[GIT PULL] SCSI fixes for 5.2-rc7

2019-07-05 Thread James Bottomley
Two iscsi fixes.  One for an oops in the client which can be triggered
by the server authentication protocol and the other in the target code
which causes data corruption.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

Maurizio Lombardi (1):
  scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not 
supported

Roman Bolshakov (1):
  scsi: target/iblock: Fix overrun in WRITE SAME emulation

And the diffstat:

 drivers/target/iscsi/iscsi_target_auth.c | 16 
 drivers/target/target_core_iblock.c  |  2 +-
 2 files changed, 9 insertions(+), 9 deletions(-)

With full diff below.

James

---

diff --git a/drivers/target/iscsi/iscsi_target_auth.c 
b/drivers/target/iscsi/iscsi_target_auth.c
index 4e680d753941..e2fa3a3bc81d 100644
--- a/drivers/target/iscsi/iscsi_target_auth.c
+++ b/drivers/target/iscsi/iscsi_target_auth.c
@@ -89,6 +89,12 @@ static int chap_check_algorithm(const char *a_str)
return CHAP_DIGEST_UNKNOWN;
 }
 
+static void chap_close(struct iscsi_conn *conn)
+{
+   kfree(conn->auth_protocol);
+   conn->auth_protocol = NULL;
+}
+
 static struct iscsi_chap *chap_server_open(
struct iscsi_conn *conn,
struct iscsi_node_auth *auth,
@@ -126,7 +132,7 @@ static struct iscsi_chap *chap_server_open(
case CHAP_DIGEST_UNKNOWN:
default:
pr_err("Unsupported CHAP_A value\n");
-   kfree(conn->auth_protocol);
+   chap_close(conn);
return NULL;
}
 
@@ -141,19 +147,13 @@ static struct iscsi_chap *chap_server_open(
 * Generate Challenge.
 */
if (chap_gen_challenge(conn, 1, aic_str, aic_len) < 0) {
-   kfree(conn->auth_protocol);
+   chap_close(conn);
return NULL;
}
 
return chap;
 }
 
-static void chap_close(struct iscsi_conn *conn)
-{
-   kfree(conn->auth_protocol);
-   conn->auth_protocol = NULL;
-}
-
 static int chap_server_compute_md5(
struct iscsi_conn *conn,
struct iscsi_node_auth *auth,
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index b5ed9c377060..efebacd36101 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -515,7 +515,7 @@ iblock_execute_write_same(struct se_cmd *cmd)
 
/* Always in 512 byte units for Linux/Block */
block_lba += sg->length >> SECTOR_SHIFT;
-   sectors -= 1;
+   sectors -= sg->length >> SECTOR_SHIFT;
}
 
iblock_submit_bios();


Re: rtc: zynqmp: One function call less in xlnx_rtc_alarm_irq_enable()

2019-07-05 Thread Markus Elfring
> Unless you use an upstream coccinelle script or you share the one you
> are using, this is not a useful information.

How do you think about to extend a software development discussion
on a topic like “Pretty-printing of code for ternary operators?”?
https://systeme.lip6.fr/pipermail/cocci/2019-July/006079.html
https://lore.kernel.org/cocci/3d2a9d9a-790c-a0f0-f980-b560504ba...@web.de/

Regards,
Markus


[PATCH] irq/irqdomain: Fix typo in the comment on top of __irq_domain_add()

2019-07-05 Thread Zenghui Yu
Fix typo in the comment on top of __irq_domain_add().

Signed-off-by: Zenghui Yu 
---
 kernel/irq/irqdomain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index a453e22..db7b713 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -123,7 +123,7 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode)
  * @ops: domain callbacks
  * @host_data: Controller private data pointer
  *
- * Allocates and initialize and irq_domain structure.
+ * Allocates and initializes an irq_domain structure.
  * Returns pointer to IRQ domain, or NULL on failure.
  */
 struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
-- 
1.8.3.1




Re: INFO: rcu detected stall in ext4_write_checks

2019-07-05 Thread Theodore Ts'o
On Fri, Jul 05, 2019 at 12:10:55PM -0700, Paul E. McKenney wrote:
> 
> Exactly, so although my patch might help for CONFIG_PREEMPT=n, it won't
> help in your scenario.  But looking at the dmesg from your URL above,
> I see the following:

I just tested with CONFIG_PREEMPT=n

% grep CONFIG_PREEMPT /build/ext4-64/.config
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTIRQ_TRACEPOINTS=y
# CONFIG_PREEMPTIRQ_EVENTS is not set

And with your patch, it's still not helping.

I think that's because SCHED_DEADLINE is a real-time style scheduler:

   In  order  to fulfill the guarantees that are made when a thread is ad‐
   mitted to the SCHED_DEADLINE policy,  SCHED_DEADLINE  threads  are  the
   highest  priority  (user  controllable)  threads  in the system; if any
   SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled
   under one of the other policies.

So a SCHED_DEADLINE process is not going yield control of the CPU,
even if it calls cond_resched() until the thread has run for more than
the sched_runtime parameter --- which for the syzkaller repro, was set
at 26 days.

There are some safety checks when using SCHED_DEADLINE:

   The kernel requires that:

   sched_runtime <= sched_deadline <= sched_period

   In  addition,  under  the  current implementation, all of the parameter
   values must be at least 1024 (i.e., just over one microsecond, which is
   the  resolution  of the implementation), and less than 2^63.  If any of
   these checks fails, sched_setattr(2) fails with the error EINVAL.

   The  CBS  guarantees  non-interference  between  tasks,  by  throttling
   threads that attempt to over-run their specified Runtime.

   To ensure deadline scheduling guarantees, the kernel must prevent situ‐
   ations where the set of SCHED_DEADLINE threads is not feasible (schedu‐
   lable)  within  the given constraints.  The kernel thus performs an ad‐
   mittance test when setting or changing SCHED_DEADLINE  policy  and  at‐
   tributes.   This admission test calculates whether the change is feasi‐
   ble; if it is not, sched_setattr(2) fails with the error EBUSY.

The problem is that SCHED_DEADLINE is designed for sporadic tasks:

   A  sporadic  task is one that has a sequence of jobs, where each job is
   activated at most once per period.  Each job also has a relative  dead‐
   line,  before which it should finish execution, and a computation time,
   which is the CPU time necessary for executing the job.  The moment when
   a  task wakes up because a new job has to be executed is called the ar‐
   rival time (also referred to as the request time or release time).  The
   start time is the time at which a task starts its execution.  The abso‐
   lute deadline is thus obtained by adding the relative deadline  to  the
   arrival time.

It appears that kernel's admission control before allowing
SCHED_DEADLINE to be set on a thread was designed for sane
applications, and not abusive ones.  Given that process started doing
abusive things *after* SCHED_DEADLINE policy was set, in order kernel
to figure out that in fact SCHED_DEADLINE should be denied for any
arbitrary kernel thread would require either (a) solving the halting
problem, or (b) being able to anticipate the future (in which case,
we should be using that kernel algorithm to play the stock market  :-)

   - Ted


[PATCH] net: pasemi: fix an use-after-free in pasemi_mac_phy_init()

2019-07-05 Thread Wen Yang
The phy_dn variable is still being used in of_phy_connect() after the
of_node_put() call, which may result in use-after-free.

Fixes: 1dd2d06c0459 ("net: Rework pasemi_mac driver to use of_mdio 
infrastructure")
Signed-off-by: Wen Yang 
Cc: "David S. Miller" 
Cc: Thomas Gleixner 
Cc: Luis Chamberlain 
Cc: Michael Ellerman 
Cc: net...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/pasemi/pasemi_mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c 
b/drivers/net/ethernet/pasemi/pasemi_mac.c
index bf5a7bc..be66601 100644
--- a/drivers/net/ethernet/pasemi/pasemi_mac.c
+++ b/drivers/net/ethernet/pasemi/pasemi_mac.c
@@ -1042,7 +1042,6 @@ static int pasemi_mac_phy_init(struct net_device *dev)
 
dn = pci_device_to_OF_node(mac->pdev);
phy_dn = of_parse_phandle(dn, "phy-handle", 0);
-   of_node_put(phy_dn);
 
mac->link = 0;
mac->speed = 0;
@@ -1051,6 +1050,7 @@ static int pasemi_mac_phy_init(struct net_device *dev)
phydev = of_phy_connect(dev, phy_dn, _adjust_link, 0,
PHY_INTERFACE_MODE_SGMII);
 
+   of_node_put(phy_dn);
if (!phydev) {
printk(KERN_ERR "%s: Could not attach to phy\n", dev->name);
return -ENODEV;
-- 
2.9.5



next-20190705 - problems generating certs/x509_certificate_list

2019-07-05 Thread Valdis Klētnieks
This worked fine in next-20190618, but in next-20190701 I'm seeing dmesg
entries at boot:

dmesg | grep -i x.509
[8.345699] Loading compiled-in X.509 certificates
[8.366137] Problem loading in-kernel X.509 certificate (-13)
[8.507348] cfg80211: Loading compiled-in X.509 certificates for regulatory 
database
[8.526556] cfg80211: Problem loading in-kernel X.509 certificate (-13)

I start debugging, and discover that certs/x509_certificate_list is a 
zero-length file.
I rm it, and 'make V=1 certs/system_certificates.o', which tells me:

()
make -f ./scripts/Makefile.headersinst obj=include/uapi
make -f ./scripts/Makefile.headersinst obj=arch/x86/include/uapi
make -f ./scripts/Makefile.build obj=certs certs/system_certificates.o
 smoking gun alert
  scripts/extract-cert "" certs/x509_certificate_list

  gcc -Wp,-MD,certs/.system_certificates.o.d  -nostdinc -isystem 
/usr/lib/gcc/x86_64-redhat-linux/9/include -I./arch/x86/include 
-I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi 
-I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi 
-include ./include/linux/kconfig.h -D__KERNEL__ -D__ASSEMBLY__ -fno-PIE -m64 
-DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 
-DCONFIG_AS_SSSE3=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 
-DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -Wa,-gdwarf-2 -DCC_USING_FENTRY 
-I.   -c -o certs/system_certificates.o certs/system_certificates.S

I go look at extract-cert.c, and sure enough, if the first parameter is a null 
string
it just goes and creates an empty file.

The Makefile says:

quiet_cmd_extract_certs  = EXTRACT_CERTS   $(patsubst "%",%,$(2))
  cmd_extract_certs  = scripts/extract-cert $(2) $@

and damned if I know why $(2) is "". Diffed the config files from -0618 and 
-0705,
not seeing anything relevant difference.

Any ideas?



pgpjQdSXdB3KK.pgp
Description: PGP signature


[PATCH] net: axienet: fix a potential double free in axienet_probe()

2019-07-05 Thread Wen Yang
There is a possible use-after-free issue in the axienet_probe():

1701:   np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0);
1702:   if (np) {
...
1787:   of_node_put(np); ---> released here
1788:   lp->eth_irq = platform_get_irq(pdev, 0);
1789:   } else {
...
1801:   }
1802:   if (IS_ERR(lp->dma_regs)) {
...
1805:   of_node_put(np); ---> double released here
1806:   goto free_netdev;
1807:   }

We solve this problem by removing the unnecessary of_node_put().

Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute 
optional")
Signed-off-by: Wen Yang 
Cc: Anirudha Sarangi 
Cc: John Linn 
Cc: "David S. Miller" 
Cc: Michal Simek 
Cc: Robert Hancock 
Cc: net...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c 
b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index 561e28a..4fc627f 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -1802,7 +1802,6 @@ static int axienet_probe(struct platform_device *pdev)
if (IS_ERR(lp->dma_regs)) {
dev_err(>dev, "could not map DMA regs\n");
ret = PTR_ERR(lp->dma_regs);
-   of_node_put(np);
goto free_netdev;
}
if ((lp->rx_irq <= 0) || (lp->tx_irq <= 0)) {
-- 
2.9.5



[PATCH] can: flexcan: fix an use-after-free in flexcan_setup_stop_mode()

2019-07-05 Thread Wen Yang
The gpr_np variable is still being used in dev_dbg() after the
of_node_put() call, which may result in use-after-free.

Fixes: de3578c198c6 ("can: flexcan: add self wakeup support")
Signed-off-by: Wen Yang 
Cc: Wolfgang Grandegger 
Cc: Marc Kleine-Budde 
Cc: "David S. Miller" 
Cc: linux-...@vger.kernel.org
Cc: net...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 drivers/net/can/flexcan.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index f2fe344..33ce45d 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -1437,10 +1437,10 @@ static int flexcan_setup_stop_mode(struct 
platform_device *pdev)
 
priv = netdev_priv(dev);
priv->stm.gpr = syscon_node_to_regmap(gpr_np);
-   of_node_put(gpr_np);
if (IS_ERR(priv->stm.gpr)) {
dev_dbg(>dev, "could not find gpr regmap\n");
-   return PTR_ERR(priv->stm.gpr);
+   ret = PTR_ERR(priv->stm.gpr);
+   goto out_put_node;
}
 
priv->stm.req_gpr = out_val[1];
@@ -1455,7 +1455,9 @@ static int flexcan_setup_stop_mode(struct platform_device 
*pdev)
 
device_set_wakeup_capable(>dev, true);
 
-   return 0;
+out_put_node:
+   of_node_put(gpr_np);
+   return ret;
 }
 
 static const struct of_device_id flexcan_of_match[] = {
-- 
2.9.5



[PATCH 1/3] kbuild: remove obj and src from the top Makefile

2019-07-05 Thread Masahiro Yamada
$(obj) is not used in the top Makefile at all. $(src) is used in
3 sites, but they can be replaced with $(srctree).

Signed-off-by: Masahiro Yamada 
---

 Makefile | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/Makefile b/Makefile
index 014390e32b0e..a5615edf2196 100644
--- a/Makefile
+++ b/Makefile
@@ -248,9 +248,6 @@ endif
 export KBUILD_CHECKSRC KBUILD_EXTMOD KBUILD_SRC
 
 objtree:= .
-src:= $(srctree)
-obj:= $(objtree)
-
 VPATH  := $(srctree)
 
 export srctree objtree VPATH
@@ -1705,7 +1702,7 @@ CHECKSTACK_ARCH := $(ARCH)
 endif
 checkstack:
$(OBJDUMP) -d vmlinux $$(find . -name '*.ko') | \
-   $(PERL) $(src)/scripts/checkstack.pl $(CHECKSTACK_ARCH)
+   $(PERL) $(srctree)/scripts/checkstack.pl $(CHECKSTACK_ARCH)
 
 kernelrelease:
@echo "$(KERNELVERSION)$$($(CONFIG_SHELL) 
$(srctree)/scripts/setlocalversion $(srctree))"
@@ -1724,11 +1721,11 @@ endif
 
 tools/: FORCE
$(Q)mkdir -p $(objtree)/tools
-   $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% 
-j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(src)/tools/
+   $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% 
-j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/
 
 tools/%: FORCE
$(Q)mkdir -p $(objtree)/tools
-   $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% 
-j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(src)/tools/ $*
+   $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% 
-j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ $*
 
 # Single targets
 # ---
-- 
2.17.1



[PATCH 3/3] kbuild: add a flag to force absolute path for srctree

2019-07-05 Thread Masahiro Yamada
In old days, Kbuild always used an absolute path for $(srctree).

Since commit 890676c65d69 ("kbuild: Use relative path when building in
the source tree"), $(srctree) is '.' when not using O=.

Yet, using absolute paths is useful in some cases even without O=, for
instance, to create a cscope file with absolute path tags.

O=. was used as an idiom to force Kbuild to use absolute paths even
when you are building in the source tree.

Since commit 25b146c5b8ce ("kbuild: allow Kbuild to start from any
directory"), Kbuild is too clever to be tricked. Even if you pass O=.
Kbuild notices you are building in the source tree, then use '.' for
$(srctree).

So, "make O=. cscope" is no help to create absolute path tags.

We cannot force one or the other according to commit e93bc1a0cab3
("Revert "kbuild: specify absolute paths for cscope""). Both of
relative path and absolute path have pros and cons.

This commit adds a new flag KBUILD_ABS_SRCTREE to allow users to
choose the absolute path for $(srctree).

"make KBUILD_ABS_SRCTREE=1 cscope" will work as a replacement of
"make O=. cscope".

I added Fixes since that commit broke some users' workflow.

Fixes: 25b146c5b8ce ("kbuild: allow Kbuild to start from any directory")
Reported-by: Pawan Gupta 
Signed-off-by: Masahiro Yamada 
---

 Documentation/kbuild/kbuild.txt | 9 +
 Makefile| 4 
 scripts/tags.sh | 3 +--
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt
index 7a7e2aa2fab5..3ef42f87f275 100644
--- a/Documentation/kbuild/kbuild.txt
+++ b/Documentation/kbuild/kbuild.txt
@@ -182,6 +182,15 @@ The output directory is often set using "O=..." on the 
commandline.
 
 The value can be overridden in which case the default value is ignored.
 
+KBUILD_ABS_SRCTREE
+--
+Kbuild uses a relative path to point to the tree when possible. For instance,
+when building in the source tree, the source tree path is '.'
+
+Setting this flag requests Kbuild to use absolute path to the source tree.
+There are some useful cases to do so, like when generating tag files with
+absolute path entries etc.
+
 KBUILD_SIGN_PIN
 --
 This variable allows a passphrase or PIN to be passed to the sign-file
diff --git a/Makefile b/Makefile
index 534a5dc796b1..6dc453f86f00 100644
--- a/Makefile
+++ b/Makefile
@@ -244,6 +244,10 @@ else
building_out_of_srctree := 1
 endif
 
+ifneq ($(KBUILD_ABS_SRCTREE),)
+srctree := $(abs_srctree)
+endif
+
 objtree:= .
 VPATH  := $(srctree)
 
diff --git a/scripts/tags.sh b/scripts/tags.sh
index 7fea4044749b..4e18ae5282a6 100755
--- a/scripts/tags.sh
+++ b/scripts/tags.sh
@@ -17,8 +17,7 @@ ignore="$(echo "$RCS_FIND_IGNORE" | sed 's|\\||g' )"
 # tags and cscope files should also ignore MODVERSION *.mod.c files
 ignore="$ignore ( -name *.mod.c ) -prune -o"
 
-# Do not use full path if we do not use O=.. builds
-# Use make O=. {tags|cscope}
+# Use make KBUILD_ABS_SRCTREE=1 {tags|cscope}
 # to force full paths for a non-O= build
 if [ "${srctree}" = "." -o -z "${srctree}" ]; then
tree=
-- 
2.17.1



[PATCH 2/3] kbuild: replace KBUILD_SRCTREE with boolean building_out_of_srctree

2019-07-05 Thread Masahiro Yamada
Commit 25b146c5b8ce ("kbuild: allow Kbuild to start from any directory")
deprecated KBUILD_SRCTREE.

It is only used in tools/testing/selftest/ to distinguish out-of-tree
build. Replace it with a new boolean flag, building_out_of_srctree.

I also replaced the conditional ($(srctree),.) because the next commit
will allow an absolute path for $(srctree) even when building in the
source tree.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 19 ---
 scripts/Makefile.build   |  2 +-
 scripts/Makefile.host|  2 +-
 scripts/Makefile.lib |  2 +-
 scripts/Makefile.modbuiltin  |  2 +-
 scripts/gdb/linux/Makefile   |  2 +-
 tools/testing/selftests/Makefile |  2 +-
 tools/testing/selftests/lib.mk   |  4 ++--
 8 files changed, 16 insertions(+), 19 deletions(-)

diff --git a/Makefile b/Makefile
index a5615edf2196..534a5dc796b1 100644
--- a/Makefile
+++ b/Makefile
@@ -228,9 +228,12 @@ ifeq ("$(origin M)", "command line")
   KBUILD_EXTMOD := $(M)
 endif
 
+export KBUILD_CHECKSRC KBUILD_EXTMOD
+
 ifeq ($(abs_srctree),$(abs_objtree))
 # building in the source tree
 srctree := .
+   building_out_of_srctree :=
 else
 ifeq ($(abs_srctree)/,$(dir $(abs_objtree)))
 # building in a subdirectory of the source tree
@@ -238,19 +241,13 @@ else
 else
 srctree := $(abs_srctree)
 endif
-
-   # TODO:
-   # KBUILD_SRC is only used to distinguish in-tree/out-of-tree build.
-   # Replace it with $(srctree) or something.
-   KBUILD_SRC := $(abs_srctree)
+   building_out_of_srctree := 1
 endif
 
-export KBUILD_CHECKSRC KBUILD_EXTMOD KBUILD_SRC
-
 objtree:= .
 VPATH  := $(srctree)
 
-export srctree objtree VPATH
+export building_out_of_srctree srctree objtree VPATH
 
 # To make sure we do not include .config for any of the *config targets
 # catch them early, and hand them over to scripts/kconfig/Makefile
@@ -453,7 +450,7 @@ USERINCLUDE:= \
 LINUXINCLUDE:= \
-I$(srctree)/arch/$(SRCARCH)/include \
-I$(objtree)/arch/$(SRCARCH)/include/generated \
-   $(if $(filter .,$(srctree)),,-I$(srctree)/include) \
+   $(if $(building_out_of_srctree),-I$(srctree)/include) \
-I$(objtree)/include \
$(USERINCLUDE)
 
@@ -509,7 +506,7 @@ PHONY += outputmakefile
 # At the same time when output Makefile generated, generate .gitignore to
 # ignore whole output directory
 outputmakefile:
-ifneq ($(srctree),.)
+ifdef building_out_of_srctree
$(Q)ln -fsn $(srctree) source
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkmakefile $(srctree)
$(Q)test -e .gitignore || \
@@ -1093,7 +1090,7 @@ PHONY += prepare archprepare prepare1 prepare3
 # and if so do:
 # 1) Check that make has not been executed in the kernel src $(srctree)
 prepare3: include/config/kernel.release
-ifneq ($(srctree),.)
+ifdef building_out_of_srctree
@$(kecho) '  Using $(srctree) as source for kernel'
$(Q)if [ -f $(srctree)/.config -o \
 -d $(srctree)/include/config -o \
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 341fca59d28f..1086caaac786 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -509,7 +509,7 @@ existing-targets := $(wildcard $(sort $(targets)))
 
 -include $(foreach f,$(existing-targets),$(dir $(f)).$(notdir $(f)).cmd)
 
-ifneq ($(srctree),.)
+ifdef building_out_of_srctree
 # Create directories for object files if they do not exist
 obj-dirs := $(sort $(obj) $(patsubst %/,%, $(dir $(targets
 # If targets exist, their directories apparently exist. Skip mkdir.
diff --git a/scripts/Makefile.host b/scripts/Makefile.host
index b6a54bdf0965..fcf0213e6ac8 100644
--- a/scripts/Makefile.host
+++ b/scripts/Makefile.host
@@ -69,7 +69,7 @@ _hostcxx_flags = $(KBUILD_HOSTCXXFLAGS) $(HOST_EXTRACXXFLAGS) 
\
 
 # $(objtree)/$(obj) for including generated headers from checkin source files
 ifeq ($(KBUILD_EXTMOD),)
-ifneq ($(srctree),.)
+ifdef building_out_of_srctree
 _hostc_flags   += -I $(objtree)/$(obj)
 _hostcxx_flags += -I $(objtree)/$(obj)
 endif
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 4d006923763c..f835a40ebae5 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -148,7 +148,7 @@ endif
 # $(srctree)/$(src) for including checkin headers from generated source files
 # $(objtree)/$(obj) for including generated headers from checkin source files
 ifeq ($(KBUILD_EXTMOD),)
-ifneq ($(srctree),.)
+ifdef building_out_of_srctree
 _c_flags   += -I $(srctree)/$(src) -I $(objtree)/$(obj)
 _a_flags   += -I $(srctree)/$(src) -I $(objtree)/$(obj)
 _cpp_flags += -I $(srctree)/$(src) -I $(objtree)/$(obj)
diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin
index 12ac300fe51b..7d4711b88656 100644
--- a/scripts/Makefile.modbuiltin
+++ b/scripts/Makefile.modbuiltin
@@ -15,7 +15,7 @@ include 

[git pull] fix bogus default y in Kconfig (VALIDATE_FS_PARSER)

2019-07-05 Thread Al Viro
That thing should not be turned on by default, especially since
it's not quiet in case it finds no problems.  Geert has sent the obvious
fix quite a few times, but it fell through the cracks.

The following changes since commit 570d7a98e7d6d5d8706d94ffd2d40adeaa318332:

  vfs: move_mount: reject moving kernel internal mounts (2019-07-01 10:46:36 
-0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git fixes

for you to fetch changes up to 75f2d86b20bf6aec0392d6dd2ae326d2ae0e:

  fs: VALIDATE_FS_PARSER should default to n (2019-07-05 11:22:11 -0400)


Geert Uytterhoeven (1):
  fs: VALIDATE_FS_PARSER should default to n

 fs/Kconfig | 1 -
 1 file changed, 1 deletion(-)


Re: kernel BUG at mm/swap_state.c:170!

2019-07-05 Thread Linus Torvalds
On Fri, Jul 5, 2019 at 4:03 PM Jan Kara  wrote:
>
> Yeah, I guess revert of 5fd4ca2d84b2 at this point is probably the best we
> can do. Let's CC Linus, Andrew, and Greg (Linus is travelling AFAIK so I'm
> not sure whether Greg won't do release for him).

I'm back home now, although possibly jetlagged.

The revert looks trivial (a conflict due to find_get_entries_tag()
having been removed in the meantime), and I guess that's the right
thing to do right now.

Matthew, comments?

   Linus


Re: [GIT PULL] nfsd bugfixes for 5.2

2019-07-05 Thread pr-tracker-bot
The pull request you sent on Fri, 5 Jul 2019 13:40:37 -0400:

> git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.2-2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/a8f46b5afe1c0a83c3013a339e6aeccc2f37342d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] Final KVM changes for 5.2

2019-07-05 Thread pr-tracker-bot
The pull request you sent on Fri,  5 Jul 2019 22:29:30 +0200:

> https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9fdb86c8cf9ae201d97334ecc2d1918800cac424

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PULL REQUEST] i2c for 5.2

2019-07-05 Thread pr-tracker-bot
The pull request you sent on Fri, 5 Jul 2019 21:21:29 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/881ed91f7db58fcbe8fdca056907991c3c9d8f2d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH 2/2] usb: pci-quirks: Minor cleanup for AMD PLL quirk

2019-07-05 Thread Ryan Kennedy
On Fri, Jul 5, 2019 at 3:10 PM Alan Stern  wrote:
>
> On Thu, 4 Jul 2019, Ryan Kennedy wrote:
>
> > usb_amd_find_chipset_info() is used for chipset detection for
> > several quirks. It is strange that its return value indicates
> > the need for the PLL quirk, which means it is often ignored.
> > This patch adds a function specifically for checking the PLL
> > quirk like the other ones. Additionally, rename probe_result to
> > something more appropriate.
> >
> > Signed-off-by: Ryan Kennedy 
>
> > @@ -322,6 +317,13 @@ bool usb_amd_prefetch_quirk(void)
> >  }
> >  EXPORT_SYMBOL_GPL(usb_amd_prefetch_quirk);
> >
> > +bool usb_amd_quirk_pll_check(void)
> > +{
> > + usb_amd_find_chipset_info();
> > + return amd_chipset.need_pll_quirk;
> > +}
> > +EXPORT_SYMBOL_GPL(usb_amd_quirk_pll_check);
>
> I really don't see the point of separating out all but one line into a
> different function.  You might as well just rename
> usb_amd_find_chipset_info to usb_amd_quirk_pll_check (along with the
> other code adjustments) and be done with it.

I did this for consistency with the others:

usb_amd_prefetch_quirk()
usb_amd_hang_symptom_quirk()
usb_hcd_amd_remote_wakeup_quirk()

They all need to ensure the chipset information exists then decide if
the particular quirk should be applied to the chipset.

Ryan

>
> However, in the end I don't care if you still want to do this.  Either
> way:
>
> Acked-by: Alan Stern 
>
> Alan Stern
>


[PATCH v7 2/2] KVM: LAPIC: Inject timer interrupt via posted interrupt

2019-07-05 Thread Wanpeng Li
From: Wanpeng Li 

Dedicated instances are currently disturbed by unnecessary jitter due 
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both 
programming timer in guest and the emulated timer fires incur vmexits.
This patch tries to avoid vmexit which is incurred by the emulated 
timer fires in dedicated instance scenario. 

When nohz_full is enabled in dedicated instances scenario, the emulated 
timers can be offload to the nearest busy housekeeping cpus since APICv 
is really common in recent years. The guest timer interrupt is injected 
by posted-interrupt which is delivered by housekeeping cpu once the emulated 
timer fires. 

The host admin should fine tuned, e.g. dedicated instances scenario w/ 
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patch:

VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time

EXTERNAL_INTERRUPT4291649.43%   39.30%   0.47us   106.09us   0.71us ( 
+-   1.09% )

w/ patch:

VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time Avg 
time

EXTERNAL_INTERRUPT6871 9.29% 2.96%   0.44us57.88us   0.72us ( 
+-   4.02% )

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Marcelo Tosatti 

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c| 101 ++--
 arch/x86/kvm/lapic.h|   1 +
 arch/x86/kvm/vmx/vmx.c  |   3 +-
 arch/x86/kvm/x86.c  |   6 +++
 arch/x86/kvm/x86.h  |   2 +
 include/linux/sched/isolation.h |   2 +
 kernel/sched/isolation.c|   6 +++
 7 files changed, 85 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 707ca9c..4869691 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -128,6 +128,17 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic)
return apic->vcpu->vcpu_id;
 }
 
+bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu)
+{
+   return pi_inject_timer && kvm_vcpu_apicv_active(vcpu);
+}
+EXPORT_SYMBOL_GPL(kvm_can_post_timer_interrupt);
+
+static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu)
+{
+   return kvm_can_post_timer_interrupt(vcpu) && vcpu->mode == 
IN_GUEST_MODE;
+}
+
 static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map,
u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) {
switch (map->mode) {
@@ -1436,29 +1447,6 @@ static void apic_update_lvtt(struct kvm_lapic *apic)
}
 }
 
-static void apic_timer_expired(struct kvm_lapic *apic)
-{
-   struct kvm_vcpu *vcpu = apic->vcpu;
-   struct swait_queue_head *q = >wq;
-   struct kvm_timer *ktimer = >lapic_timer;
-
-   if (atomic_read(>lapic_timer.pending))
-   return;
-
-   atomic_inc(>lapic_timer.pending);
-   kvm_set_pending_timer(vcpu);
-
-   /*
-* For x86, the atomic_inc() is serialized, thus
-* using swait_active() is safe.
-*/
-   if (swait_active(q))
-   swake_up_one(q);
-
-   if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use)
-   ktimer->expired_tscdeadline = ktimer->tscdeadline;
-}
-
 /*
  * On APICv, this test will cause a busy wait
  * during a higher-priority task.
@@ -1532,7 +1520,7 @@ static inline void adjust_lapic_timer_advance(struct 
kvm_vcpu *vcpu,
apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 }
 
-void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
+static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
u64 guest_tsc, tsc_deadline;
@@ -1540,9 +1528,6 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
if (apic->lapic_timer.expired_tscdeadline == 0)
return;
 
-   if (!lapic_timer_int_injected(vcpu))
-   return;
-
tsc_deadline = apic->lapic_timer.expired_tscdeadline;
apic->lapic_timer.expired_tscdeadline = 0;
guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
@@ -1554,8 +1539,59 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
adjust_lapic_timer_advance(vcpu, 
apic->lapic_timer.advance_expire_delta);
 }
+
+void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
+{
+   if (!lapic_timer_int_injected(vcpu))
+   return;
+
+   __kvm_wait_lapic_expire(vcpu);
+}
 EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire);
 
+static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
+{
+   struct kvm_timer *ktimer = >lapic_timer;
+
+   kvm_apic_local_deliver(apic, APIC_LVTT);
+   if (apic_lvtt_tscdeadline(apic))
+   ktimer->tscdeadline = 0;
+   if (apic_lvtt_oneshot(apic)) {
+   

[PATCH v7 0/2] KVM: LAPIC: Implement Exitless Timer

2019-07-05 Thread Wanpeng Li
Dedicated instances are currently disturbed by unnecessary jitter due 
to the emulated lapic timers fire on the same pCPUs which vCPUs resident.
There is no hardware virtual timer on Intel for guest like ARM. Both 
programming timer in guest and the emulated timer fires incur vmexits.
This patchset tries to avoid vmexit which is incurred by the emulated 
timer fires in dedicated instance scenario. 

When nohz_full is enabled in dedicated instances scenario, the unpinned 
timer will be moved to the nearest busy housekeepers after commit
9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit 
444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, 
KVM always makes lapic timer pinned to the pCPU which vCPU residents, the 
reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer 
pinned). Actually, these emulated timers can be offload to the housekeeping 
cpus since APICv is really common in recent years. The guest timer interrupt 
is injected by posted-interrupt which is delivered by housekeeping cpu 
once the emulated timer fires. 

The host admin should fine tuned, e.g. dedicated instances scenario w/ 
nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus 
for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root  
mode, ~3% redis performance benefit can be observed on Skylake server.

w/o patchset:

VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time   Avg time

EXTERNAL_INTERRUPT4291649.43%   39.30%   0.47us   106.09us   0.71us ( 
+-   1.09% )

w/ patchset:

VM-EXIT  Samples  Samples%  Time%   Min Time  Max Time Avg 
time

EXTERNAL_INTERRUPT6871 9.29% 2.96%   0.44us57.88us   0.72us ( 
+-   4.02% )

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Marcelo Tosatti 

v6 -> v7:
 * remove bool argument

v5 -> v6:
 * don't overwrites whatever the user specified
 * introduce kvm_can_post_timer_interrupt and kvm_use_posted_timer_interrupt
 * remove kvm_hlt_in_guest() condition
 * squash all of 2/3/4 together

v4 -> v5:
 * update patch description in patch 1/4
 * feed latest apic->lapic_timer.expired_tscdeadline to kvm_wait_lapic_expire()
 * squash advance timer handling to patch 2/4

v3 -> v4:
 * drop the HRTIMER_MODE_ABS_PINNED, add kick after set pending timer
 * don't posted inject already-expired timer

v2 -> v3:
 * disarming the vmx preemption timer when 
posted_interrupt_inject_timer_enabled()
 * check kvm_hlt_in_guest instead

v1 -> v2:
 * check vcpu_halt_in_guest
 * move module parameter from kvm-intel to kvm
 * add housekeeping_enabled
 * rename apic_timer_expired_pi to kvm_apic_inject_pending_timer_irqs


Wanpeng Li (2):
  KVM: LAPIC: Make lapic timer unpinned
  KVM: LAPIC: Inject timer interrupt via posted interrupt

 arch/x86/kvm/lapic.c| 109 ++--
 arch/x86/kvm/lapic.h|   1 +
 arch/x86/kvm/vmx/vmx.c  |   3 +-
 arch/x86/kvm/x86.c  |  12 +++--
 arch/x86/kvm/x86.h  |   2 +
 include/linux/sched/isolation.h |   2 +
 kernel/sched/isolation.c|   6 +++
 7 files changed, 90 insertions(+), 45 deletions(-)

-- 
1.8.3.1



[PATCH v7 1/2] KVM: LAPIC: Make lapic timer unpinned

2019-07-05 Thread Wanpeng Li
From: Wanpeng Li 

Commit 61abdbe0bcc2 ("kvm: x86: make lapic hrtimer pinned") pinned the
lapic timer to avoid to wait until the next kvm exit for the guest to
see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick
after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned
will be used in follow up patches.

Cc: Paolo Bonzini 
Cc: Radim Krčmář 
Cc: Marcelo Tosatti 
Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/lapic.c | 8 
 arch/x86/kvm/x86.c   | 6 +-
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 459d1ee..707ca9c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1582,7 +1582,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic)
likely(ns > apic->lapic_timer.timer_advance_ns)) {
expire = ktime_add_ns(now, ns);
expire = ktime_sub_ns(expire, ktimer->timer_advance_ns);
-   hrtimer_start(>timer, expire, HRTIMER_MODE_ABS_PINNED);
+   hrtimer_start(>timer, expire, HRTIMER_MODE_ABS);
} else
apic_timer_expired(apic);
 
@@ -1684,7 +1684,7 @@ static void start_sw_period(struct kvm_lapic *apic)
 
hrtimer_start(>lapic_timer.timer,
apic->lapic_timer.target_expiration,
-   HRTIMER_MODE_ABS_PINNED);
+   HRTIMER_MODE_ABS);
 }
 
 bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu)
@@ -2321,7 +2321,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int 
timer_advance_ns)
apic->vcpu = vcpu;
 
hrtimer_init(>lapic_timer.timer, CLOCK_MONOTONIC,
-HRTIMER_MODE_ABS_PINNED);
+HRTIMER_MODE_ABS);
apic->lapic_timer.timer.function = apic_timer_fn;
if (timer_advance_ns == -1) {
apic->lapic_timer.timer_advance_ns = 
LAPIC_TIMER_ADVANCE_ADJUST_INIT;
@@ -2510,7 +2510,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu)
 
timer = >arch.apic->lapic_timer.timer;
if (hrtimer_cancel(timer))
-   hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED);
+   hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
 }
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3a7cd935..e199ac7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk)
 
 void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
 {
-   /*
-* Note: KVM_REQ_PENDING_TIMER is implicitly checked in
-* vcpu_enter_guest.  This function is only called from
-* the physical CPU that is running vcpu.
-*/
kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu);
+   kvm_vcpu_kick(vcpu);
 }
 
 static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock)
-- 
1.8.3.1



Re: linux-next: build failure after merge of the kbuild tree

2019-07-05 Thread Masahiro Yamada
Hi Michael,

On Sat, Jul 6, 2019 at 9:05 AM Michael Kelley  wrote:
>
> From: Stephen Rothwell   Sent: Friday, July 5, 2019 
> 1:31 AM
> >
> > After merging the kbuild tree, today's linux-next build (powerpc
> > allyesconfig) failed like this:
> >
> > In file included from :
> > include/clocksource/hyperv_timer.h:18:10: fatal error: asm/mshyperv.h: No 
> > such file or
> > directory
> >  #include 
> >   ^~~~
> >
> > Caused by commit
> >
> >   34085aeb5816 ("kbuild: compile-test kernel headers to ensure they are 
> > self-contained")
> >
> > interacting with commit
> >
> >   dd2cb348613b ("clocksource/drivers: Continue making Hyper-V clocksource 
> > ISA agnostic")
> >
> > from the tip tree.
> >
>
> Thomas -- let's remove my two clocksource patches from your 'tip' tree.  I'll 
> need
> a little time to fully understand the self-contained header requirements and 
> restructure
> hyperv_timer.h to avoid this problem.

I do not think you have to drop your patches.

Since  only exists in x86,
guarding it by CONFIG_X86 is OK.
So, I think Stephen's patch is OK as-is.

Perhaps, Kbuild is imposing too much burden,
but I'd like to try it and see how it goes.


-- 
Best Regards
Masahiro Yamada


Hi

2019-07-05 Thread bantiepcongdan_tpbg
I need your help


[PATCH 2/6] fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()

2019-07-05 Thread Al Viro
From: Al Viro 

make unhash_mnt() return the mountpoint to be dropped, let callers
deal with it.

Signed-off-by: Al Viro 
---
 fs/namespace.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 746e3fd1f430..b7059a4f07e3 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -795,15 +795,17 @@ static void __touch_mnt_namespace(struct mnt_namespace 
*ns)
 /*
  * vfsmount lock must be held for write
  */
-static void unhash_mnt(struct mount *mnt)
+static struct mountpoint *unhash_mnt(struct mount *mnt)
 {
+   struct mountpoint *mp;
mnt->mnt_parent = mnt;
mnt->mnt_mountpoint = mnt->mnt.mnt_root;
list_del_init(>mnt_child);
hlist_del_init_rcu(>mnt_hash);
hlist_del_init(>mnt_mp_list);
-   put_mountpoint(mnt->mnt_mp);
+   mp = mnt->mnt_mp;
mnt->mnt_mp = NULL;
+   return mp;
 }
 
 /*
@@ -813,7 +815,7 @@ static void detach_mnt(struct mount *mnt, struct path 
*old_path)
 {
old_path->dentry = mnt->mnt_mountpoint;
old_path->mnt = >mnt_parent->mnt;
-   unhash_mnt(mnt);
+   put_mountpoint(unhash_mnt(mnt));
 }
 
 /*
@@ -823,7 +825,7 @@ static void umount_mnt(struct mount *mnt)
 {
/* old mountpoint will be dropped when we can do that */
mnt->mnt_ex_mountpoint = mnt->mnt_mountpoint;
-   unhash_mnt(mnt);
+   put_mountpoint(unhash_mnt(mnt));
 }
 
 /*
-- 
2.11.0



[PATCH 6/6] switch the remnants of releasing the mountpoint away from fs_pin

2019-07-05 Thread Al Viro
From: Al Viro 

We used to need rather convoluted ordering trickery to guarantee
that dput() of ex-mountpoints happens before the final mntput()
of the same.  Since we don't need that anymore, there's no point
playing with fs_pin for that.

Signed-off-by: Al Viro 
---
 fs/fs_pin.c| 10 ++
 fs/mount.h |  7 +--
 fs/namespace.c | 37 +++--
 include/linux/fs_pin.h |  1 -
 4 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/fs/fs_pin.c b/fs/fs_pin.c
index a6497cf8ae53..47ef3c71ce90 100644
--- a/fs/fs_pin.c
+++ b/fs/fs_pin.c
@@ -19,20 +19,14 @@ void pin_remove(struct fs_pin *pin)
spin_unlock_irq(>wait.lock);
 }
 
-void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct 
hlist_head *p)
+void pin_insert(struct fs_pin *pin, struct vfsmount *m)
 {
spin_lock(_lock);
-   if (p)
-   hlist_add_head(>s_list, p);
+   hlist_add_head(>s_list, >mnt_sb->s_pins);
hlist_add_head(>m_list, _mount(m)->mnt_pins);
spin_unlock(_lock);
 }
 
-void pin_insert(struct fs_pin *pin, struct vfsmount *m)
-{
-   pin_insert_group(pin, m, >mnt_sb->s_pins);
-}
-
 void pin_kill(struct fs_pin *p)
 {
wait_queue_entry_t wait;
diff --git a/fs/mount.h b/fs/mount.h
index 84aa8cdf4971..711a4093e475 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -58,7 +58,10 @@ struct mount {
struct mount *mnt_master;   /* slave is on master->mnt_slave_list */
struct mnt_namespace *mnt_ns;   /* containing namespace */
struct mountpoint *mnt_mp;  /* where is it mounted */
-   struct hlist_node mnt_mp_list;  /* list mounts with the same mountpoint 
*/
+   union {
+   struct hlist_node mnt_mp_list;  /* list mounts with the same 
mountpoint */
+   struct hlist_node mnt_umount;
+   };
struct list_head mnt_umounting; /* list entry for umount propagation */
 #ifdef CONFIG_FSNOTIFY
struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks;
@@ -68,7 +71,7 @@ struct mount {
int mnt_group_id;   /* peer group identifier */
int mnt_expiry_mark;/* true if marked for expiry */
struct hlist_head mnt_pins;
-   struct fs_pin mnt_umount;
+   struct hlist_head mnt_stuck_children;
 } __randomize_layout;
 
 #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
diff --git a/fs/namespace.c b/fs/namespace.c
index 326a9ab591bc..a5d0eac9749d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -171,13 +171,6 @@ unsigned int mnt_get_count(struct mount *mnt)
 #endif
 }
 
-static void drop_mountpoint(struct fs_pin *p)
-{
-   struct mount *m = container_of(p, struct mount, mnt_umount);
-   pin_remove(p);
-   mntput(>mnt);
-}
-
 static struct mount *alloc_vfsmnt(const char *name)
 {
struct mount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL);
@@ -215,7 +208,7 @@ static struct mount *alloc_vfsmnt(const char *name)
INIT_LIST_HEAD(>mnt_slave);
INIT_HLIST_NODE(>mnt_mp_list);
INIT_LIST_HEAD(>mnt_umounting);
-   init_fs_pin(>mnt_umount, drop_mountpoint);
+   INIT_HLIST_HEAD(>mnt_stuck_children);
}
return mnt;
 
@@ -1079,19 +1072,22 @@ static struct mount *clone_mnt(struct mount *old, 
struct dentry *root,
 
 static void cleanup_mnt(struct mount *mnt)
 {
+   struct hlist_node *p;
+   struct mount *m;
/*
-* This probably indicates that somebody messed
-* up a mnt_want/drop_write() pair.  If this
-* happens, the filesystem was probably unable
-* to make r/w->r/o transitions.
-*/
-   /*
+* The warning here probably indicates that somebody messed
+* up a mnt_want/drop_write() pair.  If this happens, the
+* filesystem was probably unable to make r/w->r/o transitions.
 * The locking used to deal with mnt_count decrement provides barriers,
 * so mnt_get_writers() below is safe.
 */
WARN_ON(mnt_get_writers(mnt));
if (unlikely(mnt->mnt_pins.first))
mnt_pin_kill(mnt);
+   hlist_for_each_entry_safe(m, p, >mnt_stuck_children, mnt_umount) {
+   hlist_del(>mnt_umount);
+   mntput(>mnt);
+   }
fsnotify_vfsmount_delete(>mnt);
dput(mnt->mnt.mnt_root);
deactivate_super(mnt->mnt.mnt_sb);
@@ -1160,6 +1156,7 @@ static void mntput_no_expire(struct mount *mnt)
struct mount *p, *tmp;
list_for_each_entry_safe(p, tmp, >mnt_mounts,  mnt_child) {
umount_mnt(p, );
+   hlist_add_head(>mnt_umount, 
>mnt_stuck_children);
}
}
unlock_mount_hash();
@@ -1352,6 +1349,8 @@ EXPORT_SYMBOL(may_umount);
 static void namespace_unlock(void)
 {
struct hlist_head head;
+   struct hlist_node *p;
+   struct mount *m;
 

[PATCH 4/6] make struct mountpoint bear the dentry reference to mountpoint, not struct mount

2019-07-05 Thread Al Viro
From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/mount.h |  1 -
 fs/namespace.c | 66 +-
 2 files changed, 28 insertions(+), 39 deletions(-)

diff --git a/fs/mount.h b/fs/mount.h
index 6250de544760..84aa8cdf4971 100644
--- a/fs/mount.h
+++ b/fs/mount.h
@@ -69,7 +69,6 @@ struct mount {
int mnt_expiry_mark;/* true if marked for expiry */
struct hlist_head mnt_pins;
struct fs_pin mnt_umount;
-   struct dentry *mnt_ex_mountpoint;
 } __randomize_layout;
 
 #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */
diff --git a/fs/namespace.c b/fs/namespace.c
index b7059a4f07e3..911675de2a70 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -69,6 +69,8 @@ static struct hlist_head *mount_hashtable __read_mostly;
 static struct hlist_head *mountpoint_hashtable __read_mostly;
 static struct kmem_cache *mnt_cache __read_mostly;
 static DECLARE_RWSEM(namespace_sem);
+static HLIST_HEAD(unmounted);  /* protected by namespace_sem */
+static LIST_HEAD(ex_mountpoints);
 
 /* /sys/fs */
 struct kobject *fs_kobj;
@@ -172,7 +174,6 @@ unsigned int mnt_get_count(struct mount *mnt)
 static void drop_mountpoint(struct fs_pin *p)
 {
struct mount *m = container_of(p, struct mount, mnt_umount);
-   dput(m->mnt_ex_mountpoint);
pin_remove(p);
mntput(>mnt);
 }
@@ -739,7 +740,7 @@ static struct mountpoint *get_mountpoint(struct dentry 
*dentry)
 
/* Add the new mountpoint to the hash table */
read_seqlock_excl(_lock);
-   new->m_dentry = dentry;
+   new->m_dentry = dget(dentry);
new->m_count = 1;
hlist_add_head(>m_hash, mp_hash(dentry));
INIT_HLIST_HEAD(>m_list);
@@ -752,7 +753,7 @@ static struct mountpoint *get_mountpoint(struct dentry 
*dentry)
return mp;
 }
 
-static void put_mountpoint(struct mountpoint *mp)
+static void put_mountpoint(struct mountpoint *mp, struct list_head *list)
 {
if (!--mp->m_count) {
struct dentry *dentry = mp->m_dentry;
@@ -760,6 +761,9 @@ static void put_mountpoint(struct mountpoint *mp)
spin_lock(>d_lock);
dentry->d_flags &= ~DCACHE_MOUNTED;
spin_unlock(>d_lock);
+   if (!list)
+   list = _mountpoints;
+   dput_to_list(dentry, list);
hlist_del(>m_hash);
kfree(mp);
}
@@ -813,19 +817,17 @@ static struct mountpoint *unhash_mnt(struct mount *mnt)
  */
 static void detach_mnt(struct mount *mnt, struct path *old_path)
 {
-   old_path->dentry = mnt->mnt_mountpoint;
+   old_path->dentry = dget(mnt->mnt_mountpoint);
old_path->mnt = >mnt_parent->mnt;
-   put_mountpoint(unhash_mnt(mnt));
+   put_mountpoint(unhash_mnt(mnt), NULL);
 }
 
 /*
  * vfsmount lock must be held for write
  */
-static void umount_mnt(struct mount *mnt)
+static void umount_mnt(struct mount *mnt, struct list_head *list)
 {
-   /* old mountpoint will be dropped when we can do that */
-   mnt->mnt_ex_mountpoint = mnt->mnt_mountpoint;
-   put_mountpoint(unhash_mnt(mnt));
+   put_mountpoint(unhash_mnt(mnt), list);
 }
 
 /*
@@ -837,7 +839,7 @@ void mnt_set_mountpoint(struct mount *mnt,
 {
mp->m_count++;
mnt_add_count(mnt, 1);  /* essentially, that's mntget */
-   child_mnt->mnt_mountpoint = dget(mp->m_dentry);
+   child_mnt->mnt_mountpoint = mp->m_dentry;
child_mnt->mnt_parent = mnt;
child_mnt->mnt_mp = mp;
hlist_add_head(_mnt->mnt_mp_list, >m_list);
@@ -864,7 +866,6 @@ static void attach_mnt(struct mount *mnt,
 void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct 
mount *mnt)
 {
struct mountpoint *old_mp = mnt->mnt_mp;
-   struct dentry *old_mountpoint = mnt->mnt_mountpoint;
struct mount *old_parent = mnt->mnt_parent;
 
list_del_init(>mnt_child);
@@ -873,23 +874,7 @@ void mnt_change_mountpoint(struct mount *parent, struct 
mountpoint *mp, struct m
 
attach_mnt(mnt, parent, mp);
 
-   put_mountpoint(old_mp);
-
-   /*
-* Safely avoid even the suggestion this code might sleep or
-* lock the mount hash by taking advantage of the knowledge that
-* mnt_change_mountpoint will not release the final reference
-* to a mountpoint.
-*
-* During mounting, the mount passed in as the parent mount will
-* continue to use the old mountpoint and during unmounting, the
-* old mountpoint will continue to exist until namespace_unlock,
-* which happens well after mnt_change_mountpoint.
-*/
-   spin_lock(_mountpoint->d_lock);
-   old_mountpoint->d_lockref.count--;
-   spin_unlock(_mountpoint->d_lock);
-
+   put_mountpoint(old_mp, NULL);
mnt_add_count(old_parent, -1);
 }
 
@@ -1142,6 +1127,8 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, 
delayed_mntput);
 

[PATCH 1/6] __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore

2019-07-05 Thread Al Viro
From: Al Viro 

... not since 1e9c75fb9c47 ("mnt: fix __detach_mounts infinite loop")

Signed-off-by: Al Viro 
---
 fs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 6fbc9126367a..746e3fd1f430 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1625,7 +1625,7 @@ void __detach_mounts(struct dentry *dentry)
namespace_lock();
lock_mount_hash();
mp = lookup_mountpoint(dentry);
-   if (IS_ERR_OR_NULL(mp))
+   if (!mp)
goto out_unlock;
 
event++;
-- 
2.11.0



[PATCH 3/6] Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists

2019-07-05 Thread Al Viro
From: Al Viro 

Currently, running into a shrink list that contains dentries from different
filesystems can cause several unpleasant things for shrink_dcache_parent()
and for umount(2).

The first problem is that there's a window during shrink_dentry_list() between
__dentry_kill() takes a victim out and dropping reference to its parent.  During
that window the parent looks like a genuine busy dentry.  shrink_dcache_parent()
(or, worse yet, shrink_dcache_for_umount()) coming at that time will see no
eviction candidates and no indication that it needs to wait for some
shrink_dentry_list() to proceed further.

That applies for any shrink list that might intersect with the subtree we are
trying to shrink; the only reason it does not blow on umount(2) in the mainline
is that we unregister the memory shrinker before hitting 
shrink_dcache_for_umount().

Another problem happens if something in a mixed-filesystem shrink list gets
be stuck in e.g. iput(), getting umount of unrelated fs to spin waiting for
the stuck shrinker to get around to our dentries.

Solution:
1) have shrink_dentry_list() decrement the parent's refcount and
make sure it's on a shrink list (ours unless it already had been on some
other) before calling __dentry_kill().  That eliminates the window when
shrink_dcache_parent() would've blown past the entire subtree without
noticing anything with zero refcount not on shrink lists.
2) when shrink_dcache_parent() has found no eviction candidates,
but some dentries are still sitting on shrink lists, rather than
repeating the scan in hope that shrinkers have progressed, scan looking
for something on shrink lists with zero refcount.  If such a thing is
found, grab rcu_read_lock() and stop the scan, with caller locking
it for eviction, dropping out of RCU and doing __dentry_kill(), with
the same treatment for parent as shrink_dentry_list() would do.

Note that right now mixed-filesystem shrink lists do not occur, so this
is not a mainline bug.  Howevere, there's a bunch of uses for such
beasts (e.g. the "try and evict everything we can out of given page"
patches; there are potential uses in mount-related code, considerably
simplifying the life in fs/namespace.c, etc.)

Signed-off-by: Al Viro 
---
 fs/dcache.c   | 98 ---
 fs/internal.h |  2 ++
 2 files changed, 83 insertions(+), 17 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index c435398f2c81..d8732cf2e302 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -861,6 +861,32 @@ void dput(struct dentry *dentry)
 }
 EXPORT_SYMBOL(dput);
 
+static void __dput_to_list(struct dentry *dentry, struct list_head *list)
+__must_hold(>d_lock)
+{
+   if (dentry->d_flags & DCACHE_SHRINK_LIST) {
+   /* let the owner of the list it's on deal with it */
+   --dentry->d_lockref.count;
+   } else {
+   if (dentry->d_flags & DCACHE_LRU_LIST)
+   d_lru_del(dentry);
+   if (!--dentry->d_lockref.count)
+   d_shrink_add(dentry, list);
+   }
+}
+
+void dput_to_list(struct dentry *dentry, struct list_head *list)
+{
+   rcu_read_lock();
+   if (likely(fast_dput(dentry))) {
+   rcu_read_unlock();
+   return;
+   }
+   rcu_read_unlock();
+   if (!retain_dentry(dentry))
+   __dput_to_list(dentry, list);
+   spin_unlock(>d_lock);
+}
 
 /* This must be called with d_lock held */
 static inline void __dget_dlock(struct dentry *dentry)
@@ -1067,7 +1093,7 @@ static bool shrink_lock_dentry(struct dentry *dentry)
return false;
 }
 
-static void shrink_dentry_list(struct list_head *list)
+void shrink_dentry_list(struct list_head *list)
 {
while (!list_empty(list)) {
struct dentry *dentry, *parent;
@@ -1089,18 +1115,9 @@ static void shrink_dentry_list(struct list_head *list)
rcu_read_unlock();
d_shrink_del(dentry);
parent = dentry->d_parent;
+   if (parent != dentry)
+   __dput_to_list(parent, list);
__dentry_kill(dentry);
-   if (parent == dentry)
-   continue;
-   /*
-* We need to prune ancestors too. This is necessary to prevent
-* quadratic behavior of shrink_dcache_parent(), but is also
-* expected to be beneficial in reducing dentry cache
-* fragmentation.
-*/
-   dentry = parent;
-   while (dentry && !lockref_put_or_lock(>d_lockref))
-   dentry = dentry_kill(dentry);
}
 }
 
@@ -1445,8 +1462,11 @@ int d_set_mounted(struct dentry *dentry)
 
 struct select_data {
struct dentry *start;
+   union {
+   long found;
+   struct dentry *victim;
+   };
struct list_head dispose;
-   int found;
 };
 
 static enum 

[PATCH 5/6] get rid of detach_mnt()

2019-07-05 Thread Al Viro
From: Al Viro 

Lift getting the original mount (dentry is actually not needed at all)
of the mountpoint into the callers - to do_move_mount() and pivot_root()
level.  That simplifies the cleanup in those and allows to get saner
arguments for attach_mnt_recursive().

Signed-off-by: Al Viro 
---
 fs/namespace.c | 62 ++
 1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 911675de2a70..326a9ab591bc 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -815,16 +815,6 @@ static struct mountpoint *unhash_mnt(struct mount *mnt)
 /*
  * vfsmount lock must be held for write
  */
-static void detach_mnt(struct mount *mnt, struct path *old_path)
-{
-   old_path->dentry = dget(mnt->mnt_mountpoint);
-   old_path->mnt = >mnt_parent->mnt;
-   put_mountpoint(unhash_mnt(mnt), NULL);
-}
-
-/*
- * vfsmount lock must be held for write
- */
 static void umount_mnt(struct mount *mnt, struct list_head *list)
 {
put_mountpoint(unhash_mnt(mnt), list);
@@ -2037,7 +2027,7 @@ int count_mounts(struct mnt_namespace *ns, struct mount 
*mnt)
 static int attach_recursive_mnt(struct mount *source_mnt,
struct mount *dest_mnt,
struct mountpoint *dest_mp,
-   struct path *parent_path)
+   bool moving)
 {
struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns;
HLIST_HEAD(tree_list);
@@ -2055,7 +2045,7 @@ static int attach_recursive_mnt(struct mount *source_mnt,
return PTR_ERR(smp);
 
/* Is there space to add these mounts to the mount namespace? */
-   if (!parent_path) {
+   if (!moving) {
err = count_mounts(ns, source_mnt);
if (err)
goto out;
@@ -2074,8 +2064,8 @@ static int attach_recursive_mnt(struct mount *source_mnt,
} else {
lock_mount_hash();
}
-   if (parent_path) {
-   detach_mnt(source_mnt, parent_path);
+   if (moving) {
+   unhash_mnt(source_mnt);
attach_mnt(source_mnt, dest_mnt, dest_mp);
touch_mnt_namespace(source_mnt->mnt_ns);
} else {
@@ -2173,7 +2163,7 @@ static int graft_tree(struct mount *mnt, struct mount *p, 
struct mountpoint *mp)
  d_is_dir(mnt->mnt.mnt_root))
return -ENOTDIR;
 
-   return attach_recursive_mnt(mnt, p, mp, NULL);
+   return attach_recursive_mnt(mnt, p, mp, false);
 }
 
 /*
@@ -2566,11 +2556,11 @@ static bool check_for_nsfs_mounts(struct mount *subtree)
 
 static int do_move_mount(struct path *old_path, struct path *new_path)
 {
-   struct path parent_path = {.mnt = NULL, .dentry = NULL};
struct mnt_namespace *ns;
struct mount *p;
struct mount *old;
-   struct mountpoint *mp;
+   struct mount *parent;
+   struct mountpoint *mp, *old_mp;
int err;
bool attached;
 
@@ -2580,7 +2570,9 @@ static int do_move_mount(struct path *old_path, struct 
path *new_path)
 
old = real_mount(old_path->mnt);
p = real_mount(new_path->mnt);
+   parent = old->mnt_parent;
attached = mnt_has_parent(old);
+   old_mp = old->mnt_mp;
ns = old->mnt_ns;
 
err = -EINVAL;
@@ -2608,7 +2600,7 @@ static int do_move_mount(struct path *old_path, struct 
path *new_path)
/*
 * Don't move a mount residing in a shared parent.
 */
-   if (attached && IS_MNT_SHARED(old->mnt_parent))
+   if (attached && IS_MNT_SHARED(parent))
goto out;
/*
 * Don't move a mount tree containing unbindable mounts to a destination
@@ -2624,18 +2616,21 @@ static int do_move_mount(struct path *old_path, struct 
path *new_path)
goto out;
 
err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp,
-  attached ? _path : NULL);
+  attached);
if (err)
goto out;
 
/* if the mount is moved, it should no longer be expire
 * automatically */
list_del_init(>mnt_expire);
+   if (attached)
+   put_mountpoint(old_mp, NULL);
 out:
unlock_mount(mp);
if (!err) {
-   path_put(_path);
-   if (!attached)
+   if (attached)
+   mntput_no_expire(parent);
+   else
free_mnt_ns(ns);
}
return err;
@@ -3578,8 +3573,8 @@ EXPORT_SYMBOL(path_is_under);
 SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
const char __user *, put_old)
 {
-   struct path new, old, parent_path, root_parent, root;
-   struct mount *new_mnt, *root_mnt, *old_mnt;
+   struct path new, old, root;
+   struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;

[RFC][PATCHES] (hopefully) saner refcounting for mountpoint dentries

2019-07-05 Thread Al Viro
Currently, we handle mountpoint dentry lifetime in a very convoluted
way.
* each struct mount attached to a mount tree contributes to ->d_count
of mountpoint dentry (pointed to by ->mnt_mountpoint).
* permanently detaching a mount from a mount tree moves the reference
into ->mnt_ex_mountpoint.
* that reference is dropped by drop_mountpoint(), which must happen
no later than the filesystem the mountpoint resides on gets shut down.

The last part makes for really unpleasant ordering logics; it works, but it's
bloody hard to follow and it's a lot more complex under the hood than anyone
would like.

The root cause of those complexities is that we can't do dput() while we
are detaching the thing, since the locking environment there doesn't tolerate
IO, blocking, etc., and dput() can trigger all of that.

Another complication (in analysis, not in the code) is that we also have
struct mountpoint in the picture.  Once upon a time it used to be a part
of struct dentry - the list of all mounts on given mountpoint.  Since
it doesn't make sense to bloat every dentry for the sake of a very small
fraction that will ever be anyone's mountpoints, that thing got separated.

What we have is
* mark in dentry flags (DCACHE_MOUNTED) set for dentries that are
currently mountpoints
* for each of those we have a struct mountpoint instance (exactly
one for each of those dentries).
* struct mountpoint has a pointer to its dentry (->m_dentry); it
does not contribute to refcount.
* struct mountpoint instances are hashed (all the time), using
->m_dentry as search key.
* struct mount has reference to struct mountpoint (->mnt_mp),
for as long as it is attached to a parent.  When ->mnt_mp is non-NULL
we are guaranteed that m->mnt_mp->m_dentry == m->mnt_mountpoint.
* struct mountpoint is refcounted, and ->mnt_mp contributes
to that refcount.  All other contributing references are transient -
pretty much dropped by the same function that has grabbed them.

The reasons why ->m_dentry can't become dangling (despite not contributing
to dentry refcount) or persist to the shutdown of filesystem dentry
belongs to are different for transient and presistent references to
struct mountpoint - holders of the former have dentry (and a struct
mount of the filesystem it's on) pinned until after they drop their
reference to struct mountpoint while the latter rely upon having the
(contributing) reference to the same dentry stay in struct mount
past dropping the reference to struct mountpoint.  It works, but
it's less than transparent and ultimately relies upon the mechanism
we use to order dropping dentry references from struct mount vs.
filesystem shutdowns. 

Note that once we have unmounted a struct mount, we don't really need
the reference to what used to be its mountpoint dentry - all we use
it for is eventually passing it to dput().  If we could drop it
immediately (i.e. if the locking environment allowed that), we
could do just that and forget about it as soon as mount is torn
from struct mountpoint.  IOW, we could make struct mountpoint
->m_dentry bear the contributing reference instead of struct mount
->mnt_mountpoint/->mnt_ex_mountpoint.

Locking environment really doesn't allow IO.  And ->d_count can
reach zero there.  However, while we can't kill such victim immediately,
we can put it (with zero refcount) on a shrink list of our own.  And
call shrink_dentry_list() once the locking allows.

That would almost work.  The problem is that until now all shrink
lists used to be homogeneous - all dentries on the same list belong
to the same filesystem.  And shrink_dcache_parent()/shrink_dcache_for_umount()
rely upon that.  If not for that, we could get rid of our ordering machinery.

There is another reason we want to cope with such mixed-origin shrink
lists - Slab Movable Objects patchset really needs that (well, either
that, or having a separate kmem_cache for each struct super_block).
Fortunately, that turns out be reasonably easy to do.  And that allows
to untangle the mess with mountpoints.  The series below does that;
it's in vfs.git #work.dcache and individual patches will be in followups
to this posting.

1) __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
Forgotten removal of dead check near the code affected by the
subsequent patches.
2) fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
A bit of preliminary massage - we want to be able to tell 
put_mountoint()
where to put the dropped dentry if its ->d_count reaches 0.
3) Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
The guts of that series.  We make shrink_dcache_parent() (and
shrink_dcache_for_umount()) to deal with mixed shrink lists sanely.
New primitive added: dput_to_list().  shrink_dentry_list() made non-static.
See the commit message of that one for details.
4) make struct mountpoint bear the dentry reference to 

Hi

2019-07-05 Thread rachel edwards
Nice to meet you


RE: linux-next: build failure after merge of the kbuild tree

2019-07-05 Thread Michael Kelley
From: Stephen Rothwell   Sent: Friday, July 5, 2019 1:31 
AM
> 
> After merging the kbuild tree, today's linux-next build (powerpc
> allyesconfig) failed like this:
> 
> In file included from :
> include/clocksource/hyperv_timer.h:18:10: fatal error: asm/mshyperv.h: No 
> such file or
> directory
>  #include 
>   ^~~~
> 
> Caused by commit
> 
>   34085aeb5816 ("kbuild: compile-test kernel headers to ensure they are 
> self-contained")
> 
> interacting with commit
> 
>   dd2cb348613b ("clocksource/drivers: Continue making Hyper-V clocksource ISA 
> agnostic")
> 
> from the tip tree.
> 

Thomas -- let's remove my two clocksource patches from your 'tip' tree.  I'll 
need
a little time to fully understand the self-contained header requirements and 
restructure
hyperv_timer.h to avoid this problem.

Michael


[PATCH v9 net-next 2/5] net: ethernet: ti: davinci_cpdma: add dma mapped submit

2019-07-05 Thread Ivan Khoronzhuk
In case if dma mapped packet needs to be sent, like with XDP
page pool, the "mapped" submit can be used. This patch adds dma
mapped submit based on regular one.

Signed-off-by: Ivan Khoronzhuk 
---

v9..v8
- fix potential warnings on arm64 caused by typos in type casting

 drivers/net/ethernet/ti/davinci_cpdma.c | 89 ++---
 drivers/net/ethernet/ti/davinci_cpdma.h |  4 ++
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 5cf1758d425b..4e693c3aab27 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -139,6 +139,7 @@ struct submit_info {
int directed;
void *token;
void *data;
+   int flags;
int len;
 };
 
@@ -184,6 +185,8 @@ static struct cpdma_control_info controls[] = {
 (directed << CPDMA_TO_PORT_SHIFT));\
} while (0)
 
+#define CPDMA_DMA_EXT_MAP  BIT(16)
+
 static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr)
 {
struct cpdma_desc_pool *pool = ctlr->pool;
@@ -1015,6 +1018,7 @@ static int cpdma_chan_submit_si(struct submit_info *si)
struct cpdma_chan   *chan = si->chan;
struct cpdma_ctlr   *ctlr = chan->ctlr;
int len = si->len;
+   int swlen = len;
struct cpdma_desc __iomem   *desc;
dma_addr_t  buffer;
u32 mode;
@@ -1036,16 +1040,22 @@ static int cpdma_chan_submit_si(struct submit_info *si)
chan->stats.runt_transmit_buff++;
}
 
-   buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir);
-   ret = dma_mapping_error(ctlr->dev, buffer);
-   if (ret) {
-   cpdma_desc_free(ctlr->pool, desc, 1);
-   return -EINVAL;
-   }
-
mode = CPDMA_DESC_OWNER | CPDMA_DESC_SOP | CPDMA_DESC_EOP;
cpdma_desc_to_port(chan, mode, si->directed);
 
+   if (si->flags & CPDMA_DMA_EXT_MAP) {
+   buffer = (dma_addr_t)si->data;
+   dma_sync_single_for_device(ctlr->dev, buffer, len, chan->dir);
+   swlen |= CPDMA_DMA_EXT_MAP;
+   } else {
+   buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir);
+   ret = dma_mapping_error(ctlr->dev, buffer);
+   if (ret) {
+   cpdma_desc_free(ctlr->pool, desc, 1);
+   return -EINVAL;
+   }
+   }
+
/* Relaxed IO accessors can be used here as there is read barrier
 * at the end of write sequence.
 */
@@ -1055,7 +1065,7 @@ static int cpdma_chan_submit_si(struct submit_info *si)
writel_relaxed(mode | len, >hw_mode);
writel_relaxed((uintptr_t)si->token, >sw_token);
writel_relaxed(buffer, >sw_buffer);
-   writel_relaxed(len, >sw_len);
+   writel_relaxed(swlen, >sw_len);
desc_read(desc, sw_len);
 
__cpdma_chan_submit(chan, desc);
@@ -1079,6 +1089,32 @@ int cpdma_chan_idle_submit(struct cpdma_chan *chan, void 
*token, void *data,
si.data = data;
si.len = len;
si.directed = directed;
+   si.flags = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   if (chan->state == CPDMA_STATE_TEARDOWN) {
+   spin_unlock_irqrestore(>lock, flags);
+   return -EINVAL;
+   }
+
+   ret = cpdma_chan_submit_si();
+   spin_unlock_irqrestore(>lock, flags);
+   return ret;
+}
+
+int cpdma_chan_idle_submit_mapped(struct cpdma_chan *chan, void *token,
+ dma_addr_t data, int len, int directed)
+{
+   struct submit_info si;
+   unsigned long flags;
+   int ret;
+
+   si.chan = chan;
+   si.token = token;
+   si.data = (void *)data;
+   si.len = len;
+   si.directed = directed;
+   si.flags = CPDMA_DMA_EXT_MAP;
 
spin_lock_irqsave(>lock, flags);
if (chan->state == CPDMA_STATE_TEARDOWN) {
@@ -1103,6 +1139,32 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void 
*token, void *data,
si.data = data;
si.len = len;
si.directed = directed;
+   si.flags = 0;
+
+   spin_lock_irqsave(>lock, flags);
+   if (chan->state != CPDMA_STATE_ACTIVE) {
+   spin_unlock_irqrestore(>lock, flags);
+   return -EINVAL;
+   }
+
+   ret = cpdma_chan_submit_si();
+   spin_unlock_irqrestore(>lock, flags);
+   return ret;
+}
+
+int cpdma_chan_submit_mapped(struct cpdma_chan *chan, void *token,
+dma_addr_t data, int len, int directed)
+{
+   struct submit_info si;
+   unsigned long flags;
+   int ret;
+
+   si.chan = chan;
+   si.token = token;
+   si.data = (void *)data;
+   si.len = len;
+   si.directed = directed;
+   

Re: [PATCH] m68k: One function call less in cf_tlb_miss()

2019-07-05 Thread Finn Thain


On Fri, 5 Jul 2019, Markus Elfring wrote:

> From: Markus Elfring 
> Date: Fri, 5 Jul 2019 17:11:37 +0200
> 
> Avoid an extra function call 

Not really. You've avoided an extra statement.

> by using a ternary operator instead of a conditional statement for a 
> setting selection.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>  arch/m68k/mm/mcfmmu.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c
> index 6cb1e41d58d0..02fc0778028e 100644
> --- a/arch/m68k/mm/mcfmmu.c
> +++ b/arch/m68k/mm/mcfmmu.c
> @@ -146,12 +146,10 @@ int cf_tlb_miss(struct pt_regs *regs, int write, int 
> dtlb, int extension_word)
> 
>   mmu_write(MMUDR, (pte_val(*pte) & PAGE_MASK) |
>   ((pte->pte) & CF_PAGE_MMUDR_MASK) | MMUDR_SZ_8KB | MMUDR_X);
> -
> - if (dtlb)
> - mmu_write(MMUOR, MMUOR_ACC | MMUOR_UAA);
> - else
> - mmu_write(MMUOR, MMUOR_ITLB | MMUOR_ACC | MMUOR_UAA);
> -
> + mmu_write(MMUOR,
> +   dtlb
> +   ? MMUOR_ACC | MMUOR_UAA
> +   : MMUOR_ITLB | MMUOR_ACC | MMUOR_UAA);

If you are trying to avoid redundancy, why not finish the job?

+ mmu_write(MMUOR, (dtlb ? 0 : MMUOR_ITLB) | MMUOR_ACC | MMUOR_UAA);

-- 

>   local_irq_restore(flags);
>   return 0;
>  }
> --
> 2.22.0
> 
> 


Re: [RESEND PATCH next v2 0/6] ARM: keystone: update dt and enable cpts support

2019-07-05 Thread santosh . shilimkar

On 7/5/19 8:12 AM, Grygorii Strashko wrote:

Hi Santosh,

This series is set of platform changes required to enable NETCP CPTS reference
clock selection and final patch to enable CPTS for Keystone 66AK2E/L/HK SoCs.

Those patches were posted already [1] together with driver's changes, so this
is re-send of DT/platform specific changes only, as driver's changes have
been merged already.

Patches 1-5: CPTS DT nodes update for TI Keystone 2 66AK2HK/E/L SoCs.
Patch 6: enables CPTS for TI Keystone 2 66AK2HK/E/L SoCs.

[1] https://patchwork.kernel.org/cover/10980037/

Grygorii Strashko (6):
   ARM: dts: keystone-clocks: add input fixed clocks
   ARM: dts: k2e-clocks: add input ext. fixed clocks tsipclka/b
   ARM: dts: k2e-netcp: add cpts refclk_mux node
   ARM: dts: k2hk-netcp: add cpts refclk_mux node
   ARM: dts: k2l-netcp: add cpts refclk_mux node
   ARM: configs: keystone: enable cpts


Will add these for 5.4 queue. Thanks !!

Regards,
Santosh


Re: linux-next: build failure after merge of the nvdimm tree

2019-07-05 Thread Stephen Rothwell
Hi Dan,

On Fri, 5 Jul 2019 15:32:19 -0700 Dan Williams  wrote:
>
> On Fri, Jul 5, 2019 at 12:20 AM Stephen Rothwell  
> wrote:
> >
> > After merging the nvdimm tree, today's linux-next build (x86_64
> > allmodconfig) failed like this:
> >
> > In file included from :32:
> > ./usr/include/linux/virtio_pmem.h:19:2: error: unknown type name 'uint64_t'
> >   uint64_t start;
> >   ^~~~
> > ./usr/include/linux/virtio_pmem.h:20:2: error: unknown type name 'uint64_t'
> >   uint64_t size;
> >   ^~~~  
> 
> /me boggles at how this sat in 0day visible tree for a long while
> without this report?

These messages are produced by a new test in the kbuild tree, so you
need both it and the nvdimm tree together to get them.  That will
change after the merge window, of course.

-- 
Cheers,
Stephen Rothwell


pgprPiDpkHvI1.pgp
Description: OpenPGP digital signature


Re: pagecache locking

2019-07-05 Thread Dave Chinner
On Wed, Jul 03, 2019 at 03:04:45AM +0300, Boaz Harrosh wrote:
> On 20/06/2019 01:37, Dave Chinner wrote:
> <>
> > 
> > I'd prefer it doesn't get lifted to the VFS because I'm planning on
> > getting rid of it in XFS with range locks. i.e. the XFS_MMAPLOCK is
> > likely to go away in the near term because a range lock can be
> > taken on either side of the mmap_sem in the page fault path.
> > 
> <>
> Sir Dave
> 
> Sorry if this was answered before. I am please very curious. In the zufs
> project I have an equivalent rw_MMAPLOCK that I _read_lock on page_faults.
> (Read & writes all take read-locks ...)
> The only reason I have it is because of lockdep actually.
> 
> Specifically for those xfstests that mmap a buffer then direct_IO in/out
> of that buffer from/to another file in the same FS or the same file.
> (For lockdep its the same case).

Which can deadlock if the same inode rwsem is taken on both sides of
the mmap_sem, as lockdep tells you...

> I would be perfectly happy to recursively _read_lock both from the top
> of the page_fault at the DIO path, and under in the page_fault. I'm
> _read_locking after all. But lockdep is hard to convince. So I stole the
> xfs idea of having an rw_MMAPLOCK. And grab yet another _write_lock at
> truncate/punch/clone time when all mapping traversal needs to stop for
> the destructive change to take place. (Allocations are done another way
> and are race safe with traversal)
> 
> How do you intend to address this problem with range-locks? ie recursively
> taking the same "lock"? because if not for the recursive-ity and lockdep I 
> would
> not need the extra lock-object per inode.

As long as the IO ranges to the same file *don't overlap*, it should
be perfectly safe to take separate range locks (in read or write
mode) on either side of the mmap_sem as non-overlapping range locks
can be nested and will not self-deadlock.

The "recursive lock problem" still arises with DIO and page faults
inside gup, but it only occurs when the user buffer range overlaps
the DIO range to the same file. IOWs, the application is trying to
do something that has an undefined result and is likely to result in
data corruption. So, in that case I plan to have the gup page faults
fail and the DIO return -EDEADLOCK to userspace

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU

2019-07-05 Thread Larry Finger

On 7/4/19 10:44 PM, Daniel Drake wrote:

On Wed, Jul 3, 2019 at 8:59 PM Jes Sorensen  wrote:

My point is this seems to be very dongle dependent :( We have to be
careful not breaking it for some users while fixing it for others.


Do you still have your device?

Once we get to the point when you are happy with Chris's two patches
here on a code review level, we'll reach out to other driver
contributors plus people who previously complained about these types
of problems, and see if we can get some wider testing.

Larry, do you have these devices, can you help with testing too?


I have some of the devices, and I can help with the testing.

Larry



Re: [PATCH 6/7] nfp: Use spinlock_t instead of struct spinlock

2019-07-05 Thread David Miller
From: Sebastian Andrzej Siewior 
Date: Thu,  4 Jul 2019 17:38:02 +0200

> For spinlocks the type spinlock_t should be used instead of "struct
> spinlock".
> 
> Use spinlock_t for spinlock's definition.
> 
> Cc: Jakub Kicinski 
> Cc: "David S. Miller" 
> Cc: oss-driv...@netronome.com
> Cc: net...@vger.kernel.org
> Signed-off-by: Sebastian Andrzej Siewior 

Applied to net-next, thanks.


Re: kernel BUG at mm/swap_state.c:170!

2019-07-05 Thread Jan Kara
On Fri 05-07-19 20:19:48, Mikhail Gavrilov wrote:
> Hey folks.
> Excuse me, is anybody read my previous message?
> 5.2-rc7 is still affected by this issue [the logs in file
> dmesg-5.2rc7-0.1.tar.xz] and I worry that stable 5.2 would be released
> with this bug because there is almost no time left and I didn't see
> the attention to this problem.
> I confirm that reverting commit 5fd4ca2d84b2 on top of the rc7 tag is
> help fix it [the logs in file dmesg-5.2rc7-0.2.tar.xz].
> I am still awaiting any feedback here.

Yeah, I guess revert of 5fd4ca2d84b2 at this point is probably the best we
can do. Let's CC Linus, Andrew, and Greg (Linus is travelling AFAIK so I'm
not sure whether Greg won't do release for him).

Honza
-- 
Jan Kara 
SUSE Labs, CR


Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.

2019-07-05 Thread Luis Chamberlain
Please Cc Andrew Morton  on future follow
ups.

On Sat, Jul 06, 2019 at 12:19:16AM +0200, Radoslaw Burny wrote:
> On Fri, Jul 5, 2019 at 10:02 PM Luis Chamberlain  wrote:
> >
> >
> > Please re-state the main fix in the commit log, not just the subject.
> 
> Sure, I'll do this. Just to make sure - for every iteration on the
> commit message, I need to increment the patch "version" and resend the
> whole patch, right?

Right.

> >
> > Also, this does not explain why the current values are and the impact to
> > systems / users. This would help in determine and evaluating if this
> > deserves to be a stable fix.
> 
> This commit a (much overdue) resend of https://lkml.org/lkml/2018/11/30/990
> I think Eric's comment on the previous thread explained it best:

Ah, I knew this smelled familiar. Yes I recall. Please add more
information about all this to the commit log. The more info, the better
including refence to the old discussion and also a distilled summary of
what was discussed.

Preference if you can avoid using lkml.org and instead use this URL
instead, as lkml.org is not under out control and can die, etc.

https://lore.kernel.org/lkml/20181126172607.125782-1-rbu...@google.com/

> > We spoke about this at LPC.  And this is the correct behavioral change.

Again, none of this is clear to the patch reviewer and again you didn't
mention any of it.

> >
> > The problem is there is a default value for i_uid and i_gid that is
> > correct in the general case.  That default value is not corect for
> > sysctl, because proc is weird.  As the sysctl permission check in
> > test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not
> > notice that i_uid and i_gid were being set wrong.
> >
> > So all this patch does is fix the default values i_uid and i_gid.
> 
> If my new commit message is still not conveying this clearly, feel
> free to suggest the specific wording (I'm new to the kernel patch
> process, and I might not be explaining the problems well enough).

Please consense the above into the commit log message. What you want
to be made clear is implication issues if this patch is not applied, who
is affected and why.

> > On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote:
> > > This also fixes a problem where, in a user namespace without root user
> > > mapping, it is not possible to write to /proc/sys/kernel/shmmax.
> >
> > This does not explain why that should be possible and what impact this
> > limitation has.
> 
> Writing to /proc/sys/kernel/shmmax allows setting a shared memory
> limit for that container. Since this is usually a part of container's
> initial configuration, one would expect that the container's owner /
> creator is able to set the limit. Yet, due to the bug described here,
> no process can write the container's shmmax if the container's user
> namespace does not contain root mapping.

Please include this on the commit log. It does seem then worthy as a
stable commit. Please add the Cc: stable tag, ie put this:

Cc: sta...@vger.kernel.org # v4.8+

Right above the Signed-off-by tags.

Then the scripts which pick up stable patches will pick this up.

> Using a container with no root mapping seems to be a rare case, but we
> do use this configuration at Google, which is how I found the issue.
> Also, we use a generic tool to configure the container limits, and the
> inability to write any of them causes a hard failure.

This helps folks also, so please include this in the commit log.

> > > The problem was introduced by the combination of the two commits:
> > > * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update
> > >   i_[ug]id_(read|write) to translate relative to s_user_ns
> > > - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid
> > > members of /proc/sys inodes if a containing userns does not have
> > > entries for root in the uid/gid_map.
> > This is 2014 commit merged as of v4.8.
> >
> > > * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes
> > >   with a uid or gid unknown to the vfs
> > > - changed the kernel to prevent opens for write if the i_uid/i_gid
> > > field in the inode is invalid
> >
> > This is a 2016 commit merged as of v4.8 as well...
> >
> > So regardless of the dates of the commits, are you saying this is a
> > regression you can confirm did not exist prior to v4.8? Did you test
> > v4.7 to confirm?
> 
> I assume no one has noticed this issue before because it requires such
> a specific combination of triggers.
> Yes, I've tested this with older kernel versions. I've additionally
> tested a 4.8 build with just 0aa2720d7b08 reverted, confirming that
> the revert fixes the issue.

Ummm 0aa2720d7b08 is the last part of the gitsum, you want to reference
the first part of the gitsum as otherwise git show 0aa2720d7b08 yields
nothing, but git show 0bd23d09b874e does.

OK so then the *real* issue was commit 0bd23d09b874e, so Just add this
tag:

Fixes: 0aa2720d7b08 ("vfs: Don't modify inodes with a 

Re: [PATCH net-next 0/9] net: hns3: some cleanups & bugfixes

2019-07-05 Thread David Miller
From: Huazhong Tan 
Date: Thu, 4 Jul 2019 22:04:19 +0800

> This patch-set includes cleanups and bugfixes for
> the HNS3 ethernet controller driver.
> 
> [patch 1/9] fixes VF's broadcast promisc mode not enabled after
> initializing.
> 
> [patch 2/9] adds hints for fibre port not support flow control.
> 
> [patch 3/9] fixes a port capbility updating issue.
> 
> [patch 4/9 - 9/9] adds some cleanups for HNS3 driver.

Series applied, thanks.


Re: [PATCH net] r8152: set RTL8152_UNPLUG only for real disconnection

2019-07-05 Thread David Miller
From: Hayes Wang 
Date: Thu, 4 Jul 2019 17:36:32 +0800

> Set the flag of RTL8152_UNPLUG if and only if the device is unplugged.
> Some error codes sometimes don't mean the real disconnection of usb device.
> For those situations, set the flag of RTL8152_UNPLUG causes the driver skips
> some flows of disabling the device, and it let the device stay at incorrect
> state.
> 
> Signed-off-by: Hayes Wang 

Applied.


Re: [PATCH] rtc: zynqmp: One function call less in xlnx_rtc_alarm_irq_enable()

2019-07-05 Thread Alexandre Belloni
On 05/07/2019 22:45:39+0200, Markus Elfring wrote:
> From: Markus Elfring 
> Date: Fri, 5 Jul 2019 22:37:58 +0200
> 
> Avoid an extra function call by using a ternary operator instead of
> a conditional statement for a setting selection.
> 

Please elaborate on why this is a good thing.

> This issue was detected by using the Coccinelle software.
> 

Unless you use an upstream coccinelle script or you share the one you
are using, this is not a useful information.

> Signed-off-by: Markus Elfring 
> ---
>  drivers/rtc/rtc-zynqmp.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-zynqmp.c b/drivers/rtc/rtc-zynqmp.c
> index 00639594de0c..4631019a54e2 100644
> --- a/drivers/rtc/rtc-zynqmp.c
> +++ b/drivers/rtc/rtc-zynqmp.c
> @@ -124,11 +124,8 @@ static int xlnx_rtc_alarm_irq_enable(struct device *dev, 
> u32 enabled)
>  {
>   struct xlnx_rtc_dev *xrtcdev = dev_get_drvdata(dev);
> 
> - if (enabled)
> - writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_EN);
> - else
> - writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_DIS);
> -
> + writel(RTC_INT_ALRM,
> +xrtcdev->reg_base + (enabled ? RTC_INT_EN : RTC_INT_DIS));

This makes the code less readable.

>   return 0;
>  }
> 
> --
> 2.22.0
> 

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: linux-next: build failure after merge of the nvdimm tree

2019-07-05 Thread Dan Williams
On Fri, Jul 5, 2019 at 12:20 AM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the nvdimm tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> In file included from :32:
> ./usr/include/linux/virtio_pmem.h:19:2: error: unknown type name 'uint64_t'
>   uint64_t start;
>   ^~~~
> ./usr/include/linux/virtio_pmem.h:20:2: error: unknown type name 'uint64_t'
>   uint64_t size;
>   ^~~~

/me boggles at how this sat in 0day visible tree for a long while
without this report?

>
> Caused by commit
>
>   403b7f973855 ("virtio-pmem: Add virtio pmem driver")
>
> I have used the nvdimm tree from next-20190704 for today.

Thanks Stephen, sorry for the noise.


Re: [PATCH v2] ARM: configs: Remove useless UEVENT_HELPER_PATH

2019-07-05 Thread Olof Johansson
On Fri, Jul 5, 2019 at 3:26 PM Olof Johansson via Linux.Kernel.Org
 wrote:

This didn't work as I anticipated. Please ignore, apologies for the spam.


-Olof


Re: [PATCH v2 1/9] mmc: sdhci-sprd: Check the enable clock's return value correctly

2019-07-05 Thread Olof Johansson
On Fri, Jul 5, 2019 at 3:25 PM Olof Johansson via Linux.Kernel.Org
 wrote:

Hmm, well, that didn't work like I expected to. Sorry for the noise.


-Olof


Re: [PATCH] net: ethernet: allwinner: Remove unneeded memset

2019-07-05 Thread David Miller
From: Hariprasad Kelam 
Date: Thu, 4 Jul 2019 08:29:06 +0530

> Remove unneeded memset as alloc_etherdev is using kvzalloc which uses
> __GFP_ZERO flag
> 
> Signed-off-by: Hariprasad Kelam 

Applied.


Re: linux-next: Tree for Jun 28 (kernel/bpf/cgroup.c)

2019-07-05 Thread Randy Dunlap
On 6/28/19 1:52 PM, Randy Dunlap wrote:
> On 6/28/19 3:38 AM, Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20190627:
>>
> 
> on i386:
> 
> ld: kernel/bpf/cgroup.o: in function `cg_sockopt_func_proto':
> cgroup.c:(.text+0x2906): undefined reference to `bpf_sk_storage_delete_proto'
> ld: cgroup.c:(.text+0x2939): undefined reference to `bpf_sk_storage_get_proto'
> ld: kernel/bpf/cgroup.o: in function `__cgroup_bpf_run_filter_setsockopt':
> cgroup.c:(.text+0x85e4): undefined reference to `lock_sock_nested'
> ld: cgroup.c:(.text+0x8af2): undefined reference to `release_sock'
> ld: kernel/bpf/cgroup.o: in function `__cgroup_bpf_run_filter_getsockopt':
> cgroup.c:(.text+0x8fd6): undefined reference to `lock_sock_nested'
> ld: cgroup.c:(.text+0x94e4): undefined reference to `release_sock'
> 
> 
> Full randconfig file is attached.
> 

These build errors still happen in linux-next of 20190705...

-- 
~Randy


Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.

2019-07-05 Thread Radoslaw Burny
On Fri, Jul 5, 2019 at 10:02 PM Luis Chamberlain  wrote:
>
>
> Please re-state the main fix in the commit log, not just the subject.

Sure, I'll do this. Just to make sure - for every iteration on the
commit message, I need to increment the patch "version" and resend the
whole patch, right?

>
> Also, this does not explain why the current values are and the impact to
> systems / users. This would help in determine and evaluating if this
> deserves to be a stable fix.

This commit a (much overdue) resend of https://lkml.org/lkml/2018/11/30/990
I think Eric's comment on the previous thread explained it best:

> We spoke about this at LPC.  And this is the correct behavioral change.
>
> The problem is there is a default value for i_uid and i_gid that is
> correct in the general case.  That default value is not corect for
> sysctl, because proc is weird.  As the sysctl permission check in
> test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not
> notice that i_uid and i_gid were being set wrong.
>
> So all this patch does is fix the default values i_uid and i_gid.

If my new commit message is still not conveying this clearly, feel
free to suggest the specific wording (I'm new to the kernel patch
process, and I might not be explaining the problems well enough).

>
>
> On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote:
> > This also fixes a problem where, in a user namespace without root user
> > mapping, it is not possible to write to /proc/sys/kernel/shmmax.
>
> This does not explain why that should be possible and what impact this
> limitation has.

Writing to /proc/sys/kernel/shmmax allows setting a shared memory
limit for that container. Since this is usually a part of container's
initial configuration, one would expect that the container's owner /
creator is able to set the limit. Yet, due to the bug described here,
no process can write the container's shmmax if the container's user
namespace does not contain root mapping.

Using a container with no root mapping seems to be a rare case, but we
do use this configuration at Google, which is how I found the issue.
Also, we use a generic tool to configure the container limits, and the
inability to write any of them causes a hard failure.

>
> > The problem was introduced by the combination of the two commits:
> > * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update
> >   i_[ug]id_(read|write) to translate relative to s_user_ns
> > - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid
> > members of /proc/sys inodes if a containing userns does not have
> > entries for root in the uid/gid_map.
> This is 2014 commit merged as of v4.8.
>
> > * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes
> >   with a uid or gid unknown to the vfs
> > - changed the kernel to prevent opens for write if the i_uid/i_gid
> > field in the inode is invalid
>
> This is a 2016 commit merged as of v4.8 as well...
>
> So regardless of the dates of the commits, are you saying this is a
> regression you can confirm did not exist prior to v4.8? Did you test
> v4.7 to confirm?

I assume no one has noticed this issue before because it requires such
a specific combination of triggers.
Yes, I've tested this with older kernel versions. I've additionally
tested a 4.8 build with just 0aa2720d7b08 reverted, confirming that
the revert fixes the issue.

>
> > This commit fixes the issue by defaulting i_uid/i_gid to
> > GLOBAL_ROOT_UID/GID.
>
> Why is this right?

Quoting Eric: "the sysctl permission check in test_perm are all
against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID".
The values in the inode are not even read during test_perm, but
logically, the inode belongs to the root of the namespace.

>
> > Note that these values are not used for /proc/sys
> > access checks, so the change does not otherwise affect /proc semantics.
> >
> > Tested: Used a repro program that creates a user namespace without any
> > mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
> > Before the change, it shows the overflow uid, with the change it's 0.
>
> Why is the overflow uid bad for user experience? Did you test prior to
> v4.8, ie on v4.7 to confirm this is indeed a regression?
>
> You'd want then to also ammend in the commit log a Fixes:  tag with both
> commits listed. If this is a stable fix (criteria yet to be determined),
> then we'd need a stable tag.

The overflow is technically correct; the uid in the inode is invalid,
hence it must be displayed as overflow uid. The fact that the uid is
invalid is the issue.
Logically, this commit fixes 81754357770e (as that commit first
introduced invalid uid/gid values). If you agree, I'll add this to my
updated commit.

>
>   Luis
>
> > Signed-off-by: Radoslaw Burny 
> > ---
> > Changelog since v1:
> > - Updated the commit title and description.
> >
> >  fs/proc/proc_sysctl.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/proc/proc_sysctl.c 

Re: [PATCH bpf-next] Enable zext optimization for more RV64G ALU ops

2019-07-05 Thread Daniel Borkmann
On 07/05/2019 02:18 AM, Luke Nelson wrote:
> commit 66d0d5a854a6 ("riscv: bpf: eliminate zero extension code-gen")
> added the new zero-extension optimization for some BPF ALU operations.
> 
> Since then, bugs in the JIT that have been fixed in the bpf tree require
> this optimization to be added to other operations: commit 1e692f09e091
> ("bpf, riscv: clear high 32 bits for ALU32 add/sub/neg/lsh/rsh/arsh"),
> and commit fe121ee531d1 ("bpf, riscv: clear target register high 32-bits
> for and/or/xor on ALU32")
> 
> Now that these have been merged to bpf-next, the zext optimization can
> be enabled for the fixed operations.
> 
> Cc: Song Liu 
> Cc: Jiong Wang 
> Cc: Xi Wang 
> Signed-off-by: Luke Nelson 

Applied, thanks!


Quotes needed For July Shipments

2019-07-05 Thread Sales -Jpexcc.
Hello dear,
 
We are in the market for your products after meeting at your stand during last 
expo.
 
Please kindly send us your latest catalog and price list so as to start a new 
project/order as promised during the exhibition. 
 
I would appreciate your response about the above details required so we can 
revert back to you asap.
 
Kind regards
 
Rhema Zoeh


Re: [PATCH v2] gpiolib: Preserve desc->flags when setting state

2019-07-05 Thread Linus Walleij
Hi Chris,

thanks for your patch!

On Thu, Jul 4, 2019 at 6:21 AM Chris Packham
 wrote:

> desc->flags may already have values set by of_gpiochip_add() so make
> sure that this isn't undone when setting the initial direction.
>
> Fixes: 3edfb7bd76bd1cba ("gpiolib: Show correct direction from the beginning")
> Signed-off-by: Chris Packham 
> ---
>
> Notes:
> Changes in v2:
> - add braces to avoid ambiguious else warning

This is almost the solution!

> -   if (chip->get_direction && gpiochip_line_is_valid(chip, i))
> -   desc->flags = !chip->get_direction(chip, i) ?
> -   (1 << FLAG_IS_OUT) : 0;
> -   else
> -   desc->flags = !chip->direction_input ?
> -   (1 << FLAG_IS_OUT) : 0;
> +   if (chip->get_direction && gpiochip_line_is_valid(chip, i)) {
> +   if (!chip->get_direction(chip, i))
> +   set_bit(FLAG_IS_OUT, >flags);

You need to clear_bit() in the reverse case. We just learned we can't
assume anything about the flags here, like just assign them.

> +   } else {
> +   if (!chip->direction_input)
> +   set_bit(FLAG_IS_OUT, >flags);

Same here.

Yours,
Linus Walleij


Re: gpio desc flags being lost

2019-07-05 Thread Linus Walleij
On Wed, Jul 3, 2019 at 11:30 PM Chris Packham
 wrote:

> The problem is caused by commit 3edfb7bd76bd1cba ("gpiolib: Show correct
> direction from the beginning"). I'll see if I can whip up a patch to fix it.

Oh. I think:

   if (chip->get_direction && gpiochip_line_is_valid(chip, i))
desc->flags = !chip->get_direction(chip, i) ?
(1 << FLAG_IS_OUT) : 0;
else
desc->flags = !chip->direction_input ?
(1 << FLAG_IS_OUT) : 0;


Needs to have desc->flags |=  ... &= ~

if (!chip->get_direction(chip, i))
desc->flags |= (1 << FLAG_IS_OUT);
else
desc->flags &= ~(1 << FLAG_IS_OUT);

And the same for direction_input()

Yours,
Linus Walleij


Re: [PATCH net-next 0/2] net: mvpp2: Add classification based on the ETHER flow

2019-07-05 Thread Jakub Kicinski
On Fri,  5 Jul 2019 14:09:11 +0200, Maxime Chevallier wrote:
> Hello everyone,
> 
> This series adds support for classification of the ETHER flow in the
> mvpp2 driver.
> 
> The first patch allows detecting when a user specifies a flow_type that
> isn't supported by the driver, while the second adds support for this
> flow_type by adding the mapping between the ETHER_FLOW enum value and
> the relevant classifier flow entries.

LGTM


[GIT PULL] afs: Miscellany for 5.3

2019-07-05 Thread David Howells
Hi Linus,

Here's a set of minor changes for AFS for the next merge window:

 (1) Remove an unnecessary check in afs_unlink().

 (2) Add a tracepoint for tracking callback management.

 (3) Add a tracepoint for afs_server object usage.

 (4) Use struct_size().

 (5) Add mappings for AFS UAE abort codes to Linux error codes, using
 symbolic names rather than hex numbers in the .c file.

David
---
The following changes since commit 2cd42d19cffa0ec3dfb57b1b3e1a07a9bf4ed80a:

  afs: Fix setting of i_blocks (2019-06-20 18:12:02 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/afs-next-20190628

for you to fetch changes up to 1eda8bab70ca7d353b4e865140eaec06fedbf871:

  afs: Add support for the UAE error table (2019-06-28 18:37:53 +0100)


AFS development


David Howells (4):
  afs: afs_unlink() doesn't need to check dentry->d_inode
  afs: Add some callback management tracepoints
  afs: Trace afs_server usage
  afs: Add support for the UAE error table

Zhengyuan Liu (1):
  fs/afs: use struct_size() in kzalloc()

 fs/afs/callback.c  |  20 ---
 fs/afs/cmservice.c |   5 +-
 fs/afs/dir.c   |  21 
 fs/afs/file.c  |   6 +--
 fs/afs/fsclient.c  |   2 +-
 fs/afs/inode.c |  17 +++---
 fs/afs/internal.h  |  18 +++
 fs/afs/misc.c  |  48 +++--
 fs/afs/protocol_uae.h  | 132 +
 fs/afs/rxrpc.c |   2 +-
 fs/afs/server.c|  39 +++---
 fs/afs/server_list.c   |   6 ++-
 fs/afs/write.c |   3 +-
 include/trace/events/afs.h | 132 +
 14 files changed, 369 insertions(+), 82 deletions(-)
 create mode 100644 fs/afs/protocol_uae.h



Re: [PATCH] gpiolib: fix incorrect IRQ requesting of an active-low lineevent

2019-07-05 Thread Linus Walleij
On Fri, Jul 5, 2019 at 12:35 PM  wrote:

> For example, there is a button which drives level to be low when it is 
> pushed, and drivers level to be high when it is released.
> We want to catch the event when the button is pushed.
>
> In user space we configure a line event with the following code:
>
> req.handleflags = GPIOHANDLE_REQUEST_INPUT;
> req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;

But *THIS* is the case that should have
GPIOHANDLE_REQUEST_ACTIVE_LOW, because you push
the button to activate it (it is inactive when not pushed).

Also this should have GPIOEVENT_REQUEST_RISING_EDGE.

> Run the same logic on another board which the polarity of the button is 
> inverted. The button drives level to be high when it is pushed.
> For the inverted level case, we have to add flag 
> GPIOHANDLE_REQUEST_ACTIVE_LOW:
>
> req.handleflags = GPIOHANDLE_REQUEST_INPUT | GPIOHANDLE_REQUEST_ACTIVE_LOW;
> req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;

This one should not be active low.

And also have GPIOEVENT_REQUEST_RISING_EDGE.

However I agree that the semantic should change as in the
patch, it makes most logical sense.

The reason it looks as it does is because GPIO line values
and interrupts are two separate subsystems inside the kernel
with their own flags (as you've seen).

But you are right, userspace has no idea about that and should
not have to care.

Yours,
Linus Walleij


Re: [PATCH bpf-next 1/2] bpf, libbpf: add a new API bpf_object__reuse_maps()

2019-07-05 Thread Daniel Borkmann
On 07/05/2019 10:44 PM, Anton Protopopov wrote:
> Add a new API bpf_object__reuse_maps() which can be used to replace all maps 
> in
> an object by maps pinned to a directory provided in the path argument.  
> Namely,
> each map M in the object will be replaced by a map pinned to path/M.name.
> 
> Signed-off-by: Anton Protopopov 
> ---
>  tools/lib/bpf/libbpf.c   | 34 ++
>  tools/lib/bpf/libbpf.h   |  2 ++
>  tools/lib/bpf/libbpf.map |  1 +
>  3 files changed, 37 insertions(+)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 4907997289e9..84c9e8f7bfd3 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -3144,6 +3144,40 @@ int bpf_object__unpin_maps(struct bpf_object *obj, 
> const char *path)
>   return 0;
>  }
>  
> +int bpf_object__reuse_maps(struct bpf_object *obj, const char *path)
> +{
> + struct bpf_map *map;
> +
> + if (!obj)
> + return -ENOENT;
> +
> + if (!path)
> + return -EINVAL;
> +
> + bpf_object__for_each_map(map, obj) {
> + int len, err;
> + int pinned_map_fd;
> + char buf[PATH_MAX];

We'd need to skip the case of bpf_map__is_internal(map) since they are always
recreated for the given object.

> + len = snprintf(buf, PATH_MAX, "%s/%s", path, 
> bpf_map__name(map));
> + if (len < 0) {
> + return -EINVAL;
> + } else if (len >= PATH_MAX) {
> + return -ENAMETOOLONG;
> + }
> +
> + pinned_map_fd = bpf_obj_get(buf);
> + if (pinned_map_fd < 0)
> + return pinned_map_fd;

Should we rather have a new map definition attribute that tells to reuse
the map if it's pinned in bpf fs, and if not, we create it and later on
pin it? This is what iproute2 is doing and which we're making use of heavily.
In bpf_object__reuse_maps() bailing out if bpf_obj_get() fails is perhaps
too limiting for a generic API as new version of an object file may contain
new maps which are not yet present in bpf fs at that point.

> + err = bpf_map__reuse_fd(map, pinned_map_fd);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
>  int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
>  {
>   struct bpf_program *prog;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index d639f47e3110..7fe465a1be76 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -82,6 +82,8 @@ int bpf_object__variable_offset(const struct bpf_object 
> *obj, const char *name,
>  LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char 
> *path);
>  LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
> const char *path);
> +LIBBPF_API int bpf_object__reuse_maps(struct bpf_object *obj,
> +   const char *path);
>  LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
>   const char *path);
>  LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 2c6d835620d2..66a30be6696c 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -172,5 +172,6 @@ LIBBPF_0.0.4 {
>   btf_dump__new;
>   btf__parse_elf;
>   bpf_object__load_xattr;
> + bpf_object__reuse_maps;
>   libbpf_num_possible_cpus;
>  } LIBBPF_0.0.3;
> 



Re: [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI

2019-07-05 Thread Thomas Gleixner
On Thu, 4 Jul 2019, Thomas Gleixner wrote:

> apic->send_IPI_allbutself() takes a vector number as argument.
> 
> APIC_DM_NMI is clearly not a vector number. It's defined to 0x400 which is
> outside the vector space.
> 
> Use NMI_VECTOR instead as that's what it is intended to be.
> 
> Fixes: 82da3ff89dc2 ("x86: kgdb support")
> Signed-off-by: Thomas Gleixner 
> ---
> V2: New patch
> ---
>  arch/x86/kernel/kgdb.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -424,7 +424,7 @@ static void kgdb_disable_hw_debug(struct
>   */
>  void kgdb_roundup_cpus(void)
>  {
> - apic->send_IPI_allbutself(APIC_DM_NMI);
> + apic->send_IPI_allbutself(VECTOR_NMI);

The changelog got it right, but this here needs to be VECTOR_NMI. While I
didn't 0-day was able to find and turn on the config option ...

/blush



[GIT PULL] Keys: Set 4 - Key ACLs for 5.3

2019-07-05 Thread David Howells
Hi Linus,

Here's my fourth block of keyrings changes for the next merge window.  They
change the permissions model used by keys and keyrings to be based on an
internal ACL by the following means:

 (1) Replace the permissions mask internally with an ACL that contains a
 list of ACEs, each with a specific subject with a permissions mask.
 Potted default ACLs are available for new keys and keyrings.

 ACE subjects can be macroised to indicate the UID and GID specified on
 the key (which remain).  Future commits will be able to add additional
 subject types, such as specific UIDs or domain tags/namespaces.

 Also split a number of permissions to give finer control.  Examples
 include splitting the revocation permit from the change-attributes
 permit, thereby allowing someone to be granted permission to revoke a
 key without allowing them to change the owner; also the ability to
 join a keyring is split from the ability to link to it, thereby
 stopping a process accessing a keyring by joining it and thus
 acquiring use of possessor permits.

 (2) Provide a keyctl to allow the granting or denial of one or more
 permits to a specific subject.  Direct access to the ACL is not
 granted, and the ACL cannot be viewed.

David
---
The following changes since commit a58946c158a040068e7c94dc1d58bbd273258068:

  keys: Pass the network namespace into request_key mechanism (2019-06-27 
23:02:12 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/keys-acl-20190703

for you to fetch changes up to 7a1ade847596dadc94b37e49f8c03f167fd71748:

  keys: Provide KEYCTL_GRANT_PERMISSION (2019-07-03 13:05:22 +0100)


Keyrings ACL


David Howells (2):
  keys: Replace uid/gid/perm permissions checking with an ACL
  keys: Provide KEYCTL_GRANT_PERMISSION

 Documentation/security/keys/core.rst   | 128 ++--
 Documentation/security/keys/request-key.rst|   9 +-
 certs/blacklist.c  |   7 +-
 certs/system_keyring.c |  12 +-
 drivers/md/dm-crypt.c  |   2 +-
 drivers/nvdimm/security.c  |   2 +-
 fs/afs/security.c  |   2 +-
 fs/cifs/cifs_spnego.c  |  25 +-
 fs/cifs/cifsacl.c  |  28 +-
 fs/cifs/connect.c  |   4 +-
 fs/crypto/keyinfo.c|   2 +-
 fs/ecryptfs/ecryptfs_kernel.h  |   2 +-
 fs/ecryptfs/keystore.c |   2 +-
 fs/fscache/object-list.c   |   2 +-
 fs/nfs/nfs4idmap.c |  30 +-
 fs/ubifs/auth.c|   2 +-
 include/linux/key.h| 121 +++
 include/uapi/linux/keyctl.h|  65 
 lib/digsig.c   |   2 +-
 net/ceph/ceph_common.c |   2 +-
 net/dns_resolver/dns_key.c |  12 +-
 net/dns_resolver/dns_query.c   |  15 +-
 net/rxrpc/key.c|  19 +-
 net/wireless/reg.c |   6 +-
 security/integrity/digsig.c|  31 +-
 security/integrity/digsig_asymmetric.c |   2 +-
 security/integrity/evm/evm_crypto.c|   2 +-
 security/integrity/ima/ima_mok.c   |  13 +-
 security/integrity/integrity.h |   6 +-
 .../integrity/platform_certs/platform_keyring.c|  14 +-
 security/keys/compat.c |   2 +
 security/keys/encrypted-keys/encrypted.c   |   2 +-
 security/keys/encrypted-keys/masterkey_trusted.c   |   2 +-
 security/keys/gc.c |   2 +-
 security/keys/internal.h   |  16 +-
 security/keys/key.c|  29 +-
 security/keys/keyctl.c | 104 --
 security/keys/keyring.c|  27 +-
 security/keys/permission.c | 361 +++--
 security/keys/persistent.c |  27 +-
 security/keys/proc.c   |  22 +-
 security/keys/process_keys.c   |  86 +++--
 security/keys/request_key.c|  34 +-
 security/keys/request_key_auth.c   |  15 +-
 security/selinux/hooks.c   |  16 +-
 security/smack/smack_lsm.c |   3 +-
 46 files changed, 992 insertions(+), 325 deletions(-)


Re: Re: [PATCH v2 3/7] rtc: mt6397: improvements of rtc driver

2019-07-05 Thread Alexandre Belloni
On 05/07/2019 17:35:46+0200, Frank Wunderlich wrote:
> Hi Alexander,
> 
> thank you for the Review
> 
> > Gesendet: Donnerstag, 04. Juli 2019 um 22:43 Uhr
> > Von: "Alexandre Belloni" 
> > > - rtc->rtc_dev = devm_rtc_allocate_device(rtc->dev);
> > > - if (IS_ERR(rtc->rtc_dev))
> > > - return PTR_ERR(rtc->rtc_dev);
> > > + ret = devm_request_threaded_irq(>dev, rtc->irq, NULL,
> > > + mtk_rtc_irq_handler_thread,
> > > + IRQF_ONESHOT | IRQF_TRIGGER_HIGH,
> > > + "mt6397-rtc", rtc);
> > >
> >
> > This change may lead to a crash and the allocation was intentionally
> > placed before the irq request.
> 
> i got no crash till now, but i will try to move the allocation before 
> irq-request
> 

Let's say the RTC has been used to start your platform, then the irq
handler will be called as soon as the irq is requested, leading to a
null pointer dereference.

> > > - ret = request_threaded_irq(rtc->irq, NULL,
> > > -mtk_rtc_irq_handler_thread,
> > > -IRQF_ONESHOT | IRQF_TRIGGER_HIGH,
> > > -"mt6397-rtc", rtc);
> > >   if (ret) {
> > >   dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n",
> > >   rtc->irq, ret);
> > > @@ -287,6 +281,10 @@ static int mtk_rtc_probe(struct platform_device 
> > > *pdev)
> > >
> > >   device_init_wakeup(>dev, 1);
> > >
> > > + rtc->rtc_dev = devm_rtc_allocate_device(>dev);
> > > + if (IS_ERR(rtc->rtc_dev))
> > > + return PTR_ERR(rtc->rtc_dev);
> > > +
> > >   rtc->rtc_dev->ops = _rtc_ops;
> 
> 
> > >  static const struct of_device_id mt6397_rtc_of_match[] = {
> > > + { .compatible = "mediatek,mt6323-rtc", },
> >
> > Unrelated change, this is not an improvement and must be accompanied by
> > a documentation change.
> 
> documentation is changed in 1/7 defining this compatible. i called it 
> improvement because existing driver now supports another chip
> 

Yes and IIRC, I did comment that the rtc change also had to be separated
from 1/7.

Also, I really doubt this new compatible is necessary at all as you
could simply directly use mediatek,mt6397-rtc.

-- 
Alexandre Belloni, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com


Re: [ANNOUNCE] trace-cmd v2.8.1

2019-07-05 Thread Bhaskar Chowdhury


Cool !!

On 12:34 Fri 05 Jul , Steven Rostedt wrote:


Just after releasing 2.8, some bugs were found (isn't that always the
case?). Now we have 2.8.1 stable release:

 http://trace-cmd.org

-- Steve

Short log here:

Greg Thelen (2):
 trace-cmd: Always initialize write_record() len
 trace-cmd: Avoid using uninitialized handle

Steven Rostedt (VMware) (1):
 trace-cmd: Version 2.8.1

Tzvetomir Stoyanov (VMware) (1):
 trace-cmd: Do not free pages from the lookup table in struct cpu_data in 
case trace file is loaded.


signature.asc
Description: PGP signature


[GIT PULL] Keys: Set 3 - Keyrings namespacing for 5.3

2019-07-05 Thread David Howells
Here's my third block of keyrings changes for the next merge window.

These patches help make keys and keyrings more namespace aware.  Firstly
some miscellaneous patches to make the process easier:

 (1) Simplify key index_key handling so that the word-sized chunks
 assoc_array requires don't have to be shifted about, making it easier
 to add more bits into the key.

 (2) Cache the hash value in the key so that we don't have to calculate on
 every key we examine during a search (it involves a bunch of
 multiplications).

 (3) Allow keying_search() to search non-recursively.

Then the main patches:

 (4) Make it so that keyring names are per-user_namespace from the point of
 view of KEYCTL_JOIN_SESSION_KEYRING so that they're not accessible
 cross-user_namespace.

 keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this.

 (5) Move the user and user-session keyrings to the user_namespace rather
 than the user_struct.  This prevents them propagating directly across
 user_namespaces boundaries (ie. the KEY_SPEC_* flags will only pick
 from the current user_namespace).

 (6) Make it possible to include the target namespace in which the key shall
 operate in the index_key.  This will allow the possibility of multiple
 keys with the same description, but different target domains to be held
 in the same keyring.

 keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this.

 (7) Make it so that keys are implicitly invalidated by removal of a domain
 tag, causing them to be garbage collected.

 (8) Institute a network namespace domain tag that allows keys to be
 differentiated by the network namespace in which they operate.  New keys
 that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned the network
 domain in force when they are created.

 (9) Make it so that the desired network namespace can be handed down into the
 request_key() mechanism.  This allows AFS, NFS, etc. to request keys
 specific to the network namespace of the superblock.

 This also means that the keys in the DNS record cache are thenceforth
 namespaced, provided network filesystems pass the appropriate network
 namespace down into dns_query().

 For DNS, AFS and NFS are good, whilst CIFS and Ceph are not.  Other
 cache keyrings, such as idmapper keyrings, also need to set the domain
 tag - for which they need access to the network namespace of the
 superblock.

David
---
The following changes since commit 3b8c4a08a471d56ecaaca939c972fdf5b8255629:

  keys: Kill off request_key_async{,_with_auxdata} (2019-06-26 20:58:13 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/keys-namespace-20190627

for you to fetch changes up to a58946c158a040068e7c94dc1d58bbd273258068:

  keys: Pass the network namespace into request_key mechanism (2019-06-27 
23:02:12 +0100)


Keyrings namespacing


David Howells (9):
  keys: Simplify key description management
  keys: Cache the hash value to avoid lots of recalculation
  keys: Add a 'recurse' flag for keyring searches
  keys: Namespace keyring names
  keys: Move the user and user-session keyrings to the user_namespace
  keys: Include target namespace in match criteria
  keys: Garbage collect keys for which the domain has been removed
  keys: Network namespace domain tag
  keys: Pass the network namespace into request_key mechanism

 Documentation/security/keys/core.rst|  38 ++--
 Documentation/security/keys/request-key.rst |  29 ++-
 certs/blacklist.c   |   2 +-
 crypto/asymmetric_keys/asymmetric_type.c|   2 +-
 fs/afs/addr_list.c  |   4 +-
 fs/afs/dynroot.c|   8 +-
 fs/cifs/dns_resolve.c   |   3 +-
 fs/nfs/dns_resolve.c|   3 +-
 fs/nfs/nfs4idmap.c  |   2 +-
 include/linux/dns_resolver.h|   3 +-
 include/linux/key-type.h|   3 +
 include/linux/key.h |  81 -
 include/linux/sched/user.h  |  14 --
 include/linux/user_namespace.h  |  12 +-
 include/net/net_namespace.h |   3 +
 include/uapi/linux/keyctl.h |   2 +
 kernel/user.c   |   8 +-
 kernel/user_namespace.c |   9 +-
 lib/digsig.c|   2 +-
 net/ceph/messenger.c|   3 +-
 net/core/net_namespace.c|  20 +++
 net/dns_resolver/dns_key.c  |   1 +
 net/dns_resolver/dns_query.c|   7 +-
 net/rxrpc/key.c |   6 +-
 net/rxrpc/security.c|  

Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Andrew Cooper
On 05/07/2019 21:49, Paolo Bonzini wrote:
> On 05/07/19 22:25, Thomas Gleixner wrote:
>> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
>> I'm disappointed to see wasn't shared with other software vendors at the
>> time.
> Oh, that brings back memories.  At the time I was working on Xen, so I
> remember that CVE.  IIRC there was some mitigation but the fix was
> basically to print a very scary error message if you used VT-d without
> interrupt remapping.  Maybe force the user to add something on the Xen
> command line too?

It was before my time.  I have no public comment on how the other
aspects of it were handled.

>> Is there any serious usage of virtualization w/o interrupt remapping left
>> or have the machines which are not capable been retired already?
> I think they were already starting to disappear in 2011, as I don't
> remember much worry about customers that were using systems without it.

ISTR Nehalem/Westmere era systems were the first to support interrupt
remapping, but were totally crippled with errata to the point of needing
to turn a prerequisite feature (Queued Invalidation) off.  I believe
later systems have it working to a first approximation.

As to the original question, whether people should be using such systems
is a different question to whether they actually are.

~Andrew


[GIT PULL] Keys: Set 2 - request_key() improvements for 5.3

2019-07-05 Thread David Howells
Hi Linus,

Here's my second block of keyrings changes for the next merge window.

These are all request_key()-related, including a fix and some improvements:

 (1) Fix the lack of a Link permission check on a key found by
 request_key(), thereby enabling request_key() to link keys that don't
 grant this permission to the target keyring (which must still grant
 Write permission).

 Note that the key must be in the caller's keyrings already to be
 found.

 (2) Invalidate used request_key authentication keys rather than revoking
 them, so that they get cleaned up immediately rather than hanging
 around till the expiry time is passed.

 (3) Move the RCU locks outwards from the keyring search functions so that
 a request_key_rcu() can be provided.  This can be called in RCU mode,
 so it can't sleep and can't upcall - but it can be called from
 LOOKUP_RCU pathwalk mode.

 (4) Cache the latest positive result of request_key*() temporarily in
 task_struct so that filesystems that make a lot of request_key() calls
 during pathwalk can take advantage of it to avoid having to redo the
 searching.  This requires CONFIG_KEYS_REQUEST_CACHE=y.

 It is assumed that the key just found is likely to be used multiple
 times in each step in an RCU pathwalk, and is likely to be reused for
 the next step too.

 Note that the cleanup of the cache is done on TIF_NOTIFY_RESUME, just
 before userspace resumes, and on exit.

David
---
The following changes since commit 45e0f30c30bb131663fbe1752974d6f2e39611e2:

  keys: Add capability-checking keyctl function (2019-06-19 13:27:45 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/keys-request-20190626

for you to fetch changes up to 3b8c4a08a471d56ecaaca939c972fdf5b8255629:

  keys: Kill off request_key_async{,_with_auxdata} (2019-06-26 20:58:13 +0100)


request_key improvements


David Howells (6):
  keys: Fix request_key() lack of Link perm check on found key
  keys: Invalidate used request_key authentication keys
  keys: Move the RCU locks outwards from the keyring search functions
  keys: Provide request_key_rcu()
  keys: Cache result of request_key*() temporarily in task_struct
  keys: Kill off request_key_async{,_with_auxdata}

 Documentation/security/keys/core.rst|  38 ++--
 Documentation/security/keys/request-key.rst |  33 +++
 include/keys/request_key_auth-type.h|   1 +
 include/linux/key.h |  14 +--
 include/linux/sched.h   |   5 +
 include/linux/tracehook.h   |   7 ++
 kernel/cred.c   |   9 ++
 security/keys/Kconfig   |  18 
 security/keys/internal.h|   6 +-
 security/keys/key.c |   4 +-
 security/keys/keyring.c |  16 ++--
 security/keys/proc.c|   4 +-
 security/keys/process_keys.c|  41 -
 security/keys/request_key.c | 137 ++--
 security/keys/request_key_auth.c|  60 +++-
 15 files changed, 229 insertions(+), 164 deletions(-)


[GIT PULL] Keys: Set 1 - Miscellany for 5.3

2019-07-05 Thread David Howells
Hi Linus,

Here's my first block of keyrings changes for the next merge window.  I've
divided up the set into four blocks, but they need to be applied in order
as they would otherwise conflict with each other.

These are some miscellaneous keyrings fixes and improvements:

 (1) Fix a bunch of warnings from sparse, including missing RCU bits and
 kdoc-function argument mismatches

 (2) Implement a keyctl to allow a key to be moved from one keyring to
 another, with the option of prohibiting key replacement in the
 destination keyring.

 (3) Grant Link permission to possessors of request_key_auth tokens so that
 upcall servicing daemons can more easily arrange things such that only
 the necessary auth key is passed to the actual service program, and
 not all the auth keys a daemon might possesss.

 (4) Improvement in lookup_user_key().

 (5) Implement a keyctl to allow keyrings subsystem capabilities to be
 queried.

The keyutils next branch has commits to make available, document and test
the move-key and capabilities code:


https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log

They're currently on the 'next' branch.

David
---
The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9:

  Linux 5.2-rc1 (2019-05-19 15:47:09 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git 
tags/keys-misc-20190619

for you to fetch changes up to 45e0f30c30bb131663fbe1752974d6f2e39611e2:

  keys: Add capability-checking keyctl function (2019-06-19 13:27:45 +0100)


Keyrings miscellany


David Howells (9):
  keys: sparse: Fix key_fs[ug]id_changed()
  keys: sparse: Fix incorrect RCU accesses
  keys: sparse: Fix kdoc mismatches
  keys: Change keyring_serialise_link_sem to a mutex
  keys: Break bits out of key_unlink()
  keys: Hoist locking out of __key_link_begin()
  keys: Add a keyctl to move a key between keyrings
  keys: Grant Link permission to possessers of request_key auth keys
  keys: Add capability-checking keyctl function

Eric Biggers (1):
  keys: Reuse keyring_index_key::desc_len in lookup_user_key()

 Documentation/security/keys/core.rst |  21 +++
 include/linux/key.h  |  13 +-
 include/uapi/linux/keyctl.h  |  17 +++
 kernel/cred.c|   4 +-
 security/keys/compat.c   |   6 +
 security/keys/internal.h |   7 +
 security/keys/key.c  |  27 +++-
 security/keys/keyctl.c   |  90 +++-
 security/keys/keyring.c  | 278 ---
 security/keys/process_keys.c |  26 ++--
 security/keys/request_key.c  |   9 +-
 security/keys/request_key_auth.c |   4 +-
 12 files changed, 418 insertions(+), 84 deletions(-)


Re: [PATCH] cpu/hotplug: Cache number of online CPUs

2019-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2019, Thomas Gleixner wrote:
> On Fri, 5 Jul 2019, Mathieu Desnoyers wrote:
> > - On Jul 5, 2019, at 4:49 AM, Ingo Molnar mi...@kernel.org wrote:
> > > * Mathieu Desnoyers  wrote:
> > >> The semantic I am looking for here is C11's relaxed atomics.
> > > 
> > > What does this mean?
> > 
> > C11 states:
> > 
> > "Atomic operations specifying memory_order_relaxed are  relaxed  only  with 
> >  respect
> > to memory ordering.  Implementations must still guarantee that any given 
> > atomic access
> > to a particular atomic object be indivisible with respect to all other 
> > atomic accesses
> > to that object."
> > 
> > So I am concerned that num_online_cpus() as proposed in this patch
> > try to access __num_online_cpus non-atomically, and without using
> > READ_ONCE().
> >
> > 
> > Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex
> > does not provide mutual exclusion against concurrent readers of that 
> > variable.
> 
> Again. This is nothing new. The current implementation of num_online_cpus()
> has no guarantees whatsoever. 
> 
> bitmap_hweight() can be hit by a concurrent update of the mask it is
> looking at.
> 
> num_online_cpus() gives you only the correct number if you invoke it inside
> a cpuhp_lock held section. So why do we need that fuzz about atomicity now?
> 
> It's racy and was racy forever and even if we add that READ/WRITE_ONCE muck
> then it still wont give you a reliable answer unless you hold cpuhp_lock at
> least for read. So fore me that READ/WRITE_ONCE is just a cosmetic and
> misleading reality distortion.

That said. If it makes everyone happy and feel better, I'm happy to add it
along with a bit fat comment which explains that it's just preventing a
theoretical store/load tearing issue and does not provide any guarantees
other than that.

Thanks,

tglx


Re: [PATCH] mtd: spinand: Fix max_bad_eraseblocks_per_lun info in memorg

2019-07-05 Thread Miquel Raynal
On Thu, 2019-06-06 at 17:07:55 UTC, Schrempf Frieder wrote:
> From: Frieder Schrempf 
> 
> The 1Gb Macronix chip can have a maximum of 20 bad blocks, while
> the 2Gb version has twice as many blocks and therefore the maximum
> number of bad blocks is 40.
> 
> The 4Gb GigaDevice GD5F4GQ4xA has twice as many blocks as its 2Gb
> counterpart and therefore a maximum of 80 bad blocks.
> 
> Fixes: 377e517b5fa5 ("mtd: nand: Add max_bad_eraseblocks_per_lun info to 
> memorg")
> Reported-by: Emil Lenngren 
> Signed-off-by: Frieder Schrempf 

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
mtd/fixes, thanks.

Miquel


Re: [PATCH] mtd: rawnand: ingenic: Fix ingenic_ecc dependency

2019-07-05 Thread Miquel Raynal
On Sat, 2019-06-29 at 01:22:48 UTC, Paul Cercueil wrote:
> If MTD_NAND_JZ4780 is y and MTD_NAND_JZ4780_BCH is m,
> which select CONFIG_MTD_NAND_INGENIC_ECC to m, building fails:
> 
> drivers/mtd/nand/raw/ingenic/ingenic_nand.o: In function 
> `ingenic_nand_remove':
> ingenic_nand.c:(.text+0x177): undefined reference to `ingenic_ecc_release'
> drivers/mtd/nand/raw/ingenic/ingenic_nand.o: In function 
> `ingenic_nand_ecc_correct':
> ingenic_nand.c:(.text+0x2ee): undefined reference to `ingenic_ecc_correct'
> 
> To fix that, the ingenic_nand and ingenic_ecc modules have been fused
> into one single module.
> - The ingenic_ecc.c code is now compiled in only if
>   $(CONFIG_MTD_NAND_INGENIC_ECC) is set. This is now a boolean instead
>   of tristate.
> - To avoid changing the module name, the ingenic_nand.c file is moved to
>   ingenic_nand_drv.c. Then the module name is still ingenic_nand.
> - Since ingenic_ecc.c is no more a module, the module-specific macros
>   have been dropped, and the functions are no more exported for use by
>   the ingenic_nand driver.
> 
> Fixes: 15de8c6efd0e ("mtd: rawnand: ingenic: Separate top-level and SoC 
> specific code")
> Signed-off-by: Paul Cercueil 
> Reported-by: Arnd Bergmann 
> Reported-by: Hulk Robot 
> Cc: YueHaibing 
> Cc: sta...@vger.kernel.org

Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git 
mtd/fixes, thanks.

Miquel


Re: [PATCH] [STABLE backport 4.9] arm64, vdso: Define vdso_{start,end} as array

2019-07-05 Thread Sasha Levin

On Fri, Jul 05, 2019 at 08:47:20PM +0200, Arnd Bergmann wrote:

From: Kees Cook 

Commit dbbb08f500d6146398b794fdc68a8e811366b451 upstream.

Adjust vdso_{start|end} to be char arrays to avoid compile-time analysis
that flags "too large" memcmp() calls with CONFIG_FORTIFY_SOURCE.

Cc: Jisheng Zhang 
Acked-by: Catalin Marinas 
Suggested-by: Mark Rutland 
Signed-off-by: Kees Cook 
Signed-off-by: Will Deacon 
Signed-off-by: Arnd Bergmann 
---
Backported to 4.9, which is lacking the rework from
2077be6783b5 ("arm64: Use __pa_symbol for kernel symbols")


I've queued both this and the 4.4 backport, thanks!

--
Thanks,
Sasha


Re: [PATCH] cpu/hotplug: Cache number of online CPUs

2019-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2019, Mathieu Desnoyers wrote:
> - On Jul 5, 2019, at 4:49 AM, Ingo Molnar mi...@kernel.org wrote:
> > * Mathieu Desnoyers  wrote:
> >> The semantic I am looking for here is C11's relaxed atomics.
> > 
> > What does this mean?
> 
> C11 states:
> 
> "Atomic operations specifying memory_order_relaxed are  relaxed  only  with  
> respect
> to memory ordering.  Implementations must still guarantee that any given 
> atomic access
> to a particular atomic object be indivisible with respect to all other atomic 
> accesses
> to that object."
> 
> So I am concerned that num_online_cpus() as proposed in this patch
> try to access __num_online_cpus non-atomically, and without using
> READ_ONCE().
>
> 
> Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex
> does not provide mutual exclusion against concurrent readers of that variable.

Again. This is nothing new. The current implementation of num_online_cpus()
has no guarantees whatsoever. 

bitmap_hweight() can be hit by a concurrent update of the mask it is
looking at.

num_online_cpus() gives you only the correct number if you invoke it inside
a cpuhp_lock held section. So why do we need that fuzz about atomicity now?

It's racy and was racy forever and even if we add that READ/WRITE_ONCE muck
then it still wont give you a reliable answer unless you hold cpuhp_lock at
least for read. So fore me that READ/WRITE_ONCE is just a cosmetic and
misleading reality distortion.

Thanks,

tglx






Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Paolo Bonzini
On 05/07/19 22:25, Thomas Gleixner wrote:
> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.

Oh, that brings back memories.  At the time I was working on Xen, so I
remember that CVE.  IIRC there was some mitigation but the fix was
basically to print a very scary error message if you used VT-d without
interrupt remapping.  Maybe force the user to add something on the Xen
command line too?

> The more interesting question is whether this is all relevant. If I
> understood the issue correctly then this is mitigated by proper interrupt
> remapping.

Yes, and for Linux we're good I think.  VFIO by default refuses to use
the IOMMU if interrupt remapping is absent or disabled, and KVM's own
(pre-VFIO) IOMMU support was removed a couple years ago.  I guess the
secure boot lockdown patches should outlaw VFIO's
allow_unsafe_interrupts option, but that's it.

> Is there any serious usage of virtualization w/o interrupt remapping left
> or have the machines which are not capable been retired already?

I think they were already starting to disappear in 2011, as I don't
remember much worry about customers that were using systems without it.

Paolo


Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Andrew Cooper
On 05/07/2019 20:19, Nadav Amit wrote:
>> On Jul 5, 2019, at 8:47 AM, Andrew Cooper  wrote:
>>
>> On 04/07/2019 16:51, Thomas Gleixner wrote:
>>>  2) The loop termination logic is interesting at best.
>>>
>>> If the machine has no TSC or cpu_khz is not known yet it tries 1
>>> million times to ack stale IRR/ISR bits. What?
>>>
>>> With TSC it uses the TSC to calculate the loop termination. It takes a
>>> timestamp at entry and terminates the loop when:
>>>
>>>   (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>>>
>>> That's roughly one second.
>>>
>>> Both methods are problematic. The APIC has 256 vectors, which means
>>> that in theory max. 256 IRR/ISR bits can be set. In practice this is
>>> impossible as the first 32 vectors are reserved and not affected and
>>> the chance that more than a few bits are set is close to zero.
>> [Disclaimer.  I talked to Thomas in private first, and he asked me to
>> post this publicly as the CVE is almost a decade old already.]
>>
>> I'm afraid that this isn't quite true.
>>
>> In terms of IDT vectors, the first 32 are reserved for exceptions, but
>> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
>> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
>>
>> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
>> I'm disappointed to see wasn't shared with other software vendors at the
>> time.
> IIRC (and from skimming the CVE again) the basic problem in Xen was that
> MSIs can be used when devices are assigned to generate IRQs with arbitrary
> vectors. The mitigation was to require interrupt remapping to be enabled in
> the IOMMU when IOMMU is used for DMA remapping (i.e., device assignment).
>
> Are you concerned about this case, additional concrete ones, or is it about
> security hardening? (or am I missing something?)

The phrase "impossible as the first 32 vectors are reserved" stuck out,
because its not true.  That generally means that any logic derived from
it is also false. :)

In practice, I was thinking more about robustness against buggy
conditions.  Setting TPR to 1 at start of day is very easy.  Some of the
other protections, less so.

When it comes to virtualisation, security is an illusion when a guest
kernel has a real piece of hardware in its hands.  Anyone who is under
the misapprehension otherwise should try talking to a IOMMU hardware
engineer and see the reaction on their face.  IOMMUs were designed to
isolate devices when all controlling software was of the same privilege
level.  They don't magically make the system safe against a hostile
guest device driver, which in the most basic case, can still mount a DoS
attempt with deliberately bad DMA.

~Andrew


Re: [PATCH] dax: Fix missed PMD wakeups

2019-07-05 Thread Dan Williams
On Fri, Jul 5, 2019 at 12:10 PM Matthew Wilcox  wrote:
>
> On Thu, Jul 04, 2019 at 04:27:14PM -0700, Dan Williams wrote:
> > On Thu, Jul 4, 2019 at 12:14 PM Matthew Wilcox  wrote:
> > >
> > > On Thu, Jul 04, 2019 at 06:54:50PM +0200, Jan Kara wrote:
> > > > On Wed 03-07-19 20:27:28, Matthew Wilcox wrote:
> > > > > So I think we're good for all current users.
> > > >
> > > > Agreed but it is an ugly trap. As I already said, I'd rather pay the
> > > > unnecessary cost of waiting for pte entry and have an easy to understand
> > > > interface. If we ever have a real world use case that would care for 
> > > > this
> > > > optimization, we will need to refactor functions to make this possible 
> > > > and
> > > > still keep the interfaces sane. For example get_unlocked_entry() could
> > > > return special "error code" indicating that there's no entry with 
> > > > matching
> > > > order in xarray but there's a conflict with it. That would be much less
> > > > error-prone interface.
> > >
> > > This is an internal interface.  I think it's already a pretty gnarly
> > > interface to use by definition -- it's going to sleep and might return
> > > almost anything.  There's not much scope for returning an error indicator
> > > either; value entries occupy half of the range (all odd numbers between 1
> > > and ULONG_MAX inclusive), plus NULL.  We could use an internal entry, but
> > > I don't think that makes the interface any easier to use than returning
> > > a locked entry.
> > >
> > > I think this iteration of the patch makes it a little clearer.  What do 
> > > you
> > > think?
> > >
> >
> > Not much clearer to me. get_unlocked_entry() is now misnamed and this
>
> misnamed?  You'd rather it was called "try_get_unlocked_entry()"?

I was thinking more along the lines of
get_unlocked_but_sometimes_locked_entry(), i.e. per Jan's feedback to
keep the interface simple.


[PATCH] rtc: zynqmp: One function call less in xlnx_rtc_alarm_irq_enable()

2019-07-05 Thread Markus Elfring
From: Markus Elfring 
Date: Fri, 5 Jul 2019 22:37:58 +0200

Avoid an extra function call by using a ternary operator instead of
a conditional statement for a setting selection.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/rtc/rtc-zynqmp.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/rtc/rtc-zynqmp.c b/drivers/rtc/rtc-zynqmp.c
index 00639594de0c..4631019a54e2 100644
--- a/drivers/rtc/rtc-zynqmp.c
+++ b/drivers/rtc/rtc-zynqmp.c
@@ -124,11 +124,8 @@ static int xlnx_rtc_alarm_irq_enable(struct device *dev, 
u32 enabled)
 {
struct xlnx_rtc_dev *xrtcdev = dev_get_drvdata(dev);

-   if (enabled)
-   writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_EN);
-   else
-   writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_DIS);
-
+   writel(RTC_INT_ALRM,
+  xrtcdev->reg_base + (enabled ? RTC_INT_EN : RTC_INT_DIS));
return 0;
 }

--
2.22.0



[PATCH bpf-next 1/2] bpf, libbpf: add a new API bpf_object__reuse_maps()

2019-07-05 Thread Anton Protopopov
Add a new API bpf_object__reuse_maps() which can be used to replace all maps in
an object by maps pinned to a directory provided in the path argument.  Namely,
each map M in the object will be replaced by a map pinned to path/M.name.

Signed-off-by: Anton Protopopov 
---
 tools/lib/bpf/libbpf.c   | 34 ++
 tools/lib/bpf/libbpf.h   |  2 ++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 37 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4907997289e9..84c9e8f7bfd3 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3144,6 +3144,40 @@ int bpf_object__unpin_maps(struct bpf_object *obj, const 
char *path)
return 0;
 }
 
+int bpf_object__reuse_maps(struct bpf_object *obj, const char *path)
+{
+   struct bpf_map *map;
+
+   if (!obj)
+   return -ENOENT;
+
+   if (!path)
+   return -EINVAL;
+
+   bpf_object__for_each_map(map, obj) {
+   int len, err;
+   int pinned_map_fd;
+   char buf[PATH_MAX];
+
+   len = snprintf(buf, PATH_MAX, "%s/%s", path, 
bpf_map__name(map));
+   if (len < 0) {
+   return -EINVAL;
+   } else if (len >= PATH_MAX) {
+   return -ENAMETOOLONG;
+   }
+
+   pinned_map_fd = bpf_obj_get(buf);
+   if (pinned_map_fd < 0)
+   return pinned_map_fd;
+
+   err = bpf_map__reuse_fd(map, pinned_map_fd);
+   if (err)
+   return err;
+   }
+
+   return 0;
+}
+
 int bpf_object__pin_programs(struct bpf_object *obj, const char *path)
 {
struct bpf_program *prog;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index d639f47e3110..7fe465a1be76 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -82,6 +82,8 @@ int bpf_object__variable_offset(const struct bpf_object *obj, 
const char *name,
 LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path);
 LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj,
  const char *path);
+LIBBPF_API int bpf_object__reuse_maps(struct bpf_object *obj,
+ const char *path);
 LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj,
const char *path);
 LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 2c6d835620d2..66a30be6696c 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -172,5 +172,6 @@ LIBBPF_0.0.4 {
btf_dump__new;
btf__parse_elf;
bpf_object__load_xattr;
+   bpf_object__reuse_maps;
libbpf_num_possible_cpus;
 } LIBBPF_0.0.3;
-- 
2.19.1



[PATCH bpf-next 2/2] bpf, libbpf: add an option to reuse existing maps in bpf_prog_load_xattr

2019-07-05 Thread Anton Protopopov
Add a new pinned_maps_path member to the bpf_prog_load_attr structure and
extend the bpf_prog_load_xattr() function to pass this pointer to the new
bpf_object__reuse_maps() helper. This change provides users with a simple
way to use existing pinned maps when (re)loading BPF programs.

Signed-off-by: Anton Protopopov 
---
 tools/lib/bpf/libbpf.c | 8 
 tools/lib/bpf/libbpf.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 84c9e8f7bfd3..9daa09c9fe1a 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3953,6 +3953,14 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr 
*attr,
first_prog = prog;
}
 
+   if (attr->pinned_maps_path) {
+   err = bpf_object__reuse_maps(obj, attr->pinned_maps_path);
+   if (err < 0) {
+   bpf_object__close(obj);
+   return err;
+   }
+   }
+
bpf_object__for_each_map(map, obj) {
if (!bpf_map__is_offload_neutral(map))
map->map_ifindex = attr->ifindex;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 7fe465a1be76..6bf405bb9c1f 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -329,6 +329,7 @@ struct bpf_prog_load_attr {
int ifindex;
int log_level;
int prog_flags;
+   const char *pinned_maps_path;
 };
 
 LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr,
-- 
2.19.1



[PATCH bpf-next 0/2] libbpf: add an option to reuse maps when loading a program

2019-07-05 Thread Anton Protopopov
The following two patches add an option for users to reuse existing maps when
loading a program using the bpf_prog_load_xattr function.  A user can specify a
directory containing pinned maps inside the bpf_prog_load_attr structure, and in
this case the bpf_prog_load_xattr function will replace (bpf_map__reuse_fd) all
maps defined in the object with file descriptors obtained from corresponding
entries from the specified directory.

Anton Protopopov (2):
  bpf, libbpf: add a new API bpf_object__reuse_maps()
  bpf, libbpf: add an option to reuse existing maps in bpf_prog_load_xattr

 tools/lib/bpf/libbpf.c   | 42 
 tools/lib/bpf/libbpf.h   |  3 +++
 tools/lib/bpf/libbpf.map |  1 +
 3 files changed, 46 insertions(+)

--
2.19.1


Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Andy Lutomirski
On Fri, Jul 5, 2019 at 1:36 PM Thomas Gleixner  wrote:
>
> On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper  
> > wrote:
> > > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> > > without an error code on the stack, which results in a corrupt pt_regs
> > > in the exception handler, and a stack underflow on the way back out,
> > > most likely with a fault on IRET.
> > >
> > > These can be addressed by setting TPR to 0x10, which will inhibit
> > > delivery of any errant IPIs in this range, but some extra sanity logic
> > > may not go amiss.  An error code on a 64bit stack can be spotted with
> > > `testb $8, %spl` due to %rsp being aligned before pushing the exception
> > > frame.
> >
> > Several years ago, I remember having a discussion with someone (Jan
> > Beulich, maybe?) about how to efficiently make the entry code figure
> > out the error code situation automatically.  I suspect it was on IRC
> > and I can't find the logs.  I'm thinking that maybe we should just
> > make Linux's idtentry code do something like this.
> >
> > If nothing else, we could make idtentry do:
> >
> > testl $8, %esp   /* shorter than testb IIRC */
> > jz 1f  /* or jnz -- too lazy to figure it out */
> > pushq $-1
> > 1:
>
> Errm, no. We should not silently paper over it. If we detect that this came
> in with a wrong stack frame, i.e. not from a CPU originated exception, then
> we truly should yell loud. Also in that case you want to check the APIC:ISR
> and issue an EOI to clear it.

It gives us the option to replace idtentry with something
table-driven.  I don't think I love it, but it's not an awful idea.



>
> > > Another interesting problem is an IPI which its vector 0x80.  A cunning
> > > attacker can use this to simulate system calls from unsuspecting
> > > positions in userspace, or for interrupting kernel context.  At the very
> > > least the int0x80 path does an unconditional swapgs, so will try to run
> > > with the user gs, and I expect things will explode quickly from there.
> >
> > At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I
>
> How does it help? It still crashes the kernel.
>
> > suppose we could harden this by adding a special check to int $0x80 to
> > validate GSBASE.
>
> > > One option here is to look at ISR and complain if it is found to be set.
> >
> > Barring some real hackery, we're toast long before we get far enough to
> > do that.
>
> No. We can map the APIC into the user space visible page tables for PTI
> without compromising the PTI isolation and it can be read very early on
> before SWAPGS. All you need is a register to clobber not more. It the ISR
> is set, then go into an error path, yell loudly, issue EOI and return.
> The only issue I can see is: It's slow :)
>
>

I think this will be really extremely slow.  If we can restrict this
to x2apic machines, then maybe it's not so awful.

FWIW, if we just patch up the GS thing, then we are still vulnerable:
the bad guy can arrange for a privileged process to have register
state corresponding to a dangerous syscall and then send an int $0x80
via the APIC.


Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Andy Lutomirski
On Fri, Jul 5, 2019 at 1:25 PM Thomas Gleixner  wrote:
>
> Andrew,
>
> >
> > These can be addressed by setting TPR to 0x10, which will inhibit
>
> Right, that's easy and obvious.
>

This boots:

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 177aa8ef2afa..5257c40bde6c 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1531,11 +1531,14 @@ static void setup_local_APIC(void)
 #endif

/*
-* Set Task Priority to 'accept all'. We never change this
-* later on.
+* Set Task Priority to 'accept all except vectors 0-31'.  An APIC
+* vector in the 16-31 range can be delivered otherwise, but we'll
+* think it's an exception and terrible things will happen.
+* We never change this later on.
 */
value = apic_read(APIC_TASKPRI);
value &= ~APIC_TPRI_MASK;
+   value |= 0x10;
apic_write(APIC_TASKPRI, value);

apic_pending_intr_clear();


Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Thomas Gleixner
On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper  
> wrote:
> > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> > without an error code on the stack, which results in a corrupt pt_regs
> > in the exception handler, and a stack underflow on the way back out,
> > most likely with a fault on IRET.
> >
> > These can be addressed by setting TPR to 0x10, which will inhibit
> > delivery of any errant IPIs in this range, but some extra sanity logic
> > may not go amiss.  An error code on a 64bit stack can be spotted with
> > `testb $8, %spl` due to %rsp being aligned before pushing the exception
> > frame.
> 
> Several years ago, I remember having a discussion with someone (Jan
> Beulich, maybe?) about how to efficiently make the entry code figure
> out the error code situation automatically.  I suspect it was on IRC
> and I can't find the logs.  I'm thinking that maybe we should just
> make Linux's idtentry code do something like this.
> 
> If nothing else, we could make idtentry do:
> 
> testl $8, %esp   /* shorter than testb IIRC */
> jz 1f  /* or jnz -- too lazy to figure it out */
> pushq $-1
> 1:

Errm, no. We should not silently paper over it. If we detect that this came
in with a wrong stack frame, i.e. not from a CPU originated exception, then
we truly should yell loud. Also in that case you want to check the APIC:ISR
and issue an EOI to clear it.

> > Another interesting problem is an IPI which its vector 0x80.  A cunning
> > attacker can use this to simulate system calls from unsuspecting
> > positions in userspace, or for interrupting kernel context.  At the very
> > least the int0x80 path does an unconditional swapgs, so will try to run
> > with the user gs, and I expect things will explode quickly from there.
> 
> At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I

How does it help? It still crashes the kernel.

> suppose we could harden this by adding a special check to int $0x80 to
> validate GSBASE.

> > One option here is to look at ISR and complain if it is found to be set.
> 
> Barring some real hackery, we're toast long before we get far enough to
> do that.

No. We can map the APIC into the user space visible page tables for PTI
without compromising the PTI isolation and it can be read very early on
before SWAPGS. All you need is a register to clobber not more. It the ISR
is set, then go into an error path, yell loudly, issue EOI and return.
The only issue I can see is: It's slow :)

Thanks,

tglx




[GIT PULL] Final KVM changes for 5.2

2019-07-05 Thread Paolo Bonzini
Linus,

The following changes since commit 6fbc7275c7a9ba97877050335f290341a1fd8dbf:

  Linux 5.2-rc7 (2019-06-30 11:25:36 +0800)

are available in the git repository at:

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to e644fa18e2ffc8895ca30dade503ae10128573a6:

  KVM: arm64/sve: Fix vq_present() macro to yield a bool (2019-07-05 12:07:51 
+0200)


x86 bugfix patches and one compilation fix for ARM.


Liran Alon (2):
  KVM: nVMX: Allow restore nested-state to enable eVMCS when vCPU in SMM
  KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from 
eVMCS

Paolo Bonzini (1):
  KVM: x86: degrade WARN to pr_warn_ratelimited

Wanpeng Li (1):
  KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC

Zhang Lei (1):
  KVM: arm64/sve: Fix vq_present() macro to yield a bool

 arch/arm64/kvm/guest.c  |  2 +-
 arch/x86/kvm/lapic.c|  2 +-
 arch/x86/kvm/vmx/nested.c   | 30 -
 arch/x86/kvm/x86.c  |  6 ++---
 tools/testing/selftests/kvm/x86_64/evmcs_test.c |  1 +
 5 files changed, 26 insertions(+), 15 deletions(-)


Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Thomas Gleixner
Andrew,

On Fri, 5 Jul 2019, Andrew Cooper wrote:

> On 04/07/2019 16:51, Thomas Gleixner wrote:
> >   2) The loop termination logic is interesting at best.
> >
> >  If the machine has no TSC or cpu_khz is not known yet it tries 1
> >  million times to ack stale IRR/ISR bits. What?
> >
> >  With TSC it uses the TSC to calculate the loop termination. It takes a
> >  timestamp at entry and terminates the loop when:
> >
> >   (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
> >
> >  That's roughly one second.
> >
> >  Both methods are problematic. The APIC has 256 vectors, which means
> >  that in theory max. 256 IRR/ISR bits can be set. In practice this is
> >  impossible as the first 32 vectors are reserved and not affected and
> >  the chance that more than a few bits are set is close to zero.
> 
> [Disclaimer.  I talked to Thomas in private first, and he asked me to
> post this publicly as the CVE is almost a decade old already.]

thanks for bringing this up!

> I'm afraid that this isn't quite true.
> 
> In terms of IDT vectors, the first 32 are reserved for exceptions, but
> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).

Indeed.

> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.

No comment.

> Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> without an error code on the stack, which results in a corrupt pt_regs
> in the exception handler, and a stack underflow on the way back out,
> most likely with a fault on IRET.
> 
> These can be addressed by setting TPR to 0x10, which will inhibit

Right, that's easy and obvious.

> delivery of any errant IPIs in this range, but some extra sanity logic
> may not go amiss.  An error code on a 64bit stack can be spotted with
> `testb $8, %spl` due to %rsp being aligned before pushing the exception
> frame.

The question is what we do with that information :)

> Another interesting problem is an IPI which its vector 0x80.  A cunning
> attacker can use this to simulate system calls from unsuspecting
> positions in userspace, or for interrupting kernel context.  At the very
> least the int0x80 path does an unconditional swapgs, so will try to run
> with the user gs, and I expect things will explode quickly from there.

Cute.

> One option here is to look at ISR and complain if it is found to be set.

That's slw, but could at least provide an option to do so.

> Another option, which I've only just remembered, is that AMD hardware
> has the Interrupt Enable Register in its extended APIC space, which may
> or may not be good enough to prohibit delivery of 0x80.  There isn't
> enough information in the APM to be clear, but the name suggests it is
> worth experimenting with.

I doubt it. Clearing a bit in the IER takes the interrupt out of the
priority decision logic. That's a SVM feature so interrupts directed
directly to guests cannot block other interrupts if they are not
serviced. It's grossly misnomed and won't help with the int80 issue.

The more interesting question is whether this is all relevant. If I
understood the issue correctly then this is mitigated by proper interrupt
remapping.

Is there any serious usage of virtualization w/o interrupt remapping left
or have the machines which are not capable been retired already?

Thanks,

tglx

Re: [PATCH 2/2] leds: tlc591xx: Use the OF version of the LED registration function

2019-07-05 Thread Pavel Machek
On Mon 2019-07-01 17:26:02, Jean-Jacques Hiblot wrote:
> The driver parses the device-tree to identify which LED should be handled.
> Since the information about the device node is known at this time, we can
> provide the LED core with it. It may be useful later.
> 
> Signed-off-by: Jean-Jacques Hiblot 

Acked-by: Pavel Machek 

> @@ -207,7 +207,7 @@ tlc591xx_probe(struct i2c_client *client,
>   led->led_no = idx++;
>   led->ldev.brightness_set_blocking = tlc591xx_brightness_set;
>   led->ldev.max_brightness = LED_FULL;
> - err = devm_led_classdev_register(dev, >ldev);
> + err = devm_of_led_classdev_register(dev, child, >ldev);
>   if (err < 0) {
>   dev_err(dev, "couldn't register LED %s\n",
>   led->ldev.name);

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH 2/8] leds: as3645a: Fix misuse of strlcpy

2019-07-05 Thread Pavel Machek
On Thu 2019-07-04 16:57:42, Joe Perches wrote:
> Probable cut typo - use the correct field size.
> 
> Signed-off-by: Joe Perches 

Ack.
Pavel
> ---
>  drivers/leds/leds-as3645a.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/leds/leds-as3645a.c b/drivers/leds/leds-as3645a.c
> index 14ab6b0e4de9..050088dff8dd 100644
> --- a/drivers/leds/leds-as3645a.c
> +++ b/drivers/leds/leds-as3645a.c
> @@ -668,7 +668,7 @@ static int as3645a_v4l2_setup(struct as3645a *flash)
>   };
>  
>   strlcpy(cfg.dev_name, led->name, sizeof(cfg.dev_name));
> - strlcpy(cfgind.dev_name, flash->iled_cdev.name, sizeof(cfg.dev_name));
> + strlcpy(cfgind.dev_name, flash->iled_cdev.name, 
> sizeof(cfgind.dev_name));
>  
>   flash->vf = v4l2_flash_init(
>   >client->dev, flash->flash_node, >fled, NULL,

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


[PATCH] ACPI: PM: Fix "multiple definition of acpi_sleep_state_supported" for ARM64

2019-07-05 Thread Dexuan Cui


If CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT is not set, the dummy version of
the function should be static.

Fixes: 1e2c3f0f1e93 ("ACPI: PM: Make acpi_sleep_state_supported() non-static")
Signed-off-by: Dexuan Cui 
Reported-by: kbuild test robot 
---

Sorry for not doing it right in the previous patch!

The patch fixes the build errors on ARM64:

   drivers/net/ethernet/qualcomm/emac/emac-phy.o: In function 
`acpi_sleep_state_supported':
>> emac-phy.c:(.text+0x1d8): multiple definition of `acpi_sleep_state_supported'
   drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first 
defined here
   drivers/net/ethernet/qualcomm/emac/emac-sgmii.o: In function 
`acpi_sleep_state_supported':
   emac-sgmii.c:(.text+0x548): multiple definition of 
`acpi_sleep_state_supported'
   drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first 
defined here


 include/acpi/acpi_bus.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 4ce59bdc852e..8ffc4acf2b56 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -657,7 +657,7 @@ static inline int acpi_pm_set_bridge_wakeup(struct device 
*dev, bool enable)
 #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT
 bool acpi_sleep_state_supported(u8 sleep_state);
 #else
-bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
+static bool acpi_sleep_state_supported(u8 sleep_state) { return false; }
 #endif
 
 #ifdef CONFIG_ACPI_SLEEP
-- 
2.17.1



Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Andrew Cooper
On 05/07/2019 20:06, Andy Lutomirski wrote:
> On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper  
> wrote:
>> On 04/07/2019 16:51, Thomas Gleixner wrote:
>>>   2) The loop termination logic is interesting at best.
>>>
>>>  If the machine has no TSC or cpu_khz is not known yet it tries 1
>>>  million times to ack stale IRR/ISR bits. What?
>>>
>>>  With TSC it uses the TSC to calculate the loop termination. It takes a
>>>  timestamp at entry and terminates the loop when:
>>>
>>> (rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>>>
>>>  That's roughly one second.
>>>
>>>  Both methods are problematic. The APIC has 256 vectors, which means
>>>  that in theory max. 256 IRR/ISR bits can be set. In practice this is
>>>  impossible as the first 32 vectors are reserved and not affected and
>>>  the chance that more than a few bits are set is close to zero.
>> [Disclaimer.  I talked to Thomas in private first, and he asked me to
>> post this publicly as the CVE is almost a decade old already.]
>>
>> I'm afraid that this isn't quite true.
>>
>> In terms of IDT vectors, the first 32 are reserved for exceptions, but
>> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
>> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
>>
>> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
>> I'm disappointed to see wasn't shared with other software vendors at the
>> time.
>>
>> Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
>> without an error code on the stack, which results in a corrupt pt_regs
>> in the exception handler, and a stack underflow on the way back out,
>> most likely with a fault on IRET.
>>
>> These can be addressed by setting TPR to 0x10, which will inhibit
>> delivery of any errant IPIs in this range, but some extra sanity logic
>> may not go amiss.  An error code on a 64bit stack can be spotted with
>> `testb $8, %spl` due to %rsp being aligned before pushing the exception
>> frame.
> Several years ago, I remember having a discussion with someone (Jan
> Beulich, maybe?) about how to efficiently make the entry code figure
> out the error code situation automatically.  I suspect it was on IRC
> and I can't find the logs.

It was on IRC, but I don't remember exactly when, either.

> I'm thinking that maybe we should just
> make Linux's idtentry code do something like this.
>
> If nothing else, we could make idtentry do:
>
> testl $8, %esp   /* shorter than testb IIRC */

Sadly not.  test (unlike cmp and the basic mutative opcodes) doesn't
have a sign-extendable imm8 encoding.  The two options are:

f7 c4 08 00 00 00        test   $0x8,%esp
40 f6 c4 08      test   $0x8,%spl

> jz 1f  /* or jnz -- too lazy to figure it out */
> pushq $-1
> 1:

It is jz, and Xen does use this sequence for reserved/unimplemented
vectors, but we expect those codepaths never to be executed.

>
> instead of the current hardcoded push.  The cost of a mispredicted
> branch here will be smallish compared to the absurdly large cost of
> the entry itself.  But I thought I had something more clever than
> this.  This sequence works, but it still feels like it should be
> possible to do better:
>
> .macro PUSH_ERROR_IF_NEEDED
> /*
>  * Before the IRET frame is pushed, RSP is aligned to a 16-byte
>  * boundary.  After SS .. RIP and the error code are pushed, RSP is
>  * once again aligned.  Pushing -1 will put -1 in the error code slot
>  * (regs->orig_ax) if there was no error code.
> */
>
> pushq$-1/* orig_ax = -1, maybe */
> /* now RSP points to orig_ax (aligned) or di (misaligned) */
> pushq$0
> /* now RSP points to di (misaligned) or si (aligned) */
> orq$8, %rsp
> /* now RSP points to di */
> addq$8, %rsp
> /* now RSP points to orig_ax, and we're in good shape */
> .endm
>
> Is there a better sequence for this?

The only aspect I can think of is whether mixing the push/pops with
explicit updates updates to %rsp is better or worse than a very well
predicted branch, given that various frontends have special tracking to
reduce instruction dependencies on %rsp.  I'll have to defer to the CPU
microachitects as to which of the two options is the lesser evil.

That said, both Intel and AMD's Optimisation guides have stack alignment
suggestions which mix push/sub/and on function prolog, so I expect this
is as optimised as it can reasonably be in the pipelines.

>> Another interesting problem is an IPI which its vector 0x80.  A cunning
>> attacker can use this to simulate system calls from unsuspecting
>> positions in userspace, or for interrupting kernel context.  At the very
>> least the int0x80 path does an unconditional swapgs, so will try to run
>> with the user gs, and I expect things will explode quickly from there.
> At least SMAP helps here on non-FSGSBASE systems.  With FSGSBASE, I
> suppose we could harden this by adding a special 

[PATCH] rtc: stm32: One function call less in stm32_rtc_set_alarm()

2019-07-05 Thread Markus Elfring
From: Markus Elfring 
Date: Fri, 5 Jul 2019 22:10:10 +0200

Avoid an extra function call by using a ternary operator instead of
a conditional statement.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/rtc/rtc-stm32.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/rtc/rtc-stm32.c b/drivers/rtc/rtc-stm32.c
index 8e6c9b3bcc29..83793b530fed 100644
--- a/drivers/rtc/rtc-stm32.c
+++ b/drivers/rtc/rtc-stm32.c
@@ -519,11 +519,7 @@ static int stm32_rtc_set_alarm(struct device *dev, struct 
rtc_wkalrm *alrm)
/* Write to Alarm register */
writel_relaxed(alrmar, rtc->base + regs->alrmar);

-   if (alrm->enabled)
-   stm32_rtc_alarm_irq_enable(dev, 1);
-   else
-   stm32_rtc_alarm_irq_enable(dev, 0);
-
+   stm32_rtc_alarm_irq_enable(dev, alrm->enabled ? 1 : 0);
 end:
stm32_rtc_wpr_lock(rtc);

--
2.22.0



Re: [alsa-devel] [PATCH] sound: soc: codecs: wcd9335: add irqflag IRQF_ONESHOT flag

2019-07-05 Thread Ladislav Michl
On Fri, Jul 05, 2019 at 12:40:26AM +0530, Hariprasad Kelam wrote:
> Add IRQF_ONESHOT to ensure "Interrupt is not reenabled after the hardirq
> handler finished".
> 
> fixes below issue reported by coccicheck
> 
> sound/soc/codecs/wcd9335.c:4068:8-33: ERROR: Threaded IRQ with no
> primary handler requested without IRQF_ONESHOT
> 
> Signed-off-by: Hariprasad Kelam 
> ---
>  sound/soc/codecs/wcd9335.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/sound/soc/codecs/wcd9335.c b/sound/soc/codecs/wcd9335.c
> index 85737fe..7ab9bf6f 100644
> --- a/sound/soc/codecs/wcd9335.c
> +++ b/sound/soc/codecs/wcd9335.c
> @@ -4056,6 +4056,9 @@ static struct wcd9335_irq wcd9335_irqs[] = {
>  static int wcd9335_setup_irqs(struct wcd9335_codec *wcd)
>  {
>   int irq, ret, i;
> + unsigned long irqflags;
> +
> + irqflags = IRQF_TRIGGER_RISING | IRQF_ONESHOT;

Why does this change trigger adding a variable?

>   for (i = 0; i < ARRAY_SIZE(wcd9335_irqs); i++) {
>   irq = regmap_irq_get_virq(wcd->irq_data, wcd9335_irqs[i].irq);
> @@ -4067,7 +4070,7 @@ static int wcd9335_setup_irqs(struct wcd9335_codec *wcd)
>  
>   ret = devm_request_threaded_irq(wcd->dev, irq, NULL,
>   wcd9335_irqs[i].handler,
> - IRQF_TRIGGER_RISING,
> + irqflags,
>   wcd9335_irqs[i].name, wcd);
>   if (ret) {
>   dev_err(wcd->dev, "Failed to request %s\n",
> -- 
> 2.7.4
> 
> ___
> Alsa-devel mailing list
> alsa-de...@alsa-project.org
> https://mailman.alsa-project.org/mailman/listinfo/alsa-devel


Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.

2019-07-05 Thread Luis Chamberlain


Please re-state the main fix in the commit log, not just the subject.

Also, this does not explain why the current values are and the impact to
systems / users. This would help in determine and evaluating if this
deserves to be a stable fix.

On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote:
> This also fixes a problem where, in a user namespace without root user
> mapping, it is not possible to write to /proc/sys/kernel/shmmax.

This does not explain why that should be possible and what impact this
limitation has.

> The problem was introduced by the combination of the two commits:
> * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update
>   i_[ug]id_(read|write) to translate relative to s_user_ns
> - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid
> members of /proc/sys inodes if a containing userns does not have
> entries for root in the uid/gid_map.
This is 2014 commit merged as of v4.8.

> * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes
>   with a uid or gid unknown to the vfs
> - changed the kernel to prevent opens for write if the i_uid/i_gid
> field in the inode is invalid

This is a 2016 commit merged as of v4.8 as well...

So regardless of the dates of the commits, are you saying this is a
regression you can confirm did not exist prior to v4.8? Did you test
v4.7 to confirm?

> This commit fixes the issue by defaulting i_uid/i_gid to
> GLOBAL_ROOT_UID/GID.

Why is this right?

> Note that these values are not used for /proc/sys
> access checks, so the change does not otherwise affect /proc semantics.
> 
> Tested: Used a repro program that creates a user namespace without any
> mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside.
> Before the change, it shows the overflow uid, with the change it's 0.

Why is the overflow uid bad for user experience? Did you test prior to
v4.8, ie on v4.7 to confirm this is indeed a regression?

You'd want then to also ammend in the commit log a Fixes:  tag with both
commits listed. If this is a stable fix (criteria yet to be determined),
then we'd need a stable tag.

  Luis

> Signed-off-by: Radoslaw Burny 
> ---
> Changelog since v1:
> - Updated the commit title and description.
> 
>  fs/proc/proc_sysctl.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
> index c74570736b24..36ad1b0d6259 100644
> --- a/fs/proc/proc_sysctl.c
> +++ b/fs/proc/proc_sysctl.c
> @@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode(struct 
> super_block *sb,
>  
>   if (root->set_ownership)
>   root->set_ownership(head, table, >i_uid, >i_gid);
> + else {
> + inode->i_uid = GLOBAL_ROOT_UID;
> + inode->i_gid = GLOBAL_ROOT_GID;
> + }
>  
>   return inode;
>  }
> -- 
> 2.22.0.410.gd8fdbe21b5-goog
> 


Re: [PATCH] rcuperf: Make rcuperf kernel test more robust for !expedited mode

2019-07-05 Thread Joel Fernandes
On Fri, Jul 05, 2019 at 08:09:32AM -0700, Paul E. McKenney wrote:
> On Fri, Jul 05, 2019 at 08:24:50AM -0400, Joel Fernandes wrote:
> > On Fri, Jul 05, 2019 at 12:52:31PM +0900, Byungchul Park wrote:
> > > On Thu, Jul 04, 2019 at 10:40:44AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jul 04, 2019 at 12:34:30AM -0400, Joel Fernandes (Google) wrote:
> > > > > It is possible that the rcuperf kernel test runs concurrently with 
> > > > > init
> > > > > starting up.  During this time, the system is running all grace 
> > > > > periods
> > > > > as expedited.  However, rcuperf can also be run for normal GP tests.
> > > > > Right now, it depends on a holdoff time before starting the test to
> > > > > ensure grace periods start later. This works fine with the default
> > > > > holdoff time however it is not robust in situations where init takes
> > > > > greater than the holdoff time to finish running. Or, as in my case:
> > > > > 
> > > > > I modified the rcuperf test locally to also run a thread that did
> > > > > preempt disable/enable in a loop. This had the effect of slowing down
> > > > > init. The end result was that the "batches:" counter in rcuperf was 0
> > > > > causing a division by 0 error in the results. This counter was 0 
> > > > > because
> > > > > only expedited GPs seem to happen, not normal ones which led to the
> > > > > rcu_state.gp_seq counter remaining constant across grace periods which
> > > > > unexpectedly happen to be expedited. The system was running expedited
> > > > > RCU all the time because rcu_unexpedited_gp() would not have run yet
> > > > > from init.  In other words, the test would concurrently with init
> > > > > booting in expedited GP mode.
> > > > > 
> > > > > To fix this properly, let us check if system_state if SYSTEM_RUNNING
> > > > > is set before starting the test. The system_state approximately aligns
> > > 
> > > Just minor typo..
> > > 
> > > To fix this properly, let us check if system_state if SYSTEM_RUNNING
> > > is set before starting the test. ...
> > > 
> > > Should be
> > > 
> > > To fix this properly, let us check if system_state is set to
> > > SYSTEM_RUNNING before starting the test. ...
> > 
> > That's a fair point. I wonder if Paul already fixed it up in his tree,
> > however I am happy to resend if he hasn't. Paul, how would you like to 
> > handle
> > this commit log nit?
> > 
> > it is just 'if ..' to 'is SYSTEM_RUNNING'
> 
> It now reads as follows:
> 
>   To fix this properly, this commit waits until system_state is
>   set to SYSTEM_RUNNING before starting the test.  This change is
>   made just before kernel_init() invokes rcu_end_inkernel_boot(),
>   and this latter is what turns off boot-time expediting of RCU
>   grace periods.

Ok, looks good to me, thanks.

And for below patch,

Reviewed-by: Joel Fernandes (Google) 


> I dropped the last paragraph about late_initcall().  And I suspect that
> the last clause from rcu_gp_is_expedited() can be dropped:
> 
> bool rcu_gp_is_expedited(void)
> {
>   return rcu_expedited || atomic_read(_expedited_nesting) ||
>  rcu_scheduler_active == RCU_SCHEDULER_INIT;
> }
> 
> This is because rcu_expedited_nesting is initialized to 1, and is
> decremented in rcu_end_inkernel_boot(), which is called long after
> rcu_scheduler_active has been set to RCU_SCHEDULER_RUNNING, which
> happens at core_initcall() time.  So if the last clause says "true",
> so does the second-to-last clause.
> 
> The similar check in rcu_gp_is_normal() is still need, however, to allow
> the power-management subsystem to invoke synchronize_rcu() just after
> the scheduler has been initialized, but before RCU is aware of this.
> 
> So, how about the commit shown below?
> 
>   Thanx, Paul
> 
> 
> 
> commit 1f7e72efe3c761c2b34da7b59e01ad69c657db10
> Author: Paul E. McKenney 
> Date:   Fri Jul 5 08:05:10 2019 -0700
> 
> rcu: Remove redundant "if" condition from rcu_gp_is_expedited()
> 
> Because rcu_expedited_nesting is initialized to 1 and not decremented
> until just before init is spawned, rcu_expedited_nesting is guaranteed
> to be non-zero whenever rcu_scheduler_active == RCU_SCHEDULER_INIT.
> This commit therefore removes this redundant "if" equality test.
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 249517058b13..64e9cc8609e7 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -136,8 +136,7 @@ static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1);
>   */
>  bool rcu_gp_is_expedited(void)
>  {
> - return rcu_expedited || atomic_read(_expedited_nesting) ||
> -rcu_scheduler_active == RCU_SCHEDULER_INIT;
> + return rcu_expedited || atomic_read(_expedited_nesting);
>  }
>  EXPORT_SYMBOL_GPL(rcu_gp_is_expedited);
>  
> 


Re: [PATCH net-next] hinic: add fw version query

2019-07-05 Thread Jakub Kicinski
On Fri, 5 Jul 2019 02:40:28 +, Xue Chaojing wrote:
> This patch adds firmware version query in ethtool -i.
> 
> Signed-off-by: Xue Chaojing 

Reviewed-by: Jakub Kicinski 


Buona giornata!

2019-07-05 Thread fuqingzheng
Buona giornata

   Ho una proposta commerciale reciproca, che si riferisce al trasferimento di 
una grande quantità di denaro su un conto all'estero, con il tuo aiuto come 
partner straniero come beneficiario dei fondi. Tutto su questa transazione sarà 
legale senza alcun ponte di autorità finanziaria sia nel mio paese che nel 
vostro. Se sei interessato e ti darò maggiori informazioni sul progetto non 
appena avrò ricevuto la tua risposta positiva.

Cordiali saluti,

Direttore esecutivo
 
ICBC. porcellana

---
Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware.
https://www.avast.com/antivirus



Re: [PATCH] nvme: One function call less in nvme_update_disk_info()

2019-07-05 Thread Jens Axboe
On 7/5/19 1:15 PM, Markus Elfring wrote:
> From: Markus Elfring 
> Date: Fri, 5 Jul 2019 21:08:12 +0200
> 
> Avoid an extra function call by using a ternary operator instead of
> a conditional statement.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 
> ---
>   drivers/nvme/host/core.c | 5 +
>   1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b2dd4e391f5c..73888195bdb2 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1650,10 +1650,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
>   nvme_config_discard(disk, ns);
>   nvme_config_write_zeroes(disk, ns);
> 
> - if (id->nsattr & (1 << 0))
> - set_disk_ro(disk, true);
> - else
> - set_disk_ro(disk, false);
> + set_disk_ro(disk, id->nsattr & (1 << 0) ? true : false);

Let's please not, the original is much more readable.

-- 
Jens Axboe



Re: [PATCH v6 net-next 2/5] net: ethernet: ti: davinci_cpdma: add dma mapped submit

2019-07-05 Thread kbuild test robot
Hi Ivan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Ivan-Khoronzhuk/xdp-allow-same-allocator-usage/20190706-003850
config: arm64-allmodconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 7.4.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=7.4.0 make.cross ARCH=arm64 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot 

All warnings (new ones prefixed by >>):

   drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit_si':
>> drivers/net//ethernet/ti/davinci_cpdma.c:1047:12: warning: cast from pointer 
>> to integer of different size [-Wpointer-to-int-cast]
  buffer = (u32)si->data;
   ^
   drivers/net//ethernet/ti/davinci_cpdma.c: In function 
'cpdma_chan_idle_submit_mapped':
>> drivers/net//ethernet/ti/davinci_cpdma.c:1114:12: warning: cast to pointer 
>> from integer of different size [-Wint-to-pointer-cast]
 si.data = (void *)(u32)data;
   ^
   drivers/net//ethernet/ti/davinci_cpdma.c: In function 
'cpdma_chan_submit_mapped':
   drivers/net//ethernet/ti/davinci_cpdma.c:1164:12: warning: cast to pointer 
from integer of different size [-Wint-to-pointer-cast]
 si.data = (void *)(u32)data;
   ^

vim +1047 drivers/net//ethernet/ti/davinci_cpdma.c

  1015  
  1016  static int cpdma_chan_submit_si(struct submit_info *si)
  1017  {
  1018  struct cpdma_chan   *chan = si->chan;
  1019  struct cpdma_ctlr   *ctlr = chan->ctlr;
  1020  int len = si->len;
  1021  int swlen = len;
  1022  struct cpdma_desc __iomem   *desc;
  1023  dma_addr_t  buffer;
  1024  u32 mode;
  1025  int ret;
  1026  
  1027  if (chan->count >= chan->desc_num)  {
  1028  chan->stats.desc_alloc_fail++;
  1029  return -ENOMEM;
  1030  }
  1031  
  1032  desc = cpdma_desc_alloc(ctlr->pool);
  1033  if (!desc) {
  1034  chan->stats.desc_alloc_fail++;
  1035  return -ENOMEM;
  1036  }
  1037  
  1038  if (len < ctlr->params.min_packet_size) {
  1039  len = ctlr->params.min_packet_size;
  1040  chan->stats.runt_transmit_buff++;
  1041  }
  1042  
  1043  mode = CPDMA_DESC_OWNER | CPDMA_DESC_SOP | CPDMA_DESC_EOP;
  1044  cpdma_desc_to_port(chan, mode, si->directed);
  1045  
  1046  if (si->flags & CPDMA_DMA_EXT_MAP) {
> 1047  buffer = (u32)si->data;
  1048  dma_sync_single_for_device(ctlr->dev, buffer, len, 
chan->dir);
  1049  swlen |= CPDMA_DMA_EXT_MAP;
  1050  } else {
  1051  buffer = dma_map_single(ctlr->dev, si->data, len, 
chan->dir);
  1052  ret = dma_mapping_error(ctlr->dev, buffer);
  1053  if (ret) {
  1054  cpdma_desc_free(ctlr->pool, desc, 1);
  1055  return -EINVAL;
  1056  }
  1057  }
  1058  
  1059  /* Relaxed IO accessors can be used here as there is read 
barrier
  1060   * at the end of write sequence.
  1061   */
  1062  writel_relaxed(0, >hw_next);
  1063  writel_relaxed(buffer, >hw_buffer);
  1064  writel_relaxed(len, >hw_len);
  1065  writel_relaxed(mode | len, >hw_mode);
  1066  writel_relaxed((uintptr_t)si->token, >sw_token);
  1067  writel_relaxed(buffer, >sw_buffer);
  1068  writel_relaxed(swlen, >sw_len);
  1069  desc_read(desc, sw_len);
  1070  
  1071  __cpdma_chan_submit(chan, desc);
  1072  
  1073  if (chan->state == CPDMA_STATE_ACTIVE && chan->rxfree)
  1074  chan_write(chan, rxfree, 1);
  1075  
  1076  chan->count++;
  1077  return 0;
  1078  }
  1079  
  1080  int cpdma_chan_idle_submit(struct cpdma_chan *chan, void *token, void 
*data,
  1081 int len, int directed)
  1082  {
  1083  struct submit_info si;
  1084  unsigned long flags;
  1085  int ret;
  1086  
  1087  si.chan = chan;
  1088  si.token = token;
  1089  si.data = data;
  1090  si.len = len;
  1091  si.directed = directed;
  1092  si.flags = 0;
  1093  
  1094  spin_lock_irqsave(>lock, flags);
  1095  if (chan->state == CPDMA_STATE_TEARDOWN) {
  1096  spin_unlock_irqrestore(>lock, flags);
  1097  return 

[PULL REQUEST] i2c for 5.2

2019-07-05 Thread Wolfram Sang
Linus,

I2C has a MAINTAINERS update which will be benfitial for developers, so
let's add it right away.

Please pull.

Thanks,

   Wolfram


The following changes since commit 6fbc7275c7a9ba97877050335f290341a1fd8dbf:

  Linux 5.2-rc7 (2019-06-30 11:25:36 +0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current

for you to fetch changes up to f3a3ea28edd9a17588fede4ff53bc02d986cf4d1:

  i2c: tegra: Add Dmitry as a reviewer (2019-07-05 20:46:56 +0200)


Dmitry Osipenko (1):
  i2c: tegra: Add Dmitry as a reviewer

 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)


signature.asc
Description: PGP signature


Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

2019-07-05 Thread Nadav Amit
> On Jul 5, 2019, at 8:47 AM, Andrew Cooper  wrote:
> 
> On 04/07/2019 16:51, Thomas Gleixner wrote:
>>  2) The loop termination logic is interesting at best.
>> 
>> If the machine has no TSC or cpu_khz is not known yet it tries 1
>> million times to ack stale IRR/ISR bits. What?
>> 
>> With TSC it uses the TSC to calculate the loop termination. It takes a
>> timestamp at entry and terminates the loop when:
>> 
>>(rdtsc() - start_timestamp) >= (cpu_hkz << 10)
>> 
>> That's roughly one second.
>> 
>> Both methods are problematic. The APIC has 256 vectors, which means
>> that in theory max. 256 IRR/ISR bits can be set. In practice this is
>> impossible as the first 32 vectors are reserved and not affected and
>> the chance that more than a few bits are set is close to zero.
> 
> [Disclaimer.  I talked to Thomas in private first, and he asked me to
> post this publicly as the CVE is almost a decade old already.]
> 
> I'm afraid that this isn't quite true.
> 
> In terms of IDT vectors, the first 32 are reserved for exceptions, but
> only the first 16 are reserved in the LAPIC.  Vectors 16-31 are fair
> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors).
> 
> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which
> I'm disappointed to see wasn't shared with other software vendors at the
> time.

IIRC (and from skimming the CVE again) the basic problem in Xen was that
MSIs can be used when devices are assigned to generate IRQs with arbitrary
vectors. The mitigation was to require interrupt remapping to be enabled in
the IOMMU when IOMMU is used for DMA remapping (i.e., device assignment).

Are you concerned about this case, additional concrete ones, or is it about
security hardening? (or am I missing something?)

[PATCH] nvme: One function call less in nvme_update_disk_info()

2019-07-05 Thread Markus Elfring
From: Markus Elfring 
Date: Fri, 5 Jul 2019 21:08:12 +0200

Avoid an extra function call by using a ternary operator instead of
a conditional statement.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/nvme/host/core.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b2dd4e391f5c..73888195bdb2 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1650,10 +1650,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
nvme_config_discard(disk, ns);
nvme_config_write_zeroes(disk, ns);

-   if (id->nsattr & (1 << 0))
-   set_disk_ro(disk, true);
-   else
-   set_disk_ro(disk, false);
+   set_disk_ro(disk, id->nsattr & (1 << 0) ? true : false);

blk_mq_unfreeze_queue(disk->queue);
 }
--
2.22.0



Re: [PATCH 2/2] usb: pci-quirks: Minor cleanup for AMD PLL quirk

2019-07-05 Thread Alan Stern
On Thu, 4 Jul 2019, Ryan Kennedy wrote:

> usb_amd_find_chipset_info() is used for chipset detection for
> several quirks. It is strange that its return value indicates
> the need for the PLL quirk, which means it is often ignored.
> This patch adds a function specifically for checking the PLL
> quirk like the other ones. Additionally, rename probe_result to
> something more appropriate.
> 
> Signed-off-by: Ryan Kennedy 

> @@ -322,6 +317,13 @@ bool usb_amd_prefetch_quirk(void)
>  }
>  EXPORT_SYMBOL_GPL(usb_amd_prefetch_quirk);
>  
> +bool usb_amd_quirk_pll_check(void)
> +{
> + usb_amd_find_chipset_info();
> + return amd_chipset.need_pll_quirk;
> +}
> +EXPORT_SYMBOL_GPL(usb_amd_quirk_pll_check);

I really don't see the point of separating out all but one line into a
different function.  You might as well just rename 
usb_amd_find_chipset_info to usb_amd_quirk_pll_check (along with the 
other code adjustments) and be done with it.

However, in the end I don't care if you still want to do this.  Either 
way:

Acked-by: Alan Stern 

Alan Stern



Re: INFO: rcu detected stall in ext4_write_checks

2019-07-05 Thread Paul E. McKenney
On Fri, Jul 05, 2019 at 05:48:31PM +0200, Dmitry Vyukov wrote:
> On Fri, Jul 5, 2019 at 5:17 PM Paul E. McKenney  wrote:
> >
> > On Fri, Jul 05, 2019 at 03:24:26PM +0200, Dmitry Vyukov wrote:
> > > On Thu, Jun 27, 2019 at 12:47 AM Theodore Ts'o  wrote:
> > > >
> > > > More details about what is going on.  First, it requires root, because
> > > > one of that is required is using sched_setattr (which is enough to
> > > > shoot yourself in the foot):
> > > >
> > > > sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_??? */, 
> > > > sched_flags=0, sched_nice=0, sched_priority=0, 
> > > > sched_runtime=2251799813724439, sched_deadline=4611686018427453437, 
> > > > sched_period=0}, 0) = 0
> > > >
> > > > This is setting the scheduler policy to be SCHED_DEADLINE, with a
> > > > runtime parameter of 2251799.813724439 seconds (or 26 days) and a
> > > > deadline of 4611686018.427453437 seconds (or 146 *years*).  This means
> > > > a particular kernel thread can run for up to 26 **days** before it is
> > > > scheduled away, and if a kernel reads gets woken up or sent a signal,
> > > > no worries, it will wake up roughly seven times the interval that Rip
> > > > Van Winkle spent snoozing in a cave in the Catskill Mountains (in
> > > > Washington Irving's short story).
> > > >
> > > > We then kick off a half-dozen threads all running:
> > > >
> > > >sendfile(fd, fd, , 0x8080fffe);
> > > >
> > > > (and since count is a ridiculously large number, this gets cut down to):
> > > >
> > > >sendfile(fd, fd, , 2147479552);
> > > >
> > > > Is it any wonder that we are seeing RCU stalls?   :-)
> > >
> > > +Peter, Ingo for sched_setattr and +Paul for rcu
> > >
> > > First of all: is it a semi-intended result of a root (CAP_SYS_NICE)
> > > doing local DoS abusing sched_setattr? It would perfectly reasonable
> > > to starve other processes, but I am not sure about rcu. In the end the
> > > high prio process can use rcu itself, and then it will simply blow
> > > system memory by stalling rcu. So it seems that rcu stalls should not
> > > happen as a result of weird sched_setattr values. If that is the case,
> > > what needs to be fixed? sched_setattr? rcu? sendfile?
> >
> > Does the (untested, probably does not even build) patch shown below help?
> > This patch assumes that the kernel was built with CONFIG_PREEMPT=n.
> > And that I found all the tight loops on the do_sendfile() code path.
> 
> The config used when this happened is referenced from here:
> https://syzkaller.appspot.com/bug?extid=4bfbbf28a2e50ab07368
> and it contains:
> CONFIG_PREEMPT=y
> 
> So... what does this mean? The loop should have been preempted without
> the cond_resched() then, right?

Exactly, so although my patch might help for CONFIG_PREEMPT=n, it won't
help in your scenario.  But looking at the dmesg from your URL above,
I see the following:

rcu: rcu_preempt kthread starved for 10549 jiffies! g8969 f0x2 
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0

And, prior to that:

rcu: All QSes seen, last rcu_preempt kthread activity 10503 
(4295056736-4295046233), jiffies_till_next_fqs=1, root ->qsmask 0x0

In other words, the grace period has finished, but RCU's grace-period
kthread hasn't gotten a chance to run, and thus hasn't marked it as
completed.  The standard workaround is to set the rcutree.kthread_prio
kernel boot parameter to a comfortably high real-time priority.

At least assuming that syzkaller isn't setting the scheduling priority
of random CPU-bound tasks to RT priority 99 or some such.  ;-)

Does that work for you?

Thanx, Paul

> > > If this is semi-intended, the only option I see is to disable
> > > something in syzkaller: sched_setattr entirely, or drop CAP_SYS_NICE,
> > > or ...? Any preference either way?
> >
> > Long-running tight loops in the kernel really should contain
> > cond_resched() or better.
> >
> > Thanx, Paul
> >
> > 
> >
> > diff --git a/fs/splice.c b/fs/splice.c
> > index 25212dcca2df..50aa3286764a 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -985,6 +985,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct 
> > splice_desc *sd,
> > sd->pos = prev_pos + ret;
> > goto out_release;
> > }
> > +   cond_resched();
> > }
> >
> >  done:
> >
> 



Re: [PATCH] dax: Fix missed PMD wakeups

2019-07-05 Thread Matthew Wilcox
On Thu, Jul 04, 2019 at 04:27:14PM -0700, Dan Williams wrote:
> On Thu, Jul 4, 2019 at 12:14 PM Matthew Wilcox  wrote:
> >
> > On Thu, Jul 04, 2019 at 06:54:50PM +0200, Jan Kara wrote:
> > > On Wed 03-07-19 20:27:28, Matthew Wilcox wrote:
> > > > So I think we're good for all current users.
> > >
> > > Agreed but it is an ugly trap. As I already said, I'd rather pay the
> > > unnecessary cost of waiting for pte entry and have an easy to understand
> > > interface. If we ever have a real world use case that would care for this
> > > optimization, we will need to refactor functions to make this possible and
> > > still keep the interfaces sane. For example get_unlocked_entry() could
> > > return special "error code" indicating that there's no entry with matching
> > > order in xarray but there's a conflict with it. That would be much less
> > > error-prone interface.
> >
> > This is an internal interface.  I think it's already a pretty gnarly
> > interface to use by definition -- it's going to sleep and might return
> > almost anything.  There's not much scope for returning an error indicator
> > either; value entries occupy half of the range (all odd numbers between 1
> > and ULONG_MAX inclusive), plus NULL.  We could use an internal entry, but
> > I don't think that makes the interface any easier to use than returning
> > a locked entry.
> >
> > I think this iteration of the patch makes it a little clearer.  What do you
> > think?
> >
> 
> Not much clearer to me. get_unlocked_entry() is now misnamed and this

misnamed?  You'd rather it was called "try_get_unlocked_entry()"?

> arrangement allows for mismatches of @order argument vs @xas
> configuration.

> Can you describe, or even better demonstrate with
> numbers, why it's better to carry this complication than just
> converging the waitqueues between the types?

You've got the reproducer ;-)  It seems quite wrong to make a page fault
stall just because another task is working on a different page in the
same 2MB chunk.


  1   2   3   4   5   6   >