[GIT PULL] SCSI fixes for 5.2-rc7
Two iscsi fixes. One for an oops in the client which can be triggered by the server authentication protocol and the other in the target code which causes data corruption. The patch is available here: git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes The short changelog is: Maurizio Lombardi (1): scsi: iscsi: set auth_protocol back to NULL if CHAP_A value is not supported Roman Bolshakov (1): scsi: target/iblock: Fix overrun in WRITE SAME emulation And the diffstat: drivers/target/iscsi/iscsi_target_auth.c | 16 drivers/target/target_core_iblock.c | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) With full diff below. James --- diff --git a/drivers/target/iscsi/iscsi_target_auth.c b/drivers/target/iscsi/iscsi_target_auth.c index 4e680d753941..e2fa3a3bc81d 100644 --- a/drivers/target/iscsi/iscsi_target_auth.c +++ b/drivers/target/iscsi/iscsi_target_auth.c @@ -89,6 +89,12 @@ static int chap_check_algorithm(const char *a_str) return CHAP_DIGEST_UNKNOWN; } +static void chap_close(struct iscsi_conn *conn) +{ + kfree(conn->auth_protocol); + conn->auth_protocol = NULL; +} + static struct iscsi_chap *chap_server_open( struct iscsi_conn *conn, struct iscsi_node_auth *auth, @@ -126,7 +132,7 @@ static struct iscsi_chap *chap_server_open( case CHAP_DIGEST_UNKNOWN: default: pr_err("Unsupported CHAP_A value\n"); - kfree(conn->auth_protocol); + chap_close(conn); return NULL; } @@ -141,19 +147,13 @@ static struct iscsi_chap *chap_server_open( * Generate Challenge. */ if (chap_gen_challenge(conn, 1, aic_str, aic_len) < 0) { - kfree(conn->auth_protocol); + chap_close(conn); return NULL; } return chap; } -static void chap_close(struct iscsi_conn *conn) -{ - kfree(conn->auth_protocol); - conn->auth_protocol = NULL; -} - static int chap_server_compute_md5( struct iscsi_conn *conn, struct iscsi_node_auth *auth, diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c index b5ed9c377060..efebacd36101 100644 --- a/drivers/target/target_core_iblock.c +++ b/drivers/target/target_core_iblock.c @@ -515,7 +515,7 @@ iblock_execute_write_same(struct se_cmd *cmd) /* Always in 512 byte units for Linux/Block */ block_lba += sg->length >> SECTOR_SHIFT; - sectors -= 1; + sectors -= sg->length >> SECTOR_SHIFT; } iblock_submit_bios();
Re: rtc: zynqmp: One function call less in xlnx_rtc_alarm_irq_enable()
> Unless you use an upstream coccinelle script or you share the one you > are using, this is not a useful information. How do you think about to extend a software development discussion on a topic like “Pretty-printing of code for ternary operators?”? https://systeme.lip6.fr/pipermail/cocci/2019-July/006079.html https://lore.kernel.org/cocci/3d2a9d9a-790c-a0f0-f980-b560504ba...@web.de/ Regards, Markus
[PATCH] irq/irqdomain: Fix typo in the comment on top of __irq_domain_add()
Fix typo in the comment on top of __irq_domain_add(). Signed-off-by: Zenghui Yu --- kernel/irq/irqdomain.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index a453e22..db7b713 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -123,7 +123,7 @@ void irq_domain_free_fwnode(struct fwnode_handle *fwnode) * @ops: domain callbacks * @host_data: Controller private data pointer * - * Allocates and initialize and irq_domain structure. + * Allocates and initializes an irq_domain structure. * Returns pointer to IRQ domain, or NULL on failure. */ struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size, -- 1.8.3.1
Re: INFO: rcu detected stall in ext4_write_checks
On Fri, Jul 05, 2019 at 12:10:55PM -0700, Paul E. McKenney wrote: > > Exactly, so although my patch might help for CONFIG_PREEMPT=n, it won't > help in your scenario. But looking at the dmesg from your URL above, > I see the following: I just tested with CONFIG_PREEMPT=n % grep CONFIG_PREEMPT /build/ext4-64/.config CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set CONFIG_PREEMPT_COUNT=y CONFIG_PREEMPTIRQ_TRACEPOINTS=y # CONFIG_PREEMPTIRQ_EVENTS is not set And with your patch, it's still not helping. I think that's because SCHED_DEADLINE is a real-time style scheduler: In order to fulfill the guarantees that are made when a thread is ad‐ mitted to the SCHED_DEADLINE policy, SCHED_DEADLINE threads are the highest priority (user controllable) threads in the system; if any SCHED_DEADLINE thread is runnable, it will preempt any thread scheduled under one of the other policies. So a SCHED_DEADLINE process is not going yield control of the CPU, even if it calls cond_resched() until the thread has run for more than the sched_runtime parameter --- which for the syzkaller repro, was set at 26 days. There are some safety checks when using SCHED_DEADLINE: The kernel requires that: sched_runtime <= sched_deadline <= sched_period In addition, under the current implementation, all of the parameter values must be at least 1024 (i.e., just over one microsecond, which is the resolution of the implementation), and less than 2^63. If any of these checks fails, sched_setattr(2) fails with the error EINVAL. The CBS guarantees non-interference between tasks, by throttling threads that attempt to over-run their specified Runtime. To ensure deadline scheduling guarantees, the kernel must prevent situ‐ ations where the set of SCHED_DEADLINE threads is not feasible (schedu‐ lable) within the given constraints. The kernel thus performs an ad‐ mittance test when setting or changing SCHED_DEADLINE policy and at‐ tributes. This admission test calculates whether the change is feasi‐ ble; if it is not, sched_setattr(2) fails with the error EBUSY. The problem is that SCHED_DEADLINE is designed for sporadic tasks: A sporadic task is one that has a sequence of jobs, where each job is activated at most once per period. Each job also has a relative dead‐ line, before which it should finish execution, and a computation time, which is the CPU time necessary for executing the job. The moment when a task wakes up because a new job has to be executed is called the ar‐ rival time (also referred to as the request time or release time). The start time is the time at which a task starts its execution. The abso‐ lute deadline is thus obtained by adding the relative deadline to the arrival time. It appears that kernel's admission control before allowing SCHED_DEADLINE to be set on a thread was designed for sane applications, and not abusive ones. Given that process started doing abusive things *after* SCHED_DEADLINE policy was set, in order kernel to figure out that in fact SCHED_DEADLINE should be denied for any arbitrary kernel thread would require either (a) solving the halting problem, or (b) being able to anticipate the future (in which case, we should be using that kernel algorithm to play the stock market :-) - Ted
[PATCH] net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
The phy_dn variable is still being used in of_phy_connect() after the of_node_put() call, which may result in use-after-free. Fixes: 1dd2d06c0459 ("net: Rework pasemi_mac driver to use of_mdio infrastructure") Signed-off-by: Wen Yang Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Luis Chamberlain Cc: Michael Ellerman Cc: net...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/net/ethernet/pasemi/pasemi_mac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/pasemi/pasemi_mac.c b/drivers/net/ethernet/pasemi/pasemi_mac.c index bf5a7bc..be66601 100644 --- a/drivers/net/ethernet/pasemi/pasemi_mac.c +++ b/drivers/net/ethernet/pasemi/pasemi_mac.c @@ -1042,7 +1042,6 @@ static int pasemi_mac_phy_init(struct net_device *dev) dn = pci_device_to_OF_node(mac->pdev); phy_dn = of_parse_phandle(dn, "phy-handle", 0); - of_node_put(phy_dn); mac->link = 0; mac->speed = 0; @@ -1051,6 +1050,7 @@ static int pasemi_mac_phy_init(struct net_device *dev) phydev = of_phy_connect(dev, phy_dn, _adjust_link, 0, PHY_INTERFACE_MODE_SGMII); + of_node_put(phy_dn); if (!phydev) { printk(KERN_ERR "%s: Could not attach to phy\n", dev->name); return -ENODEV; -- 2.9.5
next-20190705 - problems generating certs/x509_certificate_list
This worked fine in next-20190618, but in next-20190701 I'm seeing dmesg entries at boot: dmesg | grep -i x.509 [8.345699] Loading compiled-in X.509 certificates [8.366137] Problem loading in-kernel X.509 certificate (-13) [8.507348] cfg80211: Loading compiled-in X.509 certificates for regulatory database [8.526556] cfg80211: Problem loading in-kernel X.509 certificate (-13) I start debugging, and discover that certs/x509_certificate_list is a zero-length file. I rm it, and 'make V=1 certs/system_certificates.o', which tells me: () make -f ./scripts/Makefile.headersinst obj=include/uapi make -f ./scripts/Makefile.headersinst obj=arch/x86/include/uapi make -f ./scripts/Makefile.build obj=certs certs/system_certificates.o smoking gun alert scripts/extract-cert "" certs/x509_certificate_list gcc -Wp,-MD,certs/.system_certificates.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/9/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -D__ASSEMBLY__ -fno-PIE -m64 -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -Wa,-gdwarf-2 -DCC_USING_FENTRY -I. -c -o certs/system_certificates.o certs/system_certificates.S I go look at extract-cert.c, and sure enough, if the first parameter is a null string it just goes and creates an empty file. The Makefile says: quiet_cmd_extract_certs = EXTRACT_CERTS $(patsubst "%",%,$(2)) cmd_extract_certs = scripts/extract-cert $(2) $@ and damned if I know why $(2) is "". Diffed the config files from -0618 and -0705, not seeing anything relevant difference. Any ideas? pgpjQdSXdB3KK.pgp Description: PGP signature
[PATCH] net: axienet: fix a potential double free in axienet_probe()
There is a possible use-after-free issue in the axienet_probe(): 1701: np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0); 1702: if (np) { ... 1787: of_node_put(np); ---> released here 1788: lp->eth_irq = platform_get_irq(pdev, 0); 1789: } else { ... 1801: } 1802: if (IS_ERR(lp->dma_regs)) { ... 1805: of_node_put(np); ---> double released here 1806: goto free_netdev; 1807: } We solve this problem by removing the unnecessary of_node_put(). Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute optional") Signed-off-by: Wen Yang Cc: Anirudha Sarangi Cc: John Linn Cc: "David S. Miller" Cc: Michal Simek Cc: Robert Hancock Cc: net...@vger.kernel.org Cc: linux-arm-ker...@lists.infradead.org Cc: linux-kernel@vger.kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index 561e28a..4fc627f 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -1802,7 +1802,6 @@ static int axienet_probe(struct platform_device *pdev) if (IS_ERR(lp->dma_regs)) { dev_err(>dev, "could not map DMA regs\n"); ret = PTR_ERR(lp->dma_regs); - of_node_put(np); goto free_netdev; } if ((lp->rx_irq <= 0) || (lp->tx_irq <= 0)) { -- 2.9.5
[PATCH] can: flexcan: fix an use-after-free in flexcan_setup_stop_mode()
The gpr_np variable is still being used in dev_dbg() after the of_node_put() call, which may result in use-after-free. Fixes: de3578c198c6 ("can: flexcan: add self wakeup support") Signed-off-by: Wen Yang Cc: Wolfgang Grandegger Cc: Marc Kleine-Budde Cc: "David S. Miller" Cc: linux-...@vger.kernel.org Cc: net...@vger.kernel.org Cc: linux-kernel@vger.kernel.org --- drivers/net/can/flexcan.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index f2fe344..33ce45d 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -1437,10 +1437,10 @@ static int flexcan_setup_stop_mode(struct platform_device *pdev) priv = netdev_priv(dev); priv->stm.gpr = syscon_node_to_regmap(gpr_np); - of_node_put(gpr_np); if (IS_ERR(priv->stm.gpr)) { dev_dbg(>dev, "could not find gpr regmap\n"); - return PTR_ERR(priv->stm.gpr); + ret = PTR_ERR(priv->stm.gpr); + goto out_put_node; } priv->stm.req_gpr = out_val[1]; @@ -1455,7 +1455,9 @@ static int flexcan_setup_stop_mode(struct platform_device *pdev) device_set_wakeup_capable(>dev, true); - return 0; +out_put_node: + of_node_put(gpr_np); + return ret; } static const struct of_device_id flexcan_of_match[] = { -- 2.9.5
[PATCH 1/3] kbuild: remove obj and src from the top Makefile
$(obj) is not used in the top Makefile at all. $(src) is used in 3 sites, but they can be replaced with $(srctree). Signed-off-by: Masahiro Yamada --- Makefile | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/Makefile b/Makefile index 014390e32b0e..a5615edf2196 100644 --- a/Makefile +++ b/Makefile @@ -248,9 +248,6 @@ endif export KBUILD_CHECKSRC KBUILD_EXTMOD KBUILD_SRC objtree:= . -src:= $(srctree) -obj:= $(objtree) - VPATH := $(srctree) export srctree objtree VPATH @@ -1705,7 +1702,7 @@ CHECKSTACK_ARCH := $(ARCH) endif checkstack: $(OBJDUMP) -d vmlinux $$(find . -name '*.ko') | \ - $(PERL) $(src)/scripts/checkstack.pl $(CHECKSTACK_ARCH) + $(PERL) $(srctree)/scripts/checkstack.pl $(CHECKSTACK_ARCH) kernelrelease: @echo "$(KERNELVERSION)$$($(CONFIG_SHELL) $(srctree)/scripts/setlocalversion $(srctree))" @@ -1724,11 +1721,11 @@ endif tools/: FORCE $(Q)mkdir -p $(objtree)/tools - $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% -j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(src)/tools/ + $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% -j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ tools/%: FORCE $(Q)mkdir -p $(objtree)/tools - $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% -j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(src)/tools/ $* + $(Q)$(MAKE) LDFLAGS= MAKEFLAGS="$(tools_silent) $(filter --j% -j,$(MAKEFLAGS))" O=$(abspath $(objtree)) subdir=tools -C $(srctree)/tools/ $* # Single targets # --- -- 2.17.1
[PATCH 3/3] kbuild: add a flag to force absolute path for srctree
In old days, Kbuild always used an absolute path for $(srctree). Since commit 890676c65d69 ("kbuild: Use relative path when building in the source tree"), $(srctree) is '.' when not using O=. Yet, using absolute paths is useful in some cases even without O=, for instance, to create a cscope file with absolute path tags. O=. was used as an idiom to force Kbuild to use absolute paths even when you are building in the source tree. Since commit 25b146c5b8ce ("kbuild: allow Kbuild to start from any directory"), Kbuild is too clever to be tricked. Even if you pass O=. Kbuild notices you are building in the source tree, then use '.' for $(srctree). So, "make O=. cscope" is no help to create absolute path tags. We cannot force one or the other according to commit e93bc1a0cab3 ("Revert "kbuild: specify absolute paths for cscope""). Both of relative path and absolute path have pros and cons. This commit adds a new flag KBUILD_ABS_SRCTREE to allow users to choose the absolute path for $(srctree). "make KBUILD_ABS_SRCTREE=1 cscope" will work as a replacement of "make O=. cscope". I added Fixes since that commit broke some users' workflow. Fixes: 25b146c5b8ce ("kbuild: allow Kbuild to start from any directory") Reported-by: Pawan Gupta Signed-off-by: Masahiro Yamada --- Documentation/kbuild/kbuild.txt | 9 + Makefile| 4 scripts/tags.sh | 3 +-- 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/Documentation/kbuild/kbuild.txt b/Documentation/kbuild/kbuild.txt index 7a7e2aa2fab5..3ef42f87f275 100644 --- a/Documentation/kbuild/kbuild.txt +++ b/Documentation/kbuild/kbuild.txt @@ -182,6 +182,15 @@ The output directory is often set using "O=..." on the commandline. The value can be overridden in which case the default value is ignored. +KBUILD_ABS_SRCTREE +-- +Kbuild uses a relative path to point to the tree when possible. For instance, +when building in the source tree, the source tree path is '.' + +Setting this flag requests Kbuild to use absolute path to the source tree. +There are some useful cases to do so, like when generating tag files with +absolute path entries etc. + KBUILD_SIGN_PIN -- This variable allows a passphrase or PIN to be passed to the sign-file diff --git a/Makefile b/Makefile index 534a5dc796b1..6dc453f86f00 100644 --- a/Makefile +++ b/Makefile @@ -244,6 +244,10 @@ else building_out_of_srctree := 1 endif +ifneq ($(KBUILD_ABS_SRCTREE),) +srctree := $(abs_srctree) +endif + objtree:= . VPATH := $(srctree) diff --git a/scripts/tags.sh b/scripts/tags.sh index 7fea4044749b..4e18ae5282a6 100755 --- a/scripts/tags.sh +++ b/scripts/tags.sh @@ -17,8 +17,7 @@ ignore="$(echo "$RCS_FIND_IGNORE" | sed 's|\\||g' )" # tags and cscope files should also ignore MODVERSION *.mod.c files ignore="$ignore ( -name *.mod.c ) -prune -o" -# Do not use full path if we do not use O=.. builds -# Use make O=. {tags|cscope} +# Use make KBUILD_ABS_SRCTREE=1 {tags|cscope} # to force full paths for a non-O= build if [ "${srctree}" = "." -o -z "${srctree}" ]; then tree= -- 2.17.1
[PATCH 2/3] kbuild: replace KBUILD_SRCTREE with boolean building_out_of_srctree
Commit 25b146c5b8ce ("kbuild: allow Kbuild to start from any directory") deprecated KBUILD_SRCTREE. It is only used in tools/testing/selftest/ to distinguish out-of-tree build. Replace it with a new boolean flag, building_out_of_srctree. I also replaced the conditional ($(srctree),.) because the next commit will allow an absolute path for $(srctree) even when building in the source tree. Signed-off-by: Masahiro Yamada --- Makefile | 19 --- scripts/Makefile.build | 2 +- scripts/Makefile.host| 2 +- scripts/Makefile.lib | 2 +- scripts/Makefile.modbuiltin | 2 +- scripts/gdb/linux/Makefile | 2 +- tools/testing/selftests/Makefile | 2 +- tools/testing/selftests/lib.mk | 4 ++-- 8 files changed, 16 insertions(+), 19 deletions(-) diff --git a/Makefile b/Makefile index a5615edf2196..534a5dc796b1 100644 --- a/Makefile +++ b/Makefile @@ -228,9 +228,12 @@ ifeq ("$(origin M)", "command line") KBUILD_EXTMOD := $(M) endif +export KBUILD_CHECKSRC KBUILD_EXTMOD + ifeq ($(abs_srctree),$(abs_objtree)) # building in the source tree srctree := . + building_out_of_srctree := else ifeq ($(abs_srctree)/,$(dir $(abs_objtree))) # building in a subdirectory of the source tree @@ -238,19 +241,13 @@ else else srctree := $(abs_srctree) endif - - # TODO: - # KBUILD_SRC is only used to distinguish in-tree/out-of-tree build. - # Replace it with $(srctree) or something. - KBUILD_SRC := $(abs_srctree) + building_out_of_srctree := 1 endif -export KBUILD_CHECKSRC KBUILD_EXTMOD KBUILD_SRC - objtree:= . VPATH := $(srctree) -export srctree objtree VPATH +export building_out_of_srctree srctree objtree VPATH # To make sure we do not include .config for any of the *config targets # catch them early, and hand them over to scripts/kconfig/Makefile @@ -453,7 +450,7 @@ USERINCLUDE:= \ LINUXINCLUDE:= \ -I$(srctree)/arch/$(SRCARCH)/include \ -I$(objtree)/arch/$(SRCARCH)/include/generated \ - $(if $(filter .,$(srctree)),,-I$(srctree)/include) \ + $(if $(building_out_of_srctree),-I$(srctree)/include) \ -I$(objtree)/include \ $(USERINCLUDE) @@ -509,7 +506,7 @@ PHONY += outputmakefile # At the same time when output Makefile generated, generate .gitignore to # ignore whole output directory outputmakefile: -ifneq ($(srctree),.) +ifdef building_out_of_srctree $(Q)ln -fsn $(srctree) source $(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkmakefile $(srctree) $(Q)test -e .gitignore || \ @@ -1093,7 +1090,7 @@ PHONY += prepare archprepare prepare1 prepare3 # and if so do: # 1) Check that make has not been executed in the kernel src $(srctree) prepare3: include/config/kernel.release -ifneq ($(srctree),.) +ifdef building_out_of_srctree @$(kecho) ' Using $(srctree) as source for kernel' $(Q)if [ -f $(srctree)/.config -o \ -d $(srctree)/include/config -o \ diff --git a/scripts/Makefile.build b/scripts/Makefile.build index 341fca59d28f..1086caaac786 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -509,7 +509,7 @@ existing-targets := $(wildcard $(sort $(targets))) -include $(foreach f,$(existing-targets),$(dir $(f)).$(notdir $(f)).cmd) -ifneq ($(srctree),.) +ifdef building_out_of_srctree # Create directories for object files if they do not exist obj-dirs := $(sort $(obj) $(patsubst %/,%, $(dir $(targets # If targets exist, their directories apparently exist. Skip mkdir. diff --git a/scripts/Makefile.host b/scripts/Makefile.host index b6a54bdf0965..fcf0213e6ac8 100644 --- a/scripts/Makefile.host +++ b/scripts/Makefile.host @@ -69,7 +69,7 @@ _hostcxx_flags = $(KBUILD_HOSTCXXFLAGS) $(HOST_EXTRACXXFLAGS) \ # $(objtree)/$(obj) for including generated headers from checkin source files ifeq ($(KBUILD_EXTMOD),) -ifneq ($(srctree),.) +ifdef building_out_of_srctree _hostc_flags += -I $(objtree)/$(obj) _hostcxx_flags += -I $(objtree)/$(obj) endif diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index 4d006923763c..f835a40ebae5 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -148,7 +148,7 @@ endif # $(srctree)/$(src) for including checkin headers from generated source files # $(objtree)/$(obj) for including generated headers from checkin source files ifeq ($(KBUILD_EXTMOD),) -ifneq ($(srctree),.) +ifdef building_out_of_srctree _c_flags += -I $(srctree)/$(src) -I $(objtree)/$(obj) _a_flags += -I $(srctree)/$(src) -I $(objtree)/$(obj) _cpp_flags += -I $(srctree)/$(src) -I $(objtree)/$(obj) diff --git a/scripts/Makefile.modbuiltin b/scripts/Makefile.modbuiltin index 12ac300fe51b..7d4711b88656 100644 --- a/scripts/Makefile.modbuiltin +++ b/scripts/Makefile.modbuiltin @@ -15,7 +15,7 @@ include
[git pull] fix bogus default y in Kconfig (VALIDATE_FS_PARSER)
That thing should not be turned on by default, especially since it's not quiet in case it finds no problems. Geert has sent the obvious fix quite a few times, but it fell through the cracks. The following changes since commit 570d7a98e7d6d5d8706d94ffd2d40adeaa318332: vfs: move_mount: reject moving kernel internal mounts (2019-07-01 10:46:36 -0400) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git fixes for you to fetch changes up to 75f2d86b20bf6aec0392d6dd2ae326d2ae0e: fs: VALIDATE_FS_PARSER should default to n (2019-07-05 11:22:11 -0400) Geert Uytterhoeven (1): fs: VALIDATE_FS_PARSER should default to n fs/Kconfig | 1 - 1 file changed, 1 deletion(-)
Re: kernel BUG at mm/swap_state.c:170!
On Fri, Jul 5, 2019 at 4:03 PM Jan Kara wrote: > > Yeah, I guess revert of 5fd4ca2d84b2 at this point is probably the best we > can do. Let's CC Linus, Andrew, and Greg (Linus is travelling AFAIK so I'm > not sure whether Greg won't do release for him). I'm back home now, although possibly jetlagged. The revert looks trivial (a conflict due to find_get_entries_tag() having been removed in the meantime), and I guess that's the right thing to do right now. Matthew, comments? Linus
Re: [GIT PULL] nfsd bugfixes for 5.2
The pull request you sent on Fri, 5 Jul 2019 13:40:37 -0400: > git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.2-2 has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/a8f46b5afe1c0a83c3013a339e6aeccc2f37342d Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [GIT PULL] Final KVM changes for 5.2
The pull request you sent on Fri, 5 Jul 2019 22:29:30 +0200: > https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/9fdb86c8cf9ae201d97334ecc2d1918800cac424 Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PULL REQUEST] i2c for 5.2
The pull request you sent on Fri, 5 Jul 2019 21:21:29 +0200: > git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/881ed91f7db58fcbe8fdca056907991c3c9d8f2d Thank you! -- Deet-doot-dot, I am a bot. https://korg.wiki.kernel.org/userdoc/prtracker
Re: [PATCH 2/2] usb: pci-quirks: Minor cleanup for AMD PLL quirk
On Fri, Jul 5, 2019 at 3:10 PM Alan Stern wrote: > > On Thu, 4 Jul 2019, Ryan Kennedy wrote: > > > usb_amd_find_chipset_info() is used for chipset detection for > > several quirks. It is strange that its return value indicates > > the need for the PLL quirk, which means it is often ignored. > > This patch adds a function specifically for checking the PLL > > quirk like the other ones. Additionally, rename probe_result to > > something more appropriate. > > > > Signed-off-by: Ryan Kennedy > > > @@ -322,6 +317,13 @@ bool usb_amd_prefetch_quirk(void) > > } > > EXPORT_SYMBOL_GPL(usb_amd_prefetch_quirk); > > > > +bool usb_amd_quirk_pll_check(void) > > +{ > > + usb_amd_find_chipset_info(); > > + return amd_chipset.need_pll_quirk; > > +} > > +EXPORT_SYMBOL_GPL(usb_amd_quirk_pll_check); > > I really don't see the point of separating out all but one line into a > different function. You might as well just rename > usb_amd_find_chipset_info to usb_amd_quirk_pll_check (along with the > other code adjustments) and be done with it. I did this for consistency with the others: usb_amd_prefetch_quirk() usb_amd_hang_symptom_quirk() usb_hcd_amd_remote_wakeup_quirk() They all need to ensure the chipset information exists then decide if the particular quirk should be applied to the chipset. Ryan > > However, in the end I don't care if you still want to do this. Either > way: > > Acked-by: Alan Stern > > Alan Stern >
[PATCH v7 2/2] KVM: LAPIC: Inject timer interrupt via posted interrupt
From: Wanpeng Li Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers fire on the same pCPUs which vCPUs resident. There is no hardware virtual timer on Intel for guest like ARM. Both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit which is incurred by the emulated timer fires in dedicated instance scenario. When nohz_full is enabled in dedicated instances scenario, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv is really common in recent years. The guest timer interrupt is injected by posted-interrupt which is delivered by housekeeping cpu once the emulated timer fires. The host admin should fine tuned, e.g. dedicated instances scenario w/ nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root mode, ~3% redis performance benefit can be observed on Skylake server. w/o patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT4291649.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) w/ patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT6871 9.29% 2.96% 0.44us57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Marcelo Tosatti Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c| 101 ++-- arch/x86/kvm/lapic.h| 1 + arch/x86/kvm/vmx/vmx.c | 3 +- arch/x86/kvm/x86.c | 6 +++ arch/x86/kvm/x86.h | 2 + include/linux/sched/isolation.h | 2 + kernel/sched/isolation.c| 6 +++ 7 files changed, 85 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 707ca9c..4869691 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -128,6 +128,17 @@ static inline u32 kvm_x2apic_id(struct kvm_lapic *apic) return apic->vcpu->vcpu_id; } +bool kvm_can_post_timer_interrupt(struct kvm_vcpu *vcpu) +{ + return pi_inject_timer && kvm_vcpu_apicv_active(vcpu); +} +EXPORT_SYMBOL_GPL(kvm_can_post_timer_interrupt); + +static bool kvm_use_posted_timer_interrupt(struct kvm_vcpu *vcpu) +{ + return kvm_can_post_timer_interrupt(vcpu) && vcpu->mode == IN_GUEST_MODE; +} + static inline bool kvm_apic_map_get_logical_dest(struct kvm_apic_map *map, u32 dest_id, struct kvm_lapic ***cluster, u16 *mask) { switch (map->mode) { @@ -1436,29 +1447,6 @@ static void apic_update_lvtt(struct kvm_lapic *apic) } } -static void apic_timer_expired(struct kvm_lapic *apic) -{ - struct kvm_vcpu *vcpu = apic->vcpu; - struct swait_queue_head *q = >wq; - struct kvm_timer *ktimer = >lapic_timer; - - if (atomic_read(>lapic_timer.pending)) - return; - - atomic_inc(>lapic_timer.pending); - kvm_set_pending_timer(vcpu); - - /* -* For x86, the atomic_inc() is serialized, thus -* using swait_active() is safe. -*/ - if (swait_active(q)) - swake_up_one(q); - - if (apic_lvtt_tscdeadline(apic) || ktimer->hv_timer_in_use) - ktimer->expired_tscdeadline = ktimer->tscdeadline; -} - /* * On APICv, this test will cause a busy wait * during a higher-priority task. @@ -1532,7 +1520,7 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu, apic->lapic_timer.timer_advance_ns = timer_advance_ns; } -void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) +static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; u64 guest_tsc, tsc_deadline; @@ -1540,9 +1528,6 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) if (apic->lapic_timer.expired_tscdeadline == 0) return; - if (!lapic_timer_int_injected(vcpu)) - return; - tsc_deadline = apic->lapic_timer.expired_tscdeadline; apic->lapic_timer.expired_tscdeadline = 0; guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc()); @@ -1554,8 +1539,59 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) if (unlikely(!apic->lapic_timer.timer_advance_adjust_done)) adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta); } + +void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) +{ + if (!lapic_timer_int_injected(vcpu)) + return; + + __kvm_wait_lapic_expire(vcpu); +} EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire); +static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic) +{ + struct kvm_timer *ktimer = >lapic_timer; + + kvm_apic_local_deliver(apic, APIC_LVTT); + if (apic_lvtt_tscdeadline(apic)) + ktimer->tscdeadline = 0; + if (apic_lvtt_oneshot(apic)) { +
[PATCH v7 0/2] KVM: LAPIC: Implement Exitless Timer
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers fire on the same pCPUs which vCPUs resident. There is no hardware virtual timer on Intel for guest like ARM. Both programming timer in guest and the emulated timer fires incur vmexits. This patchset tries to avoid vmexit which is incurred by the emulated timer fires in dedicated instance scenario. When nohz_full is enabled in dedicated instances scenario, the unpinned timer will be moved to the nearest busy housekeepers after commit 9642d18eee2cd (nohz: Affine unpinned timers to housekeepers) and commit 444969223c8 ("sched/nohz: Fix affine unpinned timers mess"). However, KVM always makes lapic timer pinned to the pCPU which vCPU residents, the reason is explained by commit 61abdbe0 (kvm: x86: make lapic hrtimer pinned). Actually, these emulated timers can be offload to the housekeeping cpus since APICv is really common in recent years. The guest timer interrupt is injected by posted-interrupt which is delivered by housekeeping cpu once the emulated timer fires. The host admin should fine tuned, e.g. dedicated instances scenario w/ nohz_full cover the pCPUs which vCPUs resident, several pCPUs surplus for busy housekeeping, disable mwait/hlt/pause vmexits to keep in non-root mode, ~3% redis performance benefit can be observed on Skylake server. w/o patchset: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT4291649.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) w/ patchset: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT6871 9.29% 2.96% 0.44us57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Marcelo Tosatti v6 -> v7: * remove bool argument v5 -> v6: * don't overwrites whatever the user specified * introduce kvm_can_post_timer_interrupt and kvm_use_posted_timer_interrupt * remove kvm_hlt_in_guest() condition * squash all of 2/3/4 together v4 -> v5: * update patch description in patch 1/4 * feed latest apic->lapic_timer.expired_tscdeadline to kvm_wait_lapic_expire() * squash advance timer handling to patch 2/4 v3 -> v4: * drop the HRTIMER_MODE_ABS_PINNED, add kick after set pending timer * don't posted inject already-expired timer v2 -> v3: * disarming the vmx preemption timer when posted_interrupt_inject_timer_enabled() * check kvm_hlt_in_guest instead v1 -> v2: * check vcpu_halt_in_guest * move module parameter from kvm-intel to kvm * add housekeeping_enabled * rename apic_timer_expired_pi to kvm_apic_inject_pending_timer_irqs Wanpeng Li (2): KVM: LAPIC: Make lapic timer unpinned KVM: LAPIC: Inject timer interrupt via posted interrupt arch/x86/kvm/lapic.c| 109 ++-- arch/x86/kvm/lapic.h| 1 + arch/x86/kvm/vmx/vmx.c | 3 +- arch/x86/kvm/x86.c | 12 +++-- arch/x86/kvm/x86.h | 2 + include/linux/sched/isolation.h | 2 + kernel/sched/isolation.c| 6 +++ 7 files changed, 90 insertions(+), 45 deletions(-) -- 1.8.3.1
[PATCH v7 1/2] KVM: LAPIC: Make lapic timer unpinned
From: Wanpeng Li Commit 61abdbe0bcc2 ("kvm: x86: make lapic hrtimer pinned") pinned the lapic timer to avoid to wait until the next kvm exit for the guest to see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned will be used in follow up patches. Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Marcelo Tosatti Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 8 arch/x86/kvm/x86.c | 6 +- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 459d1ee..707ca9c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1582,7 +1582,7 @@ static void start_sw_tscdeadline(struct kvm_lapic *apic) likely(ns > apic->lapic_timer.timer_advance_ns)) { expire = ktime_add_ns(now, ns); expire = ktime_sub_ns(expire, ktimer->timer_advance_ns); - hrtimer_start(>timer, expire, HRTIMER_MODE_ABS_PINNED); + hrtimer_start(>timer, expire, HRTIMER_MODE_ABS); } else apic_timer_expired(apic); @@ -1684,7 +1684,7 @@ static void start_sw_period(struct kvm_lapic *apic) hrtimer_start(>lapic_timer.timer, apic->lapic_timer.target_expiration, - HRTIMER_MODE_ABS_PINNED); + HRTIMER_MODE_ABS); } bool kvm_lapic_hv_timer_in_use(struct kvm_vcpu *vcpu) @@ -2321,7 +2321,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) apic->vcpu = vcpu; hrtimer_init(>lapic_timer.timer, CLOCK_MONOTONIC, -HRTIMER_MODE_ABS_PINNED); +HRTIMER_MODE_ABS); apic->lapic_timer.timer.function = apic_timer_fn; if (timer_advance_ns == -1) { apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; @@ -2510,7 +2510,7 @@ void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu) timer = >arch.apic->lapic_timer.timer; if (hrtimer_cancel(timer)) - hrtimer_start_expires(timer, HRTIMER_MODE_ABS_PINNED); + hrtimer_start_expires(timer, HRTIMER_MODE_ABS); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 3a7cd935..e199ac7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1437,12 +1437,8 @@ static void update_pvclock_gtod(struct timekeeper *tk) void kvm_set_pending_timer(struct kvm_vcpu *vcpu) { - /* -* Note: KVM_REQ_PENDING_TIMER is implicitly checked in -* vcpu_enter_guest. This function is only called from -* the physical CPU that is running vcpu. -*/ kvm_make_request(KVM_REQ_PENDING_TIMER, vcpu); + kvm_vcpu_kick(vcpu); } static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) -- 1.8.3.1
Re: linux-next: build failure after merge of the kbuild tree
Hi Michael, On Sat, Jul 6, 2019 at 9:05 AM Michael Kelley wrote: > > From: Stephen Rothwell Sent: Friday, July 5, 2019 > 1:31 AM > > > > After merging the kbuild tree, today's linux-next build (powerpc > > allyesconfig) failed like this: > > > > In file included from : > > include/clocksource/hyperv_timer.h:18:10: fatal error: asm/mshyperv.h: No > > such file or > > directory > > #include > > ^~~~ > > > > Caused by commit > > > > 34085aeb5816 ("kbuild: compile-test kernel headers to ensure they are > > self-contained") > > > > interacting with commit > > > > dd2cb348613b ("clocksource/drivers: Continue making Hyper-V clocksource > > ISA agnostic") > > > > from the tip tree. > > > > Thomas -- let's remove my two clocksource patches from your 'tip' tree. I'll > need > a little time to fully understand the self-contained header requirements and > restructure > hyperv_timer.h to avoid this problem. I do not think you have to drop your patches. Since only exists in x86, guarding it by CONFIG_X86 is OK. So, I think Stephen's patch is OK as-is. Perhaps, Kbuild is imposing too much burden, but I'd like to try it and see how it goes. -- Best Regards Masahiro Yamada
Hi
I need your help
[PATCH 2/6] fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
From: Al Viro make unhash_mnt() return the mountpoint to be dropped, let callers deal with it. Signed-off-by: Al Viro --- fs/namespace.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 746e3fd1f430..b7059a4f07e3 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -795,15 +795,17 @@ static void __touch_mnt_namespace(struct mnt_namespace *ns) /* * vfsmount lock must be held for write */ -static void unhash_mnt(struct mount *mnt) +static struct mountpoint *unhash_mnt(struct mount *mnt) { + struct mountpoint *mp; mnt->mnt_parent = mnt; mnt->mnt_mountpoint = mnt->mnt.mnt_root; list_del_init(>mnt_child); hlist_del_init_rcu(>mnt_hash); hlist_del_init(>mnt_mp_list); - put_mountpoint(mnt->mnt_mp); + mp = mnt->mnt_mp; mnt->mnt_mp = NULL; + return mp; } /* @@ -813,7 +815,7 @@ static void detach_mnt(struct mount *mnt, struct path *old_path) { old_path->dentry = mnt->mnt_mountpoint; old_path->mnt = >mnt_parent->mnt; - unhash_mnt(mnt); + put_mountpoint(unhash_mnt(mnt)); } /* @@ -823,7 +825,7 @@ static void umount_mnt(struct mount *mnt) { /* old mountpoint will be dropped when we can do that */ mnt->mnt_ex_mountpoint = mnt->mnt_mountpoint; - unhash_mnt(mnt); + put_mountpoint(unhash_mnt(mnt)); } /* -- 2.11.0
[PATCH 6/6] switch the remnants of releasing the mountpoint away from fs_pin
From: Al Viro We used to need rather convoluted ordering trickery to guarantee that dput() of ex-mountpoints happens before the final mntput() of the same. Since we don't need that anymore, there's no point playing with fs_pin for that. Signed-off-by: Al Viro --- fs/fs_pin.c| 10 ++ fs/mount.h | 7 +-- fs/namespace.c | 37 +++-- include/linux/fs_pin.h | 1 - 4 files changed, 26 insertions(+), 29 deletions(-) diff --git a/fs/fs_pin.c b/fs/fs_pin.c index a6497cf8ae53..47ef3c71ce90 100644 --- a/fs/fs_pin.c +++ b/fs/fs_pin.c @@ -19,20 +19,14 @@ void pin_remove(struct fs_pin *pin) spin_unlock_irq(>wait.lock); } -void pin_insert_group(struct fs_pin *pin, struct vfsmount *m, struct hlist_head *p) +void pin_insert(struct fs_pin *pin, struct vfsmount *m) { spin_lock(_lock); - if (p) - hlist_add_head(>s_list, p); + hlist_add_head(>s_list, >mnt_sb->s_pins); hlist_add_head(>m_list, _mount(m)->mnt_pins); spin_unlock(_lock); } -void pin_insert(struct fs_pin *pin, struct vfsmount *m) -{ - pin_insert_group(pin, m, >mnt_sb->s_pins); -} - void pin_kill(struct fs_pin *p) { wait_queue_entry_t wait; diff --git a/fs/mount.h b/fs/mount.h index 84aa8cdf4971..711a4093e475 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -58,7 +58,10 @@ struct mount { struct mount *mnt_master; /* slave is on master->mnt_slave_list */ struct mnt_namespace *mnt_ns; /* containing namespace */ struct mountpoint *mnt_mp; /* where is it mounted */ - struct hlist_node mnt_mp_list; /* list mounts with the same mountpoint */ + union { + struct hlist_node mnt_mp_list; /* list mounts with the same mountpoint */ + struct hlist_node mnt_umount; + }; struct list_head mnt_umounting; /* list entry for umount propagation */ #ifdef CONFIG_FSNOTIFY struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks; @@ -68,7 +71,7 @@ struct mount { int mnt_group_id; /* peer group identifier */ int mnt_expiry_mark;/* true if marked for expiry */ struct hlist_head mnt_pins; - struct fs_pin mnt_umount; + struct hlist_head mnt_stuck_children; } __randomize_layout; #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */ diff --git a/fs/namespace.c b/fs/namespace.c index 326a9ab591bc..a5d0eac9749d 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -171,13 +171,6 @@ unsigned int mnt_get_count(struct mount *mnt) #endif } -static void drop_mountpoint(struct fs_pin *p) -{ - struct mount *m = container_of(p, struct mount, mnt_umount); - pin_remove(p); - mntput(>mnt); -} - static struct mount *alloc_vfsmnt(const char *name) { struct mount *mnt = kmem_cache_zalloc(mnt_cache, GFP_KERNEL); @@ -215,7 +208,7 @@ static struct mount *alloc_vfsmnt(const char *name) INIT_LIST_HEAD(>mnt_slave); INIT_HLIST_NODE(>mnt_mp_list); INIT_LIST_HEAD(>mnt_umounting); - init_fs_pin(>mnt_umount, drop_mountpoint); + INIT_HLIST_HEAD(>mnt_stuck_children); } return mnt; @@ -1079,19 +1072,22 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root, static void cleanup_mnt(struct mount *mnt) { + struct hlist_node *p; + struct mount *m; /* -* This probably indicates that somebody messed -* up a mnt_want/drop_write() pair. If this -* happens, the filesystem was probably unable -* to make r/w->r/o transitions. -*/ - /* +* The warning here probably indicates that somebody messed +* up a mnt_want/drop_write() pair. If this happens, the +* filesystem was probably unable to make r/w->r/o transitions. * The locking used to deal with mnt_count decrement provides barriers, * so mnt_get_writers() below is safe. */ WARN_ON(mnt_get_writers(mnt)); if (unlikely(mnt->mnt_pins.first)) mnt_pin_kill(mnt); + hlist_for_each_entry_safe(m, p, >mnt_stuck_children, mnt_umount) { + hlist_del(>mnt_umount); + mntput(>mnt); + } fsnotify_vfsmount_delete(>mnt); dput(mnt->mnt.mnt_root); deactivate_super(mnt->mnt.mnt_sb); @@ -1160,6 +1156,7 @@ static void mntput_no_expire(struct mount *mnt) struct mount *p, *tmp; list_for_each_entry_safe(p, tmp, >mnt_mounts, mnt_child) { umount_mnt(p, ); + hlist_add_head(>mnt_umount, >mnt_stuck_children); } } unlock_mount_hash(); @@ -1352,6 +1349,8 @@ EXPORT_SYMBOL(may_umount); static void namespace_unlock(void) { struct hlist_head head; + struct hlist_node *p; + struct mount *m;
[PATCH 4/6] make struct mountpoint bear the dentry reference to mountpoint, not struct mount
From: Al Viro Signed-off-by: Al Viro --- fs/mount.h | 1 - fs/namespace.c | 66 +- 2 files changed, 28 insertions(+), 39 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 6250de544760..84aa8cdf4971 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -69,7 +69,6 @@ struct mount { int mnt_expiry_mark;/* true if marked for expiry */ struct hlist_head mnt_pins; struct fs_pin mnt_umount; - struct dentry *mnt_ex_mountpoint; } __randomize_layout; #define MNT_NS_INTERNAL ERR_PTR(-EINVAL) /* distinct from any mnt_namespace */ diff --git a/fs/namespace.c b/fs/namespace.c index b7059a4f07e3..911675de2a70 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -69,6 +69,8 @@ static struct hlist_head *mount_hashtable __read_mostly; static struct hlist_head *mountpoint_hashtable __read_mostly; static struct kmem_cache *mnt_cache __read_mostly; static DECLARE_RWSEM(namespace_sem); +static HLIST_HEAD(unmounted); /* protected by namespace_sem */ +static LIST_HEAD(ex_mountpoints); /* /sys/fs */ struct kobject *fs_kobj; @@ -172,7 +174,6 @@ unsigned int mnt_get_count(struct mount *mnt) static void drop_mountpoint(struct fs_pin *p) { struct mount *m = container_of(p, struct mount, mnt_umount); - dput(m->mnt_ex_mountpoint); pin_remove(p); mntput(>mnt); } @@ -739,7 +740,7 @@ static struct mountpoint *get_mountpoint(struct dentry *dentry) /* Add the new mountpoint to the hash table */ read_seqlock_excl(_lock); - new->m_dentry = dentry; + new->m_dentry = dget(dentry); new->m_count = 1; hlist_add_head(>m_hash, mp_hash(dentry)); INIT_HLIST_HEAD(>m_list); @@ -752,7 +753,7 @@ static struct mountpoint *get_mountpoint(struct dentry *dentry) return mp; } -static void put_mountpoint(struct mountpoint *mp) +static void put_mountpoint(struct mountpoint *mp, struct list_head *list) { if (!--mp->m_count) { struct dentry *dentry = mp->m_dentry; @@ -760,6 +761,9 @@ static void put_mountpoint(struct mountpoint *mp) spin_lock(>d_lock); dentry->d_flags &= ~DCACHE_MOUNTED; spin_unlock(>d_lock); + if (!list) + list = _mountpoints; + dput_to_list(dentry, list); hlist_del(>m_hash); kfree(mp); } @@ -813,19 +817,17 @@ static struct mountpoint *unhash_mnt(struct mount *mnt) */ static void detach_mnt(struct mount *mnt, struct path *old_path) { - old_path->dentry = mnt->mnt_mountpoint; + old_path->dentry = dget(mnt->mnt_mountpoint); old_path->mnt = >mnt_parent->mnt; - put_mountpoint(unhash_mnt(mnt)); + put_mountpoint(unhash_mnt(mnt), NULL); } /* * vfsmount lock must be held for write */ -static void umount_mnt(struct mount *mnt) +static void umount_mnt(struct mount *mnt, struct list_head *list) { - /* old mountpoint will be dropped when we can do that */ - mnt->mnt_ex_mountpoint = mnt->mnt_mountpoint; - put_mountpoint(unhash_mnt(mnt)); + put_mountpoint(unhash_mnt(mnt), list); } /* @@ -837,7 +839,7 @@ void mnt_set_mountpoint(struct mount *mnt, { mp->m_count++; mnt_add_count(mnt, 1); /* essentially, that's mntget */ - child_mnt->mnt_mountpoint = dget(mp->m_dentry); + child_mnt->mnt_mountpoint = mp->m_dentry; child_mnt->mnt_parent = mnt; child_mnt->mnt_mp = mp; hlist_add_head(_mnt->mnt_mp_list, >m_list); @@ -864,7 +866,6 @@ static void attach_mnt(struct mount *mnt, void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct mount *mnt) { struct mountpoint *old_mp = mnt->mnt_mp; - struct dentry *old_mountpoint = mnt->mnt_mountpoint; struct mount *old_parent = mnt->mnt_parent; list_del_init(>mnt_child); @@ -873,23 +874,7 @@ void mnt_change_mountpoint(struct mount *parent, struct mountpoint *mp, struct m attach_mnt(mnt, parent, mp); - put_mountpoint(old_mp); - - /* -* Safely avoid even the suggestion this code might sleep or -* lock the mount hash by taking advantage of the knowledge that -* mnt_change_mountpoint will not release the final reference -* to a mountpoint. -* -* During mounting, the mount passed in as the parent mount will -* continue to use the old mountpoint and during unmounting, the -* old mountpoint will continue to exist until namespace_unlock, -* which happens well after mnt_change_mountpoint. -*/ - spin_lock(_mountpoint->d_lock); - old_mountpoint->d_lockref.count--; - spin_unlock(_mountpoint->d_lock); - + put_mountpoint(old_mp, NULL); mnt_add_count(old_parent, -1); } @@ -1142,6 +1127,8 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput);
[PATCH 1/6] __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
From: Al Viro ... not since 1e9c75fb9c47 ("mnt: fix __detach_mounts infinite loop") Signed-off-by: Al Viro --- fs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/namespace.c b/fs/namespace.c index 6fbc9126367a..746e3fd1f430 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1625,7 +1625,7 @@ void __detach_mounts(struct dentry *dentry) namespace_lock(); lock_mount_hash(); mp = lookup_mountpoint(dentry); - if (IS_ERR_OR_NULL(mp)) + if (!mp) goto out_unlock; event++; -- 2.11.0
[PATCH 3/6] Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
From: Al Viro Currently, running into a shrink list that contains dentries from different filesystems can cause several unpleasant things for shrink_dcache_parent() and for umount(2). The first problem is that there's a window during shrink_dentry_list() between __dentry_kill() takes a victim out and dropping reference to its parent. During that window the parent looks like a genuine busy dentry. shrink_dcache_parent() (or, worse yet, shrink_dcache_for_umount()) coming at that time will see no eviction candidates and no indication that it needs to wait for some shrink_dentry_list() to proceed further. That applies for any shrink list that might intersect with the subtree we are trying to shrink; the only reason it does not blow on umount(2) in the mainline is that we unregister the memory shrinker before hitting shrink_dcache_for_umount(). Another problem happens if something in a mixed-filesystem shrink list gets be stuck in e.g. iput(), getting umount of unrelated fs to spin waiting for the stuck shrinker to get around to our dentries. Solution: 1) have shrink_dentry_list() decrement the parent's refcount and make sure it's on a shrink list (ours unless it already had been on some other) before calling __dentry_kill(). That eliminates the window when shrink_dcache_parent() would've blown past the entire subtree without noticing anything with zero refcount not on shrink lists. 2) when shrink_dcache_parent() has found no eviction candidates, but some dentries are still sitting on shrink lists, rather than repeating the scan in hope that shrinkers have progressed, scan looking for something on shrink lists with zero refcount. If such a thing is found, grab rcu_read_lock() and stop the scan, with caller locking it for eviction, dropping out of RCU and doing __dentry_kill(), with the same treatment for parent as shrink_dentry_list() would do. Note that right now mixed-filesystem shrink lists do not occur, so this is not a mainline bug. Howevere, there's a bunch of uses for such beasts (e.g. the "try and evict everything we can out of given page" patches; there are potential uses in mount-related code, considerably simplifying the life in fs/namespace.c, etc.) Signed-off-by: Al Viro --- fs/dcache.c | 98 --- fs/internal.h | 2 ++ 2 files changed, 83 insertions(+), 17 deletions(-) diff --git a/fs/dcache.c b/fs/dcache.c index c435398f2c81..d8732cf2e302 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -861,6 +861,32 @@ void dput(struct dentry *dentry) } EXPORT_SYMBOL(dput); +static void __dput_to_list(struct dentry *dentry, struct list_head *list) +__must_hold(>d_lock) +{ + if (dentry->d_flags & DCACHE_SHRINK_LIST) { + /* let the owner of the list it's on deal with it */ + --dentry->d_lockref.count; + } else { + if (dentry->d_flags & DCACHE_LRU_LIST) + d_lru_del(dentry); + if (!--dentry->d_lockref.count) + d_shrink_add(dentry, list); + } +} + +void dput_to_list(struct dentry *dentry, struct list_head *list) +{ + rcu_read_lock(); + if (likely(fast_dput(dentry))) { + rcu_read_unlock(); + return; + } + rcu_read_unlock(); + if (!retain_dentry(dentry)) + __dput_to_list(dentry, list); + spin_unlock(>d_lock); +} /* This must be called with d_lock held */ static inline void __dget_dlock(struct dentry *dentry) @@ -1067,7 +1093,7 @@ static bool shrink_lock_dentry(struct dentry *dentry) return false; } -static void shrink_dentry_list(struct list_head *list) +void shrink_dentry_list(struct list_head *list) { while (!list_empty(list)) { struct dentry *dentry, *parent; @@ -1089,18 +1115,9 @@ static void shrink_dentry_list(struct list_head *list) rcu_read_unlock(); d_shrink_del(dentry); parent = dentry->d_parent; + if (parent != dentry) + __dput_to_list(parent, list); __dentry_kill(dentry); - if (parent == dentry) - continue; - /* -* We need to prune ancestors too. This is necessary to prevent -* quadratic behavior of shrink_dcache_parent(), but is also -* expected to be beneficial in reducing dentry cache -* fragmentation. -*/ - dentry = parent; - while (dentry && !lockref_put_or_lock(>d_lockref)) - dentry = dentry_kill(dentry); } } @@ -1445,8 +1462,11 @@ int d_set_mounted(struct dentry *dentry) struct select_data { struct dentry *start; + union { + long found; + struct dentry *victim; + }; struct list_head dispose; - int found; }; static enum
[PATCH 5/6] get rid of detach_mnt()
From: Al Viro Lift getting the original mount (dentry is actually not needed at all) of the mountpoint into the callers - to do_move_mount() and pivot_root() level. That simplifies the cleanup in those and allows to get saner arguments for attach_mnt_recursive(). Signed-off-by: Al Viro --- fs/namespace.c | 62 ++ 1 file changed, 28 insertions(+), 34 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index 911675de2a70..326a9ab591bc 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -815,16 +815,6 @@ static struct mountpoint *unhash_mnt(struct mount *mnt) /* * vfsmount lock must be held for write */ -static void detach_mnt(struct mount *mnt, struct path *old_path) -{ - old_path->dentry = dget(mnt->mnt_mountpoint); - old_path->mnt = >mnt_parent->mnt; - put_mountpoint(unhash_mnt(mnt), NULL); -} - -/* - * vfsmount lock must be held for write - */ static void umount_mnt(struct mount *mnt, struct list_head *list) { put_mountpoint(unhash_mnt(mnt), list); @@ -2037,7 +2027,7 @@ int count_mounts(struct mnt_namespace *ns, struct mount *mnt) static int attach_recursive_mnt(struct mount *source_mnt, struct mount *dest_mnt, struct mountpoint *dest_mp, - struct path *parent_path) + bool moving) { struct user_namespace *user_ns = current->nsproxy->mnt_ns->user_ns; HLIST_HEAD(tree_list); @@ -2055,7 +2045,7 @@ static int attach_recursive_mnt(struct mount *source_mnt, return PTR_ERR(smp); /* Is there space to add these mounts to the mount namespace? */ - if (!parent_path) { + if (!moving) { err = count_mounts(ns, source_mnt); if (err) goto out; @@ -2074,8 +2064,8 @@ static int attach_recursive_mnt(struct mount *source_mnt, } else { lock_mount_hash(); } - if (parent_path) { - detach_mnt(source_mnt, parent_path); + if (moving) { + unhash_mnt(source_mnt); attach_mnt(source_mnt, dest_mnt, dest_mp); touch_mnt_namespace(source_mnt->mnt_ns); } else { @@ -2173,7 +2163,7 @@ static int graft_tree(struct mount *mnt, struct mount *p, struct mountpoint *mp) d_is_dir(mnt->mnt.mnt_root)) return -ENOTDIR; - return attach_recursive_mnt(mnt, p, mp, NULL); + return attach_recursive_mnt(mnt, p, mp, false); } /* @@ -2566,11 +2556,11 @@ static bool check_for_nsfs_mounts(struct mount *subtree) static int do_move_mount(struct path *old_path, struct path *new_path) { - struct path parent_path = {.mnt = NULL, .dentry = NULL}; struct mnt_namespace *ns; struct mount *p; struct mount *old; - struct mountpoint *mp; + struct mount *parent; + struct mountpoint *mp, *old_mp; int err; bool attached; @@ -2580,7 +2570,9 @@ static int do_move_mount(struct path *old_path, struct path *new_path) old = real_mount(old_path->mnt); p = real_mount(new_path->mnt); + parent = old->mnt_parent; attached = mnt_has_parent(old); + old_mp = old->mnt_mp; ns = old->mnt_ns; err = -EINVAL; @@ -2608,7 +2600,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path) /* * Don't move a mount residing in a shared parent. */ - if (attached && IS_MNT_SHARED(old->mnt_parent)) + if (attached && IS_MNT_SHARED(parent)) goto out; /* * Don't move a mount tree containing unbindable mounts to a destination @@ -2624,18 +2616,21 @@ static int do_move_mount(struct path *old_path, struct path *new_path) goto out; err = attach_recursive_mnt(old, real_mount(new_path->mnt), mp, - attached ? _path : NULL); + attached); if (err) goto out; /* if the mount is moved, it should no longer be expire * automatically */ list_del_init(>mnt_expire); + if (attached) + put_mountpoint(old_mp, NULL); out: unlock_mount(mp); if (!err) { - path_put(_path); - if (!attached) + if (attached) + mntput_no_expire(parent); + else free_mnt_ns(ns); } return err; @@ -3578,8 +3573,8 @@ EXPORT_SYMBOL(path_is_under); SYSCALL_DEFINE2(pivot_root, const char __user *, new_root, const char __user *, put_old) { - struct path new, old, parent_path, root_parent, root; - struct mount *new_mnt, *root_mnt, *old_mnt; + struct path new, old, root; + struct mount *new_mnt, *root_mnt, *old_mnt, *root_parent, *ex_parent;
[RFC][PATCHES] (hopefully) saner refcounting for mountpoint dentries
Currently, we handle mountpoint dentry lifetime in a very convoluted way. * each struct mount attached to a mount tree contributes to ->d_count of mountpoint dentry (pointed to by ->mnt_mountpoint). * permanently detaching a mount from a mount tree moves the reference into ->mnt_ex_mountpoint. * that reference is dropped by drop_mountpoint(), which must happen no later than the filesystem the mountpoint resides on gets shut down. The last part makes for really unpleasant ordering logics; it works, but it's bloody hard to follow and it's a lot more complex under the hood than anyone would like. The root cause of those complexities is that we can't do dput() while we are detaching the thing, since the locking environment there doesn't tolerate IO, blocking, etc., and dput() can trigger all of that. Another complication (in analysis, not in the code) is that we also have struct mountpoint in the picture. Once upon a time it used to be a part of struct dentry - the list of all mounts on given mountpoint. Since it doesn't make sense to bloat every dentry for the sake of a very small fraction that will ever be anyone's mountpoints, that thing got separated. What we have is * mark in dentry flags (DCACHE_MOUNTED) set for dentries that are currently mountpoints * for each of those we have a struct mountpoint instance (exactly one for each of those dentries). * struct mountpoint has a pointer to its dentry (->m_dentry); it does not contribute to refcount. * struct mountpoint instances are hashed (all the time), using ->m_dentry as search key. * struct mount has reference to struct mountpoint (->mnt_mp), for as long as it is attached to a parent. When ->mnt_mp is non-NULL we are guaranteed that m->mnt_mp->m_dentry == m->mnt_mountpoint. * struct mountpoint is refcounted, and ->mnt_mp contributes to that refcount. All other contributing references are transient - pretty much dropped by the same function that has grabbed them. The reasons why ->m_dentry can't become dangling (despite not contributing to dentry refcount) or persist to the shutdown of filesystem dentry belongs to are different for transient and presistent references to struct mountpoint - holders of the former have dentry (and a struct mount of the filesystem it's on) pinned until after they drop their reference to struct mountpoint while the latter rely upon having the (contributing) reference to the same dentry stay in struct mount past dropping the reference to struct mountpoint. It works, but it's less than transparent and ultimately relies upon the mechanism we use to order dropping dentry references from struct mount vs. filesystem shutdowns. Note that once we have unmounted a struct mount, we don't really need the reference to what used to be its mountpoint dentry - all we use it for is eventually passing it to dput(). If we could drop it immediately (i.e. if the locking environment allowed that), we could do just that and forget about it as soon as mount is torn from struct mountpoint. IOW, we could make struct mountpoint ->m_dentry bear the contributing reference instead of struct mount ->mnt_mountpoint/->mnt_ex_mountpoint. Locking environment really doesn't allow IO. And ->d_count can reach zero there. However, while we can't kill such victim immediately, we can put it (with zero refcount) on a shrink list of our own. And call shrink_dentry_list() once the locking allows. That would almost work. The problem is that until now all shrink lists used to be homogeneous - all dentries on the same list belong to the same filesystem. And shrink_dcache_parent()/shrink_dcache_for_umount() rely upon that. If not for that, we could get rid of our ordering machinery. There is another reason we want to cope with such mixed-origin shrink lists - Slab Movable Objects patchset really needs that (well, either that, or having a separate kmem_cache for each struct super_block). Fortunately, that turns out be reasonably easy to do. And that allows to untangle the mess with mountpoints. The series below does that; it's in vfs.git #work.dcache and individual patches will be in followups to this posting. 1) __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore Forgotten removal of dead check near the code affected by the subsequent patches. 2) fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt() A bit of preliminary massage - we want to be able to tell put_mountoint() where to put the dropped dentry if its ->d_count reaches 0. 3) Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists The guts of that series. We make shrink_dcache_parent() (and shrink_dcache_for_umount()) to deal with mixed shrink lists sanely. New primitive added: dput_to_list(). shrink_dentry_list() made non-static. See the commit message of that one for details. 4) make struct mountpoint bear the dentry reference to
Hi
Nice to meet you
RE: linux-next: build failure after merge of the kbuild tree
From: Stephen Rothwell Sent: Friday, July 5, 2019 1:31 AM > > After merging the kbuild tree, today's linux-next build (powerpc > allyesconfig) failed like this: > > In file included from : > include/clocksource/hyperv_timer.h:18:10: fatal error: asm/mshyperv.h: No > such file or > directory > #include > ^~~~ > > Caused by commit > > 34085aeb5816 ("kbuild: compile-test kernel headers to ensure they are > self-contained") > > interacting with commit > > dd2cb348613b ("clocksource/drivers: Continue making Hyper-V clocksource ISA > agnostic") > > from the tip tree. > Thomas -- let's remove my two clocksource patches from your 'tip' tree. I'll need a little time to fully understand the self-contained header requirements and restructure hyperv_timer.h to avoid this problem. Michael
[PATCH v9 net-next 2/5] net: ethernet: ti: davinci_cpdma: add dma mapped submit
In case if dma mapped packet needs to be sent, like with XDP page pool, the "mapped" submit can be used. This patch adds dma mapped submit based on regular one. Signed-off-by: Ivan Khoronzhuk --- v9..v8 - fix potential warnings on arm64 caused by typos in type casting drivers/net/ethernet/ti/davinci_cpdma.c | 89 ++--- drivers/net/ethernet/ti/davinci_cpdma.h | 4 ++ 2 files changed, 83 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c index 5cf1758d425b..4e693c3aab27 100644 --- a/drivers/net/ethernet/ti/davinci_cpdma.c +++ b/drivers/net/ethernet/ti/davinci_cpdma.c @@ -139,6 +139,7 @@ struct submit_info { int directed; void *token; void *data; + int flags; int len; }; @@ -184,6 +185,8 @@ static struct cpdma_control_info controls[] = { (directed << CPDMA_TO_PORT_SHIFT));\ } while (0) +#define CPDMA_DMA_EXT_MAP BIT(16) + static void cpdma_desc_pool_destroy(struct cpdma_ctlr *ctlr) { struct cpdma_desc_pool *pool = ctlr->pool; @@ -1015,6 +1018,7 @@ static int cpdma_chan_submit_si(struct submit_info *si) struct cpdma_chan *chan = si->chan; struct cpdma_ctlr *ctlr = chan->ctlr; int len = si->len; + int swlen = len; struct cpdma_desc __iomem *desc; dma_addr_t buffer; u32 mode; @@ -1036,16 +1040,22 @@ static int cpdma_chan_submit_si(struct submit_info *si) chan->stats.runt_transmit_buff++; } - buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir); - ret = dma_mapping_error(ctlr->dev, buffer); - if (ret) { - cpdma_desc_free(ctlr->pool, desc, 1); - return -EINVAL; - } - mode = CPDMA_DESC_OWNER | CPDMA_DESC_SOP | CPDMA_DESC_EOP; cpdma_desc_to_port(chan, mode, si->directed); + if (si->flags & CPDMA_DMA_EXT_MAP) { + buffer = (dma_addr_t)si->data; + dma_sync_single_for_device(ctlr->dev, buffer, len, chan->dir); + swlen |= CPDMA_DMA_EXT_MAP; + } else { + buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir); + ret = dma_mapping_error(ctlr->dev, buffer); + if (ret) { + cpdma_desc_free(ctlr->pool, desc, 1); + return -EINVAL; + } + } + /* Relaxed IO accessors can be used here as there is read barrier * at the end of write sequence. */ @@ -1055,7 +1065,7 @@ static int cpdma_chan_submit_si(struct submit_info *si) writel_relaxed(mode | len, >hw_mode); writel_relaxed((uintptr_t)si->token, >sw_token); writel_relaxed(buffer, >sw_buffer); - writel_relaxed(len, >sw_len); + writel_relaxed(swlen, >sw_len); desc_read(desc, sw_len); __cpdma_chan_submit(chan, desc); @@ -1079,6 +1089,32 @@ int cpdma_chan_idle_submit(struct cpdma_chan *chan, void *token, void *data, si.data = data; si.len = len; si.directed = directed; + si.flags = 0; + + spin_lock_irqsave(>lock, flags); + if (chan->state == CPDMA_STATE_TEARDOWN) { + spin_unlock_irqrestore(>lock, flags); + return -EINVAL; + } + + ret = cpdma_chan_submit_si(); + spin_unlock_irqrestore(>lock, flags); + return ret; +} + +int cpdma_chan_idle_submit_mapped(struct cpdma_chan *chan, void *token, + dma_addr_t data, int len, int directed) +{ + struct submit_info si; + unsigned long flags; + int ret; + + si.chan = chan; + si.token = token; + si.data = (void *)data; + si.len = len; + si.directed = directed; + si.flags = CPDMA_DMA_EXT_MAP; spin_lock_irqsave(>lock, flags); if (chan->state == CPDMA_STATE_TEARDOWN) { @@ -1103,6 +1139,32 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data, si.data = data; si.len = len; si.directed = directed; + si.flags = 0; + + spin_lock_irqsave(>lock, flags); + if (chan->state != CPDMA_STATE_ACTIVE) { + spin_unlock_irqrestore(>lock, flags); + return -EINVAL; + } + + ret = cpdma_chan_submit_si(); + spin_unlock_irqrestore(>lock, flags); + return ret; +} + +int cpdma_chan_submit_mapped(struct cpdma_chan *chan, void *token, +dma_addr_t data, int len, int directed) +{ + struct submit_info si; + unsigned long flags; + int ret; + + si.chan = chan; + si.token = token; + si.data = (void *)data; + si.len = len; + si.directed = directed; +
Re: [PATCH] m68k: One function call less in cf_tlb_miss()
On Fri, 5 Jul 2019, Markus Elfring wrote: > From: Markus Elfring > Date: Fri, 5 Jul 2019 17:11:37 +0200 > > Avoid an extra function call Not really. You've avoided an extra statement. > by using a ternary operator instead of a conditional statement for a > setting selection. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > arch/m68k/mm/mcfmmu.c | 10 -- > 1 file changed, 4 insertions(+), 6 deletions(-) > > diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c > index 6cb1e41d58d0..02fc0778028e 100644 > --- a/arch/m68k/mm/mcfmmu.c > +++ b/arch/m68k/mm/mcfmmu.c > @@ -146,12 +146,10 @@ int cf_tlb_miss(struct pt_regs *regs, int write, int > dtlb, int extension_word) > > mmu_write(MMUDR, (pte_val(*pte) & PAGE_MASK) | > ((pte->pte) & CF_PAGE_MMUDR_MASK) | MMUDR_SZ_8KB | MMUDR_X); > - > - if (dtlb) > - mmu_write(MMUOR, MMUOR_ACC | MMUOR_UAA); > - else > - mmu_write(MMUOR, MMUOR_ITLB | MMUOR_ACC | MMUOR_UAA); > - > + mmu_write(MMUOR, > + dtlb > + ? MMUOR_ACC | MMUOR_UAA > + : MMUOR_ITLB | MMUOR_ACC | MMUOR_UAA); If you are trying to avoid redundancy, why not finish the job? + mmu_write(MMUOR, (dtlb ? 0 : MMUOR_ITLB) | MMUOR_ACC | MMUOR_UAA); -- > local_irq_restore(flags); > return 0; > } > -- > 2.22.0 > >
Re: [RESEND PATCH next v2 0/6] ARM: keystone: update dt and enable cpts support
On 7/5/19 8:12 AM, Grygorii Strashko wrote: Hi Santosh, This series is set of platform changes required to enable NETCP CPTS reference clock selection and final patch to enable CPTS for Keystone 66AK2E/L/HK SoCs. Those patches were posted already [1] together with driver's changes, so this is re-send of DT/platform specific changes only, as driver's changes have been merged already. Patches 1-5: CPTS DT nodes update for TI Keystone 2 66AK2HK/E/L SoCs. Patch 6: enables CPTS for TI Keystone 2 66AK2HK/E/L SoCs. [1] https://patchwork.kernel.org/cover/10980037/ Grygorii Strashko (6): ARM: dts: keystone-clocks: add input fixed clocks ARM: dts: k2e-clocks: add input ext. fixed clocks tsipclka/b ARM: dts: k2e-netcp: add cpts refclk_mux node ARM: dts: k2hk-netcp: add cpts refclk_mux node ARM: dts: k2l-netcp: add cpts refclk_mux node ARM: configs: keystone: enable cpts Will add these for 5.4 queue. Thanks !! Regards, Santosh
Re: linux-next: build failure after merge of the nvdimm tree
Hi Dan, On Fri, 5 Jul 2019 15:32:19 -0700 Dan Williams wrote: > > On Fri, Jul 5, 2019 at 12:20 AM Stephen Rothwell > wrote: > > > > After merging the nvdimm tree, today's linux-next build (x86_64 > > allmodconfig) failed like this: > > > > In file included from :32: > > ./usr/include/linux/virtio_pmem.h:19:2: error: unknown type name 'uint64_t' > > uint64_t start; > > ^~~~ > > ./usr/include/linux/virtio_pmem.h:20:2: error: unknown type name 'uint64_t' > > uint64_t size; > > ^~~~ > > /me boggles at how this sat in 0day visible tree for a long while > without this report? These messages are produced by a new test in the kbuild tree, so you need both it and the nvdimm tree together to get them. That will change after the merge window, of course. -- Cheers, Stephen Rothwell pgprPiDpkHvI1.pgp Description: OpenPGP digital signature
Re: pagecache locking
On Wed, Jul 03, 2019 at 03:04:45AM +0300, Boaz Harrosh wrote: > On 20/06/2019 01:37, Dave Chinner wrote: > <> > > > > I'd prefer it doesn't get lifted to the VFS because I'm planning on > > getting rid of it in XFS with range locks. i.e. the XFS_MMAPLOCK is > > likely to go away in the near term because a range lock can be > > taken on either side of the mmap_sem in the page fault path. > > > <> > Sir Dave > > Sorry if this was answered before. I am please very curious. In the zufs > project I have an equivalent rw_MMAPLOCK that I _read_lock on page_faults. > (Read & writes all take read-locks ...) > The only reason I have it is because of lockdep actually. > > Specifically for those xfstests that mmap a buffer then direct_IO in/out > of that buffer from/to another file in the same FS or the same file. > (For lockdep its the same case). Which can deadlock if the same inode rwsem is taken on both sides of the mmap_sem, as lockdep tells you... > I would be perfectly happy to recursively _read_lock both from the top > of the page_fault at the DIO path, and under in the page_fault. I'm > _read_locking after all. But lockdep is hard to convince. So I stole the > xfs idea of having an rw_MMAPLOCK. And grab yet another _write_lock at > truncate/punch/clone time when all mapping traversal needs to stop for > the destructive change to take place. (Allocations are done another way > and are race safe with traversal) > > How do you intend to address this problem with range-locks? ie recursively > taking the same "lock"? because if not for the recursive-ity and lockdep I > would > not need the extra lock-object per inode. As long as the IO ranges to the same file *don't overlap*, it should be perfectly safe to take separate range locks (in read or write mode) on either side of the mmap_sem as non-overlapping range locks can be nested and will not self-deadlock. The "recursive lock problem" still arises with DIO and page faults inside gup, but it only occurs when the user buffer range overlaps the DIO range to the same file. IOWs, the application is trying to do something that has an undefined result and is likely to result in data corruption. So, in that case I plan to have the gup page faults fail and the DIO return -EDEADLOCK to userspace Cheers, Dave. -- Dave Chinner da...@fromorbit.com
Re: [PATCH] rtl8xxxu: Fix wifi low signal strength issue of RTL8723BU
On 7/4/19 10:44 PM, Daniel Drake wrote: On Wed, Jul 3, 2019 at 8:59 PM Jes Sorensen wrote: My point is this seems to be very dongle dependent :( We have to be careful not breaking it for some users while fixing it for others. Do you still have your device? Once we get to the point when you are happy with Chris's two patches here on a code review level, we'll reach out to other driver contributors plus people who previously complained about these types of problems, and see if we can get some wider testing. Larry, do you have these devices, can you help with testing too? I have some of the devices, and I can help with the testing. Larry
Re: [PATCH 6/7] nfp: Use spinlock_t instead of struct spinlock
From: Sebastian Andrzej Siewior Date: Thu, 4 Jul 2019 17:38:02 +0200 > For spinlocks the type spinlock_t should be used instead of "struct > spinlock". > > Use spinlock_t for spinlock's definition. > > Cc: Jakub Kicinski > Cc: "David S. Miller" > Cc: oss-driv...@netronome.com > Cc: net...@vger.kernel.org > Signed-off-by: Sebastian Andrzej Siewior Applied to net-next, thanks.
Re: kernel BUG at mm/swap_state.c:170!
On Fri 05-07-19 20:19:48, Mikhail Gavrilov wrote: > Hey folks. > Excuse me, is anybody read my previous message? > 5.2-rc7 is still affected by this issue [the logs in file > dmesg-5.2rc7-0.1.tar.xz] and I worry that stable 5.2 would be released > with this bug because there is almost no time left and I didn't see > the attention to this problem. > I confirm that reverting commit 5fd4ca2d84b2 on top of the rc7 tag is > help fix it [the logs in file dmesg-5.2rc7-0.2.tar.xz]. > I am still awaiting any feedback here. Yeah, I guess revert of 5fd4ca2d84b2 at this point is probably the best we can do. Let's CC Linus, Andrew, and Greg (Linus is travelling AFAIK so I'm not sure whether Greg won't do release for him). Honza -- Jan Kara SUSE Labs, CR
Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.
Please Cc Andrew Morton on future follow ups. On Sat, Jul 06, 2019 at 12:19:16AM +0200, Radoslaw Burny wrote: > On Fri, Jul 5, 2019 at 10:02 PM Luis Chamberlain wrote: > > > > > > Please re-state the main fix in the commit log, not just the subject. > > Sure, I'll do this. Just to make sure - for every iteration on the > commit message, I need to increment the patch "version" and resend the > whole patch, right? Right. > > > > Also, this does not explain why the current values are and the impact to > > systems / users. This would help in determine and evaluating if this > > deserves to be a stable fix. > > This commit a (much overdue) resend of https://lkml.org/lkml/2018/11/30/990 > I think Eric's comment on the previous thread explained it best: Ah, I knew this smelled familiar. Yes I recall. Please add more information about all this to the commit log. The more info, the better including refence to the old discussion and also a distilled summary of what was discussed. Preference if you can avoid using lkml.org and instead use this URL instead, as lkml.org is not under out control and can die, etc. https://lore.kernel.org/lkml/20181126172607.125782-1-rbu...@google.com/ > > We spoke about this at LPC. And this is the correct behavioral change. Again, none of this is clear to the patch reviewer and again you didn't mention any of it. > > > > The problem is there is a default value for i_uid and i_gid that is > > correct in the general case. That default value is not corect for > > sysctl, because proc is weird. As the sysctl permission check in > > test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not > > notice that i_uid and i_gid were being set wrong. > > > > So all this patch does is fix the default values i_uid and i_gid. > > If my new commit message is still not conveying this clearly, feel > free to suggest the specific wording (I'm new to the kernel patch > process, and I might not be explaining the problems well enough). Please consense the above into the commit log message. What you want to be made clear is implication issues if this patch is not applied, who is affected and why. > > On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote: > > > This also fixes a problem where, in a user namespace without root user > > > mapping, it is not possible to write to /proc/sys/kernel/shmmax. > > > > This does not explain why that should be possible and what impact this > > limitation has. > > Writing to /proc/sys/kernel/shmmax allows setting a shared memory > limit for that container. Since this is usually a part of container's > initial configuration, one would expect that the container's owner / > creator is able to set the limit. Yet, due to the bug described here, > no process can write the container's shmmax if the container's user > namespace does not contain root mapping. Please include this on the commit log. It does seem then worthy as a stable commit. Please add the Cc: stable tag, ie put this: Cc: sta...@vger.kernel.org # v4.8+ Right above the Signed-off-by tags. Then the scripts which pick up stable patches will pick this up. > Using a container with no root mapping seems to be a rare case, but we > do use this configuration at Google, which is how I found the issue. > Also, we use a generic tool to configure the container limits, and the > inability to write any of them causes a hard failure. This helps folks also, so please include this in the commit log. > > > The problem was introduced by the combination of the two commits: > > > * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update > > > i_[ug]id_(read|write) to translate relative to s_user_ns > > > - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid > > > members of /proc/sys inodes if a containing userns does not have > > > entries for root in the uid/gid_map. > > This is 2014 commit merged as of v4.8. > > > > > * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes > > > with a uid or gid unknown to the vfs > > > - changed the kernel to prevent opens for write if the i_uid/i_gid > > > field in the inode is invalid > > > > This is a 2016 commit merged as of v4.8 as well... > > > > So regardless of the dates of the commits, are you saying this is a > > regression you can confirm did not exist prior to v4.8? Did you test > > v4.7 to confirm? > > I assume no one has noticed this issue before because it requires such > a specific combination of triggers. > Yes, I've tested this with older kernel versions. I've additionally > tested a 4.8 build with just 0aa2720d7b08 reverted, confirming that > the revert fixes the issue. Ummm 0aa2720d7b08 is the last part of the gitsum, you want to reference the first part of the gitsum as otherwise git show 0aa2720d7b08 yields nothing, but git show 0bd23d09b874e does. OK so then the *real* issue was commit 0bd23d09b874e, so Just add this tag: Fixes: 0aa2720d7b08 ("vfs: Don't modify inodes with a
Re: [PATCH net-next 0/9] net: hns3: some cleanups & bugfixes
From: Huazhong Tan Date: Thu, 4 Jul 2019 22:04:19 +0800 > This patch-set includes cleanups and bugfixes for > the HNS3 ethernet controller driver. > > [patch 1/9] fixes VF's broadcast promisc mode not enabled after > initializing. > > [patch 2/9] adds hints for fibre port not support flow control. > > [patch 3/9] fixes a port capbility updating issue. > > [patch 4/9 - 9/9] adds some cleanups for HNS3 driver. Series applied, thanks.
Re: [PATCH net] r8152: set RTL8152_UNPLUG only for real disconnection
From: Hayes Wang Date: Thu, 4 Jul 2019 17:36:32 +0800 > Set the flag of RTL8152_UNPLUG if and only if the device is unplugged. > Some error codes sometimes don't mean the real disconnection of usb device. > For those situations, set the flag of RTL8152_UNPLUG causes the driver skips > some flows of disabling the device, and it let the device stay at incorrect > state. > > Signed-off-by: Hayes Wang Applied.
Re: [PATCH] rtc: zynqmp: One function call less in xlnx_rtc_alarm_irq_enable()
On 05/07/2019 22:45:39+0200, Markus Elfring wrote: > From: Markus Elfring > Date: Fri, 5 Jul 2019 22:37:58 +0200 > > Avoid an extra function call by using a ternary operator instead of > a conditional statement for a setting selection. > Please elaborate on why this is a good thing. > This issue was detected by using the Coccinelle software. > Unless you use an upstream coccinelle script or you share the one you are using, this is not a useful information. > Signed-off-by: Markus Elfring > --- > drivers/rtc/rtc-zynqmp.c | 7 ++- > 1 file changed, 2 insertions(+), 5 deletions(-) > > diff --git a/drivers/rtc/rtc-zynqmp.c b/drivers/rtc/rtc-zynqmp.c > index 00639594de0c..4631019a54e2 100644 > --- a/drivers/rtc/rtc-zynqmp.c > +++ b/drivers/rtc/rtc-zynqmp.c > @@ -124,11 +124,8 @@ static int xlnx_rtc_alarm_irq_enable(struct device *dev, > u32 enabled) > { > struct xlnx_rtc_dev *xrtcdev = dev_get_drvdata(dev); > > - if (enabled) > - writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_EN); > - else > - writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_DIS); > - > + writel(RTC_INT_ALRM, > +xrtcdev->reg_base + (enabled ? RTC_INT_EN : RTC_INT_DIS)); This makes the code less readable. > return 0; > } > > -- > 2.22.0 > -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: linux-next: build failure after merge of the nvdimm tree
On Fri, Jul 5, 2019 at 12:20 AM Stephen Rothwell wrote: > > Hi all, > > After merging the nvdimm tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > In file included from :32: > ./usr/include/linux/virtio_pmem.h:19:2: error: unknown type name 'uint64_t' > uint64_t start; > ^~~~ > ./usr/include/linux/virtio_pmem.h:20:2: error: unknown type name 'uint64_t' > uint64_t size; > ^~~~ /me boggles at how this sat in 0day visible tree for a long while without this report? > > Caused by commit > > 403b7f973855 ("virtio-pmem: Add virtio pmem driver") > > I have used the nvdimm tree from next-20190704 for today. Thanks Stephen, sorry for the noise.
Re: [PATCH v2] ARM: configs: Remove useless UEVENT_HELPER_PATH
On Fri, Jul 5, 2019 at 3:26 PM Olof Johansson via Linux.Kernel.Org wrote: This didn't work as I anticipated. Please ignore, apologies for the spam. -Olof
Re: [PATCH v2 1/9] mmc: sdhci-sprd: Check the enable clock's return value correctly
On Fri, Jul 5, 2019 at 3:25 PM Olof Johansson via Linux.Kernel.Org wrote: Hmm, well, that didn't work like I expected to. Sorry for the noise. -Olof
Re: [PATCH] net: ethernet: allwinner: Remove unneeded memset
From: Hariprasad Kelam Date: Thu, 4 Jul 2019 08:29:06 +0530 > Remove unneeded memset as alloc_etherdev is using kvzalloc which uses > __GFP_ZERO flag > > Signed-off-by: Hariprasad Kelam Applied.
Re: linux-next: Tree for Jun 28 (kernel/bpf/cgroup.c)
On 6/28/19 1:52 PM, Randy Dunlap wrote: > On 6/28/19 3:38 AM, Stephen Rothwell wrote: >> Hi all, >> >> Changes since 20190627: >> > > on i386: > > ld: kernel/bpf/cgroup.o: in function `cg_sockopt_func_proto': > cgroup.c:(.text+0x2906): undefined reference to `bpf_sk_storage_delete_proto' > ld: cgroup.c:(.text+0x2939): undefined reference to `bpf_sk_storage_get_proto' > ld: kernel/bpf/cgroup.o: in function `__cgroup_bpf_run_filter_setsockopt': > cgroup.c:(.text+0x85e4): undefined reference to `lock_sock_nested' > ld: cgroup.c:(.text+0x8af2): undefined reference to `release_sock' > ld: kernel/bpf/cgroup.o: in function `__cgroup_bpf_run_filter_getsockopt': > cgroup.c:(.text+0x8fd6): undefined reference to `lock_sock_nested' > ld: cgroup.c:(.text+0x94e4): undefined reference to `release_sock' > > > Full randconfig file is attached. > These build errors still happen in linux-next of 20190705... -- ~Randy
Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.
On Fri, Jul 5, 2019 at 10:02 PM Luis Chamberlain wrote: > > > Please re-state the main fix in the commit log, not just the subject. Sure, I'll do this. Just to make sure - for every iteration on the commit message, I need to increment the patch "version" and resend the whole patch, right? > > Also, this does not explain why the current values are and the impact to > systems / users. This would help in determine and evaluating if this > deserves to be a stable fix. This commit a (much overdue) resend of https://lkml.org/lkml/2018/11/30/990 I think Eric's comment on the previous thread explained it best: > We spoke about this at LPC. And this is the correct behavioral change. > > The problem is there is a default value for i_uid and i_gid that is > correct in the general case. That default value is not corect for > sysctl, because proc is weird. As the sysctl permission check in > test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID we did not > notice that i_uid and i_gid were being set wrong. > > So all this patch does is fix the default values i_uid and i_gid. If my new commit message is still not conveying this clearly, feel free to suggest the specific wording (I'm new to the kernel patch process, and I might not be explaining the problems well enough). > > > On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote: > > This also fixes a problem where, in a user namespace without root user > > mapping, it is not possible to write to /proc/sys/kernel/shmmax. > > This does not explain why that should be possible and what impact this > limitation has. Writing to /proc/sys/kernel/shmmax allows setting a shared memory limit for that container. Since this is usually a part of container's initial configuration, one would expect that the container's owner / creator is able to set the limit. Yet, due to the bug described here, no process can write the container's shmmax if the container's user namespace does not contain root mapping. Using a container with no root mapping seems to be a rare case, but we do use this configuration at Google, which is how I found the issue. Also, we use a generic tool to configure the container limits, and the inability to write any of them causes a hard failure. > > > The problem was introduced by the combination of the two commits: > > * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update > > i_[ug]id_(read|write) to translate relative to s_user_ns > > - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid > > members of /proc/sys inodes if a containing userns does not have > > entries for root in the uid/gid_map. > This is 2014 commit merged as of v4.8. > > > * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes > > with a uid or gid unknown to the vfs > > - changed the kernel to prevent opens for write if the i_uid/i_gid > > field in the inode is invalid > > This is a 2016 commit merged as of v4.8 as well... > > So regardless of the dates of the commits, are you saying this is a > regression you can confirm did not exist prior to v4.8? Did you test > v4.7 to confirm? I assume no one has noticed this issue before because it requires such a specific combination of triggers. Yes, I've tested this with older kernel versions. I've additionally tested a 4.8 build with just 0aa2720d7b08 reverted, confirming that the revert fixes the issue. > > > This commit fixes the issue by defaulting i_uid/i_gid to > > GLOBAL_ROOT_UID/GID. > > Why is this right? Quoting Eric: "the sysctl permission check in test_perm are all against GLOBAL_ROOT_UID and GLOBAL_ROOT_GID". The values in the inode are not even read during test_perm, but logically, the inode belongs to the root of the namespace. > > > Note that these values are not used for /proc/sys > > access checks, so the change does not otherwise affect /proc semantics. > > > > Tested: Used a repro program that creates a user namespace without any > > mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside. > > Before the change, it shows the overflow uid, with the change it's 0. > > Why is the overflow uid bad for user experience? Did you test prior to > v4.8, ie on v4.7 to confirm this is indeed a regression? > > You'd want then to also ammend in the commit log a Fixes: tag with both > commits listed. If this is a stable fix (criteria yet to be determined), > then we'd need a stable tag. The overflow is technically correct; the uid in the inode is invalid, hence it must be displayed as overflow uid. The fact that the uid is invalid is the issue. Logically, this commit fixes 81754357770e (as that commit first introduced invalid uid/gid values). If you agree, I'll add this to my updated commit. > > Luis > > > Signed-off-by: Radoslaw Burny > > --- > > Changelog since v1: > > - Updated the commit title and description. > > > > fs/proc/proc_sysctl.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/fs/proc/proc_sysctl.c
Re: [PATCH bpf-next] Enable zext optimization for more RV64G ALU ops
On 07/05/2019 02:18 AM, Luke Nelson wrote: > commit 66d0d5a854a6 ("riscv: bpf: eliminate zero extension code-gen") > added the new zero-extension optimization for some BPF ALU operations. > > Since then, bugs in the JIT that have been fixed in the bpf tree require > this optimization to be added to other operations: commit 1e692f09e091 > ("bpf, riscv: clear high 32 bits for ALU32 add/sub/neg/lsh/rsh/arsh"), > and commit fe121ee531d1 ("bpf, riscv: clear target register high 32-bits > for and/or/xor on ALU32") > > Now that these have been merged to bpf-next, the zext optimization can > be enabled for the fixed operations. > > Cc: Song Liu > Cc: Jiong Wang > Cc: Xi Wang > Signed-off-by: Luke Nelson Applied, thanks!
Quotes needed For July Shipments
Hello dear, We are in the market for your products after meeting at your stand during last expo. Please kindly send us your latest catalog and price list so as to start a new project/order as promised during the exhibition. I would appreciate your response about the above details required so we can revert back to you asap. Kind regards Rhema Zoeh
Re: [PATCH v2] gpiolib: Preserve desc->flags when setting state
Hi Chris, thanks for your patch! On Thu, Jul 4, 2019 at 6:21 AM Chris Packham wrote: > desc->flags may already have values set by of_gpiochip_add() so make > sure that this isn't undone when setting the initial direction. > > Fixes: 3edfb7bd76bd1cba ("gpiolib: Show correct direction from the beginning") > Signed-off-by: Chris Packham > --- > > Notes: > Changes in v2: > - add braces to avoid ambiguious else warning This is almost the solution! > - if (chip->get_direction && gpiochip_line_is_valid(chip, i)) > - desc->flags = !chip->get_direction(chip, i) ? > - (1 << FLAG_IS_OUT) : 0; > - else > - desc->flags = !chip->direction_input ? > - (1 << FLAG_IS_OUT) : 0; > + if (chip->get_direction && gpiochip_line_is_valid(chip, i)) { > + if (!chip->get_direction(chip, i)) > + set_bit(FLAG_IS_OUT, >flags); You need to clear_bit() in the reverse case. We just learned we can't assume anything about the flags here, like just assign them. > + } else { > + if (!chip->direction_input) > + set_bit(FLAG_IS_OUT, >flags); Same here. Yours, Linus Walleij
Re: gpio desc flags being lost
On Wed, Jul 3, 2019 at 11:30 PM Chris Packham wrote: > The problem is caused by commit 3edfb7bd76bd1cba ("gpiolib: Show correct > direction from the beginning"). I'll see if I can whip up a patch to fix it. Oh. I think: if (chip->get_direction && gpiochip_line_is_valid(chip, i)) desc->flags = !chip->get_direction(chip, i) ? (1 << FLAG_IS_OUT) : 0; else desc->flags = !chip->direction_input ? (1 << FLAG_IS_OUT) : 0; Needs to have desc->flags |= ... &= ~ if (!chip->get_direction(chip, i)) desc->flags |= (1 << FLAG_IS_OUT); else desc->flags &= ~(1 << FLAG_IS_OUT); And the same for direction_input() Yours, Linus Walleij
Re: [PATCH net-next 0/2] net: mvpp2: Add classification based on the ETHER flow
On Fri, 5 Jul 2019 14:09:11 +0200, Maxime Chevallier wrote: > Hello everyone, > > This series adds support for classification of the ETHER flow in the > mvpp2 driver. > > The first patch allows detecting when a user specifies a flow_type that > isn't supported by the driver, while the second adds support for this > flow_type by adding the mapping between the ETHER_FLOW enum value and > the relevant classifier flow entries. LGTM
[GIT PULL] afs: Miscellany for 5.3
Hi Linus, Here's a set of minor changes for AFS for the next merge window: (1) Remove an unnecessary check in afs_unlink(). (2) Add a tracepoint for tracking callback management. (3) Add a tracepoint for afs_server object usage. (4) Use struct_size(). (5) Add mappings for AFS UAE abort codes to Linux error codes, using symbolic names rather than hex numbers in the .c file. David --- The following changes since commit 2cd42d19cffa0ec3dfb57b1b3e1a07a9bf4ed80a: afs: Fix setting of i_blocks (2019-06-20 18:12:02 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-next-20190628 for you to fetch changes up to 1eda8bab70ca7d353b4e865140eaec06fedbf871: afs: Add support for the UAE error table (2019-06-28 18:37:53 +0100) AFS development David Howells (4): afs: afs_unlink() doesn't need to check dentry->d_inode afs: Add some callback management tracepoints afs: Trace afs_server usage afs: Add support for the UAE error table Zhengyuan Liu (1): fs/afs: use struct_size() in kzalloc() fs/afs/callback.c | 20 --- fs/afs/cmservice.c | 5 +- fs/afs/dir.c | 21 fs/afs/file.c | 6 +-- fs/afs/fsclient.c | 2 +- fs/afs/inode.c | 17 +++--- fs/afs/internal.h | 18 +++ fs/afs/misc.c | 48 +++-- fs/afs/protocol_uae.h | 132 + fs/afs/rxrpc.c | 2 +- fs/afs/server.c| 39 +++--- fs/afs/server_list.c | 6 ++- fs/afs/write.c | 3 +- include/trace/events/afs.h | 132 + 14 files changed, 369 insertions(+), 82 deletions(-) create mode 100644 fs/afs/protocol_uae.h
Re: [PATCH] gpiolib: fix incorrect IRQ requesting of an active-low lineevent
On Fri, Jul 5, 2019 at 12:35 PM wrote: > For example, there is a button which drives level to be low when it is > pushed, and drivers level to be high when it is released. > We want to catch the event when the button is pushed. > > In user space we configure a line event with the following code: > > req.handleflags = GPIOHANDLE_REQUEST_INPUT; > req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE; But *THIS* is the case that should have GPIOHANDLE_REQUEST_ACTIVE_LOW, because you push the button to activate it (it is inactive when not pushed). Also this should have GPIOEVENT_REQUEST_RISING_EDGE. > Run the same logic on another board which the polarity of the button is > inverted. The button drives level to be high when it is pushed. > For the inverted level case, we have to add flag > GPIOHANDLE_REQUEST_ACTIVE_LOW: > > req.handleflags = GPIOHANDLE_REQUEST_INPUT | GPIOHANDLE_REQUEST_ACTIVE_LOW; > req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE; This one should not be active low. And also have GPIOEVENT_REQUEST_RISING_EDGE. However I agree that the semantic should change as in the patch, it makes most logical sense. The reason it looks as it does is because GPIO line values and interrupts are two separate subsystems inside the kernel with their own flags (as you've seen). But you are right, userspace has no idea about that and should not have to care. Yours, Linus Walleij
Re: [PATCH bpf-next 1/2] bpf, libbpf: add a new API bpf_object__reuse_maps()
On 07/05/2019 10:44 PM, Anton Protopopov wrote: > Add a new API bpf_object__reuse_maps() which can be used to replace all maps > in > an object by maps pinned to a directory provided in the path argument. > Namely, > each map M in the object will be replaced by a map pinned to path/M.name. > > Signed-off-by: Anton Protopopov > --- > tools/lib/bpf/libbpf.c | 34 ++ > tools/lib/bpf/libbpf.h | 2 ++ > tools/lib/bpf/libbpf.map | 1 + > 3 files changed, 37 insertions(+) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index 4907997289e9..84c9e8f7bfd3 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -3144,6 +3144,40 @@ int bpf_object__unpin_maps(struct bpf_object *obj, > const char *path) > return 0; > } > > +int bpf_object__reuse_maps(struct bpf_object *obj, const char *path) > +{ > + struct bpf_map *map; > + > + if (!obj) > + return -ENOENT; > + > + if (!path) > + return -EINVAL; > + > + bpf_object__for_each_map(map, obj) { > + int len, err; > + int pinned_map_fd; > + char buf[PATH_MAX]; We'd need to skip the case of bpf_map__is_internal(map) since they are always recreated for the given object. > + len = snprintf(buf, PATH_MAX, "%s/%s", path, > bpf_map__name(map)); > + if (len < 0) { > + return -EINVAL; > + } else if (len >= PATH_MAX) { > + return -ENAMETOOLONG; > + } > + > + pinned_map_fd = bpf_obj_get(buf); > + if (pinned_map_fd < 0) > + return pinned_map_fd; Should we rather have a new map definition attribute that tells to reuse the map if it's pinned in bpf fs, and if not, we create it and later on pin it? This is what iproute2 is doing and which we're making use of heavily. In bpf_object__reuse_maps() bailing out if bpf_obj_get() fails is perhaps too limiting for a generic API as new version of an object file may contain new maps which are not yet present in bpf fs at that point. > + err = bpf_map__reuse_fd(map, pinned_map_fd); > + if (err) > + return err; > + } > + > + return 0; > +} > + > int bpf_object__pin_programs(struct bpf_object *obj, const char *path) > { > struct bpf_program *prog; > diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h > index d639f47e3110..7fe465a1be76 100644 > --- a/tools/lib/bpf/libbpf.h > +++ b/tools/lib/bpf/libbpf.h > @@ -82,6 +82,8 @@ int bpf_object__variable_offset(const struct bpf_object > *obj, const char *name, > LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char > *path); > LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj, > const char *path); > +LIBBPF_API int bpf_object__reuse_maps(struct bpf_object *obj, > + const char *path); > LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj, > const char *path); > LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj, > diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map > index 2c6d835620d2..66a30be6696c 100644 > --- a/tools/lib/bpf/libbpf.map > +++ b/tools/lib/bpf/libbpf.map > @@ -172,5 +172,6 @@ LIBBPF_0.0.4 { > btf_dump__new; > btf__parse_elf; > bpf_object__load_xattr; > + bpf_object__reuse_maps; > libbpf_num_possible_cpus; > } LIBBPF_0.0.3; >
Re: [patch V2 01/25] x86/kgbd: Use NMI_VECTOR not APIC_DM_NMI
On Thu, 4 Jul 2019, Thomas Gleixner wrote: > apic->send_IPI_allbutself() takes a vector number as argument. > > APIC_DM_NMI is clearly not a vector number. It's defined to 0x400 which is > outside the vector space. > > Use NMI_VECTOR instead as that's what it is intended to be. > > Fixes: 82da3ff89dc2 ("x86: kgdb support") > Signed-off-by: Thomas Gleixner > --- > V2: New patch > --- > arch/x86/kernel/kgdb.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > --- a/arch/x86/kernel/kgdb.c > +++ b/arch/x86/kernel/kgdb.c > @@ -424,7 +424,7 @@ static void kgdb_disable_hw_debug(struct > */ > void kgdb_roundup_cpus(void) > { > - apic->send_IPI_allbutself(APIC_DM_NMI); > + apic->send_IPI_allbutself(VECTOR_NMI); The changelog got it right, but this here needs to be VECTOR_NMI. While I didn't 0-day was able to find and turn on the config option ... /blush
[GIT PULL] Keys: Set 4 - Key ACLs for 5.3
Hi Linus, Here's my fourth block of keyrings changes for the next merge window. They change the permissions model used by keys and keyrings to be based on an internal ACL by the following means: (1) Replace the permissions mask internally with an ACL that contains a list of ACEs, each with a specific subject with a permissions mask. Potted default ACLs are available for new keys and keyrings. ACE subjects can be macroised to indicate the UID and GID specified on the key (which remain). Future commits will be able to add additional subject types, such as specific UIDs or domain tags/namespaces. Also split a number of permissions to give finer control. Examples include splitting the revocation permit from the change-attributes permit, thereby allowing someone to be granted permission to revoke a key without allowing them to change the owner; also the ability to join a keyring is split from the ability to link to it, thereby stopping a process accessing a keyring by joining it and thus acquiring use of possessor permits. (2) Provide a keyctl to allow the granting or denial of one or more permits to a specific subject. Direct access to the ACL is not granted, and the ACL cannot be viewed. David --- The following changes since commit a58946c158a040068e7c94dc1d58bbd273258068: keys: Pass the network namespace into request_key mechanism (2019-06-27 23:02:12 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/keys-acl-20190703 for you to fetch changes up to 7a1ade847596dadc94b37e49f8c03f167fd71748: keys: Provide KEYCTL_GRANT_PERMISSION (2019-07-03 13:05:22 +0100) Keyrings ACL David Howells (2): keys: Replace uid/gid/perm permissions checking with an ACL keys: Provide KEYCTL_GRANT_PERMISSION Documentation/security/keys/core.rst | 128 ++-- Documentation/security/keys/request-key.rst| 9 +- certs/blacklist.c | 7 +- certs/system_keyring.c | 12 +- drivers/md/dm-crypt.c | 2 +- drivers/nvdimm/security.c | 2 +- fs/afs/security.c | 2 +- fs/cifs/cifs_spnego.c | 25 +- fs/cifs/cifsacl.c | 28 +- fs/cifs/connect.c | 4 +- fs/crypto/keyinfo.c| 2 +- fs/ecryptfs/ecryptfs_kernel.h | 2 +- fs/ecryptfs/keystore.c | 2 +- fs/fscache/object-list.c | 2 +- fs/nfs/nfs4idmap.c | 30 +- fs/ubifs/auth.c| 2 +- include/linux/key.h| 121 +++ include/uapi/linux/keyctl.h| 65 lib/digsig.c | 2 +- net/ceph/ceph_common.c | 2 +- net/dns_resolver/dns_key.c | 12 +- net/dns_resolver/dns_query.c | 15 +- net/rxrpc/key.c| 19 +- net/wireless/reg.c | 6 +- security/integrity/digsig.c| 31 +- security/integrity/digsig_asymmetric.c | 2 +- security/integrity/evm/evm_crypto.c| 2 +- security/integrity/ima/ima_mok.c | 13 +- security/integrity/integrity.h | 6 +- .../integrity/platform_certs/platform_keyring.c| 14 +- security/keys/compat.c | 2 + security/keys/encrypted-keys/encrypted.c | 2 +- security/keys/encrypted-keys/masterkey_trusted.c | 2 +- security/keys/gc.c | 2 +- security/keys/internal.h | 16 +- security/keys/key.c| 29 +- security/keys/keyctl.c | 104 -- security/keys/keyring.c| 27 +- security/keys/permission.c | 361 +++-- security/keys/persistent.c | 27 +- security/keys/proc.c | 22 +- security/keys/process_keys.c | 86 +++-- security/keys/request_key.c| 34 +- security/keys/request_key_auth.c | 15 +- security/selinux/hooks.c | 16 +- security/smack/smack_lsm.c | 3 +- 46 files changed, 992 insertions(+), 325 deletions(-)
Re: Re: [PATCH v2 3/7] rtc: mt6397: improvements of rtc driver
On 05/07/2019 17:35:46+0200, Frank Wunderlich wrote: > Hi Alexander, > > thank you for the Review > > > Gesendet: Donnerstag, 04. Juli 2019 um 22:43 Uhr > > Von: "Alexandre Belloni" > > > - rtc->rtc_dev = devm_rtc_allocate_device(rtc->dev); > > > - if (IS_ERR(rtc->rtc_dev)) > > > - return PTR_ERR(rtc->rtc_dev); > > > + ret = devm_request_threaded_irq(>dev, rtc->irq, NULL, > > > + mtk_rtc_irq_handler_thread, > > > + IRQF_ONESHOT | IRQF_TRIGGER_HIGH, > > > + "mt6397-rtc", rtc); > > > > > > > This change may lead to a crash and the allocation was intentionally > > placed before the irq request. > > i got no crash till now, but i will try to move the allocation before > irq-request > Let's say the RTC has been used to start your platform, then the irq handler will be called as soon as the irq is requested, leading to a null pointer dereference. > > > - ret = request_threaded_irq(rtc->irq, NULL, > > > -mtk_rtc_irq_handler_thread, > > > -IRQF_ONESHOT | IRQF_TRIGGER_HIGH, > > > -"mt6397-rtc", rtc); > > > if (ret) { > > > dev_err(>dev, "Failed to request alarm IRQ: %d: %d\n", > > > rtc->irq, ret); > > > @@ -287,6 +281,10 @@ static int mtk_rtc_probe(struct platform_device > > > *pdev) > > > > > > device_init_wakeup(>dev, 1); > > > > > > + rtc->rtc_dev = devm_rtc_allocate_device(>dev); > > > + if (IS_ERR(rtc->rtc_dev)) > > > + return PTR_ERR(rtc->rtc_dev); > > > + > > > rtc->rtc_dev->ops = _rtc_ops; > > > > > static const struct of_device_id mt6397_rtc_of_match[] = { > > > + { .compatible = "mediatek,mt6323-rtc", }, > > > > Unrelated change, this is not an improvement and must be accompanied by > > a documentation change. > > documentation is changed in 1/7 defining this compatible. i called it > improvement because existing driver now supports another chip > Yes and IIRC, I did comment that the rtc change also had to be separated from 1/7. Also, I really doubt this new compatible is necessary at all as you could simply directly use mediatek,mt6397-rtc. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: [ANNOUNCE] trace-cmd v2.8.1
Cool !! On 12:34 Fri 05 Jul , Steven Rostedt wrote: Just after releasing 2.8, some bugs were found (isn't that always the case?). Now we have 2.8.1 stable release: http://trace-cmd.org -- Steve Short log here: Greg Thelen (2): trace-cmd: Always initialize write_record() len trace-cmd: Avoid using uninitialized handle Steven Rostedt (VMware) (1): trace-cmd: Version 2.8.1 Tzvetomir Stoyanov (VMware) (1): trace-cmd: Do not free pages from the lookup table in struct cpu_data in case trace file is loaded. signature.asc Description: PGP signature
[GIT PULL] Keys: Set 3 - Keyrings namespacing for 5.3
Here's my third block of keyrings changes for the next merge window. These patches help make keys and keyrings more namespace aware. Firstly some miscellaneous patches to make the process easier: (1) Simplify key index_key handling so that the word-sized chunks assoc_array requires don't have to be shifted about, making it easier to add more bits into the key. (2) Cache the hash value in the key so that we don't have to calculate on every key we examine during a search (it involves a bunch of multiplications). (3) Allow keying_search() to search non-recursively. Then the main patches: (4) Make it so that keyring names are per-user_namespace from the point of view of KEYCTL_JOIN_SESSION_KEYRING so that they're not accessible cross-user_namespace. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEYRING_NAME for this. (5) Move the user and user-session keyrings to the user_namespace rather than the user_struct. This prevents them propagating directly across user_namespaces boundaries (ie. the KEY_SPEC_* flags will only pick from the current user_namespace). (6) Make it possible to include the target namespace in which the key shall operate in the index_key. This will allow the possibility of multiple keys with the same description, but different target domains to be held in the same keyring. keyctl_capabilities() shows KEYCTL_CAPS1_NS_KEY_TAG for this. (7) Make it so that keys are implicitly invalidated by removal of a domain tag, causing them to be garbage collected. (8) Institute a network namespace domain tag that allows keys to be differentiated by the network namespace in which they operate. New keys that are of a type marked 'KEY_TYPE_NET_DOMAIN' are assigned the network domain in force when they are created. (9) Make it so that the desired network namespace can be handed down into the request_key() mechanism. This allows AFS, NFS, etc. to request keys specific to the network namespace of the superblock. This also means that the keys in the DNS record cache are thenceforth namespaced, provided network filesystems pass the appropriate network namespace down into dns_query(). For DNS, AFS and NFS are good, whilst CIFS and Ceph are not. Other cache keyrings, such as idmapper keyrings, also need to set the domain tag - for which they need access to the network namespace of the superblock. David --- The following changes since commit 3b8c4a08a471d56ecaaca939c972fdf5b8255629: keys: Kill off request_key_async{,_with_auxdata} (2019-06-26 20:58:13 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/keys-namespace-20190627 for you to fetch changes up to a58946c158a040068e7c94dc1d58bbd273258068: keys: Pass the network namespace into request_key mechanism (2019-06-27 23:02:12 +0100) Keyrings namespacing David Howells (9): keys: Simplify key description management keys: Cache the hash value to avoid lots of recalculation keys: Add a 'recurse' flag for keyring searches keys: Namespace keyring names keys: Move the user and user-session keyrings to the user_namespace keys: Include target namespace in match criteria keys: Garbage collect keys for which the domain has been removed keys: Network namespace domain tag keys: Pass the network namespace into request_key mechanism Documentation/security/keys/core.rst| 38 ++-- Documentation/security/keys/request-key.rst | 29 ++- certs/blacklist.c | 2 +- crypto/asymmetric_keys/asymmetric_type.c| 2 +- fs/afs/addr_list.c | 4 +- fs/afs/dynroot.c| 8 +- fs/cifs/dns_resolve.c | 3 +- fs/nfs/dns_resolve.c| 3 +- fs/nfs/nfs4idmap.c | 2 +- include/linux/dns_resolver.h| 3 +- include/linux/key-type.h| 3 + include/linux/key.h | 81 - include/linux/sched/user.h | 14 -- include/linux/user_namespace.h | 12 +- include/net/net_namespace.h | 3 + include/uapi/linux/keyctl.h | 2 + kernel/user.c | 8 +- kernel/user_namespace.c | 9 +- lib/digsig.c| 2 +- net/ceph/messenger.c| 3 +- net/core/net_namespace.c| 20 +++ net/dns_resolver/dns_key.c | 1 + net/dns_resolver/dns_query.c| 7 +- net/rxrpc/key.c | 6 +- net/rxrpc/security.c|
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On 05/07/2019 21:49, Paolo Bonzini wrote: > On 05/07/19 22:25, Thomas Gleixner wrote: >> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which >> I'm disappointed to see wasn't shared with other software vendors at the >> time. > Oh, that brings back memories. At the time I was working on Xen, so I > remember that CVE. IIRC there was some mitigation but the fix was > basically to print a very scary error message if you used VT-d without > interrupt remapping. Maybe force the user to add something on the Xen > command line too? It was before my time. I have no public comment on how the other aspects of it were handled. >> Is there any serious usage of virtualization w/o interrupt remapping left >> or have the machines which are not capable been retired already? > I think they were already starting to disappear in 2011, as I don't > remember much worry about customers that were using systems without it. ISTR Nehalem/Westmere era systems were the first to support interrupt remapping, but were totally crippled with errata to the point of needing to turn a prerequisite feature (Queued Invalidation) off. I believe later systems have it working to a first approximation. As to the original question, whether people should be using such systems is a different question to whether they actually are. ~Andrew
[GIT PULL] Keys: Set 2 - request_key() improvements for 5.3
Hi Linus, Here's my second block of keyrings changes for the next merge window. These are all request_key()-related, including a fix and some improvements: (1) Fix the lack of a Link permission check on a key found by request_key(), thereby enabling request_key() to link keys that don't grant this permission to the target keyring (which must still grant Write permission). Note that the key must be in the caller's keyrings already to be found. (2) Invalidate used request_key authentication keys rather than revoking them, so that they get cleaned up immediately rather than hanging around till the expiry time is passed. (3) Move the RCU locks outwards from the keyring search functions so that a request_key_rcu() can be provided. This can be called in RCU mode, so it can't sleep and can't upcall - but it can be called from LOOKUP_RCU pathwalk mode. (4) Cache the latest positive result of request_key*() temporarily in task_struct so that filesystems that make a lot of request_key() calls during pathwalk can take advantage of it to avoid having to redo the searching. This requires CONFIG_KEYS_REQUEST_CACHE=y. It is assumed that the key just found is likely to be used multiple times in each step in an RCU pathwalk, and is likely to be reused for the next step too. Note that the cleanup of the cache is done on TIF_NOTIFY_RESUME, just before userspace resumes, and on exit. David --- The following changes since commit 45e0f30c30bb131663fbe1752974d6f2e39611e2: keys: Add capability-checking keyctl function (2019-06-19 13:27:45 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/keys-request-20190626 for you to fetch changes up to 3b8c4a08a471d56ecaaca939c972fdf5b8255629: keys: Kill off request_key_async{,_with_auxdata} (2019-06-26 20:58:13 +0100) request_key improvements David Howells (6): keys: Fix request_key() lack of Link perm check on found key keys: Invalidate used request_key authentication keys keys: Move the RCU locks outwards from the keyring search functions keys: Provide request_key_rcu() keys: Cache result of request_key*() temporarily in task_struct keys: Kill off request_key_async{,_with_auxdata} Documentation/security/keys/core.rst| 38 ++-- Documentation/security/keys/request-key.rst | 33 +++ include/keys/request_key_auth-type.h| 1 + include/linux/key.h | 14 +-- include/linux/sched.h | 5 + include/linux/tracehook.h | 7 ++ kernel/cred.c | 9 ++ security/keys/Kconfig | 18 security/keys/internal.h| 6 +- security/keys/key.c | 4 +- security/keys/keyring.c | 16 ++-- security/keys/proc.c| 4 +- security/keys/process_keys.c| 41 - security/keys/request_key.c | 137 ++-- security/keys/request_key_auth.c| 60 +++- 15 files changed, 229 insertions(+), 164 deletions(-)
[GIT PULL] Keys: Set 1 - Miscellany for 5.3
Hi Linus, Here's my first block of keyrings changes for the next merge window. I've divided up the set into four blocks, but they need to be applied in order as they would otherwise conflict with each other. These are some miscellaneous keyrings fixes and improvements: (1) Fix a bunch of warnings from sparse, including missing RCU bits and kdoc-function argument mismatches (2) Implement a keyctl to allow a key to be moved from one keyring to another, with the option of prohibiting key replacement in the destination keyring. (3) Grant Link permission to possessors of request_key_auth tokens so that upcall servicing daemons can more easily arrange things such that only the necessary auth key is passed to the actual service program, and not all the auth keys a daemon might possesss. (4) Improvement in lookup_user_key(). (5) Implement a keyctl to allow keyrings subsystem capabilities to be queried. The keyutils next branch has commits to make available, document and test the move-key and capabilities code: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log They're currently on the 'next' branch. David --- The following changes since commit a188339ca5a396acc588e5851ed7e19f66b0ebd9: Linux 5.2-rc1 (2019-05-19 15:47:09 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/keys-misc-20190619 for you to fetch changes up to 45e0f30c30bb131663fbe1752974d6f2e39611e2: keys: Add capability-checking keyctl function (2019-06-19 13:27:45 +0100) Keyrings miscellany David Howells (9): keys: sparse: Fix key_fs[ug]id_changed() keys: sparse: Fix incorrect RCU accesses keys: sparse: Fix kdoc mismatches keys: Change keyring_serialise_link_sem to a mutex keys: Break bits out of key_unlink() keys: Hoist locking out of __key_link_begin() keys: Add a keyctl to move a key between keyrings keys: Grant Link permission to possessers of request_key auth keys keys: Add capability-checking keyctl function Eric Biggers (1): keys: Reuse keyring_index_key::desc_len in lookup_user_key() Documentation/security/keys/core.rst | 21 +++ include/linux/key.h | 13 +- include/uapi/linux/keyctl.h | 17 +++ kernel/cred.c| 4 +- security/keys/compat.c | 6 + security/keys/internal.h | 7 + security/keys/key.c | 27 +++- security/keys/keyctl.c | 90 +++- security/keys/keyring.c | 278 --- security/keys/process_keys.c | 26 ++-- security/keys/request_key.c | 9 +- security/keys/request_key_auth.c | 4 +- 12 files changed, 418 insertions(+), 84 deletions(-)
Re: [PATCH] cpu/hotplug: Cache number of online CPUs
On Fri, 5 Jul 2019, Thomas Gleixner wrote: > On Fri, 5 Jul 2019, Mathieu Desnoyers wrote: > > - On Jul 5, 2019, at 4:49 AM, Ingo Molnar mi...@kernel.org wrote: > > > * Mathieu Desnoyers wrote: > > >> The semantic I am looking for here is C11's relaxed atomics. > > > > > > What does this mean? > > > > C11 states: > > > > "Atomic operations specifying memory_order_relaxed are relaxed only with > > respect > > to memory ordering. Implementations must still guarantee that any given > > atomic access > > to a particular atomic object be indivisible with respect to all other > > atomic accesses > > to that object." > > > > So I am concerned that num_online_cpus() as proposed in this patch > > try to access __num_online_cpus non-atomically, and without using > > READ_ONCE(). > > > > > > Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex > > does not provide mutual exclusion against concurrent readers of that > > variable. > > Again. This is nothing new. The current implementation of num_online_cpus() > has no guarantees whatsoever. > > bitmap_hweight() can be hit by a concurrent update of the mask it is > looking at. > > num_online_cpus() gives you only the correct number if you invoke it inside > a cpuhp_lock held section. So why do we need that fuzz about atomicity now? > > It's racy and was racy forever and even if we add that READ/WRITE_ONCE muck > then it still wont give you a reliable answer unless you hold cpuhp_lock at > least for read. So fore me that READ/WRITE_ONCE is just a cosmetic and > misleading reality distortion. That said. If it makes everyone happy and feel better, I'm happy to add it along with a bit fat comment which explains that it's just preventing a theoretical store/load tearing issue and does not provide any guarantees other than that. Thanks, tglx
Re: [PATCH] mtd: spinand: Fix max_bad_eraseblocks_per_lun info in memorg
On Thu, 2019-06-06 at 17:07:55 UTC, Schrempf Frieder wrote: > From: Frieder Schrempf > > The 1Gb Macronix chip can have a maximum of 20 bad blocks, while > the 2Gb version has twice as many blocks and therefore the maximum > number of bad blocks is 40. > > The 4Gb GigaDevice GD5F4GQ4xA has twice as many blocks as its 2Gb > counterpart and therefore a maximum of 80 bad blocks. > > Fixes: 377e517b5fa5 ("mtd: nand: Add max_bad_eraseblocks_per_lun info to > memorg") > Reported-by: Emil Lenngren > Signed-off-by: Frieder Schrempf Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes, thanks. Miquel
Re: [PATCH] mtd: rawnand: ingenic: Fix ingenic_ecc dependency
On Sat, 2019-06-29 at 01:22:48 UTC, Paul Cercueil wrote: > If MTD_NAND_JZ4780 is y and MTD_NAND_JZ4780_BCH is m, > which select CONFIG_MTD_NAND_INGENIC_ECC to m, building fails: > > drivers/mtd/nand/raw/ingenic/ingenic_nand.o: In function > `ingenic_nand_remove': > ingenic_nand.c:(.text+0x177): undefined reference to `ingenic_ecc_release' > drivers/mtd/nand/raw/ingenic/ingenic_nand.o: In function > `ingenic_nand_ecc_correct': > ingenic_nand.c:(.text+0x2ee): undefined reference to `ingenic_ecc_correct' > > To fix that, the ingenic_nand and ingenic_ecc modules have been fused > into one single module. > - The ingenic_ecc.c code is now compiled in only if > $(CONFIG_MTD_NAND_INGENIC_ECC) is set. This is now a boolean instead > of tristate. > - To avoid changing the module name, the ingenic_nand.c file is moved to > ingenic_nand_drv.c. Then the module name is still ingenic_nand. > - Since ingenic_ecc.c is no more a module, the module-specific macros > have been dropped, and the functions are no more exported for use by > the ingenic_nand driver. > > Fixes: 15de8c6efd0e ("mtd: rawnand: ingenic: Separate top-level and SoC > specific code") > Signed-off-by: Paul Cercueil > Reported-by: Arnd Bergmann > Reported-by: Hulk Robot > Cc: YueHaibing > Cc: sta...@vger.kernel.org Applied to https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git mtd/fixes, thanks. Miquel
Re: [PATCH] [STABLE backport 4.9] arm64, vdso: Define vdso_{start,end} as array
On Fri, Jul 05, 2019 at 08:47:20PM +0200, Arnd Bergmann wrote: From: Kees Cook Commit dbbb08f500d6146398b794fdc68a8e811366b451 upstream. Adjust vdso_{start|end} to be char arrays to avoid compile-time analysis that flags "too large" memcmp() calls with CONFIG_FORTIFY_SOURCE. Cc: Jisheng Zhang Acked-by: Catalin Marinas Suggested-by: Mark Rutland Signed-off-by: Kees Cook Signed-off-by: Will Deacon Signed-off-by: Arnd Bergmann --- Backported to 4.9, which is lacking the rework from 2077be6783b5 ("arm64: Use __pa_symbol for kernel symbols") I've queued both this and the 4.4 backport, thanks! -- Thanks, Sasha
Re: [PATCH] cpu/hotplug: Cache number of online CPUs
On Fri, 5 Jul 2019, Mathieu Desnoyers wrote: > - On Jul 5, 2019, at 4:49 AM, Ingo Molnar mi...@kernel.org wrote: > > * Mathieu Desnoyers wrote: > >> The semantic I am looking for here is C11's relaxed atomics. > > > > What does this mean? > > C11 states: > > "Atomic operations specifying memory_order_relaxed are relaxed only with > respect > to memory ordering. Implementations must still guarantee that any given > atomic access > to a particular atomic object be indivisible with respect to all other atomic > accesses > to that object." > > So I am concerned that num_online_cpus() as proposed in this patch > try to access __num_online_cpus non-atomically, and without using > READ_ONCE(). > > > Similarly, the update-side should use WRITE_ONCE(). Protecting with a mutex > does not provide mutual exclusion against concurrent readers of that variable. Again. This is nothing new. The current implementation of num_online_cpus() has no guarantees whatsoever. bitmap_hweight() can be hit by a concurrent update of the mask it is looking at. num_online_cpus() gives you only the correct number if you invoke it inside a cpuhp_lock held section. So why do we need that fuzz about atomicity now? It's racy and was racy forever and even if we add that READ/WRITE_ONCE muck then it still wont give you a reliable answer unless you hold cpuhp_lock at least for read. So fore me that READ/WRITE_ONCE is just a cosmetic and misleading reality distortion. Thanks, tglx
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On 05/07/19 22:25, Thomas Gleixner wrote: > In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which > I'm disappointed to see wasn't shared with other software vendors at the > time. Oh, that brings back memories. At the time I was working on Xen, so I remember that CVE. IIRC there was some mitigation but the fix was basically to print a very scary error message if you used VT-d without interrupt remapping. Maybe force the user to add something on the Xen command line too? > The more interesting question is whether this is all relevant. If I > understood the issue correctly then this is mitigated by proper interrupt > remapping. Yes, and for Linux we're good I think. VFIO by default refuses to use the IOMMU if interrupt remapping is absent or disabled, and KVM's own (pre-VFIO) IOMMU support was removed a couple years ago. I guess the secure boot lockdown patches should outlaw VFIO's allow_unsafe_interrupts option, but that's it. > Is there any serious usage of virtualization w/o interrupt remapping left > or have the machines which are not capable been retired already? I think they were already starting to disappear in 2011, as I don't remember much worry about customers that were using systems without it. Paolo
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On 05/07/2019 20:19, Nadav Amit wrote: >> On Jul 5, 2019, at 8:47 AM, Andrew Cooper wrote: >> >> On 04/07/2019 16:51, Thomas Gleixner wrote: >>> 2) The loop termination logic is interesting at best. >>> >>> If the machine has no TSC or cpu_khz is not known yet it tries 1 >>> million times to ack stale IRR/ISR bits. What? >>> >>> With TSC it uses the TSC to calculate the loop termination. It takes a >>> timestamp at entry and terminates the loop when: >>> >>> (rdtsc() - start_timestamp) >= (cpu_hkz << 10) >>> >>> That's roughly one second. >>> >>> Both methods are problematic. The APIC has 256 vectors, which means >>> that in theory max. 256 IRR/ISR bits can be set. In practice this is >>> impossible as the first 32 vectors are reserved and not affected and >>> the chance that more than a few bits are set is close to zero. >> [Disclaimer. I talked to Thomas in private first, and he asked me to >> post this publicly as the CVE is almost a decade old already.] >> >> I'm afraid that this isn't quite true. >> >> In terms of IDT vectors, the first 32 are reserved for exceptions, but >> only the first 16 are reserved in the LAPIC. Vectors 16-31 are fair >> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors). >> >> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which >> I'm disappointed to see wasn't shared with other software vendors at the >> time. > IIRC (and from skimming the CVE again) the basic problem in Xen was that > MSIs can be used when devices are assigned to generate IRQs with arbitrary > vectors. The mitigation was to require interrupt remapping to be enabled in > the IOMMU when IOMMU is used for DMA remapping (i.e., device assignment). > > Are you concerned about this case, additional concrete ones, or is it about > security hardening? (or am I missing something?) The phrase "impossible as the first 32 vectors are reserved" stuck out, because its not true. That generally means that any logic derived from it is also false. :) In practice, I was thinking more about robustness against buggy conditions. Setting TPR to 1 at start of day is very easy. Some of the other protections, less so. When it comes to virtualisation, security is an illusion when a guest kernel has a real piece of hardware in its hands. Anyone who is under the misapprehension otherwise should try talking to a IOMMU hardware engineer and see the reaction on their face. IOMMUs were designed to isolate devices when all controlling software was of the same privilege level. They don't magically make the system safe against a hostile guest device driver, which in the most basic case, can still mount a DoS attempt with deliberately bad DMA. ~Andrew
Re: [PATCH] dax: Fix missed PMD wakeups
On Fri, Jul 5, 2019 at 12:10 PM Matthew Wilcox wrote: > > On Thu, Jul 04, 2019 at 04:27:14PM -0700, Dan Williams wrote: > > On Thu, Jul 4, 2019 at 12:14 PM Matthew Wilcox wrote: > > > > > > On Thu, Jul 04, 2019 at 06:54:50PM +0200, Jan Kara wrote: > > > > On Wed 03-07-19 20:27:28, Matthew Wilcox wrote: > > > > > So I think we're good for all current users. > > > > > > > > Agreed but it is an ugly trap. As I already said, I'd rather pay the > > > > unnecessary cost of waiting for pte entry and have an easy to understand > > > > interface. If we ever have a real world use case that would care for > > > > this > > > > optimization, we will need to refactor functions to make this possible > > > > and > > > > still keep the interfaces sane. For example get_unlocked_entry() could > > > > return special "error code" indicating that there's no entry with > > > > matching > > > > order in xarray but there's a conflict with it. That would be much less > > > > error-prone interface. > > > > > > This is an internal interface. I think it's already a pretty gnarly > > > interface to use by definition -- it's going to sleep and might return > > > almost anything. There's not much scope for returning an error indicator > > > either; value entries occupy half of the range (all odd numbers between 1 > > > and ULONG_MAX inclusive), plus NULL. We could use an internal entry, but > > > I don't think that makes the interface any easier to use than returning > > > a locked entry. > > > > > > I think this iteration of the patch makes it a little clearer. What do > > > you > > > think? > > > > > > > Not much clearer to me. get_unlocked_entry() is now misnamed and this > > misnamed? You'd rather it was called "try_get_unlocked_entry()"? I was thinking more along the lines of get_unlocked_but_sometimes_locked_entry(), i.e. per Jan's feedback to keep the interface simple.
[PATCH] rtc: zynqmp: One function call less in xlnx_rtc_alarm_irq_enable()
From: Markus Elfring Date: Fri, 5 Jul 2019 22:37:58 +0200 Avoid an extra function call by using a ternary operator instead of a conditional statement for a setting selection. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/rtc/rtc-zynqmp.c | 7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/rtc/rtc-zynqmp.c b/drivers/rtc/rtc-zynqmp.c index 00639594de0c..4631019a54e2 100644 --- a/drivers/rtc/rtc-zynqmp.c +++ b/drivers/rtc/rtc-zynqmp.c @@ -124,11 +124,8 @@ static int xlnx_rtc_alarm_irq_enable(struct device *dev, u32 enabled) { struct xlnx_rtc_dev *xrtcdev = dev_get_drvdata(dev); - if (enabled) - writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_EN); - else - writel(RTC_INT_ALRM, xrtcdev->reg_base + RTC_INT_DIS); - + writel(RTC_INT_ALRM, + xrtcdev->reg_base + (enabled ? RTC_INT_EN : RTC_INT_DIS)); return 0; } -- 2.22.0
[PATCH bpf-next 1/2] bpf, libbpf: add a new API bpf_object__reuse_maps()
Add a new API bpf_object__reuse_maps() which can be used to replace all maps in an object by maps pinned to a directory provided in the path argument. Namely, each map M in the object will be replaced by a map pinned to path/M.name. Signed-off-by: Anton Protopopov --- tools/lib/bpf/libbpf.c | 34 ++ tools/lib/bpf/libbpf.h | 2 ++ tools/lib/bpf/libbpf.map | 1 + 3 files changed, 37 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 4907997289e9..84c9e8f7bfd3 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -3144,6 +3144,40 @@ int bpf_object__unpin_maps(struct bpf_object *obj, const char *path) return 0; } +int bpf_object__reuse_maps(struct bpf_object *obj, const char *path) +{ + struct bpf_map *map; + + if (!obj) + return -ENOENT; + + if (!path) + return -EINVAL; + + bpf_object__for_each_map(map, obj) { + int len, err; + int pinned_map_fd; + char buf[PATH_MAX]; + + len = snprintf(buf, PATH_MAX, "%s/%s", path, bpf_map__name(map)); + if (len < 0) { + return -EINVAL; + } else if (len >= PATH_MAX) { + return -ENAMETOOLONG; + } + + pinned_map_fd = bpf_obj_get(buf); + if (pinned_map_fd < 0) + return pinned_map_fd; + + err = bpf_map__reuse_fd(map, pinned_map_fd); + if (err) + return err; + } + + return 0; +} + int bpf_object__pin_programs(struct bpf_object *obj, const char *path) { struct bpf_program *prog; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index d639f47e3110..7fe465a1be76 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -82,6 +82,8 @@ int bpf_object__variable_offset(const struct bpf_object *obj, const char *name, LIBBPF_API int bpf_object__pin_maps(struct bpf_object *obj, const char *path); LIBBPF_API int bpf_object__unpin_maps(struct bpf_object *obj, const char *path); +LIBBPF_API int bpf_object__reuse_maps(struct bpf_object *obj, + const char *path); LIBBPF_API int bpf_object__pin_programs(struct bpf_object *obj, const char *path); LIBBPF_API int bpf_object__unpin_programs(struct bpf_object *obj, diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 2c6d835620d2..66a30be6696c 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -172,5 +172,6 @@ LIBBPF_0.0.4 { btf_dump__new; btf__parse_elf; bpf_object__load_xattr; + bpf_object__reuse_maps; libbpf_num_possible_cpus; } LIBBPF_0.0.3; -- 2.19.1
[PATCH bpf-next 2/2] bpf, libbpf: add an option to reuse existing maps in bpf_prog_load_xattr
Add a new pinned_maps_path member to the bpf_prog_load_attr structure and extend the bpf_prog_load_xattr() function to pass this pointer to the new bpf_object__reuse_maps() helper. This change provides users with a simple way to use existing pinned maps when (re)loading BPF programs. Signed-off-by: Anton Protopopov --- tools/lib/bpf/libbpf.c | 8 tools/lib/bpf/libbpf.h | 1 + 2 files changed, 9 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 84c9e8f7bfd3..9daa09c9fe1a 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -3953,6 +3953,14 @@ int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr, first_prog = prog; } + if (attr->pinned_maps_path) { + err = bpf_object__reuse_maps(obj, attr->pinned_maps_path); + if (err < 0) { + bpf_object__close(obj); + return err; + } + } + bpf_object__for_each_map(map, obj) { if (!bpf_map__is_offload_neutral(map)) map->map_ifindex = attr->ifindex; diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index 7fe465a1be76..6bf405bb9c1f 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -329,6 +329,7 @@ struct bpf_prog_load_attr { int ifindex; int log_level; int prog_flags; + const char *pinned_maps_path; }; LIBBPF_API int bpf_prog_load_xattr(const struct bpf_prog_load_attr *attr, -- 2.19.1
[PATCH bpf-next 0/2] libbpf: add an option to reuse maps when loading a program
The following two patches add an option for users to reuse existing maps when loading a program using the bpf_prog_load_xattr function. A user can specify a directory containing pinned maps inside the bpf_prog_load_attr structure, and in this case the bpf_prog_load_xattr function will replace (bpf_map__reuse_fd) all maps defined in the object with file descriptors obtained from corresponding entries from the specified directory. Anton Protopopov (2): bpf, libbpf: add a new API bpf_object__reuse_maps() bpf, libbpf: add an option to reuse existing maps in bpf_prog_load_xattr tools/lib/bpf/libbpf.c | 42 tools/lib/bpf/libbpf.h | 3 +++ tools/lib/bpf/libbpf.map | 1 + 3 files changed, 46 insertions(+) -- 2.19.1
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On Fri, Jul 5, 2019 at 1:36 PM Thomas Gleixner wrote: > > On Fri, 5 Jul 2019, Andy Lutomirski wrote: > > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper > > wrote: > > > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX > > > without an error code on the stack, which results in a corrupt pt_regs > > > in the exception handler, and a stack underflow on the way back out, > > > most likely with a fault on IRET. > > > > > > These can be addressed by setting TPR to 0x10, which will inhibit > > > delivery of any errant IPIs in this range, but some extra sanity logic > > > may not go amiss. An error code on a 64bit stack can be spotted with > > > `testb $8, %spl` due to %rsp being aligned before pushing the exception > > > frame. > > > > Several years ago, I remember having a discussion with someone (Jan > > Beulich, maybe?) about how to efficiently make the entry code figure > > out the error code situation automatically. I suspect it was on IRC > > and I can't find the logs. I'm thinking that maybe we should just > > make Linux's idtentry code do something like this. > > > > If nothing else, we could make idtentry do: > > > > testl $8, %esp /* shorter than testb IIRC */ > > jz 1f /* or jnz -- too lazy to figure it out */ > > pushq $-1 > > 1: > > Errm, no. We should not silently paper over it. If we detect that this came > in with a wrong stack frame, i.e. not from a CPU originated exception, then > we truly should yell loud. Also in that case you want to check the APIC:ISR > and issue an EOI to clear it. It gives us the option to replace idtentry with something table-driven. I don't think I love it, but it's not an awful idea. > > > > Another interesting problem is an IPI which its vector 0x80. A cunning > > > attacker can use this to simulate system calls from unsuspecting > > > positions in userspace, or for interrupting kernel context. At the very > > > least the int0x80 path does an unconditional swapgs, so will try to run > > > with the user gs, and I expect things will explode quickly from there. > > > > At least SMAP helps here on non-FSGSBASE systems. With FSGSBASE, I > > How does it help? It still crashes the kernel. > > > suppose we could harden this by adding a special check to int $0x80 to > > validate GSBASE. > > > > One option here is to look at ISR and complain if it is found to be set. > > > > Barring some real hackery, we're toast long before we get far enough to > > do that. > > No. We can map the APIC into the user space visible page tables for PTI > without compromising the PTI isolation and it can be read very early on > before SWAPGS. All you need is a register to clobber not more. It the ISR > is set, then go into an error path, yell loudly, issue EOI and return. > The only issue I can see is: It's slow :) > > I think this will be really extremely slow. If we can restrict this to x2apic machines, then maybe it's not so awful. FWIW, if we just patch up the GS thing, then we are still vulnerable: the bad guy can arrange for a privileged process to have register state corresponding to a dangerous syscall and then send an int $0x80 via the APIC.
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On Fri, Jul 5, 2019 at 1:25 PM Thomas Gleixner wrote: > > Andrew, > > > > > These can be addressed by setting TPR to 0x10, which will inhibit > > Right, that's easy and obvious. > This boots: diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index 177aa8ef2afa..5257c40bde6c 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1531,11 +1531,14 @@ static void setup_local_APIC(void) #endif /* -* Set Task Priority to 'accept all'. We never change this -* later on. +* Set Task Priority to 'accept all except vectors 0-31'. An APIC +* vector in the 16-31 range can be delivered otherwise, but we'll +* think it's an exception and terrible things will happen. +* We never change this later on. */ value = apic_read(APIC_TASKPRI); value &= ~APIC_TPRI_MASK; + value |= 0x10; apic_write(APIC_TASKPRI, value); apic_pending_intr_clear();
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On Fri, 5 Jul 2019, Andy Lutomirski wrote: > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper > wrote: > > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX > > without an error code on the stack, which results in a corrupt pt_regs > > in the exception handler, and a stack underflow on the way back out, > > most likely with a fault on IRET. > > > > These can be addressed by setting TPR to 0x10, which will inhibit > > delivery of any errant IPIs in this range, but some extra sanity logic > > may not go amiss. An error code on a 64bit stack can be spotted with > > `testb $8, %spl` due to %rsp being aligned before pushing the exception > > frame. > > Several years ago, I remember having a discussion with someone (Jan > Beulich, maybe?) about how to efficiently make the entry code figure > out the error code situation automatically. I suspect it was on IRC > and I can't find the logs. I'm thinking that maybe we should just > make Linux's idtentry code do something like this. > > If nothing else, we could make idtentry do: > > testl $8, %esp /* shorter than testb IIRC */ > jz 1f /* or jnz -- too lazy to figure it out */ > pushq $-1 > 1: Errm, no. We should not silently paper over it. If we detect that this came in with a wrong stack frame, i.e. not from a CPU originated exception, then we truly should yell loud. Also in that case you want to check the APIC:ISR and issue an EOI to clear it. > > Another interesting problem is an IPI which its vector 0x80. A cunning > > attacker can use this to simulate system calls from unsuspecting > > positions in userspace, or for interrupting kernel context. At the very > > least the int0x80 path does an unconditional swapgs, so will try to run > > with the user gs, and I expect things will explode quickly from there. > > At least SMAP helps here on non-FSGSBASE systems. With FSGSBASE, I How does it help? It still crashes the kernel. > suppose we could harden this by adding a special check to int $0x80 to > validate GSBASE. > > One option here is to look at ISR and complain if it is found to be set. > > Barring some real hackery, we're toast long before we get far enough to > do that. No. We can map the APIC into the user space visible page tables for PTI without compromising the PTI isolation and it can be read very early on before SWAPGS. All you need is a register to clobber not more. It the ISR is set, then go into an error path, yell loudly, issue EOI and return. The only issue I can see is: It's slow :) Thanks, tglx
[GIT PULL] Final KVM changes for 5.2
Linus, The following changes since commit 6fbc7275c7a9ba97877050335f290341a1fd8dbf: Linux 5.2-rc7 (2019-06-30 11:25:36 +0800) are available in the git repository at: https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to e644fa18e2ffc8895ca30dade503ae10128573a6: KVM: arm64/sve: Fix vq_present() macro to yield a bool (2019-07-05 12:07:51 +0200) x86 bugfix patches and one compilation fix for ARM. Liran Alon (2): KVM: nVMX: Allow restore nested-state to enable eVMCS when vCPU in SMM KVM: nVMX: Change KVM_STATE_NESTED_EVMCS to signal vmcs12 is copied from eVMCS Paolo Bonzini (1): KVM: x86: degrade WARN to pr_warn_ratelimited Wanpeng Li (1): KVM: LAPIC: Fix pending interrupt in IRR blocked by software disable LAPIC Zhang Lei (1): KVM: arm64/sve: Fix vq_present() macro to yield a bool arch/arm64/kvm/guest.c | 2 +- arch/x86/kvm/lapic.c| 2 +- arch/x86/kvm/vmx/nested.c | 30 - arch/x86/kvm/x86.c | 6 ++--- tools/testing/selftests/kvm/x86_64/evmcs_test.c | 1 + 5 files changed, 26 insertions(+), 15 deletions(-)
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
Andrew, On Fri, 5 Jul 2019, Andrew Cooper wrote: > On 04/07/2019 16:51, Thomas Gleixner wrote: > > 2) The loop termination logic is interesting at best. > > > > If the machine has no TSC or cpu_khz is not known yet it tries 1 > > million times to ack stale IRR/ISR bits. What? > > > > With TSC it uses the TSC to calculate the loop termination. It takes a > > timestamp at entry and terminates the loop when: > > > > (rdtsc() - start_timestamp) >= (cpu_hkz << 10) > > > > That's roughly one second. > > > > Both methods are problematic. The APIC has 256 vectors, which means > > that in theory max. 256 IRR/ISR bits can be set. In practice this is > > impossible as the first 32 vectors are reserved and not affected and > > the chance that more than a few bits are set is close to zero. > > [Disclaimer. I talked to Thomas in private first, and he asked me to > post this publicly as the CVE is almost a decade old already.] thanks for bringing this up! > I'm afraid that this isn't quite true. > > In terms of IDT vectors, the first 32 are reserved for exceptions, but > only the first 16 are reserved in the LAPIC. Vectors 16-31 are fair > game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors). Indeed. > In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which > I'm disappointed to see wasn't shared with other software vendors at the > time. No comment. > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX > without an error code on the stack, which results in a corrupt pt_regs > in the exception handler, and a stack underflow on the way back out, > most likely with a fault on IRET. > > These can be addressed by setting TPR to 0x10, which will inhibit Right, that's easy and obvious. > delivery of any errant IPIs in this range, but some extra sanity logic > may not go amiss. An error code on a 64bit stack can be spotted with > `testb $8, %spl` due to %rsp being aligned before pushing the exception > frame. The question is what we do with that information :) > Another interesting problem is an IPI which its vector 0x80. A cunning > attacker can use this to simulate system calls from unsuspecting > positions in userspace, or for interrupting kernel context. At the very > least the int0x80 path does an unconditional swapgs, so will try to run > with the user gs, and I expect things will explode quickly from there. Cute. > One option here is to look at ISR and complain if it is found to be set. That's slw, but could at least provide an option to do so. > Another option, which I've only just remembered, is that AMD hardware > has the Interrupt Enable Register in its extended APIC space, which may > or may not be good enough to prohibit delivery of 0x80. There isn't > enough information in the APM to be clear, but the name suggests it is > worth experimenting with. I doubt it. Clearing a bit in the IER takes the interrupt out of the priority decision logic. That's a SVM feature so interrupts directed directly to guests cannot block other interrupts if they are not serviced. It's grossly misnomed and won't help with the int80 issue. The more interesting question is whether this is all relevant. If I understood the issue correctly then this is mitigated by proper interrupt remapping. Is there any serious usage of virtualization w/o interrupt remapping left or have the machines which are not capable been retired already? Thanks, tglx
Re: [PATCH 2/2] leds: tlc591xx: Use the OF version of the LED registration function
On Mon 2019-07-01 17:26:02, Jean-Jacques Hiblot wrote: > The driver parses the device-tree to identify which LED should be handled. > Since the information about the device node is known at this time, we can > provide the LED core with it. It may be useful later. > > Signed-off-by: Jean-Jacques Hiblot Acked-by: Pavel Machek > @@ -207,7 +207,7 @@ tlc591xx_probe(struct i2c_client *client, > led->led_no = idx++; > led->ldev.brightness_set_blocking = tlc591xx_brightness_set; > led->ldev.max_brightness = LED_FULL; > - err = devm_led_classdev_register(dev, >ldev); > + err = devm_of_led_classdev_register(dev, child, >ldev); > if (err < 0) { > dev_err(dev, "couldn't register LED %s\n", > led->ldev.name); -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
Re: [PATCH 2/8] leds: as3645a: Fix misuse of strlcpy
On Thu 2019-07-04 16:57:42, Joe Perches wrote: > Probable cut typo - use the correct field size. > > Signed-off-by: Joe Perches Ack. Pavel > --- > drivers/leds/leds-as3645a.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/leds/leds-as3645a.c b/drivers/leds/leds-as3645a.c > index 14ab6b0e4de9..050088dff8dd 100644 > --- a/drivers/leds/leds-as3645a.c > +++ b/drivers/leds/leds-as3645a.c > @@ -668,7 +668,7 @@ static int as3645a_v4l2_setup(struct as3645a *flash) > }; > > strlcpy(cfg.dev_name, led->name, sizeof(cfg.dev_name)); > - strlcpy(cfgind.dev_name, flash->iled_cdev.name, sizeof(cfg.dev_name)); > + strlcpy(cfgind.dev_name, flash->iled_cdev.name, > sizeof(cfgind.dev_name)); > > flash->vf = v4l2_flash_init( > >client->dev, flash->flash_node, >fled, NULL, -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: Digital signature
[PATCH] ACPI: PM: Fix "multiple definition of acpi_sleep_state_supported" for ARM64
If CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT is not set, the dummy version of the function should be static. Fixes: 1e2c3f0f1e93 ("ACPI: PM: Make acpi_sleep_state_supported() non-static") Signed-off-by: Dexuan Cui Reported-by: kbuild test robot --- Sorry for not doing it right in the previous patch! The patch fixes the build errors on ARM64: drivers/net/ethernet/qualcomm/emac/emac-phy.o: In function `acpi_sleep_state_supported': >> emac-phy.c:(.text+0x1d8): multiple definition of `acpi_sleep_state_supported' drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first defined here drivers/net/ethernet/qualcomm/emac/emac-sgmii.o: In function `acpi_sleep_state_supported': emac-sgmii.c:(.text+0x548): multiple definition of `acpi_sleep_state_supported' drivers/net/ethernet/qualcomm/emac/emac.o:emac.c:(.text+0xbf8): first defined here include/acpi/acpi_bus.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 4ce59bdc852e..8ffc4acf2b56 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -657,7 +657,7 @@ static inline int acpi_pm_set_bridge_wakeup(struct device *dev, bool enable) #ifdef CONFIG_ACPI_SYSTEM_POWER_STATES_SUPPORT bool acpi_sleep_state_supported(u8 sleep_state); #else -bool acpi_sleep_state_supported(u8 sleep_state) { return false; } +static bool acpi_sleep_state_supported(u8 sleep_state) { return false; } #endif #ifdef CONFIG_ACPI_SLEEP -- 2.17.1
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
On 05/07/2019 20:06, Andy Lutomirski wrote: > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper > wrote: >> On 04/07/2019 16:51, Thomas Gleixner wrote: >>> 2) The loop termination logic is interesting at best. >>> >>> If the machine has no TSC or cpu_khz is not known yet it tries 1 >>> million times to ack stale IRR/ISR bits. What? >>> >>> With TSC it uses the TSC to calculate the loop termination. It takes a >>> timestamp at entry and terminates the loop when: >>> >>> (rdtsc() - start_timestamp) >= (cpu_hkz << 10) >>> >>> That's roughly one second. >>> >>> Both methods are problematic. The APIC has 256 vectors, which means >>> that in theory max. 256 IRR/ISR bits can be set. In practice this is >>> impossible as the first 32 vectors are reserved and not affected and >>> the chance that more than a few bits are set is close to zero. >> [Disclaimer. I talked to Thomas in private first, and he asked me to >> post this publicly as the CVE is almost a decade old already.] >> >> I'm afraid that this isn't quite true. >> >> In terms of IDT vectors, the first 32 are reserved for exceptions, but >> only the first 16 are reserved in the LAPIC. Vectors 16-31 are fair >> game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors). >> >> In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which >> I'm disappointed to see wasn't shared with other software vendors at the >> time. >> >> Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX >> without an error code on the stack, which results in a corrupt pt_regs >> in the exception handler, and a stack underflow on the way back out, >> most likely with a fault on IRET. >> >> These can be addressed by setting TPR to 0x10, which will inhibit >> delivery of any errant IPIs in this range, but some extra sanity logic >> may not go amiss. An error code on a 64bit stack can be spotted with >> `testb $8, %spl` due to %rsp being aligned before pushing the exception >> frame. > Several years ago, I remember having a discussion with someone (Jan > Beulich, maybe?) about how to efficiently make the entry code figure > out the error code situation automatically. I suspect it was on IRC > and I can't find the logs. It was on IRC, but I don't remember exactly when, either. > I'm thinking that maybe we should just > make Linux's idtentry code do something like this. > > If nothing else, we could make idtentry do: > > testl $8, %esp /* shorter than testb IIRC */ Sadly not. test (unlike cmp and the basic mutative opcodes) doesn't have a sign-extendable imm8 encoding. The two options are: f7 c4 08 00 00 00 test $0x8,%esp 40 f6 c4 08 test $0x8,%spl > jz 1f /* or jnz -- too lazy to figure it out */ > pushq $-1 > 1: It is jz, and Xen does use this sequence for reserved/unimplemented vectors, but we expect those codepaths never to be executed. > > instead of the current hardcoded push. The cost of a mispredicted > branch here will be smallish compared to the absurdly large cost of > the entry itself. But I thought I had something more clever than > this. This sequence works, but it still feels like it should be > possible to do better: > > .macro PUSH_ERROR_IF_NEEDED > /* > * Before the IRET frame is pushed, RSP is aligned to a 16-byte > * boundary. After SS .. RIP and the error code are pushed, RSP is > * once again aligned. Pushing -1 will put -1 in the error code slot > * (regs->orig_ax) if there was no error code. > */ > > pushq$-1/* orig_ax = -1, maybe */ > /* now RSP points to orig_ax (aligned) or di (misaligned) */ > pushq$0 > /* now RSP points to di (misaligned) or si (aligned) */ > orq$8, %rsp > /* now RSP points to di */ > addq$8, %rsp > /* now RSP points to orig_ax, and we're in good shape */ > .endm > > Is there a better sequence for this? The only aspect I can think of is whether mixing the push/pops with explicit updates updates to %rsp is better or worse than a very well predicted branch, given that various frontends have special tracking to reduce instruction dependencies on %rsp. I'll have to defer to the CPU microachitects as to which of the two options is the lesser evil. That said, both Intel and AMD's Optimisation guides have stack alignment suggestions which mix push/sub/and on function prolog, so I expect this is as optimised as it can reasonably be in the pipelines. >> Another interesting problem is an IPI which its vector 0x80. A cunning >> attacker can use this to simulate system calls from unsuspecting >> positions in userspace, or for interrupting kernel context. At the very >> least the int0x80 path does an unconditional swapgs, so will try to run >> with the user gs, and I expect things will explode quickly from there. > At least SMAP helps here on non-FSGSBASE systems. With FSGSBASE, I > suppose we could harden this by adding a special
[PATCH] rtc: stm32: One function call less in stm32_rtc_set_alarm()
From: Markus Elfring Date: Fri, 5 Jul 2019 22:10:10 +0200 Avoid an extra function call by using a ternary operator instead of a conditional statement. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/rtc/rtc-stm32.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/rtc/rtc-stm32.c b/drivers/rtc/rtc-stm32.c index 8e6c9b3bcc29..83793b530fed 100644 --- a/drivers/rtc/rtc-stm32.c +++ b/drivers/rtc/rtc-stm32.c @@ -519,11 +519,7 @@ static int stm32_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm) /* Write to Alarm register */ writel_relaxed(alrmar, rtc->base + regs->alrmar); - if (alrm->enabled) - stm32_rtc_alarm_irq_enable(dev, 1); - else - stm32_rtc_alarm_irq_enable(dev, 0); - + stm32_rtc_alarm_irq_enable(dev, alrm->enabled ? 1 : 0); end: stm32_rtc_wpr_lock(rtc); -- 2.22.0
Re: [alsa-devel] [PATCH] sound: soc: codecs: wcd9335: add irqflag IRQF_ONESHOT flag
On Fri, Jul 05, 2019 at 12:40:26AM +0530, Hariprasad Kelam wrote: > Add IRQF_ONESHOT to ensure "Interrupt is not reenabled after the hardirq > handler finished". > > fixes below issue reported by coccicheck > > sound/soc/codecs/wcd9335.c:4068:8-33: ERROR: Threaded IRQ with no > primary handler requested without IRQF_ONESHOT > > Signed-off-by: Hariprasad Kelam > --- > sound/soc/codecs/wcd9335.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/sound/soc/codecs/wcd9335.c b/sound/soc/codecs/wcd9335.c > index 85737fe..7ab9bf6f 100644 > --- a/sound/soc/codecs/wcd9335.c > +++ b/sound/soc/codecs/wcd9335.c > @@ -4056,6 +4056,9 @@ static struct wcd9335_irq wcd9335_irqs[] = { > static int wcd9335_setup_irqs(struct wcd9335_codec *wcd) > { > int irq, ret, i; > + unsigned long irqflags; > + > + irqflags = IRQF_TRIGGER_RISING | IRQF_ONESHOT; Why does this change trigger adding a variable? > for (i = 0; i < ARRAY_SIZE(wcd9335_irqs); i++) { > irq = regmap_irq_get_virq(wcd->irq_data, wcd9335_irqs[i].irq); > @@ -4067,7 +4070,7 @@ static int wcd9335_setup_irqs(struct wcd9335_codec *wcd) > > ret = devm_request_threaded_irq(wcd->dev, irq, NULL, > wcd9335_irqs[i].handler, > - IRQF_TRIGGER_RISING, > + irqflags, > wcd9335_irqs[i].name, wcd); > if (ret) { > dev_err(wcd->dev, "Failed to request %s\n", > -- > 2.7.4 > > ___ > Alsa-devel mailing list > alsa-de...@alsa-project.org > https://mailman.alsa-project.org/mailman/listinfo/alsa-devel
Re: [PATCH v2] fs: Fix the default values of i_uid/i_gid on /proc/sys inodes.
Please re-state the main fix in the commit log, not just the subject. Also, this does not explain why the current values are and the impact to systems / users. This would help in determine and evaluating if this deserves to be a stable fix. On Fri, Jul 05, 2019 at 06:30:21PM +0200, Radoslaw Burny wrote: > This also fixes a problem where, in a user namespace without root user > mapping, it is not possible to write to /proc/sys/kernel/shmmax. This does not explain why that should be possible and what impact this limitation has. > The problem was introduced by the combination of the two commits: > * 81754357770ebd900801231e7bc8d151ddc00498: fs: Update > i_[ug]id_(read|write) to translate relative to s_user_ns > - this caused the kernel to write INVALID_[UG]ID to i_uid/i_gid > members of /proc/sys inodes if a containing userns does not have > entries for root in the uid/gid_map. This is 2014 commit merged as of v4.8. > * 0bd23d09b874e53bd1a2fe2296030aa2720d7b08: vfs: Don't modify inodes > with a uid or gid unknown to the vfs > - changed the kernel to prevent opens for write if the i_uid/i_gid > field in the inode is invalid This is a 2016 commit merged as of v4.8 as well... So regardless of the dates of the commits, are you saying this is a regression you can confirm did not exist prior to v4.8? Did you test v4.7 to confirm? > This commit fixes the issue by defaulting i_uid/i_gid to > GLOBAL_ROOT_UID/GID. Why is this right? > Note that these values are not used for /proc/sys > access checks, so the change does not otherwise affect /proc semantics. > > Tested: Used a repro program that creates a user namespace without any > mapping and stat'ed /proc/$PID/root/proc/sys/kernel/shmmax from outside. > Before the change, it shows the overflow uid, with the change it's 0. Why is the overflow uid bad for user experience? Did you test prior to v4.8, ie on v4.7 to confirm this is indeed a regression? You'd want then to also ammend in the commit log a Fixes: tag with both commits listed. If this is a stable fix (criteria yet to be determined), then we'd need a stable tag. Luis > Signed-off-by: Radoslaw Burny > --- > Changelog since v1: > - Updated the commit title and description. > > fs/proc/proc_sysctl.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c > index c74570736b24..36ad1b0d6259 100644 > --- a/fs/proc/proc_sysctl.c > +++ b/fs/proc/proc_sysctl.c > @@ -499,6 +499,10 @@ static struct inode *proc_sys_make_inode(struct > super_block *sb, > > if (root->set_ownership) > root->set_ownership(head, table, >i_uid, >i_gid); > + else { > + inode->i_uid = GLOBAL_ROOT_UID; > + inode->i_gid = GLOBAL_ROOT_GID; > + } > > return inode; > } > -- > 2.22.0.410.gd8fdbe21b5-goog >
Re: [PATCH] rcuperf: Make rcuperf kernel test more robust for !expedited mode
On Fri, Jul 05, 2019 at 08:09:32AM -0700, Paul E. McKenney wrote: > On Fri, Jul 05, 2019 at 08:24:50AM -0400, Joel Fernandes wrote: > > On Fri, Jul 05, 2019 at 12:52:31PM +0900, Byungchul Park wrote: > > > On Thu, Jul 04, 2019 at 10:40:44AM -0700, Paul E. McKenney wrote: > > > > On Thu, Jul 04, 2019 at 12:34:30AM -0400, Joel Fernandes (Google) wrote: > > > > > It is possible that the rcuperf kernel test runs concurrently with > > > > > init > > > > > starting up. During this time, the system is running all grace > > > > > periods > > > > > as expedited. However, rcuperf can also be run for normal GP tests. > > > > > Right now, it depends on a holdoff time before starting the test to > > > > > ensure grace periods start later. This works fine with the default > > > > > holdoff time however it is not robust in situations where init takes > > > > > greater than the holdoff time to finish running. Or, as in my case: > > > > > > > > > > I modified the rcuperf test locally to also run a thread that did > > > > > preempt disable/enable in a loop. This had the effect of slowing down > > > > > init. The end result was that the "batches:" counter in rcuperf was 0 > > > > > causing a division by 0 error in the results. This counter was 0 > > > > > because > > > > > only expedited GPs seem to happen, not normal ones which led to the > > > > > rcu_state.gp_seq counter remaining constant across grace periods which > > > > > unexpectedly happen to be expedited. The system was running expedited > > > > > RCU all the time because rcu_unexpedited_gp() would not have run yet > > > > > from init. In other words, the test would concurrently with init > > > > > booting in expedited GP mode. > > > > > > > > > > To fix this properly, let us check if system_state if SYSTEM_RUNNING > > > > > is set before starting the test. The system_state approximately aligns > > > > > > Just minor typo.. > > > > > > To fix this properly, let us check if system_state if SYSTEM_RUNNING > > > is set before starting the test. ... > > > > > > Should be > > > > > > To fix this properly, let us check if system_state is set to > > > SYSTEM_RUNNING before starting the test. ... > > > > That's a fair point. I wonder if Paul already fixed it up in his tree, > > however I am happy to resend if he hasn't. Paul, how would you like to > > handle > > this commit log nit? > > > > it is just 'if ..' to 'is SYSTEM_RUNNING' > > It now reads as follows: > > To fix this properly, this commit waits until system_state is > set to SYSTEM_RUNNING before starting the test. This change is > made just before kernel_init() invokes rcu_end_inkernel_boot(), > and this latter is what turns off boot-time expediting of RCU > grace periods. Ok, looks good to me, thanks. And for below patch, Reviewed-by: Joel Fernandes (Google) > I dropped the last paragraph about late_initcall(). And I suspect that > the last clause from rcu_gp_is_expedited() can be dropped: > > bool rcu_gp_is_expedited(void) > { > return rcu_expedited || atomic_read(_expedited_nesting) || > rcu_scheduler_active == RCU_SCHEDULER_INIT; > } > > This is because rcu_expedited_nesting is initialized to 1, and is > decremented in rcu_end_inkernel_boot(), which is called long after > rcu_scheduler_active has been set to RCU_SCHEDULER_RUNNING, which > happens at core_initcall() time. So if the last clause says "true", > so does the second-to-last clause. > > The similar check in rcu_gp_is_normal() is still need, however, to allow > the power-management subsystem to invoke synchronize_rcu() just after > the scheduler has been initialized, but before RCU is aware of this. > > So, how about the commit shown below? > > Thanx, Paul > > > > commit 1f7e72efe3c761c2b34da7b59e01ad69c657db10 > Author: Paul E. McKenney > Date: Fri Jul 5 08:05:10 2019 -0700 > > rcu: Remove redundant "if" condition from rcu_gp_is_expedited() > > Because rcu_expedited_nesting is initialized to 1 and not decremented > until just before init is spawned, rcu_expedited_nesting is guaranteed > to be non-zero whenever rcu_scheduler_active == RCU_SCHEDULER_INIT. > This commit therefore removes this redundant "if" equality test. > > Signed-off-by: Paul E. McKenney > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c > index 249517058b13..64e9cc8609e7 100644 > --- a/kernel/rcu/update.c > +++ b/kernel/rcu/update.c > @@ -136,8 +136,7 @@ static atomic_t rcu_expedited_nesting = ATOMIC_INIT(1); > */ > bool rcu_gp_is_expedited(void) > { > - return rcu_expedited || atomic_read(_expedited_nesting) || > -rcu_scheduler_active == RCU_SCHEDULER_INIT; > + return rcu_expedited || atomic_read(_expedited_nesting); > } > EXPORT_SYMBOL_GPL(rcu_gp_is_expedited); > >
Re: [PATCH net-next] hinic: add fw version query
On Fri, 5 Jul 2019 02:40:28 +, Xue Chaojing wrote: > This patch adds firmware version query in ethtool -i. > > Signed-off-by: Xue Chaojing Reviewed-by: Jakub Kicinski
Buona giornata!
Buona giornata Ho una proposta commerciale reciproca, che si riferisce al trasferimento di una grande quantità di denaro su un conto all'estero, con il tuo aiuto come partner straniero come beneficiario dei fondi. Tutto su questa transazione sarà legale senza alcun ponte di autorità finanziaria sia nel mio paese che nel vostro. Se sei interessato e ti darò maggiori informazioni sul progetto non appena avrò ricevuto la tua risposta positiva. Cordiali saluti, Direttore esecutivo ICBC. porcellana --- Dit e-mailbericht is gecontroleerd op virussen met Avast antivirussoftware. https://www.avast.com/antivirus
Re: [PATCH] nvme: One function call less in nvme_update_disk_info()
On 7/5/19 1:15 PM, Markus Elfring wrote: > From: Markus Elfring > Date: Fri, 5 Jul 2019 21:08:12 +0200 > > Avoid an extra function call by using a ternary operator instead of > a conditional statement. > > This issue was detected by using the Coccinelle software. > > Signed-off-by: Markus Elfring > --- > drivers/nvme/host/core.c | 5 + > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c > index b2dd4e391f5c..73888195bdb2 100644 > --- a/drivers/nvme/host/core.c > +++ b/drivers/nvme/host/core.c > @@ -1650,10 +1650,7 @@ static void nvme_update_disk_info(struct gendisk *disk, > nvme_config_discard(disk, ns); > nvme_config_write_zeroes(disk, ns); > > - if (id->nsattr & (1 << 0)) > - set_disk_ro(disk, true); > - else > - set_disk_ro(disk, false); > + set_disk_ro(disk, id->nsattr & (1 << 0) ? true : false); Let's please not, the original is much more readable. -- Jens Axboe
Re: [PATCH v6 net-next 2/5] net: ethernet: ti: davinci_cpdma: add dma mapped submit
Hi Ivan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Ivan-Khoronzhuk/xdp-allow-same-allocator-usage/20190706-003850 config: arm64-allmodconfig (attached as .config) compiler: aarch64-linux-gcc (GCC) 7.4.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=7.4.0 make.cross ARCH=arm64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All warnings (new ones prefixed by >>): drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit_si': >> drivers/net//ethernet/ti/davinci_cpdma.c:1047:12: warning: cast from pointer >> to integer of different size [-Wpointer-to-int-cast] buffer = (u32)si->data; ^ drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_idle_submit_mapped': >> drivers/net//ethernet/ti/davinci_cpdma.c:1114:12: warning: cast to pointer >> from integer of different size [-Wint-to-pointer-cast] si.data = (void *)(u32)data; ^ drivers/net//ethernet/ti/davinci_cpdma.c: In function 'cpdma_chan_submit_mapped': drivers/net//ethernet/ti/davinci_cpdma.c:1164:12: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] si.data = (void *)(u32)data; ^ vim +1047 drivers/net//ethernet/ti/davinci_cpdma.c 1015 1016 static int cpdma_chan_submit_si(struct submit_info *si) 1017 { 1018 struct cpdma_chan *chan = si->chan; 1019 struct cpdma_ctlr *ctlr = chan->ctlr; 1020 int len = si->len; 1021 int swlen = len; 1022 struct cpdma_desc __iomem *desc; 1023 dma_addr_t buffer; 1024 u32 mode; 1025 int ret; 1026 1027 if (chan->count >= chan->desc_num) { 1028 chan->stats.desc_alloc_fail++; 1029 return -ENOMEM; 1030 } 1031 1032 desc = cpdma_desc_alloc(ctlr->pool); 1033 if (!desc) { 1034 chan->stats.desc_alloc_fail++; 1035 return -ENOMEM; 1036 } 1037 1038 if (len < ctlr->params.min_packet_size) { 1039 len = ctlr->params.min_packet_size; 1040 chan->stats.runt_transmit_buff++; 1041 } 1042 1043 mode = CPDMA_DESC_OWNER | CPDMA_DESC_SOP | CPDMA_DESC_EOP; 1044 cpdma_desc_to_port(chan, mode, si->directed); 1045 1046 if (si->flags & CPDMA_DMA_EXT_MAP) { > 1047 buffer = (u32)si->data; 1048 dma_sync_single_for_device(ctlr->dev, buffer, len, chan->dir); 1049 swlen |= CPDMA_DMA_EXT_MAP; 1050 } else { 1051 buffer = dma_map_single(ctlr->dev, si->data, len, chan->dir); 1052 ret = dma_mapping_error(ctlr->dev, buffer); 1053 if (ret) { 1054 cpdma_desc_free(ctlr->pool, desc, 1); 1055 return -EINVAL; 1056 } 1057 } 1058 1059 /* Relaxed IO accessors can be used here as there is read barrier 1060 * at the end of write sequence. 1061 */ 1062 writel_relaxed(0, >hw_next); 1063 writel_relaxed(buffer, >hw_buffer); 1064 writel_relaxed(len, >hw_len); 1065 writel_relaxed(mode | len, >hw_mode); 1066 writel_relaxed((uintptr_t)si->token, >sw_token); 1067 writel_relaxed(buffer, >sw_buffer); 1068 writel_relaxed(swlen, >sw_len); 1069 desc_read(desc, sw_len); 1070 1071 __cpdma_chan_submit(chan, desc); 1072 1073 if (chan->state == CPDMA_STATE_ACTIVE && chan->rxfree) 1074 chan_write(chan, rxfree, 1); 1075 1076 chan->count++; 1077 return 0; 1078 } 1079 1080 int cpdma_chan_idle_submit(struct cpdma_chan *chan, void *token, void *data, 1081 int len, int directed) 1082 { 1083 struct submit_info si; 1084 unsigned long flags; 1085 int ret; 1086 1087 si.chan = chan; 1088 si.token = token; 1089 si.data = data; 1090 si.len = len; 1091 si.directed = directed; 1092 si.flags = 0; 1093 1094 spin_lock_irqsave(>lock, flags); 1095 if (chan->state == CPDMA_STATE_TEARDOWN) { 1096 spin_unlock_irqrestore(>lock, flags); 1097 return
[PULL REQUEST] i2c for 5.2
Linus, I2C has a MAINTAINERS update which will be benfitial for developers, so let's add it right away. Please pull. Thanks, Wolfram The following changes since commit 6fbc7275c7a9ba97877050335f290341a1fd8dbf: Linux 5.2-rc7 (2019-06-30 11:25:36 +0800) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-current for you to fetch changes up to f3a3ea28edd9a17588fede4ff53bc02d986cf4d1: i2c: tegra: Add Dmitry as a reviewer (2019-07-05 20:46:56 +0200) Dmitry Osipenko (1): i2c: tegra: Add Dmitry as a reviewer MAINTAINERS | 1 + 1 file changed, 1 insertion(+) signature.asc Description: PGP signature
Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust
> On Jul 5, 2019, at 8:47 AM, Andrew Cooper wrote: > > On 04/07/2019 16:51, Thomas Gleixner wrote: >> 2) The loop termination logic is interesting at best. >> >> If the machine has no TSC or cpu_khz is not known yet it tries 1 >> million times to ack stale IRR/ISR bits. What? >> >> With TSC it uses the TSC to calculate the loop termination. It takes a >> timestamp at entry and terminates the loop when: >> >>(rdtsc() - start_timestamp) >= (cpu_hkz << 10) >> >> That's roughly one second. >> >> Both methods are problematic. The APIC has 256 vectors, which means >> that in theory max. 256 IRR/ISR bits can be set. In practice this is >> impossible as the first 32 vectors are reserved and not affected and >> the chance that more than a few bits are set is close to zero. > > [Disclaimer. I talked to Thomas in private first, and he asked me to > post this publicly as the CVE is almost a decade old already.] > > I'm afraid that this isn't quite true. > > In terms of IDT vectors, the first 32 are reserved for exceptions, but > only the first 16 are reserved in the LAPIC. Vectors 16-31 are fair > game for incoming IPIs (SDM Vol3, 10.5.2 Valid Interrupt Vectors). > > In practice, this makes Linux vulnerable to CVE-2011-1898 / XSA-3, which > I'm disappointed to see wasn't shared with other software vendors at the > time. IIRC (and from skimming the CVE again) the basic problem in Xen was that MSIs can be used when devices are assigned to generate IRQs with arbitrary vectors. The mitigation was to require interrupt remapping to be enabled in the IOMMU when IOMMU is used for DMA remapping (i.e., device assignment). Are you concerned about this case, additional concrete ones, or is it about security hardening? (or am I missing something?)
[PATCH] nvme: One function call less in nvme_update_disk_info()
From: Markus Elfring Date: Fri, 5 Jul 2019 21:08:12 +0200 Avoid an extra function call by using a ternary operator instead of a conditional statement. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/nvme/host/core.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index b2dd4e391f5c..73888195bdb2 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1650,10 +1650,7 @@ static void nvme_update_disk_info(struct gendisk *disk, nvme_config_discard(disk, ns); nvme_config_write_zeroes(disk, ns); - if (id->nsattr & (1 << 0)) - set_disk_ro(disk, true); - else - set_disk_ro(disk, false); + set_disk_ro(disk, id->nsattr & (1 << 0) ? true : false); blk_mq_unfreeze_queue(disk->queue); } -- 2.22.0
Re: [PATCH 2/2] usb: pci-quirks: Minor cleanup for AMD PLL quirk
On Thu, 4 Jul 2019, Ryan Kennedy wrote: > usb_amd_find_chipset_info() is used for chipset detection for > several quirks. It is strange that its return value indicates > the need for the PLL quirk, which means it is often ignored. > This patch adds a function specifically for checking the PLL > quirk like the other ones. Additionally, rename probe_result to > something more appropriate. > > Signed-off-by: Ryan Kennedy > @@ -322,6 +317,13 @@ bool usb_amd_prefetch_quirk(void) > } > EXPORT_SYMBOL_GPL(usb_amd_prefetch_quirk); > > +bool usb_amd_quirk_pll_check(void) > +{ > + usb_amd_find_chipset_info(); > + return amd_chipset.need_pll_quirk; > +} > +EXPORT_SYMBOL_GPL(usb_amd_quirk_pll_check); I really don't see the point of separating out all but one line into a different function. You might as well just rename usb_amd_find_chipset_info to usb_amd_quirk_pll_check (along with the other code adjustments) and be done with it. However, in the end I don't care if you still want to do this. Either way: Acked-by: Alan Stern Alan Stern
Re: INFO: rcu detected stall in ext4_write_checks
On Fri, Jul 05, 2019 at 05:48:31PM +0200, Dmitry Vyukov wrote: > On Fri, Jul 5, 2019 at 5:17 PM Paul E. McKenney wrote: > > > > On Fri, Jul 05, 2019 at 03:24:26PM +0200, Dmitry Vyukov wrote: > > > On Thu, Jun 27, 2019 at 12:47 AM Theodore Ts'o wrote: > > > > > > > > More details about what is going on. First, it requires root, because > > > > one of that is required is using sched_setattr (which is enough to > > > > shoot yourself in the foot): > > > > > > > > sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_??? */, > > > > sched_flags=0, sched_nice=0, sched_priority=0, > > > > sched_runtime=2251799813724439, sched_deadline=4611686018427453437, > > > > sched_period=0}, 0) = 0 > > > > > > > > This is setting the scheduler policy to be SCHED_DEADLINE, with a > > > > runtime parameter of 2251799.813724439 seconds (or 26 days) and a > > > > deadline of 4611686018.427453437 seconds (or 146 *years*). This means > > > > a particular kernel thread can run for up to 26 **days** before it is > > > > scheduled away, and if a kernel reads gets woken up or sent a signal, > > > > no worries, it will wake up roughly seven times the interval that Rip > > > > Van Winkle spent snoozing in a cave in the Catskill Mountains (in > > > > Washington Irving's short story). > > > > > > > > We then kick off a half-dozen threads all running: > > > > > > > >sendfile(fd, fd, , 0x8080fffe); > > > > > > > > (and since count is a ridiculously large number, this gets cut down to): > > > > > > > >sendfile(fd, fd, , 2147479552); > > > > > > > > Is it any wonder that we are seeing RCU stalls? :-) > > > > > > +Peter, Ingo for sched_setattr and +Paul for rcu > > > > > > First of all: is it a semi-intended result of a root (CAP_SYS_NICE) > > > doing local DoS abusing sched_setattr? It would perfectly reasonable > > > to starve other processes, but I am not sure about rcu. In the end the > > > high prio process can use rcu itself, and then it will simply blow > > > system memory by stalling rcu. So it seems that rcu stalls should not > > > happen as a result of weird sched_setattr values. If that is the case, > > > what needs to be fixed? sched_setattr? rcu? sendfile? > > > > Does the (untested, probably does not even build) patch shown below help? > > This patch assumes that the kernel was built with CONFIG_PREEMPT=n. > > And that I found all the tight loops on the do_sendfile() code path. > > The config used when this happened is referenced from here: > https://syzkaller.appspot.com/bug?extid=4bfbbf28a2e50ab07368 > and it contains: > CONFIG_PREEMPT=y > > So... what does this mean? The loop should have been preempted without > the cond_resched() then, right? Exactly, so although my patch might help for CONFIG_PREEMPT=n, it won't help in your scenario. But looking at the dmesg from your URL above, I see the following: rcu: rcu_preempt kthread starved for 10549 jiffies! g8969 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 And, prior to that: rcu: All QSes seen, last rcu_preempt kthread activity 10503 (4295056736-4295046233), jiffies_till_next_fqs=1, root ->qsmask 0x0 In other words, the grace period has finished, but RCU's grace-period kthread hasn't gotten a chance to run, and thus hasn't marked it as completed. The standard workaround is to set the rcutree.kthread_prio kernel boot parameter to a comfortably high real-time priority. At least assuming that syzkaller isn't setting the scheduling priority of random CPU-bound tasks to RT priority 99 or some such. ;-) Does that work for you? Thanx, Paul > > > If this is semi-intended, the only option I see is to disable > > > something in syzkaller: sched_setattr entirely, or drop CAP_SYS_NICE, > > > or ...? Any preference either way? > > > > Long-running tight loops in the kernel really should contain > > cond_resched() or better. > > > > Thanx, Paul > > > > > > > > diff --git a/fs/splice.c b/fs/splice.c > > index 25212dcca2df..50aa3286764a 100644 > > --- a/fs/splice.c > > +++ b/fs/splice.c > > @@ -985,6 +985,7 @@ ssize_t splice_direct_to_actor(struct file *in, struct > > splice_desc *sd, > > sd->pos = prev_pos + ret; > > goto out_release; > > } > > + cond_resched(); > > } > > > > done: > > >
Re: [PATCH] dax: Fix missed PMD wakeups
On Thu, Jul 04, 2019 at 04:27:14PM -0700, Dan Williams wrote: > On Thu, Jul 4, 2019 at 12:14 PM Matthew Wilcox wrote: > > > > On Thu, Jul 04, 2019 at 06:54:50PM +0200, Jan Kara wrote: > > > On Wed 03-07-19 20:27:28, Matthew Wilcox wrote: > > > > So I think we're good for all current users. > > > > > > Agreed but it is an ugly trap. As I already said, I'd rather pay the > > > unnecessary cost of waiting for pte entry and have an easy to understand > > > interface. If we ever have a real world use case that would care for this > > > optimization, we will need to refactor functions to make this possible and > > > still keep the interfaces sane. For example get_unlocked_entry() could > > > return special "error code" indicating that there's no entry with matching > > > order in xarray but there's a conflict with it. That would be much less > > > error-prone interface. > > > > This is an internal interface. I think it's already a pretty gnarly > > interface to use by definition -- it's going to sleep and might return > > almost anything. There's not much scope for returning an error indicator > > either; value entries occupy half of the range (all odd numbers between 1 > > and ULONG_MAX inclusive), plus NULL. We could use an internal entry, but > > I don't think that makes the interface any easier to use than returning > > a locked entry. > > > > I think this iteration of the patch makes it a little clearer. What do you > > think? > > > > Not much clearer to me. get_unlocked_entry() is now misnamed and this misnamed? You'd rather it was called "try_get_unlocked_entry()"? > arrangement allows for mismatches of @order argument vs @xas > configuration. > Can you describe, or even better demonstrate with > numbers, why it's better to carry this complication than just > converging the waitqueues between the types? You've got the reproducer ;-) It seems quite wrong to make a page fault stall just because another task is working on a different page in the same 2MB chunk.