Re: [PATCH v5 0/5] Track and expose idle PURR and SPURR ticks
On Thu, Apr 30, 2020 at 09:46:13AM +0530, Gautham R Shenoy wrote: > Hello Michael, > > > > > > Michael, could you please consider this for 5.8 ? > > > > Yes. Has it been tested on KVM at all? > > No. I haven't tested this on KVM. Will do that today. The results on Shared LPAR and KVM are as follows: --- The lparstat results on a Shared LPAR are consistent with that observed on a dedicated LPAR when at least one of the threads of the core is active. When all the threads are idle, the lparstat shows incorrect idle percentage. But this is perhaps due to the fact that the Hypervisor puts a completely idle core in some power-saving state with runlatch turned off due to which PURR counts on the threads of a core do not add up to the elapsed timebase ticks. The results are in section A) below. lparstat is not supported on KVM. However, I performed some basic sanity checks on purr, spurr, idle_purr, and idle_spurr sysfs files that show up after this patch series. When CPUs are offlined, the idle_purr and idle_spurr sysfs files no longer show up, just like purr and spurr sysfs files. The values of the counters monotonically increase, except when the CPU is busy, in which case the idle_purr and idle_spurr counts are stagnant as expected. However, I don't think the even the values of PURR or SPURR make much sense on KVM guest, since the Linux Hypervisor doesn't set additional registers such as RWMR, except on POWER8, where the KVM sets RWMR corresponding to the number of online threads in a vCORE before dispatching the vcore. I haven't been able to test it on a POWER8 guest yet. The results on POWER9 are in section B) below. A ) Shared LPAR == 1. When all the threads of the core are running a CPU-Hog # ./lparstat -E 1 5 System Configuration type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 ---Actual--- -Normalized- %busy %idle Frequency %busy %idle -- -- - -- -- 100.00 0.00 2.90GHz[126%] 126.00 0.00 100.00 0.00 2.90GHz[126%] 126.00 0.00 100.00 0.00 2.90GHz[126%] 126.00 0.00 100.00 0.00 2.90GHz[126%] 126.00 0.00 100.01 0.00 2.90GHz[126%] 126.01 0.00 2. When 4 threads of a core are running CPU Hogs, with the remaining 4 threads idle. # ./lparstat -E 1 5 System Configuration type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 ---Actual--- -Normalized- %busy %idle Frequency %busy %idle -- -- - -- -- 81.06 18.94 2.97GHz[129%] 104.56 24.44 81.05 18.95 2.97GHz[129%] 104.56 24.44 81.06 18.95 2.97GHz[129%] 104.56 24.44 81.06 18.95 2.97GHz[129%] 104.56 24.44 81.05 18.95 2.97GHz[129%] 104.56 24.45 3. When 2 threads of a core are running CPU Hogs, with the other 6 threads idle. # ./lparstat -E 1 5 System Configuration type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 ---Actual--- -Normalized- %busy %idle Frequency %busy %idle -- -- - -- -- 65.21 34.79 3.13GHz[136%] 88.69 47.31 65.20 34.81 3.13GHz[136%] 88.67 47.33 64.25 35.76 3.13GHz[136%] 87.38 48.63 63.68 36.31 3.13GHz[136%] 86.60 49.39 63.55 36.45 3.13GHz[136%] 86.42 49.58 4. When a single thread of the core is running CPU Hog, remaining 7 threads are idle. # ./lparstat -E 1 5 System Configuration type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 ---Actual--- -Normalized- %busy %idle Frequency %busy %idle -- -- - -- -- 31.80 68.20 3.20GHz[139%] 44.20 94.80 31.80 68.20 3.20GHz[139%] 44.20 94.81 31.80 68.20 3.20GHz[139%] 44.20 94.80 31.80 68.21 3.20GHz[139%] 44.20 94.81 31.79 68.21 3.20GHz[139%] 44.19 94.81 5. When the LPAR is idle: # ./lparstat -E 1 5 System Configuration type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 ---Actual--- -Normalized- %busy %idle Frequency %busy %idle -- -- - -- -- 0.04 0.14 2.41GHz[105%] 0.04 0.15 0.04 0.15 2.36GHz[102%] 0.04 0.15 0.03 0.13 2.35GHz[102%] 0.03 0.14 0.03 0.13 2.31GHz[100%] 0.03 0.13 0.03 0.13 2.32GHz[101%] 0.03 0.14 In this case, the sum of the PURR values do not add up to the elapsed TB. This is probably due to the Hypervisor putting the core into some power-saving state with the runlatch turned off. # ./purr_tb -t 8 Got threads_per_core = 8 CORE 0: CPU 0 : Delta PURR : 85744 CPU 1 : Delta PURR : 113632 CPU 2 : Delta PURR : 78224 CPU 3 : Delta PURR : 68856 CPU 4 : Delta PURR : 78064 CPU 5 : Delta PURR : 60488 CPU 6 : Delta PURR : 6 CPU 7 : Delta PURR : 59464 Total Delta PURR : 622248 (Expected ~513156096)
Re: [PATCH v2 0/5] Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there
On 1/5/20 5:07 am, Eric W. Biederman wrote: Linus Torvalds writes: On Thu, Apr 30, 2020 at 7:10 AM Greg Ungerer wrote: Most of that file goes back to pre-git days. And most of the commits since are not so much about binfmt_flat, as they are about cleanups or changes elsewhere where binfmt_flat was just a victim. I'll have a look at this. Thanks. Quick hack test shows moving setup_new_exec(bprm) to be just before install_exec_creds(bprm) works fine for the static binaries case. Doing the flush_old_exec(bprm) there too crashed out - I'll need to dig into that to see why. Just moving setup_new_exec() would at least allow us to then join the two together, and just say "setup_new_exec() does the credential installation too". But it is only half a help if we allow failure points between flush_old_exec and install_exec_creds. Greg do things work acceptably if install_exec_creds is moved to right after setup_new_exec? (patch below) Yes, confirmed. Worked fine with that patch applied. Looking at the code in load_flat_file after setup_new_exec it looks like the kinds of things that in binfmt_elf.c we do after install_exec_creds (aka vm_map). So I think we want install_exec_creds sooner, instead of setup_new_exec later. But if it's true that nobody really uses the odd flat library support any more and there are no testers, maybe we should consider ripping it out... I looked a little deeper and there is another reason to think about ripping out the flat library loader. The code is recursive, and supports a maximum of 4 shared libraries in the entire system. load_flat_binary load_flat_file calc_reloc load_flat_shared_libary load_flat_file I am mystified with what kind of system can survive with a grand total of 4 shared libaries. I think my a.out slackware system that I ran on my i486 had more shared libraries. The kind of embedded systems that were built with this stuff 20 years ago didn't have lots of applications and libraries. I think we found back then that most of your savings were from making libc shared. Less significant gains from making other libraries shared. And there was a bit of extra pain in setting them up with the shared library code generation options (that had to be unique for each one). The whole mechanism is a bit of hack, and there was a few other limitations with the way it worked (I don't recall what they were right now). I am definitely in favor of removing it. Regards Greg Having read just a bit more it is definitely guaranteed (by the code) that the first time load_flat_file is called id 0 will be used (aka id 0 is guaranteed to be the binary), and the ids 1, 2, 3 and 4 will only be used if a relocation includes that id to reference an external shared library. That part of the code is drop dead simple. --- This is what I was thinking about applying. diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c index 831a2b25ba79..1a1d1fcb893f 100644 --- a/fs/binfmt_flat.c +++ b/fs/binfmt_flat.c @@ -541,6 +541,7 @@ static int load_flat_file(struct linux_binprm *bprm, /* OK, This is the point of no return */ set_personality(PER_LINUX_32BIT); setup_new_exec(bprm); + install_exec_creds(bprm); } /* @@ -963,8 +964,6 @@ static int load_flat_binary(struct linux_binprm *bprm) } } - install_exec_creds(bprm); - set_binfmt(_format); #ifdef CONFIG_MMU
Re: [PATCH] um: do not evaluate compiler's library path when cleaning
On Sat, Apr 18, 2020 at 3:06 AM Masahiro Yamada wrote: > > Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from > -Wframe-larger-than="), 'make ARCH=um clean' emits an error message > as follows: > > $ make ARCH=um clean > gcc: error: missing argument to '-Wframe-larger-than=' > > We do not care compiler flags when cleaning. > > Use the '=' operator for lazy expansion because we do not use > LDFLAGS_pcap.o or LDFLAGS_vde.o when cleaning. > > While I was here, I removed the redundant -r option because it > already exists in the recipe. > > Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from > -Wframe-larger-than=") > Signed-off-by: Masahiro Yamada > --- > > arch/um/drivers/Makefile | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile > index a290821e355c..2a249f619467 100644 > --- a/arch/um/drivers/Makefile > +++ b/arch/um/drivers/Makefile > @@ -18,9 +18,9 @@ ubd-objs := ubd_kern.o ubd_user.o > port-objs := port_kern.o port_user.o > harddog-objs := harddog_kern.o harddog_user.o > > -LDFLAGS_pcap.o := -r $(shell $(CC) $(KBUILD_CFLAGS) > -print-file-name=libpcap.a) > +LDFLAGS_pcap.o = $(shell $(CC) $(KBUILD_CFLAGS) -print-file-name=libpcap.a) > > -LDFLAGS_vde.o := -r $(shell $(CC) $(CFLAGS) -print-file-name=libvdeplug.a) > +LDFLAGS_vde.o = $(shell $(CC) $(CFLAGS) -print-file-name=libvdeplug.a) > > targets := pcap_kern.o pcap_user.o vde_kern.o vde_user.o > > -- > 2.25.1 > Applied to linux-kbuild. -- Best Regards Masahiro Yamada
Re: [PATCH] kbuild: remove unused AS assignment
On Mon, Apr 27, 2020 at 5:06 PM Will Deacon wrote: > > On Mon, Apr 27, 2020 at 09:30:19AM +0900, Masahiro Yamada wrote: > > $(AS) is not used anywhere, hence commit aa824e0c962b ("kbuild: remove > > AS variable") killed it. > > > > Remove the left-over code in arch/{arm,arm64}/Makefile. > > > > Signed-off-by: Masahiro Yamada > > --- > > > > arch/arm/Makefile | 2 -- > > arch/arm64/Makefile | 2 -- > > 2 files changed, 4 deletions(-) > > Acked-by: Will Deacon > > (If you post the arm64 part as a separate patch, I can pick it up) This is just a trivial cleanup patch. So, I do not split it per-arch. Applied to linux-kbuild. -- Best Regards Masahiro Yamada
Re: [PATCH] kbuild: remove '/' target
On Sat, Apr 25, 2020 at 10:18 PM Masahiro Yamada wrote: > > This notice has been here for a while. Remove it entirely now. > > Signed-off-by: Masahiro Yamada > --- > Applied to linux-kbuild. > Makefile | 4 > 1 file changed, 4 deletions(-) > > diff --git a/Makefile b/Makefile > index 49b2709ff44e..95c2d8c2dfe8 100644 > --- a/Makefile > +++ b/Makefile > @@ -1650,10 +1650,6 @@ _emodinst_post: _emodinst_ > clean-dirs := $(KBUILD_EXTMOD) > clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers > $(KBUILD_EXTMOD)/modules.nsdeps > > -PHONY += / > -/: > - @echo >&2 '"$(MAKE) /" is no longer supported. Please use "$(MAKE) > ./" instead.' > - > PHONY += help > help: > @echo ' Building external modules.' > -- > 2.25.1 > -- Best Regards Masahiro Yamada
Re: [PATCH] kbuild: determine the output format of DTC by the target suffix
On Mon, Apr 27, 2020 at 11:13 PM Masahiro Yamada wrote: > > cmd_dtc takes the additional parameter $(2) to select the target > format, dtb or yaml. This makes things complicated when it is used > with cmd_and_fixdep and if_changed_rule. I actually stumbled on this. > See commit 3d4b2238684a ("kbuild: fix DT binding schema rule again to > avoid needless rebuilds"). > > Extract the suffix part of the target instead of passing the parameter. > Fortunately, this works for both $(obj)/%.dtb and $(obj)/%.dt.yaml . > > Signed-off-by: Masahiro Yamada Applied to linux-kbuild. > --- > > scripts/Makefile.lib | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib > index 12f6a331a8f3..cd52a8c6428f 100644 > --- a/scripts/Makefile.lib > +++ b/scripts/Makefile.lib > @@ -287,13 +287,13 @@ $(obj)/%.dtb.S: $(obj)/%.dtb FORCE > quiet_cmd_dtc = DTC $@ > cmd_dtc = mkdir -p $(dir ${dtc-tmp}) ; \ > $(HOSTCC) -E $(dtc_cpp_flags) -x assembler-with-cpp -o $(dtc-tmp) $< > ; \ > - $(DTC) -O $(2) -o $@ -b 0 \ > + $(DTC) -O $(patsubst .%,%,$(suffix $@)) -o $@ -b 0 \ > $(addprefix -i,$(dir $<) $(DTC_INCLUDE)) $(DTC_FLAGS) \ > -d $(depfile).dtc.tmp $(dtc-tmp) ; \ > cat $(depfile).pre.tmp $(depfile).dtc.tmp > $(depfile) > > $(obj)/%.dtb: $(src)/%.dts $(DTC) FORCE > - $(call if_changed_dep,dtc,dtb) > + $(call if_changed_dep,dtc) > > DT_CHECKER ?= dt-validate > DT_BINDING_DIR := Documentation/devicetree/bindings > @@ -304,7 +304,7 @@ quiet_cmd_dtb_check = CHECK $@ >cmd_dtb_check = $(DT_CHECKER) -u $(srctree)/$(DT_BINDING_DIR) -p > $(DT_TMP_SCHEMA) $@ > > define rule_dtc > - $(call cmd_and_fixdep,dtc,yaml) > + $(call cmd_and_fixdep,dtc) > $(call cmd,dtb_check) > endef > > -- > 2.25.1 > -- Best Regards Masahiro Yamada
[PATCH v2 2/2] PCI: hv: Retry PCI bus D0 entry when the first attempt failed with invalid device state
In the case of kdump, the PCI device was not cleanly shut down before the kdump kernel starts. This causes the initial attempt of entering D0 state in the kdump kernel to fail with invalid device state returned from Hyper-V host. When this happens, explicitly call PCI bus exit and retry to enter the D0 state. Signed-off-by: Wei Hu --- v2: Incorporate review comments from Michael Kelley, Dexuan Cui and Bjorn Helgaas drivers/pci/controller/pci-hyperv.c | 40 +++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index e6fac0187722..92092a47d3af 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -2739,6 +2739,8 @@ static void hv_free_config_window(struct hv_pcibus_device *hbus) vmbus_free_mmio(hbus->mem_config->start, PCI_CONFIG_MMIO_LENGTH); } +static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs); + /** * hv_pci_enter_d0() - Bring the "bus" into the D0 power state * @hdev: VMBus's tracking struct for this root PCI bus @@ -2751,8 +2753,10 @@ static int hv_pci_enter_d0(struct hv_device *hdev) struct pci_bus_d0_entry *d0_entry; struct hv_pci_compl comp_pkt; struct pci_packet *pkt; + bool retry = true; int ret; +enter_d0_retry: /* * Tell the host that the bus is ready to use, and moved into the * powered-on state. This includes telling the host which region @@ -2779,6 +2783,38 @@ static int hv_pci_enter_d0(struct hv_device *hdev) if (ret) goto exit; + /* +* In certain case (Kdump) the pci device of interest was +* not cleanly shut down and resource is still held on host +* side, the host could return invalid device status. +* We need to explicitly request host to release the resource +* and try to enter D0 again. +*/ + if (comp_pkt.completion_status < 0 && retry) { + retry = false; + + dev_err(>device, "Retrying D0 Entry\n"); + + /* +* Hv_pci_bus_exit() calls hv_send_resource_released() +* to free up resources of its child devices. +* In the kdump kernel we need to set the +* wslot_res_allocated to 255 so it scans all child +* devices to release resources allocated in the +* normal kernel before panic happened. +*/ + hbus->wslot_res_allocated = 255; + + ret = hv_pci_bus_exit(hdev, true); + + if (ret == 0) { + kfree(pkt); + goto enter_d0_retry; + } + dev_err(>device, + "Retrying D0 failed with ret %d\n", ret); + } + if (comp_pkt.completion_status < 0) { dev_err(>device, "PCI Pass-through VSP failed D0 Entry with status %x\n", @@ -3185,7 +3221,7 @@ static int hv_pci_probe(struct hv_device *hdev, return ret; } -static int hv_pci_bus_exit(struct hv_device *hdev, bool hibernating) +static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs) { struct hv_pcibus_device *hbus = hv_get_drvdata(hdev); struct { @@ -3203,7 +3239,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool hibernating) if (hdev->channel->rescind) return 0; - if (!hibernating) { + if (!keep_devs) { /* Delete any children which might still exist. */ dr = kzalloc(sizeof(*dr), GFP_KERNEL); if (dr && hv_pci_start_relations_work(hbus, dr)) -- 2.20.1
Re: [PATCH] parisc: suppress error messages for 'make clean'
On Sat, Apr 25, 2020 at 2:47 PM Masahiro Yamada wrote: > > 'make ARCH=parisc clean' emits a tons of error messages as follows: > > $ make ARCH=parisc clean > gcc: error: unrecognized command line option '-mno-space-regs' > gcc: error: unrecognized command line option '-mfast-indirect-calls'; did > you mean '-mforce-indirect-call'? > gcc: error: unrecognized command line option '-mdisable-fpregs' > gcc: error: missing argument to '-Wframe-larger-than=' > gcc: error: unrecognized command line option '-mno-space-regs' > gcc: error: unrecognized command line option '-mfast-indirect-calls'; did > you mean '-mforce-indirect-call'? > gcc: error: unrecognized command line option '-mdisable-fpregs' > gcc: error: missing argument to '-Wframe-larger-than=' > ... > > You can supporess them except '-Wframe-larger-than' by setting correct > CROSS_COMPILE=, but we should not require any compiler for cleaning. > > This $(shell ...) is evaluated so many times because LIBGCC is exported. > Use the ':=' operator to evaluate it just once, and sink the stderr. > Applied to linux-kbuild. > Signed-off-by: Masahiro Yamada > --- > > arch/parisc/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile > index 628cd8bb7ad8..d82787da43cd 100644 > --- a/arch/parisc/Makefile > +++ b/arch/parisc/Makefile > @@ -21,7 +21,7 @@ KBUILD_IMAGE := vmlinuz > > NM = sh $(srctree)/arch/parisc/nm > CHECKFLAGS += -D__hppa__=1 > -LIBGCC = $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name) > +LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name > 2>/dev/null) > export LIBGCC > > ifdef CONFIG_64BIT > -- > 2.25.1 > -- Best Regards Masahiro Yamada
[PATCH v2 1/2] PCI: hv: Fix the PCI HyperV probe failure path to release resource properly
Some error cases in hv_pci_probe() were not handled. Fix these error paths to release the resourses and clean up the state properly. Signed-off-by: Wei Hu --- drivers/pci/controller/pci-hyperv.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c index e15022ff63e3..e6fac0187722 100644 --- a/drivers/pci/controller/pci-hyperv.c +++ b/drivers/pci/controller/pci-hyperv.c @@ -480,6 +480,9 @@ struct hv_pcibus_device { struct workqueue_struct *wq; + /* Highest slot of child device with resources allocated */ + int wslot_res_allocated; + /* hypercall arg, must not cross page boundary */ struct hv_retarget_device_interrupt retarget_msi_interrupt_params; @@ -2847,7 +2850,7 @@ static int hv_send_resources_allocated(struct hv_device *hdev) struct hv_pci_dev *hpdev; struct pci_packet *pkt; size_t size_res; - u32 wslot; + int wslot; int ret; size_res = (hbus->protocol_version < PCI_PROTOCOL_VERSION_1_2) @@ -2900,6 +2903,8 @@ static int hv_send_resources_allocated(struct hv_device *hdev) comp_pkt.completion_status); break; } + + hbus->wslot_res_allocated = wslot; } kfree(pkt); @@ -2918,10 +2923,10 @@ static int hv_send_resources_released(struct hv_device *hdev) struct hv_pcibus_device *hbus = hv_get_drvdata(hdev); struct pci_child_message pkt; struct hv_pci_dev *hpdev; - u32 wslot; + int wslot; int ret; - for (wslot = 0; wslot < 256; wslot++) { + for (wslot = hbus->wslot_res_allocated; wslot >= 0; wslot--) { hpdev = get_pcichild_wslot(hbus, wslot); if (!hpdev) continue; @@ -2936,8 +2941,12 @@ static int hv_send_resources_released(struct hv_device *hdev) VM_PKT_DATA_INBAND, 0); if (ret) return ret; + + hbus->wslot_res_allocated = wslot - 1; } + hbus->wslot_res_allocated = -1; + return 0; } @@ -3037,6 +3046,7 @@ static int hv_pci_probe(struct hv_device *hdev, if (!hbus) return -ENOMEM; hbus->state = hv_pcibus_init; + hbus->wslot_res_allocated = -1; /* * The PCI bus "domain" is what is called "segment" in ACPI and other @@ -3136,7 +3146,7 @@ static int hv_pci_probe(struct hv_device *hdev, ret = hv_pci_allocate_bridge_windows(hbus); if (ret) - goto free_irq_domain; + goto exit_d0; ret = hv_send_resources_allocated(hdev); if (ret) @@ -3154,6 +3164,8 @@ static int hv_pci_probe(struct hv_device *hdev, free_windows: hv_pci_free_bridge_windows(hbus); +exit_d0: + (void) hv_pci_bus_exit(hdev, true); free_irq_domain: irq_domain_remove(hbus->irq_domain); free_fwnode: -- 2.20.1
Re: [PATCH] unicore32: do not evaluate compiler's library path when cleaning
On Sat, Apr 25, 2020 at 3:07 PM Masahiro Yamada wrote: > > Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from > -Wframe-larger-than="), 'make ARCH=unicore32 clean' emits error > messages as follows: > > $ make ARCH=unicore32 clean > gcc: error: missing argument to '-Wframe-larger-than=' > gcc: error: missing argument to '-Wframe-larger-than=' > > We do not care compiler flags when cleaning. > > Use the '=' operator for lazy expansion because we do not use > GNU_LIBC_A or GNU_LIBGCC_A when cleaning. > > Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from > -Wframe-larger-than=") > Signed-off-by: Masahiro Yamada > --- > Applied to linux-kbuild. > arch/unicore32/lib/Makefile | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/unicore32/lib/Makefile b/arch/unicore32/lib/Makefile > index 098981a01841..5af06645b8f0 100644 > --- a/arch/unicore32/lib/Makefile > +++ b/arch/unicore32/lib/Makefile > @@ -10,12 +10,12 @@ lib-y += strncpy_from_user.o strnlen_user.o > lib-y += clear_user.o copy_page.o > lib-y += copy_from_user.o copy_to_user.o > > -GNU_LIBC_A := $(shell $(CC) $(KBUILD_CFLAGS) > -print-file-name=libc.a) > +GNU_LIBC_A = $(shell $(CC) $(KBUILD_CFLAGS) > -print-file-name=libc.a) > GNU_LIBC_A_OBJS:= memchr.o memcpy.o memmove.o memset.o > GNU_LIBC_A_OBJS+= strchr.o strrchr.o > GNU_LIBC_A_OBJS+= rawmemchr.o # needed by > strrchr.o > > -GNU_LIBGCC_A := $(shell $(CC) $(KBUILD_CFLAGS) > -print-file-name=libgcc.a) > +GNU_LIBGCC_A = $(shell $(CC) $(KBUILD_CFLAGS) > -print-file-name=libgcc.a) > GNU_LIBGCC_A_OBJS := _ashldi3.o _ashrdi3.o _lshrdi3.o > GNU_LIBGCC_A_OBJS += _divsi3.o _modsi3.o _ucmpdi2.o _umodsi3.o > _udivsi3.o > > -- > 2.25.1 > -- Best Regards Masahiro Yamada
Re: [PATCH] h8300: suppress error messages for 'make clean'
On Sat, Apr 25, 2020 at 2:18 PM Masahiro Yamada wrote: > > 'make ARCH=h8300 clean' emits error messages as follows: > > $ make ARCH=h8300 clean > gcc: error: missing argument to '-Wframe-larger-than=' > gcc: error: unrecognized command line option '-mint32' > > You can suppress the second one by setting the correct CROSS_COMPILE=, > but we should not require any compiler for cleaning. > > Signed-off-by: Masahiro Yamada > --- Applied to linux-kbuild. > > arch/h8300/boot/compressed/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/h8300/boot/compressed/Makefile > b/arch/h8300/boot/compressed/Makefile > index 9e2701069bbe..5942793f77a0 100644 > --- a/arch/h8300/boot/compressed/Makefile > +++ b/arch/h8300/boot/compressed/Makefile > @@ -18,7 +18,7 @@ CONFIG_MEMORY_START ?= 0x0040 > CONFIG_BOOT_LINK_OFFSET ?= 0x0028 > IMAGE_OFFSET := $(shell printf "0x%08x" > $$(($(CONFIG_MEMORY_START)+$(CONFIG_BOOT_LINK_OFFSET > > -LIBGCC := $(shell $(CROSS-COMPILE)$(CC) $(KBUILD_CFLAGS) > -print-libgcc-file-name) > +LIBGCC := $(shell $(CROSS-COMPILE)$(CC) $(KBUILD_CFLAGS) > -print-libgcc-file-name 2>/dev/null) > LDFLAGS_vmlinux := -Ttext $(IMAGE_OFFSET) -estartup -T $(obj)/vmlinux.lds \ > --defsym output=$(CONFIG_MEMORY_START) > > -- > 2.25.1 > -- Best Regards Masahiro Yamada
[PATCH v2 0/2] Fix PCI HyperV device error handling
This series better handles some PCI HyperV error cases in general and for kdump case. Some of review comments from previous individual patch reviews, including splitting into separate patches, have already been incorporated. Thanks, Wei Wei Hu (2): PCI: hv: Fix the PCI HyperV probe failure path to release resource properly PCI: hv: Retry PCI bus D0 entry when the first attempt failed with invalid device state drivers/pci/controller/pci-hyperv.c | 60 ++--- 1 file changed, 54 insertions(+), 6 deletions(-) -- 2.20.1
Re: [PATCH v2 1/4] kbuild: use $(CC_VERSION_TEXT) to evaluate CC_IS_GCC and CC_IS_CLANG
On Thu, Apr 23, 2020 at 11:24 PM Masahiro Yamada wrote: > > The result of '$(CC) --version | head -n 1' is already computed by the > top Makefile, and stored in the environment variable, CC_VERSION_TEXT. > > 'echo' is probably less expensive than the two commands $(CC) and > 'head' although this optimization is not noticeable level. > > Signed-off-by: Masahiro Yamada Applied to linux-kbuild. > --- > > Changes in v2: > - new patch > > init/Kconfig | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 9e22ee8fbd75..5f797df3f043 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -9,7 +9,7 @@ config DEFCONFIG_LIST > default "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)" > > config CC_IS_GCC > - def_bool $(success,$(CC) --version | head -n 1 | grep -q gcc) > + def_bool $(success,echo "$(CC_VERSION_TEXT)" | grep -q gcc) > > config GCC_VERSION > int > @@ -21,7 +21,7 @@ config LD_VERSION > default $(shell,$(LD) --version | $(srctree)/scripts/ld-version.sh) > > config CC_IS_CLANG > - def_bool $(success,$(CC) --version | head -n 1 | grep -q clang) > + def_bool $(success,echo "$(CC_VERSION_TEXT)" | grep -q clang) > > config CLANG_VERSION > int > -- > 2.25.1 > > -- > You received this message because you are subscribed to the Google Groups > "Clang Built Linux" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to clang-built-linux+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/clang-built-linux/20200423142354.312088-1-masahiroy%40kernel.org. -- Best Regards Masahiro Yamada
Re: [PATCH] hexagon: suppress error message for 'make clean'
On Sat, Apr 25, 2020 at 4:06 AM Brian Cain wrote: > > > -Original Message- > > From: linux-hexagon-ow...@vger.kernel.org > ow...@vger.kernel.org> On Behalf Of Masahiro Yamada > ... > > 'make ARCH=hexagon clean' emits an error message as follows: > > > > $ make ARCH=hexagon clean > > gcc: error: unrecognized command line option '-G0' > > > > You can suppress it by setting the correct CROSS_COMPILE=, but we should > > not require any compiler for cleaning. > > > > Signed-off-by: Masahiro Yamada > > > Acked-by: Brian Cain Applied to linux-kbuild. Thanks. > > > --- > > > > arch/hexagon/Makefile | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/arch/hexagon/Makefile b/arch/hexagon/Makefile index > > 4c5858b80f0e..c168c6980d05 100644 > > --- a/arch/hexagon/Makefile > > +++ b/arch/hexagon/Makefile > > @@ -30,7 +30,7 @@ TIR_NAME := r19 > > KBUILD_CFLAGS += -ffixed-$(TIR_NAME) - > > DTHREADINFO_REG=$(TIR_NAME) -D__linux__ KBUILD_AFLAGS += - > > DTHREADINFO_REG=$(TIR_NAME) > > > > -LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name) > > +LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name > > +2>/dev/null) > > libs-y += $(LIBGCC) > > > > head-y := arch/hexagon/kernel/head.o > > -- > > 2.25.1 > > -- Best Regards Masahiro Yamada
Re: [RFC][PATCH] x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
On Thu, Apr 30, 2020 at 11:47:33PM -0500, Josh Poimboeuf wrote: > On Thu, Apr 30, 2020 at 08:21:47PM -0400, Steven Rostedt wrote: > > The cause is the "ftrace=function" would register the function tracer > > and create a trampoline, and it will set it as executable and > > read-only. Then the "trace_options=func_stack_trace" would then update > > the same trampoline to include the stack tracer version of the function > > tracer. But since the trampoline already exists, it updates it with > > text_poke_bp(). The problem is that text_poke_bp() called while > > system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not > > the page mapping, as it would think that the text is still read-write. > > But in this case it is not, and we take a fault and crash. > > > > Instead, lets keep the ftrace trampolines read-write during boot up, > > and then when the kernel executable text is set to read-only, the > > ftrace trampolines get set to read-only as well. > > Would it be easier to just call a new __text_poke_bp() which skips the > SYSTEM_BOOTING check, since you know the trampoline will always be > read-only? > > Like: early_trace_init() is called after mm_init(), so I thought it might work, but I guess not: [0.206271] Starting tracer 'function' [0.232032] BUG: kernel NULL pointer dereference, address: 0050 [0.232035] #PF: supervisor read access in kernel mode [0.232036] #PF: error_code(0x) - not-present page [0.232037] PGD 0 P4D 0 [0.232040] Oops: [#1] PREEMPT SMP PTI [0.232042] CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc2-next-20200424+ #127 [0.232043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014 [0.232047] RIP: 0010:walk_to_pmd+0x11/0x140 [0.232048] Code: 78 c1 32 82 e8 a0 68 ff ff 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 48 89 f0 48 c1 e8 24 25 f8 0f 00 00 <48> 03 47 50 0f 84 12 01 00 00 41 54 49 89 fc 55 48 89 c5 53 48 8b [0.232050] RSP: :82603c68 EFLAGS: 00010046 [0.232051] RAX: RBX: RCX: [0.232052] RDX: 82603cc8 RSI: RDI: [0.232053] RBP: 8063 R08: 0001 R09: fff00fff [0.232054] R10: R11: R12: 82603cc8 [0.232055] R13: R14: 0066 R15: feff [0.232056] FS: () GS:88813b60() knlGS: [0.232057] CS: 0010 DS: ES: CR0: 80050033 [0.232058] CR2: 0050 CR3: 02612001 CR4: 000606b0 [0.232061] DR0: DR1: DR2: [0.232062] DR3: DR6: fffe0ff0 DR7: 0400 [0.232063] Call Trace: [0.232066] __get_locked_pte+0x19/0x100 [0.232069] ? 0xa065 [0.232071] __text_poke+0x14d/0x640 [0.232073] ? 0xa065 [0.232076] text_poke_bp_batch+0x8b/0x1c0 [0.232078] ? 0xa065 [0.232080] __text_poke_bp+0x3a/0x60 [0.232083] arch_ftrace_update_trampoline+0xac/0x300 [0.232087] __register_ftrace_function+0x7c/0xc0 [0.232089] ftrace_startup+0x1e/0xf0 [0.232091] register_ftrace_function+0x24/0x70 [0.232093] func_set_flag+0x6f/0x90 [0.232096] __set_tracer_option.isra.0+0x24/0x50 [0.232098] trace_set_options+0x149/0x160 [0.232102] apply_trace_boot_options+0x44/0x6d [0.232105] register_tracer+0x1d8/0x1e9 [0.232107] early_trace_init+0x266/0x378 [0.232109] start_kernel+0x337/0x614 [0.232112] ? x86_family+0x5/0x20 [0.232114] secondary_startup_64+0xa4/0xb0 [0.232119] Modules linked in: [0.232120] CR2: 0050 [0.232126] random: get_random_bytes called from print_oops_end_marker+0x26/0x40 with crng_init=0 [0.232128] ---[ end trace 71e23a89b9b5224f ]--- [0.232130] RIP: 0010:walk_to_pmd+0x11/0x140 [0.232131] Code: 78 c1 32 82 e8 a0 68 ff ff 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 48 89 f0 48 c1 e8 24 25 f8 0f 00 00 <48> 03 47 50 0f 84 12 01 00 00 41 54 49 89 fc 55 48 89 c5 53 48 8b [0.232133] RSP: :82603c68 EFLAGS: 00010046 [0.232134] RAX: RBX: RCX: [0.232135] RDX: 82603cc8 RSI: RDI: [0.232136] RBP: 8063 R08: 0001 R09: fff00fff [0.232137] R10: R11: R12: 82603cc8 [0.232138] R13: R14: 0066 R15: feff [0.232139] FS: () GS:88813b60() knlGS: [0.232140] CS: 0010 DS: ES: CR0: 80050033 [0.232141] CR2: 0050 CR3: 02612001 CR4: 000606b0 [0.232142] DR0:
Re: BPFilter: bit size mismatch between bpfiter_umh and vmliux
On Thu, Apr 30, 2020 at 9:06 PM Masahiro Yamada wrote: > > Hi Alexei, > > On Wed, Apr 29, 2020 at 1:14 AM Alexei Starovoitov > wrote: > > > > > > At least, the build was successful, > > > but does this work at runtime? > > > > > > If this is a bug, I can fix it cleanly. > > > > > > I think the bit size of the user mode helper > > > should match to the kernel bit size. Is this correct? > > > > yes. they should match. > > In theory we can have -m32 umh running on 64-bit kernel, > > but I wouldn't bother adding support for such thing > > until there is a use case. > > Running 64-bit umh on 32-bit kernel is no go. > > > Thanks for the comments. > > > This issue will be fixed by this: > https://patchwork.kernel.org/patch/11515997/ > > and the Makefile will be cleaned up by this: > https://patchwork.kernel.org/patch/11515995/ > > > They are parts of the big series of Makefile cleanups. > So, I will apply the whole to kbuild tree. thank you. I saw the patches, but didn't have time to test or comment on them. To be fair umh logic was bit rotting a bit, but that will change soon.
Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume
On 2020-05-01 09:50, Bart Van Assche wrote: On 2020-04-30 18:42, Can Guo wrote: On 2020-05-01 04:32, Bart Van Assche wrote: > Has it been considered to test directly whether a SCSI device has been > runtime suspended instead of relying on blk_queue_pm_only()? How about > using pm_runtime_status_suspended() or adding a function in > block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? Yes, I used to make the patch like that way, and it also worked well, as both ways are equal actually. I kinda like the current code because we should be confident that after scsi_dev_type_resume() returns, pm_only must be 0. Different reviewers may have different opinions, either way works well anyways. Hi Can, Please note that this is not a matter of personal preferences of a reviewer but a matter of correctness. blk_queue_pm_only() does not only return a value > 0 if a SCSI device has been runtime suspended but also returns true if scsi_device_quiesce() was called for another reason. Hence my request to test the "runtime suspended" status directly and not to rely on blk_queue_pm_only(). Thanks, Bart. Hi Bart, I agree we are pursuing correctness here, but as I said, I think both way are equally correct. I also agree with you that the alternative way, see [2], is much easier to be understood, we can take the alternative way if you are OK with it. [1] Currently, scsi_dev_type_resume() is the hooker for resume, thaw and restore. Per my understanding, when scsi_dev_type_resume() is running, it is not possible that scsi_device_quiesce() can be called to this sdev, at least not possible in current code base. So it is OK to rely on blk_queue_pm_only() in scsi_dev_type_resume(). [2] The alternative way which I have tested with is like below. I think it is what you requested for if my understanding is right, please correct me if I am wrong. diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c index 3717eea..d18271d 100644 --- a/drivers/scsi/scsi_pm.c +++ b/drivers/scsi/scsi_pm.c @@ -74,12 +74,15 @@ static int scsi_dev_type_resume(struct device *dev, { const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL; int err = 0; + bool was_rpm_suspended = false; err = cb(dev, pm); scsi_device_resume(to_scsi_device(dev)); dev_dbg(dev, "scsi resume: %d\n", err); if (err == 0) { + was_rpm_suspended = pm_runtime_suspended(dev); + pm_runtime_disable(dev); err = pm_runtime_set_active(dev); pm_runtime_enable(dev); @@ -93,8 +96,10 @@ static int scsi_dev_type_resume(struct device *dev, */ if (!err && scsi_is_sdev_device(dev)) { struct scsi_device *sdev = to_scsi_device(dev); - - blk_set_runtime_active(sdev->request_queue); + if (was_rpm_suspended) + blk_post_runtime_resume(sdev->request_queue, 0); + else + blk_set_runtime_active(sdev->request_queue); } } Thanks, Can Guo
Re: [RFC][PATCH] x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up
On Thu, Apr 30, 2020 at 08:21:47PM -0400, Steven Rostedt wrote: > The cause is the "ftrace=function" would register the function tracer > and create a trampoline, and it will set it as executable and > read-only. Then the "trace_options=func_stack_trace" would then update > the same trampoline to include the stack tracer version of the function > tracer. But since the trampoline already exists, it updates it with > text_poke_bp(). The problem is that text_poke_bp() called while > system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not > the page mapping, as it would think that the text is still read-write. > But in this case it is not, and we take a fault and crash. > > Instead, lets keep the ftrace trampolines read-write during boot up, > and then when the kernel executable text is set to read-only, the > ftrace trampolines get set to read-only as well. Would it be easier to just call a new __text_poke_bp() which skips the SYSTEM_BOOTING check, since you know the trampoline will always be read-only? Like: diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index 67315fa3956a..710106256916 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -45,6 +45,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t len); extern void text_poke_sync(void); extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len); extern int poke_int3_handler(struct pt_regs *regs); +extern void __text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); extern void text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate); extern void text_poke_queue(void *addr, const void *opcode, size_t len, const void *emulate); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 7867dfb3963e..9cc983cc9291 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1265,6 +1265,14 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi text_poke_loc_init(tp, addr, opcode, len, emulate); } +void __ref __text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate) +{ + struct text_poke_loc tp; + + text_poke_loc_init(, addr, opcode, len, emulate); + text_poke_bp_batch(, 1); +} + /** * text_poke_bp() -- update instructions on live kernel on SMP * @addr: address to patch @@ -1278,13 +1286,10 @@ void __ref text_poke_queue(void *addr, const void *opcode, size_t len, const voi */ void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void *emulate) { - struct text_poke_loc tp; - if (unlikely(system_state == SYSTEM_BOOTING)) { text_poke_early(addr, opcode, len); return; } - text_poke_loc_init(, addr, opcode, len, emulate); - text_poke_bp_batch(, 1); + __text_poke_bp(addr, opcode, len, emulate); } diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 867c126ddabe..c36f51f01f6e 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -469,7 +469,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops) mutex_lock(_mutex); /* Do a safe modify in case the trampoline is executing */ new = ftrace_call_replace(ip, (unsigned long)func); - text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL); + __text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL); mutex_unlock(_mutex); }
[PATCH v2] vhost: vsock: kick send_pkt worker once device is started
Ning Bo reported an abnormal 2-second gap when booting Kata container [1]. The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of connecting from the client side. The vhost vsock client tries to connect an initializing virtio vsock server. The abnormal flow looks like: host-userspace vhost vsock guest vsock == === connect() > vhost_transport_send_pkt_work() initializing | vq->private_data==NULL | will not be queued V schedule_timeout(2s) vhost_vsock_start() <- device ready set vq->private_data wait for 2s and failed connect() again vq->private_data!=NULL recv connecting pkt Details: 1. Host userspace sends a connect pkt, at that time, guest vsock is under initializing, hence the vhost_vsock_start has not been called. So vq->private_data==NULL, and the pkt is not been queued to send to guest 2. Then it sleeps for 2s 3. After guest vsock finishes initializing, vq->private_data is set 4. When host userspace wakes up after 2s, send connecting pkt again, everything is fine. As suggested by Stefano Garzarella, this fixes it by additional kicking the send_pkt worker in vhost_vsock_start once the virtio device is started. This makes the pending pkt sent again. After this patch, kata-runtime (with vsock enabled) boot time is reduced from 3s to 1s on a ThunderX2 arm64 server. [1] https://github.com/kata-containers/runtime/issues/1917 Reported-by: Ning Bo Suggested-by: Stefano Garzarella Signed-off-by: Jia He --- v2: new solution suggested by Stefano Garzarella drivers/vhost/vsock.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index e36aaf9ba7bd..0716a9cdffee 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) mutex_unlock(>mutex); } + /* Some packets may have been queued before the device was started, +* let's kick the send worker to send them. +*/ + vhost_work_queue(>dev, >send_pkt_work); + mutex_unlock(>dev.mutex); return 0; -- 2.17.1
[PATCH kunit-next] kunit: kunit_tool: Separate out config/build/exec/parse
Add new subcommands to kunit.py to allow stages of the existing 'run' subcommand to be run independently: - 'config': Verifies that .config is a subset of .kunitconfig - 'build': Compiles a UML kernel for KUnit - 'exec': Runs the kernel, and outputs the test results. - 'parse': Parses test results from a file or stdin The 'run' command continues to behave as before. Signed-off-by: David Gow Reviewed-by: Brendan Higgins --- Changes since the RFC[1]: - Rebased on top of kselftest/kunit, adding support for --alltests and --make_options - Fixed an issue where the help text wouldn't print properly if an invalid subcommand were specified. - Changed 'parse' subcommand to make the filename optional (and default to stdin) as suggested by Brendan. [1]: https://lkml.org/lkml/2020/3/11/1225 tools/testing/kunit/kunit.py | 293 - tools/testing/kunit/kunit_tool_test.py | 63 +- 2 files changed, 296 insertions(+), 60 deletions(-) diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py index 7dca74774dd2..b01838b6f5f9 100755 --- a/tools/testing/kunit/kunit.py +++ b/tools/testing/kunit/kunit.py @@ -20,8 +20,17 @@ import kunit_config import kunit_kernel import kunit_parser -KunitResult = namedtuple('KunitResult', ['status','result']) +KunitResult = namedtuple('KunitResult', ['status','result','elapsed_time']) +KunitConfigRequest = namedtuple('KunitConfigRequest', + ['build_dir', 'defconfig', 'make_options']) +KunitBuildRequest = namedtuple('KunitBuildRequest', + ['jobs', 'build_dir', 'alltests', + 'make_options']) +KunitExecRequest = namedtuple('KunitExecRequest', + ['timeout', 'build_dir', 'alltests']) +KunitParseRequest = namedtuple('KunitParseRequest', + ['raw_output', 'input_data']) KunitRequest = namedtuple('KunitRequest', ['raw_output','timeout', 'jobs', 'build_dir', 'defconfig', 'alltests', 'make_options']) @@ -46,14 +55,25 @@ def get_kernel_root_path(): sys.exit(1) return parts[0] -def run_tests(linux: kunit_kernel.LinuxSourceTree, - request: KunitRequest) -> KunitResult: +def config_tests(linux: kunit_kernel.LinuxSourceTree, +request: KunitConfigRequest) -> KunitResult: + kunit_parser.print_with_timestamp('Configuring KUnit Kernel ...') + config_start = time.time() + if request.defconfig: + create_default_kunitconfig() success = linux.build_reconfig(request.build_dir, request.make_options) config_end = time.time() if not success: - return KunitResult(KunitStatus.CONFIG_FAILURE, 'could not configure kernel') + return KunitResult(KunitStatus.CONFIG_FAILURE, + 'could not configure kernel', + config_end - config_start) + return KunitResult(KunitStatus.SUCCESS, + 'configured kernel successfully', + config_end - config_start) +def build_tests(linux: kunit_kernel.LinuxSourceTree, + request: KunitBuildRequest) -> KunitResult: kunit_parser.print_with_timestamp('Building KUnit Kernel ...') build_start = time.time() @@ -64,78 +84,167 @@ def run_tests(linux: kunit_kernel.LinuxSourceTree, build_end = time.time() if not success: return KunitResult(KunitStatus.BUILD_FAILURE, 'could not build kernel') + if not success: + return KunitResult(KunitStatus.BUILD_FAILURE, + 'could not build kernel', + build_end - build_start) + return KunitResult(KunitStatus.SUCCESS, + 'built kernel successfully', + build_end - build_start) +def exec_tests(linux: kunit_kernel.LinuxSourceTree, + request: KunitExecRequest) -> KunitResult: kunit_parser.print_with_timestamp('Starting KUnit Kernel ...') test_start = time.time() - kunit_output = linux.run_kernel( + result = linux.run_kernel( timeout=None if request.alltests else request.timeout, build_dir=request.build_dir) + + test_end = time.time() + + return KunitResult(KunitStatus.SUCCESS, + result, + test_end - test_start) + +def parse_tests(request: KunitParseRequest) -> KunitResult: + parse_start = time.time() + + test_result = kunit_parser.TestResult(kunit_parser.TestStatus.SUCCESS, + [], + 'Tests not Parsed.') if request.raw_output: - raw_output =
Re: [PATCH v3 3/5] fs: Enable to enforce noexec mounts or file exec through RESOLVE_MAYEXEC
On Tue, 28 Apr 2020, Mickaël Salaün wrote: > Enable to either propagate the mount options from the underlying VFS > mount to prevent execution, or to propagate the file execute permission. > This may allow a script interpreter to check execution permissions > before reading commands from a file. I'm finding the description of this patch difficult to understand. In the first case, this seems to mean: if you open a file with RESOLVE_MAYEXEC from a noexec mount, then it will fail. Correct? In the second case, do you mean a RESOLVE_MAYEXEC open will fail if the file does not have +x set for the user? > The main goal is to be able to protect the kernel by restricting > arbitrary syscalls that an attacker could perform with a crafted binary > or certain script languages. This sounds like the job of seccomp. Why is this part of MAYEXEC? > It also improves multilevel isolation > by reducing the ability of an attacker to use side channels with > specific code. These restrictions can natively be enforced for ELF > binaries (with the noexec mount option) but require this kernel > extension to properly handle scripts (e.g., Python, Perl). Again, not sure why you're talking about side channels and MAYEXEC and mount options. Are you more generally talking about being able to prevent execution of arbitrary script files included by an interpreter? > Add a new sysctl fs.open_mayexec_enforce to control this behavior. > Indeed, because of compatibility with installed systems, only the system > administrator is able to check that this new enforcement is in line with > the system mount points and file permissions. A following patch adds > documentation. I don't like the idea of any of this feature set being configurable. RESOLVE_MAYEXEC as a new flag should have well-defined, stable semantics. -- James Morris
Re: [PATCH] pipe: read/write_iter() handler should check for IOCB_NOWAIT
On 4/30/20 10:14 PM, Jens Axboe wrote: > On 4/30/20 9:58 PM, Al Viro wrote: >> On Thu, Apr 30, 2020 at 10:24:46AM -0600, Jens Axboe wrote: >>> Pipe read/write only checks for the file O_NONBLOCK flag, but we should >>> also check for IOCB_NOWAIT for whether or not we should handle this read >>> or write in a non-blocking fashion. If we don't, then we will block on >>> data or space for iocbs that explicitly asked for non-blocking >>> operation. This messes up callers that explicitly ask for non-blocking >>> operations. >> >> Why does io_uring allow setting IOCB_NOWAIT without FMODE_NOWAIT, anyway? > > To do per-io non-blocking operations. It's not practical or convenient > to flip the file flag, nor does it work if you have multiple of them > going. If pipes honor the flag for the read/write iter handlers, then > we can handle them a lot more efficiently instead of requiring async > offload. Sorry, I think I misread you and the answer, while correct by itself, is not the answer to the question you are asking. You're saying we should not be using IOCB_NOWAIT if FMODE_NOWAIT isn't set, which is fair. I'll re-do the patch, we can probably just use FMODE_NOWAIT for the final check in io_uring instead. For pipes, we should include the setting of FMODE_NOWAIT for fifo_open() with the patch, at least. -- Jens Axboe
Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
On Thu, 30 Apr 2020 22:26:55 -0400 (EDT) Mathieu Desnoyers wrote: > The tracers just have to make sure they perform their vmalloc'd memory > allocation before registering the tracepoint which can touch it, else they > need to issue vmalloc_sync_mappings() on their own before making the > newly allocated memory observable by instrumentation. What gets me is that I added the patch below (which adds a vmalloc_sync_mappings() just after the alloc_percpu()), but I also recorded all instances of vmalloc() with a stackdump, and I get this: colord-1673 [002] 84.764804: __vmalloc_node_range+0x5/0x2c0: vmalloc called here colord-1673 [002] 84.764807: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => module_alloc+0x7e/0xd0 => bpf_jit_binary_alloc+0x70/0x110 => bpf_int_jit_compile+0x139/0x40a => bpf_prog_select_runtime+0xa3/0x120 => bpf_prepare_filter+0x533/0x5a0 => sk_attach_filter+0x13/0x50 => sock_setsockopt+0xd2f/0xf90 => __sys_setsockopt+0x18a/0x1a0 => __x64_sys_setsockopt+0x20/0x30 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 [ the above is from before the tracing started ] trace-cmd-1687 [002] 103.908850: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1687 [002] 103.908856: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0x23d/0x2b0 => pid_write.isra.62+0xd1/0x2f0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 trace-cmd-1697 [003] 104.088950: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1697 [003] 104.088954: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0x23d/0x2b0 => pid_write.isra.62+0xd1/0x2f0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 trace-cmd-1697 [003] 104.089666: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1697 [003] 104.089669: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0xc1/0x2b0 => pid_write.isra.62+0xd1/0x2f0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 trace-cmd-1697 [003] 104.098920: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1697 [003] 104.098924: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0xc1/0x2b0 => pid_write.isra.62+0xd1/0x2f0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 trace-cmd-1697 [003] 104.114518: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1697 [003] 104.114520: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0xc1/0x2b0 => pid_write.isra.62+0xd1/0x2f0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 trace-cmd-1697 [003] 104.130705: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1697 [003] 104.130707: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0x23d/0x2b0 => event_pid_write.isra.30+0x21b/0x3b0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 trace-cmd-1687 [001] 106.000510: __vmalloc_node_range+0x5/0x2c0: vmalloc called here trace-cmd-1687 [001] 106.000514: => __ftrace_trace_stack+0x161/0x1a0 => __vmalloc_node_range+0x4d/0x2c0 => vzalloc+0x48/0x50 => trace_pid_write+0x23d/0x2b0 => pid_write.isra.62+0xd1/0x2f0 => vfs_write+0xa8/0x1b0 => ksys_write+0x67/0xe0 => do_syscall_64+0x60/0x230 => entry_SYSCALL_64_after_hwframe+0x49/0xb3 => 0 => 0 => 0 => 0 => 0 => 0 => 0 => 0 The above is the calls to adding pids to set_event_pid. (I see I should probably make that code a bit more efficient, it calls the vmalloc code a bit too much). But what is missing, is the call to vmalloc from alloc_percpu(). In fact, I put in printks in the vmalloc() that's in alloc_percpu() and it doesn't trigger from the tracing code, and it does show up in my trace
Re: linux-next: manual merge of the mlx5-next tree with the kspp-gustavo tree
Stephen, On 4/30/20 22:30, Gustavo A. R. Silva wrote: >> This is now a conflict between the net-next and kspp-gustavo trees. >> > > Thanks for reporting this. I think the best solution, for now, is to remove > the > changes from my tree. I'll do it right away. > I just updated my -next tree. This conflict should be resolved now. :) Thanks -- Gustavo
Re: [PATCH] pipe: read/write_iter() handler should check for IOCB_NOWAIT
On 4/30/20 9:58 PM, Al Viro wrote: > On Thu, Apr 30, 2020 at 10:24:46AM -0600, Jens Axboe wrote: >> Pipe read/write only checks for the file O_NONBLOCK flag, but we should >> also check for IOCB_NOWAIT for whether or not we should handle this read >> or write in a non-blocking fashion. If we don't, then we will block on >> data or space for iocbs that explicitly asked for non-blocking >> operation. This messes up callers that explicitly ask for non-blocking >> operations. > > Why does io_uring allow setting IOCB_NOWAIT without FMODE_NOWAIT, anyway? To do per-io non-blocking operations. It's not practical or convenient to flip the file flag, nor does it work if you have multiple of them going. If pipes honor the flag for the read/write iter handlers, then we can handle them a lot more efficiently instead of requiring async offload. -- Jens Axboe
Re: [PATCH] drm/mediatek: cleanup coding style in mediatek a bit
Hi Bernard, Thank you for the patch! Yet something to improve: [auto build test ERROR on pza/reset/next] [also build test ERROR on drm-intel/for-linux-next drm-tip/drm-tip linus/master v5.7-rc3 next-20200430] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/Bernard-Zhao/drm-mediatek-cleanup-coding-style-in-mediatek-a-bit/20200428-055750 base: https://git.pengutronix.de/git/pza/linux reset/next config: arm64-allyesconfig (attached as .config) compiler: aarch64-linux-gcc (GCC) 9.3.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=arm64 If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All errors (new ones prefixed by >>): drivers/gpu/drm/mediatek/mtk_hdmi.c: In function 'mtk_hdmi_hw_send_info_frame': >> drivers/gpu/drm/mediatek/mtk_hdmi.c:1807: error: unterminated argument list >> invoking macro "dev_err" 1807 | MODULE_LICENSE("GPL v2"); | >> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: error: 'dev_err' undeclared >> (first use in this function); did you mean '_dev_err'? 316 | dev_err(hdmi->dev, "Wrong frame len: %d\n", frame_len; | ^~~ | _dev_err drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: note: each undeclared identifier is reported only once for each function it appears in >> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:10: error: expected ';' at end of >> input 316 | dev_err(hdmi->dev, "Wrong frame len: %d\n", frame_len; | ^ | ; .. 1807 | MODULE_LICENSE("GPL v2"); | >> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: error: expected declaration or >> statement at end of input 316 | dev_err(hdmi->dev, "Wrong frame len: %d\n", frame_len; | ^~~ >> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: error: expected declaration or >> statement at end of input drivers/gpu/drm/mediatek/mtk_hdmi.c:308:6: warning: unused variable 'ctrl_frame_en' [-Wunused-variable] 308 | int ctrl_frame_en = 0; | ^ drivers/gpu/drm/mediatek/mtk_hdmi.c:302:6: warning: unused variable 'i' [-Wunused-variable] 302 | int i; | ^ drivers/gpu/drm/mediatek/mtk_hdmi.c:301:6: warning: unused variable 'ctrl_reg' [-Wunused-variable] 301 | u32 ctrl_reg = GRL_CTRL; | ^~~~ At top level: drivers/gpu/drm/mediatek/mtk_hdmi.c:298:13: warning: 'mtk_hdmi_hw_send_info_frame' defined but not used [-Wunused-function] 298 | static void mtk_hdmi_hw_send_info_frame(struct mtk_hdmi *hdmi, u8 *buffer, | ^~~ drivers/gpu/drm/mediatek/mtk_hdmi.c:293:13: warning: 'mtk_hdmi_hw_enable_dvi_mode' defined but not used [-Wunused-function] 293 | static void mtk_hdmi_hw_enable_dvi_mode(struct mtk_hdmi *hdmi, bool enable) | ^~~ drivers/gpu/drm/mediatek/mtk_hdmi.c:288:13: warning: 'mtk_hdmi_hw_write_int_mask' defined but not used [-Wunused-function] 288 | static void mtk_hdmi_hw_write_int_mask(struct mtk_hdmi *hdmi, u32 int_mask) | ^~ drivers/gpu/drm/mediatek/mtk_hdmi.c:282:13: warning: 'mtk_hdmi_hw_enable_notice' defined but not used [-Wunused-function] 282 | static void mtk_hdmi_hw_enable_notice(struct mtk_hdmi *hdmi, bool enable_notice) | ^ drivers/gpu/drm/mediatek/mtk_hdmi.c:271:13: warning: 'mtk_hdmi_hw_reset' defined but not used [-Wunused-function] 271 | static void mtk_hdmi_hw_reset(struct mtk_hdmi *hdmi) | ^ drivers/gpu/drm/mediatek/mtk_hdmi.c:266:13: warning: 'mtk_hdmi_hw_aud_unmute' defined but not used [-Wunused-function] 266 | static void mtk_hdmi_hw_aud_unmute(struct mtk_hdmi *hdmi) | ^~ drivers/gpu/drm/mediatek/mtk_hdmi.c:261:13: warning: 'mtk_hdmi_hw_aud_mute' defined but not used [-Wunused-function] 261 | static void mtk_hdmi_hw_aud_mute(struct mtk_hdmi *hdmi) | ^~~~ drivers/gpu/drm/mediatek/mtk_hdmi.c:255:13: warning: 'mtk_hdmi_hw_1p4_version_enable' defined but not used [-Wunused-function] 255 | static void mtk_hdmi_hw_1p4_version_enable(struct mtk_hdmi *hdmi, bool enable) | ^~~
Re: BPFilter: bit size mismatch between bpfiter_umh and vmliux
Hi Alexei, On Wed, Apr 29, 2020 at 1:14 AM Alexei Starovoitov wrote: > > > > At least, the build was successful, > > but does this work at runtime? > > > > If this is a bug, I can fix it cleanly. > > > > I think the bit size of the user mode helper > > should match to the kernel bit size. Is this correct? > > yes. they should match. > In theory we can have -m32 umh running on 64-bit kernel, > but I wouldn't bother adding support for such thing > until there is a use case. > Running 64-bit umh on 32-bit kernel is no go. Thanks for the comments. This issue will be fixed by this: https://patchwork.kernel.org/patch/11515997/ and the Makefile will be cleaned up by this: https://patchwork.kernel.org/patch/11515995/ They are parts of the big series of Makefile cleanups. So, I will apply the whole to kbuild tree. Thanks. -- Best Regards Masahiro Yamada
Re: [PATCH v3 1/5] fs: Add support for a RESOLVE_MAYEXEC flag on openat2(2)
On Tue, 28 Apr 2020, Mickaël Salaün wrote: > When the RESOLVE_MAYEXEC flag is passed, openat2(2) may be subject to > additional restrictions depending on a security policy managed by the > kernel through a sysctl or implemented by an LSM thanks to the > inode_permission hook. Reviewed-by: James Morris -- James Morris
linux-next: build warning after merge of the sound tree
Hi all, After merging the sound tree, today's linux-next build (arm multi_v7_defconfig) produced this warning: sound/pci/hda/patch_realtek.c: In function 'alc_fixup_hp_gpio_led': sound/pci/hda/patch_realtek.c:4134:6: warning: unused variable 'err' [-Wunused-variable] 4134 | int err; | ^~~ Introduced by commit 87dc36482cab ("ALSA: hda/realtek - Add LED class support for micmute LED") -- Cheers, Stephen Rothwell pgpvchfDJI3EV.pgp Description: OpenPGP digital signature
Re: [PATCH v3 2/5] fs: Add a MAY_EXECMOUNT flag to infer the noexec mount property
On Tue, 28 Apr 2020, Mickaël Salaün wrote: > An LSM doesn't get path information related to an access request to open > an inode. This new (internal) MAY_EXECMOUNT flag enables an LSM to > check if the underlying mount point of an inode is marked as executable. > This is useful to implement a security policy taking advantage of the > noexec mount option. > > This flag is set according to path_noexec(), which checks if a mount > point is mounted with MNT_NOEXEC or if the underlying superblock is > SB_I_NOEXEC. > > Signed-off-by: Mickaël Salaün > Reviewed-by: Philippe Trébuchet > Reviewed-by: Thibaut Sautereau > Cc: Aleksa Sarai > Cc: Al Viro > Cc: Kees Cook Are there any existing LSMs which plan to use this aspect? -- James Morris
Re: linux-next: manual merge of the akpm-current tree with the bpf-next tree
On Wed, Apr 29, 2020 at 06:24:06PM +1000, Stephen Rothwell wrote: > Hi Christoph, > > On Wed, 29 Apr 2020 08:54:04 +0200 Christoph Hellwig wrote: > > > > On Tue, Apr 28, 2020 at 11:49:34PM -0700, Alexei Starovoitov wrote: > > > On Tue, Apr 28, 2020 at 11:47 PM Christoph Hellwig wrote: > > > > > > > > On Wed, Apr 29, 2020 at 04:45:07PM +1000, Stephen Rothwell wrote: > > > > > > > > > > Today's linux-next merge of the akpm-current tree got a conflict in: > > > > > > > > > > kernel/sysctl.c > > > > > > > > > > between commit: > > > > > > > > > > f461d2dcd511 ("sysctl: avoid forward declarations") > > > > > > > > > > from the bpf-next tree and commits: > > > > > > > > Hmm, the above should have gone in through Al.. > > > > > > Al pushed them into vfs tree and we pulled that tag into bpf-next. > > > > Ok. And Stephen pulled your tree first. > > No, it is not in the branch I fetch from Al yet. Now it is...
Re: [PATCH -next] hinic: Use kmemdup instead of kzalloc and memcpy
From: Zou Wei Date: Wed, 29 Apr 2020 11:35:28 +0800 > Fixes coccicheck warnings: > > drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c:452:17-24: WARNING > opportunity for kmemdup > drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c:458:23-30: WARNING > opportunity for kmemdup > > Reported-by: Hulk Robot > Signed-off-by: Zou Wei Applied.
Re: [PATCH] pipe: read/write_iter() handler should check for IOCB_NOWAIT
On Thu, Apr 30, 2020 at 10:24:46AM -0600, Jens Axboe wrote: > Pipe read/write only checks for the file O_NONBLOCK flag, but we should > also check for IOCB_NOWAIT for whether or not we should handle this read > or write in a non-blocking fashion. If we don't, then we will block on > data or space for iocbs that explicitly asked for non-blocking > operation. This messes up callers that explicitly ask for non-blocking > operations. Why does io_uring allow setting IOCB_NOWAIT without FMODE_NOWAIT, anyway?
Re: linux-next: Tree for Apr 30 (objtool warnings)
On Thu, Apr 30, 2020 at 07:40:52AM -0700, Randy Dunlap wrote: > On 4/30/20 7:31 AM, Randy Dunlap wrote: > > On 4/30/20 12:40 AM, Stephen Rothwell wrote: > >> Hi all, > >> > >> Changes since 20200429: > >> > > > > on x86_64: > > > > kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x2e: unreachable > > instruction > > kernel/cred.o: warning: objtool: copy_creds()+0x278: unreachable instruction > > net/core/skbuff.o: warning: objtool: skb_push()+0x9e: unreachable > > instruction > > > > > > Full randconfig file is attached. > > > > oops, forgot to Cc: Josh and Peter. These are all related to -flive-patching, for some reason it reduces GCC's ability to detect local noreturn functions. I had some patches for these (and more) I'll need to dig up. -- Josh
Re: [PATCH net-next v2 3/4] net: phy: bcm54140: apply the workaround on b0 chips
From: Michael Walle Date: Wed, 29 Apr 2020 01:06:58 +0200 > The lower three bits of the phy_id specifies the chip stepping. The > workaround is specifically for the B0 stepping. Apply it only on these > chips. > > Signed-off-by: Michael Walle > Reviewed-by: Andrew Lunn > Reviewed-by: Florian Fainelli Applied.
Re: [PATCH net-next v2 4/4] net: phy: bcm54140: add second PHY ID
From: Michael Walle Date: Wed, 29 Apr 2020 01:06:59 +0200 > This PHY has two PHY IDs depending on its mode. Adjust the mask so that > it includes both IDs. > > Signed-off-by: Michael Walle Applied.
Re: [PATCH net-next 3/3] ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.
On Wed, Apr 29, 2020 at 08:28:25PM -0400, vincent.cheng...@renesas.com wrote: > @@ -871,6 +880,69 @@ static int idtcm_set_pll_mode(struct idtcm_channel > *channel, > > /* PTP Hardware Clock interface */ > > +/** > + * @brief Maximum absolute value for write phase offset in picoseconds > + * > + * Destination signed register is 32-bit register in resolution of 50ps > + * > + * 0x7fff * 50 = 2147483647 * 50 = 107374182350 > + */ > +static int _idtcm_adjphase(struct idtcm_channel *channel, s32 deltaNs) > +{ > + struct idtcm *idtcm = channel->idtcm; > + > + int err; > + u8 i; > + u8 buf[4] = {0}; > + s32 phaseIn50Picoseconds; > + s64 phaseOffsetInPs; Kernel coding style uses lower_case_underscores rather than lowerCamelCase. > + > + if (channel->pll_mode != PLL_MODE_WRITE_PHASE) { > + > + kthread_cancel_delayed_work_sync( > + >write_phase_ready_work); > + > + err = idtcm_set_pll_mode(channel, PLL_MODE_WRITE_PHASE); > + > + if (err) > + return err; > + > + channel->write_phase_ready = 0; > + > + kthread_queue_delayed_work(channel->kworker, > +>write_phase_ready_work, > +msecs_to_jiffies(WR_PHASE_SETUP_MS)); Each PHC driver automatically has a kworker provided by the class layer. In order to use it, set ptp_clock_info.do_aux_work to your callback function and then call ptp_schedule_worker() when needed. See drivers/net/ethernet/ti/cpts.c for example. Thanks, Richard
Re: [PATCH net-next v2 2/4] net: phy: bcm54140: fix phy_id_mask
From: Michael Walle Date: Wed, 29 Apr 2020 01:06:57 +0200 > Broadcom defines the bits for this PHY as follows: > { oui[24:3], model[6:0], revision[2:0] } > > Thus we have to mask the lower three bits only. > > Fixes: 6937602ed3f9 ("net: phy: add Broadcom BCM54140 support") > Signed-off-by: Michael Walle > Reviewed-by: Andrew Lunn > Reviewed-by: Florian Fainelli Applied.
Re: [PATCH net-next v2 1/4] net: phy: bcm54140: use genphy_soft_reset()
From: Michael Walle Date: Wed, 29 Apr 2020 01:06:56 +0200 > Set the .soft_reset() op to be sure there will be a reset even if there > is no hardware reset line registered. > > Signed-off-by: Michael Walle > Reviewed-by: Florian Fainelli Applied.
Re: [PATCH net-next] net: phy: at803x: add downshift support
From: Michael Walle Date: Tue, 28 Apr 2020 23:15:02 +0200 > The AR8031 and AR8035 support the link speed downshift. Add driver > support for it. One peculiarity of these PHYs is that it needs a > software reset after changing the setting, thus add the .soft_reset() > op and do a phy_init_hw() if necessary. > > This was tested on a custom board with the AR8031. > > Signed-off-by: Michael Walle Applied, thank you.
Re: [PATCH v3 0/5] Add support for RESOLVE_MAYEXEC
On Tue, 28 Apr 2020, Mickaël Salaün wrote: > Furthermore, the security policy can also be delegated to an LSM, either > a MAC system or an integrity system. For instance, the new kernel > MAY_OPENEXEC flag closes a major IMA measurement/appraisal interpreter > integrity gap by bringing the ability to check the use of scripts [1]. > Other uses are expected, such as for openat2(2) [2], SGX integration > [3], bpffs [4] or IPE [5]. Confirming that this is a highly desirable feature for the proposed IPE LSM. -- James Morris
Re: [PATCH] capabilities: add description for CAP_SETFCAP
On Tue, 28 Apr 2020, Stefan Hajnoczi wrote: > On Tue, Apr 14, 2020 at 04:49:45PM +0100, Stefan Hajnoczi wrote: > > Document the purpose of CAP_SETFCAP. For some reason this capability > > had no description while the others did. > > > > Signed-off-by: Stefan Hajnoczi > > --- > > include/uapi/linux/capability.h | 2 ++ > > 1 file changed, 2 insertions(+) > > Ping? Please resend and I'll add it to my tree. > > > diff --git a/include/uapi/linux/capability.h > > b/include/uapi/linux/capability.h > > index 272dc69fa080..7288f0ad44af 100644 > > --- a/include/uapi/linux/capability.h > > +++ b/include/uapi/linux/capability.h > > @@ -332,6 +332,8 @@ struct vfs_ns_cap_data { > > > > #define CAP_AUDIT_CONTROL30 > > > > +/* Set or remove capabilities on files */ > > + > > #define CAP_SETFCAP 31 > > > > /* Override MAC access. > > -- > > 2.25.1 > > > -- James Morris
Re: [PATCH v4 1/6] arch/x86/kvm: Refactor l1d flush lifecycle management
On Sat, 2020-04-25 at 11:49 +1000, Balbir Singh wrote: > On Fri, 2020-04-24 at 13:59 -0500, Tom Lendacky wrote: > > > > On 4/23/20 9:01 AM, Balbir Singh wrote: > > > Split out the allocation and free routines to be used in a follow > > > up set of patches (to reuse for L1D flushing). > > > > > > Signed-off-by: Balbir Singh > > > Reviewed-by: Kees Cook > > > --- > > > arch/x86/include/asm/cacheflush.h | 3 +++ > > > arch/x86/kernel/Makefile | 1 + > > > arch/x86/kernel/l1d_flush.c | 36 +++ > > > arch/x86/kvm/vmx/vmx.c| 25 +++-- > > > 4 files changed, 43 insertions(+), 22 deletions(-) > > > create mode 100644 arch/x86/kernel/l1d_flush.c > > > > > > diff --git a/arch/x86/include/asm/cacheflush.h > > > b/arch/x86/include/asm/cacheflush.h > > > index 63feaf2a5f93..bac56fcd9790 100644 > > > --- a/arch/x86/include/asm/cacheflush.h > > > +++ b/arch/x86/include/asm/cacheflush.h > > > @@ -6,6 +6,9 @@ > > > #include > > > #include > > > > > > +#define L1D_CACHE_ORDER 4 > > > > Since this is becoming a generic function now, shouldn't this value be > > based on the actual L1D cache size? Is this value based on a 32KB data > > cache and the idea is to write twice the size of the cache to be sure that > > every entry has been replaced - with the second 32KB to catch the odd line > > that might not have been pulled in? > > > > Currently the only users are VMX L1TF and optional prctl(). It should be > based > on actual L1D cache size, I checked a little bit and the largest L1D cache > size across various x86 bits is 64K. so there are three options here: > > 1. We refactor the code, we would need to save the L1D cache size and use > cpu_dev callbacks for L1D flush > 2. We can make the current code depend on L1D_FLUSH MSR and enable it only > when that feature is available. There would be no software fallback. Then > follow it up with #1 > 3. We keep the code as is on the assumption that all of L1D <= 64K across > the > current platforms and we do #1 in a followup (since the prctl is optional > and > the only other user is the VMX code). > > Thanks for the review, > Balbir Singh. > Tom, I have the following changes that I think will suffice for now (these are not properly formatted, but you get the idea) diff --git a/arch/x86/kernel/l1d_flush.c b/arch/x86/kernel/l1d_flush.c index a754b6c288a9..7fec0cc8f85c 100644 --- a/arch/x86/kernel/l1d_flush.c +++ b/arch/x86/kernel/l1d_flush.c @@ -92,6 +92,9 @@ int l1d_flush_init_once(void) { int ret = 0; + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return -ENOTSUPP; + if (static_cpu_has(X86_FEATURE_FLUSH_L1D) || l1d_flush_pages) return ret; Does that satisfy your comments about patch 1/6 and 2/6 in the series? Balbir Singh.
Re: [PATCH bpf] security: Fix the default value of fs_context_parse_param hook
On Thu, 30 Apr 2020, KP Singh wrote: > From: KP Singh > Applied to: git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git for-v5.7 -- James Morris
linux-next: manual merge of the drm tree with the drm-misc-fixes tree
Hi all, Today's linux-next merge of the drm tree got a conflict in: include/linux/dma-buf.h between commit: 6f49c2515e22 ("dma-buf: fix documentation build warnings") from the drm-misc-fixes tree and commit: 09606b5446c2 ("dma-buf: add peer2peer flag") from the drm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/linux/dma-buf.h index 57bcef6f988a,82e0a4a64601.. --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@@ -333,8 -334,16 +333,16 @@@ struct dma_buf * Attachment operations implemented by the importer. */ struct dma_buf_attach_ops { + /** +* @allow_peer2peer: +* +* If this is set to true the importer must be able to handle peer +* resources without struct pages. +*/ + bool allow_peer2peer; + /** - * @move_notify + * @move_notify: [optional] notification that the DMA-buf is moving * * If this callback is provided the framework can avoid pinning the * backing store while mappings exists. pgpWjGoepBcYY.pgp Description: OpenPGP digital signature
Re: [PATCH net-next 2/3] ptp: Add adjust_phase to ptp_clock_caps capability.
On Wed, Apr 29, 2020 at 08:28:24PM -0400, vincent.cheng...@renesas.com wrote: > From: Vincent Cheng > > Add adjust_phase to ptp_clock_caps capability to allow > user to query if a PHC driver supports adjust phase with > ioctl PTP_CLOCK_GETCAPS command. > > Signed-off-by: Vincent Cheng Reviewed-by: Richard Cochran
Re: [PATCH net-next 1/3] ptp: Add adjphase function to support phase offset control.
On Wed, Apr 29, 2020 at 08:28:23PM -0400, vincent.cheng...@renesas.com wrote: > diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c > index acabbe7..c46ff98 100644 > --- a/drivers/ptp/ptp_clock.c > +++ b/drivers/ptp/ptp_clock.c > @@ -146,6 +146,8 @@ static int ptp_clock_adjtime(struct posix_clock *pc, > struct __kernel_timex *tx) > else > err = ops->adjfreq(ops, ppb); > ptp->dialed_frequency = tx->freq; > + } else if (tx->modes & ADJ_OFFSET) { > + err = ops->adjphase(ops, tx->offset); This is a new method, and no drivers have it, so there must be a check that the function pointer is non-null. Thanks, Richard
Re: [PATCH RFC] Kbuild: Makefile: warn if auto.conf is obsolete
On Fri, May 1, 2020 at 4:25 AM Mauro Carvalho Chehab wrote: > > A new behavior on more recent kernels require to always call > "make modules_prepare" after *any* Kconfig changes. Again, this is the behavior since 2004. This commit: https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=95065ad3fa787c417008a36d3a5d9a3bab17ab98 Shrug if you complain about what has been stable more than 15 years. > This is not what a poor mortal would be expecting on a building > system, as it should, IMHO, be able to detect and auto-run > whatever is needed to use the newer setup. No. External module builds should never ever attempt to update in-tree files. This is because the build environment for external modules is usually located in /lib/modules/$(uname -r)/build/, which is read-only. A number of upstream developers (ab)use M= to compile test individual directories, despite the fact Kbuild supports the single target 'make drivers/staging/media/stomisp/' You need to cope with this conflicting comment line: https://github.com/masahir0y/linux/blob/v5.6/Makefile#L681 since you care if auto.conf is up-to-date. > Yet, while this is not solved, let's at least stop the build > and produce a warning, to notify the user about that. > > Signed-off-by: Mauro Carvalho Chehab > --- > > I would still prefer to call "make modules_prepare" directly, > on such cases, but just calling "make -C . modules_prepare" doesn't > work. So, the next best thing would be to at least print a message > and don't try to do a build with a broken auto.conf file. > > Makefile | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/Makefile b/Makefile > index 70def4907036..492ee2396ab9 100644 > --- a/Makefile > +++ b/Makefile > @@ -1632,6 +1632,11 @@ $(objtree)/Module.symvers: > build-dirs := $(KBUILD_EXTMOD) > PHONY += modules > modules: descend $(objtree)/Module.symvers > + @if [ $(KCONFIG_CONFIG) -nt include/config/auto.conf ]; then \ > + echo " WARNING: $(KCONFIG_CONFIG) was modified. Need to > run:"; \ > + echo " $(MAKE) modules_prepare"; \ > + exit -1; \ > + fi > $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost > > PHONY += modules_install -- Best Regards Masahiro Yamada
Re: [PATCH] dpaa_eth: Fix comparing pointer to 0
From: Aishwarya Ramakrishnan Date: Mon, 27 Apr 2020 16:02:30 +0530 > Fixes coccicheck warning: > ./drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2110:30-31: > WARNING comparing pointer to 0 > > Avoid pointer type value compared to 0. > > Signed-off-by: Aishwarya Ramakrishnan Applied, thanks.
Re: linux-next: manual merge of the mlx5-next tree with the kspp-gustavo tree
Hi Stephen, On 4/30/20 22:12, Stephen Rothwell wrote: > Hi all, > > On Wed, 29 Apr 2020 12:06:25 +1000 Stephen Rothwell > wrote: >> >> Today's linux-next merge of the mlx5-next tree got a conflict in: >> >> include/linux/mlx5/mlx5_ifc.h >> >> between commit: >> >> 3ba225b506a2 ("treewide: Replace zero-length array with flexible-array >> member") >> >> from the kspp-gustavo tree and commit: >> >> d65dbedfd298 ("net/mlx5: Add support for COPY steering action") >> >> from the mlx5-next tree. >> >> I fixed it up (see below) and can carry the fix as necessary. This >> is now fixed as far as linux-next is concerned, but any non trivial >> conflicts should be mentioned to your upstream maintainer when your tree >> is submitted for merging. You may also want to consider cooperating >> with the maintainer of the conflicting tree to minimise any particularly >> complex conflicts. >> >> -- >> Cheers, >> Stephen Rothwell >> >> diff --cc include/linux/mlx5/mlx5_ifc.h >> index 8d30f18dcdee,fb243848132d.. >> --- a/include/linux/mlx5/mlx5_ifc.h >> +++ b/include/linux/mlx5/mlx5_ifc.h >> @@@ -5743,7 -5771,7 +5771,7 @@@ struct mlx5_ifc_alloc_modify_header_con >> u8 reserved_at_68[0x10]; >> u8 num_of_actions[0x8]; >> >> -union mlx5_ifc_set_action_in_add_action_in_auto_bits actions[]; >> - union mlx5_ifc_set_add_copy_action_in_auto_bits actions[0]; >> ++ union mlx5_ifc_set_add_copy_action_in_auto_bits actions[]; >> }; >> >> struct mlx5_ifc_dealloc_modify_header_context_out_bits { >> @@@ -9677,9 -9705,32 +9705,32 @@@ struct mlx5_ifc_mcda_reg_bits >> >> u8 reserved_at_60[0x20]; >> >> - u8 data[0][0x20]; >> + u8 data[][0x20]; >> }; >> >> + enum { >> +MLX5_MFRL_REG_RESET_TYPE_FULL_CHIP = BIT(0), >> +MLX5_MFRL_REG_RESET_TYPE_NET_PORT_ALIVE = BIT(1), >> + }; >> + >> + enum { >> +MLX5_MFRL_REG_RESET_LEVEL0 = BIT(0), >> +MLX5_MFRL_REG_RESET_LEVEL3 = BIT(3), >> +MLX5_MFRL_REG_RESET_LEVEL6 = BIT(6), >> + }; >> + >> + struct mlx5_ifc_mfrl_reg_bits { >> +u8 reserved_at_0[0x20]; >> + >> +u8 reserved_at_20[0x2]; >> +u8 pci_sync_for_fw_update_start[0x1]; >> +u8 pci_sync_for_fw_update_resp[0x2]; >> +u8 rst_type_sel[0x3]; >> +u8 reserved_at_28[0x8]; >> +u8 reset_type[0x8]; >> +u8 reset_level[0x8]; >> + }; >> + >> struct mlx5_ifc_mirc_reg_bits { >> u8 reserved_at_0[0x18]; >> u8 status_code[0x8]; > > This is now a conflict between the net-next and kspp-gustavo trees. > Thanks for reporting this. I think the best solution, for now, is to remove the changes from my tree. I'll do it right away. Thanks -- Gustavo
Re: [PATCH] net: moxa: Fix a potential double 'free_irq()'
From: Christophe JAILLET Date: Sun, 26 Apr 2020 22:59:21 +0200 > Should an irq requested with 'devm_request_irq' be released explicitly, > it should be done by 'devm_free_irq()', not 'free_irq()'. > > Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver") > Signed-off-by: Christophe JAILLET Applied, thank you.
Re: [PATCH] platform/chrome: cros_ec_typec: Handle NULL EC pointer during probe.
Hi Prashant, I do not think it is present. Thinking about it, I do not think it shall be an issue on any released device as it will have either a firmware which wouldn't even trigger the typec probe or the one after the hierarchy fix. Likely I just got a firmware which was somewhere in between those two (As I did some unrelated FW testing). So, yes, probably putting this upstream is not necessary, though IMO more sanity checks - especially on non-critical run-once paths - are always better than having a kernel panic lingering around the corner, not like I am insisting on pushing the patch though with all the info, up to Enric. Cheers, Daniil On Fri, May 1, 2020 at 10:56 AM Prashant Malani wrote: > > Hi Daniil, > > On Fri, May 01, 2020 at 10:15:18AM +1000, Daniil Lunev wrote: > > On the official revision of coreboot for hatch it doesn't even try to > > load Type C. However it gives some warning messages from > > cros-usbpd-notify-acpi about EC, So I wonder why the check of the same > > type is not appropriate in the typec driver? > > I think the difference is that GOOG0003 is already present on shipped / > official versions of coreboot (so not having that check can cause > existing release images/devices to crash), whereas for GOOG0014 that is / > isn't the case. > > Is GOOG0014 present on the official release coreboot image for this > device? If so, what's its path (/sys/bus/acpi/devices//path) ? > > Best regards, > > -Prashant > > > > ../chrome/cros_usbpd_notify.c > > > > /* Get the EC device pointer needed to talk to the EC. */ > > ec_dev = dev_get_drvdata(dev->parent); > > if (!ec_dev) { > > /* > > * We continue even for older devices which don't have the > > * correct device heirarchy, namely, GOOG0003 is a child > > * of GOOG0004. > > */ > > dev_warn(dev, "Couldn't get Chrome EC device pointer.\n"); > > } > > > > > > # dmesg > > ... > > [8.513351] cros-ec-spi spi-PRP0001:02: EC failed to respond in time > > [8.722072] cros-ec-spi spi-PRP0001:02: EC failed to respond in time > > [8.729271] cros-ec-spi spi-PRP0001:02: Cannot identify the EC: error > > -110 > > [8.736966] cros-ec-spi spi-PRP0001:02: cannot register EC, > > fallback to spidev > > [8.767017] cros_ec_lpcs GOOG0004:00: Chrome EC device registered > > [8.807537] cros-usbpd-notify-acpi GOOG0003:00: Couldn't get Chrome > > EC device pointer. > > ... > > > > On Fri, May 1, 2020 at 2:17 AM Prashant Malani wrote: > > > > > > Hi Enric, > > > > > > On Thu, Apr 30, 2020 at 8:26 AM Enric Balletbo i Serra > > > wrote: > > > > > > > > Hi Prashant, > > > > > > > > On 30/4/20 2:43, Prashant Malani wrote: > > > > > On Wed, Apr 29, 2020 at 5:38 PM Daniil Lunev > > > > > wrote: > > > > >> > > > > >> [to make it appear on the mailing list as I didn't realize I was in > > > > >> hypertext sending mode] > > > > >> > > > > >> On Thu, Apr 30, 2020 at 10:11 AM Daniil Lunev > > > > >> wrote: > > > > >>> > > > > >>> Hi Enric. > > > > >>> I encountered the issue on a Hatch device when trying running 5.4 > > > > >>> kernel on that. After talking to Prashant it seems that any device > > > > >>> with coreboot built before a certain point (a particular fix for > > > > >>> device hierarchy in ACPI tables of Chrome devices which happened in > > > > >>> mid-April) will not be able to correctly initialize the driver and > > > > >>> will get a kernel panic trying to do so. > > > > > > > > > > A clarifying detail here: This should not be seen in any current > > > > > *production* device. No prod device firmware will carry the erroneous > > > > > ACPI device entry. > > > > > > > > > > > > > Thanks for the clarification. Then, I don't think we need to upstream > > > > this. This > > > > kind of "defensive-programming" it's not something that should matter > > > > to upstream. > > > > > > Actually, on second thought, I am not 100% sure about this: > > > Daniil, is the erroneous ACPI device on a *production* firmware for > > > this device (I'm not sure about the vintage of that device's BIOS)? > > > > > > My apologies for the confusion, Enric and Daniil; but would be good to > > > get clarification from Daniil. > > > > > > Best regards, > > > > > > > > Thanks, > > > > Enric > > > > > > > > > > > > >>> Thanks, > > > > >>> Daniil > > > > >>> > > > > >>> On Thu, Apr 30, 2020 at 7:58 AM Enric Balletbo i Serra > > > > >>> wrote: > > > > > > > > Hi Daniil, > > > > > > > > Thank you for the patch. > > > > > > > > On 28/4/20 3:02, Daniil Lunev wrote: > > > > > Missing EC in device hierarchy causes NULL pointer to be returned > > > > > to the > > > > > probe function which leads to NULL pointer dereference when > > > > > trying to > > > > > send a command to the EC. This can be the case if the device is > > > > > missing > > > > > or incorrectly configured in the firmware blob. Even if the > > > > > situation > > > > > > > > There is any production device with a buggy
Re: [PATCH V1 09/10] arch/kmap: Define kmap_atomic_prot() for all arch's
On Fri, May 01, 2020 at 03:37:34AM +0100, Al Viro wrote: > On Thu, Apr 30, 2020 at 01:38:44PM -0700, ira.we...@intel.com wrote: > > > -static inline void *kmap_atomic(struct page *page) > > +static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) > > { > > preempt_disable(); > > pagefault_disable(); > > if (!PageHighMem(page)) > > return page_address(page); > > - return kmap_atomic_high(page); > > + return kmap_atomic_high_prot(page, prot); > > } > > +#define kmap_atomic(page) kmap_atomic_prot(page, kmap_prot) > > OK, so it *was* just a bisect hazard - you return to original semantics > wrt preempt_disable()... FWIW, how about doing the following: just before #5/10 have a patch that would touch only microblaze, ppc and x86 splitting their kmap_atomic_prot() into an inline helper + kmap_atomic_high_prot(). Then your #5 would leave their kmap_atomic_prot() as-is (it would use kmap_atomic_prot_high() instead). The rest of the series plays out pretty much the same way it does now, and wrappers on those 3 architectures would go away when an identical generic one is introduced in this commit (#9/10). AFAICS, that would avoid the bisect hazard and might even end up with less noise in the patches...
[PATCH v3 1/3] powerpc/numa: Set numa_node for all possible cpus
A Powerpc system with multiple possible nodes and with CONFIG_NUMA enabled always used to have a node 0, even if node 0 does not any cpus or memory attached to it. As per PAPR, node affinity of a cpu is only available once its present / online. For all cpus that are possible but not present, cpu_to_node() would point to node 0. To ensure a cpuless, memoryless dummy node is not online, powerpc need to make sure all possible but not present cpu_to_node are set to a proper node. Cc: linuxppc-...@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Signed-off-by: Srikar Dronamraju --- Changelog v1:->v2: - Rebased to v5.7-rc3 arch/powerpc/mm/numa.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 9fcf2d195830..b3615b7fdbdf 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -931,8 +931,20 @@ void __init mem_topology_setup(void) reset_numa_cpu_lookup_table(); - for_each_present_cpu(cpu) - numa_setup_cpu(cpu); + for_each_possible_cpu(cpu) { + /* +* Powerpc with CONFIG_NUMA always used to have a node 0, +* even if it was memoryless or cpuless. For all cpus that +* are possible but not present, cpu_to_node() would point +* to node 0. To remove a cpuless, memoryless dummy node, +* powerpc need to make sure all possible but not present +* cpu_to_node are set to a proper node. +*/ + if (cpu_present(cpu)) + numa_setup_cpu(cpu); + else + set_cpu_numa_node(cpu, first_online_node); + } } void __init initmem_init(void) -- 2.20.1
Re: linux-next: manual merge of the mlx5-next tree with the kspp-gustavo tree
Hi all, On Wed, 29 Apr 2020 12:06:25 +1000 Stephen Rothwell wrote: > > Today's linux-next merge of the mlx5-next tree got a conflict in: > > include/linux/mlx5/mlx5_ifc.h > > between commit: > > 3ba225b506a2 ("treewide: Replace zero-length array with flexible-array > member") > > from the kspp-gustavo tree and commit: > > d65dbedfd298 ("net/mlx5: Add support for COPY steering action") > > from the mlx5-next tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. > > -- > Cheers, > Stephen Rothwell > > diff --cc include/linux/mlx5/mlx5_ifc.h > index 8d30f18dcdee,fb243848132d.. > --- a/include/linux/mlx5/mlx5_ifc.h > +++ b/include/linux/mlx5/mlx5_ifc.h > @@@ -5743,7 -5771,7 +5771,7 @@@ struct mlx5_ifc_alloc_modify_header_con > u8 reserved_at_68[0x10]; > u8 num_of_actions[0x8]; > > - union mlx5_ifc_set_action_in_add_action_in_auto_bits actions[]; > -union mlx5_ifc_set_add_copy_action_in_auto_bits actions[0]; > ++union mlx5_ifc_set_add_copy_action_in_auto_bits actions[]; > }; > > struct mlx5_ifc_dealloc_modify_header_context_out_bits { > @@@ -9677,9 -9705,32 +9705,32 @@@ struct mlx5_ifc_mcda_reg_bits > > u8 reserved_at_60[0x20]; > > -u8 data[0][0x20]; > +u8 data[][0x20]; > }; > > + enum { > + MLX5_MFRL_REG_RESET_TYPE_FULL_CHIP = BIT(0), > + MLX5_MFRL_REG_RESET_TYPE_NET_PORT_ALIVE = BIT(1), > + }; > + > + enum { > + MLX5_MFRL_REG_RESET_LEVEL0 = BIT(0), > + MLX5_MFRL_REG_RESET_LEVEL3 = BIT(3), > + MLX5_MFRL_REG_RESET_LEVEL6 = BIT(6), > + }; > + > + struct mlx5_ifc_mfrl_reg_bits { > + u8 reserved_at_0[0x20]; > + > + u8 reserved_at_20[0x2]; > + u8 pci_sync_for_fw_update_start[0x1]; > + u8 pci_sync_for_fw_update_resp[0x2]; > + u8 rst_type_sel[0x3]; > + u8 reserved_at_28[0x8]; > + u8 reset_type[0x8]; > + u8 reset_level[0x8]; > + }; > + > struct mlx5_ifc_mirc_reg_bits { > u8 reserved_at_0[0x18]; > u8 status_code[0x8]; This is now a conflict between the net-next and kspp-gustavo trees. -- Cheers, Stephen Rothwell pgp0jN46tF6G_.pgp Description: OpenPGP digital signature
[PATCH v3 2/3] powerpc/numa: Prefer node id queried from vphn
Node id queried from the static device tree may not be correct. For example: it may always show 0 on a shared processor. Hence prefer the node id queried from vphn and fallback on the device tree based node id if vphn query fails. Cc: linuxppc-...@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Signed-off-by: Srikar Dronamraju --- Changelog v2:->v3: - Resolved comments from Gautham. Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-sri...@linux.vnet.ibm.com/t/#u Changelog v1:->v2: - Rebased to v5.7-rc3 arch/powerpc/mm/numa.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index b3615b7fdbdf..79c74a2753c1 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -719,21 +719,22 @@ static int __init parse_numa_properties(void) */ for_each_present_cpu(i) { struct device_node *cpu; - int nid; - - cpu = of_get_cpu_node(i, NULL); - BUG_ON(!cpu); - nid = of_node_to_nid_single(cpu); - of_node_put(cpu); + int nid = vphn_get_nid(i); /* * Don't fall back to default_nid yet -- we will plug * cpus into nodes once the memory scan has discovered * the topology. */ - if (nid < 0) - continue; - node_set_online(nid); + if (nid == NUMA_NO_NODE) { + cpu = of_get_cpu_node(i, NULL); + BUG_ON(!cpu); + nid = of_node_to_nid_single(cpu); + of_node_put(cpu); + } + + if (likely(nid > 0)) + node_set_online(nid); } get_n_mem_cells(_mem_addr_cells, _mem_size_cells); -- 2.20.1
[PATCH v3 0/3] Offline memoryless cpuless node 0
Changelog v2:->v3: - Resolved comments from Gautham. Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-sri...@linux.vnet.ibm.com/t/#u Changelog v1:->v2: - Rebased to v5.7-rc3 - Updated the changelog. Link v1: https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#u Linux kernel configured with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause 1. numa_balancing to be enabled on systems with only one online node. 2. Existence of dummy (cpuless and memoryless) node which can confuse users/scripts looking at output of lscpu / numactl. This patchset wants to correct this anomaly. This should only affect systems that have CONFIG_MEMORYLESS_NODES. Currently there are only 2 architectures ia64 and powerpc that have this config. Note: Patch 3 in this patch series depends on patches 1 and 2. Without patches 1 and 2, patch 3 might crash powerpc. v5.7-rc3 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files -- /sys/devices/system/node/online:0,2 /proc/sys/kernel/numa_balancing:1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.7-rc3 + patches -- available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files -- /sys/devices/system/node/online:2 /proc/sys/kernel/numa_balancing:0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Cc: linuxppc-...@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Srikar Dronamraju (3): powerpc/numa: Set numa_node for all possible cpus powerpc/numa: Prefer node id queried from vphn mm/page_alloc: Keep memoryless cpuless node 0 offline arch/powerpc/mm/numa.c | 35 --- mm/page_alloc.c| 4 +++- 2 files changed, 27 insertions(+), 12 deletions(-) -- 2.20.1
[PATCH v3 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
Currently Linux kernel with CONFIG_NUMA on a system with multiple possible nodes, marks node 0 as online at boot. However in practice, there are systems which have node 0 as memoryless and cpuless. This can cause numa_balancing to be enabled on systems with only one node with memory and CPUs. The existence of this dummy node which is cpuless and memoryless node can confuse users/scripts looking at output of lscpu / numactl. By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is always online. v5.7-rc3 available: 2 nodes (0,2) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31490 MB node distances: node 0 2 0: 10 20 2: 20 10 proc and sys files -- /sys/devices/system/node/online:0,2 /proc/sys/kernel/numa_balancing:1 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 v5.7-rc3 + patch -- available: 1 nodes (2) node 2 cpus: 0 1 2 3 4 5 6 7 node 2 size: 32625 MB node 2 free: 31487 MB node distances: node 2 2: 10 proc and sys files -- /sys/devices/system/node/online:2 /proc/sys/kernel/numa_balancing:0 /sys/devices/system/node/has_cpu: 2 /sys/devices/system/node/has_memory:2 /sys/devices/system/node/has_normal_memory: 2 /sys/devices/system/node/possible: 0-31 Note: On Powerpc, cpu_to_node of possible but not present cpus would previously return 0. Hence this commit depends on commit ("powerpc/numa: Set numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id queried from vphn"). Without the 2 commits, Powerpc system might crash. Cc: linuxppc-...@lists.ozlabs.org Cc: linux...@kvack.org Cc: linux-kernel@vger.kernel.org Cc: Michal Hocko Cc: Mel Gorman Cc: Vlastimil Babka Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Cc: Michael Ellerman Cc: Andrew Morton Cc: Linus Torvalds Cc: Gautham R Shenoy Signed-off-by: Srikar Dronamraju --- Changelog v1:->v2: - Rebased to v5.7-rc3 Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-sri...@linux.vnet.ibm.com/t/#u mm/page_alloc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 69827d4fa052..03b89592af04 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy); */ nodemask_t node_states[NR_NODE_STATES] __read_mostly = { [N_POSSIBLE] = NODE_MASK_ALL, +#ifdef CONFIG_NUMA + [N_ONLINE] = NODE_MASK_NONE, +#else [N_ONLINE] = { { [0] = 1UL } }, -#ifndef CONFIG_NUMA [N_NORMAL_MEMORY] = { { [0] = 1UL } }, #ifdef CONFIG_HIGHMEM [N_HIGH_MEMORY] = { { [0] = 1UL } }, -- 2.20.1
KASAN: slab-out-of-bounds Read in gadget_dev_desc_UDC_store
We report a bug (in linux-5.6.8) found by FuzzUSB (a modified version of syzkaller). This happened when the size of "name" buffer is smaller than that of "page" buffer (after function kstrdup executed at line 263). I guess it comes from the "page" buffer containing 0 value in the middle. So accessing the "name" buffer with "len" variable, which is used to indicate the size of "page" buffer, triggered memory access violation. To fix, it may need to check the size of name buffer, and try to use right index variable. kernel config: https://kt0755.github.io/etc/config_v5.6.8 == BUG: KASAN: slab-out-of-bounds in gadget_dev_desc_UDC_store+0x1ba/0x200 drivers/usb/gadget/configfs.c:266 Read of size 1 at addr 88806a55dd7e by task syz-executor.0/17208 CPU: 2 PID: 17208 Comm: syz-executor.0 Not tainted 5.6.8 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xce/0x128 lib/dump_stack.c:118 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506 kasan_report+0x12/0x20 mm/kasan/common.c:641 __asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132 gadget_dev_desc_UDC_store+0x1ba/0x200 drivers/usb/gadget/configfs.c:266 flush_write_buffer fs/configfs/file.c:251 [inline] configfs_write_file+0x2f1/0x4c0 fs/configfs/file.c:283 __vfs_write+0x85/0x110 fs/read_write.c:494 vfs_write+0x1cd/0x510 fs/read_write.c:558 ksys_write+0x18a/0x220 fs/read_write.c:611 __do_sys_write fs/read_write.c:623 [inline] __se_sys_write fs/read_write.c:620 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:620 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x452149 Code: 2d 61 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7f3bd907cc78 EFLAGS: 0246 ORIG_RAX: 0001 RAX: ffda RBX: 0073c0f8 RCX: 00452149 RDX: fed8 RSI: 23c0 RDI: 0003 RBP: 0003 R08: R09: R10: R11: 0246 R12: 004bf782 R13: 004d7710 R14: 7f3bd907d6d4 R15: Allocated by task 1: save_stack+0x21/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529 __kmalloc+0x11c/0x310 mm/slub.c:3841 kmalloc include/linux/slab.h:560 [inline] kzalloc include/linux/slab.h:669 [inline] acpi_os_allocate_zeroed+0x3e/0x42 include/acpi/platform/aclinuxex.h:57 acpi_ns_internalize_name+0xd9/0x16a drivers/acpi/acpica/nsutils.c:331 acpi_ns_get_node_unlocked+0x17e/0x1fe drivers/acpi/acpica/nsutils.c:666 acpi_ns_get_node+0x44/0x62 drivers/acpi/acpica/nsutils.c:726 acpi_ns_evaluate+0xc8/0x93e drivers/acpi/acpica/nseval.c:61 acpi_ut_evaluate_object+0xe4/0x386 drivers/acpi/acpica/uteval.c:60 acpi_ut_execute_power_methods+0xda/0x1b1 drivers/acpi/acpica/uteval.c:288 acpi_get_object_info+0x487/0x994 drivers/acpi/acpica/nsxfname.c:366 acpi_set_pnp_ids drivers/acpi/scan.c:1245 [inline] acpi_init_device_object+0xbfd/0x17a0 drivers/acpi/scan.c:1585 acpi_add_single_object+0x121/0x1710 drivers/acpi/scan.c:1620 acpi_bus_check_add+0x1c9/0x4f0 drivers/acpi/scan.c:1873 acpi_ns_walk_namespace+0x1d3/0x320 drivers/acpi/acpica/nswalk.c:236 acpi_walk_namespace+0xb5/0xef drivers/acpi/acpica/nsxfeval.c:606 acpi_bus_scan+0xdf/0xf0 drivers/acpi/scan.c:2054 acpi_scan_init+0x264/0x5e4 drivers/acpi/scan.c:2218 acpi_init+0x592/0x612 drivers/acpi/bus.c:1249 do_one_initcall+0xe0/0x650 init/main.c:1152 do_initcall_level init/main.c:1225 [inline] do_initcalls init/main.c:1241 [inline] do_basic_setup init/main.c:1261 [inline] kernel_init_freeable+0x5e8/0x67c init/main.c:1445 kernel_init+0x13/0x1b0 init/main.c:1352 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Freed by task 1: save_stack+0x21/0x90 mm/kasan/common.c:72 set_track mm/kasan/common.c:80 [inline] kasan_set_free_info mm/kasan/common.c:337 [inline] __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485 slab_free_hook mm/slub.c:1444 [inline] slab_free_freelist_hook mm/slub.c:1477 [inline] slab_free mm/slub.c:3034 [inline] kfree+0xf7/0x410 mm/slub.c:3995 acpi_os_free include/acpi/platform/aclinuxex.h:62 [inline] acpi_ns_get_node_unlocked+0x1c8/0x1fe drivers/acpi/acpica/nsutils.c:686 acpi_ns_get_node+0x44/0x62 drivers/acpi/acpica/nsutils.c:726 acpi_ns_evaluate+0xc8/0x93e drivers/acpi/acpica/nseval.c:61 acpi_ut_evaluate_object+0xe4/0x386 drivers/acpi/acpica/uteval.c:60 acpi_ut_execute_power_methods+0xda/0x1b1 drivers/acpi/acpica/uteval.c:288
Re: [PATCH 3/4] counter: Add character device interface
Hi William, I love your patch! Yet something to improve: [auto build test ERROR on stm32/stm32-next] [cannot apply to linus/master linux/master v5.7-rc3 next-20200430] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system. BTW, we also suggest to use '--base' option to specify the base tree in git format-patch, please see https://stackoverflow.com/a/37406982] url: https://github.com/0day-ci/linux/commits/William-Breathitt-Gray/Introduce-the-Counter-character-device-interface/20200430-051734 base: https://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32.git stm32-next config: x86_64-randconfig-d003-20200501 (attached as .config) compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kbuild test robot All errors (new ones prefixed by >>): In file included from :32:0: >> ./usr/include/linux/counter.h:22:2: error: unknown type name 'size_t' size_t num_signals; ^~ ./usr/include/linux/counter.h:23:2: error: unknown type name 'size_t' size_t num_counts; ^~ ./usr/include/linux/counter.h:24:2: error: unknown type name 'size_t' size_t num_ext; ^~ ./usr/include/linux/counter.h:35:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:39:2: error: unknown type name 'size_t' size_t num_ext; ^~ ./usr/include/linux/counter.h:50:2: error: unknown type name 'size_t' size_t count_index; ^~ ./usr/include/linux/counter.h:51:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:53:2: error: unknown type name 'size_t' size_t num_actions; ^~ ./usr/include/linux/counter.h:54:2: error: unknown type name 'size_t' size_t signal_index; ^~ ./usr/include/linux/counter.h:67:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:71:2: error: unknown type name 'size_t' size_t num_functions; ^~ ./usr/include/linux/counter.h:72:2: error: unknown type name 'size_t' size_t num_synapses; ^~ ./usr/include/linux/counter.h:73:2: error: unknown type name 'size_t' size_t num_ext; ^~ ./usr/include/linux/counter.h:86:2: error: unknown type name 'size_t' size_t owner_index; ^~ ./usr/include/linux/counter.h:87:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:99:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:110:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:121:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:133:2: error: unknown type name 'size_t' size_t count_index; ^~ ./usr/include/linux/counter.h:134:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:146:2: error: unknown type name 'size_t' size_t count_index; ^~ ./usr/include/linux/counter.h:147:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:160:2: error: unknown type name 'size_t' size_t count_index; ^~ ./usr/include/linux/counter.h:161:2: error: unknown type name 'size_t' size_t synapse_index; ^~ ./usr/include/linux/counter.h:162:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:176:2: error: unknown type name 'size_t' size_t owner_index; ^~ ./usr/include/linux/counter.h:177:2: error: unknown type name 'size_t' size_t index; ^~ >> ./usr/include/linux/counter.h:179:2: error: unknown type name 'bool' bool data; ^~~~ ./usr/include/linux/counter.h:191:2: error: unknown type name 'size_t' size_t owner_index; ^~ ./usr/include/linux/counter.h:192:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:206:2: error: unknown type name 'size_t' size_t owner_index; ^~ ./usr/include/linux/counter.h:207:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:221:2: error: unknown type name 'size_t' size_t owner_index; ^~ ./usr/include/linux/counter.h:222:2: error: unknown type name 'size_t' size_t index; ^~ ./usr/include/linux/counter.h:224:2: error: unknown type name 'size_t' size_t enum_index; ^~ ./usr/include/linux/counter.h:237:2: error: unknown type name 'size_t' size_t owner_index; ^~ ./usr/include/linux/counter.h:238:2: error: unknown type name 'size_t' size_t ext_index;
[git pull] drm fixes for 5.7-rc4
Hi Linus, Regular scheduled fixes pull for graphics. Nothing to extreme bunch of amdgpu fixes, i915 and qxl fixes, along with some misc ones. All seems to be progressing normally. Dave. drm-fixes-2020-05-01: drm fixes for 5.7-rc4 core: - EDID off by one DTD fix - DP mst write return code fix dma-buf: - fix SET_NAME ioctl uapi - doc fixes amdgpu: - Fix a green screen on resume issue - PM fixes for SR-IOV - SDMA fix for navi - Renoir display fixes - Cursor and pageflip stuttering fixes - Misc additional display fixes - (uapi) Add additional DCC tiling flags for navi1x i915: - Fix selftest refcnt leak (Xiyu) - Fix gem vma lock (Chris) - Fix gt's i915_request.timeline acquire by checking if cacheline is valid (Chris) - Fix IRQ postinistall fault masks (Matt) qxl: - use after gree fix - fix lost kunmap - release leak fix virtio: - context destruction fix The following changes since commit 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c: Linux 5.7-rc3 (2020-04-26 13:51:02 -0700) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2020-05-01 for you to fetch changes up to e3dcd86b3b4c045a4db17c02340138a4c514fe20: Merge tag 'amd-drm-fixes-5.7-2020-04-29' of git://people.freedesktop.org/~agd5f/linux into drm-fixes (2020-05-01 11:19:55 +1000) drm fixes for 5.7-rc4 core: - EDID off by one DTD fix - DP mst write return code fix dma-buf: - fix SET_NAME ioctl uapi - doc fixes amdgpu: - Fix a green screen on resume issue - PM fixes for SR-IOV SDMA fix for navi - Renoir display fixes - Cursor and pageflip stuttering fixes - Misc additional display fixes - (uapi) Add additional DCC tiling flags for navi1x i915: - Fix selftest refcnt leak (Xiyu) - Fix gem vma lock (Chris) - Fix gt's i915_request.timeline acquire by checking if cacheline is valid (Chris) - Fix IRQ postinistall fault masks (Matt) qxl: - use after gree fix - fix lost kunmap - release leak fix virtio: - context destruction fix Aric Cyr (1): drm/amd/display: Use cursor locking to prevent flip delays Aurabindo Pillai (1): drm/amd/display: DispalyPort: Write OUI only if panel supports it Chris Wilson (2): drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma() drm/i915/gt: Check cacheline is valid before acquiring Daniel Vetter (1): dma-buf: Fix SET_NAME ioctl uapi Dave Airlie (3): Merge tag 'drm-misc-fixes-2020-04-30' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Merge tag 'drm-intel-fixes-2020-04-30' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Merge tag 'amd-drm-fixes-5.7-2020-04-29' of git://people.freedesktop.org/~agd5f/linux into drm-fixes Dmytro Laktyushkin (2): drm/amd/display: check if REFCLK_CNTL register is present drm/amd/display: fix rn soc bb update Gurchetan Singh (1): drm/virtio: only destroy created contexts Lyude Paul (1): drm/dp_mst: Fix drm_dp_send_dpcd_write() return code Marek Olšák (3): drm/amdgpu: add tiling flags from Mesa drm/amdgpu: invalidate L2 before SDMA IBs (v2) drm/amdgpu: bump version for invalidate L2 before SDMA IBs Matt Roper (1): drm/i915: Use proper fault mask in interrupt postinstall too Nicholas Kazlauskas (1): drm/amd/display: Defer cursor update around VUPDATE for all ASIC Randy Dunlap (1): dma-buf: fix documentation build warnings Rodrigo Siqueira (1): drm/amd/display: Fix green screen issue after suspend Sung Lee (1): drm/amd/display: Update downspread percent to match spreadsheet for DCN2.1 Tiecheng Zhou (2): Revert "drm/amd/powerplay: avoid using pm_en before it is initialized" drm/amd/powerplay: avoid using pm_en before it is initialized revised Vasily Averin (4): drm/qxl: qxl_release leak in qxl_draw_dirty_fb() drm/qxl: qxl_release leak in qxl_hw_surface_alloc() drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper() drm/qxl: qxl_release use after free Ville Syrjälä (1): drm/edid: Fix off-by-one in DispID DTD pixel clock Xiaodong Yan (1): drm/amd/display: blank dp stream before re-train the link Xiyu Yang (1): drm/i915/selftests: Fix i915_address_space refcnt leak drivers/dma-buf/dma-buf.c | 7 +- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 3 +- drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h | 16 + drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 14 +++- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 38 --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 27 drivers/gpu/drm/amd/display/dc/core/dc_stream.c| 40 ++- .../amd/display/dc/dce110/dce110_hw_sequencer.c| 1 + .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 10 +++ .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.h | 1 +
Re: [PATCH 2/2] epoll: atomically remove wait entry on wake up
Hi [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.6.7, v5.4.35, v4.19.118, v4.14.177, v4.9.220, v4.4.220. v5.6.7: Build OK! v5.4.35: Build OK! v4.19.118: Failed to apply! Possible dependencies: 1b53734bd0b2 ("epoll: fix possible lost wakeup on epoll_ctl() path") 35cff1a6e023 ("fs/epoll: rename check_events label to send_events") 86c051793b4c ("fs/epoll: deal with wait_queue only once") abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout") c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") v4.14.177: Failed to apply! Possible dependencies: 002b343669c4 ("fs/epoll: loosen irq safety in ep_scan_ready_list()") 35cff1a6e023 ("fs/epoll: rename check_events label to send_events") 37b5e5212a44 ("epoll: remove ep_call_nested() from ep_eventpoll_poll()") 679abf381a18 ("fs/eventpoll.c: loosen irq safety in ep_poll()") 69112736e2f0 ("eventpoll: no need to mask the result of epi_item_poll() again") 86c051793b4c ("fs/epoll: deal with wait_queue only once") bec1a502d34d ("eventpoll: constify struct epoll_event pointers") c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") d85e2aa2e34d ("annotate ep_scan_ready_list()") ee8ef0a4b167 ("epoll: use the waitqueue lock to protect ep->wq") v4.9.220: Failed to apply! Possible dependencies: 86c051793b4c ("fs/epoll: deal with wait_queue only once") 89d947561077 ("sd: Implement support for ZBC devices") ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t") bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") cf43e6be865a ("block: add scalable completion tracking of requests") e806402130c9 ("block: split out request-only flags into a new namespace") v4.4.220: Failed to apply! Possible dependencies: 27489a3c827b ("blk-mq: turn hctx->run_work into a regular work struct") 511cbce2ff8b ("irq_poll: make blk-iopoll available outside the block layer") 86c051793b4c ("fs/epoll: deal with wait_queue only once") 8d354f133e86 ("blk-mq: improve layout of blk_mq_hw_ctx") 9467f85960a3 ("blk-mq/cpu-notif: Convert to new hotplug state machine") ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t") bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") cf43e6be865a ("block: add scalable completion tracking of requests") d9d8c5c489f4 ("block: convert is_sync helpers to use REQ_OPs.") da8b44d5a9f8 ("timer: convert timer_slack_ns from unsigned long to u64") e57690fe009b ("blk-mq: don't overwrite rq->mq_ctx") e6a40b096e28 ("block: prepare request creation/destruction code to use REQ_OPs") e806402130c9 ("block: split out request-only flags into a new namespace") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch? -- Thanks Sasha
linux-next: manual merge of the net-next tree with the kspp-gustavo tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c between commit: 3ba225b506a2 ("treewide: Replace zero-length array with flexible-array member") from the kspp-gustavo tree and commit: 4780dbdbd957 ("mlxsw: spectrum_span: Replace zero-length array with flexible-array member") from the net-next tree (and further changes). I fixed it up (they had the same effect on this file, so I used the latter) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell pgpyYM_9ScpC4.pgp Description: OpenPGP digital signature
Re: [PATCH 0/7] padata: parallelize deferred page init
On Thu, Apr 30, 2020 at 06:09:35PM -0700, Josh Triplett wrote: > On Thu, Apr 30, 2020 at 04:11:18PM -0400, Daniel Jordan wrote: > > Sometimes the kernel doesn't take full advantage of system memory > > bandwidth, leading to a single CPU spending excessive time in > > initialization paths where the data scales with memory size. > > > > Multithreading naturally addresses this problem, and this series is the > > first step. > > > > It extends padata, a framework that handles many parallel singlethreaded > > jobs, to handle multithreaded jobs as well by adding support for > > splitting up the work evenly, specifying a minimum amount of work that's > > appropriate for one helper thread to do, load balancing between helpers, > > and coordinating them. More documentation in patches 4 and 7. > > > > The first user is deferred struct page init, a large bottleneck in > > kernel boot--actually the largest for us and likely others too. This > > path doesn't require concurrency limits, resource control, or priority > > adjustments like future users will (vfio, hugetlb fallocate, munmap) > > because it happens during boot when the system is otherwise idle and > > waiting on page init to finish. > > > > This has been tested on a variety of x86 systems and speeds up kernel > > boot by 6% to 49% by making deferred init 63% to 91% faster. Patch 6 > > has detailed numbers. Test results from other systems appreciated. > > > > This series is based on v5.6 plus these three from mmotm: > > > > mm-call-touch_nmi_watchdog-on-max-order-boundaries-in-deferred-init.patch > > mm-initialize-deferred-pages-with-interrupts-enabled.patch > > mm-call-cond_resched-from-deferred_init_memmap.patch > > > > All of the above can be found in this branch: > > > > git://oss.oracle.com/git/linux-dmjordan.git padata-mt-definit-v1 > > > > https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-definit-v1 > > For the series (and the three prerequisite patches): > > Tested-by: Josh Triplett Appreciate the runs, Josh, thanks.
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c between commit: 8075411d93b6 ("net/mlx5: DR, On creation set CQ's arm_db member to right value") from the net tree and commit: 73a75b96fc9a ("net/mlx5: Remove empty QP and CQ events handlers") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c index 18719acb7e54,c4ed25bb9ac8.. --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c @@@ -689,18 -693,6 +693,12 @@@ static int dr_prepare_qp_to_rts(struct return 0; } - static void dr_cq_event(struct mlx5_core_cq *mcq, - enum mlx5_event event) - { - pr_info("CQ event %u on CQ #%u\n", event, mcq->cqn); - } - +static void dr_cq_complete(struct mlx5_core_cq *mcq, + struct mlx5_eqe *eqe) +{ + pr_err("CQ completion CQ: #%u\n", mcq->cqn); +} + static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev, struct mlx5_uars_page *uar, size_t ncqe) @@@ -761,9 -753,6 +759,8 @@@ pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas); mlx5_fill_page_frag_array(>wq_ctrl.buf, pas); - cq->mcq.event = dr_cq_event; + cq->mcq.comp = dr_cq_complete; + err = mlx5_core_create_cq(mdev, >mcq, in, inlen, out, sizeof(out)); kvfree(in); pgpiirFWSlynZ.pgp Description: OpenPGP digital signature
Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()
Hi Alex, On Thu, Apr 30, 2020 at 02:43:28PM -0700, Alexander Duyck wrote: > On 4/30/2020 1:11 PM, Daniel Jordan wrote: > > padata will soon divide up pfn ranges between threads when parallelizing > > deferred init, and deferred_init_maxorder() complicates that by using an > > opaque index in addition to start and end pfns. Move the index outside > > the function to make splitting the job easier, and simplify the code > > while at it. > > > > deferred_init_maxorder() now always iterates within a single pfn range > > instead of potentially multiple ranges, and advances start_pfn to the > > end of that range instead of the max-order block so partial pfn ranges > > in the block aren't skipped in a later iteration. The section alignment > > check in deferred_grow_zone() is removed as well since this alignment is > > no longer guaranteed. It's not clear what value the alignment provided > > originally. > > > > Signed-off-by: Daniel Jordan > > So part of the reason for splitting it up along section aligned boundaries > was because we already had an existing functionality in deferred_grow_zone > that was going in and pulling out a section aligned chunk and processing it > to prepare enough memory for other threads to keep running. I suspect that > the section alignment was done because normally I believe that is also the > alignment for memory onlining. I think Pavel added that functionality, maybe he could confirm. My impression was that the reason deferred_grow_zone aligned the requested order up to a section was to make enough memory available to avoid being called on every allocation. > With this already breaking things up over multiple threads how does this > work with deferred_grow_zone? Which thread is it trying to allocate from if > it needs to allocate some memory for itself? I may not be following your question, but deferred_grow_zone doesn't allocate memory during the multithreading in deferred_init_memmap because the latter sets first_deferred_pfn so that deferred_grow_zone bails early. > Also what is to prevent a worker from stop deferred_grow_zone from bailing > out in the middle of a max order page block if there is a hole in the middle > of the block? deferred_grow_zone remains singlethreaded. It could stop in the middle of a max order block, but it can't run concurrently with deferred_init_memmap, as per above, so if deferred_init_memmap were to init 'n free the remaining part of the block, the previous portion would have already been initialized.
Re: [PATCH 0/7] padata: parallelize deferred page init
On Thu, Apr 30, 2020 at 05:40:59PM -0400, Pavel Tatashin wrote: > On Thu, Apr 30, 2020 at 5:31 PM Andrew Morton > wrote: > > On Thu, 30 Apr 2020 16:11:18 -0400 Daniel Jordan > > wrote: > > > > > Sometimes the kernel doesn't take full advantage of system memory > > > bandwidth, leading to a single CPU spending excessive time in > > > initialization paths where the data scales with memory size. > > > > > > Multithreading naturally addresses this problem, and this series is the > > > first step. > > > > > > It extends padata, a framework that handles many parallel singlethreaded > > > jobs, to handle multithreaded jobs as well by adding support for > > > splitting up the work evenly, specifying a minimum amount of work that's > > > appropriate for one helper thread to do, load balancing between helpers, > > > and coordinating them. More documentation in patches 4 and 7. > > > > > > The first user is deferred struct page init, a large bottleneck in > > > kernel boot--actually the largest for us and likely others too. This > > > path doesn't require concurrency limits, resource control, or priority > > > adjustments like future users will (vfio, hugetlb fallocate, munmap) > > > because it happens during boot when the system is otherwise idle and > > > waiting on page init to finish. > > > > > > This has been tested on a variety of x86 systems and speeds up kernel > > > boot by 6% to 49% by making deferred init 63% to 91% faster. > > > > How long is this up-to-91% in seconds? If it's 91% of a millisecond > > then not impressed. If it's 91% of two weeks then better :) The largest system I could test had 384G per node and saved 1.5 out of 4 seconds. > > Relatedly, how important is boot time on these large machines anyway? > > They presumably have lengthy uptimes so boot time is relatively > > unimportant? > > Large machines indeed have a lengthy uptime, but they also can host a > large number of VMs meaning that downtime of the host increases the > downtime of VMs in cloud environments. Some VMs might be very sensible > to downtime: game servers, traders, etc. > > > IOW, can you please explain more fully why this patchset is valuable to > > our users? I'll let the users speak for themselves, but I have a similar use case to Pavel of limiting the downtime of VMs running on these large systems, and spinning up instances as fast as possible is also desirable for our cloud users.
Re: [PATCH 08/15] usb: ehci: avoid gcc-10 zero-length-bounds warning
On Thu, 30 Apr 2020, Arnd Bergmann wrote: > Building ehci drivers with gcc-10 results in a number of warnings like > when an zero-length array is accessed: > > drivers/usb/host/ehci-hub.c: In function 'ehci_bus_suspend': > drivers/usb/host/ehci-hub.c:320:30: error: array subscript 14 is outside the > bounds of an interior zero-length array 'u32[0]' {aka 'unsigned int[0]'} > [-Werror=zero-length-bounds] > 320 |u32 __iomem *hostpc_reg = >regs->hostpc[port]; > | ^ > In file included from drivers/usb/host/ehci.h:273, > from drivers/usb/host/ehci-hcd.c:96: > include/linux/usb/ehci_def.h:186:7: note: while referencing 'hostpc' > 186 | u32 hostpc[0]; /* HOSTPC extension */ > | ^~ > In file included from drivers/usb/host/ehci-hcd.c:305: > drivers/usb/host/ehci-hub.c: In function 'ehci_hub_control': > drivers/usb/host/ehci-hub.c:892:15: error: array subscript 256 is outside the > bounds of an interior zero-length array 'u32[0]' {aka 'unsigned int[0]'} > [-Werror=zero-length-bounds] > 892 | hostpc_reg = >regs->hostpc[temp]; > | ^ > In file included from drivers/usb/host/ehci.h:273, > from drivers/usb/host/ehci-hcd.c:96: > include/linux/usb/ehci_def.h:186:7: note: while referencing 'hostpc' > 186 | u32 hostpc[0]; /* HOSTPC extension */ > | ^~ > > All these fields are colocated with reserved fields that I guess > refer to the correct field length. No, they don't. > Change the two struct definition to use an unnamed union to define > both of these fields at the same location as the corresponding > reserved fields. > > Signed-off-by: Arnd Bergmann > --- > include/linux/usb/ehci_def.h | 12 > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/include/linux/usb/ehci_def.h b/include/linux/usb/ehci_def.h > index 78e00637..8777d8e56ef2 100644 > --- a/include/linux/usb/ehci_def.h > +++ b/include/linux/usb/ehci_def.h > @@ -127,7 +127,8 @@ struct ehci_regs { > #define FLAG_CF (1<<0) /* true: we'll support "high > speed" */ > > /* PORTSC: offset 0x44 */ > - u32 port_status[0]; /* up to N_PORTS */ > + union { > + u32 port_status[9]; /* up to N_PORTS */ This array can have up to 15 elements, meaning that it can extend out to offset 0x80. > /* EHCI 1.1 addendum */ > #define PORTSC_SUSPEND_STS_ACK 0 > #define PORTSC_SUSPEND_STS_NYET 1 > @@ -165,7 +166,8 @@ struct ehci_regs { > #define PORT_CONNECT (1<<0) /* device connected */ > #define PORT_RWC_BITS (PORT_CSC | PORT_PEC | PORT_OCC) > > - u32 reserved3[9]; > + u32 reserved3[9]; > + }; > > /* USBMODE: offset 0x68 */ > u32 usbmode;/* USB Device mode */ As you see, this next field actually lies inside the preceding array. It's not a real conflict; any hardware which supports the usbmode field uses only the first element of the port_status array. I don't know how you want to handle this. Doing #define usbmode port_status[9] doesn't seem like a very good approach, but I can't think of anything better at the moment. Maybe just set the array size to 9, as you did, but with a comment explaining what's really going on. > @@ -181,11 +183,13 @@ struct ehci_regs { > * PORTSCx > */ > /* HOSTPC: offset 0x84 */ > - u32 hostpc[0]; /* HOSTPC extension */ > + union { > + u32 hostpc[17]; /* HOSTPC extension */ Likewise, this array can have up to 15 elements. In fact, it's the same size as the port_status array. > #define HOSTPC_PHCD (1<<22) /* Phy clock disable */ > #define HOSTPC_PSPD (3<<25) /* Port speed detection */ > > - u32 reserved5[17]; > + u32 reserved5[17]; > + }; > > /* USBMODE_EX: offset 0xc8 */ > u32 usbmode_ex; /* USB Device mode extension */ Alan Stern
linux-next: manual merge of the net-next tree with the net tree
Hi all, Today's linux-next merge of the net-next tree got a conflict in: include/net/mptcp.h between commit: cfde141ea3fa ("mptcp: move option parsing into mptcp_incoming_options()") from the net tree and commit: 071c8ed6e88d ("tcp: mptcp: use mptcp receive buffer space to select rcv window") from the net-next tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc include/net/mptcp.h index 3bce2019e4da,5288fba56e55.. --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@@ -68,8 -68,13 +68,10 @@@ static inline bool rsk_is_mptcp(const s return tcp_rsk(req)->is_mptcp; } + void mptcp_space(const struct sock *ssk, int *space, int *full_space); + -void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr, - int opsize, struct tcp_options_received *opt_rx); bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb, unsigned int *size, struct mptcp_out_options *opts); -void mptcp_rcv_synsent(struct sock *sk); bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, struct mptcp_out_options *opts); bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, pgpoAENNkOI1v.pgp Description: OpenPGP digital signature
Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
On Thu, 30 Apr 2020 22:26:55 -0400 (EDT) Mathieu Desnoyers wrote: > - On Apr 30, 2020, at 9:13 PM, rostedt rost...@goodmis.org wrote: > > > [ Joerg, sending again this time not just to you. (hit reply to sender > > and not reply to all). Feel free to resend what you wrote before to this ] > > > > On Thu, 30 Apr 2020 21:14:34 +0200 > > Joerg Roedel wrote: > > > >> And alloc_percpu() calls down into pcpu_alloc(), which allocates new > >> percpu chunks using vmalloc() on x86. And there we are again in the > >> vmalloc area. > > > > So after a vmalloc() is made, should the page tables be synced? > > Why should it ? Usually, the page fault handler is able to resolve the > resulting minor page faults lazily. > > > > > This is a rather subtle bug, and I don't think it should be the caller of > > percpu_alloc() that needs to call vmalloc_sync_mappings(). > > Who said tracing was easy ? ;-) But anyone can hook to a tracepoint, and then if they hook to one that is in the page fault handler, and they use vmalloc, they can lock up the machine. > > > What's your suggestion for a fix? > > I know the question is not addressed to me, but here are my 2 cents: > > It's subtle because ftrace is tracing the page fault handler through > tracepoints. It would not make sense to slow down all vmalloc or > percpu_alloc() just because tracing recurses when tracing page faults. What's so damn special about alloc_percpu()? It's definitely not a fast path. And it's not used often. > > I think the right approach to solve this is to call vmalloc_sync_mappings() > before any vmalloc'd memory ends up being observable by instrumentation. > This can be achieved by adding a vmalloc_sync_mappings call on tracepoint > registration like I proposed in my patchset a few week ago: > > https://lore.kernel.org/r/20200409193543.18115-2-mathieu.desnoy...@efficios.com > > The tracers just have to make sure they perform their vmalloc'd memory > allocation before registering the tracepoint which can touch it, else they > need to issue vmalloc_sync_mappings() on their own before making the > newly allocated memory observable by instrumentation. > > This approach is not new: register_die_notifier() does exactly that today. > I'll give the answer I gave to Joerg when he replied to my accidental private (not public) email: Or even my original patch would be better than having the generic tracing code understanding the intrinsic properties of vmalloc() and alloc_percpu() on x86_64. I really don't think it is wise to have: foo = alloc_percpu(); /* * Because of some magic with the way alloc_percpu() works on * x86_64, we need to synchronize the pgd of all the tables, * otherwise the trace events that happen in x86_64 page fault * handlers can't cope with accessing the chance that a * alloc_percpu()'d memory might be touched in the page fault trace * event. Oh, and we need to audit all alloc_percpu() and vmalloc() * calls in tracing, because something might get triggered within a * page fault trace event! */ vmalloc_sync_mappings(); That would be exactly what I add as a comment if it were to be added in the generic tracing code. And we would need to audit any percpu alloc'd code in all tracing, or anything that might git hooked into something that hooks to the page fault trace point. Since this worked for a decade without this, I'm strongly against adding it in the generic code due to some issues with a single architecture. -- Steve
Re: [PATCH 21/29] mm: remove the pgprot argument to __vmalloc
>> On Tue, Apr 14, 2020 at 03:13:40PM +0200, Christoph Hellwig wrote: >> > The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove >> > it. Greetings; I recently noticed this change via the linux-next tree. It may not be possible to edit at this late date, but the change description refers to PROT_KERNEL, which is a symbol which does not appear to exist; perhaps PAGE_KERNEL was meant? The mismatch caused me and a couple other folks some confusion briefly until we decided it was supposed to be PAGE_KERNEL; if it's not too late, editing the description to clarify so would be nice. Many thanks. John Dorminy
Re: [PATCH V1 09/10] arch/kmap: Define kmap_atomic_prot() for all arch's
On Thu, Apr 30, 2020 at 01:38:44PM -0700, ira.we...@intel.com wrote: > -static inline void *kmap_atomic(struct page *page) > +static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) > { > preempt_disable(); > pagefault_disable(); > if (!PageHighMem(page)) > return page_address(page); > - return kmap_atomic_high(page); > + return kmap_atomic_high_prot(page, prot); > } > +#define kmap_atomic(page)kmap_atomic_prot(page, kmap_prot) OK, so it *was* just a bisect hazard - you return to original semantics wrt preempt_disable()...
[PATCH] streamline_config.pl: add LOCALMODCONFIG_PRESERVE to preserve some kconfigs
Sometimes it is useful to preserve batches of configs when making localmodconfig. For example, I usually don't want any usb and fs modules to be disabled. Now we can do it by: $ make LOCALMODCONFIG_PRESERVE="drivers/usb;fs" localmodconfig Signed-off-by: Changbin Du --- Documentation/admin-guide/README.rst | 8 +++- scripts/kconfig/streamline_config.pl | 23 +++ 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/README.rst b/Documentation/admin-guide/README.rst index cc6151fc0845..6deff95362f8 100644 --- a/Documentation/admin-guide/README.rst +++ b/Documentation/admin-guide/README.rst @@ -209,10 +209,16 @@ Configuring the kernel store the lsmod of that machine into a file and pass it in as a LSMOD parameter. + Also, you can preserve modules in certen folders + or kconfig files by spcifying there paths in + parameter LOCALMODCONFIG_PRESERVE. + target$ lsmod > /tmp/mylsmod target$ scp /tmp/mylsmod host:/tmp - host$ make LSMOD=/tmp/mylsmod localmodconfig + host$ make LSMOD=/tmp/mylsmod \ + LOCALMODCONFIG_PRESERVE="drivers/usb;drivers/gpu;fs" \ + localmodconfig The above also works when cross compiling. diff --git a/scripts/kconfig/streamline_config.pl b/scripts/kconfig/streamline_config.pl index e2f8504f5a2d..ab5d1e10a5d0 100755 --- a/scripts/kconfig/streamline_config.pl +++ b/scripts/kconfig/streamline_config.pl @@ -143,6 +143,7 @@ my %depends; my %selects; my %prompts; my %objects; +my %config2kfile; my $var; my $iflevel = 0; my @ifdeps; @@ -201,6 +202,7 @@ sub read_kconfig { if (/^\s*(menu)?config\s+(\S+)\s*$/) { $state = "NEW"; $config = $2; + $config2kfile{"CONFIG_$config"} = $kconfig; # Add depends for 'if' nesting for (my $i = 0; $i < $iflevel; $i++) { @@ -592,6 +594,22 @@ while ($repeat) { my %setconfigs; +my @presevered_kconfigs; +@presevered_kconfigs = split(/;/,$ENV{LOCALMODCONFIG_PRESERVE}) if (defined($ENV{LOCALMODCONFIG_PRESERVE})); + +sub in_presevered_kconfigs { +my $kconfig = $config2kfile{$_[0]}; +if (!defined($kconfig)) { +return 0; +} +foreach my $excl (@presevered_kconfigs) { +if($kconfig =~ /^$excl/) { +return 1; +} +} +return 0; +} + # Finally, read the .config file and turn off any module enabled that # we could not find a reason to keep enabled. foreach my $line (@config_file) { @@ -644,6 +662,11 @@ foreach my $line (@config_file) { } if (/^(CONFIG.*)=(m|y)/) { +if (in_presevered_kconfigs($1)) { +dprint "Preserve config $1"; +print; +next; +} if (defined($configs{$1})) { if ($localyesconfig) { $setconfigs{$1} = 'y'; -- 2.25.1
Re: bug: Kbuild seems to require "make prepare-objtool" in order to use (some) new config vars
On Fri, May 1, 2020 at 4:10 AM Mauro Carvalho Chehab wrote: > > Em Fri, 1 May 2020 02:20:13 +0900 > Masahiro Yamada escreveu: > > > On Fri, May 1, 2020 at 1:49 AM Mauro Carvalho Chehab > > wrote: > > > > > > Em Thu, 30 Apr 2020 22:51:48 +0900 > > > Masahiro Yamada escreveu: > > > > > > > Hi Mauro, > > > > > > > > > > > > On Thu, Apr 30, 2020 at 8:17 PM Mauro Carvalho Chehab > > > > wrote: > > > > > > > > > > Hi Masahiro, > > > > > > > > > > Not sure if this was already reported (or eventually fixed) upstream. > > > > > > > > > > While doing a Kconfig reorg on media, I noticed a few weird things, > > > > > requiring me to call, on a few situations, "make modules_prepare" > > > > > manually after some changes. > > > > > > > > > > I'm now working on a patchset to yet to be merged upstream aiming to > > > > > resurrect a driver from staging. It is currently on this tree > > > > > (with is based at the media development tree, on the top of 5.7-rc1): > > > > > > > > > > > > > > > https://git.linuxtv.org/mchehab/experimental.git/log/?h=atomisp_v2 > > > > > > > > > > There, I was able to identify a misbehavior that it is probably > > > > > what forced me to need calling "make modules_prepare". > > > > > > > > > > The atomisp driver is on a very bad shape. Among its problems, it has > > > > > a > > > > > set of header definitions that should be different for two different > > > > > variants of the supported devices. In order to be able to compile for > > > > > either one of the variants, I added a new config var: > > > > > CONFIG_VIDEO_ATOMISP_ISP2401. > > > > > > > > > > The problem is that calling just > > > > > > > > > > ./scripts/config -e CONFIG_VIDEO_ATOMISP_ISP2401 > > > > > > > > > > or > > > > > ./scripts/config -d CONFIG_VIDEO_ATOMISP_ISP2401 > > > > > > > > > > > > scripts/config does not take the dependency into consideration > > > > at all. > > > > > > > > You need to enable/disable other options that it depends on. > > > > > > Yes, I know. on my tests, I did a "make allyesconfig" before > > > the patch whose added this dependency. So, the only dependency > > > left to be enabled or disabled was this one. > > > > > > The problem I'm pointing is not really do a select recursion > > > (that would be a really cool feature, but I know it is not > > > trivial), but, instead, that, despite the config var was > > > there, when I tried to build it with: > > > > > > make clean; make M=drivers/staging/media/atomisp > > > > > > It didn't do the right thing. Instead, it used ISP2401=y > > > on make clean and ISP2401=n at the drivers build. > > > > > > So, in order to test if patches won't break building, > > > depending on the value of this var, I'm currently doing: > > > > > > cls;./scripts/config -d CONFIG_VIDEO_ATOMISP_ISP2401 && make > > > prepare-objtool && make clean && make M=drivers/staging/media/atomisp && > > > ./scripts/config -e CONFIG_VIDEO_ATOMISP_ISP2401 && make prepare-objtool > > > && make clean && make M=drivers/staging/media/atomisp > > > > > > (note the alien "make prepare-objtool" in the middle) > > > > > > > > > > ./scripts/config -e STAGING -e STAGING_MEDIA -e MEDIA_SUPPORT -e > > > > MEDIA_CONTROLLER -e INTEL_ATOMISP -e VIDEOBUF_VMALLOC -e VIDEO_ATOMISP > > > > -d MEDIA_SUPPORT_FILTER -e VIDEO_DEV -e VIDEO_V4L2 -e > > > > VIDEO_ATOMISP_ISP2401 > > > > > > > > allows me to enable VIDEO_ATOMISP_ISP2401. > > > > > > > > > > > > It is painful to debug such complicated dependency graph, though. > > > > > > Yeah, dependencies at the media subsystem are usually quite complex. > > > > > > > > > > > > > is not enough anymore for the build to actually use the new config > > > > > value. > > > > > > > > > > It just keeps silently using the old config value. > > > > > > > > > > I double-checked it by adding this macro at the Makefile: > > > > > > > > > > $(info ISP2401: $(CONFIG_VIDEO_ATOMISP_ISP2401) > > > > > ) > > > > > > > > > > The Makefile doesn't see the change, except if I explicitly call > > > > > "make modules_prepare" or "make prepare-objtool". > > > > > > > > > > Even calling "make oldconfig" doesn't make it use the new CONFIG_* > > > > > > > > > > > > I do not know. > > > > > > > > I cannot look into it without detailed steps to reproduce it. > > > > > > I'm applying the enclosed patch to this branch: > > > > > > https://git.linuxtv.org/mchehab/experimental.git/log/?h=atomisp_v2 > > > > > > and running this: > > > > > > $ make allyesconfig 2>/dev/null >/dev/null; echo > > > "disable";./scripts/config -d CONFIG_VIDEO_ATOMISP_ISP2401 && make > > > M=drivers/staging/media/atomisp && ./scripts/config -e > > > CONFIG_VIDEO_ATOMISP_ISP2401 && echo "enable" && make clean && make > > > M=drivers/staging/media/atomisp > > > disable > > > > > > WARNING: Symbol version dump ./Module.symvers > > >is missing; modules will have no dependencies and > > >
Re: [PATCH V1 00/10] Remove duplicated kmap code
ira.we...@intel.com writes: > From: Ira Weiny > > The kmap infrastructure has been copied almost verbatim to every architecture. > This series consolidates obvious duplicated code by defining core functions > which call into the architectures only when needed. > > Some of the k[un]map_atomic() implementations have some similarities but the > similarities were not sufficient to warrant further changes. > > In addition we remove a duplicate implementation of kmap() in DRM. > > Testing was done by 0day to cover all the architectures I can't readily > build/test. I threw some powerpc builds at it and they all passed, so LGTM. cheers
Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()
- On Apr 30, 2020, at 9:13 PM, rostedt rost...@goodmis.org wrote: > [ Joerg, sending again this time not just to you. (hit reply to sender > and not reply to all). Feel free to resend what you wrote before to this ] > > On Thu, 30 Apr 2020 21:14:34 +0200 > Joerg Roedel wrote: > >> And alloc_percpu() calls down into pcpu_alloc(), which allocates new >> percpu chunks using vmalloc() on x86. And there we are again in the >> vmalloc area. > > So after a vmalloc() is made, should the page tables be synced? Why should it ? Usually, the page fault handler is able to resolve the resulting minor page faults lazily. > > This is a rather subtle bug, and I don't think it should be the caller of > percpu_alloc() that needs to call vmalloc_sync_mappings(). Who said tracing was easy ? ;-) > What's your suggestion for a fix? I know the question is not addressed to me, but here are my 2 cents: It's subtle because ftrace is tracing the page fault handler through tracepoints. It would not make sense to slow down all vmalloc or percpu_alloc() just because tracing recurses when tracing page faults. I think the right approach to solve this is to call vmalloc_sync_mappings() before any vmalloc'd memory ends up being observable by instrumentation. This can be achieved by adding a vmalloc_sync_mappings call on tracepoint registration like I proposed in my patchset a few week ago: https://lore.kernel.org/r/20200409193543.18115-2-mathieu.desnoy...@efficios.com The tracers just have to make sure they perform their vmalloc'd memory allocation before registering the tracepoint which can touch it, else they need to issue vmalloc_sync_mappings() on their own before making the newly allocated memory observable by instrumentation. This approach is not new: register_die_notifier() does exactly that today. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH] mm/gup.c: Corrected document reference path
On 2020-04-30 12:01, Souptick Joarder wrote: Document path Documentation/vm/pin_user_pages.rst is not a correct reference and it should be Documentation/core-api/pin_user_pages.rst. Signed-off-by: Souptick Joarder --- mm/gup.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) Reviewed-by: John Hubbard thanks, -- John Hubbard NVIDIA diff --git a/mm/gup.c b/mm/gup.c index 7ce796c..4952f12 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2864,10 +2864,10 @@ int get_user_pages_fast(unsigned long start, int nr_pages, * the arguments here are identical. * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please - * see Documentation/vm/pin_user_pages.rst for further details. + * see Documentation/core-api/pin_user_pages.rst for further details. * - * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It - * is NOT intended for Case 2 (RDMA: long-term pins). + * This is intended for Case 1 (DIO) in Documentation/core-api/pin_user_pages.rst. + * It is NOT intended for Case 2 (RDMA: long-term pins). */ int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages) @@ -2904,10 +2904,10 @@ int pin_user_pages_fast(unsigned long start, int nr_pages, * the arguments here are identical. * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please - * see Documentation/vm/pin_user_pages.rst for details. + * see Documentation/core-api/pin_user_pages.rst for details. * - * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It - * is NOT intended for Case 2 (RDMA: long-term pins). + * This is intended for Case 1 (DIO) in Documentation/core-api/pin_user_pages.rst. + * It is NOT intended for Case 2 (RDMA: long-term pins). */ long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, unsigned long start, unsigned long nr_pages, @@ -2940,10 +2940,10 @@ long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, * FOLL_PIN is set. * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please - * see Documentation/vm/pin_user_pages.rst for details. + * see Documentation/core-api/pin_user_pages.rst for details. * - * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It - * is NOT intended for Case 2 (RDMA: long-term pins). + * This is intended for Case 1 (DIO) in Documentation/core-api/pin_user_pages.rst. + * It is NOT intended for Case 2 (RDMA: long-term pins). */ long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages,
Re: [PATCH V1 08/10] arch/kmap: Don't hard code kmap_prot values
On Thu, Apr 30, 2020 at 01:38:43PM -0700, ira.we...@intel.com wrote: > From: Ira Weiny > > To support kmap_atomic_prot() on all architectures each arch must > support protections passed in to them. > > Change csky, mips, nds32 and xtensa to use their global kmap_prot value > rather than a hard coded value which was equal. Minor nitpick: it's probably worth pointing out that kmap_prot on those is a constant...
[GIT PULL] SCSI fixes for 5.7-rc3
Four minor fixes: three in drivers and one in the core. The core one allows an additional state change that fixes a regression introduced by an update to the aacraid driver in the previous merge window. The patch is available here: git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes The short changelog is: David Disseldorp (1): scsi: target/iblock: fix WRITE SAME zeroing Dexuan Cui (1): scsi: core: Allow the state change from SDEV_QUIESCE to SDEV_BLOCK Martin Wilck (2): scsi: qla2xxx: check UNLOADING before posting async work scsi: qla2xxx: set UNLOADING before waiting for session deletion And the diffstat drivers/scsi/qla2xxx/qla_os.c | 35 +-- drivers/scsi/scsi_lib.c | 1 + drivers/target/target_core_iblock.c | 2 +- 3 files changed, 19 insertions(+), 19 deletions(-) With full diff below. James --- diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c index d190db5ea7d9..1d9a4866f9a7 100644 --- a/drivers/scsi/qla2xxx/qla_os.c +++ b/drivers/scsi/qla2xxx/qla_os.c @@ -3732,6 +3732,13 @@ qla2x00_remove_one(struct pci_dev *pdev) } qla2x00_wait_for_hba_ready(base_vha); + /* +* if UNLOADING flag is already set, then continue unload, +* where it was set first. +*/ + if (test_and_set_bit(UNLOADING, _vha->dpc_flags)) + return; + if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) || IS_QLA28XX(ha)) { if (ha->flags.fw_started) @@ -3750,15 +3757,6 @@ qla2x00_remove_one(struct pci_dev *pdev) qla2x00_wait_for_sess_deletion(base_vha); - /* -* if UNLOAD flag is already set, then continue unload, -* where it was set first. -*/ - if (test_bit(UNLOADING, _vha->dpc_flags)) - return; - - set_bit(UNLOADING, _vha->dpc_flags); - qla_nvme_delete(base_vha); dma_free_coherent(>pdev->dev, @@ -4864,6 +4862,9 @@ qla2x00_alloc_work(struct scsi_qla_host *vha, enum qla_work_type type) struct qla_work_evt *e; uint8_t bail; + if (test_bit(UNLOADING, >dpc_flags)) + return NULL; + QLA_VHA_MARK_BUSY(vha, bail); if (bail) return NULL; @@ -6628,13 +6629,6 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work) struct pci_dev *pdev = ha->pdev; scsi_qla_host_t *base_vha = pci_get_drvdata(ha->pdev); - /* -* if UNLOAD flag is already set, then continue unload, -* where it was set first. -*/ - if (test_bit(UNLOADING, _vha->dpc_flags)) - return; - ql_log(ql_log_warn, base_vha, 0x015b, "Disabling adapter.\n"); @@ -6645,9 +6639,14 @@ qla2x00_disable_board_on_pci_error(struct work_struct *work) return; } - qla2x00_wait_for_sess_deletion(base_vha); + /* +* if UNLOADING flag is already set, then continue unload, +* where it was set first. +*/ + if (test_and_set_bit(UNLOADING, _vha->dpc_flags)) + return; - set_bit(UNLOADING, _vha->dpc_flags); + qla2x00_wait_for_sess_deletion(base_vha); qla2x00_delete_all_vps(ha, base_vha); diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index 47835c4b4ee0..06c260f6cdae 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2284,6 +2284,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum scsi_device_state state) switch (oldstate) { case SDEV_RUNNING: case SDEV_CREATED_BLOCK: + case SDEV_QUIESCE: case SDEV_OFFLINE: break; default: diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c index 51ffd5c002de..1c181d31f4c8 100644 --- a/drivers/target/target_core_iblock.c +++ b/drivers/target/target_core_iblock.c @@ -432,7 +432,7 @@ iblock_execute_zero_out(struct block_device *bdev, struct se_cmd *cmd) target_to_linux_sector(dev, cmd->t_task_lba), target_to_linux_sector(dev, sbc_get_write_same_sectors(cmd)), - GFP_KERNEL, false); + GFP_KERNEL, BLKDEV_ZERO_NOUNMAP); if (ret) return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
[PATCH glibc 2/3] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v7)
When available, use the cpu_id field from __rseq_abi on Linux to implement sched_getcpu(). Fall-back on the vgetcpu vDSO if unavailable. Benchmarks: x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading glibc sched_getcpu(): 13.7 ns (baseline) glibc sched_getcpu() using rseq: 2.5 ns (speedup: 5.5x) inline load cpuid from __rseq_abi TLS: 0.8 ns (speedup: 17.1x) CC: Carlos O'Donell CC: Florian Weimer CC: Joseph Myers CC: Szabolcs Nagy CC: Thomas Gleixner CC: Ben Maurer CC: Peter Zijlstra CC: "Paul E. McKenney" CC: Boqun Feng CC: Will Deacon CC: Paul Turner CC: libc-al...@sourceware.org CC: linux-kernel@vger.kernel.org CC: linux-...@vger.kernel.org --- Changes since v1: - rseq is only used if both __NR_rseq and RSEQ_SIG are defined. Changes since v2: - remove duplicated __rseq_abi extern declaration. Changes since v3: - update ChangeLog. Changes since v4: - Use atomic_load_relaxed to load the __rseq_abi.cpu_id field, a consequence of the fact that __rseq_abi is not volatile anymore. - Include atomic.h which provides atomic_load_relaxed. Changes since v5: - Use __ASSUME_RSEQ to detect rseq availability. Changes since v6: - Remove use of __ASSUME_RSEQ. --- sysdeps/unix/sysv/linux/sched_getcpu.c | 27 -- 1 file changed, 25 insertions(+), 2 deletions(-) diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c b/sysdeps/unix/sysv/linux/sched_getcpu.c index c019cfb3cf..2269c4f2bd 100644 --- a/sysdeps/unix/sysv/linux/sched_getcpu.c +++ b/sysdeps/unix/sysv/linux/sched_getcpu.c @@ -18,10 +18,15 @@ #include #include #include +#include #include -int -sched_getcpu (void) +#ifdef HAVE_GETCPU_VSYSCALL +# define HAVE_VSYSCALL +#endif + +static int +vsyscall_sched_getcpu (void) { unsigned int cpu; int r = -1; @@ -32,3 +37,21 @@ sched_getcpu (void) #endif return r == -1 ? r : cpu; } + +#include + +#ifdef RSEQ_SIG +int +sched_getcpu (void) +{ + int cpu_id = atomic_load_relaxed (&__rseq_abi.cpu_id); + + return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu (); +} +#else +int +sched_getcpu (void) +{ + return vsyscall_sched_getcpu (); +} +#endif -- 2.17.1
[PATCH glibc 1/3] glibc: Perform rseq registration at C startup and thread creation (v19)
Register rseq TLS for each thread (including main), and unregister for each thread (excluding main). "rseq" stands for Restartable Sequences. See the rseq(2) man page proposed here: https://lkml.org/lkml/2018/9/19/647 Those are based on glibc master branch commit 6f0baacf0f89. The rseq system call was merged into Linux 4.18. CC: Carlos O'Donell CC: Florian Weimer CC: Joseph Myers CC: Szabolcs Nagy CC: Thomas Gleixner CC: Ben Maurer CC: Peter Zijlstra CC: "Paul E. McKenney" CC: Boqun Feng CC: Will Deacon CC: Dave Watson CC: Paul Turner CC: Rich Felker CC: libc-al...@sourceware.org CC: linux-kernel@vger.kernel.org CC: linux-...@vger.kernel.org --- Changes since v1: - Move __rseq_refcount to an extra field at the end of __rseq_abi to eliminate one symbol. All libraries/programs which try to register rseq (glibc, early-adopter applications, early-adopter libraries) should use the rseq refcount. It becomes part of the ABI within a user-space process, but it's not part of the ABI shared with the kernel per se. - Restructure how this code is organized so glibc keeps building on non-Linux targets. - Use non-weak symbol for __rseq_abi. - Move rseq registration/unregistration implementation into its own nptl/rseq.c compile unit. - Move __rseq_abi symbol under GLIBC_2.29. Changes since v2: - Move __rseq_refcount to its own symbol, which is less ugly than trying to play tricks with the rseq uapi. - Move __rseq_abi from nptl to csu (C start up), so it can be used across glibc, including memory allocator and sched_getcpu(). The __rseq_refcount symbol is kept in nptl, because there is no reason to use it elsewhere in glibc. Changes since v3: - Set __rseq_refcount TLS to 1 on register/set to 0 on unregister because glibc is the first/last user. - Unconditionally register/unregister rseq at thread start/exit, because glibc is the first/last user. - Add missing abilist items. - Rebase on glibc master commit a502c5294. - Add NEWS entry. Changes since v4: - Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on "System V Application Binary Interface", weak only affects the link editor, not the dynamic linker. - Install a new sys/rseq.h system header on Linux, which contains the RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount declaration. Move those definition/declarations from rseq-internal.h to the installed sys/rseq.h header. - Considering that rseq is only available on Linux, move csu/rseq.c to sysdeps/unix/sysv/linux/rseq-sym.c. - Move __rseq_refcount from nptl/rseq.c to sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux. - Move both ABI definitions for __rseq_abi and __rseq_refcount to sysdeps/unix/sysv/linux/Versions, so they only appear on Linux. - Document __rseq_abi and __rseq_refcount volatile. - Document the RSEQ_SIG signature define. - Move registration functions from rseq.c to rseq-internal.h static inline functions. Introduce empty stubs in misc/rseq-internal.h, which can be overridden by architecture code in sysdeps/unix/sysv/linux/rseq-internal.h. - Rename __rseq_register_current_thread and __rseq_unregister_current_thread to rseq_register_current_thread and rseq_unregister_current_thread, now that those are only visible as internal static inline functions. - Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN rather than nptl init, so applications not linked against libpthread.so have rseq registered for their main() thread. Note that it is invoked separately for SHARED and !SHARED builds. Changes since v5: - Replace __rseq_refcount by __rseq_lib_abi, which contains two uint32_t: register_state and refcount. The "register_state" field allows inhibiting rseq registration from signal handlers nested on top of glibc registration and occuring after rseq unregistration by glibc. - Introduce enum rseq_register_state, which contains the states allowed for the struct rseq_lib_abi register_state field. Changes since v6: - Introduce bits/rseq.h to define RSEQ_SIG for each architecture. The generic bits/rseq.h does not define RSEQ_SIG, meaning that each architecture implementing rseq needs to implement bits/rseq.h. - Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING. - Port to glibc-2.29. Changes since v7: - Remove __rseq_lib_abi symbol, including refcount and register_state fields. - Remove reference counting and nested signals handling from registration/unregistration functions. - Introduce new __rseq_handled exported symbol, which is set to 1 by glibc on C startup when it handles restartable sequences. This allows glibc to coexist with early adopter libraries and applications wishing to register restartable sequences when it is not handled by glibc. - Introduce rseq_init (), which sets __rseq_handled to 1 from C startup. - Update NEWS entry. - Update comments at the beginning of new files. - Registration depends on
Re: [PATCH] memcg: oom: ignore oom warnings from memory.max
On Fri, May 1, 2020 at 10:04 AM Shakeel Butt wrote: > > On Thu, Apr 30, 2020 at 6:39 PM Yafang Shao wrote: > > > > On Fri, May 1, 2020 at 2:27 AM Shakeel Butt wrote: > > > > > > Lowering memory.max can trigger an oom-kill if the reclaim does not > > > succeed. However if oom-killer does not find a process for killing, it > > > dumps a lot of warnings. > > > > > > > I have been confused by this behavior for several months and I think > > it will confuse more memcg users. > > We should keep the memcg oom behavior consistent with system oom - no > > oom kill if no process. > > > > What about bellow change ? > > > > Seems fine to me. If others are ok with this, please do send a signed-off > patch. > Sure, I will. > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index e28098e13f1c..25fbc37a747f 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -6086,6 +6086,9 @@ static ssize_t memory_max_write(struct > > kernfs_open_file *of, > > continue; > > } > > > > + if (!cgroup_is_populated(memcg->css.cgroup)) > > + break; > > + > > memcg_memory_event(memcg, MEMCG_OOM); > > if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > > break; > > > > > Deleting a memcg does not reclaim memory from it and the memory can > > > linger till there is a memory pressure. One normal way to proactively > > > reclaim such memory is to set memory.max to 0 just before deleting the > > > memcg. However if some of the memcg's memory is pinned by others, this > > > operation can trigger an oom-kill without any process and thus can log a > > > lot un-needed warnings. So, ignore all such warnings from memory.max. > > > > > > Signed-off-by: Shakeel Butt > > > --- > > > include/linux/oom.h | 3 +++ > > > mm/memcontrol.c | 9 + > > > mm/oom_kill.c | 2 +- > > > 3 files changed, 9 insertions(+), 5 deletions(-) > > > > > > diff --git a/include/linux/oom.h b/include/linux/oom.h > > > index c696c265f019..6345dc55df64 100644 > > > --- a/include/linux/oom.h > > > +++ b/include/linux/oom.h > > > @@ -52,6 +52,9 @@ struct oom_control { > > > > > > /* Used to print the constraint info. */ > > > enum oom_constraint constraint; > > > + > > > + /* Do not warn even if there is no process to be killed. */ > > > + bool no_warn; > > > }; > > > > > > extern struct mutex oom_lock; > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > index 317dbbaac603..a1f00d9b9bb0 100644 > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1571,7 +1571,7 @@ unsigned long mem_cgroup_size(struct mem_cgroup > > > *memcg) > > > } > > > > > > static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t > > > gfp_mask, > > > -int order) > > > +int order, bool no_warn) > > > { > > > struct oom_control oc = { > > > .zonelist = NULL, > > > @@ -1579,6 +1579,7 @@ static bool mem_cgroup_out_of_memory(struct > > > mem_cgroup *memcg, gfp_t gfp_mask, > > > .memcg = memcg, > > > .gfp_mask = gfp_mask, > > > .order = order, > > > + .no_warn = no_warn, > > > }; > > > bool ret; > > > > > > @@ -1821,7 +1822,7 @@ static enum oom_status mem_cgroup_oom(struct > > > mem_cgroup *memcg, gfp_t mask, int > > > mem_cgroup_oom_notify(memcg); > > > > > > mem_cgroup_unmark_under_oom(memcg); > > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > > > + if (mem_cgroup_out_of_memory(memcg, mask, order, false)) > > > ret = OOM_SUCCESS; > > > else > > > ret = OOM_FAILED; > > > @@ -1880,7 +1881,7 @@ bool mem_cgroup_oom_synchronize(bool handle) > > > mem_cgroup_unmark_under_oom(memcg); > > > finish_wait(_oom_waitq, ); > > > mem_cgroup_out_of_memory(memcg, > > > current->memcg_oom_gfp_mask, > > > -current->memcg_oom_order); > > > +current->memcg_oom_order, false); > > > } else { > > > schedule(); > > > mem_cgroup_unmark_under_oom(memcg); > > > @@ -6106,7 +6107,7 @@ static ssize_t memory_max_write(struct > > > kernfs_open_file *of, > > > } > > > > > > memcg_memory_event(memcg, MEMCG_OOM); > > > - if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > > > + if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0, true)) > > > break; > > > } > > > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > > index 463b3d74a64a..5ace39f6fe1e 100644 > > > --- a/mm/oom_kill.c > > > +++ b/mm/oom_kill.c > > > @@ -1098,7 +1098,7 @@ bool out_of_memory(struct oom_control *oc) > > > > > >
RE: Need help on "Self Detected Stall on CPU"
Thank you sir for your guidance and quick response. Let me introduce my colleagues Paul and Mikhail here (copied in CC). They would be taking actions based on your guidance in this email and may reach you with further queries. Appreciate your support and help. Thanks, Atul -Original Message- From: Paul E. McKenney Sent: 01 May 2020 00:47 To: Atul Kulkarni Cc: linux-kernel@vger.kernel.org Subject: Re: Need help on "Self Detected Stall on CPU" On Thu, Apr 30, 2020 at 06:47:20PM +, Atul Kulkarni wrote: > Dear Sir, > > Hope you are doing well. I have watched your various conference videos and > have read technical papers. > We are facing an issue with CPU stall on our systems and I felt like there is > no one better who can guide us on how we can deal with it. > > I have attached logs for your reference. Towards end I have run couple of > sysreq commands and have taken crash dump using sysreq which may help provide > additional information. > Could you please guide us on how we could fix this issue or identify what is > going wrong here? Let's focus on the first few lines of your console message: [20526.345089] INFO: rcu_preempt self-detected stall on CPU [20526.351110] 0-...: (1051 ticks this GP) idle=1fe/142/0 softirq=146268/146268 fqs=0 [20526.360163] (t=2101 jiffies g=96468 c=96467 q=2) [20526.365535] rcu_preempt kthread starved for 2101 jiffies! g96468 c96467 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0 The last line contains the hint, namely "rcu_preempt kthread starved for 2101 jiffies!" If you don't let RCU's kernel threads run, then RCU CPU stall warnings are expected behavior. The "RCU_GP_WAIT_FQS(3)" means that this kthread's last act was to sleep for three jiffies. As you can see from earlier in that same line, that was 2101 jiffies ago. The "->state=0x402" means that the scheduler believes that this kthread is blocked, that is not yet runnable. The usual way this sort of thing happens is a timer problem, be it a hardware configuration problem, a timer-driver bug, an interrupt-handling problem, and so on. This sort of problem is especially common when bringing up new hardware or when modifying timer code or when modifying code on the interrupt/exception paths. So the question to ask yourself is "Why is the timer wakeup not reaching this kthread?", with special attention to changed code and new hardware. Thanx, Paul
Re: [PATCH V1 05/10] arch/kmap_atomic: Consolidate duplicate code
On Thu, Apr 30, 2020 at 01:38:40PM -0700, ira.we...@intel.com wrote: > From: Ira Weiny > > Every arch has the same code to ensure atomic operations and a check for > !HIGHMEM page. > > Remove the duplicate code by defining a core kmap_atomic() which only > calls the arch specific kmap_atomic_high() when the page is high memory. Err AFAICS, you've just silently changed the semantics for kmap_atomic_prot() here. And while most of the callers are converted, drivers/gpu/drm/ttm/ttm_bo_util.c one is not, so at the very least it's a bisect hazard... And I would argue that having kmap_atomic() differ from kmap_atomic_prot() wrt disabling preempt is asking for trouble.
Re: linux-next: manual merge of the btrfs tree with the btrfs-fixes tree
On 2020/5/1 上午9:05, Stephen Rothwell wrote: > Hi all, > > On Fri, 1 May 2020 10:24:53 +1000 Stephen Rothwell > wrote: >> >> Today's linux-next merge of the btrfs tree got a conflict in: >> >> fs/btrfs/transaction.c >> >> between commit: >> >> fcc99734d1d4 ("btrfs: transaction: Avoid deadlock due to bad >> initialization timing of fs_info::journal_info") >> >> from the btrfs-fixes tree and commit: >> >> f12ca53a6fd6 ("btrfs: force chunk allocation if our global rsv is larger >> than metadata") >> >> from the btrfs tree. >> >> I fixed it up (see below) and can carry the fix as necessary. This >> is now fixed as far as linux-next is concerned, but any non trivial >> conflicts should be mentioned to your upstream maintainer when your tree >> is submitted for merging. You may also want to consider cooperating >> with the maintainer of the conflicting tree to minimise any particularly >> complex conflicts. >> >> -- >> Cheers, >> Stephen Rothwell >> >> diff --cc fs/btrfs/transaction.c >> index 2d5498136e5e,e4dbd8e3c641.. >> --- a/fs/btrfs/transaction.c >> +++ b/fs/btrfs/transaction.c >> @@@ -666,15 -674,17 +672,26 @@@ got_it >> current->journal_info = h; >> >> /* >> +* btrfs_record_root_in_trans() needs to alloc new extents, and may >> +* call btrfs_join_transaction() while we're also starting a >> +* transaction. >> +* >> +* Thus it need to be called after current->journal_info initialized, >> +* or we can deadlock. >> +*/ >> + btrfs_record_root_in_trans(h, root); >> + >> + * If the space_info is marked ALLOC_FORCE then we'll get upgraded to >> + * ALLOC_FORCE the first run through, and then we won't allocate for >> + * anybody else who races in later. We don't care about the return >> + * value here. >> + */ >> +if (do_chunk_alloc && num_bytes) { >> +u64 flags = h->block_rsv->space_info->flags; >> +btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags), >> + CHUNK_ALLOC_NO_FORCE); >> +} >> + >> return h; The proper fix has landed in David's misc-next branch, which puts btrfs_record_root_in_trans(); after the if () {} code block. By that, btrfs_record_root_in_trans() has lesser chance to hit ENOSPC. Thanks, Qu >> >> join_fail: > > > I fixed the missing comment start in my resolution ... > signature.asc Description: OpenPGP digital signature
Re: [PATCH] memcg: oom: ignore oom warnings from memory.max
On Thu, Apr 30, 2020 at 6:39 PM Yafang Shao wrote: > > On Fri, May 1, 2020 at 2:27 AM Shakeel Butt wrote: > > > > Lowering memory.max can trigger an oom-kill if the reclaim does not > > succeed. However if oom-killer does not find a process for killing, it > > dumps a lot of warnings. > > > > I have been confused by this behavior for several months and I think > it will confuse more memcg users. > We should keep the memcg oom behavior consistent with system oom - no > oom kill if no process. > > What about bellow change ? > Seems fine to me. If others are ok with this, please do send a signed-off patch. > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index e28098e13f1c..25fbc37a747f 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -6086,6 +6086,9 @@ static ssize_t memory_max_write(struct > kernfs_open_file *of, > continue; > } > > + if (!cgroup_is_populated(memcg->css.cgroup)) > + break; > + > memcg_memory_event(memcg, MEMCG_OOM); > if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > break; > > > Deleting a memcg does not reclaim memory from it and the memory can > > linger till there is a memory pressure. One normal way to proactively > > reclaim such memory is to set memory.max to 0 just before deleting the > > memcg. However if some of the memcg's memory is pinned by others, this > > operation can trigger an oom-kill without any process and thus can log a > > lot un-needed warnings. So, ignore all such warnings from memory.max. > > > > Signed-off-by: Shakeel Butt > > --- > > include/linux/oom.h | 3 +++ > > mm/memcontrol.c | 9 + > > mm/oom_kill.c | 2 +- > > 3 files changed, 9 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/oom.h b/include/linux/oom.h > > index c696c265f019..6345dc55df64 100644 > > --- a/include/linux/oom.h > > +++ b/include/linux/oom.h > > @@ -52,6 +52,9 @@ struct oom_control { > > > > /* Used to print the constraint info. */ > > enum oom_constraint constraint; > > + > > + /* Do not warn even if there is no process to be killed. */ > > + bool no_warn; > > }; > > > > extern struct mutex oom_lock; > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 317dbbaac603..a1f00d9b9bb0 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -1571,7 +1571,7 @@ unsigned long mem_cgroup_size(struct mem_cgroup > > *memcg) > > } > > > > static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t > > gfp_mask, > > -int order) > > +int order, bool no_warn) > > { > > struct oom_control oc = { > > .zonelist = NULL, > > @@ -1579,6 +1579,7 @@ static bool mem_cgroup_out_of_memory(struct > > mem_cgroup *memcg, gfp_t gfp_mask, > > .memcg = memcg, > > .gfp_mask = gfp_mask, > > .order = order, > > + .no_warn = no_warn, > > }; > > bool ret; > > > > @@ -1821,7 +1822,7 @@ static enum oom_status mem_cgroup_oom(struct > > mem_cgroup *memcg, gfp_t mask, int > > mem_cgroup_oom_notify(memcg); > > > > mem_cgroup_unmark_under_oom(memcg); > > - if (mem_cgroup_out_of_memory(memcg, mask, order)) > > + if (mem_cgroup_out_of_memory(memcg, mask, order, false)) > > ret = OOM_SUCCESS; > > else > > ret = OOM_FAILED; > > @@ -1880,7 +1881,7 @@ bool mem_cgroup_oom_synchronize(bool handle) > > mem_cgroup_unmark_under_oom(memcg); > > finish_wait(_oom_waitq, ); > > mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask, > > -current->memcg_oom_order); > > +current->memcg_oom_order, false); > > } else { > > schedule(); > > mem_cgroup_unmark_under_oom(memcg); > > @@ -6106,7 +6107,7 @@ static ssize_t memory_max_write(struct > > kernfs_open_file *of, > > } > > > > memcg_memory_event(memcg, MEMCG_OOM); > > - if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0)) > > + if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0, true)) > > break; > > } > > > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > > index 463b3d74a64a..5ace39f6fe1e 100644 > > --- a/mm/oom_kill.c > > +++ b/mm/oom_kill.c > > @@ -1098,7 +1098,7 @@ bool out_of_memory(struct oom_control *oc) > > > > select_bad_process(oc); > > /* Found nothing?!?! */ > > - if (!oc->chosen) { > > + if (!oc->chosen && !oc->no_warn) { > > dump_header(oc, NULL); > > pr_warn("Out of memory and no killable processes...\n"); > > /* > > -- > > 2.26.2.526.g744177e7f7-goog > > > >
[PATCH bpf-next 2/2] bpf, arm: Optimize ALU ARSH K using asr immediate instruction
This patch adds an optimization that uses the asr immediate instruction for BPF_ALU BPF_ARSH BPF_K, rather than loading the immediate to a temporary register. This is similar to existing code for handling BPF_ALU BPF_{LSH,RSH} BPF_K. This optimization saves two instructions and is more consistent with LSH and RSH. Example of the code generated for BPF_ALU32_IMM(BPF_ARSH, BPF_REG_0, 5) before the optimization: 2c: movr8, #5 30: movr9, #0 34: asrr0, r0, r8 and after optimization: 2c: asrr0, r0, #5 Tested on QEMU using lib/test_bpf and test_verifier. Co-developed-by: Xi Wang Signed-off-by: Xi Wang Signed-off-by: Luke Nelson --- arch/arm/net/bpf_jit_32.c | 10 +++--- arch/arm/net/bpf_jit_32.h | 3 +++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index 48b89211ee5c..0207b6ea6e8a 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -795,6 +795,9 @@ static inline void emit_a32_alu_i(const s8 dst, const u32 val, case BPF_RSH: emit(ARM_LSR_I(rd, rd, val), ctx); break; + case BPF_ARSH: + emit(ARM_ASR_I(rd, rd, val), ctx); + break; case BPF_NEG: emit(ARM_RSB_I(rd, rd, val), ctx); break; @@ -1408,7 +1411,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) case BPF_ALU | BPF_MUL | BPF_X: case BPF_ALU | BPF_LSH | BPF_X: case BPF_ALU | BPF_RSH | BPF_X: - case BPF_ALU | BPF_ARSH | BPF_K: case BPF_ALU | BPF_ARSH | BPF_X: case BPF_ALU64 | BPF_ADD | BPF_K: case BPF_ALU64 | BPF_ADD | BPF_X: @@ -1465,10 +1467,12 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) case BPF_ALU64 | BPF_MOD | BPF_K: case BPF_ALU64 | BPF_MOD | BPF_X: goto notyet; - /* dst = dst >> imm */ /* dst = dst << imm */ - case BPF_ALU | BPF_RSH | BPF_K: + /* dst = dst >> imm */ + /* dst = dst >> imm (signed) */ case BPF_ALU | BPF_LSH | BPF_K: + case BPF_ALU | BPF_RSH | BPF_K: + case BPF_ALU | BPF_ARSH | BPF_K: if (unlikely(imm > 31)) return -EINVAL; if (imm) diff --git a/arch/arm/net/bpf_jit_32.h b/arch/arm/net/bpf_jit_32.h index fb67cbc589e0..e0b593a1498d 100644 --- a/arch/arm/net/bpf_jit_32.h +++ b/arch/arm/net/bpf_jit_32.h @@ -94,6 +94,9 @@ #define ARM_INST_LSR_I 0x01a00020 #define ARM_INST_LSR_R 0x01a00030 +#define ARM_INST_ASR_I 0x01a00040 +#define ARM_INST_ASR_R 0x01a00050 + #define ARM_INST_MOV_R 0x01a0 #define ARM_INST_MOVS_R0x01b0 #define ARM_INST_MOV_I 0x03a0 -- 2.17.1
[PATCH bpf-next 1/2] bpf, arm: Optimize ALU64 ARSH X using orrpl conditional instruction
This patch optimizes the code generated by emit_a32_arsh_r64, which handles the BPF_ALU64 BPF_ARSH BPF_X instruction. The original code uses a conditional B followed by an unconditional ORR. The optimization saves one instruction by removing the B instruction and using a conditional ORR (with an inverted condition). Example of the code generated for BPF_ALU64_REG(BPF_ARSH, BPF_REG_0, BPF_REG_1), before optimization: 34: rsbip, r2, #32 38: subs r9, r2, #32 3c: lsrlr, r0, r2 40: orrlr, lr, r1, lsl ip 44: bmi0x4c 48: orrlr, lr, r1, asr r9 4c: asrip, r1, r2 50: movr0, lr 54: movr1, ip and after optimization: 34: rsbip, r2, #32 38: subs r9, r2, #32 3c: lsrlr, r0, r2 40: orrlr, lr, r1, lsl ip 44: orrpl lr, lr, r1, asr r9 48: asrip, r1, r2 4c: movr0, lr 50: movr1, ip Tested on QEMU using lib/test_bpf and test_verifier. Co-developed-by: Xi Wang Signed-off-by: Xi Wang Signed-off-by: Luke Nelson --- arch/arm/net/bpf_jit_32.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c index bf85d6db4931..48b89211ee5c 100644 --- a/arch/arm/net/bpf_jit_32.c +++ b/arch/arm/net/bpf_jit_32.c @@ -860,8 +860,8 @@ static inline void emit_a32_arsh_r64(const s8 dst[], const s8 src[], emit(ARM_SUBS_I(tmp2[0], rt, 32), ctx); emit(ARM_MOV_SR(ARM_LR, rd[1], SRTYPE_LSR, rt), ctx); emit(ARM_ORR_SR(ARM_LR, ARM_LR, rd[0], SRTYPE_ASL, ARM_IP), ctx); - _emit(ARM_COND_MI, ARM_B(0), ctx); - emit(ARM_ORR_SR(ARM_LR, ARM_LR, rd[0], SRTYPE_ASR, tmp2[0]), ctx); + _emit(ARM_COND_PL, + ARM_ORR_SR(ARM_LR, ARM_LR, rd[0], SRTYPE_ASR, tmp2[0]), ctx); emit(ARM_MOV_SR(ARM_IP, rd[0], SRTYPE_ASR, rt), ctx); arm_bpf_put_reg32(dst_lo, ARM_LR, ctx); -- 2.17.1
[PATCH bpf 0/2] bpf, arm: Small JIT optimizations
As Daniel suggested to us, we ran our formal verification tool, Serval, over the arm JIT. The bugs we found have been patched and applied to the bpf tree [1, 2]. This patch series introduces two small optimizations that simplify the JIT and use fewer instructions. [1] https://lore.kernel.org/bpf/20200408181229.10909-1-luke.r.n...@gmail.com/ [2] https://lore.kernel.org/bpf/20200409221752.28448-1-luke.r.n...@gmail.com/ Luke Nelson (2): bpf, arm: Optimize emit_a32_arsh_r64 using conditional instruction bpf, arm: Optimize ALU ARSH K using asr immediate instruction arch/arm/net/bpf_jit_32.c | 14 +- arch/arm/net/bpf_jit_32.h | 2 ++ 2 files changed, 11 insertions(+), 5 deletions(-) -- 2.17.1
Re: [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
On Tue, 28 Apr 2020 23:36:27 +0200 Jiri Olsa wrote: > On Fri, Apr 17, 2020 at 04:38:10PM +0900, Masami Hiramatsu wrote: > > SNIP > > > > > > > The code within the kretprobe handler checks for probe reentrancy, > > > so we won't trigger any _raw_spin_lock_irqsave probe in there. > > > > > > The problem is in outside kprobe_flush_task, where we call: > > > > > > kprobe_flush_task > > > kretprobe_table_lock > > > raw_spin_lock_irqsave > > > _raw_spin_lock_irqsave > > > > > > where _raw_spin_lock_irqsave triggers the kretprobe and installs > > > kretprobe_trampoline handler on _raw_spin_lock_irqsave return. > > > > > > The kretprobe_trampoline handler is then executed with already > > > locked kretprobe_table_locks, and first thing it does is to > > > lock kretprobe_table_locks ;-) the whole lockup path like: > > > > > > kprobe_flush_task > > > kretprobe_table_lock > > > raw_spin_lock_irqsave > > > _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline > > > installed > > > > > > ---> kretprobe_table_locks locked > > > > > > kretprobe_trampoline > > > trampoline_handler > > > kretprobe_hash_lock(current, , ); <--- deadlock > > > > > > Adding kprobe_busy_begin/end helpers that mark code with fake > > > probe installed to prevent triggering of another kprobe within > > > this code. > > > > > > Using these helpers in kprobe_flush_task, so the probe recursion > > > protection check is hit and the probe is never set to prevent > > > above lockup. > > > > > > > Thanks Jiri! > > > > Ingo, could you pick this up? > > Ingo, any chance you could take this one? Hi Ingo, Should I make a pull request for all kprobes related patches to you? Thank you, -- Masami Hiramatsu
Re: [PATCH] selftests/ftrace: treat module requirement unmet situation as unsupported
Hi, On Wed, 29 Apr 2020 17:50:44 +0800 Po-Hsu Lin wrote: > When the required module for the test does not exist, use > exit_unsupported instead of exit_unresolved to indicate this test is > not supported. Hmm, this doesn't mean "the function is not supported" but "we can not resolve this because of the environmental issue". For example, if you forgot to install the modules, but the function itself is enabled, that can not be tested, but the system supports that feature. > > By doing this we can make test behaviour in sync with the > irqsoff_tracer.tc test in preemptirq, which is also treating module > existence in this way. Moreover, the test won't exit with a non-zero > return value if the module does not exist. It is OK to return zero even if the result is "unresolved", but I don't want to change the result of each test cases, because it clarify that you must install modules correctly, instead of enabling the feature. And OK, I found irqsoff_tracer.tc IS incorrect. It should be fixed to return UNRESOLVED if there is no test module. If you still think UNRESOLVED is unneeded, please propose the patch which removes all UNRESOLVED related code. That can start another discussion. Thank you, > > Fixes: 646f01ccdd59 ("ftrace/selftest: Add tests to test > register_ftrace_direct()") > Fixes: 4d23e9b4fd2e ("selftests/ftrace: Add trace_printk sample module test") > Fixes: 7bc026d6c032 ("selftests/ftrace: Add function filter on module > testcase") > Fixes: af2a0750f374 ("selftests/ftrace: Improve kprobe on module testcase to > load/unload module") > Signed-off-by: Po-Hsu Lin > --- > tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc | 2 +- > tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc | 2 +- > tools/testing/selftests/ftrace/test.d/event/trace_printk.tc| 2 +- > tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc | 2 +- > tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc | 2 +- > 5 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > index d75a869..3d6189e 100644 > --- a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > +++ b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc > @@ -5,7 +5,7 @@ > rmmod ftrace-direct ||: > if ! modprobe ftrace-direct ; then >echo "No ftrace-direct sample module - please make > CONFIG_SAMPLE_FTRACE_DIRECT=m" > - exit_unresolved; > + exit_unsupported; > fi > > echo "Let the module run a little" > diff --git a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > index 801ecb6..3d0e3ca 100644 > --- a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > +++ b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc > @@ -5,7 +5,7 @@ > rmmod ftrace-direct ||: > if ! modprobe ftrace-direct ; then >echo "No ftrace-direct sample module - please build with > CONFIG_SAMPLE_FTRACE_DIRECT=m" > - exit_unresolved; > + exit_unsupported; > fi > > if [ ! -f kprobe_events ]; then > diff --git a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > index b02550b..dd8b10d 100644 > --- a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > +++ b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc > @@ -5,7 +5,7 @@ > rmmod trace-printk ||: > if ! modprobe trace-printk ; then >echo "No trace-printk sample module - please make > CONFIG_SAMPLE_TRACE_PRINTK=m" > - exit_unresolved; > + exit_unsupported; > fi > > echo "Waiting for irq work" > diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > index 1a4b4a4..26dc06a 100644 > --- a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc > @@ -13,7 +13,7 @@ echo '*:mod:trace_printk' > set_ftrace_filter > if ! modprobe trace-printk ; then >echo "No trace-printk sample module - please make > CONFIG_SAMPLE_TRACE_PRINTK= > m" > - exit_unresolved; > + exit_unsupported; > fi > > : "Wildcard should be resolved after loading module" > diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > index d861bd7..4e07c69 100644 > --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc > @@ -8,7 +8,7 @@ rmmod trace-printk ||: > if ! modprobe trace-printk ; then >echo "No trace-printk sample module - please make > CONFIG_SAMPLE_TRACE_PRINTK= > m" > - exit_unresolved; > + exit_unsupported; > fi > > MOD=trace_printk > -- > 2.7.4 > -- Masami Hiramatsu
[PATCH 2/3] mm/swapfile.c: __swap_entry_free() always free 1 entry
__swap_entry_free() always free 1 entry, let's remove the usage. Signed-off-by: Wei Yang --- mm/swapfile.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index f73e0c11fab9..1a877d1d40e3 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1251,13 +1251,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) } static unsigned char __swap_entry_free(struct swap_info_struct *p, - swp_entry_t entry, unsigned char usage) + swp_entry_t entry) { struct swap_cluster_info *ci; unsigned long offset = swp_offset(entry); + unsigned char usage; ci = lock_cluster_or_swap_info(p, offset); - usage = __swap_entry_free_locked(p, offset, usage); + usage = __swap_entry_free_locked(p, offset, 1); unlock_cluster_or_swap_info(p, ci); if (!usage) free_swap_slot(entry); @@ -1292,7 +1293,7 @@ void swap_free(swp_entry_t entry) p = _swap_info_get(entry); if (p) - __swap_entry_free(p, entry, 1); + __swap_entry_free(p, entry); } /* @@ -1715,7 +1716,7 @@ int free_swap_and_cache(swp_entry_t entry) p = _swap_info_get(entry); if (p) { - count = __swap_entry_free(p, entry, 1); + count = __swap_entry_free(p, entry); if (count == SWAP_HAS_CACHE && !swap_page_trans_huge_swapped(p, entry)) __try_to_reclaim_swap(p, swp_offset(entry), -- 2.23.0
[PATCH 3/3] mm/swapfile.c: count won't be bigger than SWAP_MAP_MAX
When the condition is true, there are two possibilities: 1. count == SWAP_MAP_BAD 2. count == (SWAP_MAP_MAX & COUNT_CONTINUED) == SWAP_MAP_SHMEM The first case would be filtered by the first if in __swap_duplicate(). And the second case means this swap entry is for shmem. Since we never do another duplication for shmem swap entry. This won't happen neither. Signed-off-by: Wei Yang --- mm/swapfile.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 1a877d1d40e3..88dd2ad34aad 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3404,8 +3404,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage) if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX) count += usage; - else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX) - err = -EINVAL; else if (swap_count_continued(p, offset, count)) count = COUNT_CONTINUED; else -- 2.23.0
[PATCH 1/3] mm/swapfile.c: classify SWAP_MAP_XXX to make it more readable
swap_info_struct->swap_map[] encodes some flag and count. And to do some condition check, it also introduces some special values. Currently those macros are defined with some magic order, which makes audience hard to understand the exact meaning. This patch split those macros into three categories: flag special value for first swap_map special value for continued swap_map May this help audiences a little. Signed-off-by: Wei Yang --- include/linux/swap.h | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 1e99f7ac1d7e..ec507c67529b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -183,12 +183,17 @@ enum { #define SWAP_CLUSTER_MAX 32UL #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX -#define SWAP_MAP_MAX 0x3e/* Max duplication count, in first swap_map */ -#define SWAP_MAP_BAD 0x3f/* Note pageblock is bad, in first swap_map */ +/* Bit flag in swap_map */ #define SWAP_HAS_CACHE 0x40/* Flag page is cached, in first swap_map */ -#define SWAP_CONT_MAX 0x7f/* Max count, in each swap_map continuation */ -#define COUNT_CONTINUED0x80/* See swap_map continuation for full count */ -#define SWAP_MAP_SHMEM 0xbf/* Owned by shmem/tmpfs, in first swap_map */ +#define COUNT_CONTINUED0x80/* Flag swap_map continuation for full count */ + +/* Special Value in first swap_map */ +#define SWAP_MAP_MAX 0x3e/* Max count */ +#define SWAP_MAP_BAD 0x3f/* Note page is bad */ +#define SWAP_MAP_SHMEM 0xbf/* Owned by shmem/tmpfs */ + +/* Special Value in each swap_map continuation */ +#define SWAP_CONT_MAX 0x7f/* Max count */ /* * We use this to track usage of a cluster. A cluster is a block of swap disk -- 2.23.0
Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume
On 2020-04-30 18:42, Can Guo wrote: > On 2020-05-01 04:32, Bart Van Assche wrote: > > Has it been considered to test directly whether a SCSI device has been > > runtime suspended instead of relying on blk_queue_pm_only()? How about > > using pm_runtime_status_suspended() or adding a function in > > block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? > > Yes, I used to make the patch like that way, and it also worked well, as > both ways are equal actually. I kinda like the current code because we > should be confident that after scsi_dev_type_resume() returns, pm_only > must be 0. Different reviewers may have different opinions, either way > works well anyways. Hi Can, Please note that this is not a matter of personal preferences of a reviewer but a matter of correctness. blk_queue_pm_only() does not only return a value > 0 if a SCSI device has been runtime suspended but also returns true if scsi_device_quiesce() was called for another reason. Hence my request to test the "runtime suspended" status directly and not to rely on blk_queue_pm_only(). Thanks, Bart.
Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private
On Fri, May 01, 2020 at 02:42:29AM +0100, Al Viro wrote: > On Thu, Apr 30, 2020 at 03:13:38PM -0700, Matthew Wilcox wrote: > > > > +/** > > > + * clear_page_private - clear page's private field and PG_private. > > > + * @page: page to be cleared. > > > + * > > > + * The counterpart function of attach_page_private. > > > + * Return: private data of page or NULL if page doesn't have private > > > data. > > > + */ > > > > Seems to me that the opposite of "attach" is "detach", not "clear". > > Or "remove", perhaps... Actually, "detach" is better - neither "clear" nor "remove" imply "... and give me what used to be attached there", as this thing is doing.
Re: [RFC PATCH v2 1/3] printk: Add function to set console to preferred console's driver
On (20/04/30 19:14), Alper Nebi Yasak wrote: > Currently, add_preferred_console sets a preferred console, but doesn't > actually change /dev/console to match it. That part is handled within > register_device, where a newly registered console driver will be set as > /dev/console if it matches the preferred console. > > However, if the relevant driver is already registered, the only way to > set it as /dev/console is by un-registering and re-registering it. Hmm. Preferred console selection is very fragile, there are too many setups and workarounds that even minor tweaks introduce regressions oftentimes. We have, by the way, a pending patchset which changes the same are - preferred console selection. git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git for-5.7-preferred-console [..] > An example is the xenfb_make_preferred_console() function: > > console_lock(); > for_each_console(c) { > if (!strcmp(c->name, "tty") && c->index == 0) > break; > } > console_unlock(); > if (c) { > unregister_console(c); > c->flags |= CON_CONSDEV; > c->flags &= ~CON_PRINTBUFFER; /* don't print again */ > register_console(c); > } I didn't know about this code. > The code above was introduced in commit 9e124fe16ff2 ("xen: Enable > console tty by default in domU if it's not a dummy"). In short, it's aim > is to set VT as the preferred console only after a working framebuffer > is registered and thus VT is not the dummy device. > > This patch introduces an update_console_to_preferred function that > handles the necessary /dev/console change. With this change, the example > above can be replaced with: > > console_lock(); > add_preferred_console("tty", 0, NULL); > update_console_to_preferred(); > console_unlock(); > > More importantly, these two calls can be moved to vt.c in order to bump > its priority when a non-dummy backend for it is introduced, solving that > problem in general. Let me take a look over the weekend. -ss
Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private
On Thu, Apr 30, 2020 at 03:13:38PM -0700, Matthew Wilcox wrote: > > +/** > > + * clear_page_private - clear page's private field and PG_private. > > + * @page: page to be cleared. > > + * > > + * The counterpart function of attach_page_private. > > + * Return: private data of page or NULL if page doesn't have private data. > > + */ > > Seems to me that the opposite of "attach" is "detach", not "clear". Or "remove", perhaps...
Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume
On 2020-05-01 04:32, Bart Van Assche wrote: On 2020-04-29 22:40, Can Guo wrote: On 2020-04-30 13:08, Bart Van Assche wrote: On 2020-04-29 21:10, Can Guo wrote: During system resume, scsi_resume_device() decreases a request queue's pm_only counter if the scsi device was quiesced before. But after that, if the scsi device's RPM status is RPM_SUSPENDED, the pm_only counter is still held (non-zero). Current scsi resume hook only sets the RPM status of the scsi device and its request queue to RPM_ACTIVE, but leaves the pm_only counter unchanged. This may make the request queue's pm_only counter remain non-zero after resume hook returns, hence those who are waiting on the mq_freeze_wq would never be woken up. Fix this by calling blk_post_runtime_resume() if pm_only is non-zero to balance the pm_only counter which is held by the scsi device's RPM ops. How was this issue discovered? How has this patch been tested? As the issue was found after system resumes, so the issue was discovered during system suspend/resume test, and it is very easy to be replicated. After system resumes, if this issue hits some scsi devices, all bios sent to their request queues are blocked, which may cause a system hang if the scsi devices are vital to system functionality. To make sure the patch work well, we have tested system suspend/resume and made sure no system hang happen due to request queues got blocked by imbalanced pm_only counter. Thanks, that's very interesting information. My concern with this patch is that the power management code is not the only caller of blk_set_pm_only() / blk_clear_pm_only(). E.g. the SCSI SPI code also calls scsi_device_quiesce() and scsi_device_resume(). These last functions call blk_set_pm_only() and blk_clear_pm_only(). More calls of scsi_device_quiesce() and scsi_device_resume() might be added in the future. Has it been considered to test directly whether a SCSI device has been runtime suspended instead of relying on blk_queue_pm_only()? How about using pm_runtime_status_suspended() or adding a function in block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? Thanks, Bart. Hi Bart, Slightly revised my previous mail. Please let me address your concern. First of all, it is allowed to call scsi_device_quiesce() multiple times, but one sdev's request queue's pm_only counter can only be increased once by scsi_device_quiesce(), because if a sdev has already been quiesced, in scsi_device_quiesce(), scsi_device_set_state(sdev, SDEV_QUIESCE) would return -ENIVAL (illegal state transform), then blk_clear_pm_only() shall be called to decrease pm_only once, so no matter how many times scsi_device_quiesce() is called, it can only increase pm_only once. scsi_device_resume() is same, it calls blk_clear_pm_only only once and only if the sdev was quiesced(). So, in a word, after scsi_device_resume() returns in scsi_dev_type_resume(), if a sdev has block layer runtime PM enabled (sdev->request_queue->dev is not NULL), its queue's pm_only counter should be 1 (if the sdev's runtime power status is RPM_SUSPENDED) or 0 (if the sdev's runtime power status is RPM_ACTIVE). If a sdev has block layer runtime PM disabled (sdev->request_queue->dev is NULL), its queue's pm_only counter should be 0. Has it been considered to test directly whether a SCSI device has been runtime suspended instead of relying on blk_queue_pm_only()? How about using pm_runtime_status_suspended() or adding a function in block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED? Yes, I used to make the patch like that way, and it also worked well, as both ways are equal actually. I kinda like the current code because we should be confident that after scsi_dev_type_resume() returns, pm_only must be 0. Different reviewers may have different opinions, either way works well anyways. Thanks, Can Guo.
Re: [PATCH -next] tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart
On 4/28/2020 4:18 AM, Greg KH wrote: > On Mon, Apr 27, 2020 at 10:29:58AM -0700, Florian Fainelli wrote: >> >> >> On 4/26/2020 11:19 PM, Jiri Slaby wrote: >>> On 21. 04. 20, 14:31, Zou Wei wrote: This patch fixes below error reported by coccicheck drivers/tty/serial/bcm63xx_uart.c:848:2-8: ERROR: missing clk_put; clk_get on line 842 and execution via conditional on line 846 Fixes: ab4382d27412 ("tty: move drivers/serial/ to drivers/tty/serial/") Reported-by: Hulk Robot Signed-off-by: Zou Wei --- drivers/tty/serial/bcm63xx_uart.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/bcm63xx_uart.c b/drivers/tty/serial/bcm63xx_uart.c index 5674da2..ed0aa5c 100644 --- a/drivers/tty/serial/bcm63xx_uart.c +++ b/drivers/tty/serial/bcm63xx_uart.c @@ -843,8 +843,10 @@ static int bcm_uart_probe(struct platform_device *pdev) if (IS_ERR(clk) && pdev->dev.of_node) clk = of_clk_get(pdev->dev.of_node, 0); - if (IS_ERR(clk)) + if (IS_ERR(clk)) { + clk_put(clk); >>> >>> Why would you want to put an erroneous clk? >> >> Doh, somehow I completely missed, you are right this does not look legit. > > Ugh, can you send a revert for this please? Yes, now done: https://lore.kernel.org/linux-arm-kernel/20200501013904.1394-1-f.faine...@gmail.com/ -- Florian