Re: [PATCH v5 0/5] Track and expose idle PURR and SPURR ticks

2020-04-30 Thread Gautham R Shenoy
On Thu, Apr 30, 2020 at 09:46:13AM +0530, Gautham R Shenoy wrote:
> Hello Michael,
> > >
> > > Michael, could you please consider this for 5.8 ?
> > 
> > Yes. Has it been tested on KVM at all?
> 
> No. I haven't tested this on KVM. Will do that today.


The results on Shared LPAR and KVM are as follows:
---

The lparstat results on a Shared LPAR are consistent with that
observed on a dedicated LPAR when at least one of the threads of the
core is active. When all the threads are idle, the lparstat shows
incorrect idle percentage. But this is perhaps due to the fact that
the Hypervisor puts a completely idle core in some power-saving state
with runlatch turned off due to which PURR counts on the threads of a
core do not add up to the elapsed timebase ticks. The results are in
section A) below.

lparstat is not supported on KVM. However, I performed some basic
sanity checks on purr, spurr, idle_purr, and idle_spurr sysfs files
that show up after this patch series. When CPUs are offlined, the
idle_purr and idle_spurr sysfs files no longer show up, just like purr
and spurr sysfs files. The values of the counters monotonically
increase, except when the CPU is busy, in which case the idle_purr and
idle_spurr counts are stagnant as expected.

However, I don't think the even the values of PURR or SPURR make much
sense on KVM guest, since the Linux Hypervisor doesn't set additional
registers such as RWMR, except on POWER8, where the KVM sets RWMR
corresponding to the number of online threads in a vCORE before
dispatching the vcore. I haven't been able to test it on a POWER8
guest yet. The results on POWER9 are in section B) below.


A ) Shared LPAR
==

1. When all the threads of the core are running a CPU-Hog

# ./lparstat -E 1 5
System Configuration
type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 
---Actual--- -Normalized-
%busy  %idle   Frequency %busy  %idle
-- --  - -- --
100.00   0.00  2.90GHz[126%] 126.00   0.00
100.00   0.00  2.90GHz[126%] 126.00   0.00
100.00   0.00  2.90GHz[126%] 126.00   0.00
100.00   0.00  2.90GHz[126%] 126.00   0.00
100.01   0.00  2.90GHz[126%] 126.01   0.00

2. When 4 threads of a core are running CPU Hogs, with the remaining 4
threads idle.

# ./lparstat -E 1 5
System Configuration
type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 
---Actual--- -Normalized-
%busy  %idle   Frequency %busy  %idle
-- --  - -- --
 81.06  18.94  2.97GHz[129%] 104.56  24.44
 81.05  18.95  2.97GHz[129%] 104.56  24.44
 81.06  18.95  2.97GHz[129%] 104.56  24.44
 81.06  18.95  2.97GHz[129%] 104.56  24.44
 81.05  18.95  2.97GHz[129%] 104.56  24.45

3. When 2 threads of a core are running CPU Hogs, with the other 6
threads idle.

# ./lparstat -E 1 5
System Configuration
type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 
---Actual--- -Normalized-
%busy  %idle   Frequency %busy  %idle
-- --  - -- --
 65.21  34.79  3.13GHz[136%]  88.69  47.31
 65.20  34.81  3.13GHz[136%]  88.67  47.33
 64.25  35.76  3.13GHz[136%]  87.38  48.63
 63.68  36.31  3.13GHz[136%]  86.60  49.39
 63.55  36.45  3.13GHz[136%]  86.42  49.58
 

4. When a single thread of the core is running CPU Hog, remaining 7
threads are idle.
# ./lparstat -E 1 5
System Configuration
type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 
---Actual--- -Normalized-
%busy  %idle   Frequency %busy  %idle
-- --  - -- --
 31.80  68.20  3.20GHz[139%]  44.20  94.80
 31.80  68.20  3.20GHz[139%]  44.20  94.81
 31.80  68.20  3.20GHz[139%]  44.20  94.80
 31.80  68.21  3.20GHz[139%]  44.20  94.81
 31.79  68.21  3.20GHz[139%]  44.19  94.81

5. When the LPAR is idle:

# ./lparstat -E 1 5
System Configuration
type=Shared mode=Capped smt=8 lcpu=6 mem=10362752 kB cpus=10 ent=6.00 
---Actual--- -Normalized-
%busy  %idle   Frequency %busy  %idle
-- --  - -- --
  0.04   0.14  2.41GHz[105%]   0.04   0.15
  0.04   0.15  2.36GHz[102%]   0.04   0.15
  0.03   0.13  2.35GHz[102%]   0.03   0.14
  0.03   0.13  2.31GHz[100%]   0.03   0.13
  0.03   0.13  2.32GHz[101%]   0.03   0.14

In this case, the sum of the PURR values do not add up to the elapsed
TB. This is probably due to the Hypervisor putting the core into some
power-saving state with the runlatch turned off.

# ./purr_tb -t 8
Got threads_per_core = 8
CORE 0: 
CPU 0 : Delta PURR : 85744 
CPU 1 : Delta PURR : 113632 
CPU 2 : Delta PURR : 78224 
CPU 3 : Delta PURR : 68856 
CPU 4 : Delta PURR : 78064 
CPU 5 : Delta PURR : 60488 
CPU 6 : Delta PURR : 6 
CPU 7 : Delta PURR : 59464 
Total Delta PURR : 622248 (Expected ~513156096)


Re: [PATCH v2 0/5] Fix ELF / FDPIC ELF core dumping, and use mmap_sem properly in there

2020-04-30 Thread Greg Ungerer



On 1/5/20 5:07 am, Eric W. Biederman wrote:

Linus Torvalds  writes:


On Thu, Apr 30, 2020 at 7:10 AM Greg Ungerer  wrote:



Most of that file goes back to pre-git days. And most of the commits
since are not so much about binfmt_flat, as they are about cleanups or
changes elsewhere where binfmt_flat was just a victim.


I'll have a look at this.


Thanks.


Quick hack test shows moving setup_new_exec(bprm) to be just before
install_exec_creds(bprm) works fine for the static binaries case.
Doing the flush_old_exec(bprm) there too crashed out - I'll need to
dig into that to see why.


Just moving setup_new_exec() would at least allow us to then join the
two together, and just say "setup_new_exec() does the credential
installation too".


But it is only half a help if we allow failure points between
flush_old_exec and install_exec_creds.

Greg do things work acceptably if install_exec_creds is moved to right
after setup_new_exec? (patch below)


Yes, confirmed. Worked fine with that patch applied.



Looking at the code in load_flat_file after setup_new_exec it looks like
the kinds of things that in binfmt_elf.c we do after install_exec_creds
(aka vm_map).  So I think we want install_exec_creds sooner, instead
of setup_new_exec later.


But if it's true that nobody really uses the odd flat library support
any more and there are no testers, maybe we should consider ripping it
out...


I looked a little deeper and there is another reason to think about
ripping out the flat library loader.  The code is recursive, and
supports a maximum of 4 shared libraries in the entire system.

load_flat_binary
load_flat_file
calc_reloc
load_flat_shared_libary
load_flat_file


I am mystified with what kind of system can survive with a grand total
of 4 shared libaries.  I think my a.out slackware system that I ran on
my i486 had more shared libraries.


The kind of embedded systems that were built with this stuff 20 years
ago didn't have lots of applications and libraries. I think we found
back then that most of your savings were from making libc shared.
Less significant gains from making other libraries shared. And there
was a bit of extra pain in setting them up with the shared library
code generation options (that had to be unique for each one).

The whole mechanism is a bit of hack, and there was a few other
limitations with the way it worked (I don't recall what they were
right now).

I am definitely in favor of removing it.

Regards
Greg




Having read just a bit more it is definitely guaranteed (by the code)
that the first time load_flat_file is called id 0 will be used (aka id 0
is guaranteed to be the binary), and the ids 1, 2, 3 and 4 will only be
used if a relocation includes that id to reference an external shared
library.  That part of the code is drop dead simple.

---

This is what I was thinking about applying.

diff --git a/fs/binfmt_flat.c b/fs/binfmt_flat.c
index 831a2b25ba79..1a1d1fcb893f 100644
--- a/fs/binfmt_flat.c
+++ b/fs/binfmt_flat.c
@@ -541,6 +541,7 @@ static int load_flat_file(struct linux_binprm *bprm,
/* OK, This is the point of no return */
set_personality(PER_LINUX_32BIT);
setup_new_exec(bprm);
+   install_exec_creds(bprm);
}
  
  	/*

@@ -963,8 +964,6 @@ static int load_flat_binary(struct linux_binprm *bprm)
}
}
  
-	install_exec_creds(bprm);

-
set_binfmt(_format);
  
  #ifdef CONFIG_MMU





Re: [PATCH] um: do not evaluate compiler's library path when cleaning

2020-04-30 Thread Masahiro Yamada
On Sat, Apr 18, 2020 at 3:06 AM Masahiro Yamada  wrote:
>
> Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from
> -Wframe-larger-than="), 'make ARCH=um clean' emits an error message
> as follows:
>
>   $ make ARCH=um clean
>   gcc: error: missing argument to '-Wframe-larger-than='
>
> We do not care compiler flags when cleaning.
>
> Use the '=' operator for lazy expansion because we do not use
> LDFLAGS_pcap.o or LDFLAGS_vde.o when cleaning.
>
> While I was here, I removed the redundant -r option because it
> already exists in the recipe.
>
> Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from 
> -Wframe-larger-than=")
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/um/drivers/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile
> index a290821e355c..2a249f619467 100644
> --- a/arch/um/drivers/Makefile
> +++ b/arch/um/drivers/Makefile
> @@ -18,9 +18,9 @@ ubd-objs := ubd_kern.o ubd_user.o
>  port-objs := port_kern.o port_user.o
>  harddog-objs := harddog_kern.o harddog_user.o
>
> -LDFLAGS_pcap.o := -r $(shell $(CC) $(KBUILD_CFLAGS) 
> -print-file-name=libpcap.a)
> +LDFLAGS_pcap.o = $(shell $(CC) $(KBUILD_CFLAGS) -print-file-name=libpcap.a)
>
> -LDFLAGS_vde.o := -r $(shell $(CC) $(CFLAGS) -print-file-name=libvdeplug.a)
> +LDFLAGS_vde.o = $(shell $(CC) $(CFLAGS) -print-file-name=libvdeplug.a)
>
>  targets := pcap_kern.o pcap_user.o vde_kern.o vde_user.o
>
> --
> 2.25.1
>


Applied to linux-kbuild.


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] kbuild: remove unused AS assignment

2020-04-30 Thread Masahiro Yamada
On Mon, Apr 27, 2020 at 5:06 PM Will Deacon  wrote:
>
> On Mon, Apr 27, 2020 at 09:30:19AM +0900, Masahiro Yamada wrote:
> > $(AS) is not used anywhere, hence commit aa824e0c962b ("kbuild: remove
> > AS variable") killed it.
> >
> > Remove the left-over code in arch/{arm,arm64}/Makefile.
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/arm/Makefile   | 2 --
> >  arch/arm64/Makefile | 2 --
> >  2 files changed, 4 deletions(-)
>
> Acked-by: Will Deacon 
>
> (If you post the arm64 part as a separate patch, I can pick it up)


This is just a trivial cleanup patch.
So, I do not split it per-arch.


Applied to linux-kbuild.


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] kbuild: remove '/' target

2020-04-30 Thread Masahiro Yamada
On Sat, Apr 25, 2020 at 10:18 PM Masahiro Yamada  wrote:
>
> This notice has been here for a while. Remove it entirely now.
>
> Signed-off-by: Masahiro Yamada 
> ---
>



Applied to linux-kbuild.


>  Makefile | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 49b2709ff44e..95c2d8c2dfe8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1650,10 +1650,6 @@ _emodinst_post: _emodinst_
>  clean-dirs := $(KBUILD_EXTMOD)
>  clean: rm-files := $(KBUILD_EXTMOD)/Module.symvers 
> $(KBUILD_EXTMOD)/modules.nsdeps
>
> -PHONY += /
> -/:
> -   @echo >&2 '"$(MAKE) /" is no longer supported. Please use "$(MAKE) 
> ./" instead.'
> -
>  PHONY += help
>  help:
> @echo  '  Building external modules.'
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] kbuild: determine the output format of DTC by the target suffix

2020-04-30 Thread Masahiro Yamada
On Mon, Apr 27, 2020 at 11:13 PM Masahiro Yamada  wrote:
>
> cmd_dtc takes the additional parameter $(2) to select the target
> format, dtb or yaml. This makes things complicated when it is used
> with cmd_and_fixdep and if_changed_rule. I actually stumbled on this.
> See commit 3d4b2238684a ("kbuild: fix DT binding schema rule again to
> avoid needless rebuilds").
>
> Extract the suffix part of the target instead of passing the parameter.
> Fortunately, this works for both $(obj)/%.dtb and $(obj)/%.dt.yaml .
>
> Signed-off-by: Masahiro Yamada 



Applied to linux-kbuild.


> ---
>
>  scripts/Makefile.lib | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
> index 12f6a331a8f3..cd52a8c6428f 100644
> --- a/scripts/Makefile.lib
> +++ b/scripts/Makefile.lib
> @@ -287,13 +287,13 @@ $(obj)/%.dtb.S: $(obj)/%.dtb FORCE
>  quiet_cmd_dtc = DTC $@
>  cmd_dtc = mkdir -p $(dir ${dtc-tmp}) ; \
> $(HOSTCC) -E $(dtc_cpp_flags) -x assembler-with-cpp -o $(dtc-tmp) $< 
> ; \
> -   $(DTC) -O $(2) -o $@ -b 0 \
> +   $(DTC) -O $(patsubst .%,%,$(suffix $@)) -o $@ -b 0 \
> $(addprefix -i,$(dir $<) $(DTC_INCLUDE)) $(DTC_FLAGS) \
> -d $(depfile).dtc.tmp $(dtc-tmp) ; \
> cat $(depfile).pre.tmp $(depfile).dtc.tmp > $(depfile)
>
>  $(obj)/%.dtb: $(src)/%.dts $(DTC) FORCE
> -   $(call if_changed_dep,dtc,dtb)
> +   $(call if_changed_dep,dtc)
>
>  DT_CHECKER ?= dt-validate
>  DT_BINDING_DIR := Documentation/devicetree/bindings
> @@ -304,7 +304,7 @@ quiet_cmd_dtb_check =   CHECK   $@
>cmd_dtb_check =  $(DT_CHECKER) -u $(srctree)/$(DT_BINDING_DIR) -p 
> $(DT_TMP_SCHEMA) $@
>
>  define rule_dtc
> -   $(call cmd_and_fixdep,dtc,yaml)
> +   $(call cmd_and_fixdep,dtc)
> $(call cmd,dtb_check)
>  endef
>
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH v2 2/2] PCI: hv: Retry PCI bus D0 entry when the first attempt failed with invalid device state

2020-04-30 Thread Wei Hu
In the case of kdump, the PCI device was not cleanly shut down
before the kdump kernel starts. This causes the initial
attempt of entering D0 state in the kdump kernel to fail with
invalid device state returned from Hyper-V host.
When this happens, explicitly call PCI bus exit and retry to
enter the D0 state.

Signed-off-by: Wei Hu 
---
   v2: Incorporate review comments from Michael Kelley, Dexuan Cui and
   Bjorn Helgaas

 drivers/pci/controller/pci-hyperv.c | 40 +++--
 1 file changed, 38 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index e6fac0187722..92092a47d3af 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -2739,6 +2739,8 @@ static void hv_free_config_window(struct hv_pcibus_device 
*hbus)
vmbus_free_mmio(hbus->mem_config->start, PCI_CONFIG_MMIO_LENGTH);
 }
 
+static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs);
+
 /**
  * hv_pci_enter_d0() - Bring the "bus" into the D0 power state
  * @hdev:  VMBus's tracking struct for this root PCI bus
@@ -2751,8 +2753,10 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
struct pci_bus_d0_entry *d0_entry;
struct hv_pci_compl comp_pkt;
struct pci_packet *pkt;
+   bool retry = true;
int ret;
 
+enter_d0_retry:
/*
 * Tell the host that the bus is ready to use, and moved into the
 * powered-on state.  This includes telling the host which region
@@ -2779,6 +2783,38 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
if (ret)
goto exit;
 
+   /*
+* In certain case (Kdump) the pci device of interest was
+* not cleanly shut down and resource is still held on host
+* side, the host could return invalid device status.
+* We need to explicitly request host to release the resource
+* and try to enter D0 again.
+*/
+   if (comp_pkt.completion_status < 0 && retry) {
+   retry = false;
+
+   dev_err(>device, "Retrying D0 Entry\n");
+
+   /*
+* Hv_pci_bus_exit() calls hv_send_resource_released()
+* to free up resources of its child devices.
+* In the kdump kernel we need to set the
+* wslot_res_allocated to 255 so it scans all child
+* devices to release resources allocated in the
+* normal kernel before panic happened.
+*/
+   hbus->wslot_res_allocated = 255;
+
+   ret = hv_pci_bus_exit(hdev, true);
+
+   if (ret == 0) {
+   kfree(pkt);
+   goto enter_d0_retry;
+   }
+   dev_err(>device,
+   "Retrying D0 failed with ret %d\n", ret);
+   }
+
if (comp_pkt.completion_status < 0) {
dev_err(>device,
"PCI Pass-through VSP failed D0 Entry with status %x\n",
@@ -3185,7 +3221,7 @@ static int hv_pci_probe(struct hv_device *hdev,
return ret;
 }
 
-static int hv_pci_bus_exit(struct hv_device *hdev, bool hibernating)
+static int hv_pci_bus_exit(struct hv_device *hdev, bool keep_devs)
 {
struct hv_pcibus_device *hbus = hv_get_drvdata(hdev);
struct {
@@ -3203,7 +3239,7 @@ static int hv_pci_bus_exit(struct hv_device *hdev, bool 
hibernating)
if (hdev->channel->rescind)
return 0;
 
-   if (!hibernating) {
+   if (!keep_devs) {
/* Delete any children which might still exist. */
dr = kzalloc(sizeof(*dr), GFP_KERNEL);
if (dr && hv_pci_start_relations_work(hbus, dr))
-- 
2.20.1



Re: [PATCH] parisc: suppress error messages for 'make clean'

2020-04-30 Thread Masahiro Yamada
On Sat, Apr 25, 2020 at 2:47 PM Masahiro Yamada  wrote:
>
> 'make ARCH=parisc clean' emits a tons of error messages as follows:
>
>   $ make ARCH=parisc clean
>   gcc: error: unrecognized command line option '-mno-space-regs'
>   gcc: error: unrecognized command line option '-mfast-indirect-calls'; did 
> you mean '-mforce-indirect-call'?
>   gcc: error: unrecognized command line option '-mdisable-fpregs'
>   gcc: error: missing argument to '-Wframe-larger-than='
>   gcc: error: unrecognized command line option '-mno-space-regs'
>   gcc: error: unrecognized command line option '-mfast-indirect-calls'; did 
> you mean '-mforce-indirect-call'?
>   gcc: error: unrecognized command line option '-mdisable-fpregs'
>   gcc: error: missing argument to '-Wframe-larger-than='
> ...
>
> You can supporess them except '-Wframe-larger-than' by setting correct
> CROSS_COMPILE=, but we should not require any compiler for cleaning.
>
> This $(shell ...) is evaluated so many times because LIBGCC is exported.
> Use the ':=' operator to evaluate it just once, and sink the stderr.
>



Applied to linux-kbuild.


> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/parisc/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
> index 628cd8bb7ad8..d82787da43cd 100644
> --- a/arch/parisc/Makefile
> +++ b/arch/parisc/Makefile
> @@ -21,7 +21,7 @@ KBUILD_IMAGE := vmlinuz
>
>  NM = sh $(srctree)/arch/parisc/nm
>  CHECKFLAGS += -D__hppa__=1
> -LIBGCC = $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
> +LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name 
> 2>/dev/null)
>  export LIBGCC
>
>  ifdef CONFIG_64BIT
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH v2 1/2] PCI: hv: Fix the PCI HyperV probe failure path to release resource properly

2020-04-30 Thread Wei Hu
Some error cases in hv_pci_probe() were not handled. Fix these error
paths to release the resourses and clean up the state properly.

Signed-off-by: Wei Hu 
---
 drivers/pci/controller/pci-hyperv.c | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/controller/pci-hyperv.c 
b/drivers/pci/controller/pci-hyperv.c
index e15022ff63e3..e6fac0187722 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -480,6 +480,9 @@ struct hv_pcibus_device {
 
struct workqueue_struct *wq;
 
+   /* Highest slot of child device with resources allocated */
+   int wslot_res_allocated;
+
/* hypercall arg, must not cross page boundary */
struct hv_retarget_device_interrupt retarget_msi_interrupt_params;
 
@@ -2847,7 +2850,7 @@ static int hv_send_resources_allocated(struct hv_device 
*hdev)
struct hv_pci_dev *hpdev;
struct pci_packet *pkt;
size_t size_res;
-   u32 wslot;
+   int wslot;
int ret;
 
size_res = (hbus->protocol_version < PCI_PROTOCOL_VERSION_1_2)
@@ -2900,6 +2903,8 @@ static int hv_send_resources_allocated(struct hv_device 
*hdev)
comp_pkt.completion_status);
break;
}
+
+   hbus->wslot_res_allocated = wslot;
}
 
kfree(pkt);
@@ -2918,10 +2923,10 @@ static int hv_send_resources_released(struct hv_device 
*hdev)
struct hv_pcibus_device *hbus = hv_get_drvdata(hdev);
struct pci_child_message pkt;
struct hv_pci_dev *hpdev;
-   u32 wslot;
+   int wslot;
int ret;
 
-   for (wslot = 0; wslot < 256; wslot++) {
+   for (wslot = hbus->wslot_res_allocated; wslot >= 0; wslot--) {
hpdev = get_pcichild_wslot(hbus, wslot);
if (!hpdev)
continue;
@@ -2936,8 +2941,12 @@ static int hv_send_resources_released(struct hv_device 
*hdev)
   VM_PKT_DATA_INBAND, 0);
if (ret)
return ret;
+
+   hbus->wslot_res_allocated = wslot - 1;
}
 
+   hbus->wslot_res_allocated = -1;
+
return 0;
 }
 
@@ -3037,6 +3046,7 @@ static int hv_pci_probe(struct hv_device *hdev,
if (!hbus)
return -ENOMEM;
hbus->state = hv_pcibus_init;
+   hbus->wslot_res_allocated = -1;
 
/*
 * The PCI bus "domain" is what is called "segment" in ACPI and other
@@ -3136,7 +3146,7 @@ static int hv_pci_probe(struct hv_device *hdev,
 
ret = hv_pci_allocate_bridge_windows(hbus);
if (ret)
-   goto free_irq_domain;
+   goto exit_d0;
 
ret = hv_send_resources_allocated(hdev);
if (ret)
@@ -3154,6 +3164,8 @@ static int hv_pci_probe(struct hv_device *hdev,
 
 free_windows:
hv_pci_free_bridge_windows(hbus);
+exit_d0:
+   (void) hv_pci_bus_exit(hdev, true);
 free_irq_domain:
irq_domain_remove(hbus->irq_domain);
 free_fwnode:
-- 
2.20.1



Re: [PATCH] unicore32: do not evaluate compiler's library path when cleaning

2020-04-30 Thread Masahiro Yamada
On Sat, Apr 25, 2020 at 3:07 PM Masahiro Yamada  wrote:
>
> Since commit a83e4ca26af8 ("kbuild: remove cc-option switch from
> -Wframe-larger-than="), 'make ARCH=unicore32 clean' emits error
> messages as follows:
>
>   $ make ARCH=unicore32 clean
>   gcc: error: missing argument to '-Wframe-larger-than='
>   gcc: error: missing argument to '-Wframe-larger-than='
>
> We do not care compiler flags when cleaning.
>
> Use the '=' operator for lazy expansion because we do not use
> GNU_LIBC_A or GNU_LIBGCC_A when cleaning.
>
> Fixes: a83e4ca26af8 ("kbuild: remove cc-option switch from 
> -Wframe-larger-than=")
> Signed-off-by: Masahiro Yamada 
> ---
>


Applied to linux-kbuild.


>  arch/unicore32/lib/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/unicore32/lib/Makefile b/arch/unicore32/lib/Makefile
> index 098981a01841..5af06645b8f0 100644
> --- a/arch/unicore32/lib/Makefile
> +++ b/arch/unicore32/lib/Makefile
> @@ -10,12 +10,12 @@ lib-y   += strncpy_from_user.o strnlen_user.o
>  lib-y  += clear_user.o copy_page.o
>  lib-y  += copy_from_user.o copy_to_user.o
>
> -GNU_LIBC_A := $(shell $(CC) $(KBUILD_CFLAGS) 
> -print-file-name=libc.a)
> +GNU_LIBC_A = $(shell $(CC) $(KBUILD_CFLAGS) 
> -print-file-name=libc.a)
>  GNU_LIBC_A_OBJS:= memchr.o memcpy.o memmove.o memset.o
>  GNU_LIBC_A_OBJS+= strchr.o strrchr.o
>  GNU_LIBC_A_OBJS+= rawmemchr.o  # needed by 
> strrchr.o
>
> -GNU_LIBGCC_A   := $(shell $(CC) $(KBUILD_CFLAGS) 
> -print-file-name=libgcc.a)
> +GNU_LIBGCC_A   = $(shell $(CC) $(KBUILD_CFLAGS) 
> -print-file-name=libgcc.a)
>  GNU_LIBGCC_A_OBJS  := _ashldi3.o _ashrdi3.o _lshrdi3.o
>  GNU_LIBGCC_A_OBJS  += _divsi3.o _modsi3.o _ucmpdi2.o _umodsi3.o 
> _udivsi3.o
>
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] h8300: suppress error messages for 'make clean'

2020-04-30 Thread Masahiro Yamada
On Sat, Apr 25, 2020 at 2:18 PM Masahiro Yamada  wrote:
>
> 'make ARCH=h8300 clean' emits error messages as follows:
>
>   $ make ARCH=h8300 clean
>   gcc: error: missing argument to '-Wframe-larger-than='
>   gcc: error: unrecognized command line option '-mint32'
>
> You can suppress the second one by setting the correct CROSS_COMPILE=,
> but we should not require any compiler for cleaning.
>
> Signed-off-by: Masahiro Yamada 
> ---


Applied to linux-kbuild.


>
>  arch/h8300/boot/compressed/Makefile | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/h8300/boot/compressed/Makefile 
> b/arch/h8300/boot/compressed/Makefile
> index 9e2701069bbe..5942793f77a0 100644
> --- a/arch/h8300/boot/compressed/Makefile
> +++ b/arch/h8300/boot/compressed/Makefile
> @@ -18,7 +18,7 @@ CONFIG_MEMORY_START ?= 0x0040
>  CONFIG_BOOT_LINK_OFFSET ?= 0x0028
>  IMAGE_OFFSET := $(shell printf "0x%08x" 
> $$(($(CONFIG_MEMORY_START)+$(CONFIG_BOOT_LINK_OFFSET
>
> -LIBGCC := $(shell $(CROSS-COMPILE)$(CC) $(KBUILD_CFLAGS) 
> -print-libgcc-file-name)
> +LIBGCC := $(shell $(CROSS-COMPILE)$(CC) $(KBUILD_CFLAGS) 
> -print-libgcc-file-name 2>/dev/null)
>  LDFLAGS_vmlinux := -Ttext $(IMAGE_OFFSET) -estartup -T $(obj)/vmlinux.lds \
> --defsym output=$(CONFIG_MEMORY_START)
>
> --
> 2.25.1
>


-- 
Best Regards
Masahiro Yamada


[PATCH v2 0/2] Fix PCI HyperV device error handling

2020-04-30 Thread Wei Hu
This series better handles some PCI HyperV error cases in general and
for kdump case. Some of review comments from previous individual
patch reviews, including splitting into separate patches, have
already been incorporated.

Thanks,
Wei


Wei Hu (2):
  PCI: hv: Fix the PCI HyperV probe failure path to release resource
properly
  PCI: hv: Retry PCI bus D0 entry when the first attempt failed with
invalid device state

 drivers/pci/controller/pci-hyperv.c | 60 ++---
 1 file changed, 54 insertions(+), 6 deletions(-)

-- 
2.20.1



Re: [PATCH v2 1/4] kbuild: use $(CC_VERSION_TEXT) to evaluate CC_IS_GCC and CC_IS_CLANG

2020-04-30 Thread Masahiro Yamada
On Thu, Apr 23, 2020 at 11:24 PM Masahiro Yamada  wrote:
>
> The result of '$(CC) --version | head -n 1' is already computed by the
> top Makefile, and stored in the environment variable, CC_VERSION_TEXT.
>
> 'echo' is probably less expensive than the two commands $(CC) and
> 'head' although this optimization is not noticeable level.
>
> Signed-off-by: Masahiro Yamada 

Applied to linux-kbuild.



> ---
>
> Changes in v2:
>   - new patch
>
>  init/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 9e22ee8fbd75..5f797df3f043 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -9,7 +9,7 @@ config DEFCONFIG_LIST
> default "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)"
>
>  config CC_IS_GCC
> -   def_bool $(success,$(CC) --version | head -n 1 | grep -q gcc)
> +   def_bool $(success,echo "$(CC_VERSION_TEXT)" | grep -q gcc)
>
>  config GCC_VERSION
> int
> @@ -21,7 +21,7 @@ config LD_VERSION
> default $(shell,$(LD) --version | $(srctree)/scripts/ld-version.sh)
>
>  config CC_IS_CLANG
> -   def_bool $(success,$(CC) --version | head -n 1 | grep -q clang)
> +   def_bool $(success,echo "$(CC_VERSION_TEXT)" | grep -q clang)
>
>  config CLANG_VERSION
> int
> --
> 2.25.1
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to clang-built-linux+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/clang-built-linux/20200423142354.312088-1-masahiroy%40kernel.org.



-- 
Best Regards
Masahiro Yamada


Re: [PATCH] hexagon: suppress error message for 'make clean'

2020-04-30 Thread Masahiro Yamada
On Sat, Apr 25, 2020 at 4:06 AM Brian Cain  wrote:
>
> > -Original Message-
> > From: linux-hexagon-ow...@vger.kernel.org  > ow...@vger.kernel.org> On Behalf Of Masahiro Yamada
> ...
> > 'make ARCH=hexagon clean' emits an error message as follows:
> >
> >   $ make ARCH=hexagon clean
> >   gcc: error: unrecognized command line option '-G0'
> >
> > You can suppress it by setting the correct CROSS_COMPILE=, but we should
> > not require any compiler for cleaning.
> >
> > Signed-off-by: Masahiro Yamada 
>
>
> Acked-by: Brian Cain 

Applied to linux-kbuild. Thanks.


>
> > ---
> >
> >  arch/hexagon/Makefile | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/hexagon/Makefile b/arch/hexagon/Makefile index
> > 4c5858b80f0e..c168c6980d05 100644
> > --- a/arch/hexagon/Makefile
> > +++ b/arch/hexagon/Makefile
> > @@ -30,7 +30,7 @@ TIR_NAME := r19
> >  KBUILD_CFLAGS += -ffixed-$(TIR_NAME) -
> > DTHREADINFO_REG=$(TIR_NAME) -D__linux__  KBUILD_AFLAGS += -
> > DTHREADINFO_REG=$(TIR_NAME)
> >
> > -LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name)
> > +LIBGCC := $(shell $(CC) $(KBUILD_CFLAGS) -print-libgcc-file-name
> > +2>/dev/null)
> >  libs-y += $(LIBGCC)
> >
> >  head-y := arch/hexagon/kernel/head.o
> > --
> > 2.25.1
>
>


-- 
Best Regards
Masahiro Yamada


Re: [RFC][PATCH] x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up

2020-04-30 Thread Josh Poimboeuf
On Thu, Apr 30, 2020 at 11:47:33PM -0500, Josh Poimboeuf wrote:
> On Thu, Apr 30, 2020 at 08:21:47PM -0400, Steven Rostedt wrote:
> > The cause is the "ftrace=function" would register the function tracer
> > and create a trampoline, and it will set it as executable and
> > read-only. Then the "trace_options=func_stack_trace" would then update
> > the same trampoline to include the stack tracer version of the function
> > tracer. But since the trampoline already exists, it updates it with
> > text_poke_bp(). The problem is that text_poke_bp() called while
> > system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
> > the page mapping, as it would think that the text is still read-write.
> > But in this case it is not, and we take a fault and crash.
> > 
> > Instead, lets keep the ftrace trampolines read-write during boot up,
> > and then when the kernel executable text is set to read-only, the
> > ftrace trampolines get set to read-only as well.
> 
> Would it be easier to just call a new __text_poke_bp() which skips the
> SYSTEM_BOOTING check, since you know the trampoline will always be
> read-only?
> 
> Like:

early_trace_init() is called after mm_init(), so I thought it might
work, but I guess not:

[0.206271] Starting tracer 'function'
[0.232032] BUG: kernel NULL pointer dereference, address: 0050
[0.232035] #PF: supervisor read access in kernel mode
[0.232036] #PF: error_code(0x) - not-present page
[0.232037] PGD 0 P4D 0 
[0.232040] Oops:  [#1] PREEMPT SMP PTI
[0.232042] CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-rc2-next-20200424+ 
#127
[0.232043] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-2.fc30 04/01/2014
[0.232047] RIP: 0010:walk_to_pmd+0x11/0x140
[0.232048] Code: 78 c1 32 82 e8 a0 68 ff ff 0f 0b 66 66 2e 0f 1f 84 00 00 
00 00 00 0f 1f 00 0f 1f 44 00 00 48 89 f0 48 c1 e8 24 25 f8 0f 00 00 <48> 03 47 
50 0f 84 12 01 00 00 41 54 49 89 fc 55 48 89 c5 53 48 8b
[0.232050] RSP: :82603c68 EFLAGS: 00010046
[0.232051] RAX:  RBX:  RCX: 
[0.232052] RDX: 82603cc8 RSI:  RDI: 
[0.232053] RBP: 8063 R08: 0001 R09: fff00fff
[0.232054] R10:  R11:  R12: 82603cc8
[0.232055] R13:  R14: 0066 R15: feff
[0.232056] FS:  () GS:88813b60() 
knlGS:
[0.232057] CS:  0010 DS:  ES:  CR0: 80050033
[0.232058] CR2: 0050 CR3: 02612001 CR4: 000606b0
[0.232061] DR0:  DR1:  DR2: 
[0.232062] DR3:  DR6: fffe0ff0 DR7: 0400
[0.232063] Call Trace:
[0.232066]  __get_locked_pte+0x19/0x100
[0.232069]  ? 0xa065
[0.232071]  __text_poke+0x14d/0x640
[0.232073]  ? 0xa065
[0.232076]  text_poke_bp_batch+0x8b/0x1c0
[0.232078]  ? 0xa065
[0.232080]  __text_poke_bp+0x3a/0x60
[0.232083]  arch_ftrace_update_trampoline+0xac/0x300
[0.232087]  __register_ftrace_function+0x7c/0xc0
[0.232089]  ftrace_startup+0x1e/0xf0
[0.232091]  register_ftrace_function+0x24/0x70
[0.232093]  func_set_flag+0x6f/0x90
[0.232096]  __set_tracer_option.isra.0+0x24/0x50
[0.232098]  trace_set_options+0x149/0x160
[0.232102]  apply_trace_boot_options+0x44/0x6d
[0.232105]  register_tracer+0x1d8/0x1e9
[0.232107]  early_trace_init+0x266/0x378
[0.232109]  start_kernel+0x337/0x614
[0.232112]  ? x86_family+0x5/0x20
[0.232114]  secondary_startup_64+0xa4/0xb0
[0.232119] Modules linked in:
[0.232120] CR2: 0050
[0.232126] random: get_random_bytes called from 
print_oops_end_marker+0x26/0x40 with crng_init=0
[0.232128] ---[ end trace 71e23a89b9b5224f ]---
[0.232130] RIP: 0010:walk_to_pmd+0x11/0x140
[0.232131] Code: 78 c1 32 82 e8 a0 68 ff ff 0f 0b 66 66 2e 0f 1f 84 00 00 
00 00 00 0f 1f 00 0f 1f 44 00 00 48 89 f0 48 c1 e8 24 25 f8 0f 00 00 <48> 03 47 
50 0f 84 12 01 00 00 41 54 49 89 fc 55 48 89 c5 53 48 8b
[0.232133] RSP: :82603c68 EFLAGS: 00010046
[0.232134] RAX:  RBX:  RCX: 
[0.232135] RDX: 82603cc8 RSI:  RDI: 
[0.232136] RBP: 8063 R08: 0001 R09: fff00fff
[0.232137] R10:  R11:  R12: 82603cc8
[0.232138] R13:  R14: 0066 R15: feff
[0.232139] FS:  () GS:88813b60() 
knlGS:
[0.232140] CS:  0010 DS:  ES:  CR0: 80050033
[0.232141] CR2: 0050 CR3: 02612001 CR4: 000606b0
[0.232142] DR0: 

Re: BPFilter: bit size mismatch between bpfiter_umh and vmliux

2020-04-30 Thread Alexei Starovoitov
On Thu, Apr 30, 2020 at 9:06 PM Masahiro Yamada  wrote:
>
> Hi Alexei,
>
> On Wed, Apr 29, 2020 at 1:14 AM Alexei Starovoitov
>  wrote:
> > >
> > > At least, the build was successful,
> > > but does this work at runtime?
> > >
> > > If this is a bug, I can fix it cleanly.
> > >
> > > I think the bit size of the user mode helper
> > > should match to the kernel bit size. Is this correct?
> >
> > yes. they should match.
> > In theory we can have -m32 umh running on 64-bit kernel,
> > but I wouldn't bother adding support for such thing
> > until there is a use case.
> > Running 64-bit umh on 32-bit kernel is no go.
>
>
> Thanks for the comments.
>
>
> This issue will be fixed by this:
> https://patchwork.kernel.org/patch/11515997/
>
> and the Makefile will be cleaned up by this:
> https://patchwork.kernel.org/patch/11515995/
>
>
> They are parts of the big series of Makefile cleanups.
> So, I will apply the whole to kbuild tree.

thank you.
I saw the patches, but didn't have time to test or comment on them.
To be fair umh logic was bit rotting a bit, but that will change soon.


Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume

2020-04-30 Thread Can Guo

On 2020-05-01 09:50, Bart Van Assche wrote:

On 2020-04-30 18:42, Can Guo wrote:

On 2020-05-01 04:32, Bart Van Assche wrote:
> Has it been considered to test directly whether a SCSI device has been
> runtime suspended instead of relying on blk_queue_pm_only()? How about
> using pm_runtime_status_suspended() or adding a function in
> block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?

Yes, I used to make the patch like that way, and it also worked well, 
as

both ways are equal actually. I kinda like the current code because we
should be confident that after scsi_dev_type_resume() returns, pm_only
must be 0. Different reviewers may have different opinions, either way
works well anyways.


Hi Can,

Please note that this is not a matter of personal preferences of a
reviewer but a matter of correctness. blk_queue_pm_only() does not only
return a value > 0 if a SCSI device has been runtime suspended but also
returns true if scsi_device_quiesce() was called for another reason.
Hence my request to test the "runtime suspended" status directly and 
not

to rely on blk_queue_pm_only().

Thanks,

Bart.


Hi Bart,

I agree we are pursuing correctness here, but as I said, I think both
way are equally correct. I also agree with you that the alternative way,
see [2], is much easier to be understood, we can take the alternative 
way

if you are OK with it.

[1] Currently, scsi_dev_type_resume() is the hooker for resume, thaw and
restore. Per my understanding, when scsi_dev_type_resume() is running,
it is not possible that scsi_device_quiesce() can be called to this 
sdev,

at least not possible in current code base. So it is OK to rely on
blk_queue_pm_only() in scsi_dev_type_resume().

[2] The alternative way which I have tested with is like below. I think
it is what you requested for if my understanding is right, please 
correct

me if I am wrong.

diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index 3717eea..d18271d 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -74,12 +74,15 @@ static int scsi_dev_type_resume(struct device *dev,
 {
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : 
NULL;

int err = 0;
+   bool was_rpm_suspended = false;

err = cb(dev, pm);
scsi_device_resume(to_scsi_device(dev));
dev_dbg(dev, "scsi resume: %d\n", err);

if (err == 0) {
+   was_rpm_suspended = pm_runtime_suspended(dev);
+
pm_runtime_disable(dev);
err = pm_runtime_set_active(dev);
pm_runtime_enable(dev);
@@ -93,8 +96,10 @@ static int scsi_dev_type_resume(struct device *dev,
 */
if (!err && scsi_is_sdev_device(dev)) {
struct scsi_device *sdev = to_scsi_device(dev);
-
-   blk_set_runtime_active(sdev->request_queue);
+   if (was_rpm_suspended)
+   
blk_post_runtime_resume(sdev->request_queue, 0);

+   else
+   
blk_set_runtime_active(sdev->request_queue);

}
}

Thanks,

Can Guo


Re: [RFC][PATCH] x86/ftrace: Have ftrace trampolines turn read-only at the end of system boot up

2020-04-30 Thread Josh Poimboeuf
On Thu, Apr 30, 2020 at 08:21:47PM -0400, Steven Rostedt wrote:
> The cause is the "ftrace=function" would register the function tracer
> and create a trampoline, and it will set it as executable and
> read-only. Then the "trace_options=func_stack_trace" would then update
> the same trampoline to include the stack tracer version of the function
> tracer. But since the trampoline already exists, it updates it with
> text_poke_bp(). The problem is that text_poke_bp() called while
> system_state == SYSTEM_BOOTING, it will simply do a memcpy() and not
> the page mapping, as it would think that the text is still read-write.
> But in this case it is not, and we take a fault and crash.
> 
> Instead, lets keep the ftrace trampolines read-write during boot up,
> and then when the kernel executable text is set to read-only, the
> ftrace trampolines get set to read-only as well.

Would it be easier to just call a new __text_poke_bp() which skips the
SYSTEM_BOOTING check, since you know the trampoline will always be
read-only?

Like:

diff --git a/arch/x86/include/asm/text-patching.h 
b/arch/x86/include/asm/text-patching.h
index 67315fa3956a..710106256916 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -45,6 +45,7 @@ extern void *text_poke(void *addr, const void *opcode, size_t 
len);
 extern void text_poke_sync(void);
 extern void *text_poke_kgdb(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
+extern void __text_poke_bp(void *addr, const void *opcode, size_t len, const 
void *emulate);
 extern void text_poke_bp(void *addr, const void *opcode, size_t len, const 
void *emulate);
 
 extern void text_poke_queue(void *addr, const void *opcode, size_t len, const 
void *emulate);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 7867dfb3963e..9cc983cc9291 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1265,6 +1265,14 @@ void __ref text_poke_queue(void *addr, const void 
*opcode, size_t len, const voi
text_poke_loc_init(tp, addr, opcode, len, emulate);
 }
 
+void __ref __text_poke_bp(void *addr, const void *opcode, size_t len, const 
void *emulate)
+{
+   struct text_poke_loc tp;
+
+   text_poke_loc_init(, addr, opcode, len, emulate);
+   text_poke_bp_batch(, 1);
+}
+
 /**
  * text_poke_bp() -- update instructions on live kernel on SMP
  * @addr:  address to patch
@@ -1278,13 +1286,10 @@ void __ref text_poke_queue(void *addr, const void 
*opcode, size_t len, const voi
  */
 void __ref text_poke_bp(void *addr, const void *opcode, size_t len, const void 
*emulate)
 {
-   struct text_poke_loc tp;
-
if (unlikely(system_state == SYSTEM_BOOTING)) {
text_poke_early(addr, opcode, len);
return;
}
 
-   text_poke_loc_init(, addr, opcode, len, emulate);
-   text_poke_bp_batch(, 1);
+   __text_poke_bp(addr, opcode, len, emulate);
 }
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 867c126ddabe..c36f51f01f6e 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -469,7 +469,7 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
mutex_lock(_mutex);
/* Do a safe modify in case the trampoline is executing */
new = ftrace_call_replace(ip, (unsigned long)func);
-   text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL);
+   __text_poke_bp((void *)ip, new, MCOUNT_INSN_SIZE, NULL);
mutex_unlock(_mutex);
 }
 



[PATCH v2] vhost: vsock: kick send_pkt worker once device is started

2020-04-30 Thread Jia He
Ning Bo reported an abnormal 2-second gap when booting Kata container [1].
The unconditional timeout was caused by VSOCK_DEFAULT_CONNECT_TIMEOUT of
connecting from the client side. The vhost vsock client tries to connect
an initializing virtio vsock server.

The abnormal flow looks like:
host-userspace   vhost vsock   guest vsock
==   ===   
connect() >  vhost_transport_send_pkt_work()   initializing
   | vq->private_data==NULL
   | will not be queued
   V
schedule_timeout(2s)
 vhost_vsock_start()  <-   device ready
 set vq->private_data

wait for 2s and failed
connect() again  vq->private_data!=NULL recv connecting pkt

Details:
1. Host userspace sends a connect pkt, at that time, guest vsock is under
   initializing, hence the vhost_vsock_start has not been called. So
   vq->private_data==NULL, and the pkt is not been queued to send to guest
2. Then it sleeps for 2s
3. After guest vsock finishes initializing, vq->private_data is set
4. When host userspace wakes up after 2s, send connecting pkt again,
   everything is fine.

As suggested by Stefano Garzarella, this fixes it by additional kicking the
send_pkt worker in vhost_vsock_start once the virtio device is started. This
makes the pending pkt sent again.

After this patch, kata-runtime (with vsock enabled) boot time is reduced
from 3s to 1s on a ThunderX2 arm64 server.

[1] https://github.com/kata-containers/runtime/issues/1917

Reported-by: Ning Bo 
Suggested-by: Stefano Garzarella 
Signed-off-by: Jia He 
---
v2: new solution suggested by Stefano Garzarella

 drivers/vhost/vsock.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index e36aaf9ba7bd..0716a9cdffee 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -543,6 +543,11 @@ static int vhost_vsock_start(struct vhost_vsock *vsock)
mutex_unlock(>mutex);
}
 
+   /* Some packets may have been queued before the device was started,
+* let's kick the send worker to send them.
+*/
+   vhost_work_queue(>dev, >send_pkt_work);
+
mutex_unlock(>dev.mutex);
return 0;
 
-- 
2.17.1



[PATCH kunit-next] kunit: kunit_tool: Separate out config/build/exec/parse

2020-04-30 Thread David Gow
Add new subcommands to kunit.py to allow stages of the existing 'run'
subcommand to be run independently:
- 'config': Verifies that .config is a subset of .kunitconfig
- 'build': Compiles a UML kernel for KUnit
- 'exec': Runs the kernel, and outputs the test results.
- 'parse': Parses test results from a file or stdin

The 'run' command continues to behave as before.

Signed-off-by: David Gow 
Reviewed-by: Brendan Higgins 
---

Changes since the RFC[1]:
- Rebased on top of kselftest/kunit, adding support for --alltests and
  --make_options
- Fixed an issue where the help text wouldn't print properly if an
  invalid subcommand were specified.
- Changed 'parse' subcommand to make the filename optional (and default
  to stdin) as suggested by Brendan.

[1]: https://lkml.org/lkml/2020/3/11/1225

 tools/testing/kunit/kunit.py   | 293 -
 tools/testing/kunit/kunit_tool_test.py |  63 +-
 2 files changed, 296 insertions(+), 60 deletions(-)

diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 7dca74774dd2..b01838b6f5f9 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -20,8 +20,17 @@ import kunit_config
 import kunit_kernel
 import kunit_parser
 
-KunitResult = namedtuple('KunitResult', ['status','result'])
+KunitResult = namedtuple('KunitResult', ['status','result','elapsed_time'])
 
+KunitConfigRequest = namedtuple('KunitConfigRequest',
+   ['build_dir', 'defconfig', 'make_options'])
+KunitBuildRequest = namedtuple('KunitBuildRequest',
+  ['jobs', 'build_dir', 'alltests',
+   'make_options'])
+KunitExecRequest = namedtuple('KunitExecRequest',
+ ['timeout', 'build_dir', 'alltests'])
+KunitParseRequest = namedtuple('KunitParseRequest',
+  ['raw_output', 'input_data'])
 KunitRequest = namedtuple('KunitRequest', ['raw_output','timeout', 'jobs',
   'build_dir', 'defconfig',
   'alltests', 'make_options'])
@@ -46,14 +55,25 @@ def get_kernel_root_path():
sys.exit(1)
return parts[0]
 
-def run_tests(linux: kunit_kernel.LinuxSourceTree,
- request: KunitRequest) -> KunitResult:
+def config_tests(linux: kunit_kernel.LinuxSourceTree,
+request: KunitConfigRequest) -> KunitResult:
+   kunit_parser.print_with_timestamp('Configuring KUnit Kernel ...')
+
config_start = time.time()
+   if request.defconfig:
+   create_default_kunitconfig()
success = linux.build_reconfig(request.build_dir, request.make_options)
config_end = time.time()
if not success:
-   return KunitResult(KunitStatus.CONFIG_FAILURE, 'could not 
configure kernel')
+   return KunitResult(KunitStatus.CONFIG_FAILURE,
+  'could not configure kernel',
+  config_end - config_start)
+   return KunitResult(KunitStatus.SUCCESS,
+  'configured kernel successfully',
+  config_end - config_start)
 
+def build_tests(linux: kunit_kernel.LinuxSourceTree,
+   request: KunitBuildRequest) -> KunitResult:
kunit_parser.print_with_timestamp('Building KUnit Kernel ...')
 
build_start = time.time()
@@ -64,78 +84,167 @@ def run_tests(linux: kunit_kernel.LinuxSourceTree,
build_end = time.time()
if not success:
return KunitResult(KunitStatus.BUILD_FAILURE, 'could not build 
kernel')
+   if not success:
+   return KunitResult(KunitStatus.BUILD_FAILURE,
+  'could not build kernel',
+  build_end - build_start)
+   return KunitResult(KunitStatus.SUCCESS,
+  'built kernel successfully',
+  build_end - build_start)
 
+def exec_tests(linux: kunit_kernel.LinuxSourceTree,
+  request: KunitExecRequest) -> KunitResult:
kunit_parser.print_with_timestamp('Starting KUnit Kernel ...')
test_start = time.time()
-   kunit_output = linux.run_kernel(
+   result = linux.run_kernel(
timeout=None if request.alltests else request.timeout,
build_dir=request.build_dir)
+
+   test_end = time.time()
+
+   return KunitResult(KunitStatus.SUCCESS,
+  result,
+  test_end - test_start)
+
+def parse_tests(request: KunitParseRequest) -> KunitResult:
+   parse_start = time.time()
+
+   test_result = kunit_parser.TestResult(kunit_parser.TestStatus.SUCCESS,
+ [],
+ 'Tests not Parsed.')
if request.raw_output:
-   raw_output = 

Re: [PATCH v3 3/5] fs: Enable to enforce noexec mounts or file exec through RESOLVE_MAYEXEC

2020-04-30 Thread James Morris
On Tue, 28 Apr 2020, Mickaël Salaün wrote:

> Enable to either propagate the mount options from the underlying VFS
> mount to prevent execution, or to propagate the file execute permission.
> This may allow a script interpreter to check execution permissions
> before reading commands from a file.

I'm finding the description of this patch difficult to understand.

In the first case, this seems to mean: if you open a file with 
RESOLVE_MAYEXEC from a noexec mount, then it will fail. Correct?

In the second case, do you mean a RESOLVE_MAYEXEC open will fail if the 
file does not have +x set for the user?


> The main goal is to be able to protect the kernel by restricting
> arbitrary syscalls that an attacker could perform with a crafted binary
> or certain script languages.

This sounds like the job of seccomp. Why is this part of MAYEXEC?

>  It also improves multilevel isolation
> by reducing the ability of an attacker to use side channels with
> specific code.  These restrictions can natively be enforced for ELF
> binaries (with the noexec mount option) but require this kernel
> extension to properly handle scripts (e.g., Python, Perl).

Again, not sure why you're talking about side channels and MAYEXEC and 
mount options. Are you more generally talking about being able to prevent 
execution of arbitrary script files included by an interpreter?

> Add a new sysctl fs.open_mayexec_enforce to control this behavior.
> Indeed, because of compatibility with installed systems, only the system
> administrator is able to check that this new enforcement is in line with
> the system mount points and file permissions.  A following patch adds
> documentation.

I don't like the idea of any of this feature set being configurable. 
RESOLVE_MAYEXEC as a new flag should have well-defined, stable semantics.


-- 
James Morris


Re: [PATCH] pipe: read/write_iter() handler should check for IOCB_NOWAIT

2020-04-30 Thread Jens Axboe
On 4/30/20 10:14 PM, Jens Axboe wrote:
> On 4/30/20 9:58 PM, Al Viro wrote:
>> On Thu, Apr 30, 2020 at 10:24:46AM -0600, Jens Axboe wrote:
>>> Pipe read/write only checks for the file O_NONBLOCK flag, but we should
>>> also check for IOCB_NOWAIT for whether or not we should handle this read
>>> or write in a non-blocking fashion. If we don't, then we will block on
>>> data or space for iocbs that explicitly asked for non-blocking
>>> operation. This messes up callers that explicitly ask for non-blocking
>>> operations.
>>
>> Why does io_uring allow setting IOCB_NOWAIT without FMODE_NOWAIT, anyway?
> 
> To do per-io non-blocking operations. It's not practical or convenient
> to flip the file flag, nor does it work if you have multiple of them
> going. If pipes honor the flag for the read/write iter handlers, then
> we can handle them a lot more efficiently instead of requiring async
> offload.

Sorry, I think I misread you and the answer, while correct by itself, is
not the answer to the question you are asking. You're saying we should
not be using IOCB_NOWAIT if FMODE_NOWAIT isn't set, which is fair. I'll
re-do the patch, we can probably just use FMODE_NOWAIT for the final
check in io_uring instead.

For pipes, we should include the setting of FMODE_NOWAIT for fifo_open()
with the patch, at least.

-- 
Jens Axboe



Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()

2020-04-30 Thread Steven Rostedt
On Thu, 30 Apr 2020 22:26:55 -0400 (EDT)
Mathieu Desnoyers  wrote:

> The tracers just have to make sure they perform their vmalloc'd memory
> allocation before registering the tracepoint which can touch it, else they
> need to issue vmalloc_sync_mappings() on their own before making the
> newly allocated memory observable by instrumentation.

What gets me is that I added the patch below (which adds a
vmalloc_sync_mappings() just after the alloc_percpu()), but I also recorded
all instances of vmalloc() with a stackdump, and I get this:

  colord-1673  [002] 84.764804: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
  colord-1673  [002] 84.764807: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => module_alloc+0x7e/0xd0
 => bpf_jit_binary_alloc+0x70/0x110
 => bpf_int_jit_compile+0x139/0x40a
 => bpf_prog_select_runtime+0xa3/0x120
 => bpf_prepare_filter+0x533/0x5a0
 => sk_attach_filter+0x13/0x50
 => sock_setsockopt+0xd2f/0xf90
 => __sys_setsockopt+0x18a/0x1a0
 => __x64_sys_setsockopt+0x20/0x30
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0


[ the above is from before the tracing started ]

   trace-cmd-1687  [002]    103.908850: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1687  [002]    103.908856: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
   trace-cmd-1697  [003]    104.088950: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1697  [003]    104.088954: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
   trace-cmd-1697  [003]    104.089666: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1697  [003]    104.089669: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0xc1/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
   trace-cmd-1697  [003]    104.098920: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1697  [003]    104.098924: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0xc1/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
   trace-cmd-1697  [003]    104.114518: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1697  [003]    104.114520: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0xc1/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
   trace-cmd-1697  [003]    104.130705: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1697  [003]    104.130707: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => event_pid_write.isra.30+0x21b/0x3b0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
   trace-cmd-1687  [001]    106.000510: __vmalloc_node_range+0x5/0x2c0: 
vmalloc called here
   trace-cmd-1687  [001]    106.000514: 
 => __ftrace_trace_stack+0x161/0x1a0
 => __vmalloc_node_range+0x4d/0x2c0
 => vzalloc+0x48/0x50
 => trace_pid_write+0x23d/0x2b0
 => pid_write.isra.62+0xd1/0x2f0
 => vfs_write+0xa8/0x1b0
 => ksys_write+0x67/0xe0
 => do_syscall_64+0x60/0x230
 => entry_SYSCALL_64_after_hwframe+0x49/0xb3
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0
 => 0

The above is the calls to adding pids to set_event_pid. (I see I should
probably make that code a bit more efficient, it calls the vmalloc code a
bit too much).

But what is missing, is the call to vmalloc from alloc_percpu(). In fact, I
put in printks in the vmalloc() that's in alloc_percpu() and it doesn't
trigger from the tracing code, and it does show up in my trace 

Re: linux-next: manual merge of the mlx5-next tree with the kspp-gustavo tree

2020-04-30 Thread Gustavo A. R. Silva
Stephen,

On 4/30/20 22:30, Gustavo A. R. Silva wrote:

>> This is now a conflict between the net-next and kspp-gustavo trees.
>>
> 
> Thanks for reporting this. I think the best solution, for now, is to remove 
> the
> changes from my tree. I'll do it right away.
> 

I just updated my -next tree. This conflict should be resolved now. :)

Thanks
--
Gustavo


Re: [PATCH] pipe: read/write_iter() handler should check for IOCB_NOWAIT

2020-04-30 Thread Jens Axboe
On 4/30/20 9:58 PM, Al Viro wrote:
> On Thu, Apr 30, 2020 at 10:24:46AM -0600, Jens Axboe wrote:
>> Pipe read/write only checks for the file O_NONBLOCK flag, but we should
>> also check for IOCB_NOWAIT for whether or not we should handle this read
>> or write in a non-blocking fashion. If we don't, then we will block on
>> data or space for iocbs that explicitly asked for non-blocking
>> operation. This messes up callers that explicitly ask for non-blocking
>> operations.
> 
> Why does io_uring allow setting IOCB_NOWAIT without FMODE_NOWAIT, anyway?

To do per-io non-blocking operations. It's not practical or convenient
to flip the file flag, nor does it work if you have multiple of them
going. If pipes honor the flag for the read/write iter handlers, then
we can handle them a lot more efficiently instead of requiring async
offload.

-- 
Jens Axboe



Re: [PATCH] drm/mediatek: cleanup coding style in mediatek a bit

2020-04-30 Thread kbuild test robot
Hi Bernard,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on pza/reset/next]
[also build test ERROR on drm-intel/for-linux-next drm-tip/drm-tip linus/master 
v5.7-rc3 next-20200430]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/Bernard-Zhao/drm-mediatek-cleanup-coding-style-in-mediatek-a-bit/20200428-055750
base:   https://git.pengutronix.de/git/pza/linux reset/next
config: arm64-allyesconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross 
ARCH=arm64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   drivers/gpu/drm/mediatek/mtk_hdmi.c: In function 
'mtk_hdmi_hw_send_info_frame':
>> drivers/gpu/drm/mediatek/mtk_hdmi.c:1807: error: unterminated argument list 
>> invoking macro "dev_err"
1807 | MODULE_LICENSE("GPL v2");
 | 
>> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: error: 'dev_err' undeclared 
>> (first use in this function); did you mean '_dev_err'?
 316 |   dev_err(hdmi->dev, "Wrong frame len: %d\n", frame_len;
 |   ^~~
 |   _dev_err
   drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: note: each undeclared identifier 
is reported only once for each function it appears in
>> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:10: error: expected ';' at end of 
>> input
 316 |   dev_err(hdmi->dev, "Wrong frame len: %d\n", frame_len;
 |  ^
 |  ;
   ..
1807 | MODULE_LICENSE("GPL v2");
 |   
>> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: error: expected declaration or 
>> statement at end of input
 316 |   dev_err(hdmi->dev, "Wrong frame len: %d\n", frame_len;
 |   ^~~
>> drivers/gpu/drm/mediatek/mtk_hdmi.c:316:3: error: expected declaration or 
>> statement at end of input
   drivers/gpu/drm/mediatek/mtk_hdmi.c:308:6: warning: unused variable 
'ctrl_frame_en' [-Wunused-variable]
 308 |  int ctrl_frame_en = 0;
 |  ^
   drivers/gpu/drm/mediatek/mtk_hdmi.c:302:6: warning: unused variable 'i' 
[-Wunused-variable]
 302 |  int i;
 |  ^
   drivers/gpu/drm/mediatek/mtk_hdmi.c:301:6: warning: unused variable 
'ctrl_reg' [-Wunused-variable]
 301 |  u32 ctrl_reg = GRL_CTRL;
 |  ^~~~
   At top level:
   drivers/gpu/drm/mediatek/mtk_hdmi.c:298:13: warning: 
'mtk_hdmi_hw_send_info_frame' defined but not used [-Wunused-function]
 298 | static void mtk_hdmi_hw_send_info_frame(struct mtk_hdmi *hdmi, u8 
*buffer,
 | ^~~
   drivers/gpu/drm/mediatek/mtk_hdmi.c:293:13: warning: 
'mtk_hdmi_hw_enable_dvi_mode' defined but not used [-Wunused-function]
 293 | static void mtk_hdmi_hw_enable_dvi_mode(struct mtk_hdmi *hdmi, bool 
enable)
 | ^~~
   drivers/gpu/drm/mediatek/mtk_hdmi.c:288:13: warning: 
'mtk_hdmi_hw_write_int_mask' defined but not used [-Wunused-function]
 288 | static void mtk_hdmi_hw_write_int_mask(struct mtk_hdmi *hdmi, u32 
int_mask)
 | ^~
   drivers/gpu/drm/mediatek/mtk_hdmi.c:282:13: warning: 
'mtk_hdmi_hw_enable_notice' defined but not used [-Wunused-function]
 282 | static void mtk_hdmi_hw_enable_notice(struct mtk_hdmi *hdmi, bool 
enable_notice)
 | ^
   drivers/gpu/drm/mediatek/mtk_hdmi.c:271:13: warning: 'mtk_hdmi_hw_reset' 
defined but not used [-Wunused-function]
 271 | static void mtk_hdmi_hw_reset(struct mtk_hdmi *hdmi)
 | ^
   drivers/gpu/drm/mediatek/mtk_hdmi.c:266:13: warning: 
'mtk_hdmi_hw_aud_unmute' defined but not used [-Wunused-function]
 266 | static void mtk_hdmi_hw_aud_unmute(struct mtk_hdmi *hdmi)
 | ^~
   drivers/gpu/drm/mediatek/mtk_hdmi.c:261:13: warning: 'mtk_hdmi_hw_aud_mute' 
defined but not used [-Wunused-function]
 261 | static void mtk_hdmi_hw_aud_mute(struct mtk_hdmi *hdmi)
 | ^~~~
   drivers/gpu/drm/mediatek/mtk_hdmi.c:255:13: warning: 
'mtk_hdmi_hw_1p4_version_enable' defined but not used [-Wunused-function]
 255 | static void mtk_hdmi_hw_1p4_version_enable(struct mtk_hdmi *hdmi, 
bool enable)
 | ^~~

Re: BPFilter: bit size mismatch between bpfiter_umh and vmliux

2020-04-30 Thread Masahiro Yamada
Hi Alexei,

On Wed, Apr 29, 2020 at 1:14 AM Alexei Starovoitov
 wrote:
> >
> > At least, the build was successful,
> > but does this work at runtime?
> >
> > If this is a bug, I can fix it cleanly.
> >
> > I think the bit size of the user mode helper
> > should match to the kernel bit size. Is this correct?
>
> yes. they should match.
> In theory we can have -m32 umh running on 64-bit kernel,
> but I wouldn't bother adding support for such thing
> until there is a use case.
> Running 64-bit umh on 32-bit kernel is no go.


Thanks for the comments.


This issue will be fixed by this:
https://patchwork.kernel.org/patch/11515997/

and the Makefile will be cleaned up by this:
https://patchwork.kernel.org/patch/11515995/


They are parts of the big series of Makefile cleanups.
So, I will apply the whole to kbuild tree.

Thanks.

--
Best Regards
Masahiro Yamada


Re: [PATCH v3 1/5] fs: Add support for a RESOLVE_MAYEXEC flag on openat2(2)

2020-04-30 Thread James Morris
On Tue, 28 Apr 2020, Mickaël Salaün wrote:

> When the RESOLVE_MAYEXEC flag is passed, openat2(2) may be subject to
> additional restrictions depending on a security policy managed by the
> kernel through a sysctl or implemented by an LSM thanks to the
> inode_permission hook.


Reviewed-by: James Morris 


-- 
James Morris



linux-next: build warning after merge of the sound tree

2020-04-30 Thread Stephen Rothwell
Hi all,

After merging the sound tree, today's linux-next build (arm
multi_v7_defconfig) produced this warning:

sound/pci/hda/patch_realtek.c: In function 'alc_fixup_hp_gpio_led':
sound/pci/hda/patch_realtek.c:4134:6: warning: unused variable 'err' 
[-Wunused-variable]
 4134 |  int err;
  |  ^~~

Introduced by commit

  87dc36482cab ("ALSA: hda/realtek - Add LED class support for micmute LED")

-- 
Cheers,
Stephen Rothwell


pgpvchfDJI3EV.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 2/5] fs: Add a MAY_EXECMOUNT flag to infer the noexec mount property

2020-04-30 Thread James Morris
On Tue, 28 Apr 2020, Mickaël Salaün wrote:

> An LSM doesn't get path information related to an access request to open
> an inode.  This new (internal) MAY_EXECMOUNT flag enables an LSM to
> check if the underlying mount point of an inode is marked as executable.
> This is useful to implement a security policy taking advantage of the
> noexec mount option.
> 
> This flag is set according to path_noexec(), which checks if a mount
> point is mounted with MNT_NOEXEC or if the underlying superblock is
> SB_I_NOEXEC.
> 
> Signed-off-by: Mickaël Salaün 
> Reviewed-by: Philippe Trébuchet 
> Reviewed-by: Thibaut Sautereau 
> Cc: Aleksa Sarai 
> Cc: Al Viro 
> Cc: Kees Cook 

Are there any existing LSMs which plan to use this aspect?

-- 
James Morris


Re: linux-next: manual merge of the akpm-current tree with the bpf-next tree

2020-04-30 Thread Al Viro
On Wed, Apr 29, 2020 at 06:24:06PM +1000, Stephen Rothwell wrote:
> Hi Christoph,
> 
> On Wed, 29 Apr 2020 08:54:04 +0200 Christoph Hellwig  wrote:
> >
> > On Tue, Apr 28, 2020 at 11:49:34PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Apr 28, 2020 at 11:47 PM Christoph Hellwig  wrote:  
> > > >
> > > > On Wed, Apr 29, 2020 at 04:45:07PM +1000, Stephen Rothwell wrote:  
> > > > >
> > > > > Today's linux-next merge of the akpm-current tree got a conflict in:
> > > > >
> > > > >   kernel/sysctl.c
> > > > >
> > > > > between commit:
> > > > >
> > > > >   f461d2dcd511 ("sysctl: avoid forward declarations")
> > > > >
> > > > > from the bpf-next tree and commits:  
> > > >
> > > > Hmm, the above should have gone in through Al..  
> > > 
> > > Al pushed them into vfs tree and we pulled that tag into bpf-next.  
> > 
> > Ok.  And Stephen pulled your tree first.
> 
> No, it is not in the branch I fetch from Al yet.

Now it is...


Re: [PATCH -next] hinic: Use kmemdup instead of kzalloc and memcpy

2020-04-30 Thread David Miller
From: Zou Wei 
Date: Wed, 29 Apr 2020 11:35:28 +0800

> Fixes coccicheck warnings:
> 
>  drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c:452:17-24: WARNING 
> opportunity for kmemdup
>  drivers/net/ethernet/huawei/hinic/hinic_hw_mbox.c:458:23-30: WARNING 
> opportunity for kmemdup
> 
> Reported-by: Hulk Robot 
> Signed-off-by: Zou Wei 

Applied.


Re: [PATCH] pipe: read/write_iter() handler should check for IOCB_NOWAIT

2020-04-30 Thread Al Viro
On Thu, Apr 30, 2020 at 10:24:46AM -0600, Jens Axboe wrote:
> Pipe read/write only checks for the file O_NONBLOCK flag, but we should
> also check for IOCB_NOWAIT for whether or not we should handle this read
> or write in a non-blocking fashion. If we don't, then we will block on
> data or space for iocbs that explicitly asked for non-blocking
> operation. This messes up callers that explicitly ask for non-blocking
> operations.

Why does io_uring allow setting IOCB_NOWAIT without FMODE_NOWAIT, anyway?


Re: linux-next: Tree for Apr 30 (objtool warnings)

2020-04-30 Thread Josh Poimboeuf
On Thu, Apr 30, 2020 at 07:40:52AM -0700, Randy Dunlap wrote:
> On 4/30/20 7:31 AM, Randy Dunlap wrote:
> > On 4/30/20 12:40 AM, Stephen Rothwell wrote:
> >> Hi all,
> >>
> >> Changes since 20200429:
> >>
> > 
> > on x86_64:
> > 
> > kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x2e: unreachable 
> > instruction
> > kernel/cred.o: warning: objtool: copy_creds()+0x278: unreachable instruction
> > net/core/skbuff.o: warning: objtool: skb_push()+0x9e: unreachable 
> > instruction
> > 
> > 
> > Full randconfig file is attached.
> > 
> 
> oops, forgot to Cc: Josh and Peter.

These are all related to -flive-patching, for some reason it reduces
GCC's ability to detect local noreturn functions.  I had some patches
for these (and more) I'll need to dig up.

-- 
Josh



Re: [PATCH net-next v2 3/4] net: phy: bcm54140: apply the workaround on b0 chips

2020-04-30 Thread David Miller
From: Michael Walle 
Date: Wed, 29 Apr 2020 01:06:58 +0200

> The lower three bits of the phy_id specifies the chip stepping. The
> workaround is specifically for the B0 stepping. Apply it only on these
> chips.
> 
> Signed-off-by: Michael Walle 
> Reviewed-by: Andrew Lunn 
> Reviewed-by: Florian Fainelli 

Applied.


Re: [PATCH net-next v2 4/4] net: phy: bcm54140: add second PHY ID

2020-04-30 Thread David Miller
From: Michael Walle 
Date: Wed, 29 Apr 2020 01:06:59 +0200

> This PHY has two PHY IDs depending on its mode. Adjust the mask so that
> it includes both IDs.
> 
> Signed-off-by: Michael Walle 

Applied.


Re: [PATCH net-next 3/3] ptp: ptp_clockmatrix: Add adjphase() to support PHC write phase mode.

2020-04-30 Thread Richard Cochran
On Wed, Apr 29, 2020 at 08:28:25PM -0400, vincent.cheng...@renesas.com wrote:
> @@ -871,6 +880,69 @@ static int idtcm_set_pll_mode(struct idtcm_channel 
> *channel,
>  
>  /* PTP Hardware Clock interface */
>  
> +/**
> + * @brief Maximum absolute value for write phase offset in picoseconds
> + *
> + * Destination signed register is 32-bit register in resolution of 50ps
> + *
> + * 0x7fff * 50 =  2147483647 * 50 = 107374182350
> + */
> +static int _idtcm_adjphase(struct idtcm_channel *channel, s32 deltaNs)
> +{
> + struct idtcm *idtcm = channel->idtcm;
> +
> + int err;
> + u8 i;
> + u8 buf[4] = {0};
> + s32 phaseIn50Picoseconds;
> + s64 phaseOffsetInPs;

Kernel coding style uses lower_case_underscores rather than lowerCamelCase.

> +
> + if (channel->pll_mode != PLL_MODE_WRITE_PHASE) {
> +
> + kthread_cancel_delayed_work_sync(
> + >write_phase_ready_work);
> +
> + err = idtcm_set_pll_mode(channel, PLL_MODE_WRITE_PHASE);
> +
> + if (err)
> + return err;
> +
> + channel->write_phase_ready = 0;
> +
> + kthread_queue_delayed_work(channel->kworker,
> +>write_phase_ready_work,
> +msecs_to_jiffies(WR_PHASE_SETUP_MS));

Each PHC driver automatically has a kworker provided by the class
layer.  In order to use it, set ptp_clock_info.do_aux_work to your
callback function and then call ptp_schedule_worker() when needed.

See drivers/net/ethernet/ti/cpts.c for example.

Thanks,
Richard



Re: [PATCH net-next v2 2/4] net: phy: bcm54140: fix phy_id_mask

2020-04-30 Thread David Miller
From: Michael Walle 
Date: Wed, 29 Apr 2020 01:06:57 +0200

> Broadcom defines the bits for this PHY as follows:
>   { oui[24:3], model[6:0], revision[2:0] }
> 
> Thus we have to mask the lower three bits only.
> 
> Fixes: 6937602ed3f9 ("net: phy: add Broadcom BCM54140 support")
> Signed-off-by: Michael Walle 
> Reviewed-by: Andrew Lunn 
> Reviewed-by: Florian Fainelli 

Applied.


Re: [PATCH net-next v2 1/4] net: phy: bcm54140: use genphy_soft_reset()

2020-04-30 Thread David Miller
From: Michael Walle 
Date: Wed, 29 Apr 2020 01:06:56 +0200

> Set the .soft_reset() op to be sure there will be a reset even if there
> is no hardware reset line registered.
> 
> Signed-off-by: Michael Walle 
> Reviewed-by: Florian Fainelli 

Applied.


Re: [PATCH net-next] net: phy: at803x: add downshift support

2020-04-30 Thread David Miller
From: Michael Walle 
Date: Tue, 28 Apr 2020 23:15:02 +0200

> The AR8031 and AR8035 support the link speed downshift. Add driver
> support for it. One peculiarity of these PHYs is that it needs a
> software reset after changing the setting, thus add the .soft_reset()
> op and do a phy_init_hw() if necessary.
> 
> This was tested on a custom board with the AR8031.
> 
> Signed-off-by: Michael Walle 

Applied, thank you.


Re: [PATCH v3 0/5] Add support for RESOLVE_MAYEXEC

2020-04-30 Thread James Morris
On Tue, 28 Apr 2020, Mickaël Salaün wrote:

> Furthermore, the security policy can also be delegated to an LSM, either
> a MAC system or an integrity system.  For instance, the new kernel
> MAY_OPENEXEC flag closes a major IMA measurement/appraisal interpreter
> integrity gap by bringing the ability to check the use of scripts [1].
> Other uses are expected, such as for openat2(2) [2], SGX integration
> [3], bpffs [4] or IPE [5].

Confirming that this is a highly desirable feature for the proposed IPE 
LSM.

-- 
James Morris



Re: [PATCH] capabilities: add description for CAP_SETFCAP

2020-04-30 Thread James Morris
On Tue, 28 Apr 2020, Stefan Hajnoczi wrote:

> On Tue, Apr 14, 2020 at 04:49:45PM +0100, Stefan Hajnoczi wrote:
> > Document the purpose of CAP_SETFCAP.  For some reason this capability
> > had no description while the others did.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/uapi/linux/capability.h | 2 ++
> >  1 file changed, 2 insertions(+)
> 
> Ping?

Please resend and I'll add it to my tree.

> 
> > diff --git a/include/uapi/linux/capability.h 
> > b/include/uapi/linux/capability.h
> > index 272dc69fa080..7288f0ad44af 100644
> > --- a/include/uapi/linux/capability.h
> > +++ b/include/uapi/linux/capability.h
> > @@ -332,6 +332,8 @@ struct vfs_ns_cap_data {
> >  
> >  #define CAP_AUDIT_CONTROL30
> >  
> > +/* Set or remove capabilities on files */
> > +
> >  #define CAP_SETFCAP 31
> >  
> >  /* Override MAC access.
> > -- 
> > 2.25.1
> > 
> 

-- 
James Morris




Re: [PATCH v4 1/6] arch/x86/kvm: Refactor l1d flush lifecycle management

2020-04-30 Thread Singh, Balbir
On Sat, 2020-04-25 at 11:49 +1000, Balbir Singh wrote:
> On Fri, 2020-04-24 at 13:59 -0500, Tom Lendacky wrote:
> > 
> > On 4/23/20 9:01 AM, Balbir Singh wrote:
> > > Split out the allocation and free routines to be used in a follow
> > > up set of patches (to reuse for L1D flushing).
> > > 
> > > Signed-off-by: Balbir Singh 
> > > Reviewed-by: Kees Cook 
> > > ---
> > >   arch/x86/include/asm/cacheflush.h |  3 +++
> > >   arch/x86/kernel/Makefile  |  1 +
> > >   arch/x86/kernel/l1d_flush.c   | 36 +++
> > >   arch/x86/kvm/vmx/vmx.c| 25 +++--
> > >   4 files changed, 43 insertions(+), 22 deletions(-)
> > >   create mode 100644 arch/x86/kernel/l1d_flush.c
> > > 
> > > diff --git a/arch/x86/include/asm/cacheflush.h
> > > b/arch/x86/include/asm/cacheflush.h
> > > index 63feaf2a5f93..bac56fcd9790 100644
> > > --- a/arch/x86/include/asm/cacheflush.h
> > > +++ b/arch/x86/include/asm/cacheflush.h
> > > @@ -6,6 +6,9 @@
> > >   #include 
> > >   #include 
> > > 
> > > +#define L1D_CACHE_ORDER 4
> > 
> > Since this is becoming a generic function now, shouldn't this value be
> > based on the actual L1D cache size? Is this value based on a 32KB data
> > cache and the idea is to write twice the size of the cache to be sure that
> > every entry has been replaced - with the second 32KB to catch the odd line
> > that might not have been pulled in?
> > 
> 
> Currently the only users are VMX L1TF and optional prctl(). It should be
> based
> on actual L1D cache size, I checked a little bit and the largest L1D cache
> size across various x86 bits is 64K. so there are three options here:
> 
> 1. We refactor the code, we would need to save the L1D cache size and use
> cpu_dev callbacks for L1D flush
> 2. We can make the current code depend on L1D_FLUSH MSR and enable it only
> when that feature is available. There would be no software fallback. Then
> follow it up with #1
> 3. We keep the code as is on the assumption that all of L1D <= 64K across
> the
> current platforms and we do #1 in a followup (since the prctl is optional
> and
> the only other user is the VMX code).
> 
> Thanks for the review,
> Balbir Singh.
> 

Tom,

I have the following changes that I think will suffice for now (these are not
properly formatted, but you get the idea)

diff --git a/arch/x86/kernel/l1d_flush.c b/arch/x86/kernel/l1d_flush.c
index a754b6c288a9..7fec0cc8f85c 100644
--- a/arch/x86/kernel/l1d_flush.c
+++ b/arch/x86/kernel/l1d_flush.c
@@ -92,6 +92,9 @@ int l1d_flush_init_once(void)
 {
int ret = 0;
 
+   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+   return -ENOTSUPP;
+
if (static_cpu_has(X86_FEATURE_FLUSH_L1D) || l1d_flush_pages)
return ret;


Does that satisfy your comments about patch 1/6 and 2/6 in the series?

Balbir Singh.


Re: [PATCH bpf] security: Fix the default value of fs_context_parse_param hook

2020-04-30 Thread James Morris
On Thu, 30 Apr 2020, KP Singh wrote:

> From: KP Singh 
> 

Applied to:
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security.git 
for-v5.7


-- 
James Morris




linux-next: manual merge of the drm tree with the drm-misc-fixes tree

2020-04-30 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the drm tree got a conflict in:

  include/linux/dma-buf.h

between commit:

  6f49c2515e22 ("dma-buf: fix documentation build warnings")

from the drm-misc-fixes tree and commit:

  09606b5446c2 ("dma-buf: add peer2peer flag")

from the drm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/dma-buf.h
index 57bcef6f988a,82e0a4a64601..
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@@ -333,8 -334,16 +333,16 @@@ struct dma_buf 
   * Attachment operations implemented by the importer.
   */
  struct dma_buf_attach_ops {
+   /**
+* @allow_peer2peer:
+*
+* If this is set to true the importer must be able to handle peer
+* resources without struct pages.
+*/
+   bool allow_peer2peer;
+ 
/**
 -   * @move_notify
 +   * @move_notify: [optional] notification that the DMA-buf is moving
 *
 * If this callback is provided the framework can avoid pinning the
 * backing store while mappings exists.


pgpWjGoepBcYY.pgp
Description: OpenPGP digital signature


Re: [PATCH net-next 2/3] ptp: Add adjust_phase to ptp_clock_caps capability.

2020-04-30 Thread Richard Cochran
On Wed, Apr 29, 2020 at 08:28:24PM -0400, vincent.cheng...@renesas.com wrote:
> From: Vincent Cheng 
> 
> Add adjust_phase to ptp_clock_caps capability to allow
> user to query if a PHC driver supports adjust phase with
> ioctl PTP_CLOCK_GETCAPS command.
> 
> Signed-off-by: Vincent Cheng 

Reviewed-by: Richard Cochran 


Re: [PATCH net-next 1/3] ptp: Add adjphase function to support phase offset control.

2020-04-30 Thread Richard Cochran
On Wed, Apr 29, 2020 at 08:28:23PM -0400, vincent.cheng...@renesas.com wrote:
> diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
> index acabbe7..c46ff98 100644
> --- a/drivers/ptp/ptp_clock.c
> +++ b/drivers/ptp/ptp_clock.c
> @@ -146,6 +146,8 @@ static int ptp_clock_adjtime(struct posix_clock *pc, 
> struct __kernel_timex *tx)
>   else
>   err = ops->adjfreq(ops, ppb);
>   ptp->dialed_frequency = tx->freq;
> + } else if (tx->modes & ADJ_OFFSET) {
> + err = ops->adjphase(ops, tx->offset);

This is a new method, and no drivers have it, so there must be a check
that the function pointer is non-null.

Thanks,
Richard


Re: [PATCH RFC] Kbuild: Makefile: warn if auto.conf is obsolete

2020-04-30 Thread Masahiro Yamada
On Fri, May 1, 2020 at 4:25 AM Mauro Carvalho Chehab
 wrote:
>
> A new behavior on more recent kernels require to always call
> "make modules_prepare" after *any* Kconfig changes.


Again, this is the behavior since 2004.

This commit:
https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=95065ad3fa787c417008a36d3a5d9a3bab17ab98


Shrug if you complain about what has been stable
more than 15 years.


> This is not what a poor mortal would be expecting on a building
> system, as it should, IMHO, be able to detect and auto-run
> whatever is needed to use the newer setup.

No. External module builds should never ever attempt to update
in-tree files.

This is because the build environment for external modules
is usually located in /lib/modules/$(uname -r)/build/,
which is read-only.


A number of upstream developers (ab)use
M= to compile test individual directories,
despite the fact Kbuild supports
the single target 'make drivers/staging/media/stomisp/'



You need to cope with this conflicting comment line:
https://github.com/masahir0y/linux/blob/v5.6/Makefile#L681
since you care if auto.conf is up-to-date.





> Yet, while this is not solved, let's at least stop the build
> and produce a warning, to notify the user about that.
>
> Signed-off-by: Mauro Carvalho Chehab 
> ---
>
> I would still prefer to call "make modules_prepare" directly,
> on such cases, but just calling "make -C . modules_prepare" doesn't
> work. So, the next best thing would be to at least print a message
> and don't try to do a build with a broken auto.conf file.
>
>  Makefile | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/Makefile b/Makefile
> index 70def4907036..492ee2396ab9 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1632,6 +1632,11 @@ $(objtree)/Module.symvers:
>  build-dirs := $(KBUILD_EXTMOD)
>  PHONY += modules
>  modules: descend $(objtree)/Module.symvers
> +   @if [ $(KCONFIG_CONFIG) -nt include/config/auto.conf ]; then \
> +   echo "  WARNING: $(KCONFIG_CONFIG) was modified. Need to 
> run:"; \
> +   echo "   $(MAKE) modules_prepare"; \
> +   exit -1; \
> +   fi
> $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.modpost
>
>  PHONY += modules_install








--
Best Regards
Masahiro Yamada


Re: [PATCH] dpaa_eth: Fix comparing pointer to 0

2020-04-30 Thread David Miller
From: Aishwarya Ramakrishnan 
Date: Mon, 27 Apr 2020 16:02:30 +0530

> Fixes coccicheck warning:
> ./drivers/net/ethernet/freescale/dpaa/dpaa_eth.c:2110:30-31:
> WARNING comparing pointer to 0
> 
> Avoid pointer type value compared to 0.
> 
> Signed-off-by: Aishwarya Ramakrishnan 

Applied, thanks.


Re: linux-next: manual merge of the mlx5-next tree with the kspp-gustavo tree

2020-04-30 Thread Gustavo A. R. Silva
Hi Stephen,

On 4/30/20 22:12, Stephen Rothwell wrote:
> Hi all,
> 
> On Wed, 29 Apr 2020 12:06:25 +1000 Stephen Rothwell  
> wrote:
>>
>> Today's linux-next merge of the mlx5-next tree got a conflict in:
>>
>>   include/linux/mlx5/mlx5_ifc.h
>>
>> between commit:
>>
>>   3ba225b506a2 ("treewide: Replace zero-length array with flexible-array 
>> member")
>>
>> from the kspp-gustavo tree and commit:
>>
>>   d65dbedfd298 ("net/mlx5: Add support for COPY steering action")
>>
>> from the mlx5-next tree.
>>
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>>
>> -- 
>> Cheers,
>> Stephen Rothwell
>>
>> diff --cc include/linux/mlx5/mlx5_ifc.h
>> index 8d30f18dcdee,fb243848132d..
>> --- a/include/linux/mlx5/mlx5_ifc.h
>> +++ b/include/linux/mlx5/mlx5_ifc.h
>> @@@ -5743,7 -5771,7 +5771,7 @@@ struct mlx5_ifc_alloc_modify_header_con
>>  u8 reserved_at_68[0x10];
>>  u8 num_of_actions[0x8];
>>   
>> -union mlx5_ifc_set_action_in_add_action_in_auto_bits actions[];
>>  -   union mlx5_ifc_set_add_copy_action_in_auto_bits actions[0];
>> ++   union mlx5_ifc_set_add_copy_action_in_auto_bits actions[];
>>   };
>>   
>>   struct mlx5_ifc_dealloc_modify_header_context_out_bits {
>> @@@ -9677,9 -9705,32 +9705,32 @@@ struct mlx5_ifc_mcda_reg_bits 
>>   
>>  u8 reserved_at_60[0x20];
>>   
>>  -   u8 data[0][0x20];
>>  +   u8 data[][0x20];
>>   };
>>   
>> + enum {
>> +MLX5_MFRL_REG_RESET_TYPE_FULL_CHIP = BIT(0),
>> +MLX5_MFRL_REG_RESET_TYPE_NET_PORT_ALIVE = BIT(1),
>> + };
>> + 
>> + enum {
>> +MLX5_MFRL_REG_RESET_LEVEL0 = BIT(0),
>> +MLX5_MFRL_REG_RESET_LEVEL3 = BIT(3),
>> +MLX5_MFRL_REG_RESET_LEVEL6 = BIT(6),
>> + };
>> + 
>> + struct mlx5_ifc_mfrl_reg_bits {
>> +u8 reserved_at_0[0x20];
>> + 
>> +u8 reserved_at_20[0x2];
>> +u8 pci_sync_for_fw_update_start[0x1];
>> +u8 pci_sync_for_fw_update_resp[0x2];
>> +u8 rst_type_sel[0x3];
>> +u8 reserved_at_28[0x8];
>> +u8 reset_type[0x8];
>> +u8 reset_level[0x8];
>> + };
>> + 
>>   struct mlx5_ifc_mirc_reg_bits {
>>  u8 reserved_at_0[0x18];
>>  u8 status_code[0x8];
> 
> This is now a conflict between the net-next and kspp-gustavo trees.
> 

Thanks for reporting this. I think the best solution, for now, is to remove the
changes from my tree. I'll do it right away.

Thanks
--
Gustavo



Re: [PATCH] net: moxa: Fix a potential double 'free_irq()'

2020-04-30 Thread David Miller
From: Christophe JAILLET 
Date: Sun, 26 Apr 2020 22:59:21 +0200

> Should an irq requested with 'devm_request_irq' be released explicitly,
> it should be done by 'devm_free_irq()', not 'free_irq()'.
> 
> Fixes: 6c821bd9edc9 ("net: Add MOXA ART SoCs ethernet driver")
> Signed-off-by: Christophe JAILLET 

Applied, thank you.


Re: [PATCH] platform/chrome: cros_ec_typec: Handle NULL EC pointer during probe.

2020-04-30 Thread Daniil Lunev
Hi Prashant,
I do not think it is present. Thinking about it, I do not think it
shall be an issue on any released device as it will have either a
firmware which wouldn't even trigger the typec probe or the one after
the hierarchy fix. Likely I just got a firmware which was somewhere in
between those two (As I did some unrelated FW testing). So, yes,
probably putting this upstream is not necessary, though IMO more
sanity checks - especially on non-critical run-once paths - are always
better than having a kernel panic lingering around the corner, not
like I am insisting on pushing the patch though with all the info, up
to Enric.
Cheers,
Daniil

On Fri, May 1, 2020 at 10:56 AM Prashant Malani  wrote:
>
> Hi Daniil,
>
> On Fri, May 01, 2020 at 10:15:18AM +1000, Daniil Lunev wrote:
> > On the official revision of coreboot for hatch it doesn't even try to
> > load Type C. However it gives some warning messages from
> > cros-usbpd-notify-acpi about EC, So I wonder why the check of the same
> > type is not appropriate in the typec driver?
>
> I think the difference is that GOOG0003 is already present on shipped /
> official versions of coreboot (so not having that check can cause
> existing release images/devices to crash), whereas for GOOG0014 that is / 
> isn't the case.
>
> Is GOOG0014 present on the official release coreboot image for this
> device? If so, what's its path (/sys/bus/acpi/devices//path) ?
>
> Best regards,
>
> -Prashant
> >
> > ../chrome/cros_usbpd_notify.c
> >
> > /* Get the EC device pointer needed to talk to the EC. */
> > ec_dev = dev_get_drvdata(dev->parent);
> > if (!ec_dev) {
> > /*
> > * We continue even for older devices which don't have the
> > * correct device heirarchy, namely, GOOG0003 is a child
> > * of GOOG0004.
> > */
> > dev_warn(dev, "Couldn't get Chrome EC device pointer.\n");
> > }
> >
> >
> > # dmesg
> > ...
> > [8.513351] cros-ec-spi spi-PRP0001:02: EC failed to respond in time
> > [8.722072] cros-ec-spi spi-PRP0001:02: EC failed to respond in time
> > [8.729271] cros-ec-spi spi-PRP0001:02: Cannot identify the EC: error 
> > -110
> > [8.736966] cros-ec-spi spi-PRP0001:02: cannot register EC,
> > fallback to spidev
> > [8.767017] cros_ec_lpcs GOOG0004:00: Chrome EC device registered
> > [8.807537] cros-usbpd-notify-acpi GOOG0003:00: Couldn't get Chrome
> > EC device pointer.
> > ...
> >
> > On Fri, May 1, 2020 at 2:17 AM Prashant Malani  wrote:
> > >
> > > Hi Enric,
> > >
> > > On Thu, Apr 30, 2020 at 8:26 AM Enric Balletbo i Serra
> > >  wrote:
> > > >
> > > > Hi Prashant,
> > > >
> > > > On 30/4/20 2:43, Prashant Malani wrote:
> > > > > On Wed, Apr 29, 2020 at 5:38 PM Daniil Lunev  
> > > > > wrote:
> > > > >>
> > > > >> [to make it appear on the mailing list as I didn't realize I was in
> > > > >> hypertext sending mode]
> > > > >>
> > > > >> On Thu, Apr 30, 2020 at 10:11 AM Daniil Lunev  
> > > > >> wrote:
> > > > >>>
> > > > >>> Hi Enric.
> > > > >>> I encountered the issue on a Hatch device when trying running 5.4 
> > > > >>> kernel on that. After talking to Prashant it seems that any device 
> > > > >>> with coreboot built before a certain point (a particular fix for 
> > > > >>> device hierarchy in ACPI tables of Chrome devices which happened in 
> > > > >>> mid-April) will not be able to correctly initialize the driver and 
> > > > >>> will get a kernel panic trying to do so.
> > > > >
> > > > > A clarifying detail here: This should not be seen in any current
> > > > > *production* device. No prod device firmware will carry the erroneous
> > > > > ACPI device entry.
> > > > >
> > > >
> > > > Thanks for the clarification. Then, I don't think we need to upstream 
> > > > this. This
> > > > kind of "defensive-programming" it's not something that should matter 
> > > > to upstream.
> > >
> > > Actually, on second thought, I am not 100% sure about this:
> > > Daniil, is the erroneous ACPI device on a *production* firmware for
> > > this device (I'm not sure about the vintage of that device's BIOS)?
> > >
> > > My apologies for the confusion, Enric and Daniil; but would be good to
> > > get clarification from Daniil.
> > >
> > > Best regards,
> > > >
> > > > Thanks,
> > > >  Enric
> > > >
> > > >
> > > > >>> Thanks,
> > > > >>> Daniil
> > > > >>>
> > > > >>> On Thu, Apr 30, 2020 at 7:58 AM Enric Balletbo i Serra 
> > > > >>>  wrote:
> > > > 
> > > >  Hi Daniil,
> > > > 
> > > >  Thank you for the patch.
> > > > 
> > > >  On 28/4/20 3:02, Daniil Lunev wrote:
> > > > > Missing EC in device hierarchy causes NULL pointer to be returned 
> > > > > to the
> > > > > probe function which leads to NULL pointer dereference when 
> > > > > trying to
> > > > > send a command to the EC. This can be the case if the device is 
> > > > > missing
> > > > > or incorrectly configured in the firmware blob. Even if the 
> > > > > situation
> > > > 
> > > >  There is any production device with a buggy 

Re: [PATCH V1 09/10] arch/kmap: Define kmap_atomic_prot() for all arch's

2020-04-30 Thread Al Viro
On Fri, May 01, 2020 at 03:37:34AM +0100, Al Viro wrote:
> On Thu, Apr 30, 2020 at 01:38:44PM -0700, ira.we...@intel.com wrote:
> 
> > -static inline void *kmap_atomic(struct page *page)
> > +static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
> >  {
> > preempt_disable();
> > pagefault_disable();
> > if (!PageHighMem(page))
> > return page_address(page);
> > -   return kmap_atomic_high(page);
> > +   return kmap_atomic_high_prot(page, prot);
> >  }
> > +#define kmap_atomic(page)  kmap_atomic_prot(page, kmap_prot)
> 
> OK, so it *was* just a bisect hazard - you return to original semantics
> wrt preempt_disable()...

FWIW, how about doing the following: just before #5/10 have a patch
that would touch only microblaze, ppc and x86 splitting their
kmap_atomic_prot() into an inline helper + kmap_atomic_high_prot().
Then your #5 would leave their kmap_atomic_prot() as-is (it would
use kmap_atomic_prot_high() instead).  The rest of the series plays
out pretty much the same way it does now, and wrappers on those
3 architectures would go away when an identical generic one is
introduced in this commit (#9/10).

AFAICS, that would avoid the bisect hazard and might even end
up with less noise in the patches...


[PATCH v3 1/3] powerpc/numa: Set numa_node for all possible cpus

2020-04-30 Thread Srikar Dronamraju
A Powerpc system with multiple possible nodes and with CONFIG_NUMA
enabled always used to have a node 0, even if node 0 does not any cpus
or memory attached to it. As per PAPR, node affinity of a cpu is only
available once its present / online. For all cpus that are possible but
not present, cpu_to_node() would point to node 0.

To ensure a cpuless, memoryless dummy node is not online, powerpc need
to make sure all possible but not present cpu_to_node are set to a
proper node.

Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Gautham R Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1:->v2:
- Rebased to v5.7-rc3

 arch/powerpc/mm/numa.c | 16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 9fcf2d195830..b3615b7fdbdf 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -931,8 +931,20 @@ void __init mem_topology_setup(void)
 
reset_numa_cpu_lookup_table();
 
-   for_each_present_cpu(cpu)
-   numa_setup_cpu(cpu);
+   for_each_possible_cpu(cpu) {
+   /*
+* Powerpc with CONFIG_NUMA always used to have a node 0,
+* even if it was memoryless or cpuless. For all cpus that
+* are possible but not present, cpu_to_node() would point
+* to node 0. To remove a cpuless, memoryless dummy node,
+* powerpc need to make sure all possible but not present
+* cpu_to_node are set to a proper node.
+*/
+   if (cpu_present(cpu))
+   numa_setup_cpu(cpu);
+   else
+   set_cpu_numa_node(cpu, first_online_node);
+   }
 }
 
 void __init initmem_init(void)
-- 
2.20.1



Re: linux-next: manual merge of the mlx5-next tree with the kspp-gustavo tree

2020-04-30 Thread Stephen Rothwell
Hi all,

On Wed, 29 Apr 2020 12:06:25 +1000 Stephen Rothwell  
wrote:
>
> Today's linux-next merge of the mlx5-next tree got a conflict in:
> 
>   include/linux/mlx5/mlx5_ifc.h
> 
> between commit:
> 
>   3ba225b506a2 ("treewide: Replace zero-length array with flexible-array 
> member")
> 
> from the kspp-gustavo tree and commit:
> 
>   d65dbedfd298 ("net/mlx5: Add support for COPY steering action")
> 
> from the mlx5-next tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.
> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc include/linux/mlx5/mlx5_ifc.h
> index 8d30f18dcdee,fb243848132d..
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@@ -5743,7 -5771,7 +5771,7 @@@ struct mlx5_ifc_alloc_modify_header_con
>   u8 reserved_at_68[0x10];
>   u8 num_of_actions[0x8];
>   
> - union mlx5_ifc_set_action_in_add_action_in_auto_bits actions[];
>  -union mlx5_ifc_set_add_copy_action_in_auto_bits actions[0];
> ++union mlx5_ifc_set_add_copy_action_in_auto_bits actions[];
>   };
>   
>   struct mlx5_ifc_dealloc_modify_header_context_out_bits {
> @@@ -9677,9 -9705,32 +9705,32 @@@ struct mlx5_ifc_mcda_reg_bits 
>   
>   u8 reserved_at_60[0x20];
>   
>  -u8 data[0][0x20];
>  +u8 data[][0x20];
>   };
>   
> + enum {
> + MLX5_MFRL_REG_RESET_TYPE_FULL_CHIP = BIT(0),
> + MLX5_MFRL_REG_RESET_TYPE_NET_PORT_ALIVE = BIT(1),
> + };
> + 
> + enum {
> + MLX5_MFRL_REG_RESET_LEVEL0 = BIT(0),
> + MLX5_MFRL_REG_RESET_LEVEL3 = BIT(3),
> + MLX5_MFRL_REG_RESET_LEVEL6 = BIT(6),
> + };
> + 
> + struct mlx5_ifc_mfrl_reg_bits {
> + u8 reserved_at_0[0x20];
> + 
> + u8 reserved_at_20[0x2];
> + u8 pci_sync_for_fw_update_start[0x1];
> + u8 pci_sync_for_fw_update_resp[0x2];
> + u8 rst_type_sel[0x3];
> + u8 reserved_at_28[0x8];
> + u8 reset_type[0x8];
> + u8 reset_level[0x8];
> + };
> + 
>   struct mlx5_ifc_mirc_reg_bits {
>   u8 reserved_at_0[0x18];
>   u8 status_code[0x8];

This is now a conflict between the net-next and kspp-gustavo trees.

-- 
Cheers,
Stephen Rothwell


pgp0jN46tF6G_.pgp
Description: OpenPGP digital signature


[PATCH v3 2/3] powerpc/numa: Prefer node id queried from vphn

2020-04-30 Thread Srikar Dronamraju
Node id queried from the static device tree may not
be correct. For example: it may always show 0 on a shared processor.
Hence prefer the node id queried from vphn and fallback on the device tree
based node id if vphn query fails.

Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Gautham R Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v2:->v3:
- Resolved comments from Gautham.
Link v2: 
https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-sri...@linux.vnet.ibm.com/t/#u

Changelog v1:->v2:
- Rebased to v5.7-rc3

 arch/powerpc/mm/numa.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index b3615b7fdbdf..79c74a2753c1 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -719,21 +719,22 @@ static int __init parse_numa_properties(void)
 */
for_each_present_cpu(i) {
struct device_node *cpu;
-   int nid;
-
-   cpu = of_get_cpu_node(i, NULL);
-   BUG_ON(!cpu);
-   nid = of_node_to_nid_single(cpu);
-   of_node_put(cpu);
+   int nid = vphn_get_nid(i);
 
/*
 * Don't fall back to default_nid yet -- we will plug
 * cpus into nodes once the memory scan has discovered
 * the topology.
 */
-   if (nid < 0)
-   continue;
-   node_set_online(nid);
+   if (nid == NUMA_NO_NODE) {
+   cpu = of_get_cpu_node(i, NULL);
+   BUG_ON(!cpu);
+   nid = of_node_to_nid_single(cpu);
+   of_node_put(cpu);
+   }
+
+   if (likely(nid > 0))
+   node_set_online(nid);
}
 
get_n_mem_cells(_mem_addr_cells, _mem_size_cells);
-- 
2.20.1



[PATCH v3 0/3] Offline memoryless cpuless node 0

2020-04-30 Thread Srikar Dronamraju
Changelog v2:->v3:
- Resolved comments from Gautham.
Link v2: 
https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-sri...@linux.vnet.ibm.com/t/#u

Changelog v1:->v2:
- Rebased to v5.7-rc3
- Updated the changelog.
Link v1: 
https://lore.kernel.org/linuxppc-dev/2020030237.5731-1-sri...@linux.vnet.ibm.com/t/#u

Linux kernel configured with CONFIG_NUMA on a system with multiple
possible nodes, marks node 0 as online at boot. However in practice,
there are systems which have node 0 as memoryless and cpuless.

This can cause
1. numa_balancing to be enabled on systems with only one online node.
2. Existence of dummy (cpuless and memoryless) node which can confuse
users/scripts looking at output of lscpu / numactl.

This patchset wants to correct this anomaly.

This should only affect systems that have CONFIG_MEMORYLESS_NODES.
Currently there are only 2 architectures ia64 and powerpc that have this
config.

Note: Patch 3 in this patch series depends on patches 1 and 2.
Without patches 1 and 2, patch 3 might crash powerpc.

v5.7-rc3
 available: 2 nodes (0,2)
 node 0 cpus:
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31490 MB
 node distances:
 node   0   2
   0:  10  20
   2:  20  10

proc and sys files
--
 /sys/devices/system/node/online:0,2
 /proc/sys/kernel/numa_balancing:1
 /sys/devices/system/node/has_cpu:   2
 /sys/devices/system/node/has_memory:2
 /sys/devices/system/node/has_normal_memory: 2
 /sys/devices/system/node/possible:  0-31

v5.7-rc3 + patches
--
 available: 1 nodes (2)
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31487 MB
 node distances:
 node   2
   2:  10

proc and sys files
--
/sys/devices/system/node/online:2
/proc/sys/kernel/numa_balancing:0
/sys/devices/system/node/has_cpu:   2
/sys/devices/system/node/has_memory:2
/sys/devices/system/node/has_normal_memory: 2
/sys/devices/system/node/possible:  0-31

Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Gautham R Shenoy 

Srikar Dronamraju (3):
  powerpc/numa: Set numa_node for all possible cpus
  powerpc/numa: Prefer node id queried from vphn
  mm/page_alloc: Keep memoryless cpuless node 0 offline

 arch/powerpc/mm/numa.c | 35 ---
 mm/page_alloc.c|  4 +++-
 2 files changed, 27 insertions(+), 12 deletions(-)

-- 
2.20.1



[PATCH v3 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

2020-04-30 Thread Srikar Dronamraju
Currently Linux kernel with CONFIG_NUMA on a system with multiple
possible nodes, marks node 0 as online at boot.  However in practice,
there are systems which have node 0 as memoryless and cpuless.

This can cause numa_balancing to be enabled on systems with only one node
with memory and CPUs. The existence of this dummy node which is cpuless and
memoryless node can confuse users/scripts looking at output of lscpu /
numactl.

By marking, N_ONLINE as NODE_MASK_NONE, lets stop assuming that Node 0 is
always online.

v5.7-rc3
 available: 2 nodes (0,2)
 node 0 cpus:
 node 0 size: 0 MB
 node 0 free: 0 MB
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31490 MB
 node distances:
 node   0   2
   0:  10  20
   2:  20  10

proc and sys files
--
 /sys/devices/system/node/online:0,2
 /proc/sys/kernel/numa_balancing:1
 /sys/devices/system/node/has_cpu:   2
 /sys/devices/system/node/has_memory:2
 /sys/devices/system/node/has_normal_memory: 2
 /sys/devices/system/node/possible:  0-31

v5.7-rc3 + patch
--
 available: 1 nodes (2)
 node 2 cpus: 0 1 2 3 4 5 6 7
 node 2 size: 32625 MB
 node 2 free: 31487 MB
 node distances:
 node   2
   2:  10

proc and sys files
--
/sys/devices/system/node/online:2
/proc/sys/kernel/numa_balancing:0
/sys/devices/system/node/has_cpu:   2
/sys/devices/system/node/has_memory:2
/sys/devices/system/node/has_normal_memory: 2
/sys/devices/system/node/possible:  0-31

Note: On Powerpc, cpu_to_node of possible but not present cpus would
previously return 0. Hence this commit depends on commit ("powerpc/numa: Set
numa_node for all possible cpus") and commit ("powerpc/numa: Prefer node id
queried from vphn"). Without the 2 commits, Powerpc system might crash.

Cc: linuxppc-...@lists.ozlabs.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Michal Hocko 
Cc: Mel Gorman 
Cc: Vlastimil Babka 
Cc: "Kirill A. Shutemov" 
Cc: Christopher Lameter 
Cc: Michael Ellerman 
Cc: Andrew Morton 
Cc: Linus Torvalds 
Cc: Gautham R Shenoy 
Signed-off-by: Srikar Dronamraju 
---
Changelog v1:->v2:
- Rebased to v5.7-rc3
Link v2: 
https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-sri...@linux.vnet.ibm.com/t/#u

 mm/page_alloc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 69827d4fa052..03b89592af04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -116,8 +116,10 @@ EXPORT_SYMBOL(latent_entropy);
  */
 nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
[N_POSSIBLE] = NODE_MASK_ALL,
+#ifdef CONFIG_NUMA
+   [N_ONLINE] = NODE_MASK_NONE,
+#else
[N_ONLINE] = { { [0] = 1UL } },
-#ifndef CONFIG_NUMA
[N_NORMAL_MEMORY] = { { [0] = 1UL } },
 #ifdef CONFIG_HIGHMEM
[N_HIGH_MEMORY] = { { [0] = 1UL } },
-- 
2.20.1



KASAN: slab-out-of-bounds Read in gadget_dev_desc_UDC_store

2020-04-30 Thread Kyungtae Kim
We report a bug (in linux-5.6.8) found by FuzzUSB (a modified version
of syzkaller).

This happened when the size of "name" buffer is smaller than that of
"page" buffer
(after function kstrdup executed at line 263).
I guess it comes from the "page" buffer containing 0 value in the middle.
So accessing the "name" buffer with "len" variable, which is used to
indicate the size of "page" buffer,
triggered memory access violation.
To fix, it may need to check the size of name buffer, and try to use
right index variable.

kernel config: https://kt0755.github.io/etc/config_v5.6.8

==
BUG: KASAN: slab-out-of-bounds in
gadget_dev_desc_UDC_store+0x1ba/0x200
drivers/usb/gadget/configfs.c:266
Read of size 1 at addr 88806a55dd7e by task syz-executor.0/17208

CPU: 2 PID: 17208 Comm: syz-executor.0 Not tainted 5.6.8 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xce/0x128 lib/dump_stack.c:118
 print_address_description.constprop.4+0x21/0x3c0 mm/kasan/report.c:374
 __kasan_report+0x131/0x1b0 mm/kasan/report.c:506
 kasan_report+0x12/0x20 mm/kasan/common.c:641
 __asan_report_load1_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 gadget_dev_desc_UDC_store+0x1ba/0x200 drivers/usb/gadget/configfs.c:266
 flush_write_buffer fs/configfs/file.c:251 [inline]
 configfs_write_file+0x2f1/0x4c0 fs/configfs/file.c:283
 __vfs_write+0x85/0x110 fs/read_write.c:494
 vfs_write+0x1cd/0x510 fs/read_write.c:558
 ksys_write+0x18a/0x220 fs/read_write.c:611
 __do_sys_write fs/read_write.c:623 [inline]
 __se_sys_write fs/read_write.c:620 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:620
 do_syscall_64+0x9e/0x510 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x452149
Code: 2d 61 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 0f 83 fb 60 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7f3bd907cc78 EFLAGS: 0246 ORIG_RAX: 0001
RAX: ffda RBX: 0073c0f8 RCX: 00452149
RDX: fed8 RSI: 23c0 RDI: 0003
RBP: 0003 R08:  R09: 
R10:  R11: 0246 R12: 004bf782
R13: 004d7710 R14: 7f3bd907d6d4 R15: 

Allocated by task 1:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 __kasan_kmalloc.constprop.3+0xa7/0xd0 mm/kasan/common.c:515
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:529
 __kmalloc+0x11c/0x310 mm/slub.c:3841
 kmalloc include/linux/slab.h:560 [inline]
 kzalloc include/linux/slab.h:669 [inline]
 acpi_os_allocate_zeroed+0x3e/0x42 include/acpi/platform/aclinuxex.h:57
 acpi_ns_internalize_name+0xd9/0x16a drivers/acpi/acpica/nsutils.c:331
 acpi_ns_get_node_unlocked+0x17e/0x1fe drivers/acpi/acpica/nsutils.c:666
 acpi_ns_get_node+0x44/0x62 drivers/acpi/acpica/nsutils.c:726
 acpi_ns_evaluate+0xc8/0x93e drivers/acpi/acpica/nseval.c:61
 acpi_ut_evaluate_object+0xe4/0x386 drivers/acpi/acpica/uteval.c:60
 acpi_ut_execute_power_methods+0xda/0x1b1 drivers/acpi/acpica/uteval.c:288
 acpi_get_object_info+0x487/0x994 drivers/acpi/acpica/nsxfname.c:366
 acpi_set_pnp_ids drivers/acpi/scan.c:1245 [inline]
 acpi_init_device_object+0xbfd/0x17a0 drivers/acpi/scan.c:1585
 acpi_add_single_object+0x121/0x1710 drivers/acpi/scan.c:1620
 acpi_bus_check_add+0x1c9/0x4f0 drivers/acpi/scan.c:1873
 acpi_ns_walk_namespace+0x1d3/0x320 drivers/acpi/acpica/nswalk.c:236
 acpi_walk_namespace+0xb5/0xef drivers/acpi/acpica/nsxfeval.c:606
 acpi_bus_scan+0xdf/0xf0 drivers/acpi/scan.c:2054
 acpi_scan_init+0x264/0x5e4 drivers/acpi/scan.c:2218
 acpi_init+0x592/0x612 drivers/acpi/bus.c:1249
 do_one_initcall+0xe0/0x650 init/main.c:1152
 do_initcall_level init/main.c:1225 [inline]
 do_initcalls init/main.c:1241 [inline]
 do_basic_setup init/main.c:1261 [inline]
 kernel_init_freeable+0x5e8/0x67c init/main.c:1445
 kernel_init+0x13/0x1b0 init/main.c:1352
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Freed by task 1:
 save_stack+0x21/0x90 mm/kasan/common.c:72
 set_track mm/kasan/common.c:80 [inline]
 kasan_set_free_info mm/kasan/common.c:337 [inline]
 __kasan_slab_free+0x135/0x190 mm/kasan/common.c:476
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:485
 slab_free_hook mm/slub.c:1444 [inline]
 slab_free_freelist_hook mm/slub.c:1477 [inline]
 slab_free mm/slub.c:3034 [inline]
 kfree+0xf7/0x410 mm/slub.c:3995
 acpi_os_free include/acpi/platform/aclinuxex.h:62 [inline]
 acpi_ns_get_node_unlocked+0x1c8/0x1fe drivers/acpi/acpica/nsutils.c:686
 acpi_ns_get_node+0x44/0x62 drivers/acpi/acpica/nsutils.c:726
 acpi_ns_evaluate+0xc8/0x93e drivers/acpi/acpica/nseval.c:61
 acpi_ut_evaluate_object+0xe4/0x386 drivers/acpi/acpica/uteval.c:60
 acpi_ut_execute_power_methods+0xda/0x1b1 drivers/acpi/acpica/uteval.c:288

Re: [PATCH 3/4] counter: Add character device interface

2020-04-30 Thread kbuild test robot
Hi William,

I love your patch! Yet something to improve:

[auto build test ERROR on stm32/stm32-next]
[cannot apply to linus/master linux/master v5.7-rc3 next-20200430]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:
https://github.com/0day-ci/linux/commits/William-Breathitt-Gray/Introduce-the-Counter-character-device-interface/20200430-051734
base:   https://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32.git 
stm32-next
config: x86_64-randconfig-d003-20200501 (attached as .config)
compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot 

All errors (new ones prefixed by >>):

   In file included from :32:0:
>> ./usr/include/linux/counter.h:22:2: error: unknown type name 'size_t'
 size_t num_signals;
 ^~
   ./usr/include/linux/counter.h:23:2: error: unknown type name 'size_t'
 size_t num_counts;
 ^~
   ./usr/include/linux/counter.h:24:2: error: unknown type name 'size_t'
 size_t num_ext;
 ^~
   ./usr/include/linux/counter.h:35:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:39:2: error: unknown type name 'size_t'
 size_t num_ext;
 ^~
   ./usr/include/linux/counter.h:50:2: error: unknown type name 'size_t'
 size_t count_index;
 ^~
   ./usr/include/linux/counter.h:51:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:53:2: error: unknown type name 'size_t'
 size_t num_actions;
 ^~
   ./usr/include/linux/counter.h:54:2: error: unknown type name 'size_t'
 size_t signal_index;
 ^~
   ./usr/include/linux/counter.h:67:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:71:2: error: unknown type name 'size_t'
 size_t num_functions;
 ^~
   ./usr/include/linux/counter.h:72:2: error: unknown type name 'size_t'
 size_t num_synapses;
 ^~
   ./usr/include/linux/counter.h:73:2: error: unknown type name 'size_t'
 size_t num_ext;
 ^~
   ./usr/include/linux/counter.h:86:2: error: unknown type name 'size_t'
 size_t owner_index;
 ^~
   ./usr/include/linux/counter.h:87:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:99:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:110:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:121:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:133:2: error: unknown type name 'size_t'
 size_t count_index;
 ^~
   ./usr/include/linux/counter.h:134:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:146:2: error: unknown type name 'size_t'
 size_t count_index;
 ^~
   ./usr/include/linux/counter.h:147:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:160:2: error: unknown type name 'size_t'
 size_t count_index;
 ^~
   ./usr/include/linux/counter.h:161:2: error: unknown type name 'size_t'
 size_t synapse_index;
 ^~
   ./usr/include/linux/counter.h:162:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:176:2: error: unknown type name 'size_t'
 size_t owner_index;
 ^~
   ./usr/include/linux/counter.h:177:2: error: unknown type name 'size_t'
 size_t index;
 ^~
>> ./usr/include/linux/counter.h:179:2: error: unknown type name 'bool'
 bool data;
 ^~~~
   ./usr/include/linux/counter.h:191:2: error: unknown type name 'size_t'
 size_t owner_index;
 ^~
   ./usr/include/linux/counter.h:192:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:206:2: error: unknown type name 'size_t'
 size_t owner_index;
 ^~
   ./usr/include/linux/counter.h:207:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:221:2: error: unknown type name 'size_t'
 size_t owner_index;
 ^~
   ./usr/include/linux/counter.h:222:2: error: unknown type name 'size_t'
 size_t index;
 ^~
   ./usr/include/linux/counter.h:224:2: error: unknown type name 'size_t'
 size_t enum_index;
 ^~
   ./usr/include/linux/counter.h:237:2: error: unknown type name 'size_t'
 size_t owner_index;
 ^~
   ./usr/include/linux/counter.h:238:2: error: unknown type name 'size_t'
 size_t ext_index;
  

[git pull] drm fixes for 5.7-rc4

2020-04-30 Thread Dave Airlie
Hi Linus,

Regular scheduled fixes pull for graphics. Nothing to extreme bunch of
amdgpu fixes, i915 and qxl fixes, along with some misc ones.

All seems to be progressing normally.

Dave.

drm-fixes-2020-05-01:
drm fixes for 5.7-rc4

core:
- EDID off by one DTD fix
- DP mst write return code fix

dma-buf:
- fix SET_NAME ioctl uapi
- doc fixes

amdgpu:
- Fix a green screen on resume issue
- PM fixes for SR-IOV
- SDMA fix for navi
- Renoir display fixes
- Cursor and pageflip stuttering fixes
- Misc additional display fixes
- (uapi) Add additional DCC tiling flags for navi1x

i915:
- Fix selftest refcnt leak (Xiyu)
- Fix gem vma lock (Chris)
- Fix gt's i915_request.timeline acquire by checking if cacheline is
valid (Chris)
- Fix IRQ postinistall fault masks (Matt)

qxl:
- use after gree fix
- fix lost kunmap
- release leak fix

virtio:
- context destruction fix
The following changes since commit 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c:

  Linux 5.7-rc3 (2020-04-26 13:51:02 -0700)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm tags/drm-fixes-2020-05-01

for you to fetch changes up to e3dcd86b3b4c045a4db17c02340138a4c514fe20:

  Merge tag 'amd-drm-fixes-5.7-2020-04-29' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes (2020-05-01
11:19:55 +1000)


drm fixes for 5.7-rc4

core:
- EDID off by one DTD fix
- DP mst write return code fix

dma-buf:
- fix SET_NAME ioctl uapi
- doc fixes

amdgpu:
- Fix a green screen on resume issue
- PM fixes for SR-IOV
 SDMA fix for navi
- Renoir display fixes
- Cursor and pageflip stuttering fixes
- Misc additional display fixes
- (uapi) Add additional DCC tiling flags for navi1x

i915:
- Fix selftest refcnt leak (Xiyu)
- Fix gem vma lock (Chris)
- Fix gt's i915_request.timeline acquire by checking if cacheline is
valid (Chris)
- Fix IRQ postinistall fault masks (Matt)

qxl:
- use after gree fix
- fix lost kunmap
- release leak fix

virtio:
- context destruction fix


Aric Cyr (1):
  drm/amd/display: Use cursor locking to prevent flip delays

Aurabindo Pillai (1):
  drm/amd/display: DispalyPort: Write OUI only if panel supports it

Chris Wilson (2):
  drm/i915/gem: Hold obj->vma.lock over for_each_ggtt_vma()
  drm/i915/gt: Check cacheline is valid before acquiring

Daniel Vetter (1):
  dma-buf: Fix SET_NAME ioctl uapi

Dave Airlie (3):
  Merge tag 'drm-misc-fixes-2020-04-30' of
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
  Merge tag 'drm-intel-fixes-2020-04-30' of
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
  Merge tag 'amd-drm-fixes-5.7-2020-04-29' of
git://people.freedesktop.org/~agd5f/linux into drm-fixes

Dmytro Laktyushkin (2):
  drm/amd/display: check if REFCLK_CNTL register is present
  drm/amd/display: fix rn soc bb update

Gurchetan Singh (1):
  drm/virtio: only destroy created contexts

Lyude Paul (1):
  drm/dp_mst: Fix drm_dp_send_dpcd_write() return code

Marek Olšák (3):
  drm/amdgpu: add tiling flags from Mesa
  drm/amdgpu: invalidate L2 before SDMA IBs (v2)
  drm/amdgpu: bump version for invalidate L2 before SDMA IBs

Matt Roper (1):
  drm/i915: Use proper fault mask in interrupt postinstall too

Nicholas Kazlauskas (1):
  drm/amd/display: Defer cursor update around VUPDATE for all ASIC

Randy Dunlap (1):
  dma-buf: fix documentation build warnings

Rodrigo Siqueira (1):
  drm/amd/display: Fix green screen issue after suspend

Sung Lee (1):
  drm/amd/display: Update downspread percent to match spreadsheet for DCN2.1

Tiecheng Zhou (2):
  Revert "drm/amd/powerplay: avoid using pm_en before it is initialized"
  drm/amd/powerplay: avoid using pm_en before it is initialized revised

Vasily Averin (4):
  drm/qxl: qxl_release leak in qxl_draw_dirty_fb()
  drm/qxl: qxl_release leak in qxl_hw_surface_alloc()
  drm/qxl: lost qxl_bo_kunmap_atomic_page in qxl_image_init_helper()
  drm/qxl: qxl_release use after free

Ville Syrjälä (1):
  drm/edid: Fix off-by-one in DispID DTD pixel clock

Xiaodong Yan (1):
  drm/amd/display: blank dp stream before re-train the link

Xiyu Yang (1):
  drm/i915/selftests: Fix i915_address_space refcnt leak

 drivers/dma-buf/dma-buf.c  |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  3 +-
 drivers/gpu/drm/amd/amdgpu/navi10_sdma_pkt_open.h  | 16 +
 drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 14 +++-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c  | 38 ---
 drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c   | 27 
 drivers/gpu/drm/amd/display/dc/core/dc_stream.c| 40 ++-
 .../amd/display/dc/dce110/dce110_hw_sequencer.c|  1 +
 .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c  | 10 +++
 .../drm/amd/display/dc/dcn10/dcn10_hw_sequencer.h  |  1 +
 

Re: [PATCH 2/2] epoll: atomically remove wait entry on wake up

2020-04-30 Thread Sasha Levin
Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.6.7, v5.4.35, v4.19.118, v4.14.177, 
v4.9.220, v4.4.220.

v5.6.7: Build OK!
v5.4.35: Build OK!
v4.19.118: Failed to apply! Possible dependencies:
1b53734bd0b2 ("epoll: fix possible lost wakeup on epoll_ctl() path")
35cff1a6e023 ("fs/epoll: rename check_events label to send_events")
86c051793b4c ("fs/epoll: deal with wait_queue only once")
abc610e01c66 ("fs/epoll: avoid barrier after an epoll_wait(2) timeout")
c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")

v4.14.177: Failed to apply! Possible dependencies:
002b343669c4 ("fs/epoll: loosen irq safety in ep_scan_ready_list()")
35cff1a6e023 ("fs/epoll: rename check_events label to send_events")
37b5e5212a44 ("epoll: remove ep_call_nested() from ep_eventpoll_poll()")
679abf381a18 ("fs/eventpoll.c: loosen irq safety in ep_poll()")
69112736e2f0 ("eventpoll: no need to mask the result of epi_item_poll() 
again")
86c051793b4c ("fs/epoll: deal with wait_queue only once")
bec1a502d34d ("eventpoll: constify struct epoll_event pointers")
c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()")
d85e2aa2e34d ("annotate ep_scan_ready_list()")
ee8ef0a4b167 ("epoll: use the waitqueue lock to protect ep->wq")

v4.9.220: Failed to apply! Possible dependencies:
86c051793b4c ("fs/epoll: deal with wait_queue only once")
89d947561077 ("sd: Implement support for ZBC devices")
ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t")
bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
cf43e6be865a ("block: add scalable completion tracking of requests")
e806402130c9 ("block: split out request-only flags into a new namespace")

v4.4.220: Failed to apply! Possible dependencies:
27489a3c827b ("blk-mq: turn hctx->run_work into a regular work struct")
511cbce2ff8b ("irq_poll: make blk-iopoll available outside the block layer")
86c051793b4c ("fs/epoll: deal with wait_queue only once")
8d354f133e86 ("blk-mq: improve layout of blk_mq_hw_ctx")
9467f85960a3 ("blk-mq/cpu-notif: Convert to new hotplug state machine")
ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t")
bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
cf43e6be865a ("block: add scalable completion tracking of requests")
d9d8c5c489f4 ("block: convert is_sync helpers to use REQ_OPs.")
da8b44d5a9f8 ("timer: convert timer_slack_ns from unsigned long to u64")
e57690fe009b ("blk-mq: don't overwrite rq->mq_ctx")
e6a40b096e28 ("block: prepare request creation/destruction code to use 
REQ_OPs")
e806402130c9 ("block: split out request-only flags into a new namespace")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

-- 
Thanks
Sasha


linux-next: manual merge of the net-next tree with the kspp-gustavo tree

2020-04-30 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/mellanox/mlxsw/spectrum_span.c

between commit:

  3ba225b506a2 ("treewide: Replace zero-length array with flexible-array 
member")

from the kspp-gustavo tree and commit:

  4780dbdbd957 ("mlxsw: spectrum_span: Replace zero-length array with 
flexible-array member")

from the net-next tree (and further changes).

I fixed it up (they had the same effect on this file, so I used the
latter) and can carry the fix as necessary. This is now fixed as far as
linux-next is concerned, but any non trivial conflicts should be mentioned
to your upstream maintainer when your tree is submitted for merging.
You may also want to consider cooperating with the maintainer of the
conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpyYM_9ScpC4.pgp
Description: OpenPGP digital signature


Re: [PATCH 0/7] padata: parallelize deferred page init

2020-04-30 Thread Daniel Jordan
On Thu, Apr 30, 2020 at 06:09:35PM -0700, Josh Triplett wrote:
> On Thu, Apr 30, 2020 at 04:11:18PM -0400, Daniel Jordan wrote:
> > Sometimes the kernel doesn't take full advantage of system memory
> > bandwidth, leading to a single CPU spending excessive time in
> > initialization paths where the data scales with memory size.
> > 
> > Multithreading naturally addresses this problem, and this series is the
> > first step.
> > 
> > It extends padata, a framework that handles many parallel singlethreaded
> > jobs, to handle multithreaded jobs as well by adding support for
> > splitting up the work evenly, specifying a minimum amount of work that's
> > appropriate for one helper thread to do, load balancing between helpers,
> > and coordinating them.  More documentation in patches 4 and 7.
> > 
> > The first user is deferred struct page init, a large bottleneck in
> > kernel boot--actually the largest for us and likely others too.  This
> > path doesn't require concurrency limits, resource control, or priority
> > adjustments like future users will (vfio, hugetlb fallocate, munmap)
> > because it happens during boot when the system is otherwise idle and
> > waiting on page init to finish.
> > 
> > This has been tested on a variety of x86 systems and speeds up kernel
> > boot by 6% to 49% by making deferred init 63% to 91% faster.  Patch 6
> > has detailed numbers.  Test results from other systems appreciated.
> > 
> > This series is based on v5.6 plus these three from mmotm:
> > 
> >   mm-call-touch_nmi_watchdog-on-max-order-boundaries-in-deferred-init.patch
> >   mm-initialize-deferred-pages-with-interrupts-enabled.patch
> >   mm-call-cond_resched-from-deferred_init_memmap.patch
> > 
> > All of the above can be found in this branch:
> > 
> >   git://oss.oracle.com/git/linux-dmjordan.git padata-mt-definit-v1
> >   
> > https://oss.oracle.com/git/gitweb.cgi?p=linux-dmjordan.git;a=shortlog;h=refs/heads/padata-mt-definit-v1
> 
> For the series (and the three prerequisite patches):
> 
> Tested-by: Josh Triplett 

Appreciate the runs, Josh, thanks.


linux-next: manual merge of the net-next tree with the net tree

2020-04-30 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c

between commit:

  8075411d93b6 ("net/mlx5: DR, On creation set CQ's arm_db member to right 
value")

from the net tree and commit:

  73a75b96fc9a ("net/mlx5: Remove empty QP and CQ events handlers")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
index 18719acb7e54,c4ed25bb9ac8..
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_send.c
@@@ -689,18 -693,6 +693,12 @@@ static int dr_prepare_qp_to_rts(struct 
return 0;
  }
  
- static void dr_cq_event(struct mlx5_core_cq *mcq,
-   enum mlx5_event event)
- {
-   pr_info("CQ event %u on CQ #%u\n", event, mcq->cqn);
- }
- 
 +static void dr_cq_complete(struct mlx5_core_cq *mcq,
 + struct mlx5_eqe *eqe)
 +{
 +  pr_err("CQ completion CQ: #%u\n", mcq->cqn);
 +}
 +
  static struct mlx5dr_cq *dr_create_cq(struct mlx5_core_dev *mdev,
  struct mlx5_uars_page *uar,
  size_t ncqe)
@@@ -761,9 -753,6 +759,8 @@@
pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas);
mlx5_fill_page_frag_array(>wq_ctrl.buf, pas);
  
-   cq->mcq.event = dr_cq_event;
 +  cq->mcq.comp  = dr_cq_complete;
 +
err = mlx5_core_create_cq(mdev, >mcq, in, inlen, out, sizeof(out));
kvfree(in);
  


pgpiirFWSlynZ.pgp
Description: OpenPGP digital signature


Re: [PATCH 5/7] mm: move zone iterator outside of deferred_init_maxorder()

2020-04-30 Thread Daniel Jordan
Hi Alex,

On Thu, Apr 30, 2020 at 02:43:28PM -0700, Alexander Duyck wrote:
> On 4/30/2020 1:11 PM, Daniel Jordan wrote:
> > padata will soon divide up pfn ranges between threads when parallelizing
> > deferred init, and deferred_init_maxorder() complicates that by using an
> > opaque index in addition to start and end pfns.  Move the index outside
> > the function to make splitting the job easier, and simplify the code
> > while at it.
> > 
> > deferred_init_maxorder() now always iterates within a single pfn range
> > instead of potentially multiple ranges, and advances start_pfn to the
> > end of that range instead of the max-order block so partial pfn ranges
> > in the block aren't skipped in a later iteration.  The section alignment
> > check in deferred_grow_zone() is removed as well since this alignment is
> > no longer guaranteed.  It's not clear what value the alignment provided
> > originally.
> > 
> > Signed-off-by: Daniel Jordan 
> 
> So part of the reason for splitting it up along section aligned boundaries
> was because we already had an existing functionality in deferred_grow_zone
> that was going in and pulling out a section aligned chunk and processing it
> to prepare enough memory for other threads to keep running. I suspect that
> the section alignment was done because normally I believe that is also the
> alignment for memory onlining.

I think Pavel added that functionality, maybe he could confirm.

My impression was that the reason deferred_grow_zone aligned the requested
order up to a section was to make enough memory available to avoid being called
on every allocation.

> With this already breaking things up over multiple threads how does this
> work with deferred_grow_zone? Which thread is it trying to allocate from if
> it needs to allocate some memory for itself?

I may not be following your question, but deferred_grow_zone doesn't allocate
memory during the multithreading in deferred_init_memmap because the latter
sets first_deferred_pfn so that deferred_grow_zone bails early.

> Also what is to prevent a worker from stop deferred_grow_zone from bailing
> out in the middle of a max order page block if there is a hole in the middle
> of the block?

deferred_grow_zone remains singlethreaded.  It could stop in the middle of a
max order block, but it can't run concurrently with deferred_init_memmap, as
per above, so if deferred_init_memmap were to init 'n free the remaining part
of the block, the previous portion would have already been initialized.


Re: [PATCH 0/7] padata: parallelize deferred page init

2020-04-30 Thread Daniel Jordan
On Thu, Apr 30, 2020 at 05:40:59PM -0400, Pavel Tatashin wrote:
> On Thu, Apr 30, 2020 at 5:31 PM Andrew Morton  
> wrote:
> > On Thu, 30 Apr 2020 16:11:18 -0400 Daniel Jordan 
> >  wrote:
> >
> > > Sometimes the kernel doesn't take full advantage of system memory
> > > bandwidth, leading to a single CPU spending excessive time in
> > > initialization paths where the data scales with memory size.
> > >
> > > Multithreading naturally addresses this problem, and this series is the
> > > first step.
> > >
> > > It extends padata, a framework that handles many parallel singlethreaded
> > > jobs, to handle multithreaded jobs as well by adding support for
> > > splitting up the work evenly, specifying a minimum amount of work that's
> > > appropriate for one helper thread to do, load balancing between helpers,
> > > and coordinating them.  More documentation in patches 4 and 7.
> > >
> > > The first user is deferred struct page init, a large bottleneck in
> > > kernel boot--actually the largest for us and likely others too.  This
> > > path doesn't require concurrency limits, resource control, or priority
> > > adjustments like future users will (vfio, hugetlb fallocate, munmap)
> > > because it happens during boot when the system is otherwise idle and
> > > waiting on page init to finish.
> > >
> > > This has been tested on a variety of x86 systems and speeds up kernel
> > > boot by 6% to 49% by making deferred init 63% to 91% faster.
> >
> > How long is this up-to-91% in seconds?  If it's 91% of a millisecond
> > then not impressed.  If it's 91% of two weeks then better :)

The largest system I could test had 384G per node and saved 1.5 out of 4
seconds.

> > Relatedly, how important is boot time on these large machines anyway?
> > They presumably have lengthy uptimes so boot time is relatively
> > unimportant?
> 
> Large machines indeed have a lengthy uptime, but they also can host a
> large number of VMs meaning that downtime of the host increases the
> downtime of VMs in cloud environments. Some VMs might be very sensible
> to downtime: game servers, traders, etc.
>
> > IOW, can you please explain more fully why this patchset is valuable to
> > our users?

I'll let the users speak for themselves, but I have a similar use case to Pavel
of limiting the downtime of VMs running on these large systems, and spinning up
instances as fast as possible is also desirable for our cloud users.


Re: [PATCH 08/15] usb: ehci: avoid gcc-10 zero-length-bounds warning

2020-04-30 Thread Alan Stern
On Thu, 30 Apr 2020, Arnd Bergmann wrote:

> Building ehci drivers with gcc-10 results in a number of warnings like
> when an zero-length array is accessed:
> 
> drivers/usb/host/ehci-hub.c: In function 'ehci_bus_suspend':
> drivers/usb/host/ehci-hub.c:320:30: error: array subscript 14 is outside the 
> bounds of an interior zero-length array 'u32[0]' {aka 'unsigned int[0]'} 
> [-Werror=zero-length-bounds]
>   320 |u32 __iomem *hostpc_reg = >regs->hostpc[port];
>   |  ^
> In file included from drivers/usb/host/ehci.h:273,
>  from drivers/usb/host/ehci-hcd.c:96:
> include/linux/usb/ehci_def.h:186:7: note: while referencing 'hostpc'
>   186 |  u32  hostpc[0]; /* HOSTPC extension */
>   |   ^~
> In file included from drivers/usb/host/ehci-hcd.c:305:
> drivers/usb/host/ehci-hub.c: In function 'ehci_hub_control':
> drivers/usb/host/ehci-hub.c:892:15: error: array subscript 256 is outside the 
> bounds of an interior zero-length array 'u32[0]' {aka 'unsigned int[0]'} 
> [-Werror=zero-length-bounds]
>   892 |  hostpc_reg = >regs->hostpc[temp];
>   |   ^
> In file included from drivers/usb/host/ehci.h:273,
>  from drivers/usb/host/ehci-hcd.c:96:
> include/linux/usb/ehci_def.h:186:7: note: while referencing 'hostpc'
>   186 |  u32  hostpc[0]; /* HOSTPC extension */
>   |   ^~
> 
> All these fields are colocated with reserved fields that I guess
> refer to the correct field length.

No, they don't.

> Change the two struct definition to use an unnamed union to define
> both of these fields at the same location as the corresponding
> reserved fields.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  include/linux/usb/ehci_def.h | 12 
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/usb/ehci_def.h b/include/linux/usb/ehci_def.h
> index 78e00637..8777d8e56ef2 100644
> --- a/include/linux/usb/ehci_def.h
> +++ b/include/linux/usb/ehci_def.h
> @@ -127,7 +127,8 @@ struct ehci_regs {
>  #define FLAG_CF  (1<<0)  /* true: we'll support "high 
> speed" */
>  
>   /* PORTSC: offset 0x44 */
> - u32 port_status[0]; /* up to N_PORTS */
> + union {
> + u32 port_status[9]; /* up to N_PORTS */

This array can have up to 15 elements, meaning that it can extend out
to offset 0x80.

>  /* EHCI 1.1 addendum */
>  #define PORTSC_SUSPEND_STS_ACK 0
>  #define PORTSC_SUSPEND_STS_NYET 1
> @@ -165,7 +166,8 @@ struct ehci_regs {
>  #define PORT_CONNECT (1<<0)  /* device connected */
>  #define PORT_RWC_BITS   (PORT_CSC | PORT_PEC | PORT_OCC)
>  
> - u32 reserved3[9];
> + u32 reserved3[9];
> + };
>  
>   /* USBMODE: offset 0x68 */
>   u32 usbmode;/* USB Device mode */

As you see, this next field actually lies inside the preceding array.  
It's not a real conflict; any hardware which supports the usbmode field 
uses only the first element of the port_status array.

I don't know how you want to handle this.  Doing

#define usbmode port_status[9]

doesn't seem like a very good approach, but I can't think of anything 
better at the moment.  Maybe just set the array size to 9, as you did, 
but with a comment explaining what's really going on.

> @@ -181,11 +183,13 @@ struct ehci_regs {
>   * PORTSCx
>   */
>   /* HOSTPC: offset 0x84 */
> - u32 hostpc[0];  /* HOSTPC extension */
> + union {
> + u32 hostpc[17]; /* HOSTPC extension */

Likewise, this array can have up to 15 elements.  In fact, it's the 
same size as the port_status array.

>  #define HOSTPC_PHCD  (1<<22) /* Phy clock disable */
>  #define HOSTPC_PSPD  (3<<25) /* Port speed detection */
>  
> - u32 reserved5[17];
> + u32 reserved5[17];
> + };
>  
>   /* USBMODE_EX: offset 0xc8 */
>   u32 usbmode_ex; /* USB Device mode extension */

Alan Stern



linux-next: manual merge of the net-next tree with the net tree

2020-04-30 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  include/net/mptcp.h

between commit:

  cfde141ea3fa ("mptcp: move option parsing into mptcp_incoming_options()")

from the net tree and commit:

  071c8ed6e88d ("tcp: mptcp: use mptcp receive buffer space to select rcv 
window")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/net/mptcp.h
index 3bce2019e4da,5288fba56e55..
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@@ -68,8 -68,13 +68,10 @@@ static inline bool rsk_is_mptcp(const s
return tcp_rsk(req)->is_mptcp;
  }
  
+ void mptcp_space(const struct sock *ssk, int *space, int *full_space);
+ 
 -void mptcp_parse_option(const struct sk_buff *skb, const unsigned char *ptr,
 -  int opsize, struct tcp_options_received *opt_rx);
  bool mptcp_syn_options(struct sock *sk, const struct sk_buff *skb,
   unsigned int *size, struct mptcp_out_options *opts);
 -void mptcp_rcv_synsent(struct sock *sk);
  bool mptcp_synack_options(const struct request_sock *req, unsigned int *size,
  struct mptcp_out_options *opts);
  bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,


pgpoAENNkOI1v.pgp
Description: OpenPGP digital signature


Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()

2020-04-30 Thread Steven Rostedt
On Thu, 30 Apr 2020 22:26:55 -0400 (EDT)
Mathieu Desnoyers  wrote:

> - On Apr 30, 2020, at 9:13 PM, rostedt rost...@goodmis.org wrote:
> 
> > [ Joerg, sending again this time not just to you. (hit reply to sender
> >  and not reply to all). Feel free to resend what you wrote before to this ]
> > 
> > On Thu, 30 Apr 2020 21:14:34 +0200
> > Joerg Roedel  wrote:
> >   
> >> And alloc_percpu() calls down into pcpu_alloc(), which allocates new
> >> percpu chunks using vmalloc() on x86. And there we are again in the
> >> vmalloc area.  
> > 
> > So after a vmalloc() is made, should the page tables be synced?  
> 
> Why should it ? Usually, the page fault handler is able to resolve the
> resulting minor page faults lazily.
> 
> > 
> > This is a rather subtle bug, and I don't think it should be the caller of
> > percpu_alloc() that needs to call vmalloc_sync_mappings().  
> 
> Who said tracing was easy ? ;-)

But anyone can hook to a tracepoint, and then if they hook to one that is
in the page fault handler, and they use vmalloc, they can lock up the
machine.

> 
> > What's your suggestion for a fix?  
> 
> I know the question is not addressed to me, but here are my 2 cents:
> 
> It's subtle because ftrace is tracing the page fault handler through
> tracepoints. It would not make sense to slow down all vmalloc or
> percpu_alloc() just because tracing recurses when tracing page faults.

What's so damn special about alloc_percpu()? It's definitely not a fast
path. And it's not used often.

> 
> I think the right approach to solve this is to call vmalloc_sync_mappings()
> before any vmalloc'd memory ends up being observable by instrumentation.
> This can be achieved by adding a vmalloc_sync_mappings call on tracepoint
> registration like I proposed in my patchset a few week ago:
> 
> https://lore.kernel.org/r/20200409193543.18115-2-mathieu.desnoy...@efficios.com
> 
> The tracers just have to make sure they perform their vmalloc'd memory
> allocation before registering the tracepoint which can touch it, else they
> need to issue vmalloc_sync_mappings() on their own before making the
> newly allocated memory observable by instrumentation.
> 
> This approach is not new: register_die_notifier() does exactly that today.
> 

I'll give the answer I gave to Joerg when he replied to my accidental
private (not public) email:

Or even my original patch would be better than having the generic tracing
code understanding the intrinsic properties of vmalloc() and
alloc_percpu() on x86_64. I really don't think it is wise to have:

foo = alloc_percpu();

/*
 * Because of some magic with the way alloc_percpu() works on
 * x86_64, we need to synchronize the pgd of all the tables,
 * otherwise the trace events that happen in x86_64 page fault
 * handlers can't cope with accessing the chance that a
 * alloc_percpu()'d memory might be touched in the page fault trace
 * event. Oh, and we need to audit all alloc_percpu() and vmalloc()
 * calls in tracing, because something might get triggered within a
 * page fault trace event!
 */
vmalloc_sync_mappings();

That would be exactly what I add as a comment if it were to be added in the
generic tracing code.

And we would need to audit any percpu alloc'd code in all tracing, or
anything that might git hooked into something that hooks to the page fault
trace point.

Since this worked for a decade without this, I'm strongly against adding it
in the generic code due to some issues with a single architecture.

-- Steve



Re: [PATCH 21/29] mm: remove the pgprot argument to __vmalloc

2020-04-30 Thread John Dorminy
>> On Tue, Apr 14, 2020 at 03:13:40PM +0200, Christoph Hellwig wrote:
>> > The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
>> > it.

Greetings;

I recently noticed this change via the linux-next tree.

It may not be possible to edit at this late date, but the change
description refers to PROT_KERNEL, which is a symbol which does not
appear to exist; perhaps PAGE_KERNEL was meant? The mismatch caused me
and a couple other folks some confusion briefly until we decided it
was supposed to be PAGE_KERNEL; if it's not too late, editing the
description to clarify so would be nice.

Many thanks.

John Dorminy



Re: [PATCH V1 09/10] arch/kmap: Define kmap_atomic_prot() for all arch's

2020-04-30 Thread Al Viro
On Thu, Apr 30, 2020 at 01:38:44PM -0700, ira.we...@intel.com wrote:

> -static inline void *kmap_atomic(struct page *page)
> +static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot)
>  {
>   preempt_disable();
>   pagefault_disable();
>   if (!PageHighMem(page))
>   return page_address(page);
> - return kmap_atomic_high(page);
> + return kmap_atomic_high_prot(page, prot);
>  }
> +#define kmap_atomic(page)kmap_atomic_prot(page, kmap_prot)

OK, so it *was* just a bisect hazard - you return to original semantics
wrt preempt_disable()...


[PATCH] streamline_config.pl: add LOCALMODCONFIG_PRESERVE to preserve some kconfigs

2020-04-30 Thread Changbin Du
Sometimes it is useful to preserve batches of configs when making
localmodconfig. For example, I usually don't want any usb and fs
modules to be disabled. Now we can do it by:

 $ make LOCALMODCONFIG_PRESERVE="drivers/usb;fs" localmodconfig

Signed-off-by: Changbin Du 
---
 Documentation/admin-guide/README.rst |  8 +++-
 scripts/kconfig/streamline_config.pl | 23 +++
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/README.rst 
b/Documentation/admin-guide/README.rst
index cc6151fc0845..6deff95362f8 100644
--- a/Documentation/admin-guide/README.rst
+++ b/Documentation/admin-guide/README.rst
@@ -209,10 +209,16 @@ Configuring the kernel
store the lsmod of that machine into a file
and pass it in as a LSMOD parameter.
 
+   Also, you can preserve modules in certen folders
+   or kconfig files by spcifying there paths in
+   parameter LOCALMODCONFIG_PRESERVE.
+
target$ lsmod > /tmp/mylsmod
target$ scp /tmp/mylsmod host:/tmp
 
-   host$ make LSMOD=/tmp/mylsmod localmodconfig
+   host$ make LSMOD=/tmp/mylsmod \
+   
LOCALMODCONFIG_PRESERVE="drivers/usb;drivers/gpu;fs" \
+   localmodconfig
 
The above also works when cross compiling.
 
diff --git a/scripts/kconfig/streamline_config.pl 
b/scripts/kconfig/streamline_config.pl
index e2f8504f5a2d..ab5d1e10a5d0 100755
--- a/scripts/kconfig/streamline_config.pl
+++ b/scripts/kconfig/streamline_config.pl
@@ -143,6 +143,7 @@ my %depends;
 my %selects;
 my %prompts;
 my %objects;
+my %config2kfile;
 my $var;
 my $iflevel = 0;
 my @ifdeps;
@@ -201,6 +202,7 @@ sub read_kconfig {
if (/^\s*(menu)?config\s+(\S+)\s*$/) {
$state = "NEW";
$config = $2;
+   $config2kfile{"CONFIG_$config"} = $kconfig;
 
# Add depends for 'if' nesting
for (my $i = 0; $i < $iflevel; $i++) {
@@ -592,6 +594,22 @@ while ($repeat) {
 
 my %setconfigs;
 
+my @presevered_kconfigs;
+@presevered_kconfigs = split(/;/,$ENV{LOCALMODCONFIG_PRESERVE}) if 
(defined($ENV{LOCALMODCONFIG_PRESERVE}));
+
+sub in_presevered_kconfigs {
+my $kconfig = $config2kfile{$_[0]};
+if (!defined($kconfig)) {
+return 0;
+}
+foreach my $excl (@presevered_kconfigs) {
+if($kconfig =~ /^$excl/) {
+return 1;
+}
+}
+return 0;
+}
+
 # Finally, read the .config file and turn off any module enabled that
 # we could not find a reason to keep enabled.
 foreach my $line (@config_file) {
@@ -644,6 +662,11 @@ foreach my $line (@config_file) {
 }
 
 if (/^(CONFIG.*)=(m|y)/) {
+if (in_presevered_kconfigs($1)) {
+dprint "Preserve config $1";
+print;
+next;
+}
if (defined($configs{$1})) {
if ($localyesconfig) {
$setconfigs{$1} = 'y';
-- 
2.25.1



Re: bug: Kbuild seems to require "make prepare-objtool" in order to use (some) new config vars

2020-04-30 Thread Masahiro Yamada
On Fri, May 1, 2020 at 4:10 AM Mauro Carvalho Chehab
 wrote:
>
> Em Fri, 1 May 2020 02:20:13 +0900
> Masahiro Yamada  escreveu:
>
> > On Fri, May 1, 2020 at 1:49 AM Mauro Carvalho Chehab
> >  wrote:
> > >
> > > Em Thu, 30 Apr 2020 22:51:48 +0900
> > > Masahiro Yamada  escreveu:
> > >
> > > > Hi Mauro,
> > > >
> > > >
> > > > On Thu, Apr 30, 2020 at 8:17 PM Mauro Carvalho Chehab
> > > >  wrote:
> > > > >
> > > > > Hi Masahiro,
> > > > >
> > > > > Not sure if this was already reported (or eventually fixed) upstream.
> > > > >
> > > > > While doing a Kconfig reorg on media, I noticed a few weird things,
> > > > > requiring me to call, on a few situations, "make modules_prepare"
> > > > > manually after some changes.
> > > > >
> > > > > I'm now working on a patchset to yet to be merged upstream aiming to
> > > > > resurrect a driver from staging. It is currently on this tree
> > > > > (with is based at the media development tree, on the top of 5.7-rc1):
> > > > >
> > > > > 
> > > > > https://git.linuxtv.org/mchehab/experimental.git/log/?h=atomisp_v2
> > > > >
> > > > > There, I was able to identify a misbehavior that it is probably
> > > > > what forced me to need calling "make modules_prepare".
> > > > >
> > > > > The atomisp driver is on a very bad shape. Among its problems, it has 
> > > > > a
> > > > > set of header definitions that should be different for two different
> > > > > variants of the supported devices. In order to be able to compile for
> > > > > either one of the variants, I added a new config var:
> > > > > CONFIG_VIDEO_ATOMISP_ISP2401.
> > > > >
> > > > > The problem is that calling just
> > > > >
> > > > > ./scripts/config -e CONFIG_VIDEO_ATOMISP_ISP2401
> > > > >
> > > > > or
> > > > > ./scripts/config -d CONFIG_VIDEO_ATOMISP_ISP2401
> > > >
> > > >
> > > > scripts/config does not take the dependency into consideration
> > > > at all.
> > > >
> > > > You need to enable/disable other options that it depends on.
> > >
> > > Yes, I know. on my tests, I did a "make allyesconfig" before
> > > the patch whose added this dependency. So, the only dependency
> > > left to be enabled or disabled was this one.
> > >
> > > The problem I'm pointing is not really do a select recursion
> > > (that would be a really cool feature, but I know it is not
> > > trivial), but, instead, that, despite the config var was
> > > there, when I tried to build it with:
> > >
> > > make clean; make M=drivers/staging/media/atomisp
> > >
> > > It didn't do the right thing. Instead, it used ISP2401=y
> > > on make clean and ISP2401=n at the drivers build.
> > >
> > > So, in order to test if patches won't break building,
> > > depending on the value of this var, I'm currently doing:
> > >
> > > cls;./scripts/config -d CONFIG_VIDEO_ATOMISP_ISP2401 && make 
> > > prepare-objtool && make clean && make M=drivers/staging/media/atomisp && 
> > > ./scripts/config -e CONFIG_VIDEO_ATOMISP_ISP2401 && make prepare-objtool 
> > > && make clean && make M=drivers/staging/media/atomisp
> > >
> > > (note the alien "make prepare-objtool" in the middle)
> > >
> > >
> > > > ./scripts/config -e STAGING -e STAGING_MEDIA -e MEDIA_SUPPORT -e
> > > > MEDIA_CONTROLLER -e INTEL_ATOMISP -e VIDEOBUF_VMALLOC -e VIDEO_ATOMISP
> > > > -d MEDIA_SUPPORT_FILTER -e VIDEO_DEV -e VIDEO_V4L2 -e
> > > > VIDEO_ATOMISP_ISP2401
> > > >
> > > > allows me to enable VIDEO_ATOMISP_ISP2401.
> > > >
> > > >
> > > > It is painful to debug such complicated dependency graph, though.
> > >
> > > Yeah, dependencies at the media subsystem are usually quite complex.
> > >
> > > > >
> > > > > is not enough anymore for the build to actually use the new config 
> > > > > value.
> > > > >
> > > > > It just keeps silently using the old config value.
> > > > >
> > > > > I double-checked it by adding this macro at the Makefile:
> > > > >
> > > > > $(info  ISP2401: $(CONFIG_VIDEO_ATOMISP_ISP2401) 
> > > > > )
> > > > >
> > > > > The Makefile doesn't see the change, except if I explicitly call
> > > > > "make modules_prepare" or "make prepare-objtool".
> > > > >
> > > > > Even calling "make oldconfig" doesn't make it use the new CONFIG_*
> > > >
> > > >
> > > > I do not know.
> > > >
> > > > I cannot look into it without detailed steps to reproduce it.
> > >
> > > I'm applying the enclosed patch to this branch:
> > >
> > > https://git.linuxtv.org/mchehab/experimental.git/log/?h=atomisp_v2
> > >
> > > and running this:
> > >
> > > $ make allyesconfig 2>/dev/null >/dev/null; echo 
> > > "disable";./scripts/config -d CONFIG_VIDEO_ATOMISP_ISP2401 && make 
> > > M=drivers/staging/media/atomisp && ./scripts/config -e 
> > > CONFIG_VIDEO_ATOMISP_ISP2401 && echo "enable" && make clean && make 
> > > M=drivers/staging/media/atomisp
> > > disable
> > >
> > >   WARNING: Symbol version dump ./Module.symvers
> > >is missing; modules will have no dependencies and 
> > > 

Re: [PATCH V1 00/10] Remove duplicated kmap code

2020-04-30 Thread Michael Ellerman
ira.we...@intel.com writes:
> From: Ira Weiny 
>
> The kmap infrastructure has been copied almost verbatim to every architecture.
> This series consolidates obvious duplicated code by defining core functions
> which call into the architectures only when needed.
>
> Some of the k[un]map_atomic() implementations have some similarities but the
> similarities were not sufficient to warrant further changes.
>
> In addition we remove a duplicate implementation of kmap() in DRM.
>
> Testing was done by 0day to cover all the architectures I can't readily
> build/test.

I threw some powerpc builds at it and they all passed, so LGTM.

cheers


Re: [RFC][PATCH] x86/mm: Sync all vmalloc mappings before text_poke()

2020-04-30 Thread Mathieu Desnoyers
- On Apr 30, 2020, at 9:13 PM, rostedt rost...@goodmis.org wrote:

> [ Joerg, sending again this time not just to you. (hit reply to sender
>  and not reply to all). Feel free to resend what you wrote before to this ]
> 
> On Thu, 30 Apr 2020 21:14:34 +0200
> Joerg Roedel  wrote:
> 
>> And alloc_percpu() calls down into pcpu_alloc(), which allocates new
>> percpu chunks using vmalloc() on x86. And there we are again in the
>> vmalloc area.
> 
> So after a vmalloc() is made, should the page tables be synced?

Why should it ? Usually, the page fault handler is able to resolve the
resulting minor page faults lazily.

> 
> This is a rather subtle bug, and I don't think it should be the caller of
> percpu_alloc() that needs to call vmalloc_sync_mappings().

Who said tracing was easy ? ;-)

> What's your suggestion for a fix?

I know the question is not addressed to me, but here are my 2 cents:

It's subtle because ftrace is tracing the page fault handler through
tracepoints. It would not make sense to slow down all vmalloc or
percpu_alloc() just because tracing recurses when tracing page faults.

I think the right approach to solve this is to call vmalloc_sync_mappings()
before any vmalloc'd memory ends up being observable by instrumentation.
This can be achieved by adding a vmalloc_sync_mappings call on tracepoint
registration like I proposed in my patchset a few week ago:

https://lore.kernel.org/r/20200409193543.18115-2-mathieu.desnoy...@efficios.com

The tracers just have to make sure they perform their vmalloc'd memory
allocation before registering the tracepoint which can touch it, else they
need to issue vmalloc_sync_mappings() on their own before making the
newly allocated memory observable by instrumentation.

This approach is not new: register_die_notifier() does exactly that today.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH] mm/gup.c: Corrected document reference path

2020-04-30 Thread John Hubbard

On 2020-04-30 12:01, Souptick Joarder wrote:

Document path Documentation/vm/pin_user_pages.rst is not a correct
reference and it should be Documentation/core-api/pin_user_pages.rst.

Signed-off-by: Souptick Joarder 
---
  mm/gup.c | 18 +-
  1 file changed, 9 insertions(+), 9 deletions(-)



Reviewed-by: John Hubbard 


thanks,
--
John Hubbard
NVIDIA



diff --git a/mm/gup.c b/mm/gup.c
index 7ce796c..4952f12 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2864,10 +2864,10 @@ int get_user_pages_fast(unsigned long start, int 
nr_pages,
   * the arguments here are identical.
   *
   * FOLL_PIN means that the pages must be released via unpin_user_page(). 
Please
- * see Documentation/vm/pin_user_pages.rst for further details.
+ * see Documentation/core-api/pin_user_pages.rst for further details.
   *
- * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
- * is NOT intended for Case 2 (RDMA: long-term pins).
+ * This is intended for Case 1 (DIO) in 
Documentation/core-api/pin_user_pages.rst.
+ * It is NOT intended for Case 2 (RDMA: long-term pins).
   */
  int pin_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages)
@@ -2904,10 +2904,10 @@ int pin_user_pages_fast(unsigned long start, int 
nr_pages,
   * the arguments here are identical.
   *
   * FOLL_PIN means that the pages must be released via unpin_user_page(). 
Please
- * see Documentation/vm/pin_user_pages.rst for details.
+ * see Documentation/core-api/pin_user_pages.rst for details.
   *
- * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
- * is NOT intended for Case 2 (RDMA: long-term pins).
+ * This is intended for Case 1 (DIO) in 
Documentation/core-api/pin_user_pages.rst.
+ * It is NOT intended for Case 2 (RDMA: long-term pins).
   */
  long pin_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
   unsigned long start, unsigned long nr_pages,
@@ -2940,10 +2940,10 @@ long pin_user_pages_remote(struct task_struct *tsk, 
struct mm_struct *mm,
   * FOLL_PIN is set.
   *
   * FOLL_PIN means that the pages must be released via unpin_user_page(). 
Please
- * see Documentation/vm/pin_user_pages.rst for details.
+ * see Documentation/core-api/pin_user_pages.rst for details.
   *
- * This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
- * is NOT intended for Case 2 (RDMA: long-term pins).
+ * This is intended for Case 1 (DIO) in 
Documentation/core-api/pin_user_pages.rst.
+ * It is NOT intended for Case 2 (RDMA: long-term pins).
   */
  long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,



Re: [PATCH V1 08/10] arch/kmap: Don't hard code kmap_prot values

2020-04-30 Thread Al Viro
On Thu, Apr 30, 2020 at 01:38:43PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> To support kmap_atomic_prot() on all architectures each arch must
> support protections passed in to them.
> 
> Change csky, mips, nds32 and xtensa to use their global kmap_prot value
> rather than a hard coded value which was equal.

Minor nitpick: it's probably worth pointing out that kmap_prot on those
is a constant...


[GIT PULL] SCSI fixes for 5.7-rc3

2020-04-30 Thread James Bottomley
Four minor fixes: three in drivers and one in the core.  The core one
allows an additional state change that fixes a regression introduced by
an update to the aacraid driver in the previous merge window.

The patch is available here:

git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi.git scsi-fixes

The short changelog is:

David Disseldorp (1):
  scsi: target/iblock: fix WRITE SAME zeroing

Dexuan Cui (1):
  scsi: core: Allow the state change from SDEV_QUIESCE to SDEV_BLOCK

Martin Wilck (2):
  scsi: qla2xxx: check UNLOADING before posting async work
  scsi: qla2xxx: set UNLOADING before waiting for session deletion

And the diffstat

 drivers/scsi/qla2xxx/qla_os.c   | 35 +--
 drivers/scsi/scsi_lib.c |  1 +
 drivers/target/target_core_iblock.c |  2 +-
 3 files changed, 19 insertions(+), 19 deletions(-)

With full diff below.

James

---

diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index d190db5ea7d9..1d9a4866f9a7 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -3732,6 +3732,13 @@ qla2x00_remove_one(struct pci_dev *pdev)
}
qla2x00_wait_for_hba_ready(base_vha);
 
+   /*
+* if UNLOADING flag is already set, then continue unload,
+* where it was set first.
+*/
+   if (test_and_set_bit(UNLOADING, _vha->dpc_flags))
+   return;
+
if (IS_QLA25XX(ha) || IS_QLA2031(ha) || IS_QLA27XX(ha) ||
IS_QLA28XX(ha)) {
if (ha->flags.fw_started)
@@ -3750,15 +3757,6 @@ qla2x00_remove_one(struct pci_dev *pdev)
 
qla2x00_wait_for_sess_deletion(base_vha);
 
-   /*
-* if UNLOAD flag is already set, then continue unload,
-* where it was set first.
-*/
-   if (test_bit(UNLOADING, _vha->dpc_flags))
-   return;
-
-   set_bit(UNLOADING, _vha->dpc_flags);
-
qla_nvme_delete(base_vha);
 
dma_free_coherent(>pdev->dev,
@@ -4864,6 +4862,9 @@ qla2x00_alloc_work(struct scsi_qla_host *vha, enum 
qla_work_type type)
struct qla_work_evt *e;
uint8_t bail;
 
+   if (test_bit(UNLOADING, >dpc_flags))
+   return NULL;
+
QLA_VHA_MARK_BUSY(vha, bail);
if (bail)
return NULL;
@@ -6628,13 +6629,6 @@ qla2x00_disable_board_on_pci_error(struct work_struct 
*work)
struct pci_dev *pdev = ha->pdev;
scsi_qla_host_t *base_vha = pci_get_drvdata(ha->pdev);
 
-   /*
-* if UNLOAD flag is already set, then continue unload,
-* where it was set first.
-*/
-   if (test_bit(UNLOADING, _vha->dpc_flags))
-   return;
-
ql_log(ql_log_warn, base_vha, 0x015b,
"Disabling adapter.\n");
 
@@ -6645,9 +6639,14 @@ qla2x00_disable_board_on_pci_error(struct work_struct 
*work)
return;
}
 
-   qla2x00_wait_for_sess_deletion(base_vha);
+   /*
+* if UNLOADING flag is already set, then continue unload,
+* where it was set first.
+*/
+   if (test_and_set_bit(UNLOADING, _vha->dpc_flags))
+   return;
 
-   set_bit(UNLOADING, _vha->dpc_flags);
+   qla2x00_wait_for_sess_deletion(base_vha);
 
qla2x00_delete_all_vps(ha, base_vha);
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 47835c4b4ee0..06c260f6cdae 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2284,6 +2284,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
scsi_device_state state)
switch (oldstate) {
case SDEV_RUNNING:
case SDEV_CREATED_BLOCK:
+   case SDEV_QUIESCE:
case SDEV_OFFLINE:
break;
default:
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index 51ffd5c002de..1c181d31f4c8 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -432,7 +432,7 @@ iblock_execute_zero_out(struct block_device *bdev, struct 
se_cmd *cmd)
target_to_linux_sector(dev, cmd->t_task_lba),
target_to_linux_sector(dev,
sbc_get_write_same_sectors(cmd)),
-   GFP_KERNEL, false);
+   GFP_KERNEL, BLKDEV_ZERO_NOUNMAP);
if (ret)
return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE;
 


[PATCH glibc 2/3] glibc: sched_getcpu(): use rseq cpu_id TLS on Linux (v7)

2020-04-30 Thread Mathieu Desnoyers
When available, use the cpu_id field from __rseq_abi on Linux to
implement sched_getcpu().  Fall-back on the vgetcpu vDSO if unavailable.

Benchmarks:

x86-64: Intel E5-2630 v3@2.40GHz, 16-core, hyperthreading

glibc sched_getcpu(): 13.7 ns (baseline)
glibc sched_getcpu() using rseq:   2.5 ns (speedup:  5.5x)
inline load cpuid from __rseq_abi TLS: 0.8 ns (speedup: 17.1x)

CC: Carlos O'Donell 
CC: Florian Weimer 
CC: Joseph Myers 
CC: Szabolcs Nagy 
CC: Thomas Gleixner 
CC: Ben Maurer 
CC: Peter Zijlstra 
CC: "Paul E. McKenney" 
CC: Boqun Feng 
CC: Will Deacon 
CC: Paul Turner 
CC: libc-al...@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-...@vger.kernel.org
---
Changes since v1:
- rseq is only used if both __NR_rseq and RSEQ_SIG are defined.

Changes since v2:
- remove duplicated __rseq_abi extern declaration.

Changes since v3:
- update ChangeLog.

Changes since v4:
- Use atomic_load_relaxed to load the __rseq_abi.cpu_id field, a
  consequence of the fact that __rseq_abi is not volatile anymore.
- Include atomic.h which provides atomic_load_relaxed.

Changes since v5:
- Use __ASSUME_RSEQ to detect rseq availability.

Changes since v6:
- Remove use of __ASSUME_RSEQ.
---
 sysdeps/unix/sysv/linux/sched_getcpu.c | 27 --
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/sysdeps/unix/sysv/linux/sched_getcpu.c 
b/sysdeps/unix/sysv/linux/sched_getcpu.c
index c019cfb3cf..2269c4f2bd 100644
--- a/sysdeps/unix/sysv/linux/sched_getcpu.c
+++ b/sysdeps/unix/sysv/linux/sched_getcpu.c
@@ -18,10 +18,15 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
-int
-sched_getcpu (void)
+#ifdef HAVE_GETCPU_VSYSCALL
+# define HAVE_VSYSCALL
+#endif
+
+static int
+vsyscall_sched_getcpu (void)
 {
   unsigned int cpu;
   int r = -1;
@@ -32,3 +37,21 @@ sched_getcpu (void)
 #endif
   return r == -1 ? r : cpu;
 }
+
+#include 
+
+#ifdef RSEQ_SIG
+int
+sched_getcpu (void)
+{
+  int cpu_id = atomic_load_relaxed (&__rseq_abi.cpu_id);
+
+  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
+}
+#else
+int
+sched_getcpu (void)
+{
+  return vsyscall_sched_getcpu ();
+}
+#endif
-- 
2.17.1



[PATCH glibc 1/3] glibc: Perform rseq registration at C startup and thread creation (v19)

2020-04-30 Thread Mathieu Desnoyers
Register rseq TLS for each thread (including main), and unregister for
each thread (excluding main).  "rseq" stands for Restartable Sequences.

See the rseq(2) man page proposed here:
  https://lkml.org/lkml/2018/9/19/647

Those are based on glibc master branch commit 6f0baacf0f89.
The rseq system call was merged into Linux 4.18.

CC: Carlos O'Donell 
CC: Florian Weimer 
CC: Joseph Myers 
CC: Szabolcs Nagy 
CC: Thomas Gleixner 
CC: Ben Maurer 
CC: Peter Zijlstra 
CC: "Paul E. McKenney" 
CC: Boqun Feng 
CC: Will Deacon 
CC: Dave Watson 
CC: Paul Turner 
CC: Rich Felker 
CC: libc-al...@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-...@vger.kernel.org
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
  eliminate one symbol.

  All libraries/programs which try to register rseq (glibc,
  early-adopter applications, early-adopter libraries) should use the
  rseq refcount.  It becomes part of the ABI within a user-space
  process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
  non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
  nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
  trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
  across glibc, including memory allocator and sched_getcpu().  The
  __rseq_refcount symbol is kept in nptl, because there is no reason
  to use it elsewhere in glibc.

Changes since v3:
- Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
  because glibc is the first/last user.
- Unconditionally register/unregister rseq at thread start/exit, because
  glibc is the first/last user.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.

Changes since v4:
- Do not use "weak" symbols for __rseq_abi and __rseq_refcount.  Based on
  "System V Application Binary Interface", weak only affects the link
  editor, not the dynamic linker.
- Install a new sys/rseq.h system header on Linux, which contains the
  RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
  declaration.  Move those definition/declarations from rseq-internal.h
  to the installed sys/rseq.h header.
- Considering that rseq is only available on Linux, move csu/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c.
- Move __rseq_refcount from nptl/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
- Move both ABI definitions for __rseq_abi and __rseq_refcount to
  sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
- Document __rseq_abi and __rseq_refcount volatile.
- Document the RSEQ_SIG signature define.
- Move registration functions from rseq.c to rseq-internal.h static
  inline functions.  Introduce empty stubs in misc/rseq-internal.h,
  which can be overridden by architecture code in
  sysdeps/unix/sysv/linux/rseq-internal.h.
- Rename __rseq_register_current_thread and __rseq_unregister_current_thread
  to rseq_register_current_thread and rseq_unregister_current_thread,
  now that those are only visible as internal static inline functions.
- Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
  rather than nptl init, so applications not linked against
  libpthread.so have rseq registered for their main() thread.  Note that
  it is invoked separately for SHARED and !SHARED builds.

Changes since v5:
- Replace __rseq_refcount by __rseq_lib_abi, which contains two
  uint32_t: register_state and refcount.  The "register_state" field
  allows inhibiting rseq registration from signal handlers nested on top
  of glibc registration and occuring after rseq unregistration by glibc.
- Introduce enum rseq_register_state, which contains the states allowed
  for the struct rseq_lib_abi register_state field.

Changes since v6:
- Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
  The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
  architecture implementing rseq needs to implement bits/rseq.h.
- Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
- Port to glibc-2.29.

Changes since v7:
- Remove __rseq_lib_abi symbol, including refcount and register_state
  fields.
- Remove reference counting and nested signals handling from
  registration/unregistration functions.
- Introduce new __rseq_handled exported symbol, which is set to 1
  by glibc on C startup when it handles restartable sequences.
  This allows glibc to coexist with early adopter libraries and
  applications wishing to register restartable sequences when it
  is not handled by glibc.
- Introduce rseq_init (), which sets __rseq_handled to 1 from
  C startup.
- Update NEWS entry.
- Update comments at the beginning of new files.
- Registration depends on 

Re: [PATCH] memcg: oom: ignore oom warnings from memory.max

2020-04-30 Thread Yafang Shao
On Fri, May 1, 2020 at 10:04 AM Shakeel Butt  wrote:
>
> On Thu, Apr 30, 2020 at 6:39 PM Yafang Shao  wrote:
> >
> > On Fri, May 1, 2020 at 2:27 AM Shakeel Butt  wrote:
> > >
> > > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > > succeed. However if oom-killer does not find a process for killing, it
> > > dumps a lot of warnings.
> > >
> >
> > I have been confused by this behavior for several months and I think
> > it will confuse more memcg users.
> > We should keep the memcg oom behavior consistent with system oom - no
> > oom kill if no process.
> >
> > What about bellow change ?
> >
>
> Seems fine to me. If others are ok with this, please do send a signed-off 
> patch.
>

Sure, I will.

> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index e28098e13f1c..25fbc37a747f 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6086,6 +6086,9 @@ static ssize_t memory_max_write(struct
> > kernfs_open_file *of,
> > continue;
> > }
> >
> > +   if (!cgroup_is_populated(memcg->css.cgroup))
> > +   break;
> > +
> > memcg_memory_event(memcg, MEMCG_OOM);
> > if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> > break;
> >
> > > Deleting a memcg does not reclaim memory from it and the memory can
> > > linger till there is a memory pressure. One normal way to proactively
> > > reclaim such memory is to set memory.max to 0 just before deleting the
> > > memcg. However if some of the memcg's memory is pinned by others, this
> > > operation can trigger an oom-kill without any process and thus can log a
> > > lot un-needed warnings. So, ignore all such warnings from memory.max.
> > >
> > > Signed-off-by: Shakeel Butt 
> > > ---
> > >  include/linux/oom.h | 3 +++
> > >  mm/memcontrol.c | 9 +
> > >  mm/oom_kill.c   | 2 +-
> > >  3 files changed, 9 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > > index c696c265f019..6345dc55df64 100644
> > > --- a/include/linux/oom.h
> > > +++ b/include/linux/oom.h
> > > @@ -52,6 +52,9 @@ struct oom_control {
> > >
> > > /* Used to print the constraint info. */
> > > enum oom_constraint constraint;
> > > +
> > > +   /* Do not warn even if there is no process to be killed. */
> > > +   bool no_warn;
> > >  };
> > >
> > >  extern struct mutex oom_lock;
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index 317dbbaac603..a1f00d9b9bb0 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1571,7 +1571,7 @@ unsigned long mem_cgroup_size(struct mem_cgroup 
> > > *memcg)
> > >  }
> > >
> > >  static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t 
> > > gfp_mask,
> > > -int order)
> > > +int order, bool no_warn)
> > >  {
> > > struct oom_control oc = {
> > > .zonelist = NULL,
> > > @@ -1579,6 +1579,7 @@ static bool mem_cgroup_out_of_memory(struct 
> > > mem_cgroup *memcg, gfp_t gfp_mask,
> > > .memcg = memcg,
> > > .gfp_mask = gfp_mask,
> > > .order = order,
> > > +   .no_warn = no_warn,
> > > };
> > > bool ret;
> > >
> > > @@ -1821,7 +1822,7 @@ static enum oom_status mem_cgroup_oom(struct 
> > > mem_cgroup *memcg, gfp_t mask, int
> > > mem_cgroup_oom_notify(memcg);
> > >
> > > mem_cgroup_unmark_under_oom(memcg);
> > > -   if (mem_cgroup_out_of_memory(memcg, mask, order))
> > > +   if (mem_cgroup_out_of_memory(memcg, mask, order, false))
> > > ret = OOM_SUCCESS;
> > > else
> > > ret = OOM_FAILED;
> > > @@ -1880,7 +1881,7 @@ bool mem_cgroup_oom_synchronize(bool handle)
> > > mem_cgroup_unmark_under_oom(memcg);
> > > finish_wait(_oom_waitq, );
> > > mem_cgroup_out_of_memory(memcg, 
> > > current->memcg_oom_gfp_mask,
> > > -current->memcg_oom_order);
> > > +current->memcg_oom_order, false);
> > > } else {
> > > schedule();
> > > mem_cgroup_unmark_under_oom(memcg);
> > > @@ -6106,7 +6107,7 @@ static ssize_t memory_max_write(struct 
> > > kernfs_open_file *of,
> > > }
> > >
> > > memcg_memory_event(memcg, MEMCG_OOM);
> > > -   if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> > > +   if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0, true))
> > > break;
> > > }
> > >
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > index 463b3d74a64a..5ace39f6fe1e 100644
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -1098,7 +1098,7 @@ bool out_of_memory(struct oom_control *oc)
> > >
> > > 

RE: Need help on "Self Detected Stall on CPU"

2020-04-30 Thread Atul Kulkarni
Thank you sir for your guidance and quick response.

Let me introduce my colleagues Paul and Mikhail here (copied in CC). They would 
be taking actions based on your guidance in this email and may reach you with 
further queries.

Appreciate your support and help.

Thanks,
Atul

-Original Message-
From: Paul E. McKenney  
Sent: 01 May 2020 00:47
To: Atul Kulkarni 
Cc: linux-kernel@vger.kernel.org
Subject: Re: Need help on "Self Detected Stall on CPU"

On Thu, Apr 30, 2020 at 06:47:20PM +, Atul Kulkarni wrote:
> Dear Sir,
> 
> Hope you are doing well.  I have watched your various conference videos and 
> have read technical papers.
> We are facing an issue with CPU stall on our systems and I felt like there is 
> no one better who can guide us on how we can deal with it.
> 
> I have attached logs for your reference. Towards end I have run couple of 
> sysreq commands and have taken crash dump using sysreq which may help provide 
> additional information.
> Could you please guide us on how we could fix  this issue or identify what is 
> going wrong here?

Let's focus on the first few lines of your console message:

[20526.345089] INFO: rcu_preempt self-detected stall on CPU [20526.351110]  
0-...: (1051 ticks this GP) idle=1fe/142/0 softirq=146268/146268 
fqs=0
[20526.360163]   (t=2101 jiffies g=96468 c=96467 q=2)
[20526.365535] rcu_preempt kthread starved for 2101 jiffies! g96468 c96467 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0

The last line contains the hint, namely "rcu_preempt kthread starved for
2101 jiffies!"  If you don't let RCU's kernel threads run, then RCU CPU stall 
warnings are expected behavior.

The "RCU_GP_WAIT_FQS(3)" means that this kthread's last act was to sleep for 
three jiffies.  As you can see from earlier in that same line, that was 2101 
jiffies ago.  The "->state=0x402" means that the scheduler believes that this 
kthread is blocked, that is not yet runnable.

The usual way this sort of thing happens is a timer problem, be it a hardware 
configuration problem, a timer-driver bug, an interrupt-handling problem, and 
so on.  This sort of problem is especially common when bringing up new hardware 
or when modifying timer code or when modifying code on the interrupt/exception 
paths.

So the question to ask yourself is "Why is the timer wakeup not reaching this 
kthread?", with special attention to changed code and new hardware.

Thanx, Paul


Re: [PATCH V1 05/10] arch/kmap_atomic: Consolidate duplicate code

2020-04-30 Thread Al Viro
On Thu, Apr 30, 2020 at 01:38:40PM -0700, ira.we...@intel.com wrote:
> From: Ira Weiny 
> 
> Every arch has the same code to ensure atomic operations and a check for
> !HIGHMEM page.
> 
> Remove the duplicate code by defining a core kmap_atomic() which only
> calls the arch specific kmap_atomic_high() when the page is high memory.

Err  AFAICS, you've just silently changed the semantics for
kmap_atomic_prot() here.  And while most of the callers are converted,
drivers/gpu/drm/ttm/ttm_bo_util.c one is not, so at the very least it's
a bisect hazard...

And I would argue that having kmap_atomic() differ from kmap_atomic_prot()
wrt disabling preempt is asking for trouble.


Re: linux-next: manual merge of the btrfs tree with the btrfs-fixes tree

2020-04-30 Thread Qu Wenruo


On 2020/5/1 上午9:05, Stephen Rothwell wrote:
> Hi all,
> 
> On Fri, 1 May 2020 10:24:53 +1000 Stephen Rothwell  
> wrote:
>>
>> Today's linux-next merge of the btrfs tree got a conflict in:
>>
>>   fs/btrfs/transaction.c
>>
>> between commit:
>>
>>   fcc99734d1d4 ("btrfs: transaction: Avoid deadlock due to bad 
>> initialization timing of fs_info::journal_info")
>>
>> from the btrfs-fixes tree and commit:
>>
>>   f12ca53a6fd6 ("btrfs: force chunk allocation if our global rsv is larger 
>> than metadata")
>>
>> from the btrfs tree.
>>
>> I fixed it up (see below) and can carry the fix as necessary. This
>> is now fixed as far as linux-next is concerned, but any non trivial
>> conflicts should be mentioned to your upstream maintainer when your tree
>> is submitted for merging.  You may also want to consider cooperating
>> with the maintainer of the conflicting tree to minimise any particularly
>> complex conflicts.
>>
>> -- 
>> Cheers,
>> Stephen Rothwell
>>
>> diff --cc fs/btrfs/transaction.c
>> index 2d5498136e5e,e4dbd8e3c641..
>> --- a/fs/btrfs/transaction.c
>> +++ b/fs/btrfs/transaction.c
>> @@@ -666,15 -674,17 +672,26 @@@ got_it
>>  current->journal_info = h;
>>   
>>  /*
>>  +* btrfs_record_root_in_trans() needs to alloc new extents, and may
>>  +* call btrfs_join_transaction() while we're also starting a
>>  +* transaction.
>>  +*
>>  +* Thus it need to be called after current->journal_info initialized,
>>  +* or we can deadlock.
>>  +*/
>>  +   btrfs_record_root_in_trans(h, root);
>>  +
>> + * If the space_info is marked ALLOC_FORCE then we'll get upgraded to
>> + * ALLOC_FORCE the first run through, and then we won't allocate for
>> + * anybody else who races in later.  We don't care about the return
>> + * value here.
>> + */
>> +if (do_chunk_alloc && num_bytes) {
>> +u64 flags = h->block_rsv->space_info->flags;
>> +btrfs_chunk_alloc(h, btrfs_get_alloc_profile(fs_info, flags),
>> +  CHUNK_ALLOC_NO_FORCE);
>> +}
>> + 
>>  return h;

The proper fix has landed in David's misc-next branch, which puts
btrfs_record_root_in_trans(); after the if () {} code block.

By that, btrfs_record_root_in_trans() has lesser chance to hit ENOSPC.

Thanks,
Qu

>>   
>>   join_fail:
> 
> 
> I fixed the missing comment start in my resolution ...
> 



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] memcg: oom: ignore oom warnings from memory.max

2020-04-30 Thread Shakeel Butt
On Thu, Apr 30, 2020 at 6:39 PM Yafang Shao  wrote:
>
> On Fri, May 1, 2020 at 2:27 AM Shakeel Butt  wrote:
> >
> > Lowering memory.max can trigger an oom-kill if the reclaim does not
> > succeed. However if oom-killer does not find a process for killing, it
> > dumps a lot of warnings.
> >
>
> I have been confused by this behavior for several months and I think
> it will confuse more memcg users.
> We should keep the memcg oom behavior consistent with system oom - no
> oom kill if no process.
>
> What about bellow change ?
>

Seems fine to me. If others are ok with this, please do send a signed-off patch.

> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e28098e13f1c..25fbc37a747f 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -6086,6 +6086,9 @@ static ssize_t memory_max_write(struct
> kernfs_open_file *of,
> continue;
> }
>
> +   if (!cgroup_is_populated(memcg->css.cgroup))
> +   break;
> +
> memcg_memory_event(memcg, MEMCG_OOM);
> if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> break;
>
> > Deleting a memcg does not reclaim memory from it and the memory can
> > linger till there is a memory pressure. One normal way to proactively
> > reclaim such memory is to set memory.max to 0 just before deleting the
> > memcg. However if some of the memcg's memory is pinned by others, this
> > operation can trigger an oom-kill without any process and thus can log a
> > lot un-needed warnings. So, ignore all such warnings from memory.max.
> >
> > Signed-off-by: Shakeel Butt 
> > ---
> >  include/linux/oom.h | 3 +++
> >  mm/memcontrol.c | 9 +
> >  mm/oom_kill.c   | 2 +-
> >  3 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/oom.h b/include/linux/oom.h
> > index c696c265f019..6345dc55df64 100644
> > --- a/include/linux/oom.h
> > +++ b/include/linux/oom.h
> > @@ -52,6 +52,9 @@ struct oom_control {
> >
> > /* Used to print the constraint info. */
> > enum oom_constraint constraint;
> > +
> > +   /* Do not warn even if there is no process to be killed. */
> > +   bool no_warn;
> >  };
> >
> >  extern struct mutex oom_lock;
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 317dbbaac603..a1f00d9b9bb0 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -1571,7 +1571,7 @@ unsigned long mem_cgroup_size(struct mem_cgroup 
> > *memcg)
> >  }
> >
> >  static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t 
> > gfp_mask,
> > -int order)
> > +int order, bool no_warn)
> >  {
> > struct oom_control oc = {
> > .zonelist = NULL,
> > @@ -1579,6 +1579,7 @@ static bool mem_cgroup_out_of_memory(struct 
> > mem_cgroup *memcg, gfp_t gfp_mask,
> > .memcg = memcg,
> > .gfp_mask = gfp_mask,
> > .order = order,
> > +   .no_warn = no_warn,
> > };
> > bool ret;
> >
> > @@ -1821,7 +1822,7 @@ static enum oom_status mem_cgroup_oom(struct 
> > mem_cgroup *memcg, gfp_t mask, int
> > mem_cgroup_oom_notify(memcg);
> >
> > mem_cgroup_unmark_under_oom(memcg);
> > -   if (mem_cgroup_out_of_memory(memcg, mask, order))
> > +   if (mem_cgroup_out_of_memory(memcg, mask, order, false))
> > ret = OOM_SUCCESS;
> > else
> > ret = OOM_FAILED;
> > @@ -1880,7 +1881,7 @@ bool mem_cgroup_oom_synchronize(bool handle)
> > mem_cgroup_unmark_under_oom(memcg);
> > finish_wait(_oom_waitq, );
> > mem_cgroup_out_of_memory(memcg, current->memcg_oom_gfp_mask,
> > -current->memcg_oom_order);
> > +current->memcg_oom_order, false);
> > } else {
> > schedule();
> > mem_cgroup_unmark_under_oom(memcg);
> > @@ -6106,7 +6107,7 @@ static ssize_t memory_max_write(struct 
> > kernfs_open_file *of,
> > }
> >
> > memcg_memory_event(memcg, MEMCG_OOM);
> > -   if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0))
> > +   if (!mem_cgroup_out_of_memory(memcg, GFP_KERNEL, 0, true))
> > break;
> > }
> >
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 463b3d74a64a..5ace39f6fe1e 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -1098,7 +1098,7 @@ bool out_of_memory(struct oom_control *oc)
> >
> > select_bad_process(oc);
> > /* Found nothing?!?! */
> > -   if (!oc->chosen) {
> > +   if (!oc->chosen && !oc->no_warn) {
> > dump_header(oc, NULL);
> > pr_warn("Out of memory and no killable processes...\n");
> > /*
> > --
> > 2.26.2.526.g744177e7f7-goog
> >
> >

[PATCH bpf-next 2/2] bpf, arm: Optimize ALU ARSH K using asr immediate instruction

2020-04-30 Thread Luke Nelson
This patch adds an optimization that uses the asr immediate instruction
for BPF_ALU BPF_ARSH BPF_K, rather than loading the immediate to
a temporary register. This is similar to existing code for handling
BPF_ALU BPF_{LSH,RSH} BPF_K. This optimization saves two instructions
and is more consistent with LSH and RSH.

Example of the code generated for BPF_ALU32_IMM(BPF_ARSH, BPF_REG_0, 5)
before the optimization:

  2c:  movr8, #5
  30:  movr9, #0
  34:  asrr0, r0, r8

and after optimization:

  2c:  asrr0, r0, #5

Tested on QEMU using lib/test_bpf and test_verifier.

Co-developed-by: Xi Wang 
Signed-off-by: Xi Wang 
Signed-off-by: Luke Nelson 
---
 arch/arm/net/bpf_jit_32.c | 10 +++---
 arch/arm/net/bpf_jit_32.h |  3 +++
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 48b89211ee5c..0207b6ea6e8a 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -795,6 +795,9 @@ static inline void emit_a32_alu_i(const s8 dst, const u32 
val,
case BPF_RSH:
emit(ARM_LSR_I(rd, rd, val), ctx);
break;
+   case BPF_ARSH:
+   emit(ARM_ASR_I(rd, rd, val), ctx);
+   break;
case BPF_NEG:
emit(ARM_RSB_I(rd, rd, val), ctx);
break;
@@ -1408,7 +1411,6 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
case BPF_ALU | BPF_MUL | BPF_X:
case BPF_ALU | BPF_LSH | BPF_X:
case BPF_ALU | BPF_RSH | BPF_X:
-   case BPF_ALU | BPF_ARSH | BPF_K:
case BPF_ALU | BPF_ARSH | BPF_X:
case BPF_ALU64 | BPF_ADD | BPF_K:
case BPF_ALU64 | BPF_ADD | BPF_X:
@@ -1465,10 +1467,12 @@ static int build_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx)
case BPF_ALU64 | BPF_MOD | BPF_K:
case BPF_ALU64 | BPF_MOD | BPF_X:
goto notyet;
-   /* dst = dst >> imm */
/* dst = dst << imm */
-   case BPF_ALU | BPF_RSH | BPF_K:
+   /* dst = dst >> imm */
+   /* dst = dst >> imm (signed) */
case BPF_ALU | BPF_LSH | BPF_K:
+   case BPF_ALU | BPF_RSH | BPF_K:
+   case BPF_ALU | BPF_ARSH | BPF_K:
if (unlikely(imm > 31))
return -EINVAL;
if (imm)
diff --git a/arch/arm/net/bpf_jit_32.h b/arch/arm/net/bpf_jit_32.h
index fb67cbc589e0..e0b593a1498d 100644
--- a/arch/arm/net/bpf_jit_32.h
+++ b/arch/arm/net/bpf_jit_32.h
@@ -94,6 +94,9 @@
 #define ARM_INST_LSR_I 0x01a00020
 #define ARM_INST_LSR_R 0x01a00030
 
+#define ARM_INST_ASR_I 0x01a00040
+#define ARM_INST_ASR_R 0x01a00050
+
 #define ARM_INST_MOV_R 0x01a0
 #define ARM_INST_MOVS_R0x01b0
 #define ARM_INST_MOV_I 0x03a0
-- 
2.17.1



[PATCH bpf-next 1/2] bpf, arm: Optimize ALU64 ARSH X using orrpl conditional instruction

2020-04-30 Thread Luke Nelson
This patch optimizes the code generated by emit_a32_arsh_r64, which
handles the BPF_ALU64 BPF_ARSH BPF_X instruction.

The original code uses a conditional B followed by an unconditional ORR.
The optimization saves one instruction by removing the B instruction
and using a conditional ORR (with an inverted condition).

Example of the code generated for BPF_ALU64_REG(BPF_ARSH, BPF_REG_0,
BPF_REG_1), before optimization:

  34:  rsbip, r2, #32
  38:  subs   r9, r2, #32
  3c:  lsrlr, r0, r2
  40:  orrlr, lr, r1, lsl ip
  44:  bmi0x4c
  48:  orrlr, lr, r1, asr r9
  4c:  asrip, r1, r2
  50:  movr0, lr
  54:  movr1, ip

and after optimization:

  34:  rsbip, r2, #32
  38:  subs   r9, r2, #32
  3c:  lsrlr, r0, r2
  40:  orrlr, lr, r1, lsl ip
  44:  orrpl  lr, lr, r1, asr r9
  48:  asrip, r1, r2
  4c:  movr0, lr
  50:  movr1, ip

Tested on QEMU using lib/test_bpf and test_verifier.

Co-developed-by: Xi Wang 
Signed-off-by: Xi Wang 
Signed-off-by: Luke Nelson 
---
 arch/arm/net/bpf_jit_32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index bf85d6db4931..48b89211ee5c 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -860,8 +860,8 @@ static inline void emit_a32_arsh_r64(const s8 dst[], const 
s8 src[],
emit(ARM_SUBS_I(tmp2[0], rt, 32), ctx);
emit(ARM_MOV_SR(ARM_LR, rd[1], SRTYPE_LSR, rt), ctx);
emit(ARM_ORR_SR(ARM_LR, ARM_LR, rd[0], SRTYPE_ASL, ARM_IP), ctx);
-   _emit(ARM_COND_MI, ARM_B(0), ctx);
-   emit(ARM_ORR_SR(ARM_LR, ARM_LR, rd[0], SRTYPE_ASR, tmp2[0]), ctx);
+   _emit(ARM_COND_PL,
+ ARM_ORR_SR(ARM_LR, ARM_LR, rd[0], SRTYPE_ASR, tmp2[0]), ctx);
emit(ARM_MOV_SR(ARM_IP, rd[0], SRTYPE_ASR, rt), ctx);
 
arm_bpf_put_reg32(dst_lo, ARM_LR, ctx);
-- 
2.17.1



[PATCH bpf 0/2] bpf, arm: Small JIT optimizations

2020-04-30 Thread Luke Nelson
As Daniel suggested to us, we ran our formal verification tool, Serval,
over the arm JIT. The bugs we found have been patched and applied to the
bpf tree [1, 2]. This patch series introduces two small optimizations
that simplify the JIT and use fewer instructions.

[1] https://lore.kernel.org/bpf/20200408181229.10909-1-luke.r.n...@gmail.com/
[2] https://lore.kernel.org/bpf/20200409221752.28448-1-luke.r.n...@gmail.com/

Luke Nelson (2):
  bpf, arm: Optimize emit_a32_arsh_r64 using conditional instruction
  bpf, arm: Optimize ALU ARSH K using asr immediate instruction

 arch/arm/net/bpf_jit_32.c | 14 +-
 arch/arm/net/bpf_jit_32.h |  2 ++
 2 files changed, 11 insertions(+), 5 deletions(-)

-- 
2.17.1



Re: [PATCHv2] kretprobe: Prevent triggering kretprobe from within kprobe_flush_task

2020-04-30 Thread Masami Hiramatsu
On Tue, 28 Apr 2020 23:36:27 +0200
Jiri Olsa  wrote:

> On Fri, Apr 17, 2020 at 04:38:10PM +0900, Masami Hiramatsu wrote:
> 
> SNIP
> 
> > > 
> > > The code within the kretprobe handler checks for probe reentrancy,
> > > so we won't trigger any _raw_spin_lock_irqsave probe in there.
> > > 
> > > The problem is in outside kprobe_flush_task, where we call:
> > > 
> > >   kprobe_flush_task
> > > kretprobe_table_lock
> > >   raw_spin_lock_irqsave
> > > _raw_spin_lock_irqsave
> > > 
> > > where _raw_spin_lock_irqsave triggers the kretprobe and installs
> > > kretprobe_trampoline handler on _raw_spin_lock_irqsave return.
> > > 
> > > The kretprobe_trampoline handler is then executed with already
> > > locked kretprobe_table_locks, and first thing it does is to
> > > lock kretprobe_table_locks ;-) the whole lockup path like:
> > > 
> > >   kprobe_flush_task
> > > kretprobe_table_lock
> > >   raw_spin_lock_irqsave
> > > _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline 
> > > installed
> > > 
> > > ---> kretprobe_table_locks locked
> > > 
> > > kretprobe_trampoline
> > >   trampoline_handler
> > > kretprobe_hash_lock(current, , );  <--- deadlock
> > > 
> > > Adding kprobe_busy_begin/end helpers that mark code with fake
> > > probe installed to prevent triggering of another kprobe within
> > > this code.
> > > 
> > > Using these helpers in kprobe_flush_task, so the probe recursion
> > > protection check is hit and the probe is never set to prevent
> > > above lockup.
> > > 
> > 
> > Thanks Jiri!
> > 
> > Ingo, could you pick this up?
> 
> Ingo, any chance you could take this one?

Hi Ingo,

Should I make a pull request for all kprobes related patches to you?

Thank you,

-- 
Masami Hiramatsu 


Re: [PATCH] selftests/ftrace: treat module requirement unmet situation as unsupported

2020-04-30 Thread Masami Hiramatsu
Hi,

On Wed, 29 Apr 2020 17:50:44 +0800
Po-Hsu Lin  wrote:

> When the required module for the test does not exist, use
> exit_unsupported instead of exit_unresolved to indicate this test is
> not supported.

Hmm, this doesn't mean "the function is not supported" but
"we can not resolve this because of the environmental issue".
For example, if you forgot to install the modules, but the
function itself is enabled, that can not be tested, but the
system supports that feature.

> 
> By doing this we can make test behaviour in sync with the
> irqsoff_tracer.tc test in preemptirq, which is also treating module
> existence in this way. Moreover, the test won't exit with a non-zero
> return value if the module does not exist.

It is OK to return zero even if the result is "unresolved", but
I don't want to change the result of each test cases, because
it clarify that you must install modules correctly, instead of
enabling the feature.

And OK, I found irqsoff_tracer.tc IS incorrect. It should be fixed to
return UNRESOLVED if there is no test module.

If you still think UNRESOLVED is unneeded, please propose the patch
which removes all UNRESOLVED related code. That can start another
discussion.

Thank you,


> 
> Fixes: 646f01ccdd59 ("ftrace/selftest: Add tests to test 
> register_ftrace_direct()")
> Fixes: 4d23e9b4fd2e ("selftests/ftrace: Add trace_printk sample module test")
> Fixes: 7bc026d6c032 ("selftests/ftrace: Add function filter on module 
> testcase")
> Fixes: af2a0750f374 ("selftests/ftrace: Improve kprobe on module testcase to 
> load/unload module")
> Signed-off-by: Po-Hsu Lin 
> ---
>  tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc  | 2 +-
>  tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc  | 2 +-
>  tools/testing/selftests/ftrace/test.d/event/trace_printk.tc| 2 +-
>  tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc | 2 +-
>  tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc  | 2 +-
>  5 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc 
> b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc
> index d75a869..3d6189e 100644
> --- a/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc
> +++ b/tools/testing/selftests/ftrace/test.d/direct/ftrace-direct.tc
> @@ -5,7 +5,7 @@
>  rmmod ftrace-direct ||:
>  if ! modprobe ftrace-direct ; then
>echo "No ftrace-direct sample module - please make 
> CONFIG_SAMPLE_FTRACE_DIRECT=m"
> -  exit_unresolved;
> +  exit_unsupported;
>  fi
>  
>  echo "Let the module run a little"
> diff --git a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc 
> b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc
> index 801ecb6..3d0e3ca 100644
> --- a/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc
> +++ b/tools/testing/selftests/ftrace/test.d/direct/kprobe-direct.tc
> @@ -5,7 +5,7 @@
>  rmmod ftrace-direct ||:
>  if ! modprobe ftrace-direct ; then
>echo "No ftrace-direct sample module - please build with 
> CONFIG_SAMPLE_FTRACE_DIRECT=m"
> -  exit_unresolved;
> +  exit_unsupported;
>  fi
>  
>  if [ ! -f kprobe_events ]; then
> diff --git a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc 
> b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc
> index b02550b..dd8b10d 100644
> --- a/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc
> +++ b/tools/testing/selftests/ftrace/test.d/event/trace_printk.tc
> @@ -5,7 +5,7 @@
>  rmmod trace-printk ||:
>  if ! modprobe trace-printk ; then
>echo "No trace-printk sample module - please make 
> CONFIG_SAMPLE_TRACE_PRINTK=m"
> -  exit_unresolved;
> +  exit_unsupported;
>  fi
>  
>  echo "Waiting for irq work"
> diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc 
> b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc
> index 1a4b4a4..26dc06a 100644
> --- a/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc
> +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_mod_trace.tc
> @@ -13,7 +13,7 @@ echo '*:mod:trace_printk' > set_ftrace_filter
>  if ! modprobe trace-printk ; then
>echo "No trace-printk sample module - please make 
> CONFIG_SAMPLE_TRACE_PRINTK=
>  m"
> -  exit_unresolved;
> +  exit_unsupported;
>  fi
>  
>  : "Wildcard should be resolved after loading module"
> diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc 
> b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
> index d861bd7..4e07c69 100644
> --- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
> +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_module.tc
> @@ -8,7 +8,7 @@ rmmod trace-printk ||:
>  if ! modprobe trace-printk ; then
>echo "No trace-printk sample module - please make 
> CONFIG_SAMPLE_TRACE_PRINTK=
>  m"
> -  exit_unresolved;
> +  exit_unsupported;
>  fi
>  
>  MOD=trace_printk
> -- 
> 2.7.4
> 


-- 
Masami Hiramatsu 


[PATCH 2/3] mm/swapfile.c: __swap_entry_free() always free 1 entry

2020-04-30 Thread Wei Yang
__swap_entry_free() always free 1 entry, let's remove the usage.

Signed-off-by: Wei Yang 
---
 mm/swapfile.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index f73e0c11fab9..1a877d1d40e3 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1251,13 +1251,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t 
entry)
 }
 
 static unsigned char __swap_entry_free(struct swap_info_struct *p,
-  swp_entry_t entry, unsigned char usage)
+  swp_entry_t entry)
 {
struct swap_cluster_info *ci;
unsigned long offset = swp_offset(entry);
+   unsigned char usage;
 
ci = lock_cluster_or_swap_info(p, offset);
-   usage = __swap_entry_free_locked(p, offset, usage);
+   usage = __swap_entry_free_locked(p, offset, 1);
unlock_cluster_or_swap_info(p, ci);
if (!usage)
free_swap_slot(entry);
@@ -1292,7 +1293,7 @@ void swap_free(swp_entry_t entry)
 
p = _swap_info_get(entry);
if (p)
-   __swap_entry_free(p, entry, 1);
+   __swap_entry_free(p, entry);
 }
 
 /*
@@ -1715,7 +1716,7 @@ int free_swap_and_cache(swp_entry_t entry)
 
p = _swap_info_get(entry);
if (p) {
-   count = __swap_entry_free(p, entry, 1);
+   count = __swap_entry_free(p, entry);
if (count == SWAP_HAS_CACHE &&
!swap_page_trans_huge_swapped(p, entry))
__try_to_reclaim_swap(p, swp_offset(entry),
-- 
2.23.0



[PATCH 3/3] mm/swapfile.c: count won't be bigger than SWAP_MAP_MAX

2020-04-30 Thread Wei Yang
When the condition is true, there are two possibilities:

   1. count == SWAP_MAP_BAD
   2. count == (SWAP_MAP_MAX & COUNT_CONTINUED) == SWAP_MAP_SHMEM

The first case would be filtered by the first if in __swap_duplicate().

And the second case means this swap entry is for shmem. Since we never
do another duplication for shmem swap entry. This won't happen neither.

Signed-off-by: Wei Yang 
---
 mm/swapfile.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1a877d1d40e3..88dd2ad34aad 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3404,8 +3404,6 @@ static int __swap_duplicate(swp_entry_t entry, unsigned 
char usage)
 
if ((count & ~COUNT_CONTINUED) < SWAP_MAP_MAX)
count += usage;
-   else if ((count & ~COUNT_CONTINUED) > SWAP_MAP_MAX)
-   err = -EINVAL;
else if (swap_count_continued(p, offset, count))
count = COUNT_CONTINUED;
else
-- 
2.23.0



[PATCH 1/3] mm/swapfile.c: classify SWAP_MAP_XXX to make it more readable

2020-04-30 Thread Wei Yang
swap_info_struct->swap_map[] encodes some flag and count. And to
do some condition check, it also introduces some special values.

Currently those macros are defined with some magic order, which makes
audience hard to understand the exact meaning.

This patch split those macros into three categories:

flag
special value for first swap_map
special value for continued swap_map

May this help audiences a little.

Signed-off-by: Wei Yang 
---
 include/linux/swap.h | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1e99f7ac1d7e..ec507c67529b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -183,12 +183,17 @@ enum {
 #define SWAP_CLUSTER_MAX 32UL
 #define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX
 
-#define SWAP_MAP_MAX   0x3e/* Max duplication count, in first swap_map */
-#define SWAP_MAP_BAD   0x3f/* Note pageblock is bad, in first swap_map */
+/* Bit flag in swap_map */
 #define SWAP_HAS_CACHE 0x40/* Flag page is cached, in first swap_map */
-#define SWAP_CONT_MAX  0x7f/* Max count, in each swap_map continuation */
-#define COUNT_CONTINUED0x80/* See swap_map continuation for full 
count */
-#define SWAP_MAP_SHMEM 0xbf/* Owned by shmem/tmpfs, in first swap_map */
+#define COUNT_CONTINUED0x80/* Flag swap_map continuation for full 
count */
+
+/* Special Value in first swap_map */
+#define SWAP_MAP_MAX   0x3e/* Max count */
+#define SWAP_MAP_BAD   0x3f/* Note page is bad */
+#define SWAP_MAP_SHMEM 0xbf/* Owned by shmem/tmpfs */
+
+/* Special Value in each swap_map continuation */
+#define SWAP_CONT_MAX  0x7f/* Max count */
 
 /*
  * We use this to track usage of a cluster. A cluster is a block of swap disk
-- 
2.23.0



Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume

2020-04-30 Thread Bart Van Assche
On 2020-04-30 18:42, Can Guo wrote:
> On 2020-05-01 04:32, Bart Van Assche wrote:
> > Has it been considered to test directly whether a SCSI device has been
> > runtime suspended instead of relying on blk_queue_pm_only()? How about
> > using pm_runtime_status_suspended() or adding a function in
> > block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?
> 
> Yes, I used to make the patch like that way, and it also worked well, as
> both ways are equal actually. I kinda like the current code because we
> should be confident that after scsi_dev_type_resume() returns, pm_only
> must be 0. Different reviewers may have different opinions, either way
> works well anyways.

Hi Can,

Please note that this is not a matter of personal preferences of a
reviewer but a matter of correctness. blk_queue_pm_only() does not only
return a value > 0 if a SCSI device has been runtime suspended but also
returns true if scsi_device_quiesce() was called for another reason.
Hence my request to test the "runtime suspended" status directly and not
to rely on blk_queue_pm_only().

Thanks,

Bart.


Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-04-30 Thread Al Viro
On Fri, May 01, 2020 at 02:42:29AM +0100, Al Viro wrote:
> On Thu, Apr 30, 2020 at 03:13:38PM -0700, Matthew Wilcox wrote:
> 
> > > +/**
> > > + * clear_page_private - clear page's private field and PG_private.
> > > + * @page: page to be cleared.
> > > + *
> > > + * The counterpart function of attach_page_private.
> > > + * Return: private data of page or NULL if page doesn't have private 
> > > data.
> > > + */
> > 
> > Seems to me that the opposite of "attach" is "detach", not "clear".
> 
> Or "remove", perhaps...

Actually, "detach" is better - neither "clear" nor "remove" imply "... and give
me what used to be attached there", as this thing is doing.


Re: [RFC PATCH v2 1/3] printk: Add function to set console to preferred console's driver

2020-04-30 Thread Sergey Senozhatsky
On (20/04/30 19:14), Alper Nebi Yasak wrote:
> Currently, add_preferred_console sets a preferred console, but doesn't
> actually change /dev/console to match it. That part is handled within
> register_device, where a newly registered console driver will be set as
> /dev/console if it matches the preferred console.
> 
> However, if the relevant driver is already registered, the only way to
> set it as /dev/console is by un-registering and re-registering it.

Hmm. Preferred console selection is very fragile, there are too many
setups and workarounds that even minor tweaks introduce regressions
oftentimes.

We have, by the way, a pending patchset which changes the same
are - preferred console selection.

git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk.git 
for-5.7-preferred-console

[..]
> An example is the xenfb_make_preferred_console() function:
> 
>   console_lock();
>   for_each_console(c) {
>   if (!strcmp(c->name, "tty") && c->index == 0)
>   break;
>   }
>   console_unlock();
>   if (c) {
>   unregister_console(c);
>   c->flags |= CON_CONSDEV;
>   c->flags &= ~CON_PRINTBUFFER; /* don't print again */
>   register_console(c);
>   }

I didn't know about this code.

> The code above was introduced in commit 9e124fe16ff2 ("xen: Enable
> console tty by default in domU if it's not a dummy"). In short, it's aim
> is to set VT as the preferred console only after a working framebuffer
> is registered and thus VT is not the dummy device.
> 
> This patch introduces an update_console_to_preferred function that
> handles the necessary /dev/console change. With this change, the example
> above can be replaced with:
> 
>   console_lock();
>   add_preferred_console("tty", 0, NULL);
>   update_console_to_preferred();
>   console_unlock();
> 
> More importantly, these two calls can be moved to vt.c in order to bump
> its priority when a non-dummy backend for it is introduced, solving that
> problem in general.

Let me take a look over the weekend.

-ss


Re: [RFC PATCH V2 1/9] include/linux/pagemap.h: introduce attach/clear_page_private

2020-04-30 Thread Al Viro
On Thu, Apr 30, 2020 at 03:13:38PM -0700, Matthew Wilcox wrote:

> > +/**
> > + * clear_page_private - clear page's private field and PG_private.
> > + * @page: page to be cleared.
> > + *
> > + * The counterpart function of attach_page_private.
> > + * Return: private data of page or NULL if page doesn't have private data.
> > + */
> 
> Seems to me that the opposite of "attach" is "detach", not "clear".

Or "remove", perhaps...


Re: [PATCH v3 1/1] scsi: pm: Balance pm_only counter of request queue during system resume

2020-04-30 Thread Can Guo

On 2020-05-01 04:32, Bart Van Assche wrote:

On 2020-04-29 22:40, Can Guo wrote:

On 2020-04-30 13:08, Bart Van Assche wrote:

On 2020-04-29 21:10, Can Guo wrote:
During system resume, scsi_resume_device() decreases a request 
queue's
pm_only counter if the scsi device was quiesced before. But after 
that,
if the scsi device's RPM status is RPM_SUSPENDED, the pm_only 
counter is
still held (non-zero). Current scsi resume hook only sets the RPM 
status
of the scsi device and its request queue to RPM_ACTIVE, but leaves 
the

pm_only counter unchanged. This may make the request queue's pm_only
counter remain non-zero after resume hook returns, hence those who 
are
waiting on the mq_freeze_wq would never be woken up. Fix this by 
calling
blk_post_runtime_resume() if pm_only is non-zero to balance the 
pm_only

counter which is held by the scsi device's RPM ops.


How was this issue discovered? How has this patch been tested?


As the issue was found after system resumes, so the issue was 
discovered
during system suspend/resume test, and it is very easy to be 
replicated.
After system resumes, if this issue hits some scsi devices, all bios 
sent
to their request queues are blocked, which may cause a system hang if 
the

scsi devices are vital to system functionality.

To make sure the patch work well, we have tested system suspend/resume
and made sure no system hang happen due to request queues got blocked
by imbalanced pm_only counter.


Thanks, that's very interesting information. My concern with this patch
is that the power management code is not the only caller of
blk_set_pm_only() / blk_clear_pm_only(). E.g. the SCSI SPI code also
calls scsi_device_quiesce() and scsi_device_resume(). These last
functions call blk_set_pm_only() and blk_clear_pm_only(). More calls of
scsi_device_quiesce() and scsi_device_resume() might be added in the 
future.


Has it been considered to test directly whether a SCSI device has been
runtime suspended instead of relying on blk_queue_pm_only()? How about
using pm_runtime_status_suspended() or adding a function in
block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?

Thanks,

Bart.


Hi Bart,

Slightly revised my previous mail.

Please let me address your concern.

First of all, it is allowed to call scsi_device_quiesce() multiple 
times,
but one sdev's request queue's pm_only counter can only be increased 
once

by scsi_device_quiesce(), because if a sdev has already been quiesced,
in scsi_device_quiesce(), scsi_device_set_state(sdev, SDEV_QUIESCE) 
would

return -ENIVAL (illegal state transform), then blk_clear_pm_only() shall
be called to decrease pm_only once, so no matter how many times
scsi_device_quiesce() is called, it can only increase pm_only once.

scsi_device_resume() is same, it calls blk_clear_pm_only only once and
only if the sdev was quiesced().

So, in a word, after scsi_device_resume() returns in 
scsi_dev_type_resume(),
if a sdev has block layer runtime PM enabled (sdev->request_queue->dev 
is not
NULL), its queue's pm_only counter should be 1 (if the sdev's runtime 
power
status is RPM_SUSPENDED) or 0 (if the sdev's runtime power status is 
RPM_ACTIVE).
If a sdev has block layer runtime PM disabled (sdev->request_queue->dev 
is NULL),

its queue's pm_only counter should be 0.

Has it been considered to test directly whether a SCSI device has been
runtime suspended instead of relying on blk_queue_pm_only()? How about
using pm_runtime_status_suspended() or adding a function in
block/blk-pm.h that checks whether q->rpm_status == RPM_SUSPENDED?

Yes, I used to make the patch like that way, and it also worked well, as
both ways are equal actually. I kinda like the current code because we
should be confident that after scsi_dev_type_resume() returns, pm_only
must be 0. Different reviewers may have different opinions, either way
works well anyways.

Thanks,

Can Guo.


Re: [PATCH -next] tty: serial: bcm63xx: fix missing clk_put() in bcm63xx_uart

2020-04-30 Thread Florian Fainelli



On 4/28/2020 4:18 AM, Greg KH wrote:
> On Mon, Apr 27, 2020 at 10:29:58AM -0700, Florian Fainelli wrote:
>>
>>
>> On 4/26/2020 11:19 PM, Jiri Slaby wrote:
>>> On 21. 04. 20, 14:31, Zou Wei wrote:
 This patch fixes below error reported by coccicheck

 drivers/tty/serial/bcm63xx_uart.c:848:2-8: ERROR: missing clk_put;
 clk_get on line 842 and execution via conditional on line 846

 Fixes: ab4382d27412 ("tty: move drivers/serial/ to drivers/tty/serial/")
 Reported-by: Hulk Robot 
 Signed-off-by: Zou Wei 
 ---
  drivers/tty/serial/bcm63xx_uart.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

 diff --git a/drivers/tty/serial/bcm63xx_uart.c 
 b/drivers/tty/serial/bcm63xx_uart.c
 index 5674da2..ed0aa5c 100644
 --- a/drivers/tty/serial/bcm63xx_uart.c
 +++ b/drivers/tty/serial/bcm63xx_uart.c
 @@ -843,8 +843,10 @@ static int bcm_uart_probe(struct platform_device 
 *pdev)
if (IS_ERR(clk) && pdev->dev.of_node)
clk = of_clk_get(pdev->dev.of_node, 0);
  
 -  if (IS_ERR(clk))
 +  if (IS_ERR(clk)) {
 +  clk_put(clk);
>>>
>>> Why would you want to put an erroneous clk?
>>
>> Doh, somehow I completely missed, you are right this does not look legit.
> 
> Ugh, can you send a revert for this please?

Yes, now done:

https://lore.kernel.org/linux-arm-kernel/20200501013904.1394-1-f.faine...@gmail.com/
-- 
Florian


  1   2   3   4   5   6   7   8   9   10   >