date:20200702

Re: [PATCH v2 02/15] perf ftrace: add option '-F/--funcs' to list available functions

2020-07-02 Thread Namhyung Kim

On Sat, Jun 27, 2020 at 10:37 PM Changbin Du  wrote:
>
> This adds an option '-F/--funcs' to list all available functions to trace,
> which is read from tracing file 'available_filter_functions'.
>
> $ sudo ./perf ftrace -F | head
> trace_initcall_finish_cb
> initcall_blacklisted
> do_one_initcall
> do_one_initcall
> trace_initcall_start_cb
> run_init_process
> try_to_run_init_process
> match_dev_by_label
> match_dev_by_uuid
> rootfs_init_fs_context
>
> Signed-off-by: Changbin Du 
>
> ---
> v2: option name '-l/--list-functions' -> '-F/--funcs'
> ---
>  tools/perf/Documentation/perf-ftrace.txt |  4 +++
>  tools/perf/builtin-ftrace.c  | 43 
>  2 files changed, 47 insertions(+)
>
> diff --git a/tools/perf/Documentation/perf-ftrace.txt 
> b/tools/perf/Documentation/perf-ftrace.txt
> index 952e46669168..d79560dea19f 100644
> --- a/tools/perf/Documentation/perf-ftrace.txt
> +++ b/tools/perf/Documentation/perf-ftrace.txt
> @@ -30,6 +30,10 @@ OPTIONS
>  --verbose=::
>  Verbosity level.
>
> +-F::
> +--funcs::
> +List all available functions to trace.
> +
>  -p::
>  --pid=::
> Trace on existing process id (comma separated list).
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index c5718503eded..e793118e83a9 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -32,6 +32,7 @@ struct perf_ftrace {
> struct evlist   *evlist;
> struct target   target;
> const char  *tracer;
> +   boollist_avail_functions;
> struct list_headfilters;
> struct list_headnotrace;
> struct list_headgraph_funcs;
> @@ -127,6 +128,43 @@ static int append_tracing_file(const char *name, const 
> char *val)
> return __write_tracing_file(name, val, true);
>  }
>
> +static int read_tracing_file_to_stdout(const char *name)
> +{
> +   char buf[4096];
> +   char *file;
> +   int fd;
> +   int ret = -1;
> +
> +   file = get_tracing_file(name);
> +   if (!file) {
> +   pr_debug("cannot get tracing file: %s\n", name);
> +   return -1;
> +   }
> +
> +   fd = open(file, O_RDONLY);
> +   if (fd < 0) {
> +   pr_debug("cannot open tracing file: %s: %s\n",
> +name, str_error_r(errno, buf, sizeof(buf)));
> +   goto out;
> +   }
> +
> +   /* read contents to stdout */
> +   while (true) {
> +   int n = read(fd, buf, sizeof(buf));
> +   if (n <= 0)
> +   goto out_close;
> +   if (fwrite(buf, n, 1, stdout) != 1)
> +   goto out_close;
> +   }
> +   ret = 0;

It seems the return value cannot be 0?

Thanks
Namhyung

> +
> +out_close:
> +   close(fd);
> +out:
> +   put_tracing_file(file);
> +   return ret;
> +}
> +
>  static int reset_tracing_cpu(void);
>  static void reset_tracing_filters(void);
>
> @@ -301,6 +339,9 @@ static int __cmd_ftrace(struct perf_ftrace *ftrace, int 
> argc, const char **argv)
> signal(SIGCHLD, sig_handler);
> signal(SIGPIPE, sig_handler);
>
> +   if (ftrace->list_avail_functions)
> +   return 
> read_tracing_file_to_stdout("available_filter_functions");
> +
> if (reset_tracing_files(ftrace) < 0) {
> pr_err("failed to reset ftrace\n");
> goto out;
> @@ -470,6 +511,8 @@ int cmd_ftrace(int argc, const char **argv)
> const struct option ftrace_options[] = {
> OPT_STRING('t', "tracer", , "tracer",
>"tracer to use: function or function_graph (This option is 
> deprecated)"),
> +   OPT_BOOLEAN('F', "funcs", _avail_functions,
> +   "Show available functions to filter"),
> OPT_STRING('p', "pid", , "pid",
>"trace on existing process id"),
> OPT_INCR('v', "verbose", ,
> --
> 2.25.1
>

Re: [PATCH 16/23] seq_file: switch over direct seq_read method calls to seq_read_iter

2020-07-02 Thread Miguel Ojeda

On Thu, Jul 2, 2020 at 3:50 PM Christoph Hellwig  wrote:
>
> Do you have a suggestion for an automated replacement which does?
> I'll happily switch over to that.

I guess I'd simply find the unique set of cases that occur and create
a replacement for each manually. A handful of them or so may already
cover the majority of cases. CC'ing Joe since he deals with this sort
of stuff all the time.

Some cannot be handled with replacements, e.g. re-aligning the full
list is required to fit the longer `_iter` -- if you want to cover
those cases too, applying `clang-format` to the initializer may be a
good approach.

Cheers,
Miguel

Re: [PATCH] sparc: sparc64_defconfig: add necessary configs for qemu

2020-07-02 Thread LABBE Corentin

On Thu, Jul 02, 2020 at 08:58:40PM +0100, Mark Cave-Ayland wrote:
> On 02/07/2020 14:02, Corentin Labbe wrote:
> 
> > The sparc64 qemu machines uses pcnet32 network hardware by default, so for
> > simple boot testing using qemu, having PCNET32 is useful.
> > Same for its storage which is a PATA_CMD64.
> 
> Which version of QEMU are you using? qemu-system-sparc64 switched to using a 
> hme NIC
> by default in version 2.11 (see
> https://wiki.qemu.org/Documentation/Platforms/SPARC#Changes_to_sun4u_machine_from_2.11_onwards)
> which is well over 2 years ago...
> 

You are right, I verfied in the code and it is sunhme by default.
So I will verify it works and send a v2.

Re: [bpf] af7ec13833: will-it-scale.per_process_ops -2.5% regression

2020-07-02 Thread Rong Chen





On 6/29/20 11:10 PM, Yonghong Song wrote:



On 6/28/20 1:50 AM, kernel test robot wrote:

Greeting,

FYI, we noticed a -2.5% regression of will-it-scale.per_process_ops 
due to commit:



commit: af7ec13833619e17f03aa73a785a2f871da6d66b ("bpf: Add 
bpf_skc_to_tcp6_sock() helper")

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


One of previous emails claims that
    commit: 492e639f0c222784e2e0f121966375f641c61b15 ("bpf: Add 
bpf_seq_printf and bpf_seq_write helpers")
is reponsible for 2.5% improvement for will-it-scale.per_process_ops, 
which I believe is false.


This commit should not cause regression.

Probably the variation of performance is caused by test environment 
which you may want to investigate further to reduce false alarming.

Thanks!


Hi Yonghong,

It's a function align issue, the commit effects the align of functions 
which causes a little regression,
we force to set -falign-functions=32 in KBUILD_CFLAGS and the regression 
is gone:


diff --git a/Makefile b/Makefile
index 70def4907036c..9746afa4edc21 100644
--- a/Makefile
+++ b/Makefile
@@ -476,7 +476,7 @@ LINUXINCLUDE    := \
    $(USERINCLUDE)

 KBUILD_AFLAGS   := -D__ASSEMBLY__ -fno-PIE
-KBUILD_CFLAGS   := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
+KBUILD_CFLAGS   := -Wall -Wundef -falign-functions=32 
-Werror=strict-prototypes -Wno-trigraphs \
   -fno-strict-aliasing -fno-common -fshort-wchar 
-fno-PIE \
   -Werror=implicit-function-declaration 
-Werror=implicit-int \

   -Wno-format-security \


Best Regards,
Rong Chen





in testcase: will-it-scale
on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 
2.30GHz with 192G memory

with following parameters:

nr_task: 16
mode: process
test: mmap1
cpufreq_governor: performance
ucode: 0x5002f01

test-description: Will It Scale takes a testcase and runs it from 1 
through to n parallel copies to see if the testcase will scale. It 
builds both a process and threads based test in order to see any 
differences between the two.

test-url: https://github.com/antonblanchard/will-it-scale



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:

[...]

RE: [PATCH 0/3] Dynamic CPU frequency switching for the HiFive

2020-07-02 Thread Yash Shah

> -Original Message-
> From: David Abdurachmanov 
> Sent: 01 July 2020 17:34
> To: Andreas Schwab 
> Cc: Yash Shah ; devicet...@vger.kernel.org; Albert
> Ou ; Atish Patra ; Anup
> Patel ; lolliv...@baylibre.com; linux-
> ker...@vger.kernel.org List ; Green Wan
> ; Sachin Ghadi ;
> robh...@kernel.org; Palmer Dabbelt ;
> deepa.ker...@gmail.com; Paul Walmsley ( Sifive)
> ; Alistair Francis ;
> linux-riscv ; Bin Meng
> 
> Subject: Re: [PATCH 0/3] Dynamic CPU frequency switching for the HiFive
> 
> [External Email] Do not click links or attachments unless you recognize the
> sender and know the content is safe
> 
> On Wed, Jul 1, 2020 at 1:41 PM Andreas Schwab  wrote:
> >
> > On Jun 16 2020, Yash Shah wrote:
> >
> > > The patch series adds the support for dynamic CPU frequency
> > > switching for FU540-C000 SoC on the HiFive Unleashed board. All the
> > > patches are based on Paul Walmsley's work.
> > >
> > > This series is based on Linux v5.7 and tested on HiFive unleashed board.
> >
> > I'm using that patch with 5.7.5.
> >
> > It appears to interfer with serial output when using the ondemand
> > governor.
> 
> I do recall that userspace governor is the only one supported but this might
> have changed before this patch was posted.
> 
> Yash, do you have more details?

Yes, you are right. The userspace governor is the only one supported.

- Yash

> 
> >
> > I also see soft lockups when using the performance governor:
> >
> > [  101.587527] rcu: INFO: rcu_sched self-detected stall on CPU
> > [  101.592322] rcu: 0-...!: (932 ticks this GP)
> idle=11a/1/0x4004 softirq=4301/4301 fqs=4
> > [  101.601432]  (t=6001 jiffies g=4017 q=859) [  101.605514] rcu:
> > rcu_sched kthread starved for 5984 jiffies! g4017 f0x0
> > RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=2 [  101.615494] rcu: RCU grace-
> period kthread stack dump:
> > [  101.620530] rcu_sched   R  running task010  2 
> > 0x
> > [  101.627560] Call Trace:
> > [  101.630004] [] __schedule+0x25c/0x616 [
> > 101.635205] [] schedule+0x42/0xb2 [  101.640070]
> > [] schedule_timeout+0x56/0xb8 [  101.645626]
> > [] rcu_gp_fqs_loop+0x208/0x248 [  101.651266]
> > [] rcu_gp_kthread+0xc2/0xcc [  101.656651]
> > [] kthread+0xda/0xec [  101.661426]
> > [] ret_from_exception+0x0/0xc [  101.666977] Task
> > dump for CPU 0:
> > [  101.670187] loop0   R  running task0   655  2 
> > 0x0008
> > [  101.677218] Call Trace:
> > [  101.679657] [] walk_stackframe+0x0/0xaa [
> > 101.685036] [] show_stack+0x2a/0x34 [  101.690074]
> > [] sched_show_task.part.0+0xc2/0xd2 [  101.696154]
> > [] sched_show_task+0x64/0x66 [  101.701618]
> > [] dump_cpu_task+0x3e/0x48 [  101.706916]
> > [] rcu_dump_cpu_stacks+0x94/0xce [  101.712731]
> > [] print_cpu_stall+0x116/0x18a [  101.718375]
> > [] check_cpu_stall+0xcc/0x1a2 [  101.723929]
> > [] rcu_pending.constprop.0+0x36/0xaa [  101.730094]
> > [] rcu_sched_clock_irq+0xa6/0xea [  101.735913]
> > [] update_process_times+0x1e/0x42 [  101.741821]
> > [] tick_sched_handle+0x26/0x52 [  101.747456]
> > [] tick_sched_timer+0x6a/0xd0 [  101.753015]
> > [] __run_hrtimer.constprop.0+0x50/0xe8
> > [  101.759353] [] __hrtimer_run_queues+0x48/0x6c [
> > 101.765254] [] hrtimer_interrupt+0xca/0x1d4 [
> > 101.770985] [] riscv_timer_interrupt+0x32/0x3a [
> > 101.776976] [] do_IRQ+0xa4/0xb8 [  101.781663]
> > [] ret_from_exception+0x0/0xc
> >
> > Andreas.
> >
> > --
> > Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196
> > BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7 "And now for something
> > completely different."
> >
> > ___
> > linux-riscv mailing list
> > linux-ri...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH 2/2] devicetree: hwmon: shtc1: Add sensirion,shtc1.yaml

2020-07-02 Thread Guenter Roeck

On 7/2/20 8:48 PM, Chris Ruehl wrote:
> Add documentation for the newly added DTS support in the shtc1 driver.
> 
> Signed-off-by: Chris Ruehl 
> ---
>  .../bindings/hwmon/sensirion,shtc1.yaml   | 53 +++
>  1 file changed, 53 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml
> 
> diff --git a/Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml 
> b/Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml
> new file mode 100644
> index ..e3e292bc6d7d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml
> @@ -0,0 +1,53 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/hwmon/sensirion,shtc1.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Sensirion SHTC1 Humidity and Temperature Sensor IC
> +
> +maintainers:
> +  - jdelv...@suse.com
> +
> +description: |
> +  The SHTC1, SHTW1 and SHTC3 are digital humidity and temperature sensor
> +  designed especially for battery-driven high-volume consumer electronics
> +  applications.
> +  For further information refere to Documentation/hwmon/shtc1.rst
> +
> +  This binding document describes the binding for the hardware monitor
> +  portion of the driver.
> +
> +properties:
> +  compatible:
> +enum:
> +  - sensirion,shtc1
> +  - sensirion,shtw1
> +  - sensirion,shtc3
> +
> +  reg: I2C address 0x70
> +
> +Optional properties:
> +  sensirion,blocking_io: |
> +u8, if > 0 the i2c bus hold until measure finished (default 0)
> +  sensirion,high_precision: |
> +u8, if > 0 aquire data with high precision (default 1)
> +

Why u8 and not boolean ?

Guenter

> +required:
> +  - compatible
> +  - reg
> +
> +additionalProperties: false
> +
> +Example:
> +   {
> +status = "okay";
> +clock-frequency = <40>;
> +
> +shtc3@70 {
> +  compatible = "sensirion,shtc3";
> +  reg = <0x70>
> +  sensirion,blocking_io = <1>;
> +  status = "okay";
> +};
> +  };
>

Re: [PATCH RFT] iio: adc: xilinx-xadc: use devm_krealloc()

2020-07-02 Thread kernel test robot

Hi Bartosz,

I love your patch! Yet something to improve:

[auto build test ERROR on iio/togreg]
[also build test ERROR on staging/staging-testing v5.8-rc3 next-20200702]
[cannot apply to xlnx/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Bartosz-Golaszewski/iio-adc-xilinx-xadc-use-devm_krealloc/20200703-002747
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git togreg
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
ca464639a1c9dd3944eb055ffd2796e8c2e7639f)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All error/warnings (new ones prefixed by >>):

>> drivers/iio/adc/xilinx-xadc-core.c:1179:24: error: implicit declaration of 
>> function 'devm_krealloc' [-Werror,-Wimplicit-function-declaration]
   indio_dev->channels = devm_krealloc(dev, channels,
 ^
>> drivers/iio/adc/xilinx-xadc-core.c:1179:22: warning: incompatible integer to 
>> pointer conversion assigning to 'const struct iio_chan_spec *' from 'int' 
>> [-Wint-conversion]
   indio_dev->channels = devm_krealloc(dev, channels,
   ^ 
   1 warning and 1 error generated.

vim +/devm_krealloc +1179 drivers/iio/adc/xilinx-xadc-core.c

  1093  
  1094  static int xadc_parse_dt(struct iio_dev *indio_dev, struct device_node 
*np,
  1095  unsigned int *conf)
  1096  {
  1097  struct device *dev = indio_dev->dev.parent;
  1098  struct xadc *xadc = iio_priv(indio_dev);
  1099  struct iio_chan_spec *channels, *chan;
  1100  struct device_node *chan_node, *child;
  1101  unsigned int num_channels;
  1102  const char *external_mux;
  1103  u32 ext_mux_chan;
  1104  u32 reg;
  1105  int ret;
  1106  
  1107  *conf = 0;
  1108  
  1109  ret = of_property_read_string(np, "xlnx,external-mux", 
_mux);
  1110  if (ret < 0 || strcasecmp(external_mux, "none") == 0)
    xadc->external_mux_mode = XADC_EXTERNAL_MUX_NONE;
  1112  else if (strcasecmp(external_mux, "single") == 0)
  1113  xadc->external_mux_mode = XADC_EXTERNAL_MUX_SINGLE;
  1114  else if (strcasecmp(external_mux, "dual") == 0)
  1115  xadc->external_mux_mode = XADC_EXTERNAL_MUX_DUAL;
  1116  else
  1117  return -EINVAL;
  1118  
  1119  if (xadc->external_mux_mode != XADC_EXTERNAL_MUX_NONE) {
  1120  ret = of_property_read_u32(np, 
"xlnx,external-mux-channel",
  1121  _mux_chan);
  1122  if (ret < 0)
  1123  return ret;
  1124  
  1125  if (xadc->external_mux_mode == 
XADC_EXTERNAL_MUX_SINGLE) {
  1126  if (ext_mux_chan == 0)
  1127  ext_mux_chan = XADC_REG_VPVN;
  1128  else if (ext_mux_chan <= 16)
  1129  ext_mux_chan = 
XADC_REG_VAUX(ext_mux_chan - 1);
  1130  else
  1131  return -EINVAL;
  1132  } else {
  1133  if (ext_mux_chan > 0 && ext_mux_chan <= 8)
  1134  ext_mux_chan = 
XADC_REG_VAUX(ext_mux_chan - 1);
  1135  else
  1136  return -EINVAL;
  1137  }
  1138  
  1139  *conf |= XADC_CONF0_MUX | XADC_CONF0_CHAN(ext_mux_chan);
  1140  }
  1141  
  1142  channels = devm_kmemdup(dev, xadc_channels,
  1143  sizeof(xadc_channels), GFP_KERNEL);
  1144  if (!channels)
  1145  return -ENOMEM;
  1146  
  1147  num_channels = 9;
  1148  chan = [9];
  1149  
  1150  chan_node = of_get_child_by_name(np, "xlnx,channels");
  1151  if (chan_node) {
  1152  for_each_child_of_node(chan_node, child) {
  1153  if (num_channels >= ARRAY_SIZE(xadc_channels)) {

Re: [PATCH v11 2/2] phy: samsung-ufs: add UFS PHY driver for samsung SoC

2020-07-02 Thread kernel test robot

Hi Alim,

I love your patch! Perhaps something to improve:

[auto build test WARNING on robh/for-next]
[also build test WARNING on soc/for-next linus/master v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Alim-Akhtar/dt-bindings-phy-Document-Samsung-UFS-PHY-bindings/20200703-104336
base:   https://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git for-next
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
ca464639a1c9dd3944eb055ffd2796e8c2e7639f)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/phy/samsung/phy-samsung-ufs.c:150:6: warning: variable 'ret' is used 
>> uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (IS_ERR(phy->rx0_symbol_clk)) {
   ^~~
   drivers/phy/samsung/phy-samsung-ufs.c:173:9: note: uninitialized use occurs 
here
   return ret;
  ^~~
   drivers/phy/samsung/phy-samsung-ufs.c:150:2: note: remove the 'if' if its 
condition is always false
   if (IS_ERR(phy->rx0_symbol_clk)) {
   ^~
   drivers/phy/samsung/phy-samsung-ufs.c:144:6: warning: variable 'ret' is used 
uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (IS_ERR(phy->rx0_symbol_clk)) {
   ^~~
   drivers/phy/samsung/phy-samsung-ufs.c:173:9: note: uninitialized use occurs 
here
   return ret;
  ^~~
   drivers/phy/samsung/phy-samsung-ufs.c:144:2: note: remove the 'if' if its 
condition is always false
   if (IS_ERR(phy->rx0_symbol_clk)) {
   ^~
   drivers/phy/samsung/phy-samsung-ufs.c:138:6: warning: variable 'ret' is used 
uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
   if (IS_ERR(phy->tx0_symbol_clk)) {
   ^~~
   drivers/phy/samsung/phy-samsung-ufs.c:173:9: note: uninitialized use occurs 
here
   return ret;
  ^~~
   drivers/phy/samsung/phy-samsung-ufs.c:138:2: note: remove the 'if' if its 
condition is always false
   if (IS_ERR(phy->tx0_symbol_clk)) {
   ^~
   drivers/phy/samsung/phy-samsung-ufs.c:135:9: note: initialize the variable 
'ret' to silence this warning
   int ret;
  ^
   = 0
   3 warnings generated.

vim +150 drivers/phy/samsung/phy-samsung-ufs.c

   132  
   133  static int samsung_ufs_phy_symbol_clk_init(struct samsung_ufs_phy *phy)
   134  {
   135  int ret;
   136  
   137  phy->tx0_symbol_clk = devm_clk_get(phy->dev, "tx0_symbol_clk");
   138  if (IS_ERR(phy->tx0_symbol_clk)) {
   139  dev_err(phy->dev, "failed to get tx0_symbol_clk 
clock\n");
   140  goto out;
   141  }
   142  
   143  phy->rx0_symbol_clk = devm_clk_get(phy->dev, "rx0_symbol_clk");
   144  if (IS_ERR(phy->rx0_symbol_clk)) {
   145  dev_err(phy->dev, "failed to get rx0_symbol_clk 
clock\n");
   146  goto out;
   147  }
   148  
   149  phy->rx1_symbol_clk = devm_clk_get(phy->dev, "rx1_symbol_clk");
 > 150  if (IS_ERR(phy->rx0_symbol_clk)) {
   151  dev_err(phy->dev, "failed to get rx1_symbol_clk 
clock\n");
   152  goto out;
   153  }
   154  
   155  ret = clk_prepare_enable(phy->tx0_symbol_clk);
   156  if (ret) {
   157  dev_err(phy->dev, "%s: tx0_symbol_clk enable failed 
%d\n", __func__, ret);
   158  goto out;
   159  }
   160  ret = clk_prepare_enable(phy->rx0_symbol_clk);
   161  if (ret) {
   162  dev_err(phy->dev, "%s: rx0_symbol_clk enable failed 
%d\n", __func__, ret);
   163  clk_disable_unprepare(phy->tx0_symbol_clk);
   164  goto out;
   165  }
   166  ret = clk_

Re: [PATCH 1/2] hwmon: shtc1: add support for device tree bindings

2020-07-02 Thread Guenter Roeck

On 7/2/20 8:48 PM, Chris Ruehl wrote:
> Add support for DTS bindings to the shtc driver, use CONFIG_OF
> to compile in the code if needed.
> 

Ah, here it is. The introducing patch should say something like "[PATCH 0/2]".

> Signed-off-by: Chris Ruehl 
> ---
>  drivers/hwmon/shtc1.c | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/hwmon/shtc1.c b/drivers/hwmon/shtc1.c
> index a0078ccede03..3bcabc1cbce8 100644
> --- a/drivers/hwmon/shtc1.c
> +++ b/drivers/hwmon/shtc1.c
> @@ -14,6 +14,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_OF

No. Please no conditional includes.

> +#include 
> +#endif
>  
>  /* commands (high precision mode) */
>  static const unsigned char shtc1_cmd_measure_blocking_hpm[]= { 0x7C, 
> 0xA2 };
> @@ -196,6 +199,10 @@ static int shtc1_probe(struct i2c_client *client,
>   enum shtcx_chips chip = id->driver_data;
>   struct i2c_adapter *adap = client->adapter;
>   struct device *dev = >dev;
> +#ifdef CONFIG_OF
> + struct device_node *np = dev->of_node;
> + u8 value;
> +#endif
>  
>   if (!i2c_check_functionality(adap, I2C_FUNC_I2C)) {
>   dev_err(dev, "plain i2c transactions not supported\n");
> @@ -235,6 +242,20 @@ static int shtc1_probe(struct i2c_client *client,
>  
>   if (client->dev.platform_data)
>   data->setup = *(struct shtc1_platform_data *)dev->platform_data;
> +
> +#ifdef CONFIG_OF

Unnecessary ifdef. Selection of devicetree data or not can be made
based on np != NULL. Also, if devictree data is available, platform
data should be ignored to avoid confusion.

> + if (np) {
> + if (of_property_read_bool(np, "sensirion,blocking_io")) {
> + of_property_read_u8(np, "sensirion,blocking_io", 
> );
> + data->setup.blocking_io = (value > 0) ? true : false;
> + }
Why this complexity, and why not just make the property a boolean ?

> + if (of_property_read_bool(np, "sensicon,high_precision")) {
> + of_property_read_u8(np, "sensirion,high_precision", 
> );
> + data->setup.high_precision = (value > 0) ? true : false;

"sensicon,high_precision" should also be a boolean.

> + }
> + }
> +#endif
> +
>   shtc1_select_command(data);
>   mutex_init(>update_lock);
>  
> @@ -257,6 +278,15 @@ static const struct i2c_device_id shtc1_id[] = {
>  };
>  MODULE_DEVICE_TABLE(i2c, shtc1_id);
>  
> +#ifdef CONFIG_OF
> +static const struct of_device_id shtc1_of_match[] = {
> + { .compatible = "sensirion,shtc1" },
> + { .compatible = "sensirion,shtw1" },
> + { .compatible = "sensirion,shtc3" },
> + { }
> +};
> +MODULE_DEVICE_TABLE(of, shtc1_of_match);
> +#endif
>  static struct i2c_driver shtc1_i2c_driver = {
>   .driver.name  = "shtc1",
>   .probe= shtc1_probe,
> 
Not sure how this works without setting of_match_table. I guess it just works
accidentally because .id_table also provides a devicetree match. Still,
of_match_table should be set.

Guenter

Re: [PATCH v2 01/15] perf ftrace: select function/function_graph tracer automatically

2020-07-02 Thread Namhyung Kim

Hello,

On Sat, Jun 27, 2020 at 10:37 PM Changbin Du  wrote:
>
> The '-g/-G' options have already implied function_graph tracer should be
> used instead of function tracer. So the extra option '--tracer' can be
> killed.
>
> This patch changes the behavior as below:
>   - By default, function tracer is used.
>   - If '-g' or '-G' option is on, then function_graph tracer is used.
>   - The perf configuration item 'ftrace.tracer' is marked as deprecated.
>   - The option '--tracer' is marked as deprecated.
>   - The default filter for -G/-T is to trace all functions.
>
> Here are some examples.
>
> This will start tracing all functions using function tracer:
>   $ sudo perf ftrace
>
> This will trace all functions using function graph tracer:
>   $ sudo perf ftrace -G
>
> This will trace function vfs_read using function graph tracer:
>   $ sudo perf ftrace -G vfs_read

As we support running a new task on the command line, it might
confuse users whether it's an argument of -G option or a task to run.
One can use -- to separate them but it's easy to miss..

Thanks
Namhyung


>
> Signed-off-by: Changbin Du 
> ---
>  tools/perf/Documentation/perf-config.txt |  5 -
>  tools/perf/Documentation/perf-ftrace.txt |  2 +-
>  tools/perf/builtin-ftrace.c  | 19 ---
>  3 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-config.txt 
> b/tools/perf/Documentation/perf-config.txt
> index c7d3df5798e2..a25fee7de3b2 100644
> --- a/tools/perf/Documentation/perf-config.txt
> +++ b/tools/perf/Documentation/perf-config.txt
> @@ -612,11 +612,6 @@ trace.*::
> "libbeauty", the default, to use the same argument 
> beautifiers used in the
> strace-like sys_enter+sys_exit lines.
>
> -ftrace.*::
> -   ftrace.tracer::
> -   Can be used to select the default tracer. Possible values are
> -   'function' and 'function_graph'.
> -
>  llvm.*::
> llvm.clang-path::
> Path to clang. If omit, search it from $PATH.
> diff --git a/tools/perf/Documentation/perf-ftrace.txt 
> b/tools/perf/Documentation/perf-ftrace.txt
> index b80c84307dc9..952e46669168 100644
> --- a/tools/perf/Documentation/perf-ftrace.txt
> +++ b/tools/perf/Documentation/perf-ftrace.txt
> @@ -24,7 +24,7 @@ OPTIONS
>
>  -t::
>  --tracer=::
> -   Tracer to use: function_graph or function.
> +   Tracer to use: function_graph or function. This option is deprecated.
>
>  -v::
>  --verbose=::
> diff --git a/tools/perf/builtin-ftrace.c b/tools/perf/builtin-ftrace.c
> index 2bfc1b0db536..c5718503eded 100644
> --- a/tools/perf/builtin-ftrace.c
> +++ b/tools/perf/builtin-ftrace.c
> @@ -27,7 +27,6 @@
>  #include "util/cap.h"
>  #include "util/config.h"
>
> -#define DEFAULT_TRACER  "function_graph"
>
>  struct perf_ftrace {
> struct evlist   *evlist;
> @@ -419,6 +418,7 @@ static int perf_ftrace_config(const char *var, const char 
> *value, void *cb)
> if (strcmp(var, "ftrace.tracer"))
> return -1;
>
> +   pr_warning("Configuration ftrace.tracer is deprecated\n");
> if (!strcmp(value, "function_graph") ||
> !strcmp(value, "function")) {
> ftrace->tracer = value;
> @@ -459,7 +459,7 @@ int cmd_ftrace(int argc, const char **argv)
>  {
> int ret;
> struct perf_ftrace ftrace = {
> -   .tracer = DEFAULT_TRACER,
> +   .tracer = "function",
> .target = { .uid = UINT_MAX, },
> };
> const char * const ftrace_usage[] = {
> @@ -469,7 +469,7 @@ int cmd_ftrace(int argc, const char **argv)
> };
> const struct option ftrace_options[] = {
> OPT_STRING('t', "tracer", , "tracer",
> -  "tracer to use: function_graph(default) or function"),
> +  "tracer to use: function or function_graph (This option is 
> deprecated)"),
> OPT_STRING('p', "pid", , "pid",
>"trace on existing process id"),
> OPT_INCR('v', "verbose", ,
> @@ -478,12 +478,14 @@ int cmd_ftrace(int argc, const char **argv)
> "system-wide collection from all CPUs"),
> OPT_STRING('C', "cpu", _list, "cpu",
> "list of cpus to monitor"),
> -   OPT_CALLBACK('T', "trace-funcs", , "func",
> -"trace given functions only", parse_filter_func),
> +   OPT_CALLBACK_DEFAULT('T', "trace-funcs", , "func",
> +"trace given functions using function tracer",
> +parse_filter_func, "*"),
> OPT_CALLBACK('N', "notrace-funcs", , "func",
>  "do not trace given functions", parse_filter_func),
> -   OPT_CALLBACK('G', "graph-funcs", _funcs, "func",
> -"Set graph filter on given functions", 
> parse_filter_func),
> +   OPT_CALLBACK_DEFAULT('G', "graph-funcs", _funcs, "func",
> +

Re: [GIT PULL] fixes for v5.8-rc4

2020-07-02 Thread Linus Torvalds

On Thu, Jul 2, 2020 at 1:51 PM Christian Brauner
 wrote:
>
> A few comments on this since doing a grep data_race() reveals that currently
> only kernel/rcu/* is making use of this new annotation and this seems to be 
> the
> first annotation in core kernel: when this was first sent to me I was 
> obviously
> aware of the existence of KCSAN but not whether we had established a consenus
> around annotating places in the (core) kernel where we currently have benign
> data races that KCSAN complains about. I don't know whether we have reached a
> consensus in general yet or we're just doing this subsystem specific.

I'm not sure there's any consensus, and it depends on the quality of
the KCSAN bug reports (and the reporters). There were some KCSAN
reports that seemed to not actually be real data races as much as
"KCSAN was being stupid and not understanding idempotent value
setting".

It also depends on the quality of the patch and the description. If
KCSAN patches end up being "just shut up the tool", I will stop taking
them.

But in this case, as you say, we already had the comment about the
situation, and telling the tool about it obviously won't hurt.

Linus

Re: [PATCH 24/30] usb: mtu3: mtu3_trace: Supply missing mtu3_debug.h include file

2020-07-02 Thread kernel test robot

Hi Lee,

I love your patch! Perhaps something to improve:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on balbi-usb/testing/next char-misc/char-misc-testing 
v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Lee-Jones/Fix-a-bunch-of-W-1-issues-in-USB/20200702-225210
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git 
usb-testing
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
003a086ffc0d1affbb8300b36225fb8150a2d40a)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

   In file included from drivers/usb/mtu3/mtu3_trace.c:11:
>> drivers/usb/mtu3/mtu3_debug.h:29:36: warning: declaration of 'struct 
>> ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_dev_debugfs_init(struct ssusb_mtk *ssusb);
  ^
   drivers/usb/mtu3/mtu3_debug.h:30:35: warning: declaration of 'struct 
ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_dr_debugfs_init(struct ssusb_mtk *ssusb);
 ^
   drivers/usb/mtu3/mtu3_debug.h:31:39: warning: declaration of 'struct 
ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_debugfs_create_root(struct ssusb_mtk *ssusb);
 ^
   drivers/usb/mtu3/mtu3_debug.h:32:39: warning: declaration of 'struct 
ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_debugfs_remove_root(struct ssusb_mtk *ssusb);
 ^
   In file included from drivers/usb/mtu3/mtu3_trace.c:12:
   In file included from drivers/usb/mtu3/mtu3_trace.h:279:
   include/trace/define_trace.h:95:10: fatal error: './mtu3_trace.h' file not 
found
   #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
^
   include/trace/define_trace.h:90:32: note: expanded from macro 'TRACE_INCLUDE'
   # define TRACE_INCLUDE(system) __TRACE_INCLUDE(system)
  ^~~
   include/trace/define_trace.h:87:34: note: expanded from macro 
'__TRACE_INCLUDE'
   # define __TRACE_INCLUDE(system) __stringify(TRACE_INCLUDE_PATH/system.h)
^~~~
   include/linux/stringify.h:10:27: note: expanded from macro '__stringify'
   #define __stringify(x...)   __stringify_1(x)
   ^~~~
   include/linux/stringify.h:9:29: note: expanded from macro '__stringify_1'
   #define __stringify_1(x...) #x
   ^~
   :48:1: note: expanded from here
   "./mtu3_trace.h"
   ^~~~
   4 warnings and 1 error generated.
--
   In file included from drivers/usb/mtu3/mtu3_trace.c:11:
>> drivers/usb/mtu3/mtu3_debug.h:29:36: warning: declaration of 'struct 
>> ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_dev_debugfs_init(struct ssusb_mtk *ssusb);
  ^
   drivers/usb/mtu3/mtu3_debug.h:30:35: warning: declaration of 'struct 
ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_dr_debugfs_init(struct ssusb_mtk *ssusb);
 ^
   drivers/usb/mtu3/mtu3_debug.h:31:39: warning: declaration of 'struct 
ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_debugfs_create_root(struct ssusb_mtk *ssusb);
 ^
   drivers/usb/mtu3/mtu3_debug.h:32:39: warning: declaration of 'struct 
ssusb_mtk' will not be visible outside of this function [-Wvisibility]
   void ssusb_debugfs_remove_root(struct ssusb_mtk *ssusb);
 ^
   4 warnings generated.

vim +29 drivers/usb/mtu3/mtu3_debug.h

ae07809255d3e3 Chunfeng Yun 2019-03-21  27  
ae07809255d3e3 Chunfeng Yun 2019-03-21  28  #if IS_ENABLED(CONFIG_DEBUG_FS)
ae07809255d3e3 Chunfeng Yun 2019-03-21 @29  void ssusb_dev_debugfs_init(struct 
ssusb_mtk *ssusb);
4aab6ad24a101b Chunfeng Yun 2019-03-21  30  void ssusb_dr_debugfs_init(struct 
ssusb_mtk *ssusb);
ae078092

Re: [GIT PULL] Kselftest fixes update for Linux 5.8-rc4

2020-07-02 Thread pr-tracker-bot

The pull request you sent on Thu, 2 Jul 2020 10:26:58 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> tags/linux-kselftest-fixes-5.8-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/0dce88451f9c1cc5f1b73818e0608d5f84499e9a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] Kunit fixes update for Linux 5.8-rc4

2020-07-02 Thread pr-tracker-bot

The pull request you sent on Thu, 2 Jul 2020 09:23:55 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest 
> tags/linux-kselftest-kunit-fixes-5.8-rc4

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/55844741a1e74bd41b4cea57502c2efedc99bf47

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

Re: [GIT PULL] nfsd bugfixes for 5.8

2020-07-02 Thread pr-tracker-bot

The pull request you sent on Thu, 2 Jul 2020 11:10:39 -0400:

> git://linux-nfs.org/~bfields/linux.git tags/nfsd-5.8-1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/083176c86ffae8c9b467358eca5ba05a54a27898

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

mainline/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15

2020-07-02 Thread kernelci.org bot

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has  *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.  *
*   *
* If you do send a fix, please include this trailer:*
*   Reported-by: "kernelci.org bot"   *
*   *
* Hope this helps!  *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

mainline/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15

Summary:
  Start:  7cc2a8ea1048 Merge tag 'block-5.8-2020-07-01' of 
git://git.kernel.dk/linux-block
  Plain log:  
https://storage.kernelci.org/mainline/master/v5.8-rc3-37-g7cc2a8ea1048/arm/vexpress_defconfig/gcc-8/lab-cip/baseline-vexpress-v2p-ca15-tc1.txt
  HTML log:   
https://storage.kernelci.org/mainline/master/v5.8-rc3-37-g7cc2a8ea1048/arm/vexpress_defconfig/gcc-8/lab-cip/baseline-vexpress-v2p-ca15-tc1.html
  Result: 38ac46002d1d arm: dts: vexpress: Move mcc node back into 
motherboard node

Checks:
  revert: PASS
  verify: PASS

Parameters:
  Tree:   mainline
  URL:https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
  Branch: master
  Target: qemu_arm-vexpress-a15
  CPU arch:   arm
  Lab:lab-cip
  Compiler:   gcc-8
  Config: vexpress_defconfig
  Test case:  baseline.dmesg.crit

Breaking commit found:

---
commit 38ac46002d1df5707566a73486452851341028d2
Author: Andre Przywara 
Date:   Wed Jun 3 17:22:37 2020 +0100

arm: dts: vexpress: Move mcc node back into motherboard node

Commit d9258898ad49 ("arm64: dts: arm: vexpress: Move fixed devices
out of bus node") moved the "mcc" DT node into the root node, because
it does not have any children using "reg" properties, so does violate
some dtc checks about "simple-bus" nodes.

However this broke the vexpress config-bus code, which walks up the
device tree to find the first node with an "arm,vexpress,site" property.
This gave the wrong result (matching the root node instead of the
motherboard node), so broke the clocks and some other devices for
VExpress boards.

Move the whole node back into its original position. This re-introduces
the dtc warning, but is conceptually the right thing to do. The dtc
warning seems to be overzealous here, there are discussions on fixing or
relaxing this check instead.

Link: 
https://lore.kernel.org/r/20200603162237.16319-1-andre.przyw...@arm.com
Fixes: d9258898ad49 ("arm64: dts: vexpress: Move fixed devices out of bus 
node")
Reported-and-tested-by: Guenter Roeck 
Signed-off-by: Andre Przywara 
Signed-off-by: Sudeep Holla 

diff --git a/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi 
b/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi
index e6308fb76183..a88ee5294d35 100644
--- a/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi
+++ b/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi
@@ -100,79 +100,6 @@
};
};
 
-   mcc {
-   compatible = "arm,vexpress,config-bus";
-   arm,vexpress,config-bridge = <_sysreg>;
-
-   oscclk0 {
-   /* MCC static memory clock */
-   compatible = "arm,vexpress-osc";
-   arm,vexpress-sysreg,func = <1 0>;
-   freq-range = <2500 6000>;
-   #clock-cells = <0>;
-   clock-output-names = "v2m:oscclk0";
-   };
-
-   v2m_oscclk1: oscclk1 {
-   /* CLCD clock */
-   compatible = "arm,vexpress-osc";
-   arm,vexpress-sysreg,func = <1 1>;
-   freq-range = <2375 6500>;
-   #clock-cells = <0>;
-   clock-output-names = "v2m:oscclk1";
-   };
-
-   v2m_oscclk2: oscclk2 {
-   /* IO FPGA peripheral clock */
-   compatible = "arm,vexpress-osc";
-   arm,vexpress-sysreg,func = <1 2>;
-   freq-range = <2400 2400>;
-   #clock-cells = <0>;
-   clock-output-names = "v2m:oscclk2";
-   };
-
-   volt-vio {
-   /* Logic level voltage */
-   compatible = "arm,vexpress-volt";
-   arm,vexpress-sysreg,func = <2 0>;
-   regulator-name = "VIO";
-   regulator-always-on;
-   label = "VIO";
-   };
-
-   temp-mcc {
-

Re: [PATCH] editorconfig: Add automatic editor configuration file

2020-07-02 Thread Miguel Ojeda

Hi Danny,

On Fri, Jul 3, 2020 at 2:16 AM Danny Lin  wrote:
>
> +[*]
> +charset = utf-8
> +end_of_line = lf

While UTF-8 and LF are probably OK for all files, I am not 100% sure about:

> +insert_final_newline = true
> +indent_style = tab
> +indent_size = 8

for other languages and non-code files we may have around. Perhaps it
is best to avoid `[*]` unless we are sure?

Cheers,
Miguel

Re: [PATCH] shtc1: add support for device tree bindings

2020-07-02 Thread Guenter Roeck

On 7/2/20 8:48 PM, Chris Ruehl wrote:
> Add support for DTS bindings to the shtc driver
> The patches add the compatible table and of_property_read* to the
> shtc1.c. Newly created Yaml document has been released to the
> Documentation/devicetree/hwmon/sensirion,shtc1.yaml
> 
> Signed-off-by: Chris Ruehl 
> ---
>  Version 1
> 

There is no patch.

Guenter

Re: [PATCH] ASoC: cros_ec_codec: Log results when EC commands fail

2020-07-02 Thread Guenter Roeck

On Thu, Jul 2, 2020 at 8:30 PM Yu-Hsuan Hsu  wrote:
>
> Log results of failed EC commands to identify a problem more easily.
>
> Signed-off-by: Yu-Hsuan Hsu 
> ---
>  sound/soc/codecs/cros_ec_codec.c | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/sound/soc/codecs/cros_ec_codec.c 
> b/sound/soc/codecs/cros_ec_codec.c
> index 8d45c628e988e..a4ab62f59efa6 100644
> --- a/sound/soc/codecs/cros_ec_codec.c
> +++ b/sound/soc/codecs/cros_ec_codec.c
> @@ -90,10 +90,17 @@ static int send_ec_host_command(struct cros_ec_device 
> *ec_dev, uint32_t cmd,
> if (outsize)
> memcpy(msg->data, out, outsize);
>
> -   ret = cros_ec_cmd_xfer_status(ec_dev, msg);
> +   ret = cros_ec_cmd_xfer(ec_dev, msg);

This change isn't explained in the description.

Guenter

> if (ret < 0)
> goto error;
>
> +   if (msg->result != EC_RES_SUCCESS) {
> +   dev_err(ec_dev->dev, "Command %d failed: %d\n", cmd,
> +   msg->result);
> +   ret = -EPROTO;
> +   goto error;
> +   }
> +
> if (insize)
> memcpy(in, msg->data, insize);
>
> --
> 2.27.0.212.ge8ba1cc988-goog
>

Re: [PATCH 2/2] block: enable zone-append for iov_iter of bvec type

2020-07-02 Thread Damien Le Moal

On 2020/07/03 0:42, Kanchan Joshi wrote:
> zone-append with bvec iov_iter gives WARN_ON, and returns -EINVAL.
> Add new helper to process such iov_iter and add pages in bio honoring
> zone-append specific constraints.
> 
> Signed-off-by: Kanchan Joshi 
> Signed-off-by: Selvakumar S 
> Signed-off-by: Nitesh Shetty 
> Signed-off-by: Javier Gonzalez 
> ---
>  block/bio.c | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 0cecdbc..ade9da7 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -975,6 +975,30 @@ static int __bio_iov_bvec_add_pages(struct bio *bio, 
> struct iov_iter *iter)
>   iov_iter_advance(iter, size);
>   return 0;
>  }
> +static int __bio_iov_bvec_append_add_pages(struct bio *bio, struct iov_iter 
> *iter)
> +{
> + const struct bio_vec *bv = iter->bvec;
> + unsigned int len;
> + size_t size;
> + struct request_queue *q = bio->bi_disk->queue;
> + unsigned int max_append_sectors = queue_max_zone_append_sectors(q);
> + bool same_page = false;
> +
> + if (WARN_ON_ONCE(!max_append_sectors))
> + return -EINVAL;
> +
> + if (WARN_ON_ONCE(iter->iov_offset > bv->bv_len))
> + return -EINVAL;
> +
> + len = min_t(size_t, bv->bv_len - iter->iov_offset, iter->count);
> + size = bio_add_hw_page(q, bio, bv->bv_page, len,
> + bv->bv_offset + iter->iov_offset,
> + max_append_sectors, _page);
> + if (unlikely(size != len))
> + return -EINVAL;
> + iov_iter_advance(iter, size);
> + return 0;
> +}
>  
>  #define PAGE_PTRS_PER_BVEC (sizeof(struct bio_vec) / sizeof(struct page 
> *))
>  
> @@ -1105,9 +1129,10 @@ int bio_iov_iter_get_pages(struct bio *bio, struct 
> iov_iter *iter)
>  
>   do {
>   if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
> - if (WARN_ON_ONCE(is_bvec))
> - return -EINVAL;
> - ret = __bio_iov_append_get_pages(bio, iter);
> + if (is_bvec)
> + ret = __bio_iov_bvec_append_add_pages(bio, 
> iter);
> + else
> + ret = __bio_iov_append_get_pages(bio, iter);
>   } else {
>   if (is_bvec)
>   ret = __bio_iov_bvec_add_pages(bio, iter);
> 

The only user of this function that issues zone append requests is zonefs. The
issued requests are not using bvec iter but a user direct IO buffer iter. So
this change would have no user at all as far as I can see. Am I missing
something ? What IO path makes this change necessary ?


-- 
Damien Le Moal
Western Digital Research

Re: [PATCH v5 01/12] iommu: Change type of pasid to u32

2020-07-02 Thread Felix Kuehling

Am 2020-07-02 um 3:10 p.m. schrieb Fenghua Yu:
> Hi, Felix, Thomas, Joerg and maintainers,
>
> On Tue, Jun 30, 2020 at 10:12:38PM -0400, Felix Kuehling wrote:
>> Am 2020-06-30 um 7:44 p.m. schrieb Fenghua Yu:
>> You didn't change the return types of amdgpu_pasid_alloc and
>> kfd_pasid_alloc. amdgpu_pasid_alloc returns int, because it can return
>> negative error codes. But kfd_pasid_alloc could be updated, because it
>> returns 0 for errors.
> I fixed return type as "u32" for kfd_pasid_alloc().

Thank you. The patch is

Acked-by: Felix Kuehling 



>
> The fix is minor and only limited in patch 1. So instead of sending the
> whole series, I only send the updated patch 1 here. If you want me to
> send the whole series with the fix, I can do that too.
>
> Thanks.
>
> -Fenghua
>
> From 4ff6c14bb0761dd97d803350d31f87edc4336345 Mon Sep 17 00:00:00 2001
> From: Fenghua Yu 
> Date: Mon, 4 May 2020 18:00:55 +
> Subject: [PATCH v5.1 01/12] iommu: Change type of pasid to u32
>
> PASID is defined as a few different types in iommu including "int",
> "u32", and "unsigned int". To be consistent and to match with uapi
> definitions, define PASID and its variations (e.g. max PASID) as "u32".
> "u32" is also shorter and a little more explicit than "unsigned int".
>
> No PASID type change in uapi although it defines PASID as __u64 in
> some places.
>
> Suggested-by: Thomas Gleixner 
> Signed-off-by: Fenghua Yu 
> Reviewed-by: Tony Luck 
> Reviewed-by: Lu Baolu 
> ---
> v5.1:
> - Change return type to u32 for kfd_pasid_alloc() (Felix)
>
> v5:
> - Reviewed by Lu Baolu
>
> v4:
> - Change PASID type from "unsigned int" to "u32" (Christoph)
>
> v2:
> - Create this new patch to define PASID as "unsigned int" consistently in
>   iommu (Thomas)
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h|  4 +--
>  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c|  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |  6 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h   |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  8 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  8 ++---
>  .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c   |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h   |  2 +-
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  8 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.h   |  4 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c|  6 ++--
>  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c|  4 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 20 ++--
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  2 +-
>  .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
>  drivers/iommu/amd/amd_iommu.h | 10 +++---
>  drivers/iommu/amd/iommu.c | 31 ++-
>  drivers/iommu/amd/iommu_v2.c  | 20 ++--
>  drivers/iommu/intel/dmar.c|  7 +++--
>  drivers/iommu/intel/intel-pasid.h | 24 +++---
>  drivers/iommu/intel/iommu.c   |  4 +--
>  drivers/iommu/intel/pasid.c   | 31 +--
>  drivers/iommu/intel/svm.c | 12 +++
>  drivers/iommu/iommu.c |  2 +-
>  drivers/misc/uacce/uacce.c|  2 +-
>  include/linux/amd-iommu.h |  8 ++---
>  include/linux/intel-iommu.h   | 12 +++
>  include/linux/intel-svm.h |  2 +-
>  include/linux/iommu.h | 10 +++---
>  include/linux/uacce.h |  2 +-
>  38 files changed, 141 insertions(+), 141 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index ffe149aafc39..dfef5a7e0f5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct 
> kgd_dev *dst, struct kgd_dev *s
>   })
>  
>  /* GPUVM API */
> -int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int 
> pasid,
> +int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
>   void **vm, void **process_info,
>   struct dma_fence **ef);
>  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
> - struct file *filp, unsigned int pasid,
> + struct file *filp, u32

Re: [PATCH 1/9] media: rkvdec: h264: Support profile and level controls

2020-07-02 Thread Jonas Karlman

On 2020-07-03 04:54, Ezequiel Garcia wrote:
> On Wed, 2020-07-01 at 21:56 +, Jonas Karlman wrote:
>> The Rockchip Video Decoder used in RK3399 supports H.264 profiles from
>> Baseline to High 4:2:2 up to Level 5.1, except for the Extended profile.
>>
>> Expose the V4L2_CID_MPEG_VIDEO_H264_PROFILE and the
>> V4L2_CID_MPEG_VIDEO_H264_LEVEL control, so that userspace can query the
>> driver for the list of supported profiles and level.
>>
>> In current state only Baseline to High profile is supported by the driver.
>>
>> Signed-off-by: Jonas Karlman 
> 
> I think the patch is good so:
> 
> Reviewed-by: Ezequiel Garcia 
> 
> However, feel free to just drop this patch and support the profiles
> and levels at the end of the patchset, once High 10 and High 422
> support is there.

Sure, that makes more sense, will move to end in v2.

Regards,
Jonas

> 
> Thanks,
> Ezequiel
> 
>> ---
>>  drivers/staging/media/rkvdec/rkvdec.c | 13 +
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/drivers/staging/media/rkvdec/rkvdec.c 
>> b/drivers/staging/media/rkvdec/rkvdec.c
>> index 0f81b47792f6..b1de55aa6535 100644
>> --- a/drivers/staging/media/rkvdec/rkvdec.c
>> +++ b/drivers/staging/media/rkvdec/rkvdec.c
>> @@ -94,6 +94,19 @@ static const struct rkvdec_ctrl_desc 
>> rkvdec_h264_ctrl_descs[] = {
>>  .cfg.def = V4L2_MPEG_VIDEO_H264_START_CODE_ANNEX_B,
>>  .cfg.max = V4L2_MPEG_VIDEO_H264_START_CODE_ANNEX_B,
>>  },
>> +{
>> +.cfg.id = V4L2_CID_MPEG_VIDEO_H264_PROFILE,
>> +.cfg.min = V4L2_MPEG_VIDEO_H264_PROFILE_BASELINE,
>> +.cfg.max = V4L2_MPEG_VIDEO_H264_PROFILE_HIGH,
>> +.cfg.menu_skip_mask =
>> +BIT(V4L2_MPEG_VIDEO_H264_PROFILE_EXTENDED),
>> +.cfg.def = V4L2_MPEG_VIDEO_H264_PROFILE_MAIN,
>> +},
>> +{
>> +.cfg.id = V4L2_CID_MPEG_VIDEO_H264_LEVEL,
>> +.cfg.min = V4L2_MPEG_VIDEO_H264_LEVEL_1_0,
>> +.cfg.max = V4L2_MPEG_VIDEO_H264_LEVEL_5_1,
>> +},
>>  };
>>  
>>  static const struct rkvdec_ctrls rkvdec_h264_ctrls = {
> 
>

Re: [PATCH 1/2] block: fix error code for zone-append

2020-07-02 Thread Damien Le Moal

On 2020/07/03 0:42, Kanchan Joshi wrote:
> avoid returning success when it should report failure, preventing
> odd behavior in caller.

You can be more precise here: the odd behavior is an infinite loop in
bio_iov_iter_get_pages() whcih is is the only user of 
__bio_iov_append_get_pages().

> 
> Signed-off-by: Kanchan Joshi 
> Signed-off-by: Selvakumar S 
> Signed-off-by: Nitesh Shetty 
> Signed-off-by: Javier Gonzalez 
> ---
>  block/bio.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index a7366c0..0cecdbc 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -1044,7 +1044,7 @@ static int __bio_iov_append_get_pages(struct bio *bio, 
> struct iov_iter *iter)
>   size_t offset;
>  
>   if (WARN_ON_ONCE(!max_append_sectors))
> - return 0;
> + return -EINVAL;
>  
>   /*
>* Move page array up in the allocated memory for the bio vecs as far as
> 

Note: the odd behavior mentioned in the commit message cannot currently be
triggered since only zonefs issues REQ_OP_ZONE_APPEND BIOs so we are guaranteed
that max_append_sectors is not 0 in that case. But this fix certainly makes
things more solid. So:

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research

Re: objtool clac/stac handling change..

2020-07-02 Thread Christophe Leroy





Le 03/07/2020 à 05:17, Michael Ellerman a écrit :

Christophe Leroy  writes:

Le 02/07/2020 à 15:34, Michael Ellerman a écrit :

Linus Torvalds  writes:

On Wed, Jul 1, 2020 at 12:59 PM Al Viro  wrote:

On Wed, Jul 01, 2020 at 12:04:36PM -0700, Linus Torvalds wrote:


That's actually for the access granting. Shutting the access down ends
up always doing the same thing anyway..


#define user_read_access_endprevent_current_read_from_user
#define user_write_access_end   prevent_current_write_to_user
static inline void prevent_current_read_from_user(void)
{
  prevent_user_access(NULL, NULL, ~0UL, KUAP_CURRENT_READ);
}

static inline void prevent_current_write_to_user(void)
{
  prevent_user_access(NULL, NULL, ~0UL, KUAP_CURRENT_WRITE);
}

and prevent_user_access() has instances that do care about the direction...


Go and look closer.

There are three cases:

   (a) the 32-bit book3s case. It looks like it cares, but when you look
closer, it ends up not caring about the read side, and saving the
"which address to I allow user writes to" in current->thread.kuap

   (b) the nohash 32-bit case - doesn't care

   (c) the 64-bit books case - doesn't care

So yes, in the (a) case it does make a difference between reads and
writes, but at least as far as I can tell, it ignores the read case,
and has code to avoid the unnecessary "disable user writes" case when
there was only a read enable done.


Yeah that's my understanding too.

Christophe is the expert on that code so I'll defer to him if I'm wrong.


Now, it's possible that I'm wrong, but the upshot of that is that even
on powerpc, I think that if we just made the rule be that "taking a
user exception should automatically do the 'user_access_end()' for us"
is trivial.


I think we can do something to make it work.

We don't have an equivalent of x86's ex_handler_uaccess(), so it's not
quite as easy as whacking a user_access_end() in there.


Isn't it something easy to do in bad_page_fault() ?


We'd need to do it there at least.

But I'm not convinced that's the only place we'd need to do it. We could
theoretically take a machine check on a user access, and those are
handled differently on each sub-(sub-sub)-platform, and I think all or
most of them don't call bad_page_fault().


Indeed, it needs to be done everywhere we do

regs->nip = extable_fixup(entry)

There are half a dozen of places that do that, in additional of 
bad_page_fault() that's mainly machine checks, also kprobe.


I think we can create a fixup_exception() function which takes regs and 
entry as parameters and does the nip fixup and kuap closuse.





Not exactly a call to user_access_end() but altering regs->kuap so that
user access is not restored on exception exit.


Yes.


Probably the simplest option for us is to just handle it in our
unsafe_op_wrap(). I'll try and come up with something tomorrow.


unsafe_op_wrap() is not used anymore for unsafe_put_user() as we are now
using asm goto.


Sure, but we could change it back to use unsafe_op_wrap().


But the whole purpose of using goto in unsafe_???_user() is to allow the 
use of asm goto. See explanations in commit 
https://github.com/linuxppc/linux/commit/1bd4403d86a1c06cb6cc9ac87664a0c9d3413d51#diff-eba084de047bb8a9087dac10c06f44bc





I did a quick hack to do that and see no difference in the generated
code, but your commit adding put_user_goto() did show better code
generation, so possibly it depends on compiler version, or my example
wasn't complicated enough (filldir()).


Yes as explained above it should remove the error checking in the caller 
so your exemple was most likely too trivial.


Christophe

Re: [PATCH 2/2] arm64: Allocate crashkernel always in ZONE_DMA

2020-07-02 Thread chenzhou

Hi Bhupesh,


On 2020/7/3 3:22, Bhupesh Sharma wrote:
> Hi Will,
>
> On Thu, Jul 2, 2020 at 1:20 PM Will Deacon  wrote:
>> On Thu, Jul 02, 2020 at 03:44:20AM +0530, Bhupesh Sharma wrote:
>>> commit bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in
>>> ZONE_DMA32") allocates crashkernel for arm64 in the ZONE_DMA32.
>>>
>>> However as reported by Prabhakar, this breaks kdump kernel booting in
>>> ThunderX2 like arm64 systems. I have noticed this on another ampere
>>> arm64 machine. The OOM log in the kdump kernel looks like this:
>>>
>>>   [0.240552] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
>>> allocations
>>>   [0.247713] swapper/0: page allocation failure: order:1, 
>>> mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>   <..snip..>
>>>   [0.274706] Call trace:
>>>   [0.277170]  dump_backtrace+0x0/0x208
>>>   [0.280863]  show_stack+0x1c/0x28
>>>   [0.284207]  dump_stack+0xc4/0x10c
>>>   [0.287638]  warn_alloc+0x104/0x170
>>>   [0.291156]  __alloc_pages_slowpath.constprop.106+0xb08/0xb48
>>>   [0.296958]  __alloc_pages_nodemask+0x2ac/0x2f8
>>>   [0.301530]  alloc_page_interleave+0x20/0x90
>>>   [0.305839]  alloc_pages_current+0xdc/0xf8
>>>   [0.309972]  atomic_pool_expand+0x60/0x210
>>>   [0.314108]  __dma_atomic_pool_init+0x50/0xa4
>>>   [0.318504]  dma_atomic_pool_init+0xac/0x158
>>>   [0.322813]  do_one_initcall+0x50/0x218
>>>   [0.326684]  kernel_init_freeable+0x22c/0x2d0
>>>   [0.331083]  kernel_init+0x18/0x110
>>>   [0.334600]  ret_from_fork+0x10/0x18
>>>
>>> This patch limits the crashkernel allocation to the first 1GB of
>>> the RAM accessible (ZONE_DMA), as otherwise we might run into OOM
>>> issues when crashkernel is executed, as it might have been originally
>>> allocated from either a ZONE_DMA32 memory or mixture of memory chunks
>>> belonging to both ZONE_DMA and ZONE_DMA32.
>> How does this interact with this ongoing series:
>>
>> https://lore.kernel.org/r/20200628083458.40066-1-chenzho...@huawei.com
>>
>> (patch 4, in particular)
> Many thanks for having a look at this patchset. I was not aware that
> Chen had sent out a new version.
> I had noted in the v9 review of the high/low range allocation
>  that I was working
> on a generic solution (irrespective of the crashkernel, low and high
> range allocation) which resulted in this patchset.
>
> The issue is two-fold: OOPs in memcfg layer (PATCH 1/2, which has been
> Acked-by memcfg maintainer) and OOM in the kdump kernel due to
> crashkernel allocation in ZONE_DMA32 regions(s) which is addressed by
> this PATCH.
>
> I will have a closer look at the v10 patchset Chen shared, but seems
> it needs some rework as per Dave's review comments which he shared
> today.
> IMO, in the meanwhile this patchset  can be used to fix the existing
> kdump issue with upstream kernel.
Thanks for your work.
There is no progress on the issue for long time, so i sent my solution in v8 
comments
and sent v9 recently.

I think direct limiting the crashkernel in ZONE_DMA isn't a good idea:
1. For parameter "crashkernel=Y", reserving crashkernel in first 1G memory will 
increase
the probability of memory allocation failure.
Previous discuss from https://lkml.org/lkml/2019/10/21/725:
"With ZONE_DMA=y, this config will fail to reserve 512M CMA on a server"

2. For parameter "crashkernel=Y@X", limiting the crashkernel in ZONE_DMA is 
unreasonable
for someone really want to reserve crashkernel from specified start address.

I have sent v10: https://www.spinics.net/lists/arm-kernel/msg819408.html, any 
commets are welcome.

Thanks,
Chen Zhou
>
>>> Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32")
>>> Cc: Johannes Weiner 
>>> Cc: Michal Hocko 
>>> Cc: Vladimir Davydov 
>>> Cc: James Morse 
>>> Cc: Mark Rutland 
>>> Cc: Will Deacon 
>>> Cc: Catalin Marinas 
>>> Cc: cgro...@vger.kernel.org
>>> Cc: linux...@kvack.org
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: ke...@lists.infradead.org
>>> Reported-by: Prabhakar Kushwaha 
>>> Signed-off-by: Bhupesh Sharma 
>>> ---
>>>  arch/arm64/mm/init.c | 16 ++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 1e93cfc7c47a..02ae4d623802 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -91,8 +91,15 @@ static void __init reserve_crashkernel(void)
>>>   crash_size = PAGE_ALIGN(crash_size);
>>>
>>>   if (crash_base == 0) {
>>> - /* Current arm64 boot protocol requires 2MB alignment */
>>> - crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>> + /* Current arm64 boot protocol requires 2MB alignment.
>>> +  * Also limit the crashkernel allocation to the first
>>> +  * 1GB of the RAM accessible (ZONE_DMA), as

Re: [PATCH v2 0/8] Introduce sv48 support

2020-07-02 Thread Alex Ghiti


Hi Palmer,

Le 7/1/20 à 2:27 PM, Palmer Dabbelt a écrit :

On Wed, 03 Jun 2020 01:10:56 PDT (-0700), a...@ghiti.fr wrote:

This patchset implements sv48 support at runtime. The kernel will try to
boot with 4-level page table and will fallback to 3-level if the HW 
does not

support it.

The biggest advantage is that we only have one kernel for 64bit, which
is way easier to maintain.

Folding the 4th level into a 3-level page table has almost no cost at
runtime. But as mentioned Palmer, the relocatable code generated is less
performant.

At the moment, there is no way to build a 3-level page table 
non-relocatable
64bit kernel. We agreed that distributions will use this runtime 
configuration
anyway, but Palmer proposed to introduce a new Kconfig, which I will 
do later

as sv48 support was asked for 5.8.


Sorry I wasn't clear last time, but this still has the same fundamental 
issue:

it forces 64-bit kernels to be relocatable, which imposes a performance
penalty.  We don't have any hardware that can actually take advantage of 
sv48,
so I don't want to take anything that penalizes what people are actually 
using

in order to add a feature people can't use.

I'd be OK taking this if sv48 support simply depended on a relocatable 
kernel,
as then users who want the faster kernel could still build one.  I don't 
want

to take something that forces all 64-bit kernels to be relocatable.


Indeed, I had not understood that this was a requirement. I will add a 
patch on top of this one introducing a new config, I have to think about it.


But even if I understand that the new level of indirection coming with 
PIE will be slower, is this new config worth it ? Can we benchmark 
somehow the performance loss ? IMHO I think that this config will get 
broken over time by lack of testing because I believe distributions will 
go for KASLR kernel which requires the relocatability property anyway.


Alex



Finally, the user can now ask for sv39 explicitly by using the 
device-tree
which will reduce memory footprint and reduce the number of memory 
accesses

in case of TLB miss.

Changes in v2:
  * Move variable declarations to pgtable.h in patch 5/7 as suggested 
by Anup

  * Restore mmu-type properties in patch 6 as suggested by Anup
  * Fix unused variable in patch 5 that was used in patch 6
  * Fix SPARSEMEM build (patch 2 was modified so I dropped the 
Reviewed-by)

  * Applied various Reviewed-by

Alexandre Ghiti (8):
  riscv: Get rid of compile time logic with MAX_EARLY_MAPPING_SIZE
  riscv: Allow to dynamically define VA_BITS
  riscv: Simplify MAXPHYSMEM config
  riscv: Prepare ptdump for vm layout dynamic addresses
  riscv: Implement sv48 support
  riscv: Allow user to downgrade to sv39 when hw supports sv48
  riscv: Use pgtable_l4_enabled to output mmu type in cpuinfo
  riscv: Explicit comment about user virtual address space size

 arch/riscv/Kconfig  |  34 ++---
 arch/riscv/include/asm/csr.h    |   3 +-
 arch/riscv/include/asm/fixmap.h |   1 +
 arch/riscv/include/asm/page.h   |  15 +++
 arch/riscv/include/asm/pgalloc.h    |  36 ++
 arch/riscv/include/asm/pgtable-64.h |  97 +-
 arch/riscv/include/asm/pgtable.h    |  31 -
 arch/riscv/include/asm/sparsemem.h  |   6 +-
 arch/riscv/kernel/cpu.c |  23 ++--
 arch/riscv/kernel/head.S    |   3 +-
 arch/riscv/mm/context.c |   2 +-
 arch/riscv/mm/init.c    | 194 
 arch/riscv/mm/ptdump.c  |  49 +--
 13 files changed, 412 insertions(+), 82 deletions(-)

RE: [PATCH v3] driver core: platform: expose numa_node to users in sysfs

2020-07-02 Thread Song Bao Hua (Barry Song)


> 
> However, it is still much more clear and credible to users by exposing the 
> data
> directly from ACPI table.
> 

Except ARM64 iort, numa_node is actually also applicable to x86 and other 
architectures through general
acpi_create_platform_device() API:

drivers/acpi/scan.c:
static void acpi_default_enumeration(struct acpi_device *device)
{
...
if (!device->flags.enumeration_by_parent) {
acpi_create_platform_device(device, NULL);
acpi_device_set_enumerated(device);
}
}

struct platform_device *acpi_create_platform_device(struct acpi_device *adev,
struct property_entry *properties)
{
...

pdev = platform_device_register_full();
if (IS_ERR(pdev))
dev_err(>dev, "platform device creation failed: %ld\n",
PTR_ERR(pdev));
else {
set_dev_node(>dev, acpi_get_node(adev->handle));
dev_dbg(>dev, "created platform device %s\n",
dev_name(>dev));
}

...

return pdev;
}

> >
> > Thanks,
> > John

Thanks
Barry

Re: [PATCH v2 1/4] x86/xen: remove 32-bit Xen PV guest support

2020-07-02 Thread Jürgen Groß


On 03.07.20 00:59, Boris Ostrovsky wrote:

On 7/1/20 7:06 AM, Juergen Gross wrote:

Xen is requiring 64-bit machines today and since Xen 4.14 it can be
built without 32-bit PV guest support. There is no need to carry the
burden of 32-bit PV guest support in the kernel any longer, as new
guests can be either HVM or PVH, or they can use a 64 bit kernel.

Remove the 32-bit Xen PV support from the kernel.

Signed-off-by: Juergen Gross 
---
  arch/x86/entry/entry_32.S  | 109 +--
  arch/x86/include/asm/proto.h   |   2 +-
  arch/x86/include/asm/segment.h |   2 +-
  arch/x86/kernel/head_32.S  |  31 ---
  arch/x86/xen/Kconfig   |   3 +-
  arch/x86/xen/Makefile  |   3 +-
  arch/x86/xen/apic.c|  17 --
  arch/x86/xen/enlighten_pv.c|  48 +



Should we drop PageHighMem() test in set_aliased_prot()?


(And there are few other places where is is used, in mmu_pv.c)


Yes, will drop those.






@@ -555,13 +547,8 @@ static void xen_load_tls(struct thread_struct *t, unsigned 
int cpu)
 * exception between the new %fs descriptor being loaded and
 * %fs being effectively cleared at __switch_to().
 */
-   if (paravirt_get_lazy_mode() == PARAVIRT_LAZY_CPU) {
-#ifdef CONFIG_X86_32
-   lazy_load_gs(0);
-#else



I think this also needs an adjustment to the preceding comment.


Yes.




  
-#ifdef CONFIG_X86_PAE

-static void xen_set_pte_atomic(pte_t *ptep, pte_t pte)
-{
-   trace_xen_mmu_set_pte_atomic(ptep, pte);
-   __xen_set_pte(ptep, pte);



Probably not for this series but I wonder whether __xen_set_pte() should
continue to use hypercall now that we are 64-bit only.


As Andrew wrote already the hypercall will be cheaper.

I'll adjust the comment, though.





@@ -654,14 +621,12 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t 
*pgd,



Comment above should be updated.


Yes.


Juergen

Re: [PATCH v7 02/13] dt-bindings: panel: Convert rocktech, jh057n00900 to yaml

2020-07-02 Thread Sam Ravnborg

Hi Ondrej.

> > My bot found errors running 'make dt_binding_check' on your patch:
> > 
> > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/bridge/nwl-dsi.example.dt.yaml:
> >  panel@0: '#address-cells', '#size-cells', 'port@0' do not match any of the 
> > regexes: 'pinctrl-[0-9]+'
> > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/bridge/nwl-dsi.example.dt.yaml:
> >  panel@0: 'vcc-supply' is a required property
> > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/bridge/nwl-dsi.example.dt.yaml:
> >  panel@0: 'iovcc-supply' is a required property
> > /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/display/bridge/nwl-dsi.example.dt.yaml:
> >  panel@0: 'reset-gpios' is a required property
> 
> Paths look bogus 
> 
> It should be .../rocktech,jh057n00900.yaml: ...

The example in nwl-dsi.yaml contains:
compatible = "rocktech,jh057n00900";

So the example is checked against your updated binding.
And the binding check fails because the example is no longer valid.

This needs to be fixed as we do not wat to introduce new errors.
Either the example or the binding needs the fix.

Sam


> 
> regards,
>   o.
> 
> > 
> > See https://patchwork.ozlabs.org/patch/1320690
> > 
> > If you already ran 'make dt_binding_check' and didn't see the above
> > error(s), then make sure dt-schema is up to date:
> > 
> > pip3 install git+https://github.com/devicetree-org/dt-schema.git@master 
> > --upgrade
> > 
> > Please check and re-submit.
> >

[PATCH v14 02/20] mm/page_idle: no unlikely double check for idle page counting

2020-07-02 Thread Alex Shi

As func comments mentioned, few isolated page missing be tolerated.
So why not do further to drop the unlikely double check. That won't
cause more idle pages, but reduce a lock contention.

This is also a preparation for later new page isolation feature.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/page_idle.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/mm/page_idle.c b/mm/page_idle.c
index 057c61df12db..5fdd753e151a 100644
--- a/mm/page_idle.c
+++ b/mm/page_idle.c
@@ -32,19 +32,11 @@
 static struct page *page_idle_get_page(unsigned long pfn)
 {
struct page *page = pfn_to_online_page(pfn);
-   pg_data_t *pgdat;
 
if (!page || !PageLRU(page) ||
!get_page_unless_zero(page))
return NULL;
 
-   pgdat = page_pgdat(page);
-   spin_lock_irq(>lru_lock);
-   if (unlikely(!PageLRU(page))) {
-   put_page(page);
-   page = NULL;
-   }
-   spin_unlock_irq(>lru_lock);
return page;
 }
 
-- 
1.8.3.1

[PATCH v14 15/20] mm/swap: serialize memcg changes during pagevec_lru_move_fn

2020-07-02 Thread Alex Shi

Hugh Dickins' found a memcg change bug on original version:
If we want to change the pgdat->lru_lock to memcg's lruvec lock, we have
to serialize mem_cgroup_move_account during pagevec_lru_move_fn. The
possible bad scenario would like:

cpu 0   cpu 1
lruvec = mem_cgroup_page_lruvec()
if (!isolate_lru_page())
mem_cgroup_move_account

spin_lock_irqsave(>lru_lock <== wrong lock.

So we need the ClearPageLRU to block isolate_lru_page(), then serialize
the memcg change here.

Reported-by: Hugh Dickins 
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/swap.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index b24d5f69b93a..55eb2c2eed03 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -203,7 +203,7 @@ int get_kernel_page(unsigned long start, int write, struct 
page **pages)
 EXPORT_SYMBOL_GPL(get_kernel_page);
 
 static void pagevec_lru_move_fn(struct pagevec *pvec,
-   void (*move_fn)(struct page *page, struct lruvec *lruvec))
+   void (*move_fn)(struct page *page, struct lruvec *lruvec), bool add)
 {
int i;
struct pglist_data *pgdat = NULL;
@@ -221,8 +221,15 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
spin_lock_irqsave(>lru_lock, flags);
}
 
+   /* new page add to lru or page moving between lru */
+   if (!add && !TestClearPageLRU(page))
+   continue;
+
lruvec = mem_cgroup_page_lruvec(page, pgdat);
(*move_fn)(page, lruvec);
+
+   if (!add)
+   SetPageLRU(page);
}
if (pgdat)
spin_unlock_irqrestore(>lru_lock, flags);
@@ -259,7 +266,7 @@ void rotate_reclaimable_page(struct page *page)
local_lock_irqsave(_rotate.lock, flags);
pvec = this_cpu_ptr(_rotate.pvec);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
+   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, false);
local_unlock_irqrestore(_rotate.lock, flags);
}
 }
@@ -328,7 +335,7 @@ static void activate_page_drain(int cpu)
struct pagevec *pvec = _cpu(lru_pvecs.activate_page, cpu);
 
if (pagevec_count(pvec))
-   pagevec_lru_move_fn(pvec, __activate_page);
+   pagevec_lru_move_fn(pvec, __activate_page, false);
 }
 
 static bool need_activate_page_drain(int cpu)
@@ -346,7 +353,7 @@ void activate_page(struct page *page)
pvec = this_cpu_ptr(_pvecs.activate_page);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_lru_move_fn(pvec, __activate_page);
+   pagevec_lru_move_fn(pvec, __activate_page, false);
local_unlock(_pvecs.lock);
}
 }
@@ -621,21 +628,21 @@ void lru_add_drain_cpu(int cpu)
 
/* No harm done if a racing interrupt already did this */
local_lock_irqsave(_rotate.lock, flags);
-   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
+   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, false);
local_unlock_irqrestore(_rotate.lock, flags);
}
 
pvec = _cpu(lru_pvecs.lru_deactivate_file, cpu);
if (pagevec_count(pvec))
-   pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
+   pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, false);
 
pvec = _cpu(lru_pvecs.lru_deactivate, cpu);
if (pagevec_count(pvec))
-   pagevec_lru_move_fn(pvec, lru_deactivate_fn);
+   pagevec_lru_move_fn(pvec, lru_deactivate_fn, false);
 
pvec = _cpu(lru_pvecs.lru_lazyfree, cpu);
if (pagevec_count(pvec))
-   pagevec_lru_move_fn(pvec, lru_lazyfree_fn);
+   pagevec_lru_move_fn(pvec, lru_lazyfree_fn, false);
 
activate_page_drain(cpu);
 }
@@ -664,7 +671,7 @@ void deactivate_file_page(struct page *page)
pvec = this_cpu_ptr(_pvecs.lru_deactivate_file);
 
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_lru_move_fn(pvec, lru_deactivate_file_fn);
+   pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, 
false);
local_unlock(_pvecs.lock);
}
 }
@@ -686,7 +693,7 @@ void deactivate_page(struct page *page)
pvec = this_cpu_ptr(_pvecs.lru_deactivate);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_lru_move_fn(pvec, lru_deactivate_fn);
+   pagevec_lru_move_fn(pvec,

[PATCH v14 16/20] mm/lru: replace pgdat lru_lock with lruvec lock

2020-07-02 Thread Alex Shi

This patch moves per node lru_lock into lruvec, thus bring a lru_lock for
each of memcg per node. So on a large machine, each of memcg don't
have to suffer from per node pgdat->lru_lock competition. They could go
fast with their self lru_lock.

After move memcg charge before lru inserting, page isolation could
stable page's memcg, then per memcg lruvec lock is stable and could
replace per node lru lock.

According to Daniel Jordan's suggestion, I run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice

With this and later patches, the readtwice performance increases about
80% within concurrent containers.

Also add a debug func in locking which may give some clues if there are
sth out of hands.

Signed-off-by: Alex Shi 
Cc: Hugh Dickins 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Yang Shi 
Cc: Matthew Wilcox 
Cc: Konstantin Khlebnikov 
Cc: Tejun Heo 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
Cc: cgro...@vger.kernel.org
---
 include/linux/memcontrol.h | 98 ++
 include/linux/mmzone.h |  2 +
 mm/compaction.c| 67 +++
 mm/huge_memory.c   |  9 ++---
 mm/memcontrol.c| 63 -
 mm/mlock.c | 32 +++
 mm/mmzone.c|  1 +
 mm/swap.c  | 79 +
 mm/vmscan.c| 70 ++---
 9 files changed, 300 insertions(+), 121 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index e77197a62809..6e670f991b42 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -411,6 +411,19 @@ static inline struct lruvec *mem_cgroup_lruvec(struct 
mem_cgroup *memcg,
 
 struct mem_cgroup *get_mem_cgroup_from_page(struct page *page);
 
+struct lruvec *lock_page_lruvec(struct page *page);
+struct lruvec *lock_page_lruvec_irq(struct page *page);
+struct lruvec *lock_page_lruvec_irqsave(struct page *page,
+   unsigned long *flags);
+
+#ifdef CONFIG_DEBUG_VM
+void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page);
+#else
+static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page)
+{
+}
+#endif
+
 static inline
 struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){
return css ? container_of(css, struct mem_cgroup, css) : NULL;
@@ -892,6 +905,31 @@ static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
 
+static inline struct lruvec *lock_page_lruvec(struct page *page)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+
+   spin_lock(>__lruvec.lru_lock);
+   return >__lruvec;
+}
+
+static inline struct lruvec *lock_page_lruvec_irq(struct page *page)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+
+   spin_lock_irq(>__lruvec.lru_lock);
+   return >__lruvec;
+}
+
+static inline struct lruvec *lock_page_lruvec_irqsave(struct page *page,
+   unsigned long *flagsp)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+
+   spin_lock_irqsave(>__lruvec.lru_lock, *flagsp);
+   return >__lruvec;
+}
+
 static inline struct mem_cgroup *
 mem_cgroup_iter(struct mem_cgroup *root,
struct mem_cgroup *prev,
@@ -1126,6 +1164,10 @@ static inline void count_memcg_page_event(struct page 
*page,
 void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx)
 {
 }
+
+static inline void lruvec_memcg_debug(struct lruvec *lruvec, struct page *page)
+{
+}
 #endif /* CONFIG_MEMCG */
 
 /* idx can be of type enum memcg_stat_item or node_stat_item */
@@ -1255,6 +1297,62 @@ static inline struct lruvec *parent_lruvec(struct lruvec 
*lruvec)
return mem_cgroup_lruvec(memcg, lruvec_pgdat(lruvec));
 }
 
+static inline void unlock_page_lruvec(struct lruvec *lruvec)
+{
+   spin_unlock(>lru_lock);
+}
+
+static inline void unlock_page_lruvec_irq(struct lruvec *lruvec)
+{
+   spin_unlock_irq(>lru_lock);
+}
+
+static inline void unlock_page_lruvec_irqrestore(struct lruvec *lruvec,
+   unsigned long flags)
+{
+   spin_unlock_irqrestore(>lru_lock, flags);
+}
+
+/* Don't lock again iff page's lruvec locked */
+static inline struct lruvec *relock_page_lruvec_irq(struct page *page,
+   struct lruvec *locked_lruvec)
+{
+   struct pglist_data *pgdat = page_pgdat(page);
+   bool locked;
+
+   rcu_read_lock();
+   locked = mem_cgroup_page_lruvec(page, pgdat) == locked_lruvec;
+   rcu_read_unlock();
+
+   if (locked)
+   return locked_lruvec;
+
+   if (locked_lruvec)
+   unlock_page_lruvec_irq(locked_lruvec);
+
+   return lock_page_lruvec_irq(page);
+}
+
+/* Don't lock again iff page's lruvec locked */
+static inline

[PATCH v14 20/20] mm/lru: revise the comments of lru_lock

2020-07-02 Thread Alex Shi

From: Hugh Dickins 

Since we changed the pgdat->lru_lock to lruvec->lru_lock, it's time to
fix the incorrect comments in code. Also fixed some zone->lru_lock comment
error from ancient time. etc.

Signed-off-by: Hugh Dickins 
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: Andrey Ryabinin 
Cc: Jann Horn 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: cgro...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 Documentation/admin-guide/cgroup-v1/memcg_test.rst | 15 +++
 Documentation/admin-guide/cgroup-v1/memory.rst | 21 +
 Documentation/trace/events-kmem.rst|  2 +-
 Documentation/vm/unevictable-lru.rst   | 22 --
 include/linux/mm_types.h   |  2 +-
 include/linux/mmzone.h |  2 +-
 mm/filemap.c   |  4 ++--
 mm/memcontrol.c|  2 +-
 mm/rmap.c  |  4 ++--
 mm/vmscan.c| 12 
 10 files changed, 36 insertions(+), 50 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst 
b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
index 3f7115e07b5d..0b9f91589d3d 100644
--- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst
+++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst
@@ -133,18 +133,9 @@ Under below explanation, we assume 
CONFIG_MEM_RES_CTRL_SWAP=y.
 
 8. LRU
 ==
-Each memcg has its own private LRU. Now, its handling is under global
-   VM's control (means that it's handled under global pgdat->lru_lock).
-   Almost all routines around memcg's LRU is called by global LRU's
-   list management functions under pgdat->lru_lock.
-
-   A special function is mem_cgroup_isolate_pages(). This scans
-   memcg's private LRU and call __isolate_lru_page() to extract a page
-   from LRU.
-
-   (By __isolate_lru_page(), the page is removed from both of global and
-   private LRU.)
-
+   Each memcg has its own vector of LRUs (inactive anon, active anon,
+   inactive file, active file, unevictable) of pages from each node,
+   each LRU handled under a single lru_lock for that memcg and node.
 
 9. Typical Tests.
 =
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst 
b/Documentation/admin-guide/cgroup-v1/memory.rst
index 12757e63b26c..24450696579f 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -285,20 +285,17 @@ When oom event notifier is registered, event will be 
delivered.
 2.6 Locking
 ---
 
-   lock_page_cgroup()/unlock_page_cgroup() should not be called under
-   the i_pages lock.
+Lock order is as follows:
 
-   Other lock order is following:
+  Page lock (PG_locked bit of page->flags)
+mm->page_table_lock or split pte_lock
+  lock_page_memcg (memcg->move_lock)
+mapping->i_pages lock
+  lruvec->lru_lock.
 
-   PG_locked.
- mm->page_table_lock
- pgdat->lru_lock
-  lock_page_cgroup.
-
-  In many cases, just lock_page_cgroup() is called.
-
-  per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
-  pgdat->lru_lock, it has no lock of its own.
+Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
+lruvec->lru_lock; PG_lru bit of page->flags is cleared before
+isolating a page from its LRU under lruvec->lru_lock.
 
 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
 ---
diff --git a/Documentation/trace/events-kmem.rst 
b/Documentation/trace/events-kmem.rst
index 555484110e36..68fa75247488 100644
--- a/Documentation/trace/events-kmem.rst
+++ b/Documentation/trace/events-kmem.rst
@@ -69,7 +69,7 @@ When pages are freed in batch, the also mm_page_free_batched 
is triggered.
 Broadly speaking, pages are taken off the LRU lock in bulk and
 freed in batch with a page list. Significant amounts of activity here could
 indicate that the system is under memory pressure and can also indicate
-contention on the zone->lru_lock.
+contention on the lruvec->lru_lock.
 
 4. Per-CPU Allocator Activity
 =
diff --git a/Documentation/vm/unevictable-lru.rst 
b/Documentation/vm/unevictable-lru.rst
index 17d0861b0f1d..0e1490524f53 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -33,7 +33,7 @@ reclaim in Linux.  The problems have been observed at 
customer sites on large
 memory x86_64 systems.
 
 To illustrate this with an example, a non-NUMA x86_64 platform with 128GB of
-main memory will have over 32 million 4k pages in a single zone.  When a large
+main memory will have over 32 million 4k pages in a single node.  When a large
 fraction of these pages are not evictable for any reason [see below], vmscan
 will spend a lot

[PATCH v14 14/20] mm/mlock: reorder isolation sequence during munlock

2020-07-02 Thread Alex Shi

This patch reorder the isolation steps during munlock, move the lru lock
to guard each pages, unfold __munlock_isolate_lru_page func, to do the
preparation for lru lock change.

__split_huge_page_refcount doesn't exist, but we still have to guard
PageMlocked and PageLRU in __split_huge_page_tail, that is the reason
ClearPageLRU action is moved after lru locking.

[l...@intel.com: found a sleeping function bug ... at mm/rmap.c]
Signed-off-by: Alex Shi 
Cc: Kirill A. Shutemov 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/mlock.c | 93 ++
 1 file changed, 51 insertions(+), 42 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 228ba5a8e0a5..7098be122966 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -103,25 +103,6 @@ void mlock_vma_page(struct page *page)
 }
 
 /*
- * Isolate a page from LRU with optional get_page() pin.
- * Assumes lru_lock already held and page already pinned.
- */
-static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
-{
-   if (TestClearPageLRU(page)) {
-   struct lruvec *lruvec;
-
-   lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (getpage)
-   get_page(page);
-   del_page_from_lru_list(page, lruvec, page_lru(page));
-   return true;
-   }
-
-   return false;
-}
-
-/*
  * Finish munlock after successful page isolation
  *
  * Page must be locked. This is a wrapper for try_to_munlock()
@@ -181,6 +162,7 @@ static void __munlock_isolation_failed(struct page *page)
 unsigned int munlock_vma_page(struct page *page)
 {
int nr_pages;
+   bool clearlru = false;
pg_data_t *pgdat = page_pgdat(page);
 
/* For try_to_munlock() and to serialize with page migration */
@@ -189,32 +171,42 @@ unsigned int munlock_vma_page(struct page *page)
VM_BUG_ON_PAGE(PageTail(page), page);
 
/*
-* Serialize with any parallel __split_huge_page_refcount() which
+* Serialize with any parallel __split_huge_page_tail() which
 * might otherwise copy PageMlocked to part of the tail pages before
 * we clear it in the head page. It also stabilizes hpage_nr_pages().
 */
+   get_page(page);
spin_lock_irq(>lru_lock);
+   clearlru = TestClearPageLRU(page);
 
if (!TestClearPageMlocked(page)) {
-   /* Potentially, PTE-mapped THP: do not skip the rest PTEs */
-   nr_pages = 1;
-   goto unlock_out;
+   if (clearlru)
+   SetPageLRU(page);
+   /*
+* Potentially, PTE-mapped THP: do not skip the rest PTEs
+* Reuse lock as memory barrier for release_pages racing.
+*/
+   spin_unlock_irq(>lru_lock);
+   put_page(page);
+   return 0;
}
 
nr_pages = hpage_nr_pages(page);
__mod_zone_page_state(page_zone(page), NR_MLOCK, -nr_pages);
 
-   if (__munlock_isolate_lru_page(page, true)) {
+   if (clearlru) {
+   struct lruvec *lruvec;
+
+   lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
+   del_page_from_lru_list(page, lruvec, page_lru(page));
spin_unlock_irq(>lru_lock);
__munlock_isolated_page(page);
-   goto out;
+   } else {
+   spin_unlock_irq(>lru_lock);
+   put_page(page);
+   __munlock_isolation_failed(page);
}
-   __munlock_isolation_failed(page);
-
-unlock_out:
-   spin_unlock_irq(>lru_lock);
 
-out:
return nr_pages - 1;
 }
 
@@ -297,34 +289,51 @@ static void __munlock_pagevec(struct pagevec *pvec, 
struct zone *zone)
pagevec_init(_putback);
 
/* Phase 1: page isolation */
-   spin_lock_irq(>zone_pgdat->lru_lock);
for (i = 0; i < nr; i++) {
struct page *page = pvec->pages[i];
+   struct lruvec *lruvec;
+   bool clearlru;
 
-   if (TestClearPageMlocked(page)) {
-   /*
-* We already have pin from follow_page_mask()
-* so we can spare the get_page() here.
-*/
-   if (__munlock_isolate_lru_page(page, false))
-   continue;
-   else
-   __munlock_isolation_failed(page);
-   } else {
+   clearlru = TestClearPageLRU(page);
+   spin_lock_irq(>zone_pgdat->lru_lock);
+
+   if (!TestClearPageMlocked(page)) {
delta_munlocked++;
+   if (clearlru)
+   SetPageLRU(page);
+   goto putback;
+   }
+
+

[PATCH v14 04/20] mm/compaction: rename compact_deferred as compact_should_defer

2020-07-02 Thread Alex Shi

The compact_deferred is a defer suggestion check, deferring action does in
defer_compaction not here. so, better rename it to avoid confusing.

Signed-off-by: Alex Shi 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Andrew Morton 
Cc: Vlastimil Babka 
Cc: Mike Kravetz 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/compaction.h| 4 ++--
 include/trace/events/compaction.h | 2 +-
 mm/compaction.c   | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 6fa0eea3f530..be9ed7437a38 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -100,7 +100,7 @@ extern enum compact_result compaction_suitable(struct zone 
*zone, int order,
unsigned int alloc_flags, int highest_zoneidx);
 
 extern void defer_compaction(struct zone *zone, int order);
-extern bool compaction_deferred(struct zone *zone, int order);
+extern bool compaction_should_defer(struct zone *zone, int order);
 extern void compaction_defer_reset(struct zone *zone, int order,
bool alloc_success);
 extern bool compaction_restarting(struct zone *zone, int order);
@@ -199,7 +199,7 @@ static inline void defer_compaction(struct zone *zone, int 
order)
 {
 }
 
-static inline bool compaction_deferred(struct zone *zone, int order)
+static inline bool compaction_should_defer(struct zone *zone, int order)
 {
return true;
 }
diff --git a/include/trace/events/compaction.h 
b/include/trace/events/compaction.h
index 54e5bf081171..33633c71df04 100644
--- a/include/trace/events/compaction.h
+++ b/include/trace/events/compaction.h
@@ -274,7 +274,7 @@
1UL << __entry->defer_shift)
 );
 
-DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_deferred,
+DEFINE_EVENT(mm_compaction_defer_template, mm_compaction_should_defer,
 
TP_PROTO(struct zone *zone, int order),
 
diff --git a/mm/compaction.c b/mm/compaction.c
index cd1ef9e5e638..f14780fc296a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -154,7 +154,7 @@ void defer_compaction(struct zone *zone, int order)
 }
 
 /* Returns true if compaction should be skipped this time */
-bool compaction_deferred(struct zone *zone, int order)
+bool compaction_should_defer(struct zone *zone, int order)
 {
unsigned long defer_limit = 1UL << zone->compact_defer_shift;
 
@@ -168,7 +168,7 @@ bool compaction_deferred(struct zone *zone, int order)
if (zone->compact_considered >= defer_limit)
return false;
 
-   trace_mm_compaction_deferred(zone, order);
+   trace_mm_compaction_should_defer(zone, order);
 
return true;
 }
@@ -2377,7 +2377,7 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, 
unsigned int order,
enum compact_result status;
 
if (prio > MIN_COMPACT_PRIORITY
-   && compaction_deferred(zone, order)) {
+   && compaction_should_defer(zone, order)) {
rc = max_t(enum compact_result, COMPACT_DEFERRED, rc);
continue;
}
@@ -2561,7 +2561,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
if (!populated_zone(zone))
continue;
 
-   if (compaction_deferred(zone, cc.order))
+   if (compaction_should_defer(zone, cc.order))
continue;
 
if (compaction_suitable(zone, cc.order, 0, zoneid) !=
-- 
1.8.3.1

[PATCH v14 00/20] per memcg lru lock

2020-07-02 Thread Alex Shi

This is a new version which bases on v5.8-rc3.

Current lru_lock is one for each of node, pgdat->lru_lock, that guard for
lru lists, but now we had moved the lru lists into memcg for long time. Still
using per node lru_lock is clearly unscalable, pages on each of memcgs have
to compete each others for a whole lru_lock. This patchset try to use per
lruvec/memcg lru_lock to repleace per node lru lock to guard lru lists, make
it scalable for memcgs and get performance gain.

Currently lru_lock still guards both lru list and page's lru bit, that's ok.
but if we want to use specific lruvec lock on the page, we need to pin down
the page's lruvec/memcg during locking. Just taking lruvec lock first may be
undermined by the page's memcg charge/migration. To fix this problem, we could
take out the page's lru bit clear and use it as pin down action to block the
memcg changes. That's the reason for atomic func TestClearPageLRU.
So now isolating a page need both actions: TestClearPageLRU and hold the
lru_lock.

The typic using for this is isolate_migratepages_block() in compaction.c
we have to take lru bit before lru lock, that serialized the page isolation
in memcg page charge/migration which will change page's lruvec and new 
lru_lock in it.

The above solution suggested by Johannes Weiner, and based on his new memcg 
charge path, we have this patchset. (Hugh Dickins tested and contributed much
code from compaction fix to general code polish, thanks a lot!).

The patchset includes 3 parts:
1, some code cleanup and minimum optimization as a preparation.
2, use TestCleanPageLRU as page isolation's condition
3, replace per node lru_lock with per memcg per node lru_lock

Following Daniel Jordan's suggestion, I have run 208 'dd' with on 104
containers on a 2s * 26cores * HT box with a modefied case:
https://git.kernel.org/pub/scm/linux/kernel/git/wfg/vm-scalability.git/tree/case-lru-file-readtwice
With this patchset, the readtwice performance increased about 80%
in concurrent containers.

Thanks Hugh Dickins and Konstantin Khlebnikov, they both brought this
idea 8 years ago, and others who give comments as well: Daniel Jordan, 
Mel Gorman, Shakeel Butt, Matthew Wilcox etc.

Thanks for Testing support from Intel 0day and Rong Chen, Fengguang Wu,
and Yun Wang. Hugh Dickins also shared his kbuild-swap case. Thanks!

Alex Shi (18):
  mm/vmscan: remove unnecessary lruvec adding
  mm/page_idle: no unlikely double check for idle page counting
  mm/compaction: correct the comments of compact_defer_shift
  mm/compaction: rename compact_deferred as compact_should_defer
  mm/thp: move lru_add_page_tail func to huge_memory.c
  mm/thp: clean up lru_add_page_tail
  mm/thp: narrow lru locking
  mm/memcg: add debug checking in lock_page_memcg
  mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn
  mm/lru: move lru_lock holding in func lru_note_cost_page
  mm/lru: move lock into lru_note_cost
  mm/lru: introduce TestClearPageLRU
  mm/compaction: do page isolation first in compaction
  mm/mlock: reorder isolation sequence during munlock
  mm/swap: serialize memcg changes during pagevec_lru_move_fn
  mm/lru: replace pgdat lru_lock with lruvec lock
  mm/lru: introduce the relock_page_lruvec function
  mm/pgdat: remove pgdat lru_lock

Hugh Dickins (2):
  mm/vmscan: use relock for move_pages_to_lru
  mm/lru: revise the comments of lru_lock

 Documentation/admin-guide/cgroup-v1/memcg_test.rst |  15 +-
 Documentation/admin-guide/cgroup-v1/memory.rst |  21 ++-
 Documentation/trace/events-kmem.rst|   2 +-
 Documentation/vm/unevictable-lru.rst   |  22 +--
 include/linux/compaction.h |   4 +-
 include/linux/memcontrol.h |  98 
 include/linux/mm_types.h   |   2 +-
 include/linux/mmzone.h |   6 +-
 include/linux/page-flags.h |   1 +
 include/linux/swap.h   |   4 +-
 include/trace/events/compaction.h  |   2 +-
 mm/compaction.c| 113 +
 mm/filemap.c   |   4 +-
 mm/huge_memory.c   |  53 --
 mm/memcontrol.c|  71 -
 mm/memory.c|   3 -
 mm/mlock.c |  93 +--
 mm/mmzone.c|   1 +
 mm/page_alloc.c|   1 -
 mm/page_idle.c |   8 -
 mm/rmap.c  |   4 +-
 mm/swap.c  | 177 +++--
 mm/swap_state.c|   2 -
 mm/vmscan.c| 174 ++--
 mm/workingset.c|   2 -
 25 files changed, 518

[PATCH v14 09/20] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn

2020-07-02 Thread Alex Shi

Fold the PGROTATED event collection into pagevec_move_tail_fn call back
func like other funcs does in pagevec_lru_move_fn. Now all usage of
pagevec_lru_move_fn are same and no needs of the 3rd parameter.

It's simply the calling.

[l...@intel.com: found a build issue in the original patch, thanks]
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/swap.c | 66 +++
 1 file changed, 24 insertions(+), 42 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 7701d855873d..dc8b02cdddcb 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -204,8 +204,7 @@ int get_kernel_page(unsigned long start, int write, struct 
page **pages)
 EXPORT_SYMBOL_GPL(get_kernel_page);
 
 static void pagevec_lru_move_fn(struct pagevec *pvec,
-   void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg),
-   void *arg)
+   void (*move_fn)(struct page *page, struct lruvec *lruvec))
 {
int i;
struct pglist_data *pgdat = NULL;
@@ -224,7 +223,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
}
 
lruvec = mem_cgroup_page_lruvec(page, pgdat);
-   (*move_fn)(page, lruvec, arg);
+   (*move_fn)(page, lruvec);
}
if (pgdat)
spin_unlock_irqrestore(>lru_lock, flags);
@@ -232,35 +231,23 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
pagevec_reinit(pvec);
 }
 
-static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec,
-void *arg)
+static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec)
 {
-   int *pgmoved = arg;
-
if (PageLRU(page) && !PageUnevictable(page)) {
del_page_from_lru_list(page, lruvec, page_lru(page));
ClearPageActive(page);
add_page_to_lru_list_tail(page, lruvec, page_lru(page));
-   (*pgmoved) += hpage_nr_pages(page);
+   __count_vm_events(PGROTATED, hpage_nr_pages(page));
}
 }
 
 /*
- * pagevec_move_tail() must be called with IRQ disabled.
- * Otherwise this may cause nasty races.
- */
-static void pagevec_move_tail(struct pagevec *pvec)
-{
-   int pgmoved = 0;
-
-   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, );
-   __count_vm_events(PGROTATED, pgmoved);
-}
-
-/*
  * Writeback is about to end against a page which has been marked for immediate
  * reclaim.  If it still appears to be reclaimable, move it to the tail of the
  * inactive list.
+ *
+ * pagevec_move_tail_fn() must be called with IRQ disabled.
+ * Otherwise this may cause nasty races.
  */
 void rotate_reclaimable_page(struct page *page)
 {
@@ -273,7 +260,7 @@ void rotate_reclaimable_page(struct page *page)
local_lock_irqsave(_rotate.lock, flags);
pvec = this_cpu_ptr(_rotate.pvec);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_move_tail(pvec);
+   pagevec_lru_move_fn(pvec, pagevec_move_tail_fn);
local_unlock_irqrestore(_rotate.lock, flags);
}
 }
@@ -315,8 +302,7 @@ void lru_note_cost_page(struct page *page)
  page_is_file_lru(page), hpage_nr_pages(page));
 }
 
-static void __activate_page(struct page *page, struct lruvec *lruvec,
-   void *arg)
+static void __activate_page(struct page *page, struct lruvec *lruvec)
 {
if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
int lru = page_lru_base_type(page);
@@ -340,7 +326,7 @@ static void activate_page_drain(int cpu)
struct pagevec *pvec = _cpu(lru_pvecs.activate_page, cpu);
 
if (pagevec_count(pvec))
-   pagevec_lru_move_fn(pvec, __activate_page, NULL);
+   pagevec_lru_move_fn(pvec, __activate_page);
 }
 
 static bool need_activate_page_drain(int cpu)
@@ -358,7 +344,7 @@ void activate_page(struct page *page)
pvec = this_cpu_ptr(_pvecs.activate_page);
get_page(page);
if (!pagevec_add(pvec, page) || PageCompound(page))
-   pagevec_lru_move_fn(pvec, __activate_page, NULL);
+   pagevec_lru_move_fn(pvec, __activate_page);
local_unlock(_pvecs.lock);
}
 }
@@ -374,7 +360,7 @@ void activate_page(struct page *page)
 
page = compound_head(page);
spin_lock_irq(>lru_lock);
-   __activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL);
+   __activate_page(page, mem_cgroup_page_lruvec(page, pgdat));
spin_unlock_irq(>lru_lock);
 }
 #endif
@@ -526,8 +512,7 @@ void lru_cache_add_active_or_unevictable(struct page *page,
  * be write it out by flusher threads as this is much more effective
  * than the single-page writeout from reclaim.
  */
-static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
-

[PATCH v14 11/20] mm/lru: move lock into lru_note_cost

2020-07-02 Thread Alex Shi

This patch move lru_lock into lru_note_cost. It's a bit ugly and may
cost more locking, but it's necessary for later per pgdat lru_lock to
per memcg lru_lock change.

Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/swap.c   | 5 +++--
 mm/vmscan.c | 4 +---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index b88ca630db70..c67699de4869 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -269,7 +269,9 @@ void lru_note_cost(struct lruvec *lruvec, bool file, 
unsigned int nr_pages)
 {
do {
unsigned long lrusize;
+   pglist_data *pgdat = lruvec_pgdat(lruvec);
 
+   spin_lock_irq(>lru_lock);
/* Record cost event */
if (file)
lruvec->file_cost += nr_pages;
@@ -293,15 +295,14 @@ void lru_note_cost(struct lruvec *lruvec, bool file, 
unsigned int nr_pages)
lruvec->file_cost /= 2;
lruvec->anon_cost /= 2;
}
+   spin_unlock_irq(>lru_lock);
} while ((lruvec = parent_lruvec(lruvec)));
 }
 
 void lru_note_cost_page(struct page *page)
 {
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)),
  page_is_file_lru(page), hpage_nr_pages(page));
-   spin_unlock_irq(_pgdat(page)->lru_lock);
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ddb29d813d77..c1c4259b4de5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1976,19 +1976,17 @@ static int current_may_throttle(void)
, false);
 
spin_lock_irq(>lru_lock);
-
move_pages_to_lru(lruvec, _list);
 
__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
-   lru_note_cost(lruvec, file, stat.nr_pageout);
item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
if (!cgroup_reclaim(sc))
__count_vm_events(item, nr_reclaimed);
__count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
__count_vm_events(PGSTEAL_ANON + file, nr_reclaimed);
-
spin_unlock_irq(>lru_lock);
 
+   lru_note_cost(lruvec, file, stat.nr_pageout);
mem_cgroup_uncharge_list(_list);
free_unref_page_list(_list);
 
-- 
1.8.3.1

[PATCH v14 18/20] mm/vmscan: use relock for move_pages_to_lru

2020-07-02 Thread Alex Shi

From: Hugh Dickins 

Use the relock function to replace relocking action. And try to save few
lock times.

Signed-off-by: Hugh Dickins 
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Tejun Heo 
Cc: Andrey Ryabinin 
Cc: Jann Horn 
Cc: Mel Gorman 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: cgro...@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 mm/vmscan.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bdb53a678e7e..078a1640ec60 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1854,15 +1854,15 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
enum lru_list lru;
 
while (!list_empty(list)) {
-   struct lruvec *new_lruvec = NULL;
-
page = lru_to_page(list);
VM_BUG_ON_PAGE(PageLRU(page), page);
list_del(>lru);
if (unlikely(!page_evictable(page))) {
-   spin_unlock_irq(>lru_lock);
+   if (lruvec) {
+   spin_unlock_irq(>lru_lock);
+   lruvec = NULL;
+   }
putback_lru_page(page);
-   spin_lock_irq(>lru_lock);
continue;
}
 
@@ -1876,12 +1876,7 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
 *list_add(>lru,)
 * list_add(>lru,) //corrupt
 */
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (new_lruvec != lruvec) {
-   if (lruvec)
-   spin_unlock_irq(>lru_lock);
-   lruvec = lock_page_lruvec_irq(page);
-   }
+   lruvec = relock_page_lruvec_irq(page, lruvec);
SetPageLRU(page);
 
if (unlikely(put_page_testzero(page))) {
@@ -1890,8 +1885,8 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
 
if (unlikely(PageCompound(page))) {
spin_unlock_irq(>lru_lock);
+   lruvec = NULL;
destroy_compound_page(page);
-   spin_lock_irq(>lru_lock);
} else
list_add(>lru, _to_free);
 
-- 
1.8.3.1

[PATCH v14 19/20] mm/pgdat: remove pgdat lru_lock

2020-07-02 Thread Alex Shi

Now pgdat.lru_lock was replaced by lruvec lock. It's not used anymore.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Konstantin Khlebnikov 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
---
 include/linux/mmzone.h | 1 -
 mm/page_alloc.c| 1 -
 2 files changed, 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 36c1680efd90..8d7318ce5f62 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -735,7 +735,6 @@ struct deferred_split {
 
/* Write-intensive fields used by page reclaim */
ZONE_PADDING(_pad1_)
-   spinlock_t  lru_lock;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48eb0f1410d4..05ce6e1a3098 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6721,7 +6721,6 @@ static void __meminit pgdat_init_internals(struct 
pglist_data *pgdat)
init_waitqueue_head(>pfmemalloc_wait);
 
pgdat_page_ext_init(pgdat);
-   spin_lock_init(>lru_lock);
lruvec_init(>__lruvec);
 }
 
-- 
1.8.3.1

[PATCH v14 08/20] mm/memcg: add debug checking in lock_page_memcg

2020-07-02 Thread Alex Shi

Add a debug checking in lock_page_memcg, then we could get alarm
if anything wrong here.

Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Andrew Morton 
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/memcontrol.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 19622328e4b5..fde47272b13c 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1983,6 +1983,12 @@ struct mem_cgroup *lock_page_memcg(struct page *page)
if (unlikely(!memcg))
return NULL;
 
+#ifdef CONFIG_PROVE_LOCKING
+   local_irq_save(flags);
+   might_lock(>move_lock);
+   local_irq_restore(flags);
+#endif
+
if (atomic_read(>moving_account) <= 0)
return memcg;
 
-- 
1.8.3.1

[PATCH v14 10/20] mm/lru: move lru_lock holding in func lru_note_cost_page

2020-07-02 Thread Alex Shi

It's a clean up patch w/o function changes.

Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/memory.c | 3 ---
 mm/swap.c   | 2 ++
 mm/swap_state.c | 2 --
 mm/workingset.c | 2 --
 4 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 87ec87cdc1ff..dafc5585517e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3150,10 +3150,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 * XXX: Move to lru_cache_add() when it
 * supports new vs putback
 */
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
-
lru_cache_add(page);
swap_readpage(page, true);
}
diff --git a/mm/swap.c b/mm/swap.c
index dc8b02cdddcb..b88ca630db70 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -298,8 +298,10 @@ void lru_note_cost(struct lruvec *lruvec, bool file, 
unsigned int nr_pages)
 
 void lru_note_cost_page(struct page *page)
 {
+   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost(mem_cgroup_page_lruvec(page, page_pgdat(page)),
  page_is_file_lru(page), hpage_nr_pages(page));
+   spin_unlock_irq(_pgdat(page)->lru_lock);
 }
 
 static void __activate_page(struct page *page, struct lruvec *lruvec)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 05889e8e3c97..080be52db6a8 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -440,9 +440,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, 
gfp_t gfp_mask,
}
 
/* XXX: Move to lru_cache_add() when it supports new vs putback */
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
 
/* Caller will initiate read into locked page */
SetPageWorkingset(page);
diff --git a/mm/workingset.c b/mm/workingset.c
index 50b7937bab32..337d5b9ad132 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -372,9 +372,7 @@ void workingset_refault(struct page *page, void *shadow)
if (workingset) {
SetPageWorkingset(page);
/* XXX: Move to lru_cache_add() when it supports new vs putback 
*/
-   spin_lock_irq(_pgdat(page)->lru_lock);
lru_note_cost_page(page);
-   spin_unlock_irq(_pgdat(page)->lru_lock);
inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
}
 out:
-- 
1.8.3.1

[PATCH v14 17/20] mm/lru: introduce the relock_page_lruvec function

2020-07-02 Thread Alex Shi

Use this new function to replace repeated same code.

Signed-off-by: Alex Shi 
Cc: Johannes Weiner 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Andrey Ryabinin 
Cc: Matthew Wilcox 
Cc: Mel Gorman 
Cc: Konstantin Khlebnikov 
Cc: Hugh Dickins 
Cc: Tejun Heo 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
---
 mm/mlock.c  |  9 +
 mm/swap.c   | 25 ++---
 mm/vmscan.c |  8 +---
 3 files changed, 8 insertions(+), 34 deletions(-)

diff --git a/mm/mlock.c b/mm/mlock.c
index 97a8667b4c2c..fa976a5b91c7 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -289,17 +289,10 @@ static void __munlock_pagevec(struct pagevec *pvec, 
struct zone *zone)
/* Phase 1: page isolation */
for (i = 0; i < nr; i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
bool clearlru;
 
clearlru = TestClearPageLRU(page);
-
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (new_lruvec != lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irq(lruvec);
-   lruvec = lock_page_lruvec_irq(page);
-   }
+   lruvec = relock_page_lruvec_irq(page, lruvec);
 
if (!TestClearPageMlocked(page)) {
delta_munlocked++;
diff --git a/mm/swap.c b/mm/swap.c
index 80de8a5182ca..c4d8710c8957 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -209,20 +209,12 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 
for (i = 0; i < pagevec_count(pvec); i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
-
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (lruvec != new_lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irqrestore(lruvec, flags);
-   lruvec = lock_page_lruvec_irqsave(page, );
-   }
 
/* new page add to lru or page moving between lru */
if (!add && !TestClearPageLRU(page))
continue;
 
-   lruvec = mem_cgroup_page_lruvec(page, pgdat);
+   lruvec = relock_page_lruvec_irqsave(page, lruvec, );
(*move_fn)(page, lruvec);
 
if (!add)
@@ -868,17 +860,12 @@ void release_pages(struct page **pages, int nr)
}
 
if (PageLRU(page)) {
-   struct lruvec *new_lruvec;
-
-   new_lruvec = mem_cgroup_page_lruvec(page,
-   page_pgdat(page));
-   if (new_lruvec != lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irqrestore(lruvec,
-   flags);
+   struct lruvec *pre_lruvec = lruvec;
+
+   lruvec = relock_page_lruvec_irqsave(page, lruvec,
+   );
+   if (pre_lruvec != lruvec)
lock_batch = 0;
-   lruvec = lock_page_lruvec_irqsave(page, );
-   }
 
__ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, 
page_off_lru(page));
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 168c1659e430..bdb53a678e7e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4292,15 +4292,9 @@ void check_move_unevictable_pages(struct pagevec *pvec)
 
for (i = 0; i < pvec->nr; i++) {
struct page *page = pvec->pages[i];
-   struct lruvec *new_lruvec;
 
pgscanned++;
-   new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
-   if (lruvec != new_lruvec) {
-   if (lruvec)
-   unlock_page_lruvec_irq(lruvec);
-   lruvec = lock_page_lruvec_irq(page);
-   }
+   lruvec = relock_page_lruvec_irq(page, lruvec);
 
if (!PageLRU(page) || !PageUnevictable(page))
continue;
-- 
1.8.3.1

[PATCH v14 13/20] mm/compaction: do page isolation first in compaction

2020-07-02 Thread Alex Shi

Johannes Weiner has suggested:
"So here is a crazy idea that may be worth exploring:

Right now, pgdat->lru_lock protects both PageLRU *and* the lruvec's
linked list.

Can we make PageLRU atomic and use it to stabilize the lru_lock
instead, and then use the lru_lock only serialize list operations?
..."

Yes, this patch is doing so on  __isolate_lru_page which is the core
page isolation func in compaction and shrinking path.
With this patch, the compaction will only deal the PageLRU set and now
isolated pages to skip the just alloced page which no LRU bit. And the
isolation could exclusive the other isolations in memcg move_account,
page migrations and thp split_huge_page.

As a side effect, PageLRU may be cleared during shrink_inactive_list
path for isolation reason. If so, we can skip that page.

Hugh Dickins  fixed following bugs in this patch's
early version:

Fix lots of crashes under compaction load: isolate_migratepages_block()
must clean up appropriately when rejecting a page, setting PageLRU again
if it had been cleared; and a put_page() after get_page_unless_zero()
cannot safely be done while holding locked_lruvec - it may turn out to
be the final put_page(), which will take an lruvec lock when PageLRU.
And move __isolate_lru_page_prepare back after get_page_unless_zero to
make trylock_page() safe:
trylock_page() is not safe to use at this time: its setting PG_locked
can race with the page being freed or allocated ("Bad page"), and can
also erase flags being set by one of those "sole owners" of a freshly
allocated page who use non-atomic __SetPageFlag().

Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Cc: Hugh Dickins 
Cc: Andrew Morton 
Cc: Matthew Wilcox 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/swap.h |  2 +-
 mm/compaction.c  | 42 +-
 mm/vmscan.c  | 38 ++
 3 files changed, 56 insertions(+), 26 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 2c29399b29a0..6d23d3beeff7 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -358,7 +358,7 @@ extern void lru_cache_add_active_or_unevictable(struct page 
*page,
 extern unsigned long zone_reclaimable_pages(struct zone *zone);
 extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
gfp_t gfp_mask, nodemask_t *mask);
-extern int __isolate_lru_page(struct page *page, isolate_mode_t mode);
+extern int __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode);
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
  unsigned long nr_pages,
  gfp_t gfp_mask,
diff --git a/mm/compaction.c b/mm/compaction.c
index f14780fc296a..2da2933fe56b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -869,6 +869,7 @@ static bool too_many_isolated(pg_data_t *pgdat)
if (!valid_page && IS_ALIGNED(low_pfn, pageblock_nr_pages)) {
if (!cc->ignore_skip_hint && get_pageblock_skip(page)) {
low_pfn = end_pfn;
+   page = NULL;
goto isolate_abort;
}
valid_page = page;
@@ -950,6 +951,21 @@ static bool too_many_isolated(pg_data_t *pgdat)
if (!(cc->gfp_mask & __GFP_FS) && page_mapping(page))
goto isolate_fail;
 
+   /*
+* Be careful not to clear PageLRU until after we're
+* sure the page is not being freed elsewhere -- the
+* page release code relies on it.
+*/
+   if (unlikely(!get_page_unless_zero(page)))
+   goto isolate_fail;
+
+   if (__isolate_lru_page_prepare(page, isolate_mode) != 0)
+   goto isolate_fail_put;
+
+   /* Try isolate the page */
+   if (!TestClearPageLRU(page))
+   goto isolate_fail_put;
+
/* If we already hold the lock, we can skip some rechecking */
if (!locked) {
locked = compact_lock_irqsave(>lru_lock,
@@ -962,10 +978,6 @@ static bool too_many_isolated(pg_data_t *pgdat)
goto isolate_abort;
}
 
-   /* Recheck PageLRU and PageCompound under lock */
-   if (!PageLRU(page))
-   goto isolate_fail;
-
/*
 * Page become compound since the non-locked check,
 * and it's on LRU. It can only be a THP so the order
@@ -973,16 +985,13 @@ static bool too_many_isolated(pg_data_t *pgdat)
 */
if (unlikely(PageCompound(page)

[PATCH v14 12/20] mm/lru: introduce TestClearPageLRU

2020-07-02 Thread Alex Shi

Combine PageLRU check and ClearPageLRU into a function by new
introduced func TestClearPageLRU. This function will be used as page
isolation precondition to prevent other isolations some where else.
Then there are may non PageLRU page on lru list, need to remove BUG
checking accordingly.

Hugh Dickins pointed that __page_cache_release and release_pages
has no need to do atomic clear bit since no user on the page at that
moment. and no need get_page() before lru bit clear in isolate_lru_page,
since it '(1) Must be called with an elevated refcount on the page'.

As Andrew Morton mentioned this change would dirty cacheline for page
isn't on LRU. But the lost would be acceptable with Rong Chen
 report:
https://lkml.org/lkml/2020/3/4/173

Suggested-by: Johannes Weiner 
Signed-off-by: Alex Shi 
Cc: Hugh Dickins 
Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Cc: Andrew Morton 
Cc: linux-kernel@vger.kernel.org
Cc: cgro...@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/page-flags.h |  1 +
 mm/mlock.c |  3 +--
 mm/swap.c  |  6 ++
 mm/vmscan.c| 26 +++---
 4 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6be1aa559b1e..9554ed1387dc 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -326,6 +326,7 @@ static inline void page_init_poison(struct page *page, 
size_t size)
 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
+   TESTCLEARFLAG(LRU, lru, PF_HEAD)
 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
TESTCLEARFLAG(Active, active, PF_HEAD)
 PAGEFLAG(Workingset, workingset, PF_HEAD)
diff --git a/mm/mlock.c b/mm/mlock.c
index f8736136fad7..228ba5a8e0a5 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -108,13 +108,12 @@ void mlock_vma_page(struct page *page)
  */
 static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
 {
-   if (PageLRU(page)) {
+   if (TestClearPageLRU(page)) {
struct lruvec *lruvec;
 
lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
if (getpage)
get_page(page);
-   ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_lru(page));
return true;
}
diff --git a/mm/swap.c b/mm/swap.c
index c67699de4869..b24d5f69b93a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -83,10 +83,9 @@ static void __page_cache_release(struct page *page)
struct lruvec *lruvec;
unsigned long flags;
 
+   __ClearPageLRU(page);
spin_lock_irqsave(>lru_lock, flags);
lruvec = mem_cgroup_page_lruvec(page, pgdat);
-   VM_BUG_ON_PAGE(!PageLRU(page), page);
-   __ClearPageLRU(page);
del_page_from_lru_list(page, lruvec, page_off_lru(page));
spin_unlock_irqrestore(>lru_lock, flags);
}
@@ -878,9 +877,8 @@ void release_pages(struct page **pages, int nr)
spin_lock_irqsave(_pgdat->lru_lock, 
flags);
}
 
-   lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
-   VM_BUG_ON_PAGE(!PageLRU(page), page);
__ClearPageLRU(page);
+   lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
del_page_from_lru_list(page, lruvec, 
page_off_lru(page));
}
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c1c4259b4de5..18986fefd49b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1548,16 +1548,16 @@ int __isolate_lru_page(struct page *page, 
isolate_mode_t mode)
 {
int ret = -EINVAL;
 
-   /* Only take pages on the LRU. */
-   if (!PageLRU(page))
-   return ret;
-
/* Compaction should not handle unevictable pages but CMA can do so */
if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
return ret;
 
ret = -EBUSY;
 
+   /* Only take pages on the LRU. */
+   if (!PageLRU(page))
+   return ret;
+
/*
 * To minimise LRU disruption, the caller can indicate that it only
 * wants to isolate pages it will be able to operate on without
@@ -1671,8 +1671,6 @@ static unsigned long isolate_lru_pages(unsigned long 
nr_to_scan,
page = lru_to_page(src);
prefetchw_prev_lru_page(page, src, flags);
 
-   VM_BUG_ON_PAGE(!PageLRU(page), page);
-
nr_pages = compound_nr(page);
total_scan += nr_pages;
 
@@ -1769,21 +1767,19 @@ int isolate_lru_page(struct page *page)
VM_BUG_ON_PAGE(!page_count(page), page);
WARN_RATELIMIT(PageTail(page), "trying to isolate tail

[PATCH v14 01/20] mm/vmscan: remove unnecessary lruvec adding

2020-07-02 Thread Alex Shi

We don't have to add a freeable page into lru and then remove from it.
This change saves a couple of actions and makes the moving more clear.

The SetPageLRU needs to be kept here for list intergrity.
Otherwise:
 #0 mave_pages_to_lru  #1 release_pages
   if (put_page_testzero())
 if !put_page_testzero
 !PageLRU //skip lru_lock
   list_add(>lru,)
   list_add(>lru,) //corrupt

[a...@linux-foundation.org: coding style fixes]
Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Tejun Heo 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmscan.c | 37 -
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 749d239c62b2..ddb29d813d77 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1856,26 +1856,29 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
while (!list_empty(list)) {
page = lru_to_page(list);
VM_BUG_ON_PAGE(PageLRU(page), page);
+   list_del(>lru);
if (unlikely(!page_evictable(page))) {
-   list_del(>lru);
spin_unlock_irq(>lru_lock);
putback_lru_page(page);
spin_lock_irq(>lru_lock);
continue;
}
-   lruvec = mem_cgroup_page_lruvec(page, pgdat);
 
+   /*
+* The SetPageLRU needs to be kept here for list intergrity.
+* Otherwise:
+*   #0 mave_pages_to_lru #1 release_pages
+*if (put_page_testzero())
+*   if !put_page_testzero
+*  !PageLRU //skip lru_lock
+*list_add(>lru,)
+* list_add(>lru,) //corrupt
+*/
SetPageLRU(page);
-   lru = page_lru(page);
 
-   nr_pages = hpage_nr_pages(page);
-   update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
-   list_move(>lru, >lists[lru]);
-
-   if (put_page_testzero(page)) {
+   if (unlikely(put_page_testzero(page))) {
__ClearPageLRU(page);
__ClearPageActive(page);
-   del_page_from_lru_list(page, lruvec, lru);
 
if (unlikely(PageCompound(page))) {
spin_unlock_irq(>lru_lock);
@@ -1883,11 +1886,19 @@ static unsigned noinline_for_stack 
move_pages_to_lru(struct lruvec *lruvec,
spin_lock_irq(>lru_lock);
} else
list_add(>lru, _to_free);
-   } else {
-   nr_moved += nr_pages;
-   if (PageActive(page))
-   workingset_age_nonresident(lruvec, nr_pages);
+
+   continue;
}
+
+   lruvec = mem_cgroup_page_lruvec(page, pgdat);
+   lru = page_lru(page);
+   nr_pages = hpage_nr_pages(page);
+
+   update_lru_size(lruvec, lru, page_zonenum(page), nr_pages);
+   list_add(>lru, >lists[lru]);
+   nr_moved += nr_pages;
+   if (PageActive(page))
+   workingset_age_nonresident(lruvec, nr_pages);
}
 
/*
-- 
1.8.3.1

[PATCH v14 05/20] mm/thp: move lru_add_page_tail func to huge_memory.c

2020-07-02 Thread Alex Shi

The func is only used in huge_memory.c, defining it in other file with a
CONFIG_TRANSPARENT_HUGEPAGE macro restrict just looks weird.

Let's move it THP. And make it static as Hugh Dickin suggested.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux-kernel@vger.kernel.org
Cc: linux...@kvack.org
---
 include/linux/swap.h |  2 --
 mm/huge_memory.c | 30 ++
 mm/swap.c| 33 -
 3 files changed, 30 insertions(+), 35 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5b3216ba39a9..2c29399b29a0 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -339,8 +339,6 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file,
  unsigned int nr_pages);
 extern void lru_note_cost_page(struct page *);
 extern void lru_cache_add(struct page *);
-extern void lru_add_page_tail(struct page *page, struct page *page_tail,
-struct lruvec *lruvec, struct list_head *head);
 extern void activate_page(struct page *);
 extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 78c84bee7e29..9e050b13f597 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2340,6 +2340,36 @@ static void remap_page(struct page *page)
}
 }
 
+static void lru_add_page_tail(struct page *page, struct page *page_tail,
+   struct lruvec *lruvec, struct list_head *list)
+{
+   VM_BUG_ON_PAGE(!PageHead(page), page);
+   VM_BUG_ON_PAGE(PageCompound(page_tail), page);
+   VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+   lockdep_assert_held(_pgdat(lruvec)->lru_lock);
+
+   if (!list)
+   SetPageLRU(page_tail);
+
+   if (likely(PageLRU(page)))
+   list_add_tail(_tail->lru, >lru);
+   else if (list) {
+   /* page reclaim is reclaiming a huge page */
+   get_page(page_tail);
+   list_add_tail(_tail->lru, list);
+   } else {
+   /*
+* Head page has not yet been counted, as an hpage,
+* so we must account for each subpage individually.
+*
+* Put page_tail on the list at the correct position
+* so they all end up in order.
+*/
+   add_page_to_lru_list_tail(page_tail, lruvec,
+ page_lru(page_tail));
+   }
+}
+
 static void __split_huge_page_tail(struct page *head, int tail,
struct lruvec *lruvec, struct list_head *list)
 {
diff --git a/mm/swap.c b/mm/swap.c
index a82efc33411f..7701d855873d 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -933,39 +933,6 @@ void __pagevec_release(struct pagevec *pvec)
 }
 EXPORT_SYMBOL(__pagevec_release);
 
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-/* used by __split_huge_page_refcount() */
-void lru_add_page_tail(struct page *page, struct page *page_tail,
-  struct lruvec *lruvec, struct list_head *list)
-{
-   VM_BUG_ON_PAGE(!PageHead(page), page);
-   VM_BUG_ON_PAGE(PageCompound(page_tail), page);
-   VM_BUG_ON_PAGE(PageLRU(page_tail), page);
-   lockdep_assert_held(_pgdat(lruvec)->lru_lock);
-
-   if (!list)
-   SetPageLRU(page_tail);
-
-   if (likely(PageLRU(page)))
-   list_add_tail(_tail->lru, >lru);
-   else if (list) {
-   /* page reclaim is reclaiming a huge page */
-   get_page(page_tail);
-   list_add_tail(_tail->lru, list);
-   } else {
-   /*
-* Head page has not yet been counted, as an hpage,
-* so we must account for each subpage individually.
-*
-* Put page_tail on the list at the correct position
-* so they all end up in order.
-*/
-   add_page_to_lru_list_tail(page_tail, lruvec,
- page_lru(page_tail));
-   }
-}
-#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
-
 static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
 void *arg)
 {
-- 
1.8.3.1

[PATCH v14 06/20] mm/thp: clean up lru_add_page_tail

2020-07-02 Thread Alex Shi

Since the first parameter is only used by head page, it's better to make
it explicit.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Hugh Dickins 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9e050b13f597..b18f21da4dac 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2340,19 +2340,19 @@ static void remap_page(struct page *page)
}
 }
 
-static void lru_add_page_tail(struct page *page, struct page *page_tail,
+static void lru_add_page_tail(struct page *head, struct page *page_tail,
struct lruvec *lruvec, struct list_head *list)
 {
-   VM_BUG_ON_PAGE(!PageHead(page), page);
-   VM_BUG_ON_PAGE(PageCompound(page_tail), page);
-   VM_BUG_ON_PAGE(PageLRU(page_tail), page);
+   VM_BUG_ON_PAGE(!PageHead(head), head);
+   VM_BUG_ON_PAGE(PageCompound(page_tail), head);
+   VM_BUG_ON_PAGE(PageLRU(page_tail), head);
lockdep_assert_held(_pgdat(lruvec)->lru_lock);
 
if (!list)
SetPageLRU(page_tail);
 
-   if (likely(PageLRU(page)))
-   list_add_tail(_tail->lru, >lru);
+   if (likely(PageLRU(head)))
+   list_add_tail(_tail->lru, >lru);
else if (list) {
/* page reclaim is reclaiming a huge page */
get_page(page_tail);
-- 
1.8.3.1

[PATCH v14 03/20] mm/compaction: correct the comments of compact_defer_shift

2020-07-02 Thread Alex Shi

There is no compact_defer_limit. It should be compact_defer_shift in
use. and add compact_order_failed explanation.

Signed-off-by: Alex Shi 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/mmzone.h | 1 +
 mm/compaction.c| 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f6f884970511..14c668b7e793 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -512,6 +512,7 @@ struct zone {
 * On compaction failure, 1<

[PATCH v14 07/20] mm/thp: narrow lru locking

2020-07-02 Thread Alex Shi

lru_lock and page cache xa_lock have no reason with current sequence,
put them together isn't necessary. let's narrow the lru locking, but
left the local_irq_disable to block interrupt re-entry and statistic update.

Hugh Dickins point: split_huge_page_to_list() was already silly,to be
using the _irqsave variant: it's just been taking sleeping locks, so
would already be broken if entered with interrupts enabled.
so we can save passing flags argument down to __split_huge_page().

Signed-off-by: Alex Shi 
Signed-off-by: Wei Yang 
Cc: Hugh Dickins 
Cc: Kirill A. Shutemov 
Cc: Andrea Arcangeli 
Cc: Johannes Weiner 
Cc: Matthew Wilcox 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/huge_memory.c | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b18f21da4dac..607869330329 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2433,7 +2433,7 @@ static void __split_huge_page_tail(struct page *head, int 
tail,
 }
 
 static void __split_huge_page(struct page *page, struct list_head *list,
-   pgoff_t end, unsigned long flags)
+ pgoff_t end)
 {
struct page *head = compound_head(page);
pg_data_t *pgdat = page_pgdat(head);
@@ -2442,8 +2442,6 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
unsigned long offset = 0;
int i;
 
-   lruvec = mem_cgroup_page_lruvec(head, pgdat);
-
/* complete memcg works before add pages to LRU */
mem_cgroup_split_huge_fixup(head);
 
@@ -2455,6 +2453,11 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
xa_lock(_cache->i_pages);
}
 
+   /* lock lru list/PageCompound, ref freezed by page_ref_freeze */
+   spin_lock(>lru_lock);
+
+   lruvec = mem_cgroup_page_lruvec(head, pgdat);
+
for (i = HPAGE_PMD_NR - 1; i >= 1; i--) {
__split_huge_page_tail(head, i, lruvec, list);
/* Some pages can be beyond i_size: drop them from page cache */
@@ -2474,6 +2477,8 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
}
 
ClearPageCompound(head);
+   spin_unlock(>lru_lock);
+   /* Caller disabled irqs, so they are still disabled here */
 
split_page_owner(head, HPAGE_PMD_ORDER);
 
@@ -2491,8 +2496,7 @@ static void __split_huge_page(struct page *page, struct 
list_head *list,
page_ref_add(head, 2);
xa_unlock(>mapping->i_pages);
}
-
-   spin_unlock_irqrestore(>lru_lock, flags);
+   local_irq_enable();
 
remap_page(head);
 
@@ -2631,12 +2635,10 @@ bool can_split_huge_page(struct page *page, int 
*pextra_pins)
 int split_huge_page_to_list(struct page *page, struct list_head *list)
 {
struct page *head = compound_head(page);
-   struct pglist_data *pgdata = NODE_DATA(page_to_nid(head));
struct deferred_split *ds_queue = get_deferred_split_queue(head);
struct anon_vma *anon_vma = NULL;
struct address_space *mapping = NULL;
int count, mapcount, extra_pins, ret;
-   unsigned long flags;
pgoff_t end;
 
VM_BUG_ON_PAGE(is_huge_zero_page(head), head);
@@ -2697,9 +2699,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
unmap_page(head);
VM_BUG_ON_PAGE(compound_mapcount(head), head);
 
-   /* prevent PageLRU to go away from under us, and freeze lru stats */
-   spin_lock_irqsave(>lru_lock, flags);
-
+   local_irq_disable();
if (mapping) {
XA_STATE(xas, >i_pages, page_index(head));
 
@@ -2729,7 +2729,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
__dec_node_page_state(head, NR_FILE_THPS);
}
 
-   __split_huge_page(page, list, end, flags);
+   __split_huge_page(page, list, end);
if (PageSwapCache(head)) {
swp_entry_t entry = { .val = page_private(head) };
 
@@ -2748,7 +2748,7 @@ int split_huge_page_to_list(struct page *page, struct 
list_head *list)
spin_unlock(_queue->split_queue_lock);
 fail:  if (mapping)
xa_unlock(>i_pages);
-   spin_unlock_irqrestore(>lru_lock, flags);
+   local_irq_enable();
remap_page(head);
ret = -EBUSY;
}
-- 
1.8.3.1

Re: [PATCH v2 2/2] block: add max_active_zones to blk-sysfs

2020-07-02 Thread Damien Le Moal

On 2020/07/03 3:20, Niklas Cassel wrote:
> Add a new max_active zones definition in the sysfs documentation.
> This definition will be common for all devices utilizing the zoned block
> device support in the kernel.
> 
> Export max_active_zones according to this new definition for NVMe Zoned
> Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
> the kernel), and ZBC SCSI devices.
> 
> Add the new max_active_zones member to struct request_queue, rather
> than as a queue limit, since this property cannot be split across stacking
> drivers.
> 
> For SCSI devices, even though max active zones is not part of the ZBC/ZAC
> spec, export max_active_zones as 0, signifying "no limit".
> 
> Signed-off-by: Niklas Cassel 
> Reviewed-by: Javier González 
> ---
>  Documentation/block/queue-sysfs.rst |  7 +++
>  block/blk-sysfs.c   | 14 +-
>  drivers/nvme/host/zns.c |  1 +
>  drivers/scsi/sd_zbc.c   |  1 +
>  include/linux/blkdev.h  | 16 
>  5 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/block/queue-sysfs.rst 
> b/Documentation/block/queue-sysfs.rst
> index f01cf8530ae4..f261a5c84170 100644
> --- a/Documentation/block/queue-sysfs.rst
> +++ b/Documentation/block/queue-sysfs.rst
> @@ -117,6 +117,13 @@ Maximum number of elements in a DMA scatter/gather list 
> with integrity
>  data that will be submitted by the block layer core to the associated
>  block driver.
>  
> +max_active_zones (RO)
> +-
> +For zoned block devices (zoned attribute indicating "host-managed" or
> +"host-aware"), the sum of zones belonging to any of the zone states:
> +EXPLICIT OPEN, IMPLICIT OPEN or CLOSED, is limited by this value.
> +If this value is 0, there is no limit.
> +
>  max_open_zones (RO)
>  ---
>  For zoned block devices (zoned attribute indicating "host-managed" or
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index fa42961e9678..624bb4d85fc7 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -310,6 +310,11 @@ static ssize_t queue_max_open_zones_show(struct 
> request_queue *q, char *page)
>   return queue_var_show(queue_max_open_zones(q), page);
>  }
>  
> +static ssize_t queue_max_active_zones_show(struct request_queue *q, char 
> *page)
> +{
> + return queue_var_show(queue_max_active_zones(q), page);
> +}
> +
>  static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>  {
>   return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -677,6 +682,11 @@ static struct queue_sysfs_entry 
> queue_max_open_zones_entry = {
>   .show = queue_max_open_zones_show,
>  };
>  
> +static struct queue_sysfs_entry queue_max_active_zones_entry = {
> + .attr = {.name = "max_active_zones", .mode = 0444 },
> + .show = queue_max_active_zones_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nomerges_entry = {
>   .attr = {.name = "nomerges", .mode = 0644 },
>   .show = queue_nomerges_show,
> @@ -776,6 +786,7 @@ static struct attribute *queue_attrs[] = {
>   _zoned_entry.attr,
>   _nr_zones_entry.attr,
>   _max_open_zones_entry.attr,
> + _max_active_zones_entry.attr,
>   _nomerges_entry.attr,
>   _rq_affinity_entry.attr,
>   _iostats_entry.attr,
> @@ -803,7 +814,8 @@ static umode_t queue_attr_visible(struct kobject *kobj, 
> struct attribute *attr,
>   (!q->mq_ops || !q->mq_ops->timeout))
>   return 0;
>  
> - if (attr == _max_open_zones_entry.attr &&
> + if ((attr == _max_open_zones_entry.attr ||
> +  attr == _max_active_zones_entry.attr) &&
>   !blk_queue_is_zoned(q))
>   return 0;
>  
> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> index 3d80b9cf6bfc..57cfd78731fb 100644
> --- a/drivers/nvme/host/zns.c
> +++ b/drivers/nvme/host/zns.c
> @@ -97,6 +97,7 @@ int nvme_update_zone_info(struct gendisk *disk, struct 
> nvme_ns *ns,
>   q->limits.zoned = BLK_ZONED_HM;
>   blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
>   blk_queue_max_open_zones(q, le32_to_cpu(id->mor) + 1);
> + blk_queue_max_active_zones(q, le32_to_cpu(id->mar) + 1);
>  free_data:
>   kfree(id);
>   return status;
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index aa3564139b40..d8b2c49d645b 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -721,6 +721,7 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned 
> char *buf)
>   blk_queue_max_open_zones(q, 0);
>   else
>   blk_queue_max_open_zones(q, sdkp->zones_max_open);
> + blk_queue_max_active_zones(q, 0);
>   nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
>  
>   /* READ16/WRITE16 is mandatory for ZBC disks */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index fe168abcfdda..bb9e6eb6a7e6 100644
> --- a/include/linux/blkdev.h
> +++

Re: [PATCH v2 1/2] block: add max_open_zones to blk-sysfs

2020-07-02 Thread Damien Le Moal

On 2020/07/03 3:20, Niklas Cassel wrote:
> Add a new max_open_zones definition in the sysfs documentation.
> This definition will be common for all devices utilizing the zoned block
> device support in the kernel.
> 
> Export max open zones according to this new definition for NVMe Zoned
> Namespace devices, ZAC ATA devices (which are treated as SCSI devices by
> the kernel), and ZBC SCSI devices.
> 
> Add the new max_open_zones member to struct request_queue, rather
> than as a queue limit, since this property cannot be split across stacking
> drivers.
> 
> Signed-off-by: Niklas Cassel 
> Reviewed-by: Javier González 
> ---
>  Documentation/block/queue-sysfs.rst |  7 +++
>  block/blk-sysfs.c   | 15 +++
>  drivers/nvme/host/zns.c |  1 +
>  drivers/scsi/sd_zbc.c   |  4 
>  include/linux/blkdev.h  | 16 
>  5 files changed, 43 insertions(+)
> 
> diff --git a/Documentation/block/queue-sysfs.rst 
> b/Documentation/block/queue-sysfs.rst
> index 6a8513af9201..f01cf8530ae4 100644
> --- a/Documentation/block/queue-sysfs.rst
> +++ b/Documentation/block/queue-sysfs.rst
> @@ -117,6 +117,13 @@ Maximum number of elements in a DMA scatter/gather list 
> with integrity
>  data that will be submitted by the block layer core to the associated
>  block driver.
>  
> +max_open_zones (RO)
> +---
> +For zoned block devices (zoned attribute indicating "host-managed" or
> +"host-aware"), the sum of zones belonging to any of the zone states:
> +EXPLICIT OPEN or IMPLICIT OPEN, is limited by this value.
> +If this value is 0, there is no limit.
> +
>  max_sectors_kb (RW)
>  ---
>  This is the maximum number of kilobytes that the block layer will allow
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 02643e149d5e..fa42961e9678 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -305,6 +305,11 @@ static ssize_t queue_nr_zones_show(struct request_queue 
> *q, char *page)
>   return queue_var_show(blk_queue_nr_zones(q), page);
>  }
>  
> +static ssize_t queue_max_open_zones_show(struct request_queue *q, char *page)
> +{
> + return queue_var_show(queue_max_open_zones(q), page);
> +}
> +
>  static ssize_t queue_nomerges_show(struct request_queue *q, char *page)
>  {
>   return queue_var_show((blk_queue_nomerges(q) << 1) |
> @@ -667,6 +672,11 @@ static struct queue_sysfs_entry queue_nr_zones_entry = {
>   .show = queue_nr_zones_show,
>  };
>  
> +static struct queue_sysfs_entry queue_max_open_zones_entry = {
> + .attr = {.name = "max_open_zones", .mode = 0444 },
> + .show = queue_max_open_zones_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nomerges_entry = {
>   .attr = {.name = "nomerges", .mode = 0644 },
>   .show = queue_nomerges_show,
> @@ -765,6 +775,7 @@ static struct attribute *queue_attrs[] = {
>   _nonrot_entry.attr,
>   _zoned_entry.attr,
>   _nr_zones_entry.attr,
> + _max_open_zones_entry.attr,
>   _nomerges_entry.attr,
>   _rq_affinity_entry.attr,
>   _iostats_entry.attr,
> @@ -792,6 +803,10 @@ static umode_t queue_attr_visible(struct kobject *kobj, 
> struct attribute *attr,
>   (!q->mq_ops || !q->mq_ops->timeout))
>   return 0;
>  
> + if (attr == _max_open_zones_entry.attr &&
> + !blk_queue_is_zoned(q))
> + return 0;
> +
>   return attr->mode;
>  }
>  
> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> index 04e5b991c00c..3d80b9cf6bfc 100644
> --- a/drivers/nvme/host/zns.c
> +++ b/drivers/nvme/host/zns.c
> @@ -96,6 +96,7 @@ int nvme_update_zone_info(struct gendisk *disk, struct 
> nvme_ns *ns,
>  
>   q->limits.zoned = BLK_ZONED_HM;
>   blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
> + blk_queue_max_open_zones(q, le32_to_cpu(id->mor) + 1);
>  free_data:
>   kfree(id);
>   return status;
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index 183a20720da9..aa3564139b40 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -717,6 +717,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned 
> char *buf)
>   /* The drive satisfies the kernel restrictions: set it up */
>   blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
>   blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE);
> + if (sdkp->zones_max_open == U32_MAX)
> + blk_queue_max_open_zones(q, 0);
> + else
> + blk_queue_max_open_zones(q, sdkp->zones_max_open);
>   nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
>  
>   /* READ16/WRITE16 is mandatory for ZBC disks */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 8fd900998b4e..fe168abcfdda 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -520,6 +520,7 @@ struct request_queue {
>   unsigned intnr_zones;
>   unsigned long

[PATCH net-next] net: bcmgenet: Allow changing carrier from user-space

2020-07-02 Thread Florian Fainelli

The GENET driver interfaces with internal MoCA interface as well as
external MoCA chips like the BCM6802/6803 through a fixed link
interface. It is desirable for the mocad user-space daemon to be able to
control the carrier state based upon out of band messages that it
receives from the MoCA chip.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index af924a8b885f..ee84a26bd8f3 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -3647,6 +3647,22 @@ static struct net_device_stats 
*bcmgenet_get_stats(struct net_device *dev)
return >stats;
 }
 
+static int bcmgenet_change_carrier(struct net_device *dev, bool new_carrier)
+{
+   struct bcmgenet_priv *priv = netdev_priv(dev);
+
+   if (!dev->phydev || !phy_is_pseudo_fixed_link(dev->phydev) ||
+   priv->phy_interface != PHY_INTERFACE_MODE_MOCA)
+   return -EOPNOTSUPP;
+
+   if (new_carrier)
+   netif_carrier_on(dev);
+   else
+   netif_carrier_off(dev);
+
+   return 0;
+}
+
 static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_open   = bcmgenet_open,
.ndo_stop   = bcmgenet_close,
@@ -3660,6 +3676,7 @@ static const struct net_device_ops bcmgenet_netdev_ops = {
.ndo_poll_controller= bcmgenet_poll_controller,
 #endif
.ndo_get_stats  = bcmgenet_get_stats,
+   .ndo_change_carrier = bcmgenet_change_carrier,
 };
 
 /* Array of GENET hardware parameters/characteristics */
-- 
2.17.1

[tip:WIP.core/headers] BUILD SUCCESS 1167da790ee80bbd845f50a5a32a5b0c47491b35

2020-07-02 Thread kernel test robot

  allyesconfig
alphaalldefconfig
arc haps_hs_defconfig
arm   tegra_defconfig
sh  defconfig
m68k  sun3x_defconfig
mips  mips_paravirt_defconfig
sh sh7710voipgw_defconfig
c6x defconfig
i386  allnoconfig
i386defconfig
i386  debian-10.3
i386 allyesconfig
ia64 allmodconfig
ia64defconfig
ia64  allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k  allnoconfig
m68k   sun3_defconfig
m68k allyesconfig
nios2   defconfig
nios2allyesconfig
openriscdefconfig
c6x  allyesconfig
c6x   allnoconfig
openrisc allyesconfig
nds32   defconfig
csky allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
h8300allmodconfig
xtensa  defconfig
arc defconfig
arc  allyesconfig
sh   allmodconfig
shallnoconfig
microblazeallnoconfig
mips allyesconfig
mips allmodconfig
pariscallnoconfig
parisc  defconfig
parisc   allyesconfig
parisc   allmodconfig
powerpc defconfig
powerpc  rhel-kconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a006-20200629
i386 randconfig-a002-20200629
i386 randconfig-a003-20200629
i386 randconfig-a001-20200629
i386 randconfig-a005-20200629
i386 randconfig-a004-20200629
i386 randconfig-a001-20200630
i386 randconfig-a003-20200630
i386 randconfig-a002-20200630
i386 randconfig-a004-20200630
i386 randconfig-a005-20200630
i386 randconfig-a006-20200630
i386 randconfig-a002-20200701
i386 randconfig-a001-20200701
i386 randconfig-a006-20200701
i386 randconfig-a005-20200701
i386 randconfig-a004-20200701
i386 randconfig-a003-20200701
x86_64   randconfig-a011-20200629
x86_64   randconfig-a012-20200629
x86_64   randconfig-a013-20200629
x86_64   randconfig-a014-20200629
x86_64   randconfig-a015-20200629
x86_64   randconfig-a016-20200629
x86_64   randconfig-a011-20200630
x86_64   randconfig-a014-20200630
x86_64   randconfig-a013-20200630
x86_64   randconfig-a015-20200630
x86_64   randconfig-a016-20200630
x86_64   randconfig-a012-20200630
x86_64   randconfig-a012-20200701
x86_64   randconfig-a016-20200701
x86_64   randconfig-a014-20200701
x86_64   randconfig-a011-20200701
x86_64   randconfig-a015-20200701
x86_64   randconfig-a013-20200701
i386 randconfig-a013-20200629
i386 randconfig-a016-20200629
i386 randconfig-a014-20200629
i386 randconfig-a012-20200629
i386 randconfig-a015-20200629
i386 randconfig-a011-20200629
i386 randconfig-a011-20200630
i386 randconfig-a016-20200630
i386 randconfig-a015-20200630
i386 randconfig-a014-20200630
i386 randconfig-a013-20200630
i386 randconfig-a012-20200630
i386 randconfig-a011-20200701
i386 randconfig-a015-20200701
i386 randconfig-a014-20200701
i386 randconfig-a016-20200701
i386 randconfig-a012-20200701
i386 randconfig-a013-20200701
i386 randconfig-a011-20200702
i386 randconfig-a014-20200702
i386 randconfig-a015-20200702
i386 randconfig-a016-20200702
i386

Re: [PATCH 1/2] block: add max_open_zones to blk-sysfs

2020-07-02 Thread Damien Le Moal

On 2020/07/02 21:37, Niklas Cassel wrote:
> On Tue, Jun 30, 2020 at 01:49:41AM +, Damien Le Moal wrote:
>> On 2020/06/16 19:28, Niklas Cassel wrote:
>>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
>>> index c08f6281b614..af156529f3b6 100644
>>> --- a/drivers/nvme/host/zns.c
>>> +++ b/drivers/nvme/host/zns.c
>>> @@ -82,6 +82,7 @@ int nvme_update_zone_info(struct gendisk *disk, struct 
>>> nvme_ns *ns,
>>>  
>>> q->limits.zoned = BLK_ZONED_HM;
>>> blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
>>> +   blk_queue_max_open_zones(q, le32_to_cpu(id->mor) + 1);
>>>  free_data:
>>> kfree(id);
>>> return status;
>>> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
>>> index 183a20720da9..aa3564139b40 100644
>>> --- a/drivers/scsi/sd_zbc.c
>>> +++ b/drivers/scsi/sd_zbc.c
>>> @@ -717,6 +717,10 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned 
>>> char *buf)
>>> /* The drive satisfies the kernel restrictions: set it up */
>>> blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
>>> blk_queue_required_elevator_features(q, ELEVATOR_F_ZBD_SEQ_WRITE);
>>> +   if (sdkp->zones_max_open == U32_MAX)
>>> +   blk_queue_max_open_zones(q, 0);
>>> +   else
>>> +   blk_queue_max_open_zones(q, sdkp->zones_max_open);
>>
>> This is correct only for host-managed drives. Host-aware models define the
>> "OPTIMAL NUMBER OF OPEN SEQUENTIAL WRITE PREFERRED ZONES" instead of a 
>> maximum
>> number of open sequential write required zones.
>>
>> Since the standard does not actually explicitly define what the value of the
>> maximum number of open sequential write required zones should be for a
>> host-aware drive, I would suggest to always have the max_open_zones value 
>> set to
>> 0 for host-aware disks.
> 
> Isn't this already the case?
> 
> At least according to the comments:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/sd_zbc.c?h=v5.8-rc3#n555
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/scsi/sd_zbc.c?h=v5.8-rc3#n561
> 
> We seem to set
> 
> sdkp->zones_max_open = 0;
> 
> for host-aware, and
> 
> sdkp->zones_max_open = get_unaligned_be32([16]);
> 
> for host-managed.
> 
> So the blk_queue_max_open_zones(q, sdkp->zones_max_open) call in
> sd_zbc_read_zones() should already export this new sysfs property
> as 0 for host-aware disks.

Oh, yes ! You are absolutely right. Forgot about that code :)
Please disregard this comment.

> 
> 
> Kind regards,
> Niklas
> 
>>
>>> nr_zones = round_up(sdkp->capacity, zone_blocks) >> ilog2(zone_blocks);
>>>  
>>> /* READ16/WRITE16 is mandatory for ZBC disks */
>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>> index 8fd900998b4e..2f332f00501d 100644
>>> --- a/include/linux/blkdev.h
>>> +++ b/include/linux/blkdev.h
>>> @@ -520,6 +520,7 @@ struct request_queue {
>>> unsigned intnr_zones;
>>> unsigned long   *conv_zones_bitmap;
>>> unsigned long   *seq_zones_wlock;
>>> +   unsigned intmax_open_zones;
>>>  #endif /* CONFIG_BLK_DEV_ZONED */
>>>  
>>> /*
>>> @@ -729,6 +730,17 @@ static inline bool blk_queue_zone_is_seq(struct 
>>> request_queue *q,
>>> return true;
>>> return !test_bit(blk_queue_zone_no(q, sector), q->conv_zones_bitmap);
>>>  }
>>> +
>>> +static inline void blk_queue_max_open_zones(struct request_queue *q,
>>> +   unsigned int max_open_zones)
>>> +{
>>> +   q->max_open_zones = max_open_zones;
>>> +}
>>> +
>>> +static inline unsigned int queue_max_open_zones(const struct request_queue 
>>> *q)
>>> +{
>>> +   return q->max_open_zones;
>>> +}
>>>  #else /* CONFIG_BLK_DEV_ZONED */
>>>  static inline unsigned int blk_queue_nr_zones(struct request_queue *q)
>>>  {
>>> @@ -744,6 +756,14 @@ static inline unsigned int blk_queue_zone_no(struct 
>>> request_queue *q,
>>>  {
>>> return 0;
>>>  }
>>> +static inline void blk_queue_max_open_zones(struct request_queue *q,
>>> +   unsigned int max_open_zones)
>>> +{
>>> +}
>>
>> Why is this one necessary ? For the !CONFIG_BLK_DEV_ZONED case, no driver 
>> should
>> ever call this function.
> 
> Will remove in v2.
> 


-- 
Damien Le Moal
Western Digital Research

[PATCH -next] locktorture: make function torture_percpu_rwsem_init() static

2020-07-02 Thread Wei Yongjun

The sparse tool complains as follows:

kernel/locking/locktorture.c:569:6: warning:
 symbol 'torture_percpu_rwsem_init' was not declared. Should it be static?

And this function is not used outside of locktorture.c,
so this commit marks it static.

Signed-off-by: Wei Yongjun 
---
 kernel/locking/locktorture.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c
index 9cfa5e89cff7..62d215b2e39f 100644
--- a/kernel/locking/locktorture.c
+++ b/kernel/locking/locktorture.c
@@ -566,7 +566,7 @@ static struct lock_torture_ops rwsem_lock_ops = {
 #include 
 static struct percpu_rw_semaphore pcpu_rwsem;
 
-void torture_percpu_rwsem_init(void)
+static void torture_percpu_rwsem_init(void)
 {
BUG_ON(percpu_init_rwsem(_rwsem));
 }

Re: [PATCH v10 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-02 Thread Dave Young

On 07/03/20 at 12:46pm, Dave Young wrote:
> Hi,
> 
> Thanks for the update, but still some nitpicks :(
> 
> I'm sorry I did not catch them previously,  but maybe it is not worth to
> repost the whole series if no other changes needed.

Feel free to add my acks for the common kdump part:

Acked-by: Dave Young 

Thanks
Dave

Re: [PATCH v2] crypto: ccp - Update CCP driver maintainer information

2020-07-02 Thread Herbert Xu

On Fri, Jun 26, 2020 at 02:09:39PM -0500, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Add John Allen as a new CCP driver maintainer. Additionally, break out
> the driver SEV support and create a new maintainer entry, with Brijesh
> Singh and Tom Lendacky as maintainers.
> 
> Cc: John Allen 
> Cc: Brijesh Singh 
> Signed-off-by: Tom Lendacky 
> 
> ---
> 
> Changes from v1:
> - Change the email for Brijesh. The previous one is an alias, use the
>   proper email address in case the alias is ever removed.
> ---
>  MAINTAINERS | 9 +
>  1 file changed, 9 insertions(+)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto, x86: aesni: add compatibility with IAS

2020-07-02 Thread Herbert Xu

On Mon, Jun 22, 2020 at 04:24:33PM -0700, Jian Cai wrote:
> Clang's integrated assembler complains "invalid reassignment of
> non-absolute variable 'var_ddq_add'" while assembling
> arch/x86/crypto/aes_ctrby8_avx-x86_64.S. It was because var_ddq_add was
> reassigned with non-absolute values several times, which IAS did not
> support. We can avoid the reassignment by replacing the uses of
> var_ddq_add with its definitions accordingly to have compatilibility
> with IAS.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1008
> Reported-by: Sedat Dilek 
> Reported-by: Fangrui Song 
> Tested-by: Sedat Dilek  # build+boot Linux v5.7.5; 
> clang v11.0.0-git
> Signed-off-by: Jian Cai 
> ---
>  arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 14 +++---
>  1 file changed, 3 insertions(+), 11 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: sun8i-ce - Fix runtime PM imbalance in sun8i_ce_cipher_init

2020-07-02 Thread Herbert Xu

On Mon, Jun 22, 2020 at 10:40:08AM +0800, Dinghao Liu wrote:
> pm_runtime_get_sync() increments the runtime PM usage counter even
> the call returns an error code. Thus a corresponding decrement is
> needed on the error handling path to keep the counter balanced.
> 
> Fix this by adding the missed function call.
> 
> Signed-off-by: Dinghao Liu 
> ---
>  drivers/crypto/allwinner/sun8i-ce/sun8i-ce-cipher.c | 1 +
>  1 file changed, 1 insertion(+)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH 0/3] qce crypto fixes for tcrypto failures

2020-07-02 Thread Herbert Xu

On Mon, Jun 22, 2020 at 11:45:03AM +0530, Sivaprakash Murugesan wrote:
> while running tcrypto test cases on qce crypto engine few failures are
> noticed, this is mainly because of the updates on tcrypto driver and
> not testing qce reqgularly with mainline tcrypto driver.
> 
> This series tries to address few of the errors while running tcrypto on
> qce.
> 
> Sivaprakash Murugesan (3):
>   crypto: qce: support zero length test vectors
>   crypto: qce: re-initialize context on import
>   crypto: qce: sha: Do not modify scatterlist passed along with request
> 
>  drivers/crypto/Kconfig  |  2 ++
>  drivers/crypto/qce/common.h |  2 ++
>  drivers/crypto/qce/sha.c| 36 +---
>  3 files changed, 33 insertions(+), 7 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH] crypto: ccp - Fix use of merged scatterlists

2020-07-02 Thread Herbert Xu

On Mon, Jun 22, 2020 at 03:24:02PM -0500, John Allen wrote:
> Running the crypto manager self tests with
> CONFIG_CRYPTO_MANAGER_EXTRA_TESTS may result in several types of errors
> when using the ccp-crypto driver:
> 
> alg: skcipher: cbc-des3-ccp encryption failed on test vector 0; 
> expected_error=0, actual_error=-5 ...
> 
> alg: skcipher: ctr-aes-ccp decryption overran dst buffer on test vector 0 ...
> 
> alg: ahash: sha224-ccp test failed (wrong result) on test vector ...
> 
> These errors are the result of improper processing of scatterlists mapped
> for DMA.
> 
> Given a scatterlist in which entries are merged as part of mapping the
> scatterlist for DMA, the DMA length of a merged entry will reflect the
> combined length of the entries that were merged. The subsequent
> scatterlist entry will contain DMA information for the scatterlist entry
> after the last merged entry, but the non-DMA information will be that of
> the first merged entry.
> 
> The ccp driver does not take this scatterlist merging into account. To
> address this, add a second scatterlist pointer to track the current
> position in the DMA mapped representation of the scatterlist. Both the DMA
> representation and the original representation of the scatterlist must be
> tracked as while most of the driver can use just the DMA representation,
> scatterlist_map_and_copy() must use the original representation and
> expects the scatterlist pointer to be accurate to the original
> representation.
> 
> In order to properly walk the original scatterlist, the scatterlist must
> be walked until the combined lengths of the entries seen is equal to the
> DMA length of the current entry being processed in the DMA mapped
> representation.
> 
> Fixes: 63b945091a070 ("crypto: ccp - CCP device driver and interface support")
> Signed-off-by: John Allen 
> Cc: sta...@vger.kernel.org
> ---
>  drivers/crypto/ccp/ccp-dev.h |  1 +
>  drivers/crypto/ccp/ccp-ops.c | 37 +---
>  2 files changed, 27 insertions(+), 11 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH v2 1/2] mm/memblock: expose only miminal interface to add/walk physmem

2020-07-02 Thread Andrew Morton

On Thu, 2 Jul 2020 09:23:10 +0200 David Hildenbrand  wrote:

> >>> ---
> >>>  arch/s390/kernel/crash_dump.c |  6 ++--
> >>>  include/linux/memblock.h  | 28 ++---
> >>>  mm/memblock.c | 57 ++-
> >>>  3 files changed, 55 insertions(+), 36 deletions(-)
> > 
> > So I guess this should go via the s390 tree, since the second patch of
> > this series can go only upstream if both this patch and a patch which
> > is currently only on our features are merged before.
> > 
> > Any objections?
> 
> @Andrew, fine with you if this goes via the s390 tree?

Sure, please go ahead.

Re: [PATCH v10 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-02 Thread Dave Young

Hi,

Thanks for the update, but still some nitpicks :(

I'm sorry I did not catch them previously,  but maybe it is not worth to
repost the whole series if no other changes needed.
On 07/03/20 at 11:58am, Chen Zhou wrote:
> Now we support crashkernel=X,[low] on arm64, update the Documentation.
> We could use parameters "crashkernel=X crashkernel=Y,low" to reserve
> memory above 4G.
> 
> Signed-off-by: Chen Zhou 
> Tested-by: John Donnelly 
> Tested-by: Prabhakar Kushwaha 
> ---
>  Documentation/admin-guide/kdump/kdump.rst   | 14 --
>  Documentation/admin-guide/kernel-parameters.txt | 17 +++--
>  2 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..e80fc9e28a9a 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -299,7 +299,15 @@ Boot into System Kernel
> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
> starting at physical address 0x0100 (16MB) for the dump-capture 
> kernel.
>  
> -   On x86 and x86_64, use "crashkernel=64M@16M".
> +   On x86 use "crashkernel=64M@16M".
> +
> +   On x86_64, use "crashkernel=Y" to select a region under 4G first, and
> +   fall back to reserve region above 4G.
> +   We can also use "crashkernel=X,high" to select a region above 4G, which
> +   also tries to allocate at least 256M below 4G automatically and
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   Use "crashkernel=Y@X" if we really have to reserve memory from specified

s/we/you

> +   start address X.
>  
> On ppc64, use "crashkernel=128M@32M".
>  
> @@ -316,8 +324,10 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>  
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> +   On arm64, use "crashkernel=Y[@X]". Note that the start address of
> the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
> +   If crashkernel=Z,low is specified simultaneously, reserve spcified size

s/spcified/specified

> +   low memory firstly and then reserve memory above 4G.
>  
>  Load the Dump-capture Kernel
>  
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index fb95fad81c79..58a731eed011 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -722,6 +722,9 @@
>   [KNL, x86_64] select a region under 4G first, and
>   fall back to reserve region above 4G when '@offset'
>   hasn't been specified.
> + [KNL, arm64] If crashkernel=X,low is specified, reserve
> + spcified size low memory firstly, and then reserve 
> memory

s/spcified/specified

> + above 4G.
>   See Documentation/admin-guide/kdump/kdump.rst for 
> further details.
>  
>   crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -746,13 +749,23 @@
>   requires at least 64M+32K low memory, also enough extra
>   low memory is needed to make sure DMA buffers for 32-bit
>   devices won't run out. Kernel would try to allocate at
> - at least 256M below 4G automatically.
> + least 256M below 4G automatically.
>   This one let user to specify own low range under 4G
>   for second kernel instead.
>   0: to disable low allocation.
>   It will be ignored when crashkernel=X,high is not used
>   or memory reserved is below 4G.
> -
> + [KNL, arm64] range under 4G.
> + This one let user to specify own low range under 4G

s/own low/a low

> + for crash dump kernel instead.
> + Be different from x86_64, kernel reserves specified size
> + physical memory region only when this parameter is 
> specified
> + instead of trying to reserve at least 256M below 4G
> + automatically.
> + Use this parameter along with crashkernel=X when we want
> + to reserve crashkernel above 4G. If there are devices
> + need to use ZONE_DMA in crash dump kernel, it is also
> + a good choice.
>   cryptomgr.notests
>   [KNL] Disable crypto self-tests
>  
> -- 
> 2.20.1
>

[PATCH v6 08/10] iommu/mediatek: Extend protect pa alignment value

2020-07-02 Thread Chao Hao

Starting with mt6779, iommu needs to extend to 256 bytes from 128
bytes which can send the max number of data for memory protection
pa alignment. So we can use a separate patch to modify it.

Signed-off-by: Chao Hao 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/mtk_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 5c8e141668fc..e71003037ffa 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -98,7 +98,7 @@
 #define F_MMU_INT_ID_LARB_ID(a)(((a) >> 7) & 0x7)
 #define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f)
 
-#define MTK_PROTECT_PA_ALIGN   128
+#define MTK_PROTECT_PA_ALIGN   256
 
 /*
  * Get the local arbiter ID and the portid within the larb arbiter
-- 
2.18.0

[PATCH v6 04/10] iommu/mediatek: Setting MISC_CTRL register

2020-07-02 Thread Chao Hao

Add F_MMU_IN_ORDER_WR_EN_MASK and F_MMU_STANDARD_AXI_MODE_EN_MASK
definitions in MISC_CTRL register.
F_MMU_STANDARD_AXI_MODE_EN_MASK:
If we set F_MMU_STANDARD_AXI_MODE_EN_MASK (bit[3][19] = 0, not follow
standard AXI protocol), the iommu will priorize sending of urgent read
command over a normal read command. This improves the performance.
F_MMU_IN_ORDER_WR_EN_MASK:
If we set F_MMU_IN_ORDER_WR_EN_MASK (bit[1][17] = 0, out-of-order write),
the iommu will re-order write commands and send the write commands with
higher priority. Otherwise the sending of write commands will be done in
order. The feature is controlled by OUT_ORDER_WR_EN platform data flag.

Cc: Matthias Brugger 
Suggested-by: Yong Wu 
Signed-off-by: Chao Hao 
---
 drivers/iommu/mtk_iommu.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 40ca564d97af..219d7aa6f059 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -42,6 +42,9 @@
 #define F_INVLD_EN1BIT(1)
 
 #define REG_MMU_MISC_CTRL  0x048
+#define F_MMU_IN_ORDER_WR_EN_MASK  (BIT(1) | BIT(17))
+#define F_MMU_STANDARD_AXI_MODE_MASK   (BIT(3) | BIT(19))
+
 #define REG_MMU_DCM_DIS0x050
 
 #define REG_MMU_CTRL_REG   0x110
@@ -105,6 +108,7 @@
 #define HAS_BCLK   BIT(1)
 #define HAS_VLD_PA_RNG BIT(2)
 #define RESET_AXI  BIT(3)
+#define OUT_ORDER_WR_ENBIT(4)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -585,8 +589,14 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
 
if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
/* The register is called STANDARD_AXI_MODE in this case */
-   writel_relaxed(0, data->base + REG_MMU_MISC_CTRL);
+   regval = 0;
+   } else {
+   regval = readl_relaxed(data->base + REG_MMU_MISC_CTRL);
+   regval &= ~F_MMU_STANDARD_AXI_MODE_MASK;
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, OUT_ORDER_WR_EN))
+   regval &= ~F_MMU_IN_ORDER_WR_EN_MASK;
}
+   writel_relaxed(regval, data->base + REG_MMU_MISC_CTRL);
 
if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
 dev_name(data->dev), (void *)data)) {
-- 
2.18.0

[PATCH v6 05/10] iommu/mediatek: Move inv_sel_reg into the plat_data

2020-07-02 Thread Chao Hao

For mt6779, MMU_INV_SEL register's offset is changed from
0x38 to 0x2c, so we can put inv_sel_reg in the plat_data to
use it.
In addition, we renamed it to REG_MMU_INV_SEL_GEN1 and use it
before mt6779.

Cc: Yong Wu 
Signed-off-by: Chao Hao 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/mtk_iommu.c | 9 ++---
 drivers/iommu/mtk_iommu.h | 1 +
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 219d7aa6f059..533b8f76f592 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -37,7 +37,7 @@
 #define REG_MMU_INVLD_START_A  0x024
 #define REG_MMU_INVLD_END_A0x028
 
-#define REG_MMU_INV_SEL0x038
+#define REG_MMU_INV_SEL_GEN1   0x038
 #define F_INVLD_EN0BIT(0)
 #define F_INVLD_EN1BIT(1)
 
@@ -178,7 +178,7 @@ static void mtk_iommu_tlb_flush_all(void *cookie)
 
for_each_m4u(data) {
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
-  data->base + REG_MMU_INV_SEL);
+  data->base + data->plat_data->inv_sel_reg);
writel_relaxed(F_ALL_INVLD, data->base + REG_MMU_INVALIDATE);
wmb(); /* Make sure the tlb flush all done */
}
@@ -195,7 +195,7 @@ static void mtk_iommu_tlb_flush_range_sync(unsigned long 
iova, size_t size,
for_each_m4u(data) {
spin_lock_irqsave(>tlb_lock, flags);
writel_relaxed(F_INVLD_EN1 | F_INVLD_EN0,
-  data->base + REG_MMU_INV_SEL);
+  data->base + data->plat_data->inv_sel_reg);
 
writel_relaxed(iova, data->base + REG_MMU_INVLD_START_A);
writel_relaxed(iova + size - 1,
@@ -784,18 +784,21 @@ static const struct dev_pm_ops mtk_iommu_pm_ops = {
 static const struct mtk_iommu_plat_data mt2712_data = {
.m4u_plat = M4U_MT2712,
.flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG,
+   .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
.larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
 };
 
 static const struct mtk_iommu_plat_data mt8173_data = {
.m4u_plat = M4U_MT8173,
.flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI,
+   .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
.larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
 };
 
 static const struct mtk_iommu_plat_data mt8183_data = {
.m4u_plat = M4U_MT8183,
.flags= RESET_AXI,
+   .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
.larbid_remap = {0, 4, 5, 6, 7, 2, 3, 1},
 };
 
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 5225a9170aaa..cf53f5e80d22 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -40,6 +40,7 @@ enum mtk_iommu_plat {
 struct mtk_iommu_plat_data {
enum mtk_iommu_plat m4u_plat;
u32 flags;
+   u32 inv_sel_reg;
unsigned char   larbid_remap[MTK_LARB_NR_MAX];
 };
 
-- 
2.18.0

[PATCH v6 06/10] iommu/mediatek: Add sub_comm id in translation fault

2020-07-02 Thread Chao Hao

The max larb number that a iommu HW support is 8(larb0~larb7 in the below
diagram).
If the larb's number is over 8, we use a sub_common for merging
several larbs into one larb. At this case, we will extend larb_id:
bit[11:9] means common-id;
bit[8:7] means subcommon-id;
From these two variables, we could get the real larb number when
translation fault happen.
The diagram is as below:
 EMI
  |
IOMMU
  |
   -
   |   |
common1 common0
   |   |
   -
  |
 smi common
  |
  
  |   |   |   | ||
 3'd03'd13'd23'd3  ...  3'd7   <-common_id(max is 8)
  |   |   |   | ||
Larb0   Larb1 | Larb3  ... Larb7
  |
smi sub common
  |
 --
 ||   |   |
2'd0 2'd12'd22'd3   <-sub_common_id(max is 4)
 ||   |   |
   Larb8Larb9   Larb10  Larb11

In this patch we extend larb_remap[] to larb_remap[8][4] for this.
larb_remap[x][y]: x means common-id above, y means subcommon_id above.

We can also distinguish if the M4U HW has sub_common by HAS_SUB_COMM
macro.

Cc: Matthias Brugger 
Signed-off-by: Chao Hao 
Reviewed-by: Yong Wu 
---
 drivers/iommu/mtk_iommu.c | 21 ++---
 drivers/iommu/mtk_iommu.h |  5 -
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 533b8f76f592..0d96dcd8612b 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -91,6 +91,8 @@
 #define REG_MMU1_INVLD_PA  0x148
 #define REG_MMU0_INT_ID0x150
 #define REG_MMU1_INT_ID0x154
+#define F_MMU_INT_ID_COMM_ID(a)(((a) >> 9) & 0x7)
+#define F_MMU_INT_ID_SUB_COMM_ID(a)(((a) >> 7) & 0x3)
 #define F_MMU_INT_ID_LARB_ID(a)(((a) >> 7) & 0x7)
 #define F_MMU_INT_ID_PORT_ID(a)(((a) >> 2) & 0x1f)
 
@@ -109,6 +111,7 @@
 #define HAS_VLD_PA_RNG BIT(2)
 #define RESET_AXI  BIT(3)
 #define OUT_ORDER_WR_ENBIT(4)
+#define HAS_SUB_COMM   BIT(5)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -239,7 +242,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
struct mtk_iommu_data *data = dev_id;
struct mtk_iommu_domain *dom = data->m4u_dom;
u32 int_state, regval, fault_iova, fault_pa;
-   unsigned int fault_larb, fault_port;
+   unsigned int fault_larb, fault_port, sub_comm = 0;
bool layer, write;
 
/* Read error info from registers */
@@ -255,10 +258,14 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id)
}
layer = fault_iova & F_MMU_FAULT_VA_LAYER_BIT;
write = fault_iova & F_MMU_FAULT_VA_WRITE_BIT;
-   fault_larb = F_MMU_INT_ID_LARB_ID(regval);
fault_port = F_MMU_INT_ID_PORT_ID(regval);
-
-   fault_larb = data->plat_data->larbid_remap[fault_larb];
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_SUB_COMM)) {
+   fault_larb = F_MMU_INT_ID_COMM_ID(regval);
+   sub_comm = F_MMU_INT_ID_SUB_COMM_ID(regval);
+   } else {
+   fault_larb = F_MMU_INT_ID_LARB_ID(regval);
+   }
+   fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm];
 
if (report_iommu_fault(>domain, data->dev, fault_iova,
   write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) {
@@ -785,21 +792,21 @@ static const struct mtk_iommu_plat_data mt2712_data = {
.m4u_plat = M4U_MT2712,
.flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
-   .larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
+   .larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}},
 };
 
 static const struct mtk_iommu_plat_data mt8173_data = {
.m4u_plat = M4U_MT8173,
.flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
-   .larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
+   .larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}}, /* Linear mapping. */
 };
 
 static const struct mtk_iommu_plat_data mt8183_data = {
.m4u_plat = M4U_MT8183,
.flags= RESET_AXI,
.inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
-   .larbid_remap = {0, 4, 5, 6, 7, 2, 3, 1},
+   .larbid_remap = {{0}, {4}, {5}, {6}, {7}, {2}, {3}, {1}},
 };
 
 static const struct of_device_id mtk_iommu_of_ids[] = {
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index cf53f5e80d22..46d0d47b22e1 100644

[PATCH v6 10/10] iommu/mediatek: Add mt6779 basic support

2020-07-02 Thread Chao Hao

1. Start from mt6779, INVLDT_SEL move to offset=0x2c, so we add
   REG_MMU_INV_SEL_GEN2 definition and mt6779 uses it.
2. Add mt6779_data to support mm_iommu HW init.

Cc: Yong Wu 
Signed-off-by: Chao Hao 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/mtk_iommu.c | 9 +
 drivers/iommu/mtk_iommu.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index a816030d00f1..59e5a62a34db 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -37,6 +37,7 @@
 #define REG_MMU_INVLD_START_A  0x024
 #define REG_MMU_INVLD_END_A0x028
 
+#define REG_MMU_INV_SEL_GEN2   0x02c
 #define REG_MMU_INV_SEL_GEN1   0x038
 #define F_INVLD_EN0BIT(0)
 #define F_INVLD_EN1BIT(1)
@@ -808,6 +809,13 @@ static const struct mtk_iommu_plat_data mt2712_data = {
.larbid_remap = {{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}},
 };
 
+static const struct mtk_iommu_plat_data mt6779_data = {
+   .m4u_plat  = M4U_MT6779,
+   .flags = HAS_SUB_COMM | OUT_ORDER_WR_EN | WR_THROT_EN,
+   .inv_sel_reg   = REG_MMU_INV_SEL_GEN2,
+   .larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
+};
+
 static const struct mtk_iommu_plat_data mt8173_data = {
.m4u_plat = M4U_MT8173,
.flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI,
@@ -824,6 +832,7 @@ static const struct mtk_iommu_plat_data mt8183_data = {
 
 static const struct of_device_id mtk_iommu_of_ids[] = {
{ .compatible = "mediatek,mt2712-m4u", .data = _data},
+   { .compatible = "mediatek,mt6779-m4u", .data = _data},
{ .compatible = "mediatek,mt8173-m4u", .data = _data},
{ .compatible = "mediatek,mt8183-m4u", .data = _data},
{}
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 31edd05e2eb1..214898578026 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -37,6 +37,7 @@ struct mtk_iommu_suspend_reg {
 enum mtk_iommu_plat {
M4U_MT2701,
M4U_MT2712,
+   M4U_MT6779,
M4U_MT8173,
M4U_MT8183,
 };
-- 
2.18.0

[PATCH v6 00/10] MT6779 IOMMU SUPPORT

2020-07-02 Thread Chao Hao

This patchset adds mt6779 iommu support.
mt6779 has two iommus, they are MM_IOMMU(M4U) and APU_IOMMU which used ARM 
Short-Descriptor translation format.
The mt6779's MM_IOMMU-SMI and APU_IOMMU HW diagram is as below, it is only a 
brief diagram:
   EMI
|
 --
 ||
  MM_IOMMUAPU_IOMMU
 ||
 SMI_COMMOM--- APU_BUS
 ||   |
  SMI_LARB(0~11)  |   |
 ||   |
 || --
 || | |  |
Multimedia engineCCU   VPU   MDLA   EMDA

All the connections are hardware fixed, software can not adjust it.
Compared with mt8183, SMI_BUS_ID width has changed from 10 to 12. SMI Larb 
number is described in bit[11:7],
Port number is described in bit[6:2]. In addition, there are some registers has 
changed in mt6779, so we need
to redefine and reuse them.

The patchset only used MM_IOMMU, so we only add MM_IOMMU basic function, such 
as smi_larb port definition, registers
definition and hardware initialization.
change notes:

 v6:
  1. Fix build error for "PATCH v5 02/10".
  2. Use more precise definitions and commit messages.

 v5:
  1. Split "iommu/mediatek: Add mt6779 IOMMU basic support(patch v4)" to three 
patches(from PATCH v5 08/10 to PATCH v5 10/10).
  2. Use macro definitions to replace bool values in mtk_iommu_plat_data 
structure
 http://lists.infradead.org/pipermail/linux-mediatek/2020-June/013586.html

 v4:
  1. Rebase on v5.8-rc1.
  2. Fix coding style.
  3. Add F_MMU_IN_DRDER_WR_EN definition in MISC_CTRL to improve performance.
 https://lkml.org/lkml/2020/6/16/1741

 v3:
  1. Rebase on v5.7-rc1.
  2. Remove unused port definition,ex:APU and CCU port in mt6779-larb-port.h.
  3. Remove "change single domain to multiple domain" part(from PATCH v2 09/19 
to PATCH v2 19/19).
  4. Redesign mt6779 basic part
(1)Add some register definition and reuse them.
(2)Redesign smi larb bus ID to analyze IOMMU translation fault.
(3)Only init MM_IOMMU and not use APU_IOMMU.
 http://lists.infradead.org/pipermail/linux-mediatek/2020-May/029811.html

 v2:
  1. Rebase on v5.5-rc1.
  2. Delete M4U_PORT_UNKNOWN define because of not use it.
  3. Correct coding format.
  4. Rename offset=0x48 register.
  5. Split "iommu/mediatek: Add mt6779 IOMMU basic support(patch v1)" to 
several patches(patch v2).
 http://lists.infradead.org/pipermail/linux-mediatek/2020-January/026131.html

 v1:
 http://lists.infradead.org/pipermail/linux-mediatek/2019-November/024567.html

Chao Hao (10):
  dt-bindings: mediatek: Add bindings for MT6779
  iommu/mediatek: Rename the register STANDARD_AXI_MODE(0x48) to MISC_CTRL
  iommu/mediatek: Use a u32 flags to describe different HW features
  iommu/mediatek: Setting MISC_CTRL register
  iommu/mediatek: Move inv_sel_reg into the plat_data
  iommu/mediatek: Add sub_comm id in translation fault
  iommu/mediatek: Add REG_MMU_WR_LEN_CTRL register definition
  iommu/mediatek: Extend protect pa alignment value
  iommu/mediatek: Modify MMU_CTRL register setting
  iommu/mediatek: Add mt6779 basic support

 .../bindings/iommu/mediatek,iommu.txt |   2 +
 drivers/iommu/mtk_iommu.c | 110 +++---
 drivers/iommu/mtk_iommu.h |  20 +-
 include/dt-bindings/memory/mt6779-larb-port.h | 206 ++
 4 files changed, 299 insertions(+), 39 deletions(-)

--
2.18.0

[PATCH v6 02/10] iommu/mediatek: Rename the register STANDARD_AXI_MODE(0x48) to MISC_CTRL

2020-07-02 Thread Chao Hao

For iommu offset=0x48 register, only the previous mt8173/mt8183 use the
name STANDARD_AXI_MODE, all the latest SoC extend the register more
feature by different bits, for example: axi_mode, in_order_en, coherent_en
and so on. So rename REG_MMU_MISC_CTRL may be more proper.

This patch only rename the register name, no functional change.

Signed-off-by: Chao Hao 
Reviewed-by: Yong Wu 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/mtk_iommu.c | 14 +++---
 drivers/iommu/mtk_iommu.h |  5 -
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 2be96f1cdbd2..88d3df5b91c2 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -41,7 +41,7 @@
 #define F_INVLD_EN0BIT(0)
 #define F_INVLD_EN1BIT(1)
 
-#define REG_MMU_STANDARD_AXI_MODE  0x048
+#define REG_MMU_MISC_CTRL  0x048
 #define REG_MMU_DCM_DIS0x050
 
 #define REG_MMU_CTRL_REG   0x110
@@ -573,8 +573,10 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
}
writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
 
-   if (data->plat_data->reset_axi)
-   writel_relaxed(0, data->base + REG_MMU_STANDARD_AXI_MODE);
+   if (data->plat_data->reset_axi) {
+   /* The register is called STANDARD_AXI_MODE in this case */
+   writel_relaxed(0, data->base + REG_MMU_MISC_CTRL);
+   }
 
if (devm_request_irq(data->dev, data->irq, mtk_iommu_isr, 0,
 dev_name(data->dev), (void *)data)) {
@@ -718,8 +720,7 @@ static int __maybe_unused mtk_iommu_suspend(struct device 
*dev)
struct mtk_iommu_suspend_reg *reg = >reg;
void __iomem *base = data->base;
 
-   reg->standard_axi_mode = readl_relaxed(base +
-  REG_MMU_STANDARD_AXI_MODE);
+   reg->misc_ctrl = readl_relaxed(base + REG_MMU_MISC_CTRL);
reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
reg->int_control0 = readl_relaxed(base + REG_MMU_INT_CONTROL0);
@@ -743,8 +744,7 @@ static int __maybe_unused mtk_iommu_resume(struct device 
*dev)
dev_err(data->dev, "Failed to enable clk(%d) in resume\n", ret);
return ret;
}
-   writel_relaxed(reg->standard_axi_mode,
-  base + REG_MMU_STANDARD_AXI_MODE);
+   writel_relaxed(reg->misc_ctrl, base + REG_MMU_MISC_CTRL);
writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
writel_relaxed(reg->int_control0, base + REG_MMU_INT_CONTROL0);
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index ea949a324e33..7212e6fcf982 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -18,7 +18,10 @@
 #include 
 
 struct mtk_iommu_suspend_reg {
-   u32 standard_axi_mode;
+   union {
+   u32 standard_axi_mode;/* v1 */
+   u32 misc_ctrl;/* v2 */
+   };
u32 dcm_dis;
u32 ctrl_reg;
u32 int_control0;
-- 
2.18.0

[PATCH v6 03/10] iommu/mediatek: Use a u32 flags to describe different HW features

2020-07-02 Thread Chao Hao

Given the fact that we are adding more and more plat_data bool values,
it would make sense to use a u32 flags register and add the appropriate
macro definitions to set and check for a flag present.
No functional change.

Cc: Yong Wu 
Suggested-by: Matthias Brugger 
Signed-off-by: Chao Hao 
Reviewed-by: Matthias Brugger 
---
 drivers/iommu/mtk_iommu.c | 28 +---
 drivers/iommu/mtk_iommu.h |  7 +--
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 88d3df5b91c2..40ca564d97af 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -100,6 +100,15 @@
 #define MTK_M4U_TO_LARB(id)(((id) >> 5) & 0xf)
 #define MTK_M4U_TO_PORT(id)((id) & 0x1f)
 
+#define HAS_4GB_MODE   BIT(0)
+/* HW will use the EMI clock if there isn't the "bclk". */
+#define HAS_BCLK   BIT(1)
+#define HAS_VLD_PA_RNG BIT(2)
+#define RESET_AXI  BIT(3)
+
+#define MTK_IOMMU_HAS_FLAG(pdata, _x) \
+   pdata)->flags) & (_x)) == (_x))
+
 struct mtk_iommu_domain {
struct io_pgtable_cfg   cfg;
struct io_pgtable_ops   *iop;
@@ -563,7 +572,8 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
 upper_32_bits(data->protect_base);
writel_relaxed(regval, data->base + REG_MMU_IVRP_PADDR);
 
-   if (data->enable_4GB && data->plat_data->has_vld_pa_rng) {
+   if (data->enable_4GB &&
+   MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_VLD_PA_RNG)) {
/*
 * If 4GB mode is enabled, the validate PA range is from
 * 0x1__ to 0x1__. here record bit[32:30].
@@ -573,7 +583,7 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
}
writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
 
-   if (data->plat_data->reset_axi) {
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
/* The register is called STANDARD_AXI_MODE in this case */
writel_relaxed(0, data->base + REG_MMU_MISC_CTRL);
}
@@ -618,7 +628,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
 
/* Whether the current dram is over 4GB */
data->enable_4GB = !!(max_pfn > (BIT_ULL(32) >> PAGE_SHIFT));
-   if (!data->plat_data->has_4gb_mode)
+   if (!MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE))
data->enable_4GB = false;
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
@@ -631,7 +641,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
if (data->irq < 0)
return data->irq;
 
-   if (data->plat_data->has_bclk) {
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_BCLK)) {
data->bclk = devm_clk_get(dev, "bclk");
if (IS_ERR(data->bclk))
return PTR_ERR(data->bclk);
@@ -763,23 +773,19 @@ static const struct dev_pm_ops mtk_iommu_pm_ops = {
 
 static const struct mtk_iommu_plat_data mt2712_data = {
.m4u_plat = M4U_MT2712,
-   .has_4gb_mode = true,
-   .has_bclk = true,
-   .has_vld_pa_rng   = true,
+   .flags= HAS_4GB_MODE | HAS_BCLK | HAS_VLD_PA_RNG,
.larbid_remap = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
 };
 
 static const struct mtk_iommu_plat_data mt8173_data = {
.m4u_plat = M4U_MT8173,
-   .has_4gb_mode = true,
-   .has_bclk = true,
-   .reset_axi= true,
+   .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI,
.larbid_remap = {0, 1, 2, 3, 4, 5}, /* Linear mapping. */
 };
 
 static const struct mtk_iommu_plat_data mt8183_data = {
.m4u_plat = M4U_MT8183,
-   .reset_axi= true,
+   .flags= RESET_AXI,
.larbid_remap = {0, 4, 5, 6, 7, 2, 3, 1},
 };
 
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 7212e6fcf982..5225a9170aaa 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -39,12 +39,7 @@ enum mtk_iommu_plat {
 
 struct mtk_iommu_plat_data {
enum mtk_iommu_plat m4u_plat;
-   boolhas_4gb_mode;
-
-   /* HW will use the EMI clock if there isn't the "bclk". */
-   boolhas_bclk;
-   boolhas_vld_pa_rng;
-   boolreset_axi;
+   u32 flags;
unsigned char   larbid_remap[MTK_LARB_NR_MAX];
 };
 
-- 
2.18.0

[PATCH v6 01/10] dt-bindings: mediatek: Add bindings for MT6779

2020-07-02 Thread Chao Hao

This patch adds description for MT6779 IOMMU.

MT6779 has two iommus, they are mm_iommu and apu_iommu which
both use ARM Short-Descriptor translation format.

In addition, mm_iommu and apu_iommu are two independent HW instance
, we need to set them separately.

The MT6779 IOMMU hardware diagram is as below, it is only a brief
diagram about iommu, it don't focus on the part of smi_larb, so
I don't describe the smi_larb detailedly.

 EMI
  |
   --
   ||
MM_IOMMUAPU_IOMMU
   ||
   SMI_COMMOM--- APU_BUS
  |||
SMI_LARB(0~11) ||
  |||
  ||   --
  ||   | |  |
   Multimedia engine  CCU VPU   MDLA   EMDA

All the connections are hardware fixed, software can not adjust it.

Signed-off-by: Chao Hao 
Reviewed-by: Rob Herring 
---
 .../bindings/iommu/mediatek,iommu.txt |   2 +
 include/dt-bindings/memory/mt6779-larb-port.h | 206 ++
 2 files changed, 208 insertions(+)
 create mode 100644 include/dt-bindings/memory/mt6779-larb-port.h

diff --git a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt 
b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
index ce59a505f5a4..c1ccd8582eb2 100644
--- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
+++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.txt
@@ -58,6 +58,7 @@ Required properties:
 - compatible : must be one of the following string:
"mediatek,mt2701-m4u" for mt2701 which uses generation one m4u HW.
"mediatek,mt2712-m4u" for mt2712 which uses generation two m4u HW.
+   "mediatek,mt6779-m4u" for mt6779 which uses generation two m4u HW.
"mediatek,mt7623-m4u", "mediatek,mt2701-m4u" for mt7623 which uses
 generation one m4u HW.
"mediatek,mt8173-m4u" for mt8173 which uses generation two m4u HW.
@@ -78,6 +79,7 @@ Required properties:
Specifies the mtk_m4u_id as defined in
dt-binding/memory/mt2701-larb-port.h for mt2701, mt7623
dt-binding/memory/mt2712-larb-port.h for mt2712,
+   dt-binding/memory/mt6779-larb-port.h for mt6779,
dt-binding/memory/mt8173-larb-port.h for mt8173, and
dt-binding/memory/mt8183-larb-port.h for mt8183.
 
diff --git a/include/dt-bindings/memory/mt6779-larb-port.h 
b/include/dt-bindings/memory/mt6779-larb-port.h
new file mode 100644
index ..2ad0899fbf2f
--- /dev/null
+++ b/include/dt-bindings/memory/mt6779-larb-port.h
@@ -0,0 +1,206 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019 MediaTek Inc.
+ * Author: Chao Hao 
+ */
+
+#ifndef _DTS_IOMMU_PORT_MT6779_H_
+#define _DTS_IOMMU_PORT_MT6779_H_
+
+#define MTK_M4U_ID(larb, port)  (((larb) << 5) | (port))
+
+#define M4U_LARB0_ID0
+#define M4U_LARB1_ID1
+#define M4U_LARB2_ID2
+#define M4U_LARB3_ID3
+#define M4U_LARB4_ID4
+#define M4U_LARB5_ID5
+#define M4U_LARB6_ID6
+#define M4U_LARB7_ID7
+#define M4U_LARB8_ID8
+#define M4U_LARB9_ID9
+#define M4U_LARB10_ID   10
+#define M4U_LARB11_ID   11
+
+/* larb0 */
+#define M4U_PORT_DISP_POSTMASK0 MTK_M4U_ID(M4U_LARB0_ID, 0)
+#define M4U_PORT_DISP_OVL0_HDR  MTK_M4U_ID(M4U_LARB0_ID, 1)
+#define M4U_PORT_DISP_OVL1_HDR  MTK_M4U_ID(M4U_LARB0_ID, 2)
+#define M4U_PORT_DISP_OVL0  MTK_M4U_ID(M4U_LARB0_ID, 3)
+#define M4U_PORT_DISP_OVL1  MTK_M4U_ID(M4U_LARB0_ID, 4)
+#define M4U_PORT_DISP_PVRIC0MTK_M4U_ID(M4U_LARB0_ID, 5)
+#define M4U_PORT_DISP_RDMA0 MTK_M4U_ID(M4U_LARB0_ID, 6)
+#define M4U_PORT_DISP_WDMA0 MTK_M4U_ID(M4U_LARB0_ID, 7)
+#define M4U_PORT_DISP_FAKE0 MTK_M4U_ID(M4U_LARB0_ID, 8)
+
+/* larb1 */
+#define M4U_PORT_DISP_OVL0_2L_HDR   MTK_M4U_ID(M4U_LARB1_ID, 0)
+#define M4U_PORT_DISP_OVL1_2L_HDR   MTK_M4U_ID(M4U_LARB1_ID, 1)
+#define M4U_PORT_DISP_OVL0_2L   MTK_M4U_ID(M4U_LARB1_ID, 2)
+#define M4U_PORT_DISP_OVL1_2L   MTK_M4U_ID(M4U_LARB1_ID, 3)
+#define M4U_PORT_DISP_RDMA1 MTK_M4U_ID(M4U_LARB1_ID, 4)
+#define M4U_PORT_MDP_PVRIC0 MTK_M4U_ID(M4U_LARB1_ID, 5)
+#define M4U_PORT_MDP_PVRIC1 MTK_M4U_ID(M4U_LARB1_ID, 6)
+#define M4U_PORT_MDP_RDMA0  MTK_M4U_ID(M4U_LARB1_ID, 7)
+#define M4U_PORT_MDP_RDMA1  MTK_M4U_ID(M4U_LARB1_ID, 8)
+#define M4U_PORT_MDP_WROT0_R

[PATCH v6 09/10] iommu/mediatek: Modify MMU_CTRL register setting

2020-07-02 Thread Chao Hao

The MMU_CTRL register of MT8173 is different from other SoCs.
The in_order_wr_en is bit[9] which is zero by default.
Other SoCs have the vitcim_tlb_en feature mapped to bit[12].
This bit is set to one by default. We need to preserve the bit
when setting F_MMU_TF_PROT_TO_PROGRAM_ADDR as otherwise the
bit will be cleared and IOMMU performance will drop.

Cc: Matthias Brugger 
Cc: Yong Wu 
Signed-off-by: Chao Hao 
---
 drivers/iommu/mtk_iommu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index e71003037ffa..a816030d00f1 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -555,11 +555,13 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
return ret;
}
 
-   if (data->plat_data->m4u_plat == M4U_MT8173)
+   if (data->plat_data->m4u_plat == M4U_MT8173) {
regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
 F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
-   else
-   regval = F_MMU_TF_PROT_TO_PROGRAM_ADDR;
+   } else {
+   regval = readl_relaxed(data->base + REG_MMU_CTRL_REG);
+   regval |= F_MMU_TF_PROT_TO_PROGRAM_ADDR;
+   }
writel_relaxed(regval, data->base + REG_MMU_CTRL_REG);
 
regval = F_L2_MULIT_HIT_EN |
-- 
2.18.0

[PATCH v6 07/10] iommu/mediatek: Add REG_MMU_WR_LEN_CTRL register definition

2020-07-02 Thread Chao Hao

Some platforms(ex: mt6779) need to improve performance by setting
REG_MMU_WR_LEN_CTRL register. And we can use WR_THROT_EN macro to control
whether we need to set the register. If the register uses default value,
iommu will send command to EMI without restriction, when the number of
commands become more and more, it will drop the EMI performance. So when
more than ten_commands(default value) don't be handled for EMI, iommu will
stop send command to EMI for keeping EMI's performace by enabling write
throttling mechanism(bit[5][21]=0) in MMU_WR_LEN_CTRL register.

Cc: Matthias Brugger 
Signed-off-by: Chao Hao 
---
 drivers/iommu/mtk_iommu.c | 11 +++
 drivers/iommu/mtk_iommu.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0d96dcd8612b..5c8e141668fc 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -46,6 +46,8 @@
 #define F_MMU_STANDARD_AXI_MODE_MASK   (BIT(3) | BIT(19))
 
 #define REG_MMU_DCM_DIS0x050
+#define REG_MMU_WR_LEN_CTRL0x054
+#define F_MMU_WR_THROT_DIS_MASK(BIT(5) | BIT(21))
 
 #define REG_MMU_CTRL_REG   0x110
 #define F_MMU_TF_PROT_TO_PROGRAM_ADDR  (2 << 4)
@@ -112,6 +114,7 @@
 #define RESET_AXI  BIT(3)
 #define OUT_ORDER_WR_ENBIT(4)
 #define HAS_SUB_COMM   BIT(5)
+#define WR_THROT_ENBIT(6)
 
 #define MTK_IOMMU_HAS_FLAG(pdata, _x) \
pdata)->flags) & (_x)) == (_x))
@@ -593,6 +596,12 @@ static int mtk_iommu_hw_init(const struct mtk_iommu_data 
*data)
writel_relaxed(regval, data->base + REG_MMU_VLD_PA_RNG);
}
writel_relaxed(0, data->base + REG_MMU_DCM_DIS);
+   if (MTK_IOMMU_HAS_FLAG(data->plat_data, WR_THROT_EN)) {
+   /* write command throttling mode */
+   regval = readl_relaxed(data->base + REG_MMU_WR_LEN_CTRL);
+   regval &= ~F_MMU_WR_THROT_DIS_MASK;
+   writel_relaxed(regval, data->base + REG_MMU_WR_LEN_CTRL);
+   }
 
if (MTK_IOMMU_HAS_FLAG(data->plat_data, RESET_AXI)) {
/* The register is called STANDARD_AXI_MODE in this case */
@@ -747,6 +756,7 @@ static int __maybe_unused mtk_iommu_suspend(struct device 
*dev)
struct mtk_iommu_suspend_reg *reg = >reg;
void __iomem *base = data->base;
 
+   reg->wr_len_ctrl = readl_relaxed(base + REG_MMU_WR_LEN_CTRL);
reg->misc_ctrl = readl_relaxed(base + REG_MMU_MISC_CTRL);
reg->dcm_dis = readl_relaxed(base + REG_MMU_DCM_DIS);
reg->ctrl_reg = readl_relaxed(base + REG_MMU_CTRL_REG);
@@ -771,6 +781,7 @@ static int __maybe_unused mtk_iommu_resume(struct device 
*dev)
dev_err(data->dev, "Failed to enable clk(%d) in resume\n", ret);
return ret;
}
+   writel_relaxed(reg->wr_len_ctrl, base + REG_MMU_WR_LEN_CTRL);
writel_relaxed(reg->misc_ctrl, base + REG_MMU_MISC_CTRL);
writel_relaxed(reg->dcm_dis, base + REG_MMU_DCM_DIS);
writel_relaxed(reg->ctrl_reg, base + REG_MMU_CTRL_REG);
diff --git a/drivers/iommu/mtk_iommu.h b/drivers/iommu/mtk_iommu.h
index 46d0d47b22e1..31edd05e2eb1 100644
--- a/drivers/iommu/mtk_iommu.h
+++ b/drivers/iommu/mtk_iommu.h
@@ -31,6 +31,7 @@ struct mtk_iommu_suspend_reg {
u32 int_main_control;
u32 ivrp_paddr;
u32 vld_pa_rng;
+   u32 wr_len_ctrl;
 };
 
 enum mtk_iommu_plat {
-- 
2.18.0

RE: [PATCH v2 1/3] crypto: permit users to specify numa node of acomp hardware

2020-07-02 Thread Song Bao Hua (Barry Song)




> -Original Message-
> From: Herbert Xu [mailto:herb...@gondor.apana.org.au]
> Sent: Friday, July 3, 2020 4:11 PM
> To: Song Bao Hua (Barry Song) 
> Cc: da...@davemloft.net; Wangzhou (B) ;
> Jonathan Cameron ;
> a...@linux-foundation.org; linux-cry...@vger.kernel.org;
> linux...@kvack.org; linux-kernel@vger.kernel.org; Linuxarm
> ; Seth Jennings ; Dan
> Streetman ; Vitaly Wool 
> Subject: Re: [PATCH v2 1/3] crypto: permit users to specify numa node of
> acomp hardware
> 
> On Tue, Jun 23, 2020 at 04:16:08PM +1200, Barry Song wrote:
> >
> > -void *crypto_create_tfm(struct crypto_alg *alg,
> > -   const struct crypto_type *frontend)
> > +void *crypto_create_tfm_node(struct crypto_alg *alg,
> > +   const struct crypto_type *frontend,
> > +   int node)
> >  {
> > char *mem;
> > struct crypto_tfm *tfm = NULL;
> > @@ -451,6 +452,7 @@ void *crypto_create_tfm(struct crypto_alg *alg,
> >
> > tfm = (struct crypto_tfm *)(mem + tfmsize);
> > tfm->__crt_alg = alg;
> > +   tfm->node = node;
> 
> Should the kzalloc also use node?

Yes, it would be nice since the tfm will mainly be accessed by CPU on the 
specific node.

> 
> Thanks,
> --
> Email: Herbert Xu 
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Thanks
Barry

linux-next: manual merge of the kspp tree with the drm-misc tree

2020-07-02 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the kspp tree got a conflict in:

  drivers/gpu/drm/drm_edid.c

between commit:

  948de84233d3 ("drm : Insert blank lines after declarations.")

from the drm-misc tree and commit:

  80b89ab785a4 ("treewide: Remove uninitialized_var() usage")

from the kspp tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/gpu/drm/drm_edid.c
index 252e89cb54a3,b98fa573e706..
--- a/drivers/gpu/drm/drm_edid.c
+++ b/drivers/gpu/drm/drm_edid.c
@@@ -3095,8 -3051,7 +3095,8 @@@ static int drm_cvt_modes(struct drm_con
const u8 empty[3] = { 0, 0, 0 };
  
for (i = 0; i < 4; i++) {
-   int uninitialized_var(width), height;
+   int width, height;
 +
cvt = &(timing->data.other_data.data.cvt[i]);
  
if (!memcmp(cvt->code, empty, 3))


pgpgeTCVjexaQ.pgp
Description: OpenPGP digital signature

Re: [PATCH v6 2/2] display/drm/bridge: TC358775 DSI/LVDS driver

2020-07-02 Thread kernel test robot

Hi Vinay,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.8-rc3 next-20200702]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Vinay-Simha-BN/dt-binding-Add-DSI-LVDS-TC358775-bridge-bindings/20200702-203915
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
cd77006e01b3198c75fb7819b3d0ff89709539bb
config: x86_64-allyesconfig (attached as .config)
compiler: clang version 11.0.0 (https://github.com/llvm/llvm-project 
003a086ffc0d1affbb8300b36225fb8150a2d40a)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/bridge/tc358775.c:457:2: warning: variable 'bpc' is used 
>> uninitialized whenever switch default is taken [-Wsometimes-uninitialized]
   default:
   ^~~
   drivers/gpu/drm/bridge/tc358775.c:464:34: note: uninitialized use occurs here
   dsiclk = mode->crtc_clock * 3 * bpc / tc->num_dsi_lanes / 1000;
   ^~~
   drivers/gpu/drm/bridge/tc358775.c:387:14: note: initialize the variable 
'bpc' to silence this warning
   u8 link, bpc;
   ^
= '\0'
>> drivers/gpu/drm/bridge/tc358775.c:527:1: warning: no previous prototype for 
>> function 'tc_mode_valid' [-Wmissing-prototypes]
   tc_mode_valid(struct drm_bridge *bridge,
   ^
   drivers/gpu/drm/bridge/tc358775.c:526:1: note: declare 'static' if the 
function is not intended to be used outside of this translation unit
   enum drm_mode_status
   ^
   static 
   drivers/gpu/drm/bridge/tc358775.c:566:8: warning: variable 'len' is used 
uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
   if (endpoint) {
   ^~~~
   drivers/gpu/drm/bridge/tc358775.c:579:22: note: uninitialized use occurs here
   tc->num_dsi_lanes = len / sizeof(u32);
   ^~~
   drivers/gpu/drm/bridge/tc358775.c:566:4: note: remove the 'if' if its 
condition is always true
   if (endpoint) {
   ^~
   drivers/gpu/drm/bridge/tc358775.c:562:7: warning: variable 'len' is used 
uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
   if (parent) {
   ^~
   drivers/gpu/drm/bridge/tc358775.c:579:22: note: uninitialized use occurs here
   tc->num_dsi_lanes = len / sizeof(u32);
   ^~~
   drivers/gpu/drm/bridge/tc358775.c:562:3: note: remove the 'if' if its 
condition is always true
   if (parent) {
   ^~~~
   drivers/gpu/drm/bridge/tc358775.c:558:6: warning: variable 'len' is used 
uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
   if (endpoint) {
   ^~~~
   drivers/gpu/drm/bridge/tc358775.c:579:22: note: uninitialized use occurs here
   tc->num_dsi_lanes = len / sizeof(u32);
   ^~~
   drivers/gpu/drm/bridge/tc358775.c:558:2: note: remove the 'if' if its 
condition is always true
   if (endpoint) {
   ^~
   drivers/gpu/drm/bridge/tc358775.c:550:9: note: initialize the variable 'len' 
to silence this warning
   int len;
  ^
   = 0
   drivers/gpu/drm/bridge/tc358775.c:662:16: error: incompatible function 
pointer types initializing 'enum drm_mode_status (*)(struct drm_bridge *, const 
struct drm_display_mode *)' with an expression of type 'enum drm_mode_status 
(struct drm_bridge *, const struct drm_display_info *, const struct 
drm_display_mode *)' [-Werror,-Wincompatible-function-pointer-types]
   .mode_valid = tc_mode_valid,
 ^
   5 warnings and 1 error generated.

vim +/bpc +457 drivers/gpu/drm/bridge/tc358775.c

   379  
   380  static void tc_bridge_enable(struct drm_bridge *bridge)
   381  {
   382  struct tc_data *tc = bridge_to_tc(bridge);
   383  u32 hback_porch, hsync_len, hfront_porch, hactive, htime1, 
htime2;
   384  u32 vback_porch, vsync_len, vfront_porch, vactive, vtime1, 
vtime2;
   385  u32 val =

Re: [PATCH] drm/msm/dpu: fix wrong return value in dpu_encoder_init()

2020-07-02 Thread Tianjia Zhang





On 2020/7/2 22:04, Markus Elfring wrote:

A positive value ENOMEM is returned here. I thinr this is a typo error.
It is necessary to return a negative error value.


I imagine that a small adjustment could be nice for this change description.

How do you think about to follow progress for the integration of
a previous patch like “[RESEND] drm/msm/dpu: fix error return code in 
dpu_encoder_init”?
https://lore.kernel.org/dri-devel/20200618062803.152097-1-chentao...@huawei.com/
https://lore.kernel.org/patchwork/patch/1257957/
https://lkml.org/lkml/2020/6/18/46

Regards,
Markus



This is the same fix, please ignore this patch.

Thanks,
Tianjia

[PATCH v2] f2fs: add symbolic link to kobject in sysfs

2020-07-02 Thread Daeho Jeong

From: Daeho Jeong 

Added a symbolic link to directory of sysfs. It will
create a symbolic link such as "mount_0" and "mount_1" to
each f2fs mount in the order of mounting filesystem. But
once one mount point was umounted, that sequential number
@x in "mount_@x" could be reused by later newly mounted
point. It will provide easy access to sysfs node even if
not knowing the specific device node name like sda19 and
dm-3.

Signed-off-by: Daeho Jeong 
---
 fs/f2fs/f2fs.h  |  4 
 fs/f2fs/sysfs.c | 31 +++
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 4b28fd42fdbc..7d6c5f8ce16b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1419,6 +1419,8 @@ struct decompress_io_ctx {
 #define MAX_COMPRESS_LOG_SIZE  8
 #define MAX_COMPRESS_WINDOW_SIZE   ((PAGE_SIZE) << MAX_COMPRESS_LOG_SIZE)
 
+#define MOUNT_NAME_SIZE20
+
 struct f2fs_sb_info {
struct super_block *sb; /* pointer to VFS super block */
struct proc_dir_entry *s_proc;  /* proc entry */
@@ -1599,6 +1601,8 @@ struct f2fs_sb_info {
/* For sysfs suppport */
struct kobject s_kobj;
struct completion s_kobj_unregister;
+   int s_mount_id;
+   char s_mount_name[MOUNT_NAME_SIZE];
 
/* For shrinker support */
struct list_head s_list;
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 2a140657fc4d..703d9f460d03 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -18,6 +18,7 @@
 #include 
 
 static struct proc_dir_entry *f2fs_proc_root;
+static struct ida f2fs_mount_ida;
 
 /* Sysfs support for f2fs */
 enum {
@@ -908,6 +909,9 @@ int __init f2fs_init_sysfs(void)
} else {
f2fs_proc_root = proc_mkdir("fs/f2fs", NULL);
}
+
+   ida_init(_mount_ida);
+
return ret;
 }
 
@@ -917,6 +921,7 @@ void f2fs_exit_sysfs(void)
kset_unregister(_kset);
remove_proc_entry("fs/f2fs", NULL);
f2fs_proc_root = NULL;
+   ida_destroy(_mount_ida);
 }
 
 int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
@@ -928,12 +933,22 @@ int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
init_completion(>s_kobj_unregister);
err = kobject_init_and_add(>s_kobj, _sb_ktype, NULL,
"%s", sb->s_id);
-   if (err) {
-   kobject_put(>s_kobj);
-   wait_for_completion(>s_kobj_unregister);
-   return err;
+   if (err)
+   goto err1;
+
+   sbi->s_mount_id = ida_simple_get(_mount_ida, 0, 0, GFP_KERNEL);
+   if (sbi->s_mount_id < 0) {
+   err = sbi->s_mount_id;
+   goto err1;
}
 
+   snprintf(sbi->s_mount_name, MOUNT_NAME_SIZE, "mount_%d",
+   sbi->s_mount_id);
+   err = sysfs_create_link(_kset.kobj, >s_kobj,
+   sbi->s_mount_name);
+   if (err)
+   goto err2;
+
if (f2fs_proc_root)
sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
 
@@ -948,6 +963,12 @@ int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
victim_bits_seq_show, sb);
}
return 0;
+err2:
+   ida_simple_remove(_mount_ida, sbi->s_mount_id);
+err1:
+   kobject_put(>s_kobj);
+   wait_for_completion(>s_kobj_unregister);
+   return err;
 }
 
 void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi)
@@ -959,6 +980,8 @@ void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi)
remove_proc_entry("victim_bits", sbi->s_proc);
remove_proc_entry(sbi->sb->s_id, f2fs_proc_root);
}
+   sysfs_remove_link(_kset.kobj, sbi->s_mount_name);
+   ida_simple_remove(_mount_ida, sbi->s_mount_id);
kobject_del(>s_kobj);
kobject_put(>s_kobj);
 }
-- 
2.27.0.383.g050319c2ae-goog

[PATCH] MIPS: Do not use smp_processor_id() in preemptible code

2020-07-02 Thread Xingxing Su

Use preempt_disable() to fix the following bug under CONFIG_DEBUG_PREEMPT.

[   21.915305] BUG: using smp_processor_id() in preemptible [] code: 
qemu-system-mip/1056
[   21.923996] caller is do_ri+0x1d4/0x690
[   21.927921] CPU: 0 PID: 1056 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
[   21.934913] Stack : 0001 8137 8071cd60 
a80f926d5ac95694
[   21.942984] a80f926d5ac95694  9807f0043c88 
80f2fe40
[   21.951054]   0001 

[   21.959123] 802d60cc 9807f0043dd8 81f4b1e8 
81f6
[   21.967192] 81f6 80fe  

[   21.975261] f500cce1 0001 0002 

[   21.983331] 80fe1a40 0006 8077f940 

[   21.991401] 8146 9807f004 9807f0043c80 
00fffba8cf20
[   21.999471] 8071cd60   

[   22.007541]   80212ab4 
a80f926d5ac95694
[   22.015610] ...
[   22.018086] Call Trace:
[   22.020562] [] show_stack+0xa4/0x138
[   22.025732] [] dump_stack+0xf0/0x150
[   22.030903] [] check_preemption_disabled+0xf4/0x100
[   22.037375] [] do_ri+0x1d4/0x690
[   22.042198] [] handle_ri_int+0x44/0x5c
[   24.359386] BUG: using smp_processor_id() in preemptible [] code: 
qemu-system-mip/1072
[   24.368204] caller is do_ri+0x1a8/0x690
[   24.372169] CPU: 4 PID: 1072 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
[   24.379170] Stack : 0001 8137 8071cd60 
a80f926d5ac95694
[   24.387246] a80f926d5ac95694  98001007ef06bc88 
80f2fe40
[   24.395318]   0001 

[   24.403389] 802d60cc 98001007ef06bdd8 81f4b818 
81f6
[   24.411461] 81f6 80fe  

[   24.419533] f500cce1 0001 0002 

[   24.427603] 80fe 0006 8077f940 
0020
[   24.435673] 81460020 98001007ef068000 98001007ef06bc80 
00fff370
[   24.443745] 8071cd60   

[   24.451816]   80212ab4 
a80f926d5ac95694
[   24.459887] ...
[   24.462367] Call Trace:
[   24.464846] [] show_stack+0xa4/0x138
[   24.470029] [] dump_stack+0xf0/0x150
[   24.475208] [] check_preemption_disabled+0xf4/0x100
[   24.481682] [] do_ri+0x1a8/0x690
[   24.486509] [] handle_ri_int+0x44/0x5c

Signed-off-by: Xingxing Su 
---
 arch/mips/kernel/traps.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 7c32c95..a46ce94 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -723,12 +723,14 @@ static int simulate_loongson3_cpucfg(struct pt_regs *regs,
perf_sw_event(PERF_COUNT_SW_EMULATION_FAULTS, 1, regs, 0);
 
/* Do not emulate on unsupported core models. */
-   if (!loongson3_cpucfg_emulation_enabled(_cpu_data))
+   preempt_disable();
+   if (!loongson3_cpucfg_emulation_enabled(_cpu_data)) {
+   preempt_enable();
return -1;
-
+   }
regs->regs[rd] = loongson3_cpucfg_read_synthesized(
_cpu_data, sel);
-
+   preempt_enable();
return 0;
}
 
-- 
2.1.0

Re: [PATCH v2 1/3] crypto: permit users to specify numa node of acomp hardware

2020-07-02 Thread Herbert Xu

On Tue, Jun 23, 2020 at 04:16:08PM +1200, Barry Song wrote:
>
> -void *crypto_create_tfm(struct crypto_alg *alg,
> - const struct crypto_type *frontend)
> +void *crypto_create_tfm_node(struct crypto_alg *alg,
> + const struct crypto_type *frontend,
> + int node)
>  {
>   char *mem;
>   struct crypto_tfm *tfm = NULL;
> @@ -451,6 +452,7 @@ void *crypto_create_tfm(struct crypto_alg *alg,
>  
>   tfm = (struct crypto_tfm *)(mem + tfmsize);
>   tfm->__crt_alg = alg;
> + tfm->node = node;

Should the kzalloc also use node?

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH V2] scsi: powertec: Fix different dev_id between 'request_irq()' and 'free_irq()'

2020-07-02 Thread Martin K. Petersen

On Fri, 26 Jun 2020 05:59:48 +0200, Christophe JAILLET wrote:

> The dev_id used in 'request_irq()' and 'free_irq()' should match.
> So use 'info' in both cases.

Applied to 5.9/scsi-queue, thanks!

[1/1] scsi: powertec: Fix different dev_id between request_irq() and free_irq()
  https://git.kernel.org/mkp/scsi/c/d179f7c76324

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 2/2] perf tools: Fix record failure when mixed with ARM SPE event

2020-07-02 Thread liwei (GF)

Hi Mathieu,

On 2020/7/3 7:03, Mathieu Poirier wrote:
> Hi Li,
> 
> On Tue, Jun 23, 2020 at 08:31:41PM +0800, Wei Li wrote:
>> When recording with cache-misses and arm_spe_x event, i found that
>> it will just fail without showing any error info if i put cache-misses
>> after arm_spe_x event.
>>
>> [root@localhost 0620]# perf record -e cache-misses -e \
>> arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,\
>> jitter=1,store_filter=1,min_latency=0/ sleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.067 MB perf.data ]
>> [root@localhost 0620]# perf record -e \
>> arm_spe_0/ts_enable=1,pct_enable=1,pa_enable=1,load_filter=1,jitter=1,\
>> store_filter=1,min_latency=0/ -e cache-misses sleep 1
>> [root@localhost 0620]#
>>
>> Finally, i found the reason is that the parameter 'arm_spe_pmu' passed to
>> arm_spe_recording_init() in auxtrace_record__init() is wrong. When the
>> arm_spe_x event is not the last event, 'arm_spe_pmus[i]' will be out of
>> bounds.
> 
> Yes, this indeed broken.  
> 
> The current code can only work if the only event to be
> traced is an arm_spe_X, or if it is the last event to be specified.
> Otherwise the last event type will be checked against all the
> arm_spe_pmus[i]->types, none will match and an out of bound i index will be
> used in arm_spc_recording_init().
> 
> Since this problem is not easy to figure out please include the above
> explanation in the changelog.

OK.
>>
>> It seems that the code can't support concurrent multiple different
>> arm_spe_x events currently. So add the code to check and record the
>> found 'arm_spe_pmu' to fix this issue.
>>
>> In fact, we don't support concurrent multiple same arm_spe_x events either,
>> that is checked in arm_spe_recording_options(), and it will show the
>> relevant info.
>>
>> Fixes: ffd3d18c20b8d ("perf tools: Add ARM Statistical Profiling Extensions 
>> (SPE) support")
>> Signed-off-by: Wei Li 
>> ---
>>  tools/perf/arch/arm/util/auxtrace.c | 10 +-
>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/arch/arm/util/auxtrace.c 
>> b/tools/perf/arch/arm/util/auxtrace.c
>> index 62b7b03d691a..7bb6f29e766c 100644
>> --- a/tools/perf/arch/arm/util/auxtrace.c
>> +++ b/tools/perf/arch/arm/util/auxtrace.c
>> @@ -58,6 +58,7 @@ struct auxtrace_record
>>  bool found_etm = false;
>>  bool found_spe = false;
>>  static struct perf_pmu **arm_spe_pmus;
>> +static struct perf_pmu *arm_spe_pmu;
> 
> As far as I can tell the "static" doesn't do anything.
> 
I will remove that in v2.
>>  static int nr_spes = 0;
>>  int i = 0;
>>  
>> @@ -77,6 +78,13 @@ struct auxtrace_record
>>  
>>  for (i = 0; i < nr_spes; i++) {
>>  if (evsel->core.attr.type == arm_spe_pmus[i]->type) {
>> +if (found_spe && (arm_spe_pmu != 
>> arm_spe_pmus[i])) {
>> +pr_err("Concurrent multiple SPE 
>> operation not currently supported\n");
>> +*err = -EOPNOTSUPP;
>> +return NULL;
>> +}
> 
> Instead of the above, which as you rightly pointed out, is also done in
> arm_spe_recording_options() it might be best to just fix the "if (!nr_spes)"
> condition:
> if (!nr_spes || arm_spe_pmu)
> continue

This is more brief, i will use 'found_spe' as 'arm_spe_pmu' is not initialized.
> Furthermore, instead of having a new arm_spe_pmu variable you could simply 
> make
> found_spe a struct perf_pmu.  That would be one less variable to take care of.
> 
>> +
>> +arm_spe_pmu = arm_spe_pmus[i];
>>  found_spe = true;
> 
> Last but not least do you know where the memory allocated for array 
> arm_spe_pmus
> is released?  If you can't find it either then we have a memory leak and it
> would be nice to have that fixed.
Yes, we have a memory leak here indeed, i forgot to free it in this function.
As 'arm_spe_pmus' is defined as static, i think the author meant to assign it 
only at the first call,
but this function is only called once when we executing 'record', should i go 
on fixing it
or just drop the patch 1?

> Regards
> Mathieu
> 
> PS: Leo Yan has spent a fair amount of time in the SPE code - please CC him on
> your next revision.
> 
Thanks,
Wei

[PATCH 2/2] KVM: VMX: Use KVM_POSSIBLE_CR*_GUEST_BITS to initialize guest/host masks

2020-07-02 Thread Sean Christopherson

Use the "common" KVM_POSSIBLE_CR*_GUEST_BITS defines to initialize the
CR0/CR4 guest host masks instead of duplicating most of the CR4 mask and
open coding the CR0 mask.  SVM doesn't utilize the masks, i.e. the masks
are effectively VMX specific even if they're not named as such.  This
avoids duplicate code, better documents the guest owned CR0 bit, and
eliminates the need for a build-time assertion to keep VMX and x86
synchronized.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/vmx/nested.c |  4 ++--
 arch/x86/kvm/vmx/vmx.c| 15 +--
 2 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d1af20b050a8..b26655104d4a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4109,7 +4109,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
 * CR0_GUEST_HOST_MASK is already set in the original vmcs01
 * (KVM doesn't change it);
 */
-   vcpu->arch.cr0_guest_owned_bits = X86_CR0_TS;
+   vcpu->arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS;
vmx_set_cr0(vcpu, vmcs12->host_cr0);
 
/* Same as above - no reason to call set_cr4_guest_host_mask().  */
@@ -4259,7 +4259,7 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu 
*vcpu)
 */
vmx_set_efer(vcpu, nested_vmx_get_vmcs01_guest_efer(vmx));
 
-   vcpu->arch.cr0_guest_owned_bits = X86_CR0_TS;
+   vcpu->arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS;
vmx_set_cr0(vcpu, vmcs_readl(CR0_READ_SHADOW));
 
vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7fc5ca9cb5a0..2a42c86746f7 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -133,9 +133,6 @@ module_param_named(preemption_timer, 
enable_preemption_timer, bool, S_IRUGO);
 #define KVM_VM_CR0_ALWAYS_ON   \
(KVM_VM_CR0_ALWAYS_ON_UNRESTRICTED_GUEST |  \
 X86_CR0_WP | X86_CR0_PG | X86_CR0_PE)
-#define KVM_CR4_GUEST_OWNED_BITS \
-   (X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_TSD)
 
 #define KVM_VM_CR4_ALWAYS_ON_UNRESTRICTED_GUEST X86_CR4_VMXE
 #define KVM_PMODE_VM_CR4_ALWAYS_ON (X86_CR4_PAE | X86_CR4_VMXE)
@@ -4034,11 +4031,9 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
 {
-   BUILD_BUG_ON(KVM_CR4_GUEST_OWNED_BITS & ~KVM_POSSIBLE_CR4_GUEST_BITS);
-
-   vmx->vcpu.arch.cr4_guest_owned_bits = KVM_CR4_GUEST_OWNED_BITS;
-   if (enable_ept)
-   vmx->vcpu.arch.cr4_guest_owned_bits |= X86_CR4_PGE;
+   vmx->vcpu.arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS;
+   if (!enable_ept)
+   vmx->vcpu.arch.cr4_guest_owned_bits &= ~X86_CR4_PGE;
if (is_guest_mode(>vcpu))
vmx->vcpu.arch.cr4_guest_owned_bits &=
~get_vmcs12(>vcpu)->cr4_guest_host_mask;
@@ -4335,8 +4330,8 @@ static void init_vmcs(struct vcpu_vmx *vmx)
/* 22.2.1, 20.8.1 */
vm_entry_controls_set(vmx, vmx_vmentry_ctrl());
 
-   vmx->vcpu.arch.cr0_guest_owned_bits = X86_CR0_TS;
-   vmcs_writel(CR0_GUEST_HOST_MASK, ~X86_CR0_TS);
+   vmx->vcpu.arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS;
+   vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits);
 
set_cr4_guest_host_mask(vmx);
 
-- 
2.26.0

[PATCH 0/2] KVM: VMX: CR0/CR4 guest/host masks cleanup

2020-07-02 Thread Sean Christopherson

Fix a bug where CR4.TSD isn't correctly marked as being possibly owned by
the guest in the common x86 macro, then clean up the mess that made the
bug possible by throwing away VMX's mix of duplicate code and open coded
tweaks.  The lack of a define for the guest-owned CR0 bit has bugged me
for a long time, but adding another define always seemed ridiculous.

Sean Christopherson (2):
  KVM: x86: Mark CR4.TSD as being possibly owned by the guest
  KVM: VMX: Use KVM_POSSIBLE_CR*_GUEST_BITS to initialize guest/host
masks

 arch/x86/kvm/kvm_cache_regs.h |  2 +-
 arch/x86/kvm/vmx/nested.c |  4 ++--
 arch/x86/kvm/vmx/vmx.c| 13 +
 3 files changed, 8 insertions(+), 11 deletions(-)

-- 
2.26.0

Re: [PATCH V2] scsi: eesox: Fix different dev_id between 'request_irq()' and 'free_irq()'

2020-07-02 Thread Martin K. Petersen

On Fri, 26 Jun 2020 06:05:53 +0200, Christophe JAILLET wrote:

> The dev_id used in 'request_irq()' and 'free_irq()' should match.
> So use 'info' in both cases.

Applied to 5.9/scsi-queue, thanks!

[1/1] scsi: eesox: Fix different dev_id between request_irq() and free_irq()
  https://git.kernel.org/mkp/scsi/c/86f2da1112cc

-- 
Martin K. Petersen  Oracle Linux Engineering

[PATCH 1/2] KVM: x86: Mark CR4.TSD as being possibly owned by the guest

2020-07-02 Thread Sean Christopherson

Mark CR4.TSD as being possibly owned by the guest as that is indeed the
case on VMX.  Without TSD being tagged as possibly owned by the guest, a
targeted read of CR4 to get TSD could observe a stale value.  This bug
is benign in the current code base as the sole consumer of TSD is the
emulator (for RDTSC) and the emulator always "reads" the entirety of CR4
when grabbing bits.

Add a build-time assertion in to ensure VMX doesn't hand over more CR4
bits without also updating x86.

Fixes: 52ce3c21aec3 ("x86,kvm,vmx: Don't trap writes to CR4.TSD")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/kvm_cache_regs.h | 2 +-
 arch/x86/kvm/vmx/vmx.c| 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index ff2d0e9ca3bc..cfe83d4ae625 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -7,7 +7,7 @@
 #define KVM_POSSIBLE_CR0_GUEST_BITS X86_CR0_TS
 #define KVM_POSSIBLE_CR4_GUEST_BITS  \
(X86_CR4_PVI | X86_CR4_DE | X86_CR4_PCE | X86_CR4_OSFXSR  \
-| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE)
+| X86_CR4_OSXMMEXCPT | X86_CR4_LA57 | X86_CR4_PGE | X86_CR4_TSD)
 
 #define BUILD_KVM_GPR_ACCESSORS(lname, uname)\
 static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b1a23ad986ff..7fc5ca9cb5a0 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4034,6 +4034,8 @@ void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
 
 void set_cr4_guest_host_mask(struct vcpu_vmx *vmx)
 {
+   BUILD_BUG_ON(KVM_CR4_GUEST_OWNED_BITS & ~KVM_POSSIBLE_CR4_GUEST_BITS);
+
vmx->vcpu.arch.cr4_guest_owned_bits = KVM_CR4_GUEST_OWNED_BITS;
if (enable_ept)
vmx->vcpu.arch.cr4_guest_owned_bits |= X86_CR4_PGE;
-- 
2.26.0

[PATCH 1/2] hwmon: shtc1: add support for device tree bindings

2020-07-02 Thread Chris Ruehl

Add support for DTS bindings to the shtc driver, use CONFIG_OF
to compile in the code if needed.

Signed-off-by: Chris Ruehl 
---
 drivers/hwmon/shtc1.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/drivers/hwmon/shtc1.c b/drivers/hwmon/shtc1.c
index a0078ccede03..3bcabc1cbce8 100644
--- a/drivers/hwmon/shtc1.c
+++ b/drivers/hwmon/shtc1.c
@@ -14,6 +14,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_OF
+#include 
+#endif
 
 /* commands (high precision mode) */
 static const unsigned char shtc1_cmd_measure_blocking_hpm[]= { 0x7C, 0xA2 
};
@@ -196,6 +199,10 @@ static int shtc1_probe(struct i2c_client *client,
enum shtcx_chips chip = id->driver_data;
struct i2c_adapter *adap = client->adapter;
struct device *dev = >dev;
+#ifdef CONFIG_OF
+   struct device_node *np = dev->of_node;
+   u8 value;
+#endif
 
if (!i2c_check_functionality(adap, I2C_FUNC_I2C)) {
dev_err(dev, "plain i2c transactions not supported\n");
@@ -235,6 +242,20 @@ static int shtc1_probe(struct i2c_client *client,
 
if (client->dev.platform_data)
data->setup = *(struct shtc1_platform_data *)dev->platform_data;
+
+#ifdef CONFIG_OF
+   if (np) {
+   if (of_property_read_bool(np, "sensirion,blocking_io")) {
+   of_property_read_u8(np, "sensirion,blocking_io", 
);
+   data->setup.blocking_io = (value > 0) ? true : false;
+   }
+   if (of_property_read_bool(np, "sensicon,high_precision")) {
+   of_property_read_u8(np, "sensirion,high_precision", 
);
+   data->setup.high_precision = (value > 0) ? true : false;
+   }
+   }
+#endif
+
shtc1_select_command(data);
mutex_init(>update_lock);
 
@@ -257,6 +278,15 @@ static const struct i2c_device_id shtc1_id[] = {
 };
 MODULE_DEVICE_TABLE(i2c, shtc1_id);
 
+#ifdef CONFIG_OF
+static const struct of_device_id shtc1_of_match[] = {
+   { .compatible = "sensirion,shtc1" },
+   { .compatible = "sensirion,shtw1" },
+   { .compatible = "sensirion,shtc3" },
+   { }
+};
+MODULE_DEVICE_TABLE(of, shtc1_of_match);
+#endif
 static struct i2c_driver shtc1_i2c_driver = {
.driver.name  = "shtc1",
.probe= shtc1_probe,
-- 
2.20.1

[PATCH] shtc1: add support for device tree bindings

2020-07-02 Thread Chris Ruehl

Add support for DTS bindings to the shtc driver
The patches add the compatible table and of_property_read* to the
shtc1.c. Newly created Yaml document has been released to the
Documentation/devicetree/hwmon/sensirion,shtc1.yaml

Signed-off-by: Chris Ruehl 
---
 Version 1

[PATCH 2/2] devicetree: hwmon: shtc1: Add sensirion,shtc1.yaml

2020-07-02 Thread Chris Ruehl

Add documentation for the newly added DTS support in the shtc1 driver.

Signed-off-by: Chris Ruehl 
---
 .../bindings/hwmon/sensirion,shtc1.yaml   | 53 +++
 1 file changed, 53 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml

diff --git a/Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml 
b/Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml
new file mode 100644
index ..e3e292bc6d7d
--- /dev/null
+++ b/Documentation/devicetree/bindings/hwmon/sensirion,shtc1.yaml
@@ -0,0 +1,53 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/hwmon/sensirion,shtc1.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Sensirion SHTC1 Humidity and Temperature Sensor IC
+
+maintainers:
+  - jdelv...@suse.com
+
+description: |
+  The SHTC1, SHTW1 and SHTC3 are digital humidity and temperature sensor
+  designed especially for battery-driven high-volume consumer electronics
+  applications.
+  For further information refere to Documentation/hwmon/shtc1.rst
+
+  This binding document describes the binding for the hardware monitor
+  portion of the driver.
+
+properties:
+  compatible:
+enum:
+  - sensirion,shtc1
+  - sensirion,shtw1
+  - sensirion,shtc3
+
+  reg: I2C address 0x70
+
+Optional properties:
+  sensirion,blocking_io: |
+u8, if > 0 the i2c bus hold until measure finished (default 0)
+  sensirion,high_precision: |
+u8, if > 0 aquire data with high precision (default 1)
+
+required:
+  - compatible
+  - reg
+
+additionalProperties: false
+
+Example:
+   {
+status = "okay";
+clock-frequency = <40>;
+
+shtc3@70 {
+  compatible = "sensirion,shtc3";
+  reg = <0x70>
+  sensirion,blocking_io = <1>;
+  status = "okay";
+};
+  };
-- 
2.20.1

Re: objtool clac/stac handling change..

2020-07-02 Thread Michael Ellerman

Linus Torvalds  writes:
> On Thu, Jul 2, 2020 at 8:13 AM Christophe Leroy
>  wrote:
>>
>> Isn't it something easy to do in bad_page_fault() ?
>
> Can't the user access functions take any other faults on ppc?

Yes they definitely can.

I think I can enumerate all the possibilities on 64-bit, but I don't
know all the possibilities on all the 32-bit variants.

> On x86-64, we have the "address is non-canonical" case which doesn't
> take a page fault at all, but takes a general protection fault
> instead.

Right. On P9 radix we have an address-out-of-page-table-range exception
which I guess is similar, though that does end up at bad_page_fault() in
our case.

> But note that depending on how you nest and save/restore the state,
> things can be very subtle.
>
> For example, what can happen is:
>
>  (a) user_access_begin()..
>
>  (b) we take a normal interrupt
>
>  (c) the interrupt code does something that has an exception handling
> case entirely unrelated to the user access (on x86, it might be the
> "unsafe_msr' logic, for example.
>
>  (d) we take that exception, do "fixup_exception()" for whatever that
> interrupt did.
>
>  (e) we return from that exception to the fixed up state
>
>  (d) we return from the interrupt
>
>  (e) we should still have user accesses enabled.

Yes.

We broke that a few times when developing the KUAP support, which is why
I added bad_kuap_fault() to report the case where we are in a uaccess
region but are being blocked unexpectedly by KUAP.

> NOTE! on x86, we can have "all fixup_exceptions() will clear AC in the
> exception pt_regs", because AC is part of rflags which is saved on
> (and cleared for the duration of) all interrupt and exceptions.
>
> So what happens is that on x86 all of (b)-(d) will run with AC clear
> and no user accesses allowed, and (e) will have user accesses enabled
> again, because the "fixup_exception()" at (d) only affected the state
> of the interrupt hander (which already had AC clear anyway).
>
> But I don't think exceptions and interrupts save/restore the user
> access state on powerpc, do they?

Not implicitly.

We manually save it into pt_regs on the stack in the exception entry. On
64-bit it's done in kuap_save_amr_and_lock. 32-bit does it in
kuap_save_and_lock.

And then on the return path it's kuap_restore_amr() on 64-bit, and
kuap_restore on 32-bit.

> So on powerpc you do need to be more careful. You would only need to
> disable user access on exceptions that happen _on_ user accesses.
>
> The easiest way to do that is to do what x86 does: different
> exceptions have different handlers. It's not what we did originally,
> but it's been useful.
>
> Hmm.
>
> And again, on x86, this all works fine because of how exceptions
> save/restore the user_access state and it all nests fine. But I'm
> starting to wonder how the nesting works AT ALL for powerpc?
>
> Because that nesting happens even without
>
> IOW, even aside from this whole thing, what happens on PPC, when you have

I'll annotate what happens for the 64-bit case as it's the one I know
best:

>  (a) user_access_begin();
 - mtspr(SPRN_AMR, 0)   // 0 means loads & stores permitted

>  - profile NMI or interrupt happens
 - pt_regs->kuap = mfspr(SPRN_AMR)
 - mtspr(SPRN_AMR, AMR_KUAP_BLOCKED)

>  - it wants to do user stack tracing so does
> pagefault_disable();
>(b) get_user();
 mtspr(SPRN_AMR, 0)
 ld rN,  pagefault_enable();
>- profile NMI/interrupt returns
   - mtspr(SPRN_AMR, pt_regs->kuap)
   - return from interrupt

>  (c) user accesss should work here!
>
> even if the "get_user()" in (b) would have done a
> "user_access_begin/end" pair, and regardless of whether (b) might have
> triggered a "fixup_exception()", and whether that fixup_exception()
> then did the user_access_end().
>
> On x86, this is all ok exactly because of how we only have the AC bit,
> and it nests very naturally with any exception handling.
>
> Is the ppc code nesting-safe? Particularly since it has that whole
> range-handling?

Yeah I think it is.

The range handling on 32-bit books follows the same pattern as above,
except that on exception entry we don't save the content of an SPR to
pt_regs, instead we save current->thread.kuap. (Because there isn't a
single SPR that contains the KUAP state).

cheers

[PATCH v10 3/5] arm64: kdump: add memory for devices by DT property linux,usable-memory-range

2020-07-02 Thread Chen Zhou

If we want to reserve crashkernel above 4G, we could use parameters
"crashkernel=X crashkernel=Y,low", in this case, specified size low
memory is reserved for crash dump kernel devices and never mapped by
the first kernel. This memory range is advertised to crash dump kernel
via DT property under /chosen,
linux,usable-memory-range = 

We reused the DT property linux,usable-memory-range and made the low
memory region as the second range "BASE2 SIZE2", which keeps compatibility
with existing user-space and older kdump kernels.

Crash dump kernel reads this property at boot time and call memblock_add()
to add the low memory region after memblock_cap_memory_range() has been
called.

Signed-off-by: Chen Zhou 
Tested-by: John Donnelly 
Tested-by: Prabhakar Kushwaha 
---
 arch/arm64/mm/init.c | 43 +--
 1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ce7ced85f5fb..f5b31e8f1f34 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -69,6 +69,15 @@ EXPORT_SYMBOL(vmemmap);
 phys_addr_t arm64_dma_phys_limit __ro_after_init;
 static phys_addr_t arm64_dma32_phys_limit __ro_after_init;
 
+/*
+ * The main usage of linux,usable-memory-range is for crash dump kernel.
+ * Originally, the number of usable-memory regions is one. Now crash dump
+ * kernel support at most two regions, low region and high region.
+ * To make compatibility with existing user-space and older kdump, the low
+ * region is always the last range of linux,usable-memory-range if exist.
+ */
+#define MAX_USABLE_RANGES  2
+
 #ifdef CONFIG_KEXEC_CORE
 /*
  * reserve_crashkernel() - reserves memory for crash kernel
@@ -272,9 +281,9 @@ early_param("mem", early_mem);
 static int __init early_init_dt_scan_usablemem(unsigned long node,
const char *uname, int depth, void *data)
 {
-   struct memblock_region *usablemem = data;
-   const __be32 *reg;
-   int len;
+   struct memblock_region *usable_rgns = data;
+   const __be32 *reg, *endp;
+   int len, nr = 0;
 
if (depth != 1 || strcmp(uname, "chosen") != 0)
return 0;
@@ -283,22 +292,36 @@ static int __init early_init_dt_scan_usablemem(unsigned 
long node,
if (!reg || (len < (dt_root_addr_cells + dt_root_size_cells)))
return 1;
 
-   usablemem->base = dt_mem_next_cell(dt_root_addr_cells, );
-   usablemem->size = dt_mem_next_cell(dt_root_size_cells, );
+   endp = reg + (len / sizeof(__be32));
+   while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+   usable_rgns[nr].base = dt_mem_next_cell(dt_root_addr_cells, 
);
+   usable_rgns[nr].size = dt_mem_next_cell(dt_root_size_cells, 
);
+
+   if (++nr >= MAX_USABLE_RANGES)
+   break;
+   }
 
return 1;
 }
 
 static void __init fdt_enforce_memory_region(void)
 {
-   struct memblock_region reg = {
-   .size = 0,
+   struct memblock_region usable_rgns[MAX_USABLE_RANGES] = {
+   { .size = 0 },
+   { .size = 0 }
};
 
-   of_scan_flat_dt(early_init_dt_scan_usablemem, );
+   of_scan_flat_dt(early_init_dt_scan_usablemem, _rgns);
 
-   if (reg.size)
-   memblock_cap_memory_range(reg.base, reg.size);
+   /*
+* The first range of usable-memory regions is for crash dump
+* kernel with only one region or for high region with two regions,
+* the second range is dedicated for low region if exist.
+*/
+   if (usable_rgns[0].size)
+   memblock_cap_memory_range(usable_rgns[0].base, 
usable_rgns[0].size);
+   if (usable_rgns[1].size)
+   memblock_add(usable_rgns[1].base, usable_rgns[1].size);
 }
 
 void __init arm64_memblock_init(void)
-- 
2.20.1

[PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-02 Thread Chen Zhou

commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
broken the arm64 kdump. If the memory reserved for crash dump kernel
falled in ZONE_DMA32, the devices in crash dump kernel need to use
ZONE_DMA will alloc fail.

This patch addressed the above issue based on "reserving crashkernel
above 4G". Originally, we reserve low memory below 4G, and now just need
to adjust memory limit to arm64_dma_phys_limit in reserve_crashkernel_low
if ZONE_DMA is enabled. That is, if there are devices need to use ZONE_DMA
in crash dump kernel, it is a good choice to use parameters
"crashkernel=X crashkernel=Y,low".

Signed-off-by: Chen Zhou 
---
 kernel/crash_core.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a7580d291c37..e8ecbbc761a3 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
unsigned long long base, low_base = 0, low_size = 0;
unsigned long total_low_mem;
int ret;
+   phys_addr_t crash_max = 1ULL << 32;
 
total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
 
@@ -352,7 +353,11 @@ int __init reserve_crashkernel_low(void)
return 0;
}
 
-   low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
+#ifdef CONFIG_ARM64
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   crash_max = arm64_dma_phys_limit;
+#endif
+   low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
if (!low_base) {
pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
smaller size.\n",
   (unsigned long)(low_size >> 20));
-- 
2.20.1

[PATCH v10 2/5] arm64: kdump: reserve crashkenel above 4G for crash dump kernel

2020-07-02 Thread Chen Zhou

Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=X,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

Suggested by James, just introduced crashkernel=X,low to arm64. As memtioned
above, if crashkernel=X,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G, which is much simpler.

Signed-off-by: Chen Zhou 
Tested-by: John Donnelly 
Tested-by: Prabhakar Kushwaha 
---
 arch/arm64/kernel/setup.c |  8 +++-
 arch/arm64/mm/init.c  | 31 +--
 2 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 93b3844cf442..4dc51a2ac012 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -238,7 +238,13 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, _data);
 #ifdef CONFIG_KEXEC_CORE
-   /* Userspace will find "Crash kernel" region in /proc/iomem. */
+   /*
+* Userspace will find "Crash kernel" region in /proc/iomem.
+* Note: the low region is renamed as Crash kernel (low).
+*/
+   if (crashk_low_res.end && crashk_low_res.start >= res->start &&
+   crashk_low_res.end <= res->end)
+   request_resource(res, _low_res);
if (crashk_res.end && crashk_res.start >= res->start &&
crashk_res.end <= res->end)
request_resource(res, _res);
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 1e93cfc7c47a..ce7ced85f5fb 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -81,6 +81,7 @@ static void __init reserve_crashkernel(void)
 {
unsigned long long crash_base, crash_size;
int ret;
+   phys_addr_t crash_max = arm64_dma32_phys_limit;
 
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
_size, _base);
@@ -88,12 +89,38 @@ static void __init reserve_crashkernel(void)
if (ret || !crash_size)
return;
 
+   ret = reserve_crashkernel_low();
+   if (!ret && crashk_low_res.end) {
+   /*
+* If crashkernel=X,low specified, there may be two regions,
+* we need to make some changes as follows:
+*
+* 1. rename the low region as "Crash kernel (low)"
+* In order to distinct from the high region and make no effect
+* to the use of existing kexec-tools, rename the low region as
+* "Crash kernel (low)".
+*
+* 2. change the upper bound for crash memory
+* Set MEMBLOCK_ALLOC_ACCESSIBLE upper bound for crash memory.
+*
+* 3. mark the low region as "nomap"
+* The low region is intended to be used for crash dump kernel
+* devices, just mark the low region as "nomap" simply.
+*/
+   const char *rename = "Crash kernel (low)";
+
+   crashk_low_res.name = rename;
+   crash_max = MEMBLOCK_ALLOC_ACCESSIBLE;
+   memblock_mark_nomap(crashk_low_res.start,
+   resource_size(_low_res));
+   }
+
crash_size = PAGE_ALIGN(crash_size);
 
if (crash_base == 0) {
/* Current arm64 boot protocol requires 2MB alignment */
-   crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
-   crash_size, SZ_2M);
+   crash_base = memblock_find_in_range(0, crash_max, crash_size,
+   SZ_2M);
if (crash_base == 0) {
pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
crash_size);
-- 
2.20.1

[PATCH v10 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-02 Thread Chen Zhou

Now we support crashkernel=X,[low] on arm64, update the Documentation.
We could use parameters "crashkernel=X crashkernel=Y,low" to reserve
memory above 4G.

Signed-off-by: Chen Zhou 
Tested-by: John Donnelly 
Tested-by: Prabhakar Kushwaha 
---
 Documentation/admin-guide/kdump/kdump.rst   | 14 --
 Documentation/admin-guide/kernel-parameters.txt | 17 +++--
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kdump/kdump.rst 
b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..e80fc9e28a9a 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -299,7 +299,15 @@ Boot into System Kernel
"crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
starting at physical address 0x0100 (16MB) for the dump-capture kernel.
 
-   On x86 and x86_64, use "crashkernel=64M@16M".
+   On x86 use "crashkernel=64M@16M".
+
+   On x86_64, use "crashkernel=Y" to select a region under 4G first, and
+   fall back to reserve region above 4G.
+   We can also use "crashkernel=X,high" to select a region above 4G, which
+   also tries to allocate at least 256M below 4G automatically and
+   "crashkernel=Y,low" can be used to allocate specified size low memory.
+   Use "crashkernel=Y@X" if we really have to reserve memory from specified
+   start address X.
 
On ppc64, use "crashkernel=128M@32M".
 
@@ -316,8 +324,10 @@ Boot into System Kernel
kernel will automatically locate the crash kernel image within the
first 512MB of RAM if X is not given.
 
-   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
+   On arm64, use "crashkernel=Y[@X]". Note that the start address of
the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
+   If crashkernel=Z,low is specified simultaneously, reserve spcified size
+   low memory firstly and then reserve memory above 4G.
 
 Load the Dump-capture Kernel
 
diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79..58a731eed011 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -722,6 +722,9 @@
[KNL, x86_64] select a region under 4G first, and
fall back to reserve region above 4G when '@offset'
hasn't been specified.
+   [KNL, arm64] If crashkernel=X,low is specified, reserve
+   spcified size low memory firstly, and then reserve 
memory
+   above 4G.
See Documentation/admin-guide/kdump/kdump.rst for 
further details.
 
crashkernel=range1:size1[,range2:size2,...][@offset]
@@ -746,13 +749,23 @@
requires at least 64M+32K low memory, also enough extra
low memory is needed to make sure DMA buffers for 32-bit
devices won't run out. Kernel would try to allocate at
-   at least 256M below 4G automatically.
+   least 256M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
It will be ignored when crashkernel=X,high is not used
or memory reserved is below 4G.
-
+   [KNL, arm64] range under 4G.
+   This one let user to specify own low range under 4G
+   for crash dump kernel instead.
+   Be different from x86_64, kernel reserves specified size
+   physical memory region only when this parameter is 
specified
+   instead of trying to reserve at least 256M below 4G
+   automatically.
+   Use this parameter along with crashkernel=X when we want
+   to reserve crashkernel above 4G. If there are devices
+   need to use ZONE_DMA in crash dump kernel, it is also
+   a good choice.
cryptomgr.notests
[KNL] Disable crypto self-tests
 
-- 
2.20.1

[PATCH v10 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2020-07-02 Thread Chen Zhou

In preparation for supporting reserve_crashkernel_low in arm64 as
x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.

BTW, move x86_64 CRASH_ALIGN to 2M suggested by Dave. CONFIG_PHYSICAL_ALIGN
can be selected from 2M to 16M, move to the same as arm64.

Note, in arm64, we reserve low memory if and only if crashkernel=X,low
is specified. Different with x86_64, don't set low memory automatically.

Reported-by: kbuild test robot 
Signed-off-by: Chen Zhou 
Tested-by: John Donnelly 
Tested-by: Prabhakar Kushwaha 
Acked-by: Dave Young 
---
 arch/x86/kernel/setup.c| 66 -
 include/linux/crash_core.h |  3 ++
 include/linux/kexec.h  |  2 -
 kernel/crash_core.c| 85 ++
 kernel/kexec_core.c| 17 
 5 files changed, 96 insertions(+), 77 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a3767e74c758..33db99ae3035 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -401,8 +401,8 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 
 #ifdef CONFIG_KEXEC_CORE
 
-/* 16M alignment for crash kernel regions */
-#define CRASH_ALIGNSZ_16M
+/* 2M alignment for crash kernel regions */
+#define CRASH_ALIGNSZ_2M
 
 /*
  * Keep the crash kernel below this limit.
@@ -425,59 +425,6 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 # define CRASH_ADDR_HIGH_MAX   SZ_64T
 #endif
 
-static int __init reserve_crashkernel_low(void)
-{
-#ifdef CONFIG_X86_64
-   unsigned long long base, low_base = 0, low_size = 0;
-   unsigned long total_low_mem;
-   int ret;
-
-   total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
-
-   /* crashkernel=Y,low */
-   ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
_size, );
-   if (ret) {
-   /*
-* two parts from kernel/dma/swiotlb.c:
-* -swiotlb size: user-specified with swiotlb= or default.
-*
-* -swiotlb overflow buffer: now hardcoded to 32k. We round it
-* to 8M for other buffers that may need to stay low too. Also
-* make sure we allocate enough extra low memory so that we
-* don't run out of DMA buffers for 32-bit devices.
-*/
-   low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL 
<< 20);
-   } else {
-   /* passed with crashkernel=0,low ? */
-   if (!low_size)
-   return 0;
-   }
-
-   low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
-   if (!low_base) {
-   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
smaller size.\n",
-  (unsigned long)(low_size >> 20));
-   return -ENOMEM;
-   }
-
-   ret = memblock_reserve(low_base, low_size);
-   if (ret) {
-   pr_err("%s: Error reserving crashkernel low memblock.\n", 
__func__);
-   return ret;
-   }
-
-   pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System 
low RAM: %ldMB)\n",
-   (unsigned long)(low_size >> 20),
-   (unsigned long)(low_base >> 20),
-   (unsigned long)(total_low_mem >> 20));
-
-   crashk_low_res.start = low_base;
-   crashk_low_res.end   = low_base + low_size - 1;
-   insert_resource(_resource, _low_res);
-#endif
-   return 0;
-}
-
 static void __init reserve_crashkernel(void)
 {
unsigned long long crash_size, crash_base, total_mem;
@@ -541,9 +488,12 @@ static void __init reserve_crashkernel(void)
return;
}
 
-   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
-   memblock_free(crash_base, crash_size);
-   return;
+   if (crash_base >= (1ULL << 32)) {
+   if (reserve_crashkernel_low()) {
+   memblock_free(crash_base, crash_size);
+   return;
+   }
+   insert_resource(_resource, _low_res);
}
 
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System 
RAM: %ldMB)\n",
diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
index 525510a9f965..4df8c0bff03e 100644
--- a/include/linux/crash_core.h
+++ b/include/linux/crash_core.h
@@ -63,6 +63,8 @@ phys_addr_t paddr_vmcoreinfo_note(void);
 extern unsigned char *vmcoreinfo_data;
 extern size_t vmcoreinfo_size;
 extern u32 *vmcoreinfo_note;
+extern struct resource crashk_res;
+extern struct resource crashk_low_res;
 
 Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
  void *data, size_t data_len);
@@ -74,5 +76,6 @@ int parse_crashkernel_high(char *cmdline, unsigned long long 
system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
 int

[PATCH v10 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-07-02 Thread Chen Zhou

This patch series enable reserving crashkernel above 4G in arm64.

There are following issues in arm64 kdump:
1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
when there is no enough low memory.
2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
in this case, if swiotlb or DMA buffers are required, crash dump kernel
will boot failure because there is no low memory available for allocation.
3. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken
the arm64 kdump. If the memory reserved for crash dump kernel falled in
ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc
fail.

To solve these issues, introduce crashkernel=X,low to reserve specified
size low memory.
Crashkernel=X tries to reserve memory for the crash dump kernel under
4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
size low memory for crash kdump kernel devices firstly and then reserve
memory above 4G.

When crashkernel is reserved above 4G in memory and crashkernel=X,low
is specified simultaneously, kernel should reserve specified size low memory
for crash dump kernel devices. So there may be two crash kernel regions, one
is below 4G, the other is above 4G.
In order to distinct from the high region and make no effect to the use of
kexec-tools, rename the low region as "Crash kernel (low)", and pass the
low region by reusing DT property "linux,usable-memory-range". We made the low
memory region as the last range of "linux,usable-memory-range" to keep
compatibility with existing user-space and older kdump kernels.

Besides, we need to modify kexec-tools:
arm64: support more than one crash kernel regions(see [1])

Another update is document about DT property 'linux,usable-memory-range':
schemas: update 'linux,usable-memory-range' node schema(see [2])

The previous changes and discussions can be retrieved from:

Changes since [v9]
- Patch 1 add Acked-by from Dave.
- Update patch 5 according to Dave's comments.
- Update chosen schema.

Changes since [v8]
- Reuse DT property "linux,usable-memory-range".
Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the low
memory region.
- Fix kdump broken with ZONE_DMA reintroduced.
- Update chosen schema.

Changes since [v7]
- Move x86 CRASH_ALIGN to 2M
Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
- Update Documentation/devicetree/bindings/chosen.txt.
Add corresponding documentation to Documentation/devicetree/bindings/chosen.txt
suggested by Arnd.
- Add Tested-by from Jhon and pk.

Changes since [v6]
- Fix build errors reported by kbuild test robot.

Changes since [v5]
- Move reserve_crashkernel_low() into kernel/crash_core.c.
- Delete crashkernel=X,high.
- Modify crashkernel=X,low.
If crashkernel=X,low is specified simultaneously, reserve spcified size low
memory for crash kdump kernel devices firstly and then reserve memory above 4G.
In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
pass to crash dump kernel by DT property "linux,low-memory-range".
- Update Documentation/admin-guide/kdump/kdump.rst.

Changes since [v4]
- Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.

Changes since [v3]
- Add memblock_cap_memory_ranges back for multiple ranges.
- Fix some compiling warnings.

Changes since [v2]
- Split patch "arm64: kdump: support reserving crashkernel above 4G" as
two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
patch.

Changes since [v1]:
- Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
- Remove memblock_cap_memory_ranges() i added in v1 and implement that
in fdt_enforce_memory_region().
There are at most two crash kernel regions, for two crash kernel regions
case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
and then remove the memory range in the middle.

[1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
[2]: https://github.com/robherring/dt-schema/pull/19 
[v1]: https://lkml.org/lkml/2019/4/2/1174
[v2]: https://lkml.org/lkml/2019/4/9/86
[v3]: https://lkml.org/lkml/2019/4/9/306
[v4]: https://lkml.org/lkml/2019/4/15/273
[v5]: https://lkml.org/lkml/2019/5/6/1360
[v6]: https://lkml.org/lkml/2019/8/30/142
[v7]: https://lkml.org/lkml/2019/12/23/411
[v8]: https://lkml.org/lkml/2020/5/21/213
[v9]: https://lkml.org/lkml/2020/6/28/73

Chen Zhou (5):
  x86: kdump: move reserve_crashkernel_low() into crash_core.c
  arm64: kdump: reserve crashkenel above 4G for crash dump kernel
  arm64: kdump: add memory for devices by DT property
linux,usable-memory-range
  arm64: kdump: fix kdump broken with ZONE_DMA reintroduced
  kdump: update Documentation about crashkernel on arm64

 Documentation/admin-guide/kdump/kdump.rst | 14 ++-
 .../admin-guide/kernel-parameters.txt | 17 +++-
 arch/arm64/kernel/setup.c |  8 +-
 arch/arm64/mm/init.c  | 74 ---

RE: [PATCH v3 02/14] iommu: Report domain nesting info

2020-07-02 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, July 3, 2020 1:55 AM
> 
> On Wed, 24 Jun 2020 01:55:15 -0700
> Liu Yi L  wrote:
> 
> > IOMMUs that support nesting translation needs report the capability
> > info to userspace, e.g. the format of first level/stage paging structures.
> >
> > This patch reports nesting info by DOMAIN_ATTR_NESTING. Caller can get
> > nesting info after setting DOMAIN_ATTR_NESTING.
> >
> > v2 -> v3:
> > *) remvoe cap/ecap_mask in iommu_nesting_info.
> > *) reuse DOMAIN_ATTR_NESTING to get nesting info.
> > *) return an empty iommu_nesting_info for SMMU drivers per Jean'
> >suggestion.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan 
> > ---
> >  drivers/iommu/arm-smmu-v3.c | 29 --
> >  drivers/iommu/arm-smmu.c| 29 --
> >  include/uapi/linux/iommu.h  | 59
> > +
> >  3 files changed, 113 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index f578677..0c45d4d 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -3019,6 +3019,32 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
> > return group;
> >  }
> >
> > +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain
> *smmu_domain,
> > +   void *data)
> > +{
> > +   struct iommu_nesting_info *info = (struct iommu_nesting_info *) data;
> > +   u32 size;
> > +
> > +   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > +   return -ENODEV;
> > +
> > +   size = sizeof(struct iommu_nesting_info);
> > +
> > +   /*
> > +* if provided buffer size is not equal to the size, should
> > +* return 0 and also the expected buffer size to caller.
> > +*/
> > +   if (info->size != size) {
> > +   info->size = size;
> > +   return 0;
> > +   }
> > +
> > +   /* report an empty iommu_nesting_info for now */
> > +   memset(info, 0x0, size);
> > +   info->size = size;
> > +   return 0;
> > +}
> > +
> >  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> > enum iommu_attr attr, void *data)  { @@ -
> 3028,8 +3054,7 @@
> > static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> > case IOMMU_DOMAIN_UNMANAGED:
> > switch (attr) {
> > case DOMAIN_ATTR_NESTING:
> > -   *(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> > -   return 0;
> > +   return arm_smmu_domain_nesting_info(smmu_domain,
> data);
> > default:
> > return -ENODEV;
> > }
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index
> > 243bc4c..908607d 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -1506,6 +1506,32 @@ static struct iommu_group
> *arm_smmu_device_group(struct device *dev)
> > return group;
> >  }
> >
> > +static int arm_smmu_domain_nesting_info(struct arm_smmu_domain
> *smmu_domain,
> > +   void *data)
> > +{
> > +   struct iommu_nesting_info *info = (struct iommu_nesting_info *) data;
> > +   u32 size;
> > +
> > +   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> > +   return -ENODEV;
> > +
> > +   size = sizeof(struct iommu_nesting_info);
> > +
> > +   /*
> > +* if provided buffer size is not equal to the size, should
> > +* return 0 and also the expected buffer size to caller.
> > +*/
> > +   if (info->size != size) {
> > +   info->size = size;
> > +   return 0;
> > +   }
> > +
> > +   /* report an empty iommu_nesting_info for now */
> > +   memset(info, 0x0, size);
> > +   info->size = size;
> > +   return 0;
> > +}
> > +
> >  static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> > enum iommu_attr attr, void *data)  { @@ -
> 1515,8 +1541,7 @@
> > static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> > case IOMMU_DOMAIN_UNMANAGED:
> > switch (attr) {
> > case DOMAIN_ATTR_NESTING:
> > -   *(int *)data = (smmu_domain->stage ==
> ARM_SMMU_DOMAIN_NESTED);
> > -   return 0;
> > +   return arm_smmu_domain_nesting_info(smmu_domain,
> data);
> > default:
> > return -ENODEV;
> > }
> > diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> > index 1afc661..898c99a 100644
> > --- a/include/uapi/linux/iommu.h
> > +++ b/include/uapi/linux/iommu.h
> > @@ -332,4 +332,63 @@ struct iommu_gpasid_bind_data {
> > } vendor;
> >  };
> >
> > +/*
> > + * struct iommu_nesting_info - Information for nesting-capable IOMMU.

RE: [PATCH v3 01/14] vfio/type1: Refactor vfio_iommu_type1_ioctl()

2020-07-02 Thread Liu, Yi L

> From: Alex Williamson 
> Sent: Friday, July 3, 2020 5:21 AM
> To: Liu, Yi L 
> 
> On Wed, 24 Jun 2020 01:55:14 -0700
> Liu Yi L  wrote:
> 
> > This patch refactors the vfio_iommu_type1_ioctl() to use switch
> > instead of if-else, and each cmd got a helper function.
> >
> > Cc: Kevin Tian 
> > CC: Jacob Pan 
> > Cc: Alex Williamson 
> > Cc: Eric Auger 
> > Cc: Jean-Philippe Brucker 
> > Cc: Joerg Roedel 
> > Cc: Lu Baolu 
> > Suggested-by: Christoph Hellwig 
> > Signed-off-by: Liu Yi L 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 392
> > ++--
> >  1 file changed, 213 insertions(+), 179 deletions(-)
> 
> I can go ahead and grab this one for my v5.9 next branch.  Thanks,

thanks, that would be great help. I'll monitor your next branch on github.

Regards,
Yi Liu

> Alex
> 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 5e556ac..7accb59 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -2453,6 +2453,23 @@ static int vfio_domains_have_iommu_cache(struct
> vfio_iommu *iommu)
> > return ret;
> >  }
> >
> > +static int vfio_iommu_type1_check_extension(struct vfio_iommu *iommu,
> > +   unsigned long arg)
> > +{
> > +   switch (arg) {
> > +   case VFIO_TYPE1_IOMMU:
> > +   case VFIO_TYPE1v2_IOMMU:
> > +   case VFIO_TYPE1_NESTING_IOMMU:
> > +   return 1;
> > +   case VFIO_DMA_CC_IOMMU:
> > +   if (!iommu)
> > +   return 0;
> > +   return vfio_domains_have_iommu_cache(iommu);
> > +   default:
> > +   return 0;
> > +   }
> > +}
> > +
> >  static int vfio_iommu_iova_add_cap(struct vfio_info_cap *caps,
> >  struct vfio_iommu_type1_info_cap_iova_range *cap_iovas,
> >  size_t size)
> > @@ -2529,238 +2546,255 @@ static int
> vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
> > return vfio_info_add_capability(caps, _mig.header,
> > sizeof(cap_mig));  }
> >
> > -static long vfio_iommu_type1_ioctl(void *iommu_data,
> > -  unsigned int cmd, unsigned long arg)
> > +static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> > +unsigned long arg)
> >  {
> > -   struct vfio_iommu *iommu = iommu_data;
> > +   struct vfio_iommu_type1_info info;
> > unsigned long minsz;
> > +   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> > +   unsigned long capsz;
> > +   int ret;
> >
> > -   if (cmd == VFIO_CHECK_EXTENSION) {
> > -   switch (arg) {
> > -   case VFIO_TYPE1_IOMMU:
> > -   case VFIO_TYPE1v2_IOMMU:
> > -   case VFIO_TYPE1_NESTING_IOMMU:
> > -   return 1;
> > -   case VFIO_DMA_CC_IOMMU:
> > -   if (!iommu)
> > -   return 0;
> > -   return vfio_domains_have_iommu_cache(iommu);
> > -   default:
> > -   return 0;
> > -   }
> > -   } else if (cmd == VFIO_IOMMU_GET_INFO) {
> > -   struct vfio_iommu_type1_info info;
> > -   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
> > -   unsigned long capsz;
> > -   int ret;
> > -
> > -   minsz = offsetofend(struct vfio_iommu_type1_info,
> iova_pgsizes);
> > +   minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
> >
> > -   /* For backward compatibility, cannot require this */
> > -   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> > +   /* For backward compatibility, cannot require this */
> > +   capsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);
> >
> > -   if (copy_from_user(, (void __user *)arg, minsz))
> > -   return -EFAULT;
> > +   if (copy_from_user(, (void __user *)arg, minsz))
> > +   return -EFAULT;
> >
> > -   if (info.argsz < minsz)
> > -   return -EINVAL;
> > +   if (info.argsz < minsz)
> > +   return -EINVAL;
> >
> > -   if (info.argsz >= capsz) {
> > -   minsz = capsz;
> > -   info.cap_offset = 0; /* output, no-recopy necessary */
> > -   }
> > +   if (info.argsz >= capsz) {
> > +   minsz = capsz;
> > +   info.cap_offset = 0; /* output, no-recopy necessary */
> > +   }
> >
> > -   mutex_lock(>lock);
> > -   info.flags = VFIO_IOMMU_INFO_PGSIZES;
> > +   mutex_lock(>lock);
> > +   info.flags = VFIO_IOMMU_INFO_PGSIZES;
> >
> > -   info.iova_pgsizes = iommu->pgsize_bitmap;
> > +   info.iova_pgsizes = iommu->pgsize_bitmap;
> >
> > -   ret = vfio_iommu_migration_build_caps(iommu, );
> > +   ret = vfio_iommu_migration_build_caps(iommu, );
> >
> > -   if (!ret)
> > -   ret = vfio_iommu_iova_build_caps(iommu, );
> > +   if (!ret)
> > +   ret = vfio_iommu_iova_build_caps(iommu, );
> >
> > -   mutex_unlock(>lock);
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1485 matches

Mail list logo