[PATCH v2] coccinelle: api: add devm_platform_ioremap_resource script

2019-07-15 Thread Himanshu Jha
Use recently introduced devm_platform_ioremap_resource
helper which wraps platform_get_resource() and
devm_ioremap_resource() together. This helps produce much
cleaner code and remove local `struct resource` declaration.

Signed-off-by: Himanshu Jha 
Signed-off-by: Julia Lawall 
---

v2:
   - added SPDX License
   - used IORESOURCE_MEM instead of generic argument.
   - added Julia's SoB.

 .../api/devm_platform_ioremap_resource.cocci  | 60 +++
 1 file changed, 60 insertions(+)
 create mode 100644 scripts/coccinelle/api/devm_platform_ioremap_resource.cocci

diff --git a/scripts/coccinelle/api/devm_platform_ioremap_resource.cocci 
b/scripts/coccinelle/api/devm_platform_ioremap_resource.cocci
new file mode 100644
index ..56a2e261d61d
--- /dev/null
+++ b/scripts/coccinelle/api/devm_platform_ioremap_resource.cocci
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/// Use devm_platform_ioremap_resource helper which wraps
+/// platform_get_resource() and devm_ioremap_resource() together.
+///
+// Confidence: High
+// Copyright: (C) 2019 Himanshu Jha GPLv2.
+// Copyright: (C) 2019 Julia Lawall, Inria/LIP6. GPLv2.
+// Keywords: platform_get_resource, devm_ioremap_resource,
+// Keywords: devm_platform_ioremap_resource
+
+virtual patch
+virtual report
+
+@r depends on patch && !report@
+expression e1, e2, arg1, arg2, arg3;
+identifier id;
+@@
+
+(
+- id = platform_get_resource(arg1, IORESOURCE_MEM, arg2);
+|
+- struct resource *id = platform_get_resource(arg1, IORESOURCE_MEM, arg2);
+)
+  ... when != id
+- e1 = devm_ioremap_resource(arg3, id);
++ e1 = devm_platform_ioremap_resource(arg1, arg2);
+  ... when != id
+? id = e2
+
+@r1 depends on patch && !report@
+identifier r.id;
+type T;
+@@
+
+- T *id;
+  ...when != id
+
+@r2 depends on report && !patch@
+identifier id;
+expression e1, e2, arg1, arg2, arg3;
+position j0;
+@@
+
+(
+  id = platform_get_resource(arg1, IORESOURCE_MEM, arg2);
+|
+  struct resource *id = platform_get_resource(arg1, IORESOURCE_MEM, arg2);
+)
+  ... when != id
+  e1@j0 = devm_ioremap_resource(arg3, id);
+  ... when != id
+? id = e2
+
+@script:python depends on report && !patch@
+e1 << r2.e1;
+j0 << r2.j0;
+@@
+
+msg = "WARNING: Use devm_platform_ioremap_resource for %s" % (e1)
+coccilib.report.print_report(j0[0], msg)
-- 
2.17.1



linux-next: Tree for Jul 16

2019-07-15 Thread Stephen Rothwell
Hi all,

Please do not add v5.4 material to your linux-next included branches
until after v5.3-rc1 has been released.

Changes since 20190715:

The kbuild tree gained a build failure for which I reverted 4 commits.

The pm tree lost its build failure.

The rdma tree gained a conflict against the v4l-dvb-next tree.

Non-merge commits (relative to Linus' tree): 4992
 4742 files changed, 602614 insertions(+), 111348 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 299 trees (counting Linus' and 72 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (fec88ab0af97 Merge tag 'for-linus-hmm' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma)
Merging fixes/master (1e0fd3346b49 MIPS: fix some more fall through errors in 
arch/mips)
Merging kspp-gustavo/for-next/kspp (d93512ef0f0e Makefile: Globally enable 
fall-through warning)
Merging kbuild-current/fixes (964a4eacef67 Merge tag 'dlm-5.3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm)
Merging arc-current/for-curr (24a20b0a443f ARC: [plat-hsdk]: Enable AXI DW DMAC 
in defconfig)
CONFLICT (content): Merge conflict in arch/arc/mm/fault.c
Merging arm-current/fixes (c5d0e49e8d8f ARM: 8867/1: vdso: pass --be8 to linker 
if necessary)
Merging arm-soc-fixes/arm/fixes (2659dc8d225c Merge tag 
'davinci-fixes-for-v5.2-part2' of 
git://git.kernel.org/pub/scm/linux/kernel/git/nsekhar/linux-davinci into 
arm/fixes)
Merging arm64-fixes/for-next/fixes (aa69fb62bea1 arm64/efi: Mark 
__efistub_stext_offset as an absolute symbol explicitly)
Merging m68k-current/for-linus (f28a1f16135c m68k: Don't select 
ARCH_HAS_DMA_PREP_COHERENT for nommu or coldfire)
Merging powerpc-fixes/fixes (192f0f8e9db7 Merge tag 'powerpc-5.3-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux)
Merging s390-fixes/fixes (9a159190414d s390/unwind: avoid int overflow in 
outside_of_stack)
Merging sparc/master (192f0f8e9db7 Merge tag 'powerpc-5.3-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (f384e62a82ba ISDN: hfcsusb: checking idx of ep 
configuration)
Merging bpf/master (6da193569cbe Merge branch 'bpf-fix-wide-loads-sockaddr')
Merging ipsec/master (114a5c324015 Merge tag 'mlx5-fixes-2019-07-11' of 
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux)
Merging netfilter/master (1c5ec78272e3 netfilter: nfnetlink: avoid deadlock due 
to synchronous request_module)
Merging ipvs/master (58e8b37069ff Merge branch 'net-phy-dp83867-add-some-fixes')
Merging wireless-drivers/master (41a531ffa4c5 rt2x00usb: fix rx queue hang)
Merging mac80211/master (d2ce8d6bfcfe nl80211: Fix undefined behavior in bit 
shift)
Merging rdma-fixes/for-rc (4b972a01a7da Linux 5.2-rc6)
Merging sound-current/for-linus (2f235d92ac22 ALSA: rme9652: Unneeded variable: 
"result".)
Merging sound-asoc-fixes/for-linus (bd6fe4ae445e Merge branch 'asoc-5.2' into 
asoc-linus)
Merging regmap-fixes/for-linus (ea09b3e21f18 Merge branch 'regmap-5.2' into 
regmap-linus)
Merging regulator-fixes/for-linus (174bc9941d7d Merge branch 'regulator-5.2' 
into regulator-linus)
Merging spi-fixes/for-linus (70b5fa4cc32f Merge branch 'spi-5.2' into spi-linus)
Merging pci-current/for-linus (6

Re: [PATCH] mmc: host: sdhci: Fix the incorrect soft reset operation when runtime resuming

2019-07-15 Thread Baolin Wang
Hi Adrian,

On Mon, 15 Jul 2019 at 20:39, Adrian Hunter  wrote:
>
> On 15/07/19 2:37 PM, Baolin Wang wrote:
> > Hi Adrian,
> >
> > On Mon, 15 Jul 2019 at 19:20, Adrian Hunter  wrote:
> >>
> >> On 15/07/19 1:58 PM, Baolin Wang wrote:
> >>> In sdhci_runtime_resume_host() function, we will always do software reset
> >>> for all, but according to the specification, we should issue reset command
> >>> and reinitialize the SD/eMMC card.
> >>
> >> Where does it say that?
> >
> > I checked the SD host controller simplified specification Ver4.20, and
> > in Page 75, Software Reset For All bit, it says "if this bit is set
> > to1, the host driver should issue reset command and  reinitialize the
> > SD card". (I did not check other versions).
>
> That might simply be assuming that the bus power also controls the card power.

Yes.

>
> >
> >>
> >>>However, we only do reinitialize the
> >>> SD/eMMC card when the SD/eMMC card are power down during runtime suspend.
> >>>
> >>> Thus for those platforms that do not power down the SD/eMMC card during
> >>> runtime suspend, we should not do software reset for all.
> >>>   To fix this
> >>> issue, we can add one condition to validate the MMC_CAP_AGGRESSIVE_PM
> >>> to decide if we can do software reset for all or just reset command
> >>> and data lines.
> >>>
> >>> Signed-off-by: Baolin Wang 
> >>> ---
> >>>  drivers/mmc/host/sdhci.c |2 +-
> >>>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
> >>> index 9715834..470c5e0 100644
> >>> --- a/drivers/mmc/host/sdhci.c
> >>> +++ b/drivers/mmc/host/sdhci.c
> >>> @@ -,7 +,7 @@ int sdhci_runtime_resume_host(struct sdhci_host 
> >>> *host)
> >>>   host->ops->enable_dma(host);
> >>>   }
> >>>
> >>> - sdhci_init(host, 0);
> >>> + sdhci_init(host, !(mmc->caps & MMC_CAP_AGGRESSIVE_PM));
> >>
> >> We have done a full reset for a long time, so it would be surprising to 
> >> need
> >> to change it.
> >>
> >> What problem is it causing?
> >
> > If we did not power down the SD card during runtime suspend, and we
> > reset for all when runtime resume, our SD host controller can not work
> > well, will meet some strange behavior, like:
> >
> > [6.525397] mmc0: Got data interrupt 0x0002 even though no data
> > operation was in progress.
> > [6.534189] mmc0: sdhci:  SDHCI REGISTER DUMP ===
> > [6.540797] mmc0: sdhci: Sys addr:  0x0008 | Version:  0x0004
> > [6.547413] mmc0: sdhci: Blk size:  0x0200 | Blk cnt:  0x
> > [6.554029] mmc0: sdhci: Argument:  0x03200101 | Trn mode: 0x0033
> > [6.560645] mmc0: sdhci: Present:   0x01f000f0 | Host ctl: 0x0030
> > [6.567262] mmc0: sdhci: Power: 0x | Blk gap:  0x
> > [6.573877] mmc0: sdhci: Wake-up:   0x | Clock:0x0007
> > [6.580493] mmc0: sdhci: Timeout:   0x000e | Int stat: 0x
> > [6.587109] mmc0: sdhci: Int enab:  0x037f000b | Sig enab: 0x037f000b
> > [6.593726] mmc0: sdhci: ACmd stat: 0x | Slot int: 0x
> > [6.600342] mmc0: sdhci: Caps:  0x1c6d0080 | Caps_1:   0x0807
> > [6.606959] mmc0: sdhci: Cmd:   0x061b | Max curr: 0x00ff
> > [6.613574] mmc0: sdhci: Resp[0]:   0x1201 | Resp[1]:  0x
> > [6.620190] mmc0: sdhci: Resp[2]:   0x | Resp[3]:  0x
> > [6.626806] mmc0: sdhci: Host ctl2: 0x3807
> > [6.631364] mmc0: sdhci: ADMA Err:  0x | ADMA Ptr: 
> > 0xdf062000
> > [6.638697] mmc0: sdhci: 
> > [6.645379] mmc0: cache flush error -84
> >
> > Got data interrupt but no data commands are processing now. With this
> > patch, then our SD host controller can work well. Did I miss anything
> > else? Thanks.
>
> The response seems to show the card in state 9 bus-testing, which would
> suggest the use of CMD19 for eMMC.  Perhaps the wrong command is used for
> eMMC re-tuning?

So it is strange, for eMMC we use CMD21 for tuning.

>
> The difficulty with changing long standing flow is that it might reveal
> problems for other existing hardware.  Did you consider making a
> driver-specific change?  The ->reset() callback could be used.

Understood. I will find a way to fix my issue and do not affect the
original logic used by other hardware. Thanks.

-- 
Baolin Wang
Best Regards


Re: [PATCH V5 11/18] clk: tegra210: Add support for Tegra210 clocks

2019-07-15 Thread Dmitry Osipenko
В Mon, 15 Jul 2019 21:37:09 -0700
Sowjanya Komatineni  пишет:

> On 7/15/19 8:50 PM, Dmitry Osipenko wrote:
> > 16.07.2019 6:00, Sowjanya Komatineni пишет:  
> >> On 7/15/19 5:35 PM, Sowjanya Komatineni wrote:  
> >>> On 7/14/19 2:41 PM, Dmitry Osipenko wrote:  
>  13.07.2019 8:54, Sowjanya Komatineni пишет:  
> > On 6/29/19 8:10 AM, Dmitry Osipenko wrote:  
> >> 28.06.2019 5:12, Sowjanya Komatineni пишет:  
> >>> This patch adds system suspend and resume support for Tegra210
> >>> clocks.
> >>>
> >>> All the CAR controller settings are lost on suspend when core
> >>> power goes off.
> >>>
> >>> This patch has implementation for saving and restoring all
> >>> the PLLs and clocks context during system suspend and resume
> >>> to have the clocks back to same state for normal operation.
> >>>
> >>> Acked-by: Thierry Reding 
> >>> Signed-off-by: Sowjanya Komatineni 
> >>> ---
> >>>     drivers/clk/tegra/clk-tegra210.c | 115
> >>> ++-
> >>>     drivers/clk/tegra/clk.c  |  14 +
> >>>     drivers/clk/tegra/clk.h  |   1 +
> >>>     3 files changed, 127 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/clk/tegra/clk-tegra210.c
> >>> b/drivers/clk/tegra/clk-tegra210.c
> >>> index 1c08c53482a5..1b839544e086 100644
> >>> --- a/drivers/clk/tegra/clk-tegra210.c
> >>> +++ b/drivers/clk/tegra/clk-tegra210.c
> >>> @@ -9,10 +9,12 @@
> >>>     #include 
> >>>     #include 
> >>>     #include 
> >>> +#include 
> >>>     #include 
> >>>     #include 
> >>>     #include 
> >>>     #include 
> >>> +#include 
> >>>     #include 
> >>>     #include 
> >>>     #include 
> >>> @@ -20,6 +22,7 @@
> >>>     #include 
> >>>       #include "clk.h"
> >>> +#include "clk-dfll.h"
> >>>     #include "clk-id.h"
> >>>       /*
> >>> @@ -225,6 +228,7 @@
> >>>       #define CLK_RST_CONTROLLER_RST_DEV_Y_SET 0x2a8
> >>>     #define CLK_RST_CONTROLLER_RST_DEV_Y_CLR 0x2ac
> >>> +#define CPU_SOFTRST_CTRL 0x380
> >>>       #define LVL2_CLK_GATE_OVRA 0xf8
> >>>     #define LVL2_CLK_GATE_OVRC 0x3a0
> >>> @@ -2820,6 +2824,7 @@ static int tegra210_enable_pllu(void)
> >>>     struct tegra_clk_pll_freq_table *fentry;
> >>>     struct tegra_clk_pll pllu;
> >>>     u32 reg;
> >>> +    int ret;
> >>>       for (fentry = pll_u_freq_table; fentry->input_rate;
> >>> fentry++) {
> >>>     if (fentry->input_rate == pll_ref_freq)
> >>> @@ -2847,10 +2852,10 @@ static int tegra210_enable_pllu(void)
> >>>     fence_udelay(1, clk_base);
> >>>     reg |= PLL_ENABLE;
> >>>     writel(reg, clk_base + PLLU_BASE);
> >>> +    fence_udelay(1, clk_base);
> >>>     -    readl_relaxed_poll_timeout_atomic(clk_base +
> >>> PLLU_BASE, reg,
> >>> -  reg & PLL_BASE_LOCK, 2, 1000);
> >>> -    if (!(reg & PLL_BASE_LOCK)) {
> >>> +    ret = tegra210_wait_for_mask(, PLLU_BASE,
> >>> PLL_BASE_LOCK);
> >>> +    if (ret) {
> >>>     pr_err("Timed out waiting for PLL_U to lock\n");
> >>>     return -ETIMEDOUT;
> >>>     }
> >>> @@ -3283,6 +3288,103 @@ static void
> >>> tegra210_disable_cpu_clock(u32 cpu)
> >>>     }
> >>>       #ifdef CONFIG_PM_SLEEP
> >>> +static u32 cpu_softrst_ctx[3];
> >>> +static struct platform_device *dfll_pdev;
> >>> +#define car_readl(_base, _off) readl_relaxed(clk_base +
> >>> (_base) + ((_off) * 4))
> >>> +#define car_writel(_val, _base, _off) \
> >>> +    writel_relaxed(_val, clk_base + (_base) + ((_off) *
> >>> 4)) +
> >>> +static int tegra210_clk_suspend(void)
> >>> +{
> >>> +    unsigned int i;
> >>> +    struct device_node *node;
> >>> +
> >>> +    tegra_cclkg_burst_policy_save_context();
> >>> +
> >>> +    if (!dfll_pdev) {
> >>> +    node = of_find_compatible_node(NULL, NULL,
> >>> +   "nvidia,tegra210-dfll");
> >>> +    if (node)
> >>> +    dfll_pdev = of_find_device_by_node(node);
> >>> +
> >>> +    of_node_put(node);
> >>> +    if (!dfll_pdev)
> >>> +    pr_err("dfll node not found. no suspend for
> >>> dfll\n");
> >>> +    }
> >>> +
> >>> +    if (dfll_pdev)
> >>> +    tegra_dfll_suspend(dfll_pdev);
> >>> +
> >>> +    /* Enable PLLP_OUT_CPU after dfll suspend */
> >>> +    tegra_clk_set_pllp_out_cpu(true);
> >>> +
> >>> +    tegra_sclk_cclklp_burst_policy_save_context();
> >>> +
> >>> +    clk_save_context();
> >>> +
> >>> +    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
> >>> +    cpu_softrst_ctx[i] = car_readl(CPU_SOFTRST_CTRL, i);
> >>> +
> >>> +    return 0;
> >>> 

[LINUX PATCH v18 1/2] mtd: rawnand: nand_micron: Do not over write driver's read_page()/write_page()

2019-07-15 Thread Naga Sureshkumar Relli
Add check before assigning chip->ecc.read_page() and chip->ecc.write_page()

Signed-off-by: Naga Sureshkumar Relli 
---
Changes in v18
 - None
---
 drivers/mtd/nand/raw/nand_micron.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/nand/raw/nand_micron.c 
b/drivers/mtd/nand/raw/nand_micron.c
index cbd4f09ac178..565f2696c747 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -500,8 +500,11 @@ static int micron_nand_init(struct nand_chip *chip)
chip->ecc.size = 512;
chip->ecc.strength = chip->base.eccreq.strength;
chip->ecc.algo = NAND_ECC_BCH;
-   chip->ecc.read_page = micron_nand_read_page_on_die_ecc;
-   chip->ecc.write_page = micron_nand_write_page_on_die_ecc;
+   if (!chip->ecc.read_page)
+   chip->ecc.read_page = micron_nand_read_page_on_die_ecc;
+
+   if (!chip->ecc.write_page)
+   chip->ecc.write_page = 
micron_nand_write_page_on_die_ecc;
 
if (ondie == MICRON_ON_DIE_MANDATORY) {
chip->ecc.read_page_raw = nand_read_page_raw_notsupp;
-- 
2.17.1



[LINUX PATCH v18 2/2] mtd: rawnand: pl353: Add basic driver for arm pl353 smc nand interface

2019-07-15 Thread Naga Sureshkumar Relli
Add driver for arm pl353 static memory controller nand interface.
This controller is used in Xilinx Zynq SoC for interfacing the
NAND flash memory.

Reviewed-by: Helmut Grohne 
Signed-off-by: Naga Sureshkumar Relli 
---
xilinx zynq TRM link:
https://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdf

ARM pl353 smc TRM link:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0380g/DDI0380G_smc_pl350_series_r2p1_trm.pdf

-> Tested Micron MT29F2G08ABAEAWP (On-die capable) and AMD/Spansion S34ML01G1.
-> Tested both x8 and x16 bus-widths.
-> Tested ubifs, mtd_debug tools and mtd-tests which exists in kernel as 
modules.
-> Tested jffs2

SMC memory controller driver is at drivers/memory/pl353-smc.c

Changes in v18:
 - Modified the driver's ->exec_op() method as per Boris comments
   Now in this ->exec_op(), no where we are checking any NAND_CMD's
 - Removed checking of NAND_ROW_ADDR_3 inside ->exec_op() method
 - Modifed the driver's ->exec_op() PATTERN definitions to support
   all patterns. now these patterns are not specific to any particular
   pattern as commented by Boris
Changes in v17:
 - This patch fixes the comments given by Helmut Grohne
 - Removed struct clk *mclk from struct pl353_nand_controller, as it is required
   only once during probe, so declare it inside the probe
 - Merged NAND_OP_DATA_IN_INSTR and NAND_OP_DATA_OUT_INSTR as single case in 
switch
 - Added updated Makefile and Kconfig file
 - Added pl353_nand_wait_ready(), in pl353_nand_write_page_raw() to wait for 
write
   to complete
Changes in v16:
 - Removed unnecessary comments
Changes in v15:
  All the comments given by Helmut Grohne to v14 are addressed in this series
  as mentioned below.
 - Removed below unused macros
   PL353_NAND_CMD_PHASE, PL353_NAND_DATA_PHASE and PL353_NAND_ECC_CONFIG
 - Used cond_resched() instead of cpu_relax() to eleminate the CPU spin for
   a full second
 - changed the size of cmnds[4] to cmnds[2]
 - Removed the unused variable end_cmd in struct pl353_nfc_op
 - Added new variable u16 addrs_56, instead of u32 addr5 and u32 addr6
 - Removed the unused variable cle_ale_delay_ns in struct pl353_nfc_op
 - Completely changed the nand_offset calculation, taken new varibale
   called dataphase_addrflags and eleminated the casting with __force
   just used offset + flags
 - in pl353_ecc_ooblayout64_free(), removed checking of section with
   ecc.steps, as section is 0 here
 - simplified the pl353_wait_for_dev_ready() and pl353_wait_for_ecc_done()
 - Updated the nfc_op->addrs calculation in pl353_nfc_parse_instructions()
 - Removed cond_delay(), instead used ndelay(), as it is sufficient
 - in pl353_nand_exec_op(), instead of assigning end_cmd twice, just assign
   it once by nfc_op.cmnds[1]
 - changed if (reading) to else in pl353_nand_exec_op()
 - Removed int err variable in pl353_nand_ecc_init(), instead just used
   single variable ret
 - Changed reading clock value by name rather than index in pl353_nand_probe()
 - Instead of always calling clk_get_rate(), stored it in the probe to a
   varaible and use it later
Changes in v14:
 - Removed legacy hooks as per Miquel comments
Changes in v13:
 - Rebased the driver to mtd/next
Changes in v12:
 - Rebased the driver on top of v4.19 nand tree
 - Removed nand_scan_ident() and nand_scan_tail(), and added nand_controller_ops
   with ->attach_chip() and used nand_scan() instead.
 - Renamed pl353_nand_info structure to pl353_nand_controller
 - Renamed nand_base and nandaddr in pl353_nand_controller to 'regs' and 
'buf_addr'
 - Added new API pl353_wait_for_ecc_done() to wait for ecc done and call it from
   pl353_nand_write_page_hwecc() and pl353_nand_read_page_hwecc()
 - Defined new macro for max ECC blocks
 - Added return value check for ecc.calculate()
 - Renamed pl353_nand_cmd_function() to pl353_nand_exec_op_cmd()
 - Added x16 bus-width support
 - The dependent driver pl353-smc is already reviewed and hence dropped the
   smc driver
Changes in v11:
 - Removed Documentation patch and added the required info in driver as
   per Boris comments.
 - Removed unwanted variables from pl353_nand_info as per Miquel comments
 - Removed IO_ADDR_R/W.
 - Replaced onhot() with hweight32()
 - Defined macros for static values in function pl353_nand_correct_data()
 - Removed all unnecessary delays
 - Used nand_wait_ready() where ever is required
 - Modifed the pl353_setup_data_interface() logic as per Miquel comments.
 - Taken array instead of 7 values in pl353_setup_data_interface() and pass
   it to smc driver.
 - Added check to collect the return value of mtd_device_register().
Changes in 10:
 - Typos correction like nand to NAND and soc to SOC etc..
 - Defined macros for the values in pl353_nand_calculate_hwecc()
 - Modifed ecc_status from int to char in pl353_nand_calculate_hwecc()
 - Changed the return type form int to bool to the function
   onehot()
 - Removed udelay(1000) in pl353_cmd_function, as it is not required
 - Dropped ecc->hwctl = NULL in 

[PATCH v5 3/4] media: venus: Update to bitrate based clock scaling

2019-07-15 Thread Aniket Masule
Introduced clock scaling using bitrate, preavious
calculations consider only the cycles per mb.
Also, clock scaling is now triggered before every
buffer being queued to the device. This helps in
deciding precise clock cycles required.

Signed-off-by: Aniket Masule 
---
 drivers/media/platform/qcom/venus/helpers.c | 33 +
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/helpers.c 
b/drivers/media/platform/qcom/venus/helpers.c
index 2c976e4..edf403d 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -399,17 +399,26 @@ static int scale_clocks(struct venus_inst *inst)
return ret;
 }
 
-static unsigned long calculate_vpp_freq(struct venus_inst *inst)
+static unsigned long calculate_inst_freq(struct venus_inst *inst,
+unsigned long filled_len)
 {
-   unsigned long vpp_freq = 0;
+   unsigned long vpp_freq = 0, vsp_freq = 0;
+   u64 fps = inst->fps;
u32 mbs_per_sec;
 
mbs_per_sec = load_per_instance(inst);
vpp_freq = mbs_per_sec * inst->clk_data.codec_freq_data->vpp_freq;
/* 21 / 20 is overhead factor */
vpp_freq += vpp_freq / 20;
+   vsp_freq = mbs_per_sec * inst->clk_data.codec_freq_data->vsp_freq;
 
-   return vpp_freq;
+   /* 10 / 7 is overhead factor */
+   if (inst->session_type == VIDC_SESSION_TYPE_ENC)
+   vsp_freq += (inst->controls.enc.bitrate * 10) / 7;
+   else
+   vsp_freq += ((fps * filled_len * 8) * 10) / 7;
+
+   return max(vpp_freq, vsp_freq);
 }
 
 static int scale_clocks_v4(struct venus_inst *inst)
@@ -417,13 +426,27 @@ static int scale_clocks_v4(struct venus_inst *inst)
struct venus_core *core = inst->core;
const struct freq_tbl *table = core->res->freq_tbl;
unsigned int num_rows = core->res->freq_tbl_size;
+   struct v4l2_m2m_ctx *m2m_ctx = inst->m2m_ctx;
struct clk *clk = core->clks[0];
struct device *dev = core->dev;
unsigned int i;
unsigned long freq = 0, freq_core1 = 0, freq_core2 = 0;
+   unsigned long filled_len = 0;
+   struct venus_buffer *buf, *n;
+   struct vb2_buffer *vb;
int ret;
 
-   freq = calculate_vpp_freq(inst);
+   mutex_lock(>lock);
+   v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
+   vb = >vb.vb2_buf;
+   filled_len = max(filled_len, vb2_get_plane_payload(vb, 0));
+   }
+   mutex_unlock(>lock);
+
+   if (inst->session_type == VIDC_SESSION_TYPE_DEC && !filled_len)
+   return 0;
+
+   freq = calculate_inst_freq(inst, filled_len);
 
if (freq > table[0].freq)
dev_warn(dev, "HW is overloaded, needed: %lu max: %lu\n",
@@ -1093,6 +1116,8 @@ void venus_helper_vb2_buf_queue(struct vb2_buffer *vb)
if (ret)
goto unlock;
 
+   load_scale_clocks(inst);
+
ret = session_process_buf(inst, vbuf);
if (ret)
return_buf_error(inst, vbuf);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 4/4] media: venus: Update core selection

2019-07-15 Thread Aniket Masule
Present core assignment is static. Introduced load balancing
across the cores. Load on earch core is calculated and core
with minimum load is assigned to given instance.

Signed-off-by: Aniket Masule 
---
 drivers/media/platform/qcom/venus/helpers.c| 69 +++---
 drivers/media/platform/qcom/venus/helpers.h|  2 +-
 drivers/media/platform/qcom/venus/hfi_helper.h |  1 +
 drivers/media/platform/qcom/venus/hfi_parser.h |  5 ++
 drivers/media/platform/qcom/venus/vdec.c   |  2 +-
 drivers/media/platform/qcom/venus/venc.c   |  2 +-
 6 files changed, 72 insertions(+), 9 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/helpers.c 
b/drivers/media/platform/qcom/venus/helpers.c
index edf403d..d479793 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -26,6 +26,7 @@
 #include "helpers.h"
 #include "hfi_helper.h"
 #include "hfi_venus_io.h"
+#include "hfi_parser.h"
 
 struct intbuf {
struct list_head list;
@@ -331,6 +332,24 @@ static u32 load_per_instance(struct venus_inst *inst)
return mbs * inst->fps;
 }
 
+static u32 load_per_core(struct venus_core *core, u32 core_id)
+{
+   struct venus_inst *inst = NULL;
+   u32 mbs_per_sec = 0, load = 0;
+
+   mutex_lock(>lock);
+   list_for_each_entry(inst, >instances, list) {
+   if (inst->clk_data.core_id != core_id)
+   continue;
+
+   mbs_per_sec = load_per_instance(inst);
+   load = mbs_per_sec * inst->clk_data.codec_freq_data->vpp_freq;
+   }
+   mutex_unlock(>lock);
+
+   return load;
+}
+
 static u32 load_per_type(struct venus_core *core, u32 session_type)
 {
struct venus_inst *inst = NULL;
@@ -505,6 +524,16 @@ static int load_scale_clocks(struct venus_inst *inst)
return scale_clocks(inst);
 }
 
+int set_core_usage(struct venus_inst *inst, u32 usage)
+{
+   const u32 ptype = HFI_PROPERTY_CONFIG_VIDEOCORES_USAGE;
+   struct hfi_videocores_usage_type cu;
+
+   cu.video_core_enable_mask = usage;
+
+   return hfi_session_set_property(inst, ptype, );
+}
+
 static void fill_buffer_desc(const struct venus_buffer *buf,
 struct hfi_buffer_desc *bd, bool response)
 {
@@ -808,19 +837,47 @@ int venus_helper_set_work_mode(struct venus_inst *inst, 
u32 mode)
 }
 EXPORT_SYMBOL_GPL(venus_helper_set_work_mode);
 
-int venus_helper_set_core_usage(struct venus_inst *inst, u32 usage)
+int venus_helper_set_core(struct venus_inst *inst)
 {
-   const u32 ptype = HFI_PROPERTY_CONFIG_VIDEOCORES_USAGE;
-   struct hfi_videocores_usage_type cu;
+   struct venus_core *core = inst->core;
+   u32 min_core_id = 0, core1_load = 0, core2_load = 0;
+   unsigned long min_load, max_freq, cur_inst_load;
+   u32 cores_max;
+   int ret;
 
if (!IS_V4(inst->core))
return 0;
 
-   cu.video_core_enable_mask = usage;
+   core1_load = load_per_core(core, VIDC_CORE_ID_1);
+   core2_load = load_per_core(core, VIDC_CORE_ID_2);
+   min_core_id = core1_load < core2_load ? VIDC_CORE_ID_1 : VIDC_CORE_ID_2;
+   min_load = min(core1_load, core2_load);
+   cores_max = core_num_max(inst);
 
-   return hfi_session_set_property(inst, ptype, );
+   if (cores_max < VIDC_CORE_ID_2) {
+   min_core_id = VIDC_CORE_ID_1;
+   min_load = core1_load;
+   }
+
+   cur_inst_load = load_per_instance(inst) *
+   inst->clk_data.codec_freq_data->vpp_freq;
+   max_freq = core->res->freq_tbl[0].freq;
+
+   if ((cur_inst_load + min_load)  > max_freq) {
+   dev_warn(core->dev, "HW is overloaded, needed: %lu max: %lu\n",
+cur_inst_load, max_freq);
+   return -EINVAL;
+   }
+
+   ret = set_core_usage(inst, min_core_id);
+   if (ret)
+   return ret;
+
+   inst->clk_data.core_id = min_core_id;
+
+   return 0;
 }
-EXPORT_SYMBOL_GPL(venus_helper_set_core_usage);
+EXPORT_SYMBOL_GPL(venus_helper_set_core);
 
 int venus_helper_init_codec_freq_data(struct venus_inst *inst)
 {
diff --git a/drivers/media/platform/qcom/venus/helpers.h 
b/drivers/media/platform/qcom/venus/helpers.h
index 2c13245..1034111 100644
--- a/drivers/media/platform/qcom/venus/helpers.h
+++ b/drivers/media/platform/qcom/venus/helpers.h
@@ -42,7 +42,7 @@ int venus_helper_set_output_resolution(struct venus_inst 
*inst,
   u32 buftype);
 int venus_helper_set_work_mode(struct venus_inst *inst, u32 mode);
 int venus_helper_init_codec_freq_data(struct venus_inst *inst);
-int venus_helper_set_core_usage(struct venus_inst *inst, u32 usage);
+int venus_helper_set_core(struct venus_inst *inst);
 int venus_helper_set_num_bufs(struct venus_inst *inst, unsigned int input_bufs,
  unsigned int output_bufs,
  unsigned int output2_bufs);
diff 

[PATCH v5 2/4] media: venus: Update clock scaling

2019-07-15 Thread Aniket Masule
Current clock scaling calculations are same for vpu4 and
previous versions. For vpu4, Clock scaling calculations
are updated with cycles/mb. This helps in getting precise
clock required.

Signed-off-by: Aniket Masule 
---
 drivers/media/platform/qcom/venus/helpers.c | 91 +++--
 1 file changed, 87 insertions(+), 4 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/helpers.c 
b/drivers/media/platform/qcom/venus/helpers.c
index 7492373..2c976e4 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -348,8 +348,9 @@ static u32 load_per_type(struct venus_core *core, u32 
session_type)
return mbs_per_sec;
 }
 
-static int load_scale_clocks(struct venus_core *core)
+static int scale_clocks(struct venus_inst *inst)
 {
+   struct venus_core *core = inst->core;
const struct freq_tbl *table = core->res->freq_tbl;
unsigned int num_rows = core->res->freq_tbl_size;
unsigned long freq = table[0].freq;
@@ -398,6 +399,89 @@ static int load_scale_clocks(struct venus_core *core)
return ret;
 }
 
+static unsigned long calculate_vpp_freq(struct venus_inst *inst)
+{
+   unsigned long vpp_freq = 0;
+   u32 mbs_per_sec;
+
+   mbs_per_sec = load_per_instance(inst);
+   vpp_freq = mbs_per_sec * inst->clk_data.codec_freq_data->vpp_freq;
+   /* 21 / 20 is overhead factor */
+   vpp_freq += vpp_freq / 20;
+
+   return vpp_freq;
+}
+
+static int scale_clocks_v4(struct venus_inst *inst)
+{
+   struct venus_core *core = inst->core;
+   const struct freq_tbl *table = core->res->freq_tbl;
+   unsigned int num_rows = core->res->freq_tbl_size;
+   struct clk *clk = core->clks[0];
+   struct device *dev = core->dev;
+   unsigned int i;
+   unsigned long freq = 0, freq_core1 = 0, freq_core2 = 0;
+   int ret;
+
+   freq = calculate_vpp_freq(inst);
+
+   if (freq > table[0].freq)
+   dev_warn(dev, "HW is overloaded, needed: %lu max: %lu\n",
+freq, table[0].freq);
+
+   for (i = 0; i < num_rows; i++) {
+   if (freq > table[i].freq)
+   break;
+   freq = table[i].freq;
+   }
+
+   inst->clk_data.freq = freq;
+
+   mutex_lock(>lock);
+   list_for_each_entry(inst, >instances, list) {
+   if (inst->clk_data.core_id == VIDC_CORE_ID_1) {
+   freq_core1 += inst->clk_data.freq;
+   } else if (inst->clk_data.core_id == VIDC_CORE_ID_2) {
+   freq_core2 += inst->clk_data.freq;
+   } else if (inst->clk_data.core_id == VIDC_CORE_ID_3) {
+   freq_core1 += inst->clk_data.freq;
+   freq_core2 += inst->clk_data.freq;
+   }
+   }
+   mutex_unlock(>lock);
+
+   freq = max(freq_core1, freq_core2);
+
+   ret = clk_set_rate(clk, freq);
+   if (ret)
+   goto err;
+
+   ret = clk_set_rate(core->core0_clk, freq);
+   if (ret)
+   goto err;
+
+   ret = clk_set_rate(core->core1_clk, freq);
+   if (ret)
+   goto err;
+
+   return 0;
+
+err:
+   dev_err(dev, "failed to set clock rate %lu (%d)\n", freq, ret);
+   return ret;
+}
+
+static int load_scale_clocks(struct venus_inst *inst)
+{
+   if (IS_V4(inst->core))
+   return scale_clocks_v4(inst);
+
+   if (inst->state == INST_START)
+   return 0;
+
+   return scale_clocks(inst);
+}
+
 static void fill_buffer_desc(const struct venus_buffer *buf,
 struct hfi_buffer_desc *bd, bool response)
 {
@@ -1053,7 +1137,7 @@ void venus_helper_vb2_stop_streaming(struct vb2_queue *q)
 
venus_helper_free_dpb_bufs(inst);
 
-   load_scale_clocks(core);
+   load_scale_clocks(inst);
INIT_LIST_HEAD(>registeredbufs);
}
 
@@ -1070,7 +1154,6 @@ void venus_helper_vb2_stop_streaming(struct vb2_queue *q)
 
 int venus_helper_vb2_start_streaming(struct venus_inst *inst)
 {
-   struct venus_core *core = inst->core;
int ret;
 
ret = intbufs_alloc(inst);
@@ -1081,7 +1164,7 @@ int venus_helper_vb2_start_streaming(struct venus_inst 
*inst)
if (ret)
goto err_bufs_free;
 
-   load_scale_clocks(core);
+   load_scale_clocks(inst);
 
ret = hfi_session_load_res(inst);
if (ret)
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH v5 1/4] media: venus: Add codec data table

2019-07-15 Thread Aniket Masule
Add vpp cycles for different types of codec.
It indicates the cycles required by video hardware
to process each macroblock. Add vsp cycles, cycles
required by stream processor. Initialize the codec
data with core resources.

Signed-off-by: Aniket Masule 
---
 drivers/media/platform/qcom/venus/core.c| 13 +
 drivers/media/platform/qcom/venus/core.h| 16 +++
 drivers/media/platform/qcom/venus/helpers.c | 30 +
 drivers/media/platform/qcom/venus/helpers.h |  1 +
 drivers/media/platform/qcom/venus/vdec.c|  4 
 drivers/media/platform/qcom/venus/venc.c|  4 
 6 files changed, 68 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c 
b/drivers/media/platform/qcom/venus/core.c
index 7393667..ad6bb74 100644
--- a/drivers/media/platform/qcom/venus/core.c
+++ b/drivers/media/platform/qcom/venus/core.c
@@ -473,9 +473,22 @@ static __maybe_unused int venus_runtime_resume(struct 
device *dev)
{  244800, 1 }, /* 1920x1080@30 */
 };
 
+static struct codec_freq_data sdm845_codec_freq_data[] =  {
+   { V4L2_PIX_FMT_H264, VIDC_SESSION_TYPE_ENC, 675, 10 },
+   { V4L2_PIX_FMT_HEVC, VIDC_SESSION_TYPE_ENC, 675, 10 },
+   { V4L2_PIX_FMT_VP8, VIDC_SESSION_TYPE_ENC, 675, 10 },
+   { V4L2_PIX_FMT_MPEG2, VIDC_SESSION_TYPE_DEC, 200, 10 },
+   { V4L2_PIX_FMT_H264, VIDC_SESSION_TYPE_DEC, 200, 10 },
+   { V4L2_PIX_FMT_HEVC, VIDC_SESSION_TYPE_DEC, 200, 10 },
+   { V4L2_PIX_FMT_VP8, VIDC_SESSION_TYPE_DEC, 200, 10 },
+   { V4L2_PIX_FMT_VP9, VIDC_SESSION_TYPE_DEC, 200, 10 },
+};
+
 static const struct venus_resources sdm845_res = {
.freq_tbl = sdm845_freq_table,
.freq_tbl_size = ARRAY_SIZE(sdm845_freq_table),
+   .codec_freq_data = sdm845_codec_freq_data,
+   .codec_freq_data_size = ARRAY_SIZE(sdm845_codec_freq_data),
.clks = {"core", "iface", "bus" },
.clks_num = 3,
.max_load = 2563200,
diff --git a/drivers/media/platform/qcom/venus/core.h 
b/drivers/media/platform/qcom/venus/core.h
index 7a3feb5..b8aef19 100644
--- a/drivers/media/platform/qcom/venus/core.h
+++ b/drivers/media/platform/qcom/venus/core.h
@@ -35,12 +35,21 @@ struct reg_val {
u32 value;
 };
 
+struct codec_freq_data {
+   u32 pixfmt;
+   u32 session_type;
+   unsigned long vpp_freq;
+   unsigned long vsp_freq;
+};
+
 struct venus_resources {
u64 dma_mask;
const struct freq_tbl *freq_tbl;
unsigned int freq_tbl_size;
const struct reg_val *reg_tbl;
unsigned int reg_tbl_size;
+   const struct codec_freq_data *codec_freq_data;
+   unsigned int codec_freq_data_size;
const char * const clks[VIDC_CLKS_NUM_MAX];
unsigned int clks_num;
enum hfi_version hfi_version;
@@ -216,6 +225,12 @@ struct venus_buffer {
struct list_head ref_list;
 };
 
+struct clock_data {
+   u32 core_id;
+   unsigned long freq;
+   const struct codec_freq_data *codec_freq_data;
+};
+
 #define to_venus_buffer(ptr)   container_of(ptr, struct venus_buffer, vb)
 
 /**
@@ -275,6 +290,7 @@ struct venus_inst {
struct list_head list;
struct mutex lock;
struct venus_core *core;
+   struct clock_data clk_data;
struct list_head dpbbufs;
struct list_head internalbufs;
struct list_head registeredbufs;
diff --git a/drivers/media/platform/qcom/venus/helpers.c 
b/drivers/media/platform/qcom/venus/helpers.c
index 5cad601..7492373 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -715,6 +715,36 @@ int venus_helper_set_core_usage(struct venus_inst *inst, 
u32 usage)
 }
 EXPORT_SYMBOL_GPL(venus_helper_set_core_usage);
 
+int venus_helper_init_codec_freq_data(struct venus_inst *inst)
+{
+   const struct codec_freq_data *data;
+   unsigned int i, data_size;
+   u32 pixfmt;
+   int ret = 0;
+
+   if (!IS_V4(inst->core))
+   return 0;
+
+   data = inst->core->res->codec_freq_data;
+   data_size = inst->core->res->codec_freq_data_size;
+   pixfmt = inst->session_type == VIDC_SESSION_TYPE_DEC ?
+   inst->fmt_out->pixfmt : inst->fmt_cap->pixfmt;
+
+   for (i = 0; i < data_size; i++) {
+   if (data[i].pixfmt == pixfmt &&
+   data[i].session_type == inst->session_type) {
+   inst->clk_data.codec_freq_data = [i];
+   break;
+   }
+   }
+
+   if (!inst->clk_data.codec_freq_data)
+   ret = -EINVAL;
+
+   return ret;
+}
+EXPORT_SYMBOL_GPL(venus_helper_init_codec_freq_data);
+
 int venus_helper_set_num_bufs(struct venus_inst *inst, unsigned int input_bufs,
  unsigned int output_bufs,
  unsigned int output2_bufs)
diff --git a/drivers/media/platform/qcom/venus/helpers.h 
b/drivers/media/platform/qcom/venus/helpers.h
index 

[PATCH v5 0/4] media: venus: Update clock scaling and core selection

2019-07-15 Thread Aniket Masule
In this patch series, clock scaling and core selection methods are
updated. Current clock scaling and core selection methods are same
for vpu4 and previous versions. Introducing load calculations using
vpp cycles, which indicates the cycles required by video hardware to
process each macroblock. Also adding vsp cycles, cycles require by
stream processor. Clock scaling is now done more precisely using vpp
and vsp cycles. Instance is assigned to core with minimum load, instead
of static assignment.

Changes since v4:
 - Added call to load_scale_clocks from venus_helper_vb2_buf_queue.
 - Modified check to match core_id in core_selection.

Changes since v3:
 - vsp_cycles and vpp_cyles are now unsigned long.
 - Core number counting aligned with VIDC_CORE_ID_.
 - Aligned hardware overload handling of scale_clocks_v4 with scale_clocks.
 - Added bitrate based clock scaling patch in this patch series.
 - Instance state check is now moved from scale_clocks to load_scale_clocks.

Aniket Masule (4):
  media: venus: Add codec data table
  media: venus: Update clock scaling
  media: venus: Update to bitrate based clock scaling
  media: venus: Update core selection

 drivers/media/platform/qcom/venus/core.c   |  13 ++
 drivers/media/platform/qcom/venus/core.h   |  16 ++
 drivers/media/platform/qcom/venus/helpers.c| 215 +++--
 drivers/media/platform/qcom/venus/helpers.h|   3 +-
 drivers/media/platform/qcom/venus/hfi_helper.h |   1 +
 drivers/media/platform/qcom/venus/hfi_parser.h |   5 +
 drivers/media/platform/qcom/venus/vdec.c   |   6 +-
 drivers/media/platform/qcom/venus/venc.c   |   6 +-
 8 files changed, 252 insertions(+), 13 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



[PATCH] remoteproc: qcom: Move glink_ssr notification after stop

2019-07-15 Thread Bjorn Andersson
glink_ssr is used to signal a remote processor "A" about the stopping of
another remote processor "B", so that in the event that remote processor
B is ever booted again the fifos of the glink channel between A and B is
in a known state.

But if remote processor A receives this notification before B is
actually stopped the newly reset fifo indices will be interpreted as
there being "data" on the channel and either side of the channel will
enter a fatal error handler.

Move the glink_ssr notification to the "unprepare" state of the
rproc_subdev to avoid this issue.

This has the side effect of us not notifying the dying remote processor
itself about its fate, which has been seen to block in certain resource
constraint scenarios.

Signed-off-by: Bjorn Andersson 
---
 drivers/remoteproc/qcom_common.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/remoteproc/qcom_common.c b/drivers/remoteproc/qcom_common.c
index 6a448429f223..60650bcc8c67 100644
--- a/drivers/remoteproc/qcom_common.c
+++ b/drivers/remoteproc/qcom_common.c
@@ -200,7 +200,7 @@ void qcom_unregister_ssr_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(qcom_unregister_ssr_notifier);
 
-static void ssr_notify_stop(struct rproc_subdev *subdev, bool crashed)
+static void ssr_notify_unprepare(struct rproc_subdev *subdev)
 {
struct qcom_rproc_ssr *ssr = to_ssr_subdev(subdev);
 
@@ -220,7 +220,7 @@ void qcom_add_ssr_subdev(struct rproc *rproc, struct 
qcom_rproc_ssr *ssr,
 const char *ssr_name)
 {
ssr->name = ssr_name;
-   ssr->subdev.stop = ssr_notify_stop;
+   ssr->subdev.unprepare = ssr_notify_unprepare;
 
rproc_add_subdev(rproc, >subdev);
 }
-- 
2.18.0



Re: linux-next: build failure after merge of the kbuild tree

2019-07-15 Thread Masahiro Yamada
Hi Stephen,

On Tue, Jul 16, 2019 at 1:31 PM Stephen Rothwell  wrote:
>
> Hi all,
>
> After merging the kbuild tree, today's linux-next build (x86_64
> allnoconfig) failed like this:
>
> make[1]: *** No rule to make target 'modules.order', needed by 
> 'autoksyms_recursive'.  Stop.
>
> Starting with commit
>
>   656c0ac3cb4b ("kbuild: modpost: read modules.order instead of 
> $(MODVERDIR)/*.mod")
>
> i.e. after reverting just:
>
> 25a55e249bc0 Revert "kbuild: create *.mod with full directory path and remove 
> MODVERDIR"
> 15657d9eceb6 Revert "kbuild: remove the first line of *.mod files"
> 8c21f4122839 Revert "kbuild: remove 'prepare1' target"
> 35bc41d44d2d Revert "kbuild: split out *.mod out of {single,multi}-used-m 
> rules"
>
> I get
>
> cat: modules.order: No such file or directory
>
> near the end of the allnoconfig build (but it does not fail).
>
> and when I remove 25a55e249bc0 i.e. stop reverting
>
>   0539f970a842 ("kbuild: create *.mod with full directory path and remove 
> MODVERDIR")
>
> the build fails.
>
> So, for today I have reverted these 4 commits:
>
>   56dce8121e97 ("kbuild: split out *.mod out of {single,multi}-used-m rules")
>   fbe0b5eb7890 ("kbuild: remove 'prepare1' target")
>   008fa222d268 ("kbuild: remove the first line of *.mod files")
>   0539f970a842 ("kbuild: create *.mod with full directory path and remove 
> MODVERDIR")
>


Ugh, sorry for the breakage.


For the build error, I will fix it as follows for tomorrow's linux-next:

diff --git a/Makefile b/Makefile
index bfb08cc647f8..5f3daca90862 100644
--- a/Makefile
+++ b/Makefile
@@ -1028,8 +1028,8 @@ vmlinux-deps := $(KBUILD_LDS)
$(KBUILD_VMLINUX_OBJS) $(KBUILD_VMLINUX_LIBS)

 # Recurse until adjust_autoksyms.sh is satisfied
 PHONY += autoksyms_recursive
-autoksyms_recursive: $(vmlinux-deps) modules.order
 ifdef CONFIG_TRIM_UNUSED_KSYMS
+autoksyms_recursive: $(vmlinux-deps) modules.order
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/adjust_autoksyms.sh \
  "$(MAKE) -f $(srctree)/Makefile vmlinux"
 endif



For the warning, I will fix it as follows for tomorrow's linux-next:

diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 3e229d4f4d72..6b19c1a4eae5 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -64,7 +64,9 @@ modulesymfile := $(firstword $(KBUILD_EXTMOD))/Module.symvers
 modorder := $(if $(KBUILD_EXTMOD),$(KBUILD_EXTMOD)/)modules.order

 # Step 1), find all modules listed in modules.order
+ifdef CONFIG_MODULES
 modules := $(sort $(shell cat $(modorder)))
+endif

 # Stop after building .o files if NOFINAL is set. Makes compile tests quicker
 _modpost: $(if $(KBUILD_MODPOST_NOFINAL), $(modules:.ko:.o),$(modules))


Thanks!

-- 
Best Regards
Masahiro Yamada


[PATCH] staging: vt6656: change alignment to match parenthesis

2019-07-15 Thread Benjamin Sherman
Change indentation to match parentheses.  This complies with the Linux
kernel coding style and improves readability.

Signed-off-by: Benjamin Sherman 
---
 drivers/staging/vt6656/rxtx.c| 10 +-
 drivers/staging/vt6656/usbpipe.c |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/vt6656/rxtx.c b/drivers/staging/vt6656/rxtx.c
index 9def0748ffee..4e9cfacf75f2 100644
--- a/drivers/staging/vt6656/rxtx.c
+++ b/drivers/staging/vt6656/rxtx.c
@@ -287,12 +287,12 @@ static u16 vnt_rxtx_datahead_g(struct 
vnt_usb_send_context *tx_context,
buf->duration_a = vnt_get_duration_le(priv,
tx_context->pkt_type, need_ack);
buf->duration_b = vnt_get_duration_le(priv,
-   PK_TYPE_11B, need_ack);
+ PK_TYPE_11B, need_ack);
}
 
buf->time_stamp_off_a = vnt_time_stamp_off(priv, rate);
buf->time_stamp_off_b = vnt_time_stamp_off(priv,
-   priv->top_cck_basic_rate);
+  priv->top_cck_basic_rate);
 
tx_context->tx_hdr_size = vnt_mac_hdr_pos(tx_context, >hdr);
 
@@ -325,7 +325,7 @@ static u16 vnt_rxtx_datahead_g_fb(struct 
vnt_usb_send_context *tx_context,
 
buf->time_stamp_off_a = vnt_time_stamp_off(priv, rate);
buf->time_stamp_off_b = vnt_time_stamp_off(priv,
-   priv->top_cck_basic_rate);
+  priv->top_cck_basic_rate);
 
tx_context->tx_hdr_size = vnt_mac_hdr_pos(tx_context, >hdr);
 
@@ -655,7 +655,7 @@ static u16 vnt_rxtx_ab(struct vnt_usb_send_context 
*tx_context,
u8 need_ack = tx_context->need_ack;
 
buf->rrv_time = vnt_rxtx_rsvtime_le16(priv, tx_context->pkt_type,
-   frame_len, current_rate, need_ack);
+ frame_len, current_rate, 
need_ack);
 
if (need_mic)
head = _head->tx_ab.tx.mic.head;
@@ -1036,7 +1036,7 @@ static int vnt_beacon_xmit(struct vnt_private *priv, 
struct sk_buff *skb)
 
/* Get Duration and TimeStampOff */
short_head->duration = vnt_get_duration_le(priv,
-   PK_TYPE_11B, false);
+  PK_TYPE_11B, false);
short_head->time_stamp_off =
vnt_time_stamp_off(priv, current_rate);
}
diff --git a/drivers/staging/vt6656/usbpipe.c b/drivers/staging/vt6656/usbpipe.c
index ff351a7a0876..d3304df6bd53 100644
--- a/drivers/staging/vt6656/usbpipe.c
+++ b/drivers/staging/vt6656/usbpipe.c
@@ -216,7 +216,7 @@ static void vnt_submit_rx_urb_complete(struct urb *urb)
}
 
urb->transfer_buffer = skb_put(rcb->skb,
-   skb_tailroom(rcb->skb));
+  skb_tailroom(rcb->skb));
}
 
if (usb_submit_urb(urb, GFP_ATOMIC)) {
-- 
2.22.0



Re: [Linux-kernel-mentees] [PATCH v3] PCI: Remove functions not called in include/linux/pci.h

2019-07-15 Thread Lukas Bulwahn



On Mon, 15 Jul 2019, Kelsey Skunberg wrote:

> Remove the following uncalled functions from include/linux/pci.h:
> 
> pci_block_cfg_access()
> pci_block_cfg_access_in_atomic()
> pci_unblock_cfg_access()
> 
> Functions were added in commit fb51ccbf217c ("PCI: Rework config space
> blocking services"), though no callers were added. Code continues to be
> unused and should be removed.
> 
> Signed-off-by: Kelsey Skunberg 
> ---
> 
> Changes since v1:
>   - Fixed Signed-off-by line to show full name
> 
> Changes since v2:
>   - Change commit message to reference prior commit properly with the
> following format:
>   commit <12-character sha prefix> ("")
> 
>  include/linux/pci.h | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cf380544c700..3c9ba6133bea 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1656,11 +1656,6 @@ static inline void pci_release_regions(struct pci_dev 
> *dev) { }
>  
>  static inline unsigned long pci_address_to_pio(phys_addr_t addr) { return 
> -1; }
>  
> -static inline void pci_block_cfg_access(struct pci_dev *dev) { }
> -static inline int pci_block_cfg_access_in_atomic(struct pci_dev *dev)
> -{ return 0; }
> -static inline void pci_unblock_cfg_access(struct pci_dev *dev) { }
> -
>  static inline struct pci_bus *pci_find_next_bus(const struct pci_bus *from)
>  { return NULL; }
>  static inline struct pci_dev *pci_get_slot(struct pci_bus *bus,
> -- 
> 2.20.1

I just checked with elixir on v5.2 that all three identifiers are never 
referenced beyond its definition in pci.h:

https://elixir.bootlin.com/linux/v5.2/ident/pci_block_cfg_access
https://elixir.bootlin.com/linux/v5.2/ident/pci_block_cfg_access_in_atomic
https://elixir.bootlin.com/linux/v5.2/ident/pci_unblock_cfg_access

So, what it is worth:

Reviewed-by: Lukas Bulwahn 

Lukas

> 
> ___
> Linux-kernel-mentees mailing list
> linux-kernel-ment...@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/linux-kernel-mentees
> 


Re: linux-next: build warning after merge of the ntb tree

2019-07-15 Thread Stephen Rothwell
Hi Logan,

On Mon, 15 Jul 2019 21:46:42 -0600 Logan Gunthorpe  wrote:
>
> I renamed the ntb.c file to core.c so we could add more files to build
> ntb.ko. See [1].

But that was last changed in June, so I assume that some change to the
build system has caused this warning to be produced now.

-- 
Cheers,
Stephen Rothwell


pgpkTjZA27rSU.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 4/4] media: venus: Update core selection

2019-07-15 Thread amasule

On 2019-07-15 21:30, Stanimir Varbanov wrote:

Hi,

On 7/2/19 5:46 PM, Aniket Masule wrote:

Present core assignment is static. Introduced load balancing
across the cores. Load on earch core is calculated and core
with minimum load is assigned to given instance.

Signed-off-by: Aniket Masule 
---
 drivers/media/platform/qcom/venus/helpers.c| 69 
+++---

 drivers/media/platform/qcom/venus/helpers.h|  2 +-
 drivers/media/platform/qcom/venus/hfi_helper.h |  1 +
 drivers/media/platform/qcom/venus/hfi_parser.h |  5 ++
 drivers/media/platform/qcom/venus/vdec.c   |  2 +-
 drivers/media/platform/qcom/venus/venc.c   |  2 +-
 6 files changed, 72 insertions(+), 9 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/helpers.c 
b/drivers/media/platform/qcom/venus/helpers.c

index 5726d86..321e9f7 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -26,6 +26,7 @@
 #include "helpers.h"
 #include "hfi_helper.h"
 #include "hfi_venus_io.h"
+#include "hfi_parser.h"

 struct intbuf {
struct list_head list;
@@ -331,6 +332,24 @@ static u32 load_per_instance(struct venus_inst 
*inst)

return mbs * inst->fps;
 }

+static u32 load_per_core(struct venus_core *core, u32 core_id)
+{
+   struct venus_inst *inst = NULL;
+   u32 mbs_per_sec = 0, load = 0;
+
+   mutex_lock(>lock);
+   list_for_each_entry(inst, >instances, list) {
+   if (!(inst->clk_data.core_id == core_id))


if (inst->clk_data.core_id != core_id)

I guess will be more readable?


Yes, I will modify the check.

+   continue;
+
+   mbs_per_sec = load_per_instance(inst);
+   load = mbs_per_sec * inst->clk_data.codec_freq_data->vpp_freq;
+   }
+   mutex_unlock(>lock);
+
+   return load;
+}
+





Regards,
Aniket


Re: [PATCH v4 3/4] media: venus: Update to bitrate based clock scaling

2019-07-15 Thread amasule

Hi,

On 2019-07-15 21:28, Stanimir Varbanov wrote:

Hi,

On 7/2/19 5:46 PM, Aniket Masule wrote:

Introduced clock scaling using bitrate, preavious
calculations consider only the cycles per mb.
Also, clock scaling is now triggered before every
buffer being queued to the device. This helps in
deciding precise clock cycles required.

Signed-off-by: Aniket Masule 
---
 drivers/media/platform/qcom/venus/helpers.c | 31 
+

 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/media/platform/qcom/venus/helpers.c 
b/drivers/media/platform/qcom/venus/helpers.c

index 2c976e4..5726d86 100644
--- a/drivers/media/platform/qcom/venus/helpers.c
+++ b/drivers/media/platform/qcom/venus/helpers.c
@@ -399,17 +399,26 @@ static int scale_clocks(struct venus_inst *inst)
return ret;
 }

-static unsigned long calculate_vpp_freq(struct venus_inst *inst)
+static unsigned long calculate_inst_freq(struct venus_inst *inst,
+unsigned long filled_len)
 {
-   unsigned long vpp_freq = 0;
+   unsigned long vpp_freq = 0, vsp_freq = 0;
+   u64 fps = inst->fps;
u32 mbs_per_sec;

mbs_per_sec = load_per_instance(inst);
vpp_freq = mbs_per_sec * inst->clk_data.codec_freq_data->vpp_freq;
/* 21 / 20 is overhead factor */
vpp_freq += vpp_freq / 20;
+   vsp_freq = mbs_per_sec * inst->clk_data.codec_freq_data->vsp_freq;

-   return vpp_freq;
+   /* 10 / 7 is overhead factor */
+   if (inst->session_type == VIDC_SESSION_TYPE_ENC)
+   vsp_freq += (inst->controls.enc.bitrate * 10) / 7;
+   else
+   vsp_freq += ((fps * filled_len * 8) * 10) / 7;
+
+   return max(vpp_freq, vsp_freq);
 }

 static int scale_clocks_v4(struct venus_inst *inst)
@@ -417,13 +426,27 @@ static int scale_clocks_v4(struct venus_inst 
*inst)

struct venus_core *core = inst->core;
const struct freq_tbl *table = core->res->freq_tbl;
unsigned int num_rows = core->res->freq_tbl_size;
+   struct v4l2_m2m_ctx *m2m_ctx = inst->m2m_ctx;
struct clk *clk = core->clks[0];
struct device *dev = core->dev;
unsigned int i;
unsigned long freq = 0, freq_core1 = 0, freq_core2 = 0;
+   unsigned long filled_len = 0;
+   struct venus_buffer *buf, *n;
+   struct vb2_buffer *vb;
int ret;

-   freq = calculate_vpp_freq(inst);
+   mutex_lock(>lock);
+   v4l2_m2m_for_each_src_buf_safe(m2m_ctx, buf, n) {
+   vb = >vb.vb2_buf;
+   filled_len = max(filled_len, vb2_get_plane_payload(vb, 0));
+   }
+   mutex_unlock(>lock);
+
+   if (inst->session_type == VIDC_SESSION_TYPE_DEC && !filled_len)
+   return 0;
+
+   freq = calculate_inst_freq(inst, filled_len);

if (freq > table[0].freq)
dev_warn(dev, "HW is overloaded, needed: %lu max: %lu\n",



The original patch has a call to load_scale_clocks from
venus_helper_vb2_buf_queue, why it is not included here?


Yes, I need to include it for before every buffer being queued to the 
device

to have accurate precise clock.

Regards,
Aniket


Re: [PATCH V5 11/18] clk: tegra210: Add support for Tegra210 clocks

2019-07-15 Thread Sowjanya Komatineni



On 7/15/19 8:50 PM, Dmitry Osipenko wrote:

16.07.2019 6:00, Sowjanya Komatineni пишет:

On 7/15/19 5:35 PM, Sowjanya Komatineni wrote:

On 7/14/19 2:41 PM, Dmitry Osipenko wrote:

13.07.2019 8:54, Sowjanya Komatineni пишет:

On 6/29/19 8:10 AM, Dmitry Osipenko wrote:

28.06.2019 5:12, Sowjanya Komatineni пишет:

This patch adds system suspend and resume support for Tegra210
clocks.

All the CAR controller settings are lost on suspend when core power
goes off.

This patch has implementation for saving and restoring all the PLLs
and clocks context during system suspend and resume to have the
clocks back to same state for normal operation.

Acked-by: Thierry Reding 
Signed-off-by: Sowjanya Komatineni 
---
    drivers/clk/tegra/clk-tegra210.c | 115
++-
    drivers/clk/tegra/clk.c  |  14 +
    drivers/clk/tegra/clk.h  |   1 +
    3 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/tegra/clk-tegra210.c
b/drivers/clk/tegra/clk-tegra210.c
index 1c08c53482a5..1b839544e086 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -9,10 +9,12 @@
    #include 
    #include 
    #include 
+#include 
    #include 
    #include 
    #include 
    #include 
+#include 
    #include 
    #include 
    #include 
@@ -20,6 +22,7 @@
    #include 
      #include "clk.h"
+#include "clk-dfll.h"
    #include "clk-id.h"
      /*
@@ -225,6 +228,7 @@
      #define CLK_RST_CONTROLLER_RST_DEV_Y_SET 0x2a8
    #define CLK_RST_CONTROLLER_RST_DEV_Y_CLR 0x2ac
+#define CPU_SOFTRST_CTRL 0x380
      #define LVL2_CLK_GATE_OVRA 0xf8
    #define LVL2_CLK_GATE_OVRC 0x3a0
@@ -2820,6 +2824,7 @@ static int tegra210_enable_pllu(void)
    struct tegra_clk_pll_freq_table *fentry;
    struct tegra_clk_pll pllu;
    u32 reg;
+    int ret;
      for (fentry = pll_u_freq_table; fentry->input_rate;
fentry++) {
    if (fentry->input_rate == pll_ref_freq)
@@ -2847,10 +2852,10 @@ static int tegra210_enable_pllu(void)
    fence_udelay(1, clk_base);
    reg |= PLL_ENABLE;
    writel(reg, clk_base + PLLU_BASE);
+    fence_udelay(1, clk_base);
    -    readl_relaxed_poll_timeout_atomic(clk_base + PLLU_BASE, reg,
-  reg & PLL_BASE_LOCK, 2, 1000);
-    if (!(reg & PLL_BASE_LOCK)) {
+    ret = tegra210_wait_for_mask(, PLLU_BASE, PLL_BASE_LOCK);
+    if (ret) {
    pr_err("Timed out waiting for PLL_U to lock\n");
    return -ETIMEDOUT;
    }
@@ -3283,6 +3288,103 @@ static void tegra210_disable_cpu_clock(u32
cpu)
    }
      #ifdef CONFIG_PM_SLEEP
+static u32 cpu_softrst_ctx[3];
+static struct platform_device *dfll_pdev;
+#define car_readl(_base, _off) readl_relaxed(clk_base + (_base) +
((_off) * 4))
+#define car_writel(_val, _base, _off) \
+    writel_relaxed(_val, clk_base + (_base) + ((_off) * 4))
+
+static int tegra210_clk_suspend(void)
+{
+    unsigned int i;
+    struct device_node *node;
+
+    tegra_cclkg_burst_policy_save_context();
+
+    if (!dfll_pdev) {
+    node = of_find_compatible_node(NULL, NULL,
+   "nvidia,tegra210-dfll");
+    if (node)
+    dfll_pdev = of_find_device_by_node(node);
+
+    of_node_put(node);
+    if (!dfll_pdev)
+    pr_err("dfll node not found. no suspend for dfll\n");
+    }
+
+    if (dfll_pdev)
+    tegra_dfll_suspend(dfll_pdev);
+
+    /* Enable PLLP_OUT_CPU after dfll suspend */
+    tegra_clk_set_pllp_out_cpu(true);
+
+    tegra_sclk_cclklp_burst_policy_save_context();
+
+    clk_save_context();
+
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    cpu_softrst_ctx[i] = car_readl(CPU_SOFTRST_CTRL, i);
+
+    return 0;
+}
+
+static void tegra210_clk_resume(void)
+{
+    unsigned int i;
+    struct clk_hw *parent;
+    struct clk *clk;
+
+    /*
+ * clk_restore_context restores clocks as per the clock tree.
+ *
+ * dfllCPU_out is first in the clock tree to get restored and it
+ * involves programming DFLL controller along with restoring
CPUG
+ * clock burst policy.
+ *
+ * DFLL programming needs dfll_ref and dfll_soc peripheral
clocks
+ * to be restores which are part ofthe peripheral clocks.

  ^ white-space

Please use spellchecker to avoid typos.


+ * So, peripheral clocks restore should happen prior to dfll
clock
+ * restore.
+ */
+
+    tegra_clk_osc_resume(clk_base);
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    car_writel(cpu_softrst_ctx[i], CPU_SOFTRST_CTRL, i);
+
+    /* restore all plls and peripheral clocks */
+    tegra210_init_pllu();
+    clk_restore_context();
+
+    fence_udelay(5, clk_base);
+
+    /* resume SCLK and CPULP clocks */
+    tegra_sclk_cpulp_burst_policy_restore_context();
+
+    /*
+ * restore CPUG clocks:
+ * - enable DFLL in open loop mode
+ * - switch CPUG to DFLL clock source
+ * - close DFLL loop
+ * - sync 

Re: [PULL REQUEST] i2c for 5.3

2019-07-15 Thread pr-tracker-bot
The pull request you sent on Mon, 15 Jul 2019 23:37:03 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git i2c/for-5.3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/273cbf61c3ddee9574ef1f4959b9bc6db5b24271

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] PCI changes for v5.3

2019-07-15 Thread pr-tracker-bot
The pull request you sent on Mon, 15 Jul 2019 14:27:02 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git 
> tags/pci-v5.3-changes

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/fb4da215ed92f564f7ca090bb81a199b0d6cab8a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] power-supply changes for 5.3

2019-07-15 Thread pr-tracker-bot
The pull request you sent on Mon, 15 Jul 2019 21:49:33 +0200:

> ssh://g...@gitolite.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply.git
>  tags/for-v5.3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/5fe7b600a116187e10317d83fb56922c4ef6b76d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] MFD for v5.3

2019-07-15 Thread pr-tracker-bot
The pull request you sent on Mon, 15 Jul 2019 08:48:35 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git tags/mfd-next-5.3

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/8de262531f5fbb7458463224a7587429800c24bf

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [GIT PULL] Please pull RDMA subsystem changes

2019-07-15 Thread pr-tracker-bot
The pull request you sent on Mon, 15 Jul 2019 15:26:22 +:

> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/2a3c389a0fde49b241430df806a34276568cfb29

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker


Re: [PATCH 1/2] mm,sparse: Fix deactivate_section for early sections

2019-07-15 Thread Dan Williams
On Mon, Jul 15, 2019 at 1:16 AM Oscar Salvador  wrote:
>
> deactivate_section checks whether a section is early or not
> in order to either call free_map_bootmem() or depopulate_section_memmap().
> Being the former for sections added at boot time, and the latter for
> sections hotplugged.
>
> The problem is that we zero section_mem_map, so the last early_section()
> will always report false and the section will not be removed.
>
> Fix this checking whether a section is early or not at function
> entry.
>
> Fixes: mmotm ("mm/sparsemem: Support sub-section hotplug")
> Signed-off-by: Oscar Salvador 
> ---
>  mm/sparse.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/sparse.c b/mm/sparse.c
> index 3267c4001c6d..1e224149aab6 100644
> --- a/mm/sparse.c
> +++ b/mm/sparse.c
> @@ -738,6 +738,7 @@ static void section_deactivate(unsigned long pfn, 
> unsigned long nr_pages,
> DECLARE_BITMAP(map, SUBSECTIONS_PER_SECTION) = { 0 };
> DECLARE_BITMAP(tmp, SUBSECTIONS_PER_SECTION) = { 0 };
> struct mem_section *ms = __pfn_to_section(pfn);
> +   bool section_is_early = early_section(ms);
> struct page *memmap = NULL;
> unsigned long *subsection_map = ms->usage
> ? >usage->subsection_map[0] : NULL;
> @@ -772,7 +773,7 @@ static void section_deactivate(unsigned long pfn, 
> unsigned long nr_pages,
> if (bitmap_empty(subsection_map, SUBSECTIONS_PER_SECTION)) {
> unsigned long section_nr = pfn_to_section_nr(pfn);
>
> -   if (!early_section(ms)) {
> +   if (!section_is_early) {
> kfree(ms->usage);
> ms->usage = NULL;
> }
> @@ -780,7 +781,7 @@ static void section_deactivate(unsigned long pfn, 
> unsigned long nr_pages,
> ms->section_mem_map = sparse_encode_mem_map(NULL, section_nr);
> }
>
> -   if (early_section(ms) && memmap)
> +   if (section_is_early && memmap)
> free_map_bootmem(memmap);
> else
> depopulate_section_memmap(pfn, nr_pages, altmap);

Reviewed-by: Dan Williams 

In fact, this bug was re-introduced between v9 and v10 as I had seen
this bug before, but did not write a reproducer for the unit test.


linux-next: build failure after merge of the kbuild tree

2019-07-15 Thread Stephen Rothwell
Hi all,

After merging the kbuild tree, today's linux-next build (x86_64
allnoconfig) failed like this:

make[1]: *** No rule to make target 'modules.order', needed by 
'autoksyms_recursive'.  Stop.

Starting with commit

  656c0ac3cb4b ("kbuild: modpost: read modules.order instead of 
$(MODVERDIR)/*.mod")

i.e. after reverting just:

25a55e249bc0 Revert "kbuild: create *.mod with full directory path and remove 
MODVERDIR"
15657d9eceb6 Revert "kbuild: remove the first line of *.mod files"
8c21f4122839 Revert "kbuild: remove 'prepare1' target"
35bc41d44d2d Revert "kbuild: split out *.mod out of {single,multi}-used-m rules"

I get

cat: modules.order: No such file or directory

near the end of the allnoconfig build (but it does not fail).

and when I remove 25a55e249bc0 i.e. stop reverting

  0539f970a842 ("kbuild: create *.mod with full directory path and remove 
MODVERDIR")

the build fails.

So, for today I have reverted these 4 commits:

  56dce8121e97 ("kbuild: split out *.mod out of {single,multi}-used-m rules")
  fbe0b5eb7890 ("kbuild: remove 'prepare1' target")
  008fa222d268 ("kbuild: remove the first line of *.mod files")
  0539f970a842 ("kbuild: create *.mod with full directory path and remove 
MODVERDIR")

-- 
Cheers,
Stephen Rothwell


pgpvTWpqsa7Pz.pgp
Description: OpenPGP digital signature


Re: linux-next: manual merge of the akpm-current tree with the hmm tree

2019-07-15 Thread Dan Williams
On Sat, Jul 6, 2019 at 10:04 PM Andrew Morton  wrote:
>
> On Fri, 5 Jul 2019 12:08:15 + Jason Gunthorpe  wrote:
>
> > On Thu, Jul 04, 2019 at 04:29:55PM -0700, Dan Williams wrote:
> > > Guys, Andrew has kicked the subsection patches out of -mm because of
> > > the merge conflicts. Can we hold off on the hmm cleanups for this
> > > cycle?
> >
> > I agree with you we should prioritize your subsection patches over
> > CH's cleanup if we cannot have both.
> >
> > As I said, I'll drop CH's at Andrews request, but I do not want to
> > make any changes without being aligned with him.
>
> OK, I had a shot at repairing the damage on top of current linux-next.
> The great majority of the issues were in
> mm-devm_memremap_pages-enable-sub-section-remap.patch.
>
> Below are the rejects which I saw and below that is my attempt to
> resolve it all.  Dan, please go through this with a toothcomb.  I've
> just done an mmotm release with all this in it so please do whatever's
> needed to verify that it's all working correctly.

Apologies for the delay, I was offline last week for a move. This
looks good to me and even caught some "X >> PAGE_SHIFT" to
"PHYS_PFN(X)" conversions that I missed. It also passes my testing.
Thank you accommodating a rebase.

Now to take a look at Oscar's fixes.


Re: [PATCH v1 3/3] clk: qcom: rcg: update the DFS macro for RCG

2019-07-15 Thread Taniya Das

Hello Stephen,

Thanks for the review.

On 7/16/2019 4:14 AM, Stephen Boyd wrote:

Quoting Taniya Das (2019-05-12 20:44:46)

On 5/10/2019 11:24 PM, Stephen Boyd wrote:

diff --git a/drivers/clk/qcom/clk-rcg.h b/drivers/clk/qcom/clk-rcg.h
index 5562f38..e40e8f8 100644
--- a/drivers/clk/qcom/clk-rcg.h
+++ b/drivers/clk/qcom/clk-rcg.h
@@ -171,7 +171,7 @@ struct clk_rcg_dfs_data {
};

#define DEFINE_RCG_DFS(r) \
-   { .rcg = ##_src, .init = ##_init }
+   { .rcg = , .init = ##_init }


Why do we need to rename the init data?



We want to manage the init data as the clock source name, so that we
could manage to auto generate our code. So that we do not have to
re-name the clock init data manually if the DFS source names gets
updated at any point of time.



Why is the clk name changing to not have a _src after the "root" of the
clk name? As long as I can remember, RCGs have a "_src" postfix.



Yes, the RCGs would have _src, so we do want the init data also to be
generated with _src postfix. So that we do not have to manually clean up
the generated code.



Please manually cleanup the generated code, or fix the code
generator to do what you want.



Fixing the code manually is not what we intend to do and it is time 
consuming with too many DFS controlled clocks. This really helps us 
align to internal code.


--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

--


Re: [PATCH v1 2/3] clk: qcom: rcg2: Add support for hardware control mode

2019-07-15 Thread Taniya Das

Hello Stephen,

Thanks for your review.

On 7/16/2019 4:22 AM, Stephen Boyd wrote:

Quoting Taniya Das (2019-05-08 11:24:54)

diff --git a/drivers/clk/qcom/clk-rcg2.c b/drivers/clk/qcom/clk-rcg2.c
index 57dbac9..5bb6d45 100644
--- a/drivers/clk/qcom/clk-rcg2.c
+++ b/drivers/clk/qcom/clk-rcg2.c
@@ -289,6 +289,9 @@ static int __clk_rcg2_configure(struct clk_rcg2 *rcg, const 
struct freq_tbl *f)
 cfg |= rcg->parent_map[index].cfg << CFG_SRC_SEL_SHIFT;
 if (rcg->mnd_width && f->n && (f->m != f->n))
 cfg |= CFG_MODE_DUAL_EDGE;
+   if (rcg->flags & HW_CLK_CTRL_MODE)
+   cfg |= CFG_HW_CLK_CTRL_MASK;
+


Above this we have commit bdc3bbdd40ba ("clk: qcom: Clear hardware clock
control bit of RCG") that clears this bit. Is it possible to always set
this bit and then have an override flag used in sdm845 that says to
_not_ set this bit? Presumably on earlier platforms writing the bit is a
no-op so it's safe to write the bit on those platforms.

This way, if it's going to be the default we can avoid setting the flag
and only set the flag on older platforms where it shouldn't be done for
some reason.



Not all the subsystem clock controllers might have this hardware control
bit set from design. Thus we want to set them based on the flag.


 return regmap_update_bits(rcg->clkr.regmap, RCG_CFG_OFFSET(rcg),
 mask, cfg);
  }
--
Qualcomm INDIA, on behalf of Qualcomm Innovation Center, Inc.is a member
of the Code Aurora Forum, hosted by the  Linux Foundation.



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation.

--


Re: [PATCH v4] PM / wakeup: show wakeup sources stats in sysfs

2019-07-15 Thread Tri Vo
On Tue, Jul 16, 2019 at 11:13 AM Greg Kroah-Hartman
 wrote:
>
> On Mon, Jul 15, 2019 at 11:48:27PM +0200, Rafael J. Wysocki wrote:
> > On Mon, Jul 15, 2019 at 11:44 PM Tri Vo  wrote:
> > >
> > > Userspace can use wakeup_sources debugfs node to plot history of suspend
> > > blocking wakeup sources over device's boot cycle. This information can
> > > then be used (1) for power-specific bug reporting and (2) towards
> > > attributing battery consumption to specific processes over a period of
> > > time.
> > >
> > > However, debugfs doesn't have stable ABI. For this reason, create a
> > > 'struct device' to expose wakeup sources statistics in sysfs under
> > > /sys/class/wakeup//.
> > >
> > > Introduce CONFIG_PM_SLEEP_STATS that enables/disables showing wakeup
> > > source statistics in sysfs.
> >
> > I'm not sure if this is really needed, but I'll let Greg decide.
>
> You are right.  Having zillions of config options is a pain, who is
> going to turn this off?
>
> But we can always remove the option before 5.4-rc1, so I'll take this
> as-is for now :)
>
> > Apart from this
> >
> > Reviewed-by: Rafael J. Wysocki 
>
> thanks for the review!  I'll wait for 5.3-rc1 to come out before adding
> this to my tree.

Greg, Rafael, thanks for taking the time to review this patch!


Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Juergen Gross

On 15.07.19 17:16, Andrew Cooper wrote:

There is a lot of infrastructure for functionality which is used
exclusively in __{save,restore}_processor_state() on the suspend/resume
path.

cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
rest of the Local APIC state isn't a clever thing to be doing.

Delete the suspend/resume cr8 handling, which shrinks the size of struct
saved_context, and allows for the removal of both PVOPS.

Signed-off-by: Andrew Cooper 


Reviewed-by: Juergen Gross 


Juergen


Re: [PATCH] ARM64/irqchip: Make ACPI_IORT depend on PCI again

2019-07-15 Thread Sinan Kaya
On 7/16/2019 12:04 AM, Sasha Levin wrote:
> ACPI_IORT lost it's explicit dependency on PCI in c6bb8f89fa6df
> ("ARM64/irqchip: Update ACPI_IORT symbol selection logic") where the
> author has relied on the general dependency of ACPI on PCI.
> 
> However, that dependency was finally removed in 5d32a66541c4 ("PCI/ACPI:
> Allow ACPI to be built without CONFIG_PCI set") and now ACPI_IORT breaks
> when we try and build it without PCI support.
> 
> This patch brings back the explicit dependency of ACPI_IORT on PCI.
> 
> Fixes: 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI 
> set")
> Cc: sta...@kernel.org
> Signed-off-by: Sasha Levin 

Do you have more detail on what really is broken without this patch?

It should be possible to build IORT table without PCI.



[PATCH V3] cpufreq: Make cpufreq_generic_init() return void

2019-07-15 Thread Viresh Kumar
It always returns 0 (success) and its return type should really be void.
Over that, many drivers have added error handling code based on its
return value, which is not required at all.

change its return type to void and update all the callers.

Signed-off-by: Viresh Kumar 
---
V2->V3:
- Update bmips cpufreq driver to avoid "warning: 'ret' may be used
  uninitialized".
- Build bot reported this issue almost after 4 days of posting this
  patch, I was expecting this a lot earlier :)

 drivers/cpufreq/bmips-cpufreq.c | 17 ++---
 drivers/cpufreq/cpufreq.c   |  4 +---
 drivers/cpufreq/davinci-cpufreq.c   |  3 ++-
 drivers/cpufreq/imx6q-cpufreq.c |  6 ++
 drivers/cpufreq/kirkwood-cpufreq.c  |  3 ++-
 drivers/cpufreq/loongson1-cpufreq.c |  8 +++-
 drivers/cpufreq/loongson2_cpufreq.c |  3 ++-
 drivers/cpufreq/maple-cpufreq.c |  3 ++-
 drivers/cpufreq/omap-cpufreq.c  | 15 +--
 drivers/cpufreq/pasemi-cpufreq.c|  3 ++-
 drivers/cpufreq/pmac32-cpufreq.c|  3 ++-
 drivers/cpufreq/pmac64-cpufreq.c|  3 ++-
 drivers/cpufreq/s3c2416-cpufreq.c   |  9 ++---
 drivers/cpufreq/s3c64xx-cpufreq.c   | 15 +++
 drivers/cpufreq/s5pv210-cpufreq.c   |  3 ++-
 drivers/cpufreq/sa1100-cpufreq.c|  3 ++-
 drivers/cpufreq/sa1110-cpufreq.c|  3 ++-
 drivers/cpufreq/spear-cpufreq.c |  3 ++-
 drivers/cpufreq/tegra20-cpufreq.c   |  8 +---
 include/linux/cpufreq.h |  2 +-
 20 files changed, 46 insertions(+), 71 deletions(-)

diff --git a/drivers/cpufreq/bmips-cpufreq.c b/drivers/cpufreq/bmips-cpufreq.c
index 56a4ebbf00e0..f7c23fa468f0 100644
--- a/drivers/cpufreq/bmips-cpufreq.c
+++ b/drivers/cpufreq/bmips-cpufreq.c
@@ -131,23 +131,18 @@ static int bmips_cpufreq_exit(struct cpufreq_policy 
*policy)
 static int bmips_cpufreq_init(struct cpufreq_policy *policy)
 {
struct cpufreq_frequency_table *freq_table;
-   int ret;
 
freq_table = bmips_cpufreq_get_freq_table(policy);
if (IS_ERR(freq_table)) {
-   ret = PTR_ERR(freq_table);
-   pr_err("%s: couldn't determine frequency table (%d).\n",
-   BMIPS_CPUFREQ_NAME, ret);
-   return ret;
+   pr_err("%s: couldn't determine frequency table (%ld).\n",
+   BMIPS_CPUFREQ_NAME, PTR_ERR(freq_table));
+   return PTR_ERR(freq_table);
}
 
-   ret = cpufreq_generic_init(policy, freq_table, TRANSITION_LATENCY);
-   if (ret)
-   bmips_cpufreq_exit(policy);
-   else
-   pr_info("%s: registered\n", BMIPS_CPUFREQ_NAME);
+   cpufreq_generic_init(policy, freq_table, TRANSITION_LATENCY);
+   pr_info("%s: registered\n", BMIPS_CPUFREQ_NAME);
 
-   return ret;
+   return 0;
 }
 
 static struct cpufreq_driver bmips_cpufreq_driver = {
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4d6043ee7834..8dda62367816 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -159,7 +159,7 @@ EXPORT_SYMBOL_GPL(arch_set_freq_scale);
  * - set policies transition latency
  * - policy->cpus with all possible CPUs
  */
-int cpufreq_generic_init(struct cpufreq_policy *policy,
+void cpufreq_generic_init(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table,
unsigned int transition_latency)
 {
@@ -171,8 +171,6 @@ int cpufreq_generic_init(struct cpufreq_policy *policy,
 * share the clock and voltage and clock.
 */
cpumask_setall(policy->cpus);
-
-   return 0;
 }
 EXPORT_SYMBOL_GPL(cpufreq_generic_init);
 
diff --git a/drivers/cpufreq/davinci-cpufreq.c 
b/drivers/cpufreq/davinci-cpufreq.c
index 3de48ae60c29..297d23cad8b5 100644
--- a/drivers/cpufreq/davinci-cpufreq.c
+++ b/drivers/cpufreq/davinci-cpufreq.c
@@ -90,7 +90,8 @@ static int davinci_cpu_init(struct cpufreq_policy *policy)
 * Setting the latency to 2000 us to accommodate addition of drivers
 * to pre/post change notification list.
 */
-   return cpufreq_generic_init(policy, freq_table, 2000 * 1000);
+   cpufreq_generic_init(policy, freq_table, 2000 * 1000);
+   return 0;
 }
 
 static struct cpufreq_driver davinci_driver = {
diff --git a/drivers/cpufreq/imx6q-cpufreq.c b/drivers/cpufreq/imx6q-cpufreq.c
index 47ccfa6b17b7..648a09a1778a 100644
--- a/drivers/cpufreq/imx6q-cpufreq.c
+++ b/drivers/cpufreq/imx6q-cpufreq.c
@@ -190,14 +190,12 @@ static int imx6q_set_target(struct cpufreq_policy 
*policy, unsigned int index)
 
 static int imx6q_cpufreq_init(struct cpufreq_policy *policy)
 {
-   int ret;
-
policy->clk = clks[ARM].clk;
-   ret = cpufreq_generic_init(policy, freq_table, transition_latency);
+   cpufreq_generic_init(policy, freq_table, transition_latency);
policy->suspend_freq = max_freq;
dev_pm_opp_of_register_em(policy->cpus);
 
-   return ret;
+   return 0;
 }
 
 static struct cpufreq_driver 

[PATCH] ARM64/irqchip: Make ACPI_IORT depend on PCI again

2019-07-15 Thread Sasha Levin
ACPI_IORT lost it's explicit dependency on PCI in c6bb8f89fa6df
("ARM64/irqchip: Update ACPI_IORT symbol selection logic") where the
author has relied on the general dependency of ACPI on PCI.

However, that dependency was finally removed in 5d32a66541c4 ("PCI/ACPI:
Allow ACPI to be built without CONFIG_PCI set") and now ACPI_IORT breaks
when we try and build it without PCI support.

This patch brings back the explicit dependency of ACPI_IORT on PCI.

Fixes: 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set")
Cc: sta...@kernel.org
Signed-off-by: Sasha Levin 
---
 arch/arm64/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a36ff61321ce..d6d93027196b 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -4,7 +4,7 @@ config ARM64
select ACPI_CCA_REQUIRED if ACPI
select ACPI_GENERIC_GSI if ACPI
select ACPI_GTDT if ACPI
-   select ACPI_IORT if ACPI
+   select ACPI_IORT if (ACPI && PCI)
select ACPI_REDUCED_HARDWARE_ONLY if ACPI
select ACPI_MCFG if (ACPI && PCI)
select ACPI_SPCR_TABLE if ACPI
-- 
2.20.1



Re: [PATCH v2 1/4] opp: core: add regulators enable and disable

2019-07-15 Thread Chanwoo Choi
Hi Kamil,

On 19. 7. 15. 오후 9:04, Kamil Konieczny wrote:
> Add enable regulators to dev_pm_opp_set_regulators() and disable
> regulators to dev_pm_opp_put_regulators(). This prepares for
> converting exynos-bus devfreq driver to use dev_pm_opp_set_rate().

IMHO, it is not proper to mention the specific driver name.
If you explain the reason why enable the regulator before using it,
it is enough description.

> 
> Signed-off-by: Kamil Konieczny 
> --
> Changes in v2:
> 
> - move regulator enable and disable into loop
> 
> ---
>  drivers/opp/core.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 0e7703fe733f..069c5cf8827e 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1570,6 +1570,10 @@ struct opp_table *dev_pm_opp_set_regulators(struct 
> device *dev,
>   goto free_regulators;
>   }
>  
> + ret = regulator_enable(reg);
> + if (ret < 0)
> + goto disable;
> +
>   opp_table->regulators[i] = reg;
>   }
>  
> @@ -1582,9 +1586,15 @@ struct opp_table *dev_pm_opp_set_regulators(struct 
> device *dev,
>  
>   return opp_table;
>  
> +disable:
> + regulator_put(reg);
> + --i;
> +
>  free_regulators:
> - while (i != 0)
> - regulator_put(opp_table->regulators[--i]);
> + for (; i >= 0; --i) {
> + regulator_disable(opp_table->regulators[i]);
> + regulator_put(opp_table->regulators[i]);
> + }
>  
>   kfree(opp_table->regulators);
>   opp_table->regulators = NULL;
> @@ -1610,8 +1620,10 @@ void dev_pm_opp_put_regulators(struct opp_table 
> *opp_table)
>   /* Make sure there are no concurrent readers while updating opp_table */
>   WARN_ON(!list_empty(_table->opp_list));
>  
> - for (i = opp_table->regulator_count - 1; i >= 0; i--)
> + for (i = opp_table->regulator_count - 1; i >= 0; i--) {
> + regulator_disable(opp_table->regulators[i]);
>   regulator_put(opp_table->regulators[i]);
> + }
>  
>   _free_set_opp_data(opp_table);
>  
> 

I agree to enable the regulator before using it.
The bootloader might not enable the regulators
and the kernel need to enable regulator in order to increase
the reference count explicitly event if bootloader enables it.

Reviewed-by: Chanwoo Choi 

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics


Re: [Linux-kernel-mentees] [PATCH v2] PCI: Remove functions not called in include/linux/pci.h

2019-07-15 Thread Kelsey Skunberg
On Mon, Jul 15, 2019 at 06:37:17PM -0500, Bjorn Helgaas wrote:
> On Mon, Jul 15, 2019 at 09:42:47PM +0200, Lukas Bulwahn wrote:
> > On Mon, 15 Jul 2019, Kelsey Skunberg wrote:
> > 
> > > Remove the following uncalled functions from include/linux/pci.h:
> > > 
> > > pci_block_cfg_access()
> > > pci_block_cfg_access_in_atomic()
> > > pci_unblock_cfg_access()
> > > 
> > > Functions were added in patch fb51ccbf217c "PCI: Rework config space
> > > blocking services", ...
> 
> > Also note that commits are referred to with this format:
> > 
> > commit <12-character sha prefix> ("")
> 
> FWIW, I use this shell alias to generate these references:
> 
>   gsr is aliased to `git --no-pager show -s --abbrev-commit --abbrev=12 
> --pretty=format:"%h (\"%s\")%n"'
> 
>   $ gsr fb51ccb
>   fb51ccbf217c ("PCI: Rework config space blocking services")
> 
> Documentation/process/submitting-patches.rst mentions a 12-char (at
> least) SHA-1 but the e21d2170f36 example shows a *20*-char SHA-1,
> which seems excessive to me.
> 
> I personally skip the word "commit" because I figure it's pretty
> obvious what it is, but it's fine either way.
> 
> Bjorn

This is very useful! I definitely like the use of the alias. Thank you!

-Kelsey


Re: [Xen-devel] [PATCH 0/2] Remove 32-bit Xen PV guest support

2019-07-15 Thread Juergen Gross

On 15.07.19 19:39, Andrew Cooper wrote:

On 15/07/2019 18:28, Andy Lutomirski wrote:

On Mon, Jul 15, 2019 at 9:34 AM Andi Kleen  wrote:

Juergen Gross  writes:


The long term plan has been to replace Xen PV guests by PVH. The first
victim of that plan are now 32-bit PV guests, as those are used only
rather seldom these days. Xen on x86 requires 64-bit support and with
Grub2 now supporting PVH officially since version 2.04 there is no
need to keep 32-bit PV guest support alive in the Linux kernel.
Additionally Meltdown mitigation is not available in the kernel running
as 32-bit PV guest, so dropping this mode makes sense from security
point of view, too.

Normally we have a deprecation period for feature removals like this.
You would make the kernel print a warning for some releases, and when
no user complains you can then remove. If a user complains you can't.


As I understand it, the kernel rules do allow changes like this even
if there's a complaint: this is a patch that removes what is
effectively hardware support.  If the maintenance cost exceeds the
value, then removal is fair game.  (Obviously we weight the value to
preserving compatibility quite highly, but in this case, Xen dropped
32-bit hardware support a long time ago.  If the Xen hypervisor says
that 32-bit PV guest support is deprecated, it's deprecated.)

That being said, a warning might not be a bad idea.  What's the
current status of this in upstream Xen?


So personally, I'd prefer to see support stay, but at the end of the day
it is Juergen's choice as the maintainer of the code.


Especially on the security front we are unsafe with 32-bit PV Linux.
And making it safe will make it so slow that the needed effort is not
spent very well.


Juergen


Re: [PATCH v2 2/4] devfreq: exynos-bus: convert to use dev_pm_opp_set_rate()

2019-07-15 Thread Chanwoo Choi
Hi Kamil,

Looks good to me. But, this patch has some issue.
I added the detailed reviews.

I recommend that you make the separate patches as following
in order to clarify the role of which apply the dev_pm_opp_* function.

First patch,
Need to consolidate the following two function into one function.
because the original exynos-bus.c has the problem that the regulator
of parent devfreq device have to be enabled before enabling the clock.
This issue did not happen because bootloader enables the bus-related
regulators before kernel booting.
- exynos_bus_parse_of()
- exynos_bus_parent_parse_of()

Second patch,
Apply dev_pm_opp_set_regulators() and dev_pm_opp_set_rate()


On 19. 7. 15. 오후 9:04, Kamil Konieczny wrote:
> Reuse opp core code for setting bus clock and voltage. As a side
> effect this allow useage of coupled regulators feature (required
> for boards using Exynos5422/5800 SoCs) because dev_pm_opp_set_rate()
> uses regulator_set_voltage_triplet() for setting regulator voltage
> while the old code used regulator_set_voltage_tol() with fixed
> tolerance. This patch also removes no longer needed parsing of DT
> property "exynos,voltage-tolerance" (no Exynos devfreq DT node uses
> it).
> 
> Signed-off-by: Kamil Konieczny 
> ---
>  drivers/devfreq/exynos-bus.c | 172 ++-
>  1 file changed, 66 insertions(+), 106 deletions(-)
> 
> diff --git a/drivers/devfreq/exynos-bus.c b/drivers/devfreq/exynos-bus.c
> index 486cc5b422f1..7fc4f76bd848 100644
> --- a/drivers/devfreq/exynos-bus.c
> +++ b/drivers/devfreq/exynos-bus.c
> @@ -25,7 +25,6 @@
>  #include 
>  
>  #define DEFAULT_SATURATION_RATIO 40
> -#define DEFAULT_VOLTAGE_TOLERANCE2
>  
>  struct exynos_bus {
>   struct device *dev;
> @@ -37,9 +36,9 @@ struct exynos_bus {
>  
>   unsigned long curr_freq;
>  
> - struct regulator *regulator;
> + struct opp_table *opp_table;
> +
>   struct clk *clk;
> - unsigned int voltage_tolerance;
>   unsigned int ratio;
>  };
>  
> @@ -99,56 +98,25 @@ static int exynos_bus_target(struct device *dev, unsigned 
> long *freq, u32 flags)
>  {
>   struct exynos_bus *bus = dev_get_drvdata(dev);
>   struct dev_pm_opp *new_opp;
> - unsigned long old_freq, new_freq, new_volt, tol;
>   int ret = 0;
> -
> - /* Get new opp-bus instance according to new bus clock */
> + /*
> +  * New frequency for bus may not be exactly matched to opp, adjust
> +  * *freq to correct value.
> +  */

You better to change this comment with following styles
to keep the consistency:

/* Get correct frequency for bus ... */

>   new_opp = devfreq_recommended_opp(dev, freq, flags);
>   if (IS_ERR(new_opp)) {
>   dev_err(dev, "failed to get recommended opp instance\n");
>   return PTR_ERR(new_opp);
>   }
>  
> - new_freq = dev_pm_opp_get_freq(new_opp);
> - new_volt = dev_pm_opp_get_voltage(new_opp);
>   dev_pm_opp_put(new_opp);
>  
> - old_freq = bus->curr_freq;
> -
> - if (old_freq == new_freq)
> - return 0;
> - tol = new_volt * bus->voltage_tolerance / 100;
> -
>   /* Change voltage and frequency according to new OPP level */
>   mutex_lock(>lock);
> + ret = dev_pm_opp_set_rate(dev, *freq);
> + if (!ret)
> + bus->curr_freq = *freq;

Have to print the error log if ret has minus error value.
Modify it as following:

if (ret < 0) {
dev_err(dev, "failed to set bus rate\n");
goto err:
}
bus->curr_freq = *freq;

err:
mutex_unlock(>lock);

return ret;

>  
> - if (old_freq < new_freq) {
> - ret = regulator_set_voltage_tol(bus->regulator, new_volt, tol);
> - if (ret < 0) {
> - dev_err(bus->dev, "failed to set voltage\n");
> - goto out;
> - }
> - }
> -
> - ret = clk_set_rate(bus->clk, new_freq);
> - if (ret < 0) {
> - dev_err(dev, "failed to change clock of bus\n");
> - clk_set_rate(bus->clk, old_freq);
> - goto out;
> - }
> -
> - if (old_freq > new_freq) {
> - ret = regulator_set_voltage_tol(bus->regulator, new_volt, tol);
> - if (ret < 0) {
> - dev_err(bus->dev, "failed to set voltage\n");
> - goto out;
> - }
> - }
> - bus->curr_freq = new_freq;
> -
> - dev_dbg(dev, "Set the frequency of bus (%luHz -> %luHz, %luHz)\n",
> - old_freq, new_freq, clk_get_rate(bus->clk));
> -out:
>   mutex_unlock(>lock);
>  
>   return ret;
> @@ -194,10 +162,11 @@ static void exynos_bus_exit(struct device *dev)
>   if (ret < 0)
>   dev_warn(dev, "failed to disable the devfreq-event devices\n");
>  
> - if (bus->regulator)
> - regulator_disable(bus->regulator);
> + if (bus->opp_table)
> + 

Re: [PATCH 1/2] x86/xen: remove 32-bit Xen PV guest support

2019-07-15 Thread Juergen Gross

On 15.07.19 17:44, Boris Ostrovsky wrote:



diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index 084de77a109e..d42737f31304 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -1,5 +1,5 @@
  # SPDX-License-Identifier: GPL-2.0
-OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y
+OBJECT_FILES_NON_STANDARD_xen-asm_64.o := y
  
  ifdef CONFIG_FUNCTION_TRACER

  # Do not profile debug and lowlevel utilities
@@ -34,7 +34,7 @@ obj-$(CONFIG_XEN_PV)  += mmu_pv.o
  obj-$(CONFIG_XEN_PV)  += irq.o
  obj-$(CONFIG_XEN_PV)  += multicalls.o
  obj-$(CONFIG_XEN_PV)  += xen-asm.o
-obj-$(CONFIG_XEN_PV)   += xen-asm_$(BITS).o
+obj-$(CONFIG_XEN_PV)   += xen-asm_64.o


We should be able to merge xen-asm_64.S into xen-asm.S, shouldn't we?


Yes, probably a good idea to add that.


@@ -1312,15 +1290,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
  
  	/* keep using Xen gdt for now; no urgent need to change it */
  
-#ifdef CONFIG_X86_32

-   pv_info.kernel_rpl = 1;
-   if (xen_feature(XENFEAT_supervisor_mode_kernel))
-   pv_info.kernel_rpl = 0;
-#else
pv_info.kernel_rpl = 0;


Is kernel_rpl needed anymore?


Yes, this can be dropped, together with get_kernel_rpl().


Juergen


Re: [PATCH V5 11/18] clk: tegra210: Add support for Tegra210 clocks

2019-07-15 Thread Dmitry Osipenko
16.07.2019 6:00, Sowjanya Komatineni пишет:
> 
> On 7/15/19 5:35 PM, Sowjanya Komatineni wrote:
>>
>> On 7/14/19 2:41 PM, Dmitry Osipenko wrote:
>>> 13.07.2019 8:54, Sowjanya Komatineni пишет:
 On 6/29/19 8:10 AM, Dmitry Osipenko wrote:
> 28.06.2019 5:12, Sowjanya Komatineni пишет:
>> This patch adds system suspend and resume support for Tegra210
>> clocks.
>>
>> All the CAR controller settings are lost on suspend when core power
>> goes off.
>>
>> This patch has implementation for saving and restoring all the PLLs
>> and clocks context during system suspend and resume to have the
>> clocks back to same state for normal operation.
>>
>> Acked-by: Thierry Reding 
>> Signed-off-by: Sowjanya Komatineni 
>> ---
>>    drivers/clk/tegra/clk-tegra210.c | 115
>> ++-
>>    drivers/clk/tegra/clk.c  |  14 +
>>    drivers/clk/tegra/clk.h  |   1 +
>>    3 files changed, 127 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/clk/tegra/clk-tegra210.c
>> b/drivers/clk/tegra/clk-tegra210.c
>> index 1c08c53482a5..1b839544e086 100644
>> --- a/drivers/clk/tegra/clk-tegra210.c
>> +++ b/drivers/clk/tegra/clk-tegra210.c
>> @@ -9,10 +9,12 @@
>>    #include 
>>    #include 
>>    #include 
>> +#include 
>>    #include 
>>    #include 
>>    #include 
>>    #include 
>> +#include 
>>    #include 
>>    #include 
>>    #include 
>> @@ -20,6 +22,7 @@
>>    #include 
>>      #include "clk.h"
>> +#include "clk-dfll.h"
>>    #include "clk-id.h"
>>      /*
>> @@ -225,6 +228,7 @@
>>      #define CLK_RST_CONTROLLER_RST_DEV_Y_SET 0x2a8
>>    #define CLK_RST_CONTROLLER_RST_DEV_Y_CLR 0x2ac
>> +#define CPU_SOFTRST_CTRL 0x380
>>      #define LVL2_CLK_GATE_OVRA 0xf8
>>    #define LVL2_CLK_GATE_OVRC 0x3a0
>> @@ -2820,6 +2824,7 @@ static int tegra210_enable_pllu(void)
>>    struct tegra_clk_pll_freq_table *fentry;
>>    struct tegra_clk_pll pllu;
>>    u32 reg;
>> +    int ret;
>>      for (fentry = pll_u_freq_table; fentry->input_rate;
>> fentry++) {
>>    if (fentry->input_rate == pll_ref_freq)
>> @@ -2847,10 +2852,10 @@ static int tegra210_enable_pllu(void)
>>    fence_udelay(1, clk_base);
>>    reg |= PLL_ENABLE;
>>    writel(reg, clk_base + PLLU_BASE);
>> +    fence_udelay(1, clk_base);
>>    -    readl_relaxed_poll_timeout_atomic(clk_base + PLLU_BASE, reg,
>> -  reg & PLL_BASE_LOCK, 2, 1000);
>> -    if (!(reg & PLL_BASE_LOCK)) {
>> +    ret = tegra210_wait_for_mask(, PLLU_BASE, PLL_BASE_LOCK);
>> +    if (ret) {
>>    pr_err("Timed out waiting for PLL_U to lock\n");
>>    return -ETIMEDOUT;
>>    }
>> @@ -3283,6 +3288,103 @@ static void tegra210_disable_cpu_clock(u32
>> cpu)
>>    }
>>      #ifdef CONFIG_PM_SLEEP
>> +static u32 cpu_softrst_ctx[3];
>> +static struct platform_device *dfll_pdev;
>> +#define car_readl(_base, _off) readl_relaxed(clk_base + (_base) +
>> ((_off) * 4))
>> +#define car_writel(_val, _base, _off) \
>> +    writel_relaxed(_val, clk_base + (_base) + ((_off) * 4))
>> +
>> +static int tegra210_clk_suspend(void)
>> +{
>> +    unsigned int i;
>> +    struct device_node *node;
>> +
>> +    tegra_cclkg_burst_policy_save_context();
>> +
>> +    if (!dfll_pdev) {
>> +    node = of_find_compatible_node(NULL, NULL,
>> +   "nvidia,tegra210-dfll");
>> +    if (node)
>> +    dfll_pdev = of_find_device_by_node(node);
>> +
>> +    of_node_put(node);
>> +    if (!dfll_pdev)
>> +    pr_err("dfll node not found. no suspend for dfll\n");
>> +    }
>> +
>> +    if (dfll_pdev)
>> +    tegra_dfll_suspend(dfll_pdev);
>> +
>> +    /* Enable PLLP_OUT_CPU after dfll suspend */
>> +    tegra_clk_set_pllp_out_cpu(true);
>> +
>> +    tegra_sclk_cclklp_burst_policy_save_context();
>> +
>> +    clk_save_context();
>> +
>> +    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
>> +    cpu_softrst_ctx[i] = car_readl(CPU_SOFTRST_CTRL, i);
>> +
>> +    return 0;
>> +}
>> +
>> +static void tegra210_clk_resume(void)
>> +{
>> +    unsigned int i;
>> +    struct clk_hw *parent;
>> +    struct clk *clk;
>> +
>> +    /*
>> + * clk_restore_context restores clocks as per the clock tree.
>> + *
>> + * dfllCPU_out is first in the clock tree to get restored and it
>> + * involves programming DFLL controller along with restoring
>> CPUG
>> + * clock burst policy.
>> + *
>> + * DFLL programming needs dfll_ref and 

RE: [EXT] Re: [v4,1/2] rtc/fsl: add FTM alarm driver as the wakeup source

2019-07-15 Thread Biwen Li
> Caution: EXT Email
> 
> On 15/07/2019 18:15:19+0800, Biwen Li wrote:
> > + device_init_wakeup(>dev, true);
> > + rtc->rtc_dev = devm_rtc_device_register(>dev, "ftm-alarm",
> > +
> _rtc_ops,
> > +
> THIS_MODULE);
> 
> To be clear, I want you to not use devm_rtc_device_register and use
> devm_rtc_allocate_device and rtc_register_device.
I will correct it in v5.
> 
> > + if (IS_ERR(rtc->rtc_dev)) {
> > + dev_err(>dev, "can't register rtc device\n");
> > + return PTR_ERR(rtc->rtc_dev);
> > + }
> > +
> 
> --
> Alexandre Belloni, Bootlin
> Embedded Linux and Kernel engineering
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbootlin.
> comdata=02%7C01%7Cbiwen.li%40nxp.com%7Cc5b1382ed1fb493cf69
> f08d7093513b2%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63
> 6987995881848030sdata=T71z2Rjw%2FZL444U8C1ow3WcoyDYl76Mq
> niLIupXbXtI%3Dreserved=0


Re: linux-next: build warning after merge of the ntb tree

2019-07-15 Thread Logan Gunthorpe



On 2019-07-15 9:03 p.m., Stephen Rothwell wrote:
> Hi all,
> 
> After merging the ntb tree, today's linux-next build (x86_64 allmodconfig)
> produced this warning:
> 
> WARNING: could not open /home/sfr/next/next/drivers/ntb/ntb.c: No such file 
> or directory
> 
> The only thing I could see that might be relevant is commit
> 
>   56dce8121e97 ("kbuild: split out *.mod out of {single,multi}-used-m rules")
> 
> and some others in the kbuild tree. Nothing has changed recently in the
> ntb tree ...
> 
> drievrs/ntb builds a module called ntb but there is no ntb.c file.
> 
> Any ideas?

I renamed the ntb.c file to core.c so we could add more files to build
ntb.ko. See [1].

Thanks,

Logan


[1]
https://lore.kernel.org/linux-iommu/20190422210528.15289-6-log...@deltatee.com/


[PATCH v4 4/4] numa: introduce numa cling feature

2019-07-15 Thread 王贇
Although we paid so many effort to settle down task on a particular
node, there are still chances for a task to leave it's preferred
node, that is by wakeup, numa swap migrations or load balance.

When we are using cpu cgroup in share way, since all the workloads
see all the cpus, it could be really bad especially when there
are too many fast wakeup, although now we can numa group the tasks,
they won't really stay on the same node, for example we have numa
group ng_A, ng_B, ng_C, ng_D, it's very likely result as:

CPU Usage:
Node 0  Node 1
ng_A(600%)  ng_A(400%)
ng_B(400%)  ng_B(600%)
ng_C(400%)  ng_C(600%)
ng_D(600%)  ng_D(400%)

Memory Ratio:
Node 0  Node 1
ng_A(60%)   ng_A(40%)
ng_B(40%)   ng_B(60%)
ng_C(40%)   ng_C(60%)
ng_D(60%)   ng_D(40%)

Locality won't be too bad but far from the best situation, we want
a numa group to settle down thoroughly on a particular node, with
every thing balanced.

Thus we introduce the numa cling, which try to prevent tasks leaving
the preferred node on wakeup fast path.

This help thoroughly settle down the workloads on single node, but when
multiple numa group try to settle down on the same node, unbalancing
could happen.

For example we have numa group ng_A, ng_B, ng_C, ng_D, it may result in
situation like:

CPU Usage:
Node 0  Node 1
ng_A(1000%) ng_B(1000%)
ng_C(400%)  ng_C(600%)
ng_D(400%)  ng_D(600%)

Memory Ratio:
Node 0  Node 1
ng_A(100%)  ng_B(100%)
ng_C(10%)   ng_C(90%)
ng_D(10%)   ng_D(90%)

This is because when ng_C, ng_D start to have most of the memory on node
1 at some point, task_x of ng_C stay on node 0 will try to do numa swap
migration with the task_y of ng_D stay on node 1 as long as load balanced,
the result is task_x stay on node 1 and task_y stay on node 0, while both
of them prefer node 1.

Now when other tasks of ng_D stay on node 1 wakeup task_y, task_y will
very likely go back to node 1, and since numa cling enabled, it will
keep stay on node 1 although load unbalanced, this could be frequently
and more and more tasks will prefer the node 1 and make it busy.

So the key point here is to stop doing numa cling when load starting to
become unbalancing.

We achieved this by monitoring the migration failure ratio, in scenery
above, too much tasks prefer node 1 and will keep migrating to it, load
unbalancing could lead into the migration failure in this case, and when
the failure ratio above the specified degree, we pause the cling and try
to resettle the workloads on a better node by stop tasks prefer the busy
node, this will finally give us the result like:

CPU Usage:
Node 0  Node 1
ng_A(1000%) ng_B(1000%)
ng_C(1000%) ng_D(1000%)

Memory Ratio:
Node 0  Node 1
ng_A(100%)  ng_B(100%)
ng_C(100%)  ng_D(100%)

Now we achieved the best locality and maximum hot cache benefit.

Tested on a 2 node box with 96 cpus, do sysbench-mysql-oltp_read_write
testing, X mysqld instances created and attached to X cgroups, X sysbench
instances then created and attached to corresponding cgroup to test the
mysql with oltp_read_write script for 20 minutes, average eps show:

origin  ng + cling
4 instances each 24 threads 7641.27 8010.18 +4.83%
4 instances each 48 threads 9423.39 10021.03+6.34%
4 instances each 72 threads 9691.47 10192.73+5.17%

8 instances each 24 threads 4485.44 4577.95 +2.06%
8 instances each 48 threads 5565.06 5737.50 +3.10%
8 instances each 72 threads 5605.20 5752.33 +2.63%

Also tested with perf-bench-numa, dbench, sysbench-memory, pgbench, tiny
improvement observed.

Signed-off-by: Michael Wang 
---
Since v3:
  * numa cling no longer override select_idle_sibling, instead we
prevent numa swap migration with tasks cling to dst-node, also
prevent wake affine to drag tasks away which already cling to
prev-cpu
  * refine comments

 include/linux/sched/sysctl.h |   3 +
 kernel/sched/fair.c  | 296 ---
 kernel/sysctl.c  |   9 ++
 3 files changed, 292 insertions(+), 16 deletions(-)

diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d4f6215ee03f..6eef34331dd2 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -38,6 +38,9 @@ extern unsigned int sysctl_numa_balancing_scan_period_min;
 extern unsigned int sysctl_numa_balancing_scan_period_max;
 extern unsigned int sysctl_numa_balancing_scan_size;

+extern unsigned int sysctl_numa_balancing_cling_degree;
+extern unsigned int 

Re: [PATCH V5 11/18] clk: tegra210: Add support for Tegra210 clocks

2019-07-15 Thread Sowjanya Komatineni



On 7/15/19 8:00 PM, Sowjanya Komatineni wrote:


On 7/15/19 5:35 PM, Sowjanya Komatineni wrote:


On 7/14/19 2:41 PM, Dmitry Osipenko wrote:

13.07.2019 8:54, Sowjanya Komatineni пишет:

On 6/29/19 8:10 AM, Dmitry Osipenko wrote:

28.06.2019 5:12, Sowjanya Komatineni пишет:

This patch adds system suspend and resume support for Tegra210
clocks.

All the CAR controller settings are lost on suspend when core power
goes off.

This patch has implementation for saving and restoring all the PLLs
and clocks context during system suspend and resume to have the
clocks back to same state for normal operation.

Acked-by: Thierry Reding 
Signed-off-by: Sowjanya Komatineni 
---
   drivers/clk/tegra/clk-tegra210.c | 115
++-
   drivers/clk/tegra/clk.c  |  14 +
   drivers/clk/tegra/clk.h  |   1 +
   3 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/tegra/clk-tegra210.c
b/drivers/clk/tegra/clk-tegra210.c
index 1c08c53482a5..1b839544e086 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -9,10 +9,12 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -20,6 +22,7 @@
   #include 
     #include "clk.h"
+#include "clk-dfll.h"
   #include "clk-id.h"
     /*
@@ -225,6 +228,7 @@
     #define CLK_RST_CONTROLLER_RST_DEV_Y_SET 0x2a8
   #define CLK_RST_CONTROLLER_RST_DEV_Y_CLR 0x2ac
+#define CPU_SOFTRST_CTRL 0x380
     #define LVL2_CLK_GATE_OVRA 0xf8
   #define LVL2_CLK_GATE_OVRC 0x3a0
@@ -2820,6 +2824,7 @@ static int tegra210_enable_pllu(void)
   struct tegra_clk_pll_freq_table *fentry;
   struct tegra_clk_pll pllu;
   u32 reg;
+    int ret;
     for (fentry = pll_u_freq_table; fentry->input_rate; 
fentry++) {

   if (fentry->input_rate == pll_ref_freq)
@@ -2847,10 +2852,10 @@ static int tegra210_enable_pllu(void)
   fence_udelay(1, clk_base);
   reg |= PLL_ENABLE;
   writel(reg, clk_base + PLLU_BASE);
+    fence_udelay(1, clk_base);
   -    readl_relaxed_poll_timeout_atomic(clk_base + PLLU_BASE, reg,
-  reg & PLL_BASE_LOCK, 2, 1000);
-    if (!(reg & PLL_BASE_LOCK)) {
+    ret = tegra210_wait_for_mask(, PLLU_BASE, PLL_BASE_LOCK);
+    if (ret) {
   pr_err("Timed out waiting for PLL_U to lock\n");
   return -ETIMEDOUT;
   }
@@ -3283,6 +3288,103 @@ static void 
tegra210_disable_cpu_clock(u32 cpu)

   }
     #ifdef CONFIG_PM_SLEEP
+static u32 cpu_softrst_ctx[3];
+static struct platform_device *dfll_pdev;
+#define car_readl(_base, _off) readl_relaxed(clk_base + (_base) +
((_off) * 4))
+#define car_writel(_val, _base, _off) \
+    writel_relaxed(_val, clk_base + (_base) + ((_off) * 4))
+
+static int tegra210_clk_suspend(void)
+{
+    unsigned int i;
+    struct device_node *node;
+
+    tegra_cclkg_burst_policy_save_context();
+
+    if (!dfll_pdev) {
+    node = of_find_compatible_node(NULL, NULL,
+   "nvidia,tegra210-dfll");
+    if (node)
+    dfll_pdev = of_find_device_by_node(node);
+
+    of_node_put(node);
+    if (!dfll_pdev)
+    pr_err("dfll node not found. no suspend for dfll\n");
+    }
+
+    if (dfll_pdev)
+    tegra_dfll_suspend(dfll_pdev);
+
+    /* Enable PLLP_OUT_CPU after dfll suspend */
+    tegra_clk_set_pllp_out_cpu(true);
+
+    tegra_sclk_cclklp_burst_policy_save_context();
+
+    clk_save_context();
+
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    cpu_softrst_ctx[i] = car_readl(CPU_SOFTRST_CTRL, i);
+
+    return 0;
+}
+
+static void tegra210_clk_resume(void)
+{
+    unsigned int i;
+    struct clk_hw *parent;
+    struct clk *clk;
+
+    /*
+ * clk_restore_context restores clocks as per the clock tree.
+ *
+ * dfllCPU_out is first in the clock tree to get restored 
and it
+ * involves programming DFLL controller along with restoring 
CPUG

+ * clock burst policy.
+ *
+ * DFLL programming needs dfll_ref and dfll_soc peripheral 
clocks

+ * to be restores which are part ofthe peripheral clocks.

 ^ white-space

Please use spellchecker to avoid typos.

+ * So, peripheral clocks restore should happen prior to dfll 
clock

+ * restore.
+ */
+
+    tegra_clk_osc_resume(clk_base);
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    car_writel(cpu_softrst_ctx[i], CPU_SOFTRST_CTRL, i);
+
+    /* restore all plls and peripheral clocks */
+    tegra210_init_pllu();
+    clk_restore_context();
+
+    fence_udelay(5, clk_base);
+
+    /* resume SCLK and CPULP clocks */
+    tegra_sclk_cpulp_burst_policy_restore_context();
+
+    /*
+ * restore CPUG clocks:
+ * - enable DFLL in open loop mode
+ * - switch CPUG to DFLL clock source
+ * - close DFLL loop
+ * - sync PLLX state
+ */
+    if (dfll_pdev)
+    

[PATCH v2 2/4] numa: append per-node execution time in cpu.numa_stat

2019-07-15 Thread 王贇
This patch introduced numa execution time information, to imply the numa
efficiency.

By doing 'cat /sys/fs/cgroup/cpu/CGROUP_PATH/cpu.numa_stat', we see new
output line heading with 'exectime', like:

  exectime 311900 407166

which means the tasks of this cgroup executed 311900 micro seconds on
node 0, and 407166 ms on node 1.

Combined with the memory node info from memory cgroup, we can estimate
the numa efficiency, for example if the memory.numa_stat show:

  total=206892 N0=21933 N1=185171

By monitoring the increments, if the topology keep in this way and
locality is not nice, then it imply numa balancing can't help migrate
the memory from node 1 to 0 which is accessing by tasks on node 0, or
tasks can't migrate to node 1 for some reason, then you may consider
to bind the workloads on the cpus of node 1.

Signed-off-by: Michael Wang 
---
Since v1:
  * move implementation from memory cgroup into cpu group
  * exectime now accounting in hierarchical way
  * change member name into jiffies

 kernel/sched/core.c  | 12 
 kernel/sched/fair.c  |  2 ++
 kernel/sched/sched.h |  1 +
 3 files changed, 15 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 71a8d3ed8495..f8aa73aa879b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7307,6 +7307,18 @@ static int cpu_numa_stat_show(struct seq_file *sf, void 
*v)
}
seq_putc(sf, '\n');

+   seq_puts(sf, "exectime");
+   for_each_online_node(nr) {
+   int cpu;
+   u64 sum = 0;
+
+   for_each_cpu(cpu, cpumask_of_node(nr))
+   sum += per_cpu(tg->numa_stat->jiffies, cpu);
+
+   seq_printf(sf, " %u", jiffies_to_msecs(sum));
+   }
+   seq_putc(sf, '\n');
+
return 0;
 }
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd716355d70e..2c362266af76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2652,6 +2652,8 @@ static void update_tg_numa_stat(struct task_struct *p)
if (idx != -1)
this_cpu_inc(tg->numa_stat->locality[idx]);

+   this_cpu_inc(tg->numa_stat->jiffies);
+
tg = tg->parent;
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 685a9e670880..456f83f7f595 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -360,6 +360,7 @@ struct cfs_bandwidth {

 struct numa_stat {
u64 locality[NR_NL_INTERVAL];
+   u64 jiffies;
 };

 #endif
-- 
2.14.4.44.g2045bb6



[PATCH v2 3/4] numa: introduce numa group per task group

2019-07-15 Thread 王贇
By tracing numa page faults, we recognize tasks sharing the same page,
and try pack them together into a single numa group.

However when two task share lot's of cache pages while not much
anonymous pages, since numa balancing do not tracing cache page, they
have no chance to join into the same group.

While tracing cache page cost too much, we could use some hints from
userland and cpu cgroup could be a good one.

This patch introduced new entry 'numa_group' for cpu cgroup, by echo
non-zero into the entry, we can now force all the tasks of this cgroup
to join the same numa group serving for task group.

In this way tasks are more likely to settle down on the same node, to
share closer cpu cache and gain benefit from NUMA on both file/anonymous
pages.

Besides, when multiple cgroup enabled numa group, they will be able to
exchange task location by utilizing numa migration, in this way they
could achieve single node settle down without breaking load balance.

Signed-off-by: Michael Wang 
---
Since v1:
  * just rebase, no logical changes

 kernel/sched/core.c  |  33 ++
 kernel/sched/fair.c  | 175 ++-
 kernel/sched/sched.h |  11 
 3 files changed, 218 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f8aa73aa879b..9f100c48d6e4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6802,6 +6802,8 @@ void sched_offline_group(struct task_group *tg)
 {
unsigned long flags;

+   update_tg_numa_group(tg, false);
+
/* End participation in shares distribution: */
unregister_fair_sched_group(tg);

@@ -7321,6 +7323,32 @@ static int cpu_numa_stat_show(struct seq_file *sf, void 
*v)

return 0;
 }
+
+static DEFINE_MUTEX(numa_mutex);
+
+static int cpu_numa_group_show(struct seq_file *sf, void *v)
+{
+   struct task_group *tg = css_tg(seq_css(sf));
+
+   mutex_lock(_mutex);
+   show_tg_numa_group(tg, sf);
+   mutex_unlock(_mutex);
+
+   return 0;
+}
+
+static int cpu_numa_group_write_s64(struct cgroup_subsys_state *css,
+   struct cftype *cft, s64 numa_group)
+{
+   int ret;
+   struct task_group *tg = css_tg(css);
+
+   mutex_lock(_mutex);
+   ret = update_tg_numa_group(tg, numa_group);
+   mutex_unlock(_mutex);
+
+   return ret;
+}
 #endif

 static struct cftype cpu_legacy_files[] = {
@@ -7364,6 +7392,11 @@ static struct cftype cpu_legacy_files[] = {
.name = "numa_stat",
.seq_show = cpu_numa_stat_show,
},
+   {
+   .name = "numa_group",
+   .write_s64 = cpu_numa_group_write_s64,
+   .seq_show = cpu_numa_group_show,
+   },
 #endif
{ } /* Terminate */
 };
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2c362266af76..c28ba040a563 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1073,6 +1073,7 @@ struct numa_group {
int nr_tasks;
pid_t gid;
int active_nodes;
+   bool evacuate;

struct rcu_head rcu;
unsigned long total_faults;
@@ -2246,6 +2247,176 @@ static inline void put_numa_group(struct numa_group 
*grp)
kfree_rcu(grp, rcu);
 }

+void show_tg_numa_group(struct task_group *tg, struct seq_file *sf)
+{
+   int nid;
+   struct numa_group *ng = tg->numa_group;
+
+   if (!ng) {
+   seq_puts(sf, "disabled\n");
+   return;
+   }
+
+   seq_printf(sf, "id %d nr_tasks %d active_nodes %d\n",
+  ng->gid, ng->nr_tasks, ng->active_nodes);
+
+   for_each_online_node(nid) {
+   int f_idx = task_faults_idx(NUMA_MEM, nid, 0);
+   int pf_idx = task_faults_idx(NUMA_MEM, nid, 1);
+
+   seq_printf(sf, "node %d ", nid);
+
+   seq_printf(sf, "mem_private %lu mem_shared %lu ",
+  ng->faults[f_idx], ng->faults[pf_idx]);
+
+   seq_printf(sf, "cpu_private %lu cpu_shared %lu\n",
+  ng->faults_cpu[f_idx], ng->faults_cpu[pf_idx]);
+   }
+}
+
+int update_tg_numa_group(struct task_group *tg, bool numa_group)
+{
+   struct numa_group *ng = tg->numa_group;
+
+   /* if no change then do nothing */
+   if ((ng != NULL) == numa_group)
+   return 0;
+
+   if (ng) {
+   /* put and evacuate tg's numa group */
+   rcu_assign_pointer(tg->numa_group, NULL);
+   ng->evacuate = true;
+   put_numa_group(ng);
+   } else {
+   unsigned int size = sizeof(struct numa_group) +
+   4*nr_node_ids*sizeof(unsigned long);
+
+   ng = kzalloc(size, GFP_KERNEL | __GFP_NOWARN);
+   if (!ng)
+   return -ENOMEM;
+
+   refcount_set(>refcount, 1);
+   spin_lock_init(>lock);
+   ng->faults_cpu = ng->faults + 

[PATCH v2 1/4] numa: introduce per-cgroup numa balancing locality statistic

2019-07-15 Thread 王贇
This patch introduced numa locality statistic, which try to imply
the numa balancing efficiency per memory cgroup.

On numa balancing, we trace the local page accessing ratio of tasks,
which we call the locality.

By doing 'cat /sys/fs/cgroup/cpu/CGROUP_PATH/cpu.numa_stat', we
see output line heading with 'locality', like:

  locality 15393 21259 13023 44461 21247 17012 28496 145402

locality divided into 8 regions, each number standing for the micro
seconds we hit a task running with the locality within that region,
for example here we have tasks with locality around 0~12% running for
15393 ms, and tasks with locality around 88~100% running for 145402 ms.

By monitoring the increment, we can check if the workloads of a
particular cgroup is doing well with numa, when most of the tasks are
running in low locality region, then something is wrong with your numa
policy.

Signed-off-by: Michael Wang 
---
Since v1:
  * move implementation from memory cgroup into cpu group
  * introduce new entry 'numa_stat' to present locality
  * locality now accounting in hierarchical way
  * locality now accounted into 8 regions equally

 include/linux/sched.h |  8 +++-
 kernel/sched/core.c   | 40 
 kernel/sched/debug.c  |  7 +++
 kernel/sched/fair.c   | 49 +
 kernel/sched/sched.h  | 29 +
 5 files changed, 132 insertions(+), 1 deletion(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 907808f1acc5..eb26098de6ea 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1117,8 +1117,14 @@ struct task_struct {
 * scan window were remote/local or failed to migrate. The task scan
 * period is adapted based on the locality of the faults with different
 * weights depending on whether they were shared or private faults
+*
+* 0 -- remote faults
+* 1 -- local faults
+* 2 -- page migration failure
+* 3 -- remote page accessing
+* 4 -- local page accessing
 */
-   unsigned long   numa_faults_locality[3];
+   unsigned long   numa_faults_locality[5];

unsigned long   numa_pages_migrated;
 #endif /* CONFIG_NUMA_BALANCING */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fa43ce3962e7..71a8d3ed8495 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6367,6 +6367,10 @@ static struct kmem_cache *task_group_cache __read_mostly;
 DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
 DECLARE_PER_CPU(cpumask_var_t, select_idle_mask);

+#ifdef CONFIG_NUMA_BALANCING
+DECLARE_PER_CPU(struct numa_stat, root_numa_stat);
+#endif
+
 void __init sched_init(void)
 {
unsigned long alloc_size = 0, ptr;
@@ -6416,6 +6420,10 @@ void __init sched_init(void)
init_defrootdomain();
 #endif

+#ifdef CONFIG_NUMA_BALANCING
+   root_task_group.numa_stat = _numa_stat;
+#endif
+
 #ifdef CONFIG_RT_GROUP_SCHED
init_rt_bandwidth(_task_group.rt_bandwidth,
global_rt_period(), global_rt_runtime());
@@ -6727,6 +6735,7 @@ static DEFINE_SPINLOCK(task_group_lock);

 static void sched_free_group(struct task_group *tg)
 {
+   free_tg_numa_stat(tg);
free_fair_sched_group(tg);
free_rt_sched_group(tg);
autogroup_free(tg);
@@ -6742,6 +6751,9 @@ struct task_group *sched_create_group(struct task_group 
*parent)
if (!tg)
return ERR_PTR(-ENOMEM);

+   if (!alloc_tg_numa_stat(tg))
+   goto err;
+
if (!alloc_fair_sched_group(tg, parent))
goto err;

@@ -7277,6 +7289,28 @@ static u64 cpu_rt_period_read_uint(struct 
cgroup_subsys_state *css,
 }
 #endif /* CONFIG_RT_GROUP_SCHED */

+#ifdef CONFIG_NUMA_BALANCING
+static int cpu_numa_stat_show(struct seq_file *sf, void *v)
+{
+   int nr;
+   struct task_group *tg = css_tg(seq_css(sf));
+
+   seq_puts(sf, "locality");
+   for (nr = 0; nr < NR_NL_INTERVAL; nr++) {
+   int cpu;
+   u64 sum = 0;
+
+   for_each_possible_cpu(cpu)
+   sum += per_cpu(tg->numa_stat->locality[nr], cpu);
+
+   seq_printf(sf, " %u", jiffies_to_msecs(sum));
+   }
+   seq_putc(sf, '\n');
+
+   return 0;
+}
+#endif
+
 static struct cftype cpu_legacy_files[] = {
 #ifdef CONFIG_FAIR_GROUP_SCHED
{
@@ -7312,6 +7346,12 @@ static struct cftype cpu_legacy_files[] = {
.read_u64 = cpu_rt_period_read_uint,
.write_u64 = cpu_rt_period_write_uint,
},
+#endif
+#ifdef CONFIG_NUMA_BALANCING
+   {
+   .name = "numa_stat",
+   .seq_show = cpu_numa_stat_show,
+   },
 #endif
{ } /* Terminate */
 };
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index f7e4579e746c..a22b2a62aee2 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -848,6 

[PATCH v2 0/4] per-cgroup numa suite

2019-07-15 Thread 王贇
During our torturing on numa stuff, we found problems like:

  * missing per-cgroup information about the per-node execution status
  * missing per-cgroup information about the numa locality

That is when we have a cpu cgroup running with bunch of tasks, no good
way to tell how it's tasks are dealing with numa.

The first two patches are trying to complete the missing pieces, but
more problems appeared after monitoring these status:

  * tasks not always running on the preferred numa node
  * tasks from same cgroup running on different nodes

The task numa group handler will always check if tasks are sharing pages
and try to pack them into a single numa group, so they will have chance to
settle down on the same node, but this failed in some cases:

  * workloads share page caches rather than share mappings
  * workloads got too many wakeup across nodes

Since page caches are not traced by numa balancing, there are no way to
realize such kind of relationship, and when there are too many wakeup,
task will be drag from the preferred node and then migrate back by numa
balancing, repeatedly.

Here the third patch try to address the first issue, we could now give hint
to kernel about the relationship of tasks, and pack them into single numa
group.

And the forth patch introduced numa cling, which try to address the wakup
issue, now we try to make task stay on the preferred node on wakeup in fast
path, in order to address the unbalancing risk, we monitoring the numa
migration failure ratio, and pause numa cling when it reach the specified
degree.

Since v1:
  * move statistics from memory cgroup into cpu group
  * statistics now accounting in hierarchical way
  * locality now accounted into 8 regions equally
  * numa cling no longer override select_idle_sibling, instead we
prevent numa swap migration with tasks cling to dst-node, also
prevent wake affine to drag tasks away which already cling to
prev-cpu
  * other refine on comments and names

Michael Wang (4):
  v2 numa: introduce per-cgroup numa balancing locality statistic
  v2 numa: append per-node execution time in cpu.numa_stat
  v2 numa: introduce numa group per task group
  v4 numa: introduce numa cling feature

 include/linux/sched.h|   8 +-
 include/linux/sched/sysctl.h |   3 +
 kernel/sched/core.c  |  85 
 kernel/sched/debug.c |   7 +
 kernel/sched/fair.c  | 510 ++-
 kernel/sched/sched.h |  41 
 kernel/sysctl.c  |   9 +
 7 files changed, 651 insertions(+), 12 deletions(-)

-- 
2.14.4.44.g2045bb6



[PATCH] sched/core: fix double fetch in sched_copy_attr()

2019-07-15 Thread JingYi Hou
In sched_copy_attr(), attr->size was fetched twice in get_user()
and copy_from_user().

If change it between two fetches may cause security problems
or unexpected behaivor.

We can apply the same pattern used in perf_copy_attr(). That
is, use value fetched first time to overwrite it after second fetch.

Signed-off-by: JingYi Hou 
---
 kernel/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b037f195473..60088b907ef4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4945,6 +4945,8 @@ static int sched_copy_attr(struct sched_attr __user 
*uattr, struct sched_attr *a
ret = copy_from_user(attr, uattr, size);
if (ret)
return -EFAULT;
+
+   attr->size = size;

if ((attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) &&
size < SCHED_ATTR_SIZE_VER1)
--
2.20.1



Re: [GIT PULL] Backlight for v5.3

2019-07-15 Thread Linus Torvalds
On Mon, Jul 15, 2019 at 1:00 AM Lee Jones  wrote:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight.git 
> backlight-next-5.3

Hmm. No such ref exists.

I can see the "backlight-next" branch which has the right commit SHA1,
but you normally do a signed tag, and I'd much rather merge a signed
tag than a bare branch for linux-next.

Did you perhaps forget to push it out?

  Linus


[PATCH] arm64: dts: imx8mm: Correct SAI3 RXC/TXFS pin's mux option #1

2019-07-15 Thread Anson . Huang
From: Anson Huang 

According to i.MX8MM reference manual Rev.1, 03/2019:

SAI3_RXC pin's mux option #1 should be GPT1_CLK, NOT GPT1_CAPTURE2;
SAI3_TXFS pin's mux option #1 should be GPT1_CAPTURE2, NOT GPT1_CLK.

Fixes: c1c9d41319c3 ("dt-bindings: imx: Add pinctrl binding doc for imx8mm")
Signed-off-by: Anson Huang 
---
 arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h 
b/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h
index e25f7fc..cffa899 100644
--- a/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h
+++ b/arch/arm64/boot/dts/freescale/imx8mm-pinfunc.h
@@ -462,7 +462,7 @@
 #define MX8MM_IOMUXC_SAI3_RXFS_GPIO4_IO28   
0x1CC 0x434 0x000 0x5 0x0
 #define MX8MM_IOMUXC_SAI3_RXFS_TPSMP_HTRANS0
0x1CC 0x434 0x000 0x7 0x0
 #define MX8MM_IOMUXC_SAI3_RXC_SAI3_RX_BCLK  
0x1D0 0x438 0x000 0x0 0x0
-#define MX8MM_IOMUXC_SAI3_RXC_GPT1_CAPTURE2 
0x1D0 0x438 0x000 0x1 0x0
+#define MX8MM_IOMUXC_SAI3_RXC_GPT1_CLK  
0x1D0 0x438 0x000 0x1 0x0
 #define MX8MM_IOMUXC_SAI3_RXC_SAI5_RX_BCLK  
0x1D0 0x438 0x4D0 0x2 0x2
 #define MX8MM_IOMUXC_SAI3_RXC_GPIO4_IO29
0x1D0 0x438 0x000 0x5 0x0
 #define MX8MM_IOMUXC_SAI3_RXC_TPSMP_HTRANS1 
0x1D0 0x438 0x000 0x7 0x0
@@ -472,7 +472,7 @@
 #define MX8MM_IOMUXC_SAI3_RXD_GPIO4_IO30
0x1D4 0x43C 0x000 0x5 0x0
 #define MX8MM_IOMUXC_SAI3_RXD_TPSMP_HDATA0  
0x1D4 0x43C 0x000 0x7 0x0
 #define MX8MM_IOMUXC_SAI3_TXFS_SAI3_TX_SYNC 
0x1D8 0x440 0x000 0x0 0x0
-#define MX8MM_IOMUXC_SAI3_TXFS_GPT1_CLK 
0x1D8 0x440 0x000 0x1 0x0
+#define MX8MM_IOMUXC_SAI3_TXFS_GPT1_CAPTURE2
0x1D8 0x440 0x000 0x1 0x0
 #define MX8MM_IOMUXC_SAI3_TXFS_SAI5_RX_DATA1
0x1D8 0x440 0x4D8 0x2 0x2
 #define MX8MM_IOMUXC_SAI3_TXFS_GPIO4_IO31   
0x1D8 0x440 0x000 0x5 0x0
 #define MX8MM_IOMUXC_SAI3_TXFS_TPSMP_HDATA1 
0x1D8 0x440 0x000 0x7 0x0
-- 
2.7.4



[PATCH v3] coccinelle: semantic code search for missing of_node_put

2019-07-15 Thread Wen Yang
There are functions which increment a reference counter for a device node.
These functions belong to a programming interface for the management
of information from device trees.
The counter must be decremented after the last usage of a device node.
We find these functions by using the following script:


@initialize:ocaml@
@@

let relevant_str = "use of_node_put() on it when done"

let contains s1 s2 =
let re = Str.regexp_string s2
in
try ignore (Str.search_forward re s1 0); true
with Not_found -> false

let relevant_functions = ref []

let add_function f c =
if not (List.mem f !relevant_functions)
then
  begin
let s = String.concat " "
  (
(List.map String.lowercase_ascii
  (List.filter
(function x ->
  Str.string_match
  (Str.regexp "[a-zA-Z_\\(\\)][-a-zA-Z0-9_\\(\\)]*$")
x 0) (Str.split (Str.regexp "[ .;\t\n]+") c in
 if contains s relevant_str
 then
   Printf.printf "Found relevant function: %s\n" f;
   relevant_functions := f :: !relevant_functions;
  end

@r@
identifier fn;
comments c;
type T = struct device_node *;
@@

T@c fn(...) {
...
}

@script:ocaml@
f << r.fn;
c << r.c;
@@

let (cb,cm,ca) = List.hd c in
let c = String.concat " " cb in
add_function f c


Then copy the function names found by the above script to the r_miss_put
rule. This rule checks for missing of_node_put.

And this patch also looks for places where an of_node_put() call is on some
paths but not on others (implemented by the r_miss_put_ext rule).

Finally, this patch finds use-after-free issues for a node.
(implemented by the r_use_after_put rule)

Suggested-by: Julia Lawall 
Signed-off-by: Wen Yang 
Cc: Julia Lawall 
Cc: Gilles Muller 
Cc: Nicolas Palix 
Cc: Michal Marek 
Cc: Masahiro Yamada 
Cc: Wen Yang 
Cc: Markus Elfring 
Cc: co...@systeme.lip6.fr
---
v3: delete the global set, add a rule that checks for use-after-free.
v2: improve the commit description and delete duplicate code.

 scripts/coccinelle/free/of_node_put.cocci | 192 ++
 1 file changed, 192 insertions(+)
 create mode 100644 scripts/coccinelle/free/of_node_put.cocci

diff --git a/scripts/coccinelle/free/of_node_put.cocci 
b/scripts/coccinelle/free/of_node_put.cocci
new file mode 100644
index 000..cda43fa
--- /dev/null
+++ b/scripts/coccinelle/free/of_node_put.cocci
@@ -0,0 +1,192 @@
+// SPDX-License-Identifier: GPL-2.0
+/// Find missing of_node_put
+///
+// Confidence: Moderate
+// Copyright: (C) 2018-2019 Wen Yang, ZTE.
+// Comments:
+// Options: --no-includes --include-headers
+
+virtual report
+virtual org
+
+@initialize:python@
+@@
+
+report_miss_prefix = "ERROR: missing of_node_put; acquired a node pointer with 
refcount incremented on line "
+report_miss_suffix = ", but without a corresponding object release within this 
function."
+org_miss_main = "acquired a node pointer with refcount incremented"
+org_miss_sec = "needed of_node_put"
+report_use_after_put = "ERROR: use-after-free; reference preceded by 
of_node_put on line "
+org_use_after_put_main = "of_node_put"
+org_use_after_put_sec = "reference"
+
+@r_miss_put exists@
+local idexpression struct device_node *x;
+expression e, e1;
+position p1, p2;
+statement S;
+type T, T1;
+@@
+
+* x = @p1\(of_find_all_nodes\|
+ of_get_cpu_node\|
+ of_get_parent\|
+ of_get_next_parent\|
+ of_get_next_child\|
+ of_get_next_cpu_node\|
+ of_get_compatible_child\|
+ of_get_child_by_name\|
+ of_find_node_opts_by_path\|
+ of_find_node_by_name\|
+ of_find_node_by_type\|
+ of_find_compatible_node\|
+ of_find_node_with_property\|
+ of_find_matching_node_and_match\|
+ of_find_node_by_phandle\|
+ of_parse_phandle\|
+ of_find_next_cache_node\|
+ of_get_next_available_child\)(...);
+...
+if (x == NULL || ...) S
+... when != e = (T)x
+when != of_node_put(x)
+when != of_get_next_parent(x)
+when != of_find_matching_node(x, ...)
+when != if (x) { ... return x; }
+when != v4l2_async_notifier_add_fwnode_subdev(..., <+...x...+>, ...)
+when != e1 = of_fwnode_handle(x)
+(
+ if (x) { ... when forall
+ of_node_put(x) ... }
+|
+ return (T1)x;
+|
+ return of_fwnode_handle(x);
+|
+* return@p2 ...;
+)
+
+@script:python depends on report && r_miss_put@
+p1 << r_miss_put.p1;
+p2 << r_miss_put.p2;
+@@
+
+coccilib.report.print_report(p2[0], report_miss_prefix + p1[0].line + 
report_miss_suffix)
+
+@script:python depends on org && r_miss_put@
+p1 << r_miss_put.p1;
+p2 << r_miss_put.p2;
+@@
+
+cocci.print_main(org_miss_main, p1)
+cocci.print_secs(org_miss_sec, p2)
+
+@r_miss_put_ext exists@
+local idexpression struct device_node *x;
+expression e, e1;
+position p1 != r_miss_put.p1, p2 != r_miss_put.p2;
+identifier f;
+statement S;
+type T, T1;
+@@
+
+(
+* x = 

linux-next: build warning after merge of the ntb tree

2019-07-15 Thread Stephen Rothwell
Hi all,

After merging the ntb tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

WARNING: could not open /home/sfr/next/next/drivers/ntb/ntb.c: No such file or 
directory

The only thing I could see that might be relevant is commit

  56dce8121e97 ("kbuild: split out *.mod out of {single,multi}-used-m rules")

and some others in the kbuild tree. Nothing has changed recently in the
ntb tree ...

drievrs/ntb builds a module called ntb but there is no ntb.c file.

Any ideas?

-- 
Cheers,
Stephen Rothwell


pgp4aBXspyrWW.pgp
Description: OpenPGP digital signature


Re: list corruption in deferred_split_scan()

2019-07-15 Thread Yang Shi




On 7/15/19 6:36 PM, Qian Cai wrote:



On Jul 15, 2019, at 8:22 PM, Yang Shi  wrote:



On 7/15/19 2:23 PM, Qian Cai wrote:

On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:

Another possible lead is that without reverting the those commits below,
kdump
kernel would always also crash in shrink_slab_memcg() at this line,

map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);

This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
think of where nodeinfo was freed but memcg was still online. Maybe a
check is needed:

Actually, "memcg" is NULL.

It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin the 
memcg. So, the memcg should not go away.

Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed 
this line in shrink_slab_memcg(),

-   if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
+   if (!mem_cgroup_online(memcg))
return 0;

Since the kdump kernel has the parameter “cgroup_disable=memory”, 
shrink_slab_memcg() will no longer be able to handle NULL memcg from 
mem_cgroup_iter() as,

if (mem_cgroup_disabled())  
return NULL;


Aha, yes. memcg_kmem_enabled() implicitly checks !mem_cgroup_disabled(). 
Thanks for figuring this out. I think we need add mem_cgroup_dsiabled() 
check before calling shrink_slab_memcg() as below:


diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..2f03c61 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -701,7 +701,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int 
nid,

    unsigned long ret, freed = 0;
    struct shrinker *shrinker;

-   if (!mem_cgroup_is_root(memcg))
+   if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
    return shrink_slab_memcg(gfp_mask, nid, memcg, priority);

    if (!down_read_trylock(_rwsem))




diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..bacda49 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
gfp_mask, int nid,
  if (!mem_cgroup_online(memcg))
  return 0;

+   if (!memcg->nodeinfo[nid])
+   return 0;
+
  if (!down_read_trylock(_rwsem))
  return 0;


[9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
[9.072036][T1] Read of size 8 at addr 0dc8 by task
swapper/0/1
[9.072036][T1]
[9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
20190711+ #10
[9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
DL385
Gen10, BIOS A40 01/25/2019
[9.072036][T1] Call Trace:
[9.072036][T1]  dump_stack+0x62/0x9a
[9.072036][T1]  __kasan_report.cold.4+0xb0/0xb4
[9.072036][T1]  ? unwind_get_return_address+0x40/0x50
[9.072036][T1]  ? shrink_slab+0x111/0x440
[9.072036][T1]  kasan_report+0xc/0xe
[9.072036][T1]  __asan_load8+0x71/0xa0
[9.072036][T1]  shrink_slab+0x111/0x440
[9.072036][T1]  ? mem_cgroup_iter+0x98/0x840
[9.072036][T1]  ? unregister_shrinker+0x110/0x110
[9.072036][T1]  ? kasan_check_read+0x11/0x20
[9.072036][T1]  ? mem_cgroup_protected+0x39/0x260
[9.072036][T1]  shrink_node+0x31e/0xa30
[9.072036][T1]  ? shrink_node_memcg+0x1560/0x1560
[9.072036][T1]  ? ktime_get+0x93/0x110
[9.072036][T1]  do_try_to_free_pages+0x22f/0x820
[9.072036][T1]  ? shrink_node+0xa30/0xa30
[9.072036][T1]  ? kasan_check_read+0x11/0x20
[9.072036][T1]  ? check_chain_key+0x1df/0x2e0
[9.072036][T1]  try_to_free_pages+0x242/0x4d0
[9.072036][T1]  ? do_try_to_free_pages+0x820/0x820
[9.072036][T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
[9.072036][T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[9.072036][T1]  ? unwind_dump+0x260/0x260
[9.072036][T1]  ? kernel_text_address+0x33/0xc0
[9.072036][T1]  ? arch_stack_walk+0x8f/0xf0
[9.072036][T1]  ? ret_from_fork+0x22/0x40
[9.072036][T1]  alloc_page_interleave+0x18/0x130
[9.072036][T1]  alloc_pages_current+0xf6/0x110
[9.072036][T1]  allocate_slab+0x600/0x11f0
[9.072036][T1]  new_slab+0x46/0x70
[9.072036][T1]  ___slab_alloc+0x5d4/0x9c0
[9.072036][T1]  ? create_object+0x3a/0x3e0
[9.072036][T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
[9.072036][T1]  ? ___might_sleep+0xab/0xc0
[9.072036][T1]  ? create_object+0x3a/0x3e0
[9.072036][T1]  __slab_alloc+0x12/0x20
[9.072036][T1]  ? __slab_alloc+0x12/0x20
[9.072036][T1]  kmem_cache_alloc+0x32a/0x400
[9.072036][T1]  create_object+0x3a/0x3e0
[9.072036][T1]  kmemleak_alloc+0x71/0xa0
[9.072036][T1]  kmem_cache_alloc+0x272/0x400
[9.072036][T1]  ? kasan_check_read+0x11/0x20
[9.072036][T1]  ? do_raw_spin_unlock+0xa8/0x140
[9.072036][T1]  acpi_ps_alloc_op+0x76/0x122
[

Re: [PATCH V5 11/18] clk: tegra210: Add support for Tegra210 clocks

2019-07-15 Thread Sowjanya Komatineni



On 7/15/19 5:35 PM, Sowjanya Komatineni wrote:


On 7/14/19 2:41 PM, Dmitry Osipenko wrote:

13.07.2019 8:54, Sowjanya Komatineni пишет:

On 6/29/19 8:10 AM, Dmitry Osipenko wrote:

28.06.2019 5:12, Sowjanya Komatineni пишет:

This patch adds system suspend and resume support for Tegra210
clocks.

All the CAR controller settings are lost on suspend when core power
goes off.

This patch has implementation for saving and restoring all the PLLs
and clocks context during system suspend and resume to have the
clocks back to same state for normal operation.

Acked-by: Thierry Reding 
Signed-off-by: Sowjanya Komatineni 
---
   drivers/clk/tegra/clk-tegra210.c | 115
++-
   drivers/clk/tegra/clk.c  |  14 +
   drivers/clk/tegra/clk.h  |   1 +
   3 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/tegra/clk-tegra210.c
b/drivers/clk/tegra/clk-tegra210.c
index 1c08c53482a5..1b839544e086 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -9,10 +9,12 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -20,6 +22,7 @@
   #include 
     #include "clk.h"
+#include "clk-dfll.h"
   #include "clk-id.h"
     /*
@@ -225,6 +228,7 @@
     #define CLK_RST_CONTROLLER_RST_DEV_Y_SET 0x2a8
   #define CLK_RST_CONTROLLER_RST_DEV_Y_CLR 0x2ac
+#define CPU_SOFTRST_CTRL 0x380
     #define LVL2_CLK_GATE_OVRA 0xf8
   #define LVL2_CLK_GATE_OVRC 0x3a0
@@ -2820,6 +2824,7 @@ static int tegra210_enable_pllu(void)
   struct tegra_clk_pll_freq_table *fentry;
   struct tegra_clk_pll pllu;
   u32 reg;
+    int ret;
     for (fentry = pll_u_freq_table; fentry->input_rate; 
fentry++) {

   if (fentry->input_rate == pll_ref_freq)
@@ -2847,10 +2852,10 @@ static int tegra210_enable_pllu(void)
   fence_udelay(1, clk_base);
   reg |= PLL_ENABLE;
   writel(reg, clk_base + PLLU_BASE);
+    fence_udelay(1, clk_base);
   -    readl_relaxed_poll_timeout_atomic(clk_base + PLLU_BASE, reg,
-  reg & PLL_BASE_LOCK, 2, 1000);
-    if (!(reg & PLL_BASE_LOCK)) {
+    ret = tegra210_wait_for_mask(, PLLU_BASE, PLL_BASE_LOCK);
+    if (ret) {
   pr_err("Timed out waiting for PLL_U to lock\n");
   return -ETIMEDOUT;
   }
@@ -3283,6 +3288,103 @@ static void tegra210_disable_cpu_clock(u32 
cpu)

   }
     #ifdef CONFIG_PM_SLEEP
+static u32 cpu_softrst_ctx[3];
+static struct platform_device *dfll_pdev;
+#define car_readl(_base, _off) readl_relaxed(clk_base + (_base) +
((_off) * 4))
+#define car_writel(_val, _base, _off) \
+    writel_relaxed(_val, clk_base + (_base) + ((_off) * 4))
+
+static int tegra210_clk_suspend(void)
+{
+    unsigned int i;
+    struct device_node *node;
+
+    tegra_cclkg_burst_policy_save_context();
+
+    if (!dfll_pdev) {
+    node = of_find_compatible_node(NULL, NULL,
+   "nvidia,tegra210-dfll");
+    if (node)
+    dfll_pdev = of_find_device_by_node(node);
+
+    of_node_put(node);
+    if (!dfll_pdev)
+    pr_err("dfll node not found. no suspend for dfll\n");
+    }
+
+    if (dfll_pdev)
+    tegra_dfll_suspend(dfll_pdev);
+
+    /* Enable PLLP_OUT_CPU after dfll suspend */
+    tegra_clk_set_pllp_out_cpu(true);
+
+    tegra_sclk_cclklp_burst_policy_save_context();
+
+    clk_save_context();
+
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    cpu_softrst_ctx[i] = car_readl(CPU_SOFTRST_CTRL, i);
+
+    return 0;
+}
+
+static void tegra210_clk_resume(void)
+{
+    unsigned int i;
+    struct clk_hw *parent;
+    struct clk *clk;
+
+    /*
+ * clk_restore_context restores clocks as per the clock tree.
+ *
+ * dfllCPU_out is first in the clock tree to get restored and it
+ * involves programming DFLL controller along with restoring 
CPUG

+ * clock burst policy.
+ *
+ * DFLL programming needs dfll_ref and dfll_soc peripheral 
clocks

+ * to be restores which are part ofthe peripheral clocks.

 ^ white-space

Please use spellchecker to avoid typos.

+ * So, peripheral clocks restore should happen prior to dfll 
clock

+ * restore.
+ */
+
+    tegra_clk_osc_resume(clk_base);
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    car_writel(cpu_softrst_ctx[i], CPU_SOFTRST_CTRL, i);
+
+    /* restore all plls and peripheral clocks */
+    tegra210_init_pllu();
+    clk_restore_context();
+
+    fence_udelay(5, clk_base);
+
+    /* resume SCLK and CPULP clocks */
+    tegra_sclk_cpulp_burst_policy_restore_context();
+
+    /*
+ * restore CPUG clocks:
+ * - enable DFLL in open loop mode
+ * - switch CPUG to DFLL clock source
+ * - close DFLL loop
+ * - sync PLLX state
+ */
+    if (dfll_pdev)
+    tegra_dfll_resume(dfll_pdev, false);
+
+    

Re: [PATCH V35 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down

2019-07-15 Thread Dave Young
Hi,
On 07/15/19 at 12:59pm, Matthew Garrett wrote:
> From: Josh Boyer 
> 
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware .  Reject
> the option when the kernel is locked down.
> 
> Signed-off-by: Josh Boyer 
> Signed-off-by: David Howells 
> Signed-off-by: Matthew Garrett 
> Reviewed-by: Kees Cook 
> cc: Dave Young 
> cc: linux-a...@vger.kernel.org
> ---
>  drivers/acpi/osl.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 9c0edf2fc0dd..06e7cffc4386 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -180,7 +181,7 @@ acpi_physical_address __init 
> acpi_os_get_root_pointer(void)
>   acpi_physical_address pa;
>  
>  #ifdef CONFIG_KEXEC
> - if (acpi_rsdp)
> + if (acpi_rsdp && !security_locked_down(LOCKDOWN_ACPI_TABLES))
>   return acpi_rsdp;

I'm very sorry I noticed this late, but have to say this will not work for
X86 with latest kernel code.

acpi_physical_address __init acpi_os_get_root_pointer(void)
{
acpi_physical_address pa;

#ifdef CONFIG_KEXEC
if (acpi_rsdp)
return acpi_rsdp;
#endif
pa = acpi_arch_get_root_pointer();
if (pa)
return pa;
[snip]

In above code, the later acpi_arch_get_root_pointer still get acpi_rsdp
from cmdline param if provided.

You can check the arch/x86/boot/compressed/acpi.c, and
arch/x86/kernel/acpi/boot.c

In X86 early code, cmdline provided acpi_rsdp pointer will be saved in 
boot_params.acpi_rsdp_addr;
and the used in x86_default_get_root_pointer
 
>  #endif
>   pa = acpi_arch_get_root_pointer();
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

Thanks
Dave


Re: [PATCH AUTOSEL 5.1 013/219] x86/tsc: Use CPUID.0x16 to calculate missing crystal frequency

2019-07-15 Thread Daniel Drake
Hi,

On Mon, Jul 15, 2019 at 9:38 PM Sasha Levin  wrote:
> From: Daniel Drake 
>
> [ Upstream commit 604dc9170f2435d27da5039a3efd757dceadc684 ]

In my opinion this is not stable kernel material.

It alone does not solve a particular bug. It's part of a larger effort
to decrease usage of the 8254 PIT on modern platforms, which solves a
real problem (see "x86: skip PIT initialization on modern chipsets"),
but I'd conservatively recommend just leaving these patches out of the
stable tree. The problem has existed for a while without a
particularly wide impact, and there is a somewhat documented
workaround of changing BIOS settings.

Daniel


Re: [PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"

2019-07-15 Thread Marcelo Ricardo Leitner
On Tue, Jul 16, 2019 at 07:50:02AM +0530, Hariprasad Kelam wrote:
> This patch removes NULL checks before calling kfree.
> 
> fixes below issues reported by coccicheck
> net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
> freeing functions is not needed.
> 
> Signed-off-by: Hariprasad Kelam 

Acked-by: Marcelo Ricardo Leitner 

> ---
>  net/sctp/sm_make_chunk.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index ed39396..36bd8a6e 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -2582,8 +2582,7 @@ static int sctp_process_param(struct sctp_association 
> *asoc,
>   case SCTP_PARAM_STATE_COOKIE:
>   asoc->peer.cookie_len =
>   ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
> - if (asoc->peer.cookie)
> - kfree(asoc->peer.cookie);
> + kfree(asoc->peer.cookie);
>   asoc->peer.cookie = kmemdup(param.cookie->body, 
> asoc->peer.cookie_len, gfp);
>   if (!asoc->peer.cookie)
>   retval = 0;
> @@ -2648,8 +2647,7 @@ static int sctp_process_param(struct sctp_association 
> *asoc,
>   goto fall_through;
>  
>   /* Save peer's random parameter */
> - if (asoc->peer.peer_random)
> - kfree(asoc->peer.peer_random);
> + kfree(asoc->peer.peer_random);
>   asoc->peer.peer_random = kmemdup(param.p,
>   ntohs(param.p->length), gfp);
>   if (!asoc->peer.peer_random) {
> @@ -2663,8 +2661,7 @@ static int sctp_process_param(struct sctp_association 
> *asoc,
>   goto fall_through;
>  
>   /* Save peer's HMAC list */
> - if (asoc->peer.peer_hmacs)
> - kfree(asoc->peer.peer_hmacs);
> + kfree(asoc->peer.peer_hmacs);
>   asoc->peer.peer_hmacs = kmemdup(param.p,
>   ntohs(param.p->length), gfp);
>   if (!asoc->peer.peer_hmacs) {
> @@ -2680,8 +2677,7 @@ static int sctp_process_param(struct sctp_association 
> *asoc,
>   if (!ep->auth_enable)
>   goto fall_through;
>  
> - if (asoc->peer.peer_chunks)
> - kfree(asoc->peer.peer_chunks);
> + kfree(asoc->peer.peer_chunks);
>   asoc->peer.peer_chunks = kmemdup(param.p,
>   ntohs(param.p->length), gfp);
>   if (!asoc->peer.peer_chunks)
> -- 
> 2.7.4
> 


Re: [PATCH 1/4] numa: introduce per-cgroup numa balancing locality, statistic

2019-07-15 Thread 王贇
Hi Michal,

Thx for the comments :-)

On 2019/7/15 下午8:10, Michal Koutný wrote:
> Hello Yun.
> 
> On Fri, Jul 12, 2019 at 06:10:24PM +0800, 王贇   
> wrote:
>> Forgive me but I have no idea on how to combined this
>> with memory cgroup's locality hierarchical update...
>> parent memory cgroup do not have influence on mems_allowed
>> to it's children, correct?
> I'd recommend to look at the v2 of the cpuset controller that implements
> the hierarchical behavior among configured memory node sets.

Actually whatever the memory node sets or cpu allow sets is, it will
take effect on task's behavior regarding memory location and cpu
location, while the locality only care about the results rather than
the sets.

For example if we bind tasks to cpus of node 0 and memory allow only
the node 1, by cgroup controller or madvise, then they will running
on node 0 with all the memory on node 1, on each PF for numa balancing,
the task will access page on node 1 from node 0 remotely, so the
locality will always be 0.

> 
> (My comment would better fit to 
> [PATCH 3/4] numa: introduce numa group per task group
> IIUC, you could use cpuset controller to constraint memory nodes.)
> 
> For the second part (accessing numa statistics, i.e. this patch), I
> wonder wheter this information wouldn't be better presented under the
> cpuset controller too.

Yeah, we realized the cpu cgroup could be a better place to hold these
new statistics, both locality and exectime are task's running behavior,
related to memory location but not the memory behavior, will apply in
next version.

Regards,
Michael Wang

> 
> HTH,
> Michal
> 


[PATCH] net: sctp: fix warning "NULL check before some freeing functions is not needed"

2019-07-15 Thread Hariprasad Kelam
This patch removes NULL checks before calling kfree.

fixes below issues reported by coccicheck
net/sctp/sm_make_chunk.c:2586:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2652:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2667:3-8: WARNING: NULL check before some
freeing functions is not needed.
net/sctp/sm_make_chunk.c:2684:3-8: WARNING: NULL check before some
freeing functions is not needed.

Signed-off-by: Hariprasad Kelam 
---
 net/sctp/sm_make_chunk.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index ed39396..36bd8a6e 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -2582,8 +2582,7 @@ static int sctp_process_param(struct sctp_association 
*asoc,
case SCTP_PARAM_STATE_COOKIE:
asoc->peer.cookie_len =
ntohs(param.p->length) - sizeof(struct sctp_paramhdr);
-   if (asoc->peer.cookie)
-   kfree(asoc->peer.cookie);
+   kfree(asoc->peer.cookie);
asoc->peer.cookie = kmemdup(param.cookie->body, 
asoc->peer.cookie_len, gfp);
if (!asoc->peer.cookie)
retval = 0;
@@ -2648,8 +2647,7 @@ static int sctp_process_param(struct sctp_association 
*asoc,
goto fall_through;
 
/* Save peer's random parameter */
-   if (asoc->peer.peer_random)
-   kfree(asoc->peer.peer_random);
+   kfree(asoc->peer.peer_random);
asoc->peer.peer_random = kmemdup(param.p,
ntohs(param.p->length), gfp);
if (!asoc->peer.peer_random) {
@@ -2663,8 +2661,7 @@ static int sctp_process_param(struct sctp_association 
*asoc,
goto fall_through;
 
/* Save peer's HMAC list */
-   if (asoc->peer.peer_hmacs)
-   kfree(asoc->peer.peer_hmacs);
+   kfree(asoc->peer.peer_hmacs);
asoc->peer.peer_hmacs = kmemdup(param.p,
ntohs(param.p->length), gfp);
if (!asoc->peer.peer_hmacs) {
@@ -2680,8 +2677,7 @@ static int sctp_process_param(struct sctp_association 
*asoc,
if (!ep->auth_enable)
goto fall_through;
 
-   if (asoc->peer.peer_chunks)
-   kfree(asoc->peer.peer_chunks);
+   kfree(asoc->peer.peer_chunks);
asoc->peer.peer_chunks = kmemdup(param.p,
ntohs(param.p->length), gfp);
if (!asoc->peer.peer_chunks)
-- 
2.7.4



Re: [PATCH] ipvs: remove unnecessary space

2019-07-15 Thread yangxingwu
Pablo

v2 has been sent

I made the following changes:

1. remove all unnecessary spaces in one go
2. revert bitmap_alloc ( since it's irrelevant to this subject)
3. chenge subject to "net/netfiler:remove unnecessary space"

thanks

Pablo Neira Ayuso  于2019年7月15日周一 下午4:27写道:
>
> On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> > On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > > ---
> > >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > > index 94d9d34..98e358e 100644
> > > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> > > return 0;
> > > }
> > >
> > > -   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > -sizeof(unsigned long), GFP_KERNEL);
> > > +   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > +   sizeof(unsigned long), GFP_KERNEL);
>
> May I ask one thing? :-)
>
> Please, remove all unnecessary spaces in one go, search for:
>
> git grep "=  "
>
> in the netfilter tree, and send a v2 for this one.
>
> Thanks.


Re: [PATCH v4] PM / wakeup: show wakeup sources stats in sysfs

2019-07-15 Thread Greg Kroah-Hartman
On Mon, Jul 15, 2019 at 11:48:27PM +0200, Rafael J. Wysocki wrote:
> On Mon, Jul 15, 2019 at 11:44 PM Tri Vo  wrote:
> >
> > Userspace can use wakeup_sources debugfs node to plot history of suspend
> > blocking wakeup sources over device's boot cycle. This information can
> > then be used (1) for power-specific bug reporting and (2) towards
> > attributing battery consumption to specific processes over a period of
> > time.
> >
> > However, debugfs doesn't have stable ABI. For this reason, create a
> > 'struct device' to expose wakeup sources statistics in sysfs under
> > /sys/class/wakeup//.
> >
> > Introduce CONFIG_PM_SLEEP_STATS that enables/disables showing wakeup
> > source statistics in sysfs.
> 
> I'm not sure if this is really needed, but I'll let Greg decide.

You are right.  Having zillions of config options is a pain, who is
going to turn this off?

But we can always remove the option before 5.4-rc1, so I'll take this
as-is for now :)

> Apart from this
> 
> Reviewed-by: Rafael J. Wysocki 

thanks for the review!  I'll wait for 5.3-rc1 to come out before adding
this to my tree.

greg k-h


Re: [PATCH v3] PM / wakeup: show wakeup sources stats in sysfs

2019-07-15 Thread Greg KH
On Mon, Jul 15, 2019 at 02:48:13PM -0700, Tri Vo wrote:
> > And I am guessing that you actually tested this all out, and it works
> > for you?
> 
> Yes, I played around with wakelocks to make sure that wakeup source
> stats are added/updated/removed as expected.

Great!

> > Have you changed Android userspace to use the new api with no
> > problems?
> 
> Kalesh helped me test this patch (added him in Tested-by: field in
> latest patch version). We haven't tested beyond booting and manual
> inspection on android devices. Android userspace changes should be
> fairly trivial though.

Ok, I know some people wanted to do some premature optimization with the
original discussion about this, glad to see it's not needed...

thanks,

greg k-h


[PATCH v2] net/netfilter: remove unnecessary spaces

2019-07-15 Thread yangxingwu
this patch removes extra spaces

Signed-off-by: yangxingwu 
---
 net/netfilter/ipset/ip_set_hash_gen.h  | 2 +-
 net/netfilter/ipset/ip_set_list_set.c  | 2 +-
 net/netfilter/ipvs/ip_vs_core.c| 2 +-
 net/netfilter/ipvs/ip_vs_mh.c  | 4 ++--
 net/netfilter/ipvs/ip_vs_proto_tcp.c   | 2 +-
 net/netfilter/nf_conntrack_ftp.c   | 2 +-
 net/netfilter/nf_conntrack_proto_tcp.c | 2 +-
 net/netfilter/nfnetlink_log.c  | 4 ++--
 net/netfilter/nfnetlink_queue.c| 4 ++--
 net/netfilter/xt_IDLETIMER.c   | 2 +-
 10 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/net/netfilter/ipset/ip_set_hash_gen.h 
b/net/netfilter/ipset/ip_set_hash_gen.h
index 10f6196..eb907d2 100644
--- a/net/netfilter/ipset/ip_set_hash_gen.h
+++ b/net/netfilter/ipset/ip_set_hash_gen.h
@@ -954,7 +954,7 @@ struct htype {
mtype_data_netmask(d, NCIDR_GET(h->nets[j].cidr[0]));
 #endif
key = HKEY(d, h->initval, t->htable_bits);
-   n =  rcu_dereference_bh(hbucket(t, key));
+   n = rcu_dereference_bh(hbucket(t, key));
if (!n)
continue;
for (i = 0; i < n->pos; i++) {
diff --git a/net/netfilter/ipset/ip_set_list_set.c 
b/net/netfilter/ipset/ip_set_list_set.c
index 8ada318..5c2be76 100644
--- a/net/netfilter/ipset/ip_set_list_set.c
+++ b/net/netfilter/ipset/ip_set_list_set.c
@@ -289,7 +289,7 @@ struct list_set {
if (n &&
!(SET_WITH_TIMEOUT(set) &&
  ip_set_timeout_expired(ext_timeout(n, set
-   n =  NULL;
+   n = NULL;
 
e = kzalloc(set->dsize, GFP_ATOMIC);
if (!e)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 7138556..6b3ae76 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -615,7 +615,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff 
*skb,
unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
  iph->protocol == IPPROTO_UDP) ?
  IP_VS_CONN_F_ONE_PACKET : 0;
-   union nf_inet_addr daddr =  { .all = { 0, 0, 0, 0 } };
+   union nf_inet_addr daddr = { .all = { 0, 0, 0, 0 } };
 
/* create a new connection entry */
IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
index 94d9d34..da0280c 100644
--- a/net/netfilter/ipvs/ip_vs_mh.c
+++ b/net/netfilter/ipvs/ip_vs_mh.c
@@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
return 0;
}
 
-   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
-sizeof(unsigned long), GFP_KERNEL);
+   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
+   sizeof(unsigned long), GFP_KERNEL);
if (!table)
return -ENOMEM;
 
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c 
b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 915ac82..c7b46a9 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -710,7 +710,7 @@ static int __ip_vs_tcp_init(struct netns_ipvs *ipvs, struct 
ip_vs_proto_data *pd
sizeof(tcp_timeouts));
if (!pd->timeout_table)
return -ENOMEM;
-   pd->tcp_state_table =  tcp_states;
+   pd->tcp_state_table = tcp_states;
return 0;
 }
 
diff --git a/net/netfilter/nf_conntrack_ftp.c b/net/netfilter/nf_conntrack_ftp.c
index 8c6c11b..26c1ff8 100644
--- a/net/netfilter/nf_conntrack_ftp.c
+++ b/net/netfilter/nf_conntrack_ftp.c
@@ -162,7 +162,7 @@ static int try_rfc959(const char *data, size_t dlen,
if (length == 0)
return 0;
 
-   cmd->u3.ip =  htonl((array[0] << 24) | (array[1] << 16) |
+   cmd->u3.ip = htonl((array[0] << 24) | (array[1] << 16) |
(array[2] << 8) | array[3]);
cmd->u.tcp.port = htons((array[4] << 8) | array[5]);
return length;
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c 
b/net/netfilter/nf_conntrack_proto_tcp.c
index 1e2cc83..48f3a67 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1225,7 +1225,7 @@ static int tcp_to_nlattr(struct sk_buff *skb, struct 
nlattr *nla,
[CTA_PROTOINFO_TCP_WSCALE_ORIGINAL] = { .type = NLA_U8 },
[CTA_PROTOINFO_TCP_WSCALE_REPLY]= { .type = NLA_U8 },
[CTA_PROTOINFO_TCP_FLAGS_ORIGINAL]  = { .len = sizeof(struct 
nf_ct_tcp_flags) },
-   [CTA_PROTOINFO_TCP_FLAGS_REPLY] = { .len =  sizeof(struct 
nf_ct_tcp_flags) },
+   [CTA_PROTOINFO_TCP_FLAGS_REPLY] = { .len = sizeof(struct 
nf_ct_tcp_flags) },
 };
 
 #define TCP_NLATTR_SIZE( \
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 

RE: [EXT] [PATCH v1] net: fec: optionally reset PHY via a reset-controller

2019-07-15 Thread Andy Duan
From: Sven Van Asbroeck  Sent: Tuesday, July 16, 2019 5:05 
AM
> The current fec driver allows the PHY to be reset via a gpio, specified in the
> devicetree. However, some PHYs need to be reset in a more complex way.
> 
> To accommodate such PHYs, allow an optional reset controller in the fec
> devicetree. If no reset controller is found, the fec will fall back to the 
> legacy
> reset behaviour.
> 
> Example:
>  {
> phy-mode = "rgmii";
> resets = <_reset>;
> reset-names = "phy";
> status = "okay";
> };
> 
> Signed-off-by: Sven Van Asbroeck 
> ---
> 
> Will send a Documentation patch if this receives a positive review.
> 
>  drivers/net/ethernet/freescale/fec_main.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 38f10f7dcbc3..5a5f3ed6f16d 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -61,6 +61,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -3335,6 +3336,7 @@ static int fec_enet_get_irq_cnt(struct
> platform_device *pdev)  static int  fec_probe(struct platform_device *pdev)
> {
> +   struct reset_control *phy_reset;
> struct fec_enet_private *fep;
> struct fec_platform_data *pdata;
> struct net_device *ndev;
> @@ -3490,7 +3492,9 @@ fec_probe(struct platform_device *pdev)
> pm_runtime_set_active(>dev);
> pm_runtime_enable(>dev);
> 
> -   ret = fec_reset_phy(pdev);
> +   phy_reset = devm_reset_control_get_exclusive(>dev,
> "phy");
> +   ret = IS_ERR(phy_reset) ? fec_reset_phy(pdev) :
> +   reset_control_reset(phy_reset);
> if (ret)
> goto failed_reset;

The patch looks fine.
But the reset mechanism in the driver should be abandoned since
the phylib already can handle mii bus reset and phy device reset like
below piece code:

of_mdiobus_register()
of_mdiobus_register_phy()
phy_device_register()
mdiobus_register_device()
/* Assert the reset signal */
mdio_device_reset(mdiodev, 1);
/* Deassert the reset signal */
mdio_device_reset(>mdio, 0);

dts:
ethernet-phy@0 {
compatible = "ethernet-phy-id0141.0e90", 
"ethernet-phy-ieee802.3-c45";
interrupt-parent = <>;
interrupts = <35 1>;
reg = <0>;

resets = < 8>;
reset-names = "phy";
reset-gpios = < 4 1>;
reset-assert-us = <1000>;
reset-deassert-us = <2000>;
};
};

Please refer to doc:
Documentation/devicetree/bindings/net/ethernet-phy.yaml
> 
> --
> 2.17.1



Re: [PATCH v3 0/3] kernel/notifier.c: avoid duplicate registration

2019-07-15 Thread Xiaoming Ni
On 2019/7/15 13:38, Vasily Averin wrote:
> On 7/14/19 5:45 AM, Xiaoming Ni wrote:
>> On 2019/7/12 22:07, gre...@linuxfoundation.org wrote:
>>> On Fri, Jul 12, 2019 at 09:11:57PM +0800, Xiaoming Ni wrote:
 On 2019/7/11 21:57, Vasily Averin wrote:
> On 7/11/19 4:55 AM, Nixiaoming wrote:
>> On Wed, July 10, 2019 1:49 PM Vasily Averin wrote:
>>> On 7/10/19 6:09 AM, Xiaoming Ni wrote:
 Registering the same notifier to a hook repeatedly can cause the hook
 list to form a ring or lose other members of the list.
>>>
>>> I think is not enough to _prevent_ 2nd register attempt,
>>> it's enough to detect just attempt and generate warning to mark host in 
>>> bad state.
>>>
>>
>> Duplicate registration is prevented in my patch, not just "mark host in 
>> bad state"
>>
>> Duplicate registration is checked and exited in 
>> notifier_chain_cond_register()
>>
>> Duplicate registration was checked in notifier_chain_register() but only 
>> the alarm was triggered without exiting. added by commit 
>> 831246570d34692e 
>> ("kernel/notifier.c: double register detection")
>>
>> My patch is like a combination of 831246570d34692e and 
>> notifier_chain_cond_register(),
>>  which triggers an alarm and exits when a duplicate registration is 
>> detected.
>>
>>> Unexpected 2nd register of the same hook most likely will lead to 2nd 
>>> unregister,
>>> and it can lead to host crash in any time: 
>>> you can unregister notifier on first attempt it can be too early, it 
>>> can be still in use.
>>> on the other hand you can never call 2nd unregister at all.
>>
>> Since the member was not added to the linked list at the time of the 
>> second registration, 
>> no linked list ring was formed. 
>> The member is released on the first unregistration and -ENOENT on the 
>> second unregistration.
>> After patching, the fault has been alleviated
>
> You are wrong here.
> 2nd notifier's registration is a pure bug, this should never happen.
> If you know the way to reproduce this situation -- you need to fix it. 
>
> 2nd registration can happen in 2 cases:
> 1) missed rollback, when someone forget to call unregister after 
> successfull registration, 
> and then tried to call register again. It can lead to crash for example 
> when according module will be unloaded.
> 2) some subsystem is registered twice, for example from  different 
> namespaces.
> in this case unregister called during sybsystem cleanup in first 
> namespace will incorrectly remove notifier used 
> in second namespace, it also can lead to unexpacted behaviour.
>
 So in these two cases, is it more reasonable to trigger BUG() directly 
 when checking for duplicate registration ?
 But why does current notifier_chain_register() just trigger WARN() without 
 exiting ?
 notifier_chain_cond_register() direct exit without triggering WARN() ?
>>>
>>> It should recover from this, if it can be detected.  The main point is
>>> that not all apis have to be this "robust" when used within the kernel
>>> as we do allow for the callers to know what they are doing :)
>>>
>> In the notifier_chain_register(), the condition ( (*nl) == n) is the same 
>> registration of the same hook.
>>  We can intercept this situation and avoid forming a linked list ring to 
>> make the API more rob
> 
> Once again -- yes, you CAN prevent list corruption, but you CANNOT recover 
> the host and return it back to safe state.
> If double register event was detected -- it means you have bug in kernel.
> 
> Yes, you can add BUG here and crash the host immediately, but I prefer to use 
> warning in such situations.
> 
>>> If this does not cause any additional problems or slow downs, it's
>>> probably fine to add.
>>>
>> Notifier_chain_register() is not a system hotspot function.
>> At the same time, there is already a WARN_ONCE judgment. There is no new 
>> judgment in the new patch.
>> It only changes the processing under the condition of (*nl) == n, which will 
>> not cause performance problems.
>> At the same time, avoiding the formation of a link ring can make the system 
>> more robust.
> 
> I disagree, 
> yes, node will have correct list, but anyway node will work wrong and can 
> crash the host in any time.

Sorry, my description is not accurate.

My patch feature does not prevent users from repeatedly registering hooks.
But avoiding the chain ring caused by the user repeatedly registering the hook

There are no modules for duplicate registration hooks in the current system.
But considering that not all modules are in the kernel source tree,
In order to improve the robustness of the kernel API, we should avoid the 
linked list ring caused by repeated registration.
Or in order to improve the efficiency of problem location, when the duplicate 
registration is 

Re: [PATCH v6 14/14] arm64: dts: Add power controller device node of MT8183

2019-07-15 Thread CK Hu
Hi, Weiyi:

On Mon, 2019-07-15 at 17:07 +0800, Weiyi Lu wrote:
> On Mon, 2019-07-15 at 16:07 +0800, CK Hu wrote:
> > Hi, Weiyi:
> > 
> > On Mon, 2019-07-01 at 16:57 +0800, CK Hu wrote:
> > > Hi, Weiyi:
> > > 
> > > On Thu, 2019-06-20 at 10:38 +0800, Weiyi Lu wrote:
> > > > Add power controller node and smi-common node for MT8183
> > > > In scpsys node, it contains clocks and regmapping of
> > > > infracfg and smi-common for bus protection.
> > > > 
> > > > Signed-off-by: Weiyi Lu 
> > > > ---
> > > >  arch/arm64/boot/dts/mediatek/mt8183.dtsi | 62 
> > > > 
> > > >  1 file changed, 62 insertions(+)
> > > > 
> > > > diff --git a/arch/arm64/boot/dts/mediatek/mt8183.dtsi 
> > > > b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
> > > > index 08274bf..75c4881 100644
> > > > --- a/arch/arm64/boot/dts/mediatek/mt8183.dtsi
> > > > +++ b/arch/arm64/boot/dts/mediatek/mt8183.dtsi
> > > > @@ -8,6 +8,7 @@
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > >  
> > > >  / {
> > > > compatible = "mediatek,mt8183";
> > > > @@ -196,6 +197,62 @@
> > > > #clock-cells = <1>;
> > > > };
> > > >  
> > > > +   scpsys: syscon@10006000 {
> > > > +   compatible = "mediatek,mt8183-scpsys", "syscon";
> > > > +   #power-domain-cells = <1>;
> > > > +   reg = <0 0x10006000 0 0x1000>;
> > > > +   clocks = < CLK_TOP_MUX_AUD_INTBUS>,
> > > > +< CLK_INFRA_AUDIO>,
> > > > +< CLK_INFRA_AUDIO_26M_BCLK>,
> > > > +< CLK_TOP_MUX_MFG>,
> > > > +< CLK_TOP_MUX_MM>,
> > > > +< CLK_TOP_MUX_CAM>,
> > > > +< CLK_TOP_MUX_IMG>,
> > > > +< CLK_TOP_MUX_IPU_IF>,
> > > > +< CLK_TOP_MUX_DSP>,
> > > > +< CLK_TOP_MUX_DSP1>,
> > > > +< CLK_TOP_MUX_DSP2>,
> > > > +< CLK_MM_SMI_COMMON>,
> > > > +< CLK_MM_SMI_LARB0>,
> > > > +< CLK_MM_SMI_LARB1>,
> > > > +< CLK_MM_GALS_COMM0>,
> > > > +< CLK_MM_GALS_COMM1>,
> > > > +< CLK_MM_GALS_CCU2MM>,
> > > > +< CLK_MM_GALS_IPU12MM>,
> > > > +< CLK_MM_GALS_IMG2MM>,
> > > > +< CLK_MM_GALS_CAM2MM>,
> > > > +< CLK_MM_GALS_IPU2MM>,
> > 
> > I've removed all mmsys clock in scpsys node and display still works, so
> > I think these subsys clock could be removed from scpsys node. It's
> > reasonable that subsys clock is controlled by subsys device or the
> > device use it. In MT2712 [1], the scpsys does not control subsys clock
> > and it works, so I think you should remove subsys clock in scpsys device
> > node.
> > 
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/mediatek/mt2712e.dtsi?h=v5.2
> > 
> > Regards,
> > CK
> > 
> 
> Hello CK,
> 
> Sorry, I can't agree with you at all.
> I thought you just created an environment where the MM (DISP) power
> domain could not be turned on and off properly.
> If you delete those mmsys clocks listed, bus protection will not work.
> These clocks are used for bus protection that I mentioned in patch [2].
> I guess you are now trying to solve the problem that mmsys blocks are
> used for probing two drivers. One for the display and another for the
> clock. Right?
> In the previous test you mentioned, you have affected the registration
> of mmsys clock first. This is why you saw the boot failure. I think boot
> failure is the real problem I should avoid if mmsys clock cannot probe.
> 
> [2] https://patchwork.kernel.org/patch/11005747/
> 

OK, I'll try another way to fix the probe problem, but I still have
question about bus protection. I'm not sure how bus protection works,
but I think that what mtk_scpsys_ext_clear_bus_protection() do could be
moved in mtk_smi_clk_enable(). How do you think?

Regards,
CK

> > 
> > > 
> > > Up to now, MT8183 mmsys has the same resource with another device node:
> > > 
> > >   mmsys: syscon@1400 {
> > >   compatible = "mediatek,mt8183-mmsys", "syscon";
> > >   reg = <0 0x1400 0 0x1000>;
> > >   #clock-cells = <1>;
> > >   };
> > > 
> > >   display_components: dispsys@1400 {
> > >   compatible = "mediatek,mt8183-display";
> > >   reg = <0 0x1400 0 0x1000>;
> > >   power-domains = < MT8183_POWER_DOMAIN_DISP>;
> > >   };
> > > 
> > > I think this two node 

Re: list corruption in deferred_split_scan()

2019-07-15 Thread Qian Cai



> On Jul 15, 2019, at 8:22 PM, Yang Shi  wrote:
> 
> 
> 
> On 7/15/19 2:23 PM, Qian Cai wrote:
>> On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:
 Another possible lead is that without reverting the those commits below,
 kdump
 kernel would always also crash in shrink_slab_memcg() at this line,
 
 map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);
>>> This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
>>> think of where nodeinfo was freed but memcg was still online. Maybe a
>>> check is needed:
>> Actually, "memcg" is NULL.
> 
> It sounds weird. shrink_slab() is called in mem_cgroup_iter which does pin 
> the memcg. So, the memcg should not go away.

Well, the commit “mm: shrinker: make shrinker not depend on memcg kmem” changed 
this line in shrink_slab_memcg(),

-   if (!memcg_kmem_enabled() || !mem_cgroup_online(memcg))
+   if (!mem_cgroup_online(memcg))
return 0;

Since the kdump kernel has the parameter “cgroup_disable=memory”, 
shrink_slab_memcg() will no longer be able to handle NULL memcg from 
mem_cgroup_iter() as,

if (mem_cgroup_disabled())  
return NULL;

> 
>> 
>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>> index a0301ed..bacda49 100644
>>> --- a/mm/vmscan.c
>>> +++ b/mm/vmscan.c
>>> @@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
>>> gfp_mask, int nid,
>>>  if (!mem_cgroup_online(memcg))
>>>  return 0;
>>> 
>>> +   if (!memcg->nodeinfo[nid])
>>> +   return 0;
>>> +
>>>  if (!down_read_trylock(_rwsem))
>>>  return 0;
>>> 
 [9.072036][T1] BUG: KASAN: null-ptr-deref in 
 shrink_slab+0x111/0x440
 [9.072036][T1] Read of size 8 at addr 0dc8 by task
 swapper/0/1
 [9.072036][T1]
 [9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
 5.2.0-next-
 20190711+ #10
 [9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
 DL385
 Gen10, BIOS A40 01/25/2019
 [9.072036][T1] Call Trace:
 [9.072036][T1]  dump_stack+0x62/0x9a
 [9.072036][T1]  __kasan_report.cold.4+0xb0/0xb4
 [9.072036][T1]  ? unwind_get_return_address+0x40/0x50
 [9.072036][T1]  ? shrink_slab+0x111/0x440
 [9.072036][T1]  kasan_report+0xc/0xe
 [9.072036][T1]  __asan_load8+0x71/0xa0
 [9.072036][T1]  shrink_slab+0x111/0x440
 [9.072036][T1]  ? mem_cgroup_iter+0x98/0x840
 [9.072036][T1]  ? unregister_shrinker+0x110/0x110
 [9.072036][T1]  ? kasan_check_read+0x11/0x20
 [9.072036][T1]  ? mem_cgroup_protected+0x39/0x260
 [9.072036][T1]  shrink_node+0x31e/0xa30
 [9.072036][T1]  ? shrink_node_memcg+0x1560/0x1560
 [9.072036][T1]  ? ktime_get+0x93/0x110
 [9.072036][T1]  do_try_to_free_pages+0x22f/0x820
 [9.072036][T1]  ? shrink_node+0xa30/0xa30
 [9.072036][T1]  ? kasan_check_read+0x11/0x20
 [9.072036][T1]  ? check_chain_key+0x1df/0x2e0
 [9.072036][T1]  try_to_free_pages+0x242/0x4d0
 [9.072036][T1]  ? do_try_to_free_pages+0x820/0x820
 [9.072036][T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
 [9.072036][T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
 [9.072036][T1]  ? unwind_dump+0x260/0x260
 [9.072036][T1]  ? kernel_text_address+0x33/0xc0
 [9.072036][T1]  ? arch_stack_walk+0x8f/0xf0
 [9.072036][T1]  ? ret_from_fork+0x22/0x40
 [9.072036][T1]  alloc_page_interleave+0x18/0x130
 [9.072036][T1]  alloc_pages_current+0xf6/0x110
 [9.072036][T1]  allocate_slab+0x600/0x11f0
 [9.072036][T1]  new_slab+0x46/0x70
 [9.072036][T1]  ___slab_alloc+0x5d4/0x9c0
 [9.072036][T1]  ? create_object+0x3a/0x3e0
 [9.072036][T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
 [9.072036][T1]  ? ___might_sleep+0xab/0xc0
 [9.072036][T1]  ? create_object+0x3a/0x3e0
 [9.072036][T1]  __slab_alloc+0x12/0x20
 [9.072036][T1]  ? __slab_alloc+0x12/0x20
 [9.072036][T1]  kmem_cache_alloc+0x32a/0x400
 [9.072036][T1]  create_object+0x3a/0x3e0
 [9.072036][T1]  kmemleak_alloc+0x71/0xa0
 [9.072036][T1]  kmem_cache_alloc+0x272/0x400
 [9.072036][T1]  ? kasan_check_read+0x11/0x20
 [9.072036][T1]  ? do_raw_spin_unlock+0xa8/0x140
 [9.072036][T1]  acpi_ps_alloc_op+0x76/0x122
 [9.072036][T1]  acpi_ds_execute_arguments+0x2f/0x18d
 [9.072036][T1]  acpi_ds_get_package_arguments+0x7d/0x84
 [9.072036][T1]  acpi_ns_init_one_package+0x33/0x61
 [9.072036][T1]  acpi_ns_init_one_object+0xfc/0x189
 [9.072036][T1]  acpi_ns_walk_namespace+0x114/0x1f2
 [ 

Re: [PATCH] ipvs: remove unnecessary space

2019-07-15 Thread yangxingwu
ok

I will remove all unnecessary spaces and send the v2 patch

Thansk Pablo

Pablo Neira Ayuso  于2019年7月15日周一 下午4:27写道:
>
> On Wed, Jul 10, 2019 at 10:06:09AM +0200, Simon Horman wrote:
> > On Wed, Jul 10, 2019 at 03:45:52PM +0800, yangxingwu wrote:
> > > ---
> > >  net/netfilter/ipvs/ip_vs_mh.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/netfilter/ipvs/ip_vs_mh.c b/net/netfilter/ipvs/ip_vs_mh.c
> > > index 94d9d34..98e358e 100644
> > > --- a/net/netfilter/ipvs/ip_vs_mh.c
> > > +++ b/net/netfilter/ipvs/ip_vs_mh.c
> > > @@ -174,8 +174,8 @@ static int ip_vs_mh_populate(struct ip_vs_mh_state *s,
> > > return 0;
> > > }
> > >
> > > -   table =  kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > -sizeof(unsigned long), GFP_KERNEL);
> > > +   table = kcalloc(BITS_TO_LONGS(IP_VS_MH_TAB_SIZE),
> > > +   sizeof(unsigned long), GFP_KERNEL);
>
> May I ask one thing? :-)
>
> Please, remove all unnecessary spaces in one go, search for:
>
> git grep "=  "
>
> in the netfilter tree, and send a v2 for this one.
>
> Thanks.


Re: [PATCH v2] nvmet-file: fix nvmet_file_flush() always returning an error

2019-07-15 Thread Sagi Grimberg

Reviewed-by: Sagi Grimberg 


Re: [PATCH v3 2/4] of/platform: Add functional dependency link from DT bindings

2019-07-15 Thread Frank Rowand
On 7/15/19 11:40 AM, Saravana Kannan wrote:
> Replying again because the previous email accidentally included HTML.
> 
> Thanks for taking the time to reconsider the wording Frank. Your
> intention was clear to me in the first email too.
> 
> A kernel command line option can also completely disable this
> functionality easily and cleanly. Can we pick that as an option? I've
> an implementation of that in the v5 series I sent out last week.

Yes, Rob suggested a command line option for debugging, and I am fine with
that.  But even with that, I would like a lot of testing so that we have a
chance of finding systems that have trouble with the changes and could
potentially be fixed before impacting a large number of users.

-Frank

> 
> -Saravana
> 
> On Mon, Jul 15, 2019 at 7:39 AM Frank Rowand  wrote:
>>
>> On 7/15/19 7:26 AM, Frank Rowand wrote:
>>> HiRob,
>>>
>>> Sorry for such a late reply...
>>>
>>>
>>> On 7/1/19 8:25 PM, Saravana Kannan wrote:
 On Mon, Jul 1, 2019 at 6:32 PM Rob Herring  wrote:
>
> On Mon, Jul 1, 2019 at 6:48 PM Saravana Kannan  
> wrote:
>>
>> Add device-links after the devices are created (but before they are
>> probed) by looking at common DT bindings like clocks and
>> interconnects.
>>
>> Automatically adding device-links for functional dependencies at the
>> framework level provides the following benefits:
>>
>> - Optimizes device probe order and avoids the useless work of
>>   attempting probes of devices that will not probe successfully
>>   (because their suppliers aren't present or haven't probed yet).
>>
>>   For example, in a commonly available mobile SoC, registering just
>>   one consumer device's driver at an initcall level earlier than the
>>   supplier device's driver causes 11 failed probe attempts before the
>>   consumer device probes successfully. This was with a kernel with all
>>   the drivers statically compiled in. This problem gets a lot worse if
>>   all the drivers are loaded as modules without direct symbol
>>   dependencies.
>>
>> - Supplier devices like clock providers, interconnect providers, etc
>>   need to keep the resources they provide active and at a particular
>>   state(s) during boot up even if their current set of consumers don't
>>   request the resource to be active. This is because the rest of the
>>   consumers might not have probed yet and turning off the resource
>>   before all the consumers have probed could lead to a hang or
>>   undesired user experience.
>>
>>   Some frameworks (Eg: regulator) handle this today by turning off
>>   "unused" resources at late_initcall_sync and hoping all the devices
>>   have probed by then. This is not a valid assumption for systems with
>>   loadable modules. Other frameworks (Eg: clock) just don't handle
>>   this due to the lack of a clear signal for when they can turn off
>>   resources. This leads to downstream hacks to handle cases like this
>>   that can easily be solved in the upstream kernel.
>>
>>   By linking devices before they are probed, we give suppliers a clear
>>   count of the number of dependent consumers. Once all of the
>>   consumers are active, the suppliers can turn off the unused
>>   resources without making assumptions about the number of consumers.
>>
>> By default we just add device-links to track "driver presence" (probe
>> succeeded) of the supplier device. If any other functionality provided
>> by device-links are needed, it is left to the consumer/supplier
>> devices to change the link when they probe.
>>
>> Signed-off-by: Saravana Kannan 
>> ---
>>  drivers/of/Kconfig|  9 
>>  drivers/of/platform.c | 52 +++
>>  2 files changed, 61 insertions(+)
>>
>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
>> index 37c2ccbefecd..7c7fa7394b4c 100644
>> --- a/drivers/of/Kconfig
>> +++ b/drivers/of/Kconfig
>> @@ -103,4 +103,13 @@ config OF_OVERLAY
>>  config OF_NUMA
>> bool
>>
>> +config OF_DEVLINKS
>
> I'd prefer this not be a config option. After all, we want one kernel
> build that works for all platforms.

 We need a lot more changes before one kernel build can work for all
 platforms. At least until then, I think we need this. Lot less chance
 of breaking existing platforms before all the missing pieces are
 created.

> A kernel command line option to disable might be useful for debugging.

 Or we can have a command line to enable this for platforms that want
 to use it and have it default off.
>>>
>>
>>> Given the fragility of the current boot sequence (without this patch set)
>>> and the potential breakage of existing systems, I think that if we choose
>>> to accept this patch set that it should first 

Re: [PATCH v1] scsi: ufs: change msleep to usleep_range

2019-07-15 Thread Stanley Chu
Hi Bean,

On Mon, 2019-07-15 at 11:21 +, Bean Huo (beanhuo) wrote:
> From: Bean Huo 
> 
> This patch is to change msleep() to usleep_range() based on
> Documentation/timers/timers-howto.txt. It suggests using
> usleep_range() for small msec(1ms - 20ms) since msleep()
> will often sleep longer than desired value.
> 
> After changing, booting time will be 5ms-10ms faster than before.
> I tested this change on two different platforms, one has 5ms faster,
> another one is about 10ms. I think this is different on different
> platform.
> 
> Actually, from UFS host side, 1ms-5ms delay is already sufficient for
> its initialization of the local UIC layer.
> 
> Fixes: 7a3e97b0dc4b ([SCSI] ufshcd: UFS Host controller driver)
> Signed-off-by: Bean Huo 
> ---
>  drivers/scsi/ufs/ufshcd.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index a208589426b1..21f7b3b8026c 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -4213,12 +4213,6 @@ static int ufshcd_hba_execute_hce(struct ufs_hba *hba)
>  {
>   int retry;
>  
> - /*
> -  * msleep of 1 and 5 used in this function might result in msleep(20),
> -  * but it was necessary to send the UFS FPGA to reset mode during
> -  * development and testing of this driver. msleep can be changed to
> -  * mdelay and retry count can be reduced based on the controller.
> -  */
>   if (!ufshcd_is_hba_active(hba))
>   /* change controller state to "reset state" */
>   ufshcd_hba_stop(hba, true);
> @@ -4241,7 +4235,7 @@ static int ufshcd_hba_execute_hce(struct ufs_hba *hba)
>* instruction might be read back.
>* This delay can be changed based on the controller.
>*/
> - msleep(1);
> + usleep_range(1000, 1100);
>  
>   /* wait for the host controller to complete initialization */
>   retry = 10;
> @@ -4253,7 +4247,7 @@ static int ufshcd_hba_execute_hce(struct ufs_hba *hba)
>   "Controller enable failed\n");
>   return -EIO;
>   }
> - msleep(5);
> + usleep_range(5000, 5100);
>   }
>  
>   /* enable UIC related interrupts */

Acked-by: Stanley Chu 

Thanks,
Stanley





Re: [PATCH v3 6/6] interconnect: Add OPP table support for interconnects

2019-07-15 Thread Saravana Kannan
On Mon, Jul 15, 2019 at 1:16 AM Vincent Guittot
 wrote:
>
> On Tue, 9 Jul 2019 at 21:03, Saravana Kannan  wrote:
> >
> > On Tue, Jul 9, 2019 at 12:25 AM Vincent Guittot
> >  wrote:
> > >
> > > On Sun, 7 Jul 2019 at 23:48, Saravana Kannan  wrote:
> > > >
> > > > On Thu, Jul 4, 2019 at 12:12 AM Vincent Guittot
> > > >  wrote:
> > > > >
> > > > > On Wed, 3 Jul 2019 at 23:33, Saravana Kannan  
> > > > > wrote:
> > > > > >
> > > > > > On Tue, Jul 2, 2019 at 11:45 PM Vincent Guittot
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Wed, 3 Jul 2019 at 03:10, Saravana Kannan 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > Interconnect paths can have different performance points. Now 
> > > > > > > > that OPP
> > > > > > > > framework supports bandwidth OPP tables, add OPP table support 
> > > > > > > > for
> > > > > > > > interconnects.
> > > > > > > >
> > > > > > > > Devices can use the interconnect-opp-table DT property to 
> > > > > > > > specify OPP
> > > > > > > > tables for interconnect paths. And the driver can obtain the 
> > > > > > > > OPP table for
> > > > > > > > an interconnect path by calling icc_get_opp_table().
> > > > > > >
> > > > > > > The opp table of a path must come from the aggregation of OPP 
> > > > > > > tables
> > > > > > > of the interconnect providers.
> > > > > >
> > > > > > The aggregation of OPP tables of the providers is certainly the
> > > > > > superset of what a path can achieve, but to say that OPPs for
> > > > > > interconnect path should match that superset is an 
> > > > > > oversimplification
> > > > > > of the reality in hardware.
> > > > > >
> > > > > > There are lots of reasons an interconnect path might not want to use
> > > > > > all the available bandwidth options across all the interconnects in
> > > > > > the route.
> > > > > >
> > > > > > 1. That particular path might not have been validated or verified
> > > > > >during the HW design process for some of the 
> > > > > > frequencies/bandwidth
> > > > > >combinations of the providers.
> > > > >
> > > > > All these constraint are provider's constraints and not consumer's one
> > > > >
> > > > > The consumer asks for a bandwidth according to its needs and then the
> > > > > providers select the optimal bandwidth of each interconnect after
> > > > > aggregating all the request and according to what OPP have been
> > > > > validated
> > > >
> > > > Not really. The screening can be a consumer specific issue. The
> > > > consumer IP itself might have some issue with using too low of a
> > > > bandwidth or bandwidth that's not within some range. It should not be
> > >
> > > How can an IP ask for not enough bandwidth ?
> > > It asks the needed bandwidth based on its requirements
> >
> > The "enough bandwidth" is not always obvious. It's only for very
> > simple cases that you can calculate the required bandwidth. Even for
> > cases that you think might be "obvious/easy" aren't always easy.
> >
> > For example, you'd think a display IP would have a fixed bandwidth
> > requirement for a fixed resolution screen. But that's far from the
> > truth. It can also change as the number of layers change per frame.
> > For video decoder/encoder, it depends on how well the frames compress
> > with a specific compression scheme.
> > So the "required" bandwidth is often a heuristic based on the IP
> > frequency or traffic measurement.
> >
> > But that's not even the point I was making in this specific "bullet".
> >
> > A hardware IP might be screen/verified with only certain bandwidth
> > levels. Or it might have hardware bugs that prevent it from using
> > lower bandwidths even though it's technically sufficient. We need a
> > way to capture that per path. This is not even a fictional case. This
> > has been true multiple times over widely used IPs.
>
> here you are mixing HW constraint on the soc and OPP screening with
> bandwidth request from consumer
> ICC framework is about getting bandwidth request not trying to fix
> some HW/voltage dependency of the SoC
>
> >
> > > > the provider's job to take into account all the IP that might be
> > > > connected to the interconnects. If the interconnect HW itself didn't
> > >
> > > That's not what I'm saying. The provider knows which bandwidth the
> > > interconnect can provide as it is the ones which configures it. So if
> > > the interconnect has a finite number of bandwidth point based probably
> > > on the possible clock frequency and others config of the interconnect,
> > > it selects the best final config after aggregating the request of the
> > > consumer.
> >
> > I completely agree with this. What you are stating above is how it
> > should work and that's the whole point of the interconnect framework.
> >
> > But this is orthogonal to the point I'm making.
>
> It's not orthogonal because you want to add a OPP table pointer in the
> ICC path structure to fix your platform HW constraint whereas it's not
> the purpose of the framework IMO
>
> >
> > > > change, the provider 

Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Nadav Amit
> On Jul 15, 2019, at 4:30 PM, Andrew Cooper  wrote:
> 
> On 15/07/2019 19:17, Nadav Amit wrote:
>>> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  
>>> wrote:
>>> 
>>> There is a lot of infrastructure for functionality which is used
>>> exclusively in __{save,restore}_processor_state() on the suspend/resume
>>> path.
>>> 
>>> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
>>> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
>>> rest of the Local APIC state isn't a clever thing to be doing.
>>> 
>>> Delete the suspend/resume cr8 handling, which shrinks the size of struct
>>> saved_context, and allows for the removal of both PVOPS.
>> I think removing the interface for CR8 writes is also good to avoid
>> potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
>> Priorities between CR8 and APIC”):
>> 
>> "Operating software should implement either direct APIC TPR updates or CR8
>> style TPR updates but not mix them. Software can use a serializing
>> instruction (for example, CPUID) to serialize updates between MOV CR8 and
>> stores to the APIC.”
>> 
>> And native_write_cr8() did not even issue a serializing instruction.
> 
> Given its location, the one write_cr8() is bounded by two serialising
> operations, so is safe in practice.

That’s what the “potential” in "potential correctness issues” means :)



Re: [PATCH 1/3] nvme: Pass the queue to SQ_SIZE/CQ_SIZE macros

2019-07-15 Thread Benjamin Herrenschmidt
On Tue, 2019-07-16 at 10:46 +1000, Benjamin Herrenschmidt wrote:
> # Conflicts:
> #   drivers/nvme/host/pci.c
> ---

Oops :-) You can strip that or should I resend ? I'll wait for
comments/reviews regardless.

Cheers,
Ben.



[PATCH 3/3] nvme: Add support for Apple 2018+ models

2019-07-15 Thread Benjamin Herrenschmidt
Based on reverse engineering and original patch by

Paul Pawlowski 

This adds support for Apple weird implementation of NVME in their
2018 or later machines. It accounts for the twice-as-big SQ entries
for the IO queues, and the fact that only interrupt vector 0 appears
to function properly.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/nvme/host/core.c |  5 -
 drivers/nvme/host/nvme.h | 10 ++
 drivers/nvme/host/pci.c  |  6 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 716ebe87a2b8..480ea24d8cf4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2701,7 +2701,10 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
ctrl->hmmaxd = le16_to_cpu(id->hmmaxd);
 
/* Grab required IO queue size */
-   ctrl->iosqes = id->sqes & 0xf;
+   if (ctrl->quirks & NVME_QUIRK_128_BYTES_SQES)
+   ctrl->iosqes = 7;
+   else
+   ctrl->iosqes = id->sqes & 0xf;
if (ctrl->iosqes < NVME_NVM_IOSQES) {
dev_err(ctrl->device,
"unsupported required IO queue size %d\n", 
ctrl->iosqes);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 34ef35fcd8a5..b2a78d08b984 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -92,6 +92,16 @@ enum nvme_quirks {
 * Broken Write Zeroes.
 */
NVME_QUIRK_DISABLE_WRITE_ZEROES = (1 << 9),
+
+   /*
+* Use only one interrupt vector for all queues
+*/
+   NVME_QUIRK_SINGLE_VECTOR= (1 << 10),
+
+   /*
+* Use non-standard 128 bytes SQEs.
+*/
+   NVME_QUIRK_128_BYTES_SQES   = (1 << 11),
 };
 
 /*
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 54b35ea4af88..ab2358137419 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2080,6 +2080,9 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned 
int nr_io_queues)
dev->io_queues[HCTX_TYPE_DEFAULT] = 1;
dev->io_queues[HCTX_TYPE_READ] = 0;
 
+   if (dev->ctrl.quirks & NVME_QUIRK_SINGLE_VECTOR)
+   irq_queues = 1;
+
return pci_alloc_irq_vectors_affinity(pdev, 1, irq_queues,
  PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, );
 }
@@ -3037,6 +3040,9 @@ static const struct pci_device_id nvme_id_table[] = {
{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xff) },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001) },
{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2003) },
+   { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2005),
+   .driver_data = NVME_QUIRK_SINGLE_VECTOR |
+   NVME_QUIRK_128_BYTES_SQES },
{ 0, }
 };
 MODULE_DEVICE_TABLE(pci, nvme_id_table);
-- 
2.17.1



[PATCH 1/3] nvme: Pass the queue to SQ_SIZE/CQ_SIZE macros

2019-07-15 Thread Benjamin Herrenschmidt
This will make it easier to handle variable queue entry sizes
later. No functional change.

Signed-off-by: Benjamin Herrenschmidt 

# Conflicts:
#   drivers/nvme/host/pci.c
---
 drivers/nvme/host/pci.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index dd10cf78f2d3..8f006638452b 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -28,8 +28,8 @@
 #include "trace.h"
 #include "nvme.h"
 
-#define SQ_SIZE(depth) (depth * sizeof(struct nvme_command))
-#define CQ_SIZE(depth) (depth * sizeof(struct nvme_completion))
+#define SQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_command))
+#define CQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_completion))
 
 #define SGES_PER_PAGE  (PAGE_SIZE / sizeof(struct nvme_sgl_desc))
 
@@ -1344,16 +1344,16 @@ static enum blk_eh_timer_return nvme_timeout(struct 
request *req, bool reserved)
 
 static void nvme_free_queue(struct nvme_queue *nvmeq)
 {
-   dma_free_coherent(nvmeq->dev->dev, CQ_SIZE(nvmeq->q_depth),
+   dma_free_coherent(nvmeq->dev->dev, CQ_SIZE(nvmeq),
(void *)nvmeq->cqes, nvmeq->cq_dma_addr);
if (!nvmeq->sq_cmds)
return;
 
if (test_and_clear_bit(NVMEQ_SQ_CMB, >flags)) {
pci_free_p2pmem(to_pci_dev(nvmeq->dev->dev),
-   nvmeq->sq_cmds, SQ_SIZE(nvmeq->q_depth));
+   nvmeq->sq_cmds, SQ_SIZE(nvmeq));
} else {
-   dma_free_coherent(nvmeq->dev->dev, SQ_SIZE(nvmeq->q_depth),
+   dma_free_coherent(nvmeq->dev->dev, SQ_SIZE(nvmeq),
nvmeq->sq_cmds, nvmeq->sq_dma_addr);
}
 }
@@ -1433,12 +1433,12 @@ static int nvme_cmb_qdepth(struct nvme_dev *dev, int 
nr_io_queues,
 }
 
 static int nvme_alloc_sq_cmds(struct nvme_dev *dev, struct nvme_queue *nvmeq,
-   int qid, int depth)
+   int qid)
 {
struct pci_dev *pdev = to_pci_dev(dev->dev);
 
if (qid && dev->cmb_use_sqes && (dev->cmbsz & NVME_CMBSZ_SQS)) {
-   nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(depth));
+   nvmeq->sq_cmds = pci_alloc_p2pmem(pdev, SQ_SIZE(nvmeq));
if (nvmeq->sq_cmds) {
nvmeq->sq_dma_addr = pci_p2pmem_virt_to_bus(pdev,
nvmeq->sq_cmds);
@@ -1447,11 +1447,11 @@ static int nvme_alloc_sq_cmds(struct nvme_dev *dev, 
struct nvme_queue *nvmeq,
return 0;
}
 
-   pci_free_p2pmem(pdev, nvmeq->sq_cmds, SQ_SIZE(depth));
+   pci_free_p2pmem(pdev, nvmeq->sq_cmds, SQ_SIZE(nvmeq));
}
}
 
-   nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(depth),
+   nvmeq->sq_cmds = dma_alloc_coherent(dev->dev, SQ_SIZE(nvmeq),
>sq_dma_addr, GFP_KERNEL);
if (!nvmeq->sq_cmds)
return -ENOMEM;
@@ -1465,12 +1465,13 @@ static int nvme_alloc_queue(struct nvme_dev *dev, int 
qid, int depth)
if (dev->ctrl.queue_count > qid)
return 0;
 
-   nvmeq->cqes = dma_alloc_coherent(dev->dev, CQ_SIZE(depth),
+   nvmeq->q_depth = depth;
+   nvmeq->cqes = dma_alloc_coherent(dev->dev, CQ_SIZE(nvmeq),
 >cq_dma_addr, GFP_KERNEL);
if (!nvmeq->cqes)
goto free_nvmeq;
 
-   if (nvme_alloc_sq_cmds(dev, nvmeq, qid, depth))
+   if (nvme_alloc_sq_cmds(dev, nvmeq, qid))
goto free_cqdma;
 
nvmeq->dev = dev;
@@ -1479,15 +1480,14 @@ static int nvme_alloc_queue(struct nvme_dev *dev, int 
qid, int depth)
nvmeq->cq_head = 0;
nvmeq->cq_phase = 1;
nvmeq->q_db = >dbs[qid * 2 * dev->db_stride];
-   nvmeq->q_depth = depth;
nvmeq->qid = qid;
dev->ctrl.queue_count++;
 
return 0;
 
  free_cqdma:
-   dma_free_coherent(dev->dev, CQ_SIZE(depth), (void *)nvmeq->cqes,
-   nvmeq->cq_dma_addr);
+   dma_free_coherent(dev->dev, CQ_SIZE(nvmeq), (void *)nvmeq->cqes,
+ nvmeq->cq_dma_addr);
  free_nvmeq:
return -ENOMEM;
 }
@@ -1515,7 +1515,7 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 
qid)
nvmeq->cq_head = 0;
nvmeq->cq_phase = 1;
nvmeq->q_db = >dbs[qid * 2 * dev->db_stride];
-   memset((void *)nvmeq->cqes, 0, CQ_SIZE(nvmeq->q_depth));
+   memset((void *)nvmeq->cqes, 0, CQ_SIZE(nvmeq));
nvme_dbbuf_init(dev, nvmeq, qid);
dev->online_queues++;
wmb(); /* ensure the first interrupt sees the initialization */
-- 
2.17.1



[PATCH 2/3] nvme: Retrieve the required IO queue entry size from the controller

2019-07-15 Thread Benjamin Herrenschmidt
On PCIe based NVME devices, this will  retrieve the IO queue entry
size from the controller and use the "required" setting.

It should always be 6 (64 bytes) by spec. However some controllers
such as Apple's are not properly implementing the spec and require
the size to be 7 (128 bytes).

This provides the ground work for the subsequent quirks for these
controllers.

Signed-off-by: Benjamin Herrenschmidt 
---
 drivers/nvme/host/core.c | 25 +
 drivers/nvme/host/nvme.h |  1 +
 drivers/nvme/host/pci.c  |  9 ++---
 include/linux/nvme.h |  1 +
 4 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index cc09b81fc7f4..716ebe87a2b8 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1986,6 +1986,7 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl, u64 cap)
ctrl->ctrl_config = NVME_CC_CSS_NVM;
ctrl->ctrl_config |= (page_shift - 12) << NVME_CC_MPS_SHIFT;
ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
+   /* Use default IOSQES. We'll update it later if needed */
ctrl->ctrl_config |= NVME_CC_IOSQES | NVME_CC_IOCQES;
ctrl->ctrl_config |= NVME_CC_ENABLE;
 
@@ -2698,6 +2699,30 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
ctrl->hmmin = le32_to_cpu(id->hmmin);
ctrl->hmminds = le32_to_cpu(id->hmminds);
ctrl->hmmaxd = le16_to_cpu(id->hmmaxd);
+
+   /* Grab required IO queue size */
+   ctrl->iosqes = id->sqes & 0xf;
+   if (ctrl->iosqes < NVME_NVM_IOSQES) {
+   dev_err(ctrl->device,
+   "unsupported required IO queue size %d\n", 
ctrl->iosqes);
+   ret = -EINVAL;
+   goto out_free;
+   }
+   /*
+* If our IO queue size isn't the default, update the setting
+* in CC:IOSQES.
+*/
+   if (ctrl->iosqes != NVME_NVM_IOSQES) {
+   ctrl->ctrl_config &= ~(0xfu << NVME_CC_IOSQES_SHIFT);
+   ctrl->ctrl_config |= ctrl->iosqes << 
NVME_CC_IOSQES_SHIFT;
+   ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC,
+ctrl->ctrl_config);
+   if (ret) {
+   dev_err(ctrl->device,
+   "error updating CC register\n");
+   goto out_free;
+   }
+   }
}
 
ret = nvme_mpath_init(ctrl, id);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 716a876119c8..34ef35fcd8a5 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -244,6 +244,7 @@ struct nvme_ctrl {
u32 hmmin;
u32 hmminds;
u16 hmmaxd;
+   u8 iosqes;
 
/* Fabrics only */
u16 sqsize;
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 8f006638452b..54b35ea4af88 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -28,7 +28,7 @@
 #include "trace.h"
 #include "nvme.h"
 
-#define SQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_command))
+#define SQ_SIZE(q) ((q)->q_depth << (q)->sqes)
 #define CQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_completion))
 
 #define SGES_PER_PAGE  (PAGE_SIZE / sizeof(struct nvme_sgl_desc))
@@ -162,7 +162,7 @@ static inline struct nvme_dev *to_nvme_dev(struct nvme_ctrl 
*ctrl)
 struct nvme_queue {
struct nvme_dev *dev;
spinlock_t sq_lock;
-   struct nvme_command *sq_cmds;
+   void *sq_cmds;
 /* only used for poll queues: */
spinlock_t cq_poll_lock cacheline_aligned_in_smp;
volatile struct nvme_completion *cqes;
@@ -178,6 +178,7 @@ struct nvme_queue {
u16 last_cq_head;
u16 qid;
u8 cq_phase;
+   u8 sqes;
unsigned long flags;
 #define NVMEQ_ENABLED  0
 #define NVMEQ_SQ_CMB   1
@@ -488,7 +489,8 @@ static void nvme_submit_cmd(struct nvme_queue *nvmeq, 
struct nvme_command *cmd,
bool write_sq)
 {
spin_lock(>sq_lock);
-   memcpy(>sq_cmds[nvmeq->sq_tail], cmd, sizeof(*cmd));
+   memcpy(nvmeq->sq_cmds + (nvmeq->sq_tail << nvmeq->sqes),
+  cmd, sizeof(*cmd));
if (++nvmeq->sq_tail == nvmeq->q_depth)
nvmeq->sq_tail = 0;
nvme_write_sq_db(nvmeq, write_sq);
@@ -1465,6 +1467,7 @@ static int nvme_alloc_queue(struct nvme_dev *dev, int 
qid, int depth)
if (dev->ctrl.queue_count > qid)
return 0;
 
+   nvmeq->sqes = qid ? dev->ctrl.iosqes : NVME_NVM_ADMSQES;
nvmeq->q_depth = depth;
nvmeq->cqes = dma_alloc_coherent(dev->dev, CQ_SIZE(nvmeq),
 >cq_dma_addr, GFP_KERNEL);
diff --git a/include/linux/nvme.h b/include/linux/nvme.h

linux-next: manual merge of the rdma tree with the v4l-dvb-next tree

2019-07-15 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the rdma tree got a conflict in:

  Documentation/index.rst

between commit:

  09fdc957ad0d ("docs: leds: add it to the driver-api book")
(and others following)

from the v4l-dvb-next tree and commit:

  a3a400da206b ("docs: infiniband: add it to the driver-api bookset")

from the rdma tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/index.rst
index f379e43fcda0,869616b57aa8..
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@@ -96,23 -90,9 +96,24 @@@ needed)
  
 driver-api/index
 core-api/index
 +   locking/index
 +   accounting/index
 +   block/index
 +   cdrom/index
 +   ide/index
 +   fb/index
 +   fpga/index
 +   hid/index
 +   iio/index
 +   leds/index
+infiniband/index
 media/index
 +   netlabel/index
 networking/index
 +   pcmcia/index
 +   target/index
 +   timers/index
 +   watchdog/index
 input/index
 hwmon/index
 gpu/index


pgpbLCUlO51aG.pgp
Description: OpenPGP digital signature


Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one

2019-07-15 Thread Ralph Campbell



On 7/15/19 4:34 PM, John Hubbard wrote:

On 7/15/19 3:00 PM, Andrew Morton wrote:

On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell  wrote:



On 7/9/19 5:28 PM, Andrew Morton wrote:

On Tue, 9 Jul 2019 15:35:56 -0700 Ralph Campbell  wrote:


When migrating a ZONE device private page from device memory to system
memory, the subpage pointer is initialized from a swap pte which computes
an invalid page pointer. A kernel panic results such as:

BUG: unable to handle page fault for address: ea1fffc8

Initialize subpage correctly before calling page_remove_rmap().


I think this is

Fixes:  a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in 
migration")
Cc: stable

yes?



Yes. Can you add this or should I send a v2?


I updated the patch.  Could we please have some review input?


From: Ralph Campbell 
Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one

When migrating a ZONE device private page from device memory to system
memory, the subpage pointer is initialized from a swap pte which computes
an invalid page pointer. A kernel panic results such as:

BUG: unable to handle page fault for address: ea1fffc8

Initialize subpage correctly before calling page_remove_rmap().

Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampb...@nvidia.com
Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page in 
migration")
Signed-off-by: Ralph Campbell 
Cc: "Jérôme Glisse" 
Cc: "Kirill A. Shutemov" 
Cc: Mike Kravetz 
Cc: Jason Gunthorpe 
Cc: 
Signed-off-by: Andrew Morton 
---

  mm/rmap.c |1 +
  1 file changed, 1 insertion(+)

--- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
+++ a/mm/rmap.c
@@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
 * No need to invalidate here it will synchronize on
 * against the special swap migration pte.
 */
+   subpage = page;
goto discard;
}
  


Hi Ralph and everyone,

While the above prevents a crash, I'm concerned that it is still not
an accurate fix. This fix leads to repeatedly removing the rmap, against the
same struct page, which is odd, and also doesn't directly address the
root cause, which I understand to be: this routine can't handle migrating
the zero page properly--over and back, anyway. (We should also mention more
about how this is triggered, in the commit description.)

I'll take a closer look at possible fixes (I have to step out for a bit) soon,
but any more experienced help is also appreciated here.

thanks,


I'm not surprised at the confusion. It took me quite awhile to 
understand how migrate_vma() works with ZONE_DEVICE private memory.

The big point to be aware of is that when migrating a page to
device private memory, the source page's page->mapping pointer
is copied to the ZONE_DEVICE struct page and the page_mapcount()
is increased. So, the kernel sees the page as being "mapped"
but the page table entry as being is_swap_pte() so the CPU will fault
if it tries to access the mapped address.
So yes, the source anon page is unmapped, DMA'ed to the device,
and then mapped again. Then on a CPU fault, the zone device page
is unmapped, DMA'ed to system memory, and mapped again.
The rmap_walk() is used to clear the temporary migration pte so
that is another important detail of how migrate_vma() works.
At the moment, only single anon private pages can migrate to
device private memory so there are no subpages and setting it to "page"
should be correct for now. I'm looking at supporting migration of
transparent huge pages but that is a work in progress.
Let me know how much of all that you think should be in the change log.
Getting an Acked-by from Jerome would be nice too.

I see Christoph Hellwig got confused by this too [1].
I have a patch to clear page->mapping when freeing ZONE_DEVICE private
struct pages which I'll send out soon.
I'll probably also add some comments to struct page to include the
above info and maybe remove the _zd_pad_1 field.

[1] 740d6310ed4cd5c78e63 ("mm: don't clear ->mapping in hmm_devmem_free")



Re: [PATCH v2 1/2] dt-bindings: mmc: Document Aspeed SD controller

2019-07-15 Thread Andrew Jeffery



On Tue, 16 Jul 2019, at 07:47, Rob Herring wrote:
> On Thu, Jul 11, 2019 at 9:32 PM Andrew Jeffery  wrote:
> >
> > The ASPEED SD/SDIO/eMMC controller exposes two slots implementing the
> > SDIO Host Specification v2.00, with 1 or 4 bit data buses, or an 8 bit
> > data bus if only a single slot is enabled.
> >
> > Signed-off-by: Andrew Jeffery 
> > ---
> > In v2:
> >
> > * Rename to aspeed,sdhci.yaml
> > * Rename sd-controller compatible
> > * Add `maxItems: 1` for reg properties
> > * Move sdhci subnode description to patternProperties
> > * Drop sdhci compatible requirement
> > * #address-cells and #size-cells are required
> > * Prevent additional properties
> > * Implement explicit ranges in example
> > * Remove slot property
> >
> >  .../devicetree/bindings/mmc/aspeed,sdhci.yaml | 90 +++
> >  1 file changed, 90 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml 
> > b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> > new file mode 100644
> > index ..67a691c3348c
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml
> > @@ -0,0 +1,90 @@
> > +# SPDX-License-Identifier: GPL-2.0-or-later
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/mmc/aspeed,sdhci.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: ASPEED SD/SDIO/eMMC Controller
> > +
> > +maintainers:
> > +  - Andrew Jeffery 
> > +  - Ryan Chen 
> > +
> > +description: |+
> > +  The ASPEED SD/SDIO/eMMC controller exposes two slots implementing the 
> > SDIO
> > +  Host Specification v2.00, with 1 or 4 bit data buses, or an 8 bit data 
> > bus if
> > +  only a single slot is enabled.
> > +
> > +  The two slots are supported by a common configuration area. As the 
> > SDHCIs for
> > +  the slots are dependent on the common configuration area, they are 
> > described
> > +  as child nodes.
> > +
> > +properties:
> > +  compatible:
> > +enum: [ aspeed,ast2400-sd-controller, aspeed,ast2500-sd-controller ]
> 
> This is actually a list of 4 strings. Please reformat to 1 per line.

On reflection that's obvious, but also a somewhat subtle interaction with the
preference for no quotes (the obvious caveat being "except where required").

Thanks for pointing it out.

I have been running `make dt_binding_check` and `make dtbs_check` over
these, looks like I need to up my game a bit though. Do you do additional things
in your workflow?

Andrew


Re: [PATCH V5 11/18] clk: tegra210: Add support for Tegra210 clocks

2019-07-15 Thread Sowjanya Komatineni



On 7/14/19 2:41 PM, Dmitry Osipenko wrote:

13.07.2019 8:54, Sowjanya Komatineni пишет:

On 6/29/19 8:10 AM, Dmitry Osipenko wrote:

28.06.2019 5:12, Sowjanya Komatineni пишет:

This patch adds system suspend and resume support for Tegra210
clocks.

All the CAR controller settings are lost on suspend when core power
goes off.

This patch has implementation for saving and restoring all the PLLs
and clocks context during system suspend and resume to have the
clocks back to same state for normal operation.

Acked-by: Thierry Reding 
Signed-off-by: Sowjanya Komatineni 
---
   drivers/clk/tegra/clk-tegra210.c | 115
++-
   drivers/clk/tegra/clk.c  |  14 +
   drivers/clk/tegra/clk.h  |   1 +
   3 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/drivers/clk/tegra/clk-tegra210.c
b/drivers/clk/tegra/clk-tegra210.c
index 1c08c53482a5..1b839544e086 100644
--- a/drivers/clk/tegra/clk-tegra210.c
+++ b/drivers/clk/tegra/clk-tegra210.c
@@ -9,10 +9,12 @@
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
   #include 
+#include 
   #include 
   #include 
   #include 
@@ -20,6 +22,7 @@
   #include 
     #include "clk.h"
+#include "clk-dfll.h"
   #include "clk-id.h"
     /*
@@ -225,6 +228,7 @@
     #define CLK_RST_CONTROLLER_RST_DEV_Y_SET 0x2a8
   #define CLK_RST_CONTROLLER_RST_DEV_Y_CLR 0x2ac
+#define CPU_SOFTRST_CTRL 0x380
     #define LVL2_CLK_GATE_OVRA 0xf8
   #define LVL2_CLK_GATE_OVRC 0x3a0
@@ -2820,6 +2824,7 @@ static int tegra210_enable_pllu(void)
   struct tegra_clk_pll_freq_table *fentry;
   struct tegra_clk_pll pllu;
   u32 reg;
+    int ret;
     for (fentry = pll_u_freq_table; fentry->input_rate; fentry++) {
   if (fentry->input_rate == pll_ref_freq)
@@ -2847,10 +2852,10 @@ static int tegra210_enable_pllu(void)
   fence_udelay(1, clk_base);
   reg |= PLL_ENABLE;
   writel(reg, clk_base + PLLU_BASE);
+    fence_udelay(1, clk_base);
   -    readl_relaxed_poll_timeout_atomic(clk_base + PLLU_BASE, reg,
-  reg & PLL_BASE_LOCK, 2, 1000);
-    if (!(reg & PLL_BASE_LOCK)) {
+    ret = tegra210_wait_for_mask(, PLLU_BASE, PLL_BASE_LOCK);
+    if (ret) {
   pr_err("Timed out waiting for PLL_U to lock\n");
   return -ETIMEDOUT;
   }
@@ -3283,6 +3288,103 @@ static void tegra210_disable_cpu_clock(u32 cpu)
   }
     #ifdef CONFIG_PM_SLEEP
+static u32 cpu_softrst_ctx[3];
+static struct platform_device *dfll_pdev;
+#define car_readl(_base, _off) readl_relaxed(clk_base + (_base) +
((_off) * 4))
+#define car_writel(_val, _base, _off) \
+    writel_relaxed(_val, clk_base + (_base) + ((_off) * 4))
+
+static int tegra210_clk_suspend(void)
+{
+    unsigned int i;
+    struct device_node *node;
+
+    tegra_cclkg_burst_policy_save_context();
+
+    if (!dfll_pdev) {
+    node = of_find_compatible_node(NULL, NULL,
+   "nvidia,tegra210-dfll");
+    if (node)
+    dfll_pdev = of_find_device_by_node(node);
+
+    of_node_put(node);
+    if (!dfll_pdev)
+    pr_err("dfll node not found. no suspend for dfll\n");
+    }
+
+    if (dfll_pdev)
+    tegra_dfll_suspend(dfll_pdev);
+
+    /* Enable PLLP_OUT_CPU after dfll suspend */
+    tegra_clk_set_pllp_out_cpu(true);
+
+    tegra_sclk_cclklp_burst_policy_save_context();
+
+    clk_save_context();
+
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    cpu_softrst_ctx[i] = car_readl(CPU_SOFTRST_CTRL, i);
+
+    return 0;
+}
+
+static void tegra210_clk_resume(void)
+{
+    unsigned int i;
+    struct clk_hw *parent;
+    struct clk *clk;
+
+    /*
+ * clk_restore_context restores clocks as per the clock tree.
+ *
+ * dfllCPU_out is first in the clock tree to get restored and it
+ * involves programming DFLL controller along with restoring CPUG
+ * clock burst policy.
+ *
+ * DFLL programming needs dfll_ref and dfll_soc peripheral clocks
+ * to be restores which are part ofthe peripheral clocks.

 ^ white-space

Please use spellchecker to avoid typos.


+ * So, peripheral clocks restore should happen prior to dfll clock
+ * restore.
+ */
+
+    tegra_clk_osc_resume(clk_base);
+    for (i = 0; i < ARRAY_SIZE(cpu_softrst_ctx); i++)
+    car_writel(cpu_softrst_ctx[i], CPU_SOFTRST_CTRL, i);
+
+    /* restore all plls and peripheral clocks */
+    tegra210_init_pllu();
+    clk_restore_context();
+
+    fence_udelay(5, clk_base);
+
+    /* resume SCLK and CPULP clocks */
+    tegra_sclk_cpulp_burst_policy_restore_context();
+
+    /*
+ * restore CPUG clocks:
+ * - enable DFLL in open loop mode
+ * - switch CPUG to DFLL clock source
+ * - close DFLL loop
+ * - sync PLLX state
+ */
+    if (dfll_pdev)
+    tegra_dfll_resume(dfll_pdev, false);
+
+    tegra_cclkg_burst_policy_restore_context();
+    fence_udelay(2, clk_base);
+
+   

Re: [RFC PATCH, x86]: Disable CPA cache flush for selfsnoop targets

2019-07-15 Thread Andi Kleen
On Mon, Jul 15, 2019 at 04:10:36PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 15, 2019 at 3:53 PM Andi Kleen  wrote:
> >
> > > I haven't tested on a real kernel with i915.  Does i915 really hit
> > > this code path?  Does it happen more than once or twice at boot?
> >
> > Yes some workloads allocate/free a lot of write combined memory
> > for graphics objects.
> >
> 
> But where does that memory come from?  If it's from device memory
> (i.e. memory that's not in the kernel direct map), then, unless I
> missed something, we're never changing the cache mode per se -- we're
> just ioremap_wc-ing it, which doesn't require a flush.

Integraded graphics doesn't have device memory. There's an reserved
memory area, but a lot of the buffers the GPU works with
come from main memory.


-Andi


Re: list corruption in deferred_split_scan()

2019-07-15 Thread Yang Shi




On 7/15/19 2:23 PM, Qian Cai wrote:

On Fri, 2019-07-12 at 12:12 -0700, Yang Shi wrote:

Another possible lead is that without reverting the those commits below,
kdump
kernel would always also crash in shrink_slab_memcg() at this line,

map = rcu_dereference_protected(memcg->nodeinfo[nid]->shrinker_map, true);

This looks a little bit weird. It seems nodeinfo[nid] is NULL? I didn't
think of where nodeinfo was freed but memcg was still online. Maybe a
check is needed:

Actually, "memcg" is NULL.


It sounds weird. shrink_slab() is called in mem_cgroup_iter which does 
pin the memcg. So, the memcg should not go away.





diff --git a/mm/vmscan.c b/mm/vmscan.c
index a0301ed..bacda49 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -602,6 +602,9 @@ static unsigned long shrink_slab_memcg(gfp_t
gfp_mask, int nid,
  if (!mem_cgroup_online(memcg))
  return 0;

+   if (!memcg->nodeinfo[nid])
+   return 0;
+
  if (!down_read_trylock(_rwsem))
  return 0;


[9.072036][T1] BUG: KASAN: null-ptr-deref in shrink_slab+0x111/0x440
[9.072036][T1] Read of size 8 at addr 0dc8 by task
swapper/0/1
[9.072036][T1]
[9.072036][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-next-
20190711+ #10
[9.072036][T1] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
DL385
Gen10, BIOS A40 01/25/2019
[9.072036][T1] Call Trace:
[9.072036][T1]  dump_stack+0x62/0x9a
[9.072036][T1]  __kasan_report.cold.4+0xb0/0xb4
[9.072036][T1]  ? unwind_get_return_address+0x40/0x50
[9.072036][T1]  ? shrink_slab+0x111/0x440
[9.072036][T1]  kasan_report+0xc/0xe
[9.072036][T1]  __asan_load8+0x71/0xa0
[9.072036][T1]  shrink_slab+0x111/0x440
[9.072036][T1]  ? mem_cgroup_iter+0x98/0x840
[9.072036][T1]  ? unregister_shrinker+0x110/0x110
[9.072036][T1]  ? kasan_check_read+0x11/0x20
[9.072036][T1]  ? mem_cgroup_protected+0x39/0x260
[9.072036][T1]  shrink_node+0x31e/0xa30
[9.072036][T1]  ? shrink_node_memcg+0x1560/0x1560
[9.072036][T1]  ? ktime_get+0x93/0x110
[9.072036][T1]  do_try_to_free_pages+0x22f/0x820
[9.072036][T1]  ? shrink_node+0xa30/0xa30
[9.072036][T1]  ? kasan_check_read+0x11/0x20
[9.072036][T1]  ? check_chain_key+0x1df/0x2e0
[9.072036][T1]  try_to_free_pages+0x242/0x4d0
[9.072036][T1]  ? do_try_to_free_pages+0x820/0x820
[9.072036][T1]  __alloc_pages_nodemask+0x9ce/0x1bc0
[9.072036][T1]  ? gfp_pfmemalloc_allowed+0xc0/0xc0
[9.072036][T1]  ? unwind_dump+0x260/0x260
[9.072036][T1]  ? kernel_text_address+0x33/0xc0
[9.072036][T1]  ? arch_stack_walk+0x8f/0xf0
[9.072036][T1]  ? ret_from_fork+0x22/0x40
[9.072036][T1]  alloc_page_interleave+0x18/0x130
[9.072036][T1]  alloc_pages_current+0xf6/0x110
[9.072036][T1]  allocate_slab+0x600/0x11f0
[9.072036][T1]  new_slab+0x46/0x70
[9.072036][T1]  ___slab_alloc+0x5d4/0x9c0
[9.072036][T1]  ? create_object+0x3a/0x3e0
[9.072036][T1]  ? fs_reclaim_acquire.part.15+0x5/0x30
[9.072036][T1]  ? ___might_sleep+0xab/0xc0
[9.072036][T1]  ? create_object+0x3a/0x3e0
[9.072036][T1]  __slab_alloc+0x12/0x20
[9.072036][T1]  ? __slab_alloc+0x12/0x20
[9.072036][T1]  kmem_cache_alloc+0x32a/0x400
[9.072036][T1]  create_object+0x3a/0x3e0
[9.072036][T1]  kmemleak_alloc+0x71/0xa0
[9.072036][T1]  kmem_cache_alloc+0x272/0x400
[9.072036][T1]  ? kasan_check_read+0x11/0x20
[9.072036][T1]  ? do_raw_spin_unlock+0xa8/0x140
[9.072036][T1]  acpi_ps_alloc_op+0x76/0x122
[9.072036][T1]  acpi_ds_execute_arguments+0x2f/0x18d
[9.072036][T1]  acpi_ds_get_package_arguments+0x7d/0x84
[9.072036][T1]  acpi_ns_init_one_package+0x33/0x61
[9.072036][T1]  acpi_ns_init_one_object+0xfc/0x189
[9.072036][T1]  acpi_ns_walk_namespace+0x114/0x1f2
[9.072036][T1]  ? acpi_ns_init_one_package+0x61/0x61
[9.072036][T1]  ? acpi_ns_init_one_package+0x61/0x61
[9.072036][T1]  acpi_walk_namespace+0x9e/0xcb
[9.072036][T1]  ? acpi_sleep_proc_init+0x36/0x36
[9.072036][T1]  acpi_ns_initialize_objects+0x99/0xed
[9.072036][T1]  ? acpi_ns_find_ini_methods+0xa2/0xa2
[9.072036][T1]  ? acpi_tb_load_namespace+0x2dc/0x2eb
[9.072036][T1]  acpi_load_tables+0x61/0x80
[9.072036][T1]  acpi_init+0x10d/0x44b
[9.072036][T1]  ? acpi_sleep_proc_init+0x36/0x36
[9.072036][T1]  ? bus_uevent_filter+0x16/0x30
[9.072036][T1]  ? kobject_uevent_env+0x109/0x980
[9.072036][T1]  ? kernfs_get+0x13/0x20
[9.072036][T1]  ? kobject_uevent+0xb/0x10
[9.072036][T1]  ? kset_register+0x31/0x50
[9.072036][T1]  ? kset_create_and_add+0x9f/0xd0
[9.072036][T1]  ? acpi_sleep_proc_init+0x36/0x36
[ 

Re: [PATCH v1] clk: Add devm_clk_{prepare,enable,prepare_enable}

2019-07-15 Thread Guenter Roeck

On 7/15/19 8:34 AM, Marc Gonzalez wrote:

Provide devm variants for automatic resource release on device removal.
probe() error-handling is simpler, and remove is no longer required.

Signed-off-by: Marc Gonzalez 


Again ?

https://lore.kernel.org/patchwork/patch/755667/

This must be at least the third time this is tried. I don't think anything 
changed
since the previous submissions. I long since gave up and use 
devm_add_action_or_reset()
in affected drivers instead.

Guenter


---
  Documentation/driver-model/devres.rst |  3 +++
  drivers/clk/clk.c | 24 
  include/linux/clk.h   |  8 
  3 files changed, 35 insertions(+)

diff --git a/Documentation/driver-model/devres.rst 
b/Documentation/driver-model/devres.rst
index 1b6ced8e4294..9357260576ef 100644
--- a/Documentation/driver-model/devres.rst
+++ b/Documentation/driver-model/devres.rst
@@ -253,6 +253,9 @@ CLOCK
devm_clk_hw_register()
devm_of_clk_add_hw_provider()
devm_clk_hw_register_clkdev()
+  devm_clk_prepare()
+  devm_clk_enable()
+  devm_clk_prepare_enable()
  
  DMA

dmaenginem_async_device_register()
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index c0990703ce54..5e85548357c0 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -914,6 +914,18 @@ int clk_prepare(struct clk *clk)
  }
  EXPORT_SYMBOL_GPL(clk_prepare);
  
+static void unprepare(void *clk)

+{
+   clk_unprepare(clk);
+}
+
+int devm_clk_prepare(struct device *dev, struct clk *clk)
+{
+   int rc = clk_prepare(clk);
+   return rc ? : devm_add_action_or_reset(dev, unprepare, clk);
+}
+EXPORT_SYMBOL_GPL(devm_clk_prepare);
+
  static void clk_core_disable(struct clk_core *core)
  {
lockdep_assert_held(_lock);
@@ -1136,6 +1148,18 @@ int clk_enable(struct clk *clk)
  }
  EXPORT_SYMBOL_GPL(clk_enable);
  
+static void disable(void *clk)

+{
+   clk_disable(clk);
+}
+
+int devm_clk_enable(struct device *dev, struct clk *clk)
+{
+   int rc = clk_enable(clk);
+   return rc ? : devm_add_action_or_reset(dev, disable, clk);
+}
+EXPORT_SYMBOL_GPL(devm_clk_enable);
+
  static int clk_core_prepare_enable(struct clk_core *core)
  {
int ret;
diff --git a/include/linux/clk.h b/include/linux/clk.h
index 3c096c7a51dc..d09b5207e3f1 100644
--- a/include/linux/clk.h
+++ b/include/linux/clk.h
@@ -895,6 +895,14 @@ static inline void clk_restore_context(void) {}
  
  #endif
  
+int devm_clk_prepare(struct device *dev, struct clk *clk);

+int devm_clk_enable(struct device *dev, struct clk *clk);
+static inline int devm_clk_prepare_enable(struct device *dev, struct clk *clk)
+{
+   int rc = devm_clk_prepare(dev, clk);
+   return rc ? : devm_clk_enable(dev, clk);
+}
+
  /* clk_prepare_enable helps cases using clk_enable in non-atomic context. */
  static inline int clk_prepare_enable(struct clk *clk)
  {





Re: [PATCH v2] KVM: x86: PMU Event Filter

2019-07-15 Thread Eric Hankland
> I think just disabling guest cpuid might not be enough, since guest
> could write to the msr without checking the cpuid.
>
> Why not just add a bitmap for fixed counter?
> e.g. fixed_counter_reject_bitmap
>
> At the beginning of reprogram_fixed_counter, we could add the check:
>
> if (test_bit(idx, >arch.fixed_counter_reject_bitmap))
>  return -EACCES;
>
> (Please test with your old guest and see if they have issues if we
> inject #GP when
> they try to set the fixed_ctrl msr. If there is, we could drop -EACCESS
> above)
>
> The bitmap could be set at kvm_vm_ioctl_set_pmu_event_filter.

intel_pmu_refresh() checks the guest cpuid and sets the number of
fixed counters according to that:
pmu->nr_arch_fixed_counters = min_t(int, edx.split.num_counters_fixed,
INTEL_PMC_MAX_FIXED);

and reprogram_fixed_counters()/get_fixed_pmc() respect this so the
guest can't just ignore the cpuid.

Adding a bitmap does let you do things like disable the first counter
but keep the second and third, but given that there are only three and
the events are likely to be on a whitelist anyway, it seemed like
adding the bitmap wasn't worth it. If you still feel the same way even
though we can disable them via the cpuid, I can add this in.

> I think it would be better to add more, please see below:
>
> enum kvm_pmu_action_type {
>  KVM_PMU_EVENT_ACTION_NONE = 0,
>  KVM_PMU_EVENT_ACTION_ACCEPT = 1,
>  KVM_PMU_EVENT_ACTION_REJECT = 2,
>  KVM_PMU_EVENT_ACTION_MAX
> };
>
> and do a check in kvm_vm_ioctl_set_pmu_event_filter()
>  if (filter->action >= KVM_PMU_EVENT_ACTION_MAX)
>  return -EINVAL;
>
> This is for detecting the case that we add a new action in
> userspace, while the kvm hasn't been updated to support that.
>
> KVM_PMU_EVENT_ACTION_NONE is for userspace to remove
> the filter after they set it.

We can achieve the same result by using a reject action with an empty
set of events - is there some advantage to "none" over that? I can add
that check for valid actions.

> > +#define KVM_PMU_EVENT_FILTER_MAX_EVENTS 63
>
> Why is this limit needed?

Serves to keep the filters on the smaller side and ensures the size
calculation can't overflow if users attempt to. Keeping the filter
under 4k is nicer for allocation - also, if we want really large
filters we might want to do something smarter than a linear traversal
of the filter when guests program counters.

> I think it looks tidier to wrap the changes above into a function:
>
>  if (kvm_pmu_filter_event(kvm, eventsel & AMD64_RAW_EVENT_MASK_NB))
>  return;

Okay - I can do that.

> > +   kvfree(filter);
>
> Probably better to have it conditionally?
>
> if (filter) {
>  synchronize_srcu();
>  kfree(filter)
> }
>
> You may want to factor it out, so that kvm_pmu_destroy could reuse.

Do you mean kvm_arch_destroy_vm? It looks like that's where kvm_arch
members are freed. I can do that.

Eric


[PATCH v2] mm/z3fold.c: Reinitialize zhdr structs after migration

2019-07-15 Thread Henry Burns
z3fold_page_migration() calls memcpy(new_zhdr, zhdr, PAGE_SIZE).
However, zhdr contains fields that can't be directly coppied over (ex:
list_head, a circular linked list). We only need to initialize the
linked lists in new_zhdr, as z3fold_isolate_page() already ensures
that these lists are empty

Additionally it is possible that zhdr->work has been placed in a
workqueue. In this case we shouldn't migrate the page, as zhdr->work
references zhdr as opposed to new_zhdr.

Fixes: bba4c5f96ce4 ("mm/z3fold.c: support page migration")
Signed-off-by: Henry Burns 
---
 Changelog since v1:
 - Made comments explicityly refer to new_zhdr->buddy.

 mm/z3fold.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 42ef9955117c..f4b2283b19a3 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -1352,12 +1352,22 @@ static int z3fold_page_migrate(struct address_space 
*mapping, struct page *newpa
z3fold_page_unlock(zhdr);
return -EBUSY;
}
+   if (work_pending(>work)) {
+   z3fold_page_unlock(zhdr);
+   return -EAGAIN;
+   }
new_zhdr = page_address(newpage);
memcpy(new_zhdr, zhdr, PAGE_SIZE);
newpage->private = page->private;
page->private = 0;
z3fold_page_unlock(zhdr);
spin_lock_init(_zhdr->page_lock);
+   INIT_WORK(_zhdr->work, compact_page_work);
+   /*
+* z3fold_page_isolate() ensures that new_zhdr->buddy is empty,
+* so we only have to reinitialize it.
+*/
+   INIT_LIST_HEAD(_zhdr->buddy);
new_mapping = page_mapping(page);
__ClearPageMovable(page);
ClearPagePrivate(page);
-- 
2.22.0.510.g264f2c817a-goog



Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andy Lutomirski
On Mon, Jul 15, 2019 at 4:30 PM Andrew Cooper  wrote:
>
> On 15/07/2019 19:17, Nadav Amit wrote:
> >> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  
> >> wrote:
> >>
> >> There is a lot of infrastructure for functionality which is used
> >> exclusively in __{save,restore}_processor_state() on the suspend/resume
> >> path.
> >>
> >> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
> >> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
> >> rest of the Local APIC state isn't a clever thing to be doing.
> >>
> >> Delete the suspend/resume cr8 handling, which shrinks the size of struct
> >> saved_context, and allows for the removal of both PVOPS.
> > I think removing the interface for CR8 writes is also good to avoid
> > potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
> > Priorities between CR8 and APIC”):
> >
> > "Operating software should implement either direct APIC TPR updates or CR8
> > style TPR updates but not mix them. Software can use a serializing
> > instruction (for example, CPUID) to serialize updates between MOV CR8 and
> > stores to the APIC.”
> >
> > And native_write_cr8() did not even issue a serializing instruction.
> >
>
> Given its location, the one write_cr8() is bounded by two serialising
> operations, so is safe in practice.
>
> However, I agree with the statement in the manual.  I could submit a v3
> with an updated commit message, or let it be fixed on commit.  Whichever
> is easiest.
>

I don't see anything wrong with the message.  If we actually used CR8
for interrupt priorities, we wouldn't want it to serialize.  The bug
is that the code that did the write_cr8() should have had a comment as
to how it serialized against lapic_restore().  But that doesn't seem
worth mentioning in the message, since, as noted, the real problem was
that it nonsensically restored just TPR without restoring everything
else.


Re: [v2 PATCH 0/2] mm: mempolicy: fix mbind()'s inconsistent behavior for unmovable pages

2019-07-15 Thread Yang Shi




On 7/15/19 4:51 PM, Yang Shi wrote:



On 7/15/19 3:22 PM, Andrew Morton wrote:
On Sat, 22 Jun 2019 08:20:07 +0800 Yang Shi 
 wrote:



Changelog
v2: * Fixed the inconsistent behavior by not aborting !vma_migratable()
   immediately by a separate patch (patch 1/2), and this is also 
the

   preparation for patch 2/2. For the details please see the commit
   log.  Per Vlastimil.
 * Not abort immediately if unmovable page is met. This should 
handle

   non-LRU movable pages and temporary off-LRU pages more friendly.
   Per Vlastimil and Michal Hocko.

Yang Shi (2):
   mm: mempolicy: make the behavior consistent when 
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
   mm: mempolicy: handle vma with unmovable pages mapped 
correctly in mbind



I'm seeing no evidence of review on these two.  Could we please take a
look?  2/2 fixes a kernel crash so let's please also think about the
-stable situation.


Thanks for following up this. It seems I have a few patches stalled 
due to lack of review.


BTW, this would not crash post-4.9 kernel since that BUG_ON had been 
removed. But, that behavior is definitely problematic as the commit 
log elaborated.




I have a note here that Vlastimil had an issue with [1/2] but I seem to
hae misplaced that email :(


Vlastimil suggested something for v1, then I think his concern and 
suggestion have been solved in this version. But, the review was stalled.





Re: [v2 PATCH 0/2] mm: mempolicy: fix mbind()'s inconsistent behavior for unmovable pages

2019-07-15 Thread Yang Shi




On 7/15/19 3:22 PM, Andrew Morton wrote:

On Sat, 22 Jun 2019 08:20:07 +0800 Yang Shi  wrote:


Changelog
v2: * Fixed the inconsistent behavior by not aborting !vma_migratable()
   immediately by a separate patch (patch 1/2), and this is also the
   preparation for patch 2/2. For the details please see the commit
   log.  Per Vlastimil.
 * Not abort immediately if unmovable page is met. This should handle
   non-LRU movable pages and temporary off-LRU pages more friendly.
   Per Vlastimil and Michal Hocko.

Yang Shi (2):
   mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and 
MPOL_MF_STRICT were specified
   mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind


I'm seeing no evidence of review on these two.  Could we please take a
look?  2/2 fixes a kernel crash so let's please also think about the
-stable situation.


Thanks for following up this. It seems I have a few patches stalled due 
to lack of review.


BTW, this would not crash post-4.9 kernel since that BUG_ON had been 
removed. But, that behavior is definitely problematic as the commit log 
elaborated.




I have a note here that Vlastimil had an issue with [1/2] but I seem to
hae misplaced that email :(




Re: [Linux-kernel-mentees] [PATCH v2] PCI: Remove functions not called in include/linux/pci.h

2019-07-15 Thread Bjorn Helgaas
On Mon, Jul 15, 2019 at 09:42:47PM +0200, Lukas Bulwahn wrote:
> On Mon, 15 Jul 2019, Kelsey Skunberg wrote:
> 
> > Remove the following uncalled functions from include/linux/pci.h:
> > 
> > pci_block_cfg_access()
> > pci_block_cfg_access_in_atomic()
> > pci_unblock_cfg_access()
> > 
> > Functions were added in patch fb51ccbf217c "PCI: Rework config space
> > blocking services", ...

> Also note that commits are referred to with this format:
> 
> commit <12-character sha prefix> ("")

FWIW, I use this shell alias to generate these references:

  gsr is aliased to `git --no-pager show -s --abbrev-commit --abbrev=12 
--pretty=format:"%h (\"%s\")%n"'

  $ gsr fb51ccb
  fb51ccbf217c ("PCI: Rework config space blocking services")

Documentation/process/submitting-patches.rst mentions a 12-char (at
least) SHA-1 but the e21d2170f36 example shows a *20*-char SHA-1,
which seems excessive to me.

I personally skip the word "commit" because I figure it's pretty
obvious what it is, but it's fine either way.

Bjorn


Re: [PATCH v2 2/2] interconnect: qcom: Add tagging and wake/sleep support for sdm845

2019-07-15 Thread David Dai

Hi Evan,

Thanks for the continued help in reviewing these patches!

On 7/11/2019 10:06 AM, Evan Green wrote:

Hi Georgi and David,

On Tue, Jun 18, 2019 at 2:17 AM Georgi Djakov  wrote:

From: David Dai 

Add support for wake and sleep commands by using a tag to indicate
whether or not the aggregate and set requests fall into execution
state specific bucket.

Signed-off-by: David Dai 
Signed-off-by: Georgi Djakov 
---
  drivers/interconnect/qcom/sdm845.c | 129 ++---
  1 file changed, 98 insertions(+), 31 deletions(-)

diff --git a/drivers/interconnect/qcom/sdm845.c 
b/drivers/interconnect/qcom/sdm845.c
index fb526004c82e..c100aab39415 100644
--- a/drivers/interconnect/qcom/sdm845.c
+++ b/drivers/interconnect/qcom/sdm845.c
@@ -66,6 +66,17 @@ struct bcm_db {
  #define SDM845_MAX_BCM_PER_NODE2
  #define SDM845_MAX_VCD 10

+#define QCOM_ICC_BUCKET_AMC0

What is AMC again? Is it the "right now" bucket? Maybe a comment on
the meaning of this bucket would be helpful.

That's correct. Will add a comment for this.



+#define QCOM_ICC_BUCKET_WAKE   1
+#define QCOM_ICC_BUCKET_SLEEP  2
+#define QCOM_ICC_NUM_BUCKETS   3
+#define QCOM_ICC_TAG_AMC   BIT(QCOM_ICC_BUCKET_AMC)
+#define QCOM_ICC_TAG_WAKE  BIT(QCOM_ICC_BUCKET_WAKE)
+#define QCOM_ICC_TAG_SLEEP BIT(QCOM_ICC_BUCKET_SLEEP)
+#define QCOM_ICC_TAG_ACTIVE_ONLY   (QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE)
+#define QCOM_ICC_TAG_ALWAYS(QCOM_ICC_TAG_AMC | QCOM_ICC_TAG_WAKE |\
+QCOM_ICC_TAG_SLEEP)
+
  /**
   * struct qcom_icc_node - Qualcomm specific interconnect nodes
   * @name: the node name used in debugfs
@@ -75,7 +86,9 @@ struct bcm_db {
   * @channels: num of channels at this node
   * @buswidth: width of the interconnect between a node and the bus
   * @sum_avg: current sum aggregate value of all avg bw requests
+ * @sum_avg_cached: previous sum aggregate value of all avg bw requests
   * @max_peak: current max aggregate value of all peak bw requests
+ * @max_peak_cached: previous max aggregate value of all peak bw requests
   * @bcms: list of bcms associated with this logical node
   * @num_bcms: num of @bcms
   */
@@ -86,8 +99,10 @@ struct qcom_icc_node {
 u16 num_links;
 u16 channels;
 u16 buswidth;
-   u64 sum_avg;
-   u64 max_peak;
+   u64 sum_avg[QCOM_ICC_NUM_BUCKETS];
+   u64 sum_avg_cached[QCOM_ICC_NUM_BUCKETS];
+   u64 max_peak[QCOM_ICC_NUM_BUCKETS];
+   u64 max_peak_cached[QCOM_ICC_NUM_BUCKETS];
 struct qcom_icc_bcm *bcms[SDM845_MAX_BCM_PER_NODE];
 size_t num_bcms;
  };
@@ -112,8 +127,8 @@ struct qcom_icc_bcm {
 const char *name;
 u32 type;
 u32 addr;
-   u64 vote_x;
-   u64 vote_y;
+   u64 vote_x[QCOM_ICC_NUM_BUCKETS];
+   u64 vote_y[QCOM_ICC_NUM_BUCKETS];
 bool dirty;
 bool keepalive;
 struct bcm_db aux_data;
@@ -555,7 +570,7 @@ inline void tcs_cmd_gen(struct tcs_cmd *cmd, u64 vote_x, 
u64 vote_y,
 cmd->wait = true;
  }

-static void tcs_list_gen(struct list_head *bcm_list,
+static void tcs_list_gen(struct list_head *bcm_list, int bucket,
  struct tcs_cmd tcs_list[SDM845_MAX_VCD],
  int n[SDM845_MAX_VCD])
  {
@@ -573,8 +588,8 @@ static void tcs_list_gen(struct list_head *bcm_list,
 commit = true;
 cur_vcd_size = 0;
 }
-   tcs_cmd_gen(_list[idx], bcm->vote_x, bcm->vote_y,
-   bcm->addr, commit);
+   tcs_cmd_gen(_list[idx], bcm->vote_x[bucket],
+   bcm->vote_y[bucket], bcm->addr, commit);
 idx++;
 n[batch]++;
 /*
@@ -595,32 +610,39 @@ static void tcs_list_gen(struct list_head *bcm_list,

  static void bcm_aggregate(struct qcom_icc_bcm *bcm)
  {
-   size_t i;
-   u64 agg_avg = 0;
-   u64 agg_peak = 0;
+   size_t i, bucket;
+   u64 agg_avg[QCOM_ICC_NUM_BUCKETS] = {0};
+   u64 agg_peak[QCOM_ICC_NUM_BUCKETS] = {0};
 u64 temp;

-   for (i = 0; i < bcm->num_nodes; i++) {
-   temp = bcm->nodes[i]->sum_avg * bcm->aux_data.width;
-   do_div(temp, bcm->nodes[i]->buswidth * bcm->nodes[i]->channels);
-   agg_avg = max(agg_avg, temp);
+   for (bucket = 0; bucket < QCOM_ICC_NUM_BUCKETS; bucket++) {
+   for (i = 0; i < bcm->num_nodes; i++) {
+   temp = bcm->nodes[i]->sum_avg_cached[bucket] * 
bcm->aux_data.width;
+   do_div(temp, bcm->nodes[i]->buswidth * 
bcm->nodes[i]->channels);
+   agg_avg[bucket] = max(agg_avg[bucket], temp);

-   temp = bcm->nodes[i]->max_peak * bcm->aux_data.width;
-   do_div(temp, bcm->nodes[i]->buswidth);

Why is it that this one doesn't 

Re: [PATCH] mm/hmm: Fix bad subpage pointer in try_to_unmap_one

2019-07-15 Thread John Hubbard
On 7/15/19 3:00 PM, Andrew Morton wrote:
> On Tue, 9 Jul 2019 18:24:57 -0700 Ralph Campbell  wrote:
> 
>>
>> On 7/9/19 5:28 PM, Andrew Morton wrote:
>>> On Tue, 9 Jul 2019 15:35:56 -0700 Ralph Campbell  
>>> wrote:
>>>
 When migrating a ZONE device private page from device memory to system
 memory, the subpage pointer is initialized from a swap pte which computes
 an invalid page pointer. A kernel panic results such as:

 BUG: unable to handle page fault for address: ea1fffc8

 Initialize subpage correctly before calling page_remove_rmap().
>>>
>>> I think this is
>>>
>>> Fixes:  a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE 
>>> page in migration")
>>> Cc: stable
>>>
>>> yes?
>>>
>>
>> Yes. Can you add this or should I send a v2?
> 
> I updated the patch.  Could we please have some review input?
> 
> 
> From: Ralph Campbell 
> Subject: mm/hmm: fix bad subpage pointer in try_to_unmap_one
> 
> When migrating a ZONE device private page from device memory to system
> memory, the subpage pointer is initialized from a swap pte which computes
> an invalid page pointer. A kernel panic results such as:
> 
> BUG: unable to handle page fault for address: ea1fffc8
> 
> Initialize subpage correctly before calling page_remove_rmap().
> 
> Link: http://lkml.kernel.org/r/20190709223556.28908-1-rcampb...@nvidia.com
> Fixes: a5430dda8a3a1c ("mm/migrate: support un-addressable ZONE_DEVICE page 
> in migration")
> Signed-off-by: Ralph Campbell 
> Cc: "Jérôme Glisse" 
> Cc: "Kirill A. Shutemov" 
> Cc: Mike Kravetz 
> Cc: Jason Gunthorpe 
> Cc: 
> Signed-off-by: Andrew Morton 
> ---
> 
>  mm/rmap.c |1 +
>  1 file changed, 1 insertion(+)
> 
> --- a/mm/rmap.c~mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one
> +++ a/mm/rmap.c
> @@ -1476,6 +1476,7 @@ static bool try_to_unmap_one(struct page
>* No need to invalidate here it will synchronize on
>* against the special swap migration pte.
>*/
> + subpage = page;
>   goto discard;
>   }
>  

Hi Ralph and everyone,

While the above prevents a crash, I'm concerned that it is still not
an accurate fix. This fix leads to repeatedly removing the rmap, against the
same struct page, which is odd, and also doesn't directly address the
root cause, which I understand to be: this routine can't handle migrating
the zero page properly--over and back, anyway. (We should also mention more 
about how this is triggered, in the commit description.)

I'll take a closer look at possible fixes (I have to step out for a bit) soon, 
but any more experienced help is also appreciated here.

thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH v2] x86/paravirt: Drop {read,write}_cr8() hooks

2019-07-15 Thread Andrew Cooper
On 15/07/2019 19:17, Nadav Amit wrote:
>> On Jul 15, 2019, at 8:16 AM, Andrew Cooper  wrote:
>>
>> There is a lot of infrastructure for functionality which is used
>> exclusively in __{save,restore}_processor_state() on the suspend/resume
>> path.
>>
>> cr8 is an alias of APIC_TASKPRI, and APIC_TASKPRI is saved/restored by
>> lapic_{suspend,resume}().  Saving and restoring cr8 independently of the
>> rest of the Local APIC state isn't a clever thing to be doing.
>>
>> Delete the suspend/resume cr8 handling, which shrinks the size of struct
>> saved_context, and allows for the removal of both PVOPS.
> I think removing the interface for CR8 writes is also good to avoid
> potential correctness issues, as the SDM says (10.8.6.1 "Interaction of Task
> Priorities between CR8 and APIC”):
>
> "Operating software should implement either direct APIC TPR updates or CR8
> style TPR updates but not mix them. Software can use a serializing
> instruction (for example, CPUID) to serialize updates between MOV CR8 and
> stores to the APIC.”
>
> And native_write_cr8() did not even issue a serializing instruction.
>

Given its location, the one write_cr8() is bounded by two serialising
operations, so is safe in practice.

However, I agree with the statement in the manual.  I could submit a v3
with an updated commit message, or let it be fixed on commit.  Whichever
is easiest.

~Andrew


Re: [PATCH V35 19/29] Lock down module params that specify hardware parameters (eg. ioport)

2019-07-15 Thread James Morris
On Mon, 15 Jul 2019, Matthew Garrett wrote:

> From: David Howells 
> 
> Provided an annotation for module parameters that specify hardware
> parameters (such as io ports, iomem addresses, irqs, dma channels, fixed
> dma buffers and other types).

Adding Jessica.

> 
> Suggested-by: Alan Cox 
> Signed-off-by: David Howells 
> Signed-off-by: Matthew Garrett 
> Reviewed-by: Kees Cook 
> ---
>  include/linux/security.h |  1 +
>  kernel/params.c  | 28 +++-
>  security/lockdown/lockdown.c |  1 +
>  3 files changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 8f7048395114..43fa3486522b 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -113,6 +113,7 @@ enum lockdown_reason {
>   LOCKDOWN_ACPI_TABLES,
>   LOCKDOWN_PCMCIA_CIS,
>   LOCKDOWN_TIOCSSERIAL,
> + LOCKDOWN_MODULE_PARAMETERS,
>   LOCKDOWN_INTEGRITY_MAX,
>   LOCKDOWN_CONFIDENTIALITY_MAX,
>  };
> diff --git a/kernel/params.c b/kernel/params.c
> index cf448785d058..f2779a76d39a 100644
> --- a/kernel/params.c
> +++ b/kernel/params.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_SYSFS
>  /* Protects all built-in parameters, modules use their own param_lock */
> @@ -96,13 +97,20 @@ bool parameq(const char *a, const char *b)
>   return parameqn(a, b, strlen(a)+1);
>  }
>  
> -static void param_check_unsafe(const struct kernel_param *kp)
> +static bool param_check_unsafe(const struct kernel_param *kp,
> +const char *doing)
>  {
> + if (kp->flags & KERNEL_PARAM_FL_HWPARAM &&
> + security_locked_down(LOCKDOWN_MODULE_PARAMETERS))
> + return false;
> +
>   if (kp->flags & KERNEL_PARAM_FL_UNSAFE) {
>   pr_notice("Setting dangerous option %s - tainting kernel\n",
> kp->name);
>   add_taint(TAINT_USER, LOCKDEP_STILL_OK);
>   }
> +
> + return true;
>  }
>  
>  static int parse_one(char *param,
> @@ -132,8 +140,10 @@ static int parse_one(char *param,
>   pr_debug("handling %s with %p\n", param,
>   params[i].ops->set);
>   kernel_param_lock(params[i].mod);
> - param_check_unsafe([i]);
> - err = params[i].ops->set(val, [i]);
> + if (param_check_unsafe([i], doing))
> + err = params[i].ops->set(val, [i]);
> + else
> + err = -EPERM;
>   kernel_param_unlock(params[i].mod);
>   return err;
>   }
> @@ -541,6 +551,12 @@ static ssize_t param_attr_show(struct module_attribute 
> *mattr,
>   return count;
>  }
>  
> +#ifdef CONFIG_MODULES
> +#define mod_name(mod) ((mod)->name)
> +#else
> +#define mod_name(mod) "unknown"
> +#endif
> +
>  /* sysfs always hands a nul-terminated string in buf.  We rely on that. */
>  static ssize_t param_attr_store(struct module_attribute *mattr,
>   struct module_kobject *mk,
> @@ -553,8 +569,10 @@ static ssize_t param_attr_store(struct module_attribute 
> *mattr,
>   return -EPERM;
>  
>   kernel_param_lock(mk->mod);
> - param_check_unsafe(attribute->param);
> - err = attribute->param->ops->set(buf, attribute->param);
> + if (param_check_unsafe(attribute->param, mod_name(mk->mod)))
> + err = attribute->param->ops->set(buf, attribute->param);
> + else
> + err = -EPERM;
>   kernel_param_unlock(mk->mod);
>   if (!err)
>   return len;
> diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
> index 07a49667f234..065432f9e218 100644
> --- a/security/lockdown/lockdown.c
> +++ b/security/lockdown/lockdown.c
> @@ -28,6 +28,7 @@ static char 
> *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
>   [LOCKDOWN_ACPI_TABLES] = "modified ACPI tables",
>   [LOCKDOWN_PCMCIA_CIS] = "direct PCMCIA CIS storage",
>   [LOCKDOWN_TIOCSSERIAL] = "reconfiguration of serial port IO",
> + [LOCKDOWN_MODULE_PARAMETERS] = "unsafe module parameters",
>   [LOCKDOWN_INTEGRITY_MAX] = "integrity",
>   [LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
>  };
> 

-- 
James Morris




Re: [PATCH] iio: cros_ec_accel_legacy: Always release lock when returning from _read()

2019-07-15 Thread Gwendal Grignou
Let's use Matthias CL.

On Mon, Jul 15, 2019 at 4:17 PM Gwendal Grignou  wrote:
>
> Sorry for the original mistake. I upload a patch at
> https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1702884.
>
> On Mon, Jul 15, 2019 at 1:04 PM Matthias Kaehlcke  wrote:
> >
> > Hi Benson,
> >
> > On Mon, Jul 15, 2019 at 12:55:57PM -0700, Benson Leung wrote:
> > > Hi Matthias,
> > >
> > > On Mon, Jul 15, 2019 at 12:10:17PM -0700, Matthias Kaehlcke wrote:
> > > > Before doing any actual work cros_ec_accel_legacy_read() acquires
> > > > a mutex, which is released at the end of the function. However for
> > > > 'calibbias' channels the function returns directly, without releasing
> > > > the lock. The next attempt to acquire the lock blocks forever. Instead
> > > > of an explicit return statement use the common return path, which
> > > > releases the lock.
> > > >
> > > > Fixes: 11b86c7004ef1 ("platform/chrome: Add cros_ec_accel_legacy 
> > > > driver")
> > > > Signed-off-by: Matthias Kaehlcke 
> > > > ---
> > > >  drivers/iio/accel/cros_ec_accel_legacy.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
> > > > b/drivers/iio/accel/cros_ec_accel_legacy.c
> > > > index 46bb2e421bb9..27ca4a64dddf 100644
> > > > --- a/drivers/iio/accel/cros_ec_accel_legacy.c
> > > > +++ b/drivers/iio/accel/cros_ec_accel_legacy.c
> > > > @@ -206,7 +206,8 @@ static int cros_ec_accel_legacy_read(struct iio_dev 
> > > > *indio_dev,
> > > > case IIO_CHAN_INFO_CALIBBIAS:
> > > > /* Calibration not supported. */
> > > > *val = 0;
> > > > -   return IIO_VAL_INT;
> > > > +   ret = IIO_VAL_INT;
> > > > +   break;
> > >
> > > The value of ret is not used below this. It seems to be only used in
> > > case IIO_CHAN_INFO_RAW. In fact, with your change,
> > > there's no return value at all and we just reach the end of
> > > cros_ec_accel_legacy_read.
> > >
> > > > default:
> > > > return -EINVAL;
> > > > }
> > >
> >
> > I messed up. I was over-confident that a FROMLIST patch in our 4.19
> > kernel + this patch applying on upstream means that upstream uses the
> > same code. I should have double-checked that the upstream context is
> > actually the same.
> >
> > Sorry for the noise.


Re: [PATCH V35 26/29] debugfs: Restrict debugfs when the kernel is locked down

2019-07-15 Thread James Morris
On Mon, 15 Jul 2019, Matthew Garrett wrote:

> From: David Howells 
> 
> Disallow opening of debugfs files that might be used to muck around when
> the kernel is locked down as various drivers give raw access to hardware
> through debugfs.  Given the effort of auditing all 2000 or so files and
> manually fixing each one as necessary, I've chosen to apply a heuristic
> instead.  The following changes are made:

Adding the debugfs maintainers.

> 
>  (1) chmod and chown are disallowed on debugfs objects (though the root dir
>  can be modified by mount and remount, but I'm not worried about that).
> 
>  (2) When the kernel is locked down, only files with the following criteria
>  are permitted to be opened:
> 
>   - The file must have mode 00444
>   - The file must not have ioctl methods
>   - The file must not have mmap
> 
>  (3) When the kernel is locked down, files may only be opened for reading.
> 
> Normal device interaction should be done through configfs, sysfs or a
> miscdev, not debugfs.
> 
> Note that this makes it unnecessary to specifically lock down show_dsts(),
> show_devs() and show_call() in the asus-wmi driver.
> 
> I would actually prefer to lock down all files by default and have the
> the files unlocked by the creator.  This is tricky to manage correctly,
> though, as there are 19 creation functions and ~1600 call sites (some of
> them in loops scanning tables).
> 
> Signed-off-by: David Howells 
> cc: Andy Shevchenko 
> cc: acpi4asus-u...@lists.sourceforge.net
> cc: platform-driver-...@vger.kernel.org
> cc: Matthew Garrett 
> cc: Thomas Gleixner 
> Signed-off-by: Matthew Garrett 
> ---
>  fs/debugfs/file.c| 30 ++
>  fs/debugfs/inode.c   | 32 ++--
>  include/linux/security.h |  1 +
>  security/lockdown/lockdown.c |  1 +
>  4 files changed, 62 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c
> index 93e4ca6b2ad7..87846aad594b 100644
> --- a/fs/debugfs/file.c
> +++ b/fs/debugfs/file.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -136,6 +137,25 @@ void debugfs_file_put(struct dentry *dentry)
>  }
>  EXPORT_SYMBOL_GPL(debugfs_file_put);
>  
> +/*
> + * Only permit access to world-readable files when the kernel is locked down.
> + * We also need to exclude any file that has ways to write or alter it as 
> root
> + * can bypass the permissions check.
> + */
> +static bool debugfs_is_locked_down(struct inode *inode,
> +struct file *filp,
> +const struct file_operations *real_fops)
> +{
> + if ((inode->i_mode & 0) == 0444 &&
> + !(filp->f_mode & FMODE_WRITE) &&
> + !real_fops->unlocked_ioctl &&
> + !real_fops->compat_ioctl &&
> + !real_fops->mmap)
> + return false;
> +
> + return security_locked_down(LOCKDOWN_DEBUGFS);
> +}
> +
>  static int open_proxy_open(struct inode *inode, struct file *filp)
>  {
>   struct dentry *dentry = F_DENTRY(filp);
> @@ -147,6 +167,11 @@ static int open_proxy_open(struct inode *inode, struct 
> file *filp)
>   return r == -EIO ? -ENOENT : r;
>  
>   real_fops = debugfs_real_fops(filp);
> +
> + r = debugfs_is_locked_down(inode, filp, real_fops);
> + if (r)
> + goto out;
> +
>   real_fops = fops_get(real_fops);
>   if (!real_fops) {
>   /* Huh? Module did not clean up after itself at exit? */
> @@ -272,6 +297,11 @@ static int full_proxy_open(struct inode *inode, struct 
> file *filp)
>   return r == -EIO ? -ENOENT : r;
>  
>   real_fops = debugfs_real_fops(filp);
> +
> + r = debugfs_is_locked_down(inode, filp, real_fops);
> + if (r)
> + goto out;
> +
>   real_fops = fops_get(real_fops);
>   if (!real_fops) {
>   /* Huh? Module did not cleanup after itself at exit? */
> diff --git a/fs/debugfs/inode.c b/fs/debugfs/inode.c
> index 042b688ed124..7b975dbb2bb4 100644
> --- a/fs/debugfs/inode.c
> +++ b/fs/debugfs/inode.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "internal.h"
>  
> @@ -35,6 +36,32 @@ static struct vfsmount *debugfs_mount;
>  static int debugfs_mount_count;
>  static bool debugfs_registered;
>  
> +/*
> + * Don't allow access attributes to be changed whilst the kernel is locked 
> down
> + * so that we can use the file mode as part of a heuristic to determine 
> whether
> + * to lock down individual files.
> + */
> +static int debugfs_setattr(struct dentry *dentry, struct iattr *ia)
> +{
> + int ret = security_locked_down(LOCKDOWN_DEBUGFS);
> +
> + if (ret && (ia->ia_valid & (ATTR_MODE | ATTR_UID | ATTR_GID)))
> + return ret;
> + return simple_setattr(dentry, ia);
> +}
> +
> +static const struct inode_operations debugfs_file_inode_operations = {
> + .setattr 

Re: [PATCH] iio: cros_ec_accel_legacy: Always release lock when returning from _read()

2019-07-15 Thread Gwendal Grignou
Sorry for the original mistake. I upload a patch at
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1702884.

On Mon, Jul 15, 2019 at 1:04 PM Matthias Kaehlcke  wrote:
>
> Hi Benson,
>
> On Mon, Jul 15, 2019 at 12:55:57PM -0700, Benson Leung wrote:
> > Hi Matthias,
> >
> > On Mon, Jul 15, 2019 at 12:10:17PM -0700, Matthias Kaehlcke wrote:
> > > Before doing any actual work cros_ec_accel_legacy_read() acquires
> > > a mutex, which is released at the end of the function. However for
> > > 'calibbias' channels the function returns directly, without releasing
> > > the lock. The next attempt to acquire the lock blocks forever. Instead
> > > of an explicit return statement use the common return path, which
> > > releases the lock.
> > >
> > > Fixes: 11b86c7004ef1 ("platform/chrome: Add cros_ec_accel_legacy driver")
> > > Signed-off-by: Matthias Kaehlcke 
> > > ---
> > >  drivers/iio/accel/cros_ec_accel_legacy.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
> > > b/drivers/iio/accel/cros_ec_accel_legacy.c
> > > index 46bb2e421bb9..27ca4a64dddf 100644
> > > --- a/drivers/iio/accel/cros_ec_accel_legacy.c
> > > +++ b/drivers/iio/accel/cros_ec_accel_legacy.c
> > > @@ -206,7 +206,8 @@ static int cros_ec_accel_legacy_read(struct iio_dev 
> > > *indio_dev,
> > > case IIO_CHAN_INFO_CALIBBIAS:
> > > /* Calibration not supported. */
> > > *val = 0;
> > > -   return IIO_VAL_INT;
> > > +   ret = IIO_VAL_INT;
> > > +   break;
> >
> > The value of ret is not used below this. It seems to be only used in
> > case IIO_CHAN_INFO_RAW. In fact, with your change,
> > there's no return value at all and we just reach the end of
> > cros_ec_accel_legacy_read.
> >
> > > default:
> > > return -EINVAL;
> > > }
> >
>
> I messed up. I was over-confident that a FROMLIST patch in our 4.19
> kernel + this patch applying on upstream means that upstream uses the
> same code. I should have double-checked that the upstream context is
> actually the same.
>
> Sorry for the noise.


[PATCH v6 2/4] iio: cros_ec_accel_legacy: Fix incorrect channel setting

2019-07-15 Thread Gwendal Grignou
INFO_SCALE is set both for each channel and all channels.
iio is using all channel setting, so the error was not user visible.

Signed-off-by: Gwendal Grignou 
---
 drivers/iio/accel/cros_ec_accel_legacy.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
b/drivers/iio/accel/cros_ec_accel_legacy.c
index 46bb2e421bb9..ad19d9c716f4 100644
--- a/drivers/iio/accel/cros_ec_accel_legacy.c
+++ b/drivers/iio/accel/cros_ec_accel_legacy.c
@@ -319,7 +319,6 @@ static const struct iio_chan_spec_ext_info 
cros_ec_accel_legacy_ext_info[] = {
.modified = 1,  \
.info_mask_separate =   \
BIT(IIO_CHAN_INFO_RAW) |\
-   BIT(IIO_CHAN_INFO_SCALE) |  \
BIT(IIO_CHAN_INFO_CALIBBIAS),   \
.info_mask_shared_by_all = BIT(IIO_CHAN_INFO_SCALE),\
.ext_info = cros_ec_accel_legacy_ext_info,  \
-- 
2.22.0.510.g264f2c817a-goog



[PATCH v6 3/4] iio: cros_ec_accel_legacy: Use cros_ec_sensors_core

2019-07-15 Thread Gwendal Grignou
Remove duplicate code in cros-ec-accel-legacy,
use cros-ec-sensors-core functions and structures when possible.

On glimmer, check the 2 accelerometers are presented and working.

Signed-off-by: Gwendal Grignou 
Acked-by: Jonathan Cameron 
---
 drivers/iio/accel/Kconfig|   4 +-
 drivers/iio/accel/cros_ec_accel_legacy.c | 336 ---
 2 files changed, 55 insertions(+), 285 deletions(-)

diff --git a/drivers/iio/accel/Kconfig b/drivers/iio/accel/Kconfig
index 62a970a20219..7d0848f9ea45 100644
--- a/drivers/iio/accel/Kconfig
+++ b/drivers/iio/accel/Kconfig
@@ -201,9 +201,7 @@ config HID_SENSOR_ACCEL_3D
 
 config IIO_CROS_EC_ACCEL_LEGACY
tristate "ChromeOS EC Legacy Accelerometer Sensor"
-   select IIO_BUFFER
-   select IIO_TRIGGERED_BUFFER
-   select CROS_EC_LPC_REGISTER_DEVICE
+   depends on IIO_CROS_EC_SENSORS_CORE
help
  Say yes here to get support for accelerometers on Chromebook using
  legacy EC firmware.
diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
b/drivers/iio/accel/cros_ec_accel_legacy.c
index ad19d9c716f4..f65578c65a1c 100644
--- a/drivers/iio/accel/cros_ec_accel_legacy.c
+++ b/drivers/iio/accel/cros_ec_accel_legacy.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -25,191 +26,51 @@
 
 #define DRV_NAME   "cros-ec-accel-legacy"
 
+#define CROS_EC_SENSOR_LEGACY_NUM 2
 /*
  * Sensor scale hard coded at 10 bits per g, computed as:
  * g / (2^10 - 1) = 0.009586168; with g = 9.80665 m.s^-2
  */
 #define ACCEL_LEGACY_NSCALE 9586168
 
-/* Indices for EC sensor values. */
-enum {
-   X,
-   Y,
-   Z,
-   MAX_AXIS,
-};
-
-/* State data for cros_ec_accel_legacy iio driver. */
-struct cros_ec_accel_legacy_state {
-   struct cros_ec_device *ec;
-
-   /*
-* Array holding data from a single capture. 2 bytes per channel
-* for the 3 channels plus the timestamp which is always last and
-* 8-bytes aligned.
-*/
-   s16 capture_data[8];
-   s8 sign[MAX_AXIS];
-   u8 sensor_num;
-};
-
-static int ec_cmd_read_u8(struct cros_ec_device *ec, unsigned int offset,
- u8 *dest)
-{
-   return ec->cmd_readmem(ec, offset, 1, dest);
-}
-
-static int ec_cmd_read_u16(struct cros_ec_device *ec, unsigned int offset,
-  u16 *dest)
-{
-   __le16 tmp;
-   int ret = ec->cmd_readmem(ec, offset, 2, );
-
-   *dest = le16_to_cpu(tmp);
-
-   return ret;
-}
-
-/**
- * read_ec_until_not_busy() - Read from EC status byte until it reads not busy.
- * @st: Pointer to state information for device.
- *
- * This function reads EC status until its busy bit gets cleared. It does not
- * wait indefinitely and returns -EIO if the EC status is still busy after a
- * few hundreds milliseconds.
- *
- * Return: 8-bit status if ok, -EIO on error
- */
-static int read_ec_until_not_busy(struct cros_ec_accel_legacy_state *st)
-{
-   struct cros_ec_device *ec = st->ec;
-   u8 status;
-   int attempts = 0;
-
-   ec_cmd_read_u8(ec, EC_MEMMAP_ACC_STATUS, );
-   while (status & EC_MEMMAP_ACC_STATUS_BUSY_BIT) {
-   /* Give up after enough attempts, return error. */
-   if (attempts++ >= 50)
-   return -EIO;
-
-   /* Small delay every so often. */
-   if (attempts % 5 == 0)
-   msleep(25);
-
-   ec_cmd_read_u8(ec, EC_MEMMAP_ACC_STATUS, );
-   }
-
-   return status;
-}
-
-/**
- * read_ec_accel_data_unsafe() - Read acceleration data from EC shared memory.
- * @st:Pointer to state information for device.
- * @scan_mask: Bitmap of the sensor indices to scan.
- * @data:  Location to store data.
- *
- * This is the unsafe function for reading the EC data. It does not guarantee
- * that the EC will not modify the data as it is being read in.
- */
-static void read_ec_accel_data_unsafe(struct cros_ec_accel_legacy_state *st,
- unsigned long scan_mask, s16 *data)
-{
-   int i = 0;
-   int num_enabled = bitmap_weight(_mask, MAX_AXIS);
-
-   /* Read all sensors enabled in scan_mask. Each value is 2 bytes. */
-   while (num_enabled--) {
-   i = find_next_bit(_mask, MAX_AXIS, i);
-   ec_cmd_read_u16(st->ec,
-   EC_MEMMAP_ACC_DATA +
-   sizeof(s16) *
-   (1 + i + st->sensor_num * MAX_AXIS),
-   data);
-   *data *= st->sign[i];
-   i++;
-   data++;
-   }
-}
-
-/**
- * read_ec_accel_data() - Read acceleration data from EC shared memory.
- * @st:Pointer to state information for device.
- * @scan_mask: Bitmap of the sensor indices to scan.
- * @data:  Location to store data.
- *
- * This is the safe function for reading the EC data. It 

  1   2   3   4   5   6   7   8   9   10   >