date:20190910

Re: [PATCH 0/2] Fix SEV user-space mapping of unencrypted coherent memory

2019-09-10 Thread Ingo Molnar



* Thomas Hellström (VMware)  wrote:

> With SEV and sometimes with SME encryption, The dma api coherent memory is
> typically unencrypted, meaning the linear kernel map has the encryption
> bit cleared. However, default page protection returned from vm_get_page_prot()
> has the encryption bit set. So to compute the correct page protection we need
> to clear the encryption bit.
> 
> Also, in order for the encryption bit setting to survive across do_mmap() and
> mprotect_fixup(), We need to make pgprot_modify() aware of it and not touch 
> it.
> Therefore make sme_me_mask part of _PAGE_CHG_MASK and make sure
> pgprot_modify() preserves also cleared bits that are part of _PAGE_CHG_MASK,
> not just set bits. The use of pgprot_modify() is currently quite limited and
> easy to audit.
> 
> (Note that the encryption status is not logically encoded in the pfn but in
> the page protection even if an address line in the physical address is used).
> 
> The patchset has seen some sanity testing by exporting dma_pgprot() and
> using it in the vmwgfx mmap handler with SEV enabled.
> 
> Changes since:
> RFC:
> - Make sme_me_mask port of _PAGE_CHG_MASK rather than using it by its own in
>   pgprot_modify().

Could you please add a "why is this patch-set needed", not just describe 
the "what does this patch set do"? I've seen zero discussion in the three 
changelogs of exactly why we'd want this, which drivers and features are 
affected and in what way, etc.

It's called a "fix" but doesn't explain what bad behavior it fixes.

Thanks,

Ingo

[PATCH] Staging: exfat: Avoid use of strcpy

2019-09-10 Thread Sandro Volery

Replaced strcpy with strscpy in exfat_core.c.

Signed-off-by: Sandro Volery 
---
 drivers/staging/exfat/exfat_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/exfat/exfat_core.c 
b/drivers/staging/exfat/exfat_core.c
index da8c58149c35..c71b145e8a24 100644
--- a/drivers/staging/exfat/exfat_core.c
+++ b/drivers/staging/exfat/exfat_core.c
@@ -2964,7 +2964,7 @@ s32 resolve_path(struct inode *inode, char *path, struct 
chain_t *p_dir,
if (strlen(path) >= (MAX_NAME_LENGTH * MAX_CHARSET_SIZE))
return FFS_INVALIDPATH;
 
-   strcpy(name_buf, path);
+   strscpy(name_buf, path, sizeof(name_buf));
 
nls_cstring_to_uniname(sb, p_uniname, name_buf, );
if (lossy)
-- 
2.23.0

RE: [PATCH v5 2/2] mailbox: introduce ARM SMC based mailbox

2019-09-10 Thread Peng Fan

Hi Andre,

> Subject: Re: [PATCH v5 2/2] mailbox: introduce ARM SMC based mailbox
> 
> On Wed, 28 Aug 2019 03:03:02 +
> Peng Fan  wrote:
> 
> Hi,
> 
> > From: Peng Fan 
> >
> > This mailbox driver implements a mailbox which signals transmitted
> > data via an ARM smc (secure monitor call) instruction. The mailbox
> > receiver is implemented in firmware and can synchronously return data
> > when it returns execution to the non-secure world again.
> > An asynchronous receive path is not implemented.
> > This allows the usage of a mailbox to trigger firmware actions on SoCs
> > which either don't have a separate management processor or on which
> > such a core is not available. A user of this mailbox could be the SCP
> > interface.
> >
> > Modified from Andre Przywara's v2 patch
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fpatchwork%2Fpatch%2F812999%2Fdata=02%7C01%7
> Cpeng.fa
> >
> n%40nxp.com%7Cce8d04bcbdea4de6a77a08d7353c4e35%7C686ea1d3bc2b
> 4c6fa92cd
> >
> 99c5c301635%7C0%7C0%7C637036405476727893sdata=y5%2FI%2B
> w%2FPOypEh
> > zh6gWdXAGMGnl677B7gDZsX%2F5zfAQw%3Dreserved=0
> >
> > Cc: Andre Przywara 
> > Signed-off-by: Peng Fan 
> > ---
> >  drivers/mailbox/Kconfig   |   7 ++
> >  drivers/mailbox/Makefile  |   2 +
> >  drivers/mailbox/arm-smc-mailbox.c | 215
> > ++
> >  3 files changed, 224 insertions(+)
> >  create mode 100644 drivers/mailbox/arm-smc-mailbox.c
> >
> > diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig index
> > ab4eb750bbdd..7707ee26251a 100644
> > --- a/drivers/mailbox/Kconfig
> > +++ b/drivers/mailbox/Kconfig
> > @@ -16,6 +16,13 @@ config ARM_MHU
> >   The controller has 3 mailbox channels, the last of which can be
> >   used in Secure mode only.
> >
> > +config ARM_SMC_MBOX
> > +   tristate "Generic ARM smc mailbox"
> > +   depends on OF && HAVE_ARM_SMCCC
> > +   help
> > + Generic mailbox driver which uses ARM smc calls to call into
> > + firmware for triggering mailboxes.
> > +
> >  config IMX_MBOX
> > tristate "i.MX Mailbox"
> > depends on ARCH_MXC || COMPILE_TEST
> > diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile index
> > c22fad6f696b..93918a84c91b 100644
> > --- a/drivers/mailbox/Makefile
> > +++ b/drivers/mailbox/Makefile
> > @@ -7,6 +7,8 @@ obj-$(CONFIG_MAILBOX_TEST)  += mailbox-test.o
> >
> >  obj-$(CONFIG_ARM_MHU)  += arm_mhu.o
> >
> > +obj-$(CONFIG_ARM_SMC_MBOX) += arm-smc-mailbox.o
> > +
> >  obj-$(CONFIG_IMX_MBOX) += imx-mailbox.o
> >
> >  obj-$(CONFIG_ARMADA_37XX_RWTM_MBOX)+=
> armada-37xx-rwtm-mailbox.o
> > diff --git a/drivers/mailbox/arm-smc-mailbox.c
> > b/drivers/mailbox/arm-smc-mailbox.c
> > new file mode 100644
> > index ..76a2ae11ee4d
> > --- /dev/null
> > +++ b/drivers/mailbox/arm-smc-mailbox.c
> > @@ -0,0 +1,215 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (C) 2016,2017 ARM Ltd.
> > + * Copyright 2019 NXP
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include  #include 
> > +#include 
> > +
> > +#define ARM_SMC_MBOX_MEM_TRANS BIT(0)
> > +
> > +struct arm_smc_chan_data {
> > +   u32 function_id;
> > +   u32 chan_id;
> > +   u32 flags;
> > +};
> > +
> > +struct arm_smccc_mbox_cmd {
> > +   unsigned long a0, a1, a2, a3, a4, a5, a6, a7;
> 
> I think this is one or even two registers too long?
> The SMCCC speaks of the function ID in x0/r0 and six arguments, with a
> "client ID" being an optional seventh argument. Looking at the description
> there I believe we cannot use the client ID here for this purpose, as this is
> supposed to be set by a hypervisor before passing on an SMC to EL3 firmware.
> KVM does not allow passing on SMCs in this way.

I'll drop a7.

> 
> Also, while using "long" in here seems to make sense from the mailbox and
> SMC point of view, aliasing this to the mailbox client provided data seems
> dangerous to me, as this exposes the difference between arm32 and arm64
> to the client. I think this is not what we want, the client should not be
> architecture specific.

I see.

> 
> > +};
> > +
> > +typedef unsigned long (smc_mbox_fn)(unsigned long, unsigned long,
> > +   unsigned long, unsigned long,
> > +   unsigned long, unsigned long,
> > +   unsigned long, unsigned long); static 
> > smc_mbox_fn
> > +*invoke_smc_mbox_fn;
> > +
> > +static int arm_smc_send_data(struct mbox_chan *link, void *data) {
> > +   struct arm_smc_chan_data *chan_data = link->con_priv;
> > +   struct arm_smccc_mbox_cmd *cmd = data;
> > +   unsigned long ret;
> > +   u32 function_id;
> > +   u32 chan_id;
> > +
> > +   if (chan_data->flags & ARM_SMC_MBOX_MEM_TRANS) {
> > +   if (chan_data->function_id != UINT_MAX)
> > +   function_id = chan_data->function_id;
> > +   else
> > +   function_id =

Re: [PATCH] gpiolib: don't clear FLAG_IS_OUT when emulating open-drain/open-source

2019-09-10 Thread Bartosz Golaszewski

wt., 10 wrz 2019 o 12:48 Sasha Levin  napisał(a):
>
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: c663e5f56737 gpio: support native single-ended hardware 
> drivers.
>
> The bot has tested the following trees: v5.2.13, v4.19.71, v4.14.142, 
> v4.9.191.
>
> v5.2.13: Build OK!
> v4.19.71: Build OK!
> v4.14.142: Failed to apply! Possible dependencies:
> 02e479808b5d ("gpio: Alter semantics of *raw* operations to actually be 
> raw")
> fac9d8850a0c ("gpio: Get rid of _prefix and __prefixes")
>
> v4.9.191: Failed to apply! Possible dependencies:
> 02e479808b5d ("gpio: Alter semantics of *raw* operations to actually be 
> raw")
> 0db0f26c2c5d ("pinctrl-sx150x: Convert driver to use regmap API")
> 2956b5d94a76 ("pinctrl / gpio: Introduce .set_config() callback for GPIO 
> chips")
> 46a5c112a401 ("gpio: merrifield: Implement gpio_get_direction callback")
> 6489677f86c3 ("pinctrl-sx150x: Replace sx150x_*_cfg by means of regmap 
> API")
> 6697546d650d ("pinctrl-sx150x: Add SX1503 specific data")
> 9e80f9064e73 ("pinctrl: Add SX150X GPIO Extender Pinctrl Driver")
> e3ba81206811 ("pinctrl-sx150x: Improve OF device matching code")
> e7a718f9b1c1 ("gpio: merrifield: Add support for hardware debouncer")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>

Once it's accepted, I'll prepare backports.

Bart

Re: [PATCH V8 5/5] mmc: host: sdhci-pci: Add Genesys Logic GL975x support

2019-09-10 Thread Ben Chuang

On Wed, Sep 11, 2019 at 12:42 PM Guenter Roeck  wrote:
>
> On Fri, Sep 06, 2019 at 10:33:26AM +0800, Ben Chuang wrote:
> > From: Ben Chuang 
> >
> > Add support for the GL9750 and GL9755 chipsets.
> >
> > Enable v4 mode and wait 5ms after set 1.8V signal enable for GL9750/
> > GL9755. Fix the value of SDHCI_MAX_CURRENT register and use the vendor
> > tuning flow for GL9750.
> >
> > Co-developed-by: Michael K Johnson 
> > Signed-off-by: Michael K Johnson 
> > Signed-off-by: Ben Chuang 
> > ---
> >  drivers/mmc/host/Kconfig  |   1 +
> >  drivers/mmc/host/Makefile |   2 +-
> >  drivers/mmc/host/sdhci-pci-core.c |   2 +
> >  drivers/mmc/host/sdhci-pci-gli.c  | 355 ++
> >  drivers/mmc/host/sdhci-pci.h  |   5 +
> >  5 files changed, 364 insertions(+), 1 deletion(-)
> >  create mode 100644 drivers/mmc/host/sdhci-pci-gli.c
> >
> > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> > index 931770f17087..9fbfff514d6c 100644
> > --- a/drivers/mmc/host/Kconfig
> > +++ b/drivers/mmc/host/Kconfig
> > @@ -94,6 +94,7 @@ config MMC_SDHCI_PCI
> >   depends on MMC_SDHCI && PCI
> >   select MMC_CQHCI
> >   select IOSF_MBI if X86
> > + select MMC_SDHCI_IO_ACCESSORS
> >   help
> > This selects the PCI Secure Digital Host Controller Interface.
> > Most controllers found today are PCI devices.
> > diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
> > index 73578718f119..661445415090 100644
> > --- a/drivers/mmc/host/Makefile
> > +++ b/drivers/mmc/host/Makefile
> > @@ -13,7 +13,7 @@ obj-$(CONFIG_MMC_MXS)   += mxs-mmc.o
> >  obj-$(CONFIG_MMC_SDHCI)  += sdhci.o
> >  obj-$(CONFIG_MMC_SDHCI_PCI)  += sdhci-pci.o
> >  sdhci-pci-y  += sdhci-pci-core.o sdhci-pci-o2micro.o 
> > sdhci-pci-arasan.o \
> > -sdhci-pci-dwc-mshc.o
> > +sdhci-pci-dwc-mshc.o sdhci-pci-gli.o
> >  obj-$(subst m,y,$(CONFIG_MMC_SDHCI_PCI)) += sdhci-pci-data.o
> >  obj-$(CONFIG_MMC_SDHCI_ACPI) += sdhci-acpi.o
> >  obj-$(CONFIG_MMC_SDHCI_PXAV3)+= sdhci-pxav3.o
> > diff --git a/drivers/mmc/host/sdhci-pci-core.c 
> > b/drivers/mmc/host/sdhci-pci-core.c
> > index 4154ee11b47d..e5835fbf73bc 100644
> > --- a/drivers/mmc/host/sdhci-pci-core.c
> > +++ b/drivers/mmc/host/sdhci-pci-core.c
> > @@ -1682,6 +1682,8 @@ static const struct pci_device_id pci_ids[] = {
> >   SDHCI_PCI_DEVICE(O2, SEABIRD1, o2),
> >   SDHCI_PCI_DEVICE(ARASAN, PHY_EMMC, arasan),
> >   SDHCI_PCI_DEVICE(SYNOPSYS, DWC_MSHC, snps),
> > + SDHCI_PCI_DEVICE(GLI, 9750, gl9750),
> > + SDHCI_PCI_DEVICE(GLI, 9755, gl9755),
> >   SDHCI_PCI_DEVICE_CLASS(AMD, SYSTEM_SDHCI, PCI_CLASS_MASK, amd),
> >   /* Generic SD host controller */
> >   {PCI_DEVICE_CLASS(SYSTEM_SDHCI, PCI_CLASS_MASK)},
> > diff --git a/drivers/mmc/host/sdhci-pci-gli.c 
> > b/drivers/mmc/host/sdhci-pci-gli.c
> > new file mode 100644
> > index ..94462b94abec
> > --- /dev/null
> > +++ b/drivers/mmc/host/sdhci-pci-gli.c
> > @@ -0,0 +1,355 @@
> > +// SPDX-License-Identifier: GPL-2.0+
> > +/*
> > + * Copyright (C) 2019 Genesys Logic, Inc.
> > + *
> > + * Authors: Ben Chuang 
> > + *
> > + * Version: v0.9.0 (2019-08-08)
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include "sdhci.h"
> > +#include "sdhci-pci.h"
> > +
> > +/*  Genesys Logic extra registers */
> > +#define SDHCI_GLI_9750_WT 0x800
> > +#define   SDHCI_GLI_9750_WT_EN  BIT(0)
> > +#define   GLI_9750_WT_EN_ON  0x1
> > +#define   GLI_9750_WT_EN_OFF 0x0
> > +
> > +#define SDHCI_GLI_9750_DRIVING  0x860
> > +#define   SDHCI_GLI_9750_DRIVING_1GENMASK(11, 0)
> > +#define   SDHCI_GLI_9750_DRIVING_2GENMASK(27, 26)
> > +#define   GLI_9750_DRIVING_1_VALUE0xFFF
> > +#define   GLI_9750_DRIVING_2_VALUE0x3
> > +
> > +#define SDHCI_GLI_9750_PLL 0x864
> > +#define   SDHCI_GLI_9750_PLL_TX2_INVBIT(23)
> > +#define   SDHCI_GLI_9750_PLL_TX2_DLYGENMASK(22, 20)
> > +#define   GLI_9750_PLL_TX2_INV_VALUE0x1
> > +#define   GLI_9750_PLL_TX2_DLY_VALUE0x0
> > +
> > +#define SDHCI_GLI_9750_SW_CTRL  0x874
> > +#define   SDHCI_GLI_9750_SW_CTRL_4GENMASK(7, 6)
> > +#define   GLI_9750_SW_CTRL_4_VALUE0x3
> > +
> > +#define SDHCI_GLI_9750_MISC0x878
> > +#define   SDHCI_GLI_9750_MISC_TX1_INVBIT(2)
> > +#define   SDHCI_GLI_9750_MISC_RX_INV BIT(3)
> > +#define   SDHCI_GLI_9750_MISC_TX1_DLYGENMASK(6, 4)
> > +#define   GLI_9750_MISC_TX1_INV_VALUE0x0
> > +#define   GLI_9750_MISC_RX_INV_ON0x1
> > +#define   GLI_9750_MISC_RX_INV_OFF   0x0
> > +#define   GLI_9750_MISC_RX_INV_VALUE GLI_9750_MISC_RX_INV_OFF
> > +#define   GLI_9750_MISC_TX1_DLY_VALUE0x5
> > +
> > +#define SDHCI_GLI_9750_TUNING_CONTROL  0x540
> > +#define   SDHCI_GLI_9750_TUNING_CONTROL_EN  BIT(4)
> > +#define

Re: [PATCH] driver core: ensure a device has valid node id in device_add()

2019-09-10 Thread Michal Hocko

On Tue 10-09-19 14:53:39, Michal Hocko wrote:
> On Tue 10-09-19 20:47:40, Yunsheng Lin wrote:
> > On 2019/9/10 19:12, Greg KH wrote:
> > > On Tue, Sep 10, 2019 at 01:04:51PM +0200, Michal Hocko wrote:
> > >> On Tue 10-09-19 18:58:05, Yunsheng Lin wrote:
> > >>> On 2019/9/10 17:31, Greg KH wrote:
> >  On Tue, Sep 10, 2019 at 02:43:32PM +0800, Yunsheng Lin wrote:
> > > On 2019/9/9 17:53, Greg KH wrote:
> > >> On Mon, Sep 09, 2019 at 02:04:23PM +0800, Yunsheng Lin wrote:
> > >>> Currently a device does not belong to any of the numa nodes
> > >>> (dev->numa_node is NUMA_NO_NODE) when the node id is neither
> > >>> specified by fw nor by virtual device layer and the device has
> > >>> no parent device.
> > >>
> > >> Is this really a problem?
> > >
> > > Not really.
> > > Someone need to guess the node id when it is not specified, right?
> > 
> >  No, why?  Guessing guarantees you will get it wrong on some systems.
> > 
> >  Are you seeing real problems because the id is not being set?  What
> >  problem is this fixing that you can actually observe?
> > >>>
> > >>> When passing the return value of dev_to_node() to cpumask_of_node()
> > >>> without checking the node id if the node id is not valid, there is
> > >>> global-out-of-bounds detected by KASAN as below:
> > >>
> > >> OK, I seem to remember this being brought up already. And now when I
> > >> think about it, we really want to make cpumask_of_node NUMA_NO_NODE
> > >> aware. That means using the same trick the allocator does for this
> > >> special case.
> > > 
> > > That seems reasonable to me, and much more "obvious" as to what is going
> > > on.
> > > 
> > 
> > Ok, thanks for the suggestion.
> > 
> > For arm64 and x86, there are two versions of cpumask_of_node().
> > 
> > when CONFIG_DEBUG_PER_CPU_MAPS is defined, the cpumask_of_node()
> >in arch/x86/mm/numa.c is used, which does partial node id checking:
> > 
> > const struct cpumask *cpumask_of_node(int node)
> > {
> > if (node >= nr_node_ids) {
> > printk(KERN_WARNING
> > "cpumask_of_node(%d): node > nr_node_ids(%u)\n",
> > node, nr_node_ids);
> > dump_stack();
> > return cpu_none_mask;
> > }
> > if (node_to_cpumask_map[node] == NULL) {
> > printk(KERN_WARNING
> > "cpumask_of_node(%d): no node_to_cpumask_map!\n",
> > node);
> > dump_stack();
> > return cpu_online_mask;
> > }
> > return node_to_cpumask_map[node];
> > }
> > 
> > when CONFIG_DEBUG_PER_CPU_MAPS is undefined, the cpumask_of_node()
> >in arch/x86/include/asm/topology.h is used:
> > 
> > static inline const struct cpumask *cpumask_of_node(int node)
> > {
> > return node_to_cpumask_map[node];
> > }
> 
> I would simply go with. There shouldn't be any need for heavy weight
> checks that CONFIG_DEBUG_PER_CPU_MAPS has.
> 
> static inline const struct cpumask *cpumask_of_node(int node)
> {
>   /* A nice comment goes here */
>   if (node == NUMA_NO_NODE)
>   return node_to_cpumask_map[numa_mem_id()];
> return node_to_cpumask_map[node];
> }

Sleeping over this and thinking more about the actual semantic the above
is wrong. We cannot really copy the page allocator logic. Why? Simply
because the page allocator doesn't enforce the near node affinity. It
just picks it up as a preferred node but then it is free to fallback to
any other numa node. This is not the case here and node_to_cpumask_map will
only restrict to the particular node's cpus which would have really non
deterministic behavior depending on where the code is executed. So in
fact we really want to return cpu_online_mask for NUMA_NO_NODE.

Sorry about the confusion.
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

Re: [PATCH 00/10] Hwpoison soft-offline rework

2019-09-10 Thread Naoya Horiguchi

Hi Oscar,

Thank you for your working on this.

My testing shows the following error:

  [ 1926.932435] ===> testcase 'mce_ksm_soft-offline_avoid_access.auto2' start
  [ 1927.155321] bash (15853): drop_caches: 3
  [ 1929.019094] page:e5c384c4cd40 refcount:1 mapcount:0 
mapping:0003 index:0x70001
  [ 1929.021586] anon
  [ 1929.021588] flags: 
0x57ffe00088000e(referenced|uptodate|dirty|swapbacked|hwpoison)
  [ 1929.024289] raw: 0057ffe00088000e dead0100 dead0122 
0003
  [ 1929.026611] raw: 00070001   

  [ 1929.028760] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page))
  [ 1929.030559] [ cut here ]
  [ 1929.031684] kernel BUG at mm/internal.h:73!
  [ 1929.032738] invalid opcode:  [#1] SMP PTI
  [ 1929.033941] CPU: 3 PID: 16052 Comm: mceinj.sh Not tainted 
5.3.0-rc8-v5.3-rc8-190911-1025-00010-ga436dbce8674+ #18
  [ 1929.037137] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.12.0-2.fc30 04/01/2014
  [ 1929.040066] RIP: 0010:page_set_poison+0xf9/0x160
  [ 1929.041665] Code: 63 02 7f 31 c0 5b 5d 41 5c c3 48 c7 c6 d0 d1 0c b0 48 89 
df e8 88 bb f8 ff 0f 0b 48 c7 c6 f0 2a 0d b0 48 89 df e8 77 bb f8 ff <0f> 0b 48 
8b 45 00 48 c1 e8 33 83 e0 07 83 f8 04 75 89 48 8b 45 08
  [ 1929.047773] RSP: 0018:b4fb8a73bde0 EFLAGS: 00010246
  [ 1929.049511] RAX: 0039 RBX: e5c384c4cd40 RCX: 

  [ 1929.051870] RDX:  RSI:  RDI: 
b00d1814
  [ 1929.054238] RBP: e5c384c4cd40 R08: 0596 R09: 
0048
  [ 1929.056599] R10:  R11: b4fb8a73bc58 R12: 

  [ 1929.058986] R13: b4fb8a73be10 R14: 00131335 R15: 
0001
  [ 1929.061366] FS:  7fc9e208d740() GS:9fa9bdb0() 
knlGS:
  [ 1929.063842] CS:  0010 DS:  ES:  CR0: 80050033
  [ 1929.065429] CR2: 55946c05d192 CR3: 0001365f2000 CR4: 
001406e0
  [ 1929.067373] Call Trace:
  [ 1929.068094]  soft_offline_page+0x2be/0x600
  [ 1929.069246]  soft_offline_page_store+0xdf/0x110
  [ 1929.070510]  kernfs_fop_write+0x116/0x190
  [ 1929.071618]  vfs_write+0xa5/0x1a0
  [ 1929.072614]  ksys_write+0x59/0xd0
  [ 1929.073548]  do_syscall_64+0x5f/0x1a0
  [ 1929.074554]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 1929.075957] RIP: 0033:0x7fc9e217ded8

It seems that soft-offlining on ksm pages is affected by this changeset.
Could you try to handle this?

- Naoya

On Tue, Sep 10, 2019 at 12:30:06PM +0200, Oscar Salvador wrote:
>
> This patchset was based on Naoya's hwpoison rework [1], so thanks to him
> for the initial work.
>
> This patchset aims to fix some issues laying in soft-offline handling,
> but it also takes the chance and takes some further steps to perform
> cleanups and some refactoring as well.
>
>  - Motivation:
>
>A customer and I were facing an issue where poisoned pages we returned
>back to user-space after having offlined them properly.
>This was only seend under some memory stress + soft offlining pages.
>After some anaylsis, it became clear that the problem was that
>when kcompactd kicked in to migrate pages over, compaction_alloc
>callback was handing poisoned pages to the migrate routine.
>Once this page was later on fault in, __do_page_fault returned
>VM_FAULT_HWPOISON making the process being killed.
>
>All this could happen because isolate_freepages_block and
>fast_isolate_freepages just check for the page to be PageBuddy,
>and since 1) poisoned pages can be part of a higher order page
>and 2) poisoned pages are also Page Buddy, they can sneak in easily.
>
>I also saw some problem with swap pages, but I suspected to be the
>same sort of problem, so I did not follow that trace.
>
>The full explanation can be see in [2].
>
>  - Approach:
>
>The taken approach is to not let poisoned pages hit neither
>pcplists nor buddy freelists.
>This is achieved by:
>
> In-use pages:
>
>* Normal pages
>
>1) do not release the last reference count after the
>   invalidation/migration of the page.
>2) the page is being handed to page_set_poison, which does:
>   2a) sets PageHWPoison flag
>   2b) calls put_page (only to be able to call __page_cache_release)
>   Since poisoned pages are skipped in free_pages_prepare,
>   this put_page is safe.
>   2c) Sets the refcount to 1
>
>* Hugetlb pages
>
>1) Hand the page to page_set_poison after migration
>2) page_set_poison does:
>   2a) Calls dissolve_free_huge_page
>   2b) If ranged to be dissolved contains poisoned pages,
>   we free the rangeas order-0 pages (as we do with gigantic hugetlb 
> page),
>   so free_pages_prepare will skip them accordingly.
>   2c) Sets the refcount to 1
>
> Free pages:
>
>* Normal pages:
>
>1) Take

Re: linux-next: Signed-off-by missing for commit in the net-next tree

2019-09-10 Thread Luciano Coelho

On Wed, 2019-09-11 at 00:42 +1000, Stephen Rothwell wrote:
> Hi all,
> 
> Commit
> 
>   aa43ae121675 ("iwlwifi: LTR updates")
> 
> is missing a Signed-off-by from its committer.

Oops, that was my fault.  What can we do about it? Is it enough if I
give my s-o-b publicly here?

I hereby sign off this change:

Signed-off-by: Luca Coelho 

--
Cheers,
Luca.

Re: [PATCH v2] Fix fTPM on AMD Zen+ CPUs

2019-09-10 Thread Seunghun Han

> > And why is this allocating memory inside the acpi table walker? It
> > seems to me like the memory should be allocated once the mapping is
> > made.
> >
>
> Yes, this looks bad. Letting the walker build the list and then using
> it is, probably, a better idea.
>
> > Maybe all the mappings should be created from the ACPI ranges right
> > away?
> >
>
> I don't know if it's a good idea to just map them all instead of doing
> so only for relevant ones. Maybe it is safe, here I need an advice
> from a more knowledgeable person.
>

Vanya,
I also made a patch series to solve AMD's fTPM. My patch link is here,
https://lkml.org/lkml/2019/9/9/132 .

The maintainer, Jarkko, wanted me to remark on your patch, so I would
like to cooperate with you.

Your patch is good for me. If you are fine, I would like to take your
patch and merge it with my patch series. I also would like to change
some points Jason mentioned before.

Of course, I will leave your commit message and sign-off-by note.
According to the guideline below, I will just add co-developed-by and
sign-off-by notes behind you.
https://www.kernel.org/doc/html/v5.2/process/submitting-patches.html#when-to-use-acked-by-cc-and-co-developed-by

If you have any idea about our co-work, please let me know.
I hope we can solve AMD's fTPM problem soon.

Seunghun

[PATCH v1 1/1] pinctrl: iproc: Add 'get_direction' support

2019-09-10 Thread Rayagonda Kokatanur

Add 'get_direction' support to the iProc GPIO driver.

Signed-off-by: Rayagonda Kokatanur 
---
 drivers/pinctrl/bcm/pinctrl-iproc-gpio.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/pinctrl/bcm/pinctrl-iproc-gpio.c 
b/drivers/pinctrl/bcm/pinctrl-iproc-gpio.c
index b70058c..d782d70 100644
--- a/drivers/pinctrl/bcm/pinctrl-iproc-gpio.c
+++ b/drivers/pinctrl/bcm/pinctrl-iproc-gpio.c
@@ -355,6 +355,15 @@ static int iproc_gpio_direction_output(struct gpio_chip 
*gc, unsigned gpio,
return 0;
 }
 
+static int iproc_gpio_get_direction(struct gpio_chip *gc, unsigned int gpio)
+{
+   struct iproc_gpio *chip = gpiochip_get_data(gc);
+   unsigned int offset = IPROC_GPIO_REG(gpio, IPROC_GPIO_OUT_EN_OFFSET);
+   unsigned int shift = IPROC_GPIO_SHIFT(gpio);
+
+   return !(readl(chip->base + offset) & BIT(shift));
+}
+
 static void iproc_gpio_set(struct gpio_chip *gc, unsigned gpio, int val)
 {
struct iproc_gpio *chip = gpiochip_get_data(gc);
@@ -784,6 +793,7 @@ static int iproc_gpio_probe(struct platform_device *pdev)
gc->free = iproc_gpio_free;
gc->direction_input = iproc_gpio_direction_input;
gc->direction_output = iproc_gpio_direction_output;
+   gc->get_direction = iproc_gpio_get_direction;
gc->set = iproc_gpio_set;
gc->get = iproc_gpio_get;
 
-- 
1.9.1

Re: [PATCH v3 1/2] drm/virtio: Rewrite virtio_gpu_queue_ctrl_buffer using fenced version.

2019-09-10 Thread Gerd Hoffmann

On Tue, Sep 10, 2019 at 01:06:50PM -0700, David Riley wrote:
> Factor function in preparation to generating scatterlist prior to locking.

Patches are looking good now, but they don't apply.  What tree was used
to create them?

Latest virtio-gpu driver bits are in drm-misc-next (see
https://cgit.freedesktop.org/drm/drm-misc).

cheers,
  Gerd

Re: [Letux-kernel] [RFC PATCH 0/3] Enable 1GHz support on omap36xx

2019-09-10 Thread H. Nikolaus Schaller

Hi Adam,

> Am 11.09.2019 um 02:41 schrieb Adam Ford :
 

 The error message looks as if we have to enable multi_regulator.

>>> 
>>> That will enable both vdd and vbb regulators from what I can tell in the 
>>> driver.
>>> 
 And that may need to rename cpu0-supply to vdd-supply (unless the
 names can be configured).
>>> 
>>> That is consistent with what I found.  vdd-supply = <>; and
>>> vbb-supply = <_mpu_iva>;
>>> I put them both under the cpu node.  Unfortunately, when I did that,
>>> my board crashed.
>>> 
>>> I am thinking it has something to do with the abb_mpu_iva driver
>>> because until this point, we've always operated at 800MHz or lower
>>> which all have the same behavior in abb_mpu_iva.
>>> 
>>> With the patch you posted for the regulator, without the update to
>>> cpufreq,  and with debugging enabled, I received the following in
>>> dmesg:
>>> 
>>> [1.112518] ti_abb 483072f0.regulator-abb-mpu: Missing
>>> 'efuse-address' IO resource
>>> [1.112579] ti_abb 483072f0.regulator-abb-mpu: [0]v=1012500 ABB=0
>>> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
>>> [1.112609] ti_abb 483072f0.regulator-abb-mpu: [1]v=120 ABB=0
>>> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
>>> [1.112609] ti_abb 483072f0.regulator-abb-mpu: [2]v=1325000 ABB=0
>>> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
>>> [1.112640] ti_abb 483072f0.regulator-abb-mpu: [3]v=1375000 ABB=1
>>> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
>>> [1.112731] ti_abb 483072f0.regulator-abb-mpu: ti_abb_init_timings:
>>> Clk_rate=1300, sr2_cnt=0x0032
>>> 
>> 
>> Using an unmodified kernel, I changed the device tree to make the abb
>> regulator power the cpu instead of vcc.  After doing so, I was able to
>> read the microvolt value, and it matched the processor's desired OPP
>> voltage, and the debug code showed the abb regulator was attempting to
>> set the various index based on the needed voltage.  I think the abb
>> driver is working correctly.
>> 
>> I think tomorrow, I will re-apply the patches and tweak it again to
>> support both vdd and vbb regulators.  If it crashes again, I'll look
>> more closely at the logs to see if I can determine why.  I think it's
>> pretty close.  I also need to go back and find the SmartReflex stuff
>> as well.
>> 
> Well, I couldn't give it up for the night, so I thought I'd show my data dump
> 
> [9.807647] [ cut here ]
> [9.812469] WARNING: CPU: 0 PID: 68 at drivers/opp/core.c:630
> dev_pm_opp_set_rate+0x3cc/0x480
> [9.821044] Modules linked in: sha256_generic sha256_arm cfg80211
> joydev mousedev evdev snd_soc_omap_twl4030(+) leds_gpio led_class
> panel_simple pwm_omap_dmtimer gpio_keys pwm_bl cpufreq_dt omap3_isp v
> ideobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common
> bq27xxx_battery_hdq v4l2_fwnode snd_soc_omap_mcbsp bq27xxx_battery
> snd_soc_ti_sdma omap_wdt videodev mc omap_hdq wlcore_sdio wire cn ph
> y_twl4030_usb hwmon omap2430 musb_hdrc omap_mailbox twl4030_wdt
> watchdog udc_core rtc_twl snd_soc_twl4030 ohci_platform(+)
> snd_soc_core snd_pcm_dmaengine ohci_hcd snd_pcm ehci_omap(+)
> twl4030_pwrbutton sn
> d_timer twl4030_charger snd pwm_twl_led pwm_twl ehci_hcd industrialio
> soundcore twl4030_keypad matrix_keymap usbcore at24 tsc2004
> tsc200x_core usb_common omap_ssi hsi omapdss omapdss_base drm
> drm_panel_or
> ientation_quirks cec
> [9.894470] CPU: 0 PID: 68 Comm: kworker/0:2 Not tainted
> 5.3.0-rc3-00785-gfdfc7f21c6b7-dirty #5
> [9.903198] Hardware name: Generic OMAP36xx (Flattened Device Tree)
> [9.909515] Workqueue: events dbs_work_handler
> [9.914031] [] (unwind_backtrace) from []
> (show_stack+0x10/0x14)
> [9.921813] [] (show_stack) from []
> (dump_stack+0xb4/0xd4)
> [9.929107] [] (dump_stack) from []
> (__warn.part.3+0xa8/0xd4)
> [9.936614] [] (__warn.part.3) from []
> (warn_slowpath_null+0x40/0x4c)
> [9.944854] [] (warn_slowpath_null) from []
> (dev_pm_opp_set_rate+0x3cc/0x480)
> [9.953796] [] (dev_pm_opp_set_rate) from []
> (set_target+0x30/0x58 [cpufreq_dt])
> [9.963012] [] (set_target [cpufreq_dt]) from
> [] (__cpufreq_driver_target+0x188/0x514)
> [9.972717] [] (__cpufreq_driver_target) from
> [] (od_dbs_update+0x130/0x15c)
> [9.981567] [] (od_dbs_update) from []
> (dbs_work_handler+0x28/0x58)
> [9.989624] [] (dbs_work_handler) from []
> (process_one_work+0x20c/0x500)
> [9.998107] [] (process_one_work) from []
> (worker_thread+0x2c/0x5bc)
> [   10.006256] [] (worker_thread) from []
> (kthread+0x134/0x148)
> [   10.013702] [] (kthread) from []
> (ret_from_fork+0x14/0x2c)
> [   10.020965] Exception stack(0xde63bfb0 to 0xde63bff8)
> [   10.026062] bfa0: 
>   
> [   10.034271] bfc0:     
>   
> [   10.042510] bfe0:     0013 
> [   10.049224] ---[ end trace cf0e15fa4bd57700 ]---
> [   10.053894] cpu cpu0:

[PATCH v4 09/14] software node: simplify property_entry_read_string_array()

2019-09-10 Thread Dmitry Torokhov

There is no need to treat string arrays and single strings separately, we can go
exclusively by the element length in relation to data type size.

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c | 21 +++--
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index a019b5e90d3b..9c3e566c753e 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -173,28 +173,21 @@ static int property_entry_read_string_array(const struct 
property_entry *props,
const char *propname,
const char **strings, size_t nval)
 {
-   const struct property_entry *prop;
const void *pointer;
-   size_t array_len, length;
+   size_t length;
+   int array_len;
 
/* Find out the array length. */
-   prop = property_entry_get(props, propname);
-   if (!prop)
-   return -EINVAL;
-
-   if (prop->is_array)
-   /* Find the length of an array. */
-   array_len = property_entry_count_elems_of_size(props, propname,
- sizeof(const char *));
-   else
-   /* The array length for a non-array string property is 1. */
-   array_len = 1;
+   array_len = property_entry_count_elems_of_size(props, propname,
+  sizeof(const char *));
+   if (array_len < 0)
+   return array_len;
 
/* Return how many there are if strings is NULL. */
if (!strings)
return array_len;
 
-   array_len = min(nval, array_len);
+   array_len = min_t(size_t, nval, array_len);
length = array_len * sizeof(*strings);
 
pointer = property_entry_find(props, propname, length);
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 07/14] software node: remove property_entry_read_uNN_array functions

2019-09-10 Thread Dmitry Torokhov

There is absolutely no reason to have them as we can handle it all nicely in
property_entry_read_int_array().

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c | 85 +++
 1 file changed, 14 insertions(+), 71 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 726195d334e4..a019b5e90d3b 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -131,66 +131,6 @@ static const void *property_entry_find(const struct 
property_entry *props,
return pointer;
 }
 
-static int property_entry_read_u8_array(const struct property_entry *props,
-   const char *propname,
-   u8 *values, size_t nval)
-{
-   const void *pointer;
-   size_t length = nval * sizeof(*values);
-
-   pointer = property_entry_find(props, propname, length);
-   if (IS_ERR(pointer))
-   return PTR_ERR(pointer);
-
-   memcpy(values, pointer, length);
-   return 0;
-}
-
-static int property_entry_read_u16_array(const struct property_entry *props,
-const char *propname,
-u16 *values, size_t nval)
-{
-   const void *pointer;
-   size_t length = nval * sizeof(*values);
-
-   pointer = property_entry_find(props, propname, length);
-   if (IS_ERR(pointer))
-   return PTR_ERR(pointer);
-
-   memcpy(values, pointer, length);
-   return 0;
-}
-
-static int property_entry_read_u32_array(const struct property_entry *props,
-const char *propname,
-u32 *values, size_t nval)
-{
-   const void *pointer;
-   size_t length = nval * sizeof(*values);
-
-   pointer = property_entry_find(props, propname, length);
-   if (IS_ERR(pointer))
-   return PTR_ERR(pointer);
-
-   memcpy(values, pointer, length);
-   return 0;
-}
-
-static int property_entry_read_u64_array(const struct property_entry *props,
-const char *propname,
-u64 *values, size_t nval)
-{
-   const void *pointer;
-   size_t length = nval * sizeof(*values);
-
-   pointer = property_entry_find(props, propname, length);
-   if (IS_ERR(pointer))
-   return PTR_ERR(pointer);
-
-   memcpy(values, pointer, length);
-   return 0;
-}
-
 static int
 property_entry_count_elems_of_size(const struct property_entry *props,
   const char *propname, size_t length)
@@ -209,21 +149,24 @@ static int property_entry_read_int_array(const struct 
property_entry *props,
 unsigned int elem_size, void *val,
 size_t nval)
 {
+   const void *pointer;
+   size_t length;
+
if (!val)
return property_entry_count_elems_of_size(props, name,
  elem_size);
-   switch (elem_size) {
-   case sizeof(u8):
-   return property_entry_read_u8_array(props, name, val, nval);
-   case sizeof(u16):
-   return property_entry_read_u16_array(props, name, val, nval);
-   case sizeof(u32):
-   return property_entry_read_u32_array(props, name, val, nval);
-   case sizeof(u64):
-   return property_entry_read_u64_array(props, name, val, nval);
-   }
 
-   return -ENXIO;
+   if (!is_power_of_2(elem_size) || elem_size > sizeof(u64))
+   return -ENXIO;
+
+   length = nval * elem_size;
+
+   pointer = property_entry_find(props, name, length);
+   if (IS_ERR(pointer))
+   return PTR_ERR(pointer);
+
+   memcpy(val, pointer, length);
+   return 0;
 }
 
 static int property_entry_read_string_array(const struct property_entry *props,
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 04/14] software node: mark internal macros with double underscores

2019-09-10 Thread Dmitry Torokhov

Let's mark PROPERTY_ENTRY_* macros that are internal with double leading
underscores so users are not tempted to use them.

Signed-off-by: Dmitry Torokhov 
---
 include/linux/property.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/property.h b/include/linux/property.h
index f89b930ca4b7..2c9d4d209296 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -256,7 +256,7 @@ struct property_entry {
  * and structs.
  */
 
-#define PROPERTY_ENTRY_ARRAY_LEN(_name_, _type_, _Type_, _val_, _len_) \
+#define __PROPERTY_ENTRY_ARRAY_LEN(_name_, _type_, _Type_, _val_, _len_)\
 (struct property_entry) {  \
.name = _name_, \
.length = (_len_) * sizeof(_type_), \
@@ -266,13 +266,13 @@ struct property_entry {
 }
 
 #define PROPERTY_ENTRY_U8_ARRAY_LEN(_name_, _val_, _len_)  \
-   PROPERTY_ENTRY_ARRAY_LEN(_name_, u8, U8, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u8, U8, _val_, _len_)
 #define PROPERTY_ENTRY_U16_ARRAY_LEN(_name_, _val_, _len_) \
-   PROPERTY_ENTRY_ARRAY_LEN(_name_, u16, U16, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u16, U16, _val_, _len_)
 #define PROPERTY_ENTRY_U32_ARRAY_LEN(_name_, _val_, _len_) \
-   PROPERTY_ENTRY_ARRAY_LEN(_name_, u32, U32, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u32, U32, _val_, _len_)
 #define PROPERTY_ENTRY_U64_ARRAY_LEN(_name_, _val_, _len_) \
-   PROPERTY_ENTRY_ARRAY_LEN(_name_, u64, U64, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u64, U64, _val_, _len_)
 
 #define PROPERTY_ENTRY_STRING_ARRAY_LEN(_name_, _val_, _len_)  \
 (struct property_entry) {  \
@@ -294,7 +294,7 @@ struct property_entry {
 #define PROPERTY_ENTRY_STRING_ARRAY(_name_, _val_) \
PROPERTY_ENTRY_STRING_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
 
-#define PROPERTY_ENTRY_INTEGER(_name_, _type_, _Type_, _val_)  \
+#define __PROPERTY_ENTRY_INTEGER(_name_, _type_, _Type_, _val_)\
 (struct property_entry) {  \
.name = _name_, \
.length = sizeof(_type_),   \
@@ -303,13 +303,13 @@ struct property_entry {
 }
 
 #define PROPERTY_ENTRY_U8(_name_, _val_)   \
-   PROPERTY_ENTRY_INTEGER(_name_, u8, U8, _val_)
+   __PROPERTY_ENTRY_INTEGER(_name_, u8, U8, _val_)
 #define PROPERTY_ENTRY_U16(_name_, _val_)  \
-   PROPERTY_ENTRY_INTEGER(_name_, u16, U16, _val_)
+   __PROPERTY_ENTRY_INTEGER(_name_, u16, U16, _val_)
 #define PROPERTY_ENTRY_U32(_name_, _val_)  \
-   PROPERTY_ENTRY_INTEGER(_name_, u32, U32, _val_)
+   __PROPERTY_ENTRY_INTEGER(_name_, u32, U32, _val_)
 #define PROPERTY_ENTRY_U64(_name_, _val_)  \
-   PROPERTY_ENTRY_INTEGER(_name_, u64, U64, _val_)
+   __PROPERTY_ENTRY_INTEGER(_name_, u64, U64, _val_)
 
 #define PROPERTY_ENTRY_STRING(_name_, _val_)   \
 (struct property_entry) {  \
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 11/14] software node: move small properties inline when copying

2019-09-10 Thread Dmitry Torokhov

When copying/duplicating set of properties, move smaller properties that
were stored separately directly inside property entry structures. We can
move:

- up to 8 bytes from U8 arrays
- up to 4 words
- up to 2 double words
- one U64 value
- one or 2 strings.

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 83e2a706a86e..1aa6559993ec 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -277,6 +277,16 @@ static int property_entry_copy_data(struct property_entry 
*dst,
dst->value = src->value;
}
 
+   if (!dst->is_inline && dst->length <= sizeof(dst->value)) {
+   /* We have an opportunity to move the data inline */
+   const void *tmp = dst->pointer;
+
+   memcpy(>value, tmp, dst->length);
+   dst->is_inline = true;
+
+   kfree(tmp);
+   }
+
dst->length = src->length;
dst->type = src->type;
dst->name = kstrdup(src->name, GFP_KERNEL);
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 08/14] software node: unify PROPERTY_ENTRY_XXX macros

2019-09-10 Thread Dmitry Torokhov

We can unify string properties initializer macros with integer
initializers.

Signed-off-by: Dmitry Torokhov 
---
 include/linux/property.h | 64 +---
 1 file changed, 27 insertions(+), 37 deletions(-)

diff --git a/include/linux/property.h b/include/linux/property.h
index ec8f84d564a8..238e1507925f 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -245,37 +245,33 @@ struct property_entry {
 };
 
 /*
- * Note: the below four initializers for the anonymous union are carefully
+ * Note: the below initializers for the anonymous union are carefully
  * crafted to avoid gcc-4.4.4's problems with initialization of anon unions
  * and structs.
  */
 
-#define __PROPERTY_ENTRY_ARRAY_LEN(_name_, _type_, _Type_, _val_, _len_)\
+#define __PROPERTY_ENTRY_ELEMENT_SIZE(_elem_)  \
+   sizeof(((struct property_entry *)NULL)->value._elem_)
+
+#define __PROPERTY_ENTRY_ARRAY_LEN(_name_, _elem_, _Type_, _val_, _len_)\
 (struct property_entry) {  \
.name = _name_, \
-   .length = (_len_) * sizeof(_type_), \
+   .length = (_len_) * __PROPERTY_ENTRY_ELEMENT_SIZE(_elem_),  \
.is_array = true,   \
.type = DEV_PROP_##_Type_,  \
{ .pointer = _val_ },   \
 }
 
 #define PROPERTY_ENTRY_U8_ARRAY_LEN(_name_, _val_, _len_)  \
-   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u8, U8, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u8_data, U8, _val_, _len_)
 #define PROPERTY_ENTRY_U16_ARRAY_LEN(_name_, _val_, _len_) \
-   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u16, U16, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u16_data, U16, _val_, _len_)
 #define PROPERTY_ENTRY_U32_ARRAY_LEN(_name_, _val_, _len_) \
-   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u32, U32, _val_, _len_)
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u32_data, U32, _val_, _len_)
 #define PROPERTY_ENTRY_U64_ARRAY_LEN(_name_, _val_, _len_) \
-   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u64, U64, _val_, _len_)
-
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, u64_data, U64, _val_, _len_)
 #define PROPERTY_ENTRY_STRING_ARRAY_LEN(_name_, _val_, _len_)  \
-(struct property_entry) {  \
-   .name = _name_, \
-   .length = (_len_) * sizeof(const char *),   \
-   .is_array = true,   \
-   .type = DEV_PROP_STRING,\
-   { .pointer = _val_ },   \
-}
+   __PROPERTY_ENTRY_ARRAY_LEN(_name_, str, STRING, _val_, _len_)
 
 #define PROPERTY_ENTRY_U8_ARRAY(_name_, _val_) \
PROPERTY_ENTRY_U8_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
@@ -288,30 +284,24 @@ struct property_entry {
 #define PROPERTY_ENTRY_STRING_ARRAY(_name_, _val_) \
PROPERTY_ENTRY_STRING_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
 
-#define __PROPERTY_ENTRY_INTEGER(_name_, _type_, _Type_, _val_)\
-(struct property_entry) {  \
-   .name = _name_, \
-   .length = sizeof(_type_),   \
-   .type = DEV_PROP_##_Type_,  \
-   { .value = { ._type_##_data = _val_ } },\
+#define __PROPERTY_ENTRY_ELEMENT(_name_, _elem_, _Type_, _val_)
\
+(struct property_entry) {  \
+   .name = _name_, \
+   .length = __PROPERTY_ENTRY_ELEMENT_SIZE(_elem_),\
+   .type = DEV_PROP_##_Type_,  \
+   { .value = { ._elem_ = _val_ } },   \
 }
 
-#define PROPERTY_ENTRY_U8(_name_, _val_)   \
-   __PROPERTY_ENTRY_INTEGER(_name_, u8, U8, _val_)
-#define PROPERTY_ENTRY_U16(_name_, _val_)  \
-   __PROPERTY_ENTRY_INTEGER(_name_, u16, U16, _val_)
-#define PROPERTY_ENTRY_U32(_name_, _val_)  \
-   __PROPERTY_ENTRY_INTEGER(_name_, u32, U32, _val_)
-#define PROPERTY_ENTRY_U64(_name_, _val_)  \
-   __PROPERTY_ENTRY_INTEGER(_name_, u64, U64, _val_)
-
-#define PROPERTY_ENTRY_STRING(_name_, _val_)   \
-(struct property_entry) {  \
-   .name = _name_, \
-   .length = sizeof(const char *), \
-   .type = DEV_PROP_STRING,\
-   { .value = { .str = _val_ } },  \
-}
+#define

[PATCH v4 14/14] software node: remove separate handling of references

2019-09-10 Thread Dmitry Torokhov

Now that all users of references have moved to reference properties,
we can remove separate handling of references.

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c| 46 +++-
 include/linux/property.h | 14 
 2 files changed, 17 insertions(+), 43 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 07e1325789d2..ef38ea9f730b 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -465,9 +465,8 @@ software_node_get_reference_args(const struct fwnode_handle 
*fwnode,
 struct fwnode_reference_args *args)
 {
struct swnode *swnode = to_swnode(fwnode);
-   const struct software_node_reference *ref;
const struct software_node_ref_args *ref_array;
-   const struct software_node_ref_args *ref_args;
+   const struct software_node_ref_args *ref;
const struct property_entry *prop;
struct fwnode_handle *refnode;
int i;
@@ -476,37 +475,26 @@ software_node_get_reference_args(const struct 
fwnode_handle *fwnode,
return -ENOENT;
 
prop = property_entry_get(swnode->node->properties, propname);
-   if (prop) {
-   if (prop->type != DEV_PROP_REF)
-   return -EINVAL;
-
-   /*
-* We expect that references are never stored inline, even
-* single ones, as they are too big.
-*/
-   if (prop->is_inline)
-   return -EINVAL;
-
-   if (index * sizeof(*ref_args) >= prop->length)
-   return -ENOENT;
+   if (!prop)
+   return -ENOENT;
 
-   ref_array = prop->pointer;
-   ref_args = _array[index];
-   } else {
-   if (!swnode->node->references)
-   return -ENOENT;
+   if (prop->type != DEV_PROP_REF)
+   return -EINVAL;
 
-   for (ref = swnode->node->references; ref->name; ref++)
-   if (!strcmp(ref->name, propname))
-   break;
+   /*
+* We expect that references are never stored inline, even
+* single ones, as they are too big.
+*/
+   if (prop->is_inline)
+   return -EINVAL;
 
-   if (!ref->name || index > (ref->nrefs - 1))
-   return -ENOENT;
+   if (index * sizeof(*ref) >= prop->length)
+   return -ENOENT;
 
-   ref_args = >refs[index];
-   }
+   ref_array = prop->pointer;
+   ref = _array[index];
 
-   refnode = software_node_fwnode(ref_args->node);
+   refnode = software_node_fwnode(ref->node);
if (!refnode)
return -ENOENT;
 
@@ -525,7 +513,7 @@ software_node_get_reference_args(const struct fwnode_handle 
*fwnode,
args->nargs = nargs;
 
for (i = 0; i < nargs; i++)
-   args->args[i] = ref_args->args[i];
+   args->args[i] = ref->args[i];
 
return 0;
 }
diff --git a/include/linux/property.h b/include/linux/property.h
index 08d3e9d126ef..fa5a2ddc0c7b 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -412,30 +412,16 @@ int fwnode_graph_parse_endpoint(const struct 
fwnode_handle *fwnode,
 /* -- 
*/
 /* Software fwnode support - when HW description is incomplete or missing */
 
-/**
- * struct software_node_reference - Named software node reference property
- * @name: Name of the property
- * @nrefs: Number of elements in @refs array
- * @refs: Array of references with optional arguments
- */
-struct software_node_reference {
-   const char *name;
-   unsigned int nrefs;
-   const struct software_node_ref_args *refs;
-};
-
 /**
  * struct software_node - Software node description
  * @name: Name of the software node
  * @parent: Parent of the software node
  * @properties: Array of device properties
- * @references: Array of software node reference properties
  */
 struct software_node {
const char *name;
const struct software_node *parent;
const struct property_entry *properties;
-   const struct software_node_reference *references;
 };
 
 bool is_software_node(const struct fwnode_handle *fwnode);
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 06/14] software node: get rid of property_set_pointer()

2019-09-10 Thread Dmitry Torokhov

Instead of explicitly setting values of integer types when copying
property entries lets just copy entire value union when processing
non-array values.

When handling array values assign the pointer there using the newly
introduced "raw" pointer union member. This allows us to remove
property_set_pointer().

In property_get_pointer() we do not need to handle each data type
separately, we can simply return either the raw pointer or pointer to
values union.

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c| 90 +---
 include/linux/property.h | 12 ++
 2 files changed, 22 insertions(+), 80 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 7bad41a8f65d..726195d334e4 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -103,71 +103,15 @@ property_entry_get(const struct property_entry *prop, 
const char *name)
return NULL;
 }
 
-static void
-property_set_pointer(struct property_entry *prop, const void *pointer)
-{
-   switch (prop->type) {
-   case DEV_PROP_U8:
-   if (prop->is_array)
-   prop->pointer.u8_data = pointer;
-   else
-   prop->value.u8_data = *((u8 *)pointer);
-   break;
-   case DEV_PROP_U16:
-   if (prop->is_array)
-   prop->pointer.u16_data = pointer;
-   else
-   prop->value.u16_data = *((u16 *)pointer);
-   break;
-   case DEV_PROP_U32:
-   if (prop->is_array)
-   prop->pointer.u32_data = pointer;
-   else
-   prop->value.u32_data = *((u32 *)pointer);
-   break;
-   case DEV_PROP_U64:
-   if (prop->is_array)
-   prop->pointer.u64_data = pointer;
-   else
-   prop->value.u64_data = *((u64 *)pointer);
-   break;
-   case DEV_PROP_STRING:
-   if (prop->is_array)
-   prop->pointer.str = pointer;
-   else
-   prop->value.str = pointer;
-   break;
-   default:
-   break;
-   }
-}
-
 static const void *property_get_pointer(const struct property_entry *prop)
 {
-   switch (prop->type) {
-   case DEV_PROP_U8:
-   if (prop->is_array)
-   return prop->pointer.u8_data;
-   return >value.u8_data;
-   case DEV_PROP_U16:
-   if (prop->is_array)
-   return prop->pointer.u16_data;
-   return >value.u16_data;
-   case DEV_PROP_U32:
-   if (prop->is_array)
-   return prop->pointer.u32_data;
-   return >value.u32_data;
-   case DEV_PROP_U64:
-   if (prop->is_array)
-   return prop->pointer.u64_data;
-   return >value.u64_data;
-   case DEV_PROP_STRING:
-   if (prop->is_array)
-   return prop->pointer.str;
-   return >value.str;
-   default:
+   if (!prop->length)
return NULL;
-   }
+
+   if (prop->is_array)
+   return prop->pointer;
+
+   return >value;
 }
 
 static const void *property_entry_find(const struct property_entry *props,
@@ -322,13 +266,15 @@ static int property_entry_read_string_array(const struct 
property_entry *props,
 static void property_entry_free_data(const struct property_entry *p)
 {
const void *pointer = property_get_pointer(p);
+   const char * const *src_str;
size_t i, nval;
 
if (p->is_array) {
-   if (p->type == DEV_PROP_STRING && p->pointer.str) {
+   if (p->type == DEV_PROP_STRING && p->pointer) {
+   src_str = p->pointer;
nval = p->length / sizeof(const char *);
for (i = 0; i < nval; i++)
-   kfree(p->pointer.str[i]);
+   kfree(src_str[i]);
}
kfree(pointer);
} else if (p->type == DEV_PROP_STRING) {
@@ -341,6 +287,7 @@ static const char * const *
 property_copy_string_array(const struct property_entry *src)
 {
const char **d;
+   const char * const *src_str = src->pointer;
size_t nval = src->length / sizeof(*d);
int i;
 
@@ -349,8 +296,8 @@ property_copy_string_array(const struct property_entry *src)
return NULL;
 
for (i = 0; i < nval; i++) {
-   d[i] = kstrdup(src->pointer.str[i], GFP_KERNEL);
-   if (!d[i] && src->pointer.str[i]) {
+   d[i] = kstrdup(src_str[i], GFP_KERNEL);
+   if (!d[i] && src_str[i]) {
while (--i >= 0)
kfree(d[i]);
kfree(d);
@@ -380,20 +327,21 @@ static int

[PATCH v4 05/14] software node: clean up property_copy_string_array()

2019-09-10 Thread Dmitry Torokhov

Because property_copy_string_array() stores the newly allocated pointer in the
destination property, we have an awkward code in property_entry_copy_data()
where we fetch the new pointer from dst.

Let's change property_copy_string_array() to return pointer and rely on the
common path in property_entry_copy_data() to store it in destination structure.

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index ee2a405cca9a..7bad41a8f65d 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -337,8 +337,8 @@ static void property_entry_free_data(const struct 
property_entry *p)
kfree(p->name);
 }
 
-static int property_copy_string_array(struct property_entry *dst,
- const struct property_entry *src)
+static const char * const *
+property_copy_string_array(const struct property_entry *src)
 {
const char **d;
size_t nval = src->length / sizeof(*d);
@@ -346,7 +346,7 @@ static int property_copy_string_array(struct property_entry 
*dst,
 
d = kcalloc(nval, sizeof(*d), GFP_KERNEL);
if (!d)
-   return -ENOMEM;
+   return NULL;
 
for (i = 0; i < nval; i++) {
d[i] = kstrdup(src->pointer.str[i], GFP_KERNEL);
@@ -354,12 +354,11 @@ static int property_copy_string_array(struct 
property_entry *dst,
while (--i >= 0)
kfree(d[i]);
kfree(d);
-   return -ENOMEM;
+   return NULL;
}
}
 
-   dst->pointer.str = d;
-   return 0;
+   return d;
 }
 
 static int property_entry_copy_data(struct property_entry *dst,
@@ -367,17 +366,15 @@ static int property_entry_copy_data(struct property_entry 
*dst,
 {
const void *pointer = property_get_pointer(src);
const void *new;
-   int error;
 
if (src->is_array) {
if (!src->length)
return -ENODATA;
 
if (src->type == DEV_PROP_STRING) {
-   error = property_copy_string_array(dst, src);
-   if (error)
-   return error;
-   new = dst->pointer.str;
+   new = property_copy_string_array(src);
+   if (!new)
+   return -ENOMEM;
} else {
new = kmemdup(pointer, src->length, GFP_KERNEL);
if (!new)
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 13/14] platform/x86: intel_cht_int33fe: use inline reference properties

2019-09-10 Thread Dmitry Torokhov

Now that static device properties allow defining reference properties
together with all other types of properties, instead of managing them
separately, let's adjust the driver.

Signed-off-by: Dmitry Torokhov 
---
 drivers/platform/x86/intel_cht_int33fe.c | 81 
 1 file changed, 41 insertions(+), 40 deletions(-)

diff --git a/drivers/platform/x86/intel_cht_int33fe.c 
b/drivers/platform/x86/intel_cht_int33fe.c
index 1d5d877b9582..4177c5424931 100644
--- a/drivers/platform/x86/intel_cht_int33fe.c
+++ b/drivers/platform/x86/intel_cht_int33fe.c
@@ -46,30 +46,6 @@ struct cht_int33fe_data {
struct fwnode_handle *dp;
 };
 
-static const struct software_node nodes[];
-
-static const struct software_node_ref_args pi3usb30532_ref = {
-   [INT33FE_NODE_PI3USB30532]
-};
-
-static const struct software_node_ref_args dp_ref = {
-   [INT33FE_NODE_DISPLAYPORT]
-};
-
-static struct software_node_ref_args mux_ref;
-
-static const struct software_node_reference usb_connector_refs[] = {
-   { "orientation-switch", 1, _ref},
-   { "mode-switch", 1, _ref},
-   { "displayport", 1, _ref},
-   { }
-};
-
-static const struct software_node_reference fusb302_refs[] = {
-   { "usb-role-switch", 1, _ref},
-   { }
-};
-
 /*
  * Grrr I severly dislike buggy BIOS-es. At least one BIOS enumerates
  * the max17047 both through the INT33FE ACPI device (it is right there
@@ -105,8 +81,18 @@ static const struct property_entry max17047_props[] = {
{ }
 };
 
+/*
+ * We are not using inline property here because those are constant,
+ * and we need to adjust this one at runtime to point to real
+ * software node.
+ */
+static struct software_node_ref_args fusb302_mux_refs[] = {
+   { .node = NULL },
+};
+
 static const struct property_entry fusb302_props[] = {
PROPERTY_ENTRY_STRING("linux,extcon-name", "cht_wcove_pwrsrc"),
+   PROPERTY_ENTRY_REF_ARRAY("usb-role-switch", fusb302_mux_refs),
{ }
 };
 
@@ -122,6 +108,8 @@ static const u32 snk_pdo[] = {
PDO_VAR(5000, 12000, 3000),
 };
 
+static const struct software_node nodes[];
+
 static const struct property_entry usb_connector_props[] = {
PROPERTY_ENTRY_STRING("data-role", "dual"),
PROPERTY_ENTRY_STRING("power-role", "dual"),
@@ -129,15 +117,21 @@ static const struct property_entry usb_connector_props[] 
= {
PROPERTY_ENTRY_U32_ARRAY("source-pdos", src_pdo),
PROPERTY_ENTRY_U32_ARRAY("sink-pdos", snk_pdo),
PROPERTY_ENTRY_U32("op-sink-microwatt", 250),
+   PROPERTY_ENTRY_REF("orientation-switch",
+  [INT33FE_NODE_PI3USB30532]),
+   PROPERTY_ENTRY_REF("mode-switch",
+  [INT33FE_NODE_PI3USB30532]),
+   PROPERTY_ENTRY_REF("displayport",
+  [INT33FE_NODE_DISPLAYPORT]),
{ }
 };
 
 static const struct software_node nodes[] = {
-   { "fusb302", NULL, fusb302_props, fusb302_refs },
+   { "fusb302", NULL, fusb302_props },
{ "max17047", NULL, max17047_props },
{ "pi3usb30532" },
{ "displayport" },
-   { "connector", [0], usb_connector_props, usb_connector_refs },
+   { "connector", [0], usb_connector_props },
{ }
 };
 
@@ -173,9 +167,10 @@ static void cht_int33fe_remove_nodes(struct 
cht_int33fe_data *data)
 {
software_node_unregister_nodes(nodes);
 
-   if (mux_ref.node) {
-   fwnode_handle_put(software_node_fwnode(mux_ref.node));
-   mux_ref.node = NULL;
+   if (fusb302_mux_refs[0].node) {
+   fwnode_handle_put(
+   software_node_fwnode(fusb302_mux_refs[0].node));
+   fusb302_mux_refs[0].node = NULL;
}
 
if (data->dp) {
@@ -187,25 +182,31 @@ static void cht_int33fe_remove_nodes(struct 
cht_int33fe_data *data)
 
 static int cht_int33fe_add_nodes(struct cht_int33fe_data *data)
 {
+   const struct software_node *mux_ref_node;
int ret;
 
-   ret = software_node_register_nodes(nodes);
-   if (ret)
-   return ret;
-
-   /* The devices that are not created in this driver need extra steps. */
-
/*
 * There is no ACPI device node for the USB role mux, so we need to wait
 * until the mux driver has created software node for the mux device.
 * It means we depend on the mux driver. This function will return
 * -EPROBE_DEFER until the mux device is registered.
 */
-   mux_ref.node = software_node_find_by_name(NULL, "intel-xhci-usb-sw");
-   if (!mux_ref.node) {
-   ret = -EPROBE_DEFER;
-   goto err_remove_nodes;
-   }
+   mux_ref_node = software_node_find_by_name(NULL, "intel-xhci-usb-sw");
+   if (!mux_ref_node)
+   return -EPROBE_DEFER;
+
+   /*
+* Update node used in "usb-role-switch" property. Note that we
+* rely on software_node_register_nodes() to use the original
+*

[PATCH v4 03/14] efi/apple-properties: use PROPERTY_ENTRY_U8_ARRAY_LEN

2019-09-10 Thread Dmitry Torokhov

Let's switch to using PROPERTY_ENTRY_U8_ARRAY_LEN() to initialize
property entries. Also, when dumping data, rely on local variables
instead of poking into the property entry structure directly.

Signed-off-by: Dmitry Torokhov 
---
 drivers/firmware/efi/apple-properties.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/firmware/efi/apple-properties.c 
b/drivers/firmware/efi/apple-properties.c
index 0e206c9e0d7a..5ccf39986a14 100644
--- a/drivers/firmware/efi/apple-properties.c
+++ b/drivers/firmware/efi/apple-properties.c
@@ -53,7 +53,8 @@ static void __init unmarshal_key_value_pairs(struct 
dev_header *dev_header,
 
for (i = 0; i < dev_header->prop_count; i++) {
int remaining = dev_header->len - (ptr - (void *)dev_header);
-   u32 key_len, val_len;
+   u32 key_len, val_len, entry_len;
+   const u8 *entry_data;
char *key;
 
if (sizeof(key_len) > remaining)
@@ -85,17 +86,14 @@ static void __init unmarshal_key_value_pairs(struct 
dev_header *dev_header,
ucs2_as_utf8(key, ptr + sizeof(key_len),
 key_len - sizeof(key_len));
 
-   entry[i].name = key;
-   entry[i].length = val_len - sizeof(val_len);
-   entry[i].is_array = !!entry[i].length;
-   entry[i].type = DEV_PROP_U8;
-   entry[i].pointer.u8_data = ptr + key_len + sizeof(val_len);
-
+   entry_data = ptr + key_len + sizeof(val_len);
+   entry_len = val_len - sizeof(val_len);
+   entry[i] = PROPERTY_ENTRY_U8_ARRAY_LEN(key, entry_data,
+  entry_len);
if (dump_properties) {
-   dev_info(dev, "property: %s\n", entry[i].name);
+   dev_info(dev, "property: %s\n", key);
print_hex_dump(KERN_INFO, pr_fmt(), DUMP_PREFIX_OFFSET,
-   16, 1, entry[i].pointer.u8_data,
-   entry[i].length, true);
+   16, 1, entry_data, entry_len, true);
}
 
ptr += key_len + val_len;
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 12/14] software node: implement reference properties

2019-09-10 Thread Dmitry Torokhov

It is possible to store references to software nodes in the same fashion as
other static properties, so that users do not need to define separate
structures:

static const struct software_node gpio_bank_b_node = {
.name = "B",
};

static const struct property_entry simone_key_enter_props[] = {
PROPERTY_ENTRY_U32("linux,code", KEY_ENTER),
PROPERTY_ENTRY_STRING("label", "enter"),
PROPERTY_ENTRY_REF("gpios", _bank_b_node, 123, GPIO_ACTIVE_LOW),
{ }
};

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c| 48 +++--
 include/linux/property.h | 57 +---
 2 files changed, 81 insertions(+), 24 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 1aa6559993ec..07e1325789d2 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -265,6 +265,12 @@ static int property_entry_copy_data(struct property_entry 
*dst,
}
 
dst->pointer = new;
+   } else if (src->type == DEV_PROP_REF) {
+   /*
+* Reference properties are never stored inline as
+* they are too big.
+*/
+   return -EINVAL;
} else if (src->type == DEV_PROP_STRING) {
new = kstrdup(src->value.str, GFP_KERNEL);
if (!new && src->value.str)
@@ -460,21 +466,47 @@ software_node_get_reference_args(const struct 
fwnode_handle *fwnode,
 {
struct swnode *swnode = to_swnode(fwnode);
const struct software_node_reference *ref;
+   const struct software_node_ref_args *ref_array;
+   const struct software_node_ref_args *ref_args;
const struct property_entry *prop;
struct fwnode_handle *refnode;
int i;
 
-   if (!swnode || !swnode->node->references)
+   if (!swnode)
return -ENOENT;
 
-   for (ref = swnode->node->references; ref->name; ref++)
-   if (!strcmp(ref->name, propname))
-   break;
+   prop = property_entry_get(swnode->node->properties, propname);
+   if (prop) {
+   if (prop->type != DEV_PROP_REF)
+   return -EINVAL;
 
-   if (!ref->name || index > (ref->nrefs - 1))
-   return -ENOENT;
+   /*
+* We expect that references are never stored inline, even
+* single ones, as they are too big.
+*/
+   if (prop->is_inline)
+   return -EINVAL;
+
+   if (index * sizeof(*ref_args) >= prop->length)
+   return -ENOENT;
+
+   ref_array = prop->pointer;
+   ref_args = _array[index];
+   } else {
+   if (!swnode->node->references)
+   return -ENOENT;
+
+   for (ref = swnode->node->references; ref->name; ref++)
+   if (!strcmp(ref->name, propname))
+   break;
+
+   if (!ref->name || index > (ref->nrefs - 1))
+   return -ENOENT;
+
+   ref_args = >refs[index];
+   }
 
-   refnode = software_node_fwnode(ref->refs[index].node);
+   refnode = software_node_fwnode(ref_args->node);
if (!refnode)
return -ENOENT;
 
@@ -493,7 +525,7 @@ software_node_get_reference_args(const struct fwnode_handle 
*fwnode,
args->nargs = nargs;
 
for (i = 0; i < nargs; i++)
-   args->args[i] = ref->refs[index].args[i];
+   args->args[i] = ref_args->args[i];
 
return 0;
 }
diff --git a/include/linux/property.h b/include/linux/property.h
index ac7823d58cfe..08d3e9d126ef 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -22,6 +22,7 @@ enum dev_prop_type {
DEV_PROP_U32,
DEV_PROP_U64,
DEV_PROP_STRING,
+   DEV_PROP_REF,
 };
 
 enum dev_dma_attr {
@@ -218,6 +219,20 @@ static inline int fwnode_property_count_u64(const struct 
fwnode_handle *fwnode,
return fwnode_property_read_u64_array(fwnode, propname, NULL, 0);
 }
 
+struct software_node;
+
+/**
+ * struct software_node_ref_args - Reference property with additional arguments
+ * @node: Reference to a software node
+ * @nargs: Number of elements in @args array
+ * @args: Integer arguments
+ */
+struct software_node_ref_args {
+   const struct software_node *node;
+   unsigned int nargs;
+   u64 args[NR_FWNODE_REFERENCE_ARGS];
+};
+
 /**
  * struct property_entry - "Built-in" device property representation.
  * @name: Name of the property.
@@ -255,14 +270,20 @@ struct property_entry {
 #define __PROPERTY_ENTRY_ELEMENT_SIZE(_elem_)  \
sizeof(((struct property_entry *)NULL)->value._elem_)
 
-#define __PROPERTY_ENTRY_ARRAY_LEN(_name_, _elem_, _Type_, _val_, _len_)\
+#define __PROPERTY_ENTRY_ARRAY_ELSIZE_LEN(_name_, _elsize_, _Type_,\
+

[PATCH v4 01/14] software node: remove DEV_PROP_MAX

2019-09-10 Thread Dmitry Torokhov

This definition is not used anywhere, let's remove it.

Suggested-by: Andy Shevchenko 
Reviewed-by: Andy Shevchenko 
Signed-off-by: Dmitry Torokhov 
---
 include/linux/property.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/linux/property.h b/include/linux/property.h
index 9b3d4ca3a73a..44c1704f7163 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -22,7 +22,6 @@ enum dev_prop_type {
DEV_PROP_U32,
DEV_PROP_U64,
DEV_PROP_STRING,
-   DEV_PROP_MAX,
 };
 
 enum dev_dma_attr {
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 02/14] software node: introduce PROPERTY_ENTRY_ARRAY_XXX_LEN()

2019-09-10 Thread Dmitry Torokhov

Sometimes we want to initialize property entry array from a regular
pointer, when we can't determine length automatically via ARRAY_SIZE.
Let's introduce PROPERTY_ENTRY_ARRAY_XXX_LEN macros that take explicit
"len" argument.

Signed-off-by: Dmitry Torokhov 
---
 include/linux/property.h | 45 +---
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/include/linux/property.h b/include/linux/property.h
index 44c1704f7163..f89b930ca4b7 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -256,33 +256,44 @@ struct property_entry {
  * and structs.
  */
 
-#define PROPERTY_ENTRY_INTEGER_ARRAY(_name_, _type_, _Type_, _val_)\
+#define PROPERTY_ENTRY_ARRAY_LEN(_name_, _type_, _Type_, _val_, _len_) \
 (struct property_entry) {  \
.name = _name_, \
-   .length = ARRAY_SIZE(_val_) * sizeof(_type_),   \
+   .length = (_len_) * sizeof(_type_), \
.is_array = true,   \
.type = DEV_PROP_##_Type_,  \
{ .pointer = { ._type_##_data = _val_ } },  \
 }
 
-#define PROPERTY_ENTRY_U8_ARRAY(_name_, _val_) \
-   PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u8, U8, _val_)
-#define PROPERTY_ENTRY_U16_ARRAY(_name_, _val_)\
-   PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u16, U16, _val_)
-#define PROPERTY_ENTRY_U32_ARRAY(_name_, _val_)\
-   PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u32, U32, _val_)
-#define PROPERTY_ENTRY_U64_ARRAY(_name_, _val_)\
-   PROPERTY_ENTRY_INTEGER_ARRAY(_name_, u64, U64, _val_)
+#define PROPERTY_ENTRY_U8_ARRAY_LEN(_name_, _val_, _len_)  \
+   PROPERTY_ENTRY_ARRAY_LEN(_name_, u8, U8, _val_, _len_)
+#define PROPERTY_ENTRY_U16_ARRAY_LEN(_name_, _val_, _len_) \
+   PROPERTY_ENTRY_ARRAY_LEN(_name_, u16, U16, _val_, _len_)
+#define PROPERTY_ENTRY_U32_ARRAY_LEN(_name_, _val_, _len_) \
+   PROPERTY_ENTRY_ARRAY_LEN(_name_, u32, U32, _val_, _len_)
+#define PROPERTY_ENTRY_U64_ARRAY_LEN(_name_, _val_, _len_) \
+   PROPERTY_ENTRY_ARRAY_LEN(_name_, u64, U64, _val_, _len_)
 
-#define PROPERTY_ENTRY_STRING_ARRAY(_name_, _val_) \
-(struct property_entry) {  \
-   .name = _name_, \
-   .length = ARRAY_SIZE(_val_) * sizeof(const char *), \
-   .is_array = true,   \
-   .type = DEV_PROP_STRING,\
-   { .pointer = { .str = _val_ } },\
+#define PROPERTY_ENTRY_STRING_ARRAY_LEN(_name_, _val_, _len_)  \
+(struct property_entry) {  \
+   .name = _name_, \
+   .length = (_len_) * sizeof(const char *),   \
+   .is_array = true,   \
+   .type = DEV_PROP_STRING,\
+   { .pointer = { .str = _val_ } },\
 }
 
+#define PROPERTY_ENTRY_U8_ARRAY(_name_, _val_) \
+   PROPERTY_ENTRY_U8_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
+#define PROPERTY_ENTRY_U16_ARRAY(_name_, _val_)
\
+   PROPERTY_ENTRY_U16_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
+#define PROPERTY_ENTRY_U32_ARRAY(_name_, _val_)
\
+   PROPERTY_ENTRY_U32_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
+#define PROPERTY_ENTRY_U64_ARRAY(_name_, _val_)
\
+   PROPERTY_ENTRY_U64_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
+#define PROPERTY_ENTRY_STRING_ARRAY(_name_, _val_) \
+   PROPERTY_ENTRY_STRING_ARRAY_LEN(_name_, _val_, ARRAY_SIZE(_val_))
+
 #define PROPERTY_ENTRY_INTEGER(_name_, _type_, _Type_, _val_)  \
 (struct property_entry) {  \
.name = _name_, \
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 10/14] software node: rename is_array to is_inline

2019-09-10 Thread Dmitry Torokhov

We do not need a special flag to know if we are dealing with an array,
as we can get that data from ratio between element length and the data
size, however we do need a flag to know whether the data is stored
directly inside property_entry or separately.

Signed-off-by: Dmitry Torokhov 
---
 drivers/base/swnode.c|  9 +
 include/linux/property.h | 12 +++-
 2 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/base/swnode.c b/drivers/base/swnode.c
index 9c3e566c753e..83e2a706a86e 100644
--- a/drivers/base/swnode.c
+++ b/drivers/base/swnode.c
@@ -108,7 +108,7 @@ static const void *property_get_pointer(const struct 
property_entry *prop)
if (!prop->length)
return NULL;
 
-   if (prop->is_array)
+   if (!prop->is_inline)
return prop->pointer;
 
return >value;
@@ -205,7 +205,7 @@ static void property_entry_free_data(const struct 
property_entry *p)
const char * const *src_str;
size_t i, nval;
 
-   if (p->is_array) {
+   if (!p->is_inline) {
if (p->type == DEV_PROP_STRING && p->pointer) {
src_str = p->pointer;
nval = p->length / sizeof(const char *);
@@ -250,7 +250,7 @@ static int property_entry_copy_data(struct property_entry 
*dst,
const void *pointer = property_get_pointer(src);
const void *new;
 
-   if (src->is_array) {
+   if (!src->is_inline) {
if (!src->length)
return -ENODATA;
 
@@ -264,15 +264,16 @@ static int property_entry_copy_data(struct property_entry 
*dst,
return -ENOMEM;
}
 
-   dst->is_array = true;
dst->pointer = new;
} else if (src->type == DEV_PROP_STRING) {
new = kstrdup(src->value.str, GFP_KERNEL);
if (!new && src->value.str)
return -ENOMEM;
 
+   dst->is_inline = true;
dst->value.str = new;
} else {
+   dst->is_inline = true;
dst->value = src->value;
}
 
diff --git a/include/linux/property.h b/include/linux/property.h
index 238e1507925f..ac7823d58cfe 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -222,15 +222,17 @@ static inline int fwnode_property_count_u64(const struct 
fwnode_handle *fwnode,
  * struct property_entry - "Built-in" device property representation.
  * @name: Name of the property.
  * @length: Length of data making up the value.
- * @is_array: True when the property is an array.
+ * @is_inline: True when the property value is stored directly in
+ *  property_entry instance.
  * @type: Type of the data in unions.
- * @pointer: Pointer to the property (an array of items of the given type).
- * @value: Value of the property (when it is a single item of the given type).
+ * @pointer: Pointer to the property when it is stored separately from
+ * the  property_entry instance.
+ * @value: Value of the property when it is stored inline.
  */
 struct property_entry {
const char *name;
size_t length;
-   bool is_array;
+   bool is_inline;
enum dev_prop_type type;
union {
const void *pointer;
@@ -257,7 +259,6 @@ struct property_entry {
 (struct property_entry) {  \
.name = _name_, \
.length = (_len_) * __PROPERTY_ENTRY_ELEMENT_SIZE(_elem_),  \
-   .is_array = true,   \
.type = DEV_PROP_##_Type_,  \
{ .pointer = _val_ },   \
 }
@@ -288,6 +289,7 @@ struct property_entry {
 (struct property_entry) {  \
.name = _name_, \
.length = __PROPERTY_ENTRY_ELEMENT_SIZE(_elem_),\
+   .is_inline = true,  \
.type = DEV_PROP_##_Type_,  \
{ .value = { ._elem_ = _val_ } },   \
 }
-- 
2.23.0.162.g0b9fbb3734-goog

[PATCH v4 00/14] software node: add support for reference properties

2019-09-10 Thread Dmitry Torokhov

These series implement "references" properties for software nodes as true
properties, instead of managing them completely separately.

The first 10 patches are generic cleanups and consolidation and unification
of the existing code; patch #11 implements PROPERTY_EMTRY_REF() and friends;
patch #12 converts the user of references to the property syntax, and patch
#13 removes the remains of references as entities that are managed
separately.

Changes in v4:
- dealt with union aliasing concerns
- inline small properties on copy

Changes in v3:
- added various cleanups before implementing reference properties

Changes in v2:
- reworked code so that even single-entry reference properties are
  stored as arrays (i.e. the software_node_ref_args instances are
  not part of property_entry structure) to avoid size increase.
  From user's POV nothing is changed, one can still use PROPERTY_ENTRY_REF
  macro to define reference "inline".
- dropped unused DEV_PROP_MAX
- rebased on linux-next


Dmitry Torokhov (14):
  software node: remove DEV_PROP_MAX
  software node: introduce PROPERTY_ENTRY_ARRAY_XXX_LEN()
  efi/apple-properties: use PROPERTY_ENTRY_U8_ARRAY_LEN
  software node: mark internal macros with double underscores
  software node: clean up property_copy_string_array()
  software node: get rid of property_set_pointer()
  software node: remove property_entry_read_uNN_array functions
  software node: unify PROPERTY_ENTRY_XXX macros
  software node: simplify property_entry_read_string_array()
  software node: rename is_array to is_inline
  software node: move small properties inline when copying
  software node: implement reference properties
  platform/x86: intel_cht_int33fe: use inline reference properties
  software node: remove separate handling of references

 drivers/base/swnode.c| 266 ---
 drivers/firmware/efi/apple-properties.c  |  18 +-
 drivers/platform/x86/intel_cht_int33fe.c |  81 +++
 include/linux/property.h | 177 +++
 4 files changed, 230 insertions(+), 312 deletions(-)

-- 
2.23.0.162.g0b9fbb3734-goog

Re: [PATCH] net: stmmac: socfpga: re-use the `interface` parameter from platform data

2019-09-10 Thread Ardelean, Alexandru

On Tue, 2019-09-10 at 17:46 +0200, David Miller wrote:
> [External]
> 
> From: David Miller 
> Date: Tue, 10 Sep 2019 17:45:44 +0200 (CEST)
> 
> > From: Alexandru Ardelean 
> > Date: Fri, 6 Sep 2019 15:30:54 +0300
> > 
> > > The socfpga sub-driver defines an `interface` field in the `socfpga_dwmac`
> > > struct and parses it on init.
> > > 
> > > The shared `stmmac_probe_config_dt()` function also parses this from the
> > > device-tree and makes it available on the returned `plat_data` (which is
> > > the same data available via `netdev_priv()`).
> > > 
> > > All that's needed now is to dig that information out, via some
> > > `dev_get_drvdata()` && `netdev_priv()` calls and re-use it.
> > > 
> > > Signed-off-by: Alexandru Ardelean 
> > 
> > This doesn't build even on net-next.
> 

Right.
My bad.

I think I got confused with multiple/cross-testing and probably this change 
didn't even get compiled.

Apologies for this.
Will send a good version.

Alex

> Specifically:
> 
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c: In function 
> ‘socfpga_gen5_set_phy_mode’:
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:264:44: error: ‘phymode’ 
> undeclared (first use in this function);
> did you mean ‘phy_modes’?
>   264 |   dev_err(dwmac->dev, "bad phy mode %d\n", phymode);
>   |^~~
> ./include/linux/device.h:1499:32: note: in definition of macro ‘dev_err’
>  1499 |  _dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
>   |^~~
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:264:44: note: each 
> undeclared identifier is reported only once for
> each function it appears in
>   264 |   dev_err(dwmac->dev, "bad phy mode %d\n", phymode);
>   |^~~
> ./include/linux/device.h:1499:32: note: in definition of macro ‘dev_err’
>  1499 |  _dev_err(dev, dev_fmt(fmt), ##__VA_ARGS__)
>   |^~~
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c: In function 
> ‘socfpga_gen10_set_phy_mode’:
> drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c:340:6: error: ‘phymode’ 
> undeclared (first use in this function);
> did you mean ‘phy_modes’?
>   340 |  phymode == PHY_INTERFACE_MODE_MII ||
>   |  ^~~
>   |  phy_modes

[PATCH] ARM: module: Drop 'rel->r_offset < 0' always false statement

2019-09-10 Thread Austin Kim

Since rel->r_offset is declared as Elf32_Addr,
this value is always non-negative.
typedef struct elf32_rel {
  Elf32_Addrr_offset;
Elf32_Word  r_info;
} Elf32_Rel;

typedef __u32   Elf32_Addr;
typedef unsigned int __u32;

Drop 'rel->r_offset < 0' statement which is always false.

Signed-off-by: Austin Kim 
---
 arch/arm/kernel/module.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index deef17f..0921ce7 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -92,7 +92,7 @@ apply_relocate(Elf32_Shdr *sechdrs, const char *strtab, 
unsigned int symindex,
sym = ((Elf32_Sym *)symsec->sh_addr) + offset;
symname = strtab + sym->st_name;
 
-   if (rel->r_offset < 0 || rel->r_offset > dstsec->sh_size - 
sizeof(u32)) {
+   if (rel->r_offset > dstsec->sh_size - sizeof(u32)) {
pr_err("%s: section %u reloc %u sym '%s': out of bounds 
relocation, offset %d size %u\n",
   module->name, relindex, i, symname,
   rel->r_offset, dstsec->sh_size);
-- 
2.6.2

Re: [PATCH V8 5/5] mmc: host: sdhci-pci: Add Genesys Logic GL975x support

2019-09-10 Thread Guenter Roeck

On Fri, Sep 06, 2019 at 10:33:26AM +0800, Ben Chuang wrote:
> From: Ben Chuang 
> 
> Add support for the GL9750 and GL9755 chipsets.
> 
> Enable v4 mode and wait 5ms after set 1.8V signal enable for GL9750/
> GL9755. Fix the value of SDHCI_MAX_CURRENT register and use the vendor
> tuning flow for GL9750.
> 
> Co-developed-by: Michael K Johnson 
> Signed-off-by: Michael K Johnson 
> Signed-off-by: Ben Chuang 
> ---
>  drivers/mmc/host/Kconfig  |   1 +
>  drivers/mmc/host/Makefile |   2 +-
>  drivers/mmc/host/sdhci-pci-core.c |   2 +
>  drivers/mmc/host/sdhci-pci-gli.c  | 355 ++
>  drivers/mmc/host/sdhci-pci.h  |   5 +
>  5 files changed, 364 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/mmc/host/sdhci-pci-gli.c
> 
> diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
> index 931770f17087..9fbfff514d6c 100644
> --- a/drivers/mmc/host/Kconfig
> +++ b/drivers/mmc/host/Kconfig
> @@ -94,6 +94,7 @@ config MMC_SDHCI_PCI
>   depends on MMC_SDHCI && PCI
>   select MMC_CQHCI
>   select IOSF_MBI if X86
> + select MMC_SDHCI_IO_ACCESSORS
>   help
> This selects the PCI Secure Digital Host Controller Interface.
> Most controllers found today are PCI devices.
> diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile
> index 73578718f119..661445415090 100644
> --- a/drivers/mmc/host/Makefile
> +++ b/drivers/mmc/host/Makefile
> @@ -13,7 +13,7 @@ obj-$(CONFIG_MMC_MXS)   += mxs-mmc.o
>  obj-$(CONFIG_MMC_SDHCI)  += sdhci.o
>  obj-$(CONFIG_MMC_SDHCI_PCI)  += sdhci-pci.o
>  sdhci-pci-y  += sdhci-pci-core.o sdhci-pci-o2micro.o 
> sdhci-pci-arasan.o \
> -sdhci-pci-dwc-mshc.o
> +sdhci-pci-dwc-mshc.o sdhci-pci-gli.o
>  obj-$(subst m,y,$(CONFIG_MMC_SDHCI_PCI)) += sdhci-pci-data.o
>  obj-$(CONFIG_MMC_SDHCI_ACPI) += sdhci-acpi.o
>  obj-$(CONFIG_MMC_SDHCI_PXAV3)+= sdhci-pxav3.o
> diff --git a/drivers/mmc/host/sdhci-pci-core.c 
> b/drivers/mmc/host/sdhci-pci-core.c
> index 4154ee11b47d..e5835fbf73bc 100644
> --- a/drivers/mmc/host/sdhci-pci-core.c
> +++ b/drivers/mmc/host/sdhci-pci-core.c
> @@ -1682,6 +1682,8 @@ static const struct pci_device_id pci_ids[] = {
>   SDHCI_PCI_DEVICE(O2, SEABIRD1, o2),
>   SDHCI_PCI_DEVICE(ARASAN, PHY_EMMC, arasan),
>   SDHCI_PCI_DEVICE(SYNOPSYS, DWC_MSHC, snps),
> + SDHCI_PCI_DEVICE(GLI, 9750, gl9750),
> + SDHCI_PCI_DEVICE(GLI, 9755, gl9755),
>   SDHCI_PCI_DEVICE_CLASS(AMD, SYSTEM_SDHCI, PCI_CLASS_MASK, amd),
>   /* Generic SD host controller */
>   {PCI_DEVICE_CLASS(SYSTEM_SDHCI, PCI_CLASS_MASK)},
> diff --git a/drivers/mmc/host/sdhci-pci-gli.c 
> b/drivers/mmc/host/sdhci-pci-gli.c
> new file mode 100644
> index ..94462b94abec
> --- /dev/null
> +++ b/drivers/mmc/host/sdhci-pci-gli.c
> @@ -0,0 +1,355 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Copyright (C) 2019 Genesys Logic, Inc.
> + *
> + * Authors: Ben Chuang 
> + *
> + * Version: v0.9.0 (2019-08-08)
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include "sdhci.h"
> +#include "sdhci-pci.h"
> +
> +/*  Genesys Logic extra registers */
> +#define SDHCI_GLI_9750_WT 0x800
> +#define   SDHCI_GLI_9750_WT_EN  BIT(0)
> +#define   GLI_9750_WT_EN_ON  0x1
> +#define   GLI_9750_WT_EN_OFF 0x0
> +
> +#define SDHCI_GLI_9750_DRIVING  0x860
> +#define   SDHCI_GLI_9750_DRIVING_1GENMASK(11, 0)
> +#define   SDHCI_GLI_9750_DRIVING_2GENMASK(27, 26)
> +#define   GLI_9750_DRIVING_1_VALUE0xFFF
> +#define   GLI_9750_DRIVING_2_VALUE0x3
> +
> +#define SDHCI_GLI_9750_PLL 0x864
> +#define   SDHCI_GLI_9750_PLL_TX2_INVBIT(23)
> +#define   SDHCI_GLI_9750_PLL_TX2_DLYGENMASK(22, 20)
> +#define   GLI_9750_PLL_TX2_INV_VALUE0x1
> +#define   GLI_9750_PLL_TX2_DLY_VALUE0x0
> +
> +#define SDHCI_GLI_9750_SW_CTRL  0x874
> +#define   SDHCI_GLI_9750_SW_CTRL_4GENMASK(7, 6)
> +#define   GLI_9750_SW_CTRL_4_VALUE0x3
> +
> +#define SDHCI_GLI_9750_MISC0x878
> +#define   SDHCI_GLI_9750_MISC_TX1_INVBIT(2)
> +#define   SDHCI_GLI_9750_MISC_RX_INV BIT(3)
> +#define   SDHCI_GLI_9750_MISC_TX1_DLYGENMASK(6, 4)
> +#define   GLI_9750_MISC_TX1_INV_VALUE0x0
> +#define   GLI_9750_MISC_RX_INV_ON0x1
> +#define   GLI_9750_MISC_RX_INV_OFF   0x0
> +#define   GLI_9750_MISC_RX_INV_VALUE GLI_9750_MISC_RX_INV_OFF
> +#define   GLI_9750_MISC_TX1_DLY_VALUE0x5
> +
> +#define SDHCI_GLI_9750_TUNING_CONTROL  0x540
> +#define   SDHCI_GLI_9750_TUNING_CONTROL_EN  BIT(4)
> +#define   GLI_9750_TUNING_CONTROL_EN_ON 0x1
> +#define   GLI_9750_TUNING_CONTROL_EN_OFF0x0
> +#define   SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_1BIT(16)
> +#define   SDHCI_GLI_9750_TUNING_CONTROL_GLITCH_2GENMASK(20, 19)
> +#define   GLI_9750_TUNING_CONTROL_GLITCH_1_VALUE

Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"

2019-09-10 Thread Waiman Long

On 9/10/19 6:56 AM, Wanpeng Li wrote:
> On Mon, 9 Sep 2019 at 18:56, Waiman Long  wrote:
>> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>>> From: Wanpeng Li 
>>>
>>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
>>> vCPU is preempted), we found great regression caused by this commit.
>>>
>>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 
>>> vCPUs.
>>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
>>> records/s with this commit.
>>>
>>>   Host   Guestscore
>>>
>>> vanilla + w/o kvm optimizes vanilla   1700-1800 records/s
>>> vanilla + w/o kvm optimizes vanilla + revert  13000-14000 records/s
>>> vanilla + w/ kvm optimizes  vanilla   4500-5000 records/s
>>> vanilla + w/ kvm optimizes  vanilla + revert  14000-15500 records/s
>>>
>>> Exit from aggressive wait-early mechanism can result in yield premature and
>>> incur extra scheduling latency in over-subscribe scenario.
>>>
>>> kvm optimizes:
>>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock 
>>> waiter preemption)
>>>
>>> Tested-by: loobin...@tencent.com
>>> Cc: Peter Zijlstra 
>>> Cc: Thomas Gleixner 
>>> Cc: Ingo Molnar 
>>> Cc: Waiman Long 
>>> Cc: Paolo Bonzini 
>>> Cc: Radim Krčmář 
>>> Cc: loobin...@tencent.com
>>> Cc: sta...@vger.kernel.org
>>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
>>> Signed-off-by: Wanpeng Li 
>>> ---
>>>  kernel/locking/qspinlock_paravirt.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/qspinlock_paravirt.h 
>>> b/kernel/locking/qspinlock_paravirt.h
>>> index 89bab07..e84d21a 100644
>>> --- a/kernel/locking/qspinlock_paravirt.h
>>> +++ b/kernel/locking/qspinlock_paravirt.h
>>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>>>   if ((loop & PV_PREV_CHECK_MASK) != 0)
>>>   return false;
>>>
>>> - return READ_ONCE(prev->state) != vcpu_running || 
>>> vcpu_is_preempted(prev->cpu);
>>> + return READ_ONCE(prev->state) != vcpu_running;
>>>  }
>>>
>>>  /*
>> There are several possibilities for this performance regression:
>>
>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
>> cacheline contention issue depending on how that callback is implemented.
>>
>> 2) KVM may set the preempt flag for a short period whenver an vmexit
>> happens even if a vmenter is executed shortly after. In this case, we
>> may want to use a more durable vcpu suspend flag that indicates the vcpu
>> won't get a real vcpu back for a longer period of time.
>>
>> Perhaps you can add a lock event counter to count the number of
>> wait_early events caused by vcpu_is_preempted() being true to see if it
>> really cause a lot more wait_early than without the vcpu_is_preempted()
>> call.
> pv_wait_again:1:179
> pv_wait_early:1:189429
> pv_wait_head:1:263
> pv_wait_node:1:189429
> pv_vcpu_is_preempted:1:45588
> =sleep 5
> pv_wait_again:1:181
> pv_wait_early:1:202574
> pv_wait_head:1:267
> pv_wait_node:1:202590
> pv_vcpu_is_preempted:1:46336
>
> The sampling period is 5s, 6% of wait_early events caused by
> vcpu_is_preempted() being true.

6% isn't that high. However, when one vCPU voluntarily releases its
vCPU, all the subsequently waiters in the queue will do the same. It is
a cascading effect. Perhaps we wait early too aggressive with the
original patch.

I also look up the email chain of the original commit. The patch
submitter did not provide any performance data to support this change.
The patch just looked reasonable at that time. So there was no
objection. Given that we now have hard evidence that this was not a good
idea. I think we should revert it.

Reviewed-by: Waiman Long 

Thanks,
Longman

Re: [RFC PATCH 1/2] x86: Don't let pgprot_modify() change the page encryption bit

2019-09-10 Thread Andy Lutomirski

On Tue, Sep 10, 2019 at 12:26 PM Thomas Hellström (VMware)
 wrote:
>
> On 9/10/19 6:11 PM, Andy Lutomirski wrote:
> >
> >> On Sep 5, 2019, at 8:24 AM, Christoph Hellwig  wrote:
> >>
> >>> On Thu, Sep 05, 2019 at 05:21:24PM +0200, Thomas Hellström (VMware) wrote:
>  On 9/5/19 4:15 PM, Dave Hansen wrote:
>  Hi Thomas,
> 
>  Thanks for the second batch of patches!  These look much improved on all
>  fronts.
> >>> Yes, although the TTM functionality isn't in yet. Hopefully we won't have 
> >>> to
> >>> bother you with those though, since this assumes TTM will be using the dma
> >>> API.
> >> Please take a look at dma_mmap_prepare and dma_mmap_fault in this
> >> branch:
> >>
> >> 
> >> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-mmap-improvements
> >>
> >> they should allow to fault dma api pages in the page fault handler.  But
> >> this is totally hot off the press and not actually tested for the last
> >> few patches.  Note that I've also included your two patches from this
> >> series to handle SEV.
> > I read that patch, and it seems like you’ve built in the assumption that 
> > all pages in the mapping use identical protection or, if not, that the same 
> > fake vma hack that TTM already has is used to fudge around it.  Could it be 
> > reworked slightly to avoid this?
> >
> > I wonder if it’s a mistake to put the encryption bits in vm_page_prot at 
> > all.
>
>  From my POW, the encryption bits behave quite similar in behaviour to
> the caching mode bits, and they're also in vm_page_prot. They're the
> reason TTM needs to modify the page protection in the fault handler in
> the first place.
>
> The problem seen in TTM is that we want to be able to change the
> vm_page_prot from the fault handler, but it's problematic since we have
> the mmap_sem typically only in read mode. Hence the fake vma hack. From
> what I can tell it's reasonably well-behaved, since pte_modify() skips
> the bits TTM updates, so mprotect() and mremap() works OK. I think
> split_huge_pmd may run into trouble, but we don't support it (yet) with
> TTM.

One thing I'm confused about: does TTM move individual pages between
main memory and device memory or does it move whole mappings?  If it
moves individual pages, then a single mapping could have PTEs from
dma_alloc_coherent() space and from PCI space.  How can this work with
vm_page_prot?  I guess you might get lucky and have both have the same
protection bits, but that seems like an unfortunate thing to rely on.

As a for-real example, take a look at arch/x86/entry/vdso/vma.c.  The
"vvar" VMA contains multiple pages that are backed by different types
of memory.  There's a page of ordinary kernel memory.  Historically
there was a page of genuine MMIO memory, but that's gone now.  If the
system is a SEV guest with pvclock enabled, then there's a page of
decrypted memory.   They all share a VMA, they're instantiated in
.fault, and there is no hackery involved.  Look at vvar_fault().

So, Christoph, can't you restructure your code a bit to compute the
caching and encryption state per page instead of once in
dma_mmap_prepare() and insert the pages accordingly?  You might need
to revert 6d958546ff611c9ae09b181e628c1c5d5da5ebda depending on
exactly how you structure it.

>
> We could probably get away with a WRITE_ONCE() update of the
> vm_page_prot before taking the page table lock since
>
> a) We're locking out all other writers.
> b) We can't race with another fault to the same vma since we hold an
> address space lock ("buffer object reservation")
> c) When we need to update there are no valid page table entries in the
> vma, since it only happens directly after mmap(), or after an
> unmap_mapping_range() with the same address space lock. When another
> reader (for example split_huge_pmd()) sees a valid page table entry, it
> also sees the new page protection and things are fine.
>
> But that would really be a special case. To solve this properly we'd
> probably need an additional lock to protect the vm_flags and
> vm_page_prot, taken after mmap_sem and i_mmap_lock.
>

This is all horrible IMO.

Re: [PATCH 0/2] Perf/stat: Solve problems with repeat and interval

2019-09-10 Thread Ravi Bangoria





On 9/4/19 3:17 PM, Srikar Dronamraju wrote:

There are some problems in perf stat when using a combination of repeat and
interval options. This series tries to fix them.


For the series:

Tested-by: Ravi Bangoria

Re: [BACKPORT 4.14.y v2 2/6] locking/lockdep: Add debug_locks check in __lock_downgrade()

2019-09-10 Thread Baolin Wang

On Tue, 10 Sep 2019 at 22:32, Greg KH  wrote:
>
> On Thu, Sep 05, 2019 at 11:07:14AM +0800, Baolin Wang wrote:
> > From: Waiman Long 
> >
> > [Upstream commit 513e1073d52e55b8024b4f238a48de7587c64ccf]
> >
> > Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
> > warning right after a previous lockdep warning. It is likely that the
> > previous warning turned off lock debugging causing the lockdep to have
> > inconsistency states leading to the lock downgrade warning.
> >
> > Fix that by add a check for debug_locks at the beginning of
> > __lock_downgrade().
> >
> > Reported-by: Tetsuo Handa 
> > Reported-by: syzbot+53383ae265fb161ef...@syzkaller.appspotmail.com
> > Signed-off-by: Waiman Long 
> > Signed-off-by: Peter Zijlstra (Intel) 
> > Cc: Andrew Morton 
> > Cc: Linus Torvalds 
> > Cc: Paul E. McKenney 
> > Cc: Peter Zijlstra 
> > Cc: Thomas Gleixner 
> > Cc: Will Deacon 
> > Link: 
> > https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-long...@redhat.com
> > Signed-off-by: Ingo Molnar 
> > Signed-off-by: Baolin Wang 
> > ---
> >  kernel/locking/lockdep.c |3 +++
> >  1 file changed, 3 insertions(+)
>
> Why isn't this relevant for 4.19.y?  I can't add a patch to 4.14.y and
> then have someone upgrade to 4.19.y and not have the same fix in there,
> that would be a regression.
>
> So can you redo this series also with a 4.19.y set at the same so we
> don't get out of sync?  I've queued up your first patch already as that
> was in 4.19.y (and also needed in 4.9.y).

I understood, will do. Thanks.

-- 
Baolin Wang
Best Regards

Re: [PATCH v3] module: add link_flag pram in ref_module func to decide whether add usage link

2019-09-10 Thread Zhiqiang Liu

Friendly Ping...

On 2019/7/20 22:40, Zhiqiang Liu wrote:
> Users can call ref_module func in their modules to construct
> relationships with other modules. However, the holders
> '/sys/module//holders' of the target module donot include
> the users` module. So lsmod command misses detailed info of 'Used by'.
> 
> When load module, the process is given as follows,
> load_module()
>   -> mod_sysfs_setup()
>   -> add_usage_links
>   -> do_init_module
>   -> mod->init()
> 
> add_usage_links func creates holders of target modules linking to
> this module. If ref_module is called in mod->init() func, the usage
> links cannot be added.
> 
> Consider that add_module_usage and add usage_link may separate, the
> link_flag pram is added in ref_module func to decide whether add usage
> link after add_module_usage. If link_flag is true, it means usage link
> of a to b's holder_dir should be created immediately after add_module_usage.
> 
> V2->V3:
> - add link_flag pram in ref_module func to decide whether add usage link
> 
> V1->V2:
> - remove incorrect Fixes tag
> - fix error handling of sysfs_create_link as suggested by Jessica Yu
> 
> Signed-off-by: Zhiqiang Liu 
> Suggested-by: Jessica Yu 
> Reviewed-by: Kang Zhou 
> ---
>  include/linux/module.h |  2 +-
>  kernel/module.c| 27 ---
>  2 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index 188998d3dca9..9ec04b9e93e8 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -632,7 +632,7 @@ static inline void __module_get(struct module *module)
>  #define symbol_put_addr(p) do { } while (0)
> 
>  #endif /* CONFIG_MODULE_UNLOAD */
> -int ref_module(struct module *a, struct module *b);
> +int ref_module(struct module *a, struct module *b, bool link_flag);
> 
>  /* This is a #define so the string doesn't get put in every .o file */
>  #define module_name(mod) \
> diff --git a/kernel/module.c b/kernel/module.c
> index 80c7c09584cf..00e4862a8ef7 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -837,25 +837,26 @@ static int already_uses(struct module *a, struct module 
> *b)
>   *'b' can walk the list to see who sourced them), and of 'a'
>   *targets (so 'a' can see what modules it targets).
>   */
> -static int add_module_usage(struct module *a, struct module *b)
> +static struct module_use *add_module_usage(struct module *a, struct module 
> *b)
>  {
>   struct module_use *use;
> 
>   pr_debug("Allocating new usage for %s.\n", a->name);
>   use = kmalloc(sizeof(*use), GFP_ATOMIC);
>   if (!use)
> - return -ENOMEM;
> + return NULL;
> 
>   use->source = a;
>   use->target = b;
>   list_add(>source_list, >source_list);
>   list_add(>target_list, >target_list);
> - return 0;
> + return use;
>  }
> 
>  /* Module a uses b: caller needs module_mutex() */
> -int ref_module(struct module *a, struct module *b)
> +int ref_module(struct module *a, struct module *b, bool link_flag)
>  {
> + struct module_use *use;
>   int err;
> 
>   if (b == NULL || already_uses(a, b))
> @@ -866,9 +867,21 @@ int ref_module(struct module *a, struct module *b)
>   if (err)
>   return err;
> 
> - err = add_module_usage(a, b);
> + use = add_module_usage(a, b);
> + if (!use) {
> + module_put(b);
> + return -ENOMEM;
> + }
> +
> + if (!link_flag)
> + return 0;
> +
> + err = sysfs_create_link(b->holders_dir, >mkobj.kobj, a->name);
>   if (err) {
>   module_put(b);
> + list_del(>source_list);
> + list_del(>target_list);
> + kfree(use);
>   return err;
>   }
>   return 0;
> @@ -1152,7 +1165,7 @@ static inline void module_unload_free(struct module 
> *mod)
>  {
>  }
> 
> -int ref_module(struct module *a, struct module *b)
> +int ref_module(struct module *a, struct module *b, bool link_flag)
>  {
>   return strong_try_module_get(b);
>  }
> @@ -1407,7 +1420,7 @@ static const struct kernel_symbol 
> *resolve_symbol(struct module *mod,
>   goto getname;
>   }
> 
> - err = ref_module(mod, owner);
> + err = ref_module(mod, owner, false);
>   if (err) {
>   sym = ERR_PTR(err);
>   goto getname;
>

Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport

2019-09-10 Thread Tiwei Bie

On Wed, Sep 11, 2019 at 10:52:03AM +0800, Jason Wang wrote:
> On 2019/9/11 上午9:47, Tiwei Bie wrote:
> > On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:
> > > This path introduces a new mdev transport for virtio. This is used to
> > > use kernel virtio driver to drive the mediated device that is capable
> > > of populating virtqueue directly.
> > > 
> > > A new virtio-mdev driver will be registered to the mdev bus, when a
> > > new virtio-mdev device is probed, it will register the device with
> > > mdev based config ops. This means, unlike the exist hardware
> > > transport, this is a software transport between mdev driver and mdev
> > > device. The transport was implemented through:
> > > 
> > > - configuration access was implemented through parent_ops->read()/write()
> > > - vq/config callback was implemented through parent_ops->ioctl()
> > > 
> > > This transport is derived from virtio MMIO protocol and was wrote for
> > > kernel driver. But for the transport itself, but the design goal is to
> > > be generic enough to support userspace driver (this part will be added
> > > in the future).
> > > 
> > > Note:
> > > - current mdev assume all the parameter of parent_ops was from
> > >userspace. This prevents us from implementing the kernel mdev
> > >driver. For a quick POC, this patch just abuse those parameter and
> > >assume the mdev device implementation will treat them as kernel
> > >pointer. This should be addressed in the formal series by extending
> > >mdev_parent_ops.
> > > - for a quick POC, I just drive the transport from MMIO, I'm pretty
> > >there's lot of optimization space for this.
> > > 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   drivers/vfio/mdev/Kconfig|   7 +
> > >   drivers/vfio/mdev/Makefile   |   1 +
> > >   drivers/vfio/mdev/virtio_mdev.c  | 500 +++
> > >   include/uapi/linux/virtio_mdev.h | 131 
> > >   4 files changed, 639 insertions(+)
> > >   create mode 100644 drivers/vfio/mdev/virtio_mdev.c
> > >   create mode 100644 include/uapi/linux/virtio_mdev.h
> > > 
> > [...]
> > > diff --git a/include/uapi/linux/virtio_mdev.h 
> > > b/include/uapi/linux/virtio_mdev.h
> > > new file mode 100644
> > > index ..8040de6b960a
> > > --- /dev/null
> > > +++ b/include/uapi/linux/virtio_mdev.h
> > > @@ -0,0 +1,131 @@
> > > +/*
> > > + * Virtio mediated device driver
> > > + *
> > > + * Copyright 2019, Red Hat Corp.
> > > + *
> > > + * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
> > > + *
> > > + * This header is BSD licensed so anyone can use the definitions to 
> > > implement
> > > + * compatible drivers/servers.
> > > + *
> > > + * Redistribution and use in source and binary forms, with or without
> > > + * modification, are permitted provided that the following conditions
> > > + * are met:
> > > + * 1. Redistributions of source code must retain the above copyright
> > > + *notice, this list of conditions and the following disclaimer.
> > > + * 2. Redistributions in binary form must reproduce the above copyright
> > > + *notice, this list of conditions and the following disclaimer in the
> > > + *documentation and/or other materials provided with the 
> > > distribution.
> > > + * 3. Neither the name of IBM nor the names of its contributors
> > > + *may be used to endorse or promote products derived from this 
> > > software
> > > + *without specific prior written permission.
> > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 
> > > ``AS IS'' AND
> > > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR 
> > > PURPOSE
> > > + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
> > > CONSEQUENTIAL
> > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE 
> > > GOODS
> > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 
> > > STRICT
> > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY 
> > > WAY
> > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> > > + * SUCH DAMAGE.
> > > + */
> > > +#ifndef _LINUX_VIRTIO_MDEV_H
> > > +#define _LINUX_VIRTIO_MDEV_H
> > > +
> > > +#include 
> > > +#include 
> > > +#include 
> > > +
> > > +/*
> > > + * Ioctls
> > > + */
> > > +
> > > +struct virtio_mdev_callback {
> > > + irqreturn_t (*callback)(void *);
> > > + void *private;
> > > +};
> > > +
> > > +#define VIRTIO_MDEV 0xAF
> > > +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
> > > +  struct virtio_mdev_callback)
> > > +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
> > > + struct

[PATCH] mm/memblock: fix typo in memblock doc

2019-09-10 Thread Cao jin

elaboarte -> elaborate
architecure -> architecture
compltes -> completes

Signed-off-by: Cao jin 
---
 mm/memblock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61ae666a..0d0f92003d18 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -83,16 +83,16 @@
  * Note, that both API variants use implict assumptions about allowed
  * memory ranges and the fallback methods. Consult the documentation
  * of :c:func:`memblock_alloc_internal` and
- * :c:func:`memblock_alloc_range_nid` functions for more elaboarte
+ * :c:func:`memblock_alloc_range_nid` functions for more elaborate
  * description.
  *
  * As the system boot progresses, the architecture specific
  * :c:func:`mem_init` function frees all the memory to the buddy page
  * allocator.
  *
- * Unless an architecure enables %CONFIG_ARCH_KEEP_MEMBLOCK, the
+ * Unless an architecture enables %CONFIG_ARCH_KEEP_MEMBLOCK, the
  * memblock data structures will be discarded after the system
- * initialization compltes.
+ * initialization completes.
  */
 
 #ifndef CONFIG_NEED_MULTIPLE_NODES
-- 
2.21.0

Re: [vfs] 8bb3c61baf: vm-scalability.median -23.7% regression

2019-09-10 Thread Hugh Dickins

On Mon, 9 Sep 2019, Hugh Dickins wrote:
> On Mon, 9 Sep 2019, Al Viro wrote:
> > 
> > Anyway, see vfs.git#uncertain.shmem for what I've got with those folded in.
> > Do you see any problems with that one?  That's the last 5 commits in 
> > there...
> 
> It's mostly fine, I've no problem with going your way instead of what
> we had in mmotm; but I have seen some problems with it, and had been
> intending to send you a fixup patch tonight (shmem_reconfigure() missing
> unlock on error is the main problem, but there are other fixes needed).
> 
> But I'm growing tired. I've a feeling my "swap" of the mpols, instead
> of immediate mpol_put(), was necessary to protect against a race with
> shmem_get_sbmpol(), but I'm not clear-headed enough to trust myself on
> that now.  And I've a mystery to solve, that shmem_reconfigure() gets
> stuck into showing the wrong error message.

On my "swap" for the mpol_put(): no, the race against shmem_get_sbmpol()
is safe enough without that, and what you have matches what was always
done before. I rather like my "swap", which the previous double-free had
led me to, but it's fine if you prefer the ordinary way. I was probably
coming down from some over-exposure to iput() under spinlock, but there's
no such complications here.

> 
> Tomorrow
> 
> Oh, and my first attempt to build and boot that series over 5.3-rc5
> wouldn't boot. Luckily there was a tell-tale "i915" in the stacktrace,
> which reminded me of the drivers/gpu/drm/i915/gem/i915_gemfs.c fix
> we discussed earlier in the cycle.  That is of course in linux-next
> by now, but I wonder if your branch ought to contain a duplicate of
> that fix, so that people with i915 doing bisections on 5.4-rc do not
> fall into an unbootable hole between vfs and gpu merges.

Below are the fixups I arrived at last night (I've not rechecked your
tree today, to see if you made any changes since).  But they're not
enough: I now understand why shmem_reconfigure() got stuck showing
the wrong error message, but I'll have to leave it to you to decide
what to do about it, because I don't know whether it's just a mistake,
or different filesystem types have different needs there.

My /etc/fstab has a line in for one of my test mounts:
tmpfs/tlo tmpfs  size=4G   0 0
and that "size=4G" is what causes the problem: because each time
shmem_parse_options(fc, data) is called for a remount, data (that is,
options) points to a string starting with "size=4G,", followed by
what's actually been asked for in the remount options.

So if I try
mount -o remount,size=0 /tlo
that succeeds, setting the filesystem size to 0 meaning unlimited.
So if then as a test I try
mount -o remount,size=1M /tlo
that correctly fails with "Cannot retroactively limit size".
But then when I try
mount -o remount,nr_inodes=0 /tlo
I again get "Cannot retroactively limit size",
when it should have succeeded (again, 0 here meaning unlimited).

That's because the options in shmem_parse_options() are
"size=4G,nr_inodes=0", which indeed looks like an attempt to
retroactively limit size; but the user never asked "size=4G" there.

I think this problem, and some of what's fixed below, predate your
rework, and would equally affect the version in mmotm: I just didn't
discover these issues when I was testing that before.

Hugh

--- aviro/mm/shmem.c2019-09-09 14:10:34.379832855 -0700
+++ hughd/mm/shmem.c2019-09-09 23:29:28.467037895 -0700
@@ -3456,7 +3456,7 @@ static int shmem_parse_one(struct fs_con
ctx->huge = result.uint_32;
if (ctx->huge != SHMEM_HUGE_NEVER &&
!(IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) &&
-   has_transparent_hugepage()))
+ has_transparent_hugepage()))
goto unsupported_parameter;
ctx->seen |= SHMEM_SEEN_HUGE;
break;
@@ -3532,26 +3532,26 @@ static int shmem_reconfigure(struct fs_c

spin_lock(>stat_lock);
inodes = sbinfo->max_inodes - sbinfo->free_inodes;
-   if (ctx->seen & SHMEM_SEEN_BLOCKS) {
+   if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
+   if (!sbinfo->max_blocks) {
+   err = "Cannot retroactively limit size";
+   goto out;
+   }
if (percpu_counter_compare(>used_blocks,
   ctx->blocks) > 0) {
err = "Too small a size for current use";
goto out;
}
-   if (ctx->blocks && !sbinfo->max_blocks) {
-   err = "Cannot retroactively limit nr_blocks";
+   }
+   if ((ctx->seen & SHMEM_SEEN_INODES) && ctx->inodes) {
+   if (!sbinfo->max_inodes) {
+   err = "Cannot retroactively limit inodes";
goto out;
}
-   }
-   if (ctx->seen & SHMEM_SEEN_INODES)

Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport

2019-09-10 Thread Jason Wang




On 2019/9/11 上午9:47, Tiwei Bie wrote:

On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:

This path introduces a new mdev transport for virtio. This is used to
use kernel virtio driver to drive the mediated device that is capable
of populating virtqueue directly.

A new virtio-mdev driver will be registered to the mdev bus, when a
new virtio-mdev device is probed, it will register the device with
mdev based config ops. This means, unlike the exist hardware
transport, this is a software transport between mdev driver and mdev
device. The transport was implemented through:

- configuration access was implemented through parent_ops->read()/write()
- vq/config callback was implemented through parent_ops->ioctl()

This transport is derived from virtio MMIO protocol and was wrote for
kernel driver. But for the transport itself, but the design goal is to
be generic enough to support userspace driver (this part will be added
in the future).

Note:
- current mdev assume all the parameter of parent_ops was from
   userspace. This prevents us from implementing the kernel mdev
   driver. For a quick POC, this patch just abuse those parameter and
   assume the mdev device implementation will treat them as kernel
   pointer. This should be addressed in the formal series by extending
   mdev_parent_ops.
- for a quick POC, I just drive the transport from MMIO, I'm pretty
   there's lot of optimization space for this.

Signed-off-by: Jason Wang 
---
  drivers/vfio/mdev/Kconfig|   7 +
  drivers/vfio/mdev/Makefile   |   1 +
  drivers/vfio/mdev/virtio_mdev.c  | 500 +++
  include/uapi/linux/virtio_mdev.h | 131 
  4 files changed, 639 insertions(+)
  create mode 100644 drivers/vfio/mdev/virtio_mdev.c
  create mode 100644 include/uapi/linux/virtio_mdev.h


[...]

diff --git a/include/uapi/linux/virtio_mdev.h b/include/uapi/linux/virtio_mdev.h
new file mode 100644
index ..8040de6b960a
--- /dev/null
+++ b/include/uapi/linux/virtio_mdev.h
@@ -0,0 +1,131 @@
+/*
+ * Virtio mediated device driver
+ *
+ * Copyright 2019, Red Hat Corp.
+ *
+ * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
+ *
+ * This header is BSD licensed so anyone can use the definitions to implement
+ * compatible drivers/servers.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *notice, this list of conditions and the following disclaimer in the
+ *documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *may be used to endorse or promote products derived from this software
+ *without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+#ifndef _LINUX_VIRTIO_MDEV_H
+#define _LINUX_VIRTIO_MDEV_H
+
+#include 
+#include 
+#include 
+
+/*
+ * Ioctls
+ */
+
+struct virtio_mdev_callback {
+   irqreturn_t (*callback)(void *);
+   void *private;
+};
+
+#define VIRTIO_MDEV 0xAF
+#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
+struct virtio_mdev_callback)
+#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
+   struct virtio_mdev_callback)
+
+#define VIRTIO_MDEV_DEVICE_API_STRING  "virtio-mdev"
+
+/*
+ * Control registers
+ */
+
+/* Magic value ("virt" string) - Read Only */
+#define VIRTIO_MDEV_MAGIC_VALUE0x000
+
+/* Virtio device version - Read Only */
+#define VIRTIO_MDEV_VERSION0x004
+
+/* Virtio device ID - Read Only */
+#define VIRTIO_MDEV_DEVICE_ID  0x008
+
+/* Virtio vendor ID - Read Only */
+#define VIRTIO_MDEV_VENDOR_ID  0x00c
+
+/* Bitmask of the features supported by the device (host)
+ * (32 bits per set) - Read Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES0x010
+
+/* Device (host) features set selector - Write Only */
+#define

[PATCH v2] rtl8xxxu: add bluetooth co-existence support for single antenna

2019-09-10 Thread Chris Chiu

The RTL8723BU suffers the wifi disconnection problem while bluetooth
device connected. While wifi is doing tx/rx, the bluetooth will scan
without results. This is due to the wifi and bluetooth share the same
single antenna for RF communication and they need to have a mechanism
to collaborate.

BT information is provided via the packet sent from co-processor to
host (C2H). It contains the status of BT but the rtl8723bu_handle_c2h
dose not really handle it. And there's no bluetooth coexistence
mechanism to deal with it.

This commit adds a workqueue to set the tdma configurations and
coefficient table per the parsed bluetooth link status and given
wifi connection state. The tdma/coef table comes from the vendor
driver code of the RTL8192EU and RTL8723BU. However, this commit is
only for single antenna scenario which RTL8192EU is default dual
antenna. The rtl8xxxu_parse_rxdesc24 which invokes the handle_c2h
is only for 8723b and 8192e so the mechanism is expected to work
on both chips with single antenna. Note RTL8192EU dual antenna is
not supported.

Signed-off-by: Chris Chiu 
---

Notes:
  v2:
   - Add helper functions to replace bunch of tdma settings
   - Reformat some lines to meet kernel coding style


 .../net/wireless/realtek/rtl8xxxu/rtl8xxxu.h  |  37 +++
 .../realtek/rtl8xxxu/rtl8xxxu_8723b.c |   2 -
 .../wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 262 +-
 3 files changed, 294 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
index 582c2a346cec..22e95b11bfbb 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
@@ -1220,6 +1220,37 @@ enum ratr_table_mode_new {
RATEID_IDX_BGN_3SS = 14,
 };
 
+#define BT_INFO_8723B_1ANT_B_FTP   BIT(7)
+#define BT_INFO_8723B_1ANT_B_A2DP  BIT(6)
+#define BT_INFO_8723B_1ANT_B_HID   BIT(5)
+#define BT_INFO_8723B_1ANT_B_SCO_BUSY  BIT(4)
+#define BT_INFO_8723B_1ANT_B_ACL_BUSY  BIT(3)
+#define BT_INFO_8723B_1ANT_B_INQ_PAGE  BIT(2)
+#define BT_INFO_8723B_1ANT_B_SCO_ESCO  BIT(1)
+#define BT_INFO_8723B_1ANT_B_CONNECTIONBIT(0)
+
+enum _BT_8723B_1ANT_STATUS {
+   BT_8723B_1ANT_STATUS_NON_CONNECTED_IDLE  = 0x0,
+   BT_8723B_1ANT_STATUS_CONNECTED_IDLE  = 0x1,
+   BT_8723B_1ANT_STATUS_INQ_PAGE= 0x2,
+   BT_8723B_1ANT_STATUS_ACL_BUSY= 0x3,
+   BT_8723B_1ANT_STATUS_SCO_BUSY= 0x4,
+   BT_8723B_1ANT_STATUS_ACL_SCO_BUSY= 0x5,
+   BT_8723B_1ANT_STATUS_MAX
+};
+
+struct rtl8xxxu_btcoex {
+   u8  bt_status;
+   boolbt_busy;
+   boolhas_sco;
+   boolhas_a2dp;
+   boolhas_hid;
+   boolhas_pan;
+   boolhid_only;
+   boola2dp_only;
+   boolc2h_bt_inquiry;
+};
+
 #define RTL8XXXU_RATR_STA_INIT 0
 #define RTL8XXXU_RATR_STA_HIGH 1
 #define RTL8XXXU_RATR_STA_MID  2
@@ -1340,6 +1371,10 @@ struct rtl8xxxu_priv {
 */
struct ieee80211_vif *vif;
struct delayed_work ra_watchdog;
+   struct work_struct c2hcmd_work;
+   struct sk_buff_head c2hcmd_queue;
+   spinlock_t c2hcmd_lock;
+   struct rtl8xxxu_btcoex bt_coex;
 };
 
 struct rtl8xxxu_rx_urb {
@@ -1486,6 +1521,8 @@ void rtl8xxxu_fill_txdesc_v2(struct ieee80211_hw *hw, 
struct ieee80211_hdr *hdr,
 struct rtl8xxxu_txdesc32 *tx_desc32, bool sgi,
 bool short_preamble, bool ampdu_enable,
 u32 rts_rate);
+void rtl8723bu_set_ps_tdma(struct rtl8xxxu_priv *priv,
+  u8 arg1, u8 arg2, u8 arg3, u8 arg4, u8 arg5);
 
 extern struct rtl8xxxu_fileops rtl8192cu_fops;
 extern struct rtl8xxxu_fileops rtl8192eu_fops;
diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
index ceffe05bd65b..9ba661b3d767 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_8723b.c
@@ -1580,9 +1580,7 @@ static void rtl8723b_enable_rf(struct rtl8xxxu_priv *priv)
/*
 * Software control, antenna at WiFi side
 */
-#ifdef NEED_PS_TDMA
rtl8723bu_set_ps_tdma(priv, 0x08, 0x00, 0x00, 0x00, 0x00);
-#endif
 
rtl8xxxu_write32(priv, REG_BT_COEX_TABLE1, 0x);
rtl8xxxu_write32(priv, REG_BT_COEX_TABLE2, 0x);
diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c 
b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
index a6f358b9e447..e4c1b08c8070 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c
@@ -3820,9 +3820,8 @@ void rtl8xxxu_power_off(struct rtl8xxxu_priv *priv)
rtl8xxxu_write8(priv, REG_RSV_CTRL, 0x0e);
 }
 
-#ifdef

Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc for the ARM SMC/HVC mailbox

2019-09-10 Thread Jassi Brar

On Mon, Sep 9, 2019 at 10:42 AM Andre Przywara  wrote:
>
> On Wed, 28 Aug 2019 03:02:58 +
> Peng Fan  wrote:
>
> Hi,
>
> sorry for the late reply, eventually managed to have a closer look on this.
>
> > From: Peng Fan 
> >
> > The ARM SMC/HVC mailbox binding describes a firmware interface to trigger
> > actions in software layers running in the EL2 or EL3 exception levels.
> > The term "ARM" here relates to the SMC instruction as part of the ARM
> > instruction set, not as a standard endorsed by ARM Ltd.
> >
> > Signed-off-by: Peng Fan 
> > ---
> >  .../devicetree/bindings/mailbox/arm-smc.yaml   | 125 
> > +
> >  1 file changed, 125 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/mailbox/arm-smc.yaml
> >
> > diff --git a/Documentation/devicetree/bindings/mailbox/arm-smc.yaml 
> > b/Documentation/devicetree/bindings/mailbox/arm-smc.yaml
> > new file mode 100644
> > index ..f8eb28d5e307
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/mailbox/arm-smc.yaml
> > @@ -0,0 +1,125 @@
> > +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/mailbox/arm-smc.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: ARM SMC Mailbox Interface
> > +
> > +maintainers:
> > +  - Peng Fan 
> > +
> > +description: |
> > +  This mailbox uses the ARM smc (secure monitor call) and hvc (hypervisor
> > +  call) instruction to trigger a mailbox-connected activity in firmware,
> > +  executing on the very same core as the caller. By nature this operation
> > +  is synchronous and this mailbox provides no way for asynchronous messages
> > +  to be delivered the other way round, from firmware to the OS, but
> > +  asynchronous notification could also be supported. However the value of
> > +  r0/w0/x0 the firmware returns after the smc call is delivered as a 
> > received
> > +  message to the mailbox framework, so a synchronous communication can be
> > +  established, for a asynchronous notification, no value will be returned.
> > +  The exact meaning of both the action the mailbox triggers as well as the
> > +  return value is defined by their users and is not subject to this 
> > binding.
> > +
> > +  One use case of this mailbox is the SCMI interface, which uses shared 
> > memory
> > +  to transfer commands and parameters, and a mailbox to trigger a function
> > +  call. This allows SoCs without a separate management processor (or when
> > +  such a processor is not available or used) to use this standardized
> > +  interface anyway.
> > +
> > +  This binding describes no hardware, but establishes a firmware interface.
> > +  Upon receiving an SMC using one of the described SMC function 
> > identifiers,
> > +  the firmware is expected to trigger some mailbox connected functionality.
> > +  The communication follows the ARM SMC calling convention.
> > +  Firmware expects an SMC function identifier in r0 or w0. The supported
> > +  identifiers are passed from consumers, or listed in the the arm,func-ids
> > +  properties as described below. The firmware can return one value in
> > +  the first SMC result register, it is expected to be an error value,
> > +  which shall be propagated to the mailbox client.
> > +
> > +  Any core which supports the SMC or HVC instruction can be used, as long 
> > as
> > +  a firmware component running in EL3 or EL2 is handling these calls.
> > +
> > +properties:
> > +  compatible:
> > +const: arm,smc-mbox
> > +
> > +  "#mbox-cells":
> > +const: 1
> > +
> > +  arm,num-chans:
> > +description: The number of channels supported.
> > +items:
> > +  minimum: 1
> > +  maximum: 4096 # Should be enough?
>
> This maximum sounds rather arbitrary. Why do we need one? In the driver this 
> just allocates more memory, so why not just impose no artificial limit at all?
>
This will be gone, once the driver is converted to 1channel per controller.

> Actually, do we need this property at all? Can't we just rely on the size of 
> arm,func-ids to determine this (using of_property_count_elems_of_size() in 
> the driver)? Having both sounds redundant and brings up the question what to 
> do if they don't match.
>

> > +
> > +  method:
> > +- enum:
> > +- smc
> > +- hvc
> > +
> > +  transports:
> > +- enum:
> > +- mem
> > +- reg
>
> Shouldn't there be a description on what both mean, exactly?
> For instance I would expect a list of registers to be shown for the "reg" 
> case, and be it by referring to the ARM SMCCC.
>
> Also looking at the driver this brings up more questions:
> - Which memory does mem refer to? If this is really the means of transport, 
> it should be referenced in this *controller* node and populated by the 
> driver. Looking at the example below and the driver code, it actually isn't 
> used that way, instead the memory is used and controlled by the mailbox 
> *client*.
> -

[PATCH 2/2] ASoC: fsl_mqs: Add MQS component driver

2019-09-10 Thread Shengjiu Wang

MQS (medium quality sound), is used to generate medium quality
audio via a standard digital output pin. It can be used to
connect stereo speakers or headphones simply via power amplifier
stages without an additional DAC chip. It only accepts 2-channel,
LSB-valid 16bit, MSB shift-out first, frame sync asserting with
the first bit of the frame, data shifted with the posedge of
bit clock, 44.1 kHz or 48 kHz signals from SAI1 in left justified
format; and it provides the SNR target as no more than 20dB for
the signals below 10 kHz. The signals above 10 kHz will have
worse THD+N values.

MQS provides only simple audio reproduction. No internal pop,
click or distortion artifact reduction methods are provided.

The MQS receives the audio data from the SAI1 Tx section.

Signed-off-by: Shengjiu Wang 
---
 sound/soc/fsl/Kconfig   |  10 ++
 sound/soc/fsl/Makefile  |   2 +
 sound/soc/fsl/fsl_mqs.c | 336 
 3 files changed, 348 insertions(+)
 create mode 100644 sound/soc/fsl/fsl_mqs.c

diff --git a/sound/soc/fsl/Kconfig b/sound/soc/fsl/Kconfig
index aa99c008a925..65e8cd4be930 100644
--- a/sound/soc/fsl/Kconfig
+++ b/sound/soc/fsl/Kconfig
@@ -25,6 +25,16 @@ config SND_SOC_FSL_SAI
  This option is only useful for out-of-tree drivers since
  in-tree drivers select it automatically.
 
+config SND_SOC_FSL_MQS
+   tristate "Medium Quality Sound (MQS) module support"
+   depends on SND_SOC_FSL_SAI
+   select REGMAP_MMIO
+   help
+ Say Y if you want to add Medium Quality Sound (MQS)
+ support for the Freescale CPUs.
+ This option is only useful for out-of-tree drivers since
+ in-tree drivers select it automatically.
+
 config SND_SOC_FSL_AUDMIX
tristate "Audio Mixer (AUDMIX) module support"
select REGMAP_MMIO
diff --git a/sound/soc/fsl/Makefile b/sound/soc/fsl/Makefile
index c0dd04422fe9..8cde88c72d93 100644
--- a/sound/soc/fsl/Makefile
+++ b/sound/soc/fsl/Makefile
@@ -23,6 +23,7 @@ snd-soc-fsl-esai-objs := fsl_esai.o
 snd-soc-fsl-micfil-objs := fsl_micfil.o
 snd-soc-fsl-utils-objs := fsl_utils.o
 snd-soc-fsl-dma-objs := fsl_dma.o
+snd-soc-fsl-mqs-objs := fsl_mqs.o
 
 obj-$(CONFIG_SND_SOC_FSL_AUDMIX) += snd-soc-fsl-audmix.o
 obj-$(CONFIG_SND_SOC_FSL_ASOC_CARD) += snd-soc-fsl-asoc-card.o
@@ -33,6 +34,7 @@ obj-$(CONFIG_SND_SOC_FSL_SPDIF) += snd-soc-fsl-spdif.o
 obj-$(CONFIG_SND_SOC_FSL_ESAI) += snd-soc-fsl-esai.o
 obj-$(CONFIG_SND_SOC_FSL_MICFIL) += snd-soc-fsl-micfil.o
 obj-$(CONFIG_SND_SOC_FSL_UTILS) += snd-soc-fsl-utils.o
+obj-$(CONFIG_SND_SOC_FSL_MQS) += snd-soc-fsl-mqs.o
 obj-$(CONFIG_SND_SOC_POWERPC_DMA) += snd-soc-fsl-dma.o
 
 # MPC5200 Platform Support
diff --git a/sound/soc/fsl/fsl_mqs.c b/sound/soc/fsl/fsl_mqs.c
new file mode 100644
index ..d164f5da3460
--- /dev/null
+++ b/sound/soc/fsl/fsl_mqs.c
@@ -0,0 +1,336 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ALSA SoC IMX MQS driver
+ *
+ * Copyright (C) 2014-2019 Freescale Semiconductor, Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define REG_MQS_CTRL   0x00
+
+#define MQS_EN_MASK(0x1 << 28)
+#define MQS_EN_SHIFT   (28)
+#define MQS_SW_RST_MASK(0x1 << 24)
+#define MQS_SW_RST_SHIFT   (24)
+#define MQS_OVERSAMPLE_MASK(0x1 << 20)
+#define MQS_OVERSAMPLE_SHIFT   (20)
+#define MQS_CLK_DIV_MASK   (0xFF << 0)
+#define MQS_CLK_DIV_SHIFT  (0)
+
+/* codec private data */
+struct fsl_mqs {
+   struct regmap *regmap;
+   struct clk *mclk;
+   struct clk *ipg;
+
+   unsigned int reg_iomuxc_gpr2;
+   unsigned int reg_mqs_ctrl;
+   bool use_gpr;
+};
+
+#define FSL_MQS_RATES  (SNDRV_PCM_RATE_44100 | SNDRV_PCM_RATE_48000)
+#define FSL_MQS_FORMATSSNDRV_PCM_FMTBIT_S16_LE
+
+static int fsl_mqs_hw_params(struct snd_pcm_substream *substream,
+struct snd_pcm_hw_params *params,
+struct snd_soc_dai *dai)
+{
+   struct snd_soc_component *component = dai->component;
+   struct fsl_mqs *mqs_priv = snd_soc_component_get_drvdata(component);
+   unsigned long mclk_rate;
+   int div, res;
+   int bclk, lrclk;
+
+   mclk_rate = clk_get_rate(mqs_priv->mclk);
+   bclk = snd_soc_params_to_bclk(params);
+   lrclk = params_rate(params);
+
+   /*
+* mclk_rate / (oversample(32,64) * FS * 2 * divider ) = repeat_rate;
+* if repeat_rate is 8, mqs can achieve better quality.
+* oversample rate is fix to 32 currently.
+*/
+   div = mclk_rate / (32 * lrclk * 2 * 8);
+   res = mclk_rate % (32 * lrclk * 2 * 8);
+
+   if (res == 0 && div > 0 && div <= 256) {
+   if (mqs_priv->use_gpr) {
+   regmap_update_bits(mqs_priv->regmap, IOMUXC_GPR2,
+

[PATCH 1/2] ASoC: fsl_mqs: add DT binding documentation

2019-09-10 Thread Shengjiu Wang

Add the DT binding documentation for NXP MQS driver

Signed-off-by: Shengjiu Wang 
---
 .../devicetree/bindings/sound/fsl,mqs.txt | 20 +++
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/sound/fsl,mqs.txt

diff --git a/Documentation/devicetree/bindings/sound/fsl,mqs.txt 
b/Documentation/devicetree/bindings/sound/fsl,mqs.txt
new file mode 100644
index ..a1dbe181204a
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/fsl,mqs.txt
@@ -0,0 +1,20 @@
+fsl,mqs audio CODEC
+
+Required properties:
+
+  - compatible : Must contain one of "fsl,imx6sx-mqs", "fsl,codec-mqs"
+   "fsl,imx8qm-mqs", "fsl,imx8qxp-mqs".
+  - clocks : A list of phandles + clock-specifiers, one for each entry in
+clock-names
+  - clock-names : Must contain "mclk"
+  - gpr : The gpr node.
+
+Example:
+
+mqs: mqs {
+   compatible = "fsl,imx6sx-mqs";
+   gpr = <>;
+   clocks = < IMX6SX_CLK_SAI1>;
+   clock-names = "mclk";
+   status = "disabled";
+};
-- 
2.21.0

[PATCH V2 net-next 5/7] net: hns3: modify some logs format

2019-09-10 Thread Huazhong Tan

From: Guangbin Huang 

The pfc_en and pfc_map need to be displayed in hexadecimal notation,
printing dma address should use %pad, and the end of printed string
needs to be add "\n".

This patch modifies them.

Signed-off-by: Guangbin Huang 
Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c  | 7 +--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c  | 2 +-
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
index 5cf4c1e..28961a6 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c
@@ -166,6 +166,7 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const 
char *cmd_buf)
struct hns3_enet_ring *ring;
u32 tx_index, rx_index;
u32 q_num, value;
+   dma_addr_t addr;
int cnt;
 
cnt = sscanf(_buf[8], "%u %u", _num, _index);
@@ -194,8 +195,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const 
char *cmd_buf)
}
 
tx_desc = >desc[tx_index];
+   addr = le64_to_cpu(tx_desc->addr);
dev_info(dev, "TX Queue Num: %u, BD Index: %u\n", q_num, tx_index);
-   dev_info(dev, "(TX)addr: 0x%llx\n", tx_desc->addr);
+   dev_info(dev, "(TX)addr: %pad\n", );
dev_info(dev, "(TX)vlan_tag: %u\n", tx_desc->tx.vlan_tag);
dev_info(dev, "(TX)send_size: %u\n", tx_desc->tx.send_size);
dev_info(dev, "(TX)vlan_tso: %u\n", tx_desc->tx.type_cs_vlan_tso);
@@ -217,8 +219,9 @@ static int hns3_dbg_bd_info(struct hnae3_handle *h, const 
char *cmd_buf)
rx_index = (cnt == 1) ? value : tx_index;
rx_desc  = >desc[rx_index];
 
+   addr = le64_to_cpu(rx_desc->addr);
dev_info(dev, "RX Queue Num: %u, BD Index: %u\n", q_num, rx_index);
-   dev_info(dev, "(RX)addr: 0x%llx\n", rx_desc->addr);
+   dev_info(dev, "(RX)addr: %pad\n", );
dev_info(dev, "(RX)l234_info: %u\n", rx_desc->rx.l234_info);
dev_info(dev, "(RX)pkt_len: %u\n", rx_desc->rx.pkt_len);
dev_info(dev, "(RX)size: %u\n", rx_desc->rx.size);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
index 816f920..c063301 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c
@@ -342,7 +342,7 @@ static int hclge_ieee_setpfc(struct hnae3_handle *h, struct 
ieee_pfc *pfc)
hdev->tm_info.pfc_en = pfc->pfc_en;
 
netif_dbg(h, drv, netdev,
- "set pfc: pfc_en=%u, pfc_map=%u, num_tc=%u\n",
+ "set pfc: pfc_en=%x, pfc_map=%x, num_tc=%u\n",
  pfc->pfc_en, pfc_map, hdev->tm_info.num_tc);
 
hclge_tm_pfc_info_update(hdev);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 8d4dc1b..bc5bad3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3751,7 +3751,7 @@ static void hclge_reset_event(struct pci_dev *pdev, 
struct hnae3_handle *handle)
else if (time_after(jiffies, (hdev->last_reset_time + 4 * 5 * HZ)))
hdev->reset_level = HNAE3_FUNC_RESET;
 
-   dev_info(>pdev->dev, "received reset event , reset type is %d",
+   dev_info(>pdev->dev, "received reset event, reset type is %d\n",
 hdev->reset_level);
 
/* request reset & schedule reset task */
-- 
2.7.4

[PATCH V2 net-next 3/7] net: hns3: fix shaper parameter algorithm

2019-09-10 Thread Huazhong Tan

From: Yonglong Liu 

Currently when hns3 driver configures the tm shaper to limit
bandwidth below 20Mbit using the parameters calculated by
hclge_shaper_para_calc(), the actual bandwidth limited by tm
hardware module is not accurate enough, for example, 1.28 Mbit
when the user is configuring 1 Mbit.

This patch adjusts the ir_calc to be closer to ir, and
always calculate the ir_b parameter when user is configuring
a small bandwidth. Also, removes an unnecessary parenthesis
when calculating denominator.

Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 
driver")
Signed-off-by: Yonglong Liu 
Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
index e829101..9f0e35f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c
@@ -81,16 +81,13 @@ static int hclge_shaper_para_calc(u32 ir, u8 shaper_level,
return 0;
} else if (ir_calc > ir) {
/* Increasing the denominator to select ir_s value */
-   while (ir_calc > ir) {
+   while (ir_calc >= ir && ir) {
ir_s_calc++;
ir_calc = DIVISOR_IR_B_126 / (tick * (1 << ir_s_calc));
}
 
-   if (ir_calc == ir)
-   *ir_b = 126;
-   else
-   *ir_b = (ir * tick * (1 << ir_s_calc) +
-(DIVISOR_CLK >> 1)) / DIVISOR_CLK;
+   *ir_b = (ir * tick * (1 << ir_s_calc) + (DIVISOR_CLK >> 1)) /
+   DIVISOR_CLK;
} else {
/* Increasing the numerator to select ir_u value */
u32 numerator;
@@ -104,7 +101,7 @@ static int hclge_shaper_para_calc(u32 ir, u8 shaper_level,
if (ir_calc == ir) {
*ir_b = 126;
} else {
-   u32 denominator = (DIVISOR_CLK * (1 << --ir_u_calc));
+   u32 denominator = DIVISOR_CLK * (1 << --ir_u_calc);
*ir_b = (ir * tick + (denominator >> 1)) / denominator;
}
}
-- 
2.7.4

[PATCH V2 net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver

2019-09-10 Thread Huazhong Tan

From: Guangbin Huang 

This patch adds ethtool_ops.set_channels support for HNS3 VF driver,
and updates related TQP information and RSS information, to support
modification of VF TQP number, and uses current rss_size instead of
max_rss_size to initialize RSS.

Also, fixes a format error in hclgevf_get_rss().

Signed-off-by: Guangbin Huang 
Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c |  1 +
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 83 --
 2 files changed, 79 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index aa692b1..f5a681d 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -1397,6 +1397,7 @@ static const struct ethtool_ops hns3vf_ethtool_ops = {
.set_rxfh = hns3_set_rss,
.get_link_ksettings = hns3_get_link_ksettings,
.get_channels = hns3_get_channels,
+   .set_channels = hns3_set_channels,
.get_coalesce = hns3_get_coalesce,
.set_coalesce = hns3_set_coalesce,
.get_regs_len = hns3_get_regs_len,
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
index 594cae8..e3090b3 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c
@@ -743,7 +743,7 @@ static int hclgevf_get_rss(struct hnae3_handle *handle, u32 
*indir, u8 *key,
 }
 
 static int hclgevf_set_rss(struct hnae3_handle *handle, const u32 *indir,
-  const  u8 *key, const  u8 hfunc)
+  const u8 *key, const u8 hfunc)
 {
struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
struct hclgevf_rss_cfg *rss_cfg = >rss_cfg;
@@ -2060,9 +2060,10 @@ static int hclgevf_config_gro(struct hclgevf_dev *hdev, 
bool en)
 static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 {
struct hclgevf_rss_cfg *rss_cfg = >rss_cfg;
-   int i, ret;
+   int ret;
+   u32 i;
 
-   rss_cfg->rss_size = hdev->rss_size_max;
+   rss_cfg->rss_size = hdev->nic.kinfo.rss_size;
 
if (hdev->pdev->revision >= 0x21) {
rss_cfg->hash_algo = HCLGEVF_RSS_HASH_ALGO_SIMPLE;
@@ -2099,13 +2100,13 @@ static int hclgevf_rss_init_hw(struct hclgevf_dev *hdev)
 
/* Initialize RSS indirect table */
for (i = 0; i < HCLGEVF_RSS_IND_TBL_SIZE; i++)
-   rss_cfg->rss_indirection_tbl[i] = i % hdev->rss_size_max;
+   rss_cfg->rss_indirection_tbl[i] = i % rss_cfg->rss_size;
 
ret = hclgevf_set_rss_indir_table(hdev);
if (ret)
return ret;
 
-   return hclgevf_set_rss_tc_mode(hdev, hdev->rss_size_max);
+   return hclgevf_set_rss_tc_mode(hdev, rss_cfg->rss_size);
 }
 
 static int hclgevf_init_vlan_config(struct hclgevf_dev *hdev)
@@ -2835,6 +2836,77 @@ static void hclgevf_get_tqps_and_rss_info(struct 
hnae3_handle *handle,
*max_rss_size = hdev->rss_size_max;
 }
 
+static void hclgevf_update_rss_size(struct hnae3_handle *handle,
+   u32 new_tqps_num)
+{
+   struct hnae3_knic_private_info *kinfo = >kinfo;
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+   u16 max_rss_size;
+
+   kinfo->req_rss_size = new_tqps_num;
+
+   max_rss_size = min_t(u16, hdev->rss_size_max,
+hdev->num_tqps / kinfo->num_tc);
+
+   /* Use the user's configuration when it is not larger than
+* max_rss_size, otherwise, use the maximum specification value.
+*/
+   if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
+   kinfo->req_rss_size <= max_rss_size)
+   kinfo->rss_size = kinfo->req_rss_size;
+   else if (kinfo->rss_size > max_rss_size ||
+(!kinfo->req_rss_size && kinfo->rss_size < max_rss_size))
+   kinfo->rss_size = max_rss_size;
+
+   kinfo->num_tqps = kinfo->num_tc * kinfo->rss_size;
+}
+
+static int hclgevf_set_channels(struct hnae3_handle *handle, u32 new_tqps_num,
+   bool rxfh_configured)
+{
+   struct hclgevf_dev *hdev = hclgevf_ae_get_hdev(handle);
+   struct hnae3_knic_private_info *kinfo = >kinfo;
+   u16 cur_rss_size = kinfo->rss_size;
+   u16 cur_tqps = kinfo->num_tqps;
+   u32 *rss_indir;
+   unsigned int i;
+   int ret;
+
+   hclgevf_update_rss_size(handle, new_tqps_num);
+
+   ret = hclgevf_set_rss_tc_mode(hdev, kinfo->rss_size);
+   if (ret)
+   return ret;
+
+   /* RSS indirection table has been configuared by user */
+   if (rxfh_configured)
+   goto out;
+
+   /* Reinitializes the rss indirect table according to the new RSS size */
+   rss_indir = kcalloc(HCLGEVF_RSS_IND_TBL_SIZE,

[PATCH V2 net-next 7/7] net: hns3: add some DFX info for reset issue

2019-09-10 Thread Huazhong Tan

This patch adds more information for reset DFX. Also, adds some
cleanups to reset info, move reset_fail_cnt into struct
hclge_rst_stats, and modifies some print formats.

Signed-off-by: Huazhong Tan 
---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 --
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 11 
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  2 +-
 3 files changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
index 6dcce48..d0128d7 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c
@@ -931,22 +931,36 @@ static void hclge_dbg_fd_tcam(struct hclge_dev *hdev)
 
 static void hclge_dbg_dump_rst_info(struct hclge_dev *hdev)
 {
-   dev_info(>pdev->dev, "PF reset count: %d\n",
+   dev_info(>pdev->dev, "PF reset count: %u\n",
 hdev->rst_stats.pf_rst_cnt);
-   dev_info(>pdev->dev, "FLR reset count: %d\n",
+   dev_info(>pdev->dev, "FLR reset count: %u\n",
 hdev->rst_stats.flr_rst_cnt);
-   dev_info(>pdev->dev, "CORE reset count: %d\n",
-hdev->rst_stats.core_rst_cnt);
-   dev_info(>pdev->dev, "GLOBAL reset count: %d\n",
+   dev_info(>pdev->dev, "GLOBAL reset count: %u\n",
 hdev->rst_stats.global_rst_cnt);
-   dev_info(>pdev->dev, "IMP reset count: %d\n",
+   dev_info(>pdev->dev, "IMP reset count: %u\n",
 hdev->rst_stats.imp_rst_cnt);
-   dev_info(>pdev->dev, "reset done count: %d\n",
+   dev_info(>pdev->dev, "reset done count: %u\n",
 hdev->rst_stats.reset_done_cnt);
-   dev_info(>pdev->dev, "HW reset done count: %d\n",
+   dev_info(>pdev->dev, "HW reset done count: %u\n",
 hdev->rst_stats.hw_reset_done_cnt);
-   dev_info(>pdev->dev, "reset count: %d\n",
+   dev_info(>pdev->dev, "reset count: %u\n",
 hdev->rst_stats.reset_cnt);
+   dev_info(>pdev->dev, "reset count: %u\n",
+hdev->rst_stats.reset_cnt);
+   dev_info(>pdev->dev, "reset fail count: %u\n",
+hdev->rst_stats.reset_fail_cnt);
+   dev_info(>pdev->dev, "vector0 interrupt enable status: 0x%x\n",
+hclge_read_dev(>hw, HCLGE_MISC_VECTOR_REG_BASE));
+   dev_info(>pdev->dev, "reset interrupt source: 0x%x\n",
+hclge_read_dev(>hw, HCLGE_MISC_RESET_STS_REG));
+   dev_info(>pdev->dev, "reset interrupt status: 0x%x\n",
+hclge_read_dev(>hw, HCLGE_MISC_VECTOR_INT_STS));
+   dev_info(>pdev->dev, "hardware reset status: 0x%x\n",
+hclge_read_dev(>hw, HCLGE_GLOBAL_RESET_REG));
+   dev_info(>pdev->dev, "handshake status: 0x%x\n",
+hclge_read_dev(>hw, HCLGE_NIC_CSQ_DEPTH_REG));
+   dev_info(>pdev->dev, "function reset status: 0x%x\n",
+hclge_read_dev(>hw, HCLGE_FUN_RST_ING));
 }
 
 static void hclge_dbg_get_m7_stats_info(struct hclge_dev *hdev)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bc5bad3..fd7f943 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3547,12 +3547,12 @@ static bool hclge_reset_err_handle(struct hclge_dev 
*hdev)
 "reset failed because new reset interrupt\n");
hclge_clear_reset_cause(hdev);
return false;
-   } else if (hdev->reset_fail_cnt < MAX_RESET_FAIL_CNT) {
-   hdev->reset_fail_cnt++;
+   } else if (hdev->rst_stats.reset_fail_cnt < MAX_RESET_FAIL_CNT) {
+   hdev->rst_stats.reset_fail_cnt++;
set_bit(hdev->reset_type, >reset_pending);
dev_info(>pdev->dev,
 "re-schedule reset task(%d)\n",
-hdev->reset_fail_cnt);
+hdev->rst_stats.reset_fail_cnt);
return true;
}
 
@@ -3679,7 +3679,8 @@ static void hclge_reset(struct hclge_dev *hdev)
/* ignore RoCE notify error if it fails HCLGE_RESET_MAX_FAIL_CNT - 1
 * times
 */
-   if (ret && hdev->reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
+   if (ret &&
+   hdev->rst_stats.reset_fail_cnt < HCLGE_RESET_MAX_FAIL_CNT - 1)
goto err_reset;
 
rtnl_lock();
@@ -3695,7 +3696,7 @@ static void hclge_reset(struct hclge_dev *hdev)
goto err_reset;
 
hdev->last_reset_time = jiffies;
-   hdev->reset_fail_cnt = 0;
+   hdev->rst_stats.reset_fail_cnt = 0;
hdev->rst_stats.reset_done_cnt++;
ae_dev->reset_type = HNAE3_NONE_RESET;
 
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.h

[PATCH V2 net-next 0/7] net: hns3: add a feature & bugfixes & cleanups

2019-09-10 Thread Huazhong Tan

This patch-set includes a VF feature, bugfixes and cleanups for the HNS3
ethernet controller driver.

[patch 01/07] adds ethtool_ops.set_channels support for HNS3 VF driver

[patch 02/07] adds a recovery for setting channel fail.

[patch 03/07] fixes an error related to shaper parameter algorithm.

[patch 04/07] fixes an error related to ksetting.

[patch 05/07] adds cleanups for some log pinting.

[patch 06/07] adds a NULL pointer check before function calling.

[patch 07/07] adds some debugging information for reset issue.

Change log:
V1->V2: fixes comment from David Miller.

Guangbin Huang (4):
  net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver
  net: hns3: fix port setting handle for fibre port
  net: hns3: modify some logs format
  net: hns3: check NULL pointer before use

Huazhong Tan (1):
  net: hns3: add some DFX info for reset issue

Peng Li (1):
  net: hns3: revert to old channel when setting new channel num fail

Yonglong Liu (1):
  net: hns3: fix shaper parameter algorithm

 drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c |  7 +-
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c| 54 ++
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 16 +
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c |  2 +-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 32 ++---
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 13 ++--
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.h|  2 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_tm.c  | 11 ++-
 .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c  | 83 --
 9 files changed, 174 insertions(+), 46 deletions(-)

-- 
2.7.4

[PATCH V2 net-next 4/7] net: hns3: fix port setting handle for fibre port

2019-09-10 Thread Huazhong Tan

From: Guangbin Huang 

For hardware doesn't support use specified speed and duplex
to negotiate, it's unnecessary to check and modify the port
speed and duplex for fibre port when autoneg is on.

Fixes: 22f48e24a23d ("net: hns3: add autoneg and change speed support for fibre 
port")
Signed-off-by: Guangbin Huang 
Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
index f5a681d..680c350 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
@@ -726,6 +726,12 @@ static int hns3_check_ksettings_param(const struct 
net_device *netdev,
u8 duplex;
int ret;
 
+   /* hw doesn't support use specified speed and duplex to negotiate,
+* unnecessary to check them when autoneg on.
+*/
+   if (cmd->base.autoneg)
+   return 0;
+
if (ops->get_ksettings_an_result) {
ops->get_ksettings_an_result(handle, , , );
if (cmd->base.autoneg == autoneg && cmd->base.speed == speed &&
@@ -787,6 +793,15 @@ static int hns3_set_link_ksettings(struct net_device 
*netdev,
return ret;
}
 
+   /* hw doesn't support use specified speed and duplex to negotiate,
+* ignore them when autoneg on.
+*/
+   if (cmd->base.autoneg) {
+   netdev_info(netdev,
+   "autoneg is on, ignore the speed and duplex\n");
+   return 0;
+   }
+
if (ops->cfg_mac_speed_dup_h)
ret = ops->cfg_mac_speed_dup_h(handle, cmd->base.speed,
   cmd->base.duplex);
-- 
2.7.4

[PATCH V2 net-next 2/7] net: hns3: revert to old channel when setting new channel num fail

2019-09-10 Thread Huazhong Tan

From: Peng Li 

After setting new channel num, it needs free old ring memory and
allocate new ring memory. If there is no enough memory and allocate
new ring memory fail, the ring may initialize fail. To make sure
the network interface can work normally, driver should revert the
channel to the old configuration.

Signed-off-by: Peng Li 
Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 51 ++---
 1 file changed, 37 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 9f3f8e3..8dbaf36 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4410,6 +4410,30 @@ static int hns3_reset_notify(struct hnae3_handle *handle,
return ret;
 }
 
+static int hns3_change_channels(struct hnae3_handle *handle, u32 new_tqp_num,
+   bool rxfh_configured)
+{
+   int ret;
+
+   ret = handle->ae_algo->ops->set_channels(handle, new_tqp_num,
+rxfh_configured);
+   if (ret) {
+   dev_err(>pdev->dev,
+   "Change tqp num(%u) fail.\n", new_tqp_num);
+   return ret;
+   }
+
+   ret = hns3_reset_notify(handle, HNAE3_INIT_CLIENT);
+   if (ret)
+   return ret;
+
+   ret =  hns3_reset_notify(handle, HNAE3_UP_CLIENT);
+   if (ret)
+   hns3_reset_notify(handle, HNAE3_UNINIT_CLIENT);
+
+   return ret;
+}
+
 int hns3_set_channels(struct net_device *netdev,
  struct ethtool_channels *ch)
 {
@@ -4450,24 +4474,23 @@ int hns3_set_channels(struct net_device *netdev,
return ret;
 
org_tqp_num = h->kinfo.num_tqps;
-   ret = h->ae_algo->ops->set_channels(h, new_tqp_num, rxfh_configured);
+   ret = hns3_change_channels(h, new_tqp_num, rxfh_configured);
if (ret) {
-   ret = h->ae_algo->ops->set_channels(h, org_tqp_num,
-   rxfh_configured);
-   if (ret) {
-   /* If revert to old tqp failed, fatal error occurred */
-   dev_err(>dev,
-   "Revert to old tqp num fail, ret=%d", ret);
-   return ret;
+   int ret1;
+
+   netdev_warn(netdev,
+   "Change channels fail, revert to old value\n");
+   ret1 = hns3_change_channels(h, org_tqp_num, rxfh_configured);
+   if (ret1) {
+   netdev_err(netdev,
+  "revert to old channel fail\n");
+   return ret1;
}
-   dev_info(>dev,
-"Change tqp num fail, Revert to old tqp num");
-   }
-   ret = hns3_reset_notify(h, HNAE3_INIT_CLIENT);
-   if (ret)
+
return ret;
+   }
 
-   return hns3_reset_notify(h, HNAE3_UP_CLIENT);
+   return 0;
 }
 
 static const struct hns3_hw_error_info hns3_hw_err[] = {
-- 
2.7.4

[PATCH V2 net-next 6/7] net: hns3: check NULL pointer before use

2019-09-10 Thread Huazhong Tan

From: Guangbin Huang 

This patch checks ops->set_default_reset_request whether is NULL
before using it in function hns3_slot_reset.

Signed-off-by: Guangbin Huang 
Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 8dbaf36..616cad0 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2006,7 +2006,8 @@ static pci_ers_result_t hns3_slot_reset(struct pci_dev 
*pdev)
 
ops = ae_dev->ops;
/* request the reset */
-   if (ops->reset_event && ops->get_reset_level) {
+   if (ops->reset_event && ops->get_reset_level &&
+   ops->set_default_reset_request) {
if (ae_dev->hw_err_reset_req) {
reset_type = ops->get_reset_level(ae_dev,
_dev->hw_err_reset_req);
-- 
2.7.4

Re: mtd raw nand denali.c broken for Intel/Altera Cyclone V

2019-09-10 Thread Masahiro Yamada

Hi Dinh,

On Wed, Sep 11, 2019 at 12:22 AM Dinh Nguyen  wrote:
>
>
>
> On 9/10/19 8:48 AM, Tim Sander wrote:
> > Hi
> >
> > I have noticed that my SPF records where not in place after moving the 
> > server,
> > so it seems the mail didn't go to the mailing list. Hopefully that's fixed 
> > now.
> >
> > Am Dienstag, 10. September 2019, 09:16:37 CEST schrieb Masahiro Yamada:
> >> On Fri, Sep 6, 2019 at 9:39 PM Tim Sander  wrote:
> >>> Hi
> >>>
> >>> I have noticed that there multiple breakages piling up for the denali nand
> >>> driver on the Intel/Altera Cyclone V. Unfortunately i had no time to track
> >>> the mainline kernel closely. So the breakage seems to pile up. I am a
> >>> little disapointed that Intel is not on the lookout that the kernel works
> >>> on the chips they are selling. I was really happy about the state of the
> >>> platform before concerning mainline support.
> >>>
> >>> The failure starts with kernel 4.19 or stable kernel release 4.18.19. The
> >>> commit is ba4a1b62a2d742df9e9c607ac53b3bf33496508f.
> >>
> >> Just for clarification, this corresponds to
> >> 0d55c668b218a1db68b5044bce4de74e1bd0f0c8 upstream.
> >>
> >>> The problem here is that
> >>> our platform works with a zero in the SPARE_AREA_SKIP_BYTES register.
> >>
> >> Please clarify the scope of "our platform".
> >> (Only you, or your company, or every individual using this chip?)
> > The company i work for uses this chip as a base for multiple products.
> >
> >> First, SPARE_AREA_SKIP_BYTES is not the property of the hardware.
> >> Rather, it is about the OOB layout, in other words, this parameter
> >> is defined by software.
> >>
> >> For example, U-Boot supports the Denali NAND driver.
> >> The SPARE_AREA_SKIP_BYTES is a user-configurable parameter:
> >> https://github.com/u-boot/u-boot/blob/v2019.10-rc3/drivers/mtd/nand/raw/Kcon
> >> fig#L112
> >>
> >>
> >> Your platform works with a zero in the SPARE_AREA_SKIP_BYTES register
> >> because the NAND chip on the board was initialized with a zero
> >> set to the SPARE_AREA_SKIP_BYTES register.
> >>
> >> If the NAND chip had been initialized with 8
> >> set to the SPARE_AREA_SKIP_BYTES register, it would have
> >> been working with 8 to the SPARE_AREA_SKIP_BYTES.
> >>
> >> The Boot ROM is the only (semi-)software that is unconfigurable by users,
> >> so the value of SPARE_AREA_SKIP_BYTES should be aligned with
> >> the boot ROM.
> >> I recommend you to check the spec of the boot ROM.
> > We boot from NOR flash. That's why i didn't see a problem booting probably.
> >
> >> (The maintainer of the platform, Dihn is CC'ed,
> >> so I hope he will jump in)
> > Yes i hope so too.
> >
>
> I don't have access to a NAND device at the moment. I'll try to find one
> and debug.
>
> Dinh


Dinh,
Do you have answers for the following questions?


- Does the SOCFPGA boot ROM support the NAND boot mode?

- If so, which value does it use for SPARE_AREA_SKIP_BYTES?




-- 
Best Regards
Masahiro Yamada

Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport

2019-09-10 Thread Jason Wang




On 2019/9/10 下午9:52, Michael S. Tsirkin wrote:

On Tue, Sep 10, 2019 at 09:13:02PM +0800, Jason Wang wrote:

On 2019/9/10 下午6:01, Michael S. Tsirkin wrote:

+#ifndef _LINUX_VIRTIO_MDEV_H
+#define _LINUX_VIRTIO_MDEV_H
+
+#include 
+#include 
+#include 
+
+/*
+ * Ioctls
+ */

Pls add a bit more content here. It's redundant to state these
are ioctls. Much better to document what does each one do.


Ok.



+
+struct virtio_mdev_callback {
+   irqreturn_t (*callback)(void *);
+   void *private;
+};
+
+#define VIRTIO_MDEV 0xAF
+#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
+struct virtio_mdev_callback)
+#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
+   struct virtio_mdev_callback)

Function pointer in an ioctl parameter? How does this ever make sense?


I admit this is hacky (casting).



And can't we use a couple of registers for this, and avoid ioctls?


Yes, how about something like interrupt numbers for each virtqueue and
config?

Should we just reuse VIRTIO_PCI_COMMON_Q_XXX then?



You mean something like VIRTIO_PCI_COMMON_Q_MSIX? Then it becomes a PCI 
transport in fact. And using either MSIX or irq number is actually 
another layer of indirection. So I think we can just write callback 
function and parameter through registers.







+
+#define VIRTIO_MDEV_DEVICE_API_STRING  "virtio-mdev"
+
+/*
+ * Control registers
+ */
+
+/* Magic value ("virt" string) - Read Only */
+#define VIRTIO_MDEV_MAGIC_VALUE0x000
+
+/* Virtio device version - Read Only */
+#define VIRTIO_MDEV_VERSION0x004
+
+/* Virtio device ID - Read Only */
+#define VIRTIO_MDEV_DEVICE_ID  0x008
+
+/* Virtio vendor ID - Read Only */
+#define VIRTIO_MDEV_VENDOR_ID  0x00c
+
+/* Bitmask of the features supported by the device (host)
+ * (32 bits per set) - Read Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES0x010
+
+/* Device (host) features set selector - Write Only */
+#define VIRTIO_MDEV_DEVICE_FEATURES_SEL0x014
+
+/* Bitmask of features activated by the driver (guest)
+ * (32 bits per set) - Write Only */
+#define VIRTIO_MDEV_DRIVER_FEATURES0x020
+
+/* Activated features set selector - Write Only */
+#define VIRTIO_MDEV_DRIVER_FEATURES_SEL0x024
+
+/* Queue selector - Write Only */
+#define VIRTIO_MDEV_QUEUE_SEL  0x030
+
+/* Maximum size of the currently selected queue - Read Only */
+#define VIRTIO_MDEV_QUEUE_NUM_MAX  0x034
+
+/* Queue size for the currently selected queue - Write Only */
+#define VIRTIO_MDEV_QUEUE_NUM  0x038
+
+/* Ready bit for the currently selected queue - Read Write */
+#define VIRTIO_MDEV_QUEUE_READY0x044

Is this same as started?


Do you mean "status"?

I really meant "enabled", didn't remember the correct name.
As in:  VIRTIO_PCI_COMMON_Q_ENABLE



Yes, it's the same.

Thanks





+
+/* Alignment of virtqueue - Read Only */
+#define VIRTIO_MDEV_QUEUE_ALIGN0x048
+
+/* Queue notifier - Write Only */
+#define VIRTIO_MDEV_QUEUE_NOTIFY   0x050
+
+/* Device status register - Read Write */
+#define VIRTIO_MDEV_STATUS 0x060
+
+/* Selected queue's Descriptor Table address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_DESC_LOW 0x080
+#define VIRTIO_MDEV_QUEUE_DESC_HIGH0x084
+
+/* Selected queue's Available Ring address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_AVAIL_LOW0x090
+#define VIRTIO_MDEV_QUEUE_AVAIL_HIGH   0x094
+
+/* Selected queue's Used Ring address, 64 bits in two halves */
+#define VIRTIO_MDEV_QUEUE_USED_LOW 0x0a0
+#define VIRTIO_MDEV_QUEUE_USED_HIGH0x0a4
+
+/* Configuration atomicity value */
+#define VIRTIO_MDEV_CONFIG_GENERATION  0x0fc
+
+/* The config space is defined by each driver as
+ * the per-driver configuration space - Read Write */
+#define VIRTIO_MDEV_CONFIG 0x100

Mixing device and generic config space is what virtio pci did,
caused lots of problems with extensions.
It would be better to reserve much more space.


I see, will do this.

Thanks





+
+#endif
+
+
+/* Ready bit for the currently selected queue - Read Write */
--
2.19.1

Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc for the ARM SMC/HVC mailbox

2019-09-10 Thread Jassi Brar

On Mon, Sep 9, 2019 at 8:32 AM Andre Przywara  wrote:
>
> On Fri, 30 Aug 2019 03:12:29 -0500
> Jassi Brar  wrote:
>
> Hi,
>
> > On Fri, Aug 30, 2019 at 3:07 AM Peng Fan  wrote:
> > >
> > > > Subject: Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc for 
> > > > the ARM
> > > > SMC/HVC mailbox
> > > >
> > > > On Fri, Aug 30, 2019 at 2:37 AM Peng Fan  wrote:
> > > > >
> > > > > Hi Jassi,
> > > > >
> > > > > > Subject: Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc
> > > > > > for the ARM SMC/HVC mailbox
> > > > > >
> > > > > > On Fri, Aug 30, 2019 at 1:28 AM Peng Fan  wrote:
> > > > > >
> > > > > > > > > +examples:
> > > > > > > > > +  - |
> > > > > > > > > +sram@91 {
> > > > > > > > > +  compatible = "mmio-sram";
> > > > > > > > > +  reg = <0x0 0x93f000 0x0 0x1000>;
> > > > > > > > > +  #address-cells = <1>;
> > > > > > > > > +  #size-cells = <1>;
> > > > > > > > > +  ranges = <0 0x0 0x93f000 0x1000>;
> > > > > > > > > +
> > > > > > > > > +  cpu_scp_lpri: scp-shmem@0 {
> > > > > > > > > +compatible = "arm,scmi-shmem";
> > > > > > > > > +reg = <0x0 0x200>;
> > > > > > > > > +  };
> > > > > > > > > +
> > > > > > > > > +  cpu_scp_hpri: scp-shmem@200 {
> > > > > > > > > +compatible = "arm,scmi-shmem";
> > > > > > > > > +reg = <0x200 0x200>;
> > > > > > > > > +  };
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +firmware {
> > > > > > > > > +  smc_mbox: mailbox {
> > > > > > > > > +#mbox-cells = <1>;
> > > > > > > > > +compatible = "arm,smc-mbox";
> > > > > > > > > +method = "smc";
> > > > > > > > > +arm,num-chans = <0x2>;
> > > > > > > > > +transports = "mem";
> > > > > > > > > +/* Optional */
> > > > > > > > > +arm,func-ids = <0xc2fe>, <0xc2ff>;
> > > > > > > > >
> > > > > > > > SMC/HVC is synchronously(block) running in "secure mode", i.e,
> > > > > > > > there can only be one instance running platform wide. Right?
> > > > > > >
> > > > > > > I think there could be channel for TEE, and channel for Linux.
> > > > > > > For virtualization case, there could be dedicated channel for 
> > > > > > > each VM.
> > > > > > >
> > > > > > I am talking from Linux pov. Functions 0xfe and 0xff above, can't
> > > > > > both be active at the same time, right?
> > > > >
> > > > > If I get your point correctly,
> > > > > On UP, both could not be active. On SMP, tx/rx could be both active,
> > > > > anyway this depends on secure firmware and Linux firmware design.
> > > > >
> > > > > Do you have any suggestions about arm,func-ids here?
> > > > >
> > > > I was thinking if this is just an instruction, why can't each channel be
> > > > represented as a controller, i.e, have exactly one func-id per 
> > > > controller node.
> > > > Define as many controllers as you need channels ?
> > >
> > > I am ok, this could make driver code simpler. Something as below?
> > >
> > > smc_tx_mbox: tx_mbox {
> > >   #mbox-cells = <0>;
> > >   compatible = "arm,smc-mbox";
> > >   method = "smc";
> > >   transports = "mem";
> > >   arm,func-id = <0xc2fe>;
> > > };
> > >
> > > smc_rx_mbox: rx_mbox {
> > >   #mbox-cells = <0>;
> > >   compatible = "arm,smc-mbox";
> > >   method = "smc";
> > >   transports = "mem";
> > >   arm,func-id = <0xc2ff>;
> > > };
> > >
> > > firmware {
> > >   scmi {
> > > compatible = "arm,scmi";
> > > mboxes = <_tx_mbox>, <_rx_mbox 1>;
> > > mbox-names = "tx", "rx";
> > > shmem = <_scp_lpri>, <_scp_hpri>;
> > >   };
> > > };
> > >
> > Yes, the channel part is good.
> > But I am not convinced by the need to have SCMI specific "transport" mode.
>
> Why would this be SCMI specific and what is the problem with having this 
> property?
> By the very nature of the SMC/HVC call you would expect to also pass 
> parameters in registers.
> However this limits the amount of data you can push, so the option of 
> reverting to a
> memory based payload sounds very reasonable.
>
Of course, it is very legit to pass data via mem and many platforms do
that. But as you note in your next post, the 'transport' doesn't seem
necessary doing what it does in the driver.

Cheers!

Re: [PATCH 00/13] hisi_sas: Some misc patches

2019-09-10 Thread Martin K. Petersen



John,

> This patchset includes support for some more minor features, a bit of
> tidying, and a few patches to make the driver a bit more robust.

Applied to 5.4/scsi-queue, thanks.

-- 
Martin K. Petersen  Oracle Linux Engineering

Zdravstvujte! Vas interesujut klientskie bazy dannyh?

2019-09-10 Thread 128128linux-kernel-digest

Zdravstvujte! Vas interesujut klientskie bazy dannyh?

RE: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc for the ARM SMC/HVC mailbox

2019-09-10 Thread Peng Fan

> Subject: Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc for the ARM
> SMC/HVC mailbox
> 
> On Fri, 30 Aug 2019 03:12:29 -0500
> Jassi Brar  wrote:
> 
> Hi,
> 
> > On Fri, Aug 30, 2019 at 3:07 AM Peng Fan  wrote:
> > >
> > > > Subject: Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding doc
> > > > for the ARM SMC/HVC mailbox
> > > >
> > > > On Fri, Aug 30, 2019 at 2:37 AM Peng Fan  wrote:
> > > > >
> > > > > Hi Jassi,
> > > > >
> > > > > > Subject: Re: [PATCH v5 1/2] dt-bindings: mailbox: add binding
> > > > > > doc for the ARM SMC/HVC mailbox
> > > > > >
> > > > > > On Fri, Aug 30, 2019 at 1:28 AM Peng Fan 
> wrote:
> > > > > >
> > > > > > > > > +examples:
> > > > > > > > > +  - |
> > > > > > > > > +sram@91 {
> > > > > > > > > +  compatible = "mmio-sram";
> > > > > > > > > +  reg = <0x0 0x93f000 0x0 0x1000>;
> > > > > > > > > +  #address-cells = <1>;
> > > > > > > > > +  #size-cells = <1>;
> > > > > > > > > +  ranges = <0 0x0 0x93f000 0x1000>;
> > > > > > > > > +
> > > > > > > > > +  cpu_scp_lpri: scp-shmem@0 {
> > > > > > > > > +compatible = "arm,scmi-shmem";
> > > > > > > > > +reg = <0x0 0x200>;
> > > > > > > > > +  };
> > > > > > > > > +
> > > > > > > > > +  cpu_scp_hpri: scp-shmem@200 {
> > > > > > > > > +compatible = "arm,scmi-shmem";
> > > > > > > > > +reg = <0x200 0x200>;
> > > > > > > > > +  };
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +firmware {
> > > > > > > > > +  smc_mbox: mailbox {
> > > > > > > > > +#mbox-cells = <1>;
> > > > > > > > > +compatible = "arm,smc-mbox";
> > > > > > > > > +method = "smc";
> > > > > > > > > +arm,num-chans = <0x2>;
> > > > > > > > > +transports = "mem";
> > > > > > > > > +/* Optional */
> > > > > > > > > +arm,func-ids = <0xc2fe>, <0xc2ff>;
> > > > > > > > >
> > > > > > > > SMC/HVC is synchronously(block) running in "secure mode",
> > > > > > > > i.e, there can only be one instance running platform wide. 
> > > > > > > > Right?
> > > > > > >
> > > > > > > I think there could be channel for TEE, and channel for Linux.
> > > > > > > For virtualization case, there could be dedicated channel for each
> VM.
> > > > > > >
> > > > > > I am talking from Linux pov. Functions 0xfe and 0xff above,
> > > > > > can't both be active at the same time, right?
> > > > >
> > > > > If I get your point correctly,
> > > > > On UP, both could not be active. On SMP, tx/rx could be both
> > > > > active, anyway this depends on secure firmware and Linux firmware
> design.
> > > > >
> > > > > Do you have any suggestions about arm,func-ids here?
> > > > >
> > > > I was thinking if this is just an instruction, why can't each
> > > > channel be represented as a controller, i.e, have exactly one func-id 
> > > > per
> controller node.
> > > > Define as many controllers as you need channels ?
> > >
> > > I am ok, this could make driver code simpler. Something as below?
> > >
> > > smc_tx_mbox: tx_mbox {
> > >   #mbox-cells = <0>;
> > >   compatible = "arm,smc-mbox";
> > >   method = "smc";
> > >   transports = "mem";
> > >   arm,func-id = <0xc2fe>;
> > > };
> > >
> > > smc_rx_mbox: rx_mbox {
> > >   #mbox-cells = <0>;
> > >   compatible = "arm,smc-mbox";
> > >   method = "smc";
> > >   transports = "mem";
> > >   arm,func-id = <0xc2ff>;
> > > };
> > >
> > > firmware {
> > >   scmi {
> > > compatible = "arm,scmi";
> > > mboxes = <_tx_mbox>, <_rx_mbox 1>;
> > > mbox-names = "tx", "rx";
> > > shmem = <_scp_lpri>, <_scp_hpri>;
> > >   };
> > > };
> > >
> > Yes, the channel part is good.
> > But I am not convinced by the need to have SCMI specific "transport" mode.
> 
> Why would this be SCMI specific and what is the problem with having this
> property?
> By the very nature of the SMC/HVC call you would expect to also pass
> parameters in registers. However this limits the amount of data you can push,
> so the option of reverting to a memory based payload sounds very
> reasonable.
> On the other hand *just* using memory complicates things, in case you have a
> very simple protocol. You would need a memory region shared between
> firmware and OS, which is not always easily possible on every platform. Also
> this doesn't scale easily with multiple mailboxes and channels. Passing
> parameters via registers is also naturally consistent, as there would be no
> races and no need for synchronisation with other cores or other users of the
> mailbox.
> 
> So I clearly see the benefit of specifying *both* ways of payload transport.
> Given that this driver should be protocol agnostic, it makes a lot of sense to
> introduce both methods *now*, so in the future users can just use the register
> method, without extending the binding in a incompatible way later (earlier
> kernels would have the driver, but wouldn't know how

[PATCH 1/2] arm64: dts: imx8mm: Remove incorrect fallback compatible for ocotp

2019-09-10 Thread Anson Huang

Compared to i.MX7D, i.MX8MM has different ocotp layout, so it should
NOT use "fsl,imx7d-ocotp" as ocotp's fallback compatible, remove it.

Signed-off-by: Anson Huang 
---
 arch/arm64/boot/dts/freescale/imx8mm.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/freescale/imx8mm.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mm.dtsi
index 5f9d0da..7c4dcce 100644
--- a/arch/arm64/boot/dts/freescale/imx8mm.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mm.dtsi
@@ -426,7 +426,7 @@
};
 
ocotp: ocotp-ctrl@3035 {
-   compatible = "fsl,imx8mm-ocotp", 
"fsl,imx7d-ocotp", "syscon";
+   compatible = "fsl,imx8mm-ocotp", "syscon";
reg = <0x3035 0x1>;
clocks = < IMX8MM_CLK_OCOTP_ROOT>;
/* For nvmem subnodes */
-- 
2.7.4

[PATCH 2/2] arm64: dts: imx8mn: Use "fsl,imx8mm-ocotp" as ocotp's fallback compatible

2019-09-10 Thread Anson Huang

Use "fsl,imx8mm-ocotp" as i.MX8MN ocotp's fallback compatible instead
of "fsl,imx7d-ocotp" to support SoC UID read, as i.MX8MN reuses
i.MX8MM's SoC ID driver.

Signed-off-by: Anson Huang 
---
 arch/arm64/boot/dts/freescale/imx8mn.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/freescale/imx8mn.dtsi 
b/arch/arm64/boot/dts/freescale/imx8mn.dtsi
index e4efe8d..6cb6c9c 100644
--- a/arch/arm64/boot/dts/freescale/imx8mn.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8mn.dtsi
@@ -337,7 +337,7 @@
};
 
ocotp: ocotp-ctrl@3035 {
-   compatible = "fsl,imx8mn-ocotp", 
"fsl,imx7d-ocotp", "syscon";
+   compatible = "fsl,imx8mn-ocotp", 
"fsl,imx8mm-ocotp", "syscon";
reg = <0x3035 0x1>;
clocks = < IMX8MN_CLK_OCOTP_ROOT>;
#address-cells = <1>;
-- 
2.7.4

Re: [PATCH v2] scsi: virtio_scsi: unplug LUNs when events missed

2019-09-10 Thread Martin K. Petersen



Matt,

> The event handler calls scsi_scan_host() when events are missed, which
> will hotplug new LUNs.  However, this function won't remove any
> unplugged LUNs.  The result is that hotunplug doesn't work properly
> when the number of unplugged LUNs exceeds the event queue size
> (currently 8).
>
> Scan existing LUNs when events are missed to check if they are still
> present.  If not, remove them.

Applied to 5.4/scsi-queue, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

NOTICE!!

2019-09-10 Thread M K




Receive $39 Million for our mutual benefit.

Re: [PATCH net-next 1/7] net: hns3: add ethtool_ops.set_channels support for HNS3 VF driver

2019-09-10 Thread tanhuazhong





On 2019/9/11 1:25, David Miller wrote:

From: Huazhong Tan 
Date: Tue, 10 Sep 2019 16:58:22 +0800


+   /* Set to user value, no larger than max_rss_size. */
+   if (kinfo->req_rss_size != kinfo->rss_size && kinfo->req_rss_size &&
+   kinfo->req_rss_size <= max_rss_size) {
+   dev_info(>pdev->dev, "rss changes from %u to %u\n",
+kinfo->rss_size, kinfo->req_rss_size);
+   kinfo->rss_size = kinfo->req_rss_size;


Please do not emit kernel log messages for normal operations.



Will remove this log in V2.
Thanks.


.

RE: [EXT] Re: [V4 2/2] dmaengine: fsl-dpaa2-qdma: Add NXP dpaa2 qDMA controller driver for Layerscape SoCs

2019-09-10 Thread Peng Ma

Hi Vinod,

I send those series patchs(V5) on June 25, 2019. I haven't received any 
comments yet. Their current state
is "Not Applicable", so please let me know what I need to do next.
Thanks very much for your comments.

Patch link:
https://patchwork.kernel.org/patch/11015035/
https://patchwork.kernel.org/patch/11015033/

Best Regards,
Peng
>-Original Message-
>From: Vinod Koul 
>Sent: 2019年6月25日 0:46
>To: Peng Ma 
>Cc: dan.j.willi...@intel.com; Leo Li ;
>linux-kernel@vger.kernel.org; dmaeng...@vger.kernel.org
>Subject: [EXT] Re: [V4 2/2] dmaengine: fsl-dpaa2-qdma: Add NXP dpaa2 qDMA
>controller driver for Layerscape SoCs
>
>Caution: EXT Email
>
>On 13-06-19, 10:13, Peng Ma wrote:
>> DPPA2(Data Path Acceleration Architecture 2) qDMA supports channel
>> virtualization by allowing DMA
>
>typo virtualization
>
>> jobs to be enqueued into different frame queues.
>> Core can initiate a DMA transaction by preparing a frame
>> descriptor(FD) for each DMA job and enqueuing this job to a frame
>> queue. through a hardware portal. The qDMA
>  ^^^
>why this full stop?
>
>> +static struct dpaa2_qdma_comp *
>> +dpaa2_qdma_request_desc(struct dpaa2_qdma_chan *dpaa2_chan) {
>> + struct dpaa2_qdma_comp *comp_temp = NULL;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(_chan->queue_lock, flags);
>> + if (list_empty(_chan->comp_free)) {
>> + spin_unlock_irqrestore(_chan->queue_lock, flags);
>> + comp_temp = kzalloc(sizeof(*comp_temp), GFP_NOWAIT);
>> + if (!comp_temp)
>> + goto err;
>> + comp_temp->fd_virt_addr =
>> + dma_pool_alloc(dpaa2_chan->fd_pool,
>GFP_NOWAIT,
>> +_temp->fd_bus_addr);
>> + if (!comp_temp->fd_virt_addr)
>> + goto err_comp;
>> +
>> + comp_temp->fl_virt_addr =
>> + dma_pool_alloc(dpaa2_chan->fl_pool,
>GFP_NOWAIT,
>> +_temp->fl_bus_addr);
>> + if (!comp_temp->fl_virt_addr)
>> + goto err_fd_virt;
>> +
>> + comp_temp->desc_virt_addr =
>> + dma_pool_alloc(dpaa2_chan->sdd_pool,
>GFP_NOWAIT,
>> +
>_temp->desc_bus_addr);
>> + if (!comp_temp->desc_virt_addr)
>> + goto err_fl_virt;
>> +
>> + comp_temp->qchan = dpaa2_chan;
>> + return comp_temp;
>> + }
>> +
>> + comp_temp = list_first_entry(_chan->comp_free,
>> +  struct dpaa2_qdma_comp, list);
>> + list_del(_temp->list);
>> + spin_unlock_irqrestore(_chan->queue_lock, flags);
>> +
>> + comp_temp->qchan = dpaa2_chan;
>> +
>> + return comp_temp;
>> +
>> +err_fl_virt:
>
>no err logs? how will you know what went wrong?
>
>> +static enum
>> +dma_status dpaa2_qdma_tx_status(struct dma_chan *chan,
>> + dma_cookie_t cookie,
>> + struct dma_tx_state *txstate) {
>> + return dma_cookie_status(chan, cookie, txstate);
>
>why not set dma_cookie_status as this callback?
>
>> +static int __cold dpaa2_qdma_setup(struct fsl_mc_device *ls_dev) {
>> + struct dpaa2_qdma_priv_per_prio *ppriv;
>> + struct device *dev = _dev->dev;
>> + struct dpaa2_qdma_priv *priv;
>> + u8 prio_def = DPDMAI_PRIO_NUM;
>> + int err = -EINVAL;
>> + int i;
>> +
>> + priv = dev_get_drvdata(dev);
>> +
>> + priv->dev = dev;
>> + priv->dpqdma_id = ls_dev->obj_desc.id;
>> +
>> + /* Get the handle for the DPDMAI this interface is associate with */
>> + err = dpdmai_open(priv->mc_io, 0, priv->dpqdma_id,
>_dev->mc_handle);
>> + if (err) {
>> + dev_err(dev, "dpdmai_open() failed\n");
>> + return err;
>> + }
>> + dev_info(dev, "Opened dpdmai object successfully\n");
>
>this is noise in kernel, consider debug level
>
>> +static int __cold dpaa2_dpdmai_bind(struct dpaa2_qdma_priv *priv) {
>> + int err;
>> + int i, num;
>> + struct device *dev = priv->dev;
>> + struct dpaa2_qdma_priv_per_prio *ppriv;
>> + struct dpdmai_rx_queue_cfg rx_queue_cfg;
>> + struct fsl_mc_device *ls_dev = to_fsl_mc_device(dev);
>
>the order is reverse than used in other fn, please stick to one style!
>--
>~Vinod

Re: [RFC PATCH 3/4] virtio: introudce a mdev based transport

2019-09-10 Thread Tiwei Bie

On Tue, Sep 10, 2019 at 04:19:34PM +0800, Jason Wang wrote:
> This path introduces a new mdev transport for virtio. This is used to
> use kernel virtio driver to drive the mediated device that is capable
> of populating virtqueue directly.
> 
> A new virtio-mdev driver will be registered to the mdev bus, when a
> new virtio-mdev device is probed, it will register the device with
> mdev based config ops. This means, unlike the exist hardware
> transport, this is a software transport between mdev driver and mdev
> device. The transport was implemented through:
> 
> - configuration access was implemented through parent_ops->read()/write()
> - vq/config callback was implemented through parent_ops->ioctl()
> 
> This transport is derived from virtio MMIO protocol and was wrote for
> kernel driver. But for the transport itself, but the design goal is to
> be generic enough to support userspace driver (this part will be added
> in the future).
> 
> Note:
> - current mdev assume all the parameter of parent_ops was from
>   userspace. This prevents us from implementing the kernel mdev
>   driver. For a quick POC, this patch just abuse those parameter and
>   assume the mdev device implementation will treat them as kernel
>   pointer. This should be addressed in the formal series by extending
>   mdev_parent_ops.
> - for a quick POC, I just drive the transport from MMIO, I'm pretty
>   there's lot of optimization space for this.
> 
> Signed-off-by: Jason Wang 
> ---
>  drivers/vfio/mdev/Kconfig|   7 +
>  drivers/vfio/mdev/Makefile   |   1 +
>  drivers/vfio/mdev/virtio_mdev.c  | 500 +++
>  include/uapi/linux/virtio_mdev.h | 131 
>  4 files changed, 639 insertions(+)
>  create mode 100644 drivers/vfio/mdev/virtio_mdev.c
>  create mode 100644 include/uapi/linux/virtio_mdev.h
> 
[...]
> diff --git a/include/uapi/linux/virtio_mdev.h 
> b/include/uapi/linux/virtio_mdev.h
> new file mode 100644
> index ..8040de6b960a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_mdev.h
> @@ -0,0 +1,131 @@
> +/*
> + * Virtio mediated device driver
> + *
> + * Copyright 2019, Red Hat Corp.
> + *
> + * Based on Virtio MMIO driver by ARM Ltd, copyright ARM Ltd. 2011
> + *
> + * This header is BSD licensed so anyone can use the definitions to implement
> + * compatible drivers/servers.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *notice, this list of conditions and the following disclaimer in the
> + *documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *may be used to endorse or promote products derived from this software
> + *without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS 
> IS'' AND
> + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + */
> +#ifndef _LINUX_VIRTIO_MDEV_H
> +#define _LINUX_VIRTIO_MDEV_H
> +
> +#include 
> +#include 
> +#include 
> +
> +/*
> + * Ioctls
> + */
> +
> +struct virtio_mdev_callback {
> + irqreturn_t (*callback)(void *);
> + void *private;
> +};
> +
> +#define VIRTIO_MDEV 0xAF
> +#define VIRTIO_MDEV_SET_VQ_CALLBACK _IOW(VIRTIO_MDEV, 0x00, \
> +  struct virtio_mdev_callback)
> +#define VIRTIO_MDEV_SET_CONFIG_CALLBACK _IOW(VIRTIO_MDEV, 0x01, \
> + struct virtio_mdev_callback)
> +
> +#define VIRTIO_MDEV_DEVICE_API_STRING"virtio-mdev"
> +
> +/*
> + * Control registers
> + */
> +
> +/* Magic value ("virt" string) - Read Only */
> +#define VIRTIO_MDEV_MAGIC_VALUE  0x000
> +
> +/* Virtio device version - Read Only */
> +#define VIRTIO_MDEV_VERSION  0x004
> +
> +/* Virtio device ID - Read Only */
> +#define VIRTIO_MDEV_DEVICE_ID0x008
> +
> +/* Virtio vendor ID - Read Only */
> +#define VIRTIO_MDEV_VENDOR_ID0x00c
> +
> +/* Bitmask of the features

Re: [PATCH 0/3] rtlwifi: use generic rtl_evm_db_to_percentage

2019-09-10 Thread Pkshih

On Tue, 2019-09-10 at 21:04 +0200, Michael Straube wrote:
> Functions _rtl92{c,d}_evm_db_to_percentage are functionally identical
> to the generic version rtl_evm_db_to percentage. This series converts
> rtl8192ce, rtl8192cu and rtl8192de to use the generic version.
> 
> Michael Straube (3):
>   rtlwifi: rtl8192ce: replace _rtl92c_evm_db_to_percentage with generic
> version
>   rtlwifi: rtl8192cu: replace _rtl92c_evm_db_to_percentage with generic
> version
>   rtlwifi: rtl8192de: replace _rtl92d_evm_db_to_percentage with generic
> version
> 
>  .../wireless/realtek/rtlwifi/rtl8192ce/trx.c  | 23 +--
>  .../wireless/realtek/rtlwifi/rtl8192cu/mac.c  | 18 +--
>  .../wireless/realtek/rtlwifi/rtl8192de/trx.c  | 18 ++-
>  3 files changed, 4 insertions(+), 55 deletions(-)
> 

I checked the generic version and removed functions, and they are indeed
identical. Thanks for your patches.

Acked-by: Ping-Ke Shih

Re: [PATCH net 1/2] sctp: remove redundant assignment when call sctp_get_port_local

2019-09-10 Thread maowenan




On 2019/9/11 3:22, Dan Carpenter wrote:
> On Tue, Sep 10, 2019 at 09:57:10PM +0300, Dan Carpenter wrote:
>> On Tue, Sep 10, 2019 at 03:13:42PM +0800, Mao Wenan wrote:
>>> There are more parentheses in if clause when call sctp_get_port_local
>>> in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
>>> do cleanup.
>>>
>>> Signed-off-by: Mao Wenan 
>>> ---
>>>  net/sctp/socket.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>> index 9d1f83b10c0a..766b68b55ebe 100644
>>> --- a/net/sctp/socket.c
>>> +++ b/net/sctp/socket.c
>>> @@ -399,9 +399,8 @@ static int sctp_do_bind(struct sock *sk, union 
>>> sctp_addr *addr, int len)
>>>  * detection.
>>>  */
>>> addr->v4.sin_port = htons(snum);
>>> -   if ((ret = sctp_get_port_local(sk, addr))) {
>>> +   if (sctp_get_port_local(sk, addr))
>>> return -EADDRINUSE;
>>
>> sctp_get_port_local() returns a long which is either 0,1 or a pointer
>> casted to long.  It's not documented what it means and neither of the
>> callers use the return since commit 62208f12451f ("net: sctp: simplify
>> sctp_get_port").
> 
> Actually it was commit 4e54064e0a13 ("sctp: Allow only 1 listening
> socket with SO_REUSEADDR") from 11 years ago.  That patch fixed a bug,
> because before the code assumed that a pointer casted to an int was the
> same as a pointer casted to a long.

commit 4e54064e0a13 treated non-zero return value as unexpected, so the current
cleanup is ok?

> 
> regards,
> dan carpenter
> 
> 
> .
>

[PATCH v2 net] net: sonic: replace dev_kfree_skb in sonic_send_packet

2019-09-10 Thread Mao Wenan

sonic_send_packet will be processed in irq or non-irq 
context, so it would better use dev_kfree_skb_any
instead of dev_kfree_skb.

Fixes: d9fb9f384292 ("*sonic/natsemi/ns83829: Move the National Semi-conductor 
drivers")
Signed-off-by: Mao Wenan 
---
 v2: change 'none irq' to 'non-irq'.
 drivers/net/ethernet/natsemi/sonic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/natsemi/sonic.c 
b/drivers/net/ethernet/natsemi/sonic.c
index 18fd62fbfb64..b339125b2f09 100644
--- a/drivers/net/ethernet/natsemi/sonic.c
+++ b/drivers/net/ethernet/natsemi/sonic.c
@@ -233,7 +233,7 @@ static int sonic_send_packet(struct sk_buff *skb, struct 
net_device *dev)
laddr = dma_map_single(lp->device, skb->data, length, DMA_TO_DEVICE);
if (!laddr) {
pr_err_ratelimited("%s: failed to map tx DMA buffer.\n", 
dev->name);
-   dev_kfree_skb(skb);
+   dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
 
-- 
2.20.1

Re: [Xen-devel] [PATCH] xen/pci: try to reserve MCFG areas earlier

2019-09-10 Thread Igor Druzhinin

On 10/09/2019 22:19, Boris Ostrovsky wrote:
> On 9/10/19 4:36 PM, Igor Druzhinin wrote:
>> On 10/09/2019 18:48, Boris Ostrovsky wrote:
>>> On 9/10/19 5:46 AM, Igor Druzhinin wrote:
 On 10/09/2019 02:47, Boris Ostrovsky wrote:
> On 9/9/19 5:48 PM, Igor Druzhinin wrote:
>> On 09/09/2019 20:19, Boris Ostrovsky wrote:
>>
>>> The other question I have is why you think it's worth keeping
>>> xen_mcfg_late() as a late initcall. How could MCFG info be updated
>>> between acpi_init() and late_initcalls being run? I'd think it can only
>>> happen when a new device is hotplugged.
>>>
>> It was a precaution against setup_mcfg_map() calls that might add new
>> areas that are not in MCFG table but for some reason have _CBA method.
>> It's obviously a "firmware is broken" scenario so I don't have strong
>> feelings to keep it here. Will prefer to remove in v2 if you want.
> Isn't setup_mcfg_map() called before the first xen_add_device() which is 
> where you are calling xen_mcfg_late()?
>
 setup_mcfg_map() calls are done in order of root bus discovery which
 happens *after* the previous root bus has been enumerated. So the order
 is: call setup_mcfg_map() for root bus 0, find that
 pci_mmcfg_late_init() has finished MCFG area registration, perform PCI
 enumeration of bus 0, call xen_add_device() for every device there, call
 setup_mcfg_map() for root bus X, etc.
>>> Ah, yes. Multiple busses.
>>>
>>> If that's the case then why don't we need to call xen_mcfg_late() for
>>> the first device on each bus?
>>>
>> Ideally, yes - we'd like to call it for every bus discovered. But boot
>> time buses are already in MCFG (otherwise system boot might not simply
>> work as Jan pointed out) so it's not strictly required. The only case is
>> a potential PCI bus hot-plug but I'm not sure it actually works in
>> practice and we certainly didn't support it before. It might be solved
>> theoretically by subscribing to acpi_bus_type that is available after
>> acpi_init().
> 
> OK. Then *I think* we can drop late_initcall() but I would really like
> to hear when others think.
> 

Another thing that I implied by "not supporting" but want to explicitly
call out is that currently Xen will refuse reserving any MCFG area
unless it actually existed in MCFG table at boot. I don't clearly
understand reasoning behind it but it might be worth relaxing at least
size matching restriction on Xen side now with this change.

Igor

Re: page_alloc.shuffle=1 + CONFIG_PROVE_LOCKING=y = arm64 hang

2019-09-10 Thread Sergey Senozhatsky

Cc-ing Ted, Arnd, Greg

On (09/10/19 11:22), Qian Cai wrote:
> [ 1078.283869][T43784] -> #3 (&(>lock)->rlock){-.-.}:
> [ 1078.291350][T43784]__lock_acquire+0x5c8/0xbb0
> [ 1078.296394][T43784]lock_acquire+0x154/0x428
> [ 1078.301266][T43784]_raw_spin_lock_irqsave+0x80/0xa0
> [ 1078.306831][T43784]tty_port_tty_get+0x28/0x68
> [ 1078.311873][T43784]tty_port_default_wakeup+0x20/0x40
> [ 1078.317523][T43784]tty_port_tty_wakeup+0x38/0x48
> [ 1078.322827][T43784]uart_write_wakeup+0x2c/0x50
> [ 1078.327956][T43784]pl011_tx_chars+0x240/0x260
> [ 1078.332999][T43784]pl011_start_tx+0x24/0xa8
> [ 1078.337868][T43784]__uart_start+0x90/0xa0
> [ 1078.342563][T43784]uart_write+0x15c/0x2c8
> [ 1078.347261][T43784]do_output_char+0x1c8/0x2b0
> [ 1078.352304][T43784]n_tty_write+0x300/0x668
> [ 1078.357087][T43784]tty_write+0x2e8/0x430
> [ 1078.361696][T43784]redirected_tty_write+0xcc/0xe8
> [ 1078.367086][T43784]do_iter_write+0x228/0x270
> [ 1078.372041][T43784]vfs_writev+0x10c/0x1c8
> [ 1078.376735][T43784]do_writev+0xdc/0x180
> [ 1078.381257][T43784]__arm64_sys_writev+0x50/0x60
> [ 1078.386476][T43784]el0_svc_handler+0x11c/0x1f0
> [ 1078.391606][T43784]el0_svc+0x8/0xc
> [ 1078.395691][T43784] 

uart_port->lock  ->  tty_port->lock

This thing along is already a bit suspicious. We re-enter tty
here: tty -> uart -> serial -> tty

And we re-enter tty under uart_port->lock.

> [ 1078.395691][T43784] -> #2 (_lock_key){-.-.}:
> [ 1078.402561][T43784]__lock_acquire+0x5c8/0xbb0
> [ 1078.407604][T43784]lock_acquire+0x154/0x428
> [ 1078.412474][T43784]_raw_spin_lock+0x68/0x88
> [ 1078.417343][T43784]pl011_console_write+0x2ac/0x318
> [ 1078.422820][T43784]console_unlock+0x3c4/0x898
> [ 1078.427863][T43784]vprintk_emit+0x2d4/0x460
> [ 1078.432732][T43784]vprintk_default+0x48/0x58
> [ 1078.437688][T43784]vprintk_func+0x194/0x250
> [ 1078.442557][T43784]printk+0xbc/0xec
> [ 1078.446732][T43784]register_console+0x4a8/0x580
> [ 1078.451947][T43784]uart_add_one_port+0x748/0x878
> [ 1078.457250][T43784]pl011_register_port+0x98/0x128
> [ 1078.462639][T43784]sbsa_uart_probe+0x398/0x480
> [ 1078.467772][T43784]platform_drv_probe+0x70/0x108
> [ 1078.473075][T43784]really_probe+0x15c/0x5d8
> [ 1078.477944][T43784]driver_probe_device+0x94/0x1d0
> [ 1078.483335][T43784]__device_attach_driver+0x11c/0x1a8
> [ 1078.489072][T43784]bus_for_each_drv+0xf8/0x158
> [ 1078.494201][T43784]__device_attach+0x164/0x240
> [ 1078.499331][T43784]device_initial_probe+0x24/0x30
> [ 1078.504721][T43784]bus_probe_device+0xf0/0x100
> [ 1078.509850][T43784]device_add+0x63c/0x960
> [ 1078.514546][T43784]platform_device_add+0x1ac/0x3b8
> [ 1078.520023][T43784]platform_device_register_full+0x1fc/0x290
> [ 1078.526373][T43784]acpi_create_platform_device.part.0+0x264/0x3a8
> [ 1078.533152][T43784]acpi_create_platform_device+0x68/0x80
> [ 1078.539150][T43784]acpi_default_enumeration+0x34/0x78
> [ 1078.544887][T43784]acpi_bus_attach+0x340/0x3b8
> [ 1078.550015][T43784]acpi_bus_attach+0xf8/0x3b8
> [ 1078.555057][T43784]acpi_bus_attach+0xf8/0x3b8
> [ 1078.560099][T43784]acpi_bus_attach+0xf8/0x3b8
> [ 1078.565142][T43784]acpi_bus_scan+0x9c/0x100
> [ 1078.570015][T43784]acpi_scan_init+0x16c/0x320
> [ 1078.575058][T43784]acpi_init+0x330/0x3b8
> [ 1078.579666][T43784]do_one_initcall+0x158/0x7ec
> [ 1078.584797][T43784]kernel_init_freeable+0x9a8/0xa70
> [ 1078.590360][T43784]kernel_init+0x18/0x138
> [ 1078.595055][T43784]ret_from_fork+0x10/0x1c
>
> [ 1078.599835][T43784] -> #1 (console_owner){-...}:
> [ 1078.606618][T43784]__lock_acquire+0x5c8/0xbb0
> [ 1078.611661][T43784]lock_acquire+0x154/0x428
> [ 1078.616530][T43784]console_unlock+0x298/0x898
> [ 1078.621573][T43784]vprintk_emit+0x2d4/0x460
> [ 1078.626442][T43784]vprintk_default+0x48/0x58
> [ 1078.631398][T43784]vprintk_func+0x194/0x250
> [ 1078.636267][T43784]printk+0xbc/0xec
> [ 1078.640443][T43784]_warn_unseeded_randomness+0xb4/0xd0
> [ 1078.646267][T43784]get_random_u64+0x4c/0x100
> [ 1078.651224][T43784]add_to_free_area_random+0x168/0x1a0
> [ 1078.657047][T43784]free_one_page+0x3dc/0xd08
> [ 1078.662003][T43784]__free_pages_ok+0x490/0xd00
> [ 1078.667132][T43784]__free_pages+0xc4/0x118
> [ 1078.671914][T43784]__free_pages_core+0x2e8/0x428
> [ 1078.677219][T43784]memblock_free_pages+0xa4/0xec
> [ 1078.682522][T43784]memblock_free_all+0x264/0x330
> [ 1078.687825][T43784]mem_init+0x90/0x148
> [ 1078.692259][T43784]

Re: [Letux-kernel] [RFC PATCH 0/3] Enable 1GHz support on omap36xx

2019-09-10 Thread Adam Ford

On Tue, Sep 10, 2019 at 7:24 PM Adam Ford  wrote:
>
> On Tue, Sep 10, 2019 at 3:06 PM Adam Ford  wrote:
> >
> > On Tue, Sep 10, 2019 at 2:55 PM H. Nikolaus Schaller  
> > wrote:
> > >
> > > Ok,
> > >
> > > > Am 10.09.2019 um 20:51 schrieb H. Nikolaus Schaller 
> > > > :
> > > >
> > >  it, but then I got some nasty errors and crashes.
> > > >>>
> > > >>> I have done the same but not (yet) seen a crash or error. Maybe you 
> > > >>> had
> > > >>> a typo?
> > > >>
> > > >> Can you send me an updated patch?  I'd like to try to get where you
> > > >> are that doesn't crash.
> > > >
> > > > Yes, as soon as I have access.
> > >
> > > it turns out that my patch breaks cpufreq completely...
> > > So it looks as if *I* have a typo :)
> > >
> > > Hence I am likely running at constant speed and the
> > > VDD1 regulator is fixed a 1.200V.
> > >
> > > root@letux:~# dmesg|fgrep opp
> > > [2.426208] cpu cpu0: opp_parse_supplies: Invalid number of elements 
> > > in opp-microvolt property (6) with supplies (1)
> > > [2.438140] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22
> > > root@letux:~# cat /sys/class/regulator/regulator.8/microvolts
> > > 120
> > > root@letux:~#
> > >
> > > The error message looks as if we have to enable multi_regulator.
> >
> > That will enable both vdd and vbb regulators from what I can tell in the 
> > driver.
> >
> > > And that may need to rename cpu0-supply to vdd-supply (unless the
> > > names can be configured).
> >
> > That is consistent with what I found.  vdd-supply = <>; and
> > vbb-supply = <_mpu_iva>;
> > I put them both under the cpu node.  Unfortunately, when I did that,
> > my board crashed.
> >
> > I am thinking it has something to do with the abb_mpu_iva driver
> > because until this point, we've always operated at 800MHz or lower
> > which all have the same behavior in abb_mpu_iva.
> >
> > With the patch you posted for the regulator, without the update to
> > cpufreq,  and with debugging enabled, I received the following in
> > dmesg:
> >
> > [1.112518] ti_abb 483072f0.regulator-abb-mpu: Missing
> > 'efuse-address' IO resource
> > [1.112579] ti_abb 483072f0.regulator-abb-mpu: [0]v=1012500 ABB=0
> > ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> > [1.112609] ti_abb 483072f0.regulator-abb-mpu: [1]v=120 ABB=0
> > ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> > [1.112609] ti_abb 483072f0.regulator-abb-mpu: [2]v=1325000 ABB=0
> > ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> > [1.112640] ti_abb 483072f0.regulator-abb-mpu: [3]v=1375000 ABB=1
> > ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> > [1.112731] ti_abb 483072f0.regulator-abb-mpu: ti_abb_init_timings:
> > Clk_rate=1300, sr2_cnt=0x0032
> >
>
> Using an unmodified kernel, I changed the device tree to make the abb
> regulator power the cpu instead of vcc.  After doing so, I was able to
> read the microvolt value, and it matched the processor's desired OPP
> voltage, and the debug code showed the abb regulator was attempting to
> set the various index based on the needed voltage.  I think the abb
> driver is working correctly.
>
> I think tomorrow, I will re-apply the patches and tweak it again to
> support both vdd and vbb regulators.  If it crashes again, I'll look
> more closely at the logs to see if I can determine why.  I think it's
> pretty close.  I also need to go back and find the SmartReflex stuff
> as well.
>
Well, I couldn't give it up for the night, so I thought I'd show my data dump

[9.807647] [ cut here ]
[9.812469] WARNING: CPU: 0 PID: 68 at drivers/opp/core.c:630
dev_pm_opp_set_rate+0x3cc/0x480
[9.821044] Modules linked in: sha256_generic sha256_arm cfg80211
joydev mousedev evdev snd_soc_omap_twl4030(+) leds_gpio led_class
panel_simple pwm_omap_dmtimer gpio_keys pwm_bl cpufreq_dt omap3_isp v
ideobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common
bq27xxx_battery_hdq v4l2_fwnode snd_soc_omap_mcbsp bq27xxx_battery
snd_soc_ti_sdma omap_wdt videodev mc omap_hdq wlcore_sdio wire cn ph
y_twl4030_usb hwmon omap2430 musb_hdrc omap_mailbox twl4030_wdt
watchdog udc_core rtc_twl snd_soc_twl4030 ohci_platform(+)
snd_soc_core snd_pcm_dmaengine ohci_hcd snd_pcm ehci_omap(+)
twl4030_pwrbutton sn
d_timer twl4030_charger snd pwm_twl_led pwm_twl ehci_hcd industrialio
soundcore twl4030_keypad matrix_keymap usbcore at24 tsc2004
tsc200x_core usb_common omap_ssi hsi omapdss omapdss_base drm
drm_panel_or
ientation_quirks cec
[9.894470] CPU: 0 PID: 68 Comm: kworker/0:2 Not tainted
5.3.0-rc3-00785-gfdfc7f21c6b7-dirty #5
[9.903198] Hardware name: Generic OMAP36xx (Flattened Device Tree)
[9.909515] Workqueue: events dbs_work_handler
[9.914031] [] (unwind_backtrace) from []
(show_stack+0x10/0x14)
[9.921813] [] (show_stack) from []
(dump_stack+0xb4/0xd4)
[9.929107] [] (dump_stack) from []
(__warn.part.3+0xa8/0xd4)
[9.936614] [] (__warn.part.3) from []
(warn_slowpath_null+0x40/0x4c)
[9.944854] [] (warn_slowpath_null) from []

[PATCH] net: qrtr: fix memort leak in qrtr_tun_write_iter

2019-09-10 Thread Navid Emamdoost

In qrtr_tun_write_iter the allocated kbuf should be release in case of
error happening.

Signed-off-by: Navid Emamdoost 
---
 net/qrtr/tun.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/qrtr/tun.c b/net/qrtr/tun.c
index ccff1e544c21..1dba8b92560e 100644
--- a/net/qrtr/tun.c
+++ b/net/qrtr/tun.c
@@ -84,12 +84,18 @@ static ssize_t qrtr_tun_write_iter(struct kiocb *iocb, 
struct iov_iter *from)
if (!kbuf)
return -ENOMEM;
 
-   if (!copy_from_iter_full(kbuf, len, from))
+   if (!copy_from_iter_full(kbuf, len, from)) {
+   kfree(kbuf);
return -EFAULT;
+   }
 
ret = qrtr_endpoint_post(>ep, kbuf, len);
+   if (ret < 0) {
+   kfree(kbuf);
+   return ret;
+   }
 
-   return ret < 0 ? ret : len;
+   return len;
 }
 
 static __poll_t qrtr_tun_poll(struct file *filp, poll_table *wait)
-- 
2.17.1

Re: [RFC PATCH 2/4] rseq: Fix: Unregister rseq for CLONE_TLS

2019-09-10 Thread Mathieu Desnoyers

Of course, this patch title should read:

  rseq: Fix: Unregister rseq for CLONE_SETTLS

- On Sep 11, 2019, at 1:27 AM, Mathieu Desnoyers 
mathieu.desnoy...@efficios.com wrote:

 
> /*
>  * If parent process has a registered restartable sequences area, the
> - * child inherits. Only applies when forking a process, not a thread.
> + * child inherits. Unregister rseq for a clone with CLONE_TLS set.

and here CLONE_SETTLS as well.

>  */
> static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
> {
> - if (clone_flags & CLONE_THREAD) {
> + if (clone_flags & CLONE_TLS) {

.. and here.

Thanks,

Mathieu

>   t->rseq = NULL;
>   t->rseq_sig = 0;
>   t->rseq_event_mask = 0;
> --
> 2.17.1

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.

2019-09-10 Thread Kirill A. Shutemov

On Tue, Sep 10, 2019 at 09:28:10AM -0500, Steve Wahl wrote:
> On Mon, Sep 09, 2019 at 11:14:14AM +0300, Kirill A. Shutemov wrote:
> > On Fri, Sep 06, 2019 at 04:29:50PM -0500, Steve Wahl wrote:
> > > ...
> > > The answer is to invalidate the pages of this table outside the
> > > address range occupied by the kernel before the page table is
> > > activated.  This patch has been validated to fix this problem on our
> > > hardware.
> > 
> > If the goal is to avoid *any* mapping of the reserved region to stop
> > speculation, I don't think this patch will do the job. We still (likely)
> > have the same memory mapped as part of the identity mapping. And it
> > happens at least in two places: here and before on decompression stage.
> 
> I imagine you are likely correct, ideally you would not map any
> reserved pages in these spaces.
> 
> I've been reading the code to try to understand what you say above.
> For identity mappings in the kernel, I see level2_ident_pgt mapping
> the first 1G.

This is for XEN case. Not sure how relevant it is for you.

> And I see early_dyanmic_pgts being set up with an identity mapping of
> the kernel that seems to be pretty well restricted to the range _text
> through _end.

Right, but rounded to 2M around the place kernel was decompressed to.
Some of reserved areas from the listing below are smaller then 2M or not
aligned to 2M.

> Within the decompression code, I see an identity mapping of the first
> 4G set up within the 32 bit code.  I believe we go past that to the
> startup_64 entry point.  (I don't know how common that path is, but I
> don't have a way to test it without figuring out how to force it.)

Kernel can start in 64-bit mode directly and in this case we inherit page
tables from bootloader/BIOS. They trusted to provide identity mapping to
cover at least kernel (plus some more essential stuff), but it's free to
map more.

> From a pragmatic standpoint, the guy who can verify this for me is on
> vacation, but I believe our BIOS won't ever place the halt-causing
> ranges in a space below 4GiB.  Which explains why this patch works for
> our hardware.  (We do have reserved regions below 4G, just not the
> ones that hardware causes a halt for accessing.)
> 
> In case it helps you picture the situation, our hardware takes a small
> portion of RAM from the end of each NUMA node (or it might be pairs or
> quads of NUMA nodes, I'm not entirely clear on this at the moment) for
> its own purposes.  Here's a section of our e820 table:
> 
> [0.00] BIOS-e820: [mem 0x7c00-0x8fff] reserved
> [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
> [0.00] BIOS-e820: [mem 0xfe00-0xfe010fff] reserved
> [0.00] BIOS-e820: [mem 0x0001-0x002f7fff] usable
> [0.00] BIOS-e820: [mem 0x002f8000-0x00303fff] reserved
> [0.00] BIOS-e820: [mem 0x00304000-0x005f7bff] usable
> [0.00] BIOS-e820: [mem 0x005f7c00-0x00603fff] reserved
> [0.00] BIOS-e820: [mem 0x00604000-0x008f7bff] usable
> [0.00] BIOS-e820: [mem 0x008f7c00-0x00903fff] reserved
> [0.00] BIOS-e820: [mem 0x00904000-0x00bf7bff] usable
> [0.00] BIOS-e820: [mem 0x00bf7c00-0x00c03fff] reserved
> [0.00] BIOS-e820: [mem 0x00c04000-0x00ef7bff] usable
> [0.00] BIOS-e820: [mem 0x00ef7c00-0x00f03fff] reserved
> [0.00] BIOS-e820: [mem 0x00f04000-0x011f7bff] usable
> [0.00] BIOS-e820: [mem 0x011f7c00-0x01203fff] reserved
> [0.00] BIOS-e820: [mem 0x01204000-0x014f7bff] usable
> [0.00] BIOS-e820: [mem 0x014f7c00-0x01503fff] reserved
> [0.00] BIOS-e820: [mem 0x01504000-0x017f7bff] usable
> [0.00] BIOS-e820: [mem 0x017f7c00-0x01803fff] reserved

It would be interesting to know which of them are problematic.

> Our problem occurs when KASLR (or kexec) places the kernel close
> enough to the end of one of the usable sections, and the 1G of 1:1
> mapped space includes a portion of the following reserved section, and
> speculation touches the reserved area.

Are you sure that it's speculative access to blame? Speculative access
must not cause change in architectural state.

-- 
 Kirill A. Shutemov

[RFC PATCH 1/4] rseq: Fix: Reject unknown flags on rseq unregister

2019-09-10 Thread Mathieu Desnoyers

It is preferrable to reject unknown flags within rseq unregistration
rather than to ignore them. It is an oversight caused by the fact that
the check for unknown flags is after the rseq unregister flag check.

Signed-off-by: Mathieu Desnoyers 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra (Intel) 
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: "H . Peter Anvin" 
Cc: Paul Turner 
Cc: linux-...@vger.kernel.org
---
 kernel/rseq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/rseq.c b/kernel/rseq.c
index 27c48eb7de40..a4f86a9d6937 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -310,6 +310,8 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, 
rseq_len,
int ret;
 
if (flags & RSEQ_FLAG_UNREGISTER) {
+   if (flags & ~RSEQ_FLAG_UNREGISTER)
+   return -EINVAL;
/* Unregister rseq for current thread. */
if (current->rseq != rseq || !current->rseq)
return -EINVAL;
-- 
2.17.1

[RFC PATCH 2/4] rseq: Fix: Unregister rseq for CLONE_TLS

2019-09-10 Thread Mathieu Desnoyers

It has been reported by Google that rseq is not behaving properly
with respect to clone when CLONE_VM is used without CLONE_THREAD.
It keeps the prior thread's rseq TLS registered when the TLS of the
thread has moved, so the kernel deals with the wrong TLS.

The approach of clearing the per task-struct rseq registration
on clone with CLONE_THREAD flag is incomplete. It does not cover
the use-case of clone with CLONE_VM set, but without CLONE_THREAD.

Looking more closely at each of the clone flags:

- CLONE_THREAD,
- CLONE_VM,
- CLONE_SETTLS.

It appears that the flag we really want to track is CLONE_SETTLS, which
moves the location of the TLS for the child, which makes the rseq
registration point to the wrong TLS.

Suggested-by: "H . Peter Anvin" 
Signed-off-by: Mathieu Desnoyers 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra (Intel) 
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: "H . Peter Anvin" 
Cc: Paul Turner 
Cc: Dmitry Vyukov 
Cc: linux-...@vger.kernel.org
---
 include/linux/sched.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9f51932bd543..deb4154dbf11 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1919,11 +1919,11 @@ static inline void rseq_migrate(struct task_struct *t)
 
 /*
  * If parent process has a registered restartable sequences area, the
- * child inherits. Only applies when forking a process, not a thread.
+ * child inherits. Unregister rseq for a clone with CLONE_TLS set.
  */
 static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
 {
-   if (clone_flags & CLONE_THREAD) {
+   if (clone_flags & CLONE_TLS) {
t->rseq = NULL;
t->rseq_sig = 0;
t->rseq_event_mask = 0;
-- 
2.17.1

[RFC PATCH 4/4] rseq/selftests: Use RSEQ_FLAG_UNREG_CLONE_FLAGS

2019-09-10 Thread Mathieu Desnoyers

Use the new RSEQ_FLAG_UNREG_CLONE_FLAGS rseq flag in the rseq selftests.

Signed-off-by: Mathieu Desnoyers 
Cc: Shuah Khan 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra (Intel) 
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: "H . Peter Anvin" 
Cc: Paul Turner 
Cc: Dmitry Vyukov 
---
 tools/testing/selftests/rseq/rseq.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/rseq/rseq.c 
b/tools/testing/selftests/rseq/rseq.c
index 7159eb777fd3..d5268570014c 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -68,9 +68,10 @@ static void signal_restore(sigset_t oldset)
 }
 
 static int sys_rseq(volatile struct rseq *rseq_abi, uint32_t rseq_len,
-   int flags, uint32_t sig)
+   int flags, uint32_t sig, int unreg_clone_flags)
 {
-   return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig);
+   return syscall(__NR_rseq, rseq_abi, rseq_len, flags, sig,
+  unreg_clone_flags);
 }
 
 int rseq_register_current_thread(void)
@@ -87,7 +88,9 @@ int rseq_register_current_thread(void)
}
if (__rseq_refcount++)
goto end;
-   rc = sys_rseq(&__rseq_abi, sizeof(struct rseq), 0, RSEQ_SIG);
+   rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
+ RSEQ_FLAG_UNREG_CLONE_FLAGS, RSEQ_SIG,
+ CLONE_SETTLS);
if (!rc) {
assert(rseq_current_cpu_raw() >= 0);
goto end;
@@ -116,7 +119,7 @@ int rseq_unregister_current_thread(void)
if (--__rseq_refcount)
goto end;
rc = sys_rseq(&__rseq_abi, sizeof(struct rseq),
- RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+ RSEQ_FLAG_UNREGISTER, RSEQ_SIG, 0);
if (!rc)
goto end;
__rseq_refcount = 1;
-- 
2.17.1

Re: [PATCH] pinctrl: at91-pio4: implement .get_multiple and .set_multiple

2019-09-10 Thread Linus Walleij

On Thu, Sep 5, 2019 at 3:13 PM Alexandre Belloni
 wrote:
>
> Implement .get_multiple and .set_multiple to allow reading or setting
> multiple pins simultaneously. Pins in the same bank will all be switched at
> the same time, improving synchronization and performances.
>
> Signed-off-by: Alexandre Belloni 

Good initiative!

> +   for (bank = 0; bank < atmel_pioctrl->nbanks; bank++) {> + 
>   unsigned int word = bank;
> +   unsigned int offset = 0;
> +   unsigned int reg;
> +
> +#if ATMEL_PIO_NPINS_PER_BANK != BITS_PER_LONG

Should it not be > rather than != ?

> +   word = BIT_WORD(bank * ATMEL_PIO_NPINS_PER_BANK);
> +   offset = bank * ATMEL_PIO_NPINS_PER_BANK % BITS_PER_LONG;
> +#endif

This doesn't look good for multiplatform kernels.

We need to get rid of any compiletime constants like this.

Not your fault I suppose it is already there, but this really need
to be fixed. Any ideas?

Yours,
Linus Walleij

[RFC PATCH 3/4] rseq: Introduce unreg_clone_flags

2019-09-10 Thread Mathieu Desnoyers

Considering that some custom libc could possibly choose not to use
CLONE_SETTLS, we should allow the libc to override the choice of clone
flags meant to unregister rseq. This is a policy decision which should
not be made by the kernel.

Therefore, introduce a new RSEQ_FLAG_UNREG_CLONE_FLAGS, which makes the
rseq system call expect an additional 5th argument: a mask of all the
clone flags which may each ensure rseq is unregistered upon clone.

So even if CLONE_SETTLS is eventually replaced by some other flag in the
future, the libc will be able to adapt and pass this new flag upon rseq
registration as well.

The default when RSEQ_FLAG_UNREG_CLONE_FLAGS is unset is to unregister
rseq on clone with CLONE_SETTLS.

Suggested-by: "H . Peter Anvin" 
Signed-off-by: Mathieu Desnoyers 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra (Intel) 
Cc: "Paul E. McKenney" 
Cc: Boqun Feng 
Cc: "H . Peter Anvin" 
Cc: Paul Turner 
Cc: Dmitry Vyukov 
Cc: linux-...@vger.kernel.org
---
 include/linux/sched.h |  9 +++--
 include/linux/syscalls.h  |  2 +-
 include/uapi/linux/rseq.h |  1 +
 kernel/rseq.c | 14 +++---
 4 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index deb4154dbf11..c8faa6f8493d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1138,6 +1138,7 @@ struct task_struct {
 * with respect to preemption.
 */
unsigned long rseq_event_mask;
+   int rseq_unreg_clone_flags;
 #endif
 
struct tlbflush_unmap_batch tlb_ubc;
@@ -1919,18 +1920,21 @@ static inline void rseq_migrate(struct task_struct *t)
 
 /*
  * If parent process has a registered restartable sequences area, the
- * child inherits. Unregister rseq for a clone with CLONE_TLS set.
+ * child inherits, except if it has been required to be explicitly
+ * unregistered when any of the rseq_unreg_clone_flags are passed to clone.
  */
 static inline void rseq_fork(struct task_struct *t, unsigned long clone_flags)
 {
-   if (clone_flags & CLONE_TLS) {
+   if (clone_flags & t->rseq_unreg_clone_flags) {
t->rseq = NULL;
t->rseq_sig = 0;
t->rseq_event_mask = 0;
+   t->rseq_unreg_clone_flags = 0;
} else {
t->rseq = current->rseq;
t->rseq_sig = current->rseq_sig;
t->rseq_event_mask = current->rseq_event_mask;
+   t->rseq_unreg_clone_flags = current->rseq_unreg_clone_flags;
}
 }
 
@@ -1939,6 +1943,7 @@ static inline void rseq_execve(struct task_struct *t)
t->rseq = NULL;
t->rseq_sig = 0;
t->rseq_event_mask = 0;
+   t->rseq_unreg_clone_flags = 0;
 }
 
 #else
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 88145da7d140..6a242cfcc360 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -987,7 +987,7 @@ asmlinkage long sys_pkey_free(int pkey);
 asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags,
  unsigned mask, struct statx __user *buffer);
 asmlinkage long sys_rseq(struct rseq __user *rseq, uint32_t rseq_len,
-int flags, uint32_t sig);
+int flags, uint32_t sig, int unreg_clone_flags);
 asmlinkage long sys_open_tree(int dfd, const char __user *path, unsigned 
flags);
 asmlinkage long sys_move_mount(int from_dfd, const char __user *from_path,
   int to_dfd, const char __user *to_path,
diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index 9a402fdb60e9..d71e3c6b7fdb 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -20,6 +20,7 @@ enum rseq_cpu_id_state {
 
 enum rseq_flags {
RSEQ_FLAG_UNREGISTER = (1 << 0),
+   RSEQ_FLAG_UNREG_CLONE_FLAGS = (1 << 1),
 };
 
 enum rseq_cs_flags_bit {
diff --git a/kernel/rseq.c b/kernel/rseq.c
index a4f86a9d6937..c59b8d3dc275 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -304,8 +304,8 @@ void rseq_syscall(struct pt_regs *regs)
 /*
  * sys_rseq - setup restartable sequences for caller thread.
  */
-SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, rseq_len,
-   int, flags, u32, sig)
+SYSCALL_DEFINE5(rseq, struct rseq __user *, rseq, u32, rseq_len,
+   int, flags, u32, sig, int, unreg_clone_flags)
 {
int ret;
 
@@ -324,12 +324,16 @@ SYSCALL_DEFINE4(rseq, struct rseq __user *, rseq, u32, 
rseq_len,
return ret;
current->rseq = NULL;
current->rseq_sig = 0;
+   current->rseq_unreg_clone_flags = 0;
return 0;
}
 
-   if (unlikely(flags))
+   if (unlikely(flags & ~RSEQ_FLAG_UNREG_CLONE_FLAGS))
return -EINVAL;
 
+   if (!(flags & RSEQ_FLAG_UNREG_CLONE_FLAGS))
+   unreg_clone_flags = CLONE_SETTLS;
+
if (current->rseq) {
/*
 * If rseq is

Re: [Letux-kernel] [RFC PATCH 0/3] Enable 1GHz support on omap36xx

2019-09-10 Thread Adam Ford

On Tue, Sep 10, 2019 at 3:06 PM Adam Ford  wrote:
>
> On Tue, Sep 10, 2019 at 2:55 PM H. Nikolaus Schaller  
> wrote:
> >
> > Ok,
> >
> > > Am 10.09.2019 um 20:51 schrieb H. Nikolaus Schaller :
> > >
> >  it, but then I got some nasty errors and crashes.
> > >>>
> > >>> I have done the same but not (yet) seen a crash or error. Maybe you had
> > >>> a typo?
> > >>
> > >> Can you send me an updated patch?  I'd like to try to get where you
> > >> are that doesn't crash.
> > >
> > > Yes, as soon as I have access.
> >
> > it turns out that my patch breaks cpufreq completely...
> > So it looks as if *I* have a typo :)
> >
> > Hence I am likely running at constant speed and the
> > VDD1 regulator is fixed a 1.200V.
> >
> > root@letux:~# dmesg|fgrep opp
> > [2.426208] cpu cpu0: opp_parse_supplies: Invalid number of elements in 
> > opp-microvolt property (6) with supplies (1)
> > [2.438140] cpu cpu0: _of_add_opp_table_v2: Failed to add OPP, -22
> > root@letux:~# cat /sys/class/regulator/regulator.8/microvolts
> > 120
> > root@letux:~#
> >
> > The error message looks as if we have to enable multi_regulator.
>
> That will enable both vdd and vbb regulators from what I can tell in the 
> driver.
>
> > And that may need to rename cpu0-supply to vdd-supply (unless the
> > names can be configured).
>
> That is consistent with what I found.  vdd-supply = <>; and
> vbb-supply = <_mpu_iva>;
> I put them both under the cpu node.  Unfortunately, when I did that,
> my board crashed.
>
> I am thinking it has something to do with the abb_mpu_iva driver
> because until this point, we've always operated at 800MHz or lower
> which all have the same behavior in abb_mpu_iva.
>
> With the patch you posted for the regulator, without the update to
> cpufreq,  and with debugging enabled, I received the following in
> dmesg:
>
> [1.112518] ti_abb 483072f0.regulator-abb-mpu: Missing
> 'efuse-address' IO resource
> [1.112579] ti_abb 483072f0.regulator-abb-mpu: [0]v=1012500 ABB=0
> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> [1.112609] ti_abb 483072f0.regulator-abb-mpu: [1]v=120 ABB=0
> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> [1.112609] ti_abb 483072f0.regulator-abb-mpu: [2]v=1325000 ABB=0
> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> [1.112640] ti_abb 483072f0.regulator-abb-mpu: [3]v=1375000 ABB=1
> ef=0x0 rbb=0x0 fbb=0x0 vset=0x0
> [1.112731] ti_abb 483072f0.regulator-abb-mpu: ti_abb_init_timings:
> Clk_rate=1300, sr2_cnt=0x0032
>

Using an unmodified kernel, I changed the device tree to make the abb
regulator power the cpu instead of vcc.  After doing so, I was able to
read the microvolt value, and it matched the processor's desired OPP
voltage, and the debug code showed the abb regulator was attempting to
set the various index based on the needed voltage.  I think the abb
driver is working correctly.

I think tomorrow, I will re-apply the patches and tweak it again to
support both vdd and vbb regulators.  If it crashes again, I'll look
more closely at the logs to see if I can determine why.  I think it's
pretty close.  I also need to go back and find the SmartReflex stuff
as well.

adam
>
> adam
> >
> > BR,
> > Nikolaus
> >

Re: [PATCH RESEND v4 8/9] KVM: MMU: Enable Lazy mode SPPT setup

2019-09-10 Thread Yang Weijiang

On Mon, Sep 09, 2019 at 07:10:22PM +0200, Paolo Bonzini wrote:
> On 04/09/19 15:49, Yang Weijiang wrote:
> >>> This would not enable SPP if the guest is backed by huge pages.
> >>> Instead, either the PT_PAGE_TABLE_LEVEL level must be forced for all
> >>> pages covered by SPP ranges, or (better) kvm_enable_spp_protection must
> >>> be able to cover multiple pages at once.
> >>>
> >>> Paolo
> >> OK, I'll figure out how to make it, thanks!
> > Hi, Paolo,
> > Regarding this change, I have some concerns, splitting EPT huge page
> > entries(e.g., 1GB page)will take long time compared with normal EPT page
> > fault processing, especially for multiple vcpus/pages,so the in-flight time 
> > increases,
> > but HW walks EPT for translations in the meantime, would it bring any side 
> > effect? 
> > or there's a way to mitigate it?
> 
> Sub-page permissions are only defined on EPT PTEs, not on large pages.
> Therefore, in order to allow subpage permissions the EPT page tables
> must already be split.
> 
> Paolo
Thanks, I've added code to handle hugepage, will be included in
next version patch.

Re: [PATCH] net: Remove the source address setting in connect() for UDP

2019-09-10 Thread Enke Chen (enkechen)

Hi, David:

Do you still have concerns about backward compatibility of the fix?

I really do not see how existing, working applications would be negatively 
impacted
by the fix.

Thanks.   -- Enke

-Original Message-
From: "Enke Chen (enkechen)" 
Date: Friday, September 6, 2019 at 12:23 AM
To: David Miller 
Cc: "kuz...@ms2.inr.ac.ru" , "yoshf...@linux-ipv6.org" 
, "net...@vger.kernel.org" , 
"linux-kernel@vger.kernel.org" , 
"xe-linux-external(mailer list)" , "Enke Chen 
(enkechen)" 
Subject: Re: [PATCH] net: Remove the source address setting in connect() for UDP

Hi, David:

Yes, I understand the code has been there for a long time.  But the issues are 
real, and it's really nasty when
You run into them.  As I described in the patch log, there is no backward 
compatibility Issue for fixing it.

---
There is no backward compatibility issue here as the source address setting
in connect() is not needed anyway.

  - No impact on the source address selection when the source address
is explicitly specified by "bind()", or by the "IP_PKTINFO" option.

  - In the case that the source address is not explicitly specified,
the selection of the source address would be more accurate and
reliable based on the up-to-date routing table.
---

Thanks.  -- Enke

-Original Message-
From:  on behalf of David Miller 

Date: Friday, September 6, 2019 at 12:14 AM
To: "Enke Chen (enkechen)" 
Cc: "kuz...@ms2.inr.ac.ru" , "yoshf...@linux-ipv6.org" 
, "net...@vger.kernel.org" , 
"linux-kernel@vger.kernel.org" , 
"xe-linux-external(mailer list)" 
Subject: Re: [PATCH] net: Remove the source address setting in connect() for UDP

From: Enke Chen 
Date: Thu,  5 Sep 2019 19:54:37 -0700

> The connect() system call for a UDP socket is for setting the destination
> address and port. But the current code mistakenly sets the source address
> for the socket as well. Remove the source address setting in connect() for
> UDP in this patch.

Do you have any idea how many decades of precedence this behavior has and
therefore how much you potentially will break userspace?

This boat has sailed a long time ago I'm afraid.

[PATCH] scsi: bfa: release allocated memory in case of error

2019-09-10 Thread Navid Emamdoost

In bfad_im_get_stats if bfa_port_get_stats fails, allocated memory
needs to be released.

Signed-off-by: Navid Emamdoost 
---
 drivers/scsi/bfa/bfad_attr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/bfa/bfad_attr.c b/drivers/scsi/bfa/bfad_attr.c
index 29ab81df75c0..fbfce02e5b93 100644
--- a/drivers/scsi/bfa/bfad_attr.c
+++ b/drivers/scsi/bfa/bfad_attr.c
@@ -275,8 +275,10 @@ bfad_im_get_stats(struct Scsi_Host *shost)
rc = bfa_port_get_stats(BFA_FCPORT(>bfa),
fcstats, bfad_hcb_comp, );
spin_unlock_irqrestore(>bfad_lock, flags);
-   if (rc != BFA_STATUS_OK)
+   if (rc != BFA_STATUS_OK) {
+   kfree(fcstats);
return NULL;
+   }
 
wait_for_completion();
 
-- 
2.17.1

Re: [PATCH] gpio: fix build failure: gpiochip_[un]lock*() static/non-static

2019-09-10 Thread John Hubbard


On 9/8/19 12:16 AM, Linus Walleij wrote:

On Sat, Sep 7, 2019 at 2:05 AM John Hubbard  wrote:


While building with !CONFIG_GPIOLIB, I experienced a build failure,
because driver.h in that configuration supplies both a static and
a non-static version of these routines:


I think this is fixed in my latest version of the "devel" branch?



OK, sounds good to me. Sorry for not spotting that a fix is in
the pipeline. :)


thanks,
--
John Hubbard
NVIDIA

Re: KASAN: use-after-free Read in rxrpc_send_keepalive

2019-09-10 Thread syzbot


syzbot has found a reproducer for the following crash on:

HEAD commit:3120b9a6 Merge tag 'ipc-fixes' of git://git.kernel.org/pub..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=107d1ca560
kernel config:  https://syzkaller.appspot.com/x/.config?x=ed2b148cd67382ec
dashboard link: https://syzkaller.appspot.com/bug?extid=d850c266e3df14da1d31
compiler:   clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)

syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=1734709560
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=143bcca560

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+d850c266e3df14da1...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in rxrpc_send_keepalive+0xe2/0x3c0  
net/rxrpc/output.c:634

Read of size 8 at addr 8880a859a058 by task kworker/0:2/3016

CPU: 0 PID: 3016 Comm: kworker/0:2 Not tainted 5.3.0-rc8+ #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Workqueue: krxrpcd rxrpc_peer_keepalive_worker
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
 print_address_description+0x75/0x5b0 mm/kasan/report.c:351
 __kasan_report+0x14b/0x1c0 mm/kasan/report.c:482
 kasan_report+0x26/0x50 mm/kasan/common.c:618
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
 rxrpc_send_keepalive+0xe2/0x3c0 net/rxrpc/output.c:634
 rxrpc_peer_keepalive_dispatch net/rxrpc/peer_event.c:369 [inline]
 rxrpc_peer_keepalive_worker+0x76e/0xb40 net/rxrpc/peer_event.c:430
 process_one_work+0x7ef/0x10e0 kernel/workqueue.c:2269
 worker_thread+0xc01/0x1630 kernel/workqueue.c:2415
 kthread+0x332/0x350 kernel/kthread.c:255
 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

Allocated by task 9378:
 save_stack mm/kasan/common.c:69 [inline]
 set_track mm/kasan/common.c:77 [inline]
 __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:493
 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:507
 kmem_cache_alloc_trace+0x221/0x2f0 mm/slab.c:3550
 kmalloc include/linux/slab.h:552 [inline]
 kzalloc include/linux/slab.h:748 [inline]
 rxrpc_alloc_connection+0x79/0x490 net/rxrpc/conn_object.c:41
 rxrpc_alloc_client_connection net/rxrpc/conn_client.c:176 [inline]
 rxrpc_get_client_conn net/rxrpc/conn_client.c:339 [inline]
 rxrpc_connect_call+0xb30/0x2c40 net/rxrpc/conn_client.c:697
 rxrpc_new_client_call+0x6d5/0xb60 net/rxrpc/call_object.c:289
 rxrpc_new_client_call_for_sendmsg net/rxrpc/sendmsg.c:595 [inline]
 rxrpc_do_sendmsg+0xf2b/0x19b0 net/rxrpc/sendmsg.c:652
 rxrpc_sendmsg+0x5eb/0x8b0 net/rxrpc/af_rxrpc.c:585
 sock_sendmsg_nosec net/socket.c:637 [inline]
 sock_sendmsg net/socket.c:657 [inline]
 ___sys_sendmsg+0x60d/0x910 net/socket.c:2311
 __sys_sendmmsg+0x239/0x470 net/socket.c:2413
 __do_sys_sendmmsg net/socket.c:2442 [inline]
 __se_sys_sendmmsg net/socket.c:2439 [inline]
 __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2439
 do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 16:
 save_stack mm/kasan/common.c:69 [inline]
 set_track mm/kasan/common.c:77 [inline]
 __kasan_slab_free+0x12a/0x1e0 mm/kasan/common.c:455
 kasan_slab_free+0xe/0x10 mm/kasan/common.c:463
 __cache_free mm/slab.c:3425 [inline]
 kfree+0x115/0x200 mm/slab.c:3756
 rxrpc_destroy_connection+0x1ec/0x240 net/rxrpc/conn_object.c:372
 __rcu_reclaim kernel/rcu/rcu.h:222 [inline]
 rcu_do_batch kernel/rcu/tree.c:2114 [inline]
 rcu_core+0x892/0xf10 kernel/rcu/tree.c:2314
 rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2323
 __do_softirq+0x333/0x7c4 arch/x86/include/asm/paravirt.h:778

The buggy address belongs to the object at 8880a859a040
 which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 24 bytes inside of
 1024-byte region [8880a859a040, 8880a859a440)
The buggy address belongs to the page:
page:ea0002a16680 refcount:1 mapcount:0 mapping:8880aa400c40  
index:0x0 compound_mapcount: 0

flags: 0x1fffc010200(slab|head)
raw: 01fffc010200 ea00024cc688 ea0002684d88 8880aa400c40
raw:  8880a859a040 00010007 
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 8880a8599f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 8880a8599f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

8880a859a000: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb

^
 8880a859a080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 8880a859a100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==

[PATCH v4 3/9] hugetlb_cgroup: add reservation accounting for private mappings

2019-09-10 Thread Mina Almasry

Normally the pointer to the cgroup to uncharge hangs off the struct
page, and gets queried when it's time to free the page. With
hugetlb_cgroup reservations, this is not possible. Because it's possible
for a page to be reserved by one task and actually faulted in by another
task.

The best place to put the hugetlb_cgroup pointer to uncharge for
reservations is in the resv_map. But, because the resv_map has different
semantics for private and shared mappings, the code patch to
charge/uncharge shared and private mappings is different. This patch
implements charging and uncharging for private mappings.

For private mappings, the counter to uncharge is in
resv_map->reservation_counter. On initializing the resv_map this is set
to NULL. On reservation of a region in private mapping, the tasks
hugetlb_cgroup is charged and the hugetlb_cgroup is placed is
resv_map->reservation_counter.

On hugetlb_vm_op_close, we uncharge resv_map->reservation_counter.

Signed-off-by: Mina Almasry 
---
 include/linux/hugetlb.h|  8 ++
 include/linux/hugetlb_cgroup.h | 11 
 mm/hugetlb.c   | 47 --
 mm/hugetlb_cgroup.c| 12 -
 4 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 128ff1aff1c93..536cb144cf484 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -46,6 +46,14 @@ struct resv_map {
long adds_in_progress;
struct list_head region_cache;
long region_cache_count;
+ #ifdef CONFIG_CGROUP_HUGETLB
+   /*
+* On private mappings, the counter to uncharge reservations is stored
+* here. If these fields are 0, then the mapping is shared.
+*/
+   struct page_counter *reservation_counter;
+   unsigned long pages_per_hpage;
+#endif
 };
 extern struct resv_map *resv_map_alloc(void);
 void resv_map_release(struct kref *ref);
diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index c467715dd8fb8..8c6ea58c63c89 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -25,6 +25,17 @@ struct hugetlb_cgroup;
 #define HUGETLB_CGROUP_MIN_ORDER   2

 #ifdef CONFIG_CGROUP_HUGETLB
+struct hugetlb_cgroup {
+   struct cgroup_subsys_state css;
+   /*
+* the counter to account for hugepages from hugetlb.
+*/
+   struct page_counter hugepage[HUGE_MAX_HSTATE];
+   /*
+* the counter to account for hugepage reservations from hugetlb.
+*/
+   struct page_counter reserved_hugepage[HUGE_MAX_HSTATE];
+};

 static inline struct hugetlb_cgroup *hugetlb_cgroup_from_page(struct page 
*page)
 {
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e975f55aede94..fbd7c52e17348 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -711,6 +711,16 @@ struct resv_map *resv_map_alloc(void)
INIT_LIST_HEAD(_map->regions);

resv_map->adds_in_progress = 0;
+#ifdef CONFIG_CGROUP_HUGETLB
+   /*
+* Initialize these to 0. On shared mappings, 0's here indicate these
+* fields don't do cgroup accounting. On private mappings, these will be
+* re-initialized to the proper values, to indicate that hugetlb cgroup
+* reservations are to be un-charged from here.
+*/
+   resv_map->reservation_counter = NULL;
+   resv_map->pages_per_hpage = 0;
+#endif

INIT_LIST_HEAD(_map->region_cache);
list_add(>link, _map->region_cache);
@@ -3193,7 +3203,19 @@ static void hugetlb_vm_op_close(struct vm_area_struct 
*vma)

reserve = (end - start) - region_count(resv, start, end);

-   kref_put(>refs, resv_map_release);
+#ifdef CONFIG_CGROUP_HUGETLB
+   /*
+* Since we check for HPAGE_RESV_OWNER above, this must a private
+* mapping, and these values should be none-zero, and should point to
+* the hugetlb_cgroup counter to uncharge for this reservation.
+*/
+   WARN_ON(!resv->reservation_counter);
+   WARN_ON(!resv->pages_per_hpage);
+
+   hugetlb_cgroup_uncharge_counter(
+   resv->reservation_counter,
+   (end - start) * resv->pages_per_hpage);
+#endif

if (reserve) {
/*
@@ -3203,6 +3225,8 @@ static void hugetlb_vm_op_close(struct vm_area_struct 
*vma)
gbl_reserve = hugepage_subpool_put_pages(spool, reserve);
hugetlb_acct_memory(h, -gbl_reserve);
}
+
+   kref_put(>refs, resv_map_release);
 }

 static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr)
@@ -4536,6 +4560,7 @@ int hugetlb_reserve_pages(struct inode *inode,
struct hstate *h = hstate_inode(inode);
struct hugepage_subpool *spool = subpool_inode(inode);
struct resv_map *resv_map;
+   struct hugetlb_cgroup *h_cg;
long gbl_reserve;

/* This should never happen */
@@ -4569,11 +4594,29 @@ int hugetlb_reserve_pages(struct

[PATCH v4 2/9] hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations

2019-09-10 Thread Mina Almasry

Augements hugetlb_cgroup_charge_cgroup to be able to charge hugetlb
usage or hugetlb reservation counter.

Adds a new interface to uncharge a hugetlb_cgroup counter via
hugetlb_cgroup_uncharge_counter.

Integrates the counter with hugetlb_cgroup, via hugetlb_cgroup_init,
hugetlb_cgroup_have_usage, and hugetlb_cgroup_css_offline.

Signed-off-by: Mina Almasry 
---
 include/linux/hugetlb_cgroup.h | 13 --
 mm/hugetlb.c   |  6 ++-
 mm/hugetlb_cgroup.c| 82 +++---
 3 files changed, 80 insertions(+), 21 deletions(-)

diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h
index 063962f6dfc6a..c467715dd8fb8 100644
--- a/include/linux/hugetlb_cgroup.h
+++ b/include/linux/hugetlb_cgroup.h
@@ -52,14 +52,19 @@ static inline bool hugetlb_cgroup_disabled(void)
 }

 extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
-   struct hugetlb_cgroup **ptr);
+   struct hugetlb_cgroup **ptr,
+   bool reserved);
 extern void hugetlb_cgroup_commit_charge(int idx, unsigned long nr_pages,
 struct hugetlb_cgroup *h_cg,
 struct page *page);
 extern void hugetlb_cgroup_uncharge_page(int idx, unsigned long nr_pages,
 struct page *page);
 extern void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
-  struct hugetlb_cgroup *h_cg);
+  struct hugetlb_cgroup *h_cg,
+  bool reserved);
+extern void hugetlb_cgroup_uncharge_counter(struct page_counter *p,
+   unsigned long nr_pages);
+
 extern void hugetlb_cgroup_file_init(void) __init;
 extern void hugetlb_cgroup_migrate(struct page *oldhpage,
   struct page *newhpage);
@@ -83,7 +88,7 @@ static inline bool hugetlb_cgroup_disabled(void)

 static inline int
 hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
-struct hugetlb_cgroup **ptr)
+struct hugetlb_cgroup **ptr, bool reserved)
 {
return 0;
 }
@@ -102,7 +107,7 @@ hugetlb_cgroup_uncharge_page(int idx, unsigned long 
nr_pages, struct page *page)

 static inline void
 hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
-  struct hugetlb_cgroup *h_cg)
+  struct hugetlb_cgroup *h_cg, bool reserved)
 {
 }

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6d7296dd11b83..e975f55aede94 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2078,7 +2078,8 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
gbl_chg = 1;
}

-   ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), _cg);
+   ret = hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), _cg,
+  false);
if (ret)
goto out_subpool_put;

@@ -2126,7 +2127,8 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
return page;

 out_uncharge_cgroup:
-   hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg);
+   hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg,
+   false);
 out_subpool_put:
if (map_chg || avoid_reserve)
hugepage_subpool_put_pages(spool, 1);
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 51a72624bd1ff..2ab36a98d834e 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -38,8 +38,8 @@ struct hugetlb_cgroup {
 static struct hugetlb_cgroup *root_h_cgroup __read_mostly;

 static inline
-struct page_counter *hugetlb_cgroup_get_counter(struct hugetlb_cgroup *h_cg, 
int idx,
-bool reserved)
+struct page_counter *hugetlb_cgroup_get_counter(struct hugetlb_cgroup *h_cg,
+   int idx, bool reserved)
 {
if (reserved)
return  _cg->reserved_hugepage[idx];
@@ -74,8 +74,12 @@ static inline bool hugetlb_cgroup_have_usage(struct 
hugetlb_cgroup *h_cg)
int idx;

for (idx = 0; idx < hugetlb_max_hstate; idx++) {
-   if (page_counter_read(_cg->hugepage[idx]))
+   if (page_counter_read(hugetlb_cgroup_get_counter(h_cg, idx,
+   true)) ||
+   page_counter_read(hugetlb_cgroup_get_counter(h_cg, idx,
+   false))) {
return true;
+   }
}
return false;
 }
@@ -86,18 +90,30 @@ static void hugetlb_cgroup_init(struct hugetlb_cgroup 
*h_cgroup,
int idx;

for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
-   struct

[PATCH v4 5/9] hugetlb: remove duplicated code

2019-09-10 Thread Mina Almasry

Remove duplicated code between region_chg and region_add, and refactor it into
a common function, add_reservation_in_range. This is mostly done because
there is a follow up change in this series that disables region
coalescing in region_add, and I want to make that change in one place
only. It should improve maintainability anyway on its own.

Signed-off-by: Mina Almasry 
---
 mm/hugetlb.c | 116 ---
 1 file changed, 54 insertions(+), 62 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bea51ae422f63..ce5ed1056fefd 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -244,6 +244,57 @@ struct file_region {
long to;
 };

+static long add_reservation_in_range(
+   struct resv_map *resv, long f, long t, bool count_only)
+{
+
+   long chg = 0;
+   struct list_head *head = >regions;
+   struct file_region *rg = NULL, *trg = NULL, *nrg = NULL;
+
+   /* Locate the region we are before or in. */
+   list_for_each_entry(rg, head, link)
+   if (f <= rg->to)
+   break;
+
+   /* Round our left edge to the current segment if it encloses us. */
+   if (f > rg->from)
+   f = rg->from;
+
+   chg = t - f;
+
+   /* Check for and consume any regions we now overlap with. */
+   nrg = rg;
+   list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
+   if (>link == head)
+   break;
+   if (rg->from > t)
+   break;
+
+   /* We overlap with this area, if it extends further than
+* us then we must extend ourselves.  Account for its
+* existing reservation.
+*/
+   if (rg->to > t) {
+   chg += rg->to - t;
+   t = rg->to;
+   }
+   chg -= rg->to - rg->from;
+
+   if (!count_only && rg != nrg) {
+   list_del(>link);
+   kfree(rg);
+   }
+   }
+
+   if (!count_only) {
+   nrg->from = f;
+   nrg->to = t;
+   }
+
+   return chg;
+}
+
 /*
  * Add the huge page range represented by [f, t) to the reserve
  * map.  Existing regions will be expanded to accommodate the specified
@@ -257,7 +308,7 @@ struct file_region {
 static long region_add(struct resv_map *resv, long f, long t)
 {
struct list_head *head = >regions;
-   struct file_region *rg, *nrg, *trg;
+   struct file_region *rg, *nrg;
long add = 0;

spin_lock(>lock);
@@ -287,38 +338,7 @@ static long region_add(struct resv_map *resv, long f, long 
t)
goto out_locked;
}

-   /* Round our left edge to the current segment if it encloses us. */
-   if (f > rg->from)
-   f = rg->from;
-
-   /* Check for and consume any regions we now overlap with. */
-   nrg = rg;
-   list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-   if (>link == head)
-   break;
-   if (rg->from > t)
-   break;
-
-   /* If this area reaches higher then extend our area to
-* include it completely.  If this is not the first area
-* which we intend to reuse, free it. */
-   if (rg->to > t)
-   t = rg->to;
-   if (rg != nrg) {
-   /* Decrement return value by the deleted range.
-* Another range will span this area so that by
-* end of routine add will be >= zero
-*/
-   add -= (rg->to - rg->from);
-   list_del(>link);
-   kfree(rg);
-   }
-   }
-
-   add += (nrg->from - f); /* Added to beginning of region */
-   nrg->from = f;
-   add += t - nrg->to; /* Added to end of region */
-   nrg->to = t;
+   add = add_reservation_in_range(resv, f, t, false);

 out_locked:
resv->adds_in_progress--;
@@ -345,8 +365,6 @@ static long region_add(struct resv_map *resv, long f, long 
t)
  */
 static long region_chg(struct resv_map *resv, long f, long t)
 {
-   struct list_head *head = >regions;
-   struct file_region *rg;
long chg = 0;

spin_lock(>lock);
@@ -375,34 +393,8 @@ static long region_chg(struct resv_map *resv, long f, long 
t)
goto retry_locked;
}

-   /* Locate the region we are before or in. */
-   list_for_each_entry(rg, head, link)
-   if (f <= rg->to)
-   break;
-
-   /* Round our left edge to the current segment if it encloses us. */
-   if (f > rg->from)
-   f = rg->from;
-   chg = t - f;
-
-   /* Check for and consume any regions we now overlap with. */
-   list_for_each_entry(rg, rg->link.prev,

[PATCH v4 9/9] hugetlb_cgroup: Add hugetlb_cgroup reservation docs

2019-09-10 Thread Mina Almasry

Add docs for how to use hugetlb_cgroup reservations, and their behavior.

Signed-off-by: Mina Almasry 
Acked-by: Hillf Danton 
---
 .../admin-guide/cgroup-v1/hugetlb.rst | 84 ---
 1 file changed, 73 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v1/hugetlb.rst 
b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
index a3902aa253a96..cc6eb859fc722 100644
--- a/Documentation/admin-guide/cgroup-v1/hugetlb.rst
+++ b/Documentation/admin-guide/cgroup-v1/hugetlb.rst
@@ -2,13 +2,6 @@
 HugeTLB Controller
 ==

-The HugeTLB controller allows to limit the HugeTLB usage per control group and
-enforces the controller limit during page fault. Since HugeTLB doesn't
-support page reclaim, enforcing the limit at page fault time implies that,
-the application will get SIGBUS signal if it tries to access HugeTLB pages
-beyond its limit. This requires the application to know beforehand how much
-HugeTLB pages it would require for its use.
-
 HugeTLB controller can be created by first mounting the cgroup filesystem.

 # mount -t cgroup -o hugetlb none /sys/fs/cgroup
@@ -28,10 +21,14 @@ process (bash) into it.

 Brief summary of control files::

- hugetlb..limit_in_bytes # set/show limit of "hugepagesize" 
hugetlb usage
- hugetlb..max_usage_in_bytes # show max "hugepagesize" hugetlb  
usage recorded
- hugetlb..usage_in_bytes # show current usage for 
"hugepagesize" hugetlb
- hugetlb..failcnt   # show the number of 
allocation failure due to HugeTLB limit
+ hugetlb..reservation_limit_in_bytes # set/show limit of 
"hugepagesize" hugetlb reservations
+ hugetlb..reservation_max_usage_in_bytes # show max 
"hugepagesize" hugetlb reservations recorded
+ hugetlb..reservation_usage_in_bytes # show current 
reservations for "hugepagesize" hugetlb
+ hugetlb..reservation_failcnt# show the number of 
allocation failure due to HugeTLB reservation limit
+ hugetlb..limit_in_bytes # set/show limit of 
"hugepagesize" hugetlb faults
+ hugetlb..max_usage_in_bytes # show max 
"hugepagesize" hugetlb  usage recorded
+ hugetlb..usage_in_bytes # show current usage 
for "hugepagesize" hugetlb
+ hugetlb..failcnt# show the number of 
allocation failure due to HugeTLB usage limit

 For a system supporting three hugepage sizes (64k, 32M and 1G), the control
 files include::
@@ -40,11 +37,76 @@ files include::
   hugetlb.1GB.max_usage_in_bytes
   hugetlb.1GB.usage_in_bytes
   hugetlb.1GB.failcnt
+  hugetlb.1GB.reservation_limit_in_bytes
+  hugetlb.1GB.reservation_max_usage_in_bytes
+  hugetlb.1GB.reservation_usage_in_bytes
+  hugetlb.1GB.reservation_failcnt
   hugetlb.64KB.limit_in_bytes
   hugetlb.64KB.max_usage_in_bytes
   hugetlb.64KB.usage_in_bytes
   hugetlb.64KB.failcnt
+  hugetlb.64KB.reservation_limit_in_bytes
+  hugetlb.64KB.reservation_max_usage_in_bytes
+  hugetlb.64KB.reservation_usage_in_bytes
+  hugetlb.64KB.reservation_failcnt
   hugetlb.32MB.limit_in_bytes
   hugetlb.32MB.max_usage_in_bytes
   hugetlb.32MB.usage_in_bytes
   hugetlb.32MB.failcnt
+  hugetlb.32MB.reservation_limit_in_bytes
+  hugetlb.32MB.reservation_max_usage_in_bytes
+  hugetlb.32MB.reservation_usage_in_bytes
+  hugetlb.32MB.reservation_failcnt
+
+
+1. Reservation limits
+
+The HugeTLB controller allows to limit the HugeTLB reservations per control
+group and enforces the controller limit at reservation time. Reservation limits
+are superior to Page fault limits (see section 2), since Reservation limits are
+enforced at reservation time, and never causes the application to get SIGBUS
+signal. Instead, if the application is violating its limits, then it gets an
+error on reservation time, i.e. the mmap or shmget return an error.
+
+
+2. Page fault limits
+
+The HugeTLB controller allows to limit the HugeTLB usage (page fault) per
+control group and enforces the controller limit during page fault. Since 
HugeTLB
+doesn't support page reclaim, enforcing the limit at page fault time implies
+that, the application will get SIGBUS signal if it tries to access HugeTLB
+pages beyond its limit. This requires the application to know beforehand how
+much HugeTLB pages it would require for its use.
+
+
+3. Caveats with shared memory
+
+a. Charging and uncharging:
+
+For shared hugetlb memory, both hugetlb reservation and usage (page faults) are
+charged to the first task that causes the memory to be reserved or faulted,
+and all subsequent uses of this reserved or faulted memory is done without
+charging.
+
+Shared hugetlb memory is only uncharged when it is unreseved or deallocated.
+This is usually when the hugetlbfs file is deleted, and not when the task that
+caused the reservation or fault has exited.
+
+b. Interaction between reservation limit and fault limit.
+
+Generally, it's not recommended to set both of the reservation limit and fault
+limit in a cgroup. For private memory, the fault usage cannot

[PATCH v4 7/9] hugetlb_cgroup: add accounting for shared mappings

2019-09-10 Thread Mina Almasry

For shared mappings, the pointer to the hugetlb_cgroup to uncharge lives
in the resv_map entries, in file_region->reservation_counter.

After a call to region_chg, we charge the approprate hugetlb_cgroup, and if
successful, we pass on the hugetlb_cgroup info to a follow up region_add call.
When a file_region entry is added to the resv_map via region_add, we put the
pointer to that cgroup in file_region->reservation_counter. If charging doesn't
succeed, we report the error to the caller, so that the kernel fails the
reservation.

On region_del, which is when the hugetlb memory is unreserved, we also uncharge
the file_region->reservation_counter.

Signed-off-by: Mina Almasry 
---
 mm/hugetlb.c | 147 ---
 1 file changed, 115 insertions(+), 32 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 5eca34d9b753d..711690b87dce5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -242,6 +242,15 @@ struct file_region {
struct list_head link;
long from;
long to;
+#ifdef CONFIG_CGROUP_HUGETLB
+   /*
+* On shared mappings, each reserved region appears as a struct
+* file_region in resv_map. These fields hold the info needed to
+* uncharge each reservation.
+*/
+   struct page_counter *reservation_counter;
+   unsigned long pages_per_hpage;
+#endif
 };

 /* Helper that removes a struct file_region from the resv_map cache and returns
@@ -250,9 +259,29 @@ struct file_region {
 static struct file_region *get_file_region_entry_from_cache(
struct resv_map *resv, long from, long to);

-static long add_reservation_in_range(
-   struct resv_map *resv,
+/* Helper that records hugetlb_cgroup uncharge info. */
+static void record_hugetlb_cgroup_uncharge_info(struct hugetlb_cgroup *h_cg,
+   struct file_region *nrg, struct hstate *h)
+{
+#ifdef CONFIG_CGROUP_HUGETLB
+   if (h_cg) {
+   nrg->reservation_counter =
+   _cg->reserved_hugepage[hstate_index(h)];
+   nrg->pages_per_hpage = pages_per_huge_page(h);
+   } else {
+   nrg->reservation_counter = NULL;
+   nrg->pages_per_hpage = 0;
+   }
+#endif
+}
+
+/* Must be called with resv->lock held. Calling this with dry_run == true will
+ * count the number of pages to be added but will not modify the linked list.
+ */
+static long add_reservation_in_range(struct resv_map *resv,
long f, long t,
+   struct hugetlb_cgroup *h_cg,
+   struct hstate *h,
long *regions_needed,
bool count_only)
 {
@@ -294,6 +323,8 @@ static long add_reservation_in_range(
nrg = get_file_region_entry_from_cache(resv,
last_accounted_offset,
rg->from);
+   record_hugetlb_cgroup_uncharge_info(h_cg, nrg,
+   h);
list_add(>link, rg->link.prev);
} else if (regions_needed)
*regions_needed += 1;
@@ -310,6 +341,7 @@ static long add_reservation_in_range(
if (!count_only) {
nrg = get_file_region_entry_from_cache(resv,
last_accounted_offset, t);
+   record_hugetlb_cgroup_uncharge_info(h_cg, nrg, h);
list_add(>link, rg->link.prev);
} else if (regions_needed)
*regions_needed += 1;
@@ -317,6 +349,7 @@ static long add_reservation_in_range(
last_accounted_offset = t;
}

+   VM_BUG_ON(add < 0);
return add;
 }

@@ -333,8 +366,8 @@ static long add_reservation_in_range(
  * Return the number of new huge pages added to the map.  This
  * number is greater than or equal to zero.
  */
-static long region_add(struct resv_map *resv, long f, long t,
-   long regions_needed)
+static long region_add(struct hstate *h, struct hugetlb_cgroup *h_cg,
+   struct resv_map *resv, long f, long t, long regions_needed)
 {
long add = 0;

@@ -342,7 +375,7 @@ static long region_add(struct resv_map *resv, long f, long 
t,

VM_BUG_ON(resv->region_cache_count < regions_needed);

-   add = add_reservation_in_range(resv, f, t, NULL, false);
+   add = add_reservation_in_range(resv, f, t, h_cg, h, NULL, false);
resv->adds_in_progress -= regions_needed;

spin_unlock(>lock);
@@ -380,7 +413,8 @@ static long region_chg(struct resv_map *resv, long f, long 
t,
spin_lock(>lock);

/* Count how many hugepages in this range are NOT respresented. */
-   chg = add_reservation_in_range(resv, f, t, _needed, true);
+   chg = add_reservation_in_range(resv, f, t, NULL, NULL, _needed,
+

[PATCH v4 0/9] hugetlb_cgroup: Add hugetlb_cgroup reservation limits

2019-09-10 Thread Mina Almasry

Problem:
Currently tasks attempting to allocate more hugetlb memory than is available get
a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1].
However, if a task attempts to allocate hugetlb memory only more than its
hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call,
but will SIGBUS the task when it attempts to fault the memory in.

We have developers interested in using hugetlb_cgroups, and they have expressed
dissatisfaction regarding this behavior. We'd like to improve this
behavior such that tasks violating the hugetlb_cgroup limits get an error on
mmap/shmget time, rather than getting SIGBUS'd when they try to fault
the excess memory in.

The underlying problem is that today's hugetlb_cgroup accounting happens
at hugetlb memory *fault* time, rather than at *reservation* time.
Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and
the offending task gets SIGBUS'd.

Proposed Solution:
A new page counter named hugetlb.xMB.reservation_[limit|usage]_in_bytes. This
counter has slightly different semantics than
hugetlb.xMB.[limit|usage]_in_bytes:

- While usage_in_bytes tracks all *faulted* hugetlb memory,
reservation_usage_in_bytes tracks all *reserved* hugetlb memory.

- If a task attempts to reserve more memory than limit_in_bytes allows,
the kernel will allow it to do so. But if a task attempts to reserve
more memory than reservation_limit_in_bytes, the kernel will fail this
reservation.

This proposal is implemented in this patch, with tests to verify
functionality and show the usage.

Alternatives considered:
1. A new cgroup, instead of only a new page_counter attached to
   the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code
   duplication with hugetlb_cgroup. Keeping hugetlb related page counters under
   hugetlb_cgroup seemed cleaner as well.

2. Instead of adding a new counter, we considered adding a sysctl that modifies
   the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do accounting at
   reservation time rather than fault time. Adding a new page_counter seems
   better as userspace could, if it wants, choose to enforce different cgroups
   differently: one via limit_in_bytes, and another via
   reservation_limit_in_bytes. This could be very useful if you're
   transitioning how hugetlb memory is partitioned on your system one
   cgroup at a time, for example. Also, someone may find usage for both
   limit_in_bytes and reservation_limit_in_bytes concurrently, and this
   approach gives them the option to do so.

Caveats:
1. This support is implemented for cgroups-v1. I have not tried
   hugetlb_cgroups with cgroups v2, and AFAICT it's not supported yet.
   This is largely because we use cgroups-v1 for now. If required, I
   can add hugetlb_cgroup support to cgroups v2 in this patch or
   a follow up.
2. Most complicated bit of this patch I believe is: where to store the
   pointer to the hugetlb_cgroup to uncharge at unreservation time?
   Normally the cgroup pointers hang off the struct page. But, with
   hugetlb_cgroup reservations, one task can reserve a specific page and another
   task may fault it in (I believe), so storing the pointer in struct
   page is not appropriate. Proposed approach here is to store the pointer in
   the resv_map. See patch for details.

Testing:
- Added tests passing.
- libhugetlbfs tests mostly passing, but some tests have trouble with and
  without this patch series. Seems environment issue rather than code:
  - Overall results:
** TEST SUMMARY
*  2M
*  32-bit 64-bit
* Total testcases:84  0
* Skipped: 0  0
*PASS:66  0
*FAIL:14  0
*Killed by signal: 0  0
*   Bad configuration: 4  0
*   Expected FAIL: 0  0
* Unexpected PASS: 0  0
*Test not present: 0  0
* Strange test result: 0  0
**
  - Failing tests:
- elflink_rw_and_share_test("linkhuge_rw") segfaults with and without this
  patch series.
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 32):
  FAILAddress is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc
  HUGETLB_MORECORE=yes malloc (2M: 32):
  FAILAddress is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 32):
  FAILAddress is not hugepage
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
  HUGETLB_MORECORE=yes heapshrink (2M: 32):
  FAILHeap not on hugepages
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
  libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 32):
  FAILHeap not on hugepages
- HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAILsmall_data is not hugepage
- HUGETLB_ELFMAP=RW

[PATCH v4 8/9] hugetlb_cgroup: Add hugetlb_cgroup reservation tests

2019-09-10 Thread Mina Almasry

The tests use both shared and private mapped hugetlb memory, and
monitors the hugetlb usage counter as well as the hugetlb reservation
counter. They test different configurations such as hugetlb memory usage
via hugetlbfs, or MAP_HUGETLB, or shmget/shmat, and with and without
MAP_POPULATE.

Signed-off-by: Mina Almasry 
---
 tools/testing/selftests/vm/.gitignore |   1 +
 tools/testing/selftests/vm/Makefile   |   4 +
 .../selftests/vm/charge_reserved_hugetlb.sh   | 440 ++
 .../selftests/vm/write_hugetlb_memory.sh  |  22 +
 .../testing/selftests/vm/write_to_hugetlbfs.c | 252 ++
 5 files changed, 719 insertions(+)
 create mode 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh
 create mode 100644 tools/testing/selftests/vm/write_hugetlb_memory.sh
 create mode 100644 tools/testing/selftests/vm/write_to_hugetlbfs.c

diff --git a/tools/testing/selftests/vm/.gitignore 
b/tools/testing/selftests/vm/.gitignore
index 31b3c98b6d34d..d3bed9407773c 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -14,3 +14,4 @@ virtual_address_range
 gup_benchmark
 va_128TBswitch
 map_fixed_noreplace
+write_to_hugetlbfs
diff --git a/tools/testing/selftests/vm/Makefile 
b/tools/testing/selftests/vm/Makefile
index 9534dc2bc9295..8d37d5409b52c 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -18,6 +18,7 @@ TEST_GEN_FILES += transhuge-stress
 TEST_GEN_FILES += userfaultfd
 TEST_GEN_FILES += va_128TBswitch
 TEST_GEN_FILES += virtual_address_range
+TEST_GEN_FILES += write_to_hugetlbfs

 TEST_PROGS := run_vmtests

@@ -29,3 +30,6 @@ include ../lib.mk
 $(OUTPUT)/userfaultfd: LDLIBS += -lpthread

 $(OUTPUT)/mlock-random-test: LDLIBS += -lcap
+
+# Why does adding $(OUTPUT)/ like above not apply this flag..?
+write_to_hugetlbfs: CFLAGS += -static
diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh 
b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
new file mode 100755
index 0..09e90e8f6fab4
--- /dev/null
+++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
@@ -0,0 +1,440 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+set -e
+
+cgroup_path=/dev/cgroup/memory
+if [[ ! -e $cgroup_path ]]; then
+  mkdir -p $cgroup_path
+  mount -t cgroup -o hugetlb,memory cgroup $cgroup_path
+fi
+
+cleanup () {
+   echo $$ > $cgroup_path/tasks
+
+   set +e
+   if [[ "$(pgrep write_to_hugetlbfs)" != "" ]]; then
+ kill -2 write_to_hugetlbfs
+ # Wait for hugetlbfs memory to get depleted.
+ sleep 0.5
+   fi
+   set -e
+
+   if [[ -e /mnt/huge ]]; then
+ rm -rf /mnt/huge/*
+ umount /mnt/huge || echo error
+ rmdir /mnt/huge
+   fi
+   if [[ -e $cgroup_path/hugetlb_cgroup_test ]]; then
+ rmdir $cgroup_path/hugetlb_cgroup_test
+   fi
+   if [[ -e $cgroup_path/hugetlb_cgroup_test1 ]]; then
+ rmdir $cgroup_path/hugetlb_cgroup_test1
+   fi
+   if [[ -e $cgroup_path/hugetlb_cgroup_test2 ]]; then
+ rmdir $cgroup_path/hugetlb_cgroup_test2
+   fi
+   echo 0 > /proc/sys/vm/nr_hugepages
+   echo CLEANUP DONE
+}
+
+cleanup
+
+function expect_equal() {
+  local expected="$1"
+  local actual="$2"
+  local error="$3"
+
+  if [[ "$expected" != "$actual" ]]; then
+   echo "expected ($expected) != actual ($actual): $3"
+   cleanup
+   exit 1
+  fi
+}
+
+function setup_cgroup() {
+  local name="$1"
+  local cgroup_limit="$2"
+  local reservation_limit="$3"
+
+  mkdir $cgroup_path/$name
+
+  echo writing cgroup limit: "$cgroup_limit"
+  echo "$cgroup_limit" > $cgroup_path/$name/hugetlb.2MB.limit_in_bytes
+
+  echo writing reseravation limit: "$reservation_limit"
+  echo "$reservation_limit" > \
+   $cgroup_path/$name/hugetlb.2MB.reservation_limit_in_bytes
+  echo 0 > $cgroup_path/$name/cpuset.cpus
+  echo 0 > $cgroup_path/$name/cpuset.mems
+}
+
+function write_hugetlbfs_and_get_usage() {
+  local cgroup="$1"
+  local size="$2"
+  local populate="$3"
+  local write="$4"
+  local path="$5"
+  local method="$6"
+  local private="$7"
+  local expect_failure="$8"
+
+  # Function return values.
+  reservation_failed=0
+  oom_killed=0
+  hugetlb_difference=0
+  reserved_difference=0
+
+  local hugetlb_usage=$cgroup_path/$cgroup/hugetlb.2MB.usage_in_bytes
+  local 
reserved_usage=$cgroup_path/$cgroup/hugetlb.2MB.reservation_usage_in_bytes
+
+  local hugetlb_before=$(cat $hugetlb_usage)
+  local reserved_before=$(cat $reserved_usage)
+
+  echo
+  echo Starting:
+  echo hugetlb_usage="$hugetlb_before"
+  echo reserved_usage="$reserved_before"
+  echo expect_failure is "$expect_failure"
+
+  set +e
+  if [[ "$method" == "1" ]] || [[ "$method" == 2 ]] || \
+

[PATCH v4 1/9] hugetlb_cgroup: Add hugetlb_cgroup reservation counter

2019-09-10 Thread Mina Almasry

These counters will track hugetlb reservations rather than hugetlb
memory faulted in. This patch only adds the counter, following patches
add the charging and uncharging of the counter.

Signed-off-by: Mina Almasry 
Acked-by: Hillf Danton 
---
 include/linux/hugetlb.h |  16 +-
 mm/hugetlb_cgroup.c | 111 ++--
 2 files changed, 100 insertions(+), 27 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index edfca42783192..128ff1aff1c93 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -320,6 +320,20 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, 
unsigned long addr,

 #ifdef CONFIG_HUGETLB_PAGE

+enum {
+   HUGETLB_RES_USAGE,
+   HUGETLB_RES_RESERVATION_USAGE,
+   HUGETLB_RES_LIMIT,
+   HUGETLB_RES_RESERVATION_LIMIT,
+   HUGETLB_RES_MAX_USAGE,
+   HUGETLB_RES_RESERVATION_MAX_USAGE,
+   HUGETLB_RES_FAILCNT,
+   HUGETLB_RES_RESERVATION_FAILCNT,
+   HUGETLB_RES_NULL,
+   HUGETLB_RES_MAX,
+};
+
+
 #define HSTATE_NAME_LEN 32
 /* Defines one hugetlb page size */
 struct hstate {
@@ -340,7 +354,7 @@ struct hstate {
unsigned int surplus_huge_pages_node[MAX_NUMNODES];
 #ifdef CONFIG_CGROUP_HUGETLB
/* cgroup control files */
-   struct cftype cgroup_files[5];
+   struct cftype cgroup_files[HUGETLB_RES_MAX];
 #endif
char name[HSTATE_NAME_LEN];
 };
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index 68c2f2f3c05b7..51a72624bd1ff 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -25,6 +25,10 @@ struct hugetlb_cgroup {
 * the counter to account for hugepages from hugetlb.
 */
struct page_counter hugepage[HUGE_MAX_HSTATE];
+   /*
+* the counter to account for hugepage reservations from hugetlb.
+*/
+   struct page_counter reserved_hugepage[HUGE_MAX_HSTATE];
 };

 #define MEMFILE_PRIVATE(x, val)(((x) << 16) | (val))
@@ -33,6 +37,15 @@ struct hugetlb_cgroup {

 static struct hugetlb_cgroup *root_h_cgroup __read_mostly;

+static inline
+struct page_counter *hugetlb_cgroup_get_counter(struct hugetlb_cgroup *h_cg, 
int idx,
+bool reserved)
+{
+   if (reserved)
+   return  _cg->reserved_hugepage[idx];
+   return _cg->hugepage[idx];
+}
+
 static inline
 struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
 {
@@ -254,30 +267,33 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned 
long nr_pages,
return;
 }

-enum {
-   RES_USAGE,
-   RES_LIMIT,
-   RES_MAX_USAGE,
-   RES_FAILCNT,
-};
-
 static u64 hugetlb_cgroup_read_u64(struct cgroup_subsys_state *css,
   struct cftype *cft)
 {
struct page_counter *counter;
+   struct page_counter *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css);

counter = _cg->hugepage[MEMFILE_IDX(cft->private)];
+   reserved_counter = _cg->reserved_hugepage[MEMFILE_IDX(cft->private)];

switch (MEMFILE_ATTR(cft->private)) {
-   case RES_USAGE:
+   case HUGETLB_RES_USAGE:
return (u64)page_counter_read(counter) * PAGE_SIZE;
-   case RES_LIMIT:
+   case HUGETLB_RES_RESERVATION_USAGE:
+   return (u64)page_counter_read(reserved_counter) * PAGE_SIZE;
+   case HUGETLB_RES_LIMIT:
return (u64)counter->max * PAGE_SIZE;
-   case RES_MAX_USAGE:
+   case HUGETLB_RES_RESERVATION_LIMIT:
+   return (u64)reserved_counter->max * PAGE_SIZE;
+   case HUGETLB_RES_MAX_USAGE:
return (u64)counter->watermark * PAGE_SIZE;
-   case RES_FAILCNT:
+   case HUGETLB_RES_RESERVATION_MAX_USAGE:
+   return (u64)reserved_counter->watermark * PAGE_SIZE;
+   case HUGETLB_RES_FAILCNT:
return counter->failcnt;
+   case HUGETLB_RES_RESERVATION_FAILCNT:
+   return reserved_counter->failcnt;
default:
BUG();
}
@@ -291,6 +307,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file 
*of,
int ret, idx;
unsigned long nr_pages;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
+   bool reserved = false;

if (hugetlb_cgroup_is_root(h_cg)) /* Can't set limit on root */
return -EINVAL;
@@ -304,9 +321,13 @@ static ssize_t hugetlb_cgroup_write(struct 
kernfs_open_file *of,
nr_pages = round_down(nr_pages, 1 << huge_page_order([idx]));

switch (MEMFILE_ATTR(of_cft(of)->private)) {
-   case RES_LIMIT:
+   case HUGETLB_RES_RESERVATION_LIMIT:
+   reserved = true;
+   /* Fall through. */
+   case HUGETLB_RES_LIMIT:
mutex_lock(_limit_mutex);
-   ret = page_counter_set_max(_cg->hugepage[idx], nr_pages);
+   ret = page_counter_set_max(hugetlb_cgroup_get_counter(h_cg,

[PATCH v4 4/9] hugetlb: region_chg provides only cache entry

2019-09-10 Thread Mina Almasry

Current behavior is that region_chg provides both a cache entry in
resv->region_cache, AND a placeholder entry in resv->regions. region_add
first tries to use the placeholder, and if it finds that the placeholder
has been deleted by a racing region_del call, it uses the cache entry.

This behavior is completely unnecessary and is removed in this patch for
a couple of reasons:

1. region_add needs to either find a cached file_region entry in
   resv->region_cache, or find an entry in resv->regions to expand. It
   does not need both.
2. region_chg adding a placeholder entry in resv->regions opens up
   a possible race with region_del, where region_chg adds a placeholder
   region in resv->regions, and this region is deleted by a racing call
   to region_del during region_chg execution or before region_add is
   called. Removing the race makes the code easier to reason about and
   maintain.

In addition, a follow up patch in this series disables region
coalescing, which would be further complicated if the race with
region_del exists.

Signed-off-by: Mina Almasry 
---
 mm/hugetlb.c | 63 +---
 1 file changed, 11 insertions(+), 52 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fbd7c52e17348..bea51ae422f63 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -246,14 +246,10 @@ struct file_region {

 /*
  * Add the huge page range represented by [f, t) to the reserve
- * map.  In the normal case, existing regions will be expanded
- * to accommodate the specified range.  Sufficient regions should
- * exist for expansion due to the previous call to region_chg
- * with the same range.  However, it is possible that region_del
- * could have been called after region_chg and modifed the map
- * in such a way that no region exists to be expanded.  In this
- * case, pull a region descriptor from the cache associated with
- * the map and use that for the new range.
+ * map.  Existing regions will be expanded to accommodate the specified
+ * range, or a region will be taken from the cache.  Sufficient regions
+ * must exist in the cache due to the previous call to region_chg with
+ * the same range.
  *
  * Return the number of new huge pages added to the map.  This
  * number is greater than or equal to zero.
@@ -272,9 +268,8 @@ static long region_add(struct resv_map *resv, long f, long 
t)

/*
 * If no region exists which can be expanded to include the
-* specified range, the list must have been modified by an
-* interleving call to region_del().  Pull a region descriptor
-* from the cache and use it for this range.
+* specified range, pull a region descriptor from the cache
+* and use it for this range.
 */
if (>link == head || t < rg->from) {
VM_BUG_ON(resv->region_cache_count <= 0);
@@ -339,15 +334,9 @@ static long region_add(struct resv_map *resv, long f, long 
t)
  * call to region_add that will actually modify the reserve
  * map to add the specified range [f, t).  region_chg does
  * not change the number of huge pages represented by the
- * map.  However, if the existing regions in the map can not
- * be expanded to represent the new range, a new file_region
- * structure is added to the map as a placeholder.  This is
- * so that the subsequent region_add call will have all the
- * regions it needs and will not fail.
- *
- * Upon entry, region_chg will also examine the cache of region descriptors
- * associated with the map.  If there are not enough descriptors cached, one
- * will be allocated for the in progress add operation.
+ * map.  A new file_region structure is added to the cache
+ * as a placeholder, so that the subsequent region_add
+ * call will have all the regions it needs and will not fail.
  *
  * Returns the number of huge pages that need to be added to the existing
  * reservation map for the range [f, t).  This number is greater or equal to
@@ -357,10 +346,9 @@ static long region_add(struct resv_map *resv, long f, long 
t)
 static long region_chg(struct resv_map *resv, long f, long t)
 {
struct list_head *head = >regions;
-   struct file_region *rg, *nrg = NULL;
+   struct file_region *rg;
long chg = 0;

-retry:
spin_lock(>lock);
 retry_locked:
resv->adds_in_progress++;
@@ -378,10 +366,8 @@ static long region_chg(struct resv_map *resv, long f, long 
t)
spin_unlock(>lock);

trg = kmalloc(sizeof(*trg), GFP_KERNEL);
-   if (!trg) {
-   kfree(nrg);
+   if (!trg)
return -ENOMEM;
-   }

spin_lock(>lock);
list_add(>link, >region_cache);
@@ -394,28 +380,6 @@ static long region_chg(struct resv_map *resv, long f, long 
t)
if (f <= rg->to)
break;

-   /* If we are below the current region then a new region is required.
-* Subtle,

[PATCH v4 6/9] hugetlb: disable region_add file_region coalescing

2019-09-10 Thread Mina Almasry

A follow up patch in this series adds hugetlb cgroup uncharge info the
file_region entries in resv->regions. The cgroup uncharge info may
differ for different regions, so they can no longer be coalesced at
region_add time. So, disable region coalescing in region_add in this
patch.

Behavior change:

Say a resv_map exists like this [0->1], [2->3], and [5->6].

Then a region_chg/add call comes in region_chg/add(f=0, t=5).

Old code would generate resv->regions: [0->5], [5->6].
New code would generate resv->regions: [0->1], [1->2], [2->3], [3->5],
[5->6].

Special care needs to be taken to handle the resv->adds_in_progress
variable correctly. In the past, only 1 region would be added for every
region_chg and region_add call. But now, each call may add multiple
regions, so we can no longer increment adds_in_progress by 1 in region_chg,
or decrement adds_in_progress by 1 after region_add or region_abort. Instead,
region_chg calls add_reservation_in_range() to count the number of regions
needed and allocates those, and that info is passed to region_add and
region_abort to decrement adds_in_progress correctly.

Signed-off-by: Mina Almasry 
---
 mm/hugetlb.c | 279 ++-
 1 file changed, 167 insertions(+), 112 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ce5ed1056fefd..5eca34d9b753d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -244,55 +244,80 @@ struct file_region {
long to;
 };

+/* Helper that removes a struct file_region from the resv_map cache and returns
+ * it for use.
+ */
+static struct file_region *get_file_region_entry_from_cache(
+   struct resv_map *resv, long from, long to);
+
 static long add_reservation_in_range(
-   struct resv_map *resv, long f, long t, bool count_only)
+   struct resv_map *resv,
+   long f, long t,
+   long *regions_needed,
+   bool count_only)
 {
-
-   long chg = 0;
+   long add = 0;
struct list_head *head = >regions;
+   long last_accounted_offset = f;
struct file_region *rg = NULL, *trg = NULL, *nrg = NULL;

-   /* Locate the region we are before or in. */
-   list_for_each_entry(rg, head, link)
-   if (f <= rg->to)
-   break;
-
-   /* Round our left edge to the current segment if it encloses us. */
-   if (f > rg->from)
-   f = rg->from;
+   if (regions_needed)
+   *regions_needed = 0;

-   chg = t - f;
+   /* In this loop, we essentially handle an entry for the range
+* last_accounted_offset -> rg->from, at every iteration, with some
+* bounds checking.
+*/
+   list_for_each_entry_safe(rg, trg, head, link) {
+   /* Skip irrelevant regions that start before our range. */
+   if (rg->from < f) {
+   /* If this region ends after the last accounted offset,
+* then we need to update last_accounted_offset.
+*/
+   if (rg->to > last_accounted_offset)
+   last_accounted_offset = rg->to;
+   continue;
+   }

-   /* Check for and consume any regions we now overlap with. */
-   nrg = rg;
-   list_for_each_entry_safe(rg, trg, rg->link.prev, link) {
-   if (>link == head)
-   break;
+   /* When we find a region that starts beyond our range, we've
+* finished.
+*/
if (rg->from > t)
break;

-   /* We overlap with this area, if it extends further than
-* us then we must extend ourselves.  Account for its
-* existing reservation.
+   /* Add an entry for last_accounted_offset -> rg->from, and
+* update last_accounted_offset.
 */
-   if (rg->to > t) {
-   chg += rg->to - t;
-   t = rg->to;
+   if (rg->from > last_accounted_offset) {
+   add += rg->from - last_accounted_offset;
+   if (!count_only) {
+   nrg = get_file_region_entry_from_cache(resv,
+   last_accounted_offset,
+   rg->from);
+   list_add(>link, rg->link.prev);
+   } else if (regions_needed)
+   *regions_needed += 1;
}
-   chg -= rg->to - rg->from;

-   if (!count_only && rg != nrg) {
-   list_del(>link);
-   kfree(rg);
-   }
+   last_accounted_offset = rg->to;
}

-   if (!count_only) {
-   nrg->from = f;
-   nrg->to = t;
+   /* Handle the

Re: [PATCH] ftgmac100: Disable HW checksum generation on AST2500

2019-09-10 Thread Vijay Khemka



On 9/10/19, 3:50 PM, "Linux-aspeed on behalf of Vijay Khemka" 
 wrote:



On 9/10/19, 3:05 PM, "Florian Fainelli"  wrote:

On 9/10/19 2:37 PM, Vijay Khemka wrote:
> HW checksum generation is not working for AST2500, specially with IPV6
> over NCSI. All TCP packets with IPv6 get dropped. By disabling this
> it works perfectly fine with IPV6.
> 
> Verified with IPV6 enabled and can do ssh.

How about IPv4, do these packets have problem? If not, can you continue
advertising NETIF_F_IP_CSUM but take out NETIF_F_IPV6_CSUM?

I changed code from (netdev->hw_features &= ~NETIF_F_HW_CSUM) to 
(netdev->hw_features &= ~NETIF_F_ IPV6_CSUM). And it is not working. 
Don't know why. IPV4 works without any change but IPv6 needs HW_CSUM
Disabled.

Now I changed to
netdev->hw_features &= (~NETIF_F_HW_CSUM) | NETIF_F_IP_CSUM;
And it works.

> 
> Signed-off-by: Vijay Khemka 
> ---
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
> index 030fed65393e..591c9725002b 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1839,8 +1839,9 @@ static int ftgmac100_probe(struct 
platform_device *pdev)
>   if (priv->use_ncsi)
>   netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
>  
> - /* AST2400  doesn't have working HW checksum generation */
> - if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> + /* AST2400  and AST2500 doesn't have working HW checksum 
generation */
> + if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac") ||
> +of_device_is_compatible(np, "aspeed,ast2500-mac")))
>   netdev->hw_features &= ~NETIF_F_HW_CSUM;
>   if (np && of_get_property(np, "no-hw-checksum", NULL))
>   netdev->hw_features &= ~(NETIF_F_HW_CSUM | 
NETIF_F_RXCSUM);
> 


-- 
Florian

Re: pivot_root(".", ".") and the fchdir() dance

2019-09-10 Thread Eric W. Biederman

"Michael Kerrisk (man-pages)"  writes:

> Hello Christian,
>
>>> All: I plan to add the following text to the manual page:
>>>
>>>new_root and put_old may be the same  directory.   In  particular,
>>>the following sequence allows a pivot-root operation without need‐
>>>ing to create and remove a temporary directory:
>>>
>>>chdir(new_root);
>>>pivot_root(".", ".");
>>>umount2(".", MNT_DETACH);
>> 
>> Hm, should we mention that MS_PRIVATE or MS_SLAVE is usually needed
>> before the umount2()? Especially for the container case... I think we
>> discussed this briefly yesterday in person.
> Thanks for noticing. That detail (more precisely: not MS_SHARED) is
> already covered in the numerous other changes that I have pending
> for this page:
>
>The following restrictions apply:
>...
>-  The propagation type of new_root and its parent mount must  not
>   be MS_SHARED; similarly, if put_old is an existing mount point,
>   its propagation type must not be MS_SHARED.

Ugh.  That is close but not quite correct.

A better explanation:

The pivot_root system call will never propagate any changes it makes.
The pivot_root system call ensures this is safe by verifying that
none of put_old, the parent of new_root, and parent of the root directory
have a propagation type of MS_SHARED.

>

The concern from our conversation at the container mini-summit was that
there is a pathology if in your initial mount namespace all of the
mounts are marked MS_SHARED like systemd does (and is almost necessary
if you are going to use mount propagation), that if new_root itself
is MS_SHARED then unmounting the old_root could propagate.

So I believe the desired sequence is:

>>>chdir(new_root);
+++mount("", ".", MS_SLAVE | MS_REC, NULL);
>>>pivot_root(".", ".");
>>>umount2(".", MNT_DETACH);

The change to new new_root could be either MS_SLAVE or MS_PRIVATE.  So
long as it is not MS_SHARED the mount won't propagate back to the
parent mount namespace.

Eric

[PATCH] wimax: i2400: fix memory leak

2019-09-10 Thread Navid Emamdoost

In i2400m_op_rfkill_sw_toggle cmd buffer should be released along with
skb response.

Signed-off-by: Navid Emamdoost 
---
 drivers/net/wimax/i2400m/op-rfkill.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wimax/i2400m/op-rfkill.c 
b/drivers/net/wimax/i2400m/op-rfkill.c
index 6642bcb27761..8efb493ceec2 100644
--- a/drivers/net/wimax/i2400m/op-rfkill.c
+++ b/drivers/net/wimax/i2400m/op-rfkill.c
@@ -127,6 +127,7 @@ int i2400m_op_rfkill_sw_toggle(struct wimax_dev *wimax_dev,
"%d\n", result);
result = 0;
 error_cmd:
+   kfree(cmd);
kfree_skb(ack_skb);
 error_msg_to_dev:
 error_alloc:
-- 
2.17.1

Re: [PATCH] ftgmac100: Disable HW checksum generation on AST2500

2019-09-10 Thread Vijay Khemka



On 9/10/19, 3:05 PM, "Florian Fainelli"  wrote:

On 9/10/19 2:37 PM, Vijay Khemka wrote:
> HW checksum generation is not working for AST2500, specially with IPV6
> over NCSI. All TCP packets with IPv6 get dropped. By disabling this
> it works perfectly fine with IPV6.
> 
> Verified with IPV6 enabled and can do ssh.

How about IPv4, do these packets have problem? If not, can you continue
advertising NETIF_F_IP_CSUM but take out NETIF_F_IPV6_CSUM?

I changed code from (netdev->hw_features &= ~NETIF_F_HW_CSUM) to 
(netdev->hw_features &= ~NETIF_F_ IPV6_CSUM). And it is not working. 
Don't know why. IPV4 works without any change but IPv6 needs HW_CSUM
Disabled.

> 
> Signed-off-by: Vijay Khemka 
> ---
>  drivers/net/ethernet/faraday/ftgmac100.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c 
b/drivers/net/ethernet/faraday/ftgmac100.c
> index 030fed65393e..591c9725002b 100644
> --- a/drivers/net/ethernet/faraday/ftgmac100.c
> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> @@ -1839,8 +1839,9 @@ static int ftgmac100_probe(struct platform_device 
*pdev)
>   if (priv->use_ncsi)
>   netdev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
>  
> - /* AST2400  doesn't have working HW checksum generation */
> - if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac")))
> + /* AST2400  and AST2500 doesn't have working HW checksum generation */
> + if (np && (of_device_is_compatible(np, "aspeed,ast2400-mac") ||
> +of_device_is_compatible(np, "aspeed,ast2500-mac")))
>   netdev->hw_features &= ~NETIF_F_HW_CSUM;
>   if (np && of_get_property(np, "no-hw-checksum", NULL))
>   netdev->hw_features &= ~(NETIF_F_HW_CSUM | NETIF_F_RXCSUM);
> 


-- 
Florian

[PATCH] arm64: fix function types in COND_SYSCALL

2019-09-10 Thread Sami Tolvanen

Define a weak function in COND_SYSCALL instead of a weak alias to
sys_ni_syscall, which has an incompatible type. This fixes indirect
call mismatches with Control-Flow Integrity (CFI) checking.

Signed-off-by: Sami Tolvanen 
---
 arch/arm64/include/asm/syscall_wrapper.h | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/syscall_wrapper.h 
b/arch/arm64/include/asm/syscall_wrapper.h
index 507d0ee6bc69..06d880b3526c 100644
--- a/arch/arm64/include/asm/syscall_wrapper.h
+++ b/arch/arm64/include/asm/syscall_wrapper.h
@@ -8,6 +8,8 @@
 #ifndef __ASM_SYSCALL_WRAPPER_H
 #define __ASM_SYSCALL_WRAPPER_H
 
+struct pt_regs;
+
 #define SC_ARM64_REGS_TO_ARGS(x, ...)  \
__MAP(x,__SC_ARGS   \
  ,,regs->regs[0],,regs->regs[1],,regs->regs[2] \
@@ -35,8 +37,11 @@
ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO);   
\
asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs 
*__unused)
 
-#define COND_SYSCALL_COMPAT(name) \
-   cond_syscall(__arm64_compat_sys_##name);
+#define COND_SYSCALL_COMPAT(name)  
\
+   asmlinkage long __weak __arm64_compat_sys_##name(const struct pt_regs 
*regs)\
+   {   
\
+   return sys_ni_syscall();
\
+   }
 
 #define COMPAT_SYS_NI(name) \
SYSCALL_ALIAS(__arm64_compat_sys_##name, sys_ni_posix_timers);
@@ -70,7 +75,11 @@
 #endif
 
 #ifndef COND_SYSCALL
-#define COND_SYSCALL(name) cond_syscall(__arm64_sys_##name)
+#define COND_SYSCALL(name) 
\
+   asmlinkage long __weak __arm64_sys_##name(const struct pt_regs *regs)   
\
+   {   
\
+   return sys_ni_syscall();
\
+   }
 #endif
 
 #ifndef SYS_NI
-- 
2.23.0.162.g0b9fbb3734-goog

RE: [PATCH] mm: Add callback for defining compaction completion

2019-09-10 Thread Nitin Gupta

> -Original Message-
> From: owner-linux...@kvack.org  On Behalf
> Of Michal Hocko
> Sent: Tuesday, September 10, 2019 1:19 PM
> To: Nitin Gupta 
> Cc: a...@linux-foundation.org; vba...@suse.cz;
> mgor...@techsingularity.net; dan.j.willi...@intel.com;
> khalid.a...@oracle.com; Matthew Wilcox ; Yu Zhao
> ; Qian Cai ; Andrey Ryabinin
> ; Allison Randal ; Mike
> Rapoport ; Thomas Gleixner
> ; Arun KS ; Wei Yang
> ; linux-kernel@vger.kernel.org; linux-
> m...@kvack.org
> Subject: Re: [PATCH] mm: Add callback for defining compaction completion
> 
> On Tue 10-09-19 13:07:32, Nitin Gupta wrote:
> > For some applications we need to allocate almost all memory as
> hugepages.
> > However, on a running system, higher order allocations can fail if the
> > memory is fragmented. Linux kernel currently does on-demand
> compaction
> > as we request more hugepages but this style of compaction incurs very
> > high latency. Experiments with one-time full memory compaction
> > (followed by hugepage allocations) shows that kernel is able to
> > restore a highly fragmented memory state to a fairly compacted memory
> > state within <1 sec for a 32G system. Such data suggests that a more
> > proactive compaction can help us allocate a large fraction of memory
> > as hugepages keeping allocation latencies low.
> >
> > In general, compaction can introduce unexpected latencies for
> > applications that don't even have strong requirements for contiguous
> > allocations. It is also hard to efficiently determine if the current
> > system state can be easily compacted due to mixing of unmovable
> > memory. Due to these reasons, automatic background compaction by the
> > kernel itself is hard to get right in a way which does not hurt unsuspecting
> applications or waste CPU cycles.
> 
> We do trigger background compaction on a high order pressure from the
> page allocator by waking up kcompactd. Why is that not sufficient?
> 

Whenever kcompactd is woken up, it does just enough work to create
one free page of the given order (compaction_control.order) or higher.

Such a design causes very high latency for workloads where we want
to allocate lots of hugepages in short period of time. With pro-active
compaction we can hide much of this latency. For some more background
discussion and data, please see this thread:

https://patchwork.kernel.org/patch/11098289/

> > Even with these caveats, pro-active compaction can still be very
> > useful in certain scenarios to reduce hugepage allocation latencies.
> > This callback interface allows drivers to drive compaction based on
> > their own policies like the current level of external fragmentation
> > for a particular order, system load etc.
> 
> So we do not trust the core MM to make a reasonable decision while we give
> a free ticket to modules. How does this make any sense at all? How is a
> random module going to make a more informed decision when it has less
> visibility on the overal MM situation.
>

Embedding any specific policy (like: keep external fragmentation for order-9
between 30-40%) within MM core looks like a bad idea. As a driver, we
can easily measure parameters like system load, current fragmentation level
for any order in any zone etc. to make an informed decision.
See the thread I refereed above for more background discussion.

> If you need to control compaction from the userspace you have an interface
> for that.  It is also completely unexplained why you need a completion
> callback.
> 

/proc/sys/vm/compact_memory does whole system compaction which is
often too much as a pro-active compaction strategy. To get more control
over how to compaction work to do, I have added a compaction callback
which controls how much work is done in one compaction cycle.
 
For example, as a test for this patch, I have a small test driver which defines
[low, high] external fragmentation thresholds for the HPAGE_ORDER. Whenever
extfrag is within this range, I run compact_zone_order with a callback which
returns COMPACT_CONTINUE till extfrag > low threshold and returns
COMPACT_PARTIAL_SKIPPED when extfrag <= low.

Here's the code for this sample driver:
https://gitlab.com/nigupta/memstress/snippets/1893847

Maybe this code can be added to Documentation/...

Thanks,
Nitin

> 
> > Signed-off-by: Nitin Gupta 
> > ---
> >  include/linux/compaction.h | 10 ++
> >  mm/compaction.c| 20 ++--
> >  mm/internal.h  |  2 ++
> >  3 files changed, 26 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> > index 9569e7c786d3..1ea828450fa2 100644
> > --- a/include/linux/compaction.h
> > +++ b/include/linux/compaction.h
> > @@ -58,6 +58,16 @@ enum compact_result {
> > COMPACT_SUCCESS,
> >  };
> >
> > +/* Callback function to determine if compaction is finished. */
> > +typedef enum compact_result (*compact_finished_cb)(
> > +   struct zone *zone, int order);
> > +
> > +enum compact_result

1 2 3 4 5 6 7 8 9 >

1 - 100 of 833 matches

Mail list logo