date:20190220

Re: [v2,1/2] mtd: spi-nor: Add support for EN25Q80A

2019-02-20 Thread Boris Brezillon

From: Boris Brezillon 

On Mon, 2019-02-18 at 12:04:43 UTC, Schrempf Frieder wrote:
> From: Frieder Schrempf 
> 
> This adds support for the EON EN25Q80A, a 8Mb SPI NOR chip.
> It is used on i.MX6 boards by Kontron Electronics GmbH
> (N60xx, N61xx).
> It was only tested with a single data line connected, by writing and
> reading random data with dd.
> 
> Signed-off-by: Frieder Schrempf 
> Reviewed-by: Tudor Ambarus 

Applied to http://git.infradead.org/linux-mtd.git spi-nor/next, thanks.

Boris

Re: [v2,2/2] mtd: spi-nor: Add support for MX25V8035F

2019-02-20 Thread Boris Brezillon

From: Boris Brezillon 

On Mon, 2019-02-18 at 12:04:43 UTC, Schrempf Frieder wrote:
> From: Frieder Schrempf 
> 
> This adds support for the Macronix MX25V8035F, a 8Mb SPI NOR chip.
> It is used on i.MX6UL/ULL SoMs by Kontron Electronics GmbH (N631x).
> It was only tested with a single data line connected, by writing and
> reading random data with dd.
> 
> Signed-off-by: Frieder Schrempf 
> Reviewed-by: Tudor Ambarus 

Applied to http://git.infradead.org/linux-mtd.git spi-nor/next, thanks.

Boris

[PATCH] PM-runtime: fix deadlock when canceling hrtimer

2019-02-20 Thread Vincent Guittot

When rpm_resume() desactivates the autosuspend timer, it should only try
to cancel hrtimer but not wait for the handler to finish because both
rpm_resume() and pm_suspend_timer_fn() are taking the power.lock.
We can have the deadlock sequence:
CPU0  CPU1
rpm_resume()
  spin_lock_irqsave
  pm_suspend_timer_fn()
spin_lock_irqsave
  pm_runtime_deactivate_timer()
hrtimer_cancel()

hrtimer_try_to_cancel() is enough because dev->power.timer_expires is also
set to 0.

Fixes: 8234f6734c5d ("PM-runtime: Switch autosuspend over to using hrtimers")
Reported-by: Sunzhaosheng Sun(Zhaosheng) 
Signed-off-by: Vincent Guittot 
---
 drivers/base/power/runtime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 04407d9..78937c4 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -121,7 +121,7 @@ EXPORT_SYMBOL_GPL(pm_runtime_suspended_time);
 static void pm_runtime_deactivate_timer(struct device *dev)
 {
if (dev->power.timer_expires > 0) {
-   hrtimer_cancel(>power.suspend_timer);
+   hrtimer_try_to_cancel(>power.suspend_timer);
dev->power.timer_expires = 0;
}
 }
-- 
2.7.4

Re: [next] mtd: spi-nor: cadence-quadspi: fix spelling mistake: "Couldnt't" -> "Couldn't"

2019-02-20 Thread Boris Brezillon

From: Boris Brezillon 

On Fri, 2019-02-15 at 15:15:47 UTC, Colin King wrote:
> From: Colin Ian King 
> 
> There is a spelling mistake in a dev_error message. Fix it.
> 
> Signed-off-by: Colin Ian King 
> Reviewed-by: Tudor Ambarus 

Applied to http://git.infradead.org/linux-mtd.git spi-nor/next, thanks.

Boris

Re: [PATCH -next] platform/chrome: Fix Kconfig dependencies for wilco_ec

2019-02-20 Thread Enric Balletbo i Serra

Hi,

On 21/2/19 0:09, Randy Dunlap wrote:
> On 2/20/19 2:11 PM, Nick Crews wrote:
>> In the initial version of the Wilco EC Driver, the
>> dependency order was wrong. It before was possible to
>> select CONFIG_WILCO_EC and CONFIG_CROS_EC_LPC without
>> having CONFIG_CROS_EC_LPC_MEC. This was wrong, since
>> WILCO_EC depends upon CONFIG CROS_EC_LPC_MEC, not the
>> other way around.
>>
>> Fixes: 1733c32834e5d1 ("platform/chrome: Add new driver for Wilco EC")
>> Signed-off-by: Nick Crews 
> 
> Reported-by: Randy Dunlap 
> Acked-by: Randy Dunlap  # build-tested
> 

As this is [-next] material I squashed that commit and queued for 5.1

Thanks,
 Enric


> Thanks.
> 
>> ---
>>  drivers/platform/chrome/Kconfig  | 2 +-
>>  drivers/platform/chrome/wilco_ec/Kconfig | 3 +--
>>  2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/platform/chrome/Kconfig 
>> b/drivers/platform/chrome/Kconfig
>> index 462eb9dfa4f2..b69561050868 100644
>> --- a/drivers/platform/chrome/Kconfig
>> +++ b/drivers/platform/chrome/Kconfig
>> @@ -95,7 +95,7 @@ config CROS_EC_LPC
>>  
>>  config CROS_EC_LPC_MEC
>>  bool "ChromeOS Embedded Controller LPC Microchip EC (MEC) variant"
>> -depends on CROS_EC_LPC || WILCO_EC
>> +depends on CROS_EC_LPC
>>  default n
>>  help
>>If you say Y here, a variant LPC protocol for the Microchip EC
>> diff --git a/drivers/platform/chrome/wilco_ec/Kconfig 
>> b/drivers/platform/chrome/wilco_ec/Kconfig
>> index 20945a301ec6..c6bc4e8f3062 100644
>> --- a/drivers/platform/chrome/wilco_ec/Kconfig
>> +++ b/drivers/platform/chrome/wilco_ec/Kconfig
>> @@ -1,7 +1,6 @@
>>  config WILCO_EC
>>  tristate "ChromeOS Wilco Embedded Controller"
>> -depends on ACPI && X86
>> -select CROS_EC_LPC_MEC
>> +depends on ACPI && X86 && CROS_EC_LPC_MEC
>>  help
>>If you say Y here, you get support for talking to the ChromeOS
>>Wilco EC over an eSPI bus. This uses a simple byte-level protocol
>>
> 
>

Re: [PATCH -next] platform/chrome: Fix off-by-one error in wilco_ec/debugfs.c

2019-02-20 Thread Enric Balletbo i Serra

Hi,

On 20/2/19 23:15, Nick Crews wrote:
> Hi Enric,
> 
> On Wed, Feb 20, 2019 at 3:06 PM Enric Balletbo i Serra
>  wrote:
>>
>> Hi Nick,
>>
>> Thanks for the patch.
>>
>> On 20/2/19 22:58, Nick Crews wrote:
>>> Before, in debugfs.c it was possible to supply only the message type,
>>> and not supply any other arguments when sending raw commands. However,
>>> this is never used by the EC, and it led to an underflow error. Now,
>>> just don't allow too short of a command, we will never need
>>> that anyways.
>>>
>>> Fixes: 46c7fd06f8c9 ("platform/chrome: wilco_ec: Add support for raw 
>>> commands in debugfs")
>>
>> As this is -next material I'd like to squash the fix if you don't mind.
> 
> 
> Please do. Fixing something after it's already in the tree was a new
> process for me,
> so I tried to copy other people's examples. Please let me know
> if there's anything else I should do something different next time.
> 

Squashed and pushed for 5.1

Thanks,
 Enric


> Nick
> 
>>
>> -- Enric
>>
>>> Signed-off-by: Nick Crews 
>>> ---
>>>  drivers/platform/chrome/wilco_ec/debugfs.c | 6 +++---
>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/platform/chrome/wilco_ec/debugfs.c 
>>> b/drivers/platform/chrome/wilco_ec/debugfs.c
>>> index 46ff3b6c46c7..c090db2cd5be 100644
>>> --- a/drivers/platform/chrome/wilco_ec/debugfs.c
>>> +++ b/drivers/platform/chrome/wilco_ec/debugfs.c
>>> @@ -136,8 +136,8 @@ static ssize_t raw_write(struct file *file, const char 
>>> __user *user_buf,
>>>   ret = parse_hex_sentence(buf, kcount, request_data, 
>>> TYPE_AND_DATA_SIZE);
>>>   if (ret < 0)
>>>   return ret;
>>> - /* Need at least two bytes for message type */
>>> - if (ret < 2)
>>> + /* Need at least two bytes for message type and one for command */
>>> + if (ret < 3)
>>>   return -EINVAL;
>>>
>>>   /* Clear response data buffer */
>>> @@ -145,7 +145,7 @@ static ssize_t raw_write(struct file *file, const char 
>>> __user *user_buf,
>>>
>>>   msg.type = request_data[0] << 8 | request_data[1];
>>>   msg.flags = WILCO_EC_FLAG_RAW;
>>> - msg.command = ret > 2 ? request_data[2] : 0;
>>> + msg.command = request_data[2];
>>>   msg.request_data = ret > 3 ? request_data + 3 : 0;
>>>   msg.request_size = ret - 3;
>>>   msg.response_data = debug_info->raw_data;
>>>

Re: [Xen-devel] [PATCH RFC 00/39] x86/KVM: Xen HVM guest support

2019-02-20 Thread Juergen Gross

On 21/02/2019 00:39, Marek Marczykowski-Górecki wrote:
> On Wed, Feb 20, 2019 at 08:15:30PM +, Joao Martins wrote:
>>  2. PV Driver support (patches 17 - 39)
>>
>>  We start by redirecting hypercalls from the backend to routines
>>  which emulate the behaviour that PV backends expect i.e. grant
>>  table and interdomain events. Next, we add support for late
>>  initialization of xenbus, followed by implementing
>>  frontend/backend communication mechanisms (i.e. grant tables and
>>  interdomain event channels). Finally, introduce xen-shim.ko,
>>  which will setup a limited Xen environment. This uses the added
>>  functionality of Xen specific shared memory (grant tables) and
>>  notifications (event channels).
> 
> Does it mean backends could be run in another guest, similarly as on
> real Xen? AFAIK virtio doesn't allow that as virtio backends need
> arbitrary write access to guest memory. But grant tables provide enough
> abstraction to do that safely.

As long as the grant table emulation in xen-shim isn't just a wrapper to
"normal" KVM guest memory access.

I guess the xen-shim implementation doesn't support the same kind of
guest memory isolation as Xen does?


Juergen

Re: [PATCH RFC v2 4/4] PCI: hotplug: Add quirk For Dell nvme pcie switches

2019-02-20 Thread Lukas Wunner

On Tue, Feb 19, 2019 at 07:20:30PM -0600, Alexandru Gagniuc wrote:
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -952,3 +952,23 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0400,
> PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
>  DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0401,
> PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
> +
> +

Duplicate newline.


> +static void fixup_dell_nvme_backplane_switches(struct pci_dev *pdev)

Can we have a little code comment above the function such as:

+/*
+ * Dell  NVMe storage backplanes disable in-band presence
+ * (PCIe r5.0 sec X.Y.Z) but neglect to set the corresponding flag in the
+ * Slot Capabilities 2 register.
+ */


> + if (pdev->subsystem_vendor != PCI_VENDOR_ID_DELL
> + || pdev->subsystem_device != 0x1fc7)

This looks a little unpolished, how about:

+   if (pdev->subsystem_vendor != PCI_VENDOR_ID_DELL ||
+   pdev->subsystem_device != 0x1fc7)


> + return;
> +
> + pdev->no_in_band_presence = 1;
> +}
> +
> +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_PLX, 0x9733,

By convention there's no blank line between the closing curly brace
and the DECLARE_PCI_FIXUP_CLASS_FINAL().

If the quirk is x86-specific, please enclose it in "#ifdef CONFIG_X86"
to reduce kernel footprint on other arches.

Thanks,

Lukas

Re: [RESEND PATCH] amba: Allow pclk to be controlled by power domain

2019-02-20 Thread Ulf Hansson

On Tue, 19 Feb 2019 at 07:43, Bjorn Andersson
 wrote:
>
> On Tue 05 Feb 06:58 PST 2019, Ulf Hansson wrote:
>
> > On Thu, 31 Jan 2019 at 03:01, Bjorn Andersson
> >  wrote:
> > >
> > > On the Qualcomm SDM845 platform the apb_pclk is controlled as part of
> > > the QDSS power/clock domain. Handle this by allowing amba to operate
> > > without direct apb_pclk control, when a powerdomain is attached and no
> > > clock is described.
> > >
> > > Signed-off-by: Bjorn Andersson 
> > > ---
> > >
> > > Resending this separate from the series it was originally part of.
> > >
> > >  drivers/amba/bus.c | 9 +++--
> > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
> > > index 41b706403ef7..3e13050c6d59 100644
> > > --- a/drivers/amba/bus.c
> > > +++ b/drivers/amba/bus.c
> > > @@ -219,8 +219,13 @@ static int amba_get_enable_pclk(struct amba_device 
> > > *pcdev)
> > > int ret;
> > >
> > > pcdev->pclk = clk_get(>dev, "apb_pclk");
> > > -   if (IS_ERR(pcdev->pclk))
> > > -   return PTR_ERR(pcdev->pclk);
> > > +   if (IS_ERR(pcdev->pclk)) {
> > > +   /* Continue with no clock specified, but pm_domain 
> > > attached */
> > > +   if (PTR_ERR(pcdev->pclk) == -ENOENT && 
> > > pcdev->dev.pm_domain)
> > > +   pcdev->pclk = NULL;
> >
> > This looks fragile to me.
> >
> > I would prefer to make a do match with DT, to check whether the clock
> > is needed or not.
>
> Can you please elaborate on what you want me to match on?
>
> As an example you can find the patch depending on this here:
> https://lore.kernel.org/lkml/60ebf1617f0285c89e921bf3839cba6906d493c9.1548419933.git.saiprakash.ran...@codeaurora.org/

I would extend the compatible with a "soc-id" prefix and match on that.

If that doesn't work, I guess we need check for the soc family/id,
thus use soc_device_match().

>
> > Moreover, there should be no reason to check for the
> > ->dev.pm_domain, because, if there was an error while doing the
> > attach, that should already have been reported/propagated.
> >
>
> The purpose of this check was to extend the current requirement of a
> clock to require either a clock or a power domain, rather than just
> making the clock optional - which would be the result if this part is
> omitted.

Well, that would break the current requirement for everybody else,
which is that the clock is required and the PM domain is optional.

[...]

Kind regards
Uffe

linux-next: Tree for Feb 21

2019-02-20 Thread Stephen Rothwell

Hi all,

Changes since 20190220:

The v4l-dvb tree gained a conflict against the dma-mapping tree.

The kvm tree lost its build failure.

The xarray tree gained a build failure due to an interaction with the
rdma tree for which I applied a merge fix patch.

Non-merge commits (relative to Linus' tree): 9094
 9488 files changed, 427793 insertions(+), 225870 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 296 trees (counting Linus' and 69 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (2137397c92ae Merge tag 'sound-5.0' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound)
Merging fixes/master (ed3ce4cfc919 adfs: mark expected switch fall-throughs)
Merging kspp-gustavo/for-next/kspp (6f6c95f09001 ASN.1: mark expected switch 
fall-through)
Merging kbuild-current/fixes (207a369e3c08 sh: fix build error for invisible 
CONFIG_BUILTIN_DTB_SOURCE)
Merging arc-current/for-curr (1ea685503e5c ARC: define ARCH_SLAB_MINALIGN = 8)
Merging arm-current/fixes (fc67e6f120a3 ARM: 8835/1: dma-mapping: Clear DMA ops 
on teardown)
Merging arm64-fixes/for-next/fixes (74698f6971f2 arm64: Relax GIC version check 
during early boot)
Merging m68k-current/for-linus (bed1369f5190 m68k: Fix memblock-related crashes)
Merging powerpc-fixes/fixes (8f5b27347e88 powerpc/powernv/sriov: Register IOMMU 
groups for VFs)
Merging sparc/master (b71acb0e3721 Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (9c2054a5cf41 net: dsa: fix unintended change of bridge 
interface STP state)
Merging bpf/master (f6be4d16039b selftests/bpf: make sure signal interrupts 
BPF_PROG_TEST_RUN)
Merging ipsec/master (660899ddf06a xfrm: Fix inbound traffic via XFRM 
interfaces across network namespaces)
Merging netfilter/master (1765f5dcd009 sky2: Increase D3 delay again)
Merging ipvs/master (b2e3d68d1251 netfilter: nft_compat: destroy function must 
not have side effects)
Merging wireless-drivers/master (d04ca383860b mt76x0u: fix suspend/resume)
Merging mac80211/master (83e37e0bdd14 mac80211: Restore vif beacon interval if 
start ap fails)
Merging rdma-fixes/for-rc (48396e80fb65 RDMA/srp: Rework SCSI device reset 
handling)
Merging sound-current/for-linus (268836649c07 Merge tag 'asoc-fix-v5.0-rc6' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging sound-asoc-fixes/for-linus (7f23c5605d9f Merge branch 'asoc-5.0' into 
asoc-linus)
Merging regmap-fixes/for-linus (f17b5f06cb92 Linux 5.0-rc4)
Merging regulator-fixes/for-linus (7651105001d5 Merge branch 'regulator-5.0' 
into regulator-linus)
Merging spi-fixes/for-linus (af8ecdda3367 Merge branch 'spi-5.0' into spi-linus)
Merging pci-current/for-linus (f57a98e1b713 PCI: Work around Synopsys duplicate 
Device ID (HAPS USB3, NXP i.MX))
Merging driver-core.current/driver-core-linus (d13937116f1e Linux 5.0-rc6)
Merging tty.current/tty-linus (d13937116f1e Linux 5.0-rc6)
Merging usb.current/usb-linus (d13937116f1e Linux 5.0-rc6)
Merging usb-gadget-fixes/fixes (a53469a68eb8 usb: phy: am335x: fix race 
condition in _probe)
Merging usb-serial-fixes/usb-linus (8d7fa3d4ea3f USB: serial: ftdi_sio: add ID 
for Hjelmslund Electronics USB485)
Mergin

Re: [PATCH] net: dsa: add missing phy address offset

2019-02-20 Thread Marcel Reichmuth

On Wed, Feb 20, 2019 at 08:31:22PM +0100, Andrew Lunn wrote:
> On Wed, Feb 20, 2019 at 11:27:16AM -0800, Florian Fainelli wrote:
> > On 2/20/19 10:15 AM, Marcel Reichmuth wrote:
> > 
> > You are supposed to describe the port to PHY mapping using the binding,
> > so for instance:
> > 
> > ports {
> > port@0 {
> > reg = <0>;
> > phy-handle = <>;
> > };
> > 
> > };
> > 
> > mdio {
> > phy1: phy@1 {
> > reg = <1>;
> > };
> > };
> > 
> > etc. is not that working for you?
> 
> 
> The Espressobin does exactly this:
> 
> arch/arm64/boot/dts/marvell/armada-3720-espressobin.dts 
> 
> It also uses the 6341.
>
Thank you very much for your hints. Yes that works indeed too. I
just assumed it was intended to work automatically with the 
built-in phys as it does with the other switches I am using.

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Wei Yang

On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote:
>Greg Kroah-Hartman  writes:
>
>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>> > > >Greeting,
>>> > > >
>>> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops 
>>> > > >due to commit:
>>> > > >
>>> > > >
>>> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
>>> > > >device->knode_class to device_private")
>>> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>> > > >
>>> > > 
>>> > > This is interesting.
>>> > > 
>>> > > I didn't expect the move of this field will impact the performance.
>>> > > 
>>> > > The reason is struct device is a hotter memory than 
>>> > > device->device_private?
>>> > > 
>>> > > >in testcase: will-it-scale
>>> > > >on test machine: 288 threads Knights Mill with 80G memory
>>> > > >with following parameters:
>>> > > >
>>> > > >   nr_task: 100%
>>> > > >   mode: thread
>>> > > >   test: unlink2
>>> > > >   cpufreq_governor: performance
>>> > > >
>>> > > >test-description: Will It Scale takes a testcase and runs it from 1 
>>> > > >through to n parallel copies to see if the testcase will scale. It 
>>> > > >builds both a process and threads based test in order to see any 
>>> > > >differences between the two.
>>> > > >test-url: https://github.com/antonblanchard/will-it-scale
>>> > > >
>>> > > >In addition to that, the commit also has significant impact on the 
>>> > > >following tests:
>>> > > >
>>> > > >+--+---+
>>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops 
>>> > > >-29.9% regression |
>>> > > >| test machine | 288 threads Knights Mill with 80G memory  
>>> > > >|
>>> > > >| test parameters  | cpufreq_governor=performance  
>>> > > >|
>>> > > >|  | mode=thread   
>>> > > >|
>>> > > >|  | nr_task=100%  
>>> > > >|
>>> > > >|  | test=signal1  
>>> > > >|
>>> > 
>>> > Ok, I'm going to blame your testing system, or something here, and not
>>> > the above patch.
>>> > 
>>> > All this test does is call raise(3).  That does not touch the driver
>>> > core at all.
>>> > 
>>> > > >+--+---+
>>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops 
>>> > > >-16.5% regression |
>>> > > >| test machine | 288 threads Knights Mill with 80G memory  
>>> > > >|
>>> > > >| test parameters  | cpufreq_governor=performance  
>>> > > >|
>>> > > >|  | mode=thread   
>>> > > >|
>>> > > >|  | nr_task=100%  
>>> > > >|
>>> > > >|  | test=open1
>>> > > >|
>>> > > >+--+---+
>>> > 
>>> > Same here, open1 just calls open/close a lot.  No driver core
>>> > interaction at all there either.
>>> > 
>>> > So are you _sure_ this is the offending patch?
>>> 
>>> Hi Greg,
>>> 
>>> We did an experiment, recovered the layout of struct device. and we
>>> found the regression is gone. I guess the regession is not from the
>>> patch but related to the struct layout.
>>> 
>>> 
>>> tests: 1
>>> testcase/path_params/tbox_group/run: 
>>> will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>> 
>>> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>>   --  
>>>  %stddev  change %stddev
>>>  \  |\  
>>> 237096  14% 270789will-it-scale.workload
>>>823  14%939will-it-scale.per_thread_ops
>>> 
>>> 
>>> tests: 1
>>> testcase/path_params/tbox_group/run: 
>>> will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>> 
>>> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>>   --  
>>>  %stddev  change %stddev
>>>  \  |\  
>>>  93.51   3%48% 138.53   3%  will-it-scale.time.user_time
>>>186  40%261will-it-scale.per_thread_ops
>>>  53909  40%  75507will-it-scale.workload
>>> 
>>> 
>>> tests: 1
>>> testcase/path_params/tbox_group/run: 
>>>

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-20 Thread Masami Hiramatsu

On Fri, 15 Feb 2019 12:47:13 -0500
Steven Rostedt  wrote:

> From: Changbin Du 
> 
> The userspace can ask kprobe to intercept strings at any memory address,
> including invalid kernel address. In this case, fetch_store_strlen()
> would crash since it uses general usercopy function, and user access
> functions are no longer allowed to access kernel memory.
> 
> For example, we can crash the kernel by doing something as below:
> 
> $ sudo kprobe 'p:do_sys_open +0(+0(%si)):string'
> 
> [  103.620391] BUG: GPF in non-whitelisted uaccess (non-canonical address?)
> [  103.622104] general protection fault:  [#1] SMP PTI
> [  103.623424] CPU: 10 PID: 1046 Comm: cat Not tainted 
> 5.0.0-rc3-00130-gd73aba1-dirty #96
> [  103.625321] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.12.0-2-g628b2e6-dirty-20190104_103505-linux 04/01/2014
> [  103.628284] RIP: 0010:process_fetch_insn+0x1ab/0x4b0
> [  103.629518] Code: 10 83 80 28 2e 00 00 01 31 d2 31 ff 48 8b 74 24 28 eb 0c 
> 81 fa ff 0f 00 00 7f 1c 85 c0 75 18 66 66 90 0f ae e8 48 63
>  ca 89 f8 <8a> 0c 31 66 66 90 83 c2 01 84 c9 75 dc 89 54 24 34 89 44 24 28 48
> [  103.634032] RSP: 0018:88845eb37ce0 EFLAGS: 00010246
> [  103.635312] RAX:  RBX: 888456c4e5a8 RCX: 
> 
> [  103.637057] RDX:  RSI: 2e646c2f6374652f RDI: 
> 
> [  103.638795] RBP:  R08:  R09: 
> 
> [  103.640556] R10: 0001 R11:  R12: 
> 
> [  103.642297] R13:  R14:  R15: 
> 
> [  103.644040] FS:  () GS:88846f00() 
> knlGS:
> [  103.646019] CS:  0010 DS:  ES:  CR0: 80050033
> [  103.647436] CR2: 7ffc79758038 CR3: 000463360006 CR4: 
> 00020ee0
> [  103.649147] Call Trace:
> [  103.649781]  ? sched_clock_cpu+0xc/0xa0
> [  103.650747]  ? do_sys_open+0x5/0x220
> [  103.651635]  kprobe_trace_func+0x303/0x380
> [  103.652645]  ? do_sys_open+0x5/0x220
> [  103.653528]  kprobe_dispatcher+0x45/0x50
> [  103.654682]  ? do_sys_open+0x1/0x220
> [  103.655875]  kprobe_ftrace_handler+0x90/0xf0
> [  103.657282]  ftrace_ops_assist_func+0x54/0xf0
> [  103.658564]  ? __call_rcu+0x1dc/0x280
> [  103.659482]  0xc0bf
> [  103.660384]  ? __ia32_sys_open+0x20/0x20
> [  103.661682]  ? do_sys_open+0x1/0x220
> [  103.662863]  do_sys_open+0x5/0x220
> [  103.663988]  do_syscall_64+0x60/0x210
> [  103.665201]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  103.666862] RIP: 0033:0x7fc22fadccdd
> [  103.668034] Code: 48 89 54 24 e0 41 83 e2 40 75 32 89 f0 25 00 00 41 00 3d 
> 00 00 41 00 74 24 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff
>  ff 0f 05 <48> 3d 00 f0 ff ff 77 33 f3 c3 66 0f 1f 84 00 00 00 00 00 48 8d 44
> [  103.674029] RSP: 002b:7ffc7972c3a8 EFLAGS: 0287 ORIG_RAX: 
> 0101
> [  103.676512] RAX: ffda RBX: 562f86147a21 RCX: 
> 7fc22fadccdd
> [  103.678853] RDX: 0008 RSI: 7fc22fae1428 RDI: 
> ff9c
> [  103.681151] RBP:  R08:  R09: 
> 
> [  103.683489] R10:  R11: 0287 R12: 
> 7fc22fce90a8
> [  103.685774] R13: 0001 R14:  R15: 
> 
> [  103.688056] Modules linked in:
> [  103.689131] ---[ end trace 43792035c28984a1 ]---
> 
> This can be fixed by using probe_mem_read() instead, as it can handle faulting
> kernel memory addresses, which kprobes can legitimately do.

Basically OK to me.
Could you use probe_kernel_read() in this context, since probe_mem_read() is a
wrapper function for template code.

With that change,

Acked-by: Masami Hiramatsu 

And for the long term, I need to find more efficient (or smarter) way to do it,
like strnlen_user() does.

Thank you,

> 
> Link: http://lkml.kernel.org/r/20190125151051.7381-1-changbin...@gmail.com
> 
> Cc: sta...@vger.kernel.org
> Fixes: 9da3f2b7405 ("x86/fault: BUG() when uaccess helpers fault on kernel 
> addresses")
> Signed-off-by: Changbin Du 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  kernel/trace/trace_kprobe.c | 10 +-
>  1 file changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
> index d5fb09ebba8b..9eaf07f99212 100644
> --- a/kernel/trace/trace_kprobe.c
> +++ b/kernel/trace/trace_kprobe.c
> @@ -861,22 +861,14 @@ static const struct file_operations kprobe_profile_ops 
> = {
>  static nokprobe_inline int
>  fetch_store_strlen(unsigned long addr)
>  {
> - mm_segment_t old_fs;
>   int ret, len = 0;
>   u8 c;
>  
> - old_fs = get_fs();
> - set_fs(KERNEL_DS);
> - pagefault_disable();
> -
>   do {
> - ret = __copy_from_user_inatomic(, (u8 *)addr + len, 1);
> + ret = probe_mem_read(, (u8 *)addr + len, 1);
>   len++;
>   } while (c && ret

Re: [PATCH v3 00/16] powerpc/32: Use BATs/LTLBs for STRICT_KERNEL_RWX

2019-02-20 Thread Christophe Leroy





Le 21/02/2019 à 07:31, Christophe Leroy a écrit :



Le 21/02/2019 à 02:47, Michael Ellerman a écrit :

Christophe Leroy  writes:


The purpose of this serie is to:
- use BATs with STRICT_KERNEL_RWX on book3s (See patch 13 for details.)
- use LTLBs with STRICT_KERNEL_RWX on 8xx (See patch 15 for a few 
details.)


This doesn't boot qemu-mac99 for me:

   spawn ~/src/qemu/ppc-softmmu/qemu-system-ppc -nographic -vga none 
-M mac99 -m 1G -kernel build/vmlinux -initrd ppc32-initrd.gz -append 
console=ttyPZ0 init=/bin/sh

   >> =
   >> OpenBIOS 1.1 [Feb 15 2019 10:05]
   >> Configuration device id QEMU version 1 machine id 1
   >> CPUs: 1
   >> Memory: 1024M
   >> UUID: ----
   >> CPU type PowerPC,G4
   milliseconds isn't unique.
   Welcome to OpenBIOS v1.1 built on Feb 15 2019 10:05
   >> [ppc] Kernel already loaded (0x0100 + 0x00c2c338) (initrd 
0x01d2d000 + 0x007e72f0)

   >> [ppc] Kernel command line: console=ttyPZ0 init=/bin/sh
   >> switching to new context:
   OF stdout device is: /pci@f200/mac-io@c/escc@13000/ch-a@13020
   Preparing to boot Linux version 
5.0.0-rc2-gcc-8.2.0-00125-g4fcb83ca7936 (michael@ka4) (gcc version 
8.2.0 (Buildroot 2018.11-rc2-3-ga0787e9)) #724 Thu Feb 21 12:03:14 
AEDT 2019

   Detected machine type: 0400
   command line:
   memory layout at init:
 memory_limit :  (16 MB aligned)
 alloc_bottom : 02515000
 alloc_top    : 3000
 alloc_top_hi : 4000
 rmo_top  : 3000
 ram_top  : 4000
   copying OF device tree...
   Building dt strings...
   Building dt structure...
   Device tree strings 0x02516000 -> 0x025150a4
   Device tree struct  0x02517000 -> 0x3fde7eb0
   Quiescing Open Firmware ...
   Booting Linux via __start() @ 0x0100 ...
   FAIL! Booting BE pmac32


That's pmac32 defconfig ish.
I haven't had time to debug it further sorry.



Ok. I boots fine without the '-m 1G'.

I'll find out why.



Doesn't boot because it maps memory above total_lowmem.

I'll fix it.

Christophe

Re: [PATCH 05/11] x86 topology: export die_siblings

2019-02-20 Thread Len Brown

Hi Brice,
Thank you for your suggestions!

> Patches #4 and #5 are changing the meaning the core_siblings (in the
> past, it always returned all threads in the entire package). All
> existing user-space tools will see each die as a separate package until
> they are updated to read die_siblings too. It only matters for multi-die
> CPUs when running a recent kernel with an old userspace tool, but it may
> still be consider as a sysfs ABI change.

I agree.

Exhibit 1 is the "lscpu" program.

> Worse, things will break again if you ever add tile_siblings for
> CPUID.1f "Tiles". User-space will suddenly see 2 dies of 2 cores instead
> 1 die of 2 tiles of 2 cores.

Agreed, the existing naming scheme is not resilient to future additions.

> I understand that this isn't easy to fix. But I want to make sure people
> are aware of the meaning of this change.

Here is my list of applications that care about the new CPUID leaf
and the concepts of packages and die:

cpuid
lscpu
x86_energy_perf_policy
turbostat

> The proper way to avoid this is to stop having file foo_siblings refer
> to "the container of foo" instead of "foo itself" (because that
> container changes when you add intermediate levels). Rename sysfs files
> like below, and you don't get any breakage anymore when adding
> intermediate levels:
>
> thread_siblings -> core_threads (can we do sysfs alias or symlink to
> keep the old name?)
>
> core_siblings -> die_threads
>
> die_siblings -> package_threads (needs an alias too)
>
> The documentation would also be much easier to read since "die_threads"
> is obviously "human-readable list of cpuX's hardware threads within the
> same die_id". And no need to modify the doc anymore when adding levels :)

I like your idea!

Hm, I think i'd skip creating "die_siblings", as it adds to the
fragile legacy naming scheme
that we want to deprecate.

And although it is ill-defined and has a mis-leading name, I now think
it would be
better to leave "core_siblings" as defined -- a legacy synonym for
"package_threads".  Deprecate it, but keep its original definition
until it is removed.

Updated applications would use:

core_threads
die_threads
package_threads

and they'll be future proof if/when we add any new levels.

the legacy thread_siblings and core_siblings will stick around as aliases:

core_threads (thread_siblings)
die_threads
package_threads (core_siblings)

thanks!
Len Brown, Intel Open Source Technology Center

Re: [PATCH v2 1/1] s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem

2019-02-20 Thread Christian Borntraeger




On 20.02.2019 14:12, Harald Freudenberger wrote:
> On 18.02.19 19:08, Pierre Morel wrote:
>> Libudev relies on having a subsystem link for non-root devices. To
>> avoid libudev (and potentially other userspace tools) choking on the
>> matrix device let us introduce a vfio_ap bus and with that the vfio_ap
>> bus subsytem, and make the matrix device reside within it.
>>
>> Doing this we need to suppress the forced link from the matrix device to
>> the vfio_ap driver and we suppress the device_type we do not need
>> anymore.
>>
>> Since the associated matrix driver is not the vfio_ap driver any more,
>> we have to change the search for the devices on the vfio_ap driver in
>> the function vfio_ap_verify_queue_reserved.
>>
>> Reported-by: Marc Hartmayer 
>> Reported-by: Christian Borntraeger 
>> Signed-off-by: Pierre Morel 
>> ---
>>  drivers/s390/crypto/vfio_ap_drv.c | 48 
>> +--
>>  drivers/s390/crypto/vfio_ap_ops.c |  4 +--
>>  drivers/s390/crypto/vfio_ap_private.h |  1 +
>>  3 files changed, 43 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_drv.c 
>> b/drivers/s390/crypto/vfio_ap_drv.c
>> index 31c6c84..8e45559 100644
>> --- a/drivers/s390/crypto/vfio_ap_drv.c
>> +++ b/drivers/s390/crypto/vfio_ap_drv.c
>> @@ -24,10 +24,6 @@ MODULE_LICENSE("GPL v2");
>>  
>>  static struct ap_driver vfio_ap_drv;
>>  
>> -static struct device_type vfio_ap_dev_type = {
>> -.name = VFIO_AP_DEV_TYPE_NAME,
>> -};
>> -
>>  struct ap_matrix_dev *matrix_dev;
>>  
>>  /* Only type 10 adapters (CEX4 and later) are supported
>> @@ -62,6 +58,27 @@ static void vfio_ap_matrix_dev_release(struct device *dev)
>>  kfree(matrix_dev);
>>  }
>>  
>> +static int matrix_bus_match(struct device *dev, struct device_driver *drv)
>> +{
>> +return 1;
>> +}
>> +
>> +static struct bus_type matrix_bus = {
>> +.name = "vfio_ap",
>> +.match = _bus_match,
>> +};
>> +
>> +static int matrix_probe(struct device *dev)
>> +{
>> +return 0;
>> +}
>> +
>> +static struct device_driver matrix_driver = {
>> +.name = "vfio_ap",
>> +.bus = _bus,
>> +.probe = matrix_probe,
>> +};
>> +
>>  static int vfio_ap_matrix_dev_create(void)
>>  {
>>  int ret;
>> @@ -71,6 +88,10 @@ static int vfio_ap_matrix_dev_create(void)
>>  if (IS_ERR(root_device))
>>  return PTR_ERR(root_device);
>>  
>> +ret = bus_register(_bus);
>> +if (ret)
>> +goto bus_register_err;
>> +
>>  matrix_dev = kzalloc(sizeof(*matrix_dev), GFP_KERNEL);
>>  if (!matrix_dev) {
>>  ret = -ENOMEM;
>> @@ -87,30 +108,41 @@ static int vfio_ap_matrix_dev_create(void)
>>  mutex_init(_dev->lock);
>>  INIT_LIST_HEAD(_dev->mdev_list);
>>  
>> -matrix_dev->device.type = _ap_dev_type;
>>  dev_set_name(_dev->device, "%s", VFIO_AP_DEV_NAME);
>>  matrix_dev->device.parent = root_device;
>> +matrix_dev->device.bus = _bus;
>>  matrix_dev->device.release = vfio_ap_matrix_dev_release;
>> -matrix_dev->device.driver = _ap_drv.driver;
>> +matrix_dev->vfio_ap_drv = _ap_drv;
>>  
>>  ret = device_register(_dev->device);
>>  if (ret)
>>  goto matrix_reg_err;
>>  
>> +ret = driver_register(_driver);
>> +if (ret)
>> +goto matrix_drv_err;
>> +
>>  return 0;
>>  
>> +matrix_drv_err:
>> +device_unregister(_dev->device);
>>  matrix_reg_err:
>>  put_device(_dev->device);
>>  matrix_alloc_err:
>> +bus_unregister(_bus);
>> +bus_register_err:
>>  root_device_unregister(root_device);
>> -
>>  return ret;
>>  }
>>  
>>  static void vfio_ap_matrix_dev_destroy(void)
>>  {
>> +struct device *root_device = matrix_dev->device.parent;
>> +
>> +driver_unregister(_driver);
>>  device_unregister(_dev->device);
>> -root_device_unregister(matrix_dev->device.parent);
>> +bus_unregister(_bus);
>> +root_device_unregister(root_device);
>>  }
>>  
>>  static int __init vfio_ap_init(void)
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c 
>> b/drivers/s390/crypto/vfio_ap_ops.c
>> index 272ef42..900b9cf 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -198,8 +198,8 @@ static int vfio_ap_verify_queue_reserved(unsigned long 
>> *apid,
>>  qres.apqi = apqi;
>>  qres.reserved = false;
>>  
>> -ret = driver_for_each_device(matrix_dev->device.driver, NULL, ,
>> - vfio_ap_has_queue);
>> +ret = driver_for_each_device(_dev->vfio_ap_drv->driver, NULL,
>> + , vfio_ap_has_queue);
>>  if (ret)
>>  return ret;
>>  
>> diff --git a/drivers/s390/crypto/vfio_ap_private.h 
>> b/drivers/s390/crypto/vfio_ap_private.h
>> index 5675492..76b7f98 100644
>> --- a/drivers/s390/crypto/vfio_ap_private.h
>> +++ b/drivers/s390/crypto/vfio_ap_private.h
>> @@ -40,6 +40,7 @@ struct ap_matrix_dev {
>>  struct ap_config_info info;
>>  struct list_head mdev_list;

Re: [PATCH 1/2 v2] kprobe: Do not use uaccess functions to access kernel memory that can fault

2019-02-20 Thread Masami Hiramatsu

On Wed, 20 Feb 2019 11:42:17 -0500
Steven Rostedt  wrote:

> On Thu, 21 Feb 2019 01:04:53 +0900
> Masami Hiramatsu  wrote:
> 
> > > What about just adding 'u' to the end of the offset? Say you have a
> > > data structure in kernel space that has a field in user space you want
> > > to reference?
> > > 
> > > 
> > >   field_val=+8u(+0(%si))  
> > 
> > Ah, that looks good :~) thank you for this idea!
> 
> 
> 
> Hmm, I wonder if we should make it +u8 or u+8? as +8u may be confused
> as unsigned? Like 8ULL. I don't know. Kernel developers suck at
> naming :-p

I like +u8 since it is easier to implement :-p.

Thank you,

> 
> 
> 
> -- Steve


-- 
Masami Hiramatsu

Re: [PATCH RFC v2 2/4] PCI: pciehp: Do not turn off slot if presence comes up after link

2019-02-20 Thread Lukas Wunner

On Tue, Feb 19, 2019 at 07:20:28PM -0600, Alexandru Gagniuc wrote:
> @@ -213,6 +213,21 @@ void pciehp_handle_disable_request(struct controller 
> *ctrl)
>   ctrl->request_result = pciehp_disable_slot(ctrl, SAFE_REMOVAL);
>  }
>  
> +static bool is_delayed_presence_up_event(struct controller *ctrl, u32 events)
> +{
> + bool present, link_active;
> +
> + if (!ctrl->inband_presence_disabled)
> + return false;
> +
> + present = pciehp_card_present(ctrl);
> + link_active = pciehp_check_link_active(ctrl);
> +
> + if (!present || !link_active || events & PCI_EXP_SLTSTA_DLLSC)
> + return false;
> +
> + return true;
> +}
>  void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 
> events)

Newline please after the closing curly brace.


> @@ -220,13 +235,22 @@ void pciehp_handle_presence_or_link_change(struct 
> controller *ctrl, u32 events)
>   /*
>* If the slot is on and presence or link has changed, turn it off.
>* Even if it's occupied again, we cannot assume the card is the same.
> +  * When the card is swapped, we also expect a change in link state,
> +  * without which, it's likely presence became high after link-active.
>*/

Maybe it's just me but I find the code comment difficult to understand.
How about something along the lines of:

/*
 * If the slot is on and presence or link has changed, turn it off.
 * Even if it's occupied again, we cannot assume the card is the same.
+*
+* An exception is a delayed "Card present" after a "Link Up".
+* This can happen on controllers with in-band presence disabled,
+* PCIe r5.0 sec X.Y.Z.
 */


>   mutex_lock(>state_lock);
> + present = pciehp_card_present(ctrl);
> + link_active = pciehp_check_link_active(ctrl);
>   switch (ctrl->state) {

These two assignments appear to be superfluous as you're also performing
them in pciehp_check_link_active().

Thanks,

Lukas

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Greg Kroah-Hartman

On Thu, Feb 21, 2019 at 03:18:22PM +0800, Huang, Ying wrote:
> Greg Kroah-Hartman  writes:
> 
> > On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
> >> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> >> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> >> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> >> > > >Greeting,
> >> > > >
> >> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops 
> >> > > >due to commit:
> >> > > >
> >> > > >
> >> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
> >> > > >device->knode_class to device_private")
> >> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git 
> >> > > >master
> >> > > >
> >> > > 
> >> > > This is interesting.
> >> > > 
> >> > > I didn't expect the move of this field will impact the performance.
> >> > > 
> >> > > The reason is struct device is a hotter memory than 
> >> > > device->device_private?
> >> > > 
> >> > > >in testcase: will-it-scale
> >> > > >on test machine: 288 threads Knights Mill with 80G memory
> >> > > >with following parameters:
> >> > > >
> >> > > >  nr_task: 100%
> >> > > >  mode: thread
> >> > > >  test: unlink2
> >> > > >  cpufreq_governor: performance
> >> > > >
> >> > > >test-description: Will It Scale takes a testcase and runs it from 1 
> >> > > >through to n parallel copies to see if the testcase will scale. It 
> >> > > >builds both a process and threads based test in order to see any 
> >> > > >differences between the two.
> >> > > >test-url: https://github.com/antonblanchard/will-it-scale
> >> > > >
> >> > > >In addition to that, the commit also has significant impact on the 
> >> > > >following tests:
> >> > > >
> >> > > >+--+---+
> >> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops 
> >> > > >-29.9% regression |
> >> > > >| test machine | 288 threads Knights Mill with 80G memory 
> >> > > > |
> >> > > >| test parameters  | cpufreq_governor=performance 
> >> > > > |
> >> > > >|  | mode=thread  
> >> > > > |
> >> > > >|  | nr_task=100% 
> >> > > > |
> >> > > >|  | test=signal1 
> >> > > > |
> >> > 
> >> > Ok, I'm going to blame your testing system, or something here, and not
> >> > the above patch.
> >> > 
> >> > All this test does is call raise(3).  That does not touch the driver
> >> > core at all.
> >> > 
> >> > > >+--+---+
> >> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops 
> >> > > >-16.5% regression |
> >> > > >| test machine | 288 threads Knights Mill with 80G memory 
> >> > > > |
> >> > > >| test parameters  | cpufreq_governor=performance 
> >> > > > |
> >> > > >|  | mode=thread  
> >> > > > |
> >> > > >|  | nr_task=100% 
> >> > > > |
> >> > > >|  | test=open1   
> >> > > > |
> >> > > >+--+---+
> >> > 
> >> > Same here, open1 just calls open/close a lot.  No driver core
> >> > interaction at all there either.
> >> > 
> >> > So are you _sure_ this is the offending patch?
> >> 
> >> Hi Greg,
> >> 
> >> We did an experiment, recovered the layout of struct device. and we
> >> found the regression is gone. I guess the regession is not from the
> >> patch but related to the struct layout.
> >> 
> >> 
> >> tests: 1
> >> testcase/path_params/tbox_group/run: 
> >> will-it-scale/performance-thread-100%-unlink2/lkp-knm01
> >> 
> >> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
> >>   --  
> >>  %stddev  change %stddev
> >>  \  |\  
> >> 237096  14% 270789will-it-scale.workload
> >>823  14%939will-it-scale.per_thread_ops
> >> 
> >> 
> >> tests: 1
> >> testcase/path_params/tbox_group/run: 
> >> will-it-scale/performance-thread-100%-signal1/lkp-knm01
> >> 
> >> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
> >>   --  
> >>  %stddev  change %stddev
> >>  \  |\  
> >>  93.51   3%48% 138.53   3%  will-it-scale.time.user_time
> >>186  40%261will-it-scale.per_thread_ops
> >>  53909  40%  75507

Re: [PATCH] iwlwifi: mvm: Use div64_s64 instead of do_div in iwl_mvm_debug_range_resp

2019-02-20 Thread Luciano Coelho

On Wed, 2019-02-20 at 10:56 -0700, Nathan Chancellor wrote:
> On Wed, Feb 20, 2019 at 11:51:34AM +0100, Arnd Bergmann wrote:
> > On Tue, Feb 19, 2019 at 7:22 PM Nathan Chancellor
> >  wrote:
> > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-
> > > initiator.c b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-
> > > initiator.c
> > > index e9822a3ec373..92b22250eb7d 100644
> > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c
> > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ftm-initiator.c
> > > @@ -462,7 +462,7 @@ static void iwl_mvm_debug_range_resp(struct
> > > iwl_mvm *mvm, u8 index,
> > >  {
> > > s64 rtt_avg = res->ftm.rtt_avg * 100;
> > > 
> > > -   do_div(rtt_avg, );
> > > +   div64_s64(rtt_avg, );
> > 
> > This is wrong: div64_s64 does not modify its argument like
> > do_div(), but
> > it returns the result instead. You also don't want to divide by a
> > 64-bit
> > value when the second argument is a small constant.
> > 
> > I think the correct way should be
> > 
> >s64 rtt_avg = div_s64(res->ftm.rtt_avg * 100, );
> > 
> > If you know that the value is positive, using unsigned types
> > and div_u64() would be slightly faster.
> > 
> >   Arnd
> 
> Thanks for the review and explanation, Arnd.
> 
> Luca, could you drop this version so I can resend it?

Sure, please do! I already applied this internally, but I can just fix
it with your new patch and that will be squashed before being sent
upstream, so it will look like your second patch.

--
Cheers,
Luca.

Re: [LKP] [RFC PATCH] mm, memory_hotplug: fix off-by-one in is_pageblock_removable

2019-02-20 Thread Michal Hocko

On Thu 21-02-19 11:18:07, Rong Chen wrote:
> Hi,
> 
> The patch can fix the issue for me.

Thanks for the confirmation!
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] staging: rtlwifi: Use proper enum for return in halmac_parse_psd_data_88xx

2019-02-20 Thread Nicholas Mc Guire

On Wed, Feb 20, 2019 at 10:25:24PM -0700, Nathan Chancellor wrote:
> Clang warns:
> 
> drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c:2472:11:
> warning: implicit conversion from enumeration type 'enum
> halmac_cmd_process_status' to different enumeration type 'enum
> halmac_ret_status' [-Wenum-conversion]
> return HALMAC_CMD_PROCESS_ERROR;
> ~~ ^~~~
> 1 warning generated.
> 
yup - my bad I somehow managed to end up in halmac_cmd_process_status
rather than halmac_ret_status - HALMAC_RET_MALLOC_FAIL makes sense here.

interesting that gcc did not fuss at this.

thx!
hofrat

> Fix this by using the proper enum for allocation failures,
> HALMAC_RET_MALLOC_FAIL, which is used in the rest of this file.
> 
> Fixes: e4b08e16b7d9 ("staging: r8822be: check kzalloc return or bail")
> Link: https://github.com/ClangBuiltLinux/linux/issues/375
> Signed-off-by: Nathan Chancellor 

Reviewed-by: Nicholas Mc Guire 

> ---
>  drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c 
> b/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c
> index ec742da030db..ddbeff8224ab 100644
> --- a/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c
> +++ b/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c
> @@ -2469,7 +2469,7 @@ halmac_parse_psd_data_88xx(struct halmac_adapter 
> *halmac_adapter, u8 *c2h_buf,
>   if (!psd_set->data) {
>   psd_set->data = kzalloc(psd_set->data_size, GFP_KERNEL);
>   if (!psd_set->data)
> - return HALMAC_CMD_PROCESS_ERROR;
> + return HALMAC_RET_MALLOC_FAIL;
>   }
>  
>   if (segment_id == 0)
> -- 
> 2.21.0.rc1
>

[PATCH V7 2/8] clocksource: tegra: add Tegra210 timer support

2019-02-20 Thread Joseph Lo

Add support for the Tegra210 timer that runs at oscillator clock
(TMR10-TMR13). We need these timers to work as clock event device and to
replace the ARMv8 architected timer due to it can't survive across the
power cycle of the CPU core or CPUPORESET signal. So it can't be a wake-up
source when CPU suspends in power down state.

Also convert the original driver to use timer-of API.

Cc: Daniel Lezcano 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Joseph Lo 
Acked-by: Thierry Reding 
Acked-by: Jon Hunter 
---
v7:
 * kconfig fix for 'depends on ARM || ARM64'
 * move suspend/resume to clkevt
 * refine the usage for the macro of TIMER_OF_DECLARE
v6:
 * refine the timer defines
 * add ack tag from Jon.
v5:
 * add ack tag from Thierry
v4:
 * merge timer-tegra210.c in previous version into timer-tegra20.c
v3:
 * use timer-of API
v2:
 * add error clean-up code
---
 drivers/clocksource/Kconfig |   3 +-
 drivers/clocksource/timer-tegra20.c | 370 +++-
 include/linux/cpuhotplug.h  |   1 +
 3 files changed, 262 insertions(+), 112 deletions(-)

diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
index 8dfd3bc448d0..5d93e580e5dc 100644
--- a/drivers/clocksource/Kconfig
+++ b/drivers/clocksource/Kconfig
@@ -131,7 +131,8 @@ config SUN5I_HSTIMER
 config TEGRA_TIMER
bool "Tegra timer driver" if COMPILE_TEST
select CLKSRC_MMIO
-   depends on ARM
+   select TIMER_OF
+   depends on ARM || ARM64
help
  Enables support for the Tegra driver.
 
diff --git a/drivers/clocksource/timer-tegra20.c 
b/drivers/clocksource/timer-tegra20.c
index 4293943f4e2b..fdb3d795a409 100644
--- a/drivers/clocksource/timer-tegra20.c
+++ b/drivers/clocksource/timer-tegra20.c
@@ -15,21 +15,24 @@
  *
  */
 
-#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 #include 
-#include 
 #include 
-#include 
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
+#include 
 #include 
-#include 
+#include 
+
+#include "timer-of.h"
 
+#ifdef CONFIG_ARM
 #include 
+#endif
 
 #define RTC_SECONDS0x08
 #define RTC_SHADOW_SECONDS 0x0c
@@ -39,74 +42,161 @@
 #define TIMERUS_USEC_CFG 0x14
 #define TIMERUS_CNTR_FREEZE 0x4c
 
-#define TIMER1_BASE 0x0
-#define TIMER2_BASE 0x8
-#define TIMER3_BASE 0x50
-#define TIMER4_BASE 0x58
-
-#define TIMER_PTV 0x0
-#define TIMER_PCR 0x4
-
+#define TIMER_PTV  0x0
+#define TIMER_PTV_EN   BIT(31)
+#define TIMER_PTV_PER  BIT(30)
+#define TIMER_PCR  0x4
+#define TIMER_PCR_INTR_CLR BIT(30)
+
+#ifdef CONFIG_ARM
+#define TIMER_CPU0 0x50 /* TIMER3 */
+#else
+#define TIMER_CPU0 0x90 /* TIMER10 */
+#define TIMER10_IRQ_IDX10
+#define IRQ_IDX_FOR_CPU(cpu)   (TIMER10_IRQ_IDX + cpu)
+#endif
+#define TIMER_BASE_FOR_CPU(cpu) (TIMER_CPU0 + (cpu) * 8)
+
+static u32 usec_config;
 static void __iomem *timer_reg_base;
+#ifdef CONFIG_ARM
 static void __iomem *rtc_base;
-
 static struct timespec64 persistent_ts;
 static u64 persistent_ms, last_persistent_ms;
-
 static struct delay_timer tegra_delay_timer;
-
-#define timer_writel(value, reg) \
-   writel_relaxed(value, timer_reg_base + (reg))
-#define timer_readl(reg) \
-   readl_relaxed(timer_reg_base + (reg))
+#endif
 
 static int tegra_timer_set_next_event(unsigned long cycles,
 struct clock_event_device *evt)
 {
-   u32 reg;
+   void __iomem *reg_base = timer_of_base(to_timer_of(evt));
 
-   reg = 0x8000 | ((cycles > 1) ? (cycles-1) : 0);
-   timer_writel(reg, TIMER3_BASE + TIMER_PTV);
+   writel(TIMER_PTV_EN |
+  ((cycles > 1) ? (cycles - 1) : 0), /* n+1 scheme */
+  reg_base + TIMER_PTV);
 
return 0;
 }
 
-static inline void timer_shutdown(struct clock_event_device *evt)
+static int tegra_timer_shutdown(struct clock_event_device *evt)
 {
-   timer_writel(0, TIMER3_BASE + TIMER_PTV);
+   void __iomem *reg_base = timer_of_base(to_timer_of(evt));
+
+   writel(0, reg_base + TIMER_PTV);
+
+   return 0;
 }
 
-static int tegra_timer_shutdown(struct clock_event_device *evt)
+static int tegra_timer_set_periodic(struct clock_event_device *evt)
 {
-   timer_shutdown(evt);
+   void __iomem *reg_base = timer_of_base(to_timer_of(evt));
+
+   writel(TIMER_PTV_EN | TIMER_PTV_PER |
+  ((timer_of_rate(to_timer_of(evt)) / HZ) - 1),
+  reg_base + TIMER_PTV);
+
return 0;
 }
 
-static int tegra_timer_set_periodic(struct clock_event_device *evt)
+static irqreturn_t tegra_timer_isr(int irq, void *dev_id)
+{
+   struct clock_event_device *evt = (struct clock_event_device *)dev_id;
+   void __iomem *reg_base = timer_of_base(to_timer_of(evt));
+
+   writel(TIMER_PCR_INTR_CLR, reg_base + TIMER_PCR);
+   evt->event_handler(evt);
+
+   return IRQ_HANDLED;
+}
+
+static void tegra_timer_suspend(struct

[PATCH V7 1/8] dt-bindings: timer: add Tegra210 timer

2019-02-20 Thread Joseph Lo

The Tegra210 timer provides fourteen 29-bit timer counters and one 32-bit
timestamp counter. The TMRs run at either a fixed 1 MHz clock rate derived
from the oscillator clock (TMR0-TMR9) or directly at the oscillator clock
(TMR10-TMR13). Each TMR can be programmed to generate one-shot periodic,
or watchdog interrupts.

Cc: Daniel Lezcano 
Cc: Thomas Gleixner 
Cc: linux-kernel@vger.kernel.org
Cc: devicet...@vger.kernel.org
Signed-off-by: Joseph Lo 
Reviewed-by: Rob Herring 
Acked-by: Jon Hunter 
---
v7:
 * no change
v6:
 * add ack tag from Jon.
v5:
 * no change
v4:
 * no change
v3:
 * no change
v2:
 * list all the interrupts that are supported by tegra210 timers block
 * add RB tag from Rob.
---
 .../bindings/timer/nvidia,tegra210-timer.txt  | 36 +++
 1 file changed, 36 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt

diff --git a/Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt 
b/Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt
new file mode 100644
index ..032cda96fe0d
--- /dev/null
+++ b/Documentation/devicetree/bindings/timer/nvidia,tegra210-timer.txt
@@ -0,0 +1,36 @@
+NVIDIA Tegra210 timer
+
+The Tegra210 timer provides fourteen 29-bit timer counters and one 32-bit
+timestamp counter. The TMRs run at either a fixed 1 MHz clock rate derived
+from the oscillator clock (TMR0-TMR9) or directly at the oscillator clock
+(TMR10-TMR13). Each TMR can be programmed to generate one-shot, periodic,
+or watchdog interrupts.
+
+Required properties:
+- compatible : "nvidia,tegra210-timer".
+- reg : Specifies base physical address and size of the registers.
+- interrupts : A list of 14 interrupts; one per each timer channels 0 through
+  13.
+- clocks : Must contain one entry, for the module clock.
+  See ../clocks/clock-bindings.txt for details.
+
+timer@60005000 {
+   compatible = "nvidia,tegra210-timer";
+   reg = <0x0 0x60005000 0x0 0x400>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+,
+;
+   clocks = <_car TEGRA210_CLK_TIMER>;
+   clock-names = "timer";
+};
-- 
2.20.1

Re: [PATCH RFC v2 1/4] PCI: hotplug: Add support for disabling in-band presence

2019-02-20 Thread Lukas Wunner

On Tue, Feb 19, 2019 at 07:20:27PM -0600, Alexandru Gagniuc wrote:
> @@ -846,6 +846,9 @@ struct controller *pcie_init(struct pcie_device *dev)
>   if (pdev->is_thunderbolt)
>   slot_cap |= PCI_EXP_SLTCAP_NCCS;
>  
> + if (pdev->no_in_band_presence)
> + ctrl->inband_presence_disabled = 1;
> +
>   ctrl->slot_cap = slot_cap;
>   mutex_init(>ctrl_lock);
>   mutex_init(>state_lock);

The above hunk belongs in patch 4.


> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -413,6 +413,7 @@ struct pci_dev {
>   unsigned intnon_compliant_bars:1;   /* Broken BARs; ignore them */
>   unsigned intis_probed:1;/* Device probing in progress */
>   unsigned intlink_active_reporting:1;/* Device capable of reporting 
> link active */
> + unsigned intno_in_band_presence:1;  /* Device does not report 
> in-band presence */
>   unsigned intno_vf_scan:1;   /* Don't scan for VFs after IOV 
> enablement */
>   pci_dev_flags_t dev_flags;
>   atomic_tenable_cnt; /* pci_enable_device has been called */

Same here.

Thanks,

Lukas

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Huang, Ying

Greg Kroah-Hartman  writes:

> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>> > > >Greeting,
>> > > >
>> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due 
>> > > >to commit:
>> > > >
>> > > >
>> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
>> > > >device->knode_class to device_private")
>> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> > > >
>> > > 
>> > > This is interesting.
>> > > 
>> > > I didn't expect the move of this field will impact the performance.
>> > > 
>> > > The reason is struct device is a hotter memory than 
>> > > device->device_private?
>> > > 
>> > > >in testcase: will-it-scale
>> > > >on test machine: 288 threads Knights Mill with 80G memory
>> > > >with following parameters:
>> > > >
>> > > >nr_task: 100%
>> > > >mode: thread
>> > > >test: unlink2
>> > > >cpufreq_governor: performance
>> > > >
>> > > >test-description: Will It Scale takes a testcase and runs it from 1 
>> > > >through to n parallel copies to see if the testcase will scale. It 
>> > > >builds both a process and threads based test in order to see any 
>> > > >differences between the two.
>> > > >test-url: https://github.com/antonblanchard/will-it-scale
>> > > >
>> > > >In addition to that, the commit also has significant impact on the 
>> > > >following tests:
>> > > >
>> > > >+--+---+
>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% 
>> > > >regression |
>> > > >| test machine | 288 threads Knights Mill with 80G memory   
>> > > >   |
>> > > >| test parameters  | cpufreq_governor=performance   
>> > > >   |
>> > > >|  | mode=thread
>> > > >   |
>> > > >|  | nr_task=100%   
>> > > >   |
>> > > >|  | test=signal1   
>> > > >   |
>> > 
>> > Ok, I'm going to blame your testing system, or something here, and not
>> > the above patch.
>> > 
>> > All this test does is call raise(3).  That does not touch the driver
>> > core at all.
>> > 
>> > > >+--+---+
>> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% 
>> > > >regression |
>> > > >| test machine | 288 threads Knights Mill with 80G memory   
>> > > >   |
>> > > >| test parameters  | cpufreq_governor=performance   
>> > > >   |
>> > > >|  | mode=thread
>> > > >   |
>> > > >|  | nr_task=100%   
>> > > >   |
>> > > >|  | test=open1 
>> > > >   |
>> > > >+--+---+
>> > 
>> > Same here, open1 just calls open/close a lot.  No driver core
>> > interaction at all there either.
>> > 
>> > So are you _sure_ this is the offending patch?
>> 
>> Hi Greg,
>> 
>> We did an experiment, recovered the layout of struct device. and we
>> found the regression is gone. I guess the regession is not from the
>> patch but related to the struct layout.
>> 
>> 
>> tests: 1
>> testcase/path_params/tbox_group/run: 
>> will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>> 
>> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>   --  
>>  %stddev  change %stddev
>>  \  |\  
>> 237096  14% 270789will-it-scale.workload
>>823  14%939will-it-scale.per_thread_ops
>> 
>> 
>> tests: 1
>> testcase/path_params/tbox_group/run: 
>> will-it-scale/performance-thread-100%-signal1/lkp-knm01
>> 
>> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>   --  
>>  %stddev  change %stddev
>>  \  |\  
>>  93.51   3%48% 138.53   3%  will-it-scale.time.user_time
>>186  40%261will-it-scale.per_thread_ops
>>  53909  40%  75507will-it-scale.workload
>> 
>> 
>> tests: 1
>> testcase/path_params/tbox_group/run: 
>> will-it-scale/performance-thread-100%-open1/lkp-knm01
>> 
>> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>   --  
>>  %stddev  change

RE: [PATCH v4 6/6] usb:cdns3 Fix for stuck packets in on-chip OUT buffer.

2019-02-20 Thread Felipe Balbi


Hi,

(please break your emails at 80-columns)

Pawel Laszczak  writes:
>>> One more thing. Workaround has implemented algorithm that decide for which
>>> endpoint it should be enabled.  e.g for composite device MSC+NCM+ACM it
>>> should work only for ACM OUT endpoint.
>>>
>>
>>If ACM driver didn't queue the request for ACM OUT endpoint, why does the
>>controller accept the data at all?
>>
>>I didn't understand why we need a workaround for this. It should be standard
>>behaviour to NAK data if function driver didn't request for all endpoints.
>
> Yes, I agree with you. Controller shouldn’t accept such packet. As I know this
> behavior will be fixed in RTL.
>
> But I assume that some older version of this controller are one the market,
> and driver should work correct with them.
>
> In the feature this workaround can be limited only to selected controllers.
>
> Even now I assume that it can be enabled/disabled by module parameter.

no module parameters, please. Use revision detection in runtime.

-- 
balbi


signature.asc
Description: PGP signature

Re: [PATCH v2] perf/core: use strndup_user() instead of buggy open-coded version

2019-02-20 Thread Song Liu




> On Feb 20, 2019, at 4:20 PM, Masami Hiramatsu  wrote:
> 
> Hi Jann,
> 
> On Wed, 20 Feb 2019 17:54:43 +0100
> Jann Horn  wrote:
> 
>> The first version of this method was missing the check for
>> `ret == PATH_MAX`; then such a check was added, but it didn't call kfree()
>> on error, so there was still a small memory leak in the error case.
>> Fix it by using strndup_user() instead of open-coding it.
>> 
> 
> This looks good to me.
> 
> Reviewed-by: Masami Hiramatsu 
> 
> BTW, for stable, this is good. For the long term, I think we should
> fix strndup_user() to return -E2BUG when the user string is longer
> than max.
> 
> Thank you,
> 
>> Fixes: 0eadcc7a7bc0 ("perf/core: Fix perf_uprobe_init()")
>> Signed-off-by: Jann Horn 

Thanks for the fix!

Acked-by: Song Liu 


>> ---
>> v2:
>> - be compatible with existing error codes (Masami Hiramatsu)
>> 
>> kernel/trace/trace_event_perf.c | 16 +++-
>> 1 file changed, 7 insertions(+), 9 deletions(-)
>> 
>> diff --git a/kernel/trace/trace_event_perf.c 
>> b/kernel/trace/trace_event_perf.c
>> index 76217bbef815..4629a6104474 100644
>> --- a/kernel/trace/trace_event_perf.c
>> +++ b/kernel/trace/trace_event_perf.c
>> @@ -299,15 +299,13 @@ int perf_uprobe_init(struct perf_event *p_event,
>> 
>>  if (!p_event->attr.uprobe_path)
>>  return -EINVAL;
>> -path = kzalloc(PATH_MAX, GFP_KERNEL);
>> -if (!path)
>> -return -ENOMEM;
>> -ret = strncpy_from_user(
>> -path, u64_to_user_ptr(p_event->attr.uprobe_path), PATH_MAX);
>> -if (ret == PATH_MAX)
>> -return -E2BIG;
>> -if (ret < 0)
>> -goto out;
>> +
>> +path = strndup_user(u64_to_user_ptr(p_event->attr.uprobe_path),
>> +PATH_MAX);
>> +if (IS_ERR(path)) {
>> +ret = PTR_ERR(path);
>> +return (ret == -EINVAL) ? -E2BIG : ret;
>> +}
>>  if (path[0] == '\0') {
>>  ret = -EINVAL;
>>  goto out;
>> -- 
>> 2.21.0.rc0.258.g878e2cd30e-goog
>> 
> 
> 
> -- 
> Masami Hiramatsu

Re: [PATCH 15/17] perf bpf-event: Add missing new line into pr_debug call

2019-02-20 Thread Song Liu




> On Feb 20, 2019, at 5:25 PM, Arnaldo Carvalho de Melo  wrote:
> 
> From: Jiri Olsa 
> 
> Add a missing new line into pr_debug call in 
> perf_event__synthesize_bpf_events(),
> so that the error message does not screw the verbose output.
> 
> Signed-off-by: Jiri Olsa 
> Cc: Alexander Shishkin 
> Cc: Andi Kleen 
> Cc: Namhyung Kim 
> Cc: Peter Zijlstra 
> Cc: Song Liu 
> Link: http://lkml.kernel.org/r/20190220122800.864-5-jo...@kernel.org
> Signed-off-by: Arnaldo Carvalho de Melo 

Acked-by: Song Liu 

Thanks for fixing this.

Song

> ---
> tools/perf/util/bpf-event.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/bpf-event.c b/tools/perf/util/bpf-event.c
> index 62dda96b0096..028c8ec1f62a 100644
> --- a/tools/perf/util/bpf-event.c
> +++ b/tools/perf/util/bpf-event.c
> @@ -233,7 +233,7 @@ int perf_event__synthesize_bpf_events(struct perf_tool 
> *tool,
>   err = 0;
>   break;
>   }
> - pr_debug("%s: can't get next program: %s%s",
> + pr_debug("%s: can't get next program: %s%s\n",
>__func__, strerror(errno),
>errno == EINVAL ? " -- kernel too old?" : "");
>   /* don't report error on old kernel or EPERM  */
> -- 
> 2.19.1
>

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Greg Kroah-Hartman

On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
> On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> > On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> > > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> > > >Greeting,
> > > >
> > > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due 
> > > >to commit:
> > > >
> > > >
> > > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
> > > >device->knode_class to device_private")
> > > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > >
> > > 
> > > This is interesting.
> > > 
> > > I didn't expect the move of this field will impact the performance.
> > > 
> > > The reason is struct device is a hotter memory than 
> > > device->device_private?
> > > 
> > > >in testcase: will-it-scale
> > > >on test machine: 288 threads Knights Mill with 80G memory
> > > >with following parameters:
> > > >
> > > > nr_task: 100%
> > > > mode: thread
> > > > test: unlink2
> > > > cpufreq_governor: performance
> > > >
> > > >test-description: Will It Scale takes a testcase and runs it from 1 
> > > >through to n parallel copies to see if the testcase will scale. It 
> > > >builds both a process and threads based test in order to see any 
> > > >differences between the two.
> > > >test-url: https://github.com/antonblanchard/will-it-scale
> > > >
> > > >In addition to that, the commit also has significant impact on the 
> > > >following tests:
> > > >
> > > >+--+---+
> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% 
> > > >regression |
> > > >| test machine | 288 threads Knights Mill with 80G memory
> > > >  |
> > > >| test parameters  | cpufreq_governor=performance
> > > >  |
> > > >|  | mode=thread 
> > > >  |
> > > >|  | nr_task=100%
> > > >  |
> > > >|  | test=signal1
> > > >  |
> > 
> > Ok, I'm going to blame your testing system, or something here, and not
> > the above patch.
> > 
> > All this test does is call raise(3).  That does not touch the driver
> > core at all.
> > 
> > > >+--+---+
> > > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% 
> > > >regression |
> > > >| test machine | 288 threads Knights Mill with 80G memory
> > > >  |
> > > >| test parameters  | cpufreq_governor=performance
> > > >  |
> > > >|  | mode=thread 
> > > >  |
> > > >|  | nr_task=100%
> > > >  |
> > > >|  | test=open1  
> > > >  |
> > > >+--+---+
> > 
> > Same here, open1 just calls open/close a lot.  No driver core
> > interaction at all there either.
> > 
> > So are you _sure_ this is the offending patch?
> 
> Hi Greg,
> 
> We did an experiment, recovered the layout of struct device. and we
> found the regression is gone. I guess the regession is not from the
> patch but related to the struct layout.
> 
> 
> tests: 1
> testcase/path_params/tbox_group/run: 
> will-it-scale/performance-thread-100%-unlink2/lkp-knm01
> 
> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>   --  
>  %stddev  change %stddev
>  \  |\  
> 237096  14% 270789will-it-scale.workload
>823  14%939will-it-scale.per_thread_ops
> 
> 
> tests: 1
> testcase/path_params/tbox_group/run: 
> will-it-scale/performance-thread-100%-signal1/lkp-knm01
> 
> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>   --  
>  %stddev  change %stddev
>  \  |\  
>  93.51 ±  3%48% 138.53 ±  3%  will-it-scale.time.user_time
>186  40%261will-it-scale.per_thread_ops
>  53909  40%  75507will-it-scale.workload
> 
> 
> tests: 1
> testcase/path_params/tbox_group/run: 
> will-it-scale/performance-thread-100%-open1/lkp-knm01
> 
> 570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>   --  
>  %stddev  change %stddev
>  \  |\  
> 447722  22% 546258 ± 10%  
> will-it-scale.time.involuntary_context_switches
> 226995

Re: [PATCH] KVM: MMU: record maximum physical address width in kvm_mmu_extended_role

2019-02-20 Thread Yu Zhang

On Wed, Feb 20, 2019 at 03:06:10PM +0100, Vitaly Kuznetsov wrote:
> Yu Zhang  writes:
> 
> > Previously, commit 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow
> > MMU reconfiguration is needed") offered some optimization to avoid
> > the unnecessary reconfiguration. Yet one scenario is broken - when
> > cpuid changes VM's maximum physical address width, reconfiguration
> > is needed to reset the reserved bits.  Also, the TDP may need to
> > reset its shadow_root_level when this value is changed.
> >
> > To fix this, a new field, maxphyaddr, is introduced in the extended
> > role structure to keep track of the configured guest physical address
> > width.
> >
> > Signed-off-by: Yu Zhang 
> > ---
> > Cc: Paolo Bonzini 
> > Cc: "Radim Krčmář" 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: Borislav Petkov 
> > Cc: "H. Peter Anvin" 
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  arch/x86/include/asm/kvm_host.h | 1 +
> >  arch/x86/kvm/mmu.c  | 1 +
> >  2 files changed, 2 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index 4660ce9..be87f71 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -299,6 +299,7 @@ struct kvm_mmu_memory_cache {
> > unsigned int cr4_smap:1;
> > unsigned int cr4_smep:1;
> > unsigned int cr4_la57:1;
> > +   unsigned int maxphyaddr:6;
> > };
> >  };
> >  
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index ce770b4..2b74505 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -4769,6 +4769,7 @@ static union kvm_mmu_extended_role 
> > kvm_calc_mmu_role_ext(struct kvm_vcpu *vcpu)
> > ext.cr4_pse = !!is_pse(vcpu);
> > ext.cr4_pke = !!kvm_read_cr4_bits(vcpu, X86_CR4_PKE);
> > ext.cr4_la57 = !!kvm_read_cr4_bits(vcpu, X86_CR4_LA57);
> > +   ext.maxphyaddr = cpuid_maxphyaddr(vcpu);
> >  
> > ext.valid = 1;
> 
> It seems that we can now drop 'valid' from role_ext as maxphyaddr can't
> be 0.

Thanks, Vitaly. Yes, we can drop this field. :)

> 
> Reviewed-by: Vitaly Kuznetsov 
> 
> -- 
> Vitaly
> 

B.R.
Yu

Re: [PATCH] iio: cros_ec_accel_legacy: Refactor code in cros_ec_accel_legacy_probe

2019-02-20 Thread Kees Cook

On Wed, Feb 20, 2019 at 6:06 PM Gustavo A. R. Silva
 wrote:
>
> Refactor some code in order to fix both the technical implementation
> and the following warnings:
>
> drivers/iio/accel/cros_ec_accel_legacy.c: In function 
> ‘cros_ec_accel_legacy_probe’:
> drivers/iio/accel/cros_ec_accel_legacy.c:387:36: warning: this statement may 
> fall through [-Wimplicit-fallthrough=]
> ec_accel_channels[X].scan_index = Y;
> ^~~
> drivers/iio/accel/cros_ec_accel_legacy.c:388:3: note: here
>case Y:
>^~~~
> drivers/iio/accel/cros_ec_accel_legacy.c:389:36: warning: this statement may 
> fall through [-Wimplicit-fallthrough=]
> ec_accel_channels[Y].scan_index = X;
> ^~~
> drivers/iio/accel/cros_ec_accel_legacy.c:390:3: note: here
>case Z:
>^~~~
>
> Notice that neither the for loop nor the switch statement is needed.
> Also, "state->sign[Y] = 1" should be unconditional.
>
> This patch is part of the ongoing efforts to enable
> -Wimplicit-fallthrough.
>
> Signed-off-by: Gustavo A. R. Silva 

Acked-by: Kees Cook 

-Kees

> ---
>  drivers/iio/accel/cros_ec_accel_legacy.c | 27 +++-
>  1 file changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iio/accel/cros_ec_accel_legacy.c 
> b/drivers/iio/accel/cros_ec_accel_legacy.c
> index 063e89eff791..021f9f5cd3bb 100644
> --- a/drivers/iio/accel/cros_ec_accel_legacy.c
> +++ b/drivers/iio/accel/cros_ec_accel_legacy.c
> @@ -353,7 +353,7 @@ static int cros_ec_accel_legacy_probe(struct 
> platform_device *pdev)
> struct cros_ec_sensor_platform *sensor_platform = 
> dev_get_platdata(dev);
> struct iio_dev *indio_dev;
> struct cros_ec_accel_legacy_state *state;
> -   int ret, i;
> +   int ret;
>
> if (!ec || !ec->ec_dev) {
> dev_warn(>dev, "No EC device found.\n");
> @@ -381,20 +381,17 @@ static int cros_ec_accel_legacy_probe(struct 
> platform_device *pdev)
>  * Present the channel using HTML5 standard:
>  * need to invert X and Y and invert some lid axis.
>  */
> -   for (i = X ; i < MAX_AXIS; i++) {
> -   switch (i) {
> -   case X:
> -   ec_accel_channels[X].scan_index = Y;
> -   case Y:
> -   ec_accel_channels[Y].scan_index = X;
> -   case Z:
> -   ec_accel_channels[Z].scan_index = Z;
> -   }
> -   if (state->sensor_num == MOTIONSENSE_LOC_LID && i != Y)
> -   state->sign[i] = -1;
> -   else
> -   state->sign[i] = 1;
> -   }
> +   ec_accel_channels[X].scan_index = Y;
> +   ec_accel_channels[Y].scan_index = X;
> +   ec_accel_channels[Z].scan_index = Z;
> +
> +   state->sign[Y] = 1;
> +
> +   if (state->sensor_num == MOTIONSENSE_LOC_LID)
> +   state->sign[X] = state->sign[Z] = -1;
> +   else
> +   state->sign[X] = state->sign[Z] = 1;
> +
> indio_dev->num_channels = ARRAY_SIZE(ec_accel_channels);
> indio_dev->dev.parent = >dev;
> indio_dev->info = _ec_accel_legacy_info;
> --
> 2.20.1
>


-- 
Kees Cook

[PATCH] perf record: Add support for limit perf output file size

2019-02-20 Thread Jiwei Sun

The patch adds a new option to limit the output file size, then based
on it, we can create a wrapper of the perf command that uses the option
to avoid exhausting the disk space by the unconscious user.

Signed-off-by: Jiwei Sun 
---
 tools/perf/builtin-record.c | 39 +
 1 file changed, 39 insertions(+)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 882285fb9f64..28a03929166d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -81,6 +81,7 @@ struct record {
booltimestamp_boundary;
struct switch_outputswitch_output;
unsigned long long  samples;
+   unsigned long   output_max_size;/* = 0: unlimited */
 };
 
 static volatile int auxtrace_record__snapshot_started;
@@ -106,6 +107,12 @@ static bool switch_output_time(struct record *rec)
   trigger_is_ready(_output_trigger);
 }
 
+static bool record__output_max_size_exceeded(struct record *rec)
+{
+   return (rec->output_max_size &&
+   rec->bytes_written >= rec->output_max_size);
+}
+
 static int record__write(struct record *rec, struct perf_mmap *map 
__maybe_unused,
 void *bf, size_t size)
 {
@@ -118,6 +125,9 @@ static int record__write(struct record *rec, struct 
perf_mmap *map __maybe_unuse
 
rec->bytes_written += size;
 
+   if (record__output_max_size_exceeded(rec))
+   raise(SIGTERM);
+
if (switch_output_size(rec))
trigger_hit(_output_trigger);
 
@@ -1639,6 +1649,33 @@ static int parse_clockid(const struct option *opt, const 
char *str, int unset)
return -1;
 }
 
+static int parse_output_max_size(const struct option *opt, const char *str,
+int unset)
+{
+   unsigned long *s = (unsigned long *)opt->value;
+   static struct parse_tag tags_size[] = {
+   { .tag  = 'B', .mult = 1   },
+   { .tag  = 'K', .mult = 1 << 10 },
+   { .tag  = 'M', .mult = 1 << 20 },
+   { .tag  = 'G', .mult = 1 << 30 },
+   { .tag  = 0 },
+   };
+   unsigned long val;
+
+   if (unset) {
+   *s = 0;
+   return 0;
+   }
+
+   val = parse_tag_value(str, tags_size);
+   if (val != (unsigned long) -1) {
+   *s = val;
+   return 0;
+   }
+
+   return -1;
+}
+
 static int record__parse_mmap_pages(const struct option *opt,
const char *str,
int unset __maybe_unused)
@@ -1946,6 +1983,8 @@ static struct option __record_options[] = {
 _cblocks_default, "n", "Use  control blocks in 
asynchronous trace writing mode (default: 1, max: 4)",
 record__aio_parse),
 #endif
+   OPT_CALLBACK(0, "output-max-size", _max_size,
+"size", "Output file maximum size", parse_output_max_size),
OPT_END()
 };
 
-- 
2.20.1

[PATCH 0/3] RPMPD for QCS404

2019-02-20 Thread Bjorn Andersson

Reworkd the macros of the rpmpd driver and add qcs404 power domains, then add
this to the dts.

Bjorn Andersson (3):
  soc: qcom: rpmpd: Modify corner defining macros
  soc: qcom: rpmpd: Add QCS404 corners
  arm64: dts: qcom: qcs404: Add rpmpd node

 .../devicetree/bindings/power/qcom,rpmpd.txt  |  1 +
 arch/arm64/boot/dts/qcom/qcs404.dtsi  | 35 +++
 drivers/soc/qcom/rpmpd.c  | 63 +--
 include/dt-bindings/power/qcom-rpmpd.h|  9 +++
 4 files changed, 88 insertions(+), 20 deletions(-)

-- 
2.18.0

[PATCH 2/3] soc: qcom: rpmpd: Add QCS404 corners

2019-02-20 Thread Bjorn Andersson

Add the shared cx/mx and the low-power-island's cx and mx power-domains
found on QCS404.

Signed-off-by: Bjorn Andersson 
---
 .../devicetree/bindings/power/qcom,rpmpd.txt  |  1 +
 drivers/soc/qcom/rpmpd.c  | 29 +++
 include/dt-bindings/power/qcom-rpmpd.h|  9 ++
 3 files changed, 39 insertions(+)

diff --git a/Documentation/devicetree/bindings/power/qcom,rpmpd.txt 
b/Documentation/devicetree/bindings/power/qcom,rpmpd.txt
index 980e5413d18f..b6c596883ea4 100644
--- a/Documentation/devicetree/bindings/power/qcom,rpmpd.txt
+++ b/Documentation/devicetree/bindings/power/qcom,rpmpd.txt
@@ -6,6 +6,7 @@ which then translates it into a corresponding voltage on a rail
 Required Properties:
  - compatible: Should be one of the following
* qcom,msm8996-rpmpd: RPM Power domain for the msm8996 family of SoC
+   * qcom,qcs404-rpmpd: RPM Power domain for the qcs404
* qcom,sdm845-rpmhpd: RPMh Power domain for the sdm845 family of SoC
  - #power-domain-cells: number of cells in Power domain specifier
must be 1.
diff --git a/drivers/soc/qcom/rpmpd.c b/drivers/soc/qcom/rpmpd.c
index 74b2f001b9c6..0001fcafaf97 100644
--- a/drivers/soc/qcom/rpmpd.c
+++ b/drivers/soc/qcom/rpmpd.c
@@ -19,6 +19,9 @@
 /* Resource types */
 #define RPMPD_SMPA 0x61706d73 /* smpa */
 #define RPMPD_LDOA 0x616f646c /* ldoa */
+#define RPMPD_RWMX 0x786d7772 /* rwmx */
+#define RPMPD_RWLC 0x636c7772 /* rwlc */
+#define RPMPD_RWLM 0x6d6c7772 /* rwlm */
 
 /* Operation Keys */
 #define KEY_CORNER 0x6e726f63 /* corn */
@@ -110,8 +113,34 @@ static const struct rpmpd_desc msm8996_desc = {
.num_pds = ARRAY_SIZE(msm8996_rpmpds),
 };
 
+/* qcs404 RPM Power domains */
+DEFINE_RPMPD_CORNER_PAIR(qcs404, vddmx, vddmx_ao, RWMX, 0);
+DEFINE_RPMPD_VFC(qcs404, vddmx_vfc, RWMX, 0);
+
+DEFINE_RPMPD_CORNER(qcs404, vdd_lpicx, RWLC, 0);
+DEFINE_RPMPD_VFC(qcs404, vdd_lpicx_vfc, RWLC, 0);
+
+DEFINE_RPMPD_CORNER(qcs404, vdd_lpimx, RWLM, 0);
+DEFINE_RPMPD_VFC(qcs404, vdd_lpimx_vfc, RWLM, 0);
+
+static struct rpmpd *qcs404_rpmpds[] = {
+   [QCS404_VDDMX] = _vddmx,
+   [QCS404_VDDMX_AO] = _vddmx_ao,
+   [QCS404_VDDMX_VFC] = _vddmx_vfc,
+   [QCS404_LPICX] = _vdd_lpicx,
+   [QCS404_LPICX_VFC] = _vdd_lpicx_vfc,
+   [QCS404_LPIMX] = _vdd_lpimx,
+   [QCS404_LPIMX_VFC] = _vdd_lpimx_vfc,
+};
+
+static const struct rpmpd_desc qcs404_desc = {
+   .rpmpds = qcs404_rpmpds,
+   .num_pds = ARRAY_SIZE(qcs404_rpmpds),
+};
+
 static const struct of_device_id rpmpd_match_table[] = {
{ .compatible = "qcom,msm8996-rpmpd", .data = _desc },
+   { .compatible = "qcom,qcs404-rpmpd", .data = _desc },
{ }
 };
 
diff --git a/include/dt-bindings/power/qcom-rpmpd.h 
b/include/dt-bindings/power/qcom-rpmpd.h
index 87d9c6611682..0b1b147292a3 100644
--- a/include/dt-bindings/power/qcom-rpmpd.h
+++ b/include/dt-bindings/power/qcom-rpmpd.h
@@ -36,4 +36,13 @@
 #define MSM8996_VDDSSCX5
 #define MSM8996_VDDSSCX_VFC6
 
+/* QCS404 Power Domains */
+#define QCS404_VDDMX   0
+#define QCS404_VDDMX_AO1
+#define QCS404_VDDMX_VFC   2
+#define QCS404_LPICX   3
+#define QCS404_LPICX_VFC   4
+#define QCS404_LPIMX   5
+#define QCS404_LPIMX_VFC   6
+
 #endif
-- 
2.18.0

[PATCH 1/3] soc: qcom: rpmpd: Modify corner defining macros

2019-02-20 Thread Bjorn Andersson

QCS404 uses individual resource type magic for each power-domain, so
adjust the macros slightly to make them reusable for this.

Signed-off-by: Bjorn Andersson 
---
 drivers/soc/qcom/rpmpd.c | 34 ++
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/drivers/soc/qcom/rpmpd.c b/drivers/soc/qcom/rpmpd.c
index 005326050c23..74b2f001b9c6 100644
--- a/drivers/soc/qcom/rpmpd.c
+++ b/drivers/soc/qcom/rpmpd.c
@@ -17,8 +17,8 @@
 #define domain_to_rpmpd(domain) container_of(domain, struct rpmpd, pd)
 
 /* Resource types */
-#define RPMPD_SMPA 0x61706d73
-#define RPMPD_LDOA 0x616f646c
+#define RPMPD_SMPA 0x61706d73 /* smpa */
+#define RPMPD_LDOA 0x616f646c /* ldoa */
 
 /* Operation Keys */
 #define KEY_CORNER 0x6e726f63 /* corn */
@@ -27,12 +27,12 @@
 
 #define MAX_RPMPD_STATE6
 
-#define DEFINE_RPMPD_CORNER_SMPA(_platform, _name, _active, r_id)  
\
+#define DEFINE_RPMPD_CORNER_PAIR(_platform, _name, _active, r_type, r_id) \
static struct rpmpd _platform##_##_active;  \
static struct rpmpd _platform##_##_name = { \
.pd = { .name = #_name, },  \
.peer = &_platform##_##_active, \
-   .res_type = RPMPD_SMPA, \
+   .res_type = RPMPD_##r_type, \
.res_id = r_id, \
.key = KEY_CORNER,  \
};  \
@@ -40,33 +40,27 @@
.pd = { .name = #_active, },\
.peer = &_platform##_##_name,   \
.active_only = true,\
-   .res_type = RPMPD_SMPA, \
+   .res_type = RPMPD_##r_type, \
.res_id = r_id, \
.key = KEY_CORNER,  \
}
 
-#define DEFINE_RPMPD_CORNER_LDOA(_platform, _name, r_id)   
\
+#define DEFINE_RPMPD_CORNER(_platform, _name, r_type, r_id)\
static struct rpmpd _platform##_##_name = { \
.pd = { .name = #_name, },  \
-   .res_type = RPMPD_LDOA, \
+   .res_type = RPMPD_##r_type, \
.res_id = r_id, \
.key = KEY_CORNER,  \
}
 
-#define DEFINE_RPMPD_VFC(_platform, _name, r_id, r_type)   \
+#define DEFINE_RPMPD_VFC(_platform, _name, r_type, r_id)   \
static struct rpmpd _platform##_##_name = { \
.pd = { .name = #_name, },  \
-   .res_type = r_type, \
+   .res_type = RPMPD_##r_type, \
.res_id = r_id, \
.key = KEY_FLOOR_CORNER,\
}
 
-#define DEFINE_RPMPD_VFC_SMPA(_platform, _name, r_id)  \
-   DEFINE_RPMPD_VFC(_platform, _name, r_id, RPMPD_SMPA)
-
-#define DEFINE_RPMPD_VFC_LDOA(_platform, _name, r_id)  \
-   DEFINE_RPMPD_VFC(_platform, _name, r_id, RPMPD_LDOA)
-
 struct rpmpd_req {
__le32 key;
__le32 nbytes;
@@ -94,12 +88,12 @@ struct rpmpd_desc {
 static DEFINE_MUTEX(rpmpd_lock);
 
 /* msm8996 RPM Power domains */
-DEFINE_RPMPD_CORNER_SMPA(msm8996, vddcx, vddcx_ao, 1);
-DEFINE_RPMPD_CORNER_SMPA(msm8996, vddmx, vddmx_ao, 2);
-DEFINE_RPMPD_CORNER_LDOA(msm8996, vddsscx, 26);
+DEFINE_RPMPD_CORNER_PAIR(msm8996, vddcx, vddcx_ao, SMPA, 1);
+DEFINE_RPMPD_CORNER_PAIR(msm8996, vddmx, vddmx_ao, SMPA, 2);
+DEFINE_RPMPD_CORNER(msm8996, vddsscx, LDOA, 26);
 
-DEFINE_RPMPD_VFC_SMPA(msm8996, vddcx_vfc, 1);
-DEFINE_RPMPD_VFC_LDOA(msm8996, vddsscx_vfc, 26);
+DEFINE_RPMPD_VFC(msm8996, vddcx_vfc, SMPA, 1);
+DEFINE_RPMPD_VFC(msm8996, vddsscx_vfc, LDOA, 26);
 
 static struct rpmpd *msm8996_rpmpds[] = {
[MSM8996_VDDCX] =   _vddcx,
-- 
2.18.0

[PATCH 3/3] arm64: dts: qcom: qcs404: Add rpmpd node

2019-02-20 Thread Bjorn Andersson

Add the rpmpd node on the qcs404 and define the available levels.

Signed-off-by: Bjorn Andersson 
---
 arch/arm64/boot/dts/qcom/qcs404.dtsi | 35 
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/qcs404.dtsi 
b/arch/arm64/boot/dts/qcom/qcs404.dtsi
index 6c86b267da82..1ceb0432af55 100644
--- a/arch/arm64/boot/dts/qcom/qcs404.dtsi
+++ b/arch/arm64/boot/dts/qcom/qcs404.dtsi
@@ -4,6 +4,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
interrupt-parent = <>;
@@ -240,6 +241,40 @@
compatible = "qcom,rpmcc-qcs404";
#clock-cells = <1>;
};
+
+   rpmpd: power-controller {
+   compatible = "qcom,qcs404-rpmpd";
+   #power-domain-cells = <1>;
+   operating-points-v2 = <_opp_table>;
+
+   rpmpd_opp_table: opp-table {
+   compatible = "operating-points-v2";
+
+   rpmpd_opp1: opp1 {
+   opp-level = <1>;
+   };
+
+   rpmpd_opp2: opp2 {
+   opp-level = <2>;
+   };
+
+   rpmpd_opp3: opp3 {
+   opp-level = <3>;
+   };
+
+   rpmpd_opp4: opp4 {
+   opp-level = <4>;
+   };
+
+   rpmpd_opp5: opp5 {
+   opp-level = <5>;
+   };
+
+   rpmpd_opp6: opp6 {
+   opp-level = <6>;
+   };
+   };
+   };
};
};
 
-- 
2.18.0

[PATCH V8 4/4] arm64: dts: imx: add i.MX8QXP thermal support

2019-02-20 Thread Anson Huang

Add i.MX8QXP CPU thermal zone support.

Signed-off-by: Anson Huang 
---
Changes since V7:
- move the "imx,sensor-resource-id" to scu tsens node;
- correct #thermal-sensor-cells value to be 0 as there is ONLY one 
thermal zone now;
- add cooling map for passive mode.
---
 arch/arm64/boot/dts/freescale/imx8qxp.dtsi | 35 ++
 1 file changed, 35 insertions(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8qxp.dtsi 
b/arch/arm64/boot/dts/freescale/imx8qxp.dtsi
index 4c3dd95..eccdf28 100644
--- a/arch/arm64/boot/dts/freescale/imx8qxp.dtsi
+++ b/arch/arm64/boot/dts/freescale/imx8qxp.dtsi
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
interrupt-parent = <>;
@@ -34,6 +35,7 @@
reg = <0x0 0x0>;
enable-method = "psci";
next-level-cache = <_L2>;
+   #cooling-cells = <2>;
};
 
A35_1: cpu@1 {
@@ -116,6 +118,12 @@
rtc: rtc {
compatible = "fsl,imx8qxp-sc-rtc";
};
+
+   tsens: thermal-sensor {
+   compatible = "fsl,imx8qxp-sc-thermal", 
"fsl,imx-sc-thermal";
+   #thermal-sensor-cells = <0>;
+   imx,sensor-resource-id = ;
+   };
};
 
timer {
@@ -443,4 +451,31 @@
power-domains = < IMX_SC_R_GPIO_7>;
};
};
+
+   thermal_zones: thermal-zones {
+   cpu-thermal0 {
+   polling-delay-passive = <250>;
+   polling-delay = <2000>;
+   thermal-sensors = < 0>;
+   trips {
+   cpu_alert0: trip0 {
+   temperature = <107000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+   cpu_crit0: trip1 {
+   temperature = <127000>;
+   hysteresis = <2000>;
+   type = "critical";
+   };
+   };
+   cooling-maps {
+   map0 {
+   trip = <_alert0>;
+   cooling-device =
+   <_0 THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>;
+   };
+   };
+   };
+   };
 };
-- 
2.7.4

[PATCH V8 3/4] defconfig: arm64: add i.MX system controller thermal support

2019-02-20 Thread Anson Huang

This patch enables CONFIG_IMX_SC_THERMAL as module.

Signed-off-by: Anson Huang 
---
No change.
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 2d9c390..52d503e 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -413,6 +413,7 @@ CONFIG_SENSORS_INA2XX=m
 CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y
 CONFIG_CPU_THERMAL=y
 CONFIG_THERMAL_EMULATION=y
+CONFIG_IMX_SC_THERMAL=m
 CONFIG_ROCKCHIP_THERMAL=m
 CONFIG_RCAR_THERMAL=y
 CONFIG_RCAR_GEN3_THERMAL=y
-- 
2.7.4

[PATCH V8 2/4] thermal: imx_sc: add i.MX system controller thermal support

2019-02-20 Thread Anson Huang

i.MX8QXP is an ARMv8 SoC which has a Cortex-M4 system controller
inside, the system controller is in charge of controlling power,
clock and thermal sensors etc..

This patch adds i.MX system controller thermal driver support,
Linux kernel has to communicate with system controller via MU
(message unit) IPC to get each thermal sensor's temperature,
it supports multiple sensors which are passed from device tree,
please see the binding doc for details.

Signed-off-by: Anson Huang 
---
Changes since V7:
- remove unused structure imx_sc_thermal_data to simply the driver;
- move the "imx,sensor-resource-id" property from thermal zone node to 
scu tsens node, and
  get this property using phandle;
- remove unused sensor number got from dts, now it is not needed.
---
 drivers/thermal/Kconfig  |  11 +++
 drivers/thermal/Makefile |   1 +
 drivers/thermal/imx_sc_thermal.c | 144 +++
 3 files changed, 156 insertions(+)
 create mode 100644 drivers/thermal/imx_sc_thermal.c

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 58bb7d7..fec0ef5 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -223,6 +223,17 @@ config IMX_THERMAL
  cpufreq is used as the cooling device to throttle CPUs when the
  passive trip is crossed.
 
+config IMX_SC_THERMAL
+   tristate "Temperature sensor driver for NXP i.MX SoCs with System 
Controller"
+   depends on (ARCH_MXC && IMX_SCU) || COMPILE_TEST
+   depends on OF
+   help
+ Support for Temperature Monitor (TEMPMON) found on NXP i.MX SoCs with
+ system controller inside, Linux kernel has to communicate with system
+ controller via MU (message unit) IPC to get temperature from thermal
+ sensor. It supports one critical trip point and one
+ passive trip point for each thermal sensor.
+
 config MAX77620_THERMAL
tristate "Temperature sensor driver for Maxim MAX77620 PMIC"
depends on MFD_MAX77620
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 486d682..4062627 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -40,6 +40,7 @@ obj-$(CONFIG_DB8500_THERMAL)  += db8500_thermal.o
 obj-$(CONFIG_ARMADA_THERMAL)   += armada_thermal.o
 obj-$(CONFIG_TANGO_THERMAL)+= tango_thermal.o
 obj-$(CONFIG_IMX_THERMAL)  += imx_thermal.o
+obj-$(CONFIG_IMX_SC_THERMAL)   += imx_sc_thermal.o
 obj-$(CONFIG_MAX77620_THERMAL) += max77620_thermal.o
 obj-$(CONFIG_QORIQ_THERMAL)+= qoriq_thermal.o
 obj-$(CONFIG_DA9062_THERMAL)   += da9062-thermal.o
diff --git a/drivers/thermal/imx_sc_thermal.c b/drivers/thermal/imx_sc_thermal.c
new file mode 100644
index 000..145e73b
--- /dev/null
+++ b/drivers/thermal/imx_sc_thermal.c
@@ -0,0 +1,144 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Copyright 2018-2019 NXP.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "thermal_core.h"
+
+#define IMX_SC_MISC_FUNC_GET_TEMP  13
+#define IMX_SC_C_TEMP  0
+
+static struct imx_sc_ipc *thermal_ipc_handle;
+
+struct imx_sc_sensor {
+   struct thermal_zone_device *tzd;
+   u32 resource_id;
+};
+
+struct req_get_temp {
+   u16 resource_id;
+   u8 type;
+} __packed;
+
+struct resp_get_temp {
+   u16 celsius;
+   u8 tenths;
+} __packed;
+
+struct imx_sc_msg_misc_get_temp {
+   struct imx_sc_rpc_msg hdr;
+   union {
+   struct req_get_temp req;
+   struct resp_get_temp resp;
+   } data;
+} __packed;
+
+static int imx_sc_thermal_get_temp(void *data, int *temp)
+{
+   struct imx_sc_msg_misc_get_temp msg;
+   struct imx_sc_rpc_msg *hdr = 
+   struct imx_sc_sensor *sensor = data;
+   int ret;
+
+   msg.data.req.resource_id = sensor->resource_id;
+   msg.data.req.type = IMX_SC_C_TEMP;
+
+   hdr->ver = IMX_SC_RPC_VERSION;
+   hdr->svc = IMX_SC_RPC_SVC_MISC;
+   hdr->func = IMX_SC_MISC_FUNC_GET_TEMP;
+   hdr->size = 2;
+
+   ret = imx_scu_call_rpc(thermal_ipc_handle, , true);
+   if (ret) {
+   pr_err("read temp sensor %d failed, ret %d\n",
+   sensor->resource_id, ret);
+   return ret;
+   }
+
+   *temp = msg.data.resp.celsius * 1000 + msg.data.resp.tenths * 100;
+
+   return 0;
+}
+
+static const struct thermal_zone_of_device_ops imx_sc_thermal_ops = {
+   .get_temp = imx_sc_thermal_get_temp,
+};
+
+static int imx_sc_thermal_probe(struct platform_device *pdev)
+{
+   struct device_node *np, *sensor_np;
+   int ret, i = 0;
+
+   ret = imx_scu_get_handle(_ipc_handle);
+   if (ret)
+   return ret;
+
+   np = of_find_node_by_name(NULL, "thermal-zones");
+   if (!np)
+   return -ENODEV;
+
+   for_each_available_child_of_node(np, sensor_np) {
+   struct of_phandle_args tsens_args;
+

[PATCH V8 1/4] dt-bindings: fsl: scu: add thermal binding

2019-02-20 Thread Anson Huang

NXP i.MX8QXP is an ARMv8 SoC with a Cortex-M4 core inside as
system controller, the system controller is in charge of system
power, clock and thermal sensors etc. management, Linux kernel
has to communicate with system controller via MU (message unit)
IPC to get temperature from thermal sensors, this patch adds
binding doc for i.MX system controller thermal driver.

Signed-off-by: Anson Huang 
Reviewed-by: Rob Herring 
---
Changes since V7:
- remove unused property "tsens-num";
- improve the compatible description;
- update examples according to latest dts file.
---
 .../devicetree/bindings/arm/freescale/fsl,scu.txt | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt 
b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
index 72d481c..d89147e 100644
--- a/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
+++ b/Documentation/devicetree/bindings/arm/freescale/fsl,scu.txt
@@ -122,6 +122,19 @@ RTC bindings based on SCU Message Protocol
 Required properties:
 - compatible: should be "fsl,imx8qxp-sc-rtc";
 
+Thermal bindings based on SCU Message Protocol
+
+
+Required properties:
+- compatible:  Should be :
+ "fsl,imx8qxp-sc-thermal"
+   followed by "fsl,imx-sc-thermal";
+
+- #thermal-sensor-cells:   See 
Documentation/devicetree/bindings/thermal/thermal.txt
+   for a description.
+
+- imx,sensor-resource-id:  Property array to specify each thermal zone's 
sensor resource ID.
+
 Example (imx8qxp):
 -
 lsio_mu1: mailbox@5d1c {
@@ -168,6 +181,12 @@ firmware {
rtc: rtc {
compatible = "fsl,imx8qxp-sc-rtc";
};
+
+   tsens: thermal-sensor {
+   compatible = "fsl,imx8qxp-sc-thermal", 
"fsl,imx-sc-thermal";
+   #thermal-sensor-cells = <0>;
+   imx,sensor-resource-id = ;
+   };
};
 };
 
-- 
2.7.4

Re: [PATCH v2 1/3] x86/cpufeatures: Enumerate user wait instructions

2019-02-20 Thread Andy Lutomirski

On Wed, Feb 20, 2019 at 7:44 PM Tao Xu  wrote:
>
> From: Fenghua Yu 

>
> From patchwork Wed Jan 16 21:18:41 2019
> Content-Type: text/plain; charset="utf-8"

[snipped more stuff like this]

What happened here?

> +/* Return value that will be used to set umwait control MSR */
> +static inline u32 umwait_control_val(void)
> +{
> +   /*
> +* Enable or disable C0.2 (bit 0) based on global setting on all CPUs.
> +* When bit 0 is 1, C0.2 is disabled. Otherwise, C0.2 is enabled.
> +* So value in bit 0 is opposite of umwait_enable_c0_2.
> +*/
> +   return ~umwait_enable_c0_2 & UMWAIT_CONTROL_C02_MASK;
> +}

This function is horribly named.  How about something like
umwait_compute_msr_value() or something liek that?  Also, what
happened to the maximum wait time?

> +
> +static ssize_t umwait_enable_c0_2_show(struct device *dev,
> +  struct device_attribute *attr,
> +  char *buf)
> +{
> +   return sprintf(buf, "%d\n", umwait_enable_c0_2);

I realize that it's traditional to totally ignore races in sysfs and
such, but it's a bad tradition.  Please either READ_ONCE it with a
comment or take the mutex.

> +static ssize_t umwait_enable_c0_2_store(struct device *dev,
> +   struct device_attribute *attr,
> +   const char *buf, size_t count)
> +{
> +   int enable_c0_2, cpu, ret;
> +   u32 msr_val;
> +
> +   ret = kstrtou32(buf, 10, _c0_2);
> +   if (ret)
> +   return ret;
> +
> +   if (enable_c0_2 != 1 && enable_c0_2 != 0)
> +   return -EINVAL;

How about if (enable_c0_2 > 1)?

> +
> +   mutex_lock(_lock);
> +
> +   umwait_enable_c0_2 = enable_c0_2;
> +   msr_val = umwait_control_val();
> +   get_online_cpus();
> +   /* All CPUs have same umwait control setting */
> +   for_each_online_cpu(cpu)
> +   wrmsr_on_cpu(cpu, MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> +   put_online_cpus();
> +
> +   mutex_unlock(_lock);

Please factor this thing out into a helper like
umwait_update_all_cpus().  That helper can assert that the lock is
held.

> +/* Set up umwait control MSR on this CPU using the current global setting. */
> +static int umwait_cpu_online(unsigned int cpu)
> +{
> +   u32 msr_val;
> +
> +   mutex_lock(_lock);
> +
> +   msr_val = umwait_control_val();
> +   wrmsr(MSR_IA32_UMWAIT_CONTROL, msr_val, 0);
> +
> +   mutex_unlock(_lock);
> +
> +   return 0;
> +}
> +
> +static int __init umwait_init(void)
> +{
> +   struct device *dev;
> +   int ret;
> +
> +   if (!boot_cpu_has(X86_FEATURE_WAITPKG))
> +   return -ENODEV;
> +
> +   /* Add CPU global user wait interface to control umwait. */
> +   dev = cpu_subsys.dev_root;
> +   ret = sysfs_create_group(>kobj, _attr_group);
> +   if (ret)
> +   return ret;
> +
> +   ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "umwait/intel:online",
> +   umwait_cpu_online, NULL);

This hotplug notifier thing is awful.  Thomas, do we have a function
that gets called every time a CPU is brought up (via BSP boot, AP
boot, hotplug, hibernation resume, etc) where we can just put all
these things?  cpu_init() is almost appropriate, except that it's
called at somewhat erratic times (quite different for BSP and AP IIRC)
and it's not called AFAICT during hibernation restore.  I suppose we
could add a new thing that is called by cpu_init() and
restore_processor_state().

Also, surely you should actually write the MSR in this function, too.

>
>  static int umwait_enable_c0_2 = 1; /* 0: disable C0.2. 1: enable C0.2. */
> +static u32 umwait_max_time; /* In TSC-quanta. Only bits [31:2] are used. */

I still think the default should be some reasonable nonzero value.  It
should be long enough that we get decent C0.2 residency and short
enough that UMWAIT never gives the impression that it is anything
other than a fancy way to save a bit of power and SMT resources when
spinning.  I don't want to see a situation where some library uses
UMWAIT under the expectation that it genuinely waits for an event,
appears to work well on bare metal on an otherwise idle system, and
falls apart when it's run in a VM guest or with other software
running.  IOW, programs more or less must be written to expect many
spurious wakeups, so I think we should pick a default value so that
there are essentially always many spurious wakeups.

As a guess, I think that the default wait time should be well under 1
ms but at least 20x the C0.2 entry+exit latency.

--Andy
>  static ssize_t umwait_enable_c0_2_show(struct device *dev,
> @@ -61,8 +63,46 @@ static ssize_t umwait_enable_c0_2_store(struct device *dev,
>
>  static DEVICE_ATTR_RW(umwait_enable_c0_2);
>
> +static ssize_t umwait_max_time_show(struct device *kobj,
> +   struct

[PATCH] ipmi_si: fix oops when loading ipmi_si driver

2019-02-20 Thread Yang Yingliang

When we excute the following commands, we got oops
modprobe ipmi_si ports=0xffc0e3 type=bt

[  503.305487] ipmi_si: IPMI System Interface driver
[  503.305489] ipmi_hardcode: probing via hardcoded address
[  503.305491] ipmi_si: Adding hardcoded-specified bt state machine
[  503.305494] ipmi_si: Trying hardcoded-specified bt state machine at i/o 
address 0xffc0e3, slave address 0x0, irq 0
[  503.337865] ipmi_si ipmi_si.0: bt cap response too short: 3
[  503.337867] ipmi_si ipmi_si.0: using default values
[  503.337869] ipmi_si ipmi_si.0: req2rsp=5 secs retries=2
[  503.421948] ipmi_si ipmi_si.0: IPMI message handler: Found new BMC (man_id: 
0x0007db, prod_id: 0x0001, dev_id: 0x01)
[  503.665959] ipmi_si ipmi_si.0: IPMI bt interface initialized
[  512.188061] ipmi_si: module verification failed: signature and/or required 
key missing - tainting kernel
[  512.190063] ipmi_si: IPMI System Interface driver
[  512.190065] ipmi_hardcode: probing via hardcoded address
[  512.190067] ipmi_si: Adding hardcoded-specified bt state machine
[  512.190070] ipmi_si: Trying hardcoded-specified bt state machine at i/o 
address 0xffc0e3, slave address 0x0, irq 0
[  512.201867] ipmi_si ipmi_si.0: bt cap response too short: 3
[  512.201869] ipmi_si ipmi_si.0: using default values
[  512.201870] ipmi_si ipmi_si.0: req2rsp=5 secs retries=2
[  512.269898] Unable to handle kernel NULL pointer dereference at virtual 
address 0030
[  512.269899] Mem abort info:
[  512.269900]   ESR = 0x9606
[  512.269902]   Exception class = DABT (current EL), IL = 32 bits
[  512.269903]   SET = 0, FnV = 0
[  512.269904]   EA = 0, S1PTW = 0
[  512.269905] Data abort info:
[  512.269906]   ISV = 0, ISS = 0x0006
[  512.269908]   CM = 0, WnR = 0
[  512.269910] user pgtable: 4k pages, 48-bit VAs, pgdp = 3f829971
[  512.269912] [0030] pgd=005f1e7de003, pud=005f69728003, 
pmd=
[  512.269916] Internal error: Oops: 9606 [#1] SMP
[  512.274923] Modules linked in: ipmi_si(E+) nls_utf8 isofs dm_mirror 
dm_region_hash dm_log dm_mod aes_ce_blk crypto_simd cryptd aes_ce_cipher 
ghash_ce sha2_ce sha256_arm64 sha1_ce ses hibmc_drm enclosure hisi_sas_v2_hw 
hisi_sas_main sg sbsa_gwdt ip_tables marvell ixgbe hns_dsaf hns_enet_drv 
ipmi_devintf mpt3sas ipmi_msghandler hns_mdio hnae mdio [last unloaded: ipmi_si]
[  512.308100] CPU: 27 PID: 13691 Comm: modprobe Kdump: loaded Tainted: G   
 E 5.0.0-rc7+ #273
[  512.317456] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.37 
11/21/2017
[  512.324781] pstate: a005 (NzCv daif -PAN -UAO)
[  512.329643] pc : sysfs_do_create_link_sd.isra.0+0x5c/0x110
[  512.335204] lr : sysfs_do_create_link_sd.isra.0+0x50/0x110
[  512.340763] sp : 12a8b8b0
[  512.344118] x29: 12a8b8b0 x28: 805f69bf8818
[  512.349505] x27:  x26: 
[  512.354891] x25: 1184 x24: 805f69248ee0
[  512.360277] x23: 0001 x22: 11b0d000
[  512.365662] x21: 1116e3e0 x20: 805f76db7cb0
[  512.371047] x19: 0030 x18: 
[  512.376433] x17:  x16: 
[  512.381818] x15: 1176d708 x14: 5244006d726f6674
[  512.387204] x13: 0040 x12: 0008
[  512.392589] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[  512.397974] x9 : 716e6573606aff2f x8 : 7f7f7f7f7f7f7f7f
[  512.403360] x7 : 2d68725e686c6f68 x6 : 0080
[  512.408745] x5 :  x4 : 805f08e2d440
[  512.414131] x3 : 11b0df60 x2 : 0001
[  512.419516] x1 :  x0 : 
[  512.424902] Process insmod (pid: 13691, stack limit = 0x07e6d9d7)
[  512.431784] Call trace:
[  512.434260]  sysfs_do_create_link_sd.isra.0+0x5c/0x110
[  512.439468]  sysfs_create_link+0x40/0x68
[  512.443446]  driver_sysfs_add+0x88/0xb8
[  512.447332]  device_bind_driver+0x20/0x70
[  512.451392]  __device_attach+0xa0/0x170
[  512.459409]  device_initial_probe+0x24/0x30
[  512.467809]  bus_probe_device+0xa0/0xa8
[  512.475844]  device_add+0x494/0x620
[  512.483441]  platform_device_add+0x118/0x2a0
[  512.491638]  try_smi_init+0x6c0/0x11c8 [ipmi_si]
[  512.500161]  init_ipmi_si+0x13c/0x1f0 [ipmi_si]
[  512.508342]  do_one_initcall+0x54/0x1f0
[  512.515668]  do_init_module+0x64/0x1e4
[  512.522810]  load_module+0x13fc/0x14f8
[  512.529816]  __se_sys_finit_module+0xa0/0x100
[  512.537398]  __arm64_sys_finit_module+0x24/0x30
[  512.545045]  el0_svc_common+0x120/0x148
[  512.552029]  el0_svc_handler+0x38/0x78
[  512.558918]  el0_svc+0x8/0xc
[  512.564960] Code: 942513e5 d503201f d503201f 35000400 (f9400273)
[  512.574273] ---[ end trace 2a5abdfb7a1edf93 ]---
[  512.582123] Kernel panic - not syncing: Fatal exception
[  512.590600] SMP: stopping secondary CPUs
[  512.597934] Kernel Offset: disabled
[  512.604635] CPU features: 0x002,21006008
[  512.611889] Memory Limit: none
[  512.621495] Starting crashdump kernel...
[

Re: [PATCH v3 00/16] powerpc/32: Use BATs/LTLBs for STRICT_KERNEL_RWX

2019-02-20 Thread Christophe Leroy





Le 21/02/2019 à 02:47, Michael Ellerman a écrit :

Christophe Leroy  writes:


The purpose of this serie is to:
- use BATs with STRICT_KERNEL_RWX on book3s (See patch 13 for details.)
- use LTLBs with STRICT_KERNEL_RWX on 8xx (See patch 15 for a few details.)


This doesn't boot qemu-mac99 for me:

   spawn ~/src/qemu/ppc-softmmu/qemu-system-ppc -nographic -vga none -M mac99 
-m 1G -kernel build/vmlinux -initrd ppc32-initrd.gz -append console=ttyPZ0 
init=/bin/sh
   >> =
   >> OpenBIOS 1.1 [Feb 15 2019 10:05]
   >> Configuration device id QEMU version 1 machine id 1
   >> CPUs: 1
   >> Memory: 1024M
   >> UUID: ----
   >> CPU type PowerPC,G4
   milliseconds isn't unique.
   Welcome to OpenBIOS v1.1 built on Feb 15 2019 10:05
   >> [ppc] Kernel already loaded (0x0100 + 0x00c2c338) (initrd 0x01d2d000 
+ 0x007e72f0)
   >> [ppc] Kernel command line: console=ttyPZ0 init=/bin/sh
   >> switching to new context:
   OF stdout device is: /pci@f200/mac-io@c/escc@13000/ch-a@13020
   Preparing to boot Linux version 5.0.0-rc2-gcc-8.2.0-00125-g4fcb83ca7936 
(michael@ka4) (gcc version 8.2.0 (Buildroot 2018.11-rc2-3-ga0787e9)) #724 
Thu Feb 21 12:03:14 AEDT 2019
   Detected machine type: 0400
   command line:
   memory layout at init:
 memory_limit :  (16 MB aligned)
 alloc_bottom : 02515000
 alloc_top: 3000
 alloc_top_hi : 4000
 rmo_top  : 3000
 ram_top  : 4000
   copying OF device tree...
   Building dt strings...
   Building dt structure...
   Device tree strings 0x02516000 -> 0x025150a4
   Device tree struct  0x02517000 -> 0x3fde7eb0
   Quiescing Open Firmware ...
   Booting Linux via __start() @ 0x0100 ...
   FAIL! Booting BE pmac32


That's pmac32 defconfig ish.
I haven't had time to debug it further sorry.



Ok. I boots fine without the '-m 1G'.

I'll find out why.

Christophe

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Huang, Ying

Wei Yang  writes:

> On Thu, Feb 21, 2019 at 12:46:18PM +0800, Huang, Ying wrote:
>>Wei Yang  writes:
>>
>>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> > >Greeting,
> > >
> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops 
> > >due to commit:
> > >
> > >
> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
> > >device->knode_class to device_private")
> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > 
> > This is interesting.
> > 
> > I didn't expect the move of this field will impact the performance.
> > 
> > The reason is struct device is a hotter memory than 
> > device->device_private?
> > 
> > >in testcase: will-it-scale
> > >on test machine: 288 threads Knights Mill with 80G memory
> > >with following parameters:
> > >
> > >   nr_task: 100%
> > >   mode: thread
> > >   test: unlink2
> > >   cpufreq_governor: performance
> > >
> > >test-description: Will It Scale takes a testcase and runs it from 1 
> > >through to n parallel copies to see if the testcase will scale. It 
> > >builds both a process and threads based test in order to see any 
> > >differences between the two.
> > >test-url: https://github.com/antonblanchard/will-it-scale
> > >
> > >In addition to that, the commit also has significant impact on the 
> > >following tests:
> > >
> > >+--+---+
> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops 
> > >-29.9% regression |
> > >| test machine | 288 threads Knights Mill with 80G memory  
> > >|
> > >| test parameters  | cpufreq_governor=performance  
> > >|
> > >|  | mode=thread   
> > >|
> > >|  | nr_task=100%  
> > >|
> > >|  | test=signal1  
> > >|
> 
> Ok, I'm going to blame your testing system, or something here, and not
> the above patch.
> 
> All this test does is call raise(3).  That does not touch the driver
> core at all.
> 
> > >+--+---+
> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops 
> > >-16.5% regression |
> > >| test machine | 288 threads Knights Mill with 80G memory  
> > >|
> > >| test parameters  | cpufreq_governor=performance  
> > >|
> > >|  | mode=thread   
> > >|
> > >|  | nr_task=100%  
> > >|
> > >|  | test=open1
> > >|
> > >+--+---+
> 
> Same here, open1 just calls open/close a lot.  No driver core
> interaction at all there either.
> 
> So are you _sure_ this is the offending patch?

Hi Greg,

We did an experiment, recovered the layout of struct device. and we
found the regression is gone. I guess the regession is not from the
patch but related to the struct layout.


tests: 1
testcase/path_params/tbox_group/run: 
will-it-scale/performance-thread-100%-unlink2/lkp-knm01

570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
  --  
 %stddev  change %stddev
 \  |\  
237096  14% 270789will-it-scale.workload
   823  14%939will-it-scale.per_thread_ops

>>>
>>> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
>>> before 570d020012?
>>>

tests: 1
testcase/path_params/tbox_group/run: 
will-it-scale/performance-thread-100%-signal1/lkp-knm01

570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
  --  
 %stddev  change %stddev
 \  |\  
 93.51   3%48% 138.53   3%  will-it-scale.time.user_time
   186  40%261will-it-scale.per_thread_ops
 53909  40%  75507

Re: [tip:x86/cleanups] x86: Remove pr_fmt duplicate logging prefixes

2019-02-20 Thread Bjorn Helgaas

On Thu, May 17, 2018 at 11:45 AM Joe Perches  wrote:
>
> On Thu, 2018-05-17 at 20:27 +0200, Borislav Petkov wrote:
> > On Sun, May 13, 2018 at 12:27:45PM -0700, tip-bot for Joe Perches wrote:
> > > Commit-ID:  1de392f5d5e803663abbd8ed084233f154152bcd
> > > Gitweb: 
> > > https://git.kernel.org/tip/1de392f5d5e803663abbd8ed084233f154152bcd
> > > Author: Joe Perches 
> > > AuthorDate: Thu, 10 May 2018 08:45:30 -0700
> > > Committer:  Thomas Gleixner 
> > > CommitDate: Sun, 13 May 2018 21:25:18 +0200
> > >
> > > x86: Remove pr_fmt duplicate logging prefixes
> > >
> > > Converting pr_fmt from a default simple #define to use KBUILD_MODNAME
> > > added some duplicate prefixes.
> > >
> > > Remove the duplicate prefixes.
> []
> > Maybe I'm missing something but this dropped the prefixes now
> > completely:
> >
> > -e820: BIOS-provided physical RAM map:
> > +BIOS-provided physical RAM map:
> >
> > -e820: last_pfn = 0x43f000 max_arch_pfn = 0x4
> > +last_pfn = 0x43f000 max_arch_pfn = 0x4
> >
> > -e820: last_pfn = 0x9d000 max_arch_pfn = 0x4
> > +last_pfn = 0x9d000 max_arch_pfn = 0x4
> >
> > ...
> >
> > I don't think that was the intention.
>
> Hi Borislav
>
> It wasn't and isn't the intention.
>
> This is a patch _series_, and all of the patches
> from 4-18 depend on patch 3 which converts the
> generic define in include/linux/printk.h
>
> from
> #define pr_fmt(fmt) fmt
> to
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> Perhaps a better option, which could be done in
> a v2 of the series, is to add a temporary
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> to each file modified in patches 4-18 and then
> allow the follow-on script described in the 0/18
> cover letter to remove those #defines

Was there ever a v2?  What I see from 1de392f5d5e8 ("x86: Remove
pr_fmt duplicate logging prefixes") is exactly what Borislav noted.

For example, https://bugzilla.kernel.org/attachment.cgi?id=281007 is a
v4.20 dmesg log from
https://bugzilla.kernel.org/show_bug.cgi?id=202511 .  That bug also
has a v4.17 dmesg log
(https://bugzilla.kernel.org/attachment.cgi?id=281011).  Comparing
them shows:

-Linux version 4.17.19-gentoo (root@survivor) (gcc version 7.3.0
(Gentoo 7.3.0-r3 p1.4)) #2 SMP Sat Sep 22 09:53:01 EDT 2018
+Linux version 4.20.6-gentoo (root@survivor) (gcc version 7.3.0
(Gentoo 7.3.0-r3 p1.4)) #1 SMP Tue Feb 5 09:46:51 EST 2019
...
-e820: BIOS-provided physical RAM map:
+BIOS-provided physical RAM map:

Not a big deal as far as I'm concerned, but it is a minor annoyance.

Bjorn

Re: [RFC PATCH] docs/memory-barriers.txt: Rewrite "KERNEL I/O BARRIER EFFECTS" section

2019-02-20 Thread Michael Ellerman

Will Deacon  writes:
> [+more ppc folks]
>
> On Mon, Feb 18, 2019 at 04:50:12PM +, Will Deacon wrote:
>> On Wed, Feb 13, 2019 at 10:27:09AM -0800, Linus Torvalds wrote:
>> > Note that even if mmiowb() is expensive (and I don't think that's
>> > actually even the case on ia64), you can - and probably should - do
>> > what PowerPC does.
>> > 
>> > Doing an IO barrier on PowerPC is insanely expensive, but they solve
>> > that simply track the whole "have I done any IO" manually. It's not
>> > even that expensive, it just uses a percpu flag.
>> > 
>> > (Admittedly, PowerPC makes it less obvious that it's a percpu variable
>> > because it's actually in the special "paca" region that is like a
>> > hyper-local percpu area).
>
> [...]
>
>> > But we *could* first just do the mmiowb() unconditionally in the ia64
>> > unlocking code, and then see if anybody notices?
>> 
>> I'll hack this up as a starting point. We can always try to be clever later
>> on if it's deemed necessary.
>
> Ok, so I started hacking this up in core code with the percpu flag (since
> riscv apparently needs it), but I've now realised that I don't understand
> how the PowerPC trick works after all. Consider the following:
>
>   spin_lock();// io_sync = 0
>   outb(42, port); // io_sync = 1
>   spin_lock();// io_sync = 0
>   ...
>   spin_unlock();
>   spin_unlock();
>
> The inner lock could even happen in an irq afaict, but we'll end up skipping
> the mmiowb()/sync because the io_sync flag is unconditionally cleared by
> spin_lock(). Fixing this is complicated by the fact that I/O writes can be
> performed in preemptible context with no locks held, so we can end up
> spuriously setting the io_sync flag for arbitrary CPUs, hence the desire
> to clear it in spin_lock().
>
> If the paca entry was more than a byte, we could probably track that a
> spinlock is held and then avoid clearing the flag prematurely, but I have
> a feeling that I'm missing something. Anybody know how this is supposed to
> work?

I don't think you're missing anything :/

Having two flags like you suggest could work. Or you could just make the
flag into a nesting counter.

Or do you just remove the clearing from spin_lock()? 

That gets you:

spin_lock();
outb(42, port); // io_sync = 1
spin_lock();
...
spin_unlock();  // mb(); io_sync = 0
spin_unlock();


And I/O outside of the lock case:

outb(42, port); // io_sync = 1

spin_lock();
...
spin_unlock();  // mb(); io_sync = 0


Extra barriers are not ideal, but the odd spurious mb() might be
preferable to doing another compare and branch or increment in every
spin_lock()?

cheers

linux-next: build failure after merge of the xarray tree

2019-02-20 Thread Stephen Rothwell

Hi all,

After merging the xarray tree, today's linux-next build (powerpc
ppc64_defconfig) failed like this:

In file included from include/linux/uio.h:12,
 from include/linux/socket.h:8,
 from include/rdma/rdma_cm.h:37,
 from drivers/infiniband/core/restrack.c:6:
drivers/infiniband/core/restrack.c: In function 'rt_xa_alloc_cyclic':
include/linux/kernel.h:40:18: warning: passing argument 3 of '__xa_alloc' makes 
pointer from integer without a cast [-Wint-conversion]
 #define U32_MAX  ((u32)~0U)
  ^~
drivers/infiniband/core/restrack.c:26:27: note: in expansion of macro 'U32_MAX'
  err = __xa_alloc(xa, id, U32_MAX, entry, GFP_KERNEL);
   ^~~
In file included from include/linux/radix-tree.h:31,
 from include/linux/fs.h:15,
 from include/linux/seq_file.h:11,
 from arch/powerpc/include/asm/machdep.h:12,
 from arch/powerpc/include/asm/archrandom.h:7,
 from include/linux/random.h:166,
 from include/linux/net.h:22,
 from include/linux/skbuff.h:29,
 from include/linux/if_arp.h:26,
 from include/rdma/ib_addr.h:39,
 from include/rdma/rdma_cm.h:39,
 from drivers/infiniband/core/restrack.c:6:
include/linux/xarray.h:524:61: note: expected 'void *' but argument is of type 
'unsigned int'
 int __must_check __xa_alloc(struct xarray *, u32 *id, void *entry,
   ~~^
drivers/infiniband/core/restrack.c:26:36: error: incompatible type for argument 
4 of '__xa_alloc'
  err = __xa_alloc(xa, id, U32_MAX, entry, GFP_KERNEL);
^
In file included from include/linux/radix-tree.h:31,
 from include/linux/fs.h:15,
 from include/linux/seq_file.h:11,
 from arch/powerpc/include/asm/machdep.h:12,
 from arch/powerpc/include/asm/archrandom.h:7,
 from include/linux/random.h:166,
 from include/linux/net.h:22,
 from include/linux/skbuff.h:29,
 from include/linux/if_arp.h:26,
 from include/rdma/ib_addr.h:39,
 from include/rdma/rdma_cm.h:39,
 from drivers/infiniband/core/restrack.c:6:
include/linux/xarray.h:525:3: note: expected 'struct xa_limit' but argument is 
of type 'void *'
   struct xa_limit, gfp_t);
   ^~~
drivers/infiniband/core/restrack.c:29:28: warning: passing argument 3 of 
'__xa_alloc' makes pointer from integer without a cast [-Wint-conversion]
   err = __xa_alloc(xa, id, *next, entry, GFP_KERNEL);
^
In file included from include/linux/radix-tree.h:31,
 from include/linux/fs.h:15,
 from include/linux/seq_file.h:11,
 from arch/powerpc/include/asm/machdep.h:12,
 from arch/powerpc/include/asm/archrandom.h:7,
 from include/linux/random.h:166,
 from include/linux/net.h:22,
 from include/linux/skbuff.h:29,
 from include/linux/if_arp.h:26,
 from include/rdma/ib_addr.h:39,
 from include/rdma/rdma_cm.h:39,
 from drivers/infiniband/core/restrack.c:6:
include/linux/xarray.h:524:61: note: expected 'void *' but argument is of type 
'u32' {aka 'unsigned int'}
 int __must_check __xa_alloc(struct xarray *, u32 *id, void *entry,
   ~~^
drivers/infiniband/core/restrack.c:29:35: error: incompatible type for argument 
4 of '__xa_alloc'
   err = __xa_alloc(xa, id, *next, entry, GFP_KERNEL);
   ^
In file included from include/linux/radix-tree.h:31,
 from include/linux/fs.h:15,
 from include/linux/seq_file.h:11,
 from arch/powerpc/include/asm/machdep.h:12,
 from arch/powerpc/include/asm/archrandom.h:7,
 from include/linux/random.h:166,
 from include/linux/net.h:22,
 from include/linux/skbuff.h:29,
 from include/linux/if_arp.h:26,
 from include/rdma/ib_addr.h:39,
 from include/rdma/rdma_cm.h:39,
 from drivers/infiniband/core/restrack.c:6:
include/linux/xarray.h:525:3: note: expected 'struct xa_limit' but argument is 
of type 'void *'
   struct xa_limit, gfp_t);
   ^~~

Caused by commit

  fd47c2f99f04 ("RDMA/restrack: Convert internal DB from hash to XArray")

from the rdma tree interacting with commit

  a3e4d3f97ec8 ("XArray: Redesign xa_alloc API")

from the xarray tree.

I added the following merge fix patch:

From: Stephen Rothwell 
Date: Thu, 21 Feb 2019 17:07:22 +1100
Subject: [PATCH] RDMA/restrack: fix for

Re: [PATCH] huegtlbfs: fix races and page leaks during migration

2019-02-20 Thread Andrew Morton

On Tue, 12 Feb 2019 14:14:00 -0800 Mike Kravetz  wrote:

> hugetlb pages should only be migrated if they are 'active'.  The routines
> set/clear_page_huge_active() modify the active state of hugetlb pages.
> When a new hugetlb page is allocated at fault time, set_page_huge_active
> is called before the page is locked.  Therefore, another thread could
> race and migrate the page while it is being added to page table by the
> fault code.  This race is somewhat hard to trigger, but can be seen by
> strategically adding udelay to simulate worst case scheduling behavior.
> Depending on 'how' the code races, various BUG()s could be triggered.
> 
> To address this issue, simply delay the set_page_huge_active call until
> after the page is successfully added to the page table.
> 
> Hugetlb pages can also be leaked at migration time if the pages are
> associated with a file in an explicitly mounted hugetlbfs filesystem.
> For example, a test program which hole punches, faults and migrates
> pages in such a file (1G in size) will eventually fail because it
> can not allocate a page.  Reported counts and usage at time of failure:
> 
> node0
> 537 free_hugepages
> 1024nr_hugepages
> 0   surplus_hugepages
> node1
> 1000free_hugepages
> 1024nr_hugepages
> 0   surplus_hugepages
> 
> Filesystem Size  Used Avail Use% Mounted on
> nodev  4.0G  4.0G 0 100% /var/opt/hugepool
> 
> Note that the filesystem shows 4G of pages used, while actual usage is
> 511 pages (just under 1G).  Failed trying to allocate page 512.
> 
> If a hugetlb page is associated with an explicitly mounted filesystem,
> this information in contained in the page_private field.  At migration
> time, this information is not preserved.  To fix, simply transfer
> page_private from old to new page at migration time if necessary.
> 
> Cc: 
> Fixes: bcc54222309c ("mm: hugetlb: introduce page_huge_active")
> Signed-off-by: Mike Kravetz 

cc:stable.  It would be nice to get some review of this one, please?

> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -859,6 +859,18 @@ static int hugetlbfs_migrate_page(struct address_space 
> *mapping,
>   rc = migrate_huge_page_move_mapping(mapping, newpage, page);
>   if (rc != MIGRATEPAGE_SUCCESS)
>   return rc;
> +
> + /*
> +  * page_private is subpool pointer in hugetlb pages.  Transfer to
> +  * new page.  PagePrivate is not associated with page_private for
> +  * hugetlb pages and can not be set here as only page_huge_active
> +  * pages can be migrated.
> +  */
> + if (page_private(page)) {
> + set_page_private(newpage, page_private(page));
> + set_page_private(page, 0);
> + }
> +
>   if (mode != MIGRATE_SYNC_NO_COPY)
>   migrate_page_copy(newpage, page);
>   else
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a80832487981..f859e319e3eb 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3625,7 +3625,6 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, 
> struct vm_area_struct *vma,
>   copy_user_huge_page(new_page, old_page, address, vma,
>   pages_per_huge_page(h));
>   __SetPageUptodate(new_page);
> - set_page_huge_active(new_page);
>  
>   mmun_start = haddr;
>   mmun_end = mmun_start + huge_page_size(h);
> @@ -3647,6 +3646,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, 
> struct vm_area_struct *vma,
>   make_huge_pte(vma, new_page, 1));
>   page_remove_rmap(old_page, true);
>   hugepage_add_new_anon_rmap(new_page, vma, haddr);
> + set_page_huge_active(new_page);
>   /* Make the old page be freed below */
>   new_page = old_page;
>   }
> @@ -3792,7 +3792,6 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>   }
>   clear_huge_page(page, address, pages_per_huge_page(h));
>   __SetPageUptodate(page);
> - set_page_huge_active(page);
>  
>   if (vma->vm_flags & VM_MAYSHARE) {
>   int err = huge_add_to_page_cache(page, mapping, idx);
> @@ -3863,6 +3862,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
>   }
>  
>   spin_unlock(ptl);
> +
> + /* May already be set if not newly allocated page */
> + set_page_huge_active(page);
> +
>   unlock_page(page);
>  out:
>   return ret;
> @@ -4097,7 +4100,6 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>* the set_pte_at() write.
>*/
>   __SetPageUptodate(page);
> - set_page_huge_active(page);
>  
>   mapping = dst_vma->vm_file->f_mapping;
>   idx = vma_hugecache_offset(h, dst_vma, dst_addr);
> @@ -4165,6 +4167,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>   update_mmu_cache(dst_vma, dst_addr, dst_pte);
>  
>   spin_unlock(ptl);
> +

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Wei Yang

On Thu, Feb 21, 2019 at 12:46:18PM +0800, Huang, Ying wrote:
>Wei Yang  writes:
>
>> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
 On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
 > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
 > >Greeting,
 > >
 > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due 
 > >to commit:
 > >
 > >
 > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
 > >device->knode_class to device_private")
 > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
 > >
 > 
 > This is interesting.
 > 
 > I didn't expect the move of this field will impact the performance.
 > 
 > The reason is struct device is a hotter memory than 
 > device->device_private?
 > 
 > >in testcase: will-it-scale
 > >on test machine: 288 threads Knights Mill with 80G memory
 > >with following parameters:
 > >
 > >nr_task: 100%
 > >mode: thread
 > >test: unlink2
 > >cpufreq_governor: performance
 > >
 > >test-description: Will It Scale takes a testcase and runs it from 1 
 > >through to n parallel copies to see if the testcase will scale. It 
 > >builds both a process and threads based test in order to see any 
 > >differences between the two.
 > >test-url: https://github.com/antonblanchard/will-it-scale
 > >
 > >In addition to that, the commit also has significant impact on the 
 > >following tests:
 > >
 > >+--+---+
 > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% 
 > >regression |
 > >| test machine | 288 threads Knights Mill with 80G memory   
 > >   |
 > >| test parameters  | cpufreq_governor=performance   
 > >   |
 > >|  | mode=thread
 > >   |
 > >|  | nr_task=100%   
 > >   |
 > >|  | test=signal1   
 > >   |
 
 Ok, I'm going to blame your testing system, or something here, and not
 the above patch.
 
 All this test does is call raise(3).  That does not touch the driver
 core at all.
 
 > >+--+---+
 > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% 
 > >regression |
 > >| test machine | 288 threads Knights Mill with 80G memory   
 > >   |
 > >| test parameters  | cpufreq_governor=performance   
 > >   |
 > >|  | mode=thread
 > >   |
 > >|  | nr_task=100%   
 > >   |
 > >|  | test=open1 
 > >   |
 > >+--+---+
 
 Same here, open1 just calls open/close a lot.  No driver core
 interaction at all there either.
 
 So are you _sure_ this is the offending patch?
>>>
>>>Hi Greg,
>>>
>>>We did an experiment, recovered the layout of struct device. and we
>>>found the regression is gone. I guess the regession is not from the
>>>patch but related to the struct layout.
>>>
>>>
>>>tests: 1
>>>testcase/path_params/tbox_group/run: 
>>>will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>>
>>>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>>  --  
>>> %stddev  change %stddev
>>> \  |\  
>>>237096  14% 270789will-it-scale.workload
>>>   823  14%939will-it-scale.per_thread_ops
>>>
>>
>> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
>> before 570d020012?
>>
>>>
>>>tests: 1
>>>testcase/path_params/tbox_group/run: 
>>>will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>>
>>>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>>  --  
>>> %stddev  change %stddev
>>> \  |\  
>>> 93.51   3%48% 138.53   3%  will-it-scale.time.user_time
>>>   186  40%261will-it-scale.per_thread_ops
>>> 53909  40%  75507will-it-scale.workload
>>>
>>>
>>>tests: 1
>>>testcase/path_params/tbox_group/run:

Re: [PATCH 5/6] lib: Fix function documentation for strncpy_from_user

2019-02-20 Thread Kees Cook

On Wed, Feb 20, 2019 at 9:25 PM Tobin C. Harding  wrote:
>
> On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote:
> > So, generally speaking, I'd love to split all strncpy* uses into
> > strscpy_zero() (when expecting to do str->str copies), and some new
> > function, named like mempadstr() or str2mem() that copies a str to a
> > __nonstring char array, with trailing padding, if there is space. Then
> > there is no more mixing the two cases and botching things.

I should use "converts" instead of "copies" above, just to drive the
point home. :)

>
> Oh cool, treewide changes, I'm down with that.  So to v2 I'll add
> str2mem() and then attack the tree as suggested.  What could possibly go
> wrong :)?

Some clear documentation needs to be written for str2mem() to really
help people understand what a "non string" character array is
(especially given that it LOOKS like it has NUL termination -- when in
fact it's just "padding").

The tree-wide changes will likely take a while (and don't need to be
part of this series unless you want to find a couple good examples)
since we have to do them case-by-case: it's not always obvious when
it's actually a non-string, so getting help from maintainers here will
be needed. (And maybe some kind of flow chart added to
Documentation/process/deprecated.rst for how to stop using strncpy()
and strlcpy().)

What I can't quite figure out yet is how to find a way for sfr to flag
newly added users of strcpy, strncpy, and strlcpy. We might need to
bring back __deprecated, but hide it behind a W=linux-next flag or
something crazy. Stephen, in your builds you're already injecting
-Wimplicit-fallthrough: do you do W=1 or anything like that? If not, I
think we need some W= setting for your linux-next builds that generate
the maintainer-nag warnings...

-Kees

P.S. Here's C string API Rant (I just had to get this out, please feel
free to ignore):

strcpy returns dest ... which is already known, so it's a meaningless
return value.

strncpy returns dest (still meaningless)

strlcpy returns strlen(src) ... the length we WOULD have copied. Why
would we care about that? I'm operating on dest. Were there callers
that needed to both copy part of src and learn how long it was at the
same time?

strscpy returns -E2BIG or non-NUL bytes copied: yay, a return about
what actually happened from the operation!

... snprintf returns what it WOULD have written, much like strlcpy
above. At least snprintf has an excuse: it can be used to calculate an
allocation size (called with a NULL dest and 0 size) ... but shouldn't
"how big is this format string going to be?" be a separate function? I
wonder if we can kill all kernel uses of snprintf too (after
introducing a "how big would it be?" function and switching all other
callers over to scnprintf)...

So scnprintf() does the right thing (count of non-NUL bytes copied out).

So now our safe(r?) string API versions use different letters to show
they're safe: "s" in strcpy and "cn" in sprintf. /me cries forever.

-- 
Kees Cook

RE: Re: [PATCH v3 5/7] drivers: devfreq: add longer polling interval in idle

2019-02-20 Thread MyungJoo Ham

>> 
>> There are some requirements that you need to consider:
>> 
>> Is 30% really applicable to ALL devfreq devices?
>The 30% load while the device is on lowest OPP is to filter some noise.
>It might be tunable over sysfs for each device if you like.
>>  - What if some devices do not want such behaviors?
>They can set polling_idle_ms and polling_ms the same value.
>>  - What if some devices want different values (change behavors)?
>Need of sysfs tunable here.
>>  - What if some manufactures want different default values?
>Like above (sysfs).
>>  - What if some devices want to let the framework know that it's in idle?
>There might be a filed in devfreq->state which could handle this.
>>  - What if some other kernel context, device (drivers),
>>  or userspace process want to notify that it's no more idling?This issue 
>> is more related to the new movement in the 'interconnect'
>development. They have a goal for this kind of interactions and QoS
>between devices or their clients. In devfreq it would be possible
>to tackle this, but would require a lot of changes (notification chain,
>state machines in devices,
>
>> 
>> As mentioned in the internal thread (tizen.org),
>> I'm not convinced by the idea of assuming that a device can be considered 
>> "idling"
>> if it has simply "low" utilization.
>> 
>> You are going to deteriorate the UI response time of mobile devices 
>> significantly.
>Current devfreq wake-up also does not guarantee that, maybe on a single
>CPU platform does.

Yes, the current devfreq does not enhance UI response time in the sense
that it would keep the same reponse time anyway.

However, your current approach will surely deteriorate it by lengthen
the polling latency.

For mobile and wearable devices, in many cases I've been witnessing,
the device idles right before the user's input (launching an app,
scrolling messages or web pages, or press a play button).
For mitigation, we often relay UI inputs to DVFS mechanisms so that
we either increase frequency for any UI inputs for a short period or
shorten the polling latency for a short period.
When a highly user-interactive device is idling or operating a low frequency,
we should assume that it's going to be highly performing anytime;
loosening the checking period is not a good solution in that sense
although it is probable for servers or workstations.
But, I don't think servers/workstations do care power consumption of
DVFS checking loops anyway.



>
>I will try to address your and Chanwoo's comments that the devfreq still
>needs deferred polling in some platforms.
>Would it be OK if we have two options: deferred and delayed work while
>registering a wakeup for a device?
>That would be a function like: polling_mode_init(devfreq) instead of
>simple INIT_DEFERRED_WORK(), which will check the device's preference.
>The device driver could set a filed in 'polling_mode' to enum:
>POWER_EFFICIENT or RELIABLE_INTERVAL. For compatibility with old drivers
>where the polling_mode = 0, SYSTEM_DEFAULT_POLLING_MODE (which is one
>of these two) would be used.
>Then the two-phase-polling-interval from this patch could only be used
>for the RELIABLE_INTERVAL configuration or even dropped.

One thing I want to add is: let's not over-complicate it.
Do you have any experimental results on how much power is saved by doing this?
(and user response time losses)


Cheers,
MyungJoo

[PATCH RFC 5/5] rcuwait: Replace rcu_assign_pointer() with WRITE_ONCE

2019-02-20 Thread Joel Fernandes (Google)

This suppresses a sparse error generated due to the recently added
rcu_assign_pointer sparse check below. It seems WRITE_ONCE should be
sufficient here.

>> kernel//locking/percpu-rwsem.c:162:9: sparse: error: incompatible
types in comparison expression (different address spaces)

Signed-off-by: Joel Fernandes (Google) 
---
 include/linux/rcuwait.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
index 90bfa3279a01..9e5b4760e6c2 100644
--- a/include/linux/rcuwait.h
+++ b/include/linux/rcuwait.h
@@ -44,7 +44,7 @@ extern void rcuwait_wake_up(struct rcuwait *w);
 */ \
WARN_ON(current->exit_state);   \
\
-   rcu_assign_pointer((w)->task, current); \
+   WRITE_ONCE((w)->task, current); \
for (;;) {  \
/*  \
 * Implicit barrier (A) pairs with (B) in   \
-- 
2.21.0.rc0.258.g878e2cd30e-goog

[PATCH RFC 1/5] net: rtnetlink: Fix incorrect RCU API usage

2019-02-20 Thread Joel Fernandes (Google)

rtnl_register_internal() and rtnl_unregister_all tries to directly
dereference an RCU protected pointed outside RCU read side section.
While this is Ok to do since a lock is held, let us use the correct
API to avoid programmer bugs in the future.

This also fixes sparse warnings arising from not using RCU API.

net/core/rtnetlink.c:332:13: warning: incorrect type in assignment
(different address spaces) net/core/rtnetlink.c:332:13:expected
struct rtnl_link **tab net/core/rtnetlink.c:332:13:got struct
rtnl_link *[noderef] *

Signed-off-by: Joel Fernandes (Google) 
---
 net/core/rtnetlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ea1bed08ede..98be4b4818a9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -188,7 +188,7 @@ static int rtnl_register_internal(struct module *owner,
msgindex = rtm_msgindex(msgtype);
 
rtnl_lock();
-   tab = rtnl_msg_handlers[protocol];
+   tab = rtnl_dereference(rtnl_msg_handlers[protocol]);
if (tab == NULL) {
tab = kcalloc(RTM_NR_MSGTYPES, sizeof(void *), GFP_KERNEL);
if (!tab)
@@ -329,7 +329,7 @@ void rtnl_unregister_all(int protocol)
BUG_ON(protocol < 0 || protocol > RTNL_FAMILY_MAX);
 
rtnl_lock();
-   tab = rtnl_msg_handlers[protocol];
+   tab = rtnl_dereference(rtnl_msg_handlers[protocol]);
if (!tab) {
rtnl_unlock();
return;
-- 
2.21.0.rc0.258.g878e2cd30e-goog

[PATCH RFC 3/5] sched/cpufreq: Fix incorrect RCU API usage

2019-02-20 Thread Joel Fernandes (Google)

Recently I added an RCU annotation check to rcu_assign_pointer(). All
pointers assigned to RCU protected data are to be annotated with __rcu
inorder to be able to use rcu_assign_pointer() similar to checks in
other RCU APIs.

This resulted in a sparse error: kernel//sched/cpufreq.c:41:9: sparse:
error: incompatible types in comparison expression (different address
spaces)

Fix this by using the correct APIs for RCU accesses. This will
potentially avoid any future bugs in the code. If it is felt that RCU
protection is not needed here, then the rcu_assign_pointer call can be
dropped and replaced with, say, WRITE_ONCE or smp_store_release. Or, may
be we add a new API to do it. But calls rcu_assign_pointer seems an
abuse of the RCU API unless RCU is being used.

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/sched/cpufreq.c | 8 ++--
 kernel/sched/sched.h   | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 22bd8980f32f..c9aeb3bf5dc2 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -7,7 +7,7 @@
  */
 #include "sched.h"
 
-DEFINE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
+DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
 
 /**
  * cpufreq_add_update_util_hook - Populate the CPU's update_util_data pointer.
@@ -34,8 +34,12 @@ void cpufreq_add_update_util_hook(int cpu, struct 
update_util_data *data,
if (WARN_ON(!data || !func))
return;
 
-   if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
+   rcu_read_lock();
+   if (WARN_ON(rcu_dereference(per_cpu(cpufreq_update_util_data, cpu {
+   rcu_read_unlock();
return;
+   }
+   rcu_read_unlock();
 
data->func = func;
rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d04530bf251f..2ab545d40381 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2166,7 +2166,7 @@ static inline u64 irq_time_read(int cpu)
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
 
 #ifdef CONFIG_CPU_FREQ
-DECLARE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
+DECLARE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data);
 
 /**
  * cpufreq_update_util - Take a note about CPU utilization changes.
-- 
2.21.0.rc0.258.g878e2cd30e-goog

[PATCH RFC 2/5] ixgbe: Fix incorrect RCU API usage

2019-02-20 Thread Joel Fernandes (Google)

Recently, I added an RCU annotation check in rcu_assign_pointer. This
caused a sparse error to be reported by the ixgbe driver.

Further looking, it seems the adapter->xdp_prog pointer is not annotated
with __rcu. Annonating it fixed the error, but caused a bunch of other
warnings.

This patch tries to fix all warnings by using RCU API properly. This
makes sense to do because not using RCU properly can result in various
hard to find bugs. This is a best effort fix and is only build tested.
The sparse errors and warnings go away with the change. I request
maintainers / developers in this area to test it properly.

Signed-off-by: Joel Fernandes (Google) 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  4 ++--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 17 -
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 08d85e336bd4..3b14daf27516 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -311,7 +311,7 @@ struct ixgbe_ring {
struct ixgbe_ring *next;/* pointer to next ring in q_vector */
struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
struct net_device *netdev;  /* netdev ring belongs to */
-   struct bpf_prog *xdp_prog;
+   struct bpf_prog __rcu *xdp_prog;
struct device *dev; /* device for DMA mapping */
void *desc; /* descriptor ring memory */
union {
@@ -560,7 +560,7 @@ struct ixgbe_adapter {
unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
/* OS defined structs */
struct net_device *netdev;
-   struct bpf_prog *xdp_prog;
+   struct bpf_prog __rcu *xdp_prog;
struct pci_dev *pdev;
struct mii_bus *mii_bus;
 
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index daff8183534b..6aa59bb13a14 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2199,7 +2199,7 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter 
*adapter,
u32 act;
 
rcu_read_lock();
-   xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+   xdp_prog = rcu_dereference(rx_ring->xdp_prog);
 
if (!xdp_prog)
goto xdp_out;
@@ -6547,7 +6547,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter 
*adapter,
 rx_ring->queue_index) < 0)
goto err;
 
-   rx_ring->xdp_prog = adapter->xdp_prog;
+   rcu_assign_pointer(rx_ring->xdp_prog, adapter->xdp_prog);
 
return 0;
 err:
@@ -10246,7 +10246,10 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
struct bpf_prog *prog)
if (nr_cpu_ids > MAX_XDP_QUEUES)
return -ENOMEM;
 
-   old_prog = xchg(>xdp_prog, prog);
+   rcu_read_lock();
+   old_prog = rcu_dereference(adapter->xdp_prog);
+   rcu_assign_pointer(adapter->xdp_prog, prog);
+   rcu_read_unlock();
 
/* If transitioning XDP modes reconfigure rings */
if (!!prog != !!old_prog) {
@@ -10271,13 +10274,17 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
struct bpf_prog *prog)
 static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
struct ixgbe_adapter *adapter = netdev_priv(dev);
+   struct bpf_prog *prog;
 
switch (xdp->command) {
case XDP_SETUP_PROG:
return ixgbe_xdp_setup(dev, xdp->prog);
case XDP_QUERY_PROG:
-   xdp->prog_id = adapter->xdp_prog ?
-   adapter->xdp_prog->aux->id : 0;
+   rcu_read_lock();
+   prog = rcu_dereference(adapter->xdp_prog);
+   xdp->prog_id = prog ? prog->aux->id : 0;
+   rcu_read_unlock();
+
return 0;
case XDP_QUERY_XSK_UMEM:
return ixgbe_xsk_umem_query(adapter, >xsk.umem,
-- 
2.21.0.rc0.258.g878e2cd30e-goog

[PATCH RFC 4/5] sched/topology: Annonate RCU pointers properly

2019-02-20 Thread Joel Fernandes (Google)

The scheduler's topology code uses rcu_assign_pointer() to initialize
various pointers.

Let us annotate the pointers correctly which also help avoid future
bugs. This suppresses the new sparse errors caused by an annotation
check I added to rcu_assign_pointer().

Also replace rcu_assign_pointer call on rq->sd with WRITE_ONCE. This
should be sufficient for the rq->sd initialization.

This fixes sparse errors:
kernel//sched/topology.c:378:9: sparse: error: incompatible types in
comparison expression (different address spaces)
kernel//sched/topology.c:387:9: sparse: error: incompatible types in
comparison expression (different address spaces)
kernel//sched/topology.c:612:9: sparse: error: incompatible types in
comparison expression (different address spaces)
kernel//sched/topology.c:615:9: sparse: error: incompatible types in
comparison expression (different address spaces)
kernel//sched/topology.c:618:9: sparse: error: incompatible types in
comparison expression (different address spaces)
kernel//sched/topology.c:621:9: sparse: error: incompatible types in
comparison expression (different address spaces)

Signed-off-by: Joel Fernandes (Google) 
---
 kernel/sched/sched.h| 12 ++--
 kernel/sched/topology.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2ab545d40381..806703afd4b0 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -780,7 +780,7 @@ struct root_domain {
 * NULL-terminated list of performance domains intersecting with the
 * CPUs of the rd. Protected by RCU.
 */
-   struct perf_domain  *pd;
+   struct perf_domain __rcu *pd;
 };
 
 extern struct root_domain def_root_domain;
@@ -1305,13 +1305,13 @@ static inline struct sched_domain 
*lowest_flag_domain(int cpu, int flag)
return sd;
 }
 
-DECLARE_PER_CPU(struct sched_domain *, sd_llc);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DECLARE_PER_CPU(int, sd_llc_size);
 DECLARE_PER_CPU(int, sd_llc_id);
-DECLARE_PER_CPU(struct sched_domain_shared *, sd_llc_shared);
-DECLARE_PER_CPU(struct sched_domain *, sd_numa);
-DECLARE_PER_CPU(struct sched_domain *, sd_asym_packing);
-DECLARE_PER_CPU(struct sched_domain *, sd_asym_cpucapacity);
+DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
+DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 extern struct static_key_false sched_asym_cpucapacity;
 
 struct sched_group_capacity {
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 3f35ba1d8fde..2eab2e16ded5 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -586,13 +586,13 @@ static void destroy_sched_domains(struct sched_domain *sd)
  * the cpumask of the domain), this allows us to quickly tell if
  * two CPUs are in the same cache domain, see cpus_share_cache().
  */
-DEFINE_PER_CPU(struct sched_domain *, sd_llc);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc);
 DEFINE_PER_CPU(int, sd_llc_size);
 DEFINE_PER_CPU(int, sd_llc_id);
-DEFINE_PER_CPU(struct sched_domain_shared *, sd_llc_shared);
-DEFINE_PER_CPU(struct sched_domain *, sd_numa);
-DEFINE_PER_CPU(struct sched_domain *, sd_asym_packing);
-DEFINE_PER_CPU(struct sched_domain *, sd_asym_cpucapacity);
+DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
+DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity);
 DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
 
 static void update_top_cache_domain(int cpu)
@@ -668,7 +668,7 @@ cpu_attach_domain(struct sched_domain *sd, struct 
root_domain *rd, int cpu)
 
rq_attach_root(rq, rd);
tmp = rq->sd;
-   rcu_assign_pointer(rq->sd, sd);
+   WRITE_ONCE(rq->sd, sd);
dirty_sched_domain_sysctl(cpu);
destroy_sched_domains(tmp);
 
-- 
2.21.0.rc0.258.g878e2cd30e-goog

[PATCH RFC 0/5] RCU fixes for rcu_assign_pointer() usage

2019-02-20 Thread Joel Fernandes (Google)

These patches fix various RCU API usage issues found due to sparse errors as a
result of the recent check to add rcu_check_sparse() to rcu_assign_pointer().
The errors in many cases seem to indicate either an incorrect API usage, or
missing annotations. The annotations added can also help avoid future incorrect
usages and bugs so it is a good idea to do in any case.

These are only build/boot tested and I request for feedback from maintainers
and developers in the various areas the patches touch. Thanks for any feedback!

(There are still errors in rbtree.h but I have kept those for a later time
since fixing them is a bit more involved).

Joel Fernandes (Google) (5):
net: rtnetlink: Fix incorrect RCU API usage
ixgbe: Fix incorrect RCU API usage
sched/cpufreq: Fix incorrect RCU API usage
sched/topology: Annonate RCU pointers properly
rcuwait: Replace rcu_assign_pointer() with WRITE_ONCE

drivers/net/ethernet/intel/ixgbe/ixgbe.h  |  4 ++--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 17 -
include/linux/rcuwait.h   |  2 +-
kernel/sched/cpufreq.c|  8 ++--
kernel/sched/sched.h  | 14 +++---
kernel/sched/topology.c   | 12 ++--
net/core/rtnetlink.c  |  4 ++--
7 files changed, 36 insertions(+), 25 deletions(-)

--
2.21.0.rc0.258.g878e2cd30e-goog

mmotm 2019-02-20-21-43 uploaded

2019-02-20 Thread akpm

The mm-of-the-moment snapshot 2019-02-20-21-43 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (4.x
or 4.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.

A git tree which contains the memory management portion of this tree is
maintained at git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
by Michal Hocko.  It contains the patches which are between the
"#NEXT_PATCHES_START mm" and "#NEXT_PATCHES_END" markers, from the series
file, http://www.ozlabs.org/~akpm/mmotm/series.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/

To develop on top of mmotm git:

  $ git remote add mmotm 
git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git
  $ git remote update mmotm
  $ git checkout -b topic mmotm/master
  
  $ git send-email mmotm/master.. [...]

To rebase a branch with older patches to a new mmotm release:

  $ git remote update mmotm
  $ git rebase --onto mmotm/master  topic




The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is available at

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/

and use of this tree is similar to
http://git.cmpxchg.org/cgit.cgi/linux-mmotm.git/, described above.


This mmotm tree contains the following patches against 5.0-rc7:
(patches marked "*" will be included in linux-next)

  origin.patch
* checkpatch-dont-interpret-stack-dumps-as-commit-ids.patch
* revert-initramfs-cleanup-incomplete-rootfs.patch
* numa-change-get_mempolicy-to-use-nr_node_ids-instead-of-max_numnodes.patch
* kasan-fix-assigning-tags-twice.patch
* kasan-kmemleak-pass-tagged-pointers-to-kmemleak.patch
* kmemleak-account-for-tagged-pointers-when-calculating-pointer-range-v2.patch
* kasan-slub-move-kasan_poison_slab-hook-before-page_address.patch
* kasan-slub-move-kasan_poison_slab-hook-before-page_address-v2.patch
* kasan-slub-fix-conflicts-with-config_slab_freelist_hardened.patch
* kasan-slub-fix-more-conflicts-with-config_slab_freelist_hardened.patch
* slub-fix-slab_consistency_checks-kasan_sw_tags.patch
* proc-oom-do-not-report-alien-mms-when-setting-oom_score_adj.patch
* huegtlbfs-fix-races-and-page-leaks-during-migration.patch
* mm-fix-__dump_page-for-poisoned-pages.patch
* mm-page_alloc-fix-a-division-by-zero-error-when-boosting-watermarks-v2.patch
* mm-handle-lru_add_drain_all-for-up-properly.patch
* mm-handle-lru_add_drain_all-for-up-properly-fix.patch
* psi-avoid-divide-by-zero-crash-inside-virtual-machines.patch
* tmpfs-fix-link-accounting-when-a-tmpfile-is-linked-in.patch
* kasan-fix-random-seed-generation-for-tag-based-mode.patch
* kasan-prevent-tracing-of-tagsc.patch
* kasan-slab-fix-conflicts-with-config_hardened_usercopy.patch
* kasan-slab-make-freelist-stored-without-tags.patch
* kasan-slab-remove-redundant-kasan_slab_alloc-hooks.patch
* slub-fix-a-crash-with-slub_debug-kasan_sw_tags.patch
* mm-dont-let-userspace-spam-allocations-warnings.patch
* mm-memory_hotplug-fix-off-by-one-in-is_pageblock_removable.patch
* kasan-remove-use-after-scope-bugs-detection.patch
* page_poison-play-nicely-with-kasan.patch
* kasan-fix-kasan_check_read-write-definitions.patch
* scripts-decode_stacktracesh-handle-rip-address-with-segment.patch
* sh-remove-nargs-from-__syscall.patch
* debugobjects-move-printk-out-of-db-lock-critical-sections.patch
* ocfs2-fix-a-panic-problem-caused-by-o2cb_ctl.patch
* ocfs2-fix-the-application-io-timeout-when-fstrim-is-running.patch
* ocfs2-use-zero-sized-array-and-struct_size-in-kzalloc.patch
* ocfs2-clear-zero-in-unaligned-direct-io.patch
* ocfs2-clear-zero-in-unaligned-direct-io-checkpatch-fixes.patch
* 
ocfs2-dlm-clean-dlm_lksb_get_lvb-and-dlm_lksb_put_lvb-when-the-cancel_pending-is-set.patch
* 
ocfs2-dlm-return-dlm_cancelgrant-if-the-lock-is-on-granted-list-and-the-operation-is-canceled.patch
* ocfs2-wait-for-recovering-done-after-direct-unlock-request.patch
*

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread kernel test robot

On Thu, Feb 21, 2019 at 11:46:12AM +0800, Wei Yang wrote:
> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
> >On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
> >> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
> >> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
> >> > >Greeting,
> >> > >
> >> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due 
> >> > >to commit:
> >> > >
> >> > >
> >> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
> >> > >device->knode_class to device_private")
> >> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >> > >
> >> > 
> >> > This is interesting.
> >> > 
> >> > I didn't expect the move of this field will impact the performance.
> >> > 
> >> > The reason is struct device is a hotter memory than 
> >> > device->device_private?
> >> > 
> >> > >in testcase: will-it-scale
> >> > >on test machine: 288 threads Knights Mill with 80G memory
> >> > >with following parameters:
> >> > >
> >> > >nr_task: 100%
> >> > >mode: thread
> >> > >test: unlink2
> >> > >cpufreq_governor: performance
> >> > >
> >> > >test-description: Will It Scale takes a testcase and runs it from 1 
> >> > >through to n parallel copies to see if the testcase will scale. It 
> >> > >builds both a process and threads based test in order to see any 
> >> > >differences between the two.
> >> > >test-url: https://github.com/antonblanchard/will-it-scale
> >> > >
> >> > >In addition to that, the commit also has significant impact on the 
> >> > >following tests:
> >> > >
> >> > >+--+---+
> >> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% 
> >> > >regression |
> >> > >| test machine | 288 threads Knights Mill with 80G memory   
> >> > >   |
> >> > >| test parameters  | cpufreq_governor=performance   
> >> > >   |
> >> > >|  | mode=thread
> >> > >   |
> >> > >|  | nr_task=100%   
> >> > >   |
> >> > >|  | test=signal1   
> >> > >   |
> >> 
> >> Ok, I'm going to blame your testing system, or something here, and not
> >> the above patch.
> >> 
> >> All this test does is call raise(3).  That does not touch the driver
> >> core at all.
> >> 
> >> > >+--+---+
> >> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% 
> >> > >regression |
> >> > >| test machine | 288 threads Knights Mill with 80G memory   
> >> > >   |
> >> > >| test parameters  | cpufreq_governor=performance   
> >> > >   |
> >> > >|  | mode=thread
> >> > >   |
> >> > >|  | nr_task=100%   
> >> > >   |
> >> > >|  | test=open1 
> >> > >   |
> >> > >+--+---+
> >> 
> >> Same here, open1 just calls open/close a lot.  No driver core
> >> interaction at all there either.
> >> 
> >> So are you _sure_ this is the offending patch?
> >
> >Hi Greg,
> >
> >We did an experiment, recovered the layout of struct device. and we
> >found the regression is gone. I guess the regession is not from the
> >patch but related to the struct layout.
> >
> >
> >tests: 1
> >testcase/path_params/tbox_group/run: 
> >will-it-scale/performance-thread-100%-unlink2/lkp-knm01
> >
> >570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
> >  --  
> > %stddev  change %stddev
> > \  |\  
> >237096  14% 270789will-it-scale.workload
> >   823  14%939will-it-scale.per_thread_ops
> >
> 
> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
> before 570d020012?

testcase/path_params/tbox_group/run: 
will-it-scale/performance-thread-100%-unlink2/lkp-knm01

4bd4e92cfe6d2af7 a36dc70b810afe9183de2ea18f 
 -- 
 %stddev %change %stddev
 \  |\  
937.00+0.2% 939.33will-it-scale.per_thread_ops
269989+0.3% 270789will-it-scale.workload

> >
> >tests: 1
> >testcase/path_params/tbox_group/run: 
> >will-it-scale/performance-thread-100%-signal1/lkp-knm01
> >
> >570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
> >  --  
> > %stddev  change

Re: [PATCH 08/11] powercap/intel_rapl: Support multi-die/package

2019-02-20 Thread Len Brown

On Wed, Feb 20, 2019 at 6:02 AM Peter Zijlstra  wrote:

> >   list_for_each_entry(rp, _packages, plist) {
> > @@ -1457,7 +1457,7 @@ static void rapl_remove_package(struct rapl_package 
> > *rp)
> >  /* called from CPU hotplug notifier, hotplug lock held */
> >  static struct rapl_package *rapl_add_package(int cpu)
> >  {
> > - int id = topology_physical_package_id(cpu);
> > + int id = topology_unique_die_id(cpu);
> >   struct rapl_package *rp;
> >   int ret;
>
> And now your new function names are misnomers.

That is fair.

Seems that a subsequent re-name-only patch is appropriate.

Len Brown, Intel Open Source Technology Center

[PATCH v2 2/2] dmaengine: sprd: Change channel id to slave id for DMA cell specifier

2019-02-20 Thread Baolin Wang

We will describe the slave id in DMA cell specifier instead of DMA channel
id, thus we should save the slave id from DMA engine translation function,
and remove the channel id validation.

Meanwhile we do not need set default slave id in 
sprd_dma_alloc_chan_resources(),
remove it.

Signed-off-by: Baolin Wang 
---
Changes from v1:
 - Remove channel id from DT.
---
 drivers/dma/sprd-dma.c |   19 ---
 1 file changed, 4 insertions(+), 15 deletions(-)

diff --git a/drivers/dma/sprd-dma.c b/drivers/dma/sprd-dma.c
index e2f0167..48431e2 100644
--- a/drivers/dma/sprd-dma.c
+++ b/drivers/dma/sprd-dma.c
@@ -580,15 +580,7 @@ static irqreturn_t dma_irq_handle(int irq, void *dev_id)
 
 static int sprd_dma_alloc_chan_resources(struct dma_chan *chan)
 {
-   struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
-   int ret;
-
-   ret = pm_runtime_get_sync(chan->device->dev);
-   if (ret < 0)
-   return ret;
-
-   schan->dev_id = SPRD_DMA_SOFTWARE_UID;
-   return 0;
+   return pm_runtime_get_sync(chan->device->dev);
 }
 
 static void sprd_dma_free_chan_resources(struct dma_chan *chan)
@@ -1021,13 +1013,10 @@ static void sprd_dma_free_desc(struct virt_dma_desc *vd)
 static bool sprd_dma_filter_fn(struct dma_chan *chan, void *param)
 {
struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
-   struct sprd_dma_dev *sdev = to_sprd_dma_dev(>vc.chan);
-   u32 req = *(u32 *)param;
+   u32 slave_id = *(u32 *)param;
 
-   if (req < sdev->total_chns)
-   return req == schan->chn_num + 1;
-   else
-   return false;
+   schan->dev_id = slave_id;
+   return true;
 }
 
 static int sprd_dma_probe(struct platform_device *pdev)
-- 
1.7.9.5

[PATCH v2 1/2] dt-bindings: dmaengine: sprd: Change channel id to slave id for DMA cell specifier

2019-02-20 Thread Baolin Wang

For Spreadtrum DMA engine, all channels are equal, which means slave can
request any channels with setting a unique slave id to trigger this channel.

Thus we can remove the channel id from device tree to assign the channel
dynamically, moreover we should add the slave id in device tree.

Signed-off-by: Baolin Wang 
---
Changes from v1:
 - Remove channel id from DT.
---
 Documentation/devicetree/bindings/dma/sprd-dma.txt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/dma/sprd-dma.txt 
b/Documentation/devicetree/bindings/dma/sprd-dma.txt
index 7a10fea..adccea994 100644
--- a/Documentation/devicetree/bindings/dma/sprd-dma.txt
+++ b/Documentation/devicetree/bindings/dma/sprd-dma.txt
@@ -31,7 +31,7 @@ DMA clients connected to the Spreadtrum DMA controller must 
use the format
 described in the dma.txt file, using a two-cell specifier for each channel.
 The two cells in order are:
 1. A phandle pointing to the DMA controller.
-2. The channel id.
+2. The slave id.
 
 spi0: spi@70a0{
...
-- 
1.7.9.5

Re: linux-next: Fixes tag needs some work in the net-next tree

2019-02-20 Thread Vinod Koul

On 20-02-19, 20:59, Stefano Brivio wrote:
> On Wed, 20 Feb 2019 11:02:01 -0800 (PST)
> David Miller  wrote:
> 
> > From: Jiri Pirko 
> > Date: Wed, 20 Feb 2019 09:36:11 +0100
> > 
> > > Would be good to have some robot checking "Fixes" sanity...  
> > 
> > I want to add a script to my trees that locally do it for me but the
> > backlog for patch review for me is so huge that I never get to "fun"
> > tasks like that
> 
> If it helps, this is what I use after being bitten once:
> 
> #!/bin/sh
> 
> [ ${#} -ne 2 ] && echo "Usage: %0 PATCH_FILE GIT_TREE" && exit 1
> grep "^Fixes: " "${1}" | while read -r f; do
>   sha="$(echo "${f}" | cut -d' ' -f2)"
>   if [ -z "${sha}" ] || [ "${f}" != "$(git -C "${2}" show -s --abbrev=12 
> --pretty=format:"Fixes: %h (\"%s\")" "${sha}" 2>/dev/null)" ]; then
>   echo "Bad tag: ${f}" && exit 1
>   fi
> done

Awesome thanks, I am adding this into my patch commit script and well as
send script

-- 
~Vinod

[PATCH v2] dax: Check the end of the block-device capacity with dax_direct_access()

2019-02-20 Thread Dan Williams

The checks in __bdev_dax_supported() helped mitigate a potential data
corruption bug in the pmem driver's handling of section alignment
padding. Strengthen the checks, including checking the end of the range,
to validate the dev_pagemap, Xarray entries, and sector-to-pfn
translation established for pmem namespaces.

Acked-by: Jan Kara 
Cc: "Darrick J. Wong" 
Signed-off-by: Dan Williams 
---
Changes in v2, simplify the calculation of the sector representing the
last page / pfn of the device.

 drivers/dax/super.c |   38 --
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 6e928f37d084..0cb8c30ea278 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -86,12 +86,14 @@ bool __bdev_dax_supported(struct block_device *bdev, int 
blocksize)
 {
struct dax_device *dax_dev;
bool dax_enabled = false;
+   pgoff_t pgoff, pgoff_end;
struct request_queue *q;
-   pgoff_t pgoff;
-   int err, id;
-   pfn_t pfn;
-   long len;
char buf[BDEVNAME_SIZE];
+   void *kaddr, *end_kaddr;
+   pfn_t pfn, end_pfn;
+   sector_t last_page;
+   long len, len2;
+   int err, id;
 
if (blocksize != PAGE_SIZE) {
pr_debug("%s: error: unsupported blocksize for dax\n",
@@ -113,6 +115,14 @@ bool __bdev_dax_supported(struct block_device *bdev, int 
blocksize)
return false;
}
 
+   last_page = PFN_DOWN(i_size_read(bdev->bd_inode) - 1) * 8;
+   err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, _end);
+   if (err) {
+   pr_debug("%s: error: unaligned partition for dax\n",
+   bdevname(bdev, buf));
+   return false;
+   }
+
dax_dev = dax_get_by_host(bdev->bd_disk->disk_name);
if (!dax_dev) {
pr_debug("%s: error: device does not support dax\n",
@@ -121,14 +131,15 @@ bool __bdev_dax_supported(struct block_device *bdev, int 
blocksize)
}
 
id = dax_read_lock();
-   len = dax_direct_access(dax_dev, pgoff, 1, NULL, );
+   len = dax_direct_access(dax_dev, pgoff, 1, , );
+   len2 = dax_direct_access(dax_dev, pgoff_end, 1, _kaddr, _pfn);
dax_read_unlock(id);
 
put_dax(dax_dev);
 
-   if (len < 1) {
+   if (len < 1 || len2 < 1) {
pr_debug("%s: error: dax access failed (%ld)\n",
-   bdevname(bdev, buf), len);
+   bdevname(bdev, buf), len < 1 ? len : len2);
return false;
}
 
@@ -143,13 +154,20 @@ bool __bdev_dax_supported(struct block_device *bdev, int 
blocksize)
 */
WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
dax_enabled = true;
-   } else if (pfn_t_devmap(pfn)) {
-   struct dev_pagemap *pgmap;
+   } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
+   struct dev_pagemap *pgmap, *end_pgmap;
 
pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
-   if (pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX)
+   end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
+   if (pgmap && pgmap == end_pgmap && pgmap->type == 
MEMORY_DEVICE_FS_DAX
+   && pfn_t_to_page(pfn)->pgmap == pgmap
+   && pfn_t_to_page(end_pfn)->pgmap == pgmap
+   && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
+   && pfn_t_to_pfn(end_pfn) == 
PHYS_PFN(__pa(end_kaddr)))
dax_enabled = true;
put_dev_pagemap(pgmap);
+   put_dev_pagemap(end_pgmap);
+
}
 
if (!dax_enabled) {

Re: [PATCH 3/6] lib/string: Use correct docstring format

2019-02-20 Thread Kees Cook

On Wed, Feb 20, 2019 at 8:14 PM Randy Dunlap  wrote:
> It's already in Documentation/core-api/kernel-api.rst, under
> "String Manipulation."

Ah! Thanks, yes, I missed it. :)

-- 
Kees Cook

[PATCH] staging: rtlwifi: Use proper enum for return in halmac_parse_psd_data_88xx

2019-02-20 Thread Nathan Chancellor

Clang warns:

drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c:2472:11:
warning: implicit conversion from enumeration type 'enum
halmac_cmd_process_status' to different enumeration type 'enum
halmac_ret_status' [-Wenum-conversion]
return HALMAC_CMD_PROCESS_ERROR;
~~ ^~~~
1 warning generated.

Fix this by using the proper enum for allocation failures,
HALMAC_RET_MALLOC_FAIL, which is used in the rest of this file.

Fixes: e4b08e16b7d9 ("staging: r8822be: check kzalloc return or bail")
Link: https://github.com/ClangBuiltLinux/linux/issues/375
Signed-off-by: Nathan Chancellor 
---
 drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c 
b/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c
index ec742da030db..ddbeff8224ab 100644
--- a/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c
+++ b/drivers/staging/rtlwifi/halmac/halmac_88xx/halmac_func_88xx.c
@@ -2469,7 +2469,7 @@ halmac_parse_psd_data_88xx(struct halmac_adapter 
*halmac_adapter, u8 *c2h_buf,
if (!psd_set->data) {
psd_set->data = kzalloc(psd_set->data_size, GFP_KERNEL);
if (!psd_set->data)
-   return HALMAC_CMD_PROCESS_ERROR;
+   return HALMAC_RET_MALLOC_FAIL;
}
 
if (segment_id == 0)
-- 
2.21.0.rc1

Re: [PATCH 5/6] lib: Fix function documentation for strncpy_from_user

2019-02-20 Thread Tobin C. Harding

On Wed, Feb 20, 2019 at 05:05:10PM -0800, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 4:52 PM Jann Horn  wrote:
> > AFAICS the byte_at_a_time loop exits when max==0 is reached, and then
> > if `res >= count` (in other words, if we've copied as many bytes as
> > requested, haven't encountered a null byte so far, and haven't reached
> > the end of the address space), we return `res`, which is the same as
> > `count`. Are you sure?
> 
> Oh, whew, there is only 1 arch-specific implementation of this. I
> thought you meant there was multiple implementations.
> 
> So, generally speaking, I'd love to split all strncpy* uses into
> strscpy_zero() (when expecting to do str->str copies), and some new
> function, named like mempadstr() or str2mem() that copies a str to a
> __nonstring char array, with trailing padding, if there is space. Then
> there is no more mixing the two cases and botching things.

Oh cool, treewide changes, I'm down with that.  So to v2 I'll add
str2mem() and then attack the tree as suggested.  What could possibly go
wrong :)?

Tobin

Re: [PATCH 4/6] lib/string: Add string copy/zero function

2019-02-20 Thread Tobin C. Harding

On Wed, Feb 20, 2019 at 04:48:18PM -0800, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 3:24 PM Tobin C. Harding  wrote:
> >
> > We have a function to copy strings safely and we have a function to copy
> > strings _and_ zero the tail of the destination (if source string is
> > shorter than destination buffer) but we do not have a function to do
> > both at once.  This means developers must write this themselves if they
> > desire this functionality.  This is a chore, and also leaves us open to
> > off by one errors unnecessarily.
> >
> > Add a function that calls strscpy() then memset()s the tail to zero if
> > the source string is shorter than the destination buffer.
> >
> > Add testing via kselftest.
> >
> > Signed-off-by: Tobin C. Harding 
> > ---
> >  include/linux/string.h |  4 
> >  lib/Kconfig.debug  |  2 +-
> >  lib/string.c   | 30 --
> >  lib/test_string.c  | 31 +++
> >  4 files changed, 64 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/string.h b/include/linux/string.h
> > index 7927b875f80c..695a5e6a31e3 100644
> > --- a/include/linux/string.h
> > +++ b/include/linux/string.h
> > @@ -31,6 +31,10 @@ size_t strlcpy(char *, const char *, size_t);
> >  #ifndef __HAVE_ARCH_STRSCPY
> >  ssize_t strscpy(char *, const char *, size_t);
> >  #endif
> > +
> > +/* Wrapper function, no arch specific code required */
> > +ssize_t strscpy_zeroed(char *dest, const char *src, size_t count);
> 
> bikeshed: I think "pad" is shorter and more descriptive. How about
> something like strspad() strscpy_pad() or strscpy_zero()? (just to
> shorten it slightly)

I like strscpy_pad()

> Not a blocker, just a TODO: we need a wrapper to do
> CONFIG_FORTIFY_SOURCE checking for strscpy() (and strscpy_zeroed()) to
> check for __builtin_object_size() vs the "size" argument, as done in
> strlcpy() in include/linux/string.h

I'll look into this for v2

> > @@ -238,6 +237,33 @@ ssize_t strscpy(char *dest, const char *src, size_t 
> > count)
> >  EXPORT_SYMBOL(strscpy);
> >  #endif
> >
> > +/**
> > + * strscopy_zeroed() - Copy a C-string into a sized buffer
> > + * @dest: Where to copy the string to
> > + * @src: Where to copy the string from
> > + * @count: Size of destination buffer
> > + *
> > + * If the source string is shorter than the destination buffer, zeros
> > + * the tail of the destination buffer.
> > + *
> > + * Return: The number of characters copied (not including the trailing
> > + * NUL) or -E2BIG if the destination buffer wasn't big enough.
> > + */
> > +ssize_t strscpy_zeroed(char *dest, const char *src, size_t count)
> > +{
> > +   ssize_t written;
> > +
> > +   written = strscpy(dest, src, count);
> > +   if (written < 0)
> > +   return written;
> 
> If written < 0 we filled everything (i.e. we wrote "count - 1" bytes).
> If we also exactly wrote "count - 1", then we also don't need the zero
> padding either, since strscpy wrote the trailing NUL.
> 
> so:
> 
> if (written < 0 || (count && written == count - 1))
> return written;
> 
> > +
> > +   if (written < count)
> > +   memset(dest + written, 0, count - written);
> 
> Now we know written must be [0, count - 2], so we can just:
> 
> memset(dest + written + 1, 0, count - written - 1);
> 
> The pattern (which should be added to the seltest) is:
> 
> count   source  written pad@
> 0   *   -E2BIG (0 char, 0 NUL, 0 to zero)
> 
> 1   "a" -E2BIG (0 char, 1 NUL, 0 to zero)
> 1   ""  0 (0 char, 1 NUL, 0 to zero)
> 
> 2   "ab"-E2BIG (1 char, 1 NUL, 0 to zero)
> 2   "a" 1 (1 char, 1 NUL, 0 to zero)
> 2   ""  0 (0 char, 1 NUL, 1 to zero)dest + 1
> 
> 3   "abc"   -E2BIG (2 char, 1 NUL, 0 to zero)
> 3   "ab"2 (2 char, 1 NUL, 0 to zero)
> 3   "a" 1 (1 char, 1 NUL, 1 to zero)dest + 2
> 3   ""  0 (0 char, 1 NUL, 2 to zero)dest + 1
> 
> 4   "abcd"  -E2BIG (3 char, 1 NUL, 0 to zero)
> 4   "abc"   3 (3 char, 1 NUL, 0 to zero)
> 4   "ab"2 (2 char, 1 NUL, 1 to zero)dest + 3
> 4   "a" 1 (1 char, 1 NUL, 2 to zero)dest + 2
> 4   ""  0 (0 char, 1 NUL, 3 to zero)dest + 1

So thorough, you're the man.

> > +
> > +   return written;
> > +}
> > +EXPORT_SYMBOL(strscpy_zeroed);
> > +
> >  #ifndef __HAVE_ARCH_STRCAT
> >  /**
> >   * strcat - Append one %NUL-terminated string to another
> > diff --git a/lib/test_string.c b/lib/test_string.c
> > index a9cba442389a..cc4eef51a395 100644
> > --- a/lib/test_string.c
> > +++ b/lib/test_string.c
> > @@ -111,6 +111,32 @@ static __init int memset64_selftest(void)
> > return 0;
> >  }
> >
> > +static __init int strscpy_zeroed_selftest(void)
> > +{
> > +   char

Re: [PATCH 2/6] lib/string: Fix erroneous 'overflow' documentation

2019-02-20 Thread Tobin C. Harding

On Wed, Feb 20, 2019 at 04:02:37PM -0800, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 3:24 PM Tobin C. Harding  wrote:
> >
> > Current documentation uses 'overflow' to describe a situation where less
> > data is written to a buffer than buffer size not more.  'overflow' is
> > the wrong word here - since we don't typically say 'underflow' change
> > the whole sentence.
> >
> > Fix erroneous 'overflow' documentation for under filled buffer.
> >
> > Signed-off-by: Tobin C. Harding 
> > ---
> >  lib/string.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/string.c b/lib/string.c
> > index 38e4ca08e757..7f1d72db53c5 100644
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -173,8 +173,8 @@ EXPORT_SYMBOL(strlcpy);
> >   *
> >   * Preferred to strncpy() since it always returns a valid string, and
> >   * doesn't unnecessarily force the tail of the destination buffer to be
> > - * zeroed.  If the zeroing is desired, it's likely cleaner to use strscpy()
> > - * with an overflow test, then just memset() the tail of the dest buffer.
> > + * zeroed.  If the zeroing is desired, it's likely cleaner to use 
> > strscpy(),
> > + * check the return size, then just memset() the tail of the dest buffer.
> >   */
> 
> I'd just fold this patch into the strscpy_zeroed() patch. No need for
> a kind of "no op" change here when we'll just change it again with a
> better advice ("use strscpy_zeroed()!")

Got it.

thanks,
Tobin.

[RFC PATCH 1/1] f2fs-dev: ioctl for removing a range from F2FS

2019-02-20 Thread sunqiuyang

From: Qiuyang Sun 

This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.

This feature can be used for adjusting partition sizes online.

Signed-off-by: Qiuyang Sun 
---
 fs/f2fs/f2fs.h|  9 ++
 fs/f2fs/file.c| 28 +++
 fs/f2fs/gc.c  | 83 +--
 fs/f2fs/segment.c | 47 +++
 fs/f2fs/segment.h |  1 +
 fs/f2fs/super.c   |  1 +
 6 files changed, 156 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 8c69e12..fd7f3ba 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -406,6 +406,8 @@ static inline bool __has_cursum_space(struct f2fs_journal 
*journal,
 #define F2FS_IOC_SET_PIN_FILE  _IOW(F2FS_IOCTL_MAGIC, 13, __u32)
 #define F2FS_IOC_GET_PIN_FILE  _IOR(F2FS_IOCTL_MAGIC, 14, __u32)
 #define F2FS_IOC_PRECACHE_EXTENTS  _IO(F2FS_IOCTL_MAGIC, 15)
+#define F2FS_IOC_RESIZE_FROM_END   _IOWR(F2FS_IOCTL_MAGIC, 16, \
+   struct f2fs_resize_from_end)
 
 #define F2FS_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY
 #define F2FS_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY
@@ -457,6 +459,10 @@ struct f2fs_flush_device {
u32 segments;   /* # of segments to flush */
 };
 
+struct f2fs_resize_from_end {
+   u64 len;/* bytes to shrink */
+};
+
 /* for inline stuff */
 #define DEF_INLINE_RESERVED_SIZE   1
 static inline int get_extra_isize(struct inode *inode);
@@ -1226,6 +1232,7 @@ struct f2fs_sb_info {
unsigned int segs_per_sec;  /* segments per section */
unsigned int secs_per_zone; /* sections per zone */
unsigned int total_sections;/* total section count */
+   unsigned int new_total_sections;/* for resize from end */
unsigned int total_node_count;  /* total node block count */
unsigned int total_valid_node_count;/* valid node block count */
loff_t max_file_blocks; /* max block index of file */
@@ -3008,6 +3015,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
 int f2fs_disable_cp_again(struct f2fs_sb_info *sbi);
 void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
 int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+void allocate_segment_for_resize(struct f2fs_sb_info *sbi, int type);
 void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
 int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
 bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
@@ -3146,6 +3154,7 @@ int f2fs_migrate_page(struct address_space *mapping, 
struct page *newpage,
 int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background,
unsigned int segno);
 void f2fs_build_gc_manager(struct f2fs_sb_info *sbi);
+int f2fs_resize_from_end(struct f2fs_sb_info *sbi, size_t resize_len);
 
 /*
  * recovery.c
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index b8f5d12..29e70fd 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2968,6 +2968,32 @@ static int f2fs_ioc_precache_extents(struct file *filp, 
unsigned long arg)
return f2fs_precache_extents(file_inode(filp));
 }
 
+static int f2fs_ioc_resize_from_end(struct file *filp, unsigned long arg)
+{
+   struct f2fs_sb_info *sbi = F2FS_I_SB(file_inode(filp));
+   struct f2fs_resize_from_end param;
+   int ret;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
+   if (f2fs_readonly(sbi->sb))
+   return -EROFS;
+
+   if (copy_from_user(, (struct f2fs_resize_from_end __user *)arg,
+   sizeof(param)))
+   return -EFAULT;
+
+   ret = mnt_want_write_file(filp);
+   if (ret)
+   return ret;
+
+   ret = f2fs_resize_from_end(sbi, param.len);
+   mnt_drop_write_file(filp);
+
+   return ret;
+}
+
 long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)
@@ -3024,6 +3050,8 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
return f2fs_ioc_set_pin_file(filp, arg);
case F2FS_IOC_PRECACHE_EXTENTS:
return f2fs_ioc_precache_extents(filp, arg);
+   case F2FS_IOC_RESIZE_FROM_END:
+   return f2fs_ioc_resize_from_end(filp, arg);
default:
return -ENOTTY;
}
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 195cf0f..3877e99 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -311,7 +311,7 @@ static int get_victim_by_default(struct f2fs_sb_info *sbi,
struct sit_info *sm = SIT_I(sbi);
struct victim_sel_policy p;
unsigned int secno, last_victim;
-   unsigned int last_segment = MAIN_SEGS(sbi);
+

Re: [PATCH 1/6] lib/string: Enable string selftesting

2019-02-20 Thread Tobin C. Harding

On Wed, Feb 20, 2019 at 03:57:18PM -0800, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 3:24 PM Tobin C. Harding  wrote:
> >
> > Currently we have a test module but it is not tied into the kselftest
> > infrastructure.  In preparation for adding string manipulation functions
> > and testing we should enable kselftest to utilize the test module.
> >
> > Enable string testing via kselftest infrastructure.
> >
> > Signed-off-by: Tobin C. Harding 
> > ---
> >  lib/Kconfig.debug | 14 ++
> >  lib/Makefile  |  2 +-
> >  lib/test_string.c |  4 ++--
> >  tools/testing/selftests/lib/Makefile  |  2 +-
> >  tools/testing/selftests/lib/config|  1 +
> >  tools/testing/selftests/lib/string.sh | 19 +++
> >  6 files changed, 38 insertions(+), 4 deletions(-)
> >  create mode 100755 tools/testing/selftests/lib/string.sh
> >
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index d4df5b24d75e..0dca64c1d8a4 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1802,8 +1802,22 @@ config ASYNC_RAID6_TEST
> >  config TEST_HEXDUMP
> > tristate "Test functions located in the hexdump module at runtime"
> >
> > +config TEST_STRING
> > +   tristate "Perform selftest on string manipulation functions"
> > +   default n
> > +   help
> > +Enable this option to test string manipulation functions.
> > +   Currently this only tests memset_{16,32,64}.
> > +
> > +   If unsure, say N.
> > +
> >  config TEST_STRING_HELPERS
> > tristate "Test functions located in the string_helpers module at 
> > runtime"
> > +   default n
> > +   help
> > +Enable this option to unit test code in lib/string_helpers.c
> > +
> > +If unsure, say N.
> >
> >  config TEST_KSTRTOX
> > tristate "Test kstrto*() family of functions at runtime"
> > diff --git a/lib/Makefile b/lib/Makefile
> > index e1b59da71418..9c30e1fee27f 100644
> > --- a/lib/Makefile
> > +++ b/lib/Makefile
> > @@ -39,7 +39,7 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o 
> > random32.o \
> >  bsearch.o find_bit.o llist.o memweight.o kfifo.o \
> >  percpu-refcount.o rhashtable.o reciprocal_div.o \
> >  once.o refcount.o usercopy.o errseq.o bucket_locks.o
> > -obj-$(CONFIG_STRING_SELFTEST) += test_string.o
> > +obj-$(CONFIG_TEST_STRING) += test_string.o
> 
> This patch should remove 'config STRING_SELFTEST' from lib/Kconfig too.
> 
> > diff --git a/tools/testing/selftests/lib/Makefile 
> > b/tools/testing/selftests/lib/Makefile
> > index 70d5711e3ac8..2ee4559b277e 100644
> > --- a/tools/testing/selftests/lib/Makefile
> > +++ b/tools/testing/selftests/lib/Makefile
> > @@ -3,6 +3,6 @@
> >  # No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
> >  all:
> >
> > -TEST_PROGS := printf.sh bitmap.sh prime_numbers.sh
> > +TEST_PROGS := printf.sh bitmap.sh prime_numbers.sh string.sh
> >
> >  include ../lib.mk
> > diff --git a/tools/testing/selftests/lib/config 
> > b/tools/testing/selftests/lib/config
> > index 126933bcc950..2032402ad409 100644
> > --- a/tools/testing/selftests/lib/config
> > +++ b/tools/testing/selftests/lib/config
> > @@ -1,3 +1,4 @@
> >  CONFIG_TEST_PRINTF=m
> >  CONFIG_TEST_BITMAP=m
> > +CONFIG_TEST_STRING=m
> >  CONFIG_PRIME_NUMBERS=m
> > diff --git a/tools/testing/selftests/lib/string.sh 
> > b/tools/testing/selftests/lib/string.sh
> > new file mode 100755
> > index ..99024b6f3a6a
> > --- /dev/null
> > +++ b/tools/testing/selftests/lib/string.sh
> > @@ -0,0 +1,19 @@
> > +#!/bin/sh
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Runs string manipulation tests using test_string kernel module
> > +
> > +# Kselftest framework requirement - SKIP code is 4.
> > +ksft_skip=4
> > +
> > +if ! /sbin/modprobe -q -n test_string; then
> > +   echo "string: module test_string is not found [SKIP]"
> > +   exit $ksft_skip
> > +fi
> > +
> > +if /sbin/modprobe -q test_string; then
> > +   /sbin/modprobe -q -r test_string
> > +   echo "string: ok"
> > +else
> > +   echo "string: [FAIL]"
> > +   exit 1
> > +fi
> > --
> > 2.20.1
> >
> 
> You mentioned "redundant scripts" here. You might want to refactor
> first, and have a common tool that does the core testing, and then
> have the scripts doing one line each:
> 
> i.e.:
> 
> #!/bin/bash
> exec./test_module.sh prime_numbers selftest=65536
> 
> with "test_module.sh" doing all the rest.
> 
> I bet there are other test_*.ko tests we could wire up too, and this
> refactor will make that much easier. And actually, maybe we should
> just have a single test running that just reads the "config" file for
> the list of test modules, and runs them with the correct output format
> to show which are skipped, etc.

Got it.  I'll have a go at all this and pre-pend it to v2.

thanks,
Tobin.

Re: [PATCH 0/6] lib: Add safe string funtions

2019-02-20 Thread Tobin C. Harding

On Wed, Feb 20, 2019 at 03:31:07PM -0800, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 3:24 PM Tobin C. Harding  wrote:
> > During your talk at LCA you mentioned that we could do with a couple
> > more safe string functions.  One to zero the tail of the destination
> > buffer after call to strscpy() and also the self explanatory
> > strscpy_from_user().
> 
> Thanks for jumping in with this! :)

Good to be working with you again.

> > I couldn't work out if this is a false positive or not?  Does the new
> > config option CONFIG_TEST_STRING need more documentation?  I don't see
> > where extra docs should be added and it seems self explanatory as is.
> 
> Usually this just means the help string in Kconfig is "too short".
> Sometimes this is a false positive -- really up to you if you think it
> needs more. :)

Cool, thanks.

Re: [PATCH] kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig

2019-02-20 Thread Nathan Chancellor

On Thu, Feb 21, 2019 at 01:13:38PM +0900, Masahiro Yamada wrote:
> Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
> various false positives:
> 
>  - commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building
>with -Os") turned off this option for -Os.
> 
>  - commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning
>for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
>CONFIG_PROFILE_ALL_BRANCHES
> 
>  - commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning
>for "make W=1"") turned off this option for GCC < 4.9
>Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903
> 
> I think this looks better by shifting the logic from Makefile to Kconfig.
> 

I agree!

> Signed-off-by: Masahiro Yamada 

Reviewed-by: Nathan Chancellor 

> ---
> 
>  Makefile | 10 +++---
>  init/Kconfig | 17 +
>  kernel/trace/Kconfig |  1 +
>  3 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 1bb0535..b21aa2e3 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -656,17 +656,13 @@ KBUILD_CFLAGS   += $(call cc-disable-warning, 
> int-in-bool-context)
>  
>  ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
>  KBUILD_CFLAGS+= $(call cc-option,-Oz,-Os)
> -KBUILD_CFLAGS+= $(call cc-disable-warning,maybe-uninitialized,)
> -else
> -ifdef CONFIG_PROFILE_ALL_BRANCHES
> -KBUILD_CFLAGS+= -O2 $(call cc-disable-warning,maybe-uninitialized,)
>  else
>  KBUILD_CFLAGS   += -O2
>  endif
> -endif
>  
> -KBUILD_CFLAGS += $(call cc-ifversion, -lt, 0409, \
> - $(call cc-disable-warning,maybe-uninitialized,))
> +ifdef CONFIG_CC_DISABLE_WARN_MAYBE_UNINITIALIZED
> +KBUILD_CFLAGS   += -Wno-maybe-uninitialized
> +endif
>  
>  # Tell gcc to never replace conditional load with a non-conditional one
>  KBUILD_CFLAGS+= $(call cc-option,--param=allow-store-data-races=0)
> diff --git a/init/Kconfig b/init/Kconfig
> index c9386a3..1f05a88 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -26,6 +26,22 @@ config CLANG_VERSION
>  config CC_HAS_ASM_GOTO
>   def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC))
>  
> +config CC_HAS_WARN_MAYBE_UNINITIALIZED
> + def_bool $(cc-option,-Wmaybe-uninitialized)
> + help
> +   GCC >= 4.7 supports this option.
> +
> +config CC_DISABLE_WARN_MAYBE_UNINITIALIZED
> + bool
> + depends on CC_HAS_WARN_MAYBE_UNINITIALIZED
> + default CC_IS_GCC && GCC_VERSION < 40900  # unreliable for GCC < 4.9
> + help
> +   GCC's -Wmaybe-uninitialized is not reliable by definition.
> +   Lots of false positive warnings are produced in some cases.
> +
> +   If this option is enabled, -Wno-maybe-uninitialzed is passed
> +   to the compiler to suppress maybe-uninitialized warnings.
> +
>  config CONSTRUCTORS
>   bool
>   depends on !UML
> @@ -1113,6 +1129,7 @@ config CC_OPTIMIZE_FOR_PERFORMANCE
>  
>  config CC_OPTIMIZE_FOR_SIZE
>   bool "Optimize for size"
> + imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
>   help
> Enabling this option will pass "-Os" instead of "-O2" to
> your compiler resulting in a smaller kernel.
> diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
> index fa8b1fe..8bd1d6d 100644
> --- a/kernel/trace/Kconfig
> +++ b/kernel/trace/Kconfig
> @@ -370,6 +370,7 @@ config PROFILE_ANNOTATED_BRANCHES
>  config PROFILE_ALL_BRANCHES
>   bool "Profile all if conditionals" if !FORTIFY_SOURCE
>   select TRACE_BRANCH_PROFILING
> + imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
>   help
> This tracer profiles all branch conditions. Every if ()
> taken in the kernel is recorded whether it hit or miss.
> -- 
> 2.7.4
>

[PATCH] f2fs: no need to take page lock in readdir

2019-02-20 Thread Gao Xiang

VFS will take inode_lock for readdir, therefore no need to
take page lock in readdir at all just as the majority of
other generic filesystems.

This patch improves concurrency since .iterate_shared
was introduced to VFS years ago.

Signed-off-by: Gao Xiang 
---

 personally tend to use read_mapping_page here, but it seems
 that f2fs has some remaining customized code since it was
 merged into Linux, use f2fs_find_data_page instead.

 fs/f2fs/dir.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index ecc3a4e2be96..64602bc1e092 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -873,7 +873,7 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
page_cache_sync_readahead(inode->i_mapping, ra, file, n,
min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES));
 
-   dentry_page = f2fs_get_lock_data_page(inode, n, false);
+   dentry_page = f2fs_find_data_page(inode, n);
if (IS_ERR(dentry_page)) {
err = PTR_ERR(dentry_page);
if (err == -ENOENT) {
@@ -891,11 +891,11 @@ static int f2fs_readdir(struct file *file, struct 
dir_context *ctx)
err = f2fs_fill_dentries(ctx, ,
n * NR_DENTRY_IN_BLOCK, );
if (err) {
-   f2fs_put_page(dentry_page, 1);
+   f2fs_put_page(dentry_page, 0);
break;
}
 
-   f2fs_put_page(dentry_page, 1);
+   f2fs_put_page(dentry_page, 0);
}
 out_free:
fscrypt_fname_free_buffer();
-- 
2.14.4

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Huang, Ying

Wei Yang  writes:

> On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>>> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>>> > >Greeting,
>>> > >
>>> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due 
>>> > >to commit:
>>> > >
>>> > >
>>> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
>>> > >device->knode_class to device_private")
>>> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>> > >
>>> > 
>>> > This is interesting.
>>> > 
>>> > I didn't expect the move of this field will impact the performance.
>>> > 
>>> > The reason is struct device is a hotter memory than 
>>> > device->device_private?
>>> > 
>>> > >in testcase: will-it-scale
>>> > >on test machine: 288 threads Knights Mill with 80G memory
>>> > >with following parameters:
>>> > >
>>> > > nr_task: 100%
>>> > > mode: thread
>>> > > test: unlink2
>>> > > cpufreq_governor: performance
>>> > >
>>> > >test-description: Will It Scale takes a testcase and runs it from 1 
>>> > >through to n parallel copies to see if the testcase will scale. It 
>>> > >builds both a process and threads based test in order to see any 
>>> > >differences between the two.
>>> > >test-url: https://github.com/antonblanchard/will-it-scale
>>> > >
>>> > >In addition to that, the commit also has significant impact on the 
>>> > >following tests:
>>> > >
>>> > >+--+---+
>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% 
>>> > >regression |
>>> > >| test machine | 288 threads Knights Mill with 80G memory
>>> > >  |
>>> > >| test parameters  | cpufreq_governor=performance
>>> > >  |
>>> > >|  | mode=thread 
>>> > >  |
>>> > >|  | nr_task=100%
>>> > >  |
>>> > >|  | test=signal1
>>> > >  |
>>> 
>>> Ok, I'm going to blame your testing system, or something here, and not
>>> the above patch.
>>> 
>>> All this test does is call raise(3).  That does not touch the driver
>>> core at all.
>>> 
>>> > >+--+---+
>>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% 
>>> > >regression |
>>> > >| test machine | 288 threads Knights Mill with 80G memory
>>> > >  |
>>> > >| test parameters  | cpufreq_governor=performance
>>> > >  |
>>> > >|  | mode=thread 
>>> > >  |
>>> > >|  | nr_task=100%
>>> > >  |
>>> > >|  | test=open1  
>>> > >  |
>>> > >+--+---+
>>> 
>>> Same here, open1 just calls open/close a lot.  No driver core
>>> interaction at all there either.
>>> 
>>> So are you _sure_ this is the offending patch?
>>
>>Hi Greg,
>>
>>We did an experiment, recovered the layout of struct device. and we
>>found the regression is gone. I guess the regession is not from the
>>patch but related to the struct layout.
>>
>>
>>tests: 1
>>testcase/path_params/tbox_group/run: 
>>will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>>
>>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>  --  
>> %stddev  change %stddev
>> \  |\  
>>237096  14% 270789will-it-scale.workload
>>   823  14%939will-it-scale.per_thread_ops
>>
>
> Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
> before 570d020012?
>
>>
>>tests: 1
>>testcase/path_params/tbox_group/run: 
>>will-it-scale/performance-thread-100%-signal1/lkp-knm01
>>
>>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>  --  
>> %stddev  change %stddev
>> \  |\  
>> 93.51   3%48% 138.53   3%  will-it-scale.time.user_time
>>   186  40%261will-it-scale.per_thread_ops
>> 53909  40%  75507will-it-scale.workload
>>
>>
>>tests: 1
>>testcase/path_params/tbox_group/run: 
>>will-it-scale/performance-thread-100%-open1/lkp-knm01
>>
>>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>>  --  
>> %stddev  change %stddev
>> \  |\  
>>

Re: [PATCH] cpufreq: kyro: Reduce frame-size of qcom_cpufreq_kryo_probe()

2019-02-20 Thread Viresh Kumar

On 21-02-19, 10:02, Amit Kucheria wrote:
> Perhaps I was just unfamiliar with the dev_pm_opp_set_supported_hw()
> API where the actual allocation happens 3 levels deep. Maybe the
> comment should apply to dev_pm_opp_set_supported_hw(). I leave it to
> you to decide.

I think we are fine without any comments here :)

Thanks for your reviews Amit.

-- 
viresh

What are the new features of Linux Kernel 5.0-rc7?

2019-02-20 Thread Turritopsis Dohrnii Teo En Ming

Good afternoon from Singapore,

What are the new features of Linux Kernel 5.0-rc7?

Thank you.

===BEGIN EMAIL SIGNATURE===

The Gospel for all Targeted Individuals (TIs):

[The New York Times] Microwave Weapons Are Prime Suspect in Ills of
U.S. Embassy Workers

Link: 
https://www.nytimes.com/2018/09/01/science/sonic-attack-cuba-microwave.html



Singaporean Mr. Turritopsis Dohrnii Teo En Ming's Academic
Qualifications as at 14 Feb 2019

[1] https://tdtemcerts.wordpress.com/

[2] https://tdtemcerts.blogspot.sg/

[3] https://www.scribd.com/user/270125049/Teo-En-Ming

===END EMAIL SIGNATURE===

Re: [PATCH] cpufreq: kyro: Reduce frame-size of qcom_cpufreq_kryo_probe()

2019-02-20 Thread Amit Kucheria

On Thu, Feb 21, 2019 at 9:15 AM Viresh Kumar  wrote:
>
> On 20-02-19, 21:56, Amit Kucheria wrote:
> > On Wed, Feb 20, 2019 at 4:44 PM Viresh Kumar  
> > wrote:
> > >
> > > With the introduction of commit 846a415bf440 ("arm64: default NR_CPUS to
> > > 256"), we have started getting following compilation warning:
> > >
> > > qcom-cpufreq-kryo.c:168:1: warning: the frame size of 2160 bytes is 
> > > larger than 2048 bytes [-Wframe-larger-than=]
> > >
> > > Fix that by dynamically allocating opp_tables and freeing it later.
> > >
> > > Compile tested only.
> > >
> > > Signed-off-by: Viresh Kumar 
> > > ---
> > >  drivers/cpufreq/qcom-cpufreq-kryo.c | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cpufreq/qcom-cpufreq-kryo.c 
> > > b/drivers/cpufreq/qcom-cpufreq-kryo.c
> > > index 1c8583cc06a2..6888cb6db2ef 100644
> > > --- a/drivers/cpufreq/qcom-cpufreq-kryo.c
> > > +++ b/drivers/cpufreq/qcom-cpufreq-kryo.c
> > > @@ -75,7 +75,7 @@ static enum _msm8996_version 
> > > qcom_cpufreq_kryo_get_msm_id(void)
> > >
> > >  static int qcom_cpufreq_kryo_probe(struct platform_device *pdev)
> > >  {
> > > -   struct opp_table *opp_tables[NR_CPUS] = {0};
> > > +   struct opp_table **opp_tables;
> > > enum _msm8996_version msm8996_version;
> > > struct nvmem_cell *speedbin_nvmem;
> > > struct device_node *np;
> > > @@ -133,6 +133,10 @@ static int qcom_cpufreq_kryo_probe(struct 
> > > platform_device *pdev)
> > > }
> > > kfree(speedbin);
> > >
> > > +   opp_tables = kcalloc(num_possible_cpus(), sizeof(*opp_tables), 
> > > GFP_KERNEL);
> > > +   if (!opp_tables)
> > > +   return -ENOMEM;
> > > +
> >
> > Perhaps add a comment above that that actual opp_table is allocated in
> > the loop below because of dev_pm_opp_set_supported_hw?
> >
> > I was staring at this for a few minutes wondering why you needed this
> > kcalloc before I realised that opp_tables (missed the 's') is a
> > temporary array of pointers. :-)
>
> I feel that you got confused because this patch didn't had the diff
> where the opp_tables thing is getting used. When we see the .c file
> itself, it is pretty much clear on what is going on and I believe the
> comment would be totally unnecessary and redundant.
>
> This is how it looks now, please lemme know if you still prefer the
> comment :)

Perhaps I was just unfamiliar with the dev_pm_opp_set_supported_hw()
API where the actual allocation happens 3 levels deep. Maybe the
comment should apply to dev_pm_opp_set_supported_hw(). I leave it to
you to decide.

> opp_tables = kcalloc(num_possible_cpus(), sizeof(*opp_tables), 
> GFP_KERNEL);
> if (!opp_tables)
> return -ENOMEM;
>
> for_each_possible_cpu(cpu) {
> cpu_dev = get_cpu_device(cpu);
> if (NULL == cpu_dev) {
> ret = -ENODEV;
> goto free_opp;
> }
>
> opp_tables[cpu] = dev_pm_opp_set_supported_hw(cpu_dev,
>   , 1);
> if (IS_ERR(opp_tables[cpu])) {
> ret = PTR_ERR(opp_tables[cpu]);
> dev_err(cpu_dev, "Failed to set supported 
> hardware\n");
> goto free_opp;
> }
> }
>
> kfree(opp_tables);
>
>
> --
> viresh

RE: [PATCH v3 2/2] drivers: devfreq: add tracing for scheduling work

2019-02-20 Thread MyungJoo Ham

>This patch add basic tracing of the devfreq workqueue and delayed work.
>It aims to capture changes of the polling intervals and device state.
>
>Signed-off-by: Lukasz Luba 
>---
> drivers/devfreq/devfreq.c | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>index 0ae3de7..660365a 100644
>--- a/drivers/devfreq/devfreq.c
>+++ b/drivers/devfreq/devfreq.c

Acked-by: MyungJoo Ham

RE: [PATCH v3 1/2] trace: events: add devfreq trace event file

2019-02-20 Thread MyungJoo Ham

>The patch adds a new file for with trace events for devfreq
>framework. They are used for performance analysis of the framework.
>It also contains updates in MAINTAINERS file adding new entry for
>devfreq maintainers.
>
>Signed-off-by: Lukasz Luba 
>---
> MAINTAINERS|  1 +
> include/trace/events/devfreq.h | 40 
> 2 files changed, 41 insertions(+)
> create mode 100644 include/trace/events/devfreq.h

Acked-by: MyungJoo Ham 

Thanks!


Cheers,
MyungJoo

[PATCH] kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig

2019-02-20 Thread Masahiro Yamada

Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
various false positives:

 - commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building
   with -Os") turned off this option for -Os.

 - commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning
   for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
   CONFIG_PROFILE_ALL_BRANCHES

 - commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning
   for "make W=1"") turned off this option for GCC < 4.9
   Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

I think this looks better by shifting the logic from Makefile to Kconfig.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 10 +++---
 init/Kconfig | 17 +
 kernel/trace/Kconfig |  1 +
 3 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 1bb0535..b21aa2e3 100644
--- a/Makefile
+++ b/Makefile
@@ -656,17 +656,13 @@ KBUILD_CFLAGS += $(call cc-disable-warning, 
int-in-bool-context)
 
 ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
 KBUILD_CFLAGS  += $(call cc-option,-Oz,-Os)
-KBUILD_CFLAGS  += $(call cc-disable-warning,maybe-uninitialized,)
-else
-ifdef CONFIG_PROFILE_ALL_BRANCHES
-KBUILD_CFLAGS  += -O2 $(call cc-disable-warning,maybe-uninitialized,)
 else
 KBUILD_CFLAGS   += -O2
 endif
-endif
 
-KBUILD_CFLAGS += $(call cc-ifversion, -lt, 0409, \
-   $(call cc-disable-warning,maybe-uninitialized,))
+ifdef CONFIG_CC_DISABLE_WARN_MAYBE_UNINITIALIZED
+KBUILD_CFLAGS   += -Wno-maybe-uninitialized
+endif
 
 # Tell gcc to never replace conditional load with a non-conditional one
 KBUILD_CFLAGS  += $(call cc-option,--param=allow-store-data-races=0)
diff --git a/init/Kconfig b/init/Kconfig
index c9386a3..1f05a88 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -26,6 +26,22 @@ config CLANG_VERSION
 config CC_HAS_ASM_GOTO
def_bool $(success,$(srctree)/scripts/gcc-goto.sh $(CC))
 
+config CC_HAS_WARN_MAYBE_UNINITIALIZED
+   def_bool $(cc-option,-Wmaybe-uninitialized)
+   help
+ GCC >= 4.7 supports this option.
+
+config CC_DISABLE_WARN_MAYBE_UNINITIALIZED
+   bool
+   depends on CC_HAS_WARN_MAYBE_UNINITIALIZED
+   default CC_IS_GCC && GCC_VERSION < 40900  # unreliable for GCC < 4.9
+   help
+ GCC's -Wmaybe-uninitialized is not reliable by definition.
+ Lots of false positive warnings are produced in some cases.
+
+ If this option is enabled, -Wno-maybe-uninitialzed is passed
+ to the compiler to suppress maybe-uninitialized warnings.
+
 config CONSTRUCTORS
bool
depends on !UML
@@ -1113,6 +1129,7 @@ config CC_OPTIMIZE_FOR_PERFORMANCE
 
 config CC_OPTIMIZE_FOR_SIZE
bool "Optimize for size"
+   imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
help
  Enabling this option will pass "-Os" instead of "-O2" to
  your compiler resulting in a smaller kernel.
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index fa8b1fe..8bd1d6d 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -370,6 +370,7 @@ config PROFILE_ANNOTATED_BRANCHES
 config PROFILE_ALL_BRANCHES
bool "Profile all if conditionals" if !FORTIFY_SOURCE
select TRACE_BRANCH_PROFILING
+   imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED  # avoid false positives
help
  This tracer profiles all branch conditions. Every if ()
  taken in the kernel is recorded whether it hit or miss.
-- 
2.7.4

Re: general protection fault in __dentry_path

2019-02-20 Thread syzbot


syzbot has found a reproducer for the following crash on:

HEAD commit:2137397c92ae Merge tag 'sound-5.0' of git://git.kernel.org..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1270bf78c0
kernel config:  https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f
dashboard link: https://syzkaller.appspot.com/bug?extid=7857962b4d45e602b8ad
compiler:   gcc (GCC) 9.0.0 20181231 (experimental)
userspace arch: i386
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=150bee14c0
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12f401d4c0

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+7857962b4d45e602b...@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 12576 Comm: syz-executor696 Not tainted 5.0.0-rc7+ #81
kobject: 'kvm' (985ff3e6): kobject_uevent_env
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:__dentry_path+0x49e/0x7c0 fs/d_path.c:344
Code: 89 fc 41 83 e4 01 44 89 e6 e8 fe e4 b2 ff 45 84 e4 0f 85 04 02 00 00  
e8 b0 e3 b2 ff 48 8b 85 18 ff ff ff 44 89 bd 40 ff ff ff <80> 38 00 0f 85  
f9 02 00 00 48 8b 85 38 ff ff ff 41 83 e7 01 44 89
kobject: 'kvm' (985ff3e6): fill_kobj_path: path  
= '/devices/virtual/misc/kvm'

RSP: 0018:888096127c58 EFLAGS: 00010293
RAX: dc05 RBX:  RCX: 81bcfdc2
RDX:  RSI: 81bcfdd0 RDI: 0001
RBP: 888096127d48 R08: 88809b17c540 R09: 
R10:  R11:  R12: 
R13: 888096127d20 R14: 888092473afe R15: 00014e78
FS:  () GS:8880ae80(0063) knlGS:f7fe4b40
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2: 080fb028 CR3: 9de68000 CR4: 001426f0
kobject: 'kvm' (985ff3e6): kobject_uevent_env
Call Trace:
kobject: 'kvm' (985ff3e6): fill_kobj_path: path  
= '/devices/virtual/misc/kvm'

 dentry_path_raw+0x26/0x30 fs/d_path.c:371
 kvm_uevent_notify_change.part.0+0x213/0x440  
arch/x86/kvm/../../../virt/kvm/kvm_main.c:4051
 kvm_uevent_notify_change arch/x86/kvm/../../../virt/kvm/kvm_main.c:4018  
[inline]
 kvm_dev_ioctl_create_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:3356  
[inline]

 kvm_dev_ioctl+0x1132/0x1750 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3378
 __do_compat_sys_ioctl fs/compat_ioctl.c:1052 [inline]
 __se_compat_sys_ioctl fs/compat_ioctl.c:998 [inline]
 __ia32_compat_sys_ioctl+0x197/0x620 fs/compat_ioctl.c:998
 do_syscall_32_irqs_on arch/x86/entry/common.c:326 [inline]
 do_fast_syscall_32+0x281/0xc98 arch/x86/entry/common.c:397
 entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
RIP: 0023:0xf7fe8869
Code: 85 d2 74 02 89 0a 5b 5d c3 8b 04 24 c3 8b 14 24 c3 8b 3c 24 c3 90 90  
90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90  
90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90

RSP: 002b:f7fe41fc EFLAGS: 0293 ORIG_RAX: 0036
RAX: ffda RBX: 0003 RCX: ae01
RDX:  RSI:  RDI: 
RBP: 003d0f00 R08:  R09: 
R10:  R11:  R12: 
R13:  R14:  R15: 
Modules linked in:
---[ end trace 4fe494385b47fe74 ]---
kobject: 'kvm' (985ff3e6): kobject_uevent_env
RIP: 0010:__dentry_path+0x49e/0x7c0 fs/d_path.c:344
Code: 89 fc 41 83 e4 01 44 89 e6 e8 fe e4 b2 ff 45 84 e4 0f 85 04 02 00 00  
e8 b0 e3 b2 ff 48 8b 85 18 ff ff ff 44 89 bd 40 ff ff ff <80> 38 00 0f 85  
f9 02 00 00 48 8b 85 38 ff ff ff 41 83 e7 01 44 89

RSP: 0018:888096127c58 EFLAGS: 00010293
RAX: dc05 RBX:  RCX: 81bcfdc2
RDX:  RSI: 81bcfdd0 RDI: 0001
RBP: 888096127d48 R08: 88809b17c540 R09: 
R10:  R11:  R12: 
R13: 888096127d20 R14: 888092473afe R15: 00014e78
kobject: 'kvm' (985ff3e6): fill_kobj_path: path  
= '/devices/virtual/misc/kvm'

FS:  () GS:8880ae80(0063) knlGS:f7fe4b40
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2: 080fb038 CR3: 9de68000 CR4: 001426f0

[PATCH 2/2] loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()

2019-02-20 Thread Dongli Zhang

Commit 0da03cab87e6
("loop: Fix deadlock when calling blkdev_reread_part()") moves
blkdev_reread_part() out of the loop_ctl_mutex. However,
GENHD_FL_NO_PART_SCAN is set before __blkdev_reread_part(). As a result,
__blkdev_reread_part() will fail the check of GENHD_FL_NO_PART_SCAN and
will not rescan the loop device to delete all partitions.

Below are steps to reproduce the issue:

step1 # dd if=/dev/zero of=tmp.raw bs=1M count=100
step2 # losetup -P /dev/loop0 tmp.raw
step3 # parted /dev/loop0 mklabel gpt
step4 # parted -a none -s /dev/loop0 mkpart primary 64s 1
step5 # losetup -d /dev/loop0

Step5 will not be able to delete /dev/loop0p1 (introduced by step4) and
there is below kernel warning message:

[  464.414043] __loop_clr_fd: partition scan of loop0 failed (rc=-22)

This patch sets GENHD_FL_NO_PART_SCAN after blkdev_reread_part().

Fixes: 0da03cab87e6 ("loop: Fix deadlock when calling blkdev_reread_part()")
Signed-off-by: Dongli Zhang 
---
 drivers/block/loop.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 7908673..736e55b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1034,6 +1034,15 @@ loop_init_xfer(struct loop_device *lo, struct 
loop_func_table *xfer,
return err;
 }
 
+static void loop_disable_partscan(struct loop_device *lo)
+{
+   mutex_lock(_ctl_mutex);
+   lo->lo_flags = 0;
+   if (!part_shift)
+   lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
+   mutex_unlock(_ctl_mutex);
+}
+
 static int __loop_clr_fd(struct loop_device *lo, bool release)
 {
struct file *filp = NULL;
@@ -1096,9 +1105,6 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
 
partscan = lo->lo_flags & LO_FLAGS_PARTSCAN && bdev;
lo_number = lo->lo_number;
-   lo->lo_flags = 0;
-   if (!part_shift)
-   lo->lo_disk->flags |= GENHD_FL_NO_PART_SCAN;
loop_unprepare_queue(lo);
 out_unlock:
mutex_unlock(_ctl_mutex);
@@ -1121,6 +1127,9 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
/* Device is gone, no point in returning error */
err = 0;
}
+
+   loop_disable_partscan(lo);
+
/*
 * Need not hold loop_ctl_mutex to fput backing file.
 * Calling fput holding loop_ctl_mutex triggers a circular
-- 
2.7.4

[PATCH 0/2] loop: fix two issues introduced by prior commit

2019-02-20 Thread Dongli Zhang

This patch set fix two issues introduced by prior commit.


[PATCH 1/2] loop: do not print warn message if partition scan is successful

[PATCH 1/2] fixes d57f3374ba48 ("loop: Move special partition reread
handling in loop_clr_fd()") to not always print warn message even when
partition scan is successful.

[PATCH 2/2] loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()

[PATCH 2/2] fixes 0da03cab87e6 ("loop: Fix deadlock when calling
blkdev_reread_part()") to not set GENHD_FL_NO_PART_SCAN before partition
scan when detaching loop device from the file.

Thank you very much!

Dongli Zhang

[PATCH 1/2] loop: do not print warn message if partition scan is successful

2019-02-20 Thread Dongli Zhang

Do not print warn message when the partition scan returns 0.

Fixes: d57f3374ba48 ("loop: Move special partition reread handling in 
loop_clr_fd()")
Signed-off-by: Dongli Zhang 
---
 drivers/block/loop.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index cf55389..7908673 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1115,8 +1115,9 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
err = __blkdev_reread_part(bdev);
else
err = blkdev_reread_part(bdev);
-   pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
-   __func__, lo_number, err);
+   if (err)
+   pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
+   __func__, lo_number, err);
/* Device is gone, no point in returning error */
err = 0;
}
-- 
2.7.4

Re: [PATCH 3/6] lib/string: Use correct docstring format

2019-02-20 Thread Randy Dunlap

On 2/20/19 4:07 PM, Kees Cook wrote:
> On Mon, Feb 18, 2019 at 3:24 PM Tobin C. Harding  wrote:
>>
>> Currently the docstring comments for strscpy() are not in the correct
>> format.  Prior to working on this file fix up the docstring.
>>
>> Use correct docstring format for strscpy().
> 
> Is this attached to "make htmldocs" anywhere? Maybe in the device
> driver api doc? That's where I put refcount_t. See
> driver-api/basics.rst and put something like:
> 
> String Handling
> 
> 
> .. kernel-doc:: lib/string.c
>:internal:
> 
> and add that chunk to this patch.

It's already in Documentation/core-api/kernel-api.rst, under
"String Manipulation."

> Acked-by: Kees Cook 
> 
> -Kees
> 
>>
>> Signed-off-by: Tobin C. Harding 
>> ---
>>  lib/string.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/lib/string.c b/lib/string.c
>> index 7f1d72db53c5..65969cf32f5d 100644
>> --- a/lib/string.c
>> +++ b/lib/string.c
>> @@ -159,11 +159,9 @@ EXPORT_SYMBOL(strlcpy);
>>   * @src: Where to copy the string from
>>   * @count: Size of destination buffer
>>   *
>> - * Copy the string, or as much of it as fits, into the dest buffer.
>> - * The routine returns the number of characters copied (not including
>> - * the trailing NUL) or -E2BIG if the destination buffer wasn't big enough.
>> - * The behavior is undefined if the string buffers overlap.
>> - * The destination buffer is always NUL terminated, unless it's zero-sized.
>> + * Copy the string, or as much of it as fits, into the dest buffer.  The
>> + * behavior is undefined if the string buffers overlap.  The destination
>> + * buffer is always NUL terminated, unless it's zero-sized.
>>   *
>>   * Preferred to strlcpy() since the API doesn't require reading memory
>>   * from the src string beyond the specified "count" bytes, and since
>> @@ -175,6 +173,9 @@ EXPORT_SYMBOL(strlcpy);
>>   * doesn't unnecessarily force the tail of the destination buffer to be
>>   * zeroed.  If the zeroing is desired, it's likely cleaner to use strscpy(),
>>   * check the return size, then just memset() the tail of the dest buffer.
>> + *
>> + * Return: The number of characters copied (not including the trailing
>> + * NUL) or -E2BIG if the destination buffer wasn't big enough.
>>   */
>>  ssize_t strscpy(char *dest, const char *src, size_t count)
>>  {
>> --
>> 2.20.1
>>
> 
> 


-- 
~Randy

[PATCH] mm/cma_debug: Avoid to use global cma_debugfs_root

2019-02-20 Thread Yue Hu

From: Yue Hu 

Currently cma_debugfs_root is at global space. That is unnecessary
since it will be only used by next cma_debugfs_add_one(). We can
just pass it to following calling, it will save global space. Also
remove useless idx parameter.

Signed-off-by: Yue Hu 
---
 mm/cma_debug.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/mm/cma_debug.c b/mm/cma_debug.c
index f234672..2c2c869 100644
--- a/mm/cma_debug.c
+++ b/mm/cma_debug.c
@@ -21,8 +21,6 @@ struct cma_mem {
unsigned long n;
 };
 
-static struct dentry *cma_debugfs_root;
-
 static int cma_debugfs_get(void *data, u64 *val)
 {
unsigned long *p = data;
@@ -162,7 +160,7 @@ static int cma_alloc_write(void *data, u64 val)
 }
 DEFINE_SIMPLE_ATTRIBUTE(cma_alloc_fops, NULL, cma_alloc_write, "%llu\n");
 
-static void cma_debugfs_add_one(struct cma *cma, int idx)
+static void cma_debugfs_add_one(struct cma *cma, struct dentry *root_dentry)
 {
struct dentry *tmp;
char name[16];
@@ -170,7 +168,7 @@ static void cma_debugfs_add_one(struct cma *cma, int idx)
 
scnprintf(name, sizeof(name), "cma-%s", cma->name);
 
-   tmp = debugfs_create_dir(name, cma_debugfs_root);
+   tmp = debugfs_create_dir(name, root_dentry);
 
debugfs_create_file("alloc", 0200, tmp, cma, _alloc_fops);
debugfs_create_file("free", 0200, tmp, cma, _free_fops);
@@ -188,6 +186,7 @@ static void cma_debugfs_add_one(struct cma *cma, int idx)
 
 static int __init cma_debugfs_init(void)
 {
+   struct dentry *cma_debugfs_root;
int i;
 
cma_debugfs_root = debugfs_create_dir("cma", NULL);
@@ -195,7 +194,7 @@ static int __init cma_debugfs_init(void)
return -ENOMEM;
 
for (i = 0; i < cma_area_count; i++)
-   cma_debugfs_add_one(_areas[i], i);
+   cma_debugfs_add_one(_areas[i], cma_debugfs_root);
 
return 0;
 }
-- 
1.9.1

[PATCH] Documentation: fix admin-guide/README.rst minimum gcc version requirement

2019-02-20 Thread Randy Dunlap

From: Randy Dunlap 

Fix minimum gcc version as specified in Documentation/process/changes.rst.

Suggested-by: Matthew Wilcox 
Signed-off-by: Randy Dunlap 
---
 Documentation/admin-guide/README.rst |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- lnx-50-rc7.orig/Documentation/admin-guide/README.rst
+++ lnx-50-rc7/Documentation/admin-guide/README.rst
@@ -251,7 +251,7 @@ Configuring the kernel
 Compiling the kernel
 
 
- - Make sure you have at least gcc 3.2 available.
+ - Make sure you have at least gcc 4.6 available.
For more information, refer to :ref:`Documentation/process/changes.rst 
`.
 
Please note that you can still run a.out user programs with this kernel.

[PATCH] mm/cma_debug: Check for null tmp in cma_debugfs_add_one()

2019-02-20 Thread Yue Hu

From: Yue Hu 

If debugfs_create_dir() failed, the following debugfs_create_file()
will be meanless since it depends on non-NULL tmp dentry and it will
only waste CPU resource.

Signed-off-by: Yue Hu 
---
 mm/cma_debug.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/cma_debug.c b/mm/cma_debug.c
index 2c2c869..3e9d984 100644
--- a/mm/cma_debug.c
+++ b/mm/cma_debug.c
@@ -169,6 +169,8 @@ static void cma_debugfs_add_one(struct cma *cma, struct 
dentry *root_dentry)
scnprintf(name, sizeof(name), "cma-%s", cma->name);
 
tmp = debugfs_create_dir(name, root_dentry);
+   if (!tmp)
+   return;
 
debugfs_create_file("alloc", 0200, tmp, cma, _alloc_fops);
debugfs_create_file("free", 0200, tmp, cma, _free_fops);
-- 
1.9.1

Re: [PATCH 05/32] locking/lockdep: Prepare valid_state() to handle plain masks

2019-02-20 Thread Frederic Weisbecker

On Wed, Feb 13, 2019 at 11:47:13AM -0800, Linus Torvalds wrote:
> On Wed, Feb 13, 2019 at 7:16 AM Frederic Weisbecker  
> wrote:
> > >
> > > If "vectors" only has the high hit set, you end up with "fs" having
> > > the value "64".
> > >
> > > And then "vectors >>= fs" is undefined and won't actually do anything
> > > at all on x86.
> >
> > Oh! ok didn't know that...
> 
> So in general, shift counts >= width of the type (or negative) are undefined.
> 
> They can sometimes happen to work (that's the "undefined" part ;), but
> it's not reliable or portable.
> 
> It's why you occasionally see things like
> 
> drivers/block/sx8.c:
> tmp = (blk_rq_pos(rq) >> 16) >> 16;
> 
> to get the upper 32 bits of the value. It is written with that odd
> double shift, rather than being written as ">> 32". That way it works
> even if the sector type happens to be 32-bit (and the compiler will
> just end up turning it into a zero if it's an unsigned 32-bit type
> since it's compile-time obvious).

Ok, I see.

> 
> > I see, perhaps I should use for_each_set_bit() that should take care about 
> > those
> > details?
> 
> That would _work_, but don't do that. "for_each_set_bit()" works on
> bitmaps in memory, and is slow for a simple word case. In addition to
> being slow, it uses the Linux tradition of working on bitmaps that are
> comprised of "unsigned long". So it has byte order issues too.
> 
> So for_each_set_bit() is useful when you have real arrays of bits and
> are using the "set_bit()" etc interfaces.

Yeah I suspected some overhead.

> 
> When you're actually working on just a single variable, your "__ffs()"
> model works fine, you just need to be careful to _not_ do the "+1" and
> then use it for shifts.
> 
> Also, it actually turns out that if you want to be really clever, you
> can play tricks if you don't care about the exact bit *number*.
> 
> For example, this expression:
> 
>v =  a & (a-1);
> 
> will remove the lowest bit set from 'a' very cheaply. So what you can
> do is something like this:
> 
> void for_each_bit_in_mask(u64 mask)
> {
> while (mask) {
> u64 newmask = mask & (mask-1);
> u64 onebit = mask ^ newmask;
> mask = newmask;
> do_something_with(onebit);
> }
> }
> 
> to do some operation on each bit set, and quite efficiently and
> without any undefined behavior or expensive shifts.
> 
> But the above trick does require that you are happy to just see the
> bit *mask*, not the bit *number*. I'm not sure any of your cases match
> that.

Nice, I couldn't resist introducing such a headache in my set ;-) unfortunately
I indeed need the bit number itself most of the time. 

So following your 1st advice, I should rather do something along the lines of:

   nr = 0;
   while (mask) {
   fs = __ffs64(mask);
   mask >>= fs;
   mask >>= 1;
   nr += fs + 1;
   process_bit_nr(nr - 1);
   }

And define a for_each_lock_usage_bit(usage_mask) on top of it.

Thanks a lot!

Re: [LKP] [driver core] 570d020012: will-it-scale.per_thread_ops -12.2% regression

2019-02-20 Thread Wei Yang

On Thu, Feb 21, 2019 at 11:10:49AM +0800, kernel test robot wrote:
>On Tue, Feb 19, 2019 at 01:19:04PM +0100, Greg Kroah-Hartman wrote:
>> On Tue, Feb 19, 2019 at 08:59:45AM +0800, Wei Yang wrote:
>> > On Mon, Feb 18, 2019 at 03:54:42PM +0800, kernel test robot wrote:
>> > >Greeting,
>> > >
>> > >FYI, we noticed a -12.2% regression of will-it-scale.per_thread_ops due 
>> > >to commit:
>> > >
>> > >
>> > >commit: 570d0200123fb4f809aa2f6226e93a458d664d70 ("driver core: move 
>> > >device->knode_class to device_private")
>> > >https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>> > >
>> > 
>> > This is interesting.
>> > 
>> > I didn't expect the move of this field will impact the performance.
>> > 
>> > The reason is struct device is a hotter memory than device->device_private?
>> > 
>> > >in testcase: will-it-scale
>> > >on test machine: 288 threads Knights Mill with 80G memory
>> > >with following parameters:
>> > >
>> > >  nr_task: 100%
>> > >  mode: thread
>> > >  test: unlink2
>> > >  cpufreq_governor: performance
>> > >
>> > >test-description: Will It Scale takes a testcase and runs it from 1 
>> > >through to n parallel copies to see if the testcase will scale. It builds 
>> > >both a process and threads based test in order to see any differences 
>> > >between the two.
>> > >test-url: https://github.com/antonblanchard/will-it-scale
>> > >
>> > >In addition to that, the commit also has significant impact on the 
>> > >following tests:
>> > >
>> > >+--+---+
>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -29.9% 
>> > >regression |
>> > >| test machine | 288 threads Knights Mill with 80G memory 
>> > > |
>> > >| test parameters  | cpufreq_governor=performance 
>> > > |
>> > >|  | mode=thread  
>> > > |
>> > >|  | nr_task=100% 
>> > > |
>> > >|  | test=signal1 
>> > > |
>> 
>> Ok, I'm going to blame your testing system, or something here, and not
>> the above patch.
>> 
>> All this test does is call raise(3).  That does not touch the driver
>> core at all.
>> 
>> > >+--+---+
>> > >| testcase: change | will-it-scale: will-it-scale.per_thread_ops -16.5% 
>> > >regression |
>> > >| test machine | 288 threads Knights Mill with 80G memory 
>> > > |
>> > >| test parameters  | cpufreq_governor=performance 
>> > > |
>> > >|  | mode=thread  
>> > > |
>> > >|  | nr_task=100% 
>> > > |
>> > >|  | test=open1   
>> > > |
>> > >+--+---+
>> 
>> Same here, open1 just calls open/close a lot.  No driver core
>> interaction at all there either.
>> 
>> So are you _sure_ this is the offending patch?
>
>Hi Greg,
>
>We did an experiment, recovered the layout of struct device. and we
>found the regression is gone. I guess the regession is not from the
>patch but related to the struct layout.
>
>
>tests: 1
>testcase/path_params/tbox_group/run: 
>will-it-scale/performance-thread-100%-unlink2/lkp-knm01
>
>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>  --  
> %stddev  change %stddev
> \  |\  
>237096  14% 270789will-it-scale.workload
>   823  14%939will-it-scale.per_thread_ops
>

Do you have the comparison between a36dc70b810afe9183de2ea18f and the one
before 570d020012?

>
>tests: 1
>testcase/path_params/tbox_group/run: 
>will-it-scale/performance-thread-100%-signal1/lkp-knm01
>
>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>  --  
> %stddev  change %stddev
> \  |\  
> 93.51 ±  3%48% 138.53 ±  3%  will-it-scale.time.user_time
>   186  40%261will-it-scale.per_thread_ops
> 53909  40%  75507will-it-scale.workload
>
>
>tests: 1
>testcase/path_params/tbox_group/run: 
>will-it-scale/performance-thread-100%-open1/lkp-knm01
>
>570d0200123fb4f8  a36dc70b810afe9183de2ea18f  
>  --  
> %stddev  change %stddev
> \  |\  
>447722  22% 546258 ± 10%  
> will-it-scale.time.involuntary_context_switches
>226995  19% 269751

Re: [PATCH] cpufreq: kyro: Reduce frame-size of qcom_cpufreq_kryo_probe()

2019-02-20 Thread Viresh Kumar

On 20-02-19, 21:56, Amit Kucheria wrote:
> On Wed, Feb 20, 2019 at 4:44 PM Viresh Kumar  wrote:
> >
> > With the introduction of commit 846a415bf440 ("arm64: default NR_CPUS to
> > 256"), we have started getting following compilation warning:
> >
> > qcom-cpufreq-kryo.c:168:1: warning: the frame size of 2160 bytes is larger 
> > than 2048 bytes [-Wframe-larger-than=]
> >
> > Fix that by dynamically allocating opp_tables and freeing it later.
> >
> > Compile tested only.
> >
> > Signed-off-by: Viresh Kumar 
> > ---
> >  drivers/cpufreq/qcom-cpufreq-kryo.c | 9 -
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-kryo.c 
> > b/drivers/cpufreq/qcom-cpufreq-kryo.c
> > index 1c8583cc06a2..6888cb6db2ef 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-kryo.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-kryo.c
> > @@ -75,7 +75,7 @@ static enum _msm8996_version 
> > qcom_cpufreq_kryo_get_msm_id(void)
> >
> >  static int qcom_cpufreq_kryo_probe(struct platform_device *pdev)
> >  {
> > -   struct opp_table *opp_tables[NR_CPUS] = {0};
> > +   struct opp_table **opp_tables;
> > enum _msm8996_version msm8996_version;
> > struct nvmem_cell *speedbin_nvmem;
> > struct device_node *np;
> > @@ -133,6 +133,10 @@ static int qcom_cpufreq_kryo_probe(struct 
> > platform_device *pdev)
> > }
> > kfree(speedbin);
> >
> > +   opp_tables = kcalloc(num_possible_cpus(), sizeof(*opp_tables), 
> > GFP_KERNEL);
> > +   if (!opp_tables)
> > +   return -ENOMEM;
> > +
> 
> Perhaps add a comment above that that actual opp_table is allocated in
> the loop below because of dev_pm_opp_set_supported_hw?
> 
> I was staring at this for a few minutes wondering why you needed this
> kcalloc before I realised that opp_tables (missed the 's') is a
> temporary array of pointers. :-)

I feel that you got confused because this patch didn't had the diff
where the opp_tables thing is getting used. When we see the .c file
itself, it is pretty much clear on what is going on and I believe the
comment would be totally unnecessary and redundant.

This is how it looks now, please lemme know if you still prefer the
comment :)

opp_tables = kcalloc(num_possible_cpus(), sizeof(*opp_tables), 
GFP_KERNEL);
if (!opp_tables)
return -ENOMEM;

for_each_possible_cpu(cpu) {
cpu_dev = get_cpu_device(cpu);
if (NULL == cpu_dev) {
ret = -ENODEV;
goto free_opp;
}

opp_tables[cpu] = dev_pm_opp_set_supported_hw(cpu_dev,
  , 1);
if (IS_ERR(opp_tables[cpu])) {
ret = PTR_ERR(opp_tables[cpu]);
dev_err(cpu_dev, "Failed to set supported hardware\n");
goto free_opp;
}
}

kfree(opp_tables);


-- 
viresh

[PATCH] : drop the gcc-3.3 'const' hack in roundup()

2019-02-20 Thread Randy Dunlap

From: Randy Dunlap 

The single quotation marks around "const" were causing a
documentation markup warning with reST.  Instead of fixing that
warning, just delete that comment line and the gcc-3.3 hack of
using "const" in the roundup() macro since gcc-3.3 is no longer
supported for kernel builds.

I did around 20 different $arch builds with no problems, but
we'll just have to see if this causes problems for anyone else
out there.

Suggested-by: Matthew Wilcox 
Signed-off-by: Randy Dunlap 
---
 include/linux/kernel.h |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- lnx-50-rc7.orig/include/linux/kernel.h
+++ lnx-50-rc7/include/linux/kernel.h
@@ -133,12 +133,10 @@
  *
  * Rounds @x up to next multiple of @y. If @y will always be a power
  * of 2, consider using the faster round_up().
- *
- * The `const' here prevents gcc-3.3 from calling __divdi3
  */
 #define roundup(x, y) (\
 {  \
-   const typeof(y) __y = y;\
+   typeof(y) __y = y;  \
(((x) + (__y - 1)) / __y) * __y;\
 }  \
 )

RE: [PATCH v3] usb: chipidea: Grab the (legacy) USB PHY by phandle first

2019-02-20 Thread Peter Chen

 
> 
> On Mon, 2019-02-18 at 03:04 +, Peter Chen wrote:
> > > According to the chipidea driver bindings, the USB PHY is specified via 
> > > the
> "phys"
> > > phandle node. However, this only takes effect for USB PHYs that use
> > > the common PHY framework. For legacy USB PHYs, a simple lookup based
> > > on the USB PHY type is done instead.
> > >
> > > This does not play out well when more than one USB PHY is
> > > registered, since the first registered PHY matching the type will
> > > always be returned regardless of what the driver was bound to.
> > >
> > > Fix this by looking up the PHY based on the "phys" phandle node.
> > > Although generic PHYs are rather matched by their "phys-name" and not the
> "phys"
> > > phandle directly, there is no helper for similar lookup on legacy
> > > PHYs and it's probably not worth the effort to add it.
> > >
> > > When no legacy USB PHY is found by phandle, fallback to grabbing any
> > > registered
> > > USB2 PHY. This ensures backward compatibility if some users were
> > > actually relying on this mechanism.
> > >
> > > Signed-off-by: Paul Kocialkowski 
> > > ---
> > > Changes since v2:
> > > * Fixed typos in commit message.
> > >
> > > Changes since v1:
> > > * Only consider legacy USB PHY error for fallback as suggested;
> > > * Checked against EPROBE_DEFER before entering fallback.
> > >
> > >  drivers/usb/chipidea/core.c | 9 -
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/usb/chipidea/core.c
> > > b/drivers/usb/chipidea/core.c index 7bfcbb23c2a4..016e4004fe9d
> > > 100644
> > > --- a/drivers/usb/chipidea/core.c
> > > +++ b/drivers/usb/chipidea/core.c
> > > @@ -954,8 +954,15 @@ static int ci_hdrc_probe(struct platform_device 
> > > *pdev)
> > >   } else if (ci->platdata->usb_phy) {
> > >   ci->usb_phy = ci->platdata->usb_phy;
> > >   } else {
> > > + ci->usb_phy = devm_usb_get_phy_by_phandle(dev->parent, "phys",
> > > +   0);
> > >   ci->phy = devm_phy_get(dev->parent, "usb-phy");
> > > - ci->usb_phy = devm_usb_get_phy(dev->parent,
> > > USB_PHY_TYPE_USB2);
> > > +
> > > + /* Fallback to grabbing any registered USB2 PHY */
> > > + if (IS_ERR(ci->usb_phy) &&
> > > + PTR_ERR(ci->usb_phy) != -EPROBE_DEFER)
> > > + ci->usb_phy = devm_usb_get_phy(dev->parent,
> > > +USB_PHY_TYPE_USB2);
> > >
> >
> > I think you may need to do above if ci->phy is error, and not the error is -
> EPROBE_DEFER.
> 
> As Thomas pointed out during the review of v1, the initial flow is to try and 
> get both
> ci->usb_phy and ci->phy and let the code use ci->phy in priority later.
> 

If there is a generic PHY node under USB controller, and there is a USB PHY
at other sides, both ci->phy and ci->usb_phy are valid, I original thought it is
the problem you met.

Since you are trying to fix the legacy USB PHY grab issue, I hope you could 
consider
the generic PHY as well.

Peter
 

> This change attempts to keep this flow intact. The EPROBE_DEFER check is in
> case the initial ci->usb_phy is valid but deferred: we don't want to 
> overwrite it by
> calling devm_usb_get_phy which might return an actual error and result in 
> losing
> the EPROBE_DEFER information.
> 
> Does that make sense to you?
> 
> Cheers,
> 
> Paul
> 
> > Peter
> >
> > >   /* if both generic PHY and USB PHY layers aren't enabled */
> > >   if (PTR_ERR(ci->phy) == -ENOSYS &&
> > > --
> > > 2.20.1
> --
> Paul Kocialkowski, Bootlin
> Embedded Linux and kernel engineering
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbootlin.com
> data=02%7C01%7Cpeter.chen%40nxp.com%7Cadca65357daa4fe678dc08d
> 6957b9c49%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6368607554
> 66805805sdata=c%2FdLqUhFxe1yz7lrqUoXZCvINiq1rdKJmMAuaH6Fr1k%3
> Dreserved=0

[PATCH -next] drm/nouveau/dmem: remove set but not used variable 'drm'

2019-02-20 Thread YueHaibing

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/gpu/drm/nouveau/nouveau_dmem.c: In function 'nouveau_dmem_free':
drivers/gpu/drm/nouveau/nouveau_dmem.c:103:22: warning:
 variable 'drm' set but not used [-Wunused-but-set-variable]
  struct nouveau_drm *drm;
  ^

Signed-off-by: YueHaibing 
---
 drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c 
b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index aa9fec80492d..900a302b7ce9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -100,12 +100,10 @@ static void
 nouveau_dmem_free(struct hmm_devmem *devmem, struct page *page)
 {
struct nouveau_dmem_chunk *chunk;
-   struct nouveau_drm *drm;
unsigned long idx;
 
chunk = (void *)hmm_devmem_page_get_drvdata(page);
idx = page_to_pfn(page) - chunk->pfn_first;
-   drm = chunk->drm;
 
/*
 * FIXME:

[PATCHv7 3/4] pci: layerscape: Add the EP mode support.

2019-02-20 Thread Xiaowei Bao

Add the PCIe EP mode support for layerscape platform.

Signed-off-by: Xiaowei Bao 
Reviewed-by: Minghuan Lian 
Reviewed-by: Zhiqiang Hou 
Reviewed-by: Kishon Vijay Abraham I 
---
depends on: https://patchwork.kernel.org/project/linux-pci/list/?series=66177

v2:
 - remove the EP mode check function.
v3:
 - modif the return value when enter default case.
v4:
 - no change.
v5:
 - no change.
v6:
 - modify the code base on the submit patch of the EP framework.
v7:
 - fix up the compile warning issue.

 drivers/pci/controller/dwc/Makefile|2 +-
 drivers/pci/controller/dwc/pci-layerscape-ep.c |  156 
 2 files changed, 157 insertions(+), 1 deletions(-)
 create mode 100644 drivers/pci/controller/dwc/pci-layerscape-ep.c

diff --git a/drivers/pci/controller/dwc/Makefile 
b/drivers/pci/controller/dwc/Makefile
index 7bcdcdf..b5f3b83 100644
--- a/drivers/pci/controller/dwc/Makefile
+++ b/drivers/pci/controller/dwc/Makefile
@@ -8,7 +8,7 @@ obj-$(CONFIG_PCI_EXYNOS) += pci-exynos.o
 obj-$(CONFIG_PCI_IMX6) += pci-imx6.o
 obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o
 obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone.o
-obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o
+obj-$(CONFIG_PCI_LAYERSCAPE) += pci-layerscape.o pci-layerscape-ep.o
 obj-$(CONFIG_PCIE_QCOM) += pcie-qcom.o
 obj-$(CONFIG_PCIE_ARMADA_8K) += pcie-armada8k.o
 obj-$(CONFIG_PCIE_ARTPEC6) += pcie-artpec6.o
diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
new file mode 100644
index 000..a42c9c3
--- /dev/null
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCIe controller EP driver for Freescale Layerscape SoCs
+ *
+ * Copyright (C) 2018 NXP Semiconductor.
+ *
+ * Author: Xiaowei Bao 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pcie-designware.h"
+
+#define PCIE_DBI2_OFFSET   0x1000  /* DBI2 base address*/
+
+struct ls_pcie_ep {
+   struct dw_pcie  *pci;
+};
+
+#define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
+
+static int ls_pcie_establish_link(struct dw_pcie *pci)
+{
+   return 0;
+}
+
+static const struct dw_pcie_ops ls_pcie_ep_ops = {
+   .start_link = ls_pcie_establish_link,
+};
+
+static const struct of_device_id ls_pcie_ep_of_match[] = {
+   { .compatible = "fsl,ls-pcie-ep",},
+   { },
+};
+
+static const struct pci_epc_features ls_pcie_epc_features = {
+   .linkup_notifier = false,
+   .msi_capable = true,
+   .msix_capable = false,
+};
+
+static const struct pci_epc_features*
+ls_pcie_ep_get_features(struct dw_pcie_ep *ep)
+{
+   return _pcie_epc_features;
+}
+
+static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+   enum pci_barno bar;
+
+   for (bar = BAR_0; bar <= BAR_5; bar++)
+   dw_pcie_ep_reset_bar(pci, bar);
+}
+
+static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
+ enum pci_epc_irq_type type, u16 interrupt_num)
+{
+   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
+
+   switch (type) {
+   case PCI_EPC_IRQ_LEGACY:
+   return dw_pcie_ep_raise_legacy_irq(ep, func_no);
+   case PCI_EPC_IRQ_MSI:
+   return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
+   case PCI_EPC_IRQ_MSIX:
+   return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
+   default:
+   dev_err(pci->dev, "UNKNOWN IRQ type\n");
+   return -EINVAL;
+   }
+}
+
+static struct dw_pcie_ep_ops pcie_ep_ops = {
+   .ep_init = ls_pcie_ep_init,
+   .raise_irq = ls_pcie_ep_raise_irq,
+   .get_features = ls_pcie_ep_get_features,
+};
+
+static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
+   struct platform_device *pdev)
+{
+   struct dw_pcie *pci = pcie->pci;
+   struct device *dev = pci->dev;
+   struct dw_pcie_ep *ep;
+   struct resource *res;
+   int ret;
+
+   ep = >ep;
+   ep->ops = _ep_ops;
+
+   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
+   if (!res)
+   return -EINVAL;
+
+   ep->phys_base = res->start;
+   ep->addr_size = resource_size(res);
+
+   ret = dw_pcie_ep_init(ep);
+   if (ret) {
+   dev_err(dev, "failed to initialize endpoint\n");
+   return ret;
+   }
+
+   return 0;
+}
+
+static int __init ls_pcie_ep_probe(struct platform_device *pdev)
+{
+   struct device *dev = >dev;
+   struct dw_pcie *pci;
+   struct ls_pcie_ep *pcie;
+   struct resource *dbi_base;
+   int ret;
+
+   pcie = devm_kzalloc(dev, sizeof(*pcie), GFP_KERNEL);
+   if (!pcie)
+   return -ENOMEM;
+
+   pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
+   if (!pci)
+

[PATCHv7 2/4] arm64: dts: Add the PCIE EP node in dts

2019-02-20 Thread Xiaowei Bao

Add the PCIE EP node in dts for ls1046a.

Signed-off-by: Xiaowei Bao 
Reviewed-by: Minghuan Lian 
Reviewed-by: Zhiqiang Hou 
Reviewed-by: Rob Herring 
---
v2:
 - Add the SoC specific compatibles. 
v3:
 - no change
v4:
 - no change
v5:
 - change the OB win number due to the RM update.
v6:
 - no change
v7:
 - no change

 arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi |   34 +++-
 1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi 
b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 9a2106e..576262e 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
@@ -657,6 +657,17 @@
status = "disabled";
};
 
+   pcie_ep@340 {
+   compatible = "fsl,ls1046a-pcie-ep","fsl,ls-pcie-ep";
+   reg = <0x00 0x0340 0x0 0x0010
+   0x40 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <8>;
+   num-lanes = <2>;
+   status = "disabled";
+   };
+
pcie@350 {
compatible = "fsl,ls1046a-pcie";
reg = <0x00 0x0350 0x0 0x0010   /* controller 
registers */
@@ -683,6 +694,17 @@
status = "disabled";
};
 
+   pcie_ep@350 {
+   compatible = "fsl,ls1046a-pcie-ep","fsl,ls-pcie-ep";
+   reg = <0x00 0x0350 0x0 0x0010
+   0x48 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <8>;
+   num-lanes = <2>;
+   status = "disabled";
+   };
+
pcie@360 {
compatible = "fsl,ls1046a-pcie";
reg = <0x00 0x0360 0x0 0x0010   /* controller 
registers */
@@ -709,6 +731,17 @@
status = "disabled";
};
 
+   pcie_ep@360 {
+   compatible = "fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep";
+   reg = <0x00 0x0360 0x0 0x0010
+   0x50 0x 0x8 0x>;
+   reg-names = "regs", "addr_space";
+   num-ib-windows = <6>;
+   num-ob-windows = <8>;
+   num-lanes = <2>;
+   status = "disabled";
+   };
+
qdma: dma-controller@838 {
compatible = "fsl,ls1046a-qdma", "fsl,ls1021a-qdma";
reg = <0x0 0x838 0x0 0x1000>, /* Controller regs */
@@ -729,7 +762,6 @@
queue-sizes = <64 64>;
big-endian;
};
-
};
 
reserved-memory {
-- 
1.7.1

[PATCHv7 4/4] misc: pci_endpoint_test: Add the layerscape EP device support

2019-02-20 Thread Xiaowei Bao

Add the layerscape EP device support in pci_endpoint_test driver.

Signed-off-by: Xiaowei Bao 
Reviewed-by: Minghuan Lian 
Reviewed-by: Zhiqiang Hou 
Reviewed-by: Greg KH 
---
v2:
 - no change
v3:
 - no change
v4:
 - delate the comments.
v5:
 - no change.
v6:
 - no change.
v7:
 - no change.

 drivers/misc/pci_endpoint_test.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index 896e2df..29582fe 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -788,6 +788,7 @@ static void pci_endpoint_test_remove(struct pci_dev *pdev)
 static const struct pci_device_id pci_endpoint_test_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA74x) },
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA72x) },
+   { PCI_DEVICE(PCI_VENDOR_ID_FREESCALE, 0x81c0) },
{ PCI_DEVICE(PCI_VENDOR_ID_SYNOPSYS, 0xedda) },
{ }
 };
-- 
1.7.1

[PATCHv7 1/4] dt-bindings: add DT binding for the layerscape PCIe controller with EP mode

2019-02-20 Thread Xiaowei Bao

Add the documentation for the Device Tree binding for the layerscape PCIe
controller with EP mode.

Signed-off-by: Xiaowei Bao 
Reviewed-by: Minghuan Lian 
Reviewed-by: Zhiqiang Hou 
Reviewed-by: Rob Herring 
---
v2:
 - Add the SoC specific compatibles.
v3:
 - modify the commit message.
v4:
 - no change.
v5:
 - no change.
v6:
 - no change.
v7:
 - no change.

 .../devicetree/bindings/pci/layerscape-pci.txt |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/layerscape-pci.txt 
b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
index 9b2b8d6..e20ceaa 100644
--- a/Documentation/devicetree/bindings/pci/layerscape-pci.txt
+++ b/Documentation/devicetree/bindings/pci/layerscape-pci.txt
@@ -13,6 +13,7 @@ information.
 
 Required properties:
 - compatible: should contain the platform identifier such as:
+  RC mode:
 "fsl,ls1021a-pcie"
 "fsl,ls2080a-pcie", "fsl,ls2085a-pcie"
 "fsl,ls2088a-pcie"
@@ -20,6 +21,8 @@ Required properties:
 "fsl,ls1046a-pcie"
 "fsl,ls1043a-pcie"
 "fsl,ls1012a-pcie"
+  EP mode:
+   "fsl,ls1046a-pcie-ep", "fsl,ls-pcie-ep"
 - reg: base addresses and lengths of the PCIe controller register blocks.
 - interrupts: A list of interrupt outputs of the controller. Must contain an
   entry for each entry in the interrupt-names property.
-- 
1.7.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1140 matches

Mail list logo