Re: [PATCH v1] dmaengine: imx-sdma: refine to load context only once

2019-01-04 Thread Vinod Koul
On 06-12-18, 06:36, Robin Gong wrote:
> The context loaded only one time before channel running,but
^^
space after comma please

> currently sdma_config_channel() and dma_prep_* duplicated with
> sdma_load_context(), so refine it to load context only one time
> before channel running and reload after the channel terminated.

I was going to apply it, but it fails to apply for me, please rebase and
send

-- 
~Vinod


Re: [PATCH] scsi/mvsas/mv_init.c: Use dma_zalloc_coherent

2019-01-04 Thread Sabyasachi Gupta
On Fri, Jan 4, 2019 at 6:43 PM John Garry  wrote:
>
> On 04/01/2019 12:48, Sabyasachi Gupta wrote:
> > On Wed, Dec 19, 2018 at 6:49 PM Sabyasachi Gupta
> >  wrote:
> >>
> >> On Sat, Dec 1, 2018 at 6:40 PM Sabyasachi Gupta
> >>  wrote:
> >>>
> >>> On Wed, Nov 21, 2018 at 7:18 PM Sabyasachi Gupta
> >>>  wrote:
> 
>  Replace dma_alloc_coherent + memset with dma_zalloc_coherent
> 
>
> If you're going to make this change, then how about change these to the
> managed version, so that we can avoid the explicit free'ing at driver
> removal?

I can't get it

>
>  Signed-off-by: Sabyasachi Gupta 
> >>>
> >>> Any comment on this patch?
> >>
> >> Any comment on this patch?
> >
> > Any comment on this patch?
> >
> >>
> >>>
>  ---
>   drivers/scsi/mvsas/mv_init.c | 12 
>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
>  diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
>  index 3ac3437..495bddb 100644
>  --- a/drivers/scsi/mvsas/mv_init.c
>  +++ b/drivers/scsi/mvsas/mv_init.c
>  @@ -253,33 +253,29 @@ static int mvs_alloc(struct mvs_info *mvi, struct 
>  Scsi_Host *shost)
>  /*
>   * alloc and init our DMA areas
>   */
>  -   mvi->tx = dma_alloc_coherent(mvi->dev,
>  +   mvi->tx = dma_zalloc_coherent(mvi->dev,
>   sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ,
>   >tx_dma, GFP_KERNEL);
>
> I'm guessing that this does not pass checkpatch with --strict option.
>
> Thanks,
> John

I have not not checked with --strict option

>
>  if (!mvi->tx)
>  goto err_out;
>  -   memset(mvi->tx, 0, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ);
>  -   mvi->rx_fis = dma_alloc_coherent(mvi->dev, MVS_RX_FISL_SZ,
>  +   mvi->rx_fis = dma_zalloc_coherent(mvi->dev, MVS_RX_FISL_SZ,
>   >rx_fis_dma, GFP_KERNEL);
>  if (!mvi->rx_fis)
>  goto err_out;
>  -   memset(mvi->rx_fis, 0, MVS_RX_FISL_SZ);
> 
>  -   mvi->rx = dma_alloc_coherent(mvi->dev,
>  +   mvi->rx = dma_zalloc_coherent(mvi->dev,
>   sizeof(*mvi->rx) * (MVS_RX_RING_SZ 
>  + 1),
>   >rx_dma, GFP_KERNEL);
>  if (!mvi->rx)
>  goto err_out;
>  -   memset(mvi->rx, 0, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1));
>  mvi->rx[0] = cpu_to_le32(0xfff);
>  mvi->rx_cons = 0xfff;
> 
>  -   mvi->slot = dma_alloc_coherent(mvi->dev,
>  +   mvi->slot = dma_zalloc_coherent(mvi->dev,
> sizeof(*mvi->slot) * slot_nr,
> >slot_dma, GFP_KERNEL);
>  if (!mvi->slot)
>  goto err_out;
>  -   memset(mvi->slot, 0, sizeof(*mvi->slot) * slot_nr);
> 
>  mvi->bulk_buffer = dma_alloc_coherent(mvi->dev,
> TRASH_BUCKET_SIZE,
>  --
>  2.7.4
> 
> >
> > .
> >
>
>


Re: [PATCHv3 1/2] mm/memblock: extend the limit inferior of bottom-up after parsing hotplug attr

2019-01-04 Thread Mike Rapoport
On Thu, Jan 03, 2019 at 10:47:06AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Wed, Jan 02, 2019 at 07:05:38PM +0200, Mike Rapoport wrote:
> > I agree that currently the bottom-up allocation after the kernel text has
> > issues with KASLR. But this issues are not necessarily related to the
> > memory hotplug. Even with a single memory node, a bottom-up allocation will
> > fail if KASLR would put the kernel near the end of node0.
> > 
> > What I am trying to understand is whether there is a fundamental reason to
> > prevent allocations from [0, kernel_start)?
> > 
> > Maybe Tejun can recall why he suggested to start bottom-up allocations from
> > kernel_end.
> 
> That's from 79442ed189ac ("mm/memblock.c: introduce bottom-up
> allocation mode").  I wasn't involved in that patch, so no idea why
> the restrictions were added, but FWIW it doesn't seem necessary to me.

I should have added the reference [1] at the first place :)
Thanks!

[1] https://lore.kernel.org/lkml/20130904192215.gg26...@mtj.dyndns.org/
 
> Thanks.
> 
> -- 
> tejun
> 

-- 
Sincerely yours,
Mike.



Re: [PATCH v5 05/11] platform/x86: intel_pmc: Make PCI dependency explicit

2019-01-04 Thread Sinan Kaya

On 1/4/2019 9:20 AM, Andy Shevchenko wrote:

Unfortunately, this is not how it should be fixed.

First of all, we (Heikki, Sathya, me —  all in Cc list) that this
driver should be refactored and unified with intel_scu_ipc.
Then, it has two drivers inside and PCI is just a glue for it which is
optional. Thus, needs to be split accordingly.


OK. Two questions:

1. How should I change this now?
2. Can you change the Kconfig accordingly after you merge the two drivers?

I'm focused on the build failure at this moment.




Re: [PATCH] riscv: remove redundant kernel-space generic-y

2019-01-04 Thread Masahiro Yamada
On Fri, Jan 4, 2019 at 11:16 AM Palmer Dabbelt  wrote:
>
> On Sun, 16 Dec 2018 06:11:11 PST (-0800), yamada.masah...@socionext.com wrote:
> > This commit removes redundant generic-y defines in
> > arch/riscv/include/asm/Kbuild.
> >
> > [1] It is redundant to define the same generic-y in both
> > arch/$(ARCH)/include/asm/Kbuild and
> > arch/$(ARCH)/include/uapi/asm/Kbuild.
> >
> > Remove the following generic-y:
> >
> >   errno.h
> >   fcntl.h
> >   ioctl.h
> >   ioctls.h
> >   ipcbuf.h
> >   mman.h
> >   msgbuf.h
> >   param.h
> >   poll.h
> >   posix_types.h
> >   resource.h
> >   sembuf.h
> >   setup.h
> >   shmbuf.h
> >   signal.h
> >   socket.h
> >   sockios.h
> >   stat.h
> >   statfs.h
> >   swab.h
> >   termbits.h
> >   termios.h
> >   types.h
> >
> > [2] It is redundant to define generic-y when arch-specific
> > implementation exists in arch/$(ARCH)/include/asm/*.h
> >
> > Remove the following generic-y:
> >
> >   cacheflush.h
> >   dma-mapping.h
> >   module.h
> >
> > Signed-off-by: Masahiro Yamada 
> > ---
> >
> >  arch/riscv/include/asm/Kbuild | 26 --
> >  1 file changed, 26 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild
> > index 6a646d9..f7068f3 100644
> > --- a/arch/riscv/include/asm/Kbuild
> > +++ b/arch/riscv/include/asm/Kbuild
> > @@ -1,5 +1,4 @@
> >  generic-y += bugs.h
> > -generic-y += cacheflush.h
> >  generic-y += checksum.h
> >  generic-y += compat.h
> >  generic-y += cputime.h
> > @@ -7,18 +6,12 @@ generic-y += device.h
> >  generic-y += div64.h
> >  generic-y += dma.h
> >  generic-y += dma-contiguous.h
> > -generic-y += dma-mapping.h
> >  generic-y += emergency-restart.h
> > -generic-y += errno.h
> >  generic-y += exec.h
> >  generic-y += fb.h
> > -generic-y += fcntl.h
> >  generic-y += hardirq.h
> >  generic-y += hash.h
> >  generic-y += hw_irq.h
> > -generic-y += ioctl.h
> > -generic-y += ioctls.h
> > -generic-y += ipcbuf.h
> >  generic-y += irq_regs.h
> >  generic-y += irq_work.h
> >  generic-y += kdebug.h
> > @@ -27,34 +20,15 @@ generic-y += kvm_para.h
> >  generic-y += local.h
> >  generic-y += local64.h
> >  generic-y += mm-arch-hooks.h
> > -generic-y += mman.h
> > -generic-y += module.h
> > -generic-y += msgbuf.h
> >  generic-y += mutex.h
> > -generic-y += param.h
> >  generic-y += percpu.h
> > -generic-y += poll.h
> > -generic-y += posix_types.h
> >  generic-y += preempt.h
> > -generic-y += resource.h
> >  generic-y += scatterlist.h
> >  generic-y += sections.h
> > -generic-y += sembuf.h
> >  generic-y += serial.h
> > -generic-y += setup.h
> > -generic-y += shmbuf.h
> >  generic-y += shmparam.h
> > -generic-y += signal.h
> > -generic-y += socket.h
> > -generic-y += sockios.h
> > -generic-y += stat.h
> > -generic-y += statfs.h
> > -generic-y += swab.h
> > -generic-y += termbits.h
> > -generic-y += termios.h
> >  generic-y += topology.h
> >  generic-y += trace_clock.h
> > -generic-y += types.h
> >  generic-y += unaligned.h
> >  generic-y += user.h
> >  generic-y += vga.h
>
> Thanks.  These just sort of collected there because I hadn't trimmed them.  Is
> there a script that checks these?


No script is available for now, but
I'd like to turn on warnings to catch some cases:
https://patchwork.kernel.org/patch/10746823/


BTW, can I apply this to my tree
along with other cleanups?



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v3] mm/page_owner: fix for deferred struct page init

2019-01-04 Thread Qian Cai
On 1/4/19 8:09 AM, Michal Hocko wrote:
>> Here is the number without DEFERRED_STRUCT_PAGE_INIT.
>>
>> == page_ext_init() after page_alloc_init_late() ==
>> Node 0, zone DMA: page owner found early allocated 0 pages
>> Node 0, zone DMA32: page owner found early allocated 7009 pages
>> Node 0, zone Normal: page owner found early allocated 85827 pages
>> Node 4, zone Normal: page owner found early allocated 75063 pages
>>
>> == page_ext_init() before kmemleak_init() ==
>> Node 0, zone DMA: page owner found early allocated 0 pages
>> Node 0, zone DMA32: page owner found early allocated 6654 pages
>> Node 0, zone Normal: page owner found early allocated 41907 pages
>> Node 4, zone Normal: page owner found early allocated 41356 pages
>>
>> So, it told us that it will miss tens of thousands of early page allocation 
>> call
>> sites.
> 
> This is an answer for the first part of the question (how much). The
> second is _do_we_care_?

Well, the purpose of this simple "ugly" ifdef is to avoid a regression for the
existing page_owner users with DEFERRED_STRUCT_PAGE_INIT deselected that would
start to miss tens of thousands early page allocation call sites.

The other option I can think of to not hurt your eyes is to rewrite the whole
page_ext_init(), init_page_owner(), init_debug_guardpage() to use all early
functions, so it can work in both with DEFERRED_STRUCT_PAGE_INIT=y and without.
However, I have a hard-time to convince myself it is a sensible thing to do.


Re: [PATCH v6 1/7] dmaengine: xilinx_dma: commonize DMA copy size calculation

2019-01-04 Thread Vinod Koul
On 20-11-18, 16:31, Andrea Merello wrote:
> This patch removes a bit of duplicated code by introducing a new
> function that implements calculations for DMA copy size, and
> prepares for changes to the copy size calculation that will
> happen in following patches.

Applied all, thanks
-- 
~Vinod


Re: [PATCH ghak90 (was ghak32) V4 01/10] audit: collect audit task parameters

2019-01-04 Thread Richard Guy Briggs
On 2019-01-03 18:50, Guenter Roeck wrote:
> Hi Richard,
> 
> On Tue, Jul 31, 2018 at 04:07:36PM -0400, Richard Guy Briggs wrote:
> > The audit-related parameters in struct task_struct should ideally be
> > collected together and accessed through a standard audit API.
> > 
> > Collect the existing loginuid, sessionid and audit_context together in a
> > new struct audit_task_info called "audit" in struct task_struct.
> > 
> > Use kmem_cache to manage this pool of memory.
> > Un-inline audit_free() to be able to always recover that memory.
> > 
> > See: https://github.com/linux-audit/audit-kernel/issues/81
> > 
> > Signed-off-by: Richard Guy Briggs 
> 
> Overall I am not sure if keeping task_struct a bit smaller is worth
> the added complexity, but I guess that is just me. 

The motivation was to consolidate all the audit bits into one pointer,
isolating them from the rest of the kernel, restricting access only to
helper functions to prevent abuse by other subsystems and trying to
reduce kABI issues in the future.  I agree it is a bit more complex.  It
was provoked by the need to add contid which seemed to make the most
sense as a peer to loginuid and sessionid, and adding it to task_struct
would have made it a bit too generic and available.

This is addressed at some length by Paul Moore here in v2:
https://lkml.org/lkml/2018/4/18/759

> Anyway, couple of nitpicks. Please feel free to ignore, and my apologies
> if some of all of the comments are duplicates.

Noted.  They all look like reasonable improvements, particulaly the
unnecessary else and default return.  Thanks.  The double context check
may go away anyways based on the removal of audit_take_context() in
Paul's 2a1fe215e730 ("audit: use current whenever possible") which has
yet to be incorporated.

> Guenter
> 
> > ---
> >  include/linux/audit.h | 34 --
> >  include/linux/sched.h |  5 +
> >  init/init_task.c  |  3 +--
> >  init/main.c   |  2 ++
> >  kernel/auditsc.c  | 51 
> > ++-
> >  kernel/fork.c |  4 +++-
> >  6 files changed, 73 insertions(+), 26 deletions(-)
> > 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 9334fbe..8964332 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -219,8 +219,15 @@ static inline void audit_log_task_info(struct 
> > audit_buffer *ab,
> >  
> >  /* These are defined in auditsc.c */
> > /* Public API */
> 
> Not sure if the structure below belongs after "Public API".
> Is it part of the public API ?
> 
> > +struct audit_task_info {
> > +   kuid_t  loginuid;
> > +   unsigned intsessionid;
> > +   struct audit_context*ctx;
> > +};
> 
> Add empty line ?
> 
> > +extern struct audit_task_info init_struct_audit;
> > +extern void __init audit_task_init(void);
> >  extern int  audit_alloc(struct task_struct *task);
> > -extern void __audit_free(struct task_struct *task);
> > +extern void audit_free(struct task_struct *task);
> >  extern void __audit_syscall_entry(int major, unsigned long a0, unsigned 
> > long a1,
> >   unsigned long a2, unsigned long a3);
> >  extern void __audit_syscall_exit(int ret_success, long ret_value);
> > @@ -242,12 +249,15 @@ extern void audit_seccomp_actions_logged(const char 
> > *names,
> >  
> >  static inline void audit_set_context(struct task_struct *task, struct 
> > audit_context *ctx)
> >  {
> > -   task->audit_context = ctx;
> > +   task->audit->ctx = ctx;
> >  }
> >  
> >  static inline struct audit_context *audit_context(void)
> >  {
> > -   return current->audit_context;
> > +   if (current->audit)
> > +   return current->audit->ctx;
> > +   else
> > +   return NULL;
> 
> Unnecessary else (and static checkers may complain).
> 
> >  }
> >  
> >  static inline bool audit_dummy_context(void)
> > @@ -255,11 +265,7 @@ static inline bool audit_dummy_context(void)
> > void *p = audit_context();
> > return !p || *(int *)p;
> >  }
> > -static inline void audit_free(struct task_struct *task)
> > -{
> > -   if (unlikely(task->audit_context))
> > -   __audit_free(task);
> > -}
> > +
> >  static inline void audit_syscall_entry(int major, unsigned long a0,
> >unsigned long a1, unsigned long a2,
> >unsigned long a3)
> > @@ -331,12 +337,18 @@ extern int auditsc_get_stamp(struct audit_context 
> > *ctx,
> >  
> >  static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
> >  {
> > -   return tsk->loginuid;
> > +   if (tsk->audit)
> > +   return tsk->audit->loginuid;
> > +   else
> > +   return INVALID_UID;
> 
> Unnecessary else.
> 
> >  }
> >  
> >  static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
> >  {
> > -   return tsk->sessionid;
> > +   if (tsk->audit)
> > +   return tsk->audit->sessionid;

Re: [PATCH v2 1/7] sysfs/cpu: Add "Unknown" vulnerability state

2019-01-04 Thread Will Deacon
On Fri, Jan 04, 2019 at 03:18:05PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Jan 04, 2019 at 02:08:32PM +, Dave Martin wrote:
> > On Thu, Jan 03, 2019 at 05:48:31PM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Jan 03, 2019 at 10:38:16AM -0600, Jeremy Linton wrote:
> > > > On 01/03/2019 03:38 AM, Greg Kroah-Hartman wrote:
> > > > > On Wed, Jan 02, 2019 at 06:49:15PM -0600, Jeremy Linton wrote:
> > > > > > There is a lot of variation in the Arm ecosystem. Because of this,
> > > > > > there exist possible cases where the kernel cannot authoritatively
> > > > > > determine if a machine is vulnerable.
> > > > > 
> > > > > Really?  Why not?  What keeps you from "knowing" this?  Can't the
> > > > > developer of the chip tell you?
> > > > 
> > > > There tends to be a few cases, possibly incomplete white/black lists,
> > > 
> > > Then fix the lists :)
> > > 
> > > > firmware that isn't responding correctly, or the user didn't build in 
> > > > the
> > > > code to check the mitigation (possibly because its an embedded system 
> > > > and
> > > > they know its not vulnerable?).
> > > 
> > > If the firmware doesn't respond, that would imply it is vulnerable :)
> > > 
> > > And if the code isn't built in, again, it's vulnerable.
> > > 
> > > > I would hope that it is an exceptional case.
> > > 
> > > Then have the default be vulnerable, don't give people false hope.
> > > 
> > > > > > Rather than guess the vulnerability status in cases where
> > > > > > the mitigation is disabled or the firmware isn't responding
> > > > > > correctly, we need to display an "Unknown" state.
> > > > > 
> > > > > Shouldn't "Unknown" really be the same thing as "Vulnerable"?  A user
> > > > > should treat it the same way, "Unknown" makes it feel like "maybe I 
> > > > > can
> > > > > just ignore this and hope I really am safe", which is not a good idea 
> > > > > at
> > > > > all.
> > > > 
> > > > I tend to agree its not clear what to do with "unknown".
> > > > 
> > > > OTOH, I think there is a hesitation to declare something vulnerable 
> > > > when it
> > > > isn't. Meltdown for example, is fairly rare given that it currently only
> > > > affects a few arm parts, so declaring someone vulnerable when they 
> > > > likely
> > > > aren't is going to be just as difficult to explain.
> > > 
> > > If you know it is rare, then you know how to properly detect it so
> > > "unknown" is not needed, correct?
> > > 
> > > Again, "unknown" is not going to help anyone out here, please don't do
> > > it.
> > 
> > Thinking about it, "unknown" is actually the common case.
> > 
> > Kernels that predate the sysfs vulnerabilities interface effectively
> > report this for all vulnerabilities by omitting the sysfs entries
> > entirely.
> > 
> > Current kernels also don't know anything about future vulnerabilities
> > that may be added in sysfs later on (but which may nevertheless be
> > discovered subsequently to affect current hardware).
> > 
> > So, can we simply omit the sysfs entries for which we can't provide a
> > good answer?
> 
> As you say, we already do this for older systems.
> 
> But don't add new logic to explicitly not create the files just because
> we "can not figure it out".  For those systems, I would default to
> "vulnerable" as I think that's what we do today, right?

Nope: currently the vulnerabilities directory doesn't even exist for arm64
because we don't select GENERIC_CPU_VULNERABILITIES.

There are also a few other things to consider here:

  1. The action to take as an end-user is slightly different in the case
 that you know for sure that your system is vulnerable, as opposed to
 the case that you don't know whether your system is vulnerable or not.
 The former needs a firmware update; the second needs a statement about
 the CPU, which could result in a simple whitelist update in Linux.

  2. There's an unfortunate political angle to this. Whilst the Arm website
 [1] provides information for all of the Arm-designed CPUs (i.e.
 Cortex-A*), it doesn't comment on partner implementations. I'm not at
 all keen to be seen as branding them all as vulnerable in the Linux
 kernel, as this is likely to cause more problems than it solves.
 If we had complete whitelist information available in public, that
 would be ideal, but it's not the case.

  3. The architecture has added some ID registers to determine if a CPU
 is affected by Spectre and Meltdown, so a whitelist only needs to
 cover existing CPUs.

So I agree with Dave that continuing to omit the files when we don't know
whether or not the system is affected is the right thing to do.

Will

[1] 
https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability


Re: [PATCH 1/4] pmbus: associate PMBUS_SKIP_STATUS_CHECK with driver_data

2019-01-04 Thread Guenter Roeck

On 1/4/19 12:55 AM, Xiaoting Liu wrote:

Current code compares device name with name in i2c_device_id to decide
whether PMBUS_SKIP_STATUS_CHECK should be set in pmbus_platform_data,
which makes adding new devices with PMBUS_SKIP_STATUS_CHECK should also
modify code in pmbus_probe().

This patch adds pmbus_device_info to save pages and flags. Its pointer
is put in driver_data of i2c_device_id, which makes adding new device
more straightforward.



Good idea, but I don't see at this time where the patch is needed.
Maybe in patch 3/4 which is missing from the series ?


Signed-off-by: Shunyong Yang 
Signed-off-by: Xiaoting Liu 
---
  drivers/hwmon/pmbus/pmbus.c | 54 +++--
  include/linux/pmbus.h   |  5 +
  2 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/drivers/hwmon/pmbus/pmbus.c b/drivers/hwmon/pmbus/pmbus.c
index 7688dab32f6e..aa4cf9636e99 100644
--- a/drivers/hwmon/pmbus/pmbus.c
+++ b/drivers/hwmon/pmbus/pmbus.c
@@ -172,13 +172,15 @@ static int pmbus_probe(struct i2c_client *client,
 struct pmbus_driver_info *info;
 struct pmbus_platform_data *pdata = NULL;
 struct device *dev = >dev;
+   struct pmbus_device_info *device_info;

 info = devm_kzalloc(dev, sizeof(struct pmbus_driver_info), GFP_KERNEL);
 if (!info)
 return -ENOMEM;

-   if (!strcmp(id->name, "dps460") || !strcmp(id->name, "dps800") ||
-   !strcmp(id->name, "sgd009")) {
+   device_info = (struct pmbus_device_info *)id->driver_data;
+
+   if (device_info->flags & PMBUS_SKIP_STATUS_CHECK) {
 pdata = devm_kzalloc(dev, sizeof(struct pmbus_platform_data),
  GFP_KERNEL);
 if (!pdata)
@@ -187,36 +189,44 @@ static int pmbus_probe(struct i2c_client *client,
 pdata->flags = PMBUS_SKIP_STATUS_CHECK;
 }

-   info->pages = id->driver_data;
+   info->pages = device_info->pages;
 info->identify = pmbus_identify;
 dev->platform_data = pdata;

 return pmbus_do_probe(client, id, info);
  }

+static const struct pmbus_device_info default_pmbus_info = {1, 0};
+static const struct pmbus_device_info dps460_pmbus_info = {
+   1, PMBUS_SKIP_STATUS_CHECK};
+static const struct pmbus_device_info dps800_pmbus_info = {
+   1, PMBUS_SKIP_STATUS_CHECK};
+static const struct pmbus_device_info sgd009_pmbus_info = {
+   1, PMBUS_SKIP_STATUS_CHECK};


Three structures with exactly the same content does not add value.
Please merge into one with a common name that reflects its use.


+static const struct pmbus_device_info pmbus_info = {0, 0};


default_pmbus_info and pmbus_info are badly named and ordered.
The name should reflect that one sets one page and that the other
leaves the number of pages unset.

I would suggest three structures and names, such as

pmbus_info_one
pmbus_info_one_skip
pmbus_info_zero

though I am open to better names.


  /*
   * Use driver_data to set the number of pages supported by the chip.
   */
  static const struct i2c_device_id pmbus_id[] = {
-   {"adp4000", 1},
-   {"bmr453", 1},
-   {"bmr454", 1},
-   {"dps460", 1},
-   {"dps800", 1},
-   {"mdt040", 1},
-   {"ncp4200", 1},
-   {"ncp4208", 1},
-   {"pdt003", 1},
-   {"pdt006", 1},
-   {"pdt012", 1},
-   {"pmbus", 0},
-   {"sgd009", 1},
-   {"tps40400", 1},
-   {"tps544b20", 1},
-   {"tps544b25", 1},
-   {"tps544c20", 1},
-   {"tps544c25", 1},
-   {"udt020", 1},
+   {"adp4000", (kernel_ulong_t)_pmbus_info},
+   {"bmr453", (kernel_ulong_t)_pmbus_info},
+   {"bmr454", (kernel_ulong_t)_pmbus_info},
+   {"dps460", (kernel_ulong_t)_pmbus_info},
+   {"dps800", (kernel_ulong_t)_pmbus_info},
+   {"mdt040", (kernel_ulong_t)_pmbus_info},
+   {"ncp4200", (kernel_ulong_t)_pmbus_info},
+   {"ncp4208", (kernel_ulong_t)_pmbus_info},
+   {"pdt003", (kernel_ulong_t)_pmbus_info},
+   {"pdt006", (kernel_ulong_t)_pmbus_info},
+   {"pdt012", (kernel_ulong_t)_pmbus_info},
+   {"pmbus", (kernel_ulong_t)_info},
+   {"sgd009", (kernel_ulong_t)_pmbus_info},
+   {"tps40400", (kernel_ulong_t)_pmbus_info},
+   {"tps544b20", (kernel_ulong_t)_pmbus_info},
+   {"tps544b25", (kernel_ulong_t)_pmbus_info},
+   {"tps544c20", (kernel_ulong_t)_pmbus_info},
+   {"tps544c25", (kernel_ulong_t)_pmbus_info},
+   {"udt020", (kernel_ulong_t)_pmbus_info},
 {}
  };

diff --git a/include/linux/pmbus.h b/include/linux/pmbus.h
index ee3c2aba2a8e..3c05edad7666 100644
--- a/include/linux/pmbus.h
+++ b/include/linux/pmbus.h
@@ -46,4 +46,9 @@ struct pmbus_platform_data {
 struct regulator_init_data *reg_init_data;
  };

+struct pmbus_device_info {
+   int pages;
+   u32 flags;
+};
+


This should not be needed here. The structure 

[GIT PULL] chrome-platform updates for v4.21

2019-01-04 Thread Benson Leung
Hi Linus,

Happy new year. Sorry for the late in the window pull request, but here are
the updates for chrome-platform.

Mainly a maintainership change, and a pair of patches from Brian to fixup
wakeup handling for one kind of EC event.


The following changes since commit 651022382c7f8da46cb4872a545ee1da6d097d2a:

  Linux 4.20-rc1 (2018-11-04 15:37:52 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform.git 
tags/tag-chrome-platform-for-v4.21

for you to fetch changes up to cdd6a4a0e2ec590c30ad0e965fa08bf37461cadb:

  MAINTAINERS: add maintainers for ChromeOS EC sub-drivers (2019-01-03 12:26:43 
-0800)


chrome platform changes for v4.21

Changes for EC_MKBP_EVENT_SENSOR_FIFO handling.
Also, maintainership changes. Olofj out, Enric balletbo in.


Benson Leung (1):
  MAINTAINERS: platform/chrome: Add Enric as a maintainer

Brian Norris (2):
  platform/chrome: straighten out cros_ec_get_{next,host}_event() error 
codes
  platform/chrome: don't report EC_MKBP_EVENT_SENSOR_FIFO as wakeup

Enric Balletbo i Serra (1):
  MAINTAINERS: add maintainers for ChromeOS EC sub-drivers

Olof Johansson (1):
  MAINTAINERS: platform/chrome: remove myself as maintainer

 MAINTAINERS | 11 ++-
 drivers/platform/chrome/cros_ec_proto.c | 22 +-
 include/linux/mfd/cros_ec.h |  6 --
 3 files changed, 31 insertions(+), 8 deletions(-)

Thanks,
Benson

-- 
Benson Leung
Staff Software Engineer
Chrome OS Kernel
Google Inc.
ble...@google.com
Chromium OS Project
ble...@chromium.org


signature.asc
Description: PGP signature


Re: [PATCH] dmaengine: fix dmaengine_desc_callback_valid() doesn't check for callback_result

2019-01-04 Thread Vinod Koul
On 16-11-18, 14:56, Andrea Merello wrote:
> There are two flavors of DMA completion callbacks: callback() and
> callback_result(); the latter takes an additional parameter that carries
> result information.
> 
> Most dmaengine helper functions that work with callbacks take care of both
> flavors i.e. dmaengine_desc_get_callback_invoke() first checks for
> callback_result() to be not NULL, and eventually it calls this one;
> otherwise it goes on checking for callback().
> 
> It seems however that dmaengine_desc_callback_valid() does not care about
> callback_result(), and it returns false in case callback() is NULL but
> callback_result() is not; unless there is a (hidden to me) reason for doing
> so then I'd say this is wrong.
> 
> I've hit this by using a DMA controller driver (xilinx_dma) that doesn't
> trigger any callback invocation unless dmaengine_desc_callback_valid()
> returns true, while I had only callback_result() implemented in my client
> driver (which AFAICT is always fine since dmaengine documentation says that
> callback() will be deprecated).
> 
> This patch fixes this by making dmaengine_desc_callback_valid() to return
> true in the said scenario.
> 
> Signed-off-by: Andrea Merello 
> ---
>  drivers/dma/dmaengine.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/dmaengine.h b/drivers/dma/dmaengine.h
> index 501c0b063f85..0ba2c1f3c55d 100644
> --- a/drivers/dma/dmaengine.h
> +++ b/drivers/dma/dmaengine.h
> @@ -168,7 +168,7 @@ dmaengine_desc_get_callback_invoke(struct 
> dma_async_tx_descriptor *tx,
>  static inline bool
>  dmaengine_desc_callback_valid(struct dmaengine_desc_callback *cb)
>  {
> - return (cb->callback) ? true : false;
> + return (cb->callback || cb->callback_result);

So I do not think this one should take care of callback_result, it is
supposed to check if the callback is valid or not.. Nothing more.

Ofcourse usage of this maybe incorrect which should be fixed. We do have
dmaengine_desc_callback_invoke() which propagates the callback_result to
user

-- 
~Vinod


Re: (Audio/Video): The Truth about Linux GPLv2 and license recission (revocation).

2019-01-04 Thread Ivan Ivanov
Sadly your informative videos deleted (by NSA?), could you please reupload?
This could be useful for removing your source code if you disagree
with Cock of Conduct

пт, 4 янв. 2019 г. в 16:04, :
>
> Information regarding the rights of the linux programmers, regarding
> rescission (revocation) of their granted license to use their code.
>
> Video:
> http://www.liveleak.com/view?t=9O5vz_1546606404
>
> ( Audio Only: )
> (Part1: http://www.liveleak.com/view?t=s3Sr9_1546605652  )
> (Part2: http://www.liveleak.com/view?t=aOkfS_1546605889  )
>


[PATCH] regulator: act8945a: Use rdev_get_id() to access id of regulator

2019-01-04 Thread Axel Lin
Use rdev_get_id() instead of directly access rdev->desc->id.

Signed-off-by: Axel Lin 
---
 drivers/regulator/act8945a-regulator.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/regulator/act8945a-regulator.c 
b/drivers/regulator/act8945a-regulator.c
index 603db77723b6..caa61d306a69 100644
--- a/drivers/regulator/act8945a-regulator.c
+++ b/drivers/regulator/act8945a-regulator.c
@@ -87,7 +87,8 @@ static const struct regulator_linear_range 
act8945a_voltage_ranges[] = {
 static int act8945a_set_suspend_state(struct regulator_dev *rdev, bool enable)
 {
struct regmap *regmap = rdev->regmap;
-   int id = rdev->desc->id, reg, val;
+   int id = rdev_get_id(rdev);
+   int reg, val;
 
switch (id) {
case ACT8945A_ID_DCDC1:
@@ -159,7 +160,7 @@ static int act8945a_set_mode(struct regulator_dev *rdev, 
unsigned int mode)
 {
struct act8945a_pmic *act8945a = rdev_get_drvdata(rdev);
struct regmap *regmap = rdev->regmap;
-   int id = rdev->desc->id;
+   int id = rdev_get_id(rdev);
int reg, ret, val = 0;
 
switch (id) {
@@ -190,11 +191,11 @@ static int act8945a_set_mode(struct regulator_dev *rdev, 
unsigned int mode)
 
switch (mode) {
case REGULATOR_MODE_STANDBY:
-   if (rdev->desc->id > ACT8945A_ID_DCDC3)
+   if (id > ACT8945A_ID_DCDC3)
val = BIT(5);
break;
case REGULATOR_MODE_NORMAL:
-   if (rdev->desc->id <= ACT8945A_ID_DCDC3)
+   if (id <= ACT8945A_ID_DCDC3)
val = BIT(5);
break;
default:
@@ -213,7 +214,7 @@ static int act8945a_set_mode(struct regulator_dev *rdev, 
unsigned int mode)
 static unsigned int act8945a_get_mode(struct regulator_dev *rdev)
 {
struct act8945a_pmic *act8945a = rdev_get_drvdata(rdev);
-   int id = rdev->desc->id;
+   int id = rdev_get_id(rdev);
 
if (id < ACT8945A_ID_DCDC1 || id >= ACT8945A_ID_MAX)
return -EINVAL;
-- 
2.17.1



Re: [PATCH 4/4] pmbus (dps650ab): add power supply driver

2019-01-04 Thread Guenter Roeck

On 1/4/19 1:05 AM, Xiaoting Liu wrote:

The Delta dps650ab provides main power and standby power to server.
dps650ab can be detected by MFR_ID and MFR_MODEL referring to
manufacturer's feedback. This patch adds driver to moniter power
supply status.

Signed-off-by: Xiaoting Liu 
---
  drivers/hwmon/pmbus/Kconfig|  10 +
  drivers/hwmon/pmbus/Makefile   |   1 +
  drivers/hwmon/pmbus/dps650ab.c | 100 +
  drivers/hwmon/pmbus/pmbus.c|   3 ++
  4 files changed, 114 insertions(+)

diff --git a/drivers/hwmon/pmbus/Kconfig b/drivers/hwmon/pmbus/Kconfig
index 629cb45f8557..de4638396592 100644
--- a/drivers/hwmon/pmbus/Kconfig
+++ b/drivers/hwmon/pmbus/Kconfig
@@ -184,4 +184,14 @@ config SENSORS_ZL6100
   This driver can also be built as a module. If so, the module will
   be called zl6100.

+config SENSORS_DPS650AB
+   tristate "Delta DPS650AB"
+   default n
+   help
+ If you say yes here you get hardware monitoring support for the
+ Delta DPS650AB controller.
+
+ This driver can also be built as a module. If so, the module will
+ be called dps650ab.
+
  endif # PMBUS
diff --git a/drivers/hwmon/pmbus/Makefile b/drivers/hwmon/pmbus/Makefile
index ea0e39518c21..414818230a26 100644
--- a/drivers/hwmon/pmbus/Makefile
+++ b/drivers/hwmon/pmbus/Makefile
@@ -21,3 +21,4 @@ obj-$(CONFIG_SENSORS_TPS53679)+= tps53679.o
  obj-$(CONFIG_SENSORS_UCD9000)  += ucd9000.o
  obj-$(CONFIG_SENSORS_UCD9200)  += ucd9200.o
  obj-$(CONFIG_SENSORS_ZL6100)   += zl6100.o
+obj-$(CONFIG_SENSORS_DPS650AB) += dps650ab.o
diff --git a/drivers/hwmon/pmbus/dps650ab.c b/drivers/hwmon/pmbus/dps650ab.c
new file mode 100644
index ..3c300e621f5a
--- /dev/null
+++ b/drivers/hwmon/pmbus/dps650ab.c
@@ -0,0 +1,100 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hardware monitoring driver for DELTA DPS650AB
+ *
+ * Copyright (c) 2018 Huaxintong Semiconductor Technology Co., Ltd.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pmbus.h"
+
+#define DPS650AB_MFR_ID"DELTA"
+#define DPS650AB_MFR_MODEL "DPS-650AB-16 A"
+
+static int dps650ab_probe(struct i2c_client *client,
+const struct i2c_device_id *id)
+{
+   struct pmbus_driver_info *info;
+   u8 buf[I2C_SMBUS_BLOCK_MAX];
+   int ret;
+
+   memset(buf, 0, I2C_SMBUS_BLOCK_MAX);
+


I don't think this is needed.


+   if (!i2c_check_functionality(client->adapter,
+I2C_FUNC_SMBUS_READ_BYTE_DATA
+   | I2C_FUNC_SMBUS_READ_WORD_DATA
+   | I2C_FUNC_SMBUS_READ_BLOCK_DATA))
+   return -ENODEV;
+
+   ret = i2c_smbus_read_block_data(client, PMBUS_MFR_ID, buf);
+   if (ret < 0) {
+   dev_err(>dev, "Failed to read PMBUS_MFR_ID\n");
+   return ret;
+   }
+
+   if (strncmp(buf, DPS650AB_MFR_ID, strlen(DPS650AB_MFR_ID))) {


It might make sense to verify the length of the returned string as well,
unless strings such as "DELTAX" are accepted on purpose. But even if
that is the case I would prefer an added check of "ret < 
strlen(DPS650AB_MFR_ID)"
over the memset() above


+   dev_err(>dev, "DPS650AB_MFR_ID unrecognised\n");
+   return -ENODEV;
+   }
+
+   ret = i2c_smbus_read_block_data(client, PMBUS_MFR_MODEL, buf);
+   if (ret < 0) {
+   dev_err(>dev, "Failed to read PMBUS_MFR_MODEL\n");
+   return ret;
+   }
+
+   if (strncmp(buf, DPS650AB_MFR_MODEL, strlen(DPS650AB_MFR_MODEL))) {


... to ensure that nothing bad slips through here. As currently written, the 
code
accepts a MFR_ID of "DELTA50AB-16 A" combined with a MFR_MODEL "DPS-6" as valid.


+   dev_err(>dev, "DPS650AB_MFR_MODEL unrecognised\n");


It would be a nice touch to actually list the unrecognized model. Otherwise
the message is quite useless. Same for MFR_ID.


+   return -ENODEV;
+   }
+
+   info = devm_kzalloc(>dev, sizeof(*info), GFP_KERNEL);
+   if (!info)
+   return -ENOMEM;
+
+   info->pages = 1;
+   info->format[PSC_VOLTAGE_IN] = linear;
+   info->format[PSC_VOLTAGE_OUT] = linear;
+   info->format[PSC_CURRENT_IN] = linear;
+   info->format[PSC_CURRENT_OUT] = linear;
+   info->format[PSC_POWER] = linear;
+   info->format[PSC_TEMPERATURE] = linear;
+
+   info->func[0] = PMBUS_HAVE_VIN
+   | PMBUS_HAVE_IIN | PMBUS_HAVE_PIN
+   | PMBUS_HAVE_STATUS_INPUT
+   | PMBUS_HAVE_POUT | PMBUS_HAVE_FAN12
+   | PMBUS_HAVE_IOUT | PMBUS_HAVE_STATUS_IOUT
+   | PMBUS_HAVE_VOUT | PMBUS_HAVE_STATUS_VOUT
+   | PMBUS_HAVE_TEMP | PMBUS_HAVE_TEMP2
+   | PMBUS_HAVE_STATUS_TEMP;
+   info->func[1] = info->func[0];
+
+   return pmbus_do_probe(client, id, info);
+}
+

Re: [PATCH] dmaengine: qcom_hidma: Check for driver register failure

2019-01-04 Thread Vinod Koul
On 28-12-18, 14:11, Aditya Pakki wrote:
> While initializing the driver, the function platform_driver_register can
> fail and return an error. Consistent with other invocations, this patch
> returns the error upstream.

Applied, thanks

-- 
~Vinod


Re: [PATCH] dma/mv_xor: Fix a missing check in mv_xor_channel_add

2019-01-04 Thread Vinod Koul
On 24-12-18, 11:41, Aditya Pakki wrote:
> dma_async_device_register() may fail and return an error. The capabilities
> checked in mv_xor_channel_add() are not complete. The fix handles the
> error by freeing the resources.

Applied after fixing subsystem tag, thanks

-- 
~Vinod


Re: [PATCH] dmaengine: stm32-mdma: Add a check on read_u32_array

2019-01-04 Thread Vinod Koul
On 28-12-18, 13:26, Aditya Pakki wrote:
> In stm32_mdma_probe, after reading the property "st,ahb-addr-masks", the
> second call is not checked for failure. This time of check to time of use
> case of "count" error is sent upstream.

Applied, thanks

-- 
~Vinod


[PATCH] mm: kmemleak: Turn kmemleak_lock to spin lock and RCU primitives

2019-01-04 Thread zhe.he
From: He Zhe 

It's not necessary to keep consistency between readers and writers of
kmemleak_lock. RCU is more proper for this case. And in order to gain better
performance, we turn the reader locks to RCU read locks and writer locks to
normal spin locks.

"time echo scan > /sys/kernel/debug/kmemleak" is improved from around 1.010s to
0.475s, without lock debug options, tested on Intel Corporation Broadwell Client
platform/Basking Ridge.

spin_lock_nested is replaced with irqsave version since the original outside
irqsave lock is gone. Otherwise we might have the following potential deadlock,
reported by lockdep.

WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
4.20.0-standard #1 Not tainted
-
kmemleak/163 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
8d7de78e (&(>lock)->rlock/1){+.+.}, at: scan_block+0xc4/0x1e0

and this task is already holding:
9178399c (&(>lock)->rlock){..-.}, at: scan_gray_list+0xec/0x180
which would create a new lock dependency:
 (&(>lock)->rlock){..-.} -> (&(>lock)->rlock/1){+.+.}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (&(>lock)->rlock){-.-.}

snip

   CPU0CPU1
   
  lock(&(>lock)->rlock/1);
   local_irq_disable();
   lock(&(>lock)->rlock);
   lock(&(>lock)->rlock);
  
lock(&(>lock)->rlock);

Signed-off-by: He Zhe 
Cc: catalin.mari...@arm.com
---
 mm/kmemleak.c | 38 --
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/mm/kmemleak.c b/mm/kmemleak.c
index f9d9dc2..ef9ea00 100644
--- a/mm/kmemleak.c
+++ b/mm/kmemleak.c
@@ -26,7 +26,7 @@
  *
  * The following locks and mutexes are used by kmemleak:
  *
- * - kmemleak_lock (rwlock): protects the object_list modifications and
+ * - kmemleak_lock (spinlock): protects the object_list modifications and
  *   accesses to the object_tree_root. The object_list is the main list
  *   holding the metadata (struct kmemleak_object) for the allocated memory
  *   blocks. The object_tree_root is a red black tree used to look-up
@@ -199,7 +199,7 @@ static LIST_HEAD(gray_list);
 /* search tree for object boundaries */
 static struct rb_root object_tree_root = RB_ROOT;
 /* rw_lock protecting the access to object_list and object_tree_root */
-static DEFINE_RWLOCK(kmemleak_lock);
+static DEFINE_SPINLOCK(kmemleak_lock);
 
 /* allocation caches for kmemleak internal data */
 static struct kmem_cache *object_cache;
@@ -515,9 +515,7 @@ static struct kmemleak_object *find_and_get_object(unsigned 
long ptr, int alias)
struct kmemleak_object *object;
 
rcu_read_lock();
-   read_lock_irqsave(_lock, flags);
object = lookup_object(ptr, alias);
-   read_unlock_irqrestore(_lock, flags);
 
/* check whether the object is still available */
if (object && !get_object(object))
@@ -537,13 +535,13 @@ static struct kmemleak_object 
*find_and_remove_object(unsigned long ptr, int ali
unsigned long flags;
struct kmemleak_object *object;
 
-   write_lock_irqsave(_lock, flags);
+   spin_lock_irqsave(_lock, flags);
object = lookup_object(ptr, alias);
if (object) {
rb_erase(>rb_node, _tree_root);
list_del_rcu(>object_list);
}
-   write_unlock_irqrestore(_lock, flags);
+   spin_unlock_irqrestore(_lock, flags);
 
return object;
 }
@@ -617,7 +615,7 @@ static struct kmemleak_object *create_object(unsigned long 
ptr, size_t size,
/* kernel backtrace */
object->trace_len = __save_stack_trace(object->trace);
 
-   write_lock_irqsave(_lock, flags);
+   spin_lock_irqsave(_lock, flags);
 
min_addr = min(min_addr, ptr);
max_addr = max(max_addr, ptr + size);
@@ -648,7 +646,7 @@ static struct kmemleak_object *create_object(unsigned long 
ptr, size_t size,
 
list_add_tail_rcu(>object_list, _list);
 out:
-   write_unlock_irqrestore(_lock, flags);
+   spin_unlock_irqrestore(_lock, flags);
return object;
 }
 
@@ -1334,7 +1332,7 @@ static void scan_block(void *_start, void *_end,
unsigned long *end = _end - (BYTES_PER_POINTER - 1);
unsigned long flags;
 
-   read_lock_irqsave(_lock, flags);
+   rcu_read_lock();
for (ptr = start; ptr < end; ptr++) {
struct kmemleak_object *object;
unsigned long pointer;
@@ -1350,14 +1348,8 @@ static void scan_block(void *_start, void *_end,
if (pointer < min_addr || pointer >= max_addr)
continue;
 
-   /*
-* No need for get_object() here since we hold kmemleak_lock.
-* object->use_count cannot be dropped to 0 while the object
-* is still present in object_tree_root and object_list
-* (with 

Re: [PATCH v5 06/11] platform/x86: apple-gmux: Make PCI dependency explicit

2019-01-04 Thread Andy Shevchenko
On Fri, Jan 4, 2019 at 4:12 PM Andy Shevchenko
 wrote:
>
> On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya  wrote:
> >
> > After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without
> > CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were
> > satisfied implicitly through dependencies on CONFIG_ACPI have to be
> > specified directly. This driver depends on the PCI infrastructure but
> > the dependency has not been explicitly called out.
> >
> > Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI 
> > set")
> > Signed-off-by: Sinan Kaya 
> > ---
> >  drivers/platform/x86/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> > index 7afb96cb1cd6..89f4b86244a7 100644
> > --- a/drivers/platform/x86/Kconfig
> > +++ b/drivers/platform/x86/Kconfig
> > @@ -1136,6 +1136,7 @@ config SAMSUNG_Q10
> >  config APPLE_GMUX
> > tristate "Apple Gmux Driver"
>
> > depends on ACPI
> > +   depends on PCI
>
> Can you stick with the same pattern you have used for the rest in this series?
> I.e.
>  depends on ACPI && PCI

After satisfying it

Reviewed-by: Andy Shevchenko 
Acked-by: Andy Shevchenko 

>
>
> > depends on PNP
> > depends on BACKLIGHT_CLASS_DEVICE
> > depends on BACKLIGHT_APPLE=n || BACKLIGHT_APPLE
> > --
> > 2.19.0
> >
>
>
> --
> With Best Regards,
> Andy Shevchenko



-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v5 05/11] platform/x86: intel_pmc: Make PCI dependency explicit

2019-01-04 Thread Andy Shevchenko
On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya  wrote:
>
> After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without
> CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were
> satisfied implicitly through dependencies on CONFIG_ACPI have to be
> specified directly. Code relies on PCI for execution. Specify this
> in the Kconfig.
>

Unfortunately, this is not how it should be fixed.

First of all, we (Heikki, Sathya, me —  all in Cc list) that this
driver should be refactored and unified with intel_scu_ipc.
Then, it has two drivers inside and PCI is just a glue for it which is
optional. Thus, needs to be split accordingly.

> Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI 
> set")
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/platform/x86/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index b36ea14b41ad..7afb96cb1cd6 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -1174,7 +1174,7 @@ config INTEL_SMARTCONNECT
>
>  config INTEL_PMC_IPC
> tristate "Intel PMC IPC Driver"
> -   depends on ACPI
> +   depends on ACPI && PCI
> ---help---
> This driver provides support for PMC control on some Intel platforms.
> The PMC is an ARC processor which defines IPC commands for 
> communication
> --
> 2.19.0
>


-- 
With Best Regards,
Andy Shevchenko


[PATCH] hv_balloon: avoid touching uninitialized struct page during tail onlining

2019-01-04 Thread Vitaly Kuznetsov
Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use
128M. To deal with it we implement partial section onlining by registering
custom page onlining callback (hv_online_page()). Later, when more memory
arrives we try to online the 'tail' (see hv_bring_pgs_online()).

It was found that in some cases this 'tail' onlining causes issues:

 BUG: Bad page state in process kworker/0:2  pfn:109e3a
 page:e08344278e80 count:0 mapcount:1 mapping: index:0x0
 flags: 0xf8000()
 raw: 000f8000 dead0100 dead0200 
 raw:    
 page dumped because: nonzero mapcount
 ...
 Workqueue: events hot_add_req [hv_balloon]
 Call Trace:
  dump_stack+0x5c/0x80
  bad_page.cold.112+0x7f/0xb2
  free_pcppages_bulk+0x4b8/0x690
  free_unref_page+0x54/0x70
  hv_page_online_one+0x5c/0x80 [hv_balloon]
  hot_add_req.cold.24+0x182/0x835 [hv_balloon]
  ...

Turns out that we now have deferred struct page initialization for memory
hotplug so e.g. memory_block_action() in drivers/base/memory.c does
pages_correctly_probed() check and in that check it avoids inspecting
struct pages and checks sections instead. But in Hyper-V balloon driver we
do PageReserved(pfn_to_page()) check and this is now wrong.

Switch to checking online_section_nr() instead.

Signed-off-by: Vitaly Kuznetsov 
---
 drivers/hv/hv_balloon.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 5301fef16c31..7c6349a50ef1 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -888,12 +888,14 @@ static unsigned long handle_pg_range(unsigned long 
pg_start,
pfn_cnt -= pgs_ol;
/*
 * Check if the corresponding memory block is already
-* online by checking its last previously backed page.
-* In case it is we need to bring rest (which was not
-* backed previously) online too.
+* online. It is possible to observe struct pages still
+* being uninitialized here so check section instead.
+* In case the section is online we need to bring the
+* rest of pfns (which were not backed previously)
+* online too.
 */
if (start_pfn > has->start_pfn &&
-   !PageReserved(pfn_to_page(start_pfn - 1)))
+   online_section_nr(pfn_to_section_nr(start_pfn)))
hv_bring_pgs_online(has, start_pfn, pgs_ol);
 
}
-- 
2.20.1



Re: [PATCH v2 1/7] sysfs/cpu: Add "Unknown" vulnerability state

2019-01-04 Thread Greg Kroah-Hartman
On Fri, Jan 04, 2019 at 02:08:32PM +, Dave Martin wrote:
> On Thu, Jan 03, 2019 at 05:48:31PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Jan 03, 2019 at 10:38:16AM -0600, Jeremy Linton wrote:
> > > On 01/03/2019 03:38 AM, Greg Kroah-Hartman wrote:
> > > > On Wed, Jan 02, 2019 at 06:49:15PM -0600, Jeremy Linton wrote:
> > > > > There is a lot of variation in the Arm ecosystem. Because of this,
> > > > > there exist possible cases where the kernel cannot authoritatively
> > > > > determine if a machine is vulnerable.
> > > > 
> > > > Really?  Why not?  What keeps you from "knowing" this?  Can't the
> > > > developer of the chip tell you?
> > > 
> > > There tends to be a few cases, possibly incomplete white/black lists,
> > 
> > Then fix the lists :)
> > 
> > > firmware that isn't responding correctly, or the user didn't build in the
> > > code to check the mitigation (possibly because its an embedded system and
> > > they know its not vulnerable?).
> > 
> > If the firmware doesn't respond, that would imply it is vulnerable :)
> > 
> > And if the code isn't built in, again, it's vulnerable.
> > 
> > > I would hope that it is an exceptional case.
> > 
> > Then have the default be vulnerable, don't give people false hope.
> > 
> > > > > Rather than guess the vulnerability status in cases where
> > > > > the mitigation is disabled or the firmware isn't responding
> > > > > correctly, we need to display an "Unknown" state.
> > > > 
> > > > Shouldn't "Unknown" really be the same thing as "Vulnerable"?  A user
> > > > should treat it the same way, "Unknown" makes it feel like "maybe I can
> > > > just ignore this and hope I really am safe", which is not a good idea at
> > > > all.
> > > 
> > > I tend to agree its not clear what to do with "unknown".
> > > 
> > > OTOH, I think there is a hesitation to declare something vulnerable when 
> > > it
> > > isn't. Meltdown for example, is fairly rare given that it currently only
> > > affects a few arm parts, so declaring someone vulnerable when they likely
> > > aren't is going to be just as difficult to explain.
> > 
> > If you know it is rare, then you know how to properly detect it so
> > "unknown" is not needed, correct?
> > 
> > Again, "unknown" is not going to help anyone out here, please don't do
> > it.
> 
> Thinking about it, "unknown" is actually the common case.
> 
> Kernels that predate the sysfs vulnerabilities interface effectively
> report this for all vulnerabilities by omitting the sysfs entries
> entirely.
> 
> Current kernels also don't know anything about future vulnerabilities
> that may be added in sysfs later on (but which may nevertheless be
> discovered subsequently to affect current hardware).
> 
> So, can we simply omit the sysfs entries for which we can't provide a
> good answer?

As you say, we already do this for older systems.

But don't add new logic to explicitly not create the files just because
we "can not figure it out".  For those systems, I would default to
"vulnerable" as I think that's what we do today, right?

thanks,
g
reg k-h


RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb requests

2019-01-04 Thread Anurag Kumar Vulisha
Hi Felipe,

Resending...

Since I am waiting on your suggestion,  thought of giving remainder.
 
Thanks,
Anurag Kumar Vulisha

>-Original Message-
>From: Anurag Kumar Vulisha
>Sent: Wednesday, December 12, 2018 8:41 PM
>To: 'Alan Stern' ; Felipe Balbi 
>Cc: Greg Kroah-Hartman ; Shuah Khan
>; Johan Hovold ; Jaejoong Kim
>; Benjamin Herrenschmidt ;
>Roger Quadros ; Manu Gautam ;
>martin.peter...@oracle.com; Bart Van Assche ; Mike
>Christie ; Matthew Wilcox ; Colin Ian
>King ; linux-...@vger.kernel.org; linux-
>ker...@vger.kernel.org; v.anuragku...@gmail.com; Thinh Nguyen
>; Tejas Joglekar ; Ajay
>Yugalkishore Pandey 
>Subject: RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb 
>requests
>
>
>Hi Felipe,
>
>>-Original Message-
>>From: Alan Stern [mailto:st...@rowland.harvard.edu]
>>Sent: Friday, December 07, 2018 10:40 PM
>>To: Felipe Balbi 
>>Cc: Anurag Kumar Vulisha ; Greg Kroah-Hartman
>>; Shuah Khan ; Johan Hovold
>>; Jaejoong Kim ; Benjamin
>>Herrenschmidt ; Roger Quadros ;
>Manu
>>Gautam ; martin.peter...@oracle.com; Bart Van
>>Assche ; Mike Christie ; Matthew
>>Wilcox ; Colin Ian King ; 
>>linux-
>>u...@vger.kernel.org; linux-kernel@vger.kernel.org; v.anuragku...@gmail.com;
>>Thinh Nguyen ; Tejas Joglekar
>>; Ajay Yugalkishore Pandey 
>>Subject: RE: [PATCH v7 01/10] usb: gadget: udc: Add timer support for usb 
>>requests
>>
>>On Fri, 7 Dec 2018, Felipe Balbi wrote:
>>
>>>
>>> hi,
>>>
>>> Anurag Kumar Vulisha  writes:
>>> >>Does the data book suggest a value for the timeout?
>>> >>
>>> >
>>> > No, the databook doesn't mention about the timeout value
>>> >
>>> >>> >At this point, it seems that the generic approach will be messier than 
>>> >>> >having
>>every
>>> >>> >controller driver implement its own fix.  At least, that's how it 
>>> >>> >appears to me.
>>>
>>> Why, if the UDC implementation will, anyway, be a timer?
>>
>>It's mostly a question of what happens when the timer expires.  (After
>>all, starting a timer is just as easy to do in a UDC driver as it is in
>>the core.)  When the timer expires, what can the core do?
>>
>>Don't say it can cancel the offending request and resubmit it.  That
>>leads to the sort of trouble we discussed earlier in this thread.  In
>>particular, we don't want the class driver's completion routine to be
>>called when the cancel occurs.  Furthermore, this leads to a race:
>>Suppose the class driver decides to cancel the request at the same time
>>as the core does a cancel and resubmit.  Then the class driver's cancel
>>could get lost and the request would remain on the UDC's queue.
>>
>>What you really want to do is issue the appropriate stop and restart
>>commands to the hardware, while leaving the request logically active
>>the entire time.  The UDC core can't do this, but a UDC driver can.
>>
>
>I agree with Alan's comment as it looks like there may be some corner cases
>where the issue may occur with dequeue approach. Are you okay if the timeout
>handler gets moved to the dwc3 driver (the timers still added into udc layer)?
>Please let us know your suggestion on this
>
>Thanks,
>Anurag Kumar Vulisha
>
>>> >>(Especially if dwc3 is the only driver affected.)
>>> >
>>> > As discussed above, the issue may happen with other gadgets too. As I got 
>>> > divide
>>opinions
>>> > on this implementation and both the implementations looks fine to me, I am
>little
>>confused
>>> > on which should be implemented.
>>> >
>>> > @Felipe: Do you agree with Alan's implementation? Please let us know your
>>suggestion
>>> > on this.
>>>
>>> I still think a generic timer is a better solution since it has other uses.
>>
>>Putting a struct timer into struct usb_request is okay with me, but I
>>wouldn't go any farther than that.
>>
>>> >>Since the purpose of the timeout is to detect a deadlock caused by a
>>> >>hardware bug, I suggest a fixed and relatively short timeout value such
>>> >>as one second.  Cancelling and requeuing a few requests at 1-second
>>> >>intervals shouldn't add very much overhead.
>>>
>>> I wouldn't call this a HW bug though. This is just how the UDC
>>> behaves. There are N streams and host can move data in any stream at any
>>> time. This means that host & gadget _can_ disagree on what stream to
>>> start next.
>>
>>But the USB 3 spec says what should happen when the host and gadget
>>disagree in this way, doesn't it?  And it doesn't say that they should
>>deadlock.  :-)  Or have I misread the spec?
>>
>>> One way to avoid this would be to never pre-start any streams and always
>>> rely on XferNotReady, but that would mean greatly reduced throughput for
>>> streams.
>>
>>It would be good if there was some way to actively detect the problem
>>instead of passively waiting for a timer to expire.
>>
>>Alan Stern



Re: [PATCH v5 04/11] platform/x86: intel_ips: make PCI dependency explicit

2019-01-04 Thread Andy Shevchenko
On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya  wrote:
>
> After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without
> CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were
> satisfied implicitly through dependencies on CONFIG_ACPI have to be
> specified directly. Ipss driver is a PCI device driver but this has
> not been mentioned anywhere in Kconfig.
>

Acked-by: Andy Shevchenko 

> Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI 
> set")
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/platform/x86/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index e3b62c2ee8d1..b36ea14b41ad 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -1009,7 +1009,7 @@ config INTEL_MFLD_THERMAL
>
>  config INTEL_IPS
> tristate "Intel Intelligent Power Sharing"
> -   depends on ACPI
> +   depends on ACPI && PCI
> ---help---
>   Intel Calpella platforms support dynamic power sharing between the
>   CPU and GPU, maximizing performance in a given TDP.  This driver,
> --
> 2.19.0
>


-- 
With Best Regards,
Andy Shevchenko


Re: [PATCH v5] arm64: implement ftrace with regs

2019-01-04 Thread Torsten Duwe
On Mon, Dec 17, 2018 at 09:52:04AM +0530, Amit Daniel Kachhap wrote:
> There is no error message or crash but no useful output like below,
> 
> /sys/kernel/tracing # echo wake_up_process > set_graph_function
> /sys/kernel/tracing # echo function_graph > current_tracer
> /sys/kernel/tracing # cat trace
> # tracer: function_graph
> #
> # CPU  DURATION  FUNCTION CALLS
> # | |   | |   |   |   |

It turned out the graph caller works perfectly, I only broke the filtering.
Fixed now in v6; thanks a lot for testing!

Torsten



Re: [PATCH v5 06/11] platform/x86: apple-gmux: Make PCI dependency explicit

2019-01-04 Thread Andy Shevchenko
On Wed, Jan 2, 2019 at 8:10 PM Sinan Kaya  wrote:
>
> After 'commit 5d32a66541c4 ("PCI/ACPI: Allow ACPI to be built without
> CONFIG_PCI set")' dependencies on CONFIG_PCI that previously were
> satisfied implicitly through dependencies on CONFIG_ACPI have to be
> specified directly. This driver depends on the PCI infrastructure but
> the dependency has not been explicitly called out.
>
> Fixes: 5d32a66541c46 ("PCI/ACPI: Allow ACPI to be built without CONFIG_PCI 
> set")
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/platform/x86/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/platform/x86/Kconfig b/drivers/platform/x86/Kconfig
> index 7afb96cb1cd6..89f4b86244a7 100644
> --- a/drivers/platform/x86/Kconfig
> +++ b/drivers/platform/x86/Kconfig
> @@ -1136,6 +1136,7 @@ config SAMSUNG_Q10
>  config APPLE_GMUX
> tristate "Apple Gmux Driver"

> depends on ACPI
> +   depends on PCI

Can you stick with the same pattern you have used for the rest in this series?
I.e.
 depends on ACPI && PCI


> depends on PNP
> depends on BACKLIGHT_CLASS_DEVICE
> depends on BACKLIGHT_APPLE=n || BACKLIGHT_APPLE
> --
> 2.19.0
>


-- 
With Best Regards,
Andy Shevchenko


[PATCH v6] arm64: implement ftrace with regs

2019-01-04 Thread Torsten Duwe
Use -fpatchable-function-entry (gcc8) to add 2 NOPs at the beginning
of each function. Replace the first NOP thus generated with a quick LR
saver (move it to scratch reg x9), so the 2nd replacement insn, the call
to ftrace, does not clobber the value. Ftrace will then generate the
standard stack frames.

Note that patchable-function-entry in GCC disables IPA-RA, which means
ABI register calling conventions are obeyed *and* scratch registers
such as x9 are available.

Introduce and handle an ftrace_regs_trampoline for module PLTs, right
after ftrace_trampoline, and double the size of this special section.

Signed-off-by: Torsten Duwe 

---

This patch applies on 4.20 with the additional changes
bdb85cd1d20669dfae813555dddb745ad09323ba
(arm64/module: switch to ADRP/ADD sequences for PLT entries)
and
7dc48bf96aa0fc8aa5b38cc3e5c36ac03171e680
(arm64: ftrace: always pass instrumented pc in x0)
along with their respective series, or alternatively on Linus' master,
which already has these.

changes since v5:

* fix mentioned pc in x0 to hold the start address of the call site,
  not the return address or the branch address.
  This resolves the problem found by Amit.

---
 arch/arm64/Kconfig|2 
 arch/arm64/Makefile   |4 +
 arch/arm64/include/asm/assembler.h|1 
 arch/arm64/include/asm/ftrace.h   |   13 +++
 arch/arm64/include/asm/module.h   |3 
 arch/arm64/kernel/Makefile|6 -
 arch/arm64/kernel/entry-ftrace.S  |  131 ++
 arch/arm64/kernel/ftrace.c|  125 
 arch/arm64/kernel/module-plts.c   |3 
 arch/arm64/kernel/module.c|2 
 drivers/firmware/efi/libstub/Makefile |3 
 include/asm-generic/vmlinux.lds.h |1 
 include/linux/compiler_types.h|4 +
 13 files changed, 262 insertions(+), 36 deletions(-)
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -131,6 +131,8 @@ config ARM64
select HAVE_DEBUG_KMEMLEAK
select HAVE_DMA_CONTIGUOUS
select HAVE_DYNAMIC_FTRACE
+   select HAVE_DYNAMIC_FTRACE_WITH_REGS \
+   if $(cc-option,-fpatchable-function-entry=2)
select HAVE_EFFICIENT_UNALIGNED_ACCESS
select HAVE_FTRACE_MCOUNT_RECORD
select HAVE_FUNCTION_TRACER
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -79,6 +79,10 @@ ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
 KBUILD_LDFLAGS_MODULE  += -T $(srctree)/arch/arm64/kernel/module.lds
 endif
 
+ifeq ($(CONFIG_DYNAMIC_FTRACE_WITH_REGS),y)
+  CC_FLAGS_FTRACE := -fpatchable-function-entry=2
+endif
+
 # Default value
 head-y := arch/arm64/kernel/head.o
 
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -17,6 +17,19 @@
 #define MCOUNT_ADDR((unsigned long)_mcount)
 #define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
 
+/*
+ * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
+ * of each function, with the second NOP actually calling ftrace. In contrary
+ * to a classic _mcount call, the call instruction to be modified is thus
+ * the second one, and not the only one.
+ */
+#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
+#define ARCH_SUPPORTS_FTRACE_OPS 1
+#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
+#else
+#define REC_IP_BRANCH_OFFSET 0
+#endif
+
 #ifndef __ASSEMBLY__
 #include 
 
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -7,9 +7,9 @@ CPPFLAGS_vmlinux.lds:= -DTEXT_OFFSET=$(
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)
 CFLAGS_armv8_deprecated.o := -I$(src)
 
-CFLAGS_REMOVE_ftrace.o = -pg
-CFLAGS_REMOVE_insn.o = -pg
-CFLAGS_REMOVE_return_address.o = -pg
+CFLAGS_REMOVE_ftrace.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_insn.o = -pg $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_return_address.o = -pg $(CC_FLAGS_FTRACE)
 
 # Object file lists.
 arm64-obj-y:= debug-monitors.o entry.o irq.o fpsimd.o  
\
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -13,7 +13,8 @@ cflags-$(CONFIG_X86)  += -m$(BITS) -D__K
 
 # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
 # disable the stackleak plugin
-cflags-$(CONFIG_ARM64) := $(subst -pg,,$(KBUILD_CFLAGS)) -fpie \
+cflags-$(CONFIG_ARM64) := $(filter-out -pg $(CC_FLAGS_FTRACE)\
+ ,$(KBUILD_CFLAGS)) -fpie \
   $(DISABLE_STACKLEAK_PLUGIN)
 cflags-$(CONFIG_ARM)   := $(subst -pg,,$(KBUILD_CFLAGS)) \
   -fno-builtin -fpic \
--- a/arch/arm64/kernel/entry-ftrace.S
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -13,6 +13,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * Gcc with -pg will put the following code in the beginning of each function:
@@ -122,6 +124,7 @@ skip_ftrace_call:   // }
 ENDPROC(_mcount)
 
 #else /* CONFIG_DYNAMIC_FTRACE */
+#ifndef 

Re: [PATCH v2 1/7] sysfs/cpu: Add "Unknown" vulnerability state

2019-01-04 Thread Dave Martin
On Thu, Jan 03, 2019 at 05:48:31PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 03, 2019 at 10:38:16AM -0600, Jeremy Linton wrote:
> > On 01/03/2019 03:38 AM, Greg Kroah-Hartman wrote:
> > > On Wed, Jan 02, 2019 at 06:49:15PM -0600, Jeremy Linton wrote:
> > > > There is a lot of variation in the Arm ecosystem. Because of this,
> > > > there exist possible cases where the kernel cannot authoritatively
> > > > determine if a machine is vulnerable.
> > > 
> > > Really?  Why not?  What keeps you from "knowing" this?  Can't the
> > > developer of the chip tell you?
> > 
> > There tends to be a few cases, possibly incomplete white/black lists,
> 
> Then fix the lists :)
> 
> > firmware that isn't responding correctly, or the user didn't build in the
> > code to check the mitigation (possibly because its an embedded system and
> > they know its not vulnerable?).
> 
> If the firmware doesn't respond, that would imply it is vulnerable :)
> 
> And if the code isn't built in, again, it's vulnerable.
> 
> > I would hope that it is an exceptional case.
> 
> Then have the default be vulnerable, don't give people false hope.
> 
> > > > Rather than guess the vulnerability status in cases where
> > > > the mitigation is disabled or the firmware isn't responding
> > > > correctly, we need to display an "Unknown" state.
> > > 
> > > Shouldn't "Unknown" really be the same thing as "Vulnerable"?  A user
> > > should treat it the same way, "Unknown" makes it feel like "maybe I can
> > > just ignore this and hope I really am safe", which is not a good idea at
> > > all.
> > 
> > I tend to agree its not clear what to do with "unknown".
> > 
> > OTOH, I think there is a hesitation to declare something vulnerable when it
> > isn't. Meltdown for example, is fairly rare given that it currently only
> > affects a few arm parts, so declaring someone vulnerable when they likely
> > aren't is going to be just as difficult to explain.
> 
> If you know it is rare, then you know how to properly detect it so
> "unknown" is not needed, correct?
> 
> Again, "unknown" is not going to help anyone out here, please don't do
> it.

Thinking about it, "unknown" is actually the common case.

Kernels that predate the sysfs vulnerabilities interface effectively
report this for all vulnerabilities by omitting the sysfs entries
entirely.

Current kernels also don't know anything about future vulnerabilities
that may be added in sysfs later on (but which may nevertheless be
discovered subsequently to affect current hardware).

So, can we simply omit the sysfs entries for which we can't provide a
good answer?

IMHO the kernel should make best efforts to provide answers for every
vulnerability that it knows about, so the checks should not be Kconfig-
dependent without a good reason.

There will be cases where whitelists/blacklists are the only source of
answers, and we are ultimately reliant on vendors to provide that
information.  Upstream Linux is likely to lag, there's not much we can
do about that.

Cheers
---Dave


Applied "ASoC: xlnx: add pcm formatter platform driver" to the asoc tree

2019-01-04 Thread Mark Brown
The patch

   ASoC: xlnx: add pcm formatter platform driver

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 6f6c3c36f0917be24587eeba818ab4fdfcb5465a Mon Sep 17 00:00:00 2001
From: Maruthi Srinivas Bayyavarapu 
Date: Fri, 21 Dec 2018 14:27:28 +0530
Subject: [PATCH] ASoC: xlnx: add pcm formatter platform driver

The audio formatter PL IP supports DMA of two streams -
mm2s and s2mm for playback and capture respectively. Apart from
DMA, IP also does conversions like PCM to AES and viceversa.
This patch adds DMA component driver for the IP.

Signed-off-by: Maruthi Srinivas Bayyavarapu 

Signed-off-by: Mark Brown 
---
 sound/soc/xilinx/xlnx_formatter_pcm.c | 565 ++
 1 file changed, 565 insertions(+)
 create mode 100644 sound/soc/xilinx/xlnx_formatter_pcm.c

diff --git a/sound/soc/xilinx/xlnx_formatter_pcm.c 
b/sound/soc/xilinx/xlnx_formatter_pcm.c
new file mode 100644
index ..f7235f7664d7
--- /dev/null
+++ b/sound/soc/xilinx/xlnx_formatter_pcm.c
@@ -0,0 +1,565 @@
+// SPDX-License-Identifier: GPL-2.0
+//
+// Xilinx ASoC audio formatter support
+//
+// Copyright (C) 2018 Xilinx, Inc.
+//
+// Author: Maruthi Srinivas Bayyavarapu 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define DRV_NAME "xlnx_formatter_pcm"
+
+#define XLNX_S2MM_OFFSET   0
+#define XLNX_MM2S_OFFSET   0x100
+
+#define XLNX_AUD_CORE_CONFIG   0x4
+#define XLNX_AUD_CTRL  0x10
+#define XLNX_AUD_STS   0x14
+
+#define AUD_CTRL_RESET_MASKBIT(1)
+#define AUD_CFG_MM2S_MASK  BIT(15)
+#define AUD_CFG_S2MM_MASK  BIT(31)
+
+#define XLNX_AUD_FS_MULTIPLIER 0x18
+#define XLNX_AUD_PERIOD_CONFIG 0x1C
+#define XLNX_AUD_BUFF_ADDR_LSB 0x20
+#define XLNX_AUD_BUFF_ADDR_MSB 0x24
+#define XLNX_AUD_XFER_COUNT0x28
+#define XLNX_AUD_CH_STS_START  0x2C
+#define XLNX_BYTES_PER_CH  0x44
+
+#define AUD_STS_IOC_IRQ_MASK   BIT(31)
+#define AUD_STS_CH_STS_MASKBIT(29)
+#define AUD_CTRL_IOC_IRQ_MASK  BIT(13)
+#define AUD_CTRL_TOUT_IRQ_MASK BIT(14)
+#define AUD_CTRL_DMA_EN_MASK   BIT(0)
+
+#define CFG_MM2S_CH_MASK   GENMASK(11, 8)
+#define CFG_MM2S_CH_SHIFT  8
+#define CFG_MM2S_XFER_MASK GENMASK(14, 13)
+#define CFG_MM2S_XFER_SHIFT13
+#define CFG_MM2S_PKG_MASK  BIT(12)
+
+#define CFG_S2MM_CH_MASK   GENMASK(27, 24)
+#define CFG_S2MM_CH_SHIFT  24
+#define CFG_S2MM_XFER_MASK GENMASK(30, 29)
+#define CFG_S2MM_XFER_SHIFT29
+#define CFG_S2MM_PKG_MASK  BIT(28)
+
+#define AUD_CTRL_DATA_WIDTH_SHIFT  16
+#define AUD_CTRL_ACTIVE_CH_SHIFT   19
+#define PERIOD_CFG_PERIODS_SHIFT   16
+
+#define PERIODS_MIN2
+#define PERIODS_MAX6
+#define PERIOD_BYTES_MIN   192
+#define PERIOD_BYTES_MAX   (50 * 1024)
+
+enum bit_depth {
+   BIT_DEPTH_8,
+   BIT_DEPTH_16,
+   BIT_DEPTH_20,
+   BIT_DEPTH_24,
+   BIT_DEPTH_32,
+};
+
+struct xlnx_pcm_drv_data {
+   void __iomem *mmio;
+   bool s2mm_presence;
+   bool mm2s_presence;
+   unsigned int s2mm_irq;
+   unsigned int mm2s_irq;
+   struct snd_pcm_substream *play_stream;
+   struct snd_pcm_substream *capture_stream;
+   struct clk *axi_clk;
+};
+
+/*
+ * struct xlnx_pcm_stream_param - stream configuration
+ * @mmio: base address offset
+ * @interleaved: audio channels arrangement in buffer
+ * @xfer_mode: data formatting mode during transfer
+ * @ch_limit: Maximum channels supported
+ * @buffer_size: stream ring buffer size
+ */
+struct xlnx_pcm_stream_param {
+   void __iomem *mmio;
+   bool interleaved;
+   u32 xfer_mode;
+   u32 ch_limit;
+   u64 buffer_size;
+};
+
+static const struct snd_pcm_hardware xlnx_pcm_hardware = {
+   .info = SNDRV_PCM_INFO_INTERLEAVED | SNDRV_PCM_INFO_BLOCK_TRANSFER |
+   SNDRV_PCM_INFO_BATCH | SNDRV_PCM_INFO_PAUSE |
+   SNDRV_PCM_INFO_RESUME,
+   .formats = SNDRV_PCM_FMTBIT_S8 | SNDRV_PCM_FMTBIT_S16_LE |
+  SNDRV_PCM_FMTBIT_S24_LE,
+   .channels_min = 2,
+   .channels_max = 2,
+   .rates = SNDRV_PCM_RATE_8000_192000,
+   .rate_min = 8000,
+   .rate_max = 192000,

Applied "ASoC: rockchip: fix platform_no_drv_owner.cocci warnings" to the asoc tree

2019-01-04 Thread Mark Brown
The patch

   ASoC: rockchip: fix platform_no_drv_owner.cocci warnings

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From e1de3d237b5083fcd0da6fcf600848a4cef9cc67 Mon Sep 17 00:00:00 2001
From: YueHaibing 
Date: Tue, 25 Dec 2018 02:20:36 +
Subject: [PATCH] ASoC: rockchip: fix platform_no_drv_owner.cocci warnings

Remove .owner field if calls are used which set it automatically
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci

Signed-off-by: YueHaibing 
Signed-off-by: Mark Brown 
---
 sound/soc/codecs/rk3328_codec.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/sound/soc/codecs/rk3328_codec.c b/sound/soc/codecs/rk3328_codec.c
index f3442a2283ea..24f8f86d58e9 100644
--- a/sound/soc/codecs/rk3328_codec.c
+++ b/sound/soc/codecs/rk3328_codec.c
@@ -508,7 +508,6 @@ MODULE_DEVICE_TABLE(of, rk3328_codec_of_match);
 static struct platform_driver rk3328_codec_driver = {
.driver = {
   .name = "rk3328-codec",
-  .owner = THIS_MODULE,
   .of_match_table = of_match_ptr(rk3328_codec_of_match),
},
.probe = rk3328_platform_probe,
-- 
2.20.1



Applied "ASoC: rockchip: support ACODEC for rk3328" to the asoc tree

2019-01-04 Thread Mark Brown
The patch

   ASoC: rockchip: support ACODEC for rk3328

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From c32759035ad246d3e4c65d23a07f9e6ba32caeaf Mon Sep 17 00:00:00 2001
From: Katsuhiro Suzuki 
Date: Fri, 21 Dec 2018 00:36:35 +0900
Subject: [PATCH] ASoC: rockchip: support ACODEC for rk3328

This patch adds support for audio CODEC core of rk3328.

Rockchip does not publish detail specification of this core
but driver source code is opened on their GitHub repository.
  https://github.com/rockchip-linux/kernel

So I ported this code to linux-next and added some trivial fixes.

Signed-off-by: Katsuhiro Suzuki 
Signed-off-by: Mark Brown 
---
 .../bindings/sound/rockchip,rk3328-codec.txt  |  23 +
 sound/soc/codecs/Kconfig  |   5 +
 sound/soc/codecs/Makefile |   2 +
 sound/soc/codecs/rk3328_codec.c   | 517 ++
 sound/soc/codecs/rk3328_codec.h   | 210 +++
 5 files changed, 757 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt
 create mode 100644 sound/soc/codecs/rk3328_codec.c
 create mode 100644 sound/soc/codecs/rk3328_codec.h

diff --git a/Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt 
b/Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt
new file mode 100644
index ..2469588c7ccb
--- /dev/null
+++ b/Documentation/devicetree/bindings/sound/rockchip,rk3328-codec.txt
@@ -0,0 +1,23 @@
+* Rockchip Rk3328 internal codec
+
+Required properties:
+
+- compatible: "rockchip,rk3328-codec"
+- reg: physical base address of the controller and length of memory mapped
+  region.
+- rockchip,grf: the phandle of the syscon node for GRF register.
+- clocks: a list of phandle + clock-specifer pairs, one for each entry in 
clock-names.
+- clock-names: should be "pclk".
+- spk-depop-time-ms: speak depop time msec.
+
+Example for rk3328 internal codec:
+
+codec: codec@ff41 {
+   compatible = "rockchip,rk3328-codec";
+   reg = <0x0 0xff41 0x0 0x1000>;
+   rockchip,grf = <>;
+   clocks = < PCLK_ACODEC>;
+   clock-names = "pclk";
+   spk-depop-time-ms = 100;
+   status = "disabled";
+};
diff --git a/sound/soc/codecs/Kconfig b/sound/soc/codecs/Kconfig
index d46de3e04ff6..87cb9c51e6f5 100644
--- a/sound/soc/codecs/Kconfig
+++ b/sound/soc/codecs/Kconfig
@@ -130,6 +130,7 @@ config SND_SOC_ALL_CODECS
select SND_SOC_PCM5102A
select SND_SOC_PCM512x_I2C if I2C
select SND_SOC_PCM512x_SPI if SPI_MASTER
+   select SND_SOC_RK3328
select SND_SOC_RT274 if I2C
select SND_SOC_RT286 if I2C
select SND_SOC_RT298 if I2C
@@ -806,6 +807,10 @@ config SND_SOC_PCM512x_SPI
select SND_SOC_PCM512x
select REGMAP_SPI
 
+config SND_SOC_RK3328
+   tristate "Rockchip RK3328 audio CODEC"
+   select REGMAP_MMIO
+
 config SND_SOC_RL6231
tristate
default y if SND_SOC_RT5514=y
diff --git a/sound/soc/codecs/Makefile b/sound/soc/codecs/Makefile
index fbe36e6177b0..9bb3346fab2f 100644
--- a/sound/soc/codecs/Makefile
+++ b/sound/soc/codecs/Makefile
@@ -133,6 +133,7 @@ snd-soc-pcm5102a-objs := pcm5102a.o
 snd-soc-pcm512x-objs := pcm512x.o
 snd-soc-pcm512x-i2c-objs := pcm512x-i2c.o
 snd-soc-pcm512x-spi-objs := pcm512x-spi.o
+snd-soc-rk3328-objs := rk3328_codec.o
 snd-soc-rl6231-objs := rl6231.o
 snd-soc-rl6347a-objs := rl6347a.o
 snd-soc-rt1305-objs := rt1305.o
@@ -400,6 +401,7 @@ obj-$(CONFIG_SND_SOC_PCM5102A)  += snd-soc-pcm5102a.o
 obj-$(CONFIG_SND_SOC_PCM512x)  += snd-soc-pcm512x.o
 obj-$(CONFIG_SND_SOC_PCM512x_I2C)  += snd-soc-pcm512x-i2c.o
 obj-$(CONFIG_SND_SOC_PCM512x_SPI)  += snd-soc-pcm512x-spi.o
+obj-$(CONFIG_SND_SOC_RK3328)   += snd-soc-rk3328.o
 obj-$(CONFIG_SND_SOC_RL6231)   += snd-soc-rl6231.o
 obj-$(CONFIG_SND_SOC_RL6347A)  += snd-soc-rl6347a.o
 obj-$(CONFIG_SND_SOC_RT1305)   += snd-soc-rt1305.o
diff --git a/sound/soc/codecs/rk3328_codec.c b/sound/soc/codecs/rk3328_codec.c
new file mode 100644
index ..71f3fc2d970c
--- /dev/null
+++ b/sound/soc/codecs/rk3328_codec.c
@@ -0,0 +1,517 @@
+// SPDX-License-Identifier: GPL-2.0

Applied "ASoC: rockchip: add workaround for silence of rk3288 ACODEC" to the asoc tree

2019-01-04 Thread Mark Brown
The patch

   ASoC: rockchip: add workaround for silence of rk3288 ACODEC

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From f5758544d98c8bec7793aeea28928f5e8bd45d47 Mon Sep 17 00:00:00 2001
From: Katsuhiro Suzuki 
Date: Fri, 21 Dec 2018 00:36:36 +0900
Subject: [PATCH] ASoC: rockchip: add workaround for silence of rk3288 ACODEC

This patch adds reset and precharge in shutdown of PCM device.

ACODEC goes to silence if we change Fs to 44.1kHz from 48kHz. This
workaround seems to work but I don't know this workaround is correct
sequence or not for ACODEC.

Signed-off-by: Katsuhiro Suzuki 
Signed-off-by: Mark Brown 
---
 sound/soc/codecs/rk3328_codec.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/sound/soc/codecs/rk3328_codec.c b/sound/soc/codecs/rk3328_codec.c
index 71f3fc2d970c..f3442a2283ea 100644
--- a/sound/soc/codecs/rk3328_codec.c
+++ b/sound/soc/codecs/rk3328_codec.c
@@ -261,9 +261,12 @@ static int rk3328_codec_close_playback(struct 
rk3328_codec_priv *rk3328)
mdelay(1);
}
 
+   /* Workaround for silence when changed Fs 48 -> 44.1kHz */
+   rk3328_codec_reset(rk3328);
+
regmap_update_bits(rk3328->regmap, DAC_PRECHARGE_CTRL,
   DAC_CHARGE_CURRENT_ALL_MASK,
-  DAC_CHARGE_CURRENT_I);
+  DAC_CHARGE_CURRENT_ALL_ON);
 
return 0;
 }
-- 
2.20.1



Re: [PATCH] tty/n_hdlc: fix sleep in !TASK_RUNNING state warning

2019-01-04 Thread Paul Fulghum



> On Jan 4, 2019, at 2:23 AM, Tetsuo Handa  
> wrote:
> 
> But why not to clarify what are appropriate sanity checks?
> ...
> want a cleanup for scripts/checkpatch.pl .


These are good goals. I avoid purely cosmetic patches. I do not object to 
cosmetic patches from others that do not change behavior.

The checks that concern you deal with changing tty line disciplines. Dealing 
with line discipline changes has been an ongoing issue since n_hdlc was derived 
from other line disciplines 20 years ago, with major overhauls along the way. 
It is complex: driver layers shifting during operation while dealing properly 
with opens, closes, hangups, and sleeping operations. Patches have been added 
to the latest unreleased kernel to address line discipline changes, it is still 
evolving.

Why are the existing line discipline checks in n_hdlc where they are? Becasue 
that’s how they evolved from where they started to accomodate these changes. 
There are not many and their function is known: has the line discipline changed 
at that point? I know that is not satisfying but coming up with a definitive 
comment saying a check is absolutely required in one place and not in another 
requires more insight into the long history of a moving target than I have. 
Without that insight I would not alter existing checks in code that is not 
causing problems.

Re: [PATCH] Variable "val" in function rt274_i2c_probe() could be uninitialized

2019-01-04 Thread Mark Brown
On Thu, Jan 03, 2019 at 01:59:12PM -0800, Yizhuo wrote:
> Inside function rt274_i2c_probe(), if regmap_read() function
> returns -EINVAL, then local variable "val" leaves uninitialized
> but used in if statement. This is potentially unsafe.

Please use subject lines matching the style for the subsystem.  This
makes it easier for people to identify relevant patches.


signature.asc
Description: PGP signature


[PATCH 1/2] mailbox: mailbox-test: fix debugfs in multi-instances

2019-01-04 Thread Fabien Dessenne
Create one debug entry directory per instance to support the multi
instantiation.

Signed-off-by: Ludovic Barre 
Signed-off-by: Fabien Dessenne 
---
 drivers/mailbox/mailbox-test.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c
index 58bfafc..4e2bfba 100644
--- a/drivers/mailbox/mailbox-test.c
+++ b/drivers/mailbox/mailbox-test.c
@@ -31,7 +31,6 @@
 (MBOX_MAX_MSG_LEN / MBOX_BYTES_PER_LINE))
 
 static bool mbox_data_ready;
-static struct dentry *root_debugfs_dir;
 
 struct mbox_test_device {
struct device   *dev;
@@ -45,6 +44,7 @@ struct mbox_test_device {
spinlock_t  lock;
wait_queue_head_t   waitq;
struct fasync_struct*async_queue;
+   struct dentry   *root_debugfs_dir;
 };
 
 static ssize_t mbox_test_signal_write(struct file *filp,
@@ -262,16 +262,16 @@ static int mbox_test_add_debugfs(struct platform_device 
*pdev,
if (!debugfs_initialized())
return 0;
 
-   root_debugfs_dir = debugfs_create_dir("mailbox", NULL);
-   if (!root_debugfs_dir) {
+   tdev->root_debugfs_dir = debugfs_create_dir(dev_name(>dev), NULL);
+   if (!tdev->root_debugfs_dir) {
dev_err(>dev, "Failed to create Mailbox debugfs\n");
return -EINVAL;
}
 
-   debugfs_create_file("message", 0600, root_debugfs_dir,
+   debugfs_create_file("message", 0600, tdev->root_debugfs_dir,
tdev, _test_message_ops);
 
-   debugfs_create_file("signal", 0200, root_debugfs_dir,
+   debugfs_create_file("signal", 0200, tdev->root_debugfs_dir,
tdev, _test_signal_ops);
 
return 0;
@@ -416,7 +416,7 @@ static int mbox_test_remove(struct platform_device *pdev)
 {
struct mbox_test_device *tdev = platform_get_drvdata(pdev);
 
-   debugfs_remove_recursive(root_debugfs_dir);
+   debugfs_remove_recursive(tdev->root_debugfs_dir);
 
if (tdev->tx_channel)
mbox_free_channel(tdev->tx_channel);
-- 
2.7.4



[PATCH 2/2] mailbox: stm32-ipcc: remove useless device_init_wakeup call

2019-01-04 Thread Fabien Dessenne
If the "wakeup-source" property does not exist there is no need to call
device_init_wakeup("false") at probe.

Signed-off-by: Fabien Dessenne 
---
 drivers/mailbox/stm32-ipcc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/mailbox/stm32-ipcc.c b/drivers/mailbox/stm32-ipcc.c
index 7ca8a14..210fe50 100644
--- a/drivers/mailbox/stm32-ipcc.c
+++ b/drivers/mailbox/stm32-ipcc.c
@@ -276,8 +276,6 @@ static int stm32_ipcc_probe(struct platform_device *pdev)
dev_err(dev, "Failed to set wake up irq\n");
goto err_init_wkp;
}
-   } else {
-   device_init_wakeup(dev, false);
}
 
/* mailbox controller */
-- 
2.7.4



[PATCH 1/2] mailbox: stm32-ipcc: do not enable wakeup source by default

2019-01-04 Thread Fabien Dessenne
By default do not enable the wakeup source.
This lets the userspace application decide whether the wakeup source
shall be enabled or not.

Signed-off-by: Fabien Dessenne 
---
 drivers/mailbox/stm32-ipcc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/mailbox/stm32-ipcc.c b/drivers/mailbox/stm32-ipcc.c
index a338bd4..7ca8a14 100644
--- a/drivers/mailbox/stm32-ipcc.c
+++ b/drivers/mailbox/stm32-ipcc.c
@@ -270,7 +270,7 @@ static int stm32_ipcc_probe(struct platform_device *pdev)
goto err_clk;
}
 
-   device_init_wakeup(dev, true);
+   device_set_wakeup_capable(dev, true);
ret = dev_pm_set_dedicated_wake_irq(dev, ipcc->wkp);
if (ret) {
dev_err(dev, "Failed to set wake up irq\n");
-- 
2.7.4



Re: NULL pointer dereference when writing fuzzed data to /dev/uhid

2019-01-04 Thread Anatoly Trosinenko
> I wanted to tell you that I started investigating the other private
> report you sent us, but couldn't find the time to properly come with a
> fix as the fuzzed data is hard to discriminate from valid data.

Oops, I thought I was "over securing" these issues and everyone
ignored them since I heard default policy here is to not send reports
privately without a good reason, so I re-evaluated them and sent
publicly...

OTOH these ones seem to be not too severe: on the first glance they
require a physical or root access, and even if they don't, *these
ones* are only NULL dereferences.

Best regards
Anatoly

пт, 4 янв. 2019 г. в 16:25, Benjamin Tissoires :
>
> Hi Anatoly,
>
>
> On Fri, Jan 4, 2019 at 1:32 PM Anatoly Trosinenko
>  wrote:
> >
> > Hello,
> >
> > When writing the attached file to /dev/uhid, a NULL dereference occurs
> > in kernel. As I understand, the problem is not UHID-specific, but is
> > related to HID subsystem.
>
> Thanks for the report.
> I wanted to tell you that I started investigating the other private
> report you sent us, but couldn't find the time to properly come with a
> fix as the fuzzed data is hard to discriminate from valid data.
>
> A couple of notes though:
> - writing to uhid needs to be done by root. Any distribution that
> doesn't enforce that is doomed to have several security issues
> - we could somehow reproduce those fuzzed data on a USB or Bluetooth
> connection, but that would require physical access to the device, so
> you are doomed also
> - last IIRC, there was some attempts by the ChromeOS team to allow
> access to the HID stack from the Chrome plugins, I don't know if this
> is able to generate the issues.
>
> On the specifics reported here:
>
> >
> > How to reproduce:
> > 1) Checkout the fresh master branch of the Linux kernel (tested on
> > commit 96d4f267e)
> > 2) Compile it with the attached config (kvm-xfstests capable)
> > 3) Take one of reproducers and execute
> > cat /vtmp/repro > /dev/uhid
> >
> > What happens:
> >
> > For chicony.bin:
> >
> > root@kvm-xfstests:~# cat /vtmp/chicony.bin > /dev/uhid
> > [   19.072703] BUG: unable to handle kernel NULL pointer dereference
> > at 0002
> > [   19.073371] #PF error: [normal kernel read fault]
> > [   19.073755] PGD 800078b2c067 P4D 800078b2c067 PUD 0
> > [   19.074223] Oops:  [#1] SMP PTI
> > [   19.074809] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted
> > 4.20.0-xfstests-10979-g96d4f267e40 #1
> > [   19.075965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 1.11.1-1ubuntu1 04/01/2014
> > [   19.077599] Workqueue: events uhid_device_add_worker
> > [   19.078019] RIP: 0010:ch_switch12_report_fixup+0x13/0x70
>
> This driver expects the device to be connected on USB, and you are
> triggering the oops because you are on uhid.
> I am chasing the USB dependencies in most drivers, but this is a hard
> task to do when I do not have access to the actual devices.
>
> I guess one way of fixing that is to add a check for the actual
> transport driver during probe:
> hid_is_using_ll_driver(hdev, _hid_driver)
>
> Patches are welcome :)
>
> > [   19.078462] Code: 49 8b 00 3e 80 60 20 df b8 01 00 00 00 c3 66 0f
> > 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 8f 48 19 00 00 48 89 f0 48
> > 8b 49 d8 <80> 79 02 01 74 01 c3 81 7f 3c 21 14 00 00 75 f6 83 3a 7f 76
> > f1 80
> > [   19.080103] RSP: 0018:a1d880367c48 EFLAGS: 00010286
> > [   19.080541] RAX: 9b653d27b180 RBX: 9b653a6fb948 RCX: 
> > 
> > [   19.081133] RDX: a1d880367c5c RSI: 9b653d27b180 RDI: 
> > 9b653a6fa000
> > [   19.081780] RBP: 9b653d27b180 R08: 00064992eed0 R09: 
> > 
> > [   19.082409] R10:  R11:  R12: 
> > 9b653a6fa000
> > [   19.083017] R13: 83f14510 R14: 83f14440 R15: 
> > 
> > [   19.083619] FS:  () GS:9b653fc0()
> > knlGS:
> > [   19.084362] CS:  0010 DS:  ES:  CR0: 80050033
> > [   19.085164] CR2: 0002 CR3: 788b8004 CR4: 
> > 00360ef0
> > [   19.085789] Call Trace:
> > [   19.086011]  hid_open_report+0x81/0x2c0
> > [   19.086341]  hid_device_probe+0x135/0x160
> > [   19.086754]  ? __driver_attach+0x110/0x110
> > [   19.087109]  really_probe+0xe0/0x390
> > [   19.087411]  ? __driver_attach+0x110/0x110
> > [   19.087782]  bus_for_each_drv+0x78/0xc0
> > [   19.088134]  __device_attach+0xcc/0x130
> > [   19.088477]  bus_probe_device+0x9f/0xb0
> > [   19.088832]  device_add+0x422/0x680
> > [   19.089144]  ? __debugfs_create_file+0xb5/0xf0
> > [   19.089536]  hid_add_device+0xec/0x280
> > [   19.089880]  uhid_device_add_worker+0x15/0x60
> > [   19.090270]  process_one_work+0x238/0x5d0
> > [   19.090627]  worker_thread+0x3d/0x390
> > [   19.090959]  ? process_one_work+0x5d0/0x5d0
> > [   19.091331]  kthread+0x121/0x140
> > [   19.096732]  ? __kthread_create_on_node+0x1a0/0x1a0
> > [   19.097164]  

[PATCH 2/2] mailbox: mailbox-test: fix null pointer if no mmio

2019-01-04 Thread Fabien Dessenne
Fix null pointer issue if resource_size is called with no ioresource.

Signed-off-by: Ludovic Barre 
Signed-off-by: Fabien Dessenne 
---
 drivers/mailbox/mailbox-test.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c
index 4e2bfba..4e4ac4b 100644
--- a/drivers/mailbox/mailbox-test.c
+++ b/drivers/mailbox/mailbox-test.c
@@ -363,22 +363,24 @@ static int mbox_test_probe(struct platform_device *pdev)
 
/* It's okay for MMIO to be NULL */
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   size = resource_size(res);
tdev->tx_mmio = devm_ioremap_resource(>dev, res);
-   if (PTR_ERR(tdev->tx_mmio) == -EBUSY)
+   if (PTR_ERR(tdev->tx_mmio) == -EBUSY) {
/* if reserved area in SRAM, try just ioremap */
+   size = resource_size(res);
tdev->tx_mmio = devm_ioremap(>dev, res->start, size);
-   else if (IS_ERR(tdev->tx_mmio))
+   } else if (IS_ERR(tdev->tx_mmio)) {
tdev->tx_mmio = NULL;
+   }
 
/* If specified, second reg entry is Rx MMIO */
res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
-   size = resource_size(res);
tdev->rx_mmio = devm_ioremap_resource(>dev, res);
-   if (PTR_ERR(tdev->rx_mmio) == -EBUSY)
+   if (PTR_ERR(tdev->rx_mmio) == -EBUSY) {
+   size = resource_size(res);
tdev->rx_mmio = devm_ioremap(>dev, res->start, size);
-   else if (IS_ERR(tdev->rx_mmio))
+   } else if (IS_ERR(tdev->rx_mmio)) {
tdev->rx_mmio = tdev->tx_mmio;
+   }
 
tdev->tx_channel = mbox_test_request_channel(pdev, "tx");
tdev->rx_channel = mbox_test_request_channel(pdev, "rx");
-- 
2.7.4



Re: [PATCH v4 00/10] steal tasks to improve CPU utilization

2019-01-04 Thread Shijith Thotton
On 07-Dec-18 3:09 AM, Steve Sistare wrote:
> 
> When a CPU has no more CFS tasks to run, and idle_balance() fails to
> find a task, then attempt to steal a task from an overloaded CPU in the
> same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently
> identify candidates.  To minimize search time, steal the first migratable
> task that is found when the bitmap is traversed.  For fairness, search
> for migratable tasks on an overloaded CPU in order of next to run.
> 
> This simple stealing yields a higher CPU utilization than idle_balance()
> alone, because the search is cheap, so it may be called every time the CPU
> is about to go idle.  idle_balance() does more work because it searches
> widely for the busiest queue, so to limit its CPU consumption, it declines
> to search if the system is too busy.  Simple stealing does not offload the
> globally busiest queue, but it is much better than running nothing at all.
> 
> The bitmap of overloaded CPUs is a new type of sparse bitmap, designed to
> reduce cache contention vs the usual bitmap when many threads concurrently
> set, clear, and visit elements.
> 
> Patch 1 defines the sparsemask type and its operations.
> 
> Patches 2, 3, and 4 implement the bitmap of overloaded CPUs.
> 
> Patches 5 and 6 refactor existing code for a cleaner merge of later
>patches.
> 
> Patches 7 and 8 implement task stealing using the overloaded CPUs bitmap.
> 
> Patch 9 disables stealing on systems with more than 2 NUMA nodes for the
> time being because of performance regressions that are not due to stealing
> per-se.  See the patch description for details.
> 
> Patch 10 adds schedstats for comparing the new behavior to the old, and
>provided as a convenience for developers only, not for integration.
> 
> The patch series is based on kernel 4.20.0-rc1.  It compiles, boots, and
> runs with/without each of CONFIG_SCHED_SMT, CONFIG_SMP, CONFIG_SCHED_DEBUG,
> and CONFIG_PREEMPT.  It runs without error with CONFIG_DEBUG_PREEMPT +
> CONFIG_SLUB_DEBUG + CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_MUTEXES +
> CONFIG_DEBUG_SPINLOCK + CONFIG_DEBUG_ATOMIC_SLEEP.  CPU hot plug and CPU
> bandwidth control were tested.
> 
> Stealing improves utilization with only a modest CPU overhead in scheduler
> code.  In the following experiment, hackbench is run with varying numbers
> of groups (40 tasks per group), and the delta in /proc/schedstat is shown
> for each run, averaged per CPU, augmented with these non-standard stats:
> 
>%find - percent of time spent in old and new functions that search for
>  idle CPUs and tasks to steal and set the overloaded CPUs bitmap.
> 
>steal - number of times a task is stolen from another CPU.
> 
> X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs
> Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> hackbench  process 10
> sched_wakeup_granularity_ns=1500
> 
>baseline
>grps  time  %busy  slice   sched   idle wake %find  steal
>18.084  75.02   0.10  105476  4629159183  0.31  0
>2   13.892  85.33   0.10  190225  70958   119264  0.45  0
>3   19.668  89.04   0.10  263896  87047   176850  0.49  0
>4   25.279  91.28   0.10  322171  94691   227474  0.51  0
>8   47.832  94.86   0.09  630636 144141   486322  0.56  0
> 
>new
>grps  time  %busy  slice   sched   idle wake %find  steal  %speedup
>15.938  96.80   0.24   31255   719024061  0.63   7433  36.1
>2   11.491  99.23   0.16   74097   457869512  0.84  19463  20.9
>3   16.987  99.66   0.15  115824   1985   113826  0.77  24707  15.8
>4   22.504  99.80   0.14  167188   2385   164786  0.75  29353  12.3
>8   44.441  99.86   0.11  389153   1616   387401  0.67  38190   7.6
> 
> Elapsed time improves by 8 to 36%, and CPU busy utilization is up
> by 5 to 22% hitting 99% for 2 or more groups (80 or more tasks).
> The cost is at most 0.4% more find time.
> 
> Additional performance results follow.  A negative "speedup" is a
> regression.  Note: for all hackbench runs, sched_wakeup_granularity_ns
> is set to 15 msec.  Otherwise, preemptions increase at higher loads and
> distort the comparison between baseline and new.
> 
> -- 1 Socket Results --
> 
> X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs
> Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> Average of 10 runs of: hackbench  process 10
> 
>  --- base ----- new ---
>groupstime %stdevtime %stdev  %speedup
> 1   8.0080.1   5.9050.2  35.6
> 2  13.8140.2  11.4380.1  20.7
> 3  19.4880.2  16.9190.1  15.1
> 4  25.0590.1  22.4090.1  11.8
> 8  47.4780.1  44.2210.1   7.3
> 
> X6-2: 1 socket * 22 cores * 2 hyperthreads = 44 CPUs
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Average of 10 runs of: hackbench  process 10
> 
>  --- base ----- new ---
>groupstime %stdev  

Re: [PATCH v4 1/2] spi: Add Renesas R-Car Gen3 RPC SPI controller driver

2019-01-04 Thread Sergei Shtylyov
Hello!

On 01/03/2019 09:35 AM, masonccy...@mxic.com.tw wrote:

[...]
>> >> > +#define RPC_CMNCR_MOIIO3(val)   (((val) & 0x3) << 22)
>> >> > +#define RPC_CMNCR_MOIIO2(val)   (((val) & 0x3) << 20)
>> >> > +#define RPC_CMNCR_MOIIO1(val)   (((val) & 0x3) << 18)
>> >> > +#define RPC_CMNCR_MOIIO0(val)   (((val) & 0x3) << 16)
>> >> > +#define RPC_CMNCR_MOIIO_HIZ   (RPC_CMNCR_MOIIO0(3) |
>> >> RPC_CMNCR_MOIIO1(3) | \
>> >> > + RPC_CMNCR_MOIIO2(3) | RPC_CMNCR_MOIIO3(3))
>> >> > +#define RPC_CMNCR_IO3FV(val)   (((val) & 0x3) << 14)
>> >> > +#define RPC_CMNCR_IO2FV(val)   (((val) & 0x3) << 12)
>> >>
>> >>Like I said, the above 2 aren't documented in the manual v1.00...
>> >
>> > okay, add a description as:
>> > /* RPC_CMNCR_IO3FV/IO2FV are undocumented bit, but must be set */
>> > #define RPC_CMNCR_IO3FV(val)(((val) & 0x3) << 14)
>> > #define RPC_CMNCR_IO2FV(val)(((val) & 0x3) << 12)
>> > #define RPC_CMNCR_IO0FV(val)(((val) & 0x3) << 8)
>> > #define RPC_CMNCR_IOFV_HIZ  (RPC_CMNCR_IO0FV(3) | RPC_CMNCR_IO2FV(3) | 
>> > \
>> >   RPC_CMNCR_IO3FV(3))
>> >
>> > is it ok?
>>
>>Yes. But would have been enough if you just commented with // on
>> the same line --
>> it seems these are legal now...
> 
> on the same line is over 80 char,
> #define RPC_CMNCR_IO3FV(val)(((val) & 0x3) << 14) // undocumented bit, 
> but must be set
> #define RPC_CMNCR_IO2FV(val)(((val) & 0x3) << 12) // undocumented bit, 
> but must be set
> 
> or just
> #define RPC_CMNCR_IO3FV(val)(((val) & 0x3) << 14) // undocumented bit
> #define RPC_CMNCR_IO2FV(val)(((val) & 0x3) << 12) // undocumented bit
> is it ok ?

   The second variant would be enough.

[...]

>> >> > +   ret = rpc_spi_set_freq(rpc, desc->mem->spi->max_speed_hz);
>> >> > +   if (ret)
>> >> > +  return ret;
>> >> > +
>> >> > +   rpc_spi_mem_set_prep_op_cfg(desc->mem->spi,
>> >> > +>info.op_tmpl, , );
>> >> > +
>> >> > +   regmap_write(rpc->regmap, RPC_CMNCR, RPC_CMNCR_SFDE |
>> >> > +   RPC_CMNCR_MOIIO_HIZ | RPC_CMNCR_IOFV_HIZ |
>> >> > +   RPC_CMNCR_BSZ(0));
>> >>
>> >>Why not set this in the probing time and only set/clear the MD bit?
>> >>
>> >
>> > same above!
>> > Make sure the value in these register are setting correctly
>> > before RPC starting a SPI transfer.
>>
>>You can set it once and only change the bits you need to change 
>> afterwards.
>> What's wrong with it?
>>
> 
> if so, it will patch to:
> --
> regmap_read(rpc->regmap, RPC_CMNCR, );
> data &= ~RPC_CMNCR_MD;
> regmap_write(rpc->regmap, RPC_CMNCR, data);
> --
> Do you think this way is better ?

No, this one is better:

regmap_update_bits(rpc->regmap, RPC_CMNCR, RPC_CMNCR_MD, 0);

> maybe this is better,
> write(read(rpc->regs + RPC_CMNCR) & ~RPC_CMNCR_MD,
> rpc->regs + RPC_CMNCR);

   It's effectively the same code as your 1st variant...
 
[...]
>> >> > +static void rpc_spi_transfer_setup(struct rpc_spi *rpc,
>> >> > +   struct spi_message *msg)
>> >> > +{
>> >> [...]
>> >> > +   for (i = xfercnt - 1; i < xfercnt && xfercnt > 1; i++) {
>> >> > +  if (xfer[i].rx_buf) {
>> >> > + rpc->smenr |=
>> >> > +RPC_SMENR_SPIDE(rpc_bits_set(xfer[i].len)) |
>> >> > +RPC_SMENR_SPIDB
>> >> > +(ilog2((unsigned int)xfer[i].rx_nbits));
>> >>
>> >>Mhm, I would indent this contination line by 1 extra tab...
>> >>
>> >> > +  } else if (xfer[i].tx_buf) {
>> >> > + rpc->smenr |=
>> >> > +RPC_SMENR_SPIDE(rpc_bits_set(xfer[i].len)) |
>> >> > +RPC_SMENR_SPIDB
>> >> > +(ilog2((unsigned int)xfer[i].tx_nbits));
>> >>
>> >>And this one...
>> >
>> > like this ?
>> > 
>> >  for (i = xfercnt - 1; i < xfercnt && xfercnt > 1; i++) {
>> >  if (xfer[i].rx_buf) {
>> >  rpc->smenr |=
>> >  RPC_SMENR_SPIDE(rpc_bits_set(xfer
>> [i].len)) |
>> >  RPC_SMENR_SPIDB(
>> >  ilog2((unsigned int)xfer
>> [i].rx_nbits));
>> >  } else if (xfer[i].tx_buf) {
>> >  rpc->smenr |=
>> >  RPC_SMENR_SPIDE(rpc_bits_set(xfer
>> [i].len)) |
>> >  RPC_SMENR_SPIDB(
>> >  ilog2((unsigned int)xfer
>> [i].tx_nbits));
>>
>>I didn't mean you need to leave ( on the first line, can be left
>> on the  new
>> line, as before.
>>
> 
> how about this style ?
> -
>  for (i = xfercnt - 1; i < xfercnt && xfercnt > 1; i++) {
>  if (xfer[i].rx_buf) {
> 

Re: [PATCH 00/10] steal tasks to improve CPU utilization

2019-01-04 Thread Shijith Thotton
On 22-Oct-18 8:40 PM, Steve Sistare wrote:
> 
> When a CPU has no more CFS tasks to run, and idle_balance() fails to
> find a task, then attempt to steal a task from an overloaded CPU in the
> same LLC. Maintain and use a bitmap of overloaded CPUs to efficiently
> identify candidates.  To minimize search time, steal the first migratable
> task that is found when the bitmap is traversed.  For fairness, search
> for migratable tasks on an overloaded CPU in order of next to run.
> 
> This simple stealing yields a higher CPU utilization than idle_balance()
> alone, because the search is cheap, so it may be called every time the CPU
> is about to go idle.  idle_balance() does more work because it searches
> widely for the busiest queue, so to limit its CPU consumption, it declines
> to search if the system is too busy.  Simple stealing does not offload the
> globally busiest queue, but it is much better than running nothing at all.
> 
> The bitmap of overloaded CPUs is a new type of sparse bitmap, designed to
> reduce cache contention vs the usual bitmap when many threads concurrently
> set, clear, and visit elements.
> 
> Patch 1 defines the sparsemask type and its operations.
> 
> Patches 2, 3, and 4 implement the bitmap of overloaded CPUs.
> 
> Patches 5 and 6 refactor existing code for a cleaner merge of later
>patches.
> 
> Patches 7 and 8 implement task stealing using the overloaded CPUs bitmap.
> 
> Patch 9 disables stealing on systems with more than 2 NUMA nodes for the
> time being because of performance regressions that are not due to stealing
> per-se.  See the patch description for details.
> 
> Patch 10 adds schedstats for comparing the new behavior to the old, and
>provided as a convenience for developers only, not for integration.
> 
> The patch series is based on kernel 4.19.0-rc7.  It compiles, boots, and
> runs with/without each of CONFIG_SCHED_SMT, CONFIG_SMP, CONFIG_SCHED_DEBUG,
> and CONFIG_PREEMPT.  It runs without error with CONFIG_DEBUG_PREEMPT +
> CONFIG_SLUB_DEBUG + CONFIG_DEBUG_PAGEALLOC + CONFIG_DEBUG_MUTEXES +
> CONFIG_DEBUG_SPINLOCK + CONFIG_DEBUG_ATOMIC_SLEEP.  CPU hot plug and CPU
> bandwidth control were tested.
> 
> Stealing imprroves utilization with only a modest CPU overhead in scheduler
> code.  In the following experiment, hackbench is run with varying numbers
> of groups (40 tasks per group), and the delta in /proc/schedstat is shown
> for each run, averaged per CPU, augmented with these non-standard stats:
> 
>%find - percent of time spent in old and new functions that search for
>  idle CPUs and tasks to steal and set the overloaded CPUs bitmap.
> 
>steal - number of times a task is stolen from another CPU.
> 
> X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs
> Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> hackbench  process 10
> sched_wakeup_granularity_ns=1500
> 
>baseline
>grps  time  %busy  slice   sched   idle wake %find  steal
>18.084  75.02   0.10  105476  4629159183  0.31  0
>2   13.892  85.33   0.10  190225  70958   119264  0.45  0
>3   19.668  89.04   0.10  263896  87047   176850  0.49  0
>4   25.279  91.28   0.10  322171  94691   227474  0.51  0
>8   47.832  94.86   0.09  630636 144141   486322  0.56  0
> 
>new
>grps  time  %busy  slice   sched   idle wake %find  steal  %speedup
>15.938  96.80   0.24   31255   719024061  0.63   7433  36.1
>2   11.491  99.23   0.16   74097   457869512  0.84  19463  20.9
>3   16.987  99.66   0.15  115824   1985   113826  0.77  24707  15.8
>4   22.504  99.80   0.14  167188   2385   164786  0.75  29353  12.3
>8   44.441  99.86   0.11  389153   1616   387401  0.67  38190   7.6
> 
> Elapsed time improves by 8 to 36%, and CPU busy utilization is up
> by 5 to 22% hitting 99% for 2 or more groups (80 or more tasks).
> The cost is at most 0.4% more find time.
> 
> Additional performance results follow.  A negative "speedup" is a
> regression.  Note: for all hackbench runs, sched_wakeup_granularity_ns
> is set to 15 msec.  Otherwise, preemptions increase at higher loads and
> distort the comparison between baseline and new.
> 
> -- 1 Socket Results --
> 
> X6-2: 1 socket * 10 cores * 2 hyperthreads = 20 CPUs
> Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> Average of 10 runs of: hackbench  process 10
> 
>  --- base ----- new ---
>groupstime %stdevtime %stdev  %speedup
> 1   8.0080.1   5.9050.2  35.6
> 2  13.8140.2  11.4380.1  20.7
> 3  19.4880.2  16.9190.1  15.1
> 4  25.0590.1  22.4090.1  11.8
> 8  47.4780.1  44.2210.1   7.3
> 
> X6-2: 1 socket * 22 cores * 2 hyperthreads = 44 CPUs
> Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> Average of 10 runs of: hackbench  process 10
> 
>  --- base ----- new ---
>groupstime %stdev 

Re: [PATCH] irqchip: madera: Drop GPIO includes

2019-01-04 Thread Richard Fitzgerald

On 03/01/19 10:26, Linus Walleij wrote:

This irqchip does not use anything GPIO-related so drop
the GPIO includes.



An older version did some stuff with GPIOs but when that code was removed
this include obviously was forgotten.


Cc: Richard Fitzgerald 
Signed-off-by: Linus Walleij 
---
  drivers/irqchip/irq-madera.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/drivers/irqchip/irq-madera.c b/drivers/irqchip/irq-madera.c
index e9256dee1a45..8b81271c823c 100644
--- a/drivers/irqchip/irq-madera.c
+++ b/drivers/irqchip/irq-madera.c
@@ -7,7 +7,6 @@
   */
  
  #include 

-#include 
  #include 
  #include 
  #include 
@@ -16,7 +15,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 



Acked-by: Richard Fitzgerald 


Re: [PATCH] ARM: mach-s3c24xx: Fix boolean expressions in osiris_dvs_notify

2019-01-04 Thread Krzysztof Kozlowski
On Thu, 3 Jan 2019 at 21:46, Gustavo A. R. Silva  wrote:
>
> Fix boolean expressions by using logical AND operator '&&'
> instead of bitwise operator '&'.
>
> This issue was detected with the help of Coccinelle.
>
> Fixes: 4fa084af28ca ("ARM: OSIRIS: DVS (Dynamic Voltage Scaling) supoort.")
> Cc: sta...@vger.kernel.org
> Signed-off-by: Gustavo A. R. Silva 
> ---
>  arch/arm/mach-s3c24xx/mach-osiris-dvs.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c 
> b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c
> index 058ce73137e8..ccbd7b7965ca 100644
> --- a/arch/arm/mach-s3c24xx/mach-osiris-dvs.c
> +++ b/arch/arm/mach-s3c24xx/mach-osiris-dvs.c
> @@ -65,16 +65,16 @@ static int osiris_dvs_notify(struct notifier_block *nb,
>
> switch (val) {
> case CPUFREQ_PRECHANGE:
> -   if (old_dvs & !new_dvs ||
> -   cur_dvs & !new_dvs) {
> +   if (old_dvs && !new_dvs ||
> +   cur_dvs && !new_dvs) {

Technically the old code will work fine because all variables are
bools (so 0 or 1). Therefore I am not sure whether this should be
ported to stable.

Anyway the patch is itself okay and I will take it after merge window. Thanks!

Best regards,
Krzysztof


Re: Column for keywords?

2019-01-04 Thread Dmitry Vyukov
On Fri, Jan 4, 2019 at 2:01 PM Dmitry Vyukov  wrote:
>
> On Wed, May 16, 2018 at 6:50 PM Tetsuo Handa
>  wrote:
> >
> > Dmitry Vyukov wrote:
> > > On Tue, May 15, 2018 at 10:59 PM, Tetsuo Handa
> > >  wrote:
> > > > Dmitry Vyukov wrote:
> > > >> I've filed https://github.com/google/syzkaller/issues/608 to not lose
> > > >> track of this.
> > > >
> > > > Thanks. Since the time lag between a patch was proposed and that patch 
> > > > is
> > > > applied to a git tree tends to become long, duplicated works like
> > > > https://www.spinics.net/lists/linux-fsdevel/msg125240.html and
> > > > http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056...@i-love.sakura.ne.jp
> > > > are already occurring.
> > >
> > > This is bad.
> > >
> > > > Therefore, it is important that the state of the bug (e.g.
> > > > bisected, cause identified, patch proposed) is visible from the table.
> > >
> > > What do you think about the last section of:
> > > https://groups.google.com/d/msg/syzkaller-bugs/nw7BIW9V2wk/NE0P_Au4AQAJ
> > > there is already a way to say "there is a pending fix for this".
> >
> > That lacks a way to annotate "there is a pending fix for this, but the fix
> > is not yet applied to any git tree". I mean not only "git trees which syzbot
> > is checking" but also "git trees which are publicly visible".
> >
> > (Also, if we can later correct the patch using "#syz fix:" in case the patch
> > title was renamed, it is not clear how to specify multiple patches using
> > "#syz fix:" when a patch which meant to fix the reported problem contained
> > a regression or was incomplete and thus fixup patch followed shortly. An
> > example is commit 5f3e3b85cc0a5eae and commit ef95a90ae6f4f219 in
> > "WARNING: kmalloc bug in memdup_user (2)". I've tried
> >
> >   #syz fix: RDMA/ucma: Correct option size check using optlen
> >   #syz fix: RDMA/ucma: ucma_context reference leak in error path
> >
> > but only the former patch was recorded.)
> >
> > >
> > > But one problem with manual tagging is how to make everybody update
> > > these tags. If only few people do it, it can still lead to duplicate
> > > work. And it's not syzbot-specific. Can happen with just any bug
> > > report on kernel mailing lists. Traditionally it's solved with bug
> > > tracking systems and assigning bugs when a developer starts working on
> > > it. But kernel does not have a working bug tracker.
> > >
> > > One simple thing we can do is make syzbot poll more trees to discover
> > > Reported-by tags faster. This will automatically update status on
> > > dashboard to "fix pending". I've filed
> > > https://github.com/google/syzkaller/issues/610 for this. Ideally, we
> > > would intercept all mailed patches, but it's hard with kernel
> > > development process because there is no system that tracks all pending
> > > patches.
> > >
> >
> > The problem is that the pending fix won't be applied to any git tree.
> > It depends on when reviewers and maintainers can find time for
> > reviewing/committing the fix. Scanning all git trees unlikely helps.
> >
> >   the criteria is that you are "reasonably sure that the commit will
> >   reach upstream under this title", for whatever reason
> >
> > won't apply to not yet reviewed patches. What I want is a way to specify
> > "a patch was proposed but the patch is not yet reviewed/tested/applied".
> >
> > Generally, progresses are not recorded frequently enough to avoid duplicated
> > works. I want to check not only "fix pending" stage but also e.g. "problem
> > guessed", "bisected", "cause identified", "patch proposed", "patch reviewed"
> > stages from the top page's table.
>
> 1. This sounds very much like general bug tracking system. We
> specifically didn't want to go down the slippery slope of implementing
> yet another bug tracking system.
> 2. This problem is not specific to syzbot in any way (just like lost
> bug reports). Kernel developers waste time on duplicate work for other
> bug reports too.
> So I think (1) we need a bug tracking system, (2) use that system for
> syzbot to solve this local problem.


+Ted who also says that it is not possible to make sense out of
current state of kernel bug reports (e.g. what are open bugs for ext4
sorted by priority).
+Doug who says the same re rdma_cm subsystem.

Both said this in the context of syzbot, but I fail to see how this is
any syzbot-specific. This highlights the more broad problem with
kernel development process.


Re: NULL pointer dereference when writing fuzzed data to /dev/uhid

2019-01-04 Thread Benjamin Tissoires
Hi Anatoly,


On Fri, Jan 4, 2019 at 1:32 PM Anatoly Trosinenko
 wrote:
>
> Hello,
>
> When writing the attached file to /dev/uhid, a NULL dereference occurs
> in kernel. As I understand, the problem is not UHID-specific, but is
> related to HID subsystem.

Thanks for the report.
I wanted to tell you that I started investigating the other private
report you sent us, but couldn't find the time to properly come with a
fix as the fuzzed data is hard to discriminate from valid data.

A couple of notes though:
- writing to uhid needs to be done by root. Any distribution that
doesn't enforce that is doomed to have several security issues
- we could somehow reproduce those fuzzed data on a USB or Bluetooth
connection, but that would require physical access to the device, so
you are doomed also
- last IIRC, there was some attempts by the ChromeOS team to allow
access to the HID stack from the Chrome plugins, I don't know if this
is able to generate the issues.

On the specifics reported here:

>
> How to reproduce:
> 1) Checkout the fresh master branch of the Linux kernel (tested on
> commit 96d4f267e)
> 2) Compile it with the attached config (kvm-xfstests capable)
> 3) Take one of reproducers and execute
> cat /vtmp/repro > /dev/uhid
>
> What happens:
>
> For chicony.bin:
>
> root@kvm-xfstests:~# cat /vtmp/chicony.bin > /dev/uhid
> [   19.072703] BUG: unable to handle kernel NULL pointer dereference
> at 0002
> [   19.073371] #PF error: [normal kernel read fault]
> [   19.073755] PGD 800078b2c067 P4D 800078b2c067 PUD 0
> [   19.074223] Oops:  [#1] SMP PTI
> [   19.074809] CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted
> 4.20.0-xfstests-10979-g96d4f267e40 #1
> [   19.075965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.11.1-1ubuntu1 04/01/2014
> [   19.077599] Workqueue: events uhid_device_add_worker
> [   19.078019] RIP: 0010:ch_switch12_report_fixup+0x13/0x70

This driver expects the device to be connected on USB, and you are
triggering the oops because you are on uhid.
I am chasing the USB dependencies in most drivers, but this is a hard
task to do when I do not have access to the actual devices.

I guess one way of fixing that is to add a check for the actual
transport driver during probe:
hid_is_using_ll_driver(hdev, _hid_driver)

Patches are welcome :)

> [   19.078462] Code: 49 8b 00 3e 80 60 20 df b8 01 00 00 00 c3 66 0f
> 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 8f 48 19 00 00 48 89 f0 48
> 8b 49 d8 <80> 79 02 01 74 01 c3 81 7f 3c 21 14 00 00 75 f6 83 3a 7f 76
> f1 80
> [   19.080103] RSP: 0018:a1d880367c48 EFLAGS: 00010286
> [   19.080541] RAX: 9b653d27b180 RBX: 9b653a6fb948 RCX: 
> 
> [   19.081133] RDX: a1d880367c5c RSI: 9b653d27b180 RDI: 
> 9b653a6fa000
> [   19.081780] RBP: 9b653d27b180 R08: 00064992eed0 R09: 
> 
> [   19.082409] R10:  R11:  R12: 
> 9b653a6fa000
> [   19.083017] R13: 83f14510 R14: 83f14440 R15: 
> 
> [   19.083619] FS:  () GS:9b653fc0()
> knlGS:
> [   19.084362] CS:  0010 DS:  ES:  CR0: 80050033
> [   19.085164] CR2: 0002 CR3: 788b8004 CR4: 
> 00360ef0
> [   19.085789] Call Trace:
> [   19.086011]  hid_open_report+0x81/0x2c0
> [   19.086341]  hid_device_probe+0x135/0x160
> [   19.086754]  ? __driver_attach+0x110/0x110
> [   19.087109]  really_probe+0xe0/0x390
> [   19.087411]  ? __driver_attach+0x110/0x110
> [   19.087782]  bus_for_each_drv+0x78/0xc0
> [   19.088134]  __device_attach+0xcc/0x130
> [   19.088477]  bus_probe_device+0x9f/0xb0
> [   19.088832]  device_add+0x422/0x680
> [   19.089144]  ? __debugfs_create_file+0xb5/0xf0
> [   19.089536]  hid_add_device+0xec/0x280
> [   19.089880]  uhid_device_add_worker+0x15/0x60
> [   19.090270]  process_one_work+0x238/0x5d0
> [   19.090627]  worker_thread+0x3d/0x390
> [   19.090959]  ? process_one_work+0x5d0/0x5d0
> [   19.091331]  kthread+0x121/0x140
> [   19.096732]  ? __kthread_create_on_node+0x1a0/0x1a0
> [   19.097164]  ret_from_fork+0x3a/0x50
> [   19.097483] CR2: 0002
> [   19.097779] ---[ end trace 1b547acaae113039 ]---
> [   19.098186] RIP: 0010:ch_switch12_report_fixup+0x13/0x70
> [   19.098621] Code: 49 8b 00 3e 80 60 20 df b8 01 00 00 00 c3 66 0f
> 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 8f 48 19 00 00 48 89 f0 48
> 8b 49 d8 <80> 79 02 01 74 01 c3 81 7f 3c 21 14 00 00 75 f6 83 3a 7f 76
> f1 80
> [   19.100251] RSP: 0018:a1d880367c48 EFLAGS: 00010286
> [   19.100707] RAX: 9b653d27b180 RBX: 9b653a6fb948 RCX: 
> 
> [   19.101321] RDX: a1d880367c5c RSI: 9b653d27b180 RDI: 
> 9b653a6fa000
> [   19.102448] RBP: 9b653d27b180 R08: 00064992eed0 R09: 
> 
> [   19.103029] R10:  R11:  R12: 
> 9b653a6fa000
> [   19.103601] R13: 83f14510 R14: 83f14440 

Re: [PATCH] netfilter: account ebt_table_info to kmemcg

2019-01-04 Thread Michal Hocko
On Thu 03-01-19 12:52:54, Shakeel Butt wrote:
> On Mon, Dec 31, 2018 at 2:12 AM Michal Hocko  wrote:
> >
> > On Sun 30-12-18 19:59:53, Shakeel Butt wrote:
> > > On Sun, Dec 30, 2018 at 12:00 AM Michal Hocko  wrote:
> > > >
> > > > On Sun 30-12-18 08:45:13, Michal Hocko wrote:
> > > > > On Sat 29-12-18 11:34:29, Shakeel Butt wrote:
> > > > > > On Sat, Dec 29, 2018 at 2:06 AM Michal Hocko  
> > > > > > wrote:
> > > > > > >
> > > > > > > On Sat 29-12-18 10:52:15, Florian Westphal wrote:
> > > > > > > > Michal Hocko  wrote:
> > > > > > > > > On Fri 28-12-18 17:55:24, Shakeel Butt wrote:
> > > > > > > > > > The [ip,ip6,arp]_tables use x_tables_info internally and 
> > > > > > > > > > the underlying
> > > > > > > > > > memory is already accounted to kmemcg. Do the same for 
> > > > > > > > > > ebtables. The
> > > > > > > > > > syzbot, by using setsockopt(EBT_SO_SET_ENTRIES), was able 
> > > > > > > > > > to OOM the
> > > > > > > > > > whole system from a restricted memcg, a potential DoS.
> > > > > > > > >
> > > > > > > > > What is the lifetime of these objects? Are they bound to any 
> > > > > > > > > process?
> > > > > > > >
> > > > > > > > No, they are not.
> > > > > > > > They are free'd only when userspace requests it or the netns is
> > > > > > > > destroyed.
> > > > > > >
> > > > > > > Then this is problematic, because the oom killer is not able to
> > > > > > > guarantee the hard limit and so the excessive memory consumption 
> > > > > > > cannot
> > > > > > > be really contained. As a result the memcg will be basically 
> > > > > > > useless
> > > > > > > until somebody tears down the charged objects by other means. The 
> > > > > > > memcg
> > > > > > > oom killer will surely kill all the existing tasks in the cgroup 
> > > > > > > and
> > > > > > > this could somehow reduce the problem. Maybe this is sufficient 
> > > > > > > for
> > > > > > > some usecases but that should be properly analyzed and described 
> > > > > > > in the
> > > > > > > changelog.
> > > > > > >
> > > > > >
> > > > > > Can you explain why you think the memcg hard limit will not be
> > > > > > enforced? From what I understand, the memcg oom-killer will kill the
> > > > > > allocating processes as you have mentioned. We do force charging for
> > > > > > very limited conditions but here the memcg oom-killer will take care
> > > > > > of
> > > > >
> > > > > I was talking about the force charge part. Depending on a specific
> > > > > allocation and its life time this can gradually get us over hard limit
> > > > > without any bound theoretically.
> > > >
> > > > Forgot to mention. Since b8c8a338f75e ("Revert "vmalloc: back off when
> > > > the current task is killed"") there is no way to bail out from the
> > > > vmalloc allocation loop so if the request is really large then the memcg
> > > > oom will not help. Is that a problem here?
> > > >
> > >
> > > Yes, I think it will be an issue here.
> > >
> > > > Maybe it is time to revisit fatal_signal_pending check.
> > >
> > > Yes, we will need something to handle the memcg OOM. I will think more
> > > on that front or if you have any ideas, please do propose.
> >
> > I can see three options here:
> > - do not force charge on memcg oom or introduce a limited charge
> >   overflow (reserves basically).
> > - revert the revert and reintroduce the fatal_signal_pending
> >   check into vmalloc
> > - be more specific and check tsk_is_oom_victim in vmalloc and
> >   fail
> >
> 
> I think for the long term solution we might need something similar to
> memcg oom reserves (1) but for quick fix I think we can do the
> combination of (2) and (3).

Johannes argued that fatal_signal_pending is too general check for
vmalloc. I would argue that we already break out of some operations on
fatal signals. tsk_is_oom_victim is more subtle but much more targeted
on the other hand.

I do not have any strong preference to be honest but I agree that some
limited reserves would be the best solution long term. I just do not
have any idea how to scale those reserves to be meaningful.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH v5 10/15] KVM: s390: add functions to (un)register GISC with GISA

2019-01-04 Thread Cornelia Huck
On Wed, 2 Jan 2019 18:29:00 +0100
Pierre Morel  wrote:

> On 19/12/2018 20:17, Michael Mueller wrote:
> > Add the IAM (Interruption Alert Mask) to the architecture specific
> > kvm struct. This mask in the GISA is used to define for which ISC
> > a GIB alert can be issued.
> > 
> > The functions kvm_s390_gisc_register() and kvm_s390_gisc_unregister()
> > are used to (un)register a GISC (guest ISC) with a virtual machine and
> > its GISA.
> > 
> > Upon successful completion, kvm_s390_gisc_register() returns the
> > ISC to be used for GIB alert interruptions. A negative return code
> > indicates an error during registration.
> > 
> > Theses functions will be used by other adapter types like AP and PCI to
> > request pass-through interruption support.
> > 
> > Signed-off-by: Michael Mueller 
> > ---
> >   arch/s390/include/asm/kvm_host.h |  9 ++
> >   arch/s390/kvm/interrupt.c| 66 
> > 
> >   2 files changed, 75 insertions(+)
> > 

> > +int kvm_s390_gisc_register(struct kvm *kvm, u32 gisc)
> > +{
> > +   if (!kvm->arch.gib_in_use)
> > +   return -ENODEV;
> > +   if (gisc > MAX_ISC)
> > +   return -ERANGE;
> > +
> > +   spin_lock(>arch.iam_ref_lock);
> > +   if (kvm->arch.iam_ref_count[gisc] == 0)
> > +   kvm->arch.iam |= 0x80 >> gisc;
> > +   kvm->arch.iam_ref_count[gisc]++;
> > +   if (kvm->arch.iam_ref_count[gisc] == 1)
> > +   set_iam(kvm->arch.gisa, kvm->arch.iam);  
> 
> testing the set_iam return value?
> Even it should be fine if the caller works correctly, this is done 
> before GISA is ever used.

My feeling is that checking the return code is a good idea, even if it
Should Never Fail(tm).

> 
> > +   spin_unlock(>arch.iam_ref_lock);
> > +
> > +   return gib->nisc;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_s390_gisc_register);
> > +
> > +int kvm_s390_gisc_unregister(struct kvm *kvm, u32 gisc)
> > +{
> > +   int rc = 0;
> > +
> > +   if (!kvm->arch.gib_in_use)
> > +   return -ENODEV;
> > +   if (gisc > MAX_ISC)
> > +   return -ERANGE;
> > +
> > +   spin_lock(>arch.iam_ref_lock);
> > +   if (kvm->arch.iam_ref_count[gisc] == 0) {
> > +   rc = -EINVAL;
> > +   goto out;
> > +   }
> > +   kvm->arch.iam_ref_count[gisc]--;
> > +   if (kvm->arch.iam_ref_count[gisc] == 0) {
> > +   kvm->arch.iam &= ~(0x80 >> gisc);
> > +   set_iam(kvm->arch.gisa, kvm->arch.iam);

Any chance of this function failing here? If yes, would there be any
implications?

> > +   }
> > +out:
> > +   spin_unlock(>arch.iam_ref_lock);
> > +
> > +   return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_s390_gisc_unregister);
> > +
> >   void kvm_s390_gib_destroy(void)
> >   {
> > if (!gib)
> >   
> 
> 



Re: [PATCH v4] selftests: add headers_install to lib.mk

2019-01-04 Thread Geert Uytterhoeven
Hi Anders,

On Wed, Aug 8, 2018 at 12:46 PM Anders Roxell  wrote:
> On Tue, 7 Aug 2018 at 04:33, Masahiro Yamada
>  wrote:
> > 2018-08-07 2:03 GMT+09:00 Shuah Khan :
> > > On 07/25/2018 10:08 AM, Anders Roxell wrote:
> > >> On Tue, 24 Jul 2018 at 19:11, Shuah Khan  wrote:
> > >>> On 07/23/2018 02:49 PM, Anders Roxell wrote:
> >  On Thu, 7 Jun 2018 at 13:09, Anders Roxell  
> >  wrote:
> > >
> > > If the kernel headers aren't installed we can't build all the tests.
> > > Add a new make target rule 'khdr' in the file lib.mk to generate the
> > > kernel headers and that gets include for every test-dir Makefile that
> > > includes lib.mk If the testdir in turn have its own sub-dirs the
> > > top_srcdir needs to be set to the linux-rootdir to be able to generate
> > > the kernel headers.
> > >
> > > Signed-off-by: Anders Roxell 
> > > Reviewed-by: Fathi Boudra 
> > > ---
> > >  Makefile   | 14 
> > > +-
> > >  scripts/subarch.include| 13 +
> > >  tools/testing/selftests/android/Makefile   |  2 +-
> > >  tools/testing/selftests/android/ion/Makefile   |  2 ++
> > >  tools/testing/selftests/futex/functional/Makefile  |  1 +
> > >  tools/testing/selftests/gpio/Makefile  |  7 ++-
> > >  tools/testing/selftests/kvm/Makefile   |  7 ++-
> > >  tools/testing/selftests/lib.mk | 12 
> > >  tools/testing/selftests/net/Makefile   |  1 +
> > >  .../selftests/networking/timestamping/Makefile |  1 +
> > >  tools/testing/selftests/vm/Makefile|  4 
> > >  11 files changed, 36 insertions(+), 28 deletions(-)
> > >  create mode 100644 scripts/subarch.include
> > >
> > > diff --git a/Makefile b/Makefile
> > > index 6b9aea95ae3a..8050072300fa 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -286,19 +286,7 @@ KERNELRELEASE = $(shell cat 
> > > include/config/kernel.release 2> /dev/null)
> > >  KERNELVERSION = $(VERSION)$(if $(PATCHLEVEL),.$(PATCHLEVEL)$(if 
> > > $(SUBLEVEL),.$(SUBLEVEL)))$(EXTRAVERSION)
> > >  export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION
> > >
> > > -# SUBARCH tells the usermode build what the underlying arch is.  
> > > That is set
> > > -# first, and if a usermode build is happening, the "ARCH=um" on the 
> > > command
> > > -# line overrides the setting of ARCH below.  If a native build is 
> > > happening,
> > > -# then ARCH is assigned, getting whatever value it gets normally, and
> > > -# SUBARCH is subsequently ignored.
> > > -
> > > -SUBARCH := $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ \
> > > - -e s/sun4u/sparc64/ \
> > > - -e s/arm.*/arm/ -e s/sa110/arm/ \
> > > - -e s/s390x/s390/ -e 
> > > s/parisc64/parisc/ \
> > > - -e s/ppc.*/powerpc/ -e 
> > > s/mips.*/mips/ \
> > > - -e s/sh[234].*/sh/ -e 
> > > s/aarch64.*/arm64/ \
> > > - -e s/riscv.*/riscv/)
> > > +include scripts/subarch.include
> > >>>
> > >>> What is the reason for this SUBARCH block move to to 
> > >>> scripts/subarch.include?
> > >>> Is this necessary for adding headers install dependency to lib.mk?
> > >>
> > >> This is needed to create headers for cross build.
> > >
> > > I am sorry for the delay on this patch. I am going to get this into 4.19.
> > > If anybody has objections, please let me.
> > >
> > > Anders! Will be able to rebase the patch and send me the latest. I think
> > > I have Acks from kvm, android, and vm so far.
> >
> > I may be missing something about the tools/ directory,
> > but why isn't it like this?
> >
> >
> > kselftest: headers_install
> > $(Q)$(MAKE) -C $(srctree)/tools/testing/selftests run_tests
>
> That wont work if you cross compile, since you wont run the tests
> where you build them.

So why does a similar dependency work for samples in the main Makefile
(even listed twice!)?

Documentation/ samples/: headers_install

samples: headers_install

Or perhaps this also doesn't work?

> Then you can argue that we should break it up in a build and run rule.
> However, my understanding how people just build and run the the tests
> for that particular subsystem that they develop for, like:
> make -C tools/testing/selftests//

I'm still a bit puzzled: how can you "build and run" the tests when
cross-compiling, as you don't run the tests where you build them?

Thanks!

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I 

Re: [PATCH] drm/tegra: hdmi: Fix audio to work with any pixel clock rate

2019-01-04 Thread Thierry Reding
On Wed, Dec 14, 2016 at 06:20:39PM +0100, Alban Bedel wrote:
> The audio setting implementation was limited to a few specific pixel
> clocks. This prevented HDMI audio from working on several test devices
> as they need a pixel clock that is not supported by this implementation.
> 
> Fix this by implementing the algorithm provided in the TRM using fixed
> point arithmetic. This allows the driver to cope with any sane pixel
> clock rate.
> 
> Signed-off-by: Alban Bedel 
> ---
>  drivers/gpu/drm/tegra/hdmi.c | 163 
> ++-
>  1 file changed, 54 insertions(+), 109 deletions(-)

I had completely forgotten about this one, but then ran into this issue
myself yesterday and only then I started remembering this patch. It did
apply almost cleanly and still works perfectly. I made a couple of minor
modifications (see below) and applied this for v4.22.

> diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/drm/tegra/hdmi.c
> index cda0491ed6bf..b6078d6604b1 100644
> --- a/drivers/gpu/drm/tegra/hdmi.c
> +++ b/drivers/gpu/drm/tegra/hdmi.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Turned this into #include  which I think is the more
canonical way to get at do_div() nowadays.

>  
>  #include 
>  #include 
> @@ -112,68 +113,11 @@ static inline void tegra_hdmi_writel(struct tegra_hdmi 
> *hdmi, u32 value,
>  }
>  
>  struct tegra_hdmi_audio_config {
> - unsigned int pclk;
>   unsigned int n;
>   unsigned int cts;
>   unsigned int aval;
>  };
>  
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_32k[] = {
> - {  2520, 4096,  25200, 24000 },
> - {  2700, 4096,  27000, 24000 },
> - {  7425, 4096,  74250, 24000 },
> - { 14850, 4096, 148500, 24000 },
> - { 0,0,  0, 0 },
> -};
> -
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_44_1k[] = {
> - {  2520, 5880,  26250, 25000 },
> - {  2700, 5880,  28125, 25000 },
> - {  7425, 4704,  61875, 2 },
> - { 14850, 4704, 123750, 2 },
> - { 0,0,  0, 0 },
> -};
> -
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_48k[] = {
> - {  2520, 6144,  25200, 24000 },
> - {  2700, 6144,  27000, 24000 },
> - {  7425, 6144,  74250, 24000 },
> - { 14850, 6144, 148500, 24000 },
> - { 0,0,  0, 0 },
> -};
> -
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_88_2k[] = {
> - {  2520, 11760,  26250, 25000 },
> - {  2700, 11760,  28125, 25000 },
> - {  7425,  9408,  61875, 2 },
> - { 14850,  9408, 123750, 2 },
> - { 0, 0,  0, 0 },
> -};
> -
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_96k[] = {
> - {  2520, 12288,  25200, 24000 },
> - {  2700, 12288,  27000, 24000 },
> - {  7425, 12288,  74250, 24000 },
> - { 14850, 12288, 148500, 24000 },
> - { 0, 0,  0, 0 },
> -};
> -
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_176_4k[] = {
> - {  2520, 23520,  26250, 25000 },
> - {  2700, 23520,  28125, 25000 },
> - {  7425, 18816,  61875, 2 },
> - { 14850, 18816, 123750, 2 },
> - { 0, 0,  0, 0 },
> -};
> -
> -static const struct tegra_hdmi_audio_config tegra_hdmi_audio_192k[] = {
> - {  2520, 24576,  25200, 24000 },
> - {  2700, 24576,  27000, 24000 },
> - {  7425, 24576,  74250, 24000 },
> - { 14850, 24576, 148500, 24000 },
> - { 0, 0,  0, 0 },
> -};
> -
>  static const struct tmds_config tegra20_tmds_config[] = {
>   { /* slow pixel clock modes */
>   .pclk = 2700,
> @@ -411,52 +355,49 @@ static const struct tmds_config tegra124_tmds_config[] 
> = {
>   },
>  };
>  
> -static const struct tegra_hdmi_audio_config *
> -tegra_hdmi_get_audio_config(unsigned int sample_rate, unsigned int pclk)
> +static int
> +tegra_hdmi_get_audio_config(unsigned int audio_freq, unsigned int pix_clock,
> + struct tegra_hdmi_audio_config *config)
>  {
> - const struct tegra_hdmi_audio_config *table;
> -
> - switch (sample_rate) {
> - case 32000:
> - table = tegra_hdmi_audio_32k;
> - break;
> -
> - case 44100:
> - table = tegra_hdmi_audio_44_1k;
> - break;
> -
> - case 48000:
> - table = tegra_hdmi_audio_48k;
> - break;
> -
> - case 88200:
> - table = tegra_hdmi_audio_88_2k;
> - break;
> -
> - case 96000:
> - table = tegra_hdmi_audio_96k;
> - break;
> -
> - case 176400:
> - table = tegra_hdmi_audio_176_4k;
> - break;
> -
> - case 192000:
> - table = tegra_hdmi_audio_192k;
> - break;
> -
> - default:
> 

Re: [PATCH] scsi/mvsas/mv_init.c: Use dma_zalloc_coherent

2019-01-04 Thread John Garry

On 04/01/2019 12:48, Sabyasachi Gupta wrote:

On Wed, Dec 19, 2018 at 6:49 PM Sabyasachi Gupta
 wrote:


On Sat, Dec 1, 2018 at 6:40 PM Sabyasachi Gupta
 wrote:


On Wed, Nov 21, 2018 at 7:18 PM Sabyasachi Gupta
 wrote:


Replace dma_alloc_coherent + memset with dma_zalloc_coherent



If you're going to make this change, then how about change these to the 
managed version, so that we can avoid the explicit free'ing at driver 
removal?



Signed-off-by: Sabyasachi Gupta 


Any comment on this patch?


Any comment on this patch?


Any comment on this patch?






---
 drivers/scsi/mvsas/mv_init.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
index 3ac3437..495bddb 100644
--- a/drivers/scsi/mvsas/mv_init.c
+++ b/drivers/scsi/mvsas/mv_init.c
@@ -253,33 +253,29 @@ static int mvs_alloc(struct mvs_info *mvi, struct 
Scsi_Host *shost)
/*
 * alloc and init our DMA areas
 */
-   mvi->tx = dma_alloc_coherent(mvi->dev,
+   mvi->tx = dma_zalloc_coherent(mvi->dev,
 sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ,
 >tx_dma, GFP_KERNEL);


I'm guessing that this does not pass checkpatch with --strict option.

Thanks,
John


if (!mvi->tx)
goto err_out;
-   memset(mvi->tx, 0, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ);
-   mvi->rx_fis = dma_alloc_coherent(mvi->dev, MVS_RX_FISL_SZ,
+   mvi->rx_fis = dma_zalloc_coherent(mvi->dev, MVS_RX_FISL_SZ,
 >rx_fis_dma, GFP_KERNEL);
if (!mvi->rx_fis)
goto err_out;
-   memset(mvi->rx_fis, 0, MVS_RX_FISL_SZ);

-   mvi->rx = dma_alloc_coherent(mvi->dev,
+   mvi->rx = dma_zalloc_coherent(mvi->dev,
 sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1),
 >rx_dma, GFP_KERNEL);
if (!mvi->rx)
goto err_out;
-   memset(mvi->rx, 0, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1));
mvi->rx[0] = cpu_to_le32(0xfff);
mvi->rx_cons = 0xfff;

-   mvi->slot = dma_alloc_coherent(mvi->dev,
+   mvi->slot = dma_zalloc_coherent(mvi->dev,
   sizeof(*mvi->slot) * slot_nr,
   >slot_dma, GFP_KERNEL);
if (!mvi->slot)
goto err_out;
-   memset(mvi->slot, 0, sizeof(*mvi->slot) * slot_nr);

mvi->bulk_buffer = dma_alloc_coherent(mvi->dev,
   TRASH_BUCKET_SIZE,
--
2.7.4



.






[PATCH v2 2/2] arm64/sve: Disentangle from

2019-01-04 Thread Dave Martin
Currently,  provides common definitions for
describing SVE context structures that are also used by the ptrace
definitions in .

For this reason, a #include of  was added in
ptrace.h, but it this turns out that this can interact badly with
userspace code that tries to include ptrace.h on top of the libc
headers (which may provide their own shadow definitions for
sigcontext.h).

To make the headers easier for userspace to consume, this patch
bounces the common definitions into an __SVE_* namespace and moves
them to a backend header  that can be
included by the other headers as appropriate.  This should allow
ptrace.h to be used alongside libc's sigcontext.h (if any) without
ill effects.

This should make the situation unambiguous:  is
the header to include for the sigframe-specific definitions, while
 is the header to include for ptrace-specific
definitions.

To avoid conflicting with existing usage, 
remains the canonical way to get the common definitions for
SVE_VQ_MIN, sve_vq_from_vl() etc., both in userspace and in the
kernel: relying on these being defined as a side effect of
including just  was never intended to be safe.

Signed-off-by: Dave Martin 

---

Changes since v1:

 * Remove check for inappropriate inclusion of sve_context.h.

   A comment is added instead to document this usage restriction.

 * Include  in sve_context.h for __u32 (thanks, kbuild
   test robot)

   Annoyingly, the __u32 cast is not really needed in expected use, but
   it is possible that userspace is accidentally relying on it.
---
 arch/arm64/include/uapi/asm/ptrace.h  | 39 ++---
 arch/arm64/include/uapi/asm/sigcontext.h  | 56 +++
 arch/arm64/include/uapi/asm/sve_context.h | 53 +
 3 files changed, 99 insertions(+), 49 deletions(-)
 create mode 100644 arch/arm64/include/uapi/asm/sve_context.h

diff --git a/arch/arm64/include/uapi/asm/ptrace.h 
b/arch/arm64/include/uapi/asm/ptrace.h
index 65ef8b0..cff79c5 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -23,7 +23,7 @@
 #include 
 
 #include 
-#include 
+#include 
 
 
 /*
@@ -130,9 +130,9 @@ struct user_sve_header {
  */
 
 /* Offset from the start of struct user_sve_header to the register data */
-#define SVE_PT_REGS_OFFSET \
-   ((sizeof(struct user_sve_header) + (SVE_VQ_BYTES - 1))  \
-   / SVE_VQ_BYTES * SVE_VQ_BYTES)
+#define SVE_PT_REGS_OFFSET \
+   ((sizeof(struct user_sve_header) + (__SVE_VQ_BYTES - 1))\
+   / __SVE_VQ_BYTES * __SVE_VQ_BYTES)
 
 /*
  * The register data content and layout depends on the value of the
@@ -178,39 +178,36 @@ struct user_sve_header {
  * Additional data might be appended in the future.
  */
 
-#define SVE_PT_SVE_ZREG_SIZE(vq)   SVE_SIG_ZREG_SIZE(vq)
-#define SVE_PT_SVE_PREG_SIZE(vq)   SVE_SIG_PREG_SIZE(vq)
-#define SVE_PT_SVE_FFR_SIZE(vq)SVE_SIG_FFR_SIZE(vq)
+#define SVE_PT_SVE_ZREG_SIZE(vq)   __SVE_ZREG_SIZE(vq)
+#define SVE_PT_SVE_PREG_SIZE(vq)   __SVE_PREG_SIZE(vq)
+#define SVE_PT_SVE_FFR_SIZE(vq)__SVE_FFR_SIZE(vq)
 #define SVE_PT_SVE_FPSR_SIZE   sizeof(__u32)
 #define SVE_PT_SVE_FPCR_SIZE   sizeof(__u32)
 
-#define __SVE_SIG_TO_PT(offset) \
-   ((offset) - SVE_SIG_REGS_OFFSET + SVE_PT_REGS_OFFSET)
-
 #define SVE_PT_SVE_OFFSET  SVE_PT_REGS_OFFSET
 
 #define SVE_PT_SVE_ZREGS_OFFSET \
-   __SVE_SIG_TO_PT(SVE_SIG_ZREGS_OFFSET)
+   (SVE_PT_REGS_OFFSET + __SVE_ZREGS_OFFSET)
 #define SVE_PT_SVE_ZREG_OFFSET(vq, n) \
-   __SVE_SIG_TO_PT(SVE_SIG_ZREG_OFFSET(vq, n))
+   (SVE_PT_REGS_OFFSET + __SVE_ZREG_OFFSET(vq, n))
 #define SVE_PT_SVE_ZREGS_SIZE(vq) \
-   (SVE_PT_SVE_ZREG_OFFSET(vq, SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET)
+   (SVE_PT_SVE_ZREG_OFFSET(vq, __SVE_NUM_ZREGS) - SVE_PT_SVE_ZREGS_OFFSET)
 
 #define SVE_PT_SVE_PREGS_OFFSET(vq) \
-   __SVE_SIG_TO_PT(SVE_SIG_PREGS_OFFSET(vq))
+   (SVE_PT_REGS_OFFSET + __SVE_PREGS_OFFSET(vq))
 #define SVE_PT_SVE_PREG_OFFSET(vq, n) \
-   __SVE_SIG_TO_PT(SVE_SIG_PREG_OFFSET(vq, n))
+   (SVE_PT_REGS_OFFSET + __SVE_PREG_OFFSET(vq, n))
 #define SVE_PT_SVE_PREGS_SIZE(vq) \
-   (SVE_PT_SVE_PREG_OFFSET(vq, SVE_NUM_PREGS) - \
+   (SVE_PT_SVE_PREG_OFFSET(vq, __SVE_NUM_PREGS) - \
SVE_PT_SVE_PREGS_OFFSET(vq))
 
 #define SVE_PT_SVE_FFR_OFFSET(vq) \
-   __SVE_SIG_TO_PT(SVE_SIG_FFR_OFFSET(vq))
+   (SVE_PT_REGS_OFFSET + __SVE_FFR_OFFSET(vq))
 
 #define SVE_PT_SVE_FPSR_OFFSET(vq) \
((SVE_PT_SVE_FFR_OFFSET(vq) + SVE_PT_SVE_FFR_SIZE(vq) + \
-   (SVE_VQ_BYTES - 1)) \
-   / SVE_VQ_BYTES * SVE_VQ_BYTES)
+   (__SVE_VQ_BYTES - 1))   \
+   / __SVE_VQ_BYTES * __SVE_VQ_BYTES)
 #define 

[PATCH v2 1/2] arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition

2019-01-04 Thread Dave Martin
SVE_PT_REGS_OFFSET is supposed to indicate the offset for skipping
over the ptrace NT_ARM_SVE header (struct user_sve_header) to the
start of the SVE register data proper.

However, currently SVE_PT_REGS_OFFSET is defined in terms of struct
sve_context, which is wrong: that structure describes the SVE
header in the signal frame, not in the ptrace regset.

This patch fixes the definition to use the ptrace header structure
struct user_sve_header instead.

By good fortune, the two structures are the same size anyway, so
there is no functional or ABI change.

Signed-off-by: Dave Martin 
---
 arch/arm64/include/uapi/asm/ptrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/uapi/asm/ptrace.h 
b/arch/arm64/include/uapi/asm/ptrace.h
index a36227f..65ef8b0 100644
--- a/arch/arm64/include/uapi/asm/ptrace.h
+++ b/arch/arm64/include/uapi/asm/ptrace.h
@@ -131,7 +131,7 @@ struct user_sve_header {
 
 /* Offset from the start of struct user_sve_header to the register data */
 #define SVE_PT_REGS_OFFSET \
-   ((sizeof(struct sve_context) + (SVE_VQ_BYTES - 1))  \
+   ((sizeof(struct user_sve_header) + (SVE_VQ_BYTES - 1))  \
/ SVE_VQ_BYTES * SVE_VQ_BYTES)
 
 /*
-- 
2.1.4



[PATCH v2 0/2] arm64/sve: UAPI: Disentangle ptrace.h from sigcontext.h

2019-01-04 Thread Dave Martin
This patch refactors the UAPI header definitions for the Arm SVE
extension to avoid multiple-definition problems when userspace mixes its
own sigcontext.h definitions with the kernel's ptrace.h (which is
apparently routine).

A common backend header is created to hold common definitions, suitably
namespaced.

The extra header guard in sve_context.h was considered redundant and is
now gone, so patch 1 (kbuild: install_headers.sh: Strip _UAPI from
#if-defined() guards) from the previous version of the series has been
dropeed.

Dave Martin (2):
  arm64/sve: ptrace: Fix SVE_PT_REGS_OFFSET definition
  arm64/sve: Disentangle  from


 arch/arm64/include/uapi/asm/ptrace.h  | 39 ++---
 arch/arm64/include/uapi/asm/sigcontext.h  | 56 +++
 arch/arm64/include/uapi/asm/sve_context.h | 53 +
 3 files changed, 99 insertions(+), 49 deletions(-)
 create mode 100644 arch/arm64/include/uapi/asm/sve_context.h

-- 
2.1.4



Re: [PATCH v3] mm/page_owner: fix for deferred struct page init

2019-01-04 Thread Michal Hocko
On Thu 03-01-19 17:22:29, Qian Cai wrote:
> On 1/3/19 3:22 PM, Michal Hocko wrote:
> > On Thu 03-01-19 14:53:47, Qian Cai wrote:
> >> On 1/3/19 2:07 PM, Michal Hocko wrote> So can we make the revert with an
> >> explanation that the patch was wrong?
> >>> If we want to make hacks to catch more objects to be tracked then it
> >>> would be great to have some numbers in hands.
> >>
> >> Well, those numbers are subject to change depends on future start_kernel()
> >> order. Right now, there are many functions could be caught earlier by page 
> >> owner.
> >>
> >>kmemleak_init();
> > [...]
> >>sched_init_smp();
> > 
> > The kernel source dump will not tell us much of course. A ball park
> > number whether we are talking about dozen, hundreds or thousands of
> > allocations would tell us something at least, doesn't it.
> > 
> > Handwaving that it might help us some is not particurarly useful. We are
> > already losing some allocations already. Does it matter? Well, that
> > depends, sometimes we do want to catch an owner of particular page and
> > it is sad to find nothing. But how many times have you or somebody else
> > encountered that in practice. That is exactly a useful information to
> > judge an ugly ifdefery in the code. See my point?
> 
> Here is the number without DEFERRED_STRUCT_PAGE_INIT.
> 
> == page_ext_init() after page_alloc_init_late() ==
> Node 0, zone DMA: page owner found early allocated 0 pages
> Node 0, zone DMA32: page owner found early allocated 7009 pages
> Node 0, zone Normal: page owner found early allocated 85827 pages
> Node 4, zone Normal: page owner found early allocated 75063 pages
> 
> == page_ext_init() before kmemleak_init() ==
> Node 0, zone DMA: page owner found early allocated 0 pages
> Node 0, zone DMA32: page owner found early allocated 6654 pages
> Node 0, zone Normal: page owner found early allocated 41907 pages
> Node 4, zone Normal: page owner found early allocated 41356 pages
> 
> So, it told us that it will miss tens of thousands of early page allocation 
> call
> sites.

This is an answer for the first part of the question (how much). The
second is _do_we_care_?

-- 
Michal Hocko
SUSE Labs


(Audio/Video): The Truth about Linux GPLv2 and license recission (revocation).

2019-01-04 Thread vsnsdualce
Information regarding the rights of the linux programmers, regarding 
rescission (revocation) of their granted license to use their code.


Video:
http://www.liveleak.com/view?t=9O5vz_1546606404

( Audio Only: )
(Part1: http://www.liveleak.com/view?t=s3Sr9_1546605652  )
(Part2: http://www.liveleak.com/view?t=aOkfS_1546605889  )



Re: Column for keywords?

2019-01-04 Thread Dmitry Vyukov
On Wed, May 16, 2018 at 6:50 PM Tetsuo Handa
 wrote:
>
> Dmitry Vyukov wrote:
> > On Tue, May 15, 2018 at 10:59 PM, Tetsuo Handa
> >  wrote:
> > > Dmitry Vyukov wrote:
> > >> I've filed https://github.com/google/syzkaller/issues/608 to not lose
> > >> track of this.
> > >
> > > Thanks. Since the time lag between a patch was proposed and that patch is
> > > applied to a git tree tends to become long, duplicated works like
> > > https://www.spinics.net/lists/linux-fsdevel/msg125240.html and
> > > http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056...@i-love.sakura.ne.jp
> > > are already occurring.
> >
> > This is bad.
> >
> > > Therefore, it is important that the state of the bug (e.g.
> > > bisected, cause identified, patch proposed) is visible from the table.
> >
> > What do you think about the last section of:
> > https://groups.google.com/d/msg/syzkaller-bugs/nw7BIW9V2wk/NE0P_Au4AQAJ
> > there is already a way to say "there is a pending fix for this".
>
> That lacks a way to annotate "there is a pending fix for this, but the fix
> is not yet applied to any git tree". I mean not only "git trees which syzbot
> is checking" but also "git trees which are publicly visible".
>
> (Also, if we can later correct the patch using "#syz fix:" in case the patch
> title was renamed, it is not clear how to specify multiple patches using
> "#syz fix:" when a patch which meant to fix the reported problem contained
> a regression or was incomplete and thus fixup patch followed shortly. An
> example is commit 5f3e3b85cc0a5eae and commit ef95a90ae6f4f219 in
> "WARNING: kmalloc bug in memdup_user (2)". I've tried
>
>   #syz fix: RDMA/ucma: Correct option size check using optlen
>   #syz fix: RDMA/ucma: ucma_context reference leak in error path
>
> but only the former patch was recorded.)
>
> >
> > But one problem with manual tagging is how to make everybody update
> > these tags. If only few people do it, it can still lead to duplicate
> > work. And it's not syzbot-specific. Can happen with just any bug
> > report on kernel mailing lists. Traditionally it's solved with bug
> > tracking systems and assigning bugs when a developer starts working on
> > it. But kernel does not have a working bug tracker.
> >
> > One simple thing we can do is make syzbot poll more trees to discover
> > Reported-by tags faster. This will automatically update status on
> > dashboard to "fix pending". I've filed
> > https://github.com/google/syzkaller/issues/610 for this. Ideally, we
> > would intercept all mailed patches, but it's hard with kernel
> > development process because there is no system that tracks all pending
> > patches.
> >
>
> The problem is that the pending fix won't be applied to any git tree.
> It depends on when reviewers and maintainers can find time for
> reviewing/committing the fix. Scanning all git trees unlikely helps.
>
>   the criteria is that you are "reasonably sure that the commit will
>   reach upstream under this title", for whatever reason
>
> won't apply to not yet reviewed patches. What I want is a way to specify
> "a patch was proposed but the patch is not yet reviewed/tested/applied".
>
> Generally, progresses are not recorded frequently enough to avoid duplicated
> works. I want to check not only "fix pending" stage but also e.g. "problem
> guessed", "bisected", "cause identified", "patch proposed", "patch reviewed"
> stages from the top page's table.

1. This sounds very much like general bug tracking system. We
specifically didn't want to go down the slippery slope of implementing
yet another bug tracking system.
2. This problem is not specific to syzbot in any way (just like lost
bug reports). Kernel developers waste time on duplicate work for other
bug reports too.
So I think (1) we need a bug tracking system, (2) use that system for
syzbot to solve this local problem.


direct.c:undefined reference to `arch_dma_alloc'

2019-01-04 Thread kbuild test robot
Hi Christoph,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144
commit: 68c608345cc569bcfa1c1b2add4c00c343ecf933 swiotlb: remove dma_mark_clean
date:   3 weeks ago
config: ia64-zx1_defconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 68c608345cc569bcfa1c1b2add4c00c343ecf933
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   kernel/dma/mapping.o: In function `dma_common_get_sgtable':
   mapping.c:(.text+0x10d2): undefined reference to `arch_dma_coherent_to_pfn'
   kernel/dma/mapping.o: In function `dma_common_mmap':
   mapping.c:(.text+0x1452): undefined reference to `arch_dma_coherent_to_pfn'
   kernel/dma/direct.o: In function `dma_direct_alloc':
>> direct.c:(.text+0x1162): undefined reference to `arch_dma_alloc'
   kernel/dma/direct.o: In function `dma_direct_free':
>> direct.c:(.text+0x1352): undefined reference to `arch_dma_free'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[GIT PULL] Please pull powerpc/linux.git powerpc-4.21-2 tag

2019-01-04 Thread Michael Ellerman
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Linus,

Please pull some powerpc fixes for 4.21:

The following changes since commit 96d4f267e40f9509e8a66e2b39e8b95655617693:

  Remove 'type' argument from access_ok() function (2019-01-03 18:57:57 -0800)

are available in the git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
tags/powerpc-4.21-2

for you to fetch changes up to 074400a7be61250d9f0ccec07d5c35ffec4d8d22:

  powerpc: Drop use of 'type' from access_ok() (2019-01-04 23:07:59 +1100)

- --
powerpc fixes for 4.21 #2

A fix for the recent access_ok() change, which broke the build. We recently
added a use of type in order to squash a warning elsewhere about type being
unused.

A handful of other minor build fixes, and one defconfig update.

Thanks to:
  Christian Lamparter, Christophe Leroy, Diana Craciun, Mathieu Malaterre.

- --
Christian Lamparter (1):
  powerpc/4xx/ocm: Fix compilation error due to PAGE_KERNEL usage

Diana Craciun (1):
  powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'

Mathieu Malaterre (1):
  powerpc: Drop use of 'type' from access_ok()

Michael Ellerman (4):
  powerpc/4xx/ocm: Fix phys_addr_t printf warnings
  powerpc/configs: Add PPC4xx_OCM to ppc40x_defconfig
  KVM: PPC: Book3S HV: radix: Fix uninitialized var build error
  Merge branch 'master' into fixes


 arch/powerpc/configs/ppc40x_defconfig  |  1 +
 arch/powerpc/include/asm/uaccess.h |  2 +-
 arch/powerpc/kernel/head_booke.h   | 18 --
 arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
 arch/powerpc/platforms/4xx/ocm.c   | 10 +-
 5 files changed, 20 insertions(+), 13 deletions(-)
-BEGIN PGP SIGNATURE-

iQIcBAEBAgAGBQJcL1eOAAoJEFHr6jzI4aWA86wP/3V5zlUNtsyXpcCx60PNIMya
6z6oWVKoM92nMIyfWuoxKBEbylSyixFSxOsKoiVJDyigW5WWNjsPQ1KOUD2OVn0V
OQhj2aDXwUtxdlQ5dK0H9lodsUKUEw1/saqcgi2OS8iDDa8kJIEemDe29Omf5DyE
U2RKl++nhAfK8BLfMRCAIqL+TVZq9Zy+zNwbVtsmj7lAlw+Csb4KV3YvlJ7b6pjD
Ntm6O3X8sy7GtJdMpeBDOMTbWuvIQpefVH135jS1IjJZbrafZK6a4gQ9cs5mx68d
+/nyzkcjy83eRGkv3QY5Y+vEa0wkFdBb6Nf0MH63xCPVYVjom6pRUGL+RPldi2GO
bmieDXavlicX9y+BCQSKASTzNo2uR2heF31aCnr6bsOSeT8ZNGbIwgHuGGqrzN+8
qLQb00hA0uKlbNgPpNoyYO5K3khntR348HfH0Uwlm3Nttvn1TEh87c9L19FcdlMw
SlA6kpJ6328FkEv5FUPu7v7764dPgOQNImPKnsYdhbUxhHnDMx33LzsUEQQcYWpM
3nsKQMW1yqMH2BG8u9LQtZErFanX+RNm81OsP38F8nvGe1RJkoq7R7MSooZ9tzs2
Ymf8W+KNofKmXVAPh7AyKQzpy0a/sEqAXmtTPBf7bsdwU5PQB0+LMf0yzyZpEp2U
3eFcseqDoO32ScDZQUYt
=Y+yv
-END PGP SIGNATURE-


Re: Remove 'type' argument from access_ok() function

2019-01-04 Thread Michael Ellerman
On Fri, 2019-01-04 at 09:27:58 UTC, Mathieu Malaterre wrote:
> In commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with
> access_ok()") an attempt was made to remove a warning by referencing the
> variable `type`, however in commit 96d4f267e40f ("Remove 'type' argument
> from access_ok() function") the variable `type` has been removed.
> 
> Revert commit 05a4ab823983 ("powerpc/uaccess: fix warning/error with
> access_ok()") to fix the following compilation error:
> 
>   arch/powerpc/include/asm/uaccess.h:66:32: error: ‘type’ 
> undeclared (first use in this function)
> 
> Signed-off-by: Mathieu Malaterre 

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/074400a7be61250d9f0ccec07d5c35

cheers


[REGRESSION, BISECTED] pci: nvme device with HMB fails on arm64

2019-01-04 Thread Liviu Dudau
Hello Christoph,

As I have mentioned to you after Xmas, I think your dma-mapping series breaks
NVMe drivers that use HMB on arm64 (RK3399 NanoPC T4 board with Toshiba RC100
SSD in this case). The observed behaviour is that the modprobe of the nvme
module will hang due to a kernel crash, which I've sent a patch for to the mm
subsystem ("mm/vmalloc.c: don't dereference possible NULL pointer in __vunmap"),
but that behaviour is triggered by the failure of the NVMe drive to access host
memory buffers (error 24578, flags 0x1).

Now that your series has been merged into Linus' tree, I've bisected it to this:

bfd56cd605219d90b210a5377fca31a644efe95c is the first bad commit
commit bfd56cd605219d90b210a5377fca31a644efe95c
Author: Christoph Hellwig 
Date:   Sun Nov 4 17:38:39 2018 +0100

dma-mapping: support highmem in the generic remap allocator

By using __dma_direct_alloc_pages we can deal entirely with struct page
instead of having to derive a kernel virtual address.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 

:04 04 565ae62f55f04c11da2471bd59d1b0328273992d 
40047c9ecf715f6f7e8293b335c1f16dd511a0e0 M  kernel

Note that the bisect was done on the mainline tree without any additional 
patches, as
applying those makes the bisect process think that the culprit is the merging 
of the
tracing changes.

I haven't tried to revert the patch as it is clear that the following patch 
depends on it
and also because during the bisect process one of the steps generated a kernel 
that failed
to boot as it was missing your patch 9ab91e7c5c51 ("arm64: default to the 
direct mapping
in get_arch_dma_ops"). I've marked that step as bad, as it was related to the 
series, but
I might have been wrong there.

The full bisect log is this:

$ git bisect log
git bisect start
# good: [00c569b567c7f1f0da6162868fd02a9f29411805] Merge tag 'locks-v4.21-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
git bisect good 00c569b567c7f1f0da6162868fd02a9f29411805
# bad: [645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144] Merge branch 'for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
git bisect bad 645ff1e8e704c4f33ab1fcd3c87f95cb9b6d7144
# bad: [02061181d3a9ccfe15ef6bc15fa56283acc47620] Merge tag 'staging-4.21-rc1' 
of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect bad 02061181d3a9ccfe15ef6bc15fa56283acc47620
# bad: [f346b0becb1bc62e45495f9cdbae3eef35d0b635] Merge branch 'akpm' (patches 
from Andrew)
git bisect bad f346b0becb1bc62e45495f9cdbae3eef35d0b635
# bad: [938edb8a31b976c9a92eb0cd4ff481e93f76c1f1] Merge tag 'scsi-misc' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
git bisect bad 938edb8a31b976c9a92eb0cd4ff481e93f76c1f1
# good: [00203ba40d40d7f33857416adfb18adaf0e40123] kyber: use sbitmap 
add_wait_queue/list_del wait helpers
git bisect good 00203ba40d40d7f33857416adfb18adaf0e40123
# good: [735bcc77e6ba83e464665cea9041072190ede37e] scsi: hisi_sas: Fix warnings 
detected by sparse
git bisect good 735bcc77e6ba83e464665cea9041072190ede37e
# bad: [af7ddd8a627c62a835524b3f5b471edbbbcce025] Merge tag 'dma-mapping-4.21' 
of git://git.infradead.org/users/hch/dma-mapping
git bisect bad af7ddd8a627c62a835524b3f5b471edbbbcce025
# bad: [8ddbe5943c0b1259b5ddb6dc1729863433fc256c] dma-mapping: move 
dma_cache_sync out of line
git bisect bad 8ddbe5943c0b1259b5ddb6dc1729863433fc256c
# bad: [887712a0a5b31e0cf28087f6445de431b4722e52] x86/calgary: remove the 
mapping_error dma_map_ops method
git bisect bad 887712a0a5b31e0cf28087f6445de431b4722e52
# bad: [b0cbeae4944924640bf550b75487729a20204c14] dma-direct: remove the 
mapping_error dma_map_ops method
git bisect bad b0cbeae4944924640bf550b75487729a20204c14
# bad: [bfd56cd605219d90b210a5377fca31a644efe95c] dma-mapping: support highmem 
in the generic remap allocator
git bisect bad bfd56cd605219d90b210a5377fca31a644efe95c
# good: [704f2c20eaa566f6906e8812b6e2115889bd753d] dma-direct: reject highmem 
pages from dma_alloc_from_contiguous
git bisect good 704f2c20eaa566f6906e8812b6e2115889bd753d
# good: [0c3b3171ceccb8830c2bb5adff1b4e9b204c1450] dma-mapping: move the arm64 
noncoherent alloc/free support to common code
git bisect good 0c3b3171ceccb8830c2bb5adff1b4e9b204c1450
# first bad commit: [bfd56cd605219d90b210a5377fca31a644efe95c] dma-mapping: 
support highmem in the generic remap allocator

Does anyone have any suggestions on what I might try as a fix?


Best regards,
Liviu

-- 
   
  ||___
  \   |  With enough courage, you can do without a reputation  |  /
   \  |  -- Rhett Butler   | /
   /  || \
  /__)  (_\


Re: [PATCH] perf stat: Poll for monitored tasks being alive in fork mode

2019-01-04 Thread Jiri Olsa
On Fri, Jan 04, 2019 at 10:28:17AM +0800, Jin Yao wrote:
> Following test shows the stat keeps running even if no longer
> task to monitor (mgen exits at ~5s).
> 
> perf stat -e cycles -p `pgrep mgen` -I1000 -- sleep 10
> time counts unit events
>  1.000148916  1,308,365,864  cycles
>  2.000379171  1,297,269,875  cycles
>  3.000556719  1,297,187,078  cycles
>  4.000914241761,261,827  cycles
>  5.001306091cycles
>  6.001676881cycles
>  7.002046336cycles
>  8.002405651cycles
>  9.002766625cycles
> 10.001395827cycles
> 
> We'd better finish stat immediately if there's no longer task to
> monitor.
> 
> After:
> 
> perf stat -e cycles -p `pgrep mgen` -I1000 -- sleep 10
> time counts unit events
>  1.000180062  1,236,592,661  cycles
>  2.000421539  1,223,733,572  cycles
>  3.000609910  1,297,047,663  cycles
>  4.000807545  1,297,215,816  cycles
>  5.001001578  1,297,208,032  cycles
>  6.001390345582,343,659  cycles
> sleep: Terminated
> 
> Now the stat exits immediately when the monitored tasks ends.
> 
> Signed-off-by: Jin Yao 
> ---
>  tools/perf/builtin-stat.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 63a3afc..71f3bc8 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -553,6 +553,13 @@ static int __run_perf_stat(int argc, const char **argv, 
> int run_idx)
>  
>   if (interval || timeout) {
>   while (!waitpid(child_pid, , WNOHANG)) {
> + if (!is_target_alive(,
> + evsel_list->threads) &&
> + (child_pid != -1)) {

do we need that child_pid check? we just returned from waitpid
so we should be ok.. we just make the race window smaller

could we just do:

if (!is_target_alive(, 
evsel_list->threads)) {
kill(child_pid, SIGTERM);
break;
}

also I'm not sure we should do this only under new option,
as it might break people's scripts.. thoughts?

jirka

> + kill(child_pid, SIGTERM);
> + break;
> + }
> +
>   nanosleep(, NULL);
>   if (timeout)
>   break;
> -- 
> 2.7.4
> 


[PATCH 25/25] mm, compaction: Do not direct compact remote memory

2019-01-04 Thread Mel Gorman
Remote compaction is expensive and possibly counter-productive. Locality
is expected to often have better performance characteristics than remote
high-order pages. For small allocations, it's expected that locality is
generally required or fallbacks are possible. For larger allocations such
as THP, they are forbidden at the time of writing but if __GFP_THISNODE
is ever removed, then it would still be preferable to fallback to small
local base pages over remote THP in the general case. kcompactd is still
woken via kswapd so compaction happens eventually.

While this patch potentially has both positive and negative effects,
it is best to avoid the possibility of remote compaction given the cost
relative to any potential benefit.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index ae70be023b21..cc17f0c01811 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2348,6 +2348,16 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, 
unsigned int order,
continue;
}
 
+   /*
+* Do not compact remote memory. It's expensive and high-order
+* small allocations are expected to prefer or require local
+* memory. Similarly, larger requests such as THP can fallback
+* to base pages in preference to remote huge pages if
+* __GFP_THISNODE is not specified
+*/
+   if (zone_to_nid(zone) != 
zone_to_nid(ac->preferred_zoneref->zone))
+   continue;
+
status = compact_zone_order(zone, order, gfp_mask, prio,
alloc_flags, ac_classzone_idx(ac), capture);
rc = max(status, rc);
-- 
2.16.4



[PATCH 21/25] mm, compaction: Round-robin the order while searching the free lists for a target

2019-01-04 Thread Mel Gorman
As compaction proceeds and creates high-order blocks, the free list
search gets less efficient as the larger blocks are used as compaction
targets. Eventually, the larger blocks will be behind the migration
scanner for partially migrated pageblocks and the search fails. This
patch round-robins what orders are searched so that larger blocks can be
ignored and find smaller blocks that can be used as migration targets.

The overall impact was small on 1-socket but it avoids corner cases where
the migration/free scanners meet prematurely or situations where many of
the pageblocks encountered by the free scanner are almost full instead of
being properly packed. Previous testing had indicated that without this
patch there were occasional large spikes in the free scanner without this
patch. By co-incidence, the 2-socket results showed a 54% reduction in
the free scanner but will not be universally true.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 33 ++---
 mm/internal.h   |  3 ++-
 2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 6c5552c6d8f9..652e249168b1 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1154,6 +1154,24 @@ fast_isolate_around(struct compact_control *cc, unsigned 
long pfn, unsigned long
set_pageblock_skip(page);
 }
 
+/* Search orders in round-robin fashion */
+static int next_search_order(struct compact_control *cc, int order)
+{
+   order--;
+   if (order < 0)
+   order = cc->order - 1;
+
+   /* Search wrapped around? */
+   if (order == cc->search_order) {
+   cc->search_order--;
+   if (cc->search_order < 0)
+   cc->search_order = cc->order - 1;
+   return -1;
+   }
+
+   return order;
+}
+
 static unsigned long
 fast_isolate_freepages(struct compact_control *cc)
 {
@@ -1186,9 +1204,15 @@ fast_isolate_freepages(struct compact_control *cc)
if (WARN_ON_ONCE(min_pfn > low_pfn))
low_pfn = min_pfn;
 
-   for (order = cc->order - 1;
-order >= 0 && !page;
-order--) {
+   /*
+* Search starts from the last successful isolation order or the next
+* order to search after a previous failure
+*/
+   cc->search_order = min_t(unsigned int, cc->order - 1, cc->search_order);
+
+   for (order = cc->search_order;
+!page && order >= 0;
+order = next_search_order(cc, order)) {
struct free_area *area = >zone->free_area[order];
struct list_head *freelist;
struct page *freepage;
@@ -1211,6 +1235,7 @@ fast_isolate_freepages(struct compact_control *cc)
 
if (pfn >= low_pfn) {
cc->fast_search_fail = 0;
+   cc->search_order = order;
page = freepage;
break;
}
@@ -2146,6 +2171,7 @@ static enum compact_result compact_zone_order(struct zone 
*zone, int order,
.total_migrate_scanned = 0,
.total_free_scanned = 0,
.order = order,
+   .search_order = order,
.gfp_mask = gfp_mask,
.zone = zone,
.mode = (prio == COMPACT_PRIO_ASYNC) ?
@@ -2385,6 +2411,7 @@ static void kcompactd_do_work(pg_data_t *pgdat)
struct zone *zone;
struct compact_control cc = {
.order = pgdat->kcompactd_max_order,
+   .search_order = pgdat->kcompactd_max_order,
.total_migrate_scanned = 0,
.total_free_scanned = 0,
.classzone_idx = pgdat->kcompactd_classzone_idx,
diff --git a/mm/internal.h b/mm/internal.h
index e5ca2a10b8ad..d028abd8a8f3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -191,7 +191,8 @@ struct compact_control {
struct zone *zone;
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
-   unsigned int fast_search_fail;  /* failures to use free list searches */
+   unsigned short fast_search_fail;/* failures to use free list searches */
+   unsigned short search_order;/* order to start a fast search at */
const gfp_t gfp_mask;   /* gfp mask of a direct compactor */
int order;  /* order a direct compactor needs */
int migratetype;/* migratetype of direct compactor */
-- 
2.16.4



[PATCH 24/25] mm, compaction: Capture a page under direct compaction

2019-01-04 Thread Mel Gorman
Compaction is inherently race-prone as a suitable page freed during
compaction can be allocated by any parallel task. This patch uses a
capture_control structure to isolate a page immediately when it is freed
by a direct compactor in the slow path of the page allocator. The intent
is to avoid redundant scanning.

4.20.0 4.20.0
   selective-v2r15  capture-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  2624.85 (   0.00%) 2594.49 (   1.16%)
Amean fault-both-5  3842.66 (   0.00%) 4088.32 (  -6.39%)
Amean fault-both-7  5459.47 (   0.00%) 5936.54 (  -8.74%)
Amean fault-both-12 9276.60 (   0.00%)10160.85 (  -9.53%)
Amean fault-both-1814030.73 (   0.00%)13908.92 (   0.87%)
Amean fault-both-2413298.10 (   0.00%)16819.86 * -26.48%*
Amean fault-both-3017648.62 (   0.00%)17901.74 (  -1.43%)
Amean fault-both-3219161.67 (   0.00%)18621.32 (   2.82%)

Latency is only moderately affected but the devil is in the details.
A closer examination indicates that base page fault latency is much
reduced but latency of huge pages is increased as it takes creater care
to succeed. Part of the "problem" is that allocation success rates
are close to 100% even when under pressure and compaction gets harder

   4.20.0 4.20.0
  selective-v2r15  capture-v2r15
Percentage huge-1 0.00 (   0.00%)0.00 (   0.00%)
Percentage huge-399.95 (   0.00%)   99.98 (   0.03%)
Percentage huge-598.83 (   0.00%)   98.01 (  -0.84%)
Percentage huge-796.78 (   0.00%)   98.30 (   1.58%)
Percentage huge-12   98.85 (   0.00%)   97.76 (  -1.10%)
Percentage huge-18   97.52 (   0.00%)   99.05 (   1.57%)
Percentage huge-24   97.07 (   0.00%)   99.34 (   2.35%)
Percentage huge-30   96.59 (   0.00%)   99.08 (   2.58%)
Percentage huge-32   95.94 (   0.00%)   99.03 (   3.22%)

And scan rates are reduced as expected by 10% for the migration
scanner and 37% for the free scanner indicating that there is
less redundant work.

Compaction migrate scanned20338945.0018133661.00
Compaction free scanned   12590377.00 7986174.00

The impact on 2-socket is much larger albeit not presented. Under
a different workload that fragments heavily, the allocation latency
is reduced by 26% while the success rate goes from 63% to 80%

Signed-off-by: Mel Gorman 
---
 include/linux/compaction.h |  3 ++-
 include/linux/sched.h  |  4 
 kernel/sched/core.c|  3 +++
 mm/compaction.c| 31 +++--
 mm/internal.h  |  9 +++
 mm/page_alloc.c| 58 ++
 6 files changed, 96 insertions(+), 12 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 68250a57aace..b0d530cf46d1 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -95,7 +95,8 @@ extern int sysctl_compact_unevictable_allowed;
 extern int fragmentation_index(struct zone *zone, unsigned int order);
 extern enum compact_result try_to_compact_pages(gfp_t gfp_mask,
unsigned int order, unsigned int alloc_flags,
-   const struct alloc_context *ac, enum compact_priority prio);
+   const struct alloc_context *ac, enum compact_priority prio,
+   struct page **page);
 extern void reset_isolation_suitable(pg_data_t *pgdat);
 extern enum compact_result compaction_suitable(struct zone *zone, int order,
unsigned int alloc_flags, int classzone_idx);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 89541d248893..f5ac0cf9cc32 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -47,6 +47,7 @@ struct pid_namespace;
 struct pipe_inode_info;
 struct rcu_node;
 struct reclaim_state;
+struct capture_control;
 struct robust_list_head;
 struct sched_attr;
 struct sched_param;
@@ -964,6 +965,9 @@ struct task_struct {
 
struct io_context   *io_context;
 
+#ifdef CONFIG_COMPACTION
+   struct capture_control  *capture_control;
+#endif
/* Ptrace state: */
unsigned long   ptrace_message;
kernel_siginfo_t*last_siginfo;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f66920173370..ef478b0daa45 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2177,6 +2177,9 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
INIT_HLIST_HEAD(>preempt_notifiers);
 #endif
 
+#ifdef CONFIG_COMPACTION
+   p->capture_control = NULL;
+#endif
init_numa_balancing(clone_flags, p);
 }
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 7f316e1a7275..ae70be023b21 100644

[PATCH 23/25] mm, compaction: Be selective about what pageblocks to clear skip hints

2019-01-04 Thread Mel Gorman
Pageblock hints are cleared when compaction restarts or kswapd makes enough
progress that it can sleep but it's over-eager in that the bit is cleared
for migration sources with no LRU pages and migration targets with no free
pages. As pageblock skip hint flushes are relatively rare and out-of-band
with respect to kswapd, this patch makes a few more expensive checks to
see if it's appropriate to even clear the bit. Every pageblock that is
not cleared will avoid 512 pages being scanned unnecessarily on x86-64.

The impact is variable with different workloads showing small differences
in latency, success rates and scan rates. This is expected as clearing
the hints is not that common but doing a small amount of work out-of-band
to avoid a large amount of work in-band later is generally a good thing.

Signed-off-by: Mel Gorman 
---
 include/linux/mmzone.h |   2 +
 mm/compaction.c| 119 +
 2 files changed, 102 insertions(+), 19 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cc4a507d7ca4..faa1e6523f49 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -480,6 +480,8 @@ struct zone {
unsigned long   compact_cached_free_pfn;
/* pfn where async and sync compaction migration scanner should start */
unsigned long   compact_cached_migrate_pfn[2];
+   unsigned long   compact_init_migrate_pfn;
+   unsigned long   compact_init_free_pfn;
 #endif
 
 #ifdef CONFIG_COMPACTION
diff --git a/mm/compaction.c b/mm/compaction.c
index cc532e81a7b7..7f316e1a7275 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -231,6 +231,62 @@ static bool pageblock_skip_persistent(struct page *page)
return false;
 }
 
+static bool
+__reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
+   bool check_target)
+{
+   struct page *page = pfn_to_online_page(pfn);
+   struct page *end_page;
+
+   if (!page)
+   return false;
+   if (zone != page_zone(page))
+   return false;
+   if (pageblock_skip_persistent(page))
+   return false;
+
+   /*
+* If skip is already cleared do no further checking once the
+* restart points have been set.
+*/
+   if (check_source && check_target && !get_pageblock_skip(page))
+   return true;
+
+   /*
+* If clearing skip for the target scanner, do not select a
+* non-movable pageblock as the starting point.
+*/
+   if (!check_source && check_target &&
+   get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
+   return false;
+
+   /*
+* Only clear the hint if a sample indicates there is either a
+* free page or an LRU page in the block. One or other condition
+* is necessary for the block to be a migration source/target.
+*/
+   page = pfn_to_page(pageblock_start_pfn(pfn));
+   if (zone != page_zone(page))
+   return false;
+   end_page = page + pageblock_nr_pages;
+
+   do {
+   if (check_source && PageLRU(page)) {
+   clear_pageblock_skip(page);
+   return true;
+   }
+
+   if (check_target && PageBuddy(page)) {
+   clear_pageblock_skip(page);
+   return true;
+   }
+
+   page += (1 << PAGE_ALLOC_COSTLY_ORDER);
+   } while (page < end_page);
+
+   return false;
+}
+
 /*
  * This function is called to clear all cached information on pageblocks that
  * should be skipped for page isolation when the migrate and free page scanner
@@ -238,30 +294,54 @@ static bool pageblock_skip_persistent(struct page *page)
  */
 static void __reset_isolation_suitable(struct zone *zone)
 {
-   unsigned long start_pfn = zone->zone_start_pfn;
-   unsigned long end_pfn = zone_end_pfn(zone);
-   unsigned long pfn;
+   unsigned long migrate_pfn = zone->zone_start_pfn;
+   unsigned long free_pfn = zone_end_pfn(zone);
+   unsigned long reset_migrate = free_pfn;
+   unsigned long reset_free = migrate_pfn;
+   bool source_set = false;
+   bool free_set = false;
 
-   zone->compact_blockskip_flush = false;
+   if (!zone->compact_blockskip_flush)
+   return;
 
-   /* Walk the zone and mark every pageblock as suitable for isolation */
-   for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
-   struct page *page;
+   zone->compact_blockskip_flush = false;
 
+   /*
+* Walk the zone and update pageblock skip information. Source looks
+* for PageLRU while target looks for PageBuddy. When the scanner
+* is found, both PageBuddy and PageLRU are checked as the pageblock
+* is suitable as both source and target.
+*/
+  

[PATCH 22/25] mm, compaction: Sample pageblocks for free pages

2019-01-04 Thread Mel Gorman
Once fast searching finishes, there is a possibility that the linear
scanner is scanning full blocks found by the fast scanner earlier. This
patch uses an adaptive stride to sample pageblocks for free pages. The
more consecutive full pageblocks encountered, the larger the stride until
a pageblock with free pages is found. The scanners might meet slightly
sooner but it is an acceptable risk given that the search of the free
lists may still encounter the pages and adjust the cached PFN of the free
scanner accordingly.

In terms of latency and success rates, the impact is not obvious but the
free scan rate is reduced by 87% on a 1-socket machine and 92% on a
2-socket machine. It's also the first time in the series where the number
of pages scanned by the migration scanner is greater than the free scanner
due to the increased search efficiency.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 652e249168b1..cc532e81a7b7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -441,6 +441,7 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
unsigned long *start_pfn,
unsigned long end_pfn,
struct list_head *freelist,
+   unsigned int stride,
bool strict)
 {
int nr_scanned = 0, total_isolated = 0;
@@ -450,10 +451,14 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
unsigned long blockpfn = *start_pfn;
unsigned int order;
 
+   /* Strict mode is for isolation, speed is secondary */
+   if (strict)
+   stride = 1;
+
cursor = pfn_to_page(blockpfn);
 
/* Isolate free pages. */
-   for (; blockpfn < end_pfn; blockpfn++, cursor++) {
+   for (; blockpfn < end_pfn; blockpfn += stride, cursor += stride) {
int isolated;
struct page *page = cursor;
 
@@ -624,7 +629,7 @@ isolate_freepages_range(struct compact_control *cc,
break;
 
isolated = isolate_freepages_block(cc, _start_pfn,
-   block_end_pfn, , true);
+   block_end_pfn, , 0, true);
 
/*
 * In strict mode, isolate_freepages_block() returns 0 if
@@ -1139,7 +1144,7 @@ fast_isolate_around(struct compact_control *cc, unsigned 
long pfn, unsigned long
 
/* Scan before */
if (start_pfn != pfn) {
-   isolate_freepages_block(cc, _pfn, pfn, >freepages, 
false);
+   isolate_freepages_block(cc, _pfn, pfn, >freepages, 1, 
false);
if (cc->nr_freepages >= cc->nr_migratepages)
return;
}
@@ -1147,7 +1152,7 @@ fast_isolate_around(struct compact_control *cc, unsigned 
long pfn, unsigned long
/* Scan after */
start_pfn = pfn + nr_isolated;
if (start_pfn != end_pfn)
-   isolate_freepages_block(cc, _pfn, end_pfn, 
>freepages, false);
+   isolate_freepages_block(cc, _pfn, end_pfn, 
>freepages, 1, false);
 
/* Skip this pageblock in the future as it's full or nearly full */
if (cc->nr_freepages < cc->nr_migratepages)
@@ -1333,7 +1338,9 @@ static void isolate_freepages(struct compact_control *cc)
unsigned long isolate_start_pfn; /* exact pfn we start at */
unsigned long block_end_pfn;/* end of current pageblock */
unsigned long low_pfn;   /* lowest pfn scanner is able to scan */
+   unsigned long nr_isolated;
struct list_head *freelist = >freepages;
+   unsigned int stride;
 
/* Try a small search of the free lists for a candidate */
isolate_start_pfn = fast_isolate_freepages(cc);
@@ -1356,6 +1363,7 @@ static void isolate_freepages(struct compact_control *cc)
block_end_pfn = min(block_start_pfn + pageblock_nr_pages,
zone_end_pfn(zone));
low_pfn = pageblock_end_pfn(cc->migrate_pfn);
+   stride = cc->mode == MIGRATE_ASYNC ? COMPACT_CLUSTER_MAX : 1;
 
/*
 * Isolate free pages until enough are available to migrate the
@@ -1387,8 +1395,8 @@ static void isolate_freepages(struct compact_control *cc)
continue;
 
/* Found a block suitable for isolating free pages from. */
-   isolate_freepages_block(cc, _start_pfn, block_end_pfn,
-   freelist, false);
+   nr_isolated = isolate_freepages_block(cc, _start_pfn,
+   block_end_pfn, freelist, stride, false);
 
/* Update the skip hint if the full pageblock was scanned */
if (isolate_start_pfn == block_end_pfn)
@@ 

[PATCH 20/25] mm, compaction: Reduce unnecessary skipping of migration target scanner

2019-01-04 Thread Mel Gorman
The fast isolation of pages can move the scanner faster than is necessary
depending on the contents of the free list. This patch will only allow
the fast isolation to initialise the scanner and advance it slowly. The
primary means of moving the scanner forward is via the linear scanner
to reduce the likelihood the migration source/target scanners meet
prematurely triggering a rescan.

4.20.0 4.20.0
   noresched-v2r15 slowfree-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  2736.50 (   0.00%) 2512.53 (   8.18%)
Amean fault-both-5  4133.70 (   0.00%) 4159.43 (  -0.62%)
Amean fault-both-7  5738.61 (   0.00%) 5950.15 (  -3.69%)
Amean fault-both-12 9392.82 (   0.00%) 8674.38 (   7.65%)
Amean fault-both-1813257.15 (   0.00%)12850.79 (   3.07%)
Amean fault-both-2416859.44 (   0.00%)17242.86 (  -2.27%)
Amean fault-both-3016249.30 (   0.00%)19404.18 * -19.42%*
Amean fault-both-3214904.71 (   0.00%)16200.79 (  -8.70%)

The impact to latency, success rates and scan rates is marginal but
avoiding unnecessary restarts is important. It helps later patches that
are more careful about how pageblocks are treated as earlier iterations
of those patches hit corner cases where the restarts were punishing and
very visible.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 27 ++-
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 75eb0d40d4d7..6c5552c6d8f9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -324,10 +324,9 @@ static void update_cached_migrate(struct compact_control 
*cc, unsigned long pfn)
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-   struct page *page, unsigned long nr_isolated)
+   struct page *page, unsigned long pfn)
 {
struct zone *zone = cc->zone;
-   unsigned long pfn;
 
if (cc->no_set_skip_hint)
return;
@@ -335,13 +334,8 @@ static void update_pageblock_skip(struct compact_control 
*cc,
if (!page)
return;
 
-   if (nr_isolated)
-   return;
-
set_pageblock_skip(page);
 
-   pfn = page_to_pfn(page);
-
/* Update where async and sync compaction should restart */
if (pfn < zone->compact_cached_free_pfn)
zone->compact_cached_free_pfn = pfn;
@@ -359,7 +353,7 @@ static inline bool pageblock_skip_persistent(struct page 
*page)
 }
 
 static inline void update_pageblock_skip(struct compact_control *cc,
-   struct page *page, unsigned long nr_isolated)
+   struct page *page, unsigned long pfn)
 {
 }
 
@@ -450,7 +444,7 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
bool strict)
 {
int nr_scanned = 0, total_isolated = 0;
-   struct page *cursor, *valid_page = NULL;
+   struct page *cursor;
unsigned long flags = 0;
bool locked = false;
unsigned long blockpfn = *start_pfn;
@@ -477,9 +471,6 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
if (!pfn_valid_within(blockpfn))
goto isolate_fail;
 
-   if (!valid_page)
-   valid_page = page;
-
/*
 * For compound pages such as THP and hugetlbfs, we can save
 * potentially a lot of iterations if we skip them at once.
@@ -576,10 +567,6 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
if (strict && blockpfn < end_pfn)
total_isolated = 0;
 
-   /* Update the pageblock-skip if the whole pageblock was scanned */
-   if (blockpfn == end_pfn)
-   update_pageblock_skip(cc, valid_page, total_isolated);
-
cc->total_free_scanned += nr_scanned;
if (total_isolated)
count_compact_events(COMPACTISOLATED, total_isolated);
@@ -1295,8 +1282,10 @@ fast_isolate_freepages(struct compact_control *cc)
}
}
 
-   if (highest && highest > cc->zone->compact_cached_free_pfn)
+   if (highest && highest >= cc->zone->compact_cached_free_pfn) {
+   highest -= pageblock_nr_pages;
cc->zone->compact_cached_free_pfn = highest;
+   }
 
cc->total_free_scanned += nr_scanned;
if (!page)
@@ -1376,6 +1365,10 @@ static void isolate_freepages(struct compact_control *cc)
isolate_freepages_block(cc, _start_pfn, block_end_pfn,
freelist, false);
 
+   /* Update the skip hint if the full pageblock was scanned */
+   

[PATCH 19/25] mm, compaction: Do not consider a need to reschedule as contention

2019-01-04 Thread Mel Gorman
Scanning on large machines can take a considerable length of time and
eventually need to be rescheduled. This is treated as an abort event but
that's not appropriate as the attempt is likely to be retried after making
numerous checks and taking another cycle through the page allocator.
This patch will check the need to reschedule if necessary but continue
the scanning.

The main benefit is reduced scanning when compaction is taking a long time
or the machine is over-saturated. It also avoids an unnecessary exit of
compaction that ends up being retried by the page allocator in the outer
loop.

4.20.0 4.20.0
  synccached-v2r15noresched-v2r15
Amean fault-both-3  2655.55 (   0.00%) 2736.50 (  -3.05%)
Amean fault-both-5  4580.67 (   0.00%) 4133.70 (   9.76%)
Amean fault-both-7  5740.50 (   0.00%) 5738.61 (   0.03%)
Amean fault-both-12 9237.55 (   0.00%) 9392.82 (  -1.68%)
Amean fault-both-1812899.51 (   0.00%)13257.15 (  -2.77%)
Amean fault-both-2416342.47 (   0.00%)16859.44 (  -3.16%)
Amean fault-both-3020394.26 (   0.00%)16249.30 *  20.32%*
Amean fault-both-3217450.76 (   0.00%)14904.71 *  14.59%*

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 1a41a2dbff24..75eb0d40d4d7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -398,19 +398,11 @@ static bool compact_lock_irqsave(spinlock_t *lock, 
unsigned long *flags,
return true;
 }
 
-/*
- * Aside from avoiding lock contention, compaction also periodically checks
- * need_resched() and records async compaction as contended if necessary.
- */
+/* Avoid soft-lockups due to long scan times */
 static inline void compact_check_resched(struct compact_control *cc)
 {
-   /* async compaction aborts if contended */
-   if (need_resched()) {
-   if (cc->mode == MIGRATE_ASYNC)
-   cc->contended = true;
-
+   if (need_resched())
cond_resched();
-   }
 }
 
 /*
-- 
2.16.4



[PATCH 18/25] mm, compaction: Rework compact_should_abort as compact_check_resched

2019-01-04 Thread Mel Gorman
With incremental changes, compact_should_abort no longer makes
any documented sense. Rename to compact_check_resched and update the
associated comments.  There is no benefit other than reducing redundant
code and making the intent slightly clearer. It could potentially be
merged with earlier patches but it just makes the review slightly
harder.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 61 ++---
 1 file changed, 23 insertions(+), 38 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index be27e4fa1b40..1a41a2dbff24 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -398,6 +398,21 @@ static bool compact_lock_irqsave(spinlock_t *lock, 
unsigned long *flags,
return true;
 }
 
+/*
+ * Aside from avoiding lock contention, compaction also periodically checks
+ * need_resched() and records async compaction as contended if necessary.
+ */
+static inline void compact_check_resched(struct compact_control *cc)
+{
+   /* async compaction aborts if contended */
+   if (need_resched()) {
+   if (cc->mode == MIGRATE_ASYNC)
+   cc->contended = true;
+
+   cond_resched();
+   }
+}
+
 /*
  * Compaction requires the taking of some coarse locks that are potentially
  * very heavily contended. The lock should be periodically unlocked to avoid
@@ -426,33 +441,7 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
return true;
}
 
-   if (need_resched()) {
-   if (cc->mode == MIGRATE_ASYNC)
-   cc->contended = true;
-   cond_resched();
-   }
-
-   return false;
-}
-
-/*
- * Aside from avoiding lock contention, compaction also periodically checks
- * need_resched() and either schedules in sync compaction or aborts async
- * compaction. This is similar to what compact_unlock_should_abort() does, but
- * is used where no lock is concerned.
- *
- * Returns false when no scheduling was needed, or sync compaction scheduled.
- * Returns true when async compaction should abort.
- */
-static inline bool compact_should_abort(struct compact_control *cc)
-{
-   /* async compaction aborts if contended */
-   if (need_resched()) {
-   if (cc->mode == MIGRATE_ASYNC)
-   cc->contended = true;
-
-   cond_resched();
-   }
+   compact_check_resched(cc);
 
return false;
 }
@@ -750,8 +739,7 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
return 0;
}
 
-   if (compact_should_abort(cc))
-   return 0;
+   compact_check_resched(cc);
 
if (cc->direct_compaction && (cc->mode == MIGRATE_ASYNC)) {
skip_on_failure = true;
@@ -1374,12 +1362,10 @@ static void isolate_freepages(struct compact_control 
*cc)
isolate_start_pfn = block_start_pfn) {
/*
 * This can iterate a massively long zone without finding any
-* suitable migration targets, so periodically check if we need
-* to schedule, or even abort async compaction.
+* suitable migration targets, so periodically check resched.
 */
-   if (!(block_start_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
-   && compact_should_abort(cc))
-   break;
+   if (!(block_start_pfn % (SWAP_CLUSTER_MAX * 
pageblock_nr_pages)))
+   compact_check_resched(cc);
 
page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
@@ -1673,11 +1659,10 @@ static isolate_migrate_t isolate_migratepages(struct 
zone *zone,
/*
 * This can potentially iterate a massively long zone with
 * many pageblocks unsuitable, so periodically check if we
-* need to schedule, or even abort async compaction.
+* need to schedule.
 */
-   if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages))
-   && compact_should_abort(cc))
-   break;
+   if (!(low_pfn % (SWAP_CLUSTER_MAX * pageblock_nr_pages)))
+   compact_check_resched(cc);
 
page = pageblock_pfn_to_page(block_start_pfn, block_end_pfn,
zone);
-- 
2.16.4



[PATCH 17/25] mm, compaction: Keep cached migration PFNs synced for unusable pageblocks

2019-01-04 Thread Mel Gorman
Migrate has separate cached PFNs for ASYNC and SYNC* migration on the
basis that some migrations will fail in ASYNC mode. However, if the cached
PFNs match at the start of scanning and pageblocks are skipped due to
having no isolation candidates, then the sync state does not matter.
This patch keeps matching cached PFNs in sync until a pageblock with
isolation candidates is found.

The actual benefit is marginal given that the sync scanner following the
async scanner will often skip a number of pageblocks but it's useless
work. Any benefit depends heavily on whether the scanners restarted
recently so overall the reduction in scan rates is a mere 2.8% which
is borderline noise.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 921720f7a416..be27e4fa1b40 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1967,6 +1967,7 @@ static enum compact_result compact_zone(struct 
compact_control *cc)
unsigned long end_pfn = zone_end_pfn(cc->zone);
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
+   bool update_cached;
unsigned long a, b, c;
 
cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
@@ -2019,6 +2020,17 @@ static enum compact_result compact_zone(struct 
compact_control *cc)
 
last_migrated_pfn = 0;
 
+   /*
+* Migrate has separate cached PFNs for ASYNC and SYNC* migration on
+* the basis that some migrations will fail in ASYNC mode. However,
+* if the cached PFNs match and pageblocks are skipped due to having
+* no isolation candidates, then the sync state does not matter.
+* Until a pageblock with isolation candidates is found, keep the
+* cached PFNs in sync to avoid revisiting the same blocks.
+*/
+   update_cached = !sync &&
+   cc->zone->compact_cached_migrate_pfn[0] == 
cc->zone->compact_cached_migrate_pfn[1];
+
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync);
 
@@ -2050,6 +2062,11 @@ static enum compact_result compact_zone(struct 
compact_control *cc)
last_migrated_pfn = 0;
goto out;
case ISOLATE_NONE:
+   if (update_cached) {
+   cc->zone->compact_cached_migrate_pfn[1] =
+   cc->zone->compact_cached_migrate_pfn[0];
+   }
+
/*
 * We haven't isolated and migrated anything, but
 * there might still be unflushed migrations from
@@ -2057,6 +2074,7 @@ static enum compact_result compact_zone(struct 
compact_control *cc)
 */
goto check_drain;
case ISOLATE_SUCCESS:
+   update_cached = false;
last_migrated_pfn = start_pfn;
;
}
-- 
2.16.4



RE: [PATCH] fsl/fman: avoid sleeping in atomic context while adding an address

2019-01-04 Thread Madalin-cristian Bucur
Hi,

this is a duplicate of this other patch addressing the issue:

commit 0d9c9a238faf925823bde866182c663b6d734f2e
Author: Scott Wood 
Date:   Thu Dec 27 18:29:09 2018 -0600

fsl/fman: Use GFP_ATOMIC in {memac,tgec}_add_hash_mac_address()

Thank you,
Madalin

> -Original Message-
> From: Yi Wang 
> Sent: Friday, January 4, 2019 7:50 AM
> To: Madalin-cristian Bucur 
> Cc: da...@davemloft.net; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; xue.zhih...@zte.com.cn; wang.y...@zte.com.cn;
> huang.jun...@zte.com.cn; Junhua Huang 
> Subject: [PATCH] fsl/fman: avoid sleeping in atomic context while adding
> an address
> 
> From: Junhua Huang 
> 
> dev_set_rx_mode will call function pointer mac_dev->add_hash_mac_addr
> while holding spin_lock_bh. The function pointer points to
> memac_add_hash_mac_address when ethernet type is fman-memac,
> which will kmalloc use GFP_KERNEL flag.
> 
> / # ifconfig eth2 192.168.1.168
> [  576.604544] BUG: sleeping function called from invalid context at
> mm/slab.h:393
> [  576.610587] in_atomic(): 1, irqs_disabled(): 0, pid: 2751, name:
> ifconfig
> [  576.616105] 2 locks held by ifconfig/2751:
> [  576.618916] #0:(rtnl_mutex)at: []
> .rtnl_lock+0x1c/0x30
> [  576.625523] #1:(_xmit_ETHER)at: []
> .dev_set_rx_mode+0x24/0x54
> [  576.632745] CPU: 5 PID: 2751 Comm: ifconfig Tainted: G   W 4.9.115-
> rt93-EMBSYS-@332 #3
> [  576.642942] Call Trace:
> [  576.644085] [c0007499b440] [c0a09eb4]
> .dump_stack+0xe0/0x14c (unreliable)
> [  576.650642] [c0007499b4d0] [c0076f3c]
> .___might_sleep+0x1ac/0x278
> [  576.656493] [c0007499b560] [c01aad6c]
> .kmem_cache_alloc+0x144/0x28c
> [  576.662518] [c0007499b620] [c0634c18]
> .memac_add_hash_mac_address+0x100/0x194
> [  576.669416] [c0007499b6b0] [c0630b54]
> .set_multi+0x1bc/0x20c
> [  576.674829] [c0007499b760] [c063718c]
> .dpaa_set_rx_mode+0x84/0x104
> [  576.680765] [c0007499b7e0] [c07f9394]
> .__dev_set_rx_mode+0x64/0xdc
> [  576.686701] [c0007499b870] [c07f943c]
> .dev_set_rx_mode+0x30/0x54
> [  576.692464] [c0007499b8f0] [c07f98f4]
> .__dev_change_flags+0x98/0x1c4
> [  576.698573] [c0007499b980] [c07f9a4c]
> .dev_change_flags+0x2c/0x80
> [  576.704429] [c0007499ba10] [c08cde28]
> .devinet_ioctl+0x624/0x8e0
> [  576.710191] [c0007499bb00] [c08d1678]
> .inet_ioctl+0x1f4/0x250
> [  576.715697] [c0007499bb70] [c07c89d8]
> .sock_do_ioctl+0x50/0xa8
> [  576.721284] [c0007499bc00] [c07c94e8]
> .sock_ioctl+0x2b8/0x39c
> [  576.726786] [c0007499bca0] [c01e2500]
> .do_vfs_ioctl+0xc8/0x8b4
> [  576.732373] [c0007499bd90] [c01e2d44] .SyS_ioctl+0x58/0xa4
> [  576.737612] [c0007499be30] [c7a4]
> system_call+0x38/0x108
> 
> Signed-off-by: Junhua Huang 
> ---
>  drivers/net/ethernet/freescale/fman/fman_memac.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/freescale/fman/fman_memac.c
> b/drivers/net/ethernet/freescale/fman/fman_memac.c
> index 71a5ded..21dd557 100644
> --- a/drivers/net/ethernet/freescale/fman/fman_memac.c
> +++ b/drivers/net/ethernet/freescale/fman/fman_memac.c
> @@ -923,7 +923,7 @@ int memac_add_hash_mac_address(struct fman_mac *memac,
> enet_addr_t *eth_addr)
>   hash = get_mac_addr_hash_code(addr) & HASH_CTRL_ADDR_MASK;
> 
>   /* Create element to be added to the driver hash table */
> - hash_entry = kmalloc(sizeof(*hash_entry), GFP_KERNEL);
> + hash_entry = kmalloc(sizeof(*hash_entry), GFP_ATOMIC);
>   if (!hash_entry)
>   return -ENOMEM;
>   hash_entry->addr = addr;
> --
> 2.15.2



[PATCH 16/25] mm, compaction: Check early for huge pages encountered by the migration scanner

2019-01-04 Thread Mel Gorman
When scanning for sources or targets, PageCompound is checked for huge
pages as they can be skipped quickly but it happens relatively late after
a lot of setup and checking. This patch short-cuts the check to make it
earlier. It might still change when the lock is acquired but this has
less overhead overall. The free scanner advances but the migration scanner
does not. Typically the free scanner encounters more movable blocks that
change state over the lifetime of the system and also tends to scan more
aggressively as it's actively filling its portion of the physical address
space with data. This could change in the future but for the moment,
this worked better in practice and incurred fewer scan restarts.

The impact on latency and allocation success rates is marginal but the
free scan rates are reduced by 32% and system CPU usage is reduced by
2.6%. The 2-socket results are not materially different.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 608d274f9880..921720f7a416 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1071,6 +1071,9 @@ static bool suitable_migration_source(struct 
compact_control *cc,
 {
int block_mt;
 
+   if (pageblock_skip_persistent(page))
+   return false;
+
if ((cc->mode != MIGRATE_ASYNC) || !cc->direct_compaction)
return true;
 
@@ -1693,12 +1696,17 @@ static isolate_migrate_t isolate_migratepages(struct 
zone *zone,
continue;
 
/*
-* For async compaction, also only scan in MOVABLE blocks.
-* Async compaction is optimistic to see if the minimum amount
-* of work satisfies the allocation.
+* For async compaction, also only scan in MOVABLE blocks
+* without huge pages. Async compaction is optimistic to see
+* if the minimum amount of work satisfies the allocation.
+* The cached PFN is updated as it's possible that all
+* remaining blocks between source and target are suitable
+* and the compaction scanners fail to meet.
 */
-   if (!suitable_migration_source(cc, page))
+   if (!suitable_migration_source(cc, page)) {
+   update_cached_migrate(cc, block_end_pfn);
continue;
+   }
 
/* Perform the isolation */
low_pfn = isolate_migratepages_block(cc, low_pfn,
-- 
2.16.4



[PATCH 14/25] mm, compaction: Avoid rescanning the same pageblock multiple times

2019-01-04 Thread Mel Gorman
Pageblocks are marked for skip when no pages are isolated after a scan.
However, it's possible to hit corner cases where the migration scanner
gets stuck near the boundary between the source and target scanner. Due
to pages being migrated in blocks of COMPACT_CLUSTER_MAX, pages that
are migrated can be reallocated before the pageblock is complete. The
pageblock is not necessarily skipped so it can be rescanned multiple
times. Similarly, a pageblock with some dirty/writeback pages may fail
to isolate and be rescanned until writeback completes which is wasteful.

This patch tracks if a pageblock is being rescanned. If so, then the entire
pageblock will be migrated as one operation. This narrows the race window
during which pages can be reallocated during migration. Secondly, if there
are pages that cannot be isolated then the pageblock will still be fully
scanned and marked for skipping. On the second rescan, the pageblock skip
is set and the migration scanner makes progress.

4.20.0 4.20.0
  finishscan-v2r15 norescan-v2r15
Amean fault-both-3  3729.80 (   0.00%) 2872.13 *  23.00%*
Amean fault-both-5  5148.49 (   0.00%) 4330.56 *  15.89%*
Amean fault-both-7  7393.24 (   0.00%) 6496.63 (  12.13%)
Amean fault-both-1211709.32 (   0.00%)10280.59 (  12.20%)
Amean fault-both-1816626.82 (   0.00%)11079.19 *  33.37%*
Amean fault-both-2419944.34 (   0.00%)17207.80 *  13.72%*
Amean fault-both-3023435.53 (   0.00%)17736.13 *  24.32%*
Amean fault-both-3223948.70 (   0.00%)18509.41 *  22.71%*

   4.20.0 4.20.0
 finishscan-v2r15 norescan-v2r15
Percentage huge-1 0.00 (   0.00%)0.00 (   0.00%)
Percentage huge-388.39 (   0.00%)   96.87 (   9.60%)
Percentage huge-592.07 (   0.00%)   94.63 (   2.77%)
Percentage huge-791.96 (   0.00%)   93.83 (   2.03%)
Percentage huge-12   93.38 (   0.00%)   92.65 (  -0.78%)
Percentage huge-18   91.89 (   0.00%)   93.66 (   1.94%)
Percentage huge-24   91.37 (   0.00%)   93.15 (   1.95%)
Percentage huge-30   92.77 (   0.00%)   93.16 (   0.42%)
Percentage huge-32   87.97 (   0.00%)   92.58 (   5.24%)

The fault latency reduction is large and while the THP allocation
success rate is only slightly higher, it's already high at this
point of the series.

Compaction migrate scanned60718343.0031772603.00
Compaction free scanned  933061894.0063267928.00

Migration scan rates are reduced by 48% and free scan rates are
also reduced as the same migration source block is not being selected
multiple times. The corner case where migration scan rates go through the
roof due to a dirty/writeback pageblock located at the boundary of the
migration/free scanner did not happen in this case. When it does happen,
the scan rates multiple by factors measured in the hundreds and would be
misleading to present.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 32 ++--
 mm/internal.h   |  1 +
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9438f0564ed5..9c2cc7955446 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -959,8 +959,11 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
cc->nr_migratepages++;
nr_isolated++;
 
-   /* Avoid isolating too much */
-   if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
+   /*
+* Avoid isolating too much unless this block is being
+* rescanned (e.g. dirty/writeback pages, parallel allocation).
+*/
+   if (cc->nr_migratepages == COMPACT_CLUSTER_MAX && !cc->rescan) {
++low_pfn;
break;
}
@@ -1007,11 +1010,14 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
spin_unlock_irqrestore(zone_lru_lock(zone), flags);
 
/*
-* Updated the cached scanner pfn if the pageblock was scanned
-* without isolating a page. The pageblock may not be marked
-* skipped already if there were no LRU pages in the block.
+* Updated the cached scanner pfn once the pageblock has been scanned
+* Pages will either be migrated in which case there is no point
+* scanning in the near future or migration failed in which case the
+* failure reason may persist. The block is marked for skipping if
+* there were no pages isolated in the block or if the block is
+* rescanned twice in a row.
 */
-   if (low_pfn == end_pfn && !nr_isolated) {
+   if (low_pfn == end_pfn && (!nr_isolated || cc->rescan)) {
   

[PATCH 15/25] mm, compaction: Finish pageblock scanning on contention

2019-01-04 Thread Mel Gorman
Async migration aborts on spinlock contention but contention can be high
when there are multiple compaction attempts and kswapd is active. The
consequence is that the migration scanners move forward uselessly while
still contending on locks for longer while leaving suitable migration
sources behind.

This patch will acquire the lock but track when contention occurs. When
it does, the current pageblock will finish as compaction may succeed for
that block and then abort. This will have a variable impact on latency as
in some cases useless scanning is avoided (reduces latency) but a lock
will be contended (increase latency) or a single contended pageblock is
scanned that would otherwise have been skipped (increase latency).

4.20.0 4.20.0
norescan-v2r15finishcontend-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  2872.13 (   0.00%) 2973.08 (  -3.51%)
Amean fault-both-5  4330.56 (   0.00%) 3870.19 (  10.63%)
Amean fault-both-7  6496.63 (   0.00%) 6580.50 (  -1.29%)
Amean fault-both-1210280.59 (   0.00%) 9527.40 (   7.33%)
Amean fault-both-1811079.19 (   0.00%)13395.86 * -20.91%*
Amean fault-both-2417207.80 (   0.00%)14936.94 *  13.20%*
Amean fault-both-3017736.13 (   0.00%)16748.46 (   5.57%)
Amean fault-both-3218509.41 (   0.00%)18521.30 (  -0.06%)

   4.20.0 4.20.0
   norescan-v2r15finishcontend-v2r15
Percentage huge-1 0.00 (   0.00%)0.00 (   0.00%)
Percentage huge-396.87 (   0.00%)   97.57 (   0.72%)
Percentage huge-594.63 (   0.00%)   96.88 (   2.39%)
Percentage huge-793.83 (   0.00%)   95.47 (   1.74%)
Percentage huge-12   92.65 (   0.00%)   98.64 (   6.47%)
Percentage huge-18   93.66 (   0.00%)   98.33 (   4.98%)
Percentage huge-24   93.15 (   0.00%)   98.88 (   6.15%)
Percentage huge-30   93.16 (   0.00%)   97.09 (   4.21%)
Percentage huge-32   92.58 (   0.00%)   96.20 (   3.92%)

As expected, a variable impact on latency while allocation success
rates are slightly higher. System CPU usage is reduced by about 10%
but scan rate impact is mixed

Compaction migrate scanned3177260319980216
Compaction free scanned   63267928   120381828

Migration scan rates are reduced 37% which is expected as a pageblock
is used by the async scanner instead of skipped but the free scanning is
increased. This can be partially accounted for by the increased success
rate but also by the fact that the scanners do not meet for longer when
pageblocks are actually used. Overall this is justified and completing
a pageblock scan is very important for later patches.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 95 +++--
 1 file changed, 39 insertions(+), 56 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9c2cc7955446..608d274f9880 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -376,24 +376,25 @@ static bool test_and_set_skip(struct compact_control *cc, 
struct page *page,
 
 /*
  * Compaction requires the taking of some coarse locks that are potentially
- * very heavily contended. For async compaction, back out if the lock cannot
- * be taken immediately. For sync compaction, spin on the lock if needed.
+ * very heavily contended. For async compaction, trylock and record if the
+ * lock is contended. The lock will still be acquired but compaction will
+ * abort when the current block is finished regardless of success rate.
+ * Sync compaction acquires the lock.
  *
- * Returns true if the lock is held
- * Returns false if the lock is not held and compaction should abort
+ * Always returns true which makes it easier to track lock state in callers.
  */
-static bool compact_trylock_irqsave(spinlock_t *lock, unsigned long *flags,
+static bool compact_lock_irqsave(spinlock_t *lock, unsigned long *flags,
struct compact_control *cc)
 {
-   if (cc->mode == MIGRATE_ASYNC) {
-   if (!spin_trylock_irqsave(lock, *flags)) {
-   cc->contended = true;
-   return false;
-   }
-   } else {
-   spin_lock_irqsave(lock, *flags);
+   /* Track if the lock is contended in async mode */
+   if (cc->mode == MIGRATE_ASYNC && !cc->contended) {
+   if (spin_trylock_irqsave(lock, *flags))
+   return true;
+
+   cc->contended = true;
}
 
+   spin_lock_irqsave(lock, *flags);
return true;
 }
 
@@ -426,10 +427,8 @@ static bool compact_unlock_should_abort(spinlock_t *lock,
}
 
if (need_resched()) {
-   if (cc->mode == MIGRATE_ASYNC) {
+

[PATCH 13/25] mm, compaction: Use free lists to quickly locate a migration target

2019-01-04 Thread Mel Gorman
Similar to the migration scanner, this patch uses the free lists to quickly
locate a migration target. The search is different in that lower orders
will be searched for a suitable high PFN if necessary but the search
is still bound. This is justified on the grounds that the free scanner
typically scans linearly much more than the migration scanner.

If a free page is found, it is isolated and compaction continues if enough
pages were isolated. For SYNC* scanning, the full pageblock is scanned
for any remaining free pages so that is can be marked for skipping in
the near future.

1-socket thpfioscale
4.20.0 4.20.0
 isolmig-v2r15 findfree-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  3066.68 (   0.00%) 2884.51 (   5.94%)
Amean fault-both-5  4298.49 (   0.00%) 4419.70 (  -2.82%)
Amean fault-both-7  5986.99 (   0.00%) 6039.04 (  -0.87%)
Amean fault-both-12 9324.85 (   0.00%) 9992.34 (  -7.16%)
Amean fault-both-1813350.05 (   0.00%)12690.05 (   4.94%)
Amean fault-both-2413491.77 (   0.00%)14393.93 (  -6.69%)
Amean fault-both-3015630.86 (   0.00%)16894.08 (  -8.08%)
Amean fault-both-3217428.50 (   0.00%)17813.68 (  -2.21%)

The impact on latency is variable but the search is optimistic and
sensitive to the exact system state. Success rates are similar but
the major impact is to the rate of scanning

4.20.0-rc6  4.20.0-rc6
  isolmig-v1r4findfree-v1r8
Compaction migrate scanned2551648828324352
Compaction free scanned   8760332156131065

The free scan rates are reduced by 35%. The 2-socket reductions for the
free scanner are more dramatic which is a likely reflection that the
machine has more memory.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 203 ++--
 1 file changed, 198 insertions(+), 5 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 24e3a9db4b70..9438f0564ed5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1136,7 +1136,7 @@ static inline bool compact_scanners_met(struct 
compact_control *cc)
 
 /* Reorder the free list to reduce repeated future searches */
 static void
-move_freelist_tail(struct list_head *freelist, struct page *freepage)
+move_freelist_head(struct list_head *freelist, struct page *freepage)
 {
LIST_HEAD(sublist);
 
@@ -1147,6 +1147,193 @@ move_freelist_tail(struct list_head *freelist, struct 
page *freepage)
}
 }
 
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+   LIST_HEAD(sublist);
+
+   if (!list_is_last(freelist, >lru)) {
+   list_cut_before(, freelist, >lru);
+   if (!list_empty())
+   list_splice_tail(, freelist);
+   }
+}
+
+static void
+fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned 
long nr_isolated)
+{
+   unsigned long start_pfn, end_pfn;
+   struct page *page = pfn_to_page(pfn);
+
+   /* Do not search around if there are enough pages already */
+   if (cc->nr_freepages >= cc->nr_migratepages)
+   return;
+
+   /* Minimise scanning during async compaction */
+   if (cc->direct_compaction && cc->mode == MIGRATE_ASYNC)
+   return;
+
+   /* Pageblock boundaries */
+   start_pfn = pageblock_start_pfn(pfn);
+   end_pfn = min(start_pfn + pageblock_nr_pages, zone_end_pfn(cc->zone));
+
+   /* Scan before */
+   if (start_pfn != pfn) {
+   isolate_freepages_block(cc, _pfn, pfn, >freepages, 
false);
+   if (cc->nr_freepages >= cc->nr_migratepages)
+   return;
+   }
+
+   /* Scan after */
+   start_pfn = pfn + nr_isolated;
+   if (start_pfn != end_pfn)
+   isolate_freepages_block(cc, _pfn, end_pfn, 
>freepages, false);
+
+   /* Skip this pageblock in the future as it's full or nearly full */
+   if (cc->nr_freepages < cc->nr_migratepages)
+   set_pageblock_skip(page);
+}
+
+static unsigned long
+fast_isolate_freepages(struct compact_control *cc)
+{
+   unsigned int limit = min(1U, freelist_scan_limit(cc) >> 1);
+   unsigned int order_scanned = 0, nr_scanned = 0;
+   unsigned long low_pfn, min_pfn, high_pfn = 0, highest = 0;
+   unsigned long nr_isolated = 0;
+   unsigned long distance;
+   struct page *page = NULL;
+   bool scan_start = false;
+   int order;
+
+   /*
+* If starting the scan, use a deeper search and use the highest
+* PFN found if a suitable one is not found.
+*/
+   if (cc->free_pfn == pageblock_start_pfn(zone_end_pfn(cc->zone) - 1)) {
+   limit = pageblock_nr_pages >> 1;
+   scan_start = true;
+   }
+

[PATCH 11/25] mm, compaction: Use free lists to quickly locate a migration source

2019-01-04 Thread Mel Gorman
The migration scanner is a linear scan of a zone with a potentiall large
search space.  Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.

The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire block.
The lowest PFN found during searches is tracked as the basis of the start
for the linear search after the first search of the free list fails.
After the search, the free list is shuffled so that the next search will
not encounter the same page. If the search fails then the subsequent
searches will be shorter and the linear scanner is used.

If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used. It
is somewhat pointless to use the list search in those cases. Small free
pages must be used for the search and there is no guarantee that movable
pages are located within that block that are contiguous.

4.20.0 4.20.0
failfast-v2r15  findmig-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  3833.72 (   0.00%) 3505.69 (   8.56%)
Amean fault-both-5  4967.15 (   0.00%) 5794.13 * -16.65%*
Amean fault-both-7  7139.19 (   0.00%) 7663.09 (  -7.34%)
Amean fault-both-1211326.30 (   0.00%)10983.36 (   3.03%)
Amean fault-both-1816270.70 (   0.00%)13602.71 *  16.40%*
Amean fault-both-2419839.65 (   0.00%)16145.77 *  18.62%*
Amean fault-both-3021707.05 (   0.00%)19753.82 (   9.00%)
Amean fault-both-3221968.16 (   0.00%)20616.16 (   6.15%)

   4.20.0 4.20.0
   failfast-v2r15  findmig-v2r15
Percentage huge-1 0.00 (   0.00%)0.00 (   0.00%)
Percentage huge-384.62 (   0.00%)   90.58 (   7.05%)
Percentage huge-588.43 (   0.00%)   91.34 (   3.29%)
Percentage huge-788.33 (   0.00%)   92.21 (   4.39%)
Percentage huge-12   88.74 (   0.00%)   92.48 (   4.21%)
Percentage huge-18   86.52 (   0.00%)   91.65 (   5.93%)
Percentage huge-24   86.42 (   0.00%)   90.23 (   4.41%)
Percentage huge-30   86.67 (   0.00%)   90.17 (   4.04%)
Percentage huge-32   86.00 (   0.00%)   89.72 (   4.32%)

This shows an improvement in allocation latencies and a slight increase
in allocation success rates. While not presented, there was a 13% reduction
in migration scanning and a 10% reduction on system CPU usage. A 2-socket
machine showed similar benefits.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 179 +++-
 mm/internal.h   |   2 +
 2 files changed, 179 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 8f0ce44dba41..137e32e8a2f5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1050,6 +1050,12 @@ static bool suitable_migration_target(struct 
compact_control *cc,
return false;
 }
 
+static inline unsigned int
+freelist_scan_limit(struct compact_control *cc)
+{
+   return (COMPACT_CLUSTER_MAX >> cc->fast_search_fail) + 1;
+}
+
 /*
  * Test whether the free scanner has reached the same or lower pageblock than
  * the migration scanner, and compaction should thus terminate.
@@ -1060,6 +1066,19 @@ static inline bool compact_scanners_met(struct 
compact_control *cc)
<= (cc->migrate_pfn >> pageblock_order);
 }
 
+/* Reorder the free list to reduce repeated future searches */
+static void
+move_freelist_tail(struct list_head *freelist, struct page *freepage)
+{
+   LIST_HEAD(sublist);
+
+   if (!list_is_last(freelist, >lru)) {
+   list_cut_position(, freelist, >lru);
+   if (!list_empty())
+   list_splice_tail(, freelist);
+   }
+}
+
 /*
  * Based on information in the current compact_control, find blocks
  * suitable for isolating free pages from and then isolate them.
@@ -1217,6 +1236,160 @@ typedef enum {
  */
 int sysctl_compact_unevictable_allowed __read_mostly = 1;
 
+static inline void
+update_fast_start_pfn(struct compact_control *cc, unsigned long pfn)
+{
+   if (cc->fast_start_pfn == ULONG_MAX)
+   return;
+
+   if (!cc->fast_start_pfn)
+   cc->fast_start_pfn = pfn;
+
+   cc->fast_start_pfn = min(cc->fast_start_pfn, pfn);
+}
+
+static inline void
+reinit_migrate_pfn(struct compact_control *cc)
+{
+   if (!cc->fast_start_pfn || cc->fast_start_pfn == ULONG_MAX)
+   

[PATCH 12/25] mm, compaction: Keep migration source private to a single compaction instance

2019-01-04 Thread Mel Gorman
Due to either a fast search of the free list or a linear scan, it is
possible for multiple compaction instances to pick the same pageblock
for migration.  This is lucky for one scanner and increased scanning for
all the others. It also allows a race between requests on which first
allocates the resulting free block.

This patch tests and updates the pageblock skip for the migration scanner
carefully. When isolating a block, it will check and skip if the block is
already in use. Once the zone lock is acquired, it will be rechecked so
that only one scanner can set the pageblock skip for exclusive use. Any
scanner contending will continue with a linear scan. The skip bit is
still set if no pages can be isolated in a range. While this may result
in redundant scanning, it avoids unnecessarily acquiring the zone lock
when there are no suitable migration sources.

1-socket thpscale
4.20.0 4.20.0
 findmig-v2r15  isolmig-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  3505.69 (   0.00%) 3066.68 *  12.52%*
Amean fault-both-5  5794.13 (   0.00%) 4298.49 *  25.81%*
Amean fault-both-7  7663.09 (   0.00%) 5986.99 *  21.87%*
Amean fault-both-1210983.36 (   0.00%) 9324.85 (  15.10%)
Amean fault-both-1813602.71 (   0.00%)13350.05 (   1.86%)
Amean fault-both-2416145.77 (   0.00%)13491.77 *  16.44%*
Amean fault-both-3019753.82 (   0.00%)15630.86 *  20.87%*
Amean fault-both-3220616.16 (   0.00%)17428.50 *  15.46%*

This is the first patch that shows a significant reduction in latency as
multiple compaction scanners do not operate on the same blocks. There is
a small increase in the success rate

   4.20.0-rc6 4.20.0-rc6
 findmig-v1r4   isolmig-v1r4
Percentage huge-390.58 (   0.00%)   95.84 (   5.81%)
Percentage huge-591.34 (   0.00%)   94.19 (   3.12%)
Percentage huge-792.21 (   0.00%)   93.78 (   1.71%)
Percentage huge-12   92.48 (   0.00%)   94.33 (   2.00%)
Percentage huge-18   91.65 (   0.00%)   94.15 (   2.72%)
Percentage huge-24   90.23 (   0.00%)   94.23 (   4.43%)
Percentage huge-30   90.17 (   0.00%)   95.17 (   5.54%)
Percentage huge-32   89.72 (   0.00%)   93.59 (   4.32%)

Compaction migrate scanned5416830625516488
Compaction free scanned  80053095487603321

Migration scan rates are reduced by 52%.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 126 
 1 file changed, 99 insertions(+), 27 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 137e32e8a2f5..24e3a9db4b70 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -279,13 +279,52 @@ void reset_isolation_suitable(pg_data_t *pgdat)
}
 }
 
+/*
+ * Sets the pageblock skip bit if it was clear. Note that this is a hint as
+ * locks are not required for read/writers. Returns true if it was already set.
+ */
+static bool test_and_set_skip(struct compact_control *cc, struct page *page,
+   unsigned long pfn)
+{
+   bool skip;
+
+   /* Do no update if skip hint is being ignored */
+   if (cc->ignore_skip_hint)
+   return false;
+
+   if (!IS_ALIGNED(pfn, pageblock_nr_pages))
+   return false;
+
+   skip = get_pageblock_skip(page);
+   if (!skip && !cc->no_set_skip_hint)
+   set_pageblock_skip(page);
+
+   return skip;
+}
+
+static void update_cached_migrate(struct compact_control *cc, unsigned long 
pfn)
+{
+   struct zone *zone = cc->zone;
+
+   pfn = pageblock_end_pfn(pfn);
+
+   /* Set for isolation rather than compaction */
+   if (cc->no_set_skip_hint)
+   return;
+
+   if (pfn > zone->compact_cached_migrate_pfn[0])
+   zone->compact_cached_migrate_pfn[0] = pfn;
+   if (cc->mode != MIGRATE_ASYNC &&
+   pfn > zone->compact_cached_migrate_pfn[1])
+   zone->compact_cached_migrate_pfn[1] = pfn;
+}
+
 /*
  * If no pages were isolated then mark this pageblock to be skipped in the
  * future. The information is later cleared by __reset_isolation_suitable().
  */
 static void update_pageblock_skip(struct compact_control *cc,
-   struct page *page, unsigned long nr_isolated,
-   bool migrate_scanner)
+   struct page *page, unsigned long nr_isolated)
 {
struct zone *zone = cc->zone;
unsigned long pfn;
@@ -304,16 +343,8 @@ static void update_pageblock_skip(struct compact_control 
*cc,
pfn = page_to_pfn(page);
 
/* Update where async and sync compaction should restart */
-   if (migrate_scanner) {
-   if (pfn 

[PATCH 06/25] mm, compaction: Skip pageblocks with reserved pages

2019-01-04 Thread Mel Gorman
Reserved pages are set at boot time, tend to be clustered and almost never
become unreserved. When isolating pages for either migration sources or
target, skip the entire pageblock is one PageReserved page is encountered
on the grounds that it is highly probable the entire pageblock is reserved.

The performance impact is relative to the number of reserved pages in
the system and their location so it'll be variable but intuitively it
should make sense. If the memblock allocator was ever changed to spread
reserved pages throughout the address space then this patch would be
impaired but it would also be considered a bug given that such a change
would ruin fragmentation.

On both 1-socket and 2-socket machines, scan rates are reduced slightly
on workloads that intensively allocate THP while the system is fragmented.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3afa4e9188b6..94d1e5b062ea 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -484,6 +484,15 @@ static unsigned long isolate_freepages_block(struct 
compact_control *cc,
goto isolate_fail;
}
 
+   /*
+* A reserved page is never freed and tend to be clustered in
+* the same pageblock. Skip the block.
+*/
+   if (PageReserved(page)) {
+   blockpfn = end_pfn;
+   break;
+   }
+
if (!PageBuddy(page))
goto isolate_fail;
 
@@ -827,6 +836,13 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
goto isolate_success;
}
 
+   /*
+* A reserved page is never freed and tend to be
+* clustered in the same pageblocks. Skip the block.
+*/
+   if (PageReserved(page))
+   low_pfn = end_pfn;
+
goto isolate_fail;
}
 
-- 
2.16.4



[PATCH 10/25] mm, compaction: Ignore the fragmentation avoidance boost for isolation and compaction

2019-01-04 Thread Mel Gorman
When pageblocks get fragmented, watermarks are artifically boosted to
reclaim pages to avoid further fragmentation events. However, compaction
is often either fragmentation-neutral or moving movable pages away from
unmovable/reclaimable pages. As the true watermarks are preserved, allow
compaction to ignore the boost factor.

The expected impact is very slight as the main benefit is that compaction
is slightly more likely to succeed when the system has been fragmented
very recently. On both 1-socket and 2-socket machines for THP-intensive
allocation during fragmentation the success rate was increased by less
than 1% which is marginal. However, detailed tracing indicated that
failure of migration due to a premature ENOMEM triggered by watermark
checks were eliminated.

Signed-off-by: Mel Gorman 
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 57ba9d1da519..05c9a81d54ed 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2958,7 +2958,7 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
 * watermark, because we already know our high-order page
 * exists.
 */
-   watermark = min_wmark_pages(zone) + (1UL << order);
+   watermark = zone->_watermark[WMARK_MIN] + (1UL << order);
if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA))
return 0;
 
-- 
2.16.4



[PATCH 08/25] mm, compaction: Always finish scanning of a full pageblock

2019-01-04 Thread Mel Gorman
When compaction is finishing, it uses a flag to ensure the pageblock
is complete but it makes sense to always complete migration of a
pageblock. Minimally, skip information is based on a pageblock and
partially scanned pageblocks may incur more scanning in the future. The
pageblock skip handling also becomes more strict later in the series and
the hint is more useful if a complete pageblock was always scanned.

The potentially impacts latency as more scanning is done but it's not a
consistent win or loss as the scanning is not always a high percentage
of the pageblock and sometimes it is offset by future reductions
in scanning. Hence, the results are not presented this time due to a
misleading mix of gains/losses without any clear pattern. However, full
scanning of the pageblock is important for later patches.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
---
 mm/compaction.c | 19 ---
 mm/internal.h   |  1 -
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 94d1e5b062ea..8bf2090231a3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1347,16 +1347,14 @@ static enum compact_result __compact_finished(struct 
compact_control *cc)
if (is_via_compact_memory(cc->order))
return COMPACT_CONTINUE;
 
-   if (cc->finishing_block) {
-   /*
-* We have finished the pageblock, but better check again that
-* we really succeeded.
-*/
-   if (IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
-   cc->finishing_block = false;
-   else
-   return COMPACT_CONTINUE;
-   }
+   /*
+* Always finish scanning a pageblock to reduce the possibility of
+* fallbacks in the future. This is particularly important when
+* migration source is unmovable/reclaimable but it's not worth
+* special casing.
+*/
+   if (!IS_ALIGNED(cc->migrate_pfn, pageblock_nr_pages))
+   return COMPACT_CONTINUE;
 
/* Direct compactor: Is a suitable page free? */
for (order = cc->order; order < MAX_ORDER; order++) {
@@ -1398,7 +1396,6 @@ static enum compact_result __compact_finished(struct 
compact_control *cc)
return COMPACT_SUCCESS;
}
 
-   cc->finishing_block = true;
return COMPACT_CONTINUE;
}
}
diff --git a/mm/internal.h b/mm/internal.h
index c6f794ad21a9..edb4029f64c8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -202,7 +202,6 @@ struct compact_control {
bool direct_compaction; /* False from kcompactd or /proc/... */
bool whole_zone;/* Whole zone should/has been scanned */
bool contended; /* Signal lock or sched contention */
-   bool finishing_block;   /* Finishing current pageblock */
 };
 
 unsigned long
-- 
2.16.4



[PATCH 09/25] mm, compaction: Use the page allocator bulk-free helper for lists of pages

2019-01-04 Thread Mel Gorman
release_pages() is a simpler version of free_unref_page_list() but it
tracks the highest PFN for caching the restart point of the compaction
free scanner. This patch optionally tracks the highest PFN in the core
helper and converts compaction to use it. The performance impact is
limited but it should reduce lock contention slightly in some cases.
The main benefit is removing some partially duplicated code.

Signed-off-by: Mel Gorman 
---
 include/linux/gfp.h |  7 ++-
 mm/compaction.c | 12 +++-
 mm/page_alloc.c | 10 +-
 3 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 5f5e25fd6149..9e58799b730f 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -543,7 +543,12 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t 
size, gfp_t gfp_mask);
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
 extern void free_unref_page(struct page *page);
-extern void free_unref_page_list(struct list_head *list);
+extern void __free_page_list(struct list_head *list, bool dropref, unsigned 
long *highest_pfn);
+
+static inline void free_unref_page_list(struct list_head *list)
+{
+   return __free_page_list(list, false, NULL);
+}
 
 struct page_frag_cache;
 extern void __page_frag_cache_drain(struct page *page, unsigned int count);
diff --git a/mm/compaction.c b/mm/compaction.c
index 8bf2090231a3..8f0ce44dba41 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -52,16 +52,10 @@ static inline void count_compact_events(enum vm_event_item 
item, long delta)
 
 static unsigned long release_freepages(struct list_head *freelist)
 {
-   struct page *page, *next;
-   unsigned long high_pfn = 0;
+   unsigned long high_pfn;
 
-   list_for_each_entry_safe(page, next, freelist, lru) {
-   unsigned long pfn = page_to_pfn(page);
-   list_del(>lru);
-   __free_page(page);
-   if (pfn > high_pfn)
-   high_pfn = pfn;
-   }
+   __free_page_list(freelist, true, _pfn);
+   INIT_LIST_HEAD(freelist);
 
return high_pfn;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cde5dac6229a..57ba9d1da519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2876,18 +2876,26 @@ void free_unref_page(struct page *page)
 /*
  * Free a list of 0-order pages
  */
-void free_unref_page_list(struct list_head *list)
+void __free_page_list(struct list_head *list, bool dropref,
+   unsigned long *highest_pfn)
 {
struct page *page, *next;
unsigned long flags, pfn;
int batch_count = 0;
 
+   if (highest_pfn)
+   *highest_pfn = 0;
+
/* Prepare pages for freeing */
list_for_each_entry_safe(page, next, list, lru) {
+   if (dropref)
+   WARN_ON_ONCE(!put_page_testzero(page));
pfn = page_to_pfn(page);
if (!free_unref_page_prepare(page, pfn))
list_del(>lru);
set_page_private(page, pfn);
+   if (highest_pfn && pfn > *highest_pfn)
+   *highest_pfn = pfn;
}
 
local_irq_save(flags);
-- 
2.16.4



[PATCH 07/25] mm, migrate: Immediately fail migration of a page with no migration handler

2019-01-04 Thread Mel Gorman
Pages with no migration handler use a fallback handler which sometimes
works and sometimes persistently retries. A historical example was blockdev
pages but there are others such as odd refcounting when page->private
is used.  These are retried multiple times which is wasteful during
compaction so this patch will fail migration faster unless the caller
specifies MIGRATE_SYNC.

This is not expected to help THP allocation success rates but it did
reduce latencies very slightly in some cases.

1-socket thpfioscale
4.20.0 4.20.0
  noreserved-v2r15 failfast-v2r15
Amean fault-both-1 0.00 (   0.00%)0.00 *   0.00%*
Amean fault-both-3  3839.67 (   0.00%) 3833.72 (   0.15%)
Amean fault-both-5  5177.47 (   0.00%) 4967.15 (   4.06%)
Amean fault-both-7  7245.03 (   0.00%) 7139.19 (   1.46%)
Amean fault-both-1211534.89 (   0.00%)11326.30 (   1.81%)
Amean fault-both-1816241.10 (   0.00%)16270.70 (  -0.18%)
Amean fault-both-2419075.91 (   0.00%)19839.65 (  -4.00%)
Amean fault-both-3022712.11 (   0.00%)21707.05 (   4.43%)
Amean fault-both-3221692.92 (   0.00%)21968.16 (  -1.27%)

The 2-socket results are not materially different. Scan rates are similar
as expected.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 5d1839a9148d..547cc1f3f3bb 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -916,7 +916,7 @@ static int fallback_migrate_page(struct address_space 
*mapping,
 */
if (page_has_private(page) &&
!try_to_release_page(page, GFP_KERNEL))
-   return -EAGAIN;
+   return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
 
return migrate_page(mapping, newpage, page, mode);
 }
-- 
2.16.4



[PATCH 04/25] mm, compaction: Remove unnecessary zone parameter in some instances

2019-01-04 Thread Mel Gorman
A zone parameter is passed into a number of top-level compaction functions
despite the fact that it's already in cache_control. This is harmless but
it did need an audit to check if zone actually ever changes meaningfully.
This patches removes the parameter in a number of top-level functions. The
change could be much deeper but this was enough to briefly clarify the
flow.

No functional change.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 54 ++
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index fb4d9f52ed56..7acb43f07303 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1300,8 +1300,7 @@ static inline bool is_via_compact_memory(int order)
return order == -1;
 }
 
-static enum compact_result __compact_finished(struct zone *zone,
-   struct compact_control *cc)
+static enum compact_result __compact_finished(struct compact_control *cc)
 {
unsigned int order;
const int migratetype = cc->migratetype;
@@ -1312,7 +1311,7 @@ static enum compact_result __compact_finished(struct zone 
*zone,
/* Compaction run completes if the migrate and free scanner meet */
if (compact_scanners_met(cc)) {
/* Let the next compaction start anew. */
-   reset_cached_positions(zone);
+   reset_cached_positions(cc->zone);
 
/*
 * Mark that the PG_migrate_skip information should be cleared
@@ -1321,7 +1320,7 @@ static enum compact_result __compact_finished(struct zone 
*zone,
 * based on an allocation request.
 */
if (cc->direct_compaction)
-   zone->compact_blockskip_flush = true;
+   cc->zone->compact_blockskip_flush = true;
 
if (cc->whole_zone)
return COMPACT_COMPLETE;
@@ -1345,7 +1344,7 @@ static enum compact_result __compact_finished(struct zone 
*zone,
 
/* Direct compactor: Is a suitable page free? */
for (order = cc->order; order < MAX_ORDER; order++) {
-   struct free_area *area = >free_area[order];
+   struct free_area *area = >zone->free_area[order];
bool can_steal;
 
/* Job done if page is free of the right migratetype */
@@ -1391,13 +1390,12 @@ static enum compact_result __compact_finished(struct 
zone *zone,
return COMPACT_NO_SUITABLE_PAGE;
 }
 
-static enum compact_result compact_finished(struct zone *zone,
-   struct compact_control *cc)
+static enum compact_result compact_finished(struct compact_control *cc)
 {
int ret;
 
-   ret = __compact_finished(zone, cc);
-   trace_mm_compaction_finished(zone, cc->order, ret);
+   ret = __compact_finished(cc);
+   trace_mm_compaction_finished(cc->zone, cc->order, ret);
if (ret == COMPACT_NO_SUITABLE_PAGE)
ret = COMPACT_CONTINUE;
 
@@ -1524,16 +1522,16 @@ bool compaction_zonelist_suitable(struct alloc_context 
*ac, int order,
return false;
 }
 
-static enum compact_result compact_zone(struct zone *zone, struct 
compact_control *cc)
+static enum compact_result compact_zone(struct compact_control *cc)
 {
enum compact_result ret;
-   unsigned long start_pfn = zone->zone_start_pfn;
-   unsigned long end_pfn = zone_end_pfn(zone);
+   unsigned long start_pfn = cc->zone->zone_start_pfn;
+   unsigned long end_pfn = zone_end_pfn(cc->zone);
unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
 
cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
-   ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
+   ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
cc->classzone_idx);
/* Compaction is likely to fail */
if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
@@ -1546,8 +1544,8 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
 * Clear pageblock skip if there were failures recently and compaction
 * is about to be retried after being deferred.
 */
-   if (compaction_restarting(zone, cc->order))
-   __reset_isolation_suitable(zone);
+   if (compaction_restarting(cc->zone, cc->order))
+   __reset_isolation_suitable(cc->zone);
 
/*
 * Setup to move all movable pages to the end of the zone. Used cached
@@ -1559,16 +1557,16 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
cc->migrate_pfn = start_pfn;
cc->free_pfn = pageblock_start_pfn(end_pfn - 1);
} else {
-   cc->migrate_pfn = zone->compact_cached_migrate_pfn[sync];
-   cc->free_pfn = zone->compact_cached_free_pfn;
+ 

[PATCH 05/25] mm, compaction: Rename map_pages to split_map_pages

2019-01-04 Thread Mel Gorman
It's non-obvious that high-order free pages are split into order-0 pages
from the function name. Fix it.

Signed-off-by: Mel Gorman 
---
 mm/compaction.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7acb43f07303..3afa4e9188b6 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -66,7 +66,7 @@ static unsigned long release_freepages(struct list_head 
*freelist)
return high_pfn;
 }
 
-static void map_pages(struct list_head *list)
+static void split_map_pages(struct list_head *list)
 {
unsigned int i, order, nr_pages;
struct page *page, *next;
@@ -644,7 +644,7 @@ isolate_freepages_range(struct compact_control *cc,
}
 
/* __isolate_free_page() does not map the pages */
-   map_pages();
+   split_map_pages();
 
if (pfn < end_pfn) {
/* Loop terminated early, cleanup. */
@@ -1141,7 +1141,7 @@ static void isolate_freepages(struct compact_control *cc)
}
 
/* __isolate_free_page() does not map the pages */
-   map_pages(freelist);
+   split_map_pages(freelist);
 
/*
 * Record where the free scanner will restart next time. Either we
-- 
2.16.4



[PATCH 02/25] mm, compaction: Rearrange compact_control

2019-01-04 Thread Mel Gorman
compact_control spans two cache lines with write-intensive lines on
both. Rearrange so the most write-intensive fields are in the same
cache line. This has a negligible impact on the overall performance of
compaction and is more a tidying exercise than anything.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
---
 mm/internal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 5ddf5d3771a0..9437ba5791db 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -183,14 +183,14 @@ extern int user_min_free_kbytes;
 struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
struct list_head migratepages;  /* List of pages being migrated */
-   struct zone *zone;
unsigned int nr_freepages;  /* Number of isolated free pages */
unsigned int nr_migratepages;   /* Number of pages to migrate */
-   unsigned long total_migrate_scanned;
-   unsigned long total_free_scanned;
unsigned long free_pfn; /* isolate_freepages search base */
unsigned long migrate_pfn;  /* isolate_migratepages search base */
unsigned long last_migrated_pfn;/* Not yet flushed page being freed */
+   struct zone *zone;
+   unsigned long total_migrate_scanned;
+   unsigned long total_free_scanned;
const gfp_t gfp_mask;   /* gfp mask of a direct compactor */
int order;  /* order a direct compactor needs */
int migratetype;/* migratetype of direct compactor */
-- 
2.16.4



[PATCH 00/25] Increase success rates and reduce latency of compaction v2

2019-01-04 Thread Mel Gorman
This series reduces scan rates and success rates of compaction, primarily
by using the free lists to shorten scans, better controlling of skip
information and whether multiple scanners can target the same block and
capturing pageblocks before being stolen by parallel requests. The series
is based on the 4.21/5.0 merge window after Andrew's tree had been merged.
It's known to rebase cleanly.

Primarily I'm using thpscale to measure the impact of the series. The
benchmark creates a large file, maps it, faults it, punches holes in the
mapping so that the virtual address space is fragmented and then tries
to allocate THP. It re-executes for different numbers of threads. From a
fragmentation perspective, the workload is relatively benign but it does
stress compaction.

The overall impact on latencies for a 1-socket machine is

  baselinepatches
Amean fault-both-3  5362.80 (   0.00%) 4446.89 *  17.08%*
Amean fault-both-5  9488.75 (   0.00%) 5660.86 *  40.34%*
Amean fault-both-7 11909.86 (   0.00%) 8549.63 *  28.21%*
Amean fault-both-1216185.09 (   0.00%)11508.36 *  28.90%*
Amean fault-both-1812057.72 (   0.00%)19013.48 * -57.69%*
Amean fault-both-2423939.95 (   0.00%)19676.16 *  17.81%*
Amean fault-both-3026606.14 (   0.00%)27363.23 (  -2.85%)
Amean fault-both-3231677.12 (   0.00%)23154.09 *  26.91%*

While there is a glitch at the 18-thread mark, it's known that the base
page allocation latency was much lower and huge pages were taking
longer -- partially due a high allocation success rate.

The allocation success rates are much improved

 baselinepatches
Percentage huge-370.93 (   0.00%)   98.30 (  38.60%)
Percentage huge-556.02 (   0.00%)   83.36 (  48.81%)
Percentage huge-760.98 (   0.00%)   89.04 (  46.01%)
Percentage huge-12   73.02 (   0.00%)   94.36 (  29.23%)
Percentage huge-18   94.37 (   0.00%)   95.87 (   1.58%)
Percentage huge-24   84.95 (   0.00%)   97.41 (  14.67%)
Percentage huge-30   83.63 (   0.00%)   96.69 (  15.62%)
Percentage huge-32   81.69 (   0.00%)   96.10 (  17.65%)

That's a nearly perfect allocation success rate.

The biggest impact is on the scan rates

Compaction migrate scanned   10652081126934599
Compaction free scanned 418073504026584944

The number of pages scanned for migration was reduced by 74% and the
free scanner was reduced by 99.36%. So much less work in exchange
for lower latency and better success rates.

The series was also evaluated using a workload that heavily fragments
memory but the benefits there are also significant, albeit not presented.

It was commented that we should be rethinking scanning entirely and to
a large extent I agree. However, to achieve that you need a lot of this
series in place first so it's best to make the linear scanners as best
as possible before ripping them out.

 include/linux/compaction.h |3 +-
 include/linux/gfp.h|7 +-
 include/linux/mmzone.h |2 +
 include/linux/sched.h  |4 +
 kernel/sched/core.c|3 +
 mm/compaction.c| 1031 ++--
 mm/internal.h  |   23 +-
 mm/migrate.c   |2 +-
 mm/page_alloc.c|   70 ++-
 9 files changed, 908 insertions(+), 237 deletions(-)

-- 
2.16.4



[PATCH 01/25] mm, compaction: Shrink compact_control

2019-01-04 Thread Mel Gorman
The isolate and migrate scanners should never isolate more than a pageblock
of pages so unsigned int is sufficient saving 8 bytes on a 64-bit build.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
---
 mm/internal.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index f4a7bb02decf..5ddf5d3771a0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -184,8 +184,8 @@ struct compact_control {
struct list_head freepages; /* List of free pages to migrate to */
struct list_head migratepages;  /* List of pages being migrated */
struct zone *zone;
-   unsigned long nr_freepages; /* Number of isolated free pages */
-   unsigned long nr_migratepages;  /* Number of pages to migrate */
+   unsigned int nr_freepages;  /* Number of isolated free pages */
+   unsigned int nr_migratepages;   /* Number of pages to migrate */
unsigned long total_migrate_scanned;
unsigned long total_free_scanned;
unsigned long free_pfn; /* isolate_freepages search base */
-- 
2.16.4



[PATCH 03/25] mm, compaction: Remove last_migrated_pfn from compact_control

2019-01-04 Thread Mel Gorman
The last_migrated_pfn field is a bit dubious as to whether it really helps
but either way, the information from it can be inferred without increasing
the size of compact_control so remove the field.

Signed-off-by: Mel Gorman 
Acked-by: Vlastimil Babka 
---
 mm/compaction.c | 25 +
 mm/internal.h   |  1 -
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index ef29490b0f46..fb4d9f52ed56 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -886,15 +886,6 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
cc->nr_migratepages++;
nr_isolated++;
 
-   /*
-* Record where we could have freed pages by migration and not
-* yet flushed them to buddy allocator.
-* - this is the lowest page that was isolated and likely be
-* then freed by migration.
-*/
-   if (!cc->last_migrated_pfn)
-   cc->last_migrated_pfn = low_pfn;
-
/* Avoid isolating too much */
if (cc->nr_migratepages == COMPACT_CLUSTER_MAX) {
++low_pfn;
@@ -918,7 +909,6 @@ isolate_migratepages_block(struct compact_control *cc, 
unsigned long low_pfn,
}
putback_movable_pages(>migratepages);
cc->nr_migratepages = 0;
-   cc->last_migrated_pfn = 0;
nr_isolated = 0;
}
 
@@ -1539,6 +1529,7 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
enum compact_result ret;
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone_end_pfn(zone);
+   unsigned long last_migrated_pfn;
const bool sync = cc->mode != MIGRATE_ASYNC;
 
cc->migratetype = gfpflags_to_migratetype(cc->gfp_mask);
@@ -1584,7 +1575,7 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
cc->whole_zone = true;
}
 
-   cc->last_migrated_pfn = 0;
+   last_migrated_pfn = 0;
 
trace_mm_compaction_begin(start_pfn, cc->migrate_pfn,
cc->free_pfn, end_pfn, sync);
@@ -1593,12 +1584,14 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
 
while ((ret = compact_finished(zone, cc)) == COMPACT_CONTINUE) {
int err;
+   unsigned long start_pfn = cc->migrate_pfn;
 
switch (isolate_migratepages(zone, cc)) {
case ISOLATE_ABORT:
ret = COMPACT_CONTENDED;
putback_movable_pages(>migratepages);
cc->nr_migratepages = 0;
+   last_migrated_pfn = 0;
goto out;
case ISOLATE_NONE:
/*
@@ -1608,6 +1601,7 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
 */
goto check_drain;
case ISOLATE_SUCCESS:
+   last_migrated_pfn = start_pfn;
;
}
 
@@ -1639,8 +1633,7 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
cc->migrate_pfn = block_end_pfn(
cc->migrate_pfn - 1, cc->order);
/* Draining pcplists is useless in this case */
-   cc->last_migrated_pfn = 0;
-
+   last_migrated_pfn = 0;
}
}
 
@@ -1652,18 +1645,18 @@ static enum compact_result compact_zone(struct zone 
*zone, struct compact_contro
 * compact_finished() can detect immediately if allocation
 * would succeed.
 */
-   if (cc->order > 0 && cc->last_migrated_pfn) {
+   if (cc->order > 0 && last_migrated_pfn) {
int cpu;
unsigned long current_block_start =
block_start_pfn(cc->migrate_pfn, cc->order);
 
-   if (cc->last_migrated_pfn < current_block_start) {
+   if (last_migrated_pfn < current_block_start) {
cpu = get_cpu();
lru_add_drain_cpu(cpu);
drain_local_pages(zone);
put_cpu();
/* No more flushing until we migrate again */
-   cc->last_migrated_pfn = 0;
+   last_migrated_pfn = 0;
}
}
 
diff --git a/mm/internal.h b/mm/internal.h
index 

[PATCH 1/4] rtlwifi: rtl8723ae: Take the FW LPS mode handling out

2019-01-04 Thread Bernd Edlinger
This appears to trigger a firmware bug and causes severe
problems with rtl8723ae PCI devices.

When the power save mode is activated for longer periods
of time the firmware stops to receive any packets.

This problem was exposed by commit 873ffe154ae0 ("rtlwifi:
Fix logic error in enter/exit power-save mode").

Previously the power save mode was only active rarely and
only for a short time so that the problem was not noticeable.

Signed-off-by: Bernd Edlinger 
---
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.c| 20 
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.h|  1 -
  .../net/wireless/realtek/rtlwifi/rtl8723ae/hw.c| 56 +-
  3 files changed, 1 insertion(+), 76 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c
index bf9859f..77833fb 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.c
@@ -268,26 +268,6 @@ void rtl8723e_fill_h2c_cmd(struct ieee80211_hw *hw,
   (u8 *)_cmdbuf);
  }
  
-void rtl8723e_set_fw_pwrmode_cmd(struct ieee80211_hw *hw, u8 mode)
-{
-   struct rtl_priv *rtlpriv = rtl_priv(hw);
-   u8 u1_h2c_set_pwrmode[3] = { 0 };
-   struct rtl_ps_ctl *ppsc = rtl_psc(rtl_priv(hw));
-
-   RT_TRACE(rtlpriv, COMP_POWER, DBG_LOUD, "FW LPS mode = %d\n", mode);
-
-   SET_H2CCMD_PWRMODE_PARM_MODE(u1_h2c_set_pwrmode, mode);
-   SET_H2CCMD_PWRMODE_PARM_SMART_PS(u1_h2c_set_pwrmode,
-   (rtlpriv->mac80211.p2p) ? ppsc->smart_ps : 1);
-   SET_H2CCMD_PWRMODE_PARM_BCN_PASS_TIME(u1_h2c_set_pwrmode,
- ppsc->reg_max_lps_awakeintvl);
-
-   RT_PRINT_DATA(rtlpriv, COMP_CMD, DBG_DMESG,
- "rtl8723e_set_fw_rsvdpagepkt(): u1_h2c_set_pwrmode\n",
- u1_h2c_set_pwrmode, 3);
-   rtl8723e_fill_h2c_cmd(hw, H2C_SETPWRMODE, 3, u1_h2c_set_pwrmode);
-}
-
  #define BEACON_PG 0 /* ->1 */
  #define PSPOLL_PG 2
  #define NULL_PG   3
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h
index 2e668fc..8618b82 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/fw.h
@@ -55,7 +55,6 @@
  
  void rtl8723e_fill_h2c_cmd(struct ieee80211_hw *hw, u8 element_id,
   u32 cmd_len, u8 *p_cmdbuffer);
-void rtl8723e_set_fw_pwrmode_cmd(struct ieee80211_hw *hw, u8 mode);
  void rtl8723e_set_fw_rsvdpagepkt(struct ieee80211_hw *hw, bool b_dl_finished);
  void rtl8723e_set_fw_joinbss_report_cmd(struct ieee80211_hw *hw, u8 mstatus);
  void rtl8723e_set_p2p_ps_offload_cmd(struct ieee80211_hw *hw, u8 
p2p_ps_state);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
index f783e4a..f0eb356 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/hw.c
@@ -408,29 +408,7 @@ void rtl8723e_set_hw_reg(struct ieee80211_hw *hw, u8 
variable, u8 *val)
case HW_VAR_WPA_CONFIG:
rtl_write_byte(rtlpriv, REG_SECCFG, *((u8 *)val));
break;
-   case HW_VAR_SET_RPWM:{
-   u8 rpwm_val;
-
-   rpwm_val = rtl_read_byte(rtlpriv, REG_PCIE_HRPWM);
-   udelay(1);
-
-   if (rpwm_val & BIT(7)) {
-   rtl_write_byte(rtlpriv, REG_PCIE_HRPWM,
-  (*(u8 *)val));
-   } else {
-   rtl_write_byte(rtlpriv, REG_PCIE_HRPWM,
-  ((*(u8 *)val) | BIT(7)));
-   }
-
-   break;
-   }
case HW_VAR_H2C_FW_PWRMODE:{
-   u8 psmode = (*(u8 *)val);
-
-   if (psmode != FW_PS_ACTIVE_MODE)
-   rtl8723e_dm_rf_saving(hw, true);
-
-   rtl8723e_set_fw_pwrmode_cmd(hw, (*(u8 *)val));
break;
}
case HW_VAR_FW_PSMODE_STATUS:
@@ -513,39 +491,7 @@ void rtl8723e_set_hw_reg(struct ieee80211_hw *hw, u8 
variable, u8 *val)
break;
}
case HW_VAR_FW_LPS_ACTION:{
-   bool b_enter_fwlps = *((bool *)val);
-   u8 rpwm_val, fw_pwrmode;
-   bool fw_current_inps;
-
-   if (b_enter_fwlps) {
-   rpwm_val = 0x02;/* RF off */
-   fw_current_inps = true;
-   rtlpriv->cfg->ops->set_hw_reg(hw,
-   HW_VAR_FW_PSMODE_STATUS,
-   

Re: [PATCH] scsi/dpt_i2o.c: Use dma_zalloc_coherent

2019-01-04 Thread Sabyasachi Gupta
On Wed, Dec 19, 2018 at 6:41 PM Sabyasachi Gupta
 wrote:
>
> On Sat, Dec 1, 2018 at 6:45 PM Sabyasachi Gupta
>  wrote:
> >
> > On Sat, Nov 17, 2018 at 6:17 PM Sabyasachi Gupta
> >  wrote:
> > >
> > > Replaced dma_alloc_coherent + memset with dma_zalloc_coherent
> > >
> > > Signed-off-by: Sabyasachi Gupta 
> >
> > Any comment on this patch?
>
> Any comment on this patch?
>
Any comment on this patch?

> >
> > > ---
> > >  drivers/scsi/dpt_i2o.c | 12 
> > >  1 file changed, 4 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/scsi/dpt_i2o.c b/drivers/scsi/dpt_i2o.c
> > > index 37de8fb..056383a 100644
> > > --- a/drivers/scsi/dpt_i2o.c
> > > +++ b/drivers/scsi/dpt_i2o.c
> > > @@ -1370,13 +1370,12 @@ static s32 adpt_i2o_reset_hba(adpt_hba* pHba)
> > > schedule_timeout_uninterruptible(1);
> > > } while (m == EMPTY_QUEUE);
> > >
> > > -   status = dma_alloc_coherent(>pDev->dev, 4, , 
> > > GFP_KERNEL);
> > > +   status = dma_zalloc_coherent(>pDev->dev, 4, , 
> > > GFP_KERNEL);
> > > if(status == NULL) {
> > > adpt_send_nop(pHba, m);
> > > printk(KERN_ERR"IOP reset failed - no free memory.\n");
> > > return -ENOMEM;
> > > }
> > > -   memset(status,0,4);
> > >
> > > msg[0]=EIGHT_WORD_MSG_SIZE|SGL_OFFSET_0;
> > > msg[1]=I2O_CMD_ADAPTER_RESET<<24|HOST_TID<<12|ADAPTER_TID;
> > > @@ -2836,14 +2835,13 @@ static s32 adpt_i2o_init_outbound_q(adpt_hba* 
> > > pHba)
> > >
> > > msg=(u32 __iomem *)(pHba->msg_addr_virt+m);
> > >
> > > -   status = dma_alloc_coherent(>pDev->dev, 4, , 
> > > GFP_KERNEL);
> > > +   status = dma_zalloc_coherent(>pDev->dev, 4, , 
> > > GFP_KERNEL);
> > > if (!status) {
> > > adpt_send_nop(pHba, m);
> > > printk(KERN_WARNING"%s: IOP reset failed - no free 
> > > memory.\n",
> > > pHba->name);
> > > return -ENOMEM;
> > > }
> > > -   memset(status, 0, 4);
> > >
> > > writel(EIGHT_WORD_MSG_SIZE| SGL_OFFSET_6, [0]);
> > > writel(I2O_CMD_OUTBOUND_INIT<<24 | HOST_TID<<12 | ADAPTER_TID, 
> > > [1]);
> > > @@ -2890,14 +2888,13 @@ static s32 adpt_i2o_init_outbound_q(adpt_hba* 
> > > pHba)
> > > pHba->reply_pool, pHba->reply_pool_pa);
> > > }
> > >
> > > -   pHba->reply_pool = dma_alloc_coherent(>pDev->dev,
> > > +   pHba->reply_pool = dma_zalloc_coherent(>pDev->dev,
> > > pHba->reply_fifo_size * REPLY_FRAME_SIZE 
> > > * 4,
> > > >reply_pool_pa, GFP_KERNEL);
> > > if (!pHba->reply_pool) {
> > > printk(KERN_ERR "%s: Could not allocate reply pool\n", 
> > > pHba->name);
> > > return -ENOMEM;
> > > }
> > > -   memset(pHba->reply_pool, 0 , pHba->reply_fifo_size * 
> > > REPLY_FRAME_SIZE * 4);
> > >
> > > for(i = 0; i < pHba->reply_fifo_size; i++) {
> > > writel(pHba->reply_pool_pa + (i * REPLY_FRAME_SIZE * 4),
> > > @@ -3126,13 +3123,12 @@ static int adpt_i2o_build_sys_table(void)
> > > sys_tbl_len = sizeof(struct i2o_sys_tbl) +  // Header + IOPs
> > > (hba_count) * sizeof(struct 
> > > i2o_sys_tbl_entry);
> > >
> > > -   sys_tbl = dma_alloc_coherent(>pDev->dev,
> > > +   sys_tbl = dma_zalloc_coherent(>pDev->dev,
> > > sys_tbl_len, _tbl_pa, GFP_KERNEL);
> > > if (!sys_tbl) {
> > > printk(KERN_WARNING "SysTab Set failed. Out of 
> > > memory.\n");
> > > return -ENOMEM;
> > > }
> > > -   memset(sys_tbl, 0, sys_tbl_len);
> > >
> > > sys_tbl->num_entries = hba_count;
> > > sys_tbl->version = I2OVERSION;
> > > --
> > > 2.7.4
> > >


[PATCH 2/4] rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input

2019-01-04 Thread Bernd Edlinger
  gain control when no beacon was received in the connected state.

This avoids sporadic "Connection to AP ... lost" errors.

Signed-off-by: Bernd Edlinger 
---
  drivers/net/wireless/realtek/rtlwifi/core.c | 2 ++
  drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c | 8 
  2 files changed, 10 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c 
b/drivers/net/wireless/realtek/rtlwifi/core.c
index 4bf7967..ce23339 100644
--- a/drivers/net/wireless/realtek/rtlwifi/core.c
+++ b/drivers/net/wireless/realtek/rtlwifi/core.c
@@ -1957,5 +1957,7 @@ void rtl_dm_diginit(struct ieee80211_hw *hw, u32 
cur_igvalue)
dm_digtable->bt30_cur_igi = 0x32;
dm_digtable->pre_cck_pd_state = CCK_PD_STAGE_MAX;
dm_digtable->cur_cck_pd_state = CCK_PD_STAGE_LOWRSSI;
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
  }
  EXPORT_SYMBOL(rtl_dm_diginit);
diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 42a6fba..902b944 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -151,8 +151,14 @@ static u8 rtl8723e_dm_initial_gain_min_pwdb(struct 
ieee80211_hw *hw)
  {
struct rtl_priv *rtlpriv = rtl_priv(hw);
struct dig_t *dm_digtable = >dm_digtable;
+   struct rtl_mac *mac = rtl_mac(rtlpriv);
long rssi_val_min = 0;
  
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION &&
+   rtlpriv->link_info.bcn_rx_inperiod == 0)
+   return 0;
+
if ((dm_digtable->curmultista_cstate == DIG_MULTISTA_CONNECT) &&
(dm_digtable->cursta_cstate == DIG_STA_CONNECT)) {
if (rtlpriv->dm.entry_min_undec_sm_pwdb != 0)
@@ -417,6 +423,8 @@ static void rtl8723e_dm_cck_packet_detection_thresh(struct 
ieee80211_hw *hw)
} else {
rtl_set_bbreg(hw, RCCK0_CCA, MASKBYTE2, 0xcd);
rtl_set_bbreg(hw, RCCK0_SYSTEM, MASKBYTE1, 0x47);
+   dm_digtable->pre_cck_fa_state = 0;
+   dm_digtable->cur_cck_fa_state = 0;
  
}
dm_digtable->pre_cck_pd_state = dm_digtable->cur_cck_pd_state;
-- 
1.9.1


[PATCH 4/4] rtlwifi: Move the clearing of

2019-01-04 Thread Bernd Edlinger
  rtlpriv->link_info.num_rx_inperiod in rtl_watchdog_wq_callback a few lines
  down.

This is necessary since it is still used in the "AP off" detection
code block. Moved clearing of rtlpriv->link_info.num_rx_inperiod
as well for consistency.

Signed-off-by: Bernd Edlinger 
---
  drivers/net/wireless/realtek/rtlwifi/base.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/base.c 
b/drivers/net/wireless/realtek/rtlwifi/base.c
index ef9b502..7aa68fe 100644
--- a/drivers/net/wireless/realtek/rtlwifi/base.c
+++ b/drivers/net/wireless/realtek/rtlwifi/base.c
@@ -2172,8 +2172,6 @@ void rtl_watchdog_wq_callback(void *data)
;
}
  
-   rtlpriv->link_info.num_rx_inperiod = 0;
-   rtlpriv->link_info.num_tx_inperiod = 0;
for (tid = 0; tid <= 7; tid++)
rtlpriv->link_info.tidtx_inperiod[tid] = 0;
  
@@ -2236,6 +2234,8 @@ void rtl_watchdog_wq_callback(void *data)
rtlpriv->btcoexist.btc_info.in_4way = false;
}
  
+   rtlpriv->link_info.num_rx_inperiod = 0;
+   rtlpriv->link_info.num_tx_inperiod = 0;
rtlpriv->link_info.bcn_rx_inperiod = 0;
  
/* <6> scan list */
-- 
1.9.1


Re: [PATCH 2/2] virtio: document virtio_config_ops restrictions

2019-01-04 Thread Cornelia Huck
On Thu, 3 Jan 2019 18:28:49 +0100
Halil Pasic  wrote:

> On Thu,  3 Jan 2019 17:08:04 +0100
> Cornelia Huck  wrote:
> 
> > Some transports (e.g. virtio-ccw) implement virtio operations that
> > seem to be a simple read/write as something more involved that
> > cannot be done from an atomic context.
> > 
> > Give at least a hint about that.
> > 
> > Signed-off-by: Cornelia Huck 
> > ---
> >  include/linux/virtio_config.h | 5 +
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
> > index 7087ef946ba7..987b6491b946 100644
> > --- a/include/linux/virtio_config.h
> > +++ b/include/linux/virtio_config.h
> > @@ -12,6 +12,11 @@ struct irq_affinity;
> >  
> >  /**
> >   * virtio_config_ops - operations for configuring a virtio device
> > + * Note: Do not assume that a transport implements all of the operations
> > + *   getting/setting a value as a simple read/write! Generally 
> > speaking,
> > + *   any of @get/@set, @get_status/@set_status, or @get_features/
> > + *   @finalize_features are NOT safe to be called from an atomic
> > + *   context.  
> 
> I think the only exception is @bus_name (and maybe @generation, I don't
> know) because it does not have to 'speak' with the hypervisor. If a
> transport operation has to 'speak' with the hypervisor, we do it by
> making it interpret a channel program. That means not safe to be called
> form atomic context. Or am I missing something?

I explicitly singled out the listed callbacks because they read/write a
value and there might be more to them than meets the eye. I would
assume that nobody expects calling e.g. find_vqs (allocating queues)
from an atomic context to be a good idea :) Maybe I should do
s/Generally speaking/In particular/ ?

That said, it's only a hint; we should add might_sleep as well to
interfaces where it makes sense.


[PATCH 0/4] rtlwifi: Fix issues with rtl8723ae

2019-01-04 Thread Bernd Edlinger
Currently the rtl8723ae driver is broken (since v4.7).

Connection to AP is lost very often, especially when
the signal level is not very good.

The main issue is the power save mode is basically
not working, and seems to trigger a firmware bug.
So I had to remove that code.

While debugging the driver I found a couple related issues,
for instance that the signal level in dm.undec_sm_pwdb
is no longer accurate (may be even much too high) when no more
packets are received, and it increases the likelihood to receive
something if the input gain is set to maximum.

The patch was tested with the rtl8723ae PCI card in my laptop
against a FRITZ!Box 7590 AP -- the WiFi connection works now
very reliable for me.


Bernd Edlinger (4):
   rtlwifi: rtl8723ae: Take the FW LPS mode handling out
   rtlwifi: rtl8723ae: Don't use dm.undec_sm_pwdb for input gain control
when no beacon was received in the connected state.
   rtlwifi: rtl8723ae: Re-introduce
 rtl8723e_dm_refresh_rate_adaptive_mask
   rtlwifi: Move the clearing of rtlpriv->link_info.num_rx_inperiod in
  rtl_watchdog_wq_callback a few lines down.

  drivers/net/wireless/realtek/rtlwifi/base.c|  4 +-
  drivers/net/wireless/realtek/rtlwifi/core.c|  2 +
  .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 95 +-
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.c| 20 -
  .../net/wireless/realtek/rtlwifi/rtl8723ae/fw.h|  1 -
  .../net/wireless/realtek/rtlwifi/rtl8723ae/hw.c| 56 +
  6 files changed, 98 insertions(+), 80 deletions(-)

-- 
1.9.1


Re: [PATCH] scsi/mvsas/mv_init.c: Use dma_zalloc_coherent

2019-01-04 Thread Sabyasachi Gupta
On Wed, Dec 19, 2018 at 6:49 PM Sabyasachi Gupta
 wrote:
>
> On Sat, Dec 1, 2018 at 6:40 PM Sabyasachi Gupta
>  wrote:
> >
> > On Wed, Nov 21, 2018 at 7:18 PM Sabyasachi Gupta
> >  wrote:
> > >
> > > Replace dma_alloc_coherent + memset with dma_zalloc_coherent
> > >
> > > Signed-off-by: Sabyasachi Gupta 
> >
> > Any comment on this patch?
>
> Any comment on this patch?

Any comment on this patch?

>
> >
> > > ---
> > >  drivers/scsi/mvsas/mv_init.c | 12 
> > >  1 file changed, 4 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/scsi/mvsas/mv_init.c b/drivers/scsi/mvsas/mv_init.c
> > > index 3ac3437..495bddb 100644
> > > --- a/drivers/scsi/mvsas/mv_init.c
> > > +++ b/drivers/scsi/mvsas/mv_init.c
> > > @@ -253,33 +253,29 @@ static int mvs_alloc(struct mvs_info *mvi, struct 
> > > Scsi_Host *shost)
> > > /*
> > >  * alloc and init our DMA areas
> > >  */
> > > -   mvi->tx = dma_alloc_coherent(mvi->dev,
> > > +   mvi->tx = dma_zalloc_coherent(mvi->dev,
> > >  sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ,
> > >  >tx_dma, GFP_KERNEL);
> > > if (!mvi->tx)
> > > goto err_out;
> > > -   memset(mvi->tx, 0, sizeof(*mvi->tx) * MVS_CHIP_SLOT_SZ);
> > > -   mvi->rx_fis = dma_alloc_coherent(mvi->dev, MVS_RX_FISL_SZ,
> > > +   mvi->rx_fis = dma_zalloc_coherent(mvi->dev, MVS_RX_FISL_SZ,
> > >  >rx_fis_dma, GFP_KERNEL);
> > > if (!mvi->rx_fis)
> > > goto err_out;
> > > -   memset(mvi->rx_fis, 0, MVS_RX_FISL_SZ);
> > >
> > > -   mvi->rx = dma_alloc_coherent(mvi->dev,
> > > +   mvi->rx = dma_zalloc_coherent(mvi->dev,
> > >  sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 
> > > 1),
> > >  >rx_dma, GFP_KERNEL);
> > > if (!mvi->rx)
> > > goto err_out;
> > > -   memset(mvi->rx, 0, sizeof(*mvi->rx) * (MVS_RX_RING_SZ + 1));
> > > mvi->rx[0] = cpu_to_le32(0xfff);
> > > mvi->rx_cons = 0xfff;
> > >
> > > -   mvi->slot = dma_alloc_coherent(mvi->dev,
> > > +   mvi->slot = dma_zalloc_coherent(mvi->dev,
> > >sizeof(*mvi->slot) * slot_nr,
> > >>slot_dma, GFP_KERNEL);
> > > if (!mvi->slot)
> > > goto err_out;
> > > -   memset(mvi->slot, 0, sizeof(*mvi->slot) * slot_nr);
> > >
> > > mvi->bulk_buffer = dma_alloc_coherent(mvi->dev,
> > >TRASH_BUCKET_SIZE,
> > > --
> > > 2.7.4
> > >


[PATCH 3/4] rtlwifi: rtl8723ae: Re-introduce

2019-01-04 Thread Bernd Edlinger
  rtl8723e_dm_refresh_rate_adaptive_mask

This function was present in a previous version of the code base,
it works just fine for me -- as long as it is not using stale data.

Fixed a style nit in rtl8723e_dm_init_rate_adaptive_mask.

Signed-off-by: Bernd Edlinger 
---
  .../net/wireless/realtek/rtlwifi/rtl8723ae/dm.c| 87 +-
  1 file changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
index 902b944..acfd54c 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8723ae/dm.c
@@ -673,7 +673,7 @@ void rtl8723e_dm_check_txpower_tracking(struct ieee80211_hw 
*hw)
  void rtl8723e_dm_init_rate_adaptive_mask(struct ieee80211_hw *hw)
  {
struct rtl_priv *rtlpriv = rtl_priv(hw);
-   struct rate_adaptive *p_ra = &(rtlpriv->ra);
+   struct rate_adaptive *p_ra = >ra;
  
p_ra->ratr_state = DM_RATR_STA_INIT;
p_ra->pre_ratr_state = DM_RATR_STA_INIT;
@@ -685,6 +685,89 @@ void rtl8723e_dm_init_rate_adaptive_mask(struct 
ieee80211_hw *hw)
  
  }
  
+void rtl8723e_dm_refresh_rate_adaptive_mask(struct ieee80211_hw *hw)
+{
+   struct rtl_priv *rtlpriv = rtl_priv(hw);
+   struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
+   struct rtl_mac *mac = rtl_mac(rtl_priv(hw));
+   struct rate_adaptive *p_ra = >ra;
+   u32 low_rssithresh_for_ra, high_rssithresh_for_ra;
+   struct ieee80211_sta *sta = NULL;
+
+   if (is_hal_stop(rtlhal)) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver is going to unload\n");
+   return;
+   }
+
+   if (!rtlpriv->dm.useramask) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+" driver does not control rate adaptive mask\n");
+   return;
+   }
+
+   if (mac->link_state == MAC80211_LINKED &&
+   mac->opmode == NL80211_IFTYPE_STATION) {
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   high_rssithresh_for_ra = 55;
+   low_rssithresh_for_ra = 20;
+   break;
+   case DM_RATR_STA_LOW:
+   high_rssithresh_for_ra = 60;
+   low_rssithresh_for_ra = 25;
+   break;
+   default:
+   high_rssithresh_for_ra = 50;
+   low_rssithresh_for_ra = 20;
+   break;
+   }
+
+   if (rtlpriv->link_info.bcn_rx_inperiod == 0)
+   switch (p_ra->pre_ratr_state) {
+   case DM_RATR_STA_HIGH:
+   default:
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   break;
+   case DM_RATR_STA_MIDDLE:
+   case DM_RATR_STA_LOW:
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+   break;
+   }
+   else if (rtlpriv->dm.undec_sm_pwdb > high_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_HIGH;
+   else if (rtlpriv->dm.undec_sm_pwdb > low_rssithresh_for_ra)
+   p_ra->ratr_state = DM_RATR_STA_MIDDLE;
+   else
+   p_ra->ratr_state = DM_RATR_STA_LOW;
+
+   if (p_ra->pre_ratr_state != p_ra->ratr_state) {
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI = %ld\n",
+rtlpriv->dm.undec_sm_pwdb);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"RSSI_LEVEL = %d\n", p_ra->ratr_state);
+   RT_TRACE(rtlpriv, COMP_RATE, DBG_LOUD,
+"PreState = %d, CurState = %d\n",
+p_ra->pre_ratr_state, p_ra->ratr_state);
+
+   rcu_read_lock();
+   sta = rtl_find_sta(hw, mac->bssid);
+   if (sta)
+   rtlpriv->cfg->ops->update_rate_tbl(hw, sta,
+  p_ra->ratr_state,
+ true);
+   rcu_read_unlock();
+
+   p_ra->pre_ratr_state = p_ra->ratr_state;
+   }
+   }
+}
+
  void rtl8723e_dm_rf_saving(struct ieee80211_hw *hw, u8 bforce_in_normal)
  {
struct rtl_priv *rtlpriv = rtl_priv(hw);
@@ -834,7 +917,7 @@ void rtl8723e_dm_watchdog(struct ieee80211_hw *hw)

Re: [PATCH RFC lora-next 2/4] net: lora: sx1301: add minimal to get AGC working prior to tx work

2019-01-04 Thread Ben Whitten




On 29/12/2018 09:58, Andreas Färber wrote:

Hi Ben,

Am 19.12.18 um 16:56 schrieb Ben Whitten:

As part of initialisation when opening the lora device after loading
the AGC firmware we need to satisfy its startup procedure which involves
a few steps;

Loading a 16 entry lookup table.
For this I have hard coded the laird ETSI certified table for use on the
RG186-M2 (EU) cards, this will need investigation on how other devices
load calibration data.


Isn't calibration performed before this firmware is initialized? I left
out reading the values back from firmware previously, but that should
get implemented. In the userspace implementation, do you get these from
a config file or did you modify the reference code to hardcode them?


The calibration you refer to is the IQ offset calibration, as far as
I can tell this is a separate thing from the power table the chip uses.
In the user space implementation these values are placed in the config
file.



If these are different calibration values from the ones returned by
firmware, then a DT property would be an easy way to get
hardware-specific data into the driver. However, same as with your clk
config, that makes us dependent on DT, which we shouldn't be for ACPI
and USB. If we consider it configuration data rather than an immutable
fact, then we would need a netlink command to set them.


Perhaps we can provide both mechanisms, a DT power table and a mechanism
to set via netlink prior to opening the interface.
As these power settings are hardware dependent and certified for our
card by FCC and ETSI I would prefer to include in DT.



In any case, we have some other vendors on this list, so hopefully
someone can comment. :)



Selecting the correct channel to transmit on.
Currently always 0 for the reference design.


Similarly: DT or netlink depending on whether fixed, and fall back to 0
as default.


Agreed, so on the DT would it be appropriate to have a handle in the
sx1301 node with a link to the radio which can transmit, eg.

tx = < 0>;

Allowing for both to be transmitters if that if a hardware choice.





Then ending the AGC init procedure and seeing that it has come up.

Signed-off-by: Ben Whitten 
---
  drivers/net/lora/sx1301.c | 254 +-
  drivers/net/lora/sx1301.h |  16 +++
  2 files changed, 268 insertions(+), 2 deletions(-)


Many thanks for working on this! Some nits inline.


diff --git a/drivers/net/lora/sx1301.c b/drivers/net/lora/sx1301.c
index e75df93b96d8..0c7b6d0b31af 100644
--- a/drivers/net/lora/sx1301.c
+++ b/drivers/net/lora/sx1301.c
@@ -24,6 +24,121 @@
  
  #include "sx1301.h"
  
+static struct sx1301_tx_gain_lut tx_gain_lut[] = {

+   {
+   .dig_gain = 0,
+   .pa_gain = 0,
+   .dac_gain = 3,
+   .mix_gain = 8,
+   .rf_power = -3,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 0,
+   .dac_gain = 3,
+   .mix_gain = 9,
+   .rf_power = 0,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 0,
+   .dac_gain = 3,
+   .mix_gain = 12,
+   .rf_power = 3,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 0,
+   .dac_gain = 3,
+   .mix_gain = 13,
+   .rf_power = 4,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 8,
+   .rf_power = 6,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 9,
+   .rf_power = 9,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 10,
+   .rf_power = 10,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 11,
+   .rf_power = 12,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 12,
+   .rf_power = 13,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 13,
+   .rf_power = 14,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 1,
+   .dac_gain = 3,
+   .mix_gain = 15,
+   .rf_power = 16,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 2,
+   .dac_gain = 3,
+   .mix_gain = 10,
+   .rf_power = 19,
+   },
+   {
+   .dig_gain = 0,
+   .pa_gain = 2,
+   .dac_gain = 3,
+   .mix_gain = 11,
+   .rf_power = 21,
+   },
+   {
+

CFS scheduler: spin_lock usage causes dead lock when smp_apic_timer_interrupt occurs

2019-01-04 Thread Tom Putzeys
Dear Ingo and Peter,

I would like to report a possible bug in the CFS scheduler causing a dead lock. 

We suspect this bug to have caused intermittent yet highly-persistent system 
freezes on our quad-core SMP systems.

We noticed the problem on 4.1.17 preempt-rt but we suspect the problematic code 
is not linked to the preempt-rt patch and is also present in the latest 4.20 
kernel.

The problem concerns the use of spin_lock to lock cfs_b in a situation where 
the spin lock is used in an interrupt handler:
-  __run_hrtimer (in kernel/time/hrtimer.c) calls fn(timer) with IRQ's enabled. 
This can call sched_cfs_period_timer() (in kernel/sched/fair.c) which locks 
cfs_b. 
- the hard IRQ smp_apic_timer_interrupt can then occur. It can call 
ttwu_queue() which grabs the spin lock for its CPU run queue and can then try 
to enqueue a task via the CFS scheduler.
- this can call check_enqueue_throttle() which can call assign_cfs_rq_runtime() 
which tries to obtain the cfs_b lock. It is now blocked.

The cfs_b lock uses spin_lock and so was not intended for use inside a hard irq 
but the CFS scheduler does just that when it uses a hrtimer_interrupt to wake 
up and enqueue work. Our initial impression is that  the cfs_b needs to be 
locked using spin_lock_irqsave.

My colleague Mike Pearce has submitted a bug report on Bugzilla 3 weeks ago: 
https://bugzilla.kernel.org/show_bug.cgi?id=201993

We would appreciate any feedback.

Kind regards,

Tom

 

<    1   2   3   4   5   6   7   >