blk_mq_sched_insert_request: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage

2017-07-26 Thread Michael Ellerman
Hi Jens,

I'm seeing the lockdep warning below on shutdown on a Power8 machine
using IPR.

If I'm reading it right it looks like the spin_lock() (non-irq) in
blk_mq_sched_insert_request() is the immediate cause.

Looking at blk_mq_requeue_work() (the caller), it is doing
spin_lock_irqsave(). So is switching blk_mq_sched_insert_request() to
spin_lock_irqsave() the right fix?

cheers


ipr 0001:08:00.0: shutdown


WARNING: inconsistent lock state
4.13.0-rc2-gcc6x-gf74c89b #1 Not tainted

inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/28/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (&(>lock)->rlock){+.?...}, at: [] 
blk_mq_sched_dispatch_requests+0xa4/0x2a0
{SOFTIRQ-ON-W} state was registered at:
  lock_acquire+0xec/0x2e0
  _raw_spin_lock+0x44/0x70
  blk_mq_sched_insert_request+0x88/0x1f0
  blk_mq_requeue_work+0x108/0x180
  process_one_work+0x310/0x800
  worker_thread+0x88/0x520
  kthread+0x164/0x1b0
  ret_from_kernel_thread+0x5c/0x74
irq event stamp: 3572314
hardirqs last  enabled at (3572314): [] 
_raw_spin_unlock_irqrestore+0x58/0xb0
hardirqs last disabled at (3572313): [] 
_raw_spin_lock_irqsave+0x3c/0x90
softirqs last  enabled at (3572302): [] irq_enter+0x9c/0xe0
softirqs last disabled at (3572303): [] irq_exit+0x108/0x150

other info that might help us debug this:
 Possible unsafe locking scenario:

   CPU0
   
  lock(&(>lock)->rlock);
  
lock(&(>lock)->rlock);

 *** DEADLOCK ***

2 locks held by swapper/28/0:
 #0:  ((_cmd->timer)){+.-...}, at: [] 
call_timer_fn+0x10/0x4b0
 #1:  (rcu_read_lock){..}, at: [] 
__blk_mq_run_hw_queue+0xa0/0x2c0

stack backtrace:
CPU: 28 PID: 0 Comm: swapper/28 Not tainted 4.13.0-rc2-gcc6x-gf74c89b #1
Call Trace:
[c01fffe97550] [c0b50818] dump_stack+0xe8/0x160 (unreliable)
[c01fffe97590] [c01586d0] print_usage_bug+0x2d0/0x390
[c01fffe97640] [c0158f34] mark_lock+0x7a4/0x8e0
[c01fffe976f0] [c015a000] __lock_acquire+0x6a0/0x1a70
[c01fffe97860] [c015befc] lock_acquire+0xec/0x2e0
[c01fffe97930] [c0b71514] _raw_spin_lock+0x44/0x70
[c01fffe97960] [c05b60f4] blk_mq_sched_dispatch_requests+0xa4/0x2a0
[c01fffe979c0] [c05acac0] __blk_mq_run_hw_queue+0x100/0x2c0
[c01fffe97a00] [c05ad478] __blk_mq_delay_run_hw_queue+0x118/0x130
[c01fffe97a40] [c05ad61c] blk_mq_start_hw_queues+0x6c/0xa0
[c01fffe97a80] [c0797aac] scsi_kick_queue+0x2c/0x60
[c01fffe97aa0] [c0797cf0] scsi_run_queue+0x210/0x360
[c01fffe97b10] [c079b888] scsi_run_host_queues+0x48/0x80
[c01fffe97b40] [c07b6090] ipr_ioa_bringdown_done+0x70/0x1e0
[c01fffe97bc0] [c07bc860] ipr_reset_ioa_job+0x80/0xf0
[c01fffe97bf0] [c07b4d50] ipr_reset_timer_done+0xd0/0x100
[c01fffe97c30] [c01937bc] call_timer_fn+0xdc/0x4b0
[c01fffe97cf0] [c0193d08] expire_timers+0x178/0x330
[c01fffe97d60] [c01940c8] run_timer_softirq+0xb8/0x120
[c01fffe97de0] [c0b726a8] __do_softirq+0x168/0x6d8
[c01fffe97ef0] [c00df2c8] irq_exit+0x108/0x150
[c01fffe97f10] [c0017bf4] __do_irq+0x2a4/0x4a0
[c01fffe97f90] [c002da50] call_do_irq+0x14/0x24
[c007fad93aa0] [c0017e8c] do_IRQ+0x9c/0x140
[c007fad93af0] [c0008b98] hardware_interrupt_common+0x138/0x140
--- interrupt: 501 at .L1.42+0x0/0x4
LR = arch_local_irq_restore.part.4+0x84/0xb0
[c007fad93de0] [c007ffc1f7d8] 0xc007ffc1f7d8 (unreliable)
[c007fad93e00] [c0988d3c] cpuidle_enter_state+0x1bc/0x530
[c007fad93e60] [c01457cc] call_cpuidle+0x4c/0x90
[c007fad93e80] [c0145b28] do_idle+0x208/0x2f0
[c007fad93ef0] [c0145f8c] cpu_startup_entry+0x3c/0x50
[c007fad93f20] [c0042bc0] start_secondary+0x3b0/0x4b0
[c007fad93f90] [c000ac6c] start_secondary_prolog+0x10/0x14


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Nicholas Piggin
On Wed, 26 Jul 2017 18:42:14 -0700
"Paul E. McKenney"  wrote:

> On Wed, Jul 26, 2017 at 04:22:00PM -0700, David Miller wrote:

> > Indeed, that really wouldn't explain how we end up with a RCU stall
> > dump listing almost all of the cpus as having missed a grace period.  
> 
> I have seen stranger things, but admittedly not often.

So the backtraces show the RCU gp thread in schedule_timeout.

Are you sure that it's timeout has expired and it's not being scheduled,
or could it be a bad (large) timeout (looks unlikely) or that it's being
scheduled but not correctly noting gps on other CPUs?

It's not in R state, so if it's not being scheduled at all, then it's
because the timer has not fired:

[ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! g1566 c1565 f0x0 
RCU_GP_WAIT_FQS(3) ->state=0x1
[ 1984.638153] rcu_preempt S0 9  2 0x
[ 1984.643626] Call trace:
[ 1984.646059] [] __switch_to+0x90/0xa8
[ 1984.651189] [] __schedule+0x19c/0x5d8
[ 1984.656400] [] schedule+0x38/0xa0
[ 1984.661266] [] schedule_timeout+0x124/0x218
[ 1984.667002] [] rcu_gp_kthread+0x4fc/0x748
[ 1984.672564] [] kthread+0xfc/0x128
[ 1984.677429] [] ret_from_fork+0x10/0x50


Re: KVM guests freeze under upstream kernel

2017-07-26 Thread Michael Ellerman
jos...@linux.vnet.ibm.com writes:
> On Thu, Jul 20, 2017 at 10:18:18PM -0300, jos...@linux.vnet.ibm.com wrote:
>> On Thu, Jul 20, 2017 at 03:21:59PM +1000, Paul Mackerras wrote:
>> > 
>> > Did you check the host kernel logs for any oops messages?
>> 
>> dmesg was clean but after sometime waiting (I forgot QEMU running in
>> another terminal) I got the oops below (after rebooting the host I 
>> couldn't reproduce it again).
>> 
>> Another test that I did was:
>> Compile with transparent huge pages disabled: KVM works fine
>> Compile with transparent huge pages enabled: doesn't work
>>   + disabling it in /sys/kernel/mm/transparent_hugepage: doesn't work
>> 
>> Just out of my own curiosity I made this small change:
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> b/arch/powerpc/include
>> index c0737c8..f94a3b6 100644
>> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
>> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
>> @@ -80,7 +80,7 @@
>>  
>>   #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty
>>   tracking 
>>#define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
>>-#define _PAGE_DEVMAP   _RPAGE_SW1 /* software: ZONE_DEVICE page 
>> */
>>+#define _PAGE_DEVMAP   _RPAGE_RSV3
>> #define __HAVE_ARCH_PTE_DEVMAP
>> 
>> and it works. I chose _RPAGE_RSV3 because it uses the same value that
>> x86 uses (0x0400UL) but I don't if it could have any side
>> effect
>> 
>
> Does this change make any sense to you people?

No :)

I think it's just hiding the bug somehow. Presumably we have some code
somewhere that is getting confused by _RPAGE_SW1 being set, or setting
that bit incorrectly.

cheers


Re: [PATCH 05/11] powerpc/topology: Remove the unused parent_node() macro

2017-07-26 Thread Dou Liyang

Hi Michael,

At 07/27/2017 10:21 AM, Michael Ellerman wrote:

Dou Liyang  writes:


Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().

The parent_node() macro in POWERPC platform is unnecessary.

Remove it for cleanup.

Reported-by: Michael Ellerman 
Signed-off-by: Dou Liyang 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/topology.h | 2 --
 1 file changed, 2 deletions(-)


Thanks for doing this series.



It's my pleasure. :)

Sincerely,

dou


Acked-by: Michael Ellerman 

cheers








Re: [PATCH 05/11] powerpc/topology: Remove the unused parent_node() macro

2017-07-26 Thread Michael Ellerman
Dou Liyang  writes:

> Commit a7be6e5a7f8d ("mm: drop useless local parameters of
> __register_one_node()") removes the last user of parent_node().
>
> The parent_node() macro in POWERPC platform is unnecessary.
>
> Remove it for cleanup.
>
> Reported-by: Michael Ellerman 
> Signed-off-by: Dou Liyang 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h | 2 --
>  1 file changed, 2 deletions(-)

Thanks for doing this series.

Acked-by: Michael Ellerman 

cheers


Re: [PATCH 1/6] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-07-26 Thread Aneesh Kumar K.V



On 07/26/2017 09:36 PM, Ram Pai wrote:

On Wed, Jul 26, 2017 at 04:05:48PM +0530, Aneesh Kumar K.V wrote:

Ram Pai  writes:




diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
b/arch/powerpc/include/asm/book3s/64/hash-64k.h
index 9732837..62e580c 100644
--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -12,18 +12,14 @@
   */
  #define H_PAGE_COMBO  _RPAGE_RPN0 /* this is a combo 4k page */
  #define H_PAGE_4K_PFN _RPAGE_RPN1 /* PFN is for a single 4k page */
+#define H_PAGE_BUSY_RPAGE_RPN42 /* software: PTE & hash are busy */



Why are we moving H_PAGE_BUSY. Right now 4k and 64k linux page table
format looks similar.


The goal is to clear off all the _RPAGE_RSV* bits so that they can be
used for protection keys.  the aim is to keep the protection-bits in the
_RPAGE_RSV* bits, so that they will work as-is whenever radix MMU enables
protection keys.

Yes this makes the PTE format differ from 4k PTE. Hopefully it is a
small inconvenience. The PTE format for 4K is anyway not exactly the
same compared to 64K PTE format. For example, higher RPN bits are
used on 4K but not on 64k. lower RPN bits are used on 64k but not
on 4k.

I was wondering why in this patch ? You do in the next patch

--- a/arch/powerpc/include/asm/book3s/64/hash-64k.h
+++ b/arch/powerpc/include/asm/book3s/64/hash-64k.h
@@ -12,7 +12,7 @@
  */
 #define H_PAGE_COMBO   _RPAGE_RPN0 /* this is a combo 4k page */
 #define H_PAGE_4K_PFN  _RPAGE_RPN1 /* PFN is for a single 4k page */
-#define H_PAGE_BUSY_RPAGE_RPN42 /* software: PTE & hash are busy */
+#define H_PAGE_BUSY_RPAGE_RPN44 /* software: PTE & hash are busy */



-aneesh



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Paul E. McKenney
On Wed, Jul 26, 2017 at 04:22:00PM -0700, David Miller wrote:
> From: "Paul E. McKenney" 
> Date: Wed, 26 Jul 2017 16:15:05 -0700
> 
> > On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
> >> Just out of curiousity, what x86 idle method is your machine using?
> >> The mwait one or the one which simply uses 'halt'?  The mwait variant
> >> might mask this bug, and halt would be a lot closer to how sparc64 and
> >> Jonathan's system operates.
> > 
> > My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
> > I am not using the mwait one.  Here is a grep for IDLE in my .config:
> > 
> > CONFIG_NO_HZ_IDLE=y
> > CONFIG_GENERIC_SMP_IDLE_THREAD=y
> > # CONFIG_IDLE_PAGE_TRACKING is not set
> > CONFIG_ACPI_PROCESSOR_IDLE=y
> > CONFIG_CPU_IDLE=y
> > # CONFIG_CPU_IDLE_GOV_LADDER is not set
> > CONFIG_CPU_IDLE_GOV_MENU=y
> > # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
> > # CONFIG_INTEL_IDLE is not set
> 
> No, that doesn't influence it.  It is determined by cpu features at
> run time.
> 
> If you are using mwait, it'll say so in your kernel log like this:
> 
>   using mwait in idle threads

Thank you for the hint!

And vim says:

"E486: Pattern not found: using mwait in idle threads"

> >> On sparc64 the cpu yield we do in the idle loop sleeps the cpu.  It's
> >> local TICK register keeps advancing, and the local timer therefore
> >> will still trigger.  Also, any externally generated interrupts
> >> (including cross calls) will wake up the cpu as well.
> >> 
> >> The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
> >> case.  One of my running theories is that we miss scheduling a tick
> >> due to a race.  That would be consistent with the behavior we see
> >> in the RCU dumps, I think.
> > 
> > But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
> > warning?  By default, your grace period needs to extend for more than
> > 21 seconds (more than one-third of a -minute-) to get one.  Or do
> > you mean that the ticks get shut off now and forever, as opposed to
> > just losing one of them?
> 
> Hmmm, good point.  And I was talking about simply missing one tick.
> 
> Indeed, that really wouldn't explain how we end up with a RCU stall
> dump listing almost all of the cpus as having missed a grace period.

I have seen stranger things, but admittedly not often.

Thanx, Paul



Re: [PATCH v3 4/5] powerpc/lib/sstep: Add prty instruction emulation

2017-07-26 Thread Michael Ellerman
Segher Boessenkool  writes:

> On Wed, Jul 26, 2017 at 08:03:30PM +1000, Michael Ellerman wrote:
>> Segher Boessenkool  writes:
>> > A general question about these patches: some things are inside #ifdef
>> > __powerpc64__, some are not.  It seems it is the wrong macro, and it
>> > should be used (or not used) consistently?
>> 
>> Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean?
>
> Yeah.  But I see sstep.c already mixes those two at will (or if there
> is a distinction, I'm not seeing it :-) )

Yeah OK. In practice they're equivalent, if CONFIG_PPC64=y then the
kernel is built 64-bit and therefore __powerpc64__ is defined.

But I agree it's a mess, we should use CONFIG_PPC64 exclusively unless
there's some reason not to (which I don't think there ever is).

>> I thought the reason some are #ifdef'ed is that some are 64-bit only.
>> ie. bpermd is 64-bit only ?
>
> 64-bit only, in what way?  It's not clear what the rules are.

Instructions that have "d" in the name? :P

> It's not instructions that can only run in 64-bit mode.
> It's not instructions that only give a usable result with 64-bit regs
> implemented.
> It's not instructions only implemented on 64-bit CPUs.

I think it's trying to be that ^

If you build a 32-bit kernel then instructions that are only defined on
64-bit CPUs should be treated as illegal, so the easiest way to achieve
that is to #ifdef off the code for those instructions.

cheers


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread David Miller
From: "Paul E. McKenney" 
Date: Wed, 26 Jul 2017 16:15:05 -0700

> On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
>> Just out of curiousity, what x86 idle method is your machine using?
>> The mwait one or the one which simply uses 'halt'?  The mwait variant
>> might mask this bug, and halt would be a lot closer to how sparc64 and
>> Jonathan's system operates.
> 
> My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
> I am not using the mwait one.  Here is a grep for IDLE in my .config:
> 
>   CONFIG_NO_HZ_IDLE=y
>   CONFIG_GENERIC_SMP_IDLE_THREAD=y
>   # CONFIG_IDLE_PAGE_TRACKING is not set
>   CONFIG_ACPI_PROCESSOR_IDLE=y
>   CONFIG_CPU_IDLE=y
>   # CONFIG_CPU_IDLE_GOV_LADDER is not set
>   CONFIG_CPU_IDLE_GOV_MENU=y
>   # CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
>   # CONFIG_INTEL_IDLE is not set

No, that doesn't influence it.  It is determined by cpu features at
run time.

If you are using mwait, it'll say so in your kernel log like this:

using mwait in idle threads

>> On sparc64 the cpu yield we do in the idle loop sleeps the cpu.  It's
>> local TICK register keeps advancing, and the local timer therefore
>> will still trigger.  Also, any externally generated interrupts
>> (including cross calls) will wake up the cpu as well.
>> 
>> The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
>> case.  One of my running theories is that we miss scheduling a tick
>> due to a race.  That would be consistent with the behavior we see
>> in the RCU dumps, I think.
> 
> But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
> warning?  By default, your grace period needs to extend for more than
> 21 seconds (more than one-third of a -minute-) to get one.  Or do
> you mean that the ticks get shut off now and forever, as opposed to
> just losing one of them?

Hmmm, good point.  And I was talking about simply missing one tick.

Indeed, that really wouldn't explain how we end up with a RCU stall
dump listing almost all of the cpus as having missed a grace period.


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Paul E. McKenney
On Wed, Jul 26, 2017 at 03:45:40PM -0700, David Miller wrote:
> From: "Paul E. McKenney" 
> Date: Wed, 26 Jul 2017 15:36:58 -0700
> 
> > And without CONFIG_SOFTLOCKUP_DETECTOR, I see five runs of 24 with RCU
> > CPU stall warnings.  So it seems likely that CONFIG_SOFTLOCKUP_DETECTOR
> > really is having an effect.
> 
> Thanks for all of the info Paul, I'll digest this and scan over the
> code myself.
> 
> Just out of curiousity, what x86 idle method is your machine using?
> The mwait one or the one which simply uses 'halt'?  The mwait variant
> might mask this bug, and halt would be a lot closer to how sparc64 and
> Jonathan's system operates.

My kernel builds with CONFIG_INTEL_IDLE=n, which I believe means that
I am not using the mwait one.  Here is a grep for IDLE in my .config:

CONFIG_NO_HZ_IDLE=y
CONFIG_GENERIC_SMP_IDLE_THREAD=y
# CONFIG_IDLE_PAGE_TRACKING is not set
CONFIG_ACPI_PROCESSOR_IDLE=y
CONFIG_CPU_IDLE=y
# CONFIG_CPU_IDLE_GOV_LADDER is not set
CONFIG_CPU_IDLE_GOV_MENU=y
# CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED is not set
# CONFIG_INTEL_IDLE is not set

> On sparc64 the cpu yield we do in the idle loop sleeps the cpu.  It's
> local TICK register keeps advancing, and the local timer therefore
> will still trigger.  Also, any externally generated interrupts
> (including cross calls) will wake up the cpu as well.
> 
> The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
> case.  One of my running theories is that we miss scheduling a tick
> due to a race.  That would be consistent with the behavior we see
> in the RCU dumps, I think.

But wouldn't you have to miss a -lot- of ticks to get an RCU CPU stall
warning?  By default, your grace period needs to extend for more than
21 seconds (more than one-third of a -minute-) to get one.  Or do
you mean that the ticks get shut off now and forever, as opposed to
just losing one of them?

> Anyways, just a theory, and that's why I keep mentioning that commit
> about the revert of the revert (specifically
> 411fe24e6b7c283c3a1911450cdba6dd3aaea56e).
> 
> :-)

I am running an overnight test in preparation for attempting to push
some fixes for regressions into 4.12, but will try reverting this
and enabling CONFIG_HZ_PERIODIC tomorrow.

Jonathan, might the commit that Dave points out above be what reduces
the probability of occurrence as you test older releases?

Thanx, Paul



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread David Miller
From: "Paul E. McKenney" 
Date: Wed, 26 Jul 2017 15:36:58 -0700

> And without CONFIG_SOFTLOCKUP_DETECTOR, I see five runs of 24 with RCU
> CPU stall warnings.  So it seems likely that CONFIG_SOFTLOCKUP_DETECTOR
> really is having an effect.

Thanks for all of the info Paul, I'll digest this and scan over the
code myself.

Just out of curiousity, what x86 idle method is your machine using?
The mwait one or the one which simply uses 'halt'?  The mwait variant
might mask this bug, and halt would be a lot closer to how sparc64 and
Jonathan's system operates.

On sparc64 the cpu yield we do in the idle loop sleeps the cpu.  It's
local TICK register keeps advancing, and the local timer therefore
will still trigger.  Also, any externally generated interrupts
(including cross calls) will wake up the cpu as well.

The tick-sched code is really tricky wrt. NO_HZ even in the NO_HZ_IDLE
case.  One of my running theories is that we miss scheduling a tick
due to a race.  That would be consistent with the behavior we see
in the RCU dumps, I think.

Anyways, just a theory, and that's why I keep mentioning that commit
about the revert of the revert (specifically
411fe24e6b7c283c3a1911450cdba6dd3aaea56e).

:-)


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Paul E. McKenney
On Wed, Jul 26, 2017 at 10:50:13AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 26, 2017 at 09:54:32AM -0700, David Miller wrote:
> > From: "Paul E. McKenney" 
> > Date: Wed, 26 Jul 2017 08:49:00 -0700
> > 
> > > On Wed, Jul 26, 2017 at 04:33:40PM +0100, Jonathan Cameron wrote:
> > >> Didn't leave it long enough. Still bad on 4.10-rc7 just took over
> > >> an hour to occur.
> > > 
> > > And it is quite possible that SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y
> > > are just greatly reducing the probability of the problem rather than
> > > completely preventing it.
> > > 
> > > Still, hopefully useful information, thank you for the testing!
> > 
> > I guess that invalidates my idea to test reverting recent changes to
> > the tick-sched.c code... :-/
> > 
> > In NO_HZ_IDLE mode, what is really supposed to happen on a completely
> > idle system?
> > 
> > All the cpus enter the idle loop, have no timers programmed, and they
> > all just go to sleep until an external event happens.
> > 
> > What ensures that grace periods get processed in this regime?
> 
> There are several different situations with different mechanisms:
> 
> 1.No grace period is in progress and no RCU callbacks are pending
>   anywhere in the system.  In this case, some other event would
>   need to start a grace period, so RCU just stays idle until that
>   happens, possibly indefinitely.  According to the battery-powered
>   embedded guys, this is a feature, not a bug.  ;-)
> 
> 2.No grace period is in progress, but there is at least one RCU
>   callback somewhere in the system.  In this case, the mechanism
>   depends on CONFIG_RCU_FAST_NO_HZ:
> 
>   CONFIG_RCU_FAST_NO_HZ=n:  The CPU on which the callback is
>   queued will return "true" in response to the call to
>   rcu_needs_cpu() that is made shortly before that CPU
>   enters idle.  This will cause the scheduling-clock
>   interrupt to remain on, despite the CPU being idle,
>   which will in turn allow RCU's state machine to continue
>   running out of softirq, triggered by the scheduling-clock
>   interrupts.
> 
>   CONFIG_RCU_FAST_NO_HZ=y:  The CPU on which the callback is queued
>   will return "false" in response to the call to
>   rcu_needs_cpu() that is made shortly before that CPU
>   enters idle.  However, it will also request a next event
>   about six seconds in the future if all callbacks do
>   nothing but free memory (kfree_rcu()), or about four
>   jiffies in the future if at least one callback does
>   something more than just free memory.
> 
>   There is also a rcu_prepare_for_idle() function that
>   is invoked later in the idle-entry process in this case
>   which will wake up the grace-period kthread if need be.
> 
> 3.A grace period is in progress.  In this case the grace-period
>   kthread is either currently running (in which case there will be
>   at least one non-idle CPU) or is in a timed wait for its next
>   scan for idle/offline CPUs (such CPUs need the grace-period
>   kthread to report quiescent states on their behalf).  In this
>   latter case, the timer subsystem will post a next event that
>   will be the wakeup time for the grace-period kthread, or some
>   earlier event.
> 
>   This is where we have been seeing trouble, if for no other
>   reason because RCU CPU stall warnings only happen when there
>   is a grace period in progress.
> 
> That is the theory, anyway...
> 
> And when I enabled CONFIG_SOFTLOCKUP_DETECTOR, I still see failures.
> I did 24 half-hour rcutorture runs on the TREE01 scenario, and two of them
> saw RCU CPU stall warnings with starvation of the grace-period kthread.
> I just now started another test but without CONFIG_SOFTLOCKUP_DETECTOR
> to see if it makes a significance difference for my testing.  I do have
> CONFIG_RCU_FAST_NO_HZ=y in my runs.

And without CONFIG_SOFTLOCKUP_DETECTOR, I see five runs of 24 with RCU
CPU stall warnings.  So it seems likely that CONFIG_SOFTLOCKUP_DETECTOR
really is having an effect.

Thanx, Paul



[PATCH 05/11] powerpc/topology: Remove the unused parent_node() macro

2017-07-26 Thread Dou Liyang
Commit a7be6e5a7f8d ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().

The parent_node() macro in POWERPC platform is unnecessary.

Remove it for cleanup.

Reported-by: Michael Ellerman 
Signed-off-by: Dou Liyang 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Cc: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/include/asm/topology.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h 
b/arch/powerpc/include/asm/topology.h
index dc4e159..2d84bca 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -16,8 +16,6 @@ struct device_node;
 
 #include 
 
-#define parent_node(node)  (node)
-
 #define cpumask_of_node(node) ((node) == -1 ?  \
   cpu_all_mask :   \
   node_to_cpumask_map[node])
-- 
2.5.5





Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Brijesh Singh



On 07/26/2017 02:26 PM, H. Peter Anvin wrote:


   \

   static inline void outs##bwl(int port, const void *addr, unsigned

long count) \

   {


This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/


Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling

a real

function will be significant.
The code bloat reduction will be significant.


I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is

actually

still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the

early

console.



There are some indirect user of string I/O functions. The following
functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this
approach,
I have added a new file arch/x86/kernel/io.c which provides non rep
version of
string I/O routines. The file gets built and used only when
AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with
AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
a function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)
\
  unsigned type value = in##bwl(port);\
  slow_down_io(); \
  return value;   \
-}
\
-
\
+}
+
+#define BUILDIO_REP(bwl, bw, type)
\
static inline void outs##bwl(int port, const void *addr, unsigned long
count) \
{
\
  asm volatile("rep; outs" #bwl   \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
unsigned long count)\
{
\
  asm volatile("rep; ins" #bwl\
   : "+D"(addr), "+c"(count) : "d"(port));\
-}
+}
\
  
  BUILDIO(b, b, char)

  BUILDIO(w, w, short)
  BUILDIO(l, , int)
  
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+extern void outsb_try_rep(int port, const void *addr, unsigned long
count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long
count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long
count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb  outsb_try_rep
+#define insb   insb_try_rep
+#define outsw  outsw_try_rep
+#define insw   insw_try_rep
+#define outsl  outsl_try_rep
+#define insl   insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
  extern void *xlate_dev_mem_ptr(phys_addr_t phys);
  extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -42,6 +42,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
  
  obj-y  := process_$(BITS).o signal.o

  obj-$(CONFIG_COMPAT)   += signal_compat.o
+obj-$(CONFIG_AMD_MEM_ENCRYPT) += io.o
obj-y  += traps.o irq.o irq_$(BITS).o
dumpstack_$(BITS).o
  obj-y  += time.o ioport.o dumpstack.o nmi.o
  obj-$(CONFIG_MODIFY_LDT_SYSCALL)   += ldt.o
diff --git a/arch/x86/kernel/io.c b/arch/x86/kernel/io.c
new file mode 100644
index 000..f58afa9
--- /dev/null
+++ b/arch/x86/kernel/io.c
@@ -0,0 +1,87 @@
+#include 
+#include 
+#include 
+
+void outsb_try_rep(int port, const void *addr, unsigned long count)
+{
+   if (sev_active()) {
+   unsigned char *value = (unsigned char *)addr;
+   while (count) {
+   outb(*value, port);
+   value++;
+   count--;
+   }
+   } else {
+   asm volatile("rep; outsb" : "+S"(addr), "+c"(count) :
"d"(port));
+   }
+}
+
+void insb_try_rep(int port, void *addr, unsigned long count)
+{
+   if (sev_active()) {
+   unsigned char *value = (unsigned char *)addr;
+  

Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread H. Peter Anvin
,Eric Biederman ,Tejun Heo 
,Paolo Bonzini ,Andrew Morton 
,"Kirill A . Shutemov" 
,Lu Baolu 
From: h...@zytor.com
Message-ID: 

On July 26, 2017 9:24:45 PM GMT+02:00, Brijesh Singh  
wrote:
>
>Hi Arnd and David,
>
>On 07/26/2017 05:45 AM, Arnd Bergmann wrote:
>> On Tue, Jul 25, 2017 at 11:51 AM, David Laight
> wrote:
>>> From: Brijesh Singh
 Sent: 24 July 2017 20:08
 From: Tom Lendacky 

 Secure Encrypted Virtualization (SEV) does not support string I/O,
>so
 unroll the string I/O operation into a loop operating on one
>element at
 a time.

 Signed-off-by: Tom Lendacky 
 Signed-off-by: Brijesh Singh 
 ---
   arch/x86/include/asm/io.h | 26 ++
   1 file changed, 22 insertions(+), 4 deletions(-)

 diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
 index e080a39..2f3c002 100644
 --- a/arch/x86/include/asm/io.h
 +++ b/arch/x86/include/asm/io.h
 @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int
>port)   \

>   \
   static inline void outs##bwl(int port, const void *addr, unsigned
>long count) \
   {
>> 
>> This will clash with a fix I did to add a "memory" clobber
>> for the traditional implementation, see
>> https://patchwork.kernel.org/patch/9854573/
>> 
>>> Is it even worth leaving these as inline functions?
>>> Given the speed of IO cycles it is unlikely that the cost of calling
>a real
>>> function will be significant.
>>> The code bloat reduction will be significant.
>> 
>> I think the smallest code would be the original "rep insb" etc, which
>> should be smaller than a function call, unlike the loop. Then again,
>> there is a rather small number of affected device drivers, almost all
>> of them for ancient hardware that you won't even build in a 64-bit
>> x86 kernel, see the list below. The only user I found that is
>actually
>> still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the
>early
>> console.
>
>
>There are some indirect user of string I/O functions. The following
>functions
>defined in lib/iomap.c calls rep version of ins and outs.
>
>- ioread8_rep, ioread16_rep, ioread32_rep
>- iowrite8_rep, iowrite16_rep, iowrite32_rep
>
>I found that several drivers use above functions.
>
>Here is one approach to convert it into non-inline functions. In this
>approach,
>I have added a new file arch/x86/kernel/io.c which provides non rep
>version of
>string I/O routines. The file gets built and used only when
>AMD_MEM_ENCRYPT is
>enabled. On positive side, if we don't build kernel with
>AMD_MEM_ENCRYPT support
>then we use inline routines, when AMD_MEM_ENCRYPT is built then we make
>a function
>call. Inside the function we unroll only when SEV is active.
>
>Do you see any issue with this approach ? thanks
>
>diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>index e080a39..104927d 100644
>--- a/arch/x86/include/asm/io.h
>+++ b/arch/x86/include/asm/io.h
>@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)  
>\
>  unsigned type value = in##bwl(port);\
>  slow_down_io(); \
>  return value;   \
>-} 
>\
>-  
>\
>+}
>+
>+#define BUILDIO_REP(bwl, bw, type)
>\
>static inline void outs##bwl(int port, const void *addr, unsigned long
>count) \
>{ 
>\
>  asm volatile("rep; outs" #bwl   \
>@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr,
>unsigned long count)\
>{ 
>\
>  asm volatile("rep; ins" #bwl\
>   : "+D"(addr), "+c"(count) : "d"(port));\
>-}
>+} 
>\
>  
>  BUILDIO(b, b, char)
>  BUILDIO(w, w, short)
>  BUILDIO(l, , int)
>  
>+#ifdef CONFIG_AMD_MEM_ENCRYPT
>+extern void outsb_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insb_try_rep(int port, void *addr, unsigned long count);
>+extern void outsw_try_rep(int port, const void *addr, unsigned long
>count);
>+extern void insw_try_rep(int port, void *addr, unsigned long count);
>+extern 

Re: [PATCH v2] include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH

2017-07-26 Thread Alex Williamson
On Tue, 18 Jul 2017 14:22:20 -0300
Murilo Opsfelder Araujo  wrote:

> When CONFIG_EEH=y and CONFIG_VFIO_SPAPR_EEH=n, build fails with the
> following:
> 
> drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_release':
> vfio_pci.c:(.text+0xa98): undefined reference to 
> `.vfio_spapr_pci_eeh_release'
> drivers/vfio/pci/vfio_pci.o: In function `.vfio_pci_open':
> vfio_pci.c:(.text+0x1420): undefined reference to 
> `.vfio_spapr_pci_eeh_open'
> 
> In this case, vfio_pci.c should use the empty definitions of
> vfio_spapr_pci_eeh_open and vfio_spapr_pci_eeh_release functions.
> 
> This patch fixes it by guarding these function definitions with
> CONFIG_VFIO_SPAPR_EEH, the symbol that controls whether vfio_spapr_eeh.c is
> built, which is where the non-empty versions of these functions are. We need 
> to
> make use of IS_ENABLED() macro because CONFIG_VFIO_SPAPR_EEH is a tristate
> option.
> 
> This issue was found during a randconfig build. Logs are here:
> 
> http://kisskb.ellerman.id.au/kisskb/buildresult/12982362/
> 
> Signed-off-by: Murilo Opsfelder Araujo 
> ---

Applied to my for-linus branch with David and Alexey's R-b for v4.13.
Thanks,

Alex

> 
> Changes from v1:
> - Rebased on top of next-20170718.
> 
>  include/linux/vfio.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 586809a..a47b985 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -152,7 +152,7 @@ extern int vfio_set_irqs_validate_and_prepare(struct 
> vfio_irq_set *hdr,
> size_t *data_size);
> 
>  struct pci_dev;
> -#ifdef CONFIG_EEH
> +#if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH)
>  extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
>  extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
>  extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
> @@ -173,7 +173,7 @@ static inline long vfio_spapr_iommu_eeh_ioctl(struct 
> iommu_group *group,
>  {
>   return -ENOTTY;
>  }
> -#endif /* CONFIG_EEH */
> +#endif /* CONFIG_VFIO_SPAPR_EEH */
> 
>  /*
>   * IRQfd - generic
> --
> 2.9.4



Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Brijesh Singh


Hi Arnd and David,

On 07/26/2017 05:45 AM, Arnd Bergmann wrote:

On Tue, Jul 25, 2017 at 11:51 AM, David Laight  wrote:

From: Brijesh Singh

Sent: 24 July 2017 20:08
From: Tom Lendacky 

Secure Encrypted Virtualization (SEV) does not support string I/O, so
unroll the string I/O operation into a loop operating on one element at
a time.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/include/asm/io.h | 26 ++
  1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..2f3c002 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port) 
  \
   \
  static inline void outs##bwl(int port, const void *addr, unsigned long count) 
\
  {


This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/


Is it even worth leaving these as inline functions?
Given the speed of IO cycles it is unlikely that the cost of calling a real
function will be significant.
The code bloat reduction will be significant.


I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is actually
still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the early
console.



There are some indirect user of string I/O functions. The following functions
defined in lib/iomap.c calls rep version of ins and outs.

- ioread8_rep, ioread16_rep, ioread32_rep
- iowrite8_rep, iowrite16_rep, iowrite32_rep

I found that several drivers use above functions.

Here is one approach to convert it into non-inline functions. In this approach,
I have added a new file arch/x86/kernel/io.c which provides non rep version of
string I/O routines. The file gets built and used only when AMD_MEM_ENCRYPT is
enabled. On positive side, if we don't build kernel with AMD_MEM_ENCRYPT support
then we use inline routines, when AMD_MEM_ENCRYPT is built then we make a 
function
call. Inside the function we unroll only when SEV is active.

Do you see any issue with this approach ? thanks

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index e080a39..104927d 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -323,8 +323,9 @@ static inline unsigned type in##bwl##_p(int port)   
\
unsigned type value = in##bwl(port);\
slow_down_io(); \
return value;   \
-}  \
-   \
+}
+
+#define BUILDIO_REP(bwl, bw, type) \
 static inline void outs##bwl(int port, const void *addr, unsigned long count) \
 {  \
asm volatile("rep; outs" #bwl   \
@@ -335,12 +336,31 @@ static inline void ins##bwl(int port, void *addr, 
unsigned long count)\
 {  \
asm volatile("rep; ins" #bwl\
 : "+D"(addr), "+c"(count) : "d"(port));\
-}
+}  \
 
 BUILDIO(b, b, char)

 BUILDIO(w, w, short)
 BUILDIO(l, , int)
 
+#ifdef CONFIG_AMD_MEM_ENCRYPT

+extern void outsb_try_rep(int port, const void *addr, unsigned long count);
+extern void insb_try_rep(int port, void *addr, unsigned long count);
+extern void outsw_try_rep(int port, const void *addr, unsigned long count);
+extern void insw_try_rep(int port, void *addr, unsigned long count);
+extern void outsl_try_rep(int port, const void *addr, unsigned long count);
+extern void insl_try_rep(int port, void *addr, unsigned long count);
+#define outsb  outsb_try_rep
+#define insb   insb_try_rep
+#define outsw  outsw_try_rep
+#define insw   insw_try_rep
+#define outsl  outsl_try_rep
+#define insl   insl_try_rep
+#else
+BUILDIO_REP(b, b, char)
+BUILDIO_REP(w, w, short)
+BUILDIO_REP(l, , int)
+#endif
+
 extern void *xlate_dev_mem_ptr(phys_addr_t phys);
 extern void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index a01892b..3b6e2a3 100644
--- a/arch/x86/kernel/Makefile
+++ 

Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Paul E. McKenney
On Wed, Jul 26, 2017 at 09:54:32AM -0700, David Miller wrote:
> From: "Paul E. McKenney" 
> Date: Wed, 26 Jul 2017 08:49:00 -0700
> 
> > On Wed, Jul 26, 2017 at 04:33:40PM +0100, Jonathan Cameron wrote:
> >> Didn't leave it long enough. Still bad on 4.10-rc7 just took over
> >> an hour to occur.
> > 
> > And it is quite possible that SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y
> > are just greatly reducing the probability of the problem rather than
> > completely preventing it.
> > 
> > Still, hopefully useful information, thank you for the testing!
> 
> I guess that invalidates my idea to test reverting recent changes to
> the tick-sched.c code... :-/
> 
> In NO_HZ_IDLE mode, what is really supposed to happen on a completely
> idle system?
> 
> All the cpus enter the idle loop, have no timers programmed, and they
> all just go to sleep until an external event happens.
> 
> What ensures that grace periods get processed in this regime?

There are several different situations with different mechanisms:

1.  No grace period is in progress and no RCU callbacks are pending
anywhere in the system.  In this case, some other event would
need to start a grace period, so RCU just stays idle until that
happens, possibly indefinitely.  According to the battery-powered
embedded guys, this is a feature, not a bug.  ;-)

2.  No grace period is in progress, but there is at least one RCU
callback somewhere in the system.  In this case, the mechanism
depends on CONFIG_RCU_FAST_NO_HZ:

CONFIG_RCU_FAST_NO_HZ=n:  The CPU on which the callback is
queued will return "true" in response to the call to
rcu_needs_cpu() that is made shortly before that CPU
enters idle.  This will cause the scheduling-clock
interrupt to remain on, despite the CPU being idle,
which will in turn allow RCU's state machine to continue
running out of softirq, triggered by the scheduling-clock
interrupts.

CONFIG_RCU_FAST_NO_HZ=y:  The CPU on which the callback is queued
will return "false" in response to the call to
rcu_needs_cpu() that is made shortly before that CPU
enters idle.  However, it will also request a next event
about six seconds in the future if all callbacks do
nothing but free memory (kfree_rcu()), or about four
jiffies in the future if at least one callback does
something more than just free memory.

There is also a rcu_prepare_for_idle() function that
is invoked later in the idle-entry process in this case
which will wake up the grace-period kthread if need be.

3.  A grace period is in progress.  In this case the grace-period
kthread is either currently running (in which case there will be
at least one non-idle CPU) or is in a timed wait for its next
scan for idle/offline CPUs (such CPUs need the grace-period
kthread to report quiescent states on their behalf).  In this
latter case, the timer subsystem will post a next event that
will be the wakeup time for the grace-period kthread, or some
earlier event.

This is where we have been seeing trouble, if for no other
reason because RCU CPU stall warnings only happen when there
is a grace period in progress.

That is the theory, anyway...

And when I enabled CONFIG_SOFTLOCKUP_DETECTOR, I still see failures.
I did 24 half-hour rcutorture runs on the TREE01 scenario, and two of them
saw RCU CPU stall warnings with starvation of the grace-period kthread.
I just now started another test but without CONFIG_SOFTLOCKUP_DETECTOR
to see if it makes a significance difference for my testing.  I do have
CONFIG_RCU_FAST_NO_HZ=y in my runs.

Thanx, Paul



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Wed, 26 Jul 2017 09:54:32 -0700
David Miller  wrote:

> From: "Paul E. McKenney" 
> Date: Wed, 26 Jul 2017 08:49:00 -0700
> 
> > On Wed, Jul 26, 2017 at 04:33:40PM +0100, Jonathan Cameron wrote:  
> >> Didn't leave it long enough. Still bad on 4.10-rc7 just took over
> >> an hour to occur.  
> > 
> > And it is quite possible that SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y
> > are just greatly reducing the probability of the problem rather than
> > completely preventing it.
> > 
> > Still, hopefully useful information, thank you for the testing!  

Not sure it actually gives us much information, but no issues yet
with a simple program running every cpu that wakes up every 3 seconds.

Will leave it running overnight and report back in the morning.

> 
> I guess that invalidates my idea to test reverting recent changes to
> the tick-sched.c code... :-/
> 
> In NO_HZ_IDLE mode, what is really supposed to happen on a completely
> idle system?
> 
> All the cpus enter the idle loop, have no timers programmed, and they
> all just go to sleep until an external event happens.
> 
> What ensures that grace periods get processed in this regime?


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread David Miller
From: "Paul E. McKenney" 
Date: Wed, 26 Jul 2017 08:49:00 -0700

> On Wed, Jul 26, 2017 at 04:33:40PM +0100, Jonathan Cameron wrote:
>> Didn't leave it long enough. Still bad on 4.10-rc7 just took over
>> an hour to occur.
> 
> And it is quite possible that SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y
> are just greatly reducing the probability of the problem rather than
> completely preventing it.
> 
> Still, hopefully useful information, thank you for the testing!

I guess that invalidates my idea to test reverting recent changes to
the tick-sched.c code... :-/

In NO_HZ_IDLE mode, what is really supposed to happen on a completely
idle system?

All the cpus enter the idle loop, have no timers programmed, and they
all just go to sleep until an external event happens.

What ensures that grace periods get processed in this regime?


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread David Miller
From: "Paul E. McKenney" 
Date: Wed, 26 Jul 2017 07:14:17 -0700

> Dave, any other ideas on what might be causing this or what might be
> tested?

I was going to go through the changes that happened between v4.12
and now to kernel/timer/tick-sched.c and see if reverting any of
those help.

But I don't know when I would get to that.

Commit 411fe24e6b7c283c3a1911450cdba6dd3aaea56e, looks particularly juicy.
:-)


Re: [RFC Part1 PATCH v3 03/17] x86/mm: Secure Encrypted Virtualization (SEV) support

2017-07-26 Thread Tom Lendacky

On 7/25/2017 11:28 PM, Borislav Petkov wrote:

On Mon, Jul 24, 2017 at 02:07:43PM -0500, Brijesh Singh wrote:

From: Tom Lendacky 

Provide support for Secure Encyrpted Virtualization (SEV). This initial


Your subject misses a verb and patch subjects should have an active verb
denoting what the patch does. The sentence above is a good example.


Yup, will update.




support defines a flag that is used by the kernel to determine if it is
running with SEV active.

Signed-off-by: Tom Lendacky 
Signed-off-by: Brijesh Singh 
---
  arch/x86/include/asm/mem_encrypt.h | 2 ++
  arch/x86/mm/mem_encrypt.c  | 3 +++
  include/linux/mem_encrypt.h| 8 +++-
  3 files changed, 12 insertions(+), 1 deletion(-)


...


diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 0fbd092..1e4643e 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -40,6 +40,9 @@ static char sme_cmdline_off[] __initdata = "off";
  unsigned long sme_me_mask __section(.data) = 0;
  EXPORT_SYMBOL_GPL(sme_me_mask);
  
+unsigned int sev_enabled __section(.data) = 0;

+EXPORT_SYMBOL_GPL(sev_enabled);


So sev_enabled is a pure bool used only in bool context, not like
sme_me_mask whose value is read too. Which means, you can make the
former static and query it only through accessor functions.


If it's made static then the sme_active()/sev_active() inline functions
would need to be turned into functions within the mem_encrypt.c file. So
there's a trade-off to do that, which is the better one?




  /* Buffer used for early in-place encryption by BSP, no locking needed */
  static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
  
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h

index 1255f09..ea0831a 100644
--- a/include/linux/mem_encrypt.h
+++ b/include/linux/mem_encrypt.h
@@ -22,12 +22,18 @@
  #else /* !CONFIG_ARCH_HAS_MEM_ENCRYPT */
  
  #define sme_me_mask	0UL

+#define sev_enabled0
  
  #endif	/* CONFIG_ARCH_HAS_MEM_ENCRYPT */
  
  static inline bool sme_active(void)

  {
-   return !!sme_me_mask;
+   return (sme_me_mask && !sev_enabled);


You don't need the brackets. Below too.


Ok.




+}
+
+static inline bool sev_active(void)
+{
+   return (sme_me_mask && sev_enabled);
  }


So this is confusing, TBH. SME and SEV are not mutually exclusive and
yet the logic here says so. Why?

I mean, in the hypervisor context, sme_active() is still true.

/me is confused.


The kernel needs to distinguish between running under SME and running
under SEV. SME and SEV are similar but not the same. The trampoline code
is a good example.  Before paging is activated, SME will access all
memory as decrypted, but SEV will access all memory as encrypted.  So
when APs are being brought up under SME the trampoline area cannot be
encrypted, whereas under SEV the trampoline area must be encrypted.

Thanks,
Tom





Re: [PATCH 1/6] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-07-26 Thread Ram Pai
On Wed, Jul 26, 2017 at 04:05:48PM +0530, Aneesh Kumar K.V wrote:
> Ram Pai  writes:
> 
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> > in the 4K backed HPTE pages.These bits continue to be used
> > for 64K backed HPTE pages in this patch, but will be freed
> > up in the next patch. The  bit  numbers are big-endian  as
> > defined in the ISA3.0
> >
> > The patch does the following change to the 4k htpe backed
> > 64K PTE's format.
> >
> > H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure
> > below)
> > V0 which occupied bit 4 is not used anymore.
> > V1 which occupied bit 5 is not used anymore.
> > V2 which occupied bit 6 is not used anymore.
> > V3 which occupied bit 7 is not used anymore.
> >
> > Before the patch, the 4k backed 64k PTE format was as follows
> >
> >  0 1 2 3 4  5  6  7  8 9 10...63
> >  : : : : :  :  :  :  : : ::
> >  v v v v v  v  v  v  v v vv
> >
> > ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> > |x|x|x|B|V0|V1|V2|V3|x| | |x|x||x|x|x|x| <- primary pte
> > '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> > |S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
> > '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
> >
> > After the patch, the 4k backed 64k PTE format is as follows
> >
> >  0 1 2 3 4  5  6  7  8 9 10...63
> >  : : : : :  :  :  :  : : ::
> >  v v v v v  v  v  v  v v vv
> >
> > ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> > |x|x|x| |  |  |  |  |x|B| |x|x||.|.|.|.| <- primary pte
> > '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> > |S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
> > '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
> >
> > the four  bits S,G,I,X (one quadruplet per 4k HPTE) that
> > cache  the  hash-bucket  slot  value, is initialized  to
> > 1,1,1,1 indicating -- an invalid slot.   If  a HPTE gets
> > cached in a   slot(i.e 7th  slot  of  secondary hash
> > bucket), it is  released  immediately. In  other  words,
> > even  though    is   a valid slot  value in the hash
> > bucket, we consider it invalid and  release the slot and
> > the HPTE.  This  gives  us  the opportunity to determine
> > the validity of S,G,I,X  bits  based on its contents and
> > not on any of the bits V0,V1,V2 or V3 in the primary PTE
> >
> > When   we  release  aHPTEcached in the  slot
> > we alsorelease  a  legitimate   slot  in the primary
> > hash bucket  and  unmap  its  corresponding  HPTE.  This
> > is  to  ensure   that  we do get a HPTE cached in a slot
> > of the primary hash bucket, the next time we retry.
> >
> > Though  treating    slot  as  invalid,  reduces  the
> > number of  available  slots  in the hash bucket and  may
> > have  an  effect   on the performance, the probabilty of
> > hitting a  slot is extermely low.
> >
> > Compared  to  the  current   scheme, the above described
> > scheme  reduces  the  number of false hash table updates
> > significantly   andhas  the   added   advantage   of
> > releasing  four  valuable  PTE bits for other purpose.
> >
> > NOTE:even though bits 3, 4, 5, 6, 7 are  not  used  when
> > the  64K  PTE is backed by 4k HPTE,  they continue to be
> > used  if  the  PTE  gets  backed  by 64k HPTE.  The next
> > patch will decouple that aswell, and truely  release the
> > bits.
> >
> > This idea was jointly developed by Paul Mackerras,
> > Aneesh, Michael Ellermen and myself.
> >
> > 4K PTE format remains unchanged currently.
> >
> > The patch does the following code changes
> > a) PTE flags are split between 64k and 4k  header files.
> > b) __hash_page_4K()  is  reimplemented   to reflect the
> >above logic.
> >
> > Reviewed-by: Aneesh Kumar K.V 
> > Signed-off-by: Ram Pai 
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  |2 +
> >  arch/powerpc/include/asm/book3s/64/hash-64k.h |8 +--
> >  arch/powerpc/include/asm/book3s/64/hash.h |1 -
> >  arch/powerpc/mm/hash64_64k.c  |   74 
> > -
> >  arch/powerpc/mm/hash_utils_64.c   |4 +-
> >  5 files changed, 55 insertions(+), 34 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > index 0c4e470..f959c00 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > @@ -16,6 +16,8 @@
> >  #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
> >  #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> >
> > +#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are 
> > busy */
> > +
> >  /* PTE flags 

Re: [RFC Part1 PATCH v3 05/17] x86, realmode: Don't decrypt trampoline area under SEV

2017-07-26 Thread Borislav Petkov
 Subject: x86/realmode: ...

On Mon, Jul 24, 2017 at 02:07:45PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> When SEV is active the trampoline area will need to be in encrypted
> memory so only mark the area decrypted if SME is active.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/realmode/init.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
> index 1f71980..c7eeca7 100644
> --- a/arch/x86/realmode/init.c
> +++ b/arch/x86/realmode/init.c
> @@ -63,9 +63,11 @@ static void __init setup_real_mode(void)
>   /*
>* If SME is active, the trampoline area will need to be in
>* decrypted memory in order to bring up other processors
> -  * successfully.
> +  * successfully. For SEV the trampoline area needs to be in
> +  * encrypted memory, so only do this for SME.

Or simply say:

"It is not needed for SEV."

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH v3 4/5] powerpc/lib/sstep: Add prty instruction emulation

2017-07-26 Thread Segher Boessenkool
On Wed, Jul 26, 2017 at 08:03:30PM +1000, Michael Ellerman wrote:
> Segher Boessenkool  writes:
> > A general question about these patches: some things are inside #ifdef
> > __powerpc64__, some are not.  It seems it is the wrong macro, and it
> > should be used (or not used) consistently?
> 
> Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean?

Yeah.  But I see sstep.c already mixes those two at will (or if there
is a distinction, I'm not seeing it :-) )

> I thought the reason some are #ifdef'ed is that some are 64-bit only.
> ie. bpermd is 64-bit only ?

64-bit only, in what way?  It's not clear what the rules are.

It's not instructions that can only run in 64-bit mode.
It's not instructions that only give a usable result with 64-bit regs
implemented.
It's not instructions only implemented on 64-bit CPUs.
It's not even "all instructions that would not give a correct result
in the low 32 bits of GPRs if the high 32 bits are not implemented".


Segher


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Paul E. McKenney
On Wed, Jul 26, 2017 at 04:33:40PM +0100, Jonathan Cameron wrote:
> On Wed, 26 Jul 2017 15:23:15 +0100
> Jonathan Cameron  wrote:
> 
> > On Wed, 26 Jul 2017 07:14:17 -0700
> > "Paul E. McKenney"  wrote:
> > 
> > > On Wed, Jul 26, 2017 at 01:28:01PM +0100, Jonathan Cameron wrote:  
> > > > On Wed, 26 Jul 2017 10:32:32 +0100
> > > > Jonathan Cameron  wrote:
> > > > 
> > > > > On Wed, 26 Jul 2017 09:16:23 +0100
> > > > > Jonathan Cameron  wrote:
> > > > > 
> > > > > > On Tue, 25 Jul 2017 21:12:17 -0700
> > > > > > "Paul E. McKenney"  wrote:
> > > > > >   
> > > > > > > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote: 
> > > > > > >
> > > > > > > > From: "Paul E. McKenney" 
> > > > > > > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > > > > > >   
> > > > > > > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote: 
> > > > > > > > >  
> > > > > > > > >> Just to report, turning softlockup back on fixes things for 
> > > > > > > > >> me on
> > > > > > > > >> sparc64 too.  
> > > > > > > > > 
> > > > > > > > > Very good!
> > > > > > > > >   
> > > > > > > > >> The thing about softlockup is it runs an hrtimer, which 
> > > > > > > > >> seems to run
> > > > > > > > >> about every 4 seconds.  
> > > > > > > > > 
> > > > > > > > > I could see where that could shake things loose, but I am 
> > > > > > > > > surprised that
> > > > > > > > > it would be needed.  I ran a short run with 
> > > > > > > > > CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > > > > > > with no trouble, but I will be running a longer test later on.
> > > > > > > > >   
> > > > > > > > >> So I wonder if this is a NO_HZ problem.  
> > > > > > > > > 
> > > > > > > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  
> > > > > > > > > What are
> > > > > > > > > you running?  (Again, my symptoms are slightly different, so 
> > > > > > > > > I might
> > > > > > > > > be seeing a different bug.)  
> > > > > > > > 
> > > > > > > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > > > > > > 
> > > > > > > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR 
> > > > > > > > disabled.  
> > > > > > > 
> > > > > > > Same here -- but my failure case happens fairly rarely, so it 
> > > > > > > will take
> > > > > > > some time to gain reasonable confidence that enabling 
> > > > > > > SOFTLOCKUP_DETECTOR
> > > > > > > had effect.
> > > > > > > 
> > > > > > > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > > > > > > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > > > > > > 
> > > > > > >   Thanx, Paul
> > > > > > > 
> > > > > > I'll be the headless chicken running around and trying as many tests
> > > > > > as I can fit in.  Typical time to see the failure for us is sub 10
> > > > > > minutes so we'll see how far we get.
> > > > > > 
> > > > > > Make me a list to run if you like ;)
> > > > > > 
> > > > > > NO_HZ_PERIODIC=y running now.  
> > > > > By which I mean CONFIG_HZ_PERIODIC=y
> > > 
> > > I did get that messed up, didn't I?  Sorry for my confusion!
> > >   
> > > > > Anyhow, run for 40 minutes with out seeing a splat but my sanity check
> > > > > on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
> > > > > I won't have much confidence until we are a few hours in on this.
> > > > > 
> > > > > Anyhow, certainly looking like a promising direction for 
> > > > > investigation!
> > > > > 
> > > > Well it's done over 3 hours without a splat so I think it is fine with
> > > > CONFIG_HZ_PERIODIC=y
> > > 
> > > Thank you!
> > > 
> > > If you run with SOFTLOCKUP_DETECTOR=n and NO_HZ_IDLE=y, but have a normal
> > > user task waking up every few seconds on each CPU, does the problem occur?
> > > (The question is whether any disturbance gets things going, or whether 
> > > there
> > > is something special about SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y.
> > > 
> > > Dave, any other ideas on what might be causing this or what might be
> > > tested?
> > > 
> > >   Thanx, Paul
> > >   
> > 
> > Although it's still early days (40 mins in) it looks like the issue first
> > occurred between 4.10-rc7 and 4.11-rc1 (don't ask why those particular RCs)
> > 
> > Bad as with current kernel on 4.11-rc1 and good on 4.10-rc7.
> 
> Didn't leave it long enough. Still bad on 4.10-rc7 just took over
> an hour to occur.

And it is quite possible that SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y
are just greatly reducing the probability of the problem rather than
completely preventing it.

Still, hopefully useful information, thank you for the testing!


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Wed, 26 Jul 2017 15:23:15 +0100
Jonathan Cameron  wrote:

> On Wed, 26 Jul 2017 07:14:17 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Wed, Jul 26, 2017 at 01:28:01PM +0100, Jonathan Cameron wrote:  
> > > On Wed, 26 Jul 2017 10:32:32 +0100
> > > Jonathan Cameron  wrote:
> > > 
> > > > On Wed, 26 Jul 2017 09:16:23 +0100
> > > > Jonathan Cameron  wrote:
> > > > 
> > > > > On Tue, 25 Jul 2017 21:12:17 -0700
> > > > > "Paul E. McKenney"  wrote:
> > > > >   
> > > > > > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:   
> > > > > >  
> > > > > > > From: "Paul E. McKenney" 
> > > > > > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > > > > >   
> > > > > > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote:   
> > > > > > > >
> > > > > > > >> Just to report, turning softlockup back on fixes things for me 
> > > > > > > >> on
> > > > > > > >> sparc64 too.  
> > > > > > > > 
> > > > > > > > Very good!
> > > > > > > >   
> > > > > > > >> The thing about softlockup is it runs an hrtimer, which seems 
> > > > > > > >> to run
> > > > > > > >> about every 4 seconds.  
> > > > > > > > 
> > > > > > > > I could see where that could shake things loose, but I am 
> > > > > > > > surprised that
> > > > > > > > it would be needed.  I ran a short run with 
> > > > > > > > CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > > > > > with no trouble, but I will be running a longer test later on.
> > > > > > > >   
> > > > > > > >> So I wonder if this is a NO_HZ problem.  
> > > > > > > > 
> > > > > > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  
> > > > > > > > What are
> > > > > > > > you running?  (Again, my symptoms are slightly different, so I 
> > > > > > > > might
> > > > > > > > be seeing a different bug.)  
> > > > > > > 
> > > > > > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > > > > > 
> > > > > > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR 
> > > > > > > disabled.  
> > > > > > 
> > > > > > Same here -- but my failure case happens fairly rarely, so it will 
> > > > > > take
> > > > > > some time to gain reasonable confidence that enabling 
> > > > > > SOFTLOCKUP_DETECTOR
> > > > > > had effect.
> > > > > > 
> > > > > > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > > > > > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > > > > > 
> > > > > > Thanx, Paul
> > > > > > 
> > > > > I'll be the headless chicken running around and trying as many tests
> > > > > as I can fit in.  Typical time to see the failure for us is sub 10
> > > > > minutes so we'll see how far we get.
> > > > > 
> > > > > Make me a list to run if you like ;)
> > > > > 
> > > > > NO_HZ_PERIODIC=y running now.  
> > > > By which I mean CONFIG_HZ_PERIODIC=y
> > 
> > I did get that messed up, didn't I?  Sorry for my confusion!
> >   
> > > > Anyhow, run for 40 minutes with out seeing a splat but my sanity check
> > > > on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
> > > > I won't have much confidence until we are a few hours in on this.
> > > > 
> > > > Anyhow, certainly looking like a promising direction for investigation!
> > > > 
> > > Well it's done over 3 hours without a splat so I think it is fine with
> > > CONFIG_HZ_PERIODIC=y
> > 
> > Thank you!
> > 
> > If you run with SOFTLOCKUP_DETECTOR=n and NO_HZ_IDLE=y, but have a normal
> > user task waking up every few seconds on each CPU, does the problem occur?
> > (The question is whether any disturbance gets things going, or whether there
> > is something special about SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y.
> > 
> > Dave, any other ideas on what might be causing this or what might be
> > tested?
> > 
> > Thanx, Paul
> >   
> 
> Although it's still early days (40 mins in) it looks like the issue first
> occurred between 4.10-rc7 and 4.11-rc1 (don't ask why those particular RCs)
> 
> Bad as with current kernel on 4.11-rc1 and good on 4.10-rc7.
Didn't leave it long enough. Still bad on 4.10-rc7 just took over
an hour to occur.
> 
> Could be something different was hiding it in 4.10 though.  We have a fair
> delta from mainline back then unfortunately so bisecting will be
> 'interesting'.
> 
> I'll see if I can get the test you suggest running.
> 
> Jonathan
> ___
> linuxarm mailing list
> linux...@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm



Re: [RFC Part1 PATCH v3 04/17] x86/mm: Don't attempt to encrypt initrd under SEV

2017-07-26 Thread Borislav Petkov
On Mon, Jul 24, 2017 at 02:07:44PM -0500, Brijesh Singh wrote:
> From: Tom Lendacky 
> 
> When SEV is active the initrd/initramfs will already have already been
> placed in memory encyrpted so do not try to encrypt it.
> 
> Signed-off-by: Tom Lendacky 
> Signed-off-by: Brijesh Singh 
> ---
>  arch/x86/kernel/setup.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 0bfe0c1..01d56a1 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -379,9 +379,11 @@ static void __init reserve_initrd(void)
>* If SME is active, this memory will be marked encrypted by the
>* kernel when it is accessed (including relocation). However, the
>* ramdisk image was loaded decrypted by the bootloader, so make
> -  * sure that it is encrypted before accessing it.
> +  * sure that it is encrypted before accessing it. For SEV the
> +  * ramdisk will already be encyrpted, so only do this for SME.
   ^
Please introduce a spellchecker into your patch creation workflow.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 


Re: [PATCH 3/4] powerpc: pseries: only store the device node basename in full_name

2017-07-26 Thread Rob Herring
On Wed, Jul 26, 2017 at 5:07 AM, David Laight  wrote:
> From: Rob Herring
>> Sent: 25 July 2017 22:44
>> With dependencies on full_name containing the entire device node path
>> removed, stop storing the full_name in nodes created by
>> dlpar_configure_connector() and pSeries_reconfig_add_node().
> ...
>>   dn = kzalloc(sizeof(*dn), GFP_KERNEL);
>>   if (!dn)
>>   return NULL;
>>
>>   name = (char *)ccwa + be32_to_cpu(ccwa->name_offset);
>> - dn->full_name = kasprintf(GFP_KERNEL, "%s/%s", path, name);
>> + dn->full_name = kasprintf(GFP_KERNEL, "%s", name);
>
> Isn't this just strdup()?

Yes, it can be simplified to that now.

> Perhaps you should rename full_name to name - since it is no longer 'full'?

Ideally, yes. However, we still have users in other places tree wide
(which should still work with the change) and I don't think it is
worth the additional churn. Also, we already have "name" as that is
the node name without the unit-address. I'd like to get rid of that,
but name is special in that it is exposed as a property too. Finally,
full_name is still the full path on Sparc.

> Maybe you could do a single malloc() for both 'dn' and the name?

Sure.

Rob


Re: [PATCH 0/4] Removing full paths from DT full_name

2017-07-26 Thread Rob Herring
On Wed, Jul 26, 2017 at 9:20 AM, Frank Rowand  wrote:
> Hi Rob,
>
> On 07/25/17 14:44, Rob Herring wrote:
>> This series is the last steps to remove storing the full path for every
>> DT node. Instead, we can create full path strings dynamically as needed
>> with printf %pOF specifiers (commit ce4fecf1fe15). There are a number of
>> remaining direct users of full_name after this series. I don't believe
>> there should be any functional impact for those users with the change to
>> only the node name (+unit-address). The majority are for struct
>> resource.name. This should only affect /proc/iomem display.
>
> I added a new dependency on full_name in:
>
>   [PATCH v4 3/3] of: overlay: add overlay symbols to live device tree
>
> You don't need to fix that -- I knew the removal of full_name was coming
> and expected to adapt to that change when it came, so I'll take care of
> it.  It will probably take me two or three weeks to get to it, but that
> shouldn't be a big deal since the affected code is a new feature that
> is not yet used.

Indeed, I had fixed up the print statements, but missed that. Thanks
for the heads up.

Rob


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Wed, 26 Jul 2017 07:14:17 -0700
"Paul E. McKenney"  wrote:

> On Wed, Jul 26, 2017 at 01:28:01PM +0100, Jonathan Cameron wrote:
> > On Wed, 26 Jul 2017 10:32:32 +0100
> > Jonathan Cameron  wrote:
> >   
> > > On Wed, 26 Jul 2017 09:16:23 +0100
> > > Jonathan Cameron  wrote:
> > >   
> > > > On Tue, 25 Jul 2017 21:12:17 -0700
> > > > "Paul E. McKenney"  wrote:
> > > > 
> > > > > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:  
> > > > > > From: "Paul E. McKenney" 
> > > > > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > > > > 
> > > > > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote: 
> > > > > > >
> > > > > > >> Just to report, turning softlockup back on fixes things for me on
> > > > > > >> sparc64 too.
> > > > > > > 
> > > > > > > Very good!
> > > > > > > 
> > > > > > >> The thing about softlockup is it runs an hrtimer, which seems to 
> > > > > > >> run
> > > > > > >> about every 4 seconds.
> > > > > > > 
> > > > > > > I could see where that could shake things loose, but I am 
> > > > > > > surprised that
> > > > > > > it would be needed.  I ran a short run with 
> > > > > > > CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > > > > with no trouble, but I will be running a longer test later on.
> > > > > > > 
> > > > > > >> So I wonder if this is a NO_HZ problem.
> > > > > > > 
> > > > > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  What 
> > > > > > > are
> > > > > > > you running?  (Again, my symptoms are slightly different, so I 
> > > > > > > might
> > > > > > > be seeing a different bug.)
> > > > > > 
> > > > > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > > > > 
> > > > > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled. 
> > > > > >
> > > > > 
> > > > > Same here -- but my failure case happens fairly rarely, so it will 
> > > > > take
> > > > > some time to gain reasonable confidence that enabling 
> > > > > SOFTLOCKUP_DETECTOR
> > > > > had effect.
> > > > > 
> > > > > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > > > > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > > > > 
> > > > >   Thanx, Paul
> > > > >   
> > > > I'll be the headless chicken running around and trying as many tests
> > > > as I can fit in.  Typical time to see the failure for us is sub 10
> > > > minutes so we'll see how far we get.
> > > > 
> > > > Make me a list to run if you like ;)
> > > > 
> > > > NO_HZ_PERIODIC=y running now.
> > > By which I mean CONFIG_HZ_PERIODIC=y  
> 
> I did get that messed up, didn't I?  Sorry for my confusion!
> 
> > > Anyhow, run for 40 minutes with out seeing a splat but my sanity check
> > > on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
> > > I won't have much confidence until we are a few hours in on this.
> > > 
> > > Anyhow, certainly looking like a promising direction for investigation!
> > >   
> > Well it's done over 3 hours without a splat so I think it is fine with
> > CONFIG_HZ_PERIODIC=y  
> 
> Thank you!
> 
> If you run with SOFTLOCKUP_DETECTOR=n and NO_HZ_IDLE=y, but have a normal
> user task waking up every few seconds on each CPU, does the problem occur?
> (The question is whether any disturbance gets things going, or whether there
> is something special about SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y.
> 
> Dave, any other ideas on what might be causing this or what might be
> tested?
> 
>   Thanx, Paul
> 

Although it's still early days (40 mins in) it looks like the issue first
occurred between 4.10-rc7 and 4.11-rc1 (don't ask why those particular RCs)

Bad as with current kernel on 4.11-rc1 and good on 4.10-rc7.

Could be something different was hiding it in 4.10 though.  We have a fair
delta from mainline back then unfortunately so bisecting will be
'interesting'.

I'll see if I can get the test you suggest running.

Jonathan


Re: [PATCH 0/4] Removing full paths from DT full_name

2017-07-26 Thread Frank Rowand
Hi Rob,

On 07/25/17 14:44, Rob Herring wrote:
> This series is the last steps to remove storing the full path for every 
> DT node. Instead, we can create full path strings dynamically as needed 
> with printf %pOF specifiers (commit ce4fecf1fe15). There are a number of 
> remaining direct users of full_name after this series. I don't believe 
> there should be any functional impact for those users with the change to 
> only the node name (+unit-address). The majority are for struct 
> resource.name. This should only affect /proc/iomem display.

I added a new dependency on full_name in:

  [PATCH v4 3/3] of: overlay: add overlay symbols to live device tree

You don't need to fix that -- I knew the removal of full_name was coming
and expected to adapt to that change when it came, so I'll take care of
it.  It will probably take me two or three weeks to get to it, but that
shouldn't be a big deal since the affected code is a new feature that
is not yet used.

-Frank

> 
> Patches 1 and 2 can be applied now for 4.14. For patches 3 and 4, my 
> target is 4.15 after all the dependencies have been merged.
> 
> PPC folks, Please test! The PPC parts are untested. A git branch with 
> all the dependencies is here[1].
> 
> Rob
> 
> [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git dt-printf
> 
> Rob Herring (4):
>   powerpc: pseries: vio: match parent nodes with of_find_node_by_path
>   powerpc: pseries: remove dlpar_attach_node dependency on full path
>   powerpc: pseries: only store the device node basename in full_name
>   of/fdt: only store the device node basename in full_name
> 
>  arch/powerpc/platforms/pseries/dlpar.c   | 26 +++
>  arch/powerpc/platforms/pseries/hotplug-cpu.c |  2 +-
>  arch/powerpc/platforms/pseries/mobility.c|  2 +-
>  arch/powerpc/platforms/pseries/pseries.h |  2 +-
>  arch/powerpc/platforms/pseries/reconfig.c|  2 +-
>  arch/powerpc/platforms/pseries/vio.c |  4 +-
>  drivers/of/fdt.c | 69 
> +---
>  7 files changed, 23 insertions(+), 84 deletions(-)
> 



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Paul E. McKenney
On Wed, Jul 26, 2017 at 01:28:01PM +0100, Jonathan Cameron wrote:
> On Wed, 26 Jul 2017 10:32:32 +0100
> Jonathan Cameron  wrote:
> 
> > On Wed, 26 Jul 2017 09:16:23 +0100
> > Jonathan Cameron  wrote:
> > 
> > > On Tue, 25 Jul 2017 21:12:17 -0700
> > > "Paul E. McKenney"  wrote:
> > >   
> > > > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:
> > > > > From: "Paul E. McKenney" 
> > > > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > > >   
> > > > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote:  
> > > > > >> Just to report, turning softlockup back on fixes things for me on
> > > > > >> sparc64 too.  
> > > > > > 
> > > > > > Very good!
> > > > > >   
> > > > > >> The thing about softlockup is it runs an hrtimer, which seems to 
> > > > > >> run
> > > > > >> about every 4 seconds.  
> > > > > > 
> > > > > > I could see where that could shake things loose, but I am surprised 
> > > > > > that
> > > > > > it would be needed.  I ran a short run with 
> > > > > > CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > > > with no trouble, but I will be running a longer test later on.
> > > > > >   
> > > > > >> So I wonder if this is a NO_HZ problem.  
> > > > > > 
> > > > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  What 
> > > > > > are
> > > > > > you running?  (Again, my symptoms are slightly different, so I might
> > > > > > be seeing a different bug.)  
> > > > > 
> > > > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > > > 
> > > > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled.   
> > > > >
> > > > 
> > > > Same here -- but my failure case happens fairly rarely, so it will take
> > > > some time to gain reasonable confidence that enabling 
> > > > SOFTLOCKUP_DETECTOR
> > > > had effect.
> > > > 
> > > > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > > > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > > > 
> > > > Thanx, Paul
> > > > 
> > > I'll be the headless chicken running around and trying as many tests
> > > as I can fit in.  Typical time to see the failure for us is sub 10
> > > minutes so we'll see how far we get.
> > > 
> > > Make me a list to run if you like ;)
> > > 
> > > NO_HZ_PERIODIC=y running now.  
> > By which I mean CONFIG_HZ_PERIODIC=y

I did get that messed up, didn't I?  Sorry for my confusion!

> > Anyhow, run for 40 minutes with out seeing a splat but my sanity check
> > on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
> > I won't have much confidence until we are a few hours in on this.
> > 
> > Anyhow, certainly looking like a promising direction for investigation!
> > 
> Well it's done over 3 hours without a splat so I think it is fine with
> CONFIG_HZ_PERIODIC=y

Thank you!

If you run with SOFTLOCKUP_DETECTOR=n and NO_HZ_IDLE=y, but have a normal
user task waking up every few seconds on each CPU, does the problem occur?
(The question is whether any disturbance gets things going, or whether there
is something special about SOFTLOCKUP_DETECTOR=y and HZ_PERIODIC=y.

Dave, any other ideas on what might be causing this or what might be
tested?

Thanx, Paul



Re: [PATCH 4/4] of/fdt: only store the device node basename in full_name

2017-07-26 Thread Rob Herring
On Wed, Jul 26, 2017 at 5:26 AM, Michael Ellerman  wrote:
> Rob Herring  writes:
>
>> With dependencies on a statically allocated full path name converted to
>> use %pOF format specifier, we can store just the basename of node, and
>> the unflattening of the FDT can be simplified.
>>
>> This commit will affect the remaining users of full_name. After
>> analyzing these users, the remaining cases should only change some print
>> messages. The main users of full_name are providing a name for struct
>> resource. The resource names shouldn't be important other than providing
>> /proc/iomem names.
>>
>> We no longer distinguish between pre and post 0x10 dtb formats as either
>> a full path or basename will work. However, less than 0x10 formats have
>> been broken since the conversion to use libfdt (and no one has cared).
>
> For the record - yes we did care. It broke booting with old versions of
> kexec, and it was a royal P.I.T.# to debug :D

Sorry, I forgot about that one. I'll drop the statement. I had gone
back and looked and only found the issue on mpc8323 booting[1] which
was an issue with libfdt having more checks on the fdt. I proposed
some fixes, but never heard back on that.

Rob

[1] https://lkml.org/lkml/2015/6/10/820


Re: Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-26 Thread Thomas Gleixner
On Wed, 26 Jul 2017, Michael Ellerman wrote:

> Hi Thomas,
> 
> I'm seeing the lockdep barf below on some bare metal Power8 machines.
> 
> This seems to be caused by our smp_cpus_done(), which does:
> 
>   void __init smp_cpus_done(unsigned int max_cpus)
>   {
>   /*
>* We want the setup_cpu() here to be called on the boot CPU, but
>* init might run on any CPU, so make sure it's invoked on the boot
>* CPU.
>*/
>   if (smp_ops && smp_ops->setup_cpu)
>   work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);
> 
> 
> I don't think CPU hotplug can happen at this point, so I don't think
> there's really a bug.
> 
> But it looks like the work_on_cpu_safe() call could just go away, since
> you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
> the boot CPU, initially"). Though I can't see where init is unpinned, so
> maybe we do still need to do it?

It's undone in sched_init_smp(). So it looks safe. The call order is:

 smp_init()
...
smp_cpus_done()

 sched_init_smp()

Thanks,

tglx


[PATCH] powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler

2017-07-26 Thread Michael Ellerman
Historically the boot wrapper was always built 32-bit big endian, even
for 64-bit kernels. That was because old firmwares didn't necessarily
support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
uses CROSS32CC for compilation.

However when we added 64-bit little endian support, we also added
support for building the boot wrapper 64-bit. However we kept using
CROSS32CC, because in most cases it is just CC and everything works.

However if the user doesn't specify CROSS32_COMPILE (which no one ever
does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
becomes just "gcc". On native systems that is probably OK, but if we're
cross building it definitely isn't, leading to eg:

  gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
  gcc: error: unrecognized argument in option ‘-mabi=elfv2’
  gcc: error: unrecognized command line option ‘-mlittle-endian’
  make: *** [zImage] Error 2

To fix it, stop using CROSS32CC, because we may or may not be building
32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
CROSS32_COMPILE if it's set and we're building for 32-bit.

Fixes: 147c05168fc8 ("powerpc/boot: Add support for 64bit little endian 
wrapper")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/boot/Makefile | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index a7814a7b1523..6f952fe1f084 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -25,12 +25,20 @@ compress-$(CONFIG_KERNEL_XZ)   := CONFIG_KERNEL_XZ
 BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
 -fno-strict-aliasing -Os -msoft-float -pipe \
 -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
--isystem $(shell $(CROSS32CC) -print-file-name=include) \
 -D$(compress-y)
 
+BOOTCC := $(CC)
 ifdef CONFIG_PPC64_BOOT_WRAPPER
 BOOTCFLAGS += -m64
+else
+BOOTCFLAGS += -m32
+ifdef CROSS32_COMPILE
+BOOTCC := $(CROSS32_COMPILE)gcc
+endif
 endif
+
+BOOTCFLAGS += -isystem $(shell $(BOOTCC) -print-file-name=include)
+
 ifdef CONFIG_CPU_BIG_ENDIAN
 BOOTCFLAGS += -mbig-endian
 else
@@ -183,10 +191,10 @@ clean-files := $(zlib-) $(zlibheader-) 
$(zliblinuxheader-) \
empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
 
 quiet_cmd_bootcc = BOOTCC  $@
-  cmd_bootcc = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
+  cmd_bootcc = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTCFLAGS) -c -o $@ $<
 
 quiet_cmd_bootas = BOOTAS  $@
-  cmd_bootas = $(CROSS32CC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $<
+  cmd_bootas = $(BOOTCC) -Wp,-MD,$(depfile) $(BOOTAFLAGS) -c -o $@ $<
 
 quiet_cmd_bootar = BOOTAR  $@
   cmd_bootar = $(CROSS32AR) -cr$(KBUILD_ARFLAGS) $@. $(filter-out 
FORCE,$^); mv $@. $@
-- 
2.7.4



Re: KVM guests freeze under upstream kernel

2017-07-26 Thread joserz
On Thu, Jul 20, 2017 at 10:18:18PM -0300, jos...@linux.vnet.ibm.com wrote:
> On Thu, Jul 20, 2017 at 03:21:59PM +1000, Paul Mackerras wrote:
> > On Thu, Jul 20, 2017 at 12:02:23AM -0300, jos...@linux.vnet.ibm.com wrote:
> > > On Thu, Jul 20, 2017 at 09:42:50AM +1000, Benjamin Herrenschmidt wrote:
> > > > On Wed, 2017-07-19 at 16:46 -0300, jos...@linux.vnet.ibm.com wrote:
> > > > > Hello!
> > > > > 
> > > > > We're not able to boot any KVM guest using upstream kernel 
> > > > > (cb8c65ccff7f77d0285f1b126c72d37b2572c865 - 4.13.0-rc1+).
> > > > > After reaching the SLOF initial counting, the guest simply freezes:
> > > > 
> > > > Can you send our .config ?
> > > 
> > > Sure,
> > > 
> > > Answering Michael as well:
> > > 
> > > It's a P9 with RHEL kernel 4.11.0-10.el7a.ppc64le installed. The problem
> > > was noticed with kernel > 4.13 (I'm currently running 4.13.0-rc1+).
> > > 
> > > QEMU is https://github.com/dgibson/qemu (ppc-for-2.10) but I gave the
> > > default packaged Qemu a try.
> > > 
> > > For the guest, I tried both a vanilla Ubuntu 17.04 and the host kernel.
> > > But they had never a chance to run since the freezing happened in SLOF.
> > > 
> > > Note that using the 4.11.0-10.el7a.ppc64le kernel it works fine
> > > (for any of these Qemu/Guest setup). With 4.13.0-rc1 I have it run after
> > > reverting that referred commit.
> > 
> > Is the host kernel running in radix mode?
> 
> yes
> 
> > 
> > Did you check the host kernel logs for any oops messages?
> 
> dmesg was clean but after sometime waiting (I forgot QEMU running in
> another terminal) I got the oops below (after rebooting the host I 
> couldn't reproduce it again).
> 
> Another test that I did was:
> Compile with transparent huge pages disabled: KVM works fine
> Compile with transparent huge pages enabled: doesn't work
>   + disabling it in /sys/kernel/mm/transparent_hugepage: doesn't work
> 
> Just out of my own curiosity I made this small change:
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h
> b/arch/powerpc/include
> index c0737c8..f94a3b6 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -80,7 +80,7 @@
>  
>   #define _PAGE_SOFT_DIRTY   _RPAGE_SW3 /* software: software dirty
>   tracking 
>#define _PAGE_SPECIAL  _RPAGE_SW2 /* software: special page */
>-#define _PAGE_DEVMAP   _RPAGE_SW1 /* software: ZONE_DEVICE page */
>+#define _PAGE_DEVMAP   _RPAGE_RSV3
> #define __HAVE_ARCH_PTE_DEVMAP
> 
> and it works. I chose _RPAGE_RSV3 because it uses the same value that
> x86 uses (0x0400UL) but I don't if it could have any side
> effect
> 

Does this change make any sense to you people?
I didn't see any side effect expect that devices backed memory will have
a bigger address space in transparent huge pages IF I understand that
correctly.

If so I can send a patch with this change.

Thank you!!



[PATCH] powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain

2017-07-26 Thread Michael Ellerman
In commit efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions
for modules"), we added an ld version check early in the powerpc
top-level Makefile.

Because the Makefile runs before the kernel config is setup, the
checks for CONFIG_CPU_LITTLE_ENDIAN etc. all take the default case. So
we end up configuring ld for 32-bit big endian.

That would be OK, except that for historical (or perhaps no) reason,
we use 'override LD' to add the endian flags to the LD variable
itself, rather than the normal approach of adding them to LDFLAGS.

The end result is that when we check the ld version we run it as:

  $(CROSS_COMPILE)ld -EB -m elf32ppc --version

This often works, unless you are using a 64-bit only and/or little
endian only, toolchain. In which case you see something like:

  $ make defconfig
  powerpc64le-linux-ld: unrecognised emulation mode: elf32ppc
  Supported emulations: elf64lppc elf32lppc elf32lppclinux elf32lppcsim
  /bin/sh: 1: [: -ge: unexpected operator

The proper fix is to stop using 'override LD', but that will require a
fair bit of testing. Instead we can fix it for now just by reordering
the Makefile to do the version check earlier.

Fixes: efe0160cfd40 ("powerpc/64: Linker on-demand sfpr functions for modules")
Signed-off-by: Michael Ellerman 
---
 arch/powerpc/Makefile | 25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8d4ed73d5490..e2b3e7a00c9e 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -59,6 +59,19 @@ machine-$(CONFIG_PPC64) += 64
 machine-$(CONFIG_CPU_LITTLE_ENDIAN) += le
 UTS_MACHINE := $(subst $(space),,$(machine-y))
 
+# XXX This needs to be before we override LD below
+ifdef CONFIG_PPC32
+KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
+else
+ifeq ($(call ld-ifversion, -ge, 22500, y),y)
+# Have the linker provide sfpr if possible.
+# There is a corresponding test in arch/powerpc/lib/Makefile
+KBUILD_LDFLAGS_MODULE += --save-restore-funcs
+else
+KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
+endif
+endif
+
 ifeq ($(CONFIG_CPU_LITTLE_ENDIAN),y)
 override LD+= -EL
 LDEMULATION:= lppc
@@ -190,18 +203,6 @@ else
 CHECKFLAGS += -D__LITTLE_ENDIAN__
 endif
 
-ifdef CONFIG_PPC32
-KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
-else
-ifeq ($(call ld-ifversion, -ge, 22500, y),y)
-# Have the linker provide sfpr if possible.
-# There is a corresponding test in arch/powerpc/lib/Makefile
-KBUILD_LDFLAGS_MODULE += --save-restore-funcs
-else
-KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
-endif
-endif
-
 ifeq ($(CONFIG_476FPE_ERR46),y)
KBUILD_LDFLAGS_MODULE += --ppc476-workaround \
-T $(srctree)/arch/powerpc/platforms/44x/ppc476_modules.lds
-- 
2.7.4



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Wed, 26 Jul 2017 13:28:01 +0100
Jonathan Cameron  wrote:

> On Wed, 26 Jul 2017 10:32:32 +0100
> Jonathan Cameron  wrote:
> 
> > On Wed, 26 Jul 2017 09:16:23 +0100
> > Jonathan Cameron  wrote:
> >   
> > > On Tue, 25 Jul 2017 21:12:17 -0700
> > > "Paul E. McKenney"  wrote:
> > > 
> > > > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:  
> > > > > From: "Paul E. McKenney" 
> > > > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > > > 
> > > > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote:   
> > > > > >  
> > > > > >> Just to report, turning softlockup back on fixes things for me on
> > > > > >> sparc64 too.
> > > > > > 
> > > > > > Very good!
> > > > > > 
> > > > > >> The thing about softlockup is it runs an hrtimer, which seems to 
> > > > > >> run
> > > > > >> about every 4 seconds.
> > > > > > 
> > > > > > I could see where that could shake things loose, but I am surprised 
> > > > > > that
> > > > > > it would be needed.  I ran a short run with 
> > > > > > CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > > > with no trouble, but I will be running a longer test later on.
> > > > > > 
> > > > > >> So I wonder if this is a NO_HZ problem.
> > > > > > 
> > > > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  What 
> > > > > > are
> > > > > > you running?  (Again, my symptoms are slightly different, so I might
> > > > > > be seeing a different bug.)
> > > > > 
> > > > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > > > 
> > > > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled.   
> > > > >  
> > > > 
> > > > Same here -- but my failure case happens fairly rarely, so it will take
> > > > some time to gain reasonable confidence that enabling 
> > > > SOFTLOCKUP_DETECTOR
> > > > had effect.
> > > > 
> > > > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > > > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > > > 
> > > > Thanx, Paul
> > > >   
> > > I'll be the headless chicken running around and trying as many tests
> > > as I can fit in.  Typical time to see the failure for us is sub 10
> > > minutes so we'll see how far we get.
> > > 
> > > Make me a list to run if you like ;)
> > > 
> > > NO_HZ_PERIODIC=y running now.
> > By which I mean CONFIG_HZ_PERIODIC=y
> > 
> > Anyhow, run for 40 minutes with out seeing a splat but my sanity check
> > on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
> > I won't have much confidence until we are a few hours in on this.
> > 
> > Anyhow, certainly looking like a promising direction for investigation!
> >   
> Well it's done over 3 hours without a splat so I think it is fine with
> CONFIG_HZ_PERIODIC=y
> 
As I think we expected, the problem occurs with NO_HZ_FULL.
Happened pretty quickly but given the somewhat random nature,
might just be coincidence.

Jonathan
> 
> > Jonathan
> >   
> > > 
> > > Jonathan
> > > 
> > > ___
> > > linuxarm mailing list
> > > linux...@huawei.com
> > > http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm
> > 
> > 
> > ___
> > linuxarm mailing list
> > linux...@huawei.com
> > http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm  
> 
> 
> ___
> linuxarm mailing list
> linux...@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm




Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Wed, 26 Jul 2017 10:32:32 +0100
Jonathan Cameron  wrote:

> On Wed, 26 Jul 2017 09:16:23 +0100
> Jonathan Cameron  wrote:
> 
> > On Tue, 25 Jul 2017 21:12:17 -0700
> > "Paul E. McKenney"  wrote:
> >   
> > > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:
> > > > From: "Paul E. McKenney" 
> > > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > >   
> > > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote:  
> > > > >> Just to report, turning softlockup back on fixes things for me on
> > > > >> sparc64 too.  
> > > > > 
> > > > > Very good!
> > > > >   
> > > > >> The thing about softlockup is it runs an hrtimer, which seems to run
> > > > >> about every 4 seconds.  
> > > > > 
> > > > > I could see where that could shake things loose, but I am surprised 
> > > > > that
> > > > > it would be needed.  I ran a short run with 
> > > > > CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > > with no trouble, but I will be running a longer test later on.
> > > > >   
> > > > >> So I wonder if this is a NO_HZ problem.  
> > > > > 
> > > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  What are
> > > > > you running?  (Again, my symptoms are slightly different, so I might
> > > > > be seeing a different bug.)  
> > > > 
> > > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > > 
> > > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled. 
> > > >  
> > > 
> > > Same here -- but my failure case happens fairly rarely, so it will take
> > > some time to gain reasonable confidence that enabling SOFTLOCKUP_DETECTOR
> > > had effect.
> > > 
> > > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > > 
> > >   Thanx, Paul
> > > 
> > I'll be the headless chicken running around and trying as many tests
> > as I can fit in.  Typical time to see the failure for us is sub 10
> > minutes so we'll see how far we get.
> > 
> > Make me a list to run if you like ;)
> > 
> > NO_HZ_PERIODIC=y running now.  
> By which I mean CONFIG_HZ_PERIODIC=y
> 
> Anyhow, run for 40 minutes with out seeing a splat but my sanity check
> on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
> I won't have much confidence until we are a few hours in on this.
> 
> Anyhow, certainly looking like a promising direction for investigation!
> 
Well it's done over 3 hours without a splat so I think it is fine with
CONFIG_HZ_PERIODIC=y


> Jonathan
> 
> > 
> > Jonathan
> > 
> > ___
> > linuxarm mailing list
> > linux...@huawei.com
> > http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm  
> 
> 
> ___
> linuxarm mailing list
> linux...@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm




Possible circular locking dependency detected between cpu_hotplug_lock.rw_sem and wfc.work

2017-07-26 Thread Michael Ellerman
Hi Thomas,

I'm seeing the lockdep barf below on some bare metal Power8 machines.

This seems to be caused by our smp_cpus_done(), which does:

  void __init smp_cpus_done(unsigned int max_cpus)
  {
/*
 * We want the setup_cpu() here to be called on the boot CPU, but
 * init might run on any CPU, so make sure it's invoked on the boot
 * CPU.
 */
if (smp_ops && smp_ops->setup_cpu)
work_on_cpu_safe(boot_cpuid, smp_setup_cpu_workfn, NULL);


I don't think CPU hotplug can happen at this point, so I don't think
there's really a bug.

But it looks like the work_on_cpu_safe() call could just go away, since
you pinned init to the boot CPU in 8fb12156b8db ("init: Pin init task to
the boot CPU, initially"). Though I can't see where init is unpinned, so
maybe we do still need to do it?

cheers


[1.468911] ipr: IBM Power RAID SCSI Device Driver version: 2.6.4 (March 14, 
2017)
[1.469032] ipr 0001:08:00.0: Found IOA with IRQ: 0
[1.469167] 
[1.469193] ==
[1.469250] WARNING: possible circular locking dependency detected
[1.469309] 4.12.0-gcc6-gb40b238 #1 Not tainted
[1.469355] --
[1.469413] kworker/0:1/971 is trying to acquire lock:
[1.469459]  (cpu_hotplug_lock.rw_sem){++}, at: [] 
apply_workqueue_attrs+0x34/0xa0
[1.469544] 
[1.469544] but task is already holding lock:
[1.469602]  (()){+.+.+.}, at: [] 
process_one_work+0x25c/0x800
[1.469674] 
[1.469674] which lock already depends on the new lock.
[1.469674] 
[1.469744] 
[1.469744] the existing dependency chain (in reverse order) is:
[1.469813] 
[1.469813] -> #1 (()){+.+.+.}:
[1.469862]flush_work+0x74/0x340
[1.469898]work_on_cpu+0xb8/0xe0
[1.469934]work_on_cpu_safe+0x80/0xd0
[1.469982]smp_cpus_done+0x54/0xb0
[1.470017]smp_init+0x1a0/0x1cc
[1.470053]kernel_init_freeable+0x1f4/0x3b0
[1.470100]kernel_init+0x24/0x160
[1.470137]ret_from_kernel_thread+0x5c/0x74
[1.470183] 
[1.470183] -> #0 (cpu_hotplug_lock.rw_sem){++}:
[1.470244]lock_acquire+0xec/0x2e0
[1.470279]cpus_read_lock+0x4c/0xd0
[1.470315]apply_workqueue_attrs+0x34/0xa0
[1.470362]__alloc_workqueue_key+0x1b4/0x5b4
[1.470410]scsi_host_alloc+0x328/0x4b0
[1.470457]ipr_probe+0xb4/0x1f00
[1.470493]local_pci_probe+0x6c/0x140
[1.470541]work_for_cpu_fn+0x38/0x60
[1.470588]process_one_work+0x310/0x800
[1.470635]worker_thread+0x2b0/0x520
[1.470682]kthread+0x164/0x1b0
[1.470718]ret_from_kernel_thread+0x5c/0x74
[1.470764] 
[1.470764] other info that might help us debug this:
[1.470764] 
[1.470834]  Possible unsafe locking scenario:
[1.470834] 
[1.470891]CPU0CPU1
[1.470937]
[1.470984]   lock(());
[1.471020]lock(cpu_hotplug_lock.rw_sem);
[1.471078]lock(());
[1.471136]   lock(cpu_hotplug_lock.rw_sem);
[1.471183] 
[1.471183]  *** DEADLOCK ***
[1.471183] 
[1.471242] 2 locks held by kworker/0:1/971:
[1.471289]  #0:  ("events"){.+.+.+}, at: [] 
process_one_work+0x25c/0x800
[1.471360]  #1:  (()){+.+.+.}, at: [] 
process_one_work+0x25c/0x800
[1.471488] 
[1.471488] stack backtrace:
[1.471582] CPU: 0 PID: 971 Comm: kworker/0:1 Not tainted 
4.12.0-gcc6-gb40b238 #1
[1.471721] Workqueue: events work_for_cpu_fn
[1.471813] Call Trace:
[1.471859] [c007f877f560] [c0b491b8] dump_stack+0xe8/0x160 
(unreliable)
[1.471996] [c007f877f5a0] [c0153e68] 
print_circular_bug+0x288/0x3d0
[1.472133] [c007f877f640] [c0157ec8] 
__lock_acquire+0x1858/0x1a20
[1.472271] [c007f877f7b0] [c0158bfc] lock_acquire+0xec/0x2e0
[1.472386] [c007f877f880] [c00d4f8c] cpus_read_lock+0x4c/0xd0
[1.472501] [c007f877f8b0] [c0100974] 
apply_workqueue_attrs+0x34/0xa0
[1.472639] [c007f877f8f0] [c0102ef4] 
__alloc_workqueue_key+0x1b4/0x5b4
[1.472776] [c007f877f9b0] [c078cd58] scsi_host_alloc+0x328/0x4b0
[1.472913] [c007f877fa50] [c07bd414] ipr_probe+0xb4/0x1f00
[1.473028] [c007f877fbb0] [c06157fc] local_pci_probe+0x6c/0x140
[1.473166] [c007f877fc40] [c00f8298] work_for_cpu_fn+0x38/0x60
[1.473281] [c007f877fc70] [c00fdbe0] 
process_one_work+0x310/0x800
[1.473418] [c007f877fd30] [c00fe380] worker_thread+0x2b0/0x520
[1.473534] [c007f877fdc0] [c0107c84] kthread+0x164/0x1b0
[1.473649] [c007f877fe30] [c000b6e8] 
ret_from_kernel_thread+0x5c/0x74


Re: [PATCH 0/4] Allow non-legacy cards to be vgaarb default

2017-07-26 Thread Laszlo Ersek
On 07/25/17 17:56, Gabriele Paoloni wrote:
> Hi Laszlo
> 
> [...]
> 
>>
>> Having practically zero background in gfx development (either kernel or
>> Xorg), I think the problem is that vga_default_device() /
>> vga_set_default_device(), which -- apparently -- "boot_vga" is based
>> upon, come from "drivers/gpu/vga/vgaarb.c". Namely, the concept of
>> "primary / boot display device" is tied to the VGA arbiter, plus only a
>> PCI device can currently be marked as primary/boot display device.
>>
>> Can these concepts be split from each other? (I can fully imagine that
>> this would result in a userspace visible interface change (or
>> addition),
>> so that e.g. "/sys/devices/**/boot_gpu" would have to be consulted by
>> display servers.)
>>
>> (Sorry if I'm totally wrong.)
>>
>> ... Hm, reading the thread starter at
>> > d...@lists.ozlabs.org/msg120851.html>,
>> and the references within... It looks like this work is motivated by
>> hardware that is supposed to be PCI, but actually breaks the specs. Is
>> that correct? If so, then I don't think I can suggest anything useful.
> 
> My understanding is that the current PCIe HW is specs compliant but the
> vgaarb, in order to make a VGA device the default one, requires all the
> bridges on top of such device to have the "VGA Enable" bit set (optional
> bit in the PCI Express™ to PCI/PCI-X Bridge Spec). I.e. all the bridges
> on top have to support legacy VGA devices; and this is not mandatory
> from the specs...right?

Sounds very plausible to me. And, I guess if the "boot GPU" concept were
split from the VGA arbiter, then the VGA arbiter's above requirement
(a) would not have to be disturbed, and
(b) would no longer interfere with the kind of hardware that's being
discussed.

Thanks
Laszlo

> BTW my VGA experience is limited too...this is just my understanding...
> 
> Gab
> 
>> Specs exist so that hardware vendors and software authors follow them.
>> If hardware does not conform, then software should either refuse to
>> work
>> with it, or handle it with quirks, on a case-by-case basis. I guess
>> this
>> means that I don't agree with the
>>
>>   broad[] suggest[ion] that a more generic solution would be better
>>
>> which seems to disqualify me from the discussion, as it must have been
>> suggested by people with incomparably more experience than what I have
>> :)
>>
>> Thanks
>> Laszlo



Re: [RFC Part1 PATCH v3 13/17] x86/io: Unroll string I/O when SEV is active

2017-07-26 Thread Arnd Bergmann
On Tue, Jul 25, 2017 at 11:51 AM, David Laight  wrote:
> From: Brijesh Singh
>> Sent: 24 July 2017 20:08
>> From: Tom Lendacky 
>>
>> Secure Encrypted Virtualization (SEV) does not support string I/O, so
>> unroll the string I/O operation into a loop operating on one element at
>> a time.
>>
>> Signed-off-by: Tom Lendacky 
>> Signed-off-by: Brijesh Singh 
>> ---
>>  arch/x86/include/asm/io.h | 26 ++
>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
>> index e080a39..2f3c002 100644
>> --- a/arch/x86/include/asm/io.h
>> +++ b/arch/x86/include/asm/io.h
>> @@ -327,14 +327,32 @@ static inline unsigned type in##bwl##_p(int port)  
>>  \
>>   \
>>  static inline void outs##bwl(int port, const void *addr, unsigned long 
>> count) \
>>  {

This will clash with a fix I did to add a "memory" clobber
for the traditional implementation, see
https://patchwork.kernel.org/patch/9854573/

> Is it even worth leaving these as inline functions?
> Given the speed of IO cycles it is unlikely that the cost of calling a real
> function will be significant.
> The code bloat reduction will be significant.

I think the smallest code would be the original "rep insb" etc, which
should be smaller than a function call, unlike the loop. Then again,
there is a rather small number of affected device drivers, almost all
of them for ancient hardware that you won't even build in a 64-bit
x86 kernel, see the list below. The only user I found that is actually
still relevant is drivers/tty/hvc/hvc_xen.c, which uses it for the early
console.

   Arnd

---
Full list for reference

$ git grep -wl '\(__ide_\|\)\(in\|out\)s\(b\|w\|l\)' drivers sound/
drivers/atm/horizon.c
drivers/cdrom/gdrom.c
drivers/gpu/drm/vmwgfx/vmwgfx_msg.h
drivers/ide/ide-io-std.c
drivers/isdn/hardware/avm/avmcard.h
drivers/isdn/hardware/eicon/divasmain.c
drivers/isdn/hardware/mISDN/avmfritz.c
drivers/isdn/hardware/mISDN/iohelper.h
drivers/isdn/hardware/mISDN/netjet.c
drivers/isdn/hardware/mISDN/w6692.c
drivers/isdn/hisax/asuscom.c
drivers/isdn/hisax/avm_a1.c
drivers/isdn/hisax/avm_a1p.c
drivers/isdn/hisax/avm_pci.c
drivers/isdn/hisax/diva.c
drivers/isdn/hisax/elsa.c
drivers/isdn/hisax/gazel.c
drivers/isdn/hisax/hisax_fcpcipnp.c
drivers/isdn/hisax/ix1_micro.c
drivers/isdn/hisax/mic.c
drivers/isdn/hisax/netjet.c
drivers/isdn/hisax/niccy.c
drivers/isdn/hisax/saphir.c
drivers/isdn/hisax/sedlbauer.c
drivers/isdn/hisax/sportster.c
drivers/isdn/hisax/teles3.c
drivers/isdn/hisax/w6692.c
drivers/isdn/hisax/w6692.h
drivers/media/rc/winbond-cir.c
drivers/mtd/spi-nor/aspeed-smc.c
drivers/net/appletalk/cops.c
drivers/net/arcnet/arcdevice.h
drivers/net/arcnet/com20020.c
drivers/net/ethernet/3com/3c509.c
drivers/net/ethernet/3com/3c515.c
drivers/net/ethernet/3com/3c574_cs.c
drivers/net/ethernet/3com/3c589_cs.c
drivers/net/ethernet/8390/axnet_cs.c
drivers/net/ethernet/8390/mcf8390.c
drivers/net/ethernet/8390/ne.c
drivers/net/ethernet/8390/ne2k-pci.c
drivers/net/ethernet/8390/pcnet_cs.c
drivers/net/ethernet/8390/smc-ultra.c
drivers/net/ethernet/amd/nmclan_cs.c
drivers/net/ethernet/fujitsu/fmvj18x_cs.c
drivers/net/ethernet/hp/hp100.c
drivers/net/ethernet/smsc/smc9194.c
drivers/net/ethernet/smsc/smc91c92_cs.c
drivers/net/ethernet/smsc/smc91x.h
drivers/net/ethernet/xircom/xirc2ps_cs.c
drivers/net/sb1000.c
drivers/net/wan/sbni.c
drivers/net/wireless/cisco/airo.c
drivers/net/wireless/intersil/hostap/hostap_cs.c
drivers/net/wireless/intersil/hostap/hostap_plx.c
drivers/net/wireless/marvell/libertas/if_cs.c
drivers/net/wireless/wl3501_cs.c
drivers/parport/parport_pc.c
drivers/pcmcia/m32r_cfc.c
drivers/scsi/NCR53c406a.c
drivers/scsi/aha152x.c
drivers/scsi/eata_pio.c
drivers/scsi/fdomain.c
drivers/scsi/g_NCR5380.c
drivers/scsi/imm.c
drivers/scsi/nsp32_io.h
drivers/scsi/pcmcia/nsp_io.h
drivers/scsi/pcmcia/sym53c500_cs.c
drivers/scsi/ppa.c
drivers/scsi/qlogicfas408.c
drivers/scsi/sym53c416.c
drivers/staging/comedi/drivers/adl_pci9111.c
drivers/staging/comedi/drivers/cb_pcidas.c
drivers/staging/comedi/drivers/das16m1.c
drivers/staging/comedi/drivers/das1800.c
drivers/staging/iio/adc/ad7606_par.c
drivers/tty/hvc/hvc_xen.c
drivers/tty/isicom.c
drivers/tty/rocket_int.h
drivers/usb/host/isp1362.h
drivers/usb/musb/blackfin.c
sound/drivers/opl4/opl4_lib.c
sound/isa/gus/gus_dram.c
sound/isa/gus/gus_pcm.c


Re: [RESEND] [v3 PATCH 2/2] powernv/powerpc: Clear PECE1 in LPCR via stop-api only on Hotplug

2017-07-26 Thread Nicholas Piggin
On Fri, 21 Jul 2017 16:31:34 +0530
"Gautham R. Shenoy"  wrote:

> From: "Gautham R. Shenoy" 
> 
> Currently we use the stop-api provided by the firmware to program the
> SLW engine to restore the values of hypervisor resources that get lost
> on deeper idle states (such as winkle). Since the deep states were
> only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
> to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
> woken up by decrementer.
> 
> On POWER9, some of the deep platform idle states such as stop4 can be
> used in cpuidle as well. In this case, we want the CPU in stop4 to be
> woken up by the decrementer when some timer on the CPU expires.
> 
> In this patch, we program the stop-api for LPCR with PECE1
> bit cleared only when we are offlining the CPU and set it
> back once the CPU is online.

Looks pretty good to me, thanks!

Reviewed-by: Nicholas Piggin 


> 
> Signed-off-by: Gautham R. Shenoy 
> ---
> v2 --> v3:
> - Program the LPCR during platform idle entry/exit on both POWER8 and
>   POWER9
> 
> v1 --> v2:
> - Move the LPCR manipulations for CPU-Hotplug into idle.c
> 
>  arch/powerpc/platforms/powernv/idle.c | 34 +-
>  arch/powerpc/platforms/powernv/smp.c  |  8 
>  2 files changed, 33 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 2abee07..a1296e7 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -68,7 +68,7 @@ static int pnv_save_sprs_for_deep_states(void)
>* all cpus at boot. Get these reg values of current cpu and use the
>* same across all cpus.
>*/
> - uint64_t lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
> + uint64_t lpcr_val = mfspr(SPRN_LPCR);
>   uint64_t hid0_val = mfspr(SPRN_HID0);
>   uint64_t hid1_val = mfspr(SPRN_HID1);
>   uint64_t hid4_val = mfspr(SPRN_HID4);
> @@ -355,6 +355,14 @@ void power9_idle(void)
>  }
>  
>  #ifdef CONFIG_HOTPLUG_CPU
> +static void pnv_program_cpu_hotplug_lpcr(unsigned int cpu, u64 lpcr_val)
> +{
> + u64 pir = get_hard_smp_processor_id(cpu);
> +
> + mtspr(SPRN_LPCR, lpcr_val);
> + opal_slw_set_reg(pir, SPRN_LPCR, lpcr_val);
> +}
> +
>  /*
>   * pnv_cpu_offline: A function that puts the CPU into the deepest
>   * available platform idle state on a CPU-Offline.
> @@ -364,6 +372,20 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
>  {
>   unsigned long srr1;
>   u32 idle_states = pnv_get_supported_cpuidle_states();
> + u64 lpcr_val;
> +
> + /*
> +  * We don't want to take decrementer interrupts while we are
> +  * offline, so clear LPCR:PECE1. We keep PECE2 (and
> +  * LPCR_PECE_HVEE on P9) enabled as to let IPIs in.
> +  *
> +  * If the CPU gets woken up by a special wakeup, ensure that
> +  * the SLW engine sets LPCR with decrementer bit cleared, else
> +  * the CPU will come back to the kernel due to a spurious
> +  * wakeup.
> +  */
> + lpcr_val = mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1;
> + pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val);
>  
>   __ppc64_runlatch_off();
>  
> @@ -394,6 +416,16 @@ unsigned long pnv_cpu_offline(unsigned int cpu)
>  
>   __ppc64_runlatch_on();
>  
> + /*
> +  * Re-enable decrementer interrupts in LPCR.
> +  *
> +  * Further, we want stop states to be woken up by decrementer
> +  * for non-hotplug cases. So program the LPCR via stop api as
> +  * well.
> +  */
> + lpcr_val = mfspr(SPRN_LPCR) | (u64)LPCR_PECE1;
> + pnv_program_cpu_hotplug_lpcr(cpu, lpcr_val);
> +
>   return srr1;
>  }
>  #endif
> diff --git a/arch/powerpc/platforms/powernv/smp.c 
> b/arch/powerpc/platforms/powernv/smp.c
> index 40dae96..536b07b 100644
> --- a/arch/powerpc/platforms/powernv/smp.c
> +++ b/arch/powerpc/platforms/powernv/smp.c
> @@ -164,12 +164,6 @@ static void pnv_smp_cpu_kill_self(void)
>   if (cpu_has_feature(CPU_FTR_ARCH_207S))
>   wmask = SRR1_WAKEMASK_P8;
>  
> - /* We don't want to take decrementer interrupts while we are offline,
> -  * so clear LPCR:PECE1. We keep PECE2 (and LPCR_PECE_HVEE on P9)
> -  * enabled as to let IPIs in.
> -  */
> - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
> -
>   while (!generic_check_cpu_restart(cpu)) {
>   /*
>* Clear IPI flag, since we don't handle IPIs while
> @@ -219,8 +213,6 @@ static void pnv_smp_cpu_kill_self(void)
>  
>   }
>  
> - /* Re-enable decrementer interrupts */
> - mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
>   DBG("CPU%d coming online...\n", cpu);
>  }
>  



Re: [v3 PATCH 1/2] powernv/powerpc:Save/Restore additional SPRs for stop4 cpuidle

2017-07-26 Thread Nicholas Piggin
On Fri, 21 Jul 2017 16:11:37 +0530
"Gautham R. Shenoy"  wrote:

> From: "Gautham R. Shenoy" 
> 
> The stop4 idle state on POWER9 is a deep idle state which loses
> hypervisor resources, but whose latency is low enough that it can be
> exposed via cpuidle.
> 
> Until now, the deep idle states which lose hypervisor resources (eg:
> winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
> from such states, barring a few SPRs which need to be restored to
> their older value, rest of the SPRS are reinitialized to their values
> corresponding to that at boot time.
> 
> When stop4 is used in the context of cpuidle, we want these additional
> SPRs to be restored to their older value, to ensure that the context
> on the CPU coming back from idle is same as it was before going idle.
> 
> In this patch, we define a SPR save area in PACA (since we have used
> up the volatile register space in the stack) and on POWER9, we restore
> SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
> SPRN_MMCR2 to the values they had before entering stop.
> 
> Signed-off-by: Gautham R. Shenoy 

Looks good to me. Keeping in mind we need to tidy up and unify
all this SPR handling and save/restore etc. in the longer term.

Reviewed-by: Nicholas Piggin 

> ---
> v2-->v3:
> - Use a structure instead of an array for the stop sprs save area.
> - Name the offsets into the paca->stop_sprs as STOP_XXX instead of
>   PACA_XXX.
> - Add comments in the assembly code explaining why saving/restoring
>   is not needed on POWER8.
> v1-->v2:
> No change
> 
>  arch/powerpc/include/asm/cpuidle.h | 11 +++
>  arch/powerpc/include/asm/paca.h|  7 
>  arch/powerpc/kernel/asm-offsets.c  |  8 +
>  arch/powerpc/kernel/idle_book3s.S  | 65 
> --
>  4 files changed, 89 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cpuidle.h 
> b/arch/powerpc/include/asm/cpuidle.h
> index 52586f9..8a174cb 100644
> --- a/arch/powerpc/include/asm/cpuidle.h
> +++ b/arch/powerpc/include/asm/cpuidle.h
> @@ -67,6 +67,17 @@
>  #define ERR_DEEP_STATE_ESL_MISMATCH  -2
>  
>  #ifndef __ASSEMBLY__
> +/* Additional SPRs that need to be saved/restored during stop */
> +struct stop_sprs {
> + u64 pid;
> + u64 ldbar;
> + u64 fscr;
> + u64 hfscr;
> + u64 mmcr1;
> + u64 mmcr2;
> + u64 mmcra;
> +};
> +
>  extern u32 pnv_fastsleep_workaround_at_entry[];
>  extern u32 pnv_fastsleep_workaround_at_exit[];
>  
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index dc88a31..04b60af 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -31,6 +31,7 @@
>  #endif
>  #include 
>  #include 
> +#include 
>  
>  register struct paca_struct *local_paca asm("r13");
>  
> @@ -183,6 +184,12 @@ struct paca_struct {
>   struct paca_struct **thread_sibling_pacas;
>   /* The PSSCR value that the kernel requested before going to stop */
>   u64 requested_psscr;
> +
> + /*
> +  * Save area for additional SPRs that need to be
> +  * saved/restored during cpuidle stop.
> +  */
> + struct stop_sprs stop_sprs;
>  #endif
>  
>  #ifdef CONFIG_PPC_STD_MMU_64
> diff --git a/arch/powerpc/kernel/asm-offsets.c 
> b/arch/powerpc/kernel/asm-offsets.c
> index a7b5af3..e2a48df 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -743,6 +743,14 @@ int main(void)
>   OFFSET(PACA_SUBCORE_SIBLING_MASK, paca_struct, subcore_sibling_mask);
>   OFFSET(PACA_SIBLING_PACA_PTRS, paca_struct, thread_sibling_pacas);
>   OFFSET(PACA_REQ_PSSCR, paca_struct, requested_psscr);
> +#define STOP_SPR(x, f)   OFFSET(x, paca_struct, stop_sprs.f)
> + STOP_SPR(STOP_PID, pid);
> + STOP_SPR(STOP_LDBAR, ldbar);
> + STOP_SPR(STOP_FSCR, fscr);
> + STOP_SPR(STOP_HFSCR, hfscr);
> + STOP_SPR(STOP_MMCR1, mmcr1);
> + STOP_SPR(STOP_MMCR2, mmcr2);
> + STOP_SPR(STOP_MMCRA, mmcra);
>  #endif
>  
>   DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 5adb390e..5e6af97 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -84,7 +84,61 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_300)
>   std r3,_WORT(r1)
>   mfspr   r3,SPRN_WORC
>   std r3,_WORC(r1)
> +/*
> + * On POWER9, there are idle states such as stop4, invoked via cpuidle,
> + * that lose hypervisor resources. In such cases, we need to save
> + * additional SPRs before entering those idle states so that they can
> + * be restored to their older values on wakeup from the idle state.
> + *
> + * On POWER8, the only such deep idle state is winkle which is used
> + * only in the context of CPU-Hotplug, where these additional SPRs are
> + * reinitiazed to a sane 

Re: [PATCH 1/6] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-07-26 Thread Aneesh Kumar K.V
Ram Pai  writes:

> Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> in the 4K backed HPTE pages.These bits continue to be used
> for 64K backed HPTE pages in this patch, but will be freed
> up in the next patch. The  bit  numbers are big-endian  as
> defined in the ISA3.0
>
> The patch does the following change to the 4k htpe backed
> 64K PTE's format.
>
> H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure
>   below)
> V0 which occupied bit 4 is not used anymore.
> V1 which occupied bit 5 is not used anymore.
> V2 which occupied bit 6 is not used anymore.
> V3 which occupied bit 7 is not used anymore.
>
> Before the patch, the 4k backed 64k PTE format was as follows
>
>  0 1 2 3 4  5  6  7  8 9 10...63
>  : : : : :  :  :  :  : : ::
>  v v v v v  v  v  v  v v vv
>
> ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> |x|x|x|B|V0|V1|V2|V3|x| | |x|x||x|x|x|x| <- primary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> |S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
>
> After the patch, the 4k backed 64k PTE format is as follows
>
>  0 1 2 3 4  5  6  7  8 9 10...63
>  : : : : :  :  :  :  : : ::
>  v v v v v  v  v  v  v v vv
>
> ,-,-,-,-,--,--,--,--,-,-,-,-,-,--,-,-,-,
> |x|x|x| |  |  |  |  |x|B| |x|x||.|.|.|.| <- primary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'_''_'_'_'_'
> |S|G|I|X|S |G |I |X |S|G|I|X|..|S|G|I|X| <- secondary pte
> '_'_'_'_'__'__'__'__'_'_'_'_'__'_'_'_'_'
>
> the four  bits S,G,I,X (one quadruplet per 4k HPTE) that
> cache  the  hash-bucket  slot  value, is initialized  to
> 1,1,1,1 indicating -- an invalid slot.   If  a HPTE gets
> cached in a   slot(i.e 7th  slot  of  secondary hash
> bucket), it is  released  immediately. In  other  words,
> even  though    is   a valid slot  value in the hash
> bucket, we consider it invalid and  release the slot and
> the HPTE.  This  gives  us  the opportunity to determine
> the validity of S,G,I,X  bits  based on its contents and
> not on any of the bits V0,V1,V2 or V3 in the primary PTE
>
> When   we  release  aHPTEcached in the  slot
> we alsorelease  a  legitimate   slot  in the primary
> hash bucket  and  unmap  its  corresponding  HPTE.  This
> is  to  ensure   that  we do get a HPTE cached in a slot
> of the primary hash bucket, the next time we retry.
>
> Though  treating    slot  as  invalid,  reduces  the
> number of  available  slots  in the hash bucket and  may
> have  an  effect   on the performance, the probabilty of
> hitting a  slot is extermely low.
>
> Compared  to  the  current   scheme, the above described
> scheme  reduces  the  number of false hash table updates
> significantly   andhas  the   added   advantage   of
> releasing  four  valuable  PTE bits for other purpose.
>
> NOTE:even though bits 3, 4, 5, 6, 7 are  not  used  when
> the  64K  PTE is backed by 4k HPTE,  they continue to be
> used  if  the  PTE  gets  backed  by 64k HPTE.  The next
> patch will decouple that aswell, and truely  release the
> bits.
>
> This idea was jointly developed by Paul Mackerras,
> Aneesh, Michael Ellermen and myself.
>
> 4K PTE format remains unchanged currently.
>
> The patch does the following code changes
> a) PTE flags are split between 64k and 4k  header files.
> b) __hash_page_4K()  is  reimplemented   to reflect the
>above logic.
>
> Reviewed-by: Aneesh Kumar K.V 
> Signed-off-by: Ram Pai 
> ---
>  arch/powerpc/include/asm/book3s/64/hash-4k.h  |2 +
>  arch/powerpc/include/asm/book3s/64/hash-64k.h |8 +--
>  arch/powerpc/include/asm/book3s/64/hash.h |1 -
>  arch/powerpc/mm/hash64_64k.c  |   74 
> -
>  arch/powerpc/mm/hash_utils_64.c   |4 +-
>  5 files changed, 55 insertions(+), 34 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> index 0c4e470..f959c00 100644
> --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> @@ -16,6 +16,8 @@
>  #define H_PUD_TABLE_SIZE (sizeof(pud_t) << H_PUD_INDEX_SIZE)
>  #define H_PGD_TABLE_SIZE (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
>
> +#define H_PAGE_BUSY  _RPAGE_RSV1 /* software: PTE & hash are busy */
> +
>  /* PTE flags to conserve for HPTE identification */
>  #define _PAGE_HPTEFLAGS (H_PAGE_BUSY | H_PAGE_HASHPTE | \
>H_PAGE_F_SECOND | H_PAGE_F_GIX)
> diff --git a/arch/powerpc/include/asm/book3s/64/hash-64k.h 
> b/arch/powerpc/include/asm/book3s/64/hash-64k.h
> index 

Re: [PATCH 4/4] of/fdt: only store the device node basename in full_name

2017-07-26 Thread Michael Ellerman
Rob Herring  writes:

> With dependencies on a statically allocated full path name converted to
> use %pOF format specifier, we can store just the basename of node, and
> the unflattening of the FDT can be simplified.
>
> This commit will affect the remaining users of full_name. After
> analyzing these users, the remaining cases should only change some print
> messages. The main users of full_name are providing a name for struct
> resource. The resource names shouldn't be important other than providing
> /proc/iomem names.
>
> We no longer distinguish between pre and post 0x10 dtb formats as either
> a full path or basename will work. However, less than 0x10 formats have
> been broken since the conversion to use libfdt (and no one has cared).

For the record - yes we did care. It broke booting with old versions of
kexec, and it was a royal P.I.T.# to debug :D

cheers


RE: [PATCH 3/4] powerpc: pseries: only store the device node basename in full_name

2017-07-26 Thread David Laight
From: Rob Herring
> Sent: 25 July 2017 22:44
> With dependencies on full_name containing the entire device node path
> removed, stop storing the full_name in nodes created by
> dlpar_configure_connector() and pSeries_reconfig_add_node().
...
>   dn = kzalloc(sizeof(*dn), GFP_KERNEL);
>   if (!dn)
>   return NULL;
> 
>   name = (char *)ccwa + be32_to_cpu(ccwa->name_offset);
> - dn->full_name = kasprintf(GFP_KERNEL, "%s/%s", path, name);
> + dn->full_name = kasprintf(GFP_KERNEL, "%s", name);

Isn't this just strdup()?
Perhaps you should rename full_name to name - since it is no longer 'full'?

Maybe you could do a single malloc() for both 'dn' and the name?

David



Re: [PATCH v3 4/5] powerpc/lib/sstep: Add prty instruction emulation

2017-07-26 Thread Michael Ellerman
Segher Boessenkool  writes:

> On Tue, Jul 25, 2017 at 01:33:19PM +1000, Matt Brown wrote:
>> +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v,
>> +int size, int ra)
>> +{
>> +unsigned long long res = v;
>> +
>> +res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8));
>> +res = (0x00070007 & res) + (0x00070007 & (res >> 16));
>> +if (size == 32) {   /* prtyw */
>> +regs->gpr[ra] = (0x00010001 & res);
>> +return;
>> +}
>> +
>> +res = (0x000f & res) + (0x000f & (res >> 32));
>> +regs->gpr[ra] = res & 1;/*prtyd */
>> +}
>
> Does 7's and 0xf look strange (since the top bit in the values there
> is always 0).  You always "and" the values with 1 later, you could do
> that immediately (and change + to ^).
>
> A general question about these patches: some things are inside #ifdef
> __powerpc64__, some are not.  It seems it is the wrong macro, and it
> should be used (or not used) consistently?

Why is it the wrong macro? Because we tend to use CONFIG_PPC64 you mean?

I thought the reason some are #ifdef'ed is that some are 64-bit only.
ie. bpermd is 64-bit only ?

cheers


Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Wed, 26 Jul 2017 09:16:23 +0100
Jonathan Cameron  wrote:

> On Tue, 25 Jul 2017 21:12:17 -0700
> "Paul E. McKenney"  wrote:
> 
> > On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:  
> > > From: "Paul E. McKenney" 
> > > Date: Tue, 25 Jul 2017 20:55:45 -0700
> > > 
> > > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote:
> > > >> Just to report, turning softlockup back on fixes things for me on
> > > >> sparc64 too.
> > > > 
> > > > Very good!
> > > > 
> > > >> The thing about softlockup is it runs an hrtimer, which seems to run
> > > >> about every 4 seconds.
> > > > 
> > > > I could see where that could shake things loose, but I am surprised that
> > > > it would be needed.  I ran a short run with CONFIG_SOFTLOCKUP_DETECTOR=y
> > > > with no trouble, but I will be running a longer test later on.
> > > > 
> > > >> So I wonder if this is a NO_HZ problem.
> > > > 
> > > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  What are
> > > > you running?  (Again, my symptoms are slightly different, so I might
> > > > be seeing a different bug.)
> > > 
> > > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > > 
> > > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled.
> > 
> > Same here -- but my failure case happens fairly rarely, so it will take
> > some time to gain reasonable confidence that enabling SOFTLOCKUP_DETECTOR
> > had effect.
> > 
> > But you are right, might be interesting to try NO_HZ_PERIODIC=y
> > or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> > 
> > Thanx, Paul
> >   
> I'll be the headless chicken running around and trying as many tests
> as I can fit in.  Typical time to see the failure for us is sub 10
> minutes so we'll see how far we get.
> 
> Make me a list to run if you like ;)
> 
> NO_HZ_PERIODIC=y running now.
By which I mean CONFIG_HZ_PERIODIC=y

Anyhow, run for 40 minutes with out seeing a splat but my sanity check
on the NO_FULL_HZ=n and NO_HZ_IDLE=y this morning took 20 minutes so
I won't have much confidence until we are a few hours in on this.

Anyhow, certainly looking like a promising direction for investigation!

Jonathan

> 
> Jonathan
> 
> ___
> linuxarm mailing list
> linux...@huawei.com
> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm




Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Tue, 25 Jul 2017 21:12:17 -0700
"Paul E. McKenney"  wrote:

> On Tue, Jul 25, 2017 at 09:02:33PM -0700, David Miller wrote:
> > From: "Paul E. McKenney" 
> > Date: Tue, 25 Jul 2017 20:55:45 -0700
> >   
> > > On Tue, Jul 25, 2017 at 02:10:29PM -0700, David Miller wrote:  
> > >> Just to report, turning softlockup back on fixes things for me on
> > >> sparc64 too.  
> > > 
> > > Very good!
> > >   
> > >> The thing about softlockup is it runs an hrtimer, which seems to run
> > >> about every 4 seconds.  
> > > 
> > > I could see where that could shake things loose, but I am surprised that
> > > it would be needed.  I ran a short run with CONFIG_SOFTLOCKUP_DETECTOR=y
> > > with no trouble, but I will be running a longer test later on.
> > >   
> > >> So I wonder if this is a NO_HZ problem.  
> > > 
> > > Might be.  My tests run with NO_HZ_FULL=n and NO_HZ_IDLE=y.  What are
> > > you running?  (Again, my symptoms are slightly different, so I might
> > > be seeing a different bug.)  
> > 
> > I run with NO_HZ_FULL=n and NO_HZ_IDLE=y, just like you.
> > 
> > To clarify, the symptoms show up with SOFTLOCKUP_DETECTOR disabled.  
> 
> Same here -- but my failure case happens fairly rarely, so it will take
> some time to gain reasonable confidence that enabling SOFTLOCKUP_DETECTOR
> had effect.
> 
> But you are right, might be interesting to try NO_HZ_PERIODIC=y
> or NO_HZ_FULL=y.  So many possible tests, and so little time.  ;-)
> 
>   Thanx, Paul
> 
I'll be the headless chicken running around and trying as many tests
as I can fit in.  Typical time to see the failure for us is sub 10
minutes so we'll see how far we get.

Make me a list to run if you like ;)

NO_HZ_PERIODIC=y running now.

Jonathan



Re: RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?

2017-07-26 Thread Jonathan Cameron
On Tue, 25 Jul 2017 20:53:06 -0700
"Paul E. McKenney"  wrote:

> On Wed, Jul 26, 2017 at 12:52:07AM +0800, Jonathan Cameron wrote:
> > On Tue, 25 Jul 2017 08:12:45 -0700
> > "Paul E. McKenney"  wrote:
> >   
> > > On Tue, Jul 25, 2017 at 10:42:45PM +0800, Jonathan Cameron wrote:  
> > > > On Tue, 25 Jul 2017 06:46:26 -0700
> > > > "Paul E. McKenney"  wrote:
> > > > 
> > > > > On Tue, Jul 25, 2017 at 10:26:54PM +1000, Nicholas Piggin wrote:
> > > > > > On Tue, 25 Jul 2017 19:32:10 +0800
> > > > > > Jonathan Cameron  wrote:
> > > > > >   
> > > > > > > Hi All,
> > > > > > > 
> > > > > > > We observed a regression on our d05 boards (but curiously not
> > > > > > > the fairly similar but single socket / smaller core count
> > > > > > > d03), initially seen with linux-next prior to the merge window
> > > > > > > and still present in v4.13-rc2.
> > > > > > > 
> > > > > > > The symptom is:  
> > > > > 
> > > > > Adding Dave Miller and the sparcli...@vger.kernel.org email on CC, as
> > > > > they have been seeing something similar, and you might well have saved
> > > > > them the trouble of bisecting.
> > > > > 
> > > > > [ . . . ]
> > > > > 
> > > > > > > [ 1984.628602] rcu_preempt kthread starved for 5663 jiffies! 
> > > > > > > g1566 c1565 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1  
> > > > > 
> > > > > This is the cause from an RCU perspective.  You had a lot of idle 
> > > > > CPUs,
> > > > > and RCU is not permitted to disturb them -- the battery-powered 
> > > > > embedded
> > > > > guys get very annoyed by that sort of thing.  What happens instead is
> > > > > that each CPU updates a per-CPU state variable when entering or 
> > > > > exiting
> > > > > idle, and the grace-period kthread ("rcu_preempt kthread" in the above
> > > > > message) checks these state variables, and if when sees an idle CPU,
> > > > > it reports a quiescent state on that CPU's behalf.
> > > > > 
> > > > > But the grace-period kthread can only do this work if it gets a chance
> > > > > to run.  And the message above says that this kthread hasn't had a 
> > > > > chance
> > > > > to run for a full 5,663 jiffies.  For completeness, the "g1566 c1565"
> > > > > says that grace period #1566 is in progress, the "f0x0" says that no 
> > > > > one
> > > > > is needing another grace period #1567.  The "RCU_GP_WAIT_FQS(3)" says
> > > > > that the grace-period kthread has fully initialized the current grace
> > > > > period and is sleeping for a few jiffies waiting to scan for idle 
> > > > > tasks.
> > > > > Finally, the "->state=0x1" says that the grace-period kthread is in
> > > > > TASK_INTERRUPTIBLE state, in other words, still sleeping.
> > > > 
> > > > Thanks for the explanation!
> > > > > 
> > > > > So my first question is "What did commit 05a4a9527 (kernel/watchdog:
> > > > > split up config options) do to prevent the grace-period kthread from
> > > > > getting a chance to run?" 
> > > > 
> > > > As far as we can tell it was a side effect of that patch.
> > > > 
> > > > The real cause is that patch changed the result of defconfigs to stop 
> > > > running
> > > > the softlockup detector - now CONFIG_SOFTLOCKUP_DETECTOR
> > > > 
> > > > Enabling that on 4.13-rc2 (and presumably everything in between)
> > > > means we don't see the problem any more.
> > > > 
> > > > > I must confess that I don't see anything
> > > > > obvious in that commit, so my second question is "Are we sure that
> > > > > reverting this commit makes the problem go away?"
> > > > 
> > > > Simply enabling CONFIG_SOFTLOCKUP_DETECTOR seems to make it go away.
> > > > That detector fires up a thread on every cpu, which may be relevant.
> > > 
> > > Interesting...  Why should it be necessary to fire up a thread on every
> > > CPU in order to make sure that RCU's grace-period kthreads get some
> > > CPU time?  Especially give how many idle CPUs you had on your system.
> > > 
> > > So I have to ask if there is some other bug that the softlockup detector
> > > is masking.  
> > I am thinking the same.  We can try going back further than 4.12 tomorrow
> > (we think we can realistically go back to 4.8 and possibly 4.6
> > with this board)  
> 
> Looking forward to seeing results!
> 
> > > > > and my third is "Is
> > > > > this an intermittent problem that led to a false bisection?"
> > > > 
> > > > Whilst it is a bit slow to occur, we verified with long runs on either
> > > > side of that patch and since with the option enabled on latest mainline.
> > > > 
> > > > Also can cause the issue before that patch by disabling the previous
> > > > relevant option on 4.12.
> > > 
> > > OK, thank you -- hard to argue with that!  ;-)  
> > 
> > We thought it was a pretty unlikely a bisection result
> > hence hammered it thoroughly ;)  
> 
> Glad that I am not the only paranoid one out here.  ;-)
> 
> > > Except that I am still 

Re: [PATCH v3 4/5] powerpc/lib/sstep: Add prty instruction emulation

2017-07-26 Thread Gabriel Paubert
On Tue, Jul 25, 2017 at 06:08:05PM +1000, Balbir Singh wrote:
> On Tue, 2017-07-25 at 13:33 +1000, Matt Brown wrote:
> > This adds emulation for the prtyw and prtyd instructions.
> > Tested for logical correctness against the prtyw and prtyd instructions
> > on ppc64le.
> > 
> > Signed-off-by: Matt Brown 
> > ---
> > v3:
> > - optimised using the Giles-Miller method of side-ways addition
> > v2:
> > - fixed opcodes
> > - fixed bitshifting and typecast errors
> > - merged do_prtyw and do_prtyd into single function
> > ---
> >  arch/powerpc/lib/sstep.c | 27 +++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> > index 6a79618..0bcf631 100644
> > --- a/arch/powerpc/lib/sstep.c
> > +++ b/arch/powerpc/lib/sstep.c
> > @@ -655,6 +655,25 @@ static nokprobe_inline void do_bpermd(struct pt_regs 
> > *regs, unsigned long v1,
> > regs->gpr[ra] = perm;
> >  }
> >  #endif /* __powerpc64__ */
> > +/*
> > + * The size parameter adjusts the equivalent prty instruction.
> > + * prtyw = 32, prtyd = 64
> > + */
> > +static nokprobe_inline void do_prty(struct pt_regs *regs, unsigned long v,
> > +   int size, int ra)
> > +{
> > +   unsigned long long res = v;
> > +
> > +   res = (0x0001000100010001 & res) + (0x0001000100010001 & (res >> 8));
> > +   res = (0x00070007 & res) + (0x00070007 & (res >> 16));
> > +   if (size == 32) {   /* prtyw */
> > +   regs->gpr[ra] = (0x00010001 & res);
> > +   return;
> > +   }
> > +
> > +   res = (0x000f & res) + (0x000f & (res >> 32));
> > +   regs->gpr[ra] = res & 1;/*prtyd */
> 
> Looks good, you can also xor instead of adding, but the masks would need
> to change in that case.

Actually, I'd prefer to use xor for this case, since you hardly ever
need any mask, except at the end. Most masks are only needed because of
carry propagation, which the xor completely avoid.

In other words:

unsigned long long res = v ^ (v >> 8);
res ^= res >> 16; 
if (size == 32) {
regs->gpr[ra] = res & 0x00010001;
return;
}
res ^= res >> 32;
regs-> gpr[ra] = res & 1;

should work, but I've not even compiled it.

Gabriel


Re: [PATCH v3 2/5] powerpc/lib/sstep: Add popcnt instruction emulation

2017-07-26 Thread Gabriel Paubert
On Tue, Jul 25, 2017 at 01:33:17PM +1000, Matt Brown wrote:
> This adds emulations for the popcntb, popcntw, and popcntd instructions.
> Tested for correctness against the popcnt{b,w,d} instructions on ppc64le.
> 
> Signed-off-by: Matt Brown 
> ---
> v3:
>   - optimised using the Giles-Miller method of side-ways addition
> v2:
>   - fixed opcodes
>   - fixed typecasting
>   - fixed bitshifting error for both 32 and 64bit arch
> ---
>  arch/powerpc/lib/sstep.c | 40 +++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 87d277f..c1f9cdb 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -612,6 +612,32 @@ static nokprobe_inline void do_cmpb(struct pt_regs 
> *regs, unsigned long v1,
>   regs->gpr[rd] = out_val;
>  }
>  
> +/*
> + * The size parameter is used to adjust the equivalent popcnt instruction.
> + * popcntb = 8, popcntw = 32, popcntd = 64
> + */
> +static nokprobe_inline void do_popcnt(struct pt_regs *regs, unsigned long v1,
> + int size, int ra)
> +{
> + unsigned long long out = v1;
> +
> + out = (0x & out) + (0x & (out >> 1));

This can be simplified in a less obvious way as:
out -= (out >> 1) & 0x;

which maps each pair of bits according to the following:
00 -> 00
01 -> 01
10 -> 01
11 -> 10

This should save one instruction.

> + out = (0x & out) + (0x & (out >> 2));

Ok, but now each nibble is between 0 and 4, so the addition of two
nibbles can't overflow or generate carry into the higher one.

> + out = (0x0f0f0f0f0f0f0f0f & out) + (0x0f0f0f0f0f0f0f0f & (out >> 4));

out += out >> 4;
out &= 0x0f0f0f0f0f0f0f0f;

which should also save one instruction

> + if (size == 8) {/* popcntb */
> + regs->gpr[ra] = out;
> + return;
> + }

At this point each count occupies at least one byte and can no more
overflow, so masking is only needed before returning.

> + out = (0x001f001f001f001f & out) + (0x001f001f001f001f & (out >> 8));
out += out >> 8;

> + out = (0x003f003f & out) + (0x003f003f & (out >> 16));

out += out >> 16;

> + if (size == 32) {   /* popcntw */
> + regs->gpr[ra] = out;
regs->gpr[ra] = out & 0x003f003f;

> + return;
> + }
> + out = (0x007f & out) + (0x007f & (out >> 32));
out = (out + (out >> 32)) & 0x7f;


Gabriel

> + regs->gpr[ra] = out;/* popcntd */
> +}
> +
>  static nokprobe_inline int trap_compare(long v1, long v2)
>  {
>   int ret = 0;
> @@ -1194,6 +1220,10 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   regs->gpr[ra] = regs->gpr[rd] & ~regs->gpr[rb];
>   goto logical_done;
>  
> + case 122:   /* popcntb */
> + do_popcnt(regs, regs->gpr[rd], 8, ra);
> + goto logical_done;
> +
>   case 124:   /* nor */
>   regs->gpr[ra] = ~(regs->gpr[rd] | regs->gpr[rb]);
>   goto logical_done;
> @@ -1206,6 +1236,10 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   regs->gpr[ra] = regs->gpr[rd] ^ regs->gpr[rb];
>   goto logical_done;
>  
> + case 378:   /* popcntw */
> + do_popcnt(regs, regs->gpr[rd], 32, ra);
> + goto logical_done;
> +
>   case 412:   /* orc */
>   regs->gpr[ra] = regs->gpr[rd] | ~regs->gpr[rb];
>   goto logical_done;
> @@ -1217,7 +1251,11 @@ int analyse_instr(struct instruction_op *op, struct 
> pt_regs *regs,
>   case 476:   /* nand */
>   regs->gpr[ra] = ~(regs->gpr[rd] & regs->gpr[rb]);
>   goto logical_done;
> -
> +#ifdef __powerpc64__
> + case 506:   /* popcntd */
> + do_popcnt(regs, regs->gpr[rd], 64, ra);
> + goto logical_done;
> +#endif
>   case 922:   /* extsh */
>   regs->gpr[ra] = (signed short) regs->gpr[rd];
>   goto logical_done;
> -- 
> 2.9.3