Re: [PATCH v9 00/12] Support PPTT for ARM64

2018-06-04 Thread Catalin Marinas
On Thu, May 17, 2018 at 06:05:24PM +0100, Catalin Marinas wrote:
> On Fri, May 11, 2018 at 06:57:55PM -0500, Jeremy Linton wrote:
> > Jeremy Linton (12):
> >   drivers: base: cacheinfo: move cache_setup_of_node()
> >   drivers: base: cacheinfo: setup DT cache properties early
> >   cacheinfo: rename of_node to fw_token
> >   arm64/acpi: Create arch specific cpu to acpi id helper
> >   ACPI/PPTT: Add Processor Properties Topology Table parsing
> >   ACPI: Enable PPTT support on ARM64
> >   drivers: base cacheinfo: Add support for ACPI based firmware tables
> >   arm64: Add support for ACPI based firmware tables
> >   arm64: topology: rename cluster_id
> >   arm64: topology: enable ACPI/PPTT based CPU topology
> >   ACPI: Add PPTT to injectable table list
> >   arm64: topology: divorce MC scheduling domain from core_siblings
> 
> Queued for 4.18 (without Sudeep's latest property_read_u64 cacheinfo
> patch - http://lkml.kernel.org/r/20180517154701.GA20281@e107155-lin; I
> can add it separately).

I'm going to revert patch 12 in this series (arm64: topology: divorce MC
scheduling domain from core_siblings) until the problem is understood
and a fix proposed and tested. It's likely that the PPTT for arm64 will
only be fully enabled in 4.19.

-- 
Catalin


Re: [PATCH v9 00/12] Support PPTT for ARM64

2018-05-29 Thread Robin Murphy

On 29/05/18 16:51, Geert Uytterhoeven wrote:

Hi Will,

On Tue, May 29, 2018 at 5:08 PM, Will Deacon  wrote:

On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:

On 29/05/18 12:56, Geert Uytterhoeven wrote:

On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla  wrote:

On 29/05/18 11:48, Geert Uytterhoeven wrote:

System supend still works fine on systems with big cores only:

 R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
 R-Car M3-N (2xCA57)

Reverting this commit fixes the issue for me.


I can't find anything that relates to system suspend in these patches
unless they are messing with something during CPU hot plug-in back
during resume.


It's only the last patch that introduces the breakage.



As specified in the commit log, it won't change any behavior for DT
systems if it's non-NUMA or single node system. So I am still wondering
what could trigger this regression.


I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
to the scheduler, although I can't see how this would happen.

Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
do you see anything shouting in dmesg?


Thanks, but unfortunately it doesn't help.
I added some debug code to print cpumask, but so far I don't see anything
suspicious.


Do you have CONFIG_NUMA enabled? On a hunch I've managed to reproduce 
what looks like the same thing on a Juno board with NUMA=n; going in 
with external debug it seems to be stuck in the loop in 
init_sched_groups_capacity(), with an approximate stack trace of:



init_sched_groups_capacity()
partition_sched_domains()
cpuset_cpu_active()
sched_cpu_activate()
cpuhp_invoke_callback()
cpuhp_thread_fn()

My hunch is based on the fact that it looks like we can, under the right 
circumstances, end up with default_topology picking up cpu_online_mask 
as a sibling mask via cpu_coregroup_mask(), and given the great 
coincidence that that's going to change when hotplugging out CPUs on 
suspend, things might not react too well to that. Things also look to go 
utterly haywire once into a full-blown systemd userspace with cpuidle, 
but I haven't got a clear picture of that yet.


Robin.


Re: [PATCH v9 00/12] Support PPTT for ARM64

2018-05-29 Thread Geert Uytterhoeven
Hi Will,

On Tue, May 29, 2018 at 5:08 PM, Will Deacon  wrote:
> On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote:
>> On 29/05/18 12:56, Geert Uytterhoeven wrote:
>> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla  wrote:
>> >> On 29/05/18 11:48, Geert Uytterhoeven wrote:
>> >>> System supend still works fine on systems with big cores only:
>> >>>
>> >>> R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware))
>> >>> R-Car M3-N (2xCA57)
>> >>>
>> >>> Reverting this commit fixes the issue for me.
>> >>
>> >> I can't find anything that relates to system suspend in these patches
>> >> unless they are messing with something during CPU hot plug-in back
>> >> during resume.
>> >
>> > It's only the last patch that introduces the breakage.
>> >
>>
>> As specified in the commit log, it won't change any behavior for DT
>> systems if it's non-NUMA or single node system. So I am still wondering
>> what could trigger this regression.
>
> I wonder if we're somehow giving an uninitialised/invalid NUMA configuration
> to the scheduler, although I can't see how this would happen.
>
> Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below
> do you see anything shouting in dmesg?

Thanks, but unfortunately it doesn't help.
I added some debug code to print cpumask, but so far I don't see anything
suspicious.

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH v9 00/12] Support PPTT for ARM64

2018-05-17 Thread Catalin Marinas
On Fri, May 11, 2018 at 06:57:55PM -0500, Jeremy Linton wrote:
> Jeremy Linton (12):
>   drivers: base: cacheinfo: move cache_setup_of_node()
>   drivers: base: cacheinfo: setup DT cache properties early
>   cacheinfo: rename of_node to fw_token
>   arm64/acpi: Create arch specific cpu to acpi id helper
>   ACPI/PPTT: Add Processor Properties Topology Table parsing
>   ACPI: Enable PPTT support on ARM64
>   drivers: base cacheinfo: Add support for ACPI based firmware tables
>   arm64: Add support for ACPI based firmware tables
>   arm64: topology: rename cluster_id
>   arm64: topology: enable ACPI/PPTT based CPU topology
>   ACPI: Add PPTT to injectable table list
>   arm64: topology: divorce MC scheduling domain from core_siblings

Queued for 4.18 (without Sudeep's latest property_read_u64 cacheinfo
patch - http://lkml.kernel.org/r/20180517154701.GA20281@e107155-lin; I
can add it separately).

Thanks.

-- 
Catalin