Re: next/master bisection: baseline.login on kontron-kbox-a-230-ls

2021-03-16 Thread Guillaume Tucker
Hi Sahil,

Please see the bisection report below about a boot failure on
kontron-kbox-a-230-ls on linux-next.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The kernel is hitting this issue:

[5.326403] kernel BUG at arch/arm64/kernel/traps.c:406!

Full log:

  
https://storage.kernelci.org/next/master/next-20210316/arm64/defconfig/gcc-8/lab-kontron/baseline-kontron-kbox-a-230-ls.html

The issue can be reproduced with a plain arm64 defconfig, and
doesn't seem to be impacting other platforms on kernelci.org.
More details can be found here:

  https://kernelci.org/test/case/id/605057a041fc669ff0addccc/

Please let us know if you need any help debugging the issue on
this platform or to try a fix.

Best wishes,
Guillaume


On 16/03/2021 14:23, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on kontron-kbox-a-230-ls
> 
> Summary:
>   Start:  0f4b0bb396f6 Add linux-next specific files for 20210316
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210316/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/clang-11/lab-kontron/baseline-kontron-kbox-a-230-ls.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210316/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/clang-11/lab-kontron/baseline-kontron-kbox-a-230-ls.html
>   Result: 48787485f8de arm64: dts: ls1028a: enable optee node
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: kontron-kbox-a-230-ls
>   CPU arch:   arm64
>   Lab:lab-kontron
>   Compiler:   clang-11
>   Config: defconfig+CONFIG_ARM64_64K_PAGES=y
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 48787485f8de44915016d4583e898b62bb2d5753
> Author: Sahil Malhotra 
> Date:   Fri Mar 5 14:03:51 2021 +0530
> 
> arm64: dts: ls1028a: enable optee node
> 
> optee node was disabled in ls1028a.dtsi, enabling it by default.
> 
> Signed-off-by: Sahil Malhotra 
> Signed-off-by: Shawn Guo 
> 
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index c1f2f402ad53..3d96c6beb7e2 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -95,7 +95,6 @@
>   optee {
>   compatible = "linaro,optee-tz";
>   method = "smc";
> - status = "disabled";
>   };
>   };
> ---
> 
> 
> Git bisection log:
> 
> ---
> git bisect start
> # good: [1e28eed17697bcf343c6743f0028cc3b5dd88bf0] Linux 5.12-rc3
> git bisect good 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
> # bad: [0f4b0bb396f6f424a7b074d00cb71f5966edcb8a] Add linux-next specific 
> files for 20210316
> git bisect bad 0f4b0bb396f6f424a7b074d00cb71f5966edcb8a
> # bad: [edd84c42baeffe66740143a04f24588fded94241] Merge remote-tracking 
> branch 'drm-misc/for-linux-next'
> git bisect bad edd84c42baeffe66740143a04f24588fded94241
> # bad: [a76f62d56da82bee1a4c35dd6375a8fdd57eca4e] Merge remote-tracking 
> branch 'cel/for-next'
> git bisect bad a76f62d56da82bee1a4c35dd6375a8fdd57eca4e
> # bad: [38872831aa5ec902b861d14e641cfeea97ca913a] Merge remote-tracking 
> branch 'qcom/for-next'
> git bisect bad 38872831aa5ec902b861d14e641cfeea97ca913a
> # good: [287bccb5b7f13f88cae2e14f49b0572a3bd43a1c] Merge remote-tracking 
> branch 'dma-mapping/for-next'
> git bisect good 287bccb5b7f13f88cae2e14f49b0572a3bd43a1c
> # bad: [b56586a8bfe0fb60a81b804cba49deb0d93e6623] Merge remote-tracking 
> branch 'imx-mxs/for-next'
> git bisect bad b56586a8bfe0fb60a81b804cba49deb0d93e6623
> # bad: [8b5531915cf217d205ca813a10fc79987fb528fb] Merge branch 
> 'imx/defconfig' into for-next
> git bisect bad 8b5531915cf217d205ca813a10fc79987fb528fb
> # bad: [3a28e405ca096d692df2dd4b61f179b6fbed0da3] arm64: dts: 

Re: next/master bisection: baseline.login on rk3399-gru-kevin

2021-03-04 Thread Guillaume Tucker
Hi Ray,

Please see the bisection report below about a boot failure on
rk3399-gru-kevin on linux-next.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The boot log shows a kernel panic with a NULL pointer
dereference:

  
https://storage.kernelci.org/next/master/next-20210304/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.html#L673

Some more details can be found here:

  https://kernelci.org/test/case/id/60405c6fa031a93136addcc0/


Please let us know if you need any help with debugging the issue
or trying a fix on this platform.

Thanks,
Guillaume

On 04/03/2021 12:02, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on rk3399-gru-kevin
> 
> Summary:
>   Start:  f5427c2460eb Add linux-next specific files for 20210304
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210304/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210304/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.html
>   Result: 59fa3def35de usb: dwc3: add a power supply for current control
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: rk3399-gru-kevin
>   CPU arch:   arm64
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: defconfig+CONFIG_RANDOMIZE_BASE=y
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 59fa3def35de957881ac142a384487e27e8fe527
> Author: Ray Chi 
> Date:   Mon Feb 22 19:51:48 2021 +0800
> 
> usb: dwc3: add a power supply for current control
> 
> Currently, VBUS draw callback does no action when the
> generic PHYs are used. This patch adds an additional
> path to control charging current through power supply
> interface.
> 
> Signed-off-by: Ray Chi 
> Link: https://lore.kernel.org/r/20210222115149.3606776-2-ray...@google.com
> Signed-off-by: Greg Kroah-Hartman 
> 
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index f2448d0a9d39..d15f065849cd 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1238,6 +1238,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>   u8  rx_max_burst_prd;
>   u8  tx_thr_num_pkt_prd;
>   u8  tx_max_burst_prd;
> + const char  *usb_psy_name;
> + int ret;
>  
>   /* default to highest possible threshold */
>   lpm_nyet_threshold = 0xf;
> @@ -1263,6 +1265,13 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>   else
>   dwc->sysdev = dwc->dev;
>  
> + ret = device_property_read_string(dev, "usb-psy-name", _psy_name);
> + if (ret >= 0) {
> + dwc->usb_psy = power_supply_get_by_name(usb_psy_name);
> + if (!dwc->usb_psy)
> + dev_err(dev, "couldn't get usb power supply\n");
> + }
> +
>   dwc->has_lpm_erratum = device_property_read_bool(dev,
>   "snps,has-lpm-erratum");
>   device_property_read_u8(dev, "snps,lpm-nyet-threshold",
> @@ -1619,6 +1628,9 @@ static int dwc3_probe(struct platform_device *pdev)
>  assert_reset:
>   reset_control_assert(dwc->reset);
>  
> + if (!dwc->usb_psy)
> + power_supply_put(dwc->usb_psy);
> +
>   return ret;
>  }
>  
> @@ -1641,6 +1653,9 @@ static int dwc3_remove(struct platform_device *pdev)
>   dwc3_free_event_buffers(dwc);
>   dwc3_free_scratch_buffers(dwc);
>  
> + if (!dwc->usb_psy)
> + power_supply_put(dwc->usb_psy);
> +
>   return 0;
>  }
>  
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index 052b20d52651..6708fdf358b3 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -30,6 +30,8 @@
>  
>  #include 
>  
> +#include 
> +
>  #define DWC3_MSG_MAX 500
>  
>  /* Global constants 

Re: [PATCH 5.10 000/661] 5.10.20-rc2 review

2021-03-03 Thread Guillaume Tucker
On 02/03/2021 12:40, Greg Kroah-Hartman wrote:
> On Tue, Mar 02, 2021 at 11:38:36AM +0000, Guillaume Tucker wrote:
>> On 01/03/2021 19:37, Greg Kroah-Hartman wrote:
>>> This is the start of the stable review cycle for the 5.10.20 release.
>>> There are 661 patches in this series, all will be posted as a response
>>> to this one.  If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Wed, 03 Mar 2021 19:34:53 +.
>>> Anything received after that time might be too late.
>>>
>>> The whole patch series can be found in one patch at:
>>> 
>>> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.20-rc2.gz
>>> or in the git tree and branch at:
>>> 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
>>> linux-5.10.y
>>> and the diffstat can be found below.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>>
>> I've been through the KernelCI results for v5.10.20-rc2 and made
>> this manual reply, hoping to eventually get it all automated.
>>
>>
>>
>> First there was one build regression with the arm
>> realview_defconfig:
>>
>> kernel/rcu/tree.c:683:2: error: implicit declaration of function 
>> ‘IRQ_WORK_INIT’; did you mean ‘IRQMASK_I_BIT’? 
>> [-Werror=implicit-function-declaration]
>>   IRQ_WORK_INIT(late_wakeup_func);
>>   ^
>>   IRQMASK_I_BIT
>> kernel/rcu/tree.c:683:2: error: invalid initializer
>>
>>
>> Full log:
>>
>>   
>> https://storage.kernelci.org/stable-rc/linux-5.10.y/v5.10.19-662-g92929e15cdc0/arm/realview_defconfig/gcc-8/build.log
> 
> That should now be resolved with a new -rc release for 5.4.y and 5.10.y.

Confirmed in my other email for v5.10.20-rc4.

>> There were also a few new build warnings.  Here's a comparison of
>> the number of builds that completed with no warnings, with at
>> least one warning, and with an error between current stable and
>> stable-rc:
>>
>>   pass  warn  error
>> v5.10.19  188  6  0  
>> v5.10.20-rc2  180 15  1
>>
>> Full details for these 2 revisions respectively:
>>
>>   https://kernelci.org/build/stable/branch/linux-5.10.y/kernel/v5.10.19/
>>   
>> https://kernelci.org/build/stable-rc/branch/linux-5.10.y/kernel/v5.10.19-662-g92929e15cdc0/
> 
> That error should be resolved.
> 
> Warnings for non-x86 arches I have not been tracking to try to get down
> to 0.  That would be a good project for someone to work on...

OK, so until we get to 0 we should probably ignore warnings when
replying to the -rc review threads.  If someone wants to pick
this up in the meantime, kernelci.org can definitely help.

>> Then on the runtime testing side, there was one boot regression
>> detected on imx8mp-evk as detailed here:
>>
>>   https://kernelci.org/test/case/id/603d69ec2924db6b9baddcb2/
>>
>> I've re-run a couple of tests with both v5.10.19 and v5.10.20-rc2
>> and also got a failure with v5.10.19, so it looks like it's not
>> really a new regression but more of an intermittent problem.
>> Bisections are not enabled in NXP's lab so we don't have results
>> about which commit caused it.  We should chase this up, I've
>> already asked if they're OK to enable bisection.  Then we may
>> bisect with an older revision that is really booting to find the
>> root cause...
> 
> Finding that root cause would be good, but doesn't really sound like a
> regression yet :)

Yep.  Bisections are now getting enabled in the NXP test lab, so
we'll share the news if it leads to something.  FWIW the same
test passed with v5.10.20-rc4.

>> Presumably it's not OK to have this build error in the v5.10.20
>> release, assuming the boot regression is not new and can be
>> ignored, but that's your call.  So it seems a bit early for
>> KernelCI to stamp it with Tested-by, even though it was tested
>> but it's more a matter of clarifying the semantics and whether
>> Tested-by implicitly means "works for me".  What do you think?
> 
> Try the new release to see if that fixes the build errors for you.

All passing now.

> And thanks for doing all of the testing here, this round was a rough one
> for a variety of different reasons...

You're welcome.  That's what KernelCI is here for :)

It'll just take a bit more typing to automate the replies and use
the last stable release as a reference to detect new regressions
on stable-rc.  I think patc...@kernelci.org you're putting on CC
will make things easier in this respect, in fact that's what it
was originally created for.

Best wishes,
Guillaume


Re: [PATCH 5.10 000/657] 5.10.20-rc4 review

2021-03-03 Thread Guillaume Tucker
On 02/03/2021 19:28, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.20 release.
> There are 657 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Thu, 04 Mar 2021 19:25:07 +.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.20-rc4.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h


No build errors seen on kernelci.org:

  
https://kernelci.org/build/stable-rc/branch/linux-5.10.y/kernel/v5.10.19-658-g083cbba104d9/


No test regressions either:

  
https://kernelci.org/test/job/stable-rc/branch/linux-5.10.y/kernel/v5.10.19-658-g083cbba104d9/


Tested-by: "kernelci.org bot" 


Thanks,
Guillaume


Re: [PATCH 5.10 000/661] 5.10.20-rc2 review

2021-03-02 Thread Guillaume Tucker
On 01/03/2021 19:37, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.20 release.
> There are 661 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
> 
> Responses should be made by Wed, 03 Mar 2021 19:34:53 +.
> Anything received after that time might be too late.
> 
> The whole patch series can be found in one patch at:
>   
> https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.20-rc2.gz
> or in the git tree and branch at:
>   
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.10.y
> and the diffstat can be found below.
> 
> thanks,
> 
> greg k-h


I've been through the KernelCI results for v5.10.20-rc2 and made
this manual reply, hoping to eventually get it all automated.



First there was one build regression with the arm
realview_defconfig:

kernel/rcu/tree.c:683:2: error: implicit declaration of function 
‘IRQ_WORK_INIT’; did you mean ‘IRQMASK_I_BIT’? 
[-Werror=implicit-function-declaration]
  IRQ_WORK_INIT(late_wakeup_func);
  ^
  IRQMASK_I_BIT
kernel/rcu/tree.c:683:2: error: invalid initializer


Full log:

  
https://storage.kernelci.org/stable-rc/linux-5.10.y/v5.10.19-662-g92929e15cdc0/arm/realview_defconfig/gcc-8/build.log


There were also a few new build warnings.  Here's a comparison of
the number of builds that completed with no warnings, with at
least one warning, and with an error between current stable and
stable-rc:

  pass  warn  error
v5.10.19  188  6  0  
v5.10.20-rc2  180 15  1

Full details for these 2 revisions respectively:

  https://kernelci.org/build/stable/branch/linux-5.10.y/kernel/v5.10.19/
  
https://kernelci.org/build/stable-rc/branch/linux-5.10.y/kernel/v5.10.19-662-g92929e15cdc0/



Then on the runtime testing side, there was one boot regression
detected on imx8mp-evk as detailed here:

  https://kernelci.org/test/case/id/603d69ec2924db6b9baddcb2/

I've re-run a couple of tests with both v5.10.19 and v5.10.20-rc2
and also got a failure with v5.10.19, so it looks like it's not
really a new regression but more of an intermittent problem.
Bisections are not enabled in NXP's lab so we don't have results
about which commit caused it.  We should chase this up, I've
already asked if they're OK to enable bisection.  Then we may
bisect with an older revision that is really booting to find the
root cause...



Presumably it's not OK to have this build error in the v5.10.20
release, assuming the boot regression is not new and can be
ignored, but that's your call.  So it seems a bit early for
KernelCI to stamp it with Tested-by, even though it was tested
but it's more a matter of clarifying the semantics and whether
Tested-by implicitly means "works for me".  What do you think?

Best wishes,
Guillaume


Re: mainline/master bisection: baseline.login on meson-sm1-khadas-vim3l

2021-02-24 Thread Guillaume Tucker
On 24/02/2021 08:52, Marc Zyngier wrote:
> On Tue, 23 Feb 2021 21:03:52 +,
> Guillaume Tucker  wrote:
>>
>> On 23/02/2021 14:18, Marc Zyngier wrote:
>>> Hi Guillaume,
>>>
>>> On Tue, 23 Feb 2021 09:46:30 +,
>>> Guillaume Tucker  wrote:
>>>>
>>>> Hello Marc,
>>>>
>>>> Please see the bisection report below about a boot failure on
>>>> meson-sm1-khadas-vim3l on mainline.  It seems to only be
>>>> affecting kernels built with CONFIG_ARM64_64K_PAGES=y.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org but this one
>>>> looks valid.
>>>>
>>>> There's no output in the log, so the kernel is most likely
>>>> crashing early.  Some more details can be found here:
>>>>
>>>>   https://kernelci.org/test/case/id/6034bed3b344e2860daddcc8/
>>>>
>>>> Please let us know if you need any help to debug the issue or try
>>>> a fix on this platform.
>>>
>>> Thanks for the heads up.
>>>
>>> There is actually a fundamental problem with the patch you bisected
>>> to: it provides no guarantee that the point where we enable the EL2
>>> MMU is in the idmap and, as it turns out, the code we're running from
>>> disappears from under our feet, leading to a translation fault we're
>>> not prepared to handle.
>>>
>>> How does it work with 4kB pages? Luck.
>>
>> There may be a fascinating explanation for it, but luck works
>> too.  It really seems to be booting happily with 4k pages:
>>
>>   https://kernelci.org/test/plan/id/60347b358de339d1b7addcc5/
> 
> Oh, I know it boots fine with 4k, that's what I used everywhere.
> We're just lucky that the bit of code that deals with the MMU happens
> to *also* be in the idmap. With 64k pages, it gets pushed further down
> the line, and bad things happen. Short of explicit statements in the
> code, luck rules.

OK I see that now, thanks for the explanation.

>>> Do you mind giving the patch below a go? It does work on my vim3l and
>>> on a FVP, so odds are that it will solve it for you too.
>>
>> Sure, and that worked here as well:
>>
>>   http://lava.baylibre.com:10080/scheduler/job/752416
>>
>> and here's the test branch where I applied your fix, for
>> completeness:
>>
>>   https://gitlab.collabora.com/gtucker/linux/-/commits/v5.11-vim3l-vhe/
> 
> Awesome. thanks for having tested it.
> 
>> As always, if you do send a patch with the fix, please give some
>> credit to the bot:
>>
>>   Reported-by: "kernelci.org bot"  
> 
> Will do. Mind if I credit you too for the testing?

Sure:

  Tested-by: Guillaume Tucker 

Thanks,
Guillaume


Re: mainline/master bisection: baseline.login on meson-sm1-khadas-vim3l

2021-02-23 Thread Guillaume Tucker
On 23/02/2021 14:18, Marc Zyngier wrote:
> Hi Guillaume,
> 
> On Tue, 23 Feb 2021 09:46:30 +,
> Guillaume Tucker  wrote:
>>
>> Hello Marc,
>>
>> Please see the bisection report below about a boot failure on
>> meson-sm1-khadas-vim3l on mainline.  It seems to only be
>> affecting kernels built with CONFIG_ARM64_64K_PAGES=y.
>>
>> Reports aren't automatically sent to the public while we're
>> trialing new bisection features on kernelci.org but this one
>> looks valid.
>>
>> There's no output in the log, so the kernel is most likely
>> crashing early.  Some more details can be found here:
>>
>>   https://kernelci.org/test/case/id/6034bed3b344e2860daddcc8/
>>
>> Please let us know if you need any help to debug the issue or try
>> a fix on this platform.
> 
> Thanks for the heads up.
> 
> There is actually a fundamental problem with the patch you bisected
> to: it provides no guarantee that the point where we enable the EL2
> MMU is in the idmap and, as it turns out, the code we're running from
> disappears from under our feet, leading to a translation fault we're
> not prepared to handle.
> 
> How does it work with 4kB pages? Luck.

There may be a fascinating explanation for it, but luck works
too.  It really seems to be booting happily with 4k pages:

  https://kernelci.org/test/plan/id/60347b358de339d1b7addcc5/

> Do you mind giving the patch below a go? It does work on my vim3l and
> on a FVP, so odds are that it will solve it for you too.

Sure, and that worked here as well:

  http://lava.baylibre.com:10080/scheduler/job/752416

and here's the test branch where I applied your fix, for
completeness:

  https://gitlab.collabora.com/gtucker/linux/-/commits/v5.11-vim3l-vhe/

As always, if you do send a patch with the fix, please give some
credit to the bot:

  Reported-by: "kernelci.org bot"  

Thanks,
Guillaume


> Thanks,
> 
>   M.
> 
> diff --git a/arch/arm64/kernel/hyp-stub.S b/arch/arm64/kernel/hyp-stub.S
> index 678cd2c618ee..fbd2543b8f7d 100644
> --- a/arch/arm64/kernel/hyp-stub.S
> +++ b/arch/arm64/kernel/hyp-stub.S
> @@ -96,8 +96,10 @@ SYM_CODE_START_LOCAL(mutate_to_vhe)
>   cmp x1, xzr
>   and x2, x2, x1
>   csinv   x2, x2, xzr, ne
> - cbz x2, 1f
> + cbnzx2, 2f
>  
> +1:   eret
> +2:
>   // Engage the VHE magic!
>   mov_q   x0, HCR_HOST_VHE_FLAGS
>   msr hcr_el2, x0
> @@ -131,11 +133,29 @@ SYM_CODE_START_LOCAL(mutate_to_vhe)
>   msr mair_el1, x0
>   isb
>  
> + // Hack the exception return to stay at EL2
> + mrs x0, spsr_el1
> + and x0, x0, #~PSR_MODE_MASK
> + mov x1, #PSR_MODE_EL2h
> + orr x0, x0, x1
> + msr spsr_el1, x0
> +
> + b   enter_vhe
> +SYM_CODE_END(mutate_to_vhe)
> +
> + // At the point where we reach enter_vhe(), we run with
> + // the MMU off (which is enforced by mutate_to_vhe()).
> + // We thus need to be in the idmap, or everything will
> + // explode when enabling the MMU.
> +
> + .pushsection.idmap.text, "ax"
> +
> +SYM_CODE_START_LOCAL(enter_vhe)
> + // Enable the EL2 S1 MMU, as set up from EL1
>   // Invalidate TLBs before enabling the MMU
>   tlbivmalle1
>   dsb nsh
>  
> - // Enable the EL2 S1 MMU, as set up from EL1
>   mrs_s   x0, SYS_SCTLR_EL12
>   set_sctlr_el1   x0
>  
> @@ -143,17 +163,12 @@ SYM_CODE_START_LOCAL(mutate_to_vhe)
>   mov_q   x0, INIT_SCTLR_EL1_MMU_OFF
>   msr_s   SYS_SCTLR_EL12, x0
>  
> - // Hack the exception return to stay at EL2
> - mrs x0, spsr_el1
> - and x0, x0, #~PSR_MODE_MASK
> - mov x1, #PSR_MODE_EL2h
> - orr x0, x0, x1
> - msr spsr_el1, x0
> -
>   mov x0, xzr
>  
> -1:   eret
> -SYM_CODE_END(mutate_to_vhe)
> + eret
> +SYM_CODE_END(enter_vhe)
> +
> + .popsection
>  
>  .macro invalid_vectorlabel
>  SYM_CODE_START_LOCAL(\label)
> 
> 



Re: next/master bisection: baseline.login on r8a77960-ulcb

2021-02-23 Thread Guillaume Tucker
Hi Christoph,

Please see the bisection report below about a boot failure on
r8a77960-ulcb on next-20210222.  

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The log shows a kernel panic, more details can be found here:

  https://kernelci.org/test/case/id/6034bde034504edc9faddd2c/

Please let us know if you need any help to debug the issue or try
a fix on this platform.

Best wishes,
Guillaume

On 23/02/2021 02:02, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on r8a77960-ulcb
> 
> Summary:
>   Start:  37dfbfbdca66 Add linux-next specific files for 20210222
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210222/arm64/defconfig/clang-10/lab-baylibre/baseline-r8a77960-ulcb.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210222/arm64/defconfig/clang-10/lab-baylibre/baseline-r8a77960-ulcb.html
>   Result: 567d877f9a7d swiotlb: refactor swiotlb_tbl_map_single
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: r8a77960-ulcb
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   clang-10
>   Config: defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 567d877f9a7d6bf4e4bf0ecd6de23fec8039b123
> Author: Christoph Hellwig 
> Date:   Thu Feb 4 11:08:35 2021 +0100
> 
> swiotlb: refactor swiotlb_tbl_map_single
> 
> Split out a bunch of a self-contained helpers to make the function easier
> to follow.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Jianxiong Gao 
> Tested-by: Jianxiong Gao 
> Signed-off-by: Konrad Rzeszutek Wilk 
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index b38b1553c466..381c24ef1ac1 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -468,134 +468,133 @@ static void swiotlb_bounce(phys_addr_t orig_addr, 
> phys_addr_t tlb_addr,
>   }
>  }
>  
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t 
> orig_addr,
> - size_t mapping_size, size_t alloc_size,
> - enum dma_data_direction dir, unsigned long attrs)
> -{
> - dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(hwdev, io_tlb_start);
> - unsigned long flags;
> - phys_addr_t tlb_addr;
> - unsigned int nslots, stride, index, wrap;
> - int i;
> - unsigned long mask;
> - unsigned long offset_slots;
> - unsigned long max_slots;
> - unsigned long tmp_io_tlb_used;
> -
> - if (no_iotlb_memory)
> - panic("Can not allocate SWIOTLB buffer earlier and can't now 
> provide you with the DMA bounce buffer");
> -
> - if (mem_encrypt_active())
> - pr_warn_once("Memory encryption is active and system is using 
> DMA bounce buffers\n");
> +#define slot_addr(start, idx)((start) + ((idx) << IO_TLB_SHIFT))
>  
> - if (mapping_size > alloc_size) {
> - dev_warn_once(hwdev, "Invalid sizes (mapping: %zd bytes, alloc: 
> %zd bytes)",
> -   mapping_size, alloc_size);
> - return (phys_addr_t)DMA_MAPPING_ERROR;
> - }
> -
> - mask = dma_get_seg_boundary(hwdev);
> +/*
> + * Carefully handle integer overflow which can occur when boundary_mask == 
> ~0UL.
> + */
> +static inline unsigned long get_max_slots(unsigned long boundary_mask)
> +{
> + if (boundary_mask == ~0UL)
> + return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> + return nr_slots(boundary_mask + 1);
> +}
>  
> - tbl_dma_addr &= mask;
> +static unsigned int wrap_index(unsigned int index)
> +{
> + if (index >= io_tlb_nslabs)
> + return 0;
> + return index;
> +}
>  
> - offset_slots = nr_slots(tbl_dma_addr);
> +/*
> + * Find a suitable number of IO TLB entries size that will fit this request 
> and
> + * allocate a buffer from that IO TLB pool.
> + */
> +static int find_slots(struct device *dev, size_t alloc_size)
> +{
> + unsigned long boundary_mask = 

Re: mainline/master bisection: baseline.login on meson-sm1-khadas-vim3l

2021-02-23 Thread Guillaume Tucker
Hello Marc,

Please see the bisection report below about a boot failure on
meson-sm1-khadas-vim3l on mainline.  It seems to only be
affecting kernels built with CONFIG_ARM64_64K_PAGES=y.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

There's no output in the log, so the kernel is most likely
crashing early.  Some more details can be found here:

  https://kernelci.org/test/case/id/6034bed3b344e2860daddcc8/

Please let us know if you need any help to debug the issue or try
a fix on this platform.

Best wishes,
Guillaume

On 22/02/2021 12:38, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> mainline/master bisection: baseline.login on meson-sm1-khadas-vim3l
> 
> Summary:
>   Start:  31caf8b2a847 Merge branch 'linus' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
>   Plain log:  
> https://storage.kernelci.org/mainline/master/v5.11-7579-g31caf8b2a847/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/clang-10/lab-baylibre/baseline-meson-sm1-khadas-vim3l.txt
>   HTML log:   
> https://storage.kernelci.org/mainline/master/v5.11-7579-g31caf8b2a847/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/clang-10/lab-baylibre/baseline-meson-sm1-khadas-vim3l.html
>   Result: 0c93df9622d4 arm64: Initialise as nVHE before switching to VHE
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   mainline
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>   Branch: master
>   Target: meson-sm1-khadas-vim3l
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   clang-10
>   Config: defconfig+CONFIG_ARM64_64K_PAGES=y
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 0c93df9622d4d921bcd0dc83f71fed9e98f5119f
> Author: Marc Zyngier 
> Date:   Mon Feb 8 09:57:14 2021 +
> 
> arm64: Initialise as nVHE before switching to VHE
> 
> As we are aiming to be able to control whether we enable VHE or
> not, let's always drop down to EL1 first, and only then upgrade
> to VHE if at all possible.
> 
> This means that if the kernel is booted at EL2, we always start
> with a nVHE init, drop to EL1 to initialise the the kernel, and
> only then upgrade the kernel EL to EL2 if possible (the process
> is obviously shortened for secondary CPUs).
> 
> The resume path is handled similarly to a secondary CPU boot.
> 
> Signed-off-by: Marc Zyngier 
> Acked-by: David Brazdil 
> Acked-by: Catalin Marinas 
> Link: https://lore.kernel.org/r/20210208095732.3267263-6-...@kernel.org
> [will: Avoid calling switch_to_vhe twice on kaslr path]
> Signed-off-by: Will Deacon 
> 
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index 28e9735302df..ec66dc061b0c 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -447,6 +447,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
>   ret // to __primary_switch()
>  0:
>  #endif
> + bl  switch_to_vhe   // Prefer VHE if possible
>   add sp, sp, #16
>   mov x29, #0
>   mov x30, #0
> @@ -493,42 +494,6 @@ SYM_INNER_LABEL(init_el1, SYM_L_LOCAL)
>   eret
>  
>  SYM_INNER_LABEL(init_el2, SYM_L_LOCAL)
> -#ifdef CONFIG_ARM64_VHE
> - /*
> -  * Check for VHE being present. x2 being non-zero indicates that we
> -  * do have VHE, and that the kernel is intended to run at EL2.
> -  */
> - mrs x2, id_aa64mmfr1_el1
> - ubfxx2, x2, #ID_AA64MMFR1_VHE_SHIFT, #4
> -#else
> - mov x2, xzr
> -#endif
> - cbz x2, init_el2_nvhe
> -
> - /*
> -  * When VHE _is_ in use, EL1 will not be used in the host and
> -  * requires no configuration, and all non-hyp-specific EL2 setup
> -  * will be done via the _EL1 system register aliases in __cpu_setup.
> -  */
> - mov_q   x0, HCR_HOST_VHE_FLAGS
> - msr hcr_el2, x0
> - isb
> -
> - init_el2_state vhe
> -
> - isb
> -
> - mov_q   x0, INIT_PSTATE_EL2
> - msr spsr_el2, x0
> - msr elr_el2, lr
> - mov w0, 

Re: [PATCH] iommu/tegra-smmu: Fix mc errors on tegra124-nyan

2021-02-22 Thread Guillaume Tucker
On 18/02/2021 22:07, Nicolin Chen wrote:
> Commit 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> removed certain hack in the tegra_smmu_probe() by relying on IOMMU core to
> of_xlate SMMU's SID per device, so as to get rid of tegra_smmu_find() and
> tegra_smmu_configure() that are typically done in the IOMMU core also.
> 
> This approach works for both existing devices that have DT nodes and other
> devices (like PCI device) that don't exist in DT, on Tegra210 and Tegra3
> upon testing. However, Page Fault errors are reported on tegra124-Nyan:
> 
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])
> 
> After debugging, I found that the mentioned commit changed some function
> callback sequence of tegra-smmu's, resulting in enabling SMMU for display
> client before display driver gets initialized. I couldn't reproduce exact
> same issue on Tegra210 as Tegra124 (arm-32) differs at arch-level code.
> 
> Actually this Page Fault is a known issue, as on most of Tegra platforms,
> display gets enabled by the bootloader for the splash screen feature, so
> it keeps filling the framebuffer memory. A proper fix to this issue is to
> 1:1 linear map the framebuffer memory to IOVA space so the SMMU will have
> the same address as the physical address in its page table. Yet, Thierry
> has been working on the solution above for a year, and it hasn't merged.
> 
> Therefore, let's partially revert the mentioned commit to fix the errors.
> 
> The reason why we do a partial revert here is that we can still set priv
> in ->of_xlate() callback for PCI devices. Meanwhile, devices existing in
> DT, like display, will go through tegra_smmu_configure() at the stage of
> bus_set_iommu() when SMMU gets probed(), as what it did before we merged
> the mentioned commit.
> 
> Once we have the linear map solution for framebuffer memory, this change
> can be cleaned away.
> 
> [Big thank to Guillaume who reported and helped debugging/verification]
> 
> Fixes: 25938c73cd79 ("iommu/tegra-smmu: Rework tegra_smmu_probe_device()")
> Reported-by: Guillaume Tucker 

You're welcome.  I would actually prefer to see it as reported by
kernelci.org since I wouldn't have found it without the automated
testing and bisection.  If you're OK to change this trailer:

  Reported-by: "kernelci.org bot" 

> Signed-off-by: Nicolin Chen 
> ---
> 
> Guillaume, would you please give a "Tested-by" to this change? Thanks!

Sure. https://lava.collabora.co.uk/scheduler/job/3249387

  Tested-by: Guillaume Tucker 

Thanks,
Guillaume


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-18 Thread Guillaume Tucker
On 18/02/2021 10:35, Nicolin Chen wrote:
> Hi Guillaume,
> 
> Thank you for the test results! And sorry for my belated reply.

No worries :)

> On Thu, Feb 11, 2021 at 03:50:05PM +0000, Guillaume Tucker wrote:
>>> On Sat, Feb 06, 2021 at 01:40:13PM +0000, Guillaume Tucker wrote:
>>>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>>>> and the commit-reverted version (passing), each applying this patch.
>>>>
>>>> Sure, I've run 3 jobs:
>>>>
>>>> * v5.11-rc6 as a reference, to see the original issue:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>>>
>>>> * + your debug patch:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>>>
>>>> * + the "breaking" commit reverted, passing the tests:
>>>>   https://lava.collabora.co.uk/scheduler/job/3187851
>>>
>>> Thanks for the help!
>>>
>>> I am able to figure out what's probably wrong, yet not so sure
>>> about the best solution at this point.
>>>
>>> Would it be possible for you to run one more time with another
>>> debugging patch? I'd like to see the same logs as previous:
>>> 1. Vanilla kernel + debug patch
>>> 2. Vanilla kernel + Reverted + debug patch
>>
>> As it turns out, next-20210210 is passing all the tests again so
>> it looks like this got fixed in the meantime:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3210192
> 
> I checked this passing log, however, found that the regression is
> still there though test passed, as the prints below aren't normal:
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>EMEM address decode error (SMMU translation error [--S])
>   tegra-mc 70019000.memory-controller: display0a: read @0xfe056b40:
>Page fault (SMMU translation error [--S])

Ah yes sorry, there are other KernelCI checks for kernel errors
but that wasn't enabled in the bisection so I didn't notice them.

> I was trying to think of a simpler solution than a revert. However,
> given the fact that the callback sequence could change -- guessing
> likely a recent change in iommu core, I feel it safer to revert my
> previous change, not necessarily being a complete revert though.
> 
> I attached my partial reverting change in this email. Would it be
> possible for you to run one more test for me to confirm it? It'd
> keep the tests passing while eliminating all error prints above.
> 
> If the fix works, I'll re-send it to mail list by adding a commit
> message.

Sure, here's next-20210218 as a reference:

  https://lava.collabora.co.uk/scheduler/job/3241236

and here with your patch applied on top of it:

  https://lava.collabora.co.uk/scheduler/job/3241246

The git branch I've used where your patch is applied:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210218-nyan-big-drm-read/

The errors seem to have disappeared but I'll let you double check
that things are all back to a working state.

BTW: This thread is a good example of how having an "on-demand"
KernelCI service to let developers re-run tests with extra
patches would allow them to fix issues independently.  We'll keep
that in mind for the future.

Best wishes,
Guillaume


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-11 Thread Guillaume Tucker
On 10/02/2021 08:20, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Sat, Feb 06, 2021 at 01:40:13PM +, Guillaume Tucker wrote:
>>> It'd be nicer if I can get both logs of the vanilla kernel (failing)
>>> and the commit-reverted version (passing), each applying this patch.
>>
>> Sure, I've run 3 jobs:
>>
>> * v5.11-rc6 as a reference, to see the original issue:
>>   https://lava.collabora.co.uk/scheduler/job/3187848
>>
>> * + your debug patch:
>>   https://lava.collabora.co.uk/scheduler/job/3187849
>>
>> * + the "breaking" commit reverted, passing the tests:
>>   https://lava.collabora.co.uk/scheduler/job/3187851
> 
> Thanks for the help!
> 
> I am able to figure out what's probably wrong, yet not so sure
> about the best solution at this point.
> 
> Would it be possible for you to run one more time with another
> debugging patch? I'd like to see the same logs as previous:
> 1. Vanilla kernel + debug patch
> 2. Vanilla kernel + Reverted + debug patch

As it turns out, next-20210210 is passing all the tests again so
it looks like this got fixed in the meantime:

  https://lava.collabora.co.uk/scheduler/job/3210192
  https://lava.collabora.co.uk/results/3210192/0_igt-kms-tegra

And here's a more extensive list of IGT tests on next-20210211,
all the regressions have been fixed:

  https://kernelci.org/test/plan/id/60254c42f51df36be53abe62/


I haven't run a reversed bisection to find the fix, but I guess
it wouldn't be too hard to find out what happened by hand anyway.
I see the drm/tegra/for-5.12-rc1 tag has been merged into
linux-next, maybe that solved the issue?

FYI I've also run some jobs with your debug patch and with the
breaking patch reverted:

  https://lava.collabora.co.uk/scheduler/job/3210245
  https://lava.collabora.co.uk/scheduler/job/3210596

Meanwhile I'll see what can be done to improve the automated
bisection so if there are new IGT regressions they would get
reported earlier.  I guess it would have saved us all some time
if it had been bisected in December.

Thanks,
Guillaume


Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-06 Thread Guillaume Tucker
On 05/02/2021 09:45, Nicolin Chen wrote:
> Hi Guillaume,
> 
> On Thu, Feb 04, 2021 at 09:24:23PM -0800, Nicolin Chen wrote:
>>> Please let us know if you need any help debugging this issue or
>>> to try a fix on this platform.
>>
>> Yes, I don't have any Tegra124 platform to run. It'd be very nice
>> if you can run some debugging patch (I can provide you) and a fix
>> after I root cause the issue.
> 
> Would it be possible for you to run with the given debugging patch?
> 
> It'd be nicer if I can get both logs of the vanilla kernel (failing)
> and the commit-reverted version (passing), each applying this patch.

Sure, I've run 3 jobs:

* v5.11-rc6 as a reference, to see the original issue:
  https://lava.collabora.co.uk/scheduler/job/3187848

* + your debug patch:
  https://lava.collabora.co.uk/scheduler/job/3187849

* + the "breaking" commit reverted, passing the tests:
  https://lava.collabora.co.uk/scheduler/job/3187851


You can see the history of the test branch I'm using here, with
the 3 revisions mentioned above:

  
https://gitlab.collabora.com/gtucker/linux/-/commits/linux-5.11-rc6-nyan-big-drm-read/


Hope that helps,
Guillaume


Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-06 Thread Guillaume Tucker
On 05/02/2021 12:05, Ard Biesheuvel wrote:
> On Fri, 5 Feb 2021 at 09:21, Ard Biesheuvel  wrote:
>>
>> On Thu, 4 Feb 2021 at 22:31, Guillaume Tucker
>>  wrote:
>>>
>>> On 04/02/2021 18:23, Nick Desaulniers wrote:
>>>> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor  
>>>> wrote:
>>>>>
>>>>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang 
>>>>> Built Linux wrote:
>>>>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel  wrote:
>>>>>>>
>>>>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>>>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> Essentially:
>>>>>>>>>>
>>>>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 
>>>>>>>>>> CC="ccache clang" zImage
>>>>>>
>>>>>> This command should link with BFD (and assemble with GAS; it's only
>>>>>> using clang as the compiler.
>>>>>
>>>>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
>>>>> use all of the LLVM utilities minus the integrated assembler while
>>>>> wrapping clang with ccache.
>>>>
>>>> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
>>>> permit fallback to BFD.
>>>
>>> That was close, except we're cross-compiling with GCC for arm.
>>> So I've now built a plain next-20210203 (without Ard's fix) using
>>> this command line:
>>>
>>> make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm 
>>> CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage
>>>
>>> I'm using a modified Docker image gtucker/kernelci-build-clang-11
>>> with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
>>> packages added to be able to use the GNU linker.  BTW I guess we
>>> should enable this kind of hybrid build setup on kernelci.org as
>>> well.
>>>
>>> Full build log + kernel binaries can be found here:
>>>
>>> 
>>> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/
>>>
>>> And this booted fine, which confirms it's really down to how
>>> ld.lld puts together the kernel image.  Does it actually solve
>>> the debate whether this is an issue to fix in the assembly code
>>> or at link time?
>>>
>>> Full test job details for the record:
>>>
>>> https://lava.collabora.co.uk/scheduler/job/3176004
>>>
>>
>>
>> So the issue appears to be in the way the linker generates the
>> _kernel_bss_size symbol, which obviously has an impact, given that the
>> queued fix takes it into account in the cache_clean operation.
>>
>> On GNU ld, I see
>>
>>479: 00065e14 0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>>
>> whereas n LLVM ld.lld, I see
>>
>>433: c1c86e98 0 NOTYPE  GLOBAL DEFAULT  ABS _kernel_bss_size
>>
>> and adding this value may cause the cache clean to operate on unmapped
>> addresses, or cause the addition to wrap and not perform a cache clean
>> at all.
>>
>> AFAICT, this also breaks the appended DTB case in LLVM, so this needs
>> a separate fix in any case.
> 
> I pushed a combined branch of torvalds/master, rmk/fixes (still
> containing my 9052/1 fix) and this patch to my for-kernelci branch
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/
> 
> Guillaume,
> 
> It seems there is no Clang-11 coverage there, right? Mind giving this
> branch a spin? If this fixes the regressions, we can get these queued
> up.

That's right, Clang builds are only enabled on linux-next and
mainline at the moment.  We could enable it on any other branch
where it makes sense.  How about just the main defconfig for arm,
arm64 and x86_64 on your ardb/for-kernelci branch?

For now I've run another set of builds with clang-11 and got the
following test results with your branch on staging:

  
https://staging.kernelci.org/test/job/ardb/branch/for-kernelci/kernel/v5.11-rc6-146-g923ca344043a/plan/baseline/

which are all passing.

I'll reply to the thread with your patch to confirm.

Guillaume


Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-04 Thread Guillaume Tucker
On 04/02/2021 18:23, Nick Desaulniers wrote:
> On Thu, Feb 4, 2021 at 10:12 AM Nathan Chancellor  wrote:
>>
>> On Thu, Feb 04, 2021 at 10:06:08AM -0800, 'Nick Desaulniers' via Clang Built 
>> Linux wrote:
>>> On Thu, Feb 4, 2021 at 8:02 AM Ard Biesheuvel  wrote:
>>>>
>>>> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>>>>  wrote:
>>>>>
>>>>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>>>>  wrote:
>>>>>>>
>>>>>>> Essentially:
>>>>>>>
>>>>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 
>>>>>>> CC="ccache clang" zImage
>>>
>>> This command should link with BFD (and assemble with GAS; it's only
>>> using clang as the compiler.
>>
>> I think you missed the 'LLVM=1' before CC="ccache clang". That should
>> use all of the LLVM utilities minus the integrated assembler while
>> wrapping clang with ccache.
> 
> You're right, I missed `LLVM=1`. Adding `LD=ld.bfd` I think should
> permit fallback to BFD.

That was close, except we're cross-compiling with GCC for arm.
So I've now built a plain next-20210203 (without Ard's fix) using
this command line:

make LD=arm-linux-gnueabihf-ld.bfd -j18 ARCH=arm 
CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache clang" zImage

I'm using a modified Docker image gtucker/kernelci-build-clang-11
with the very latest LLVM 11 and gcc-8-arm-linux-gnueabihf
packages added to be able to use the GNU linker.  BTW I guess we
should enable this kind of hybrid build setup on kernelci.org as
well.

Full build log + kernel binaries can be found here:


https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-41/arm/multi_v7_defconfig/clang-11/

And this booted fine, which confirms it's really down to how
ld.lld puts together the kernel image.  Does it actually solve
the debate whether this is an issue to fix in the assembly code
or at link time?

Full test job details for the record:

https://lava.collabora.co.uk/scheduler/job/3176004

Hope that helps,
Guillaume


Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-04 Thread Guillaume Tucker
On 04/02/2021 16:01, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 16:53, Guillaume Tucker
>  wrote:
>>
>> On 04/02/2021 15:42, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>>>  wrote:
>>>>
>>>> On 04/02/2021 10:33, Guillaume Tucker wrote:
>>>>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>>>>>  wrote:
>>>>>>>
>>>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Hi Ard,
>>>>>>>>>
>>>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>>>>>
>>>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>>>> looks valid.
>>>>>>>>>
>>>>>>>>> The kernel is most likely crashing very early on, so there's
>>>>>>>>> nothing in the logs.  Please let us know if you need some help
>>>>>>>>> with debugging or trying a fix on these platforms.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks for the report.
>>>>>>>
>>>>>>> Ard,
>>>>>>>
>>>>>>> I want to send my fixes branch today which includes your regression
>>>>>>> fix that caused this regression.
>>>>>>>
>>>>>>> As this is proving difficult to fix, I can only drop your fix from
>>>>>>> my fixes branch - and given that this seems to be problematical, I'm
>>>>>>> tempted to revert the original change at this point which should fix
>>>>>>> both of these regressions - and then we have another go at getting rid
>>>>>>> of the set/way instructions during the next cycle.
>>>>>>>
>>>>>>> Thoughts?
>>>>>>>
>>>>>>
>>>>>> Hi Russell,
>>>>>>
>>>>>> If Guillaume is willing to do the experiment, and it fixes the issue,
>>>>>
>>>>> Yes, I'm running some tests with that fix now and should have
>>>>> some results shortly.
>>>>
>>>> Yes it does fix the issue:
>>>>
>>>>   https://lava.collabora.co.uk/scheduler/job/3173819
>>>>
>>>> with Ard's fix applied to this test branch:
>>>>
>>>>   
>>>> https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>>>>
>>>>
>>>> +clang +Nick
>>>>
>>>> It's worth mentioning that the issue only happens with kernels
>>>> built with Clang.  As you can see there are several other arm
>>>> platforms failing with clang-11 builds but booting fine with
>>>> gcc-8:
>>>>
>>>>   
>>>> https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>>>>
>>>> Here's a sample build log:
>>>>
>>>>   
>>>> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>>>>
>>>> Essentially:
>>>>
>>>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache 
>>>> clang" zImage
>>>>
>>>> I believe it should be using the GNU assembler as LLVM_IAS=1 is
>>>> not defined, but there may be something more subtle about it.
>>>>
>>>
>>>
>>> Do you have a link for a failing zImage built from multi_v7_defconfig?
>>
>> Sure, this one was built from a plain next-20210203:
>>
>>   
>> http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage
>>
>> You can also find the dtbs, modules and other thin

Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-04 Thread Guillaume Tucker
On 04/02/2021 15:42, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 12:32, Guillaume Tucker
>  wrote:
>>
>> On 04/02/2021 10:33, Guillaume Tucker wrote:
>>> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>>>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>>>  wrote:
>>>>>
>>>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi Ard,
>>>>>>>
>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>>>
>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>> looks valid.
>>>>>>>
>>>>>>> The kernel is most likely crashing very early on, so there's
>>>>>>> nothing in the logs.  Please let us know if you need some help
>>>>>>> with debugging or trying a fix on these platforms.
>>>>>>>
>>>>>>
>>>>>> Thanks for the report.
>>>>>
>>>>> Ard,
>>>>>
>>>>> I want to send my fixes branch today which includes your regression
>>>>> fix that caused this regression.
>>>>>
>>>>> As this is proving difficult to fix, I can only drop your fix from
>>>>> my fixes branch - and given that this seems to be problematical, I'm
>>>>> tempted to revert the original change at this point which should fix
>>>>> both of these regressions - and then we have another go at getting rid
>>>>> of the set/way instructions during the next cycle.
>>>>>
>>>>> Thoughts?
>>>>>
>>>>
>>>> Hi Russell,
>>>>
>>>> If Guillaume is willing to do the experiment, and it fixes the issue,
>>>
>>> Yes, I'm running some tests with that fix now and should have
>>> some results shortly.
>>
>> Yes it does fix the issue:
>>
>>   https://lava.collabora.co.uk/scheduler/job/3173819
>>
>> with Ard's fix applied to this test branch:
>>
>>   https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/
>>
>>
>> +clang +Nick
>>
>> It's worth mentioning that the issue only happens with kernels
>> built with Clang.  As you can see there are several other arm
>> platforms failing with clang-11 builds but booting fine with
>> gcc-8:
>>
>>   
>> https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/
>>
>> Here's a sample build log:
>>
>>   
>> https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log
>>
>> Essentially:
>>
>>   make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache 
>> clang" zImage
>>
>> I believe it should be using the GNU assembler as LLVM_IAS=1 is
>> not defined, but there may be something more subtle about it.
>>
> 
> 
> Do you have a link for a failing zImage built from multi_v7_defconfig?

Sure, this one was built from a plain next-20210203:

  
http://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/zImage

You can also find the dtbs, modules and other things in that same
directory.

For the record, here's the test job that used it:

  https://lava.collabora.co.uk/scheduler/job/3173792

Guillaume


Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-04 Thread Guillaume Tucker
On 04/02/2021 10:33, Guillaume Tucker wrote:
> On 04/02/2021 10:27, Ard Biesheuvel wrote:
>> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>>  wrote:
>>>
>>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>>  wrote:
>>>>>
>>>>> Hi Ard,
>>>>>
>>>>> Please see the bisection report below about a boot failure on
>>>>> rk3288 with next-20210203.  It was also bisected on
>>>>> imx6q-var-dt6customboard with next-20210202.
>>>>>
>>>>> Reports aren't automatically sent to the public while we're
>>>>> trialing new bisection features on kernelci.org but this one
>>>>> looks valid.
>>>>>
>>>>> The kernel is most likely crashing very early on, so there's
>>>>> nothing in the logs.  Please let us know if you need some help
>>>>> with debugging or trying a fix on these platforms.
>>>>>
>>>>
>>>> Thanks for the report.
>>>
>>> Ard,
>>>
>>> I want to send my fixes branch today which includes your regression
>>> fix that caused this regression.
>>>
>>> As this is proving difficult to fix, I can only drop your fix from
>>> my fixes branch - and given that this seems to be problematical, I'm
>>> tempted to revert the original change at this point which should fix
>>> both of these regressions - and then we have another go at getting rid
>>> of the set/way instructions during the next cycle.
>>>
>>> Thoughts?
>>>
>>
>> Hi Russell,
>>
>> If Guillaume is willing to do the experiment, and it fixes the issue,
> 
> Yes, I'm running some tests with that fix now and should have
> some results shortly.

Yes it does fix the issue:

  https://lava.collabora.co.uk/scheduler/job/3173819

with Ard's fix applied to this test branch:

  https://gitlab.collabora.com/gtucker/linux/-/commits/next-20210203-ard-fix/


+clang +Nick

It's worth mentioning that the issue only happens with kernels
built with Clang.  As you can see there are several other arm
platforms failing with clang-11 builds but booting fine with
gcc-8:

  
https://kernelci.org/test/job/next/branch/master/kernel/next-20210203/plan/baseline/

Here's a sample build log:

  
https://storage.staging.kernelci.org/gtucker/next-20210203-ard-fix/v5.10-rc4-24722-g58b6c0e507b7-gtucker_single-staging-33/arm/multi_v7_defconfig/clang-11/build.log

Essentially:

  make -j18 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- LLVM=1 CC="ccache 
clang" zImage

I believe it should be using the GNU assembler as LLVM_IAS=1 is
not defined, but there may be something more subtle about it.

Thanks,
Guillaume


>> it proves that rk3288 is relying on the flush before the MMU is
>> disabled, and so in that case, the fix is trivial, and we can just
>> apply it.
>>
>> If the experiment fails (which would mean rk3288 does not tolerate the
>> cache maintenance being performed after cache off), it is going to be
>> hairy, and so it will definitely take more time.
>>
>> So in the latter case (or if Guillaume does not get back to us), I
>> think reverting my queued fix is the only sane option. But in that
>> case, may I suggest that we queue the revert of the original by-VA
>> change for v5.12 so it gets lots of coverage in -next, and allows us
>> an opportunity to come up with a proper fix in the same timeframe, and
>> backport the revert and the subsequent fix as a pair? Otherwise, we'll
>> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
>> has set/way, and v5.10y+ has by-va again. (I don't think we care about
>> anything before that, given that v5.4 predates any of this)
>>
>> But in the end, I'm happy to go along with whatever works best for you.
> 
> Thanks,
> Guillaume
> 



Re: [PATCH RESEND v2 4/5] iommu/tegra-smmu: Rework tegra_smmu_probe_device()

2021-02-04 Thread Guillaume Tucker
Hi Nicolin,

A regression was detected by kernelci.org in IGT's drm_read tests
on mainline, it was first seen on 17th December 2020.  You can
find some details here:

  https://kernelci.org/test/case/id/600b82dc1e3208f123d3dffc/

Then an automated bisection was run and it landed on this
patch (v5.10-rc3-4-g25938c73cd79 on mainline).  Normally, an
email is generated automatically but I had to start this one by
hand as there were issues getting it to complete.

You can see the failing test cases with this patch:

  https://lava.collabora.co.uk/results/3126405/0_igt-kms-tegra

Some errors are seen around this point in the log:

  https://lava.collabora.co.uk/scheduler/job/3126405#L1005

[3.029729] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: EMEM address decode error (SMMU translation error [--S])
[3.042058] tegra-mc 70019000.memory-controller: display0a: read 
@0xfe00: Page fault (SMMU translation error [--S])


Here's the same test passing with this patch reverted:

  https://lava.collabora.co.uk/results/3126570/0_igt-kms-tegra
  

For completeness, you can see all the test jobs run by the
automated bisection here:

  
https://lava.collabora.co.uk/scheduler/device_type/tegra124-nyan-big?dt_length=25_search=bisection-gtucker-12#dt_


Please let us know if you need any help debugging this issue or
to try a fix on this platform.

Best wishes,
Guillaume

On 25/11/2020 10:10, Nicolin Chen wrote:
> The bus_set_iommu() in tegra_smmu_probe() enumerates all clients
> to call in tegra_smmu_probe_device() where each client searches
> its DT node for smmu pointer and swgroup ID, so as to configure
> an fwspec. But this requires a valid smmu pointer even before mc
> and smmu drivers are probed. So in tegra_smmu_probe() we added a
> line of code to fill mc->smmu, marking "a bit of a hack".
> 
> This works for most of clients in the DTB, however, doesn't work
> for a client that doesn't exist in DTB, a PCI device for example.
> 
> Actually, if we return ERR_PTR(-ENODEV) in ->probe_device() when
> it's called from bus_set_iommu(), iommu core will let everything
> carry on. Then when a client gets probed, of_iommu_configure() in
> iommu core will search DTB for swgroup ID and call ->of_xlate()
> to prepare an fwspec, similar to tegra_smmu_probe_device() and
> tegra_smmu_configure(). Then it'll call tegra_smmu_probe_device()
> again, and this time we shall return smmu->iommu pointer properly.
> 
> So we can get rid of tegra_smmu_find() and tegra_smmu_configure()
> along with DT polling code by letting the iommu core handle every
> thing, except a problem that we search iommus property in DTB not
> only for swgroup ID but also for mc node to get mc->smmu pointer
> to call dev_iommu_priv_set() and return the smmu->iommu pointer.
> So we'll need to find another way to get smmu pointer.
> 
> Referencing the implementation of sun50i-iommu driver, of_xlate()
> has client's dev pointer, mc node and swgroup ID. This means that
> we can call dev_iommu_priv_set() in of_xlate() instead, so we can
> simply get smmu pointer in ->probe_device().
> 
> This patch reworks tegra_smmu_probe_device() by:
> 1) Removing mc->smmu hack in tegra_smmu_probe() so as to return
>ERR_PTR(-ENODEV) in tegra_smmu_probe_device() during stage of
>tegra_smmu_probe/tegra_mc_probe().
> 2) Moving dev_iommu_priv_set() to of_xlate() so we can get smmu
>pointer in tegra_smmu_probe_device() to replace DTB polling.
> 3) Removing tegra_smmu_configure() accordingly since iommu core
>takes care of it.
> 
> This also fixes a problem that previously we could add clients to
> iommu groups before iommu core initializes its default domain:
> ubuntu@jetson:~$ dmesg | grep iommu
> platform 5000.host1x: Adding to iommu group 1
> platform 5700.gpu: Adding to iommu group 2
> iommu: Default domain type: Translated
> platform 5420.dc: Adding to iommu group 3
> platform 5424.dc: Adding to iommu group 3
> platform 5434.vic: Adding to iommu group 4
> 
> Though it works fine with IOMMU_DOMAIN_UNMANAGED, but will have
> warnings if switching to IOMMU_DOMAIN_DMA:
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> iommu: Failed to allocate default IOMMU domain of type 0 for
>group (null) - Falling back to IOMMU_DOMAIN_DMA
> 
> Now, bypassing the first probe_device() call from bus_set_iommu()
> fixes the sequence:
> ubuntu@jetson:~$ dmesg | grep iommu
> iommu: Default domain type: Translated
> tegra-host1x 5000.host1x: Adding to iommu group 0
> tegra-dc 5420.dc: Adding to iommu group 1
> tegra-dc 5424.dc: Adding to iommu group 1
> tegra-vic 5434.vic: Adding to iommu group 2
> nouveau 5700.gpu: Adding to iommu group 3
> 
> Note that dmesg log above is testing with IOMMU_DOMAIN_UNMANAGED.
> 
> Reviewed-by: Dmitry Osipenko 
> Tested-by: Dmitry Osipenko 
> 

Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-04 Thread Guillaume Tucker
On 04/02/2021 10:27, Ard Biesheuvel wrote:
> On Thu, 4 Feb 2021 at 11:06, Russell King - ARM Linux admin
>  wrote:
>>
>> On Thu, Feb 04, 2021 at 10:07:58AM +0100, Ard Biesheuvel wrote:
>>> On Thu, 4 Feb 2021 at 09:43, Guillaume Tucker
>>>  wrote:
>>>>
>>>> Hi Ard,
>>>>
>>>> Please see the bisection report below about a boot failure on
>>>> rk3288 with next-20210203.  It was also bisected on
>>>> imx6q-var-dt6customboard with next-20210202.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org but this one
>>>> looks valid.
>>>>
>>>> The kernel is most likely crashing very early on, so there's
>>>> nothing in the logs.  Please let us know if you need some help
>>>> with debugging or trying a fix on these platforms.
>>>>
>>>
>>> Thanks for the report.
>>
>> Ard,
>>
>> I want to send my fixes branch today which includes your regression
>> fix that caused this regression.
>>
>> As this is proving difficult to fix, I can only drop your fix from
>> my fixes branch - and given that this seems to be problematical, I'm
>> tempted to revert the original change at this point which should fix
>> both of these regressions - and then we have another go at getting rid
>> of the set/way instructions during the next cycle.
>>
>> Thoughts?
>>
> 
> Hi Russell,
> 
> If Guillaume is willing to do the experiment, and it fixes the issue,

Yes, I'm running some tests with that fix now and should have
some results shortly.

> it proves that rk3288 is relying on the flush before the MMU is
> disabled, and so in that case, the fix is trivial, and we can just
> apply it.
> 
> If the experiment fails (which would mean rk3288 does not tolerate the
> cache maintenance being performed after cache off), it is going to be
> hairy, and so it will definitely take more time.
> 
> So in the latter case (or if Guillaume does not get back to us), I
> think reverting my queued fix is the only sane option. But in that
> case, may I suggest that we queue the revert of the original by-VA
> change for v5.12 so it gets lots of coverage in -next, and allows us
> an opportunity to come up with a proper fix in the same timeframe, and
> backport the revert and the subsequent fix as a pair? Otherwise, we'll
> end up in the situation where v5.10.x until today has by-va, v5.10.x-y
> has set/way, and v5.10y+ has by-va again. (I don't think we care about
> anything before that, given that v5.4 predates any of this)
> 
> But in the end, I'm happy to go along with whatever works best for you.

Thanks,
Guillaume


Re: next/master bisection: baseline.login on sun50i-h5-libretech-all-h3-cc

2021-02-04 Thread Guillaume Tucker
Hi Samuel,

Please see the bisection report below about a boot failure on
sun50i-h5-libretech-all-h3-cc with next-20210203.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The kernel is most likely crashing very early on, so there's
nothing in the logs.  Please let us know if you need some help
with debugging or trying a fix on these platforms.

Best wishes,
Guillaume


On 03/02/2021 23:49, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on sun50i-h5-libretech-all-h3-cc
> 
> Summary:
>   Start:  58b6c0e507b7 Add linux-next specific files for 20210203
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210203/arm64/defconfig/gcc-8/lab-baylibre/baseline-sun50i-h5-libretech-all-h3-cc.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210203/arm64/defconfig/gcc-8/lab-baylibre/baseline-sun50i-h5-libretech-all-h3-cc.html
>   Result: 7240f6156428 ARM: dts: sunxi: Move wakeup-capable IRQs to r_intc
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: sun50i-h5-libretech-all-h3-cc
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 7240f6156428fd61a9b681db71cc288848dd04d7
> Author: Samuel Holland 
> Date:   Sun Jan 17 23:50:38 2021 -0600
> 
> ARM: dts: sunxi: Move wakeup-capable IRQs to r_intc
> 
> All IRQs that can be used to wake up the system must be routed through
> r_intc, so they are visible to firmware while the system is suspended.
> 
> In addition to the external NMI input, which is already routed through
> r_intc, these include PIO and R_PIO (gpio-keys), the LRADC, and the RTC.
> 
> Acked-by: Maxime Ripard 
> Signed-off-by: Samuel Holland 
> Signed-off-by: Chen-Yu Tsai 
> 
> diff --git a/arch/arm/boot/dts/sun6i-a31.dtsi 
> b/arch/arm/boot/dts/sun6i-a31.dtsi
> index 9532331af8ef..a31f9072bf79 100644
> --- a/arch/arm/boot/dts/sun6i-a31.dtsi
> +++ b/arch/arm/boot/dts/sun6i-a31.dtsi
> @@ -611,6 +611,7 @@
>   pio: pinctrl@1c20800 {
>   compatible = "allwinner,sun6i-a31-pinctrl";
>   reg = <0x01c20800 0x400>;
> + interrupt-parent = <_intc>;
>   interrupts = ,
>,
>,
> @@ -802,6 +803,7 @@
>   lradc: lradc@1c22800 {
>   compatible = "allwinner,sun4i-a10-lradc-keys";
>   reg = <0x01c22800 0x100>;
> + interrupt-parent = <_intc>;
>   interrupts = ;
>   status = "disabled";
>   };
> @@ -1299,6 +1301,7 @@
>   #clock-cells = <1>;
>   compatible = "allwinner,sun6i-a31-rtc";
>   reg = <0x01f0 0x54>;
> + interrupt-parent = <_intc>;
>   interrupts = ,
>;
>   clocks = <>;
> @@ -1383,6 +1386,7 @@
>   r_pio: pinctrl@1f02c00 {
>   compatible = "allwinner,sun6i-a31-r-pinctrl";
>   reg = <0x01f02c00 0x400>;
> + interrupt-parent = <_intc>;
>   interrupts = ,
>;
>   clocks = <_gates 0>, <>, < 0>;
> diff --git a/arch/arm/boot/dts/sun8i-a23-a33.dtsi 
> b/arch/arm/boot/dts/sun8i-a23-a33.dtsi
> index a84c90a660ca..4461d5098b20 100644
> --- a/arch/arm/boot/dts/sun8i-a23-a33.dtsi
> +++ b/arch/arm/boot/dts/sun8i-a23-a33.dtsi
> @@ -338,6 +338,7 @@
>   pio: pinctrl@1c20800 {
>   /* compatible gets set in SoC specific dtsi file */
>   reg = <0x01c20800 0x400>;
> + interrupt-parent = <_intc>;
>   /* 

Re: next/master bisection: baseline.login on rk3288-rock2-square

2021-02-04 Thread Guillaume Tucker
Hi Ard,

Please see the bisection report below about a boot failure on
rk3288 with next-20210203.  It was also bisected on
imx6q-var-dt6customboard with next-20210202.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The kernel is most likely crashing very early on, so there's
nothing in the logs.  Please let us know if you need some help
with debugging or trying a fix on these platforms.

Best wishes,
Guillaume


On 04/02/2021 04:25, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on rk3288-rock2-square
> 
> Summary:
>   Start:  58b6c0e507b7 Add linux-next specific files for 20210203
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210203/arm/multi_v7_defconfig/clang-11/lab-collabora/baseline-rk3288-rock2-square.html
>   Result: 5a29552af92d ARM: 9052/1: decompressor: cover BSS in cache 
> clean and reorder with MMU disable on v7
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: rk3288-rock2-square
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   clang-11
>   Config: multi_v7_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 5a29552af92dbd62c2b6fd1cddf7dad1ef7555b2
> Author: Ard Biesheuvel 
> Date:   Sun Jan 24 18:03:45 2021 +0100
> 
> ARM: 9052/1: decompressor: cover BSS in cache clean and reorder with MMU 
> disable on v7
> 
> Commit 401b368caaec ("ARM: decompressor: switch to by-VA cache maintenance
> for v7 cores") replaced the by-set/way cache maintenance in the 
> decompressor
> with by-VA cache maintenance, which is more appropriate for the task at
> hand, especially under virtualization on hosts with non-architected system
> caches that are not affected by by-set/way maintenance at all.
> 
> On such systems, that commit inadvertently removed the cache clean and
> invalidate of all of the guest's memory that is performed by KVM on behalf
> of the guest after its MMU is disabled (but only if any by-set/way cache
> maintenance instructions were issued first). This resulted in various
> erroneous behaviors observed by Russell, all involving the mini-stack
> used by the core kernel's v7 boot code, and which resides in BSS. It
> seems intractable to figure out exactly what goes wrong in each of these
> cases, but some small experiments did suggest that the lack of a cache
> clean and invalidate *after* disabling the MMU and caches is what
> triggers the errors, presumably because cachelines are being allocated
> or reallocated while the first cache clean and invalidate is in progress.
> 
> To ensure that no cache lines cover any of the data that is accessed by
> the booting kernel with the MMU off, include the uncompressed kernel's
> BSS region in the cache clean operation.
> 
> Also, to ensure that no cachelines are allocated while the cache is being
> cleaned, perform the cache clean operation *after* disabling the MMU and
> caches when running on v7 or later, by making a tail call to the clean
> routine from the cache_off routine. This requires passing the VA range
> to cache_off(), which means some care needs to be taken to preserve
> R0 and R1 across the call to cache_off().
> 
> Since this makes the first cache clean redundant, call it with the
> range reduced to zero. This only affects v7, as all other versions
> ignore R0/R1 entirely.
> 
> Link: 
> https://lore.kernel.org/linux-arm-kernel/20210122152012.30075-1-a...@kernel.org
> 
> Fixes: 401b368caaec ("ARM: decompressor: switch to by-VA cache 
> maintenance for v7 cores")
> Reported-by: Russell King 
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Russell King 
> 
> diff --git 

Re: next/master bisection: baseline.bootrr.clk-mt8173-mm-probed on mt8173-elm-hana

2021-01-12 Thread Guillaume Tucker
Hi Saravana,

Please see the bisection report below about the clk-mt8173-mm and
mtk-mmsys drivers failing to probe on mt8173-elm-hana with
next-20210111.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Some details can be found here:

  https://kernelci.org/test/plan/id/5ffbeef7f97770d9ccc94cf0/

The bisection was run with CONFIG_ARM64_64K_PAGES=y and only
against the clk-mt8173-mm test case, but it can be reproduced
with a plain arm64 defconfig and the mtk-mmsys driver is also
confirmed to be failing to probe with this patch.

Your commit message acknowledges the fact that it might "break"
some drivers, or rather that some drivers might need to be fixed:

> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it. [...]

It sounds like this is what needs to be done here, so I've also
put some MediaTek maintainers on CC.

Thanks,
Guillaume

On 11/01/2021 18:24, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.bootrr.clk-mt8173-mm-probed on mt8173-elm-hana
> 
> Summary:
>   Start:  ef8b014ee4a1 Add linux-next specific files for 20210111
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20210111/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/clang-11/lab-collabora/baseline-mt8173-elm-hana.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20210111/arm64/defconfig+CONFIG_ARM64_64K_PAGES=y/clang-11/lab-collabora/baseline-mt8173-elm-hana.html
>   Result: e590474768f1 driver core: Set fw_devlink=on by default
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: mt8173-elm-hana
>   CPU arch:   arm64
>   Lab:lab-collabora
>   Compiler:   clang-11
>   Config: defconfig+CONFIG_ARM64_64K_PAGES=y
>   Test case:  baseline.bootrr.clk-mt8173-mm-probed
> 
> Breaking commit found:
> 
> ---
> commit e590474768f1cc04852190b61dec692411b22e2a
> Author: Saravana Kannan 
> Date:   Thu Dec 17 19:17:03 2020 -0800
> 
> driver core: Set fw_devlink=on by default
> 
> Cyclic dependencies in some firmware was one of the last remaining
> reasons fw_devlink=on couldn't be set by default. Now that cyclic
> dependencies don't block probing, set fw_devlink=on by default.
> 
> Setting fw_devlink=on by default brings a bunch of benefits (currently,
> only for systems with device tree firmware):
> * Significantly cuts down deferred probes.
> * Device probe is effectively attempted in graph order.
> * Makes it much easier to load drivers as modules without having to
>   worry about functional dependencies between modules (depmod is still
>   needed for symbol dependencies).
> 
> If this patch prevents some devices from probing, it's very likely due
> to the system having one or more device drivers that "probe"/set up a
> device (DT node with compatible property) without creating a struct
> device for it.  If we hit such cases, the device drivers need to be
> fixed so that they populate struct devices and probe them like normal
> device drivers so that the driver core is aware of the devices and their
> status. See [1] for an example of such a case.
> 
> [1] - 
> https://lore.kernel.org/lkml/CAGETcx9PiX==mlxb9po8myyk6u2vhpvwtmsa5nkd-ywh5xh...@mail.gmail.com/
> Signed-off-by: Saravana Kannan 
> Link: 
> https://lore.kernel.org/r/20201218031703.3053753-6-sarava...@google.com
> Signed-off-by: Greg Kroah-Hartman 
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 4e15193aafad..e61e62b624ce 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1457,7 +1457,7 @@ static void device_links_purge(struct device *dev)
>  #define FW_DEVLINK_FLAGS_RPM (FW_DEVLINK_FLAGS_ON | \
>DL_FLAG_PM_RUNTIME)
>  

Re: pmwg/integ bisection: baseline.login on rk3328-rock64

2021-01-12 Thread Guillaume Tucker
Hi Vincent,

Please see the bisection report below about a boot failure on
rk3328-rock64 with the pwmg/integ branch.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

There's nothing in the serial console log, probably because it's
crashing too early during boot.

Some details can be found here:

  https://kernelci.org/test/case/id/5ffb978de38e717501c94cd8/

The bisection was run with CONFIG_RANDOMIZE_BASE=y enabled, but
the same issue occurs with a plain defconfig from that branch.
Results with other configs and platforms can be compared here:

  
https://kernelci.org/test/job/pmwg/branch/integ/kernel/v5.11-rc3-13-gcea05edf93998/plan/baseline/

Please let us know if you need some help to test a fix or debug
the issue.

Thanks,
Guillaume


On 11/01/2021 05:36, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> pmwg/integ bisection: baseline.login on rk3328-rock64
> 
> Summary:
>   Start:  cea05edf9399 Merge remote-tracking branch 
> 'georgi.db845c/db845c-fixes' into integ
>   Plain log:  
> https://storage.kernelci.org/pmwg/integ/v5.11-rc3-13-gcea05edf93998/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.txt
>   HTML log:   
> https://storage.kernelci.org/pmwg/integ/v5.11-rc3-13-gcea05edf93998/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-rk3328-rock64.html
>   Result: 31379ec3d17b arm64/hikey: update defconfig
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   pmwg
>   URL:https://git.linaro.org/power/linux.git
>   Branch: integ
>   Target: rk3328-rock64
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig+CONFIG_RANDOMIZE_BASE=y
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 31379ec3d17bf215585f1bac15eff77351830d37
> Author: Vincent Guittot 
> Date:   Tue Nov 17 10:02:58 2020 +0100
> 
> arm64/hikey: update defconfig
> 
> Signed-off-by: Vincent Guittot 
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 5cfe3cf6f2ac..4d2e85c7f96b 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -12,9 +12,11 @@ CONFIG_TASK_IO_ACCOUNTING=y
>  CONFIG_IKCONFIG=y
>  CONFIG_IKCONFIG_PROC=y
>  CONFIG_NUMA_BALANCING=y
> +CONFIG_CGROUPS=y
>  CONFIG_MEMCG=y
>  CONFIG_MEMCG_SWAP=y
>  CONFIG_BLK_CGROUP=y
> +CONFIG_CGROUP_SCHED=y
>  CONFIG_CGROUP_PIDS=y
>  CONFIG_CGROUP_HUGETLB=y
>  CONFIG_CPUSETS=y
> @@ -22,7 +24,6 @@ CONFIG_CGROUP_DEVICE=y
>  CONFIG_CGROUP_CPUACCT=y
>  CONFIG_CGROUP_PERF=y
>  CONFIG_USER_NS=y
> -CONFIG_SCHED_AUTOGROUP=y
>  CONFIG_BLK_DEV_INITRD=y
>  CONFIG_KALLSYMS_ALL=y
>  # CONFIG_COMPAT_BRK is not set
> @@ -83,7 +84,6 @@ CONFIG_CPU_FREQ_GOV_POWERSAVE=m
>  CONFIG_CPU_FREQ_GOV_USERSPACE=y
>  CONFIG_CPU_FREQ_GOV_ONDEMAND=y
>  CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m
> -CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
>  CONFIG_CPUFREQ_DT=y
>  CONFIG_ACPI_CPPC_CPUFREQ=m
>  CONFIG_ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM=m
> @@ -264,6 +264,7 @@ CONFIG_VIRTIO_BLK=y
>  CONFIG_BLK_DEV_NVME=m
>  CONFIG_SRAM=y
>  CONFIG_PCI_ENDPOINT_TEST=m
> +CONFIG_HISI_HIKEY_USB=m
>  CONFIG_EEPROM_AT24=m
>  CONFIG_EEPROM_AT25=m
>  CONFIG_UACCE=m
> @@ -768,9 +769,13 @@ CONFIG_USB_CONFIGFS_RNDIS=y
>  CONFIG_USB_CONFIGFS_EEM=y
>  CONFIG_USB_CONFIGFS_MASS_STORAGE=y
>  CONFIG_USB_CONFIGFS_F_FS=y
> +CONFIG_USB_ETH=m
>  CONFIG_TYPEC=m
>  CONFIG_TYPEC_TCPM=m
> +CONFIG_TYPEC_TCPCI=m
> +CONFIG_TYPEC_RT1711H=m
>  CONFIG_TYPEC_FUSB302=m
> +CONFIG_TYPEC_UCSI=m
>  CONFIG_TYPEC_HD3SS3220=m
>  CONFIG_MMC=y
>  CONFIG_MMC_BLOCK_MINORS=32
> @@ -997,6 +1002,7 @@ CONFIG_PHY_XGENE=y
>  CONFIG_PHY_SUN4I_USB=y
>  CONFIG_PHY_MIXEL_MIPI_DPHY=m
>  CONFIG_PHY_HI6220_USB=y
> +CONFIG_PHY_HI3660_USB=m
>  CONFIG_PHY_HISTB_COMBPHY=y
>  CONFIG_PHY_HISI_INNO_USB2=y
>  CONFIG_PHY_MVEBU_CP110_COMPHY=y
> @@ -1059,7 +1065,6 @@ CONFIG_CUSE=m
>  CONFIG_OVERLAY_FS=m
>  CONFIG_VFAT_FS=y
>  CONFIG_HUGETLBFS=y
> -CONFIG_CONFIGFS_FS=y
>  CONFIG_EFIVAR_FS=y
>  CONFIG_SQUASHFS=y
>  CONFIG_NFS_FS=y
> @@ -1088,7 +1093,6 @@ CONFIG_DEBUG_INFO=y
>  CONFIG_MAGIC_SYSRQ=y
>  CONFIG_DEBUG_FS=y
>  

Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging

2021-01-12 Thread Guillaume Tucker
On 12/01/2021 10:53, Guillaume Tucker wrote:
> On 05/01/2021 09:13, Mike Rapoport wrote:
>> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
>>> Hello Mike,
>>>
>>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
>>>> Thanks for the logs, it seems that implicitly adding reserved regions to
>>>> memblock.memory wasn't that bright idea :)
>>>
>>> Would it be possible to somehow clean up the hack then?
>>>
>>> The only difference between the clean solution and the hack is that
>>> the hack intended to achieved the exact same, but without adding the
>>> reserved regions to memblock.memory.
>>
>> I didn't consider adding reserved regions to memblock.memory as a clean
>> solution, this was still a hack, but I didn't think that things are that
>> fragile.
>>
>> I still think we cannot rely on memblock.reserved to detect
>> memory/zone/node sizes and the boot failure reported here confirms this.
>>  
>>> The comment on that problematic area says the reserved area cannot be
>>> used for DMA because of some unexplained hw issue, and that doing so
>>> prevents booting, but since the area got reserved, even with the clean
>>> solution, it shouldn't have never been used for DMA?
>>>
>>> So I can only imagine that the physical memory region is way more
>>> problematic than just for DMA. It sounds like that anything that
>>> touches it, including the CPU, will hang the system, not just DMA. It
>>> sounds somewhat similar to the other e820 direct mapping issue on x86?
>>
>> My understanding is that the boot failed because when I implicitly added
>> the reserved region to memblock.memory the memory size seen by
>> free_area_init() jumped from 2G to 4G because the reserved area was close
>> to 4G. The very first allocation would get a chunk from slightly below of
>> 4G and as there is no real memory there, the kernel would crash.
>>  
>>> If you want to test the hack on the arm board to check if it boots you
>>> can use the below commit:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc
>>
>> My take is your solution would boot with this memory configuration, but I
>> still don't think that using memblock.reserved for zone/node sizing is
>> correct.
> 
> The rk3288 platform has now been failing to boot for nearly a
> month on linux-next:
> 
>   https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/
> 
> Until a fix or a new version of this patch is made, would it be
> possible to drop it or revert it so the platform become usable
> again?
> 
> Or if you want, I can make a cleaned-up version of my hack to
> ignore the problematic region if you still need your patch to be
> on linux-next, but that would probably be less than ideal.

By the way, another bisection found that this commit is also
breaking tegra124-nyan-big but only with both CONFIG_EFI=y
CONFIG_ARM_LPAE=y enabled:

  https://kernelci.org/test/case/id/5ff6b1e26cf19f3b10c94cc5/

The plain multi_v7_defconfig is booting fine:

  https://kernelci.org/test/plan/id/5ff6b0a1db91b8a2b9c94cba/

I haven't looked into this one or tried to make it boot like
rk3288, but please let me know if there's anything there that can
be done to help.

Thanks,
Guillaume


Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging

2021-01-12 Thread Guillaume Tucker
On 05/01/2021 09:13, Mike Rapoport wrote:
> On Sun, Jan 03, 2021 at 03:09:14PM -0500, Andrea Arcangeli wrote:
>> Hello Mike,
>>
>> On Sun, Jan 03, 2021 at 03:47:53PM +0200, Mike Rapoport wrote:
>>> Thanks for the logs, it seems that implicitly adding reserved regions to
>>> memblock.memory wasn't that bright idea :)
>>
>> Would it be possible to somehow clean up the hack then?
>>
>> The only difference between the clean solution and the hack is that
>> the hack intended to achieved the exact same, but without adding the
>> reserved regions to memblock.memory.
> 
> I didn't consider adding reserved regions to memblock.memory as a clean
> solution, this was still a hack, but I didn't think that things are that
> fragile.
> 
> I still think we cannot rely on memblock.reserved to detect
> memory/zone/node sizes and the boot failure reported here confirms this.
>  
>> The comment on that problematic area says the reserved area cannot be
>> used for DMA because of some unexplained hw issue, and that doing so
>> prevents booting, but since the area got reserved, even with the clean
>> solution, it shouldn't have never been used for DMA?
>>
>> So I can only imagine that the physical memory region is way more
>> problematic than just for DMA. It sounds like that anything that
>> touches it, including the CPU, will hang the system, not just DMA. It
>> sounds somewhat similar to the other e820 direct mapping issue on x86?
> 
> My understanding is that the boot failed because when I implicitly added
> the reserved region to memblock.memory the memory size seen by
> free_area_init() jumped from 2G to 4G because the reserved area was close
> to 4G. The very first allocation would get a chunk from slightly below of
> 4G and as there is no real memory there, the kernel would crash.
>  
>> If you want to test the hack on the arm board to check if it boots you
>> can use the below commit:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/andrea/aa.git/commit/?id=c3ea2633015104ce0df33dcddbc36f57de1392bc
> 
> My take is your solution would boot with this memory configuration, but I
> still don't think that using memblock.reserved for zone/node sizing is
> correct.

The rk3288 platform has now been failing to boot for nearly a
month on linux-next:

  https://kernelci.org/test/case/id/5ffbed0a31ad81239bc94cdb/

Until a fix or a new version of this patch is made, would it be
possible to drop it or revert it so the platform become usable
again?

Or if you want, I can make a cleaned-up version of my hack to
ignore the problematic region if you still need your patch to be
on linux-next, but that would probably be less than ideal.

Thanks,
Guillaume


Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging

2020-12-18 Thread Guillaume Tucker
On 13/12/2020 08:23, Mike Rapoport wrote:
> Hi Guillaume,
> 
> On Fri, Dec 11, 2020 at 09:53:46PM +, Guillaume Tucker wrote:
>> Hi Mike,
>>
>> Please see the bisection report below about a boot failure on
>> rk3288 with next-20201210.
>>
>> Reports aren't automatically sent to the public while we're
>> trialing new bisection features on kernelci.org but this one
>> looks valid.
>>
>> There's nothing in the serial console log, probably because it's
>> crashing too early during boot.  This was confirmed on two rk3288
>> platforms on kernelci.org: rk3288-veyron-jaq and
>> rk3288-rock2-square.  There's no clear sign about other platforms
>> being impacted.
>>
>> If this looks like something you want to investigate but you
>> don't have a platform at hand to reproduce it, please let us know
>> if you would like the test to be re-run on kernelci.org with some
>> debug config turned on, or if you have a fix to try.
> 
> I'd apprciate if you can build a working kernel with
> CONFIG_DEBUG_MEMORY_INIT=y and run it with 
> 
>   memblock=debug mminit_loglevel=4
> 
> in the command line.
> 
> If I understand correctly, DEBUG_LL is not an option for these platforms
> so if earlyprintk didn't display the log there is not much to do about
> it.

OK, sorry for the delay.  I've built a kernel and booted it as
you requested, and also found that the issue was due to this
memory area defined in arch/arm/boot/dts/rk3288.dtsi:

reserved-memory {
#address-cells = <2>;
#size-cells = <2>;
ranges;

/*
 * The rk3288 cannot use the memory area above 0xfe00
 * for dma operations for some reason. While there is
 * probably a better solution available somewhere, we
 * haven't found it yet and while devices with 2GB of ram
 * are not affected, this issue prevents 4GB from booting.
 * So to make these devices at least bootable, block
 * this area for the time being until the real solution
 * is found.
 */
dma-unusable@fe00 {
reg = <0x0 0xfe00 0x0 0x100>;
};
};

So I've put a hack[1] on top of 950c37691925 to skip adding a
node in memblock_enforce_memory_reserved_overlap() if the base
address is 0xfe00, which got the kernel booting.  Here's the
console log:

  https://people.collabora.com/~gtucker/tmp/2966825.txt

and the full test job details, if this helps:

  https://lava.collabora.co.uk/scheduler/job/2966825


I haven't really looked much further than that, but I'll be
available on Monday to help run other tests if needed.

Thanks,
Guillaume

[1] https://people.collabora.com/~gtucker/tmp/2966825.patch


Re: next/master bisection: baseline.login on ox820-cloudengines-pogoplug-series-3

2020-12-18 Thread Guillaume Tucker
Hi Ard,

Please see the bisection report below about a boot failure on
ox820-cloudengines-pogoplug-series-3.  There was also a bisection
yesterday with next-20201216 which landed on the same commit, on
the same platform and also with oxnas_v6_defconfig.  I'm not
aware of any other platform on kernelci.org showing the same
regression.

Hope this helps!

Best wishes,
Guillaume

On 18/12/2020 10:51, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on ox820-cloudengines-pogoplug-series-3
> 
> Summary:
>   Start:  90cc8cf2d1ab Add linux-next specific files for 20201217
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20201217/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/baseline-ox820-cloudengines-pogoplug-series-3.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20201217/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/baseline-ox820-cloudengines-pogoplug-series-3.html
>   Result: f77ac2e378be ARM: 9030/1: entry: omit FP emulation for UND 
> exceptions taken in kernel mode
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: ox820-cloudengines-pogoplug-series-3
>   CPU arch:   arm
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: oxnas_v6_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit f77ac2e378be9dd61eb88728f0840642f045d9d1
> Author: Ard Biesheuvel 
> Date:   Thu Nov 19 18:09:16 2020 +0100
> 
> ARM: 9030/1: entry: omit FP emulation for UND exceptions taken in kernel 
> mode
> 
> There are a couple of problems with the exception entry code that deals
> with FP exceptions (which are reported as UND exceptions) when building
> the kernel in Thumb2 mode:
> - the conditional branch to vfp_kmode_exception in vfp_support_entry()
>   may be out of range for its target, depending on how the linker decides
>   to arrange the sections;
> - when the UND exception is taken in kernel mode, the emulation handling
>   logic is entered via the 'call_fpe' label, which means we end up using
>   the wrong value/mask pairs to match and detect the NEON opcodes.
> 
> Since UND exceptions in kernel mode are unlikely to occur on a hot path
> (as opposed to the user mode version which is invoked for VFP support
> code and lazy restore), we can use the existing undef hook machinery for
> any kernel mode instruction emulation that is needed, including calling
> the existing vfp_kmode_exception() routine for unexpected cases. So drop
> the call to call_fpe, and instead, install an undef hook that will get
> called for NEON and VFP instructions that trigger an UND exception in
> kernel mode.
> 
> While at it, make sure that the PC correction is accurate for the
> execution mode where the exception was taken, by checking the PSR
> Thumb bit.
> 
> Cc: Dmitry Osipenko 
> Cc: Kees Cook 
> Fixes: eff8728fe698 ("vmlinux.lds.h: Add PGO and AutoFDO input sections")
> Signed-off-by: Ard Biesheuvel 
> Reviewed-by: Linus Walleij 
> Reviewed-by: Nick Desaulniers 
> Signed-off-by: Russell King 
> 
> diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
> index c4220f51fcf3..0ea8529a4872 100644
> --- a/arch/arm/kernel/entry-armv.S
> +++ b/arch/arm/kernel/entry-armv.S
> @@ -252,31 +252,10 @@ __und_svc:
>  #else
>   svc_entry
>  #endif
> - @
> - @ call emulation code, which returns using r9 if it has emulated
> - @ the instruction, or the more conventional lr if we are to treat
> - @ this as a real undefined instruction
> - @
> - @  r0 - instruction
> - @
> -#ifndef CONFIG_THUMB2_KERNEL
> - ldr r0, [r4, #-4]
> -#else
> - mov r1, #2
> - ldrhr0, [r4, #-2]   @ Thumb instruction at LR - 2
> - cmp r0, #0xe800 @ 32-bit instruction if xx >= 0
> - blo __und_svc_fault
> - ldrhr9, [r4]@ bottom 16 bits
> - add r4, r4, 

Re: linusw/devel bisection: baseline.bootrr.mediatek-mt8173-pinctrl-probed on mt8173-elm-hana

2020-12-16 Thread Guillaume Tucker
On 16/12/2020 12:41, Linus Walleij wrote:
> On Wed, Dec 16, 2020 at 11:10 AM Guillaume Tucker
>  wrote:
> 
>>> It seems we need to teach the core to ignore the name (empty string).
>>
>> OK great, I see you've sent a patch for that.  I'll check if we
>> can confirm it fixes the issue (something I'd like to also
>> automate...).
> 
> Yups would love to hear if this solves it, it should be in today's
> -next.

Yes in fact it appears to be all fixed on your for-next branch:

  https://kernelci.org/test/case/id/5fda32f92738afa48dc94ce1/

Today's linux-next was not tested in the Collabora lab because of
some infrastructure problem, but that's resolved now so it should
be in tomorrow's results.

Best wishes,
Guillaume


Re: linusw/devel bisection: baseline.bootrr.mediatek-mt8173-pinctrl-probed on mt8173-elm-hana

2020-12-16 Thread Guillaume Tucker
On 15/12/2020 12:20, Linus Walleij wrote:
> On Mon, Dec 14, 2020 at 11:28 PM Guillaume Tucker
>  wrote:
> 
>> Please see the bisection report below about the pinctrl driver
>> failing to probe on the arm64 mt8173-elm-hana platform.
> 
> That's an excellent, helpful report which helps a lot!
> Thank you for doing this!

Thanks for the feedback!  Glad this helped.

>> This is the error message:
>>
>> [0.051788] gpio gpiochip0: Detected name collision for GPIO name ''
>> [0.051813] gpio gpiochip0: GPIO name collision on the same chip, this is 
>> not allowed, fix all lines on the chip to have unique names
>> [0.051832] gpiochip_add_data_with_key: GPIOs 377..511 (1000b000.pinctrl) 
>> failed to register, -17
>> [0.051946] mediatek-mt8173-pinctrl: probe of 1000b000.pinctrl failed 
>> with error -22
> 
> It seems we need to teach the core to ignore the name (empty string).

OK great, I see you've sent a patch for that.  I'll check if we
can confirm it fixes the issue (something I'd like to also
automate...).

Best wishes,
Guillaume


Re: linusw/devel bisection: baseline.bootrr.mediatek-mt8173-pinctrl-probed on mt8173-elm-hana

2020-12-14 Thread Guillaume Tucker
Hi Linus,

Please see the bisection report below about the pinctrl driver
failing to probe on the arm64 mt8173-elm-hana platform.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

This is the error message:

[0.051788] gpio gpiochip0: Detected name collision for GPIO name ''
[0.051813] gpio gpiochip0: GPIO name collision on the same chip, this is 
not allowed, fix all lines on the chip to have unique names
[0.051832] gpiochip_add_data_with_key: GPIOs 377..511 (1000b000.pinctrl) 
failed to register, -17
[0.051946] mediatek-mt8173-pinctrl: probe of 1000b000.pinctrl failed with 
error -22

and the full log:

  
https://storage.kernelci.org/linusw/devel/v5.10-rc4-91-g65efb43ac94b/arm64/defconfig/gcc-8/lab-collabora/baseline-mt8173-elm-hana.html#L492

I guess some GPIO now needs to be renamed following your patch
which enforces uniqueness, so it's not a problem with the patch
per se.  As I'm not sure if it's something you would want to fix
yourself, I've also CC-ed MediaTek and others such as Enric who
knows about this platform and helped enable the test in KernelCI.

Best wishes,
Guillaume

On 14/12/2020 13:47, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> linusw/devel bisection: baseline.bootrr.mediatek-mt8173-pinctrl-probed on 
> mt8173-elm-hana
> 
> Summary:
>   Start:  65efb43ac94b gpiolib: Disallow identical line names in the same 
> chip
>   Plain log:  
> https://storage.kernelci.org/linusw/devel/v5.10-rc4-91-g65efb43ac94b/arm64/defconfig/gcc-8/lab-collabora/baseline-mt8173-elm-hana.txt
>   HTML log:   
> https://storage.kernelci.org/linusw/devel/v5.10-rc4-91-g65efb43ac94b/arm64/defconfig/gcc-8/lab-collabora/baseline-mt8173-elm-hana.html
>   Result: 65efb43ac94b gpiolib: Disallow identical line names in the same 
> chip
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   linusw
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git/
>   Branch: devel
>   Target: mt8173-elm-hana
>   CPU arch:   arm64
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.bootrr.mediatek-mt8173-pinctrl-probed
> 
> Breaking commit found:
> 
> ---
> commit 65efb43ac94bffeb652cddba4106817bb38c5e71
> Author: Linus Walleij 
> Date:   Sat Dec 12 01:34:47 2020 +0100
> 
> gpiolib: Disallow identical line names in the same chip
> 
> We need to make this namespace hierarchical: at least do not
> allow two lines on the same chip to have the same name, this
> is just too much flexibility. If we name a line on a chip,
> name it uniquely on that chip.
> 
> I don't know what happens if we just apply this, I *hope* there
> are not a lot of systems out there breaking this simple and
> intuitive rule.
> 
> As a side effect, this makes the device tree naming code
> scream a bit if names are not globally unique.
> 
> I think there are not super-many device trees out there naming
> their lines so let's fix this before the problem becomes
> widespread.
> 
> Cc: Geert Uytterhoeven 
> Cc: Johan Hovold 
> Signed-off-by: Linus Walleij 
> Link: 
> https://lore.kernel.org/r/20201212003447.238474-1-linus.wall...@linaro.org
> 
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index 5ce0c14c637b..fe1b96b7f127 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -330,11 +330,9 @@ static struct gpio_desc *gpio_name_to_desc(const char * 
> const name)
>  
>  /*
>   * Take the names from gc->names and assign them to their GPIO descriptors.
> - * Warn if a name is already used for a GPIO line on a different GPIO chip.
>   *
> - * Note that:
> - *   1. Non-unique names are still accepted,
> - *   2. Name collisions within the same GPIO chip are not reported.
> + * - Fail if a name is already used for a GPIO line on the same chip.
> + * - Allow names to not be globally unique but warn about it.
>   */
>  static int gpiochip_set_desc_names(struct gpio_chip *gc)
>  {
> @@ -343,13 +341,19 @@ static int gpiochip_set_desc_names(struct 

Re: kernelci/staging-next bisection: sleep.login on rk3288-rock2-square #2286-staging

2020-12-11 Thread Guillaume Tucker
Hi Mike,

Please see the bisection report below about a boot failure on
rk3288 with next-20201210.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

There's nothing in the serial console log, probably because it's
crashing too early during boot.  This was confirmed on two rk3288
platforms on kernelci.org: rk3288-veyron-jaq and
rk3288-rock2-square.  There's no clear sign about other platforms
being impacted.

If this looks like something you want to investigate but you
don't have a platform at hand to reproduce it, please let us know
if you would like the test to be re-run on kernelci.org with some
debug config turned on, or if you have a fix to try.

Thanks,
Guillaume

On 11/12/2020 21:34, staging.kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> kernelci/staging-next bisection: sleep.login on rk3288-rock2-square 
> #2286-staging
> 
> Summary:
>   Start:  7f507faf2d85 staging-next-20201211.0

This is really next-20201210...  The revision shown here is just
an artifact of staging.kernelci.org which creates its own tags.

>   Plain log:  
> https://storage.staging.kernelci.org/kernelci/staging-next/staging-next-20201211.0/arm/multi_v7_defconfig/gcc-8/lab-collabora/sleep-rk3288-rock2-square.txt
>   HTML log:   
> https://storage.staging.kernelci.org/kernelci/staging-next/staging-next-20201211.0/arm/multi_v7_defconfig/gcc-8/lab-collabora/sleep-rk3288-rock2-square.html
>   Result: 950c37691925 mm: memblock: enforce overlap of memory.memblock 
> and memory.reserved
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   kernelci
>   URL:https://github.com/kernelci/linux.git
>   Branch: staging-next
>   Target: rk3288-rock2-square
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: multi_v7_defconfig
>   Test case:  sleep.login
> 
> Breaking commit found:
> 
> ---
> commit 950c3769192512118a87432dd42e71c5241dbd10
> Author: Mike Rapoport 
> Date:   Thu Dec 10 15:40:51 2020 +1100
> 
> mm: memblock: enforce overlap of memory.memblock and memory.reserved
> 
> Patch series "mm: fix initialization of struct page for holes in  memory 
> layout", v2.
> 
> Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
> rather that check each PFN") exposed several issues with the memory map
> initialization and these patches fix those issues.
> 
> Initially there were crashes during compaction that Qian Cai reported back
> in April [1].  It seemed back then that the probelm was fixed, but a few
> weeks ago Andrea Arcangeli hit the same bug [2] and after a long
> discussion between us [3] I think these patches are the proper fix.
> 
> [1] 
> https://lore.kernel.org/lkml/8c537eb7-85ee-4dcf-943e-3cc0ed0df...@lca.pw
> [2] 
> https://lore.kernel.org/lkml/20201121194506.13464-1-aarca...@redhat.com
> [3] 
> https://lore.kernel.org/mm-commits/20201206005401.qkuavgoxr%a...@linux-foundation.org
> 
> This patch (of 2):
> 
> memblock does not require that the reserved memory ranges will be a subset
> of memblock.memory.
> 
> As a result there may be reserved pages that are not in the range of any
> zone or node because zone and node boundaries are detected based on
> memblock.memory and pages that only present in memblock.reserved are not
> taken into account during zone/node size detection.
> 
> Make sure that all ranges in memblock.reserved are added to
> memblock.memory before calculating node and zone boundaries.
> 
> Link: https://lkml.kernel.org/r/20201209214304.6812-1-r...@kernel.org
> Link: https://lkml.kernel.org/r/20201209214304.6812-2-r...@kernel.org
> Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions 
> rather that check each PFN")
> Signed-off-by: Mike Rapoport 
> Reported-by: Andrea Arcangeli 
> Cc: Baoquan He 
> Cc: David Hildenbrand 
> Cc: Mel Gorman 
> Cc: Michal Hocko 
> Cc: Qian Cai 
> Cc: Vlastimil Babka 
> Cc: 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Stephen 

Re: [PATCH 2/2] drm/meson: dw-hdmi: Enable the iahb clock early enough

2020-11-20 Thread Guillaume Tucker
On 20/11/2020 09:42, Marc Zyngier wrote:
> Instead of moving meson_dw_hdmi_init() around which breaks existing
> platform, let's enable the clock meson_dw_hdmi_init() depends on.
> This means we don't have to worry about this clock being enabled or
> not, depending on the boot-loader features.
> 
> Fixes: b33340e33acd ("drm/meson: dw-hdmi: Ensure that clocks are enabled 
> before touching the TOP registers")
> Reported-by: Guillaume Tucker 

Although I am triaging kernelci bisections, it was initially
found thanks to our friendly bot.  So if you're OK with this, it
would most definitely appreciate a mention:

  Reported-by: "kernelci.org bot" 

Thanks,
Guillaume


Re: next/master bisection: baseline.dmesg.emerg on meson-gxbb-p200

2020-11-19 Thread Guillaume Tucker
Hi Marc,

On 19/11/2020 11:58, Marc Zyngier wrote:
> On 2020-11-19 10:26, Neil Armstrong wrote:
>> On 19/11/2020 11:20, Marc Zyngier wrote:
>>> On 2020-11-19 08:50, Guillaume Tucker wrote:
>>>> Please see the automated bisection report below about some kernel
>>>> errors on meson-gxbb-p200.
>>>>
>>>> Reports aren't automatically sent to the public while we're
>>>> trialing new bisection features on kernelci.org, however this one
>>>> looks valid.
>>>>
>>>> The bisection started with next-20201118 but the errors are still
>>>> present in next-20201119.  Details for this regression:
>>>>
>>>>   https://kernelci.org/test/case/id/5fb6196bfd0127fd68d8d902/
>>>>
>>>> The first error is:
>>>>
>>>>   [   14.757489] Internal error: synchronous external abort: 96000210
>>>> [#1] PREEMPT SMP
>>>
>>> Looks like yet another clock ordering setup. I guess different Amlogic
>>> platforms have slightly different ordering requirements.
>>>
>>> Neil, do you have any idea of which platform requires which ordering?
>>> The variability in DT and platforms is pretty difficult to follow (and
>>> I don't think I have such board around).
>>
>> The requirements should be the same, here the init was done before calling
>> dw_hdmi_probe to be sure the clocks and internals resets were deasserted.
>> But since you boot from u-boot already enabling these, it's already active.
>>
>> The solution would be to revert and do some check in meson_dw_hdmi_init() to
>> check if already enabled and do nothing.
> 
> A better fix seems to be this, which makes it explicit that there is
> a dependency between some of the registers accessed from meson_dw_hdmi_init()
> and the iahb clock.
> 
> Guillaume, can you give this a go on your failing box?

I confirm it solves the problem.  Please add this to your fix
patch if it's OK with you:

  Reported-by: "kernelci.org bot" 
  Tested-by: Guillaume Tucker 


For the record, it passed all the tests when applied on top of
the "bad" revision found by the bisection:

  
http://lava.baylibre.com:10080/scheduler/alljobs?search=v5.10-rc3-1021-gb8668a2e5ea1

and the exact same test on the "bad" revision without the fix
consistently showed the error:

  http://lava.baylibre.com:10080/scheduler/job/374176


Thanks,
Guillaume


> diff --git a/drivers/gpu/drm/meson/meson_dw_hdmi.c 
> b/drivers/gpu/drm/meson/meson_dw_hdmi.c
> index 7f8eea494147..52af8ba94311 100644
> --- a/drivers/gpu/drm/meson/meson_dw_hdmi.c
> +++ b/drivers/gpu/drm/meson/meson_dw_hdmi.c
> @@ -146,6 +146,7 @@ struct meson_dw_hdmi {
>  struct reset_control *hdmitx_ctrl;
>  struct reset_control *hdmitx_phy;
>  struct clk *hdmi_pclk;
> +    struct clk *iahb_clk;
>  struct clk *venci_clk;
>  struct regulator *hdmi_supply;
>  u32 irq_stat;
> @@ -1033,6 +1034,13 @@ static int meson_dw_hdmi_bind(struct device *dev, 
> struct device *master,
>  }
>  clk_prepare_enable(meson_dw_hdmi->hdmi_pclk);
> 
> +    meson_dw_hdmi->iahb_clk = devm_clk_get(dev, "iahb");
> +    if (IS_ERR(meson_dw_hdmi->iahb_clk)) {
> +    dev_err(dev, "Unable to get iahb clk\n");
> +    return PTR_ERR(meson_dw_hdmi->iahb_clk);
> +    }
> +    clk_prepare_enable(meson_dw_hdmi->iahb_clk);
> +
>  meson_dw_hdmi->venci_clk = devm_clk_get(dev, "venci");
>  if (IS_ERR(meson_dw_hdmi->venci_clk)) {
>  dev_err(dev, "Unable to get venci clk\n");
> @@ -1071,6 +1079,8 @@ static int meson_dw_hdmi_bind(struct device *dev, 
> struct device *master,
> 
>  encoder->possible_crtcs = BIT(0);
> 
> +    meson_dw_hdmi_init(meson_dw_hdmi);
> +
>  DRM_DEBUG_DRIVER("encoder initialized\n");
> 
>  /* Bridge / Connector */
> @@ -1095,8 +1105,6 @@ static int meson_dw_hdmi_bind(struct device *dev, 
> struct device *master,
>  if (IS_ERR(meson_dw_hdmi->hdmi))
>  return PTR_ERR(meson_dw_hdmi->hdmi);
> 
> -    meson_dw_hdmi_init(meson_dw_hdmi);
> -
>  next_bridge = of_drm_find_bridge(pdev->dev.of_node);
>  if (next_bridge)
>  drm_bridge_attach(encoder, next_bridge,
> 
> 



Re: next/master bisection: baseline.dmesg.emerg on meson-gxbb-p200

2020-11-19 Thread Guillaume Tucker
Please see the automated bisection report below about some kernel
errors on meson-gxbb-p200.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org, however this one
looks valid.

The bisection started with next-20201118 but the errors are still
present in next-20201119.  Details for this regression:

  https://kernelci.org/test/case/id/5fb6196bfd0127fd68d8d902/

The first error is:

  [   14.757489] Internal error: synchronous external abort: 96000210 [#1] 
PREEMPT SMP

Full log:

  
https://storage.kernelci.org/next/master/next-20201119/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-gxbb-p200.html#L410

Some other platforms are failing to boot starting with
next-20201118 but it's unclear whether that's due to the same
issue.  They might lead to a successful bisection which would
help clarify this.  All the baseline test results can be found
here:

  
https://kernelci.org/test/job/next/branch/master/kernel/next-20201119/plan/baseline/


Hope this helps.  Pleas let us know if you need some help to
reproduce the issue or try a fix.

Thanks,
Guillaume

On 19/11/2020 03:03, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.dmesg.emerg on meson-gxbb-p200
> 
> Summary:
>   Start:  205292332779 Add linux-next specific files for 20201118
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20201118/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-gxbb-p200.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20201118/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-gxbb-p200.html
>   Result: b33340e33acd drm/meson: dw-hdmi: Ensure that clocks are enabled 
> before touching the TOP registers
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: meson-gxbb-p200
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.dmesg.emerg
> 
> Breaking commit found:
> 
> ---
> commit b33340e33acdfe5ca6a5aa1244709575ae1e0432
> Author: Marc Zyngier 
> Date:   Mon Nov 16 20:07:44 2020 +
> 
> drm/meson: dw-hdmi: Ensure that clocks are enabled before touching the 
> TOP registers
> 
> Removing the meson-dw-hdmi module and re-inserting it results in a hang
> as the driver writes to HDMITX_TOP_SW_RESET. Similar effects can be seen
> when booting with mainline u-boot and using the u-boot provided DT (which
> is highly desirable).
> 
> The reason for the hang seem to be that the clocks are not always
> enabled by the time we enter meson_dw_hdmi_init(). Moving this call
> *after* dw_hdmi_probe() ensures that the clocks are enabled.
> 
> Fixes: 1374b8375c2e ("drm/meson: dw_hdmi: add resume/suspend hooks")
> Signed-off-by: Marc Zyngier 
> Acked-by: Neil Armstrong 
> Signed-off-by: Neil Armstrong 
> Link: 
> https://patchwork.freedesktop.org/patch/msgid/20201116200744.495826-5-...@kernel.org
> 
> diff --git a/drivers/gpu/drm/meson/meson_dw_hdmi.c 
> b/drivers/gpu/drm/meson/meson_dw_hdmi.c
> index 68826cf9993f..7f8eea494147 100644
> --- a/drivers/gpu/drm/meson/meson_dw_hdmi.c
> +++ b/drivers/gpu/drm/meson/meson_dw_hdmi.c
> @@ -1073,8 +1073,6 @@ static int meson_dw_hdmi_bind(struct device *dev, 
> struct device *master,
>  
>   DRM_DEBUG_DRIVER("encoder initialized\n");
>  
> - meson_dw_hdmi_init(meson_dw_hdmi);
> -
>   /* Bridge / Connector */
>  
>   dw_plat_data->priv_data = meson_dw_hdmi;
> @@ -1097,6 +1095,8 @@ static int meson_dw_hdmi_bind(struct device *dev, 
> struct device *master,
>   if (IS_ERR(meson_dw_hdmi->hdmi))
>   return PTR_ERR(meson_dw_hdmi->hdmi);
>  
> + meson_dw_hdmi_init(meson_dw_hdmi);
> +
>   next_bridge = of_drm_find_bridge(pdev->dev.of_node);
>   if (next_bridge)
>   drm_bridge_attach(encoder, next_bridge,
> ---
> 
> 
> Git bisection log:
> 
> 

Re: rmk/for-next bisection: baseline.login on bcm2836-rpi-2-b

2020-11-16 Thread Guillaume Tucker
On 16/11/2020 12:20, Ard Biesheuvel wrote:
> On Mon, 16 Nov 2020 at 12:20, Ard Biesheuvel  wrote:
>>
>> On Sun, 15 Nov 2020 at 15:11, Ard Biesheuvel  wrote:
>>>
>>> On Fri, 13 Nov 2020 at 17:25, Ard Biesheuvel  wrote:
>>>>
>>>> On Fri, 13 Nov 2020 at 17:15, Ard Biesheuvel  wrote:
>>>>>
>>>>> On Fri, 13 Nov 2020 at 16:58, Russell King - ARM Linux admin
>>>>>  wrote:
>>>>>>
>>>>>> On Fri, Nov 13, 2020 at 03:43:27PM +, Guillaume Tucker wrote:
>>>>>>> On 13/11/2020 10:35, Ard Biesheuvel wrote:
>>>>>>>> On Fri, 13 Nov 2020 at 11:31, Guillaume Tucker
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> Hi Ard,
>>>>>>>>>
>>>>>>>>> Please see the bisection report below about a boot failure on
>>>>>>>>> RPi-2b.
>>>>>>>>>
>>>>>>>>> Reports aren't automatically sent to the public while we're
>>>>>>>>> trialing new bisection features on kernelci.org but this one
>>>>>>>>> looks valid.
>>>>>>>>>
>>>>>>>>> There's nothing in the serial console log, probably because it's
>>>>>>>>> crashing too early during boot.  I'm not sure if other platforms
>>>>>>>>> on kernelci.org were hit by this in the same way, but there
>>>>>>>>> doesn't seem to be any.
>>>>>>>>>
>>>>>>>>> The same regression can be see on rmk's for-next branch as well
>>>>>>>>> as in linux-next.  It happens with both bcm2835_defconfig and
>>>>>>>>> multi_v7_defconfig.
>>>>>>>>>
>>>>>>>>> Some more details can be found here:
>>>>>>>>>
>>>>>>>>>   https://kernelci.org/test/case/id/5fae44823818ee918adb8864/
>>>>>>>>>
>>>>>>>>> If this looks like a real issue but you don't have a platform at
>>>>>>>>> hand to reproduce it, please let us know if you would like the
>>>>>>>>> KernelCI test to be re-run with earlyprintk or some debug config
>>>>>>>>> turned on, or if you have a fix to try.
>>>>>>>>>
>>>>>>>>> Best wishes,
>>>>>>>>> Guillaume
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hello Guillaume,
>>>>>>>>
>>>>>>>> That patch did have an issue, but it was already fixed by
>>>>>>>>
>>>>>>>> https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9020/1
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=fc2933c133744305236793025b00c2f7d258b687
>>>>>>>>
>>>>>>>> Could you please double check whether cherry-picking that on top of
>>>>>>>> the first bad commit fixes the problem?
>>>>>>>
>>>>>>> Sadly this doesn't appear to be fixing the issue.  I've
>>>>>>> cherry-picked your patch on top of the commit found by the
>>>>>>> bisection but it still didn't boot, here's the git log
>>>>>>>
>>>>>>> cbb9656e83ca ARM: 9020/1: mm: use correct section size macro to 
>>>>>>> describe the FDT virtual address
>>>>>>> 7a1be318f579 ARM: 9012/1: move device tree mapping out of linear region
>>>>>>> e9a2f8b599d0 ARM: 9011/1: centralize phys-to-virt conversion of 
>>>>>>> DT/ATAGS address
>>>>>>> 3650b228f83a Linux 5.10-rc1
>>>>>>>
>>>>>>> Test log: 
>>>>>>> https://people.collabora.com/~gtucker/lava/boot/rpi-2-b/v5.10-rc1-3-gcbb9656e83ca/
>>>>>>>
>>>>>>> There's no output so it's hard to tell what is going on, but
>>>>>>> reverting the bad commmit does make the board to boot (that's
>>>>>>> what "revert: PASS" means in the bisect report).  So it's
>>>>>>> unlikely that there is another issue causing the boot failure.
>>>>>>
>>>>>> These silent boot failures are precisely wh

Re: rmk/for-next bisection: baseline.login on bcm2836-rpi-2-b

2020-11-13 Thread Guillaume Tucker
On 13/11/2020 10:35, Ard Biesheuvel wrote:
> On Fri, 13 Nov 2020 at 11:31, Guillaume Tucker
>  wrote:
>>
>> Hi Ard,
>>
>> Please see the bisection report below about a boot failure on
>> RPi-2b.
>>
>> Reports aren't automatically sent to the public while we're
>> trialing new bisection features on kernelci.org but this one
>> looks valid.
>>
>> There's nothing in the serial console log, probably because it's
>> crashing too early during boot.  I'm not sure if other platforms
>> on kernelci.org were hit by this in the same way, but there
>> doesn't seem to be any.
>>
>> The same regression can be see on rmk's for-next branch as well
>> as in linux-next.  It happens with both bcm2835_defconfig and
>> multi_v7_defconfig.
>>
>> Some more details can be found here:
>>
>>   https://kernelci.org/test/case/id/5fae44823818ee918adb8864/
>>
>> If this looks like a real issue but you don't have a platform at
>> hand to reproduce it, please let us know if you would like the
>> KernelCI test to be re-run with earlyprintk or some debug config
>> turned on, or if you have a fix to try.
>>
>> Best wishes,
>> Guillaume
>>
> 
> Hello Guillaume,
> 
> That patch did have an issue, but it was already fixed by
> 
> https://www.armlinux.org.uk/developer/patches/viewpatch.php?id=9020/1
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=fc2933c133744305236793025b00c2f7d258b687
> 
> Could you please double check whether cherry-picking that on top of
> the first bad commit fixes the problem?

Sadly this doesn't appear to be fixing the issue.  I've
cherry-picked your patch on top of the commit found by the
bisection but it still didn't boot, here's the git log

cbb9656e83ca ARM: 9020/1: mm: use correct section size macro to describe the 
FDT virtual address
7a1be318f579 ARM: 9012/1: move device tree mapping out of linear region
e9a2f8b599d0 ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
3650b228f83a Linux 5.10-rc1

Test log: 
https://people.collabora.com/~gtucker/lava/boot/rpi-2-b/v5.10-rc1-3-gcbb9656e83ca/

There's no output so it's hard to tell what is going on, but
reverting the bad commmit does make the board to boot (that's
what "revert: PASS" means in the bisect report).  So it's
unlikely that there is another issue causing the boot failure.

Let me know if there's anything else you want to try out, I'll soon be
done for today but can help again next week.

Thanks,
Guillaume

>> On 13/11/2020 02:27, KernelCI bot wrote:
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>> * This automated bisection report was sent to you on the basis  *
>>> * that you may be involved with the breaking commit it has  *
>>> * found.  No manual investigation has been done to verify it,   *
>>> * and the root cause of the problem may be somewhere else.  *
>>> *   *
>>> * If you do send a fix, please include this trailer:*
>>> *   Reported-by: "kernelci.org bot"   *
>>> *   *
>>> * Hope this helps!  *
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>
>>> rmk/for-next bisection: baseline.login on bcm2836-rpi-2-b
>>>
>>> Summary:
>>>   Start:  40bd54f12902 Merge branch 'devel-stable' into for-next
>>>   Plain log:  
>>> https://storage.kernelci.org/rmk/for-next/for-linus-35-g40bd54f129026/arm/bcm2835_defconfig/gcc-8/lab-collabora/baseline-bcm2836-rpi-2-b.txt
>>>   HTML log:   
>>> https://storage.kernelci.org/rmk/for-next/for-linus-35-g40bd54f129026/arm/bcm2835_defconfig/gcc-8/lab-collabora/baseline-bcm2836-rpi-2-b.html
>>>   Result: 7a1be318f579 ARM: 9012/1: move device tree mapping out of 
>>> linear region
>>>
>>> Checks:
>>>   revert: PASS
>>>   verify: PASS
>>>
>>> Parameters:
>>>   Tree:   rmk
>>>   URL:git://git.armlinux.org.uk/~rmk/linux-arm.git
>>>   Branch: for-next
>>>   Target: bcm2836-rpi-2-b
>>>   CPU arch:   arm
>>>   Lab:lab-collabora
>>>   Compiler:   gcc-8
>>>   Config: bcm2835_defconfig
>>>   Test case:  baseline.login
>>>
>>> Breaking commit found:
>>>
>>> ---
>>> commit 7a1be318f5795cb66f

Re: rmk/for-next bisection: baseline.login on bcm2836-rpi-2-b

2020-11-13 Thread Guillaume Tucker
Hi Ard,

Please see the bisection report below about a boot failure on
RPi-2b.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

There's nothing in the serial console log, probably because it's
crashing too early during boot.  I'm not sure if other platforms
on kernelci.org were hit by this in the same way, but there
doesn't seem to be any.

The same regression can be see on rmk's for-next branch as well
as in linux-next.  It happens with both bcm2835_defconfig and
multi_v7_defconfig.

Some more details can be found here:

  https://kernelci.org/test/case/id/5fae44823818ee918adb8864/

If this looks like a real issue but you don't have a platform at
hand to reproduce it, please let us know if you would like the
KernelCI test to be re-run with earlyprintk or some debug config
turned on, or if you have a fix to try.

Best wishes,
Guillaume



On 13/11/2020 02:27, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> rmk/for-next bisection: baseline.login on bcm2836-rpi-2-b
> 
> Summary:
>   Start:  40bd54f12902 Merge branch 'devel-stable' into for-next
>   Plain log:  
> https://storage.kernelci.org/rmk/for-next/for-linus-35-g40bd54f129026/arm/bcm2835_defconfig/gcc-8/lab-collabora/baseline-bcm2836-rpi-2-b.txt
>   HTML log:   
> https://storage.kernelci.org/rmk/for-next/for-linus-35-g40bd54f129026/arm/bcm2835_defconfig/gcc-8/lab-collabora/baseline-bcm2836-rpi-2-b.html
>   Result: 7a1be318f579 ARM: 9012/1: move device tree mapping out of 
> linear region
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   rmk
>   URL:git://git.armlinux.org.uk/~rmk/linux-arm.git
>   Branch: for-next
>   Target: bcm2836-rpi-2-b
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: bcm2835_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 7a1be318f5795cb66fa0dc86b3ace427fe68057f
> Author: Ard Biesheuvel 
> Date:   Sun Oct 11 10:21:37 2020 +0100
> 
> ARM: 9012/1: move device tree mapping out of linear region
> 
> On ARM, setting up the linear region is tricky, given the constraints
> around placement and alignment of the memblocks, and how the kernel
> itself as well as the DT are placed in physical memory.
> 
> Let's simplify matters a bit, by moving the device tree mapping to the
> top of the address space, right between the end of the vmalloc region
> and the start of the the fixmap region, and create a read-only mapping
> for it that is independent of the size of the linear region, and how it
> is organized.
> 
> Since this region was formerly used as a guard region, which will now be
> populated fully on LPAE builds by this read-only mapping (which will
> still be able to function as a guard region for stray writes), bump the
> start of the [underutilized] fixmap region by 512 KB as well, to ensure
> that there is always a proper guard region here. Doing so still leaves
> ample room for the fixmap space, even with NR_CPUS set to its maximum
> value of 32.
> 
> Tested-by: Linus Walleij 
> Reviewed-by: Linus Walleij 
> Reviewed-by: Nicolas Pitre 
> Signed-off-by: Ard Biesheuvel 
> Signed-off-by: Russell King 
> 
> diff --git a/Documentation/arm/memory.rst b/Documentation/arm/memory.rst
> index 0521b4ce5c96..34bb23c44a71 100644
> --- a/Documentation/arm/memory.rst
> +++ b/Documentation/arm/memory.rst
> @@ -45,9 +45,14 @@ fffe8000   fffeDTCM mapping area for platforms 
> with
>  fffe fffe7fffITCM mapping area for platforms with
>   ITCM mounted inside the CPU.
>  
> -ffc0 ffefFixmap mapping region.  Addresses provided
> +ffc8 ffefFixmap mapping region.  Addresses provided
>   by fix_to_virt() will be located here.
>  
> +ffc0 ffc7Guard region
> +
> +ff80 ffbfPermanent, fixed read-only mapping of the
> + firmware provided DT blob
> +
>  fee0 feff

[PATCH v2] rtc: hym8563: enable wakeup when applicable

2020-11-06 Thread Guillaume Tucker
Enable wakeup in the hym8563 driver if the IRQ was successfully
requested or if wakeup-source is set in the devicetree.

As per the description of device_init_wakeup(), it should be enabled
for "devices that everyone expects to be wakeup sources".  One would
expect this to be the case with a real-time clock.

Tested on rk3288-rock2-square, which has an IRQ configured for the
RTC.  As a result, wakeup was enabled during driver initialisation.

Fixes: dcaf03849352 ("rtc: add hym8563 rtc-driver")
Reported-by: kernelci.org bot 
Signed-off-by: Guillaume Tucker 
---

Notes:
v2: enable wakeup if irq or wakeup-source

 drivers/rtc/rtc-hym8563.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-hym8563.c b/drivers/rtc/rtc-hym8563.c
index 0fb79c4afb46..24e0095be058 100644
--- a/drivers/rtc/rtc-hym8563.c
+++ b/drivers/rtc/rtc-hym8563.c
@@ -527,8 +527,6 @@ static int hym8563_probe(struct i2c_client *client,
hym8563->client = client;
i2c_set_clientdata(client, hym8563);
 
-   device_set_wakeup_capable(>dev, true);
-
ret = hym8563_init_device(client);
if (ret) {
dev_err(>dev, "could not init device, %d\n", ret);
@@ -547,6 +545,11 @@ static int hym8563_probe(struct i2c_client *client,
}
}
 
+   if (client->irq > 0 ||
+   device_property_read_bool(>dev, "wakeup-source")) {
+   device_init_wakeup(>dev, true);
+   }
+
/* check state of calendar information */
ret = i2c_smbus_read_byte_data(client, HYM8563_SEC);
if (ret < 0)
-- 
2.20.1



Re: [PATCH] rtc: hym8563: enable wakeup by default

2020-11-05 Thread Guillaume Tucker
On 05/11/2020 22:09, Alexandre Belloni wrote:
> On 05/11/2020 22:01:10+0000, Guillaume Tucker wrote:
>> Enable wakeup by default in the hym8563 driver to match the behaviour
>> implemented by the majority of RTC drivers.  As per the description of
>> device_init_wakeup(), it should be enabled for "devices that everyone
>> expects to be wakeup sources".  One would expect this to be the case
>> with a real-time clock.
>>
> 
> Actually, the proper way of doing it for a discrete RTC is to only
> enable wakeup if the irq request is successful or when the wakeup-source
> property is present on the node.

Thanks for the quick reply.  I see, I'll send a v2 accordingly.

Guillaume

>> Fixes: dcaf03849352 ("rtc: add hym8563 rtc-driver")
>> Reported-by: kernelci.org bot 
>> Signed-off-by: Guillaume Tucker 
>> ---
>>  drivers/rtc/rtc-hym8563.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/rtc/rtc-hym8563.c b/drivers/rtc/rtc-hym8563.c
>> index 0fb79c4afb46..6fccfe634d57 100644
>> --- a/drivers/rtc/rtc-hym8563.c
>> +++ b/drivers/rtc/rtc-hym8563.c
>> @@ -527,7 +527,7 @@ static int hym8563_probe(struct i2c_client *client,
>>  hym8563->client = client;
>>  i2c_set_clientdata(client, hym8563);
>>  
>> -device_set_wakeup_capable(>dev, true);
>> +device_init_wakeup(>dev, true);
>>  
>>  ret = hym8563_init_device(client);
>>  if (ret) {
>> -- 
>> 2.20.1
>>
> 



[PATCH] rtc: hym8563: enable wakeup by default

2020-11-05 Thread Guillaume Tucker
Enable wakeup by default in the hym8563 driver to match the behaviour
implemented by the majority of RTC drivers.  As per the description of
device_init_wakeup(), it should be enabled for "devices that everyone
expects to be wakeup sources".  One would expect this to be the case
with a real-time clock.

Fixes: dcaf03849352 ("rtc: add hym8563 rtc-driver")
Reported-by: kernelci.org bot 
Signed-off-by: Guillaume Tucker 
---
 drivers/rtc/rtc-hym8563.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/rtc/rtc-hym8563.c b/drivers/rtc/rtc-hym8563.c
index 0fb79c4afb46..6fccfe634d57 100644
--- a/drivers/rtc/rtc-hym8563.c
+++ b/drivers/rtc/rtc-hym8563.c
@@ -527,7 +527,7 @@ static int hym8563_probe(struct i2c_client *client,
hym8563->client = client;
i2c_set_clientdata(client, hym8563);
 
-   device_set_wakeup_capable(>dev, true);
+   device_init_wakeup(>dev, true);
 
ret = hym8563_init_device(client);
if (ret) {
-- 
2.20.1



Re: [PATCH] x86/x86_64_defconfig: Enable the serial console

2020-10-12 Thread Guillaume Tucker
On 12/10/2020 15:40, Willy Tarreau wrote:
> On Mon, Oct 12, 2020 at 04:32:12PM +0200, Borislav Petkov wrote:
>> On Mon, Oct 12, 2020 at 11:22:10AM +0100, Guillaume Tucker wrote:
>>> However, it was found while adding some x86 Chromebooks[1] to
>>> KernelCI that x86_64_defconfig lacked some basic things for
>>> anyone to be able to boot a kernel with a serial console enabled
>>> on those.
>>
>> Hold on, those are laptops, right? How come they do have serial console?
>> Because laptops don't have serial console - that has been the eternal
>> problem with debugging kernels on laptops.

Yes the link you pointed at is a prerequisite to enable serial
console in the firmware (Coreboot/Depthcharge).

> Well, to be precise, they don't have *anymore*. I used to exclusively
> select laptops having a serial port given that I was using it daily with
> routers, until I had to resign when I abandonned my good old NC8000 :-/

You can get serial console on recent enough Chromebooks with a
debug interface such as SuzyQable:

  https://www.sparkfun.com/products/14746

It's not a USB Type-C adapter, it has a debug interface which
works with Chromebooks that support Case-Closed Debugging.
Anyone can do that without modifying the Chromebook, and with a
bit of patience to go through the documentation[1]...

The KernelCI sample results from my previous email were run using
just that: off-the-shelf Chromebooks + SuzyQ + rebuilt firmware
for interactive console and tftp boot + kernel with the config
options in Enric's patch.

Thanks,
Guillaume


[1] 
https://chromium.googlesource.com/chromiumos/platform/ec/+/cr50_stab/docs/case_closed_debugging_cr50.md


Re: [PATCH] x86/x86_64_defconfig: Enable the serial console

2020-10-12 Thread Guillaume Tucker
On 12/10/2020 04:58, Willy Tarreau wrote:
> Hi Enric,
> 
> On Sun, Oct 11, 2020 at 07:05:55PM +0200, Enric Balletbo i Serra wrote:
>> For arm64 (i.e : arm64_defconfig):
>> 1. Someone renames CONFIG_A to CONFIG_AB, sends a patch, and as he did a
>> grep, the patch modifies all the defconfigs.
>> 2. The patch is accepted and merged in linux-next.
>> 3. KernelCI builds linux-next, boots the kernel on the hardware and all 
>> the
>> tests continue passing.
>>
>>
>> For x86:
>> 1. Someone renames CONFIG_A to CONFIG_AB, sends a patch and as he did a 
>> grep
>> the patches modifies all the defconfigs.
>> 2. The patch is accepted and merged in linux-next.
>> 3. KernelCI builds linux-next, boots the kernel on the hardware, and some
>> tests start to fail or are skipped.
>> 4. The maintainer is noticed about the behavior change, so he will need 
>> to
>> look at the problem, and find it.
>> 5. The maintainer sends a patch.
>> 6. The patch is accepted, but he needs to tag the release as per kernel <
>> x.y.z version it should use CONFIG_A and for kernel > x.y.z it should pick
>> CONFIG_AB.
>> 7. KernelCI builds linux-next, boots the kernel on the hardware and all 
>> the
>> tests pass again.
> 
> Previously I thought I understood your needs, but now I don't anymore. You
> seem to be saying that you're not testing *anything* outside of defconfig,
> and that as such you'd like defconfig to be complete enough to provide good
> coverage. This sounds a bit odd to me. And what if in the arm64 case, the
> CONFIG_YOUR_V4L2_DEVICE is *not* added to defconfig ? You're in the same
> situation.
> 
> We all know it's not fun to have to deal with local config snippets, but
> as soon as you plan to boot on a specific hardware, this is unavoidable.
> Also, config symbols are rarely renamed. Most often they are moved under
> new entries (e.g. CONFIG_VENDOR_FOO) which are enabled by default, so
> that updating your old configuration using "make olddefconfig" is enough
> to update it.
> 
> What I'm understanding from your proposed change is not to support
> KernelCI, but to support Chromebooks by default. This could make more
> sense if that's a relevant platform whose support is currently limited
> by default, I'm not able to judge that, but at least it seems to me
> this would make more sense than having specific configs for KernelCI.

This is correct, KernelCI doesn't really need these configs to be
upstreamed.  It's useful as Enric pointed out, but there are
already several specific config fragments being managed by the
KernelCI build system as one would expect, and we can take care
of one more if need be.

However, it was found while adding some x86 Chromebooks[1] to
KernelCI that x86_64_defconfig lacked some basic things for
anyone to be able to boot a kernel with a serial console enabled
on those.  That is what this patch is really about.  When doing
upstream kernel development and building your own kernel, it is
obviously a very useful thing to have.

Agreed, it is easy enough for a developer to turn these configs
on when required.  But it's not entirely trivial to find out
which configs to turn on, especially when you don't have access
to the kernel log.  I went through the Chrome OS 4.14 kernel
config fragments to get there.  Everyone would probably not
agree, but it does seem to me that the convenience of having it
upstream outweighs the costs.

If it's about size or performance, anyone can compare the kernel
image sizes and other things with the KernelCI (staging) build
artifacts based on v5.9[2].

As mentioned earlier in this thread, there aren't any written
rules about what goes into x86_64_defconfig and what does not.
Based on past history, and looking at it from a developer's point
of view rather than KernelCI, does it make sense in this case?

Thanks,
Guillaume


[1] HP Intel x360 "octopus" and AMD 11A-G6-EE "grunt":
https://staging.kernelci.org/test/plan/id/5f8101a97ba4fdae00cafbb0/
https://staging.kernelci.org/test/plan/id/5f81003f56c3586920cafbb4/

[2] Plain x86_64_defconfig:

https://storage.staging.kernelci.org/kernelci/staging-mainline/staging-mainline-20201011.0/x86_64/x86_64_defconfig/gcc-8/
with "x86 Chromebook" fragment:

https://storage.staging.kernelci.org/kernelci/staging-mainline/staging-mainline-20201011.0/x86_64/x86_64_defconfig+x86-chromebook/gcc-8/



Re: media/master bisection: v4l2-compliance-vivid.Format-ioctls-Input-3.VIDIOC_TRY_FMT on qemu_arm-virt-gicv3

2020-10-02 Thread Guillaume Tucker
On 30/09/2020 09:05, Guillaume Tucker wrote:
> Please see the bisection report below about a regression in
> v4l2-compliance on vivid.
> 
> Reports aren't automatically sent to the public while we're
> trialing new bisection features on kernelci.org but this one
> looks valid.
> 
> 
> The full results for v4l2-compliance on vivid for
> v5.9-rc4-471-gc0c8db7bc953 show 22 individual test case
> regressions which might all be due to a single issue:
> 
>   https://kernelci.org/test/plan/id/5f728c108b008d61f4bf9db7/
> 
> For comparison, this is the results from the previous revision in
> the media tree:
> 
>   https://kernelci.org/test/plan/id/5f6b44ddea4abb1888bf9db4/
> 
> Also worth noting is that the v4l2-compliance test suite was
> updated on Friday 25th, in-between the revisions mentioned above.
> So the issue might have been present earlier but not detected.

Turns out, it needed yet another update.  The failures were all
due to the fact that the v4l2-compliance version being used on
kernelci.org was lagging by a few days behind the media/master
branch.

It's a pretty rare issue, but it would be nice to have a way to
avoid that.  On the KernelCI side of things, we should start
monitoring tests and rebuild them automatically rather than on a
fixed weekly basis.  On the kernel side of things, it would help
if the tests were updated _before_ the changes were applied to
the branch as otherwise there would still be a window for this
kind of issue to occur.

Generally speaking, what do you think would be the best way to
fit the KernelCI v4l2-compliance test cycles into the media
subsystem workflow?

Best wishes,
Guillaume


> On 29/09/2020 07:30, KernelCI bot wrote:
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has  *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.  *
>> *   *
>> * If you do send a fix, please include this trailer:*
>> *   Reported-by: "kernelci.org bot"   *
>> *   *
>> * Hope this helps!  *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> media/master bisection: 
>> v4l2-compliance-vivid.Format-ioctls-Input-3.VIDIOC_TRY_FMT on 
>> qemu_arm-virt-gicv3
>>
>> Summary:
>>   Start:  c0c8db7bc953 media: MAINTAINERS: remove Maxime Jourdan as 
>> maintainer of Amlogic VDEC
>>   Plain log:  
>> https://storage.kernelci.org/media/master/v5.9-rc4-471-gc0c8db7bc953/arm/multi_v7_defconfig+virtualvideo/gcc-8/lab-collabora/v4l2-compliance-vivid-qemu_arm-virt-gicv3.txt
>>   HTML log:   
>> https://storage.kernelci.org/media/master/v5.9-rc4-471-gc0c8db7bc953/arm/multi_v7_defconfig+virtualvideo/gcc-8/lab-collabora/v4l2-compliance-vivid-qemu_arm-virt-gicv3.html
>>   Result: 2f491463497a media: vivid: Add support to the CSC API
>>
>> Checks:
>>   revert: PASS
>>   verify: PASS
>>
>> Parameters:
>>   Tree:   media
>>   URL:https://git.linuxtv.org/media_tree.git
>>   Branch: master
>>   Target: qemu_arm-virt-gicv3
>>   CPU arch:   arm
>>   Lab:lab-collabora
>>   Compiler:   gcc-8
>>   Config: multi_v7_defconfig+virtualvideo
>>   Test case:  v4l2-compliance-vivid.Format-ioctls-Input-3.VIDIOC_TRY_FMT
>>
>> Breaking commit found:
>>
>> ---
>> commit 2f491463497ad43bc06968a334747c6b6b20fc74
>> Author: Dafna Hirschfeld 
>> Date:   Thu Aug 27 21:46:09 2020 +0200
>>
>> media: vivid: Add support to the CSC API
>> 
>> The CSC API (Colorspace conversion) allows userspace to try
>> to configure the colorspace, transfer function, Y'CbCr/HSV encoding
>> and the quantization for capture devices. This patch adds support
>> to the CSC API in vivid.
>> Using the CSC API, userspace is allowed to do the following:
>> 
>> - Set the colorspace.
>> - Set the xfer_func.
>> - Set the ycbcr_enc function for YUV formats.
>> - Set the hsv_enc function for HSV formats
>> - Set the quantization for YUV and RGB formats.
>> 
>> Signed-off-by: Dafna Hirschfeld 
>> Signed-off-by: Hans Verkuil 
>> Signed-off-by: Mauro Carvalho Chehab 
>>
&g

Re: media/master bisection: v4l2-compliance-vivid.Format-ioctls-Input-3.VIDIOC_TRY_FMT on qemu_arm-virt-gicv3

2020-09-30 Thread Guillaume Tucker
Please see the bisection report below about a regression in
v4l2-compliance on vivid.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.


The full results for v4l2-compliance on vivid for
v5.9-rc4-471-gc0c8db7bc953 show 22 individual test case
regressions which might all be due to a single issue:

  https://kernelci.org/test/plan/id/5f728c108b008d61f4bf9db7/

For comparison, this is the results from the previous revision in
the media tree:

  https://kernelci.org/test/plan/id/5f6b44ddea4abb1888bf9db4/

Also worth noting is that the v4l2-compliance test suite was
updated on Friday 25th, in-between the revisions mentioned above.
So the issue might have been present earlier but not detected.

Hope this helps!

Thanks,
Guillaume


On 29/09/2020 07:30, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> media/master bisection: 
> v4l2-compliance-vivid.Format-ioctls-Input-3.VIDIOC_TRY_FMT on 
> qemu_arm-virt-gicv3
> 
> Summary:
>   Start:  c0c8db7bc953 media: MAINTAINERS: remove Maxime Jourdan as 
> maintainer of Amlogic VDEC
>   Plain log:  
> https://storage.kernelci.org/media/master/v5.9-rc4-471-gc0c8db7bc953/arm/multi_v7_defconfig+virtualvideo/gcc-8/lab-collabora/v4l2-compliance-vivid-qemu_arm-virt-gicv3.txt
>   HTML log:   
> https://storage.kernelci.org/media/master/v5.9-rc4-471-gc0c8db7bc953/arm/multi_v7_defconfig+virtualvideo/gcc-8/lab-collabora/v4l2-compliance-vivid-qemu_arm-virt-gicv3.html
>   Result: 2f491463497a media: vivid: Add support to the CSC API
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   media
>   URL:https://git.linuxtv.org/media_tree.git
>   Branch: master
>   Target: qemu_arm-virt-gicv3
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: multi_v7_defconfig+virtualvideo
>   Test case:  v4l2-compliance-vivid.Format-ioctls-Input-3.VIDIOC_TRY_FMT
> 
> Breaking commit found:
> 
> ---
> commit 2f491463497ad43bc06968a334747c6b6b20fc74
> Author: Dafna Hirschfeld 
> Date:   Thu Aug 27 21:46:09 2020 +0200
> 
> media: vivid: Add support to the CSC API
> 
> The CSC API (Colorspace conversion) allows userspace to try
> to configure the colorspace, transfer function, Y'CbCr/HSV encoding
> and the quantization for capture devices. This patch adds support
> to the CSC API in vivid.
> Using the CSC API, userspace is allowed to do the following:
> 
> - Set the colorspace.
> - Set the xfer_func.
> - Set the ycbcr_enc function for YUV formats.
> - Set the hsv_enc function for HSV formats
> - Set the quantization for YUV and RGB formats.
> 
> Signed-off-by: Dafna Hirschfeld 
> Signed-off-by: Hans Verkuil 
> Signed-off-by: Mauro Carvalho Chehab 
> 
> diff --git a/drivers/media/test-drivers/vivid/vivid-vid-cap.c 
> b/drivers/media/test-drivers/vivid/vivid-vid-cap.c
> index e94beef008c8..eadf28ab1e39 100644
> --- a/drivers/media/test-drivers/vivid/vivid-vid-cap.c
> +++ b/drivers/media/test-drivers/vivid/vivid-vid-cap.c
> @@ -560,6 +560,7 @@ int vivid_try_fmt_vid_cap(struct file *file, void *priv,
>   unsigned factor = 1;
>   unsigned w, h;
>   unsigned p;
> + bool user_set_csc = !!(mp->flags & V4L2_PIX_FMT_FLAG_SET_CSC);
>  
>   fmt = vivid_get_format(dev, mp->pixelformat);
>   if (!fmt) {
> @@ -633,13 +634,30 @@ int vivid_try_fmt_vid_cap(struct file *file, void *priv,
>   (fmt->bit_depth[p] / fmt->vdownsampling[p])) /
>   (fmt->bit_depth[0] / fmt->vdownsampling[0]);
>  
> - mp->colorspace = vivid_colorspace_cap(dev);
> - if (fmt->color_enc == TGP_COLOR_ENC_HSV)
> - mp->hsv_enc = vivid_hsv_enc_cap(dev);
> - else
> + if (!user_set_csc || !v4l2_is_colorspace_valid(mp->colorspace))
> + mp->colorspace = vivid_colorspace_cap(dev);
> +
> + if (!user_set_csc || !v4l2_is_xfer_func_valid(mp->xfer_func))
> + mp->xfer_func = vivid_xfer_func_cap(dev);
> +
> + if (fmt->color_enc == TGP_COLOR_ENC_HSV) {
> + if (!user_set_csc || !v4l2_is_hsv_enc_valid(mp->hsv_enc))
> +  

Re: [PATCH v3 16/16] ARM: Remove custom IRQ stat accounting

2020-09-28 Thread Guillaume Tucker
Hi Marc,

On 24/09/2020 14:09, Guillaume Tucker wrote:
> On 24/09/2020 10:29, Marc Zyngier wrote:
>> Hi Guillaume,
>>
>> On Thu, 24 Sep 2020 10:00:09 +0100,
>> Guillaume Tucker  wrote:
>>>
>>> Hi Marc,
>>>
>>> On 01/09/2020 15:43, Marc Zyngier wrote:
>>>> Let's switch the arm code to the core accounting, which already
>>>> does everything we need.
>>>>
>>>> Reviewed-by: Valentin Schneider 
>>>> Signed-off-by: Marc Zyngier 
>>>> ---
>>>>  arch/arm/include/asm/hardirq.h | 17 -
>>>>  arch/arm/kernel/smp.c  | 20 
>>>>  2 files changed, 4 insertions(+), 33 deletions(-)
>>>
>>> This appears to be causing a NULL pointer dereference on
>>> beaglebone-black, it got bisected automatically several times.
>>> None of the other platforms in the KernelCI labs appears to be
>>> affected.
>>
>> Hmm. My bet is that because this is a UP machine running an SMP
>> kernel, and I fell into the trap of forgetting about this 32bit
>> configuration.
>>
>> I expect the following patch to fix it. Please give it a go if you can
>> (I'm away at the moment and can't test much, and do not have any
>> physical 32bit machine to test this on).
> 
> OK thanks, that worked:
> 
>   https://lava.baylibre.com/scheduler/job/143170
> 
> I've added this fix to the kernel branch used on
> staging.kernelci.org which is based on linux-next, so it will get
> fully verified a bit later today.
> 
> Guillaume
> 
> 
>> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
>> index 00327fa74b01..b4e3d336dc33 100644
>> --- a/arch/arm/kernel/smp.c
>> +++ b/arch/arm/kernel/smp.c
>> @@ -531,7 +531,12 @@ void show_ipi_list(struct seq_file *p, int prec)
>>  unsigned int cpu, i;
>>  
>>  for (i = 0; i < NR_IPI; i++) {
>> -unsigned int irq = irq_desc_get_irq(ipi_desc[i]);
>> +unsigned int irq;
>> +
>> +if (!ipi_desc[i])
>> +continue;
>> +
>> +irq = irq_desc_get_irq(ipi_desc[i]);
>>  seq_printf(p, "%*s%u: ", prec - 1, "IPI", i);
>>  
>>  for_each_online_cpu(cpu)

This fix has been all tested now, with no visible side effects:

  
https://staging.kernelci.org/test/job/kernelci/branch/staging.kernelci.org/kernel/staging-20200928.1/plan/baseline/

In the meantime, the same issue was detected (without the fix)
and bisected on sun5i-a13-olinuxino-micro and landed on the same
commit.  A few more platforms are also impacted such as imx53-qsb
as mentioned by Fabio.

The commit is in your irqchip tree so I guess we should wait for
you to apply the fix.  If you do make a separate commit to fix
the issue, please add:

  Reported-by: kernelci.org bot 

and also:

  Tested-by: Guillaume Tucker 

Thanks,
Guillaume


Re: [PATCH v3 16/16] ARM: Remove custom IRQ stat accounting

2020-09-24 Thread Guillaume Tucker
On 24/09/2020 14:34, Fabio Estevam wrote:
> Hi Guillaume,
> 
> On Thu, Sep 24, 2020 at 6:01 AM Guillaume Tucker
>  wrote:
> 
>> This appears to be causing a NULL pointer dereference on
>> beaglebone-black, it got bisected automatically several times.
>> None of the other platforms in the KernelCI labs appears to be
>> affected.
> 
> Actually imx53-qsb is also affected:
> https://storage.kernelci.org/next/master/next-20200924/arm/imx_v6_v7_defconfig/gcc-8/lab-pengutronix/baseline-imx53-qsrb.html
> 
> kernelci marks it Boot result: PASS though.
> 
> Shouldn't kernelci flag a warning or error instead?

Thanks for bringing this up.  The status in the HTML log file is
a very coarse one, in this case the board booted "fine" since it
reached a login prompt.  The issue was detected later when
checking for errors in the kernel log.

But yes you're right, the issue is also impacting imx53-qsrb
indeed.  I didn't spot that because it was only reported as a
regression on staging.kernelci.org, whereas imx53-qsrb is in the
Pengutronix lab which is not sending results there at the moment.

The failures can be found on the production web dashboard though,
but not as regressions:

beaglebone-black:

  https://kernelci.org/test/case/id/5f6c7f1ab7c8c5472cbf9de9/

imx53-qsrb:

  https://kernelci.org/test/case/id/5f6c7ea6f89a9d0f4dbf9ddf/


I need to investigate why that is the case, knowing that the
regression was detected correctly on staging which is the
development KernelCI instance:

  https://staging.kernelci.org/test/plan/id/5f6bea67f724eb1b34dce581/


Thanks,
Guillaume




Re: [PATCH v3 16/16] ARM: Remove custom IRQ stat accounting

2020-09-24 Thread Guillaume Tucker
On 24/09/2020 10:29, Marc Zyngier wrote:
> Hi Guillaume,
> 
> On Thu, 24 Sep 2020 10:00:09 +0100,
> Guillaume Tucker  wrote:
>>
>> Hi Marc,
>>
>> On 01/09/2020 15:43, Marc Zyngier wrote:
>>> Let's switch the arm code to the core accounting, which already
>>> does everything we need.
>>>
>>> Reviewed-by: Valentin Schneider 
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  arch/arm/include/asm/hardirq.h | 17 -
>>>  arch/arm/kernel/smp.c  | 20 
>>>  2 files changed, 4 insertions(+), 33 deletions(-)
>>
>> This appears to be causing a NULL pointer dereference on
>> beaglebone-black, it got bisected automatically several times.
>> None of the other platforms in the KernelCI labs appears to be
>> affected.
> 
> Hmm. My bet is that because this is a UP machine running an SMP
> kernel, and I fell into the trap of forgetting about this 32bit
> configuration.
> 
> I expect the following patch to fix it. Please give it a go if you can
> (I'm away at the moment and can't test much, and do not have any
> physical 32bit machine to test this on).

OK thanks, that worked:

  https://lava.baylibre.com/scheduler/job/143170

I've added this fix to the kernel branch used on
staging.kernelci.org which is based on linux-next, so it will get
fully verified a bit later today.

Guillaume


> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 00327fa74b01..b4e3d336dc33 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -531,7 +531,12 @@ void show_ipi_list(struct seq_file *p, int prec)
>   unsigned int cpu, i;
>  
>   for (i = 0; i < NR_IPI; i++) {
> - unsigned int irq = irq_desc_get_irq(ipi_desc[i]);
> + unsigned int irq;
> +
> + if (!ipi_desc[i])
> + continue;
> +
> + irq = irq_desc_get_irq(ipi_desc[i]);
>   seq_printf(p, "%*s%u: ", prec - 1, "IPI", i);
>  
>   for_each_online_cpu(cpu)
> 
> Thanks,
> 
>   M.
> 



Re: [PATCH v3 16/16] ARM: Remove custom IRQ stat accounting

2020-09-24 Thread Guillaume Tucker
Hi Marc,

On 01/09/2020 15:43, Marc Zyngier wrote:
> Let's switch the arm code to the core accounting, which already
> does everything we need.
> 
> Reviewed-by: Valentin Schneider 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/include/asm/hardirq.h | 17 -
>  arch/arm/kernel/smp.c  | 20 
>  2 files changed, 4 insertions(+), 33 deletions(-)

This appears to be causing a NULL pointer dereference on
beaglebone-black, it got bisected automatically several times.
None of the other platforms in the KernelCI labs appears to be
affected.

Here's the error in the full job log, with next-20200923:

  
https://storage.staging.kernelci.org/kernelci/staging.kernelci.org/staging-20200924.0/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-beaglebone-black.html#L460

and some meta-data:

  https://staging.kernelci.org/test/case/id/5f6bea67f724eb1b34dce584/

The full bisection report is available here:

  https://groups.io/g/kernelci-results-staging/message/2094

I've also run it again with a debug build to locate the problem,
see below.


> diff --git a/arch/arm/include/asm/hardirq.h b/arch/arm/include/asm/hardirq.h
> index 7a88f160b1fb..b95848ed2bc7 100644
> --- a/arch/arm/include/asm/hardirq.h
> +++ b/arch/arm/include/asm/hardirq.h
> @@ -6,29 +6,12 @@
>  #include 
>  #include 
>  
> -/* number of IPIS _not_ including IPI_CPU_BACKTRACE */
> -#define NR_IPI   7
> -
>  typedef struct {
>   unsigned int __softirq_pending;
> -#ifdef CONFIG_SMP
> - unsigned int ipi_irqs[NR_IPI];
> -#endif
>  } cacheline_aligned irq_cpustat_t;
>  
>  #include/* Standard mappings for irq_cpustat_t 
> above */
>  
> -#define __inc_irq_stat(cpu, member)  __IRQ_STAT(cpu, member)++
> -#define __get_irq_stat(cpu, member)  __IRQ_STAT(cpu, member)
> -
> -#ifdef CONFIG_SMP
> -u64 smp_irq_stat_cpu(unsigned int cpu);
> -#else
> -#define smp_irq_stat_cpu(cpu)0
> -#endif
> -
> -#define arch_irq_stat_cpusmp_irq_stat_cpu
> -
>  #define __ARCH_IRQ_EXIT_IRQS_DISABLED1
>  
>  #endif /* __ASM_HARDIRQ_H */
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index d51e64955a26..aead847ac8b9 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -65,6 +65,7 @@ enum ipi_msg_type {
>   IPI_CPU_STOP,
>   IPI_IRQ_WORK,
>   IPI_COMPLETION,
> + NR_IPI,
>   /*
>* CPU_BACKTRACE is special and not included in NR_IPI
>* or tracable with trace_ipi_*
> @@ -529,27 +530,16 @@ void show_ipi_list(struct seq_file *p, int prec)
>   unsigned int cpu, i;
>  
>   for (i = 0; i < NR_IPI; i++) {
> + unsigned int irq = irq_desc_get_irq(ipi_desc[i]);

It looks like irq_desc_get_irq() gets called with a NULL
pointer (well, 0x001c):

(gdb) l *0xc030ef38
0xc030ef38 is in show_ipi_list (../include/linux/irqdesc.h:123).
118 return container_of(data->common, struct irq_desc, 
irq_common_data);
119 }
120 
121 static inline unsigned int irq_desc_get_irq(struct irq_desc *desc)
122 {
123 return desc->irq_data.irq;
124 }
125 
126 static inline struct irq_data *irq_desc_get_irq_data(struct irq_desc 
*desc)
127 {

Full job log: https://lava.baylibre.com/scheduler/job/142375#L727

I haven't looked any further but hopefully this should be a good
enough clue to find the root cause.  I don't know if you have a
platform at hand to reproduce the issue, please let me know if
you need some help with debugging or testing a fix.

Hope this helps,
Guillaume


>   seq_printf(p, "%*s%u: ", prec - 1, "IPI", i);
>  
>   for_each_online_cpu(cpu)
> - seq_printf(p, "%10u ",
> -__get_irq_stat(cpu, ipi_irqs[i]));
> + seq_printf(p, "%10u ", kstat_irqs_cpu(irq, cpu));
>  
>   seq_printf(p, " %s\n", ipi_types[i]);
>   }
>  }
>  
> -u64 smp_irq_stat_cpu(unsigned int cpu)
> -{
> - u64 sum = 0;
> - int i;
> -
> - for (i = 0; i < NR_IPI; i++)
> - sum += __get_irq_stat(cpu, ipi_irqs[i]);
> -
> - return sum;
> -}
> -
>  void arch_send_call_function_ipi_mask(const struct cpumask *mask)
>  {
>   smp_cross_call(mask, IPI_CALL_FUNC);
> @@ -630,10 +620,8 @@ static void do_handle_IPI(int ipinr)
>  {
>   unsigned int cpu = smp_processor_id();
>  
> - if ((unsigned)ipinr < NR_IPI) {
> + if ((unsigned)ipinr < NR_IPI)
>   trace_ipi_entry_rcuidle(ipi_types[ipinr]);
> - __inc_irq_stat(cpu, ipi_irqs[ipinr]);
> - }
>  
>   switch (ipinr) {
>   case IPI_WAKEUP:
> 



Re: [PATCH] cma: make number of CMA areas dynamic, remove CONFIG_CMA_AREAS

2020-09-16 Thread Guillaume Tucker
On 16/09/2020 17:30, Mike Kravetz wrote:
> On 9/16/20 2:14 AM, Song Bao Hua (Barry Song) wrote:
 -Original Message-
 From: Mike Kravetz [mailto:mike.krav...@oracle.com]
 Sent: Wednesday, September 16, 2020 8:57 AM
 To: linux...@kvack.org; linux-kernel@vger.kernel.org;
 linux-arm-ker...@lists.infradead.org; linux-m...@vger.kernel.org
 Cc: Roman Gushchin ; Song Bao Hua (Barry Song)
 ; Mike Rapoport ; Joonsoo
 Kim ; Rik van Riel ; Aslan Bakirov
 ; Michal Hocko ; Andrew Morton
 ; Mike Kravetz 
 Subject: [PATCH] cma: make number of CMA areas dynamic, remove
 CONFIG_CMA_AREAS

 The number of distinct CMA areas is limited by the constant
 CONFIG_CMA_AREAS.  In most environments, this was set to a default
 value of 7.  Not too long ago, support was added to allocate hugetlb
 gigantic pages from CMA.  More recent changes to make
>>> dma_alloc_coherent
 NUMA-aware on arm64 added more potential users of CMA areas.  Along
 with the dma_alloc_coherent changes, the default value of CMA_AREAS
 was bumped up to 19 if NUMA is enabled.

 It seems that the number of CMA users is likely to grow.  Instead of
 using a static array for cma areas, use a simple linked list.  These
 areas are used before normal memory allocators, so use the memblock
 allocator.

 Acked-by: Roman Gushchin 
 Signed-off-by: Mike Kravetz 
 ---
 rfc->v1
   - Made minor changes suggested by Song Bao Hua (Barry Song)
   - Removed check for late calls to cma_init_reserved_mem that was part
 of RFC.
   - Added ACK from Roman Gushchin
   - Still in need of arm testing
>>>
>>> Unfortunately, the test result on my arm64 board is negative, Linux can't 
>>> boot
>>> after applying
>>> this patch.
>>>
>>> I guess we have to hold on this patch for a while till this is fixed. BTW, 
>>> Mike, do
>>> you have
>>> a qemu-based arm64 numa system to debug? It is very easy to reproduce, we
>>> don't need to
>>> use hugetlb_cma and pernuma_cma. Just the default cma will make the boot
>>> hang.
>>
>> Hi Mike,
>> I spent some time on debugging the boot issue and sent a patch here:
>> https://lore.kernel.org/linux-mm/20200916085933.25220-1-song.bao@hisilicon.com/
>> All details and knic oops can be found there.
>> pls feel free to merge my patch into your v2 if you want. And we probably 
>> need ack from
>> arm maintainers.
>>
>> Also,  +Will,
>>
>> Hi Will, the whole story is that Mike tried to remove the cma array with 
>> CONFIG_CMA_AREAS
>> and moved to use memblock_alloc() to allocate cma area, so that the number 
>> of cma areas
>> could be dynamic. It turns out it causes a kernel panic on arm64 during 
>> system boot as the
>> returned address from memblock_alloc is invalid before paging_init() is done 
>> on arm64.
>>
> 
> Thank you!
> 
> Based on your analysis, I am concerned that other architectures may also
> have issues.
> 
> Andrew,
> I suggest we remove this patch from your tree.  I will audit all architectures
> which enable CMA and look for similar issues there.  Will then merge Barry's
> patch into a V2 with any other arch specific changes.

FYI This was also bisected on kernelci.org[1] and it landed on
this commit: c999bd436fe9 ("mm/cma: make number of CMA areas
dynamic, remove CONFIG_CMA_AREAS").  Only arm and arm64 seem to
be affected, and not with all the builds:

  
https://kernelci.org/test/job/next/branch/master/kernel/next-20200916/plan/baseline/

The list of failures above might help someone debug the issue
with a platform they have at hand.

Guillaume

[1] https://groups.io/g/kernelci-results-staging/message/2027


Re: [PATCH v2 1/4] ARM: exynos: clear L310_AUX_CTRL_NS_LOCKDOWN in default l2c_aux_val

2020-09-01 Thread Guillaume Tucker
On 01/09/2020 16:25, Krzysztof Kozlowski wrote:
> On Tue, 1 Sep 2020 at 16:42, Guillaume Tucker
>  wrote:
>>
>> On 01/09/2020 14:51, Krzysztof Kozlowski wrote:
>>> On Tue, 1 Sep 2020 at 15:45, Krzysztof Kozlowski  wrote:
>>>>
>>>> On Tue, 1 Sep 2020 at 15:34, Guillaume Tucker
>>>>  wrote:
>>>>>
>>>>> Hi Krzysztof, Russell,
>>>>>
>>>>> On 10/08/2020 13:22, Guillaume Tucker wrote:
>>>>>> The L310_AUX_CTRL_NS_LOCKDOWN flag is set during the L2C enable
>>>>>> sequence.  There is no need to set it in the default register value,
>>>>>> this was done before support for it was implemented in the code.  It
>>>>>> is not set in the hardware initial value either.
>>>>>>
>>>>>> Clean this up by removing this flag from the default l2c_aux_val, and
>>>>>> add it to the l2c_aux_mask to print an alert message if it was already
>>>>>> set before the kernel initialisation.
>>>>>>
>>>>>> Signed-off-by: Guillaume Tucker 
>>>>>> ---
>>>>>>
>>>>>> Notes:
>>>>>> v2: fix flag name L310_AUX_CTRL_NS_LOCKDOWN
>>>>>>
>>>>>>  arch/arm/mach-exynos/exynos.c | 4 ++--
>>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> I believe this v2 series has addressed all previous comments and
>>>>> you were waiting for the 5.9 merge window to end.  The patches
>>>>> all still apply cleanly on v5.9-rc3.  Do you want me to resend
>>>>> the series anyway or is there anything else needed at this point?
>>>>>
>>>>> Maybe one thing that wasn't completely clear in v1 was whether
>>>>> patch 2/4 was the right approach.  I've explained the reason
>>>>> behind it but didn't get a final reply from Russell[1].
>>>>
>>>> I am sorry, my bad. I already applied this one and 3/4 (dts).
>>>> Apparently I forgot to reply with confirmation and Patchwork did not
>>>> notify you for some reason.
>>
>> No problem, I see them in linux-next now.  Thanks!
>>
>>>> Patch 2/4 does not look like one for me so I would need ack from
>>>> Russell to take. Did you submit it to the ARM patches queue?
>>
>> I've CC-ed linux-arm-ker...@lists.infradead.org on the whole
>> series.  Did you mean anything else by the ARM patches queue?
> 
> Unless anything changed, so far all ARM-core related patches had to be
> submitted to Russell's system. I didn't submit anything for 3 years so
> maybe something changed...
> https://www.arm.linux.org.uk/developer/patches/

Ah yes, thanks.  I hadn't visited that website for ages...  The
patch 2/4 is there now:

  https://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=9007/1

Best wishes,
Guillaume


Re: [PATCH v2 1/4] ARM: exynos: clear L310_AUX_CTRL_NS_LOCKDOWN in default l2c_aux_val

2020-09-01 Thread Guillaume Tucker
On 01/09/2020 14:51, Krzysztof Kozlowski wrote:
> On Tue, 1 Sep 2020 at 15:45, Krzysztof Kozlowski  wrote:
>>
>> On Tue, 1 Sep 2020 at 15:34, Guillaume Tucker
>>  wrote:
>>>
>>> Hi Krzysztof, Russell,
>>>
>>> On 10/08/2020 13:22, Guillaume Tucker wrote:
>>>> The L310_AUX_CTRL_NS_LOCKDOWN flag is set during the L2C enable
>>>> sequence.  There is no need to set it in the default register value,
>>>> this was done before support for it was implemented in the code.  It
>>>> is not set in the hardware initial value either.
>>>>
>>>> Clean this up by removing this flag from the default l2c_aux_val, and
>>>> add it to the l2c_aux_mask to print an alert message if it was already
>>>> set before the kernel initialisation.
>>>>
>>>> Signed-off-by: Guillaume Tucker 
>>>> ---
>>>>
>>>> Notes:
>>>> v2: fix flag name L310_AUX_CTRL_NS_LOCKDOWN
>>>>
>>>>  arch/arm/mach-exynos/exynos.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> I believe this v2 series has addressed all previous comments and
>>> you were waiting for the 5.9 merge window to end.  The patches
>>> all still apply cleanly on v5.9-rc3.  Do you want me to resend
>>> the series anyway or is there anything else needed at this point?
>>>
>>> Maybe one thing that wasn't completely clear in v1 was whether
>>> patch 2/4 was the right approach.  I've explained the reason
>>> behind it but didn't get a final reply from Russell[1].
>>
>> I am sorry, my bad. I already applied this one and 3/4 (dts).
>> Apparently I forgot to reply with confirmation and Patchwork did not
>> notify you for some reason.

No problem, I see them in linux-next now.  Thanks!

>> Patch 2/4 does not look like one for me so I would need ack from
>> Russell to take. Did you submit it to the ARM patches queue?

I've CC-ed linux-arm-ker...@lists.infradead.org on the whole
series.  Did you mean anything else by the ARM patches queue?

>> Patch 4/4 will wait for v5.10-rc1 as it depends on 1/4 and it is DTS patch.
> 
> Correct: Patch 4/4 will wait for v5.10 because it depends on the DTS patch.

Sure, in fact patch 4/4 depends on the DTS one (3/4) and also on
the l2c fix (2/4) as otherwise prefetch would actually not be
enabled.  So it sounds like both remaining ones 2/4 and 4/4 are
actually now pending Russell's ack.

Best wishes,
Guillaume


[1] 
https://lore.kernel.org/lkml/46fa1159-fcd6-b528-b8e8-2fba04823...@collabora.com/


Re: [PATCH v2 1/4] ARM: exynos: clear L310_AUX_CTRL_NS_LOCKDOWN in default l2c_aux_val

2020-09-01 Thread Guillaume Tucker
Hi Krzysztof, Russell,

On 10/08/2020 13:22, Guillaume Tucker wrote:
> The L310_AUX_CTRL_NS_LOCKDOWN flag is set during the L2C enable
> sequence.  There is no need to set it in the default register value,
> this was done before support for it was implemented in the code.  It
> is not set in the hardware initial value either.
> 
> Clean this up by removing this flag from the default l2c_aux_val, and
> add it to the l2c_aux_mask to print an alert message if it was already
> set before the kernel initialisation.
> 
> Signed-off-by: Guillaume Tucker 
> ---
> 
> Notes:
> v2: fix flag name L310_AUX_CTRL_NS_LOCKDOWN
> 
>  arch/arm/mach-exynos/exynos.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

I believe this v2 series has addressed all previous comments and
you were waiting for the 5.9 merge window to end.  The patches
all still apply cleanly on v5.9-rc3.  Do you want me to resend
the series anyway or is there anything else needed at this point?

Maybe one thing that wasn't completely clear in v1 was whether
patch 2/4 was the right approach.  I've explained the reason
behind it but didn't get a final reply from Russell[1].

Best wishes,
Guillaume


[1] 
https://lore.kernel.org/lkml/46fa1159-fcd6-b528-b8e8-2fba04823...@collabora.com/


> diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
> index 36c3785a..a96f3353a0c1 100644
> --- a/arch/arm/mach-exynos/exynos.c
> +++ b/arch/arm/mach-exynos/exynos.c
> @@ -193,8 +193,8 @@ static void __init exynos_dt_fixup(void)
>  }
>  
>  DT_MACHINE_START(EXYNOS_DT, "Samsung Exynos (Flattened Device Tree)")
> - .l2c_aux_val= 0x3c40,
> - .l2c_aux_mask   = 0xc20f,
> + .l2c_aux_val= 0x3840,
> + .l2c_aux_mask   = 0xc60f,
>   .smp= smp_ops(exynos_smp_ops),
>   .map_io = exynos_init_io,
>   .init_early = exynos_firmware_init,
> 



Re: mainline/master bisection: baseline.login on mt8173-elm-hana

2020-08-17 Thread Guillaume Tucker
Please see the bisection report below about a boot failure.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The log doesn't appear to be showing anything from the kernel, so
it's likely to be crashing very early.  Please let us know if you
need some help with investigating this issue, to try booting with
earlyprintk or anything.

Hope this helps.

Thanks,
Guillaume

On 16/08/2020 12:44, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> mainline/master bisection: baseline.login on mt8173-elm-hana
> 
> Summary:
>   Start:  a1d21081a60d Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>   Plain log:  
> https://storage.kernelci.org/mainline/master/v5.8-13249-ga1d21081a60d/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-collabora/baseline-mt8173-elm-hana.txt
>   HTML log:   
> https://storage.kernelci.org/mainline/master/v5.8-13249-ga1d21081a60d/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-collabora/baseline-mt8173-elm-hana.html
>   Result: f97dbf48ca43 irqchip/mtk-sysirq: Convert to a platform driver
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   mainline
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>   Branch: master
>   Target: mt8173-elm-hana
>   CPU arch:   arm64
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: defconfig+CONFIG_RANDOMIZE_BASE=y
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit f97dbf48ca43009e8b8bcdf07f47fc9f06149b36
> Author: Saravana Kannan 
> Date:   Fri Jul 17 17:06:36 2020 -0700
> 
> irqchip/mtk-sysirq: Convert to a platform driver
> 
> This driver can work as a platform driver. So covert it to a platform
> driver.
> 
> Signed-off-by: Saravana Kannan 
> Signed-off-by: Marc Zyngier 
> Reviewed-by: Hanks Chen 
> Link: 
> https://lore.kernel.org/r/20200718000637.3632841-4-sarava...@google.com
> 
> diff --git a/drivers/irqchip/irq-mtk-sysirq.c 
> b/drivers/irqchip/irq-mtk-sysirq.c
> index 6ff98b87e5c0..7299c5ab4d10 100644
> --- a/drivers/irqchip/irq-mtk-sysirq.c
> +++ b/drivers/irqchip/irq-mtk-sysirq.c
> @@ -231,4 +231,6 @@ static int __init mtk_sysirq_of_init(struct device_node 
> *node,
>   kfree(chip_data);
>   return ret;
>  }
> -IRQCHIP_DECLARE(mtk_sysirq, "mediatek,mt6577-sysirq", mtk_sysirq_of_init);
> +IRQCHIP_PLATFORM_DRIVER_BEGIN(mtk_sysirq)
> +IRQCHIP_MATCH("mediatek,mt6577-sysirq", mtk_sysirq_of_init)
> +IRQCHIP_PLATFORM_DRIVER_END(mtk_sysirq)
> ---
> 
> 
> Git bisection log:
> 
> ---
> git bisect start
> # good: [e4cbce4d131753eca271d9d67f58c6377f27ad21] Merge tag 
> 'sched-core-2020-08-03' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good e4cbce4d131753eca271d9d67f58c6377f27ad21
> # bad: [a1d21081a60dfb7fddf4a38b66d9cef603b317a9] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
> git bisect bad a1d21081a60dfb7fddf4a38b66d9cef603b317a9
> # skip: [47ec5303d73ea344e84f46660fff693c57641386] Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
> git bisect skip 47ec5303d73ea344e84f46660fff693c57641386
> # good: [99f0975d760b6320dd5490bd0bc3a31900284757] dt-bindings: i2c: 
> renesas,iic: Document r8a774e1 support
> git bisect good 99f0975d760b6320dd5490bd0bc3a31900284757
> # good: [641ca08547f83bd265477150a66cf2378bc98ed7] nfp: convert to new 
> udp_tunnel_nic infra
> git bisect good 641ca08547f83bd265477150a66cf2378bc98ed7
> # good: [38c392cef19019457ddcfb197ff3d9c5267698e6] powerpc/pseries: remove 
> dlpar_cpu_readd()
> git bisect good 38c392cef19019457ddcfb197ff3d9c5267698e6
> # skip: [cc3365bbd07c26aa2e4c7435068292e03116d4e7] perf tools: Add 
> clockid_name function
> git bisect skip cc3365bbd07c26aa2e4c7435068292e03116d4e7
> # good: [1ea528b0963040273471dc904bf8b0a243741d9f] drm/komeda: Use GEM CMA 
> object functions
> git bisect good 1ea528b0963040273471dc904bf8b0a243741d9f
> # good: 

Re: mainline/master bisection: baseline.bootrr.rockchip-pcie-probed on rk3399-gru-kevin

2020-08-17 Thread Guillaume Tucker
Hi,

Please see the bisection report below about a driver probe
regression with rockchip-pcie.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

It seems to be due to this error:

  <6>[   16.842128] rockchip-pcie f800.pcie: no vpcie12v regulator found

Full log:

  
https://storage.kernelci.org/mainline/master/v5.8-13249-ga1d21081a60d/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.html

The issue was not there in v5.8 when the driver was probing fine:

  
https://storage.kernelci.org/mainline/master/v5.8/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.html

Hope this helps.

Thanks,
Guillaume

On 16/08/2020 16:18, KernelCI bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> mainline/master bisection: baseline.bootrr.rockchip-pcie-probed on 
> rk3399-gru-kevin
> 
> Summary:
>   Start:  a1d21081a60d Merge 
> git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
>   Plain log:  
> https://storage.kernelci.org/mainline/master/v5.8-13249-ga1d21081a60d/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.txt
>   HTML log:   
> https://storage.kernelci.org/mainline/master/v5.8-13249-ga1d21081a60d/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.html
>   Result: 2f96593ecc37 of_address: Add bus type match for pci ranges 
> parser
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   mainline
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>   Branch: master
>   Target: rk3399-gru-kevin
>   CPU arch:   arm64
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.bootrr.rockchip-pcie-probed
> 
> Breaking commit found:
> 
> ---
> commit 2f96593ecc37e98bf99525f0629128080533867f
> Author: Jiaxun Yang 
> Date:   Tue Jul 28 23:36:55 2020 +0800
> 
> of_address: Add bus type match for pci ranges parser
> 
> So the parser can be used to parse range property of ISA bus.
> 
> As they're all using PCI-like method of range property, there is no need
> start a new parser.
> 
> Signed-off-by: Jiaxun Yang 
> Reviewed-by: Rob Herring 
> Signed-off-by: Thomas Bogendoerfer 
> 
> diff --git a/drivers/of/address.c b/drivers/of/address.c
> index 8eea3f6e29a4..813936d419ad 100644
> --- a/drivers/of/address.c
> +++ b/drivers/of/address.c
> @@ -49,6 +49,7 @@ struct of_bus {
>   u64 (*map)(__be32 *addr, const __be32 *range,
>   int na, int ns, int pna);
>   int (*translate)(__be32 *addr, u64 offset, int na);
> + boolhas_flags;
>   unsigned int(*get_flags)(const __be32 *addr);
>  };
>  
> @@ -364,6 +365,7 @@ static struct of_bus of_busses[] = {
>   .count_cells = of_bus_pci_count_cells,
>   .map = of_bus_pci_map,
>   .translate = of_bus_pci_translate,
> + .has_flags = true,
>   .get_flags = of_bus_pci_get_flags,
>   },
>  #endif /* CONFIG_PCI */
> @@ -375,6 +377,7 @@ static struct of_bus of_busses[] = {
>   .count_cells = of_bus_isa_count_cells,
>   .map = of_bus_isa_map,
>   .translate = of_bus_isa_translate,
> + .has_flags = true,
>   .get_flags = of_bus_isa_get_flags,
>   },
>   /* Default */
> @@ -698,9 +701,10 @@ static int parser_init(struct of_pci_range_parser 
> *parser,
>  
>   parser->node = node;
>   parser->pna = of_n_addr_cells(node);
> - parser->na = of_bus_n_addr_cells(node);
> - parser->ns = of_bus_n_size_cells(node);
>   parser->dma = !strcmp(name, "dma-ranges");
> + parser->bus = of_match_bus(node);
> +
> + parser->bus->count_cells(parser->node, >na, >ns);
>  
>   parser->range = of_get_property(node, name, );
>   if (parser->range == NULL)
> @@ -732,6 +736,7 @@ struct of_pci_range *of_pci_range_parser_one(struct 
> of_pci_range_parser *parser,
>   int na = parser->na;
>   int ns = parser->ns;
>   int np = parser->pna + na + ns;
> + int busflag_na = 0;
>  
>   if (!range)
>   return 

Re: [PATCH 2/3] ARM: l2c: update prefetch bits in L2X0_AUX_CTRL using DT value

2020-08-10 Thread Guillaume Tucker
On 29/07/2020 17:22, Guillaume Tucker wrote:
> On 29/07/2020 15:18, Russell King - ARM Linux admin wrote:
>> On Wed, Jul 29, 2020 at 02:47:32PM +0100, Guillaume Tucker wrote:
>>> The L310_PREFETCH_CTRL register bits 28 and 29 to enable data and
>>> instruction prefetch respectively can also be accessed via the
>>> L2X0_AUX_CTRL register.  They appear to be actually wired together in
>>> hardware between the registers.  Changing them in the prefetch
>>> register only will get undone when restoring the aux control register
>>> later on.  For this reason, set these bits in both registers during
>>> initialisation according to the DT attributes.
>>
>> How will that happen?
>>
>> We write the auxiliary control register before the prefetch control
>> register, so the prefetch control register will take precedence.  See
>> l2c310_configure() - l2c_configure() writes the auxiliary control
>> register, and the function writes the prefetch control register later.
> 
> What I'm seeing is that outer_cache.configure() gets called, at
> least on exynos4412-odroidx2.  See l2c_enable():
> 
>   if (outer_cache.configure)
>   outer_cache.configure(_saved_regs);
>   else
>   l2x0_data->configure(base);
> 
> Then instead of l2c310_configure(), exynos_l2_configure() gets
> called and writes prefetch_ctrl right before aux_ctrl.  Should
> exynos_l2_configure() be changed to swap the register writes?
> 
> 
>> I think the real issue is that Exynos has been modifying the prefetch
>> settings via its machine .aux_mask / .aux_val configuration, and the
>> opposite is actually true: the prefetch control register values will
>> overwrite the attempt to modify the auxiliary control values set through
>> the machine .aux_mask/.aux_val.
> 
> Yes with l2c310_configure() but not with exynos_l2_configure().
> 
> To be clear, this is what I've found to be happening, if you
> switch to using the device tree prefetch attributes and clear
> the bits in the default l2c_aux_val (see PATCH 3/3):
> 
> 1. l2x0_of_init() first gets called with the default aux_val
> 
> 2. l2c310_of_parse() sets the bits in l2x0_saved_regs.prefetch_ctrl
>but not in aux_val (unless you apply this patch 2/3)
> 
> 3. l2c_enable() calls exynos_l2_configure() which writes
>prefetch_ctrl and then aux_ctrl - thus setting the prefetch bits
>and then clearing them just after
> 
> 4. l2c310_enable() reads back aux_ctrl and prefetch, both of which
>now have the bits cleared (the pr_info() message about prefetch
>enabled gets skipped)
> 
> 
> That's why I thought it would be safer to set the prefetch bits
> in both registers so it should work regardless if the
> initialisation sequence.  Also, if we want these bits to be
> changed, we should clear them in the aux_mask value to not get
> another error message about register corruption - so I'm doing
> that too.

I've kept this patch as-is in the v2 because I wasn't sure
whether you wanted the issue to be addressed differently in the
end.  I just made it a bit clearer in the commit message that
it's fixing an issue when using the DT prefetch properties.
Please let me know if you want me to rework this in any way.

Thanks,
Guillaume

>>> Fixes: ec3bd0e68a67 ("ARM: 8391/1: l2c: add options to overwrite 
>>> prefetching behavior")
>>> Signed-off-by: Guillaume Tucker 
>>> ---
>>>  arch/arm/mm/cache-l2x0.c | 16 
>>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
>>> index 12c26eb88afb..43d91bfd2360 100644
>>> --- a/arch/arm/mm/cache-l2x0.c
>>> +++ b/arch/arm/mm/cache-l2x0.c
>>> @@ -1249,20 +1249,28 @@ static void __init l2c310_of_parse(const struct 
>>> device_node *np,
>>>  
>>> ret = of_property_read_u32(np, "prefetch-data", );
>>> if (ret == 0) {
>>> -   if (val)
>>> +   if (val) {
>>> prefetch |= L310_PREFETCH_CTRL_DATA_PREFETCH;
>>> -   else
>>> +   *aux_val |= L310_PREFETCH_CTRL_DATA_PREFETCH;
>>> +   } else {
>>> prefetch &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
>>> +   *aux_val &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
>>> +   }
>>> +   *aux_mask &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
>>> } else if (ret != -EINVAL) {
>>> pr_err("L2C-310 OF prefetch-data property value is missing\n");
>>> 

Re: [PATCH 3/3] ARM: exynos: use DT prefetch attributes rather than l2c_aux_val

2020-08-10 Thread Guillaume Tucker
On 03/08/2020 14:13, Krzysztof Kozlowski wrote:
> On Wed, Jul 29, 2020 at 02:47:33PM +0100, Guillaume Tucker wrote:
>> Use the standard l2c2x0 device tree bindings to enable data and
>> instruction prefetch on exynos4210 and exynos4412 and clear the
>> respective bits in the default l2c_aux_val.  No other Exynos platform
>> relying on this default register value appears to be using the l2x0
>> cache.
>>
>> Signed-off-by: Guillaume Tucker 
>> ---
>>  arch/arm/boot/dts/exynos4210.dtsi | 2 ++
>>  arch/arm/boot/dts/exynos4412.dtsi | 2 ++
>>  arch/arm/mach-exynos/exynos.c | 4 ++--
> 
> I will need these split between DTS and mach changes.

Of course, sorry.  Fixed in v2.

Thanks,
Guillaume


Re: [PATCH 1/3] ARM: exynos: clear L220_AUX_CTRL_NS_LOCKDOWN in default l2c_aux_val

2020-08-10 Thread Guillaume Tucker
On 03/08/2020 15:22, Russell King - ARM Linux admin wrote:
> On Mon, Aug 03, 2020 at 03:34:39PM +0200, Krzysztof Kozlowski wrote:
>> On Wed, Jul 29, 2020 at 02:47:31PM +0100, Guillaume Tucker wrote:
>>> The L220_AUX_CTRL_NS_LOCKDOWN flag is set during the L2C enable
>>> sequence.  There is no need to set it in the default register value,
>>> this was done before support for it was implemented in the code.  It
>>> is not set in the hardware initial value either.
>>>
>>> Clean this up by removing this flag from the default l2c_aux_val, and
>>> add it to the l2c_aux_mask to print an alert message if it was already
>>> set before the kernel initialisation.
>>>
>>> Signed-off-by: Guillaume Tucker 
>>> ---
>>>  arch/arm/mach-exynos/exynos.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> Makes sense. I'll take it after the merge window.
> 
> Yes, because platforms actually have no control over this bit through
> these values.
> 
> Please fix the description to use the right define, it's
> L310_AUX_CTRL_NS_LOCKDOWN not L220_AUX_CTRL_NS_LOCKDOWN.

Thanks, fixed in v2.

Guilaume



[PATCH v2 1/4] ARM: exynos: clear L310_AUX_CTRL_NS_LOCKDOWN in default l2c_aux_val

2020-08-10 Thread Guillaume Tucker
The L310_AUX_CTRL_NS_LOCKDOWN flag is set during the L2C enable
sequence.  There is no need to set it in the default register value,
this was done before support for it was implemented in the code.  It
is not set in the hardware initial value either.

Clean this up by removing this flag from the default l2c_aux_val, and
add it to the l2c_aux_mask to print an alert message if it was already
set before the kernel initialisation.

Signed-off-by: Guillaume Tucker 
---

Notes:
v2: fix flag name L310_AUX_CTRL_NS_LOCKDOWN

 arch/arm/mach-exynos/exynos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index 36c3785a..a96f3353a0c1 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -193,8 +193,8 @@ static void __init exynos_dt_fixup(void)
 }
 
 DT_MACHINE_START(EXYNOS_DT, "Samsung Exynos (Flattened Device Tree)")
-   .l2c_aux_val= 0x3c40,
-   .l2c_aux_mask   = 0xc20f,
+   .l2c_aux_val= 0x3840,
+   .l2c_aux_mask   = 0xc60f,
.smp= smp_ops(exynos_smp_ops),
.map_io = exynos_init_io,
.init_early = exynos_firmware_init,
-- 
2.20.1



[PATCH v2 2/4] ARM: l2c: fix prefetch bits init in L2X0_AUX_CTRL using DT values

2020-08-10 Thread Guillaume Tucker
The L310_PREFETCH_CTRL register bits 28 and 29 to enable data and
instruction prefetch respectively can also be accessed via the
L2X0_AUX_CTRL register.  They appear to be actually wired together in
hardware between the registers.  Changing them in the prefetch
register only will get undone when restoring the aux control register
later on.  For this reason, set these bits in both registers during
initialisation according to the devicetree property values.

Fixes: ec3bd0e68a67 ("ARM: 8391/1: l2c: add options to overwrite prefetching 
behavior")
Signed-off-by: Guillaume Tucker 
---

Notes:
v2: tweak commit message to show this is a fix

 arch/arm/mm/cache-l2x0.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 12c26eb88afb..43d91bfd2360 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1249,20 +1249,28 @@ static void __init l2c310_of_parse(const struct 
device_node *np,
 
ret = of_property_read_u32(np, "prefetch-data", );
if (ret == 0) {
-   if (val)
+   if (val) {
prefetch |= L310_PREFETCH_CTRL_DATA_PREFETCH;
-   else
+   *aux_val |= L310_PREFETCH_CTRL_DATA_PREFETCH;
+   } else {
prefetch &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
+   *aux_val &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
+   }
+   *aux_mask &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
} else if (ret != -EINVAL) {
pr_err("L2C-310 OF prefetch-data property value is missing\n");
}
 
ret = of_property_read_u32(np, "prefetch-instr", );
if (ret == 0) {
-   if (val)
+   if (val) {
prefetch |= L310_PREFETCH_CTRL_INSTR_PREFETCH;
-   else
+   *aux_val |= L310_PREFETCH_CTRL_INSTR_PREFETCH;
+   } else {
prefetch &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
+   *aux_val &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
+   }
+   *aux_mask &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
} else if (ret != -EINVAL) {
pr_err("L2C-310 OF prefetch-instr property value is missing\n");
}
-- 
2.20.1



[PATCH v2 3/4] ARM: dts: exynos: add prefetch properties for L2C-310 cache

2020-08-10 Thread Guillaume Tucker
Add the devicetree properties to enable instruction and data prefetch
on exynos4210 and exynos4412 which use the L2C-310 cache.  No other
Exynos chip appears to be using this L2 cache hardware.

This follows the default bits being set in the l2c_aux_val register
for the Exynos platform, which can now be cleared as a result.

Signed-off-by: Guillaume Tucker 
---

Notes:
v2: split patch to include devicetree changes only

 arch/arm/boot/dts/exynos4210.dtsi | 2 ++
 arch/arm/boot/dts/exynos4412.dtsi | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/arch/arm/boot/dts/exynos4210.dtsi 
b/arch/arm/boot/dts/exynos4210.dtsi
index b4466232f0c1..7e0d253b26ef 100644
--- a/arch/arm/boot/dts/exynos4210.dtsi
+++ b/arch/arm/boot/dts/exynos4210.dtsi
@@ -102,6 +102,8 @@
reg = <0x10502000 0x1000>;
cache-unified;
cache-level = <2>;
+   prefetch-data = <1>;
+   prefetch-instr = <1>;
arm,tag-latency = <2 2 1>;
arm,data-latency = <2 2 1>;
};
diff --git a/arch/arm/boot/dts/exynos4412.dtsi 
b/arch/arm/boot/dts/exynos4412.dtsi
index 48868947373e..37efa247bf4d 100644
--- a/arch/arm/boot/dts/exynos4412.dtsi
+++ b/arch/arm/boot/dts/exynos4412.dtsi
@@ -218,6 +218,8 @@
reg = <0x10502000 0x1000>;
cache-unified;
cache-level = <2>;
+   prefetch-data = <1>;
+   prefetch-instr = <1>;
arm,tag-latency = <2 2 1>;
arm,data-latency = <3 2 1>;
arm,double-linefill = <1>;
-- 
2.20.1



[PATCH v2 4/4] ARM: exynos: clear prefetch bits in default l2c_aux_val

2020-08-10 Thread Guillaume Tucker
Clear the L310_AUX_CTRL_DATA_PREFETCH and L310_AUX_CTRL_INSTR_PREFETCH
bits in the l2c_aux_val defaults for Exynos since they can now be set
using the standard l2c2x0 devicetree bindings.

Signed-off-by: Guillaume Tucker 
---

Notes:
v2: split patch to only clear exynos platform register bits

 arch/arm/mach-exynos/exynos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index a96f3353a0c1..0e906cc3a48e 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -193,8 +193,8 @@ static void __init exynos_dt_fixup(void)
 }
 
 DT_MACHINE_START(EXYNOS_DT, "Samsung Exynos (Flattened Device Tree)")
-   .l2c_aux_val= 0x3840,
-   .l2c_aux_mask   = 0xc60f,
+   .l2c_aux_val= 0x0840,
+   .l2c_aux_mask   = 0xf60f,
.smp= smp_ops(exynos_smp_ops),
.map_io = exynos_init_io,
.init_early = exynos_firmware_init,
-- 
2.20.1



Re: [PATCH 2/3] ARM: l2c: update prefetch bits in L2X0_AUX_CTRL using DT value

2020-07-29 Thread Guillaume Tucker
On 29/07/2020 15:18, Russell King - ARM Linux admin wrote:
> On Wed, Jul 29, 2020 at 02:47:32PM +0100, Guillaume Tucker wrote:
>> The L310_PREFETCH_CTRL register bits 28 and 29 to enable data and
>> instruction prefetch respectively can also be accessed via the
>> L2X0_AUX_CTRL register.  They appear to be actually wired together in
>> hardware between the registers.  Changing them in the prefetch
>> register only will get undone when restoring the aux control register
>> later on.  For this reason, set these bits in both registers during
>> initialisation according to the DT attributes.
> 
> How will that happen?
> 
> We write the auxiliary control register before the prefetch control
> register, so the prefetch control register will take precedence.  See
> l2c310_configure() - l2c_configure() writes the auxiliary control
> register, and the function writes the prefetch control register later.

What I'm seeing is that outer_cache.configure() gets called, at
least on exynos4412-odroidx2.  See l2c_enable():

if (outer_cache.configure)
outer_cache.configure(_saved_regs);
else
l2x0_data->configure(base);

Then instead of l2c310_configure(), exynos_l2_configure() gets
called and writes prefetch_ctrl right before aux_ctrl.  Should
exynos_l2_configure() be changed to swap the register writes?


> I think the real issue is that Exynos has been modifying the prefetch
> settings via its machine .aux_mask / .aux_val configuration, and the
> opposite is actually true: the prefetch control register values will
> overwrite the attempt to modify the auxiliary control values set through
> the machine .aux_mask/.aux_val.

Yes with l2c310_configure() but not with exynos_l2_configure().

To be clear, this is what I've found to be happening, if you
switch to using the device tree prefetch attributes and clear
the bits in the default l2c_aux_val (see PATCH 3/3):

1. l2x0_of_init() first gets called with the default aux_val

2. l2c310_of_parse() sets the bits in l2x0_saved_regs.prefetch_ctrl
   but not in aux_val (unless you apply this patch 2/3)

3. l2c_enable() calls exynos_l2_configure() which writes
   prefetch_ctrl and then aux_ctrl - thus setting the prefetch bits
   and then clearing them just after

4. l2c310_enable() reads back aux_ctrl and prefetch, both of which
   now have the bits cleared (the pr_info() message about prefetch
   enabled gets skipped)


That's why I thought it would be safer to set the prefetch bits
in both registers so it should work regardless if the
initialisation sequence.  Also, if we want these bits to be
changed, we should clear them in the aux_mask value to not get
another error message about register corruption - so I'm doing
that too.

Thanks,
Guillaume


>> Fixes: ec3bd0e68a67 ("ARM: 8391/1: l2c: add options to overwrite prefetching 
>> behavior")
>> Signed-off-by: Guillaume Tucker 
>> ---
>>  arch/arm/mm/cache-l2x0.c | 16 
>>  1 file changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
>> index 12c26eb88afb..43d91bfd2360 100644
>> --- a/arch/arm/mm/cache-l2x0.c
>> +++ b/arch/arm/mm/cache-l2x0.c
>> @@ -1249,20 +1249,28 @@ static void __init l2c310_of_parse(const struct 
>> device_node *np,
>>  
>>  ret = of_property_read_u32(np, "prefetch-data", );
>>  if (ret == 0) {
>> -if (val)
>> +if (val) {
>>  prefetch |= L310_PREFETCH_CTRL_DATA_PREFETCH;
>> -else
>> +*aux_val |= L310_PREFETCH_CTRL_DATA_PREFETCH;
>> +} else {
>>  prefetch &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
>> +*aux_val &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
>> +}
>> +*aux_mask &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
>>  } else if (ret != -EINVAL) {
>>  pr_err("L2C-310 OF prefetch-data property value is missing\n");
>>  }
>>  
>>  ret = of_property_read_u32(np, "prefetch-instr", );
>>  if (ret == 0) {
>> -if (val)
>> +if (val) {
>>  prefetch |= L310_PREFETCH_CTRL_INSTR_PREFETCH;
>> -else
>> +*aux_val |= L310_PREFETCH_CTRL_INSTR_PREFETCH;
>> +} else {
>>  prefetch &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
>> +*aux_val &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
>> +}
>> +*aux_mask &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
>>  } else if (ret != -EINVAL) {
>>  pr_err("L2C-310 OF prefetch-instr property value is missing\n");
>>  }
>> -- 
>> 2.20.1
>>
>>
> 



[PATCH 1/3] ARM: exynos: clear L220_AUX_CTRL_NS_LOCKDOWN in default l2c_aux_val

2020-07-29 Thread Guillaume Tucker
The L220_AUX_CTRL_NS_LOCKDOWN flag is set during the L2C enable
sequence.  There is no need to set it in the default register value,
this was done before support for it was implemented in the code.  It
is not set in the hardware initial value either.

Clean this up by removing this flag from the default l2c_aux_val, and
add it to the l2c_aux_mask to print an alert message if it was already
set before the kernel initialisation.

Signed-off-by: Guillaume Tucker 
---
 arch/arm/mach-exynos/exynos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index 36c3785a..a96f3353a0c1 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -193,8 +193,8 @@ static void __init exynos_dt_fixup(void)
 }
 
 DT_MACHINE_START(EXYNOS_DT, "Samsung Exynos (Flattened Device Tree)")
-   .l2c_aux_val= 0x3c40,
-   .l2c_aux_mask   = 0xc20f,
+   .l2c_aux_val= 0x3840,
+   .l2c_aux_mask   = 0xc60f,
.smp= smp_ops(exynos_smp_ops),
.map_io = exynos_init_io,
.init_early = exynos_firmware_init,
-- 
2.20.1



[PATCH 2/3] ARM: l2c: update prefetch bits in L2X0_AUX_CTRL using DT value

2020-07-29 Thread Guillaume Tucker
The L310_PREFETCH_CTRL register bits 28 and 29 to enable data and
instruction prefetch respectively can also be accessed via the
L2X0_AUX_CTRL register.  They appear to be actually wired together in
hardware between the registers.  Changing them in the prefetch
register only will get undone when restoring the aux control register
later on.  For this reason, set these bits in both registers during
initialisation according to the DT attributes.

Fixes: ec3bd0e68a67 ("ARM: 8391/1: l2c: add options to overwrite prefetching 
behavior")
Signed-off-by: Guillaume Tucker 
---
 arch/arm/mm/cache-l2x0.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 12c26eb88afb..43d91bfd2360 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1249,20 +1249,28 @@ static void __init l2c310_of_parse(const struct 
device_node *np,
 
ret = of_property_read_u32(np, "prefetch-data", );
if (ret == 0) {
-   if (val)
+   if (val) {
prefetch |= L310_PREFETCH_CTRL_DATA_PREFETCH;
-   else
+   *aux_val |= L310_PREFETCH_CTRL_DATA_PREFETCH;
+   } else {
prefetch &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
+   *aux_val &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
+   }
+   *aux_mask &= ~L310_PREFETCH_CTRL_DATA_PREFETCH;
} else if (ret != -EINVAL) {
pr_err("L2C-310 OF prefetch-data property value is missing\n");
}
 
ret = of_property_read_u32(np, "prefetch-instr", );
if (ret == 0) {
-   if (val)
+   if (val) {
prefetch |= L310_PREFETCH_CTRL_INSTR_PREFETCH;
-   else
+   *aux_val |= L310_PREFETCH_CTRL_INSTR_PREFETCH;
+   } else {
prefetch &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
+   *aux_val &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
+   }
+   *aux_mask &= ~L310_PREFETCH_CTRL_INSTR_PREFETCH;
} else if (ret != -EINVAL) {
pr_err("L2C-310 OF prefetch-instr property value is missing\n");
}
-- 
2.20.1



[PATCH 3/3] ARM: exynos: use DT prefetch attributes rather than l2c_aux_val

2020-07-29 Thread Guillaume Tucker
Use the standard l2c2x0 device tree bindings to enable data and
instruction prefetch on exynos4210 and exynos4412 and clear the
respective bits in the default l2c_aux_val.  No other Exynos platform
relying on this default register value appears to be using the l2x0
cache.

Signed-off-by: Guillaume Tucker 
---
 arch/arm/boot/dts/exynos4210.dtsi | 2 ++
 arch/arm/boot/dts/exynos4412.dtsi | 2 ++
 arch/arm/mach-exynos/exynos.c | 4 ++--
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/exynos4210.dtsi 
b/arch/arm/boot/dts/exynos4210.dtsi
index b4466232f0c1..7e0d253b26ef 100644
--- a/arch/arm/boot/dts/exynos4210.dtsi
+++ b/arch/arm/boot/dts/exynos4210.dtsi
@@ -102,6 +102,8 @@
reg = <0x10502000 0x1000>;
cache-unified;
cache-level = <2>;
+   prefetch-data = <1>;
+   prefetch-instr = <1>;
arm,tag-latency = <2 2 1>;
arm,data-latency = <2 2 1>;
};
diff --git a/arch/arm/boot/dts/exynos4412.dtsi 
b/arch/arm/boot/dts/exynos4412.dtsi
index 48868947373e..37efa247bf4d 100644
--- a/arch/arm/boot/dts/exynos4412.dtsi
+++ b/arch/arm/boot/dts/exynos4412.dtsi
@@ -218,6 +218,8 @@
reg = <0x10502000 0x1000>;
cache-unified;
cache-level = <2>;
+   prefetch-data = <1>;
+   prefetch-instr = <1>;
arm,tag-latency = <2 2 1>;
arm,data-latency = <3 2 1>;
arm,double-linefill = <1>;
diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index a96f3353a0c1..0e906cc3a48e 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -193,8 +193,8 @@ static void __init exynos_dt_fixup(void)
 }
 
 DT_MACHINE_START(EXYNOS_DT, "Samsung Exynos (Flattened Device Tree)")
-   .l2c_aux_val= 0x3840,
-   .l2c_aux_mask   = 0xc60f,
+   .l2c_aux_val= 0x0840,
+   .l2c_aux_mask   = 0xf60f,
.smp= smp_ops(exynos_smp_ops),
.map_io = exynos_init_io,
.init_early = exynos_firmware_init,
-- 
2.20.1



Re: mainline/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15

2020-07-16 Thread Guillaume Tucker
Hi André,

Sorry for the delay, I missed the thread on this issue.

On 03/07/2020 11:49, André Przywara wrote:
> On 03/07/2020 06:38, kernelci.org bot wrote:
> 
> Hi Guillaume,
> 
> is this report legit? The situation didn't change from Monday, I just
> repeated the test with mainline compared to my patch reverted.
> 
> What is the actual failure here? You pointed to:
> <2>GIC CPU mask not found - kernel will fail to boot.
> but I don't see any explicit line stating that as the culprit anywhere
> in the logs. Actually the last line says:
> 00:24:07.04  Job finished correctly

The failure is a "crit" kernel error message.  The test job still
completes, but it has detected this new error and the bisection
reliably lands on the same commit mentioned below as the root
cause.  As you found out, the commit is not to blame so this is a
false positive.  However, it's a bit more complicated...

> And I see the GIC messages with and without this patch. As mentioned on
> Monday, "-smp 2" should be added to the QEMU command line to fix that.

All the test jobs involved in this bisection can be found here:

  
https://lava.ciplatform.org/scheduler/alljobs?length=25=lava-bisection-2224#table

The bisection first ran the "good" and "bad" revisions and didn't
detect the kernel error message with the "good" one.  However,
taking a closer look at the logs, the error was actually there:

  https://lava.ciplatform.org/scheduler/job/27647#L454

and then there were some kernel warnings:

  https://lava.ciplatform.org/scheduler/job/27647#L561

which didn't occur with the "bad" revision:

  https://lava.ciplatform.org/scheduler/job/27648


For some reason probably related to the kernel warnings, when
testing the "good" revision the GIC kernel errors got silenced
and dmesg didn't print them.  This mislead the test into a false
positive.

Bisections are only run when a regression occurs, and it looks
like these GIC errors have been around for a long time.  What I
believe happened is that the errors got hidden at some
point (allegedly due to the kernel warnings) and then came back
later.  This was then wrongly detected as a regression.


So we have 2 problems to solve, the first one is to actually
remove these kernel errors and you've explained how to do that
with the QEMU command line.  I've resubmitted the test job with
it and it worked indeed:

  https://lava.collabora.co.uk/scheduler/job/2493885

and sent a fix for it:

  https://github.com/kernelci/kernelci-core/pull/442


The other problem is about avoiding such cases from occurring
again in the future on kernelci.org, by making kernel error
detection more robust.  But that's not a kernel problem.

Please bear with us, hopefully this false positive should not
come back.  Thanks for your help with investigating the GIC
errors in the first place.

Guillaume

>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has  *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.  *
>> *   *
>> * If you do send a fix, please include this trailer:*
>> *   Reported-by: "kernelci.org bot"   *
>> *   *
>> * Hope this helps!  *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> mainline/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15
>>
>> Summary:
>>   Start:  7cc2a8ea1048 Merge tag 'block-5.8-2020-07-01' of 
>> git://git.kernel.dk/linux-block
>>   Plain log:  
>> https://storage.kernelci.org/mainline/master/v5.8-rc3-37-g7cc2a8ea1048/arm/vexpress_defconfig/gcc-8/lab-cip/baseline-vexpress-v2p-ca15-tc1.txt
>>   HTML log:   
>> https://storage.kernelci.org/mainline/master/v5.8-rc3-37-g7cc2a8ea1048/arm/vexpress_defconfig/gcc-8/lab-cip/baseline-vexpress-v2p-ca15-tc1.html
>>   Result: 38ac46002d1d arm: dts: vexpress: Move mcc node back into 
>> motherboard node
>>
>> Checks:
>>   revert: PASS
>>   verify: PASS
> 
> What does that mean? That reverting the patch made the test pass?
> I did exactly that, and reverting made it worse, because poweroff
> doesn't work (among other things).
> So could this be a testing artifact? Because of the failing poweroff the
> test timed out or something?
> 
> Many thanks,
> Andre
> 
>>
>> Parameters:
>>   Tree:   mainline
>>   URL:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
>>   Branch: master
>>   Target: qemu_arm-vexpress-a15
>>   CPU arch:   arm
>>   Lab:lab-cip
>>   Compiler:   gcc-8
>>   Config: vexpress_defconfig
>>   Test case:  baseline.dmesg.crit
>>
>> Breaking commit found:
>>
>> 

Re: mainline/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15

2020-07-16 Thread Guillaume Tucker
On 06/07/2020 13:49, Sudeep Holla wrote:
> Hi,
> 
> On Sun, Jul 05, 2020 at 07:12:58PM -0700, kernelci.org bot wrote:
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has  *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.  *
>> *   *
>> * If you do send a fix, please include this trailer:*
>> *   Reported-by: "kernelci.org bot"   *
>> *   *
>> * Hope this helps!  *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
> 
> Andre test and replied to one of the similar but earlier reports.
> Unless we get some response to that, we can't proceed and we can't
> do much other than ignoring these reports. Please respond to Andre's
> queries.


Sorry I missed your email, the regression is still there.  I'll
reply to André.

Guillaume


Re: chrome-platform/for-kernelci bisection: baseline.bootrr.rockchip-dp-probed on rk3399-gru-kevin

2020-07-09 Thread Guillaume Tucker
On 09/07/2020 10:17, Enric Balletbo i Serra wrote:
> Hi,
> 
> On 8/7/20 22:32, Guenter Roeck wrote:
>> On 7/8/20 11:59 AM, kernelci.org bot wrote:
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>> * This automated bisection report was sent to you on the basis  *
>>> * that you may be involved with the breaking commit it has  *
>>> * found.  No manual investigation has been done to verify it,   *
>>> * and the root cause of the problem may be somewhere else.  *
>>> *   *
>>> * If you do send a fix, please include this trailer:*
>>> *   Reported-by: "kernelci.org bot"   *
>>> *   *
>>> * Hope this helps!  *
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>
>>> chrome-platform/for-kernelci bisection: baseline.bootrr.rockchip-dp-probed 
>>> on rk3399-gru-kevin
>>>
>>> Summary:
>>>   Start:  154353417996 KERNELCI: x86_64_defconfig: Enable support for 
>>> Chromebooks devices
>>>   Plain log:  
>>> https://storage.kernelci.org/chrome-platform/for-kernelci/v5.8-rc1-20-g154353417996/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.txt
>>>   HTML log:   
>>> https://storage.kernelci.org/chrome-platform/for-kernelci/v5.8-rc1-20-g154353417996/arm64/defconfig/gcc-8/lab-collabora/baseline-rk3399-gru-kevin.html
>>>   Result: 8c9a6ef40bf4 platform/chrome: cros_ec_proto: Convert EC error 
>>> codes to Linux error codes
>>>
>>> Checks:
>>>   revert: PASS
>>>   verify: PASS
>>>
>>> Parameters:
>>>   Tree:   chrome-platform
>>>   URL:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux.git
>>>   Branch: for-kernelci
>>>   Target: rk3399-gru-kevin
>>>   CPU arch:   arm64
>>>   Lab:lab-collabora
>>>   Compiler:   gcc-8
>>>   Config: defconfig
>>>   Test case:  baseline.bootrr.rockchip-dp-probed
>>>
>>> Breaking commit found:
>>>
>>> ---
>>> commit 8c9a6ef40bf400c64c9907031bd32b59f9d4aea2
>>> Author: Guenter Roeck 
>>> Date:   Sat Jul 4 07:26:07 2020 -0700
>>>
>>> platform/chrome: cros_ec_proto: Convert EC error codes to Linux error 
>>> codes
>>> 
>>> The EC reports a variety of error codes. Most of those, with the 
>>> exception
>>> of EC_RES_INVALID_VERSION, are converted to -EPROTO. As result, the 
>>> actual
>>> error code gets lost. Convert all EC errors to Linux error codes to 
>>> report
>>> a more meaningful error to the caller to aid debugging.
>>> 
>>> Cc: Yu-Hsuan Hsu 
>>> Cc: Prashant Malani 
>>> Signed-off-by: Guenter Roeck 
>>> Signed-off-by: Enric Balletbo i Serra 
>>>
> 
> So, as Guenter pointed I dropped this patch now.
> 
>>
>> So, just FTR, turns out that there are callers which specifically check for
>> -EPROTO and examine the EC error code if it is returned, or just accept
>> -EPROTO as generic failure (but nothing else). Example is 
>> drivers/pwm/pwm-cros-ec.c:
>> cros_ec_num_pwms(). Such commands now fail, in this case because
>> EC_RES_INVALID_PARAM is now returned as -EINVAL and cros_ec_num_pwms()
>> doesn't expect that.
>>
> 
> Right, that's interesting, and I'll take in consideration for future reworks 
> of
> the above patch and also take a deeper look at those specific cases reported.

This bisection is probably one of the most interesting ones
indeed.  I should mention it when I finally get round to making
a "KernelCI bisections hall of fame" blog post.

> BTW, Guillaume, I queued that patch to give a try and test 3 days ago. Is the
> bisection job expected to take that time to run? In this case I think it also
> took some time to receive the build test, so probably is just a matter of 
> having
> lot of jobs in the queue?
> 
> I am not complaining at all, just curious, and just want to know to improve my
> maintainer workflow.

I think what you're doing is perfectly fine, there were some
issues with Jenkins and one build server that caused some
KernelCI builds to not be run this week.  Also there is an
intermittent bug in LAVA that causes tests to not run, so I think
the delay was due to an unfortunate combination of infrastructure
issues.

We now have a rather fast build server dedicated to bisections,
so for a maintainer branch like yours where it just takes a
handful of iterations I would expect this kind of report to be
sent at most 6h after a git push.  Bisecting linux-next can take
a few extra hours with typically 10~15 iterations.

Best wishes,
Guillaume

>> drivers/iio/common/cros_ec_sensors/cros_ec_sensors.c has a similar problem;
>> it only accepts -EPROTO as "valid" error, but nothing else. I didn't check
>> for others.
>>
>> Guenter
>>



Re: next/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15

2020-06-26 Thread Guillaume Tucker
On 26/06/2020 20:11, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.dmesg.crit on qemu_arm-vexpress-a15
> 
> Summary:
>   Start:  36e3135df4d4 Add linux-next specific files for 20200626
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200626/arm/vexpress_defconfig/gcc-8/lab-collabora/baseline-vexpress-v2p-ca15-tc1.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200626/arm/vexpress_defconfig/gcc-8/lab-collabora/baseline-vexpress-v2p-ca15-tc1.html
>   Result: 38ac46002d1d arm: dts: vexpress: Move mcc node back into 
> motherboard node
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: qemu_arm-vexpress-a15
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: vexpress_defconfig
>   Test case:  baseline.dmesg.crit

The critical error message allegedly caused by this commit is:

<2>GIC CPU mask not found - kernel will fail to boot.

The report needs to be improved to show it automatically, sorry
if this wasn't clear.

Guillaume


> Breaking commit found:
> 
> ---
> commit 38ac46002d1df5707566a73486452851341028d2
> Author: Andre Przywara 
> Date:   Wed Jun 3 17:22:37 2020 +0100
> 
> arm: dts: vexpress: Move mcc node back into motherboard node
> 
> Commit d9258898ad49 ("arm64: dts: arm: vexpress: Move fixed devices
> out of bus node") moved the "mcc" DT node into the root node, because
> it does not have any children using "reg" properties, so does violate
> some dtc checks about "simple-bus" nodes.
> 
> However this broke the vexpress config-bus code, which walks up the
> device tree to find the first node with an "arm,vexpress,site" property.
> This gave the wrong result (matching the root node instead of the
> motherboard node), so broke the clocks and some other devices for
> VExpress boards.
> 
> Move the whole node back into its original position. This re-introduces
> the dtc warning, but is conceptually the right thing to do. The dtc
> warning seems to be overzealous here, there are discussions on fixing or
> relaxing this check instead.
> 
> Link: 
> https://lore.kernel.org/r/20200603162237.16319-1-andre.przyw...@arm.com
> Fixes: d9258898ad49 ("arm64: dts: vexpress: Move fixed devices out of bus 
> node")
> Reported-and-tested-by: Guenter Roeck 
> Signed-off-by: Andre Przywara 
> Signed-off-by: Sudeep Holla 
> 
> diff --git a/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi 
> b/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi
> index e6308fb76183..a88ee5294d35 100644
> --- a/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi
> +++ b/arch/arm/boot/dts/vexpress-v2m-rs1.dtsi
> @@ -100,79 +100,6 @@
>   };
>   };
>  
> - mcc {
> - compatible = "arm,vexpress,config-bus";
> - arm,vexpress,config-bridge = <_sysreg>;
> -
> - oscclk0 {
> - /* MCC static memory clock */
> - compatible = "arm,vexpress-osc";
> - arm,vexpress-sysreg,func = <1 0>;
> - freq-range = <2500 6000>;
> - #clock-cells = <0>;
> - clock-output-names = "v2m:oscclk0";
> - };
> -
> - v2m_oscclk1: oscclk1 {
> - /* CLCD clock */
> - compatible = "arm,vexpress-osc";
> - arm,vexpress-sysreg,func = <1 1>;
> - freq-range = <2375 6500>;
> - #clock-cells = <0>;
> - clock-output-names = "v2m:oscclk1";
> - };
> -
> - v2m_oscclk2: oscclk2 {
> - /* IO FPGA peripheral clock */
> - compatible = "arm,vexpress-osc";
> - arm,vexpress-sysreg,func = <1 2>;
> - freq-range = <2400 2400>;
> - #clock-cells = <0>;
> - clock-output-names = "v2m:oscclk2";
> - };
> -
> - 

Re: media/master bisection: v4l2-compliance-uvc.Buffer-ioctls-Input-0.VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF on rk3399-gru-kevin

2020-06-26 Thread Guillaume Tucker
On 26/06/2020 08:02, Hans Verkuil wrote:
> Hi Guillaume,
> 
> You need to update v4l-utils to the latest version from our git master branch.
> 
> The reserved field in reqbufs is now in use as a flags field, so it is no 
> longer
> zero. The compliance test has been updated accordingly in the v4l-utils git 
> repo.

I see, thanks.  It's getting updated today.

Guillaume


> On 26/06/2020 08:56, Guillaume Tucker wrote:
>> Please see the bisection report below about a regression in
>> v4l2-compliance with uvcvideo:
>>
>> [   25.495039] uvcvideo: Failed to query (SET_CUR) UVC control 10 on unit 2: 
>> -32 (exp. 2).
>>  fail: v4l2-test-buffers.cpp(680): check_0(reqbufs.reserved, 
>> sizeof(reqbufs.reserved))
>>  test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: FAIL
>>
>>
>> as seen in the full job log:
>>
>> 
>> https://storage.kernelci.org/media/master/v5.8-rc1-64-ge30cc79cc80f/arm64/defconfig/gcc-8/lab-collabora/v4l2-compliance-uvc-rk3399-gru-kevin.html#L1713
>>
>> with a few more details about the regression here:
>>
>> https://kernelci.org/test/case/id/5ef23169140826f73d97bf51/
>>
>> and the same test case failure also seen with vivid:
>>
>> https://kernelci.org/test/case/id/5ef23699f641f7b3e597bf3f/
>>
>>
>> The bisection actually ran a couple of days ago but there was an
>> email error when sending the report, so I'm sending it by hand
>> now.  I hope the issue hasn't spread too widely already, I know
>> it's also affecting linux-next.
>>
>> Guillaume
>>
>>
>> On 25/06/2020 23:19, kernelci.org bot wrote:
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>> * This automated bisection report was sent to you on the basis  *
>>> * that you may be involved with the breaking commit it has  *
>>> * found.  No manual investigation has been done to verify it,   *
>>> * and the root cause of the problem may be somewhere else.  *
>>> *   *
>>> * If you do send a fix, please include this trailer:*
>>> *   Reported-by: "kernelci.org bot"   *
>>> *   *
>>> * Hope this helps!  *
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>
>>> media/master bisection: 
>>> v4l2-compliance-uvc.Buffer-ioctls-Input-0.VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF
>>>  on rk3399-gru-kevin
>>>
>>> Summary:
>>>   Start:  e30cc79cc80f media: media-request: Fix crash if memory 
>>> allocation fails
>>>   Plain log:  
>>> https://storage.kernelci.org/media/master/v5.8-rc1-64-ge30cc79cc80f/arm64/defconfig/gcc-8/lab-collabora/v4l2-compliance-uvc-rk3399-gru-kevin.txt
>>>   HTML log:   
>>> https://storage.kernelci.org/media/master/v5.8-rc1-64-ge30cc79cc80f/arm64/defconfig/gcc-8/lab-collabora/v4l2-compliance-uvc-rk3399-gru-kevin.html
>>>   Result: 1e0b2318fa75 media: videobuf2: handle 
>>> V4L2_FLAG_MEMORY_NON_CONSISTENT flag
>>>
>>> Checks:
>>>   revert: PASS
>>>   verify: PASS
>>>
>>> Parameters:
>>>   Tree:   media
>>>   URL:https://git.linuxtv.org/media_tree.git
>>>   Branch: master
>>>   Target: rk3399-gru-kevin
>>>   CPU arch:   arm64
>>>   Lab:lab-collabora
>>>   Compiler:   gcc-8
>>>   Config: defconfig
>>>   Test case:  
>>> v4l2-compliance-uvc.Buffer-ioctls-Input-0.VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF
>>>
>>> Breaking commit found:
>>>
>>> ---
>>> commit 1e0b2318fa75d186ee0d2be31843ce867385fcc4
>>> Author: Sergey Senozhatsky 
>>> Date:   Thu May 14 18:01:45 2020 +0200
>>>
>>> media: videobuf2: handle V4L2_FLAG_MEMORY_NON_CONSISTENT flag
>>> 
>>> This patch lets user-space to request a non-consistent memory
>>> allocation during CREATE_BUFS and REQBUFS ioctl calls.
>>> 
>>> = CREATE_BUFS
>>> 
>>>   struct v4l2_create_buffers has seven 4-byte reserved areas,
>>>   so reserved[0] is renamed to ->flags. The struct, thus, now
>>>   has six reserved 4-byte regions.
>>> 
>>> = CREATE_BUFS32
>>> 
>>>   struct v4l2_create_buff

Re: media/master bisection: v4l2-compliance-uvc.Buffer-ioctls-Input-0.VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF on rk3399-gru-kevin

2020-06-26 Thread Guillaume Tucker
Please see the bisection report below about a regression in
v4l2-compliance with uvcvideo:

[   25.495039] uvcvideo: Failed to query (SET_CUR) UVC control 10 on unit 2: 
-32 (exp. 2).
fail: v4l2-test-buffers.cpp(680): check_0(reqbufs.reserved, 
sizeof(reqbufs.reserved))
test VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF: FAIL


as seen in the full job log:


https://storage.kernelci.org/media/master/v5.8-rc1-64-ge30cc79cc80f/arm64/defconfig/gcc-8/lab-collabora/v4l2-compliance-uvc-rk3399-gru-kevin.html#L1713

with a few more details about the regression here:

https://kernelci.org/test/case/id/5ef23169140826f73d97bf51/

and the same test case failure also seen with vivid:

https://kernelci.org/test/case/id/5ef23699f641f7b3e597bf3f/


The bisection actually ran a couple of days ago but there was an
email error when sending the report, so I'm sending it by hand
now.  I hope the issue hasn't spread too widely already, I know
it's also affecting linux-next.

Guillaume


On 25/06/2020 23:19, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> media/master bisection: 
> v4l2-compliance-uvc.Buffer-ioctls-Input-0.VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF 
> on rk3399-gru-kevin
> 
> Summary:
>   Start:  e30cc79cc80f media: media-request: Fix crash if memory 
> allocation fails
>   Plain log:  
> https://storage.kernelci.org/media/master/v5.8-rc1-64-ge30cc79cc80f/arm64/defconfig/gcc-8/lab-collabora/v4l2-compliance-uvc-rk3399-gru-kevin.txt
>   HTML log:   
> https://storage.kernelci.org/media/master/v5.8-rc1-64-ge30cc79cc80f/arm64/defconfig/gcc-8/lab-collabora/v4l2-compliance-uvc-rk3399-gru-kevin.html
>   Result: 1e0b2318fa75 media: videobuf2: handle 
> V4L2_FLAG_MEMORY_NON_CONSISTENT flag
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   media
>   URL:https://git.linuxtv.org/media_tree.git
>   Branch: master
>   Target: rk3399-gru-kevin
>   CPU arch:   arm64
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  
> v4l2-compliance-uvc.Buffer-ioctls-Input-0.VIDIOC_REQBUFS/CREATE_BUFS/QUERYBUF
> 
> Breaking commit found:
> 
> ---
> commit 1e0b2318fa75d186ee0d2be31843ce867385fcc4
> Author: Sergey Senozhatsky 
> Date:   Thu May 14 18:01:45 2020 +0200
> 
> media: videobuf2: handle V4L2_FLAG_MEMORY_NON_CONSISTENT flag
> 
> This patch lets user-space to request a non-consistent memory
> allocation during CREATE_BUFS and REQBUFS ioctl calls.
> 
> = CREATE_BUFS
> 
>   struct v4l2_create_buffers has seven 4-byte reserved areas,
>   so reserved[0] is renamed to ->flags. The struct, thus, now
>   has six reserved 4-byte regions.
> 
> = CREATE_BUFS32
> 
>   struct v4l2_create_buffers32 has seven 4-byte reserved areas,
>   so reserved[0] is renamed to ->flags. The struct, thus, now
>   has six reserved 4-byte regions.
> 
> = REQBUFS
> 
>  We use one bit of a ->reserved[1] member of struct v4l2_requestbuffers,
>  which is now renamed to ->flags. Unlike v4l2_create_buffers, struct
>  v4l2_requestbuffers does not have enough reserved room. Therefore for
>  backward compatibility  ->reserved and ->flags were put into anonymous
>  union.
> 
> Signed-off-by: Sergey Senozhatsky 
> Signed-off-by: Hans Verkuil 
> Signed-off-by: Mauro Carvalho Chehab 
> 
> diff --git a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst 
> b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> index e1afc5b504c2..f2a702870fad 100644
> --- a/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> +++ b/Documentation/userspace-api/media/v4l/vidioc-create-bufs.rst
> @@ -121,7 +121,12 @@ than the number requested.
>   other changes, then set ``count`` to 0, ``memory`` to
>   ``V4L2_MEMORY_MMAP`` and ``format.type`` to the buffer type.
>  * - __u32
> -  - ``reserved``\ [7]
> +  - ``flags``
> +  - Specifies additional buffer management attributes.
> + See :ref:`memory-flags`.
> +
> +* - __u32
> +  - ``reserved``\ [6]
>- A place holder for future extensions. Drivers and applications
>   must 

Re: stable-rc/linux-5.6.y bisection: baseline.dmesg.crit on bcm2837-rpi-3-b

2020-06-25 Thread Guillaume Tucker
On 25/06/2020 06:24, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> stable-rc/linux-5.6.y bisection: baseline.dmesg.crit on bcm2837-rpi-3-b
> 
> Summary:
>   Start:  61aba373f570 Linux 5.6.19
>   Plain log:  
> https://storage.kernelci.org/stable-rc/linux-5.6.y/v5.6.19/arm64/defconfig/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.txt
>   HTML log:   
> https://storage.kernelci.org/stable-rc/linux-5.6.y/v5.6.19/arm64/defconfig/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.html
>   Result: 9cf5d5444c78 Revert "cgroup: Add memory barriers to plug 
> cgroup_rstat_updated() race window"
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   stable-rc
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>   Branch: linux-5.6.y
>   Target: bcm2837-rpi-3-b
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.dmesg.crit

The "crit" kernel message is:

[   17.536674] hwmon hwmon1: Undervoltage detected!

which is a known intermittent issue on rpi-3-b that is most
likely to be unrelated to the patch found by this bisection.

Maybe the bisection landed on this patch because it makes the
undervoltage more likely by a side effect, if it affects timing
or the binary size or addresses in very subtle ways.  But even
then it wouldn't be a regression.

We should stop running bisections on rpi-3-b, at least for this
kernel error message, until it has been fixed to avoid future
false positives.

Sorry for the noise.

Guillaume


> Breaking commit found:
> 
> ---
> commit 9cf5d5444c78c14bb9f90dd21cef361ee14ba64b
> Author: Tejun Heo 
> Date:   Thu Apr 9 14:55:35 2020 -0400
> 
> Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race 
> window"
> 
> [ Upstream commit d8ef4b38cb69d907f9b0e889c44d05fc0f890977 ]
> 
> This reverts commit 9a9e97b2f1f2 ("cgroup: Add memory barriers to plug
> cgroup_rstat_updated() race window").
> 
> The commit was added in anticipation of memcg rstat conversion which 
> needed
> synchronous accounting for the event counters (e.g. oom kill count). 
> However,
> the conversion didn't get merged due to percpu memory overhead concern 
> which
> couldn't be addressed at the time.
> 
> Unfortunately, the patch's addition of smp_mb() to cgroup_rstat_updated()
> meant that every scheduling event now had to go through an additional full
> barrier and Mel Gorman noticed it as 1% regression in netperf UDP_STREAM 
> test.
> 
> There's no need to have this barrier in tree now and even if we need
> synchronous accounting in the future, the right thing to do is separating 
> that
> out to a separate function so that hot paths which don't care about
> synchronous behavior don't have to pay the overhead of the full barrier. 
> Let's
> revert.
> 
> Signed-off-by: Tejun Heo 
> Reported-by: Mel Gorman 
> Link: http://lkml.kernel.org/r/20200409154413.gk3...@techsingularity.net
> Cc: v4.18+
> Signed-off-by: Sasha Levin 
> 
> diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
> index 6f87352f8219..41ca996568df 100644
> --- a/kernel/cgroup/rstat.c
> +++ b/kernel/cgroup/rstat.c
> @@ -33,12 +33,9 @@ void cgroup_rstat_updated(struct cgroup *cgrp, int cpu)
>   return;
>  
>   /*
> -  * Paired with the one in cgroup_rstat_cpu_pop_updated().  Either we
> -  * see NULL updated_next or they see our updated stat.
> -  */
> - smp_mb();
> -
> - /*
> +  * Speculative already-on-list test. This may race leading to
> +  * temporary inaccuracies, which is fine.
> +  *
>* Because @parent's updated_children is terminated with @parent
>* instead of NULL, we can tell whether @cgrp is on the list by
>* testing the next pointer for NULL.
> @@ -134,13 +131,6 @@ static struct cgroup 
> *cgroup_rstat_cpu_pop_updated(struct cgroup *pos,
>   *nextp = rstatc->updated_next;
>   rstatc->updated_next = NULL;
>  
> - /*
> -  * Paired with the one in cgroup_rstat_cpu_updated().
> - 

Re: krzysztof/for-next bisection: baseline.dmesg.crit on bcm2837-rpi-3-b

2020-06-23 Thread Guillaume Tucker
On 23/06/2020 15:23, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> krzysztof/for-next bisection: baseline.dmesg.crit on bcm2837-rpi-3-b
> 
> Summary:
>   Start:  d6fe116541b7 Merge branch 'next/soc' into for-next
>   Plain log:  
> https://storage.kernelci.org/krzysztof/for-next/v5.8-rc1-14-gd6fe116541b7/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.txt
>   HTML log:   
> https://storage.kernelci.org/krzysztof/for-next/v5.8-rc1-14-gd6fe116541b7/arm64/defconfig+CONFIG_RANDOMIZE_BASE=y/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.html
>   Result: 5b17a04addc2 ARM: exynos: clear L310_AUX_CTRL_FULL_LINE_ZERO in 
> default l2c_aux_val
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   krzysztof
>   URL:https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux.git
>   Branch: for-next
>   Target: bcm2837-rpi-3-b
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig+CONFIG_RANDOMIZE_BASE=y
>   Test case:  baseline.dmesg.crit

The "crit" kernel message is:

[   17.566555] hwmon hwmon1: Undervoltage detected!

which does not seem to have anything to do with the patch found
by the bisection.  Also, the bcm2837-rpi-3-b uses Cortex-A53
cores and no L2C-310 cache.

This undervoltage issue is actually an intermittent issue that
was already present before.  See next-20200616:

  https://kernelci.org/test/case/id/5ee880c10e8d4cd38797bf52/
  
https://storage.kernelci.org/next/master/next-20200616/arm64/defconfig/gcc-8/lab-baylibre/baseline-bcm2837-rpi-3-b.html#L708

I'll still take a closer look to be sure this is actually noise.
The same revision built without CONFIG_RAMDOMIZE_BASE=y passed
fine, although I don't see how this could be related:

  https://kernelci.org/test/plan/id/5ef1ccb2d9df2557d597bf20/

Maybe the rpi-3-b could get an undervoltage depending on the
address where the kernel was loaded, and somehow my patch would
make this more likely?  It sounds so far-fetched...

This is so ironic - after 6 months with no false positives in
kernelci bisections, and this rpi-3-b issue too random to ever
cause a bisection to succeed, I get this report which landed a
commit that I made, one week after enabling public bisection
email reports again.  It must be trying to tell me something :)

Guillaume


> Breaking commit found:
> 
> -------
> commit 5b17a04addc29201dc142c8d2c077eb7745d2e35
> Author: Guillaume Tucker 
> Date:   Fri Jun 12 14:58:37 2020 +0100
> 
> ARM: exynos: clear L310_AUX_CTRL_FULL_LINE_ZERO in default l2c_aux_val
> 
> This "alert" error message can be seen on exynos4412-odroidx2:
> 
> L2C: platform modifies aux control register: 0x0207 -> 0x3e470001
> L2C: platform provided aux values permit register corruption.
> 
> Followed by this plain error message:
> 
> L2C-310: enabling full line of zeros but not enabled in Cortex-A9
> 
> To fix it, don't set the L310_AUX_CTRL_FULL_LINE_ZERO flag (bit 0) in
> the default value of l2c_aux_val.  It may instead be enabled when
> applicable by the logic in l2c310_enable() if the attribute
> "arm,full-line-zero-disable" was set in the device tree.
> 
> The initial commit that introduced this default value was in v2.6.38
> commit 1cf0eb799759 ("ARM: S5PV310: Add L2 cache init function in
> cpu.c").
> 
> However, the code to set the L310_AUX_CTRL_FULL_LINE_ZERO flag and
> manage that feature was added much later and the default value was not
> updated then.  So this seems to have been a subtle oversight
> especially since enabling it only in the cache and not in the A9 core
> doesn't actually prevent the platform from running.  According to the
> TRM, the opposite would be a real issue, if the feature was enabled in
> the A9 core but not in the cache controller.
> 
&

Re: next/master bisection: baseline.login on ox820-cloudengines-pogoplug-series-3

2020-06-18 Thread Guillaume Tucker
On 18/06/2020 15:09, Miquel Raynal wrote:
> Hi Guillaume,
> 
> Miquel Raynal  wrote on Thu, 18 Jun 2020
> 15:23:24 +0200:
> 
>> Hi Guillaume,
>>
>> Guillaume Tucker  wrote on Thu, 18 Jun
>> 2020 13:28:05 +0100:
>>
>>> Please see the bisection report below about a kernel panic.
>>>
>>> Reports aren't automatically sent to the public while we're
>>> trialing new bisection features on kernelci.org but this one
>>> looks valid.
>>>
>>> See the kernel Oops due to a NULL pointer followed by a panic:
>>>
>>>   
>>> https://storage.kernelci.org/next/master/next-20200618/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/baseline-ox820-cloudengines-pogoplug-series-3.html#L504
>>
>> Thanks for the report, I will not be able to manage it before Monday,
>> but I'll try to take care of it early next week.
> 
> Actually Boris saw the issue, I just updated nand/next, it should be
> part of tomorrow's linux-next. Could you please report if it fixes your
> boot?

Sure, will check tomorrow.  Thanks for the update.

We may also consider adding the nand/next branch to kernelci.org
and catch issues earlier.  We can discuss that separately.

Guillaume


Re: next/master bisection: baseline.login on ox820-cloudengines-pogoplug-series-3

2020-06-18 Thread Guillaume Tucker
Please see the bisection report below about a kernel panic.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

See the kernel Oops due to a NULL pointer followed by a panic:

  
https://storage.kernelci.org/next/master/next-20200618/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/baseline-ox820-cloudengines-pogoplug-series-3.html#L504

Thanks,
Guillaume


On 18/06/2020 13:20, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on ox820-cloudengines-pogoplug-series-3
> 
> Summary:
>   Start:  ce2cc8efd7a4 Add linux-next specific files for 20200618
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200618/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/baseline-ox820-cloudengines-pogoplug-series-3.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200618/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/baseline-ox820-cloudengines-pogoplug-series-3.html
>   Result: 7b929258ff0e mtd: rawnand: Allocate the interface 
> configurations dynamically
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: ox820-cloudengines-pogoplug-series-3
>   CPU arch:   arm
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: oxnas_v6_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 7b929258ff0e913616e21661a757f5ecb776d337
> Author: Miquel Raynal 
> Date:   Fri May 29 13:13:22 2020 +0200
> 
> mtd: rawnand: Allocate the interface configurations dynamically
> 
> Instead of manipulating the statically allocated structure and copy
> timings around, allocate one at identification time and save it in the
> nand_chip structure once it has been initialized.
> 
> All NAND chips using the same interface configuration during reset and
> startup, we define a helper to retrieve a single reset interface
> configuration object, shared across all NAND chips.
> 
> We use a second pointer to always have a reference on the currently
> applied interface configuration, which may either point to the "best
> interface configuration" or to the "default reset interface
> configuration".
> 
> Signed-off-by: Miquel Raynal 
> Reviewed-by: Boris Brezillon 
> Link: 
> https://lore.kernel.org/linux-mtd/20200529111322.7184-29-miquel.ray...@bootlin.com
> 
> diff --git a/drivers/mtd/nand/raw/internals.h 
> b/drivers/mtd/nand/raw/internals.h
> index 5ebfbb89e572..012876e14317 100644
> --- a/drivers/mtd/nand/raw/internals.h
> +++ b/drivers/mtd/nand/raw/internals.h
> @@ -93,6 +93,7 @@ onfi_find_closest_sdr_mode(const struct nand_sdr_timings 
> *spec_timings);
>  int nand_choose_best_sdr_timings(struct nand_chip *chip,
>struct nand_interface_config *iface,
>struct nand_sdr_timings *spec_timings);
> +const struct nand_interface_config *nand_get_reset_interface_config(void);
>  int nand_get_features(struct nand_chip *chip, int addr, u8 
> *subfeature_param);
>  int nand_set_features(struct nand_chip *chip, int addr, u8 
> *subfeature_param);
>  int nand_read_page_raw_notsupp(struct nand_chip *chip, u8 *buf,
> diff --git a/drivers/mtd/nand/raw/nand_base.c 
> b/drivers/mtd/nand/raw/nand_base.c
> index 753328f106c1..4a0d486210e9 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -928,9 +928,9 @@ static int nand_reset_interface(struct nand_chip *chip, 
> int chipnr)
>* timings to timing mode 0.
>*/
>  
> - onfi_fill_interface_config(chip, >interface_config,
> -NAND_SDR_IFACE, 0);
> - ret = ops->setup_interface(chip, chipnr, >interface_config);
> + chip->current_interface_config = nand_get_reset_interface_config();
> + ret = ops->setup_interface(chip, chipnr,
> +chip->current_interface_config);
>   if (ret)
>   pr_err("Failed to configure data interface to SDR 

Re: [PATCH] ARM: exynos: update l2c_aux_mask to fix alert message

2020-06-12 Thread Guillaume Tucker
On 02/04/2020 14:11, Russell King - ARM Linux admin wrote:
> On Thu, Apr 02, 2020 at 02:03:52PM +0100, Russell King - ARM Linux admin 
> wrote:
>> On Thu, Apr 02, 2020 at 01:13:24PM +0100, Guillaume Tucker wrote:
>>> On 01/04/2020 17:31, Russell King - ARM Linux admin wrote:
>>>> On Wed, Apr 01, 2020 at 05:08:03PM +0100, Guillaume Tucker wrote:
>>>>> Allow setting the number of cycles for RAM reads in the pl310 cache
>>>>> controller L2 auxiliary control register mask (bits 0-2) since it
>>>>> needs to be changed in software.  This only affects exynos4210 and
>>>>> exynos4412 as they use the pl310 cache controller.
>>>>>
>>>>> With the mask used until now, the following warnings were generated,
>>>>> the 2nd one being a pr_alert():
>>>>>
>>>>>   L2C: platform modifies aux control register: 0x0207 -> 0x3e470001
>>>>>   L2C: platform provided aux values permit register corruption.
>>>>>
>>>>> This latency cycles value has always been set in software in spite of
>>>>> the warnings.  Keep it this way but clear the alert message about
>>>>> register corruption to acknowledge it is a valid thing to do.
>>>>
>>>> This is telling you that you are doing something you should not be
>>>> doing.  The L2C controller should be configured by board firmware
>>>> first and foremost, because if, for example, u-boot makes use of the
>>>> L2 cache, or any other pre-main kernel code (in other words,
>>>> decompressor) the setup of the L2 controller will be wrong.
>>>>
>>>> So, NAK.
>>>
>>> OK thanks, I guess I misinterpreted the meaning of the error
>>> message.  It's really saying that the register value was not the
>>> right one before the kernel tried to change it.  Next step for me
>>> is to look into U-Boot.
>>
>> The message "L2C: platform provided aux values permit register
>> corruption." means that bits are set in both the mask and the value
>> fields.  Since the new value is calculated as:
>>
>>  old = register value;
>>  new = old & mask;
>>  new |= val;
>>
>> If bits are set in both "mask" and "val" for a multi-bit field, the
>> value ending up in the field may not be what is intended.  Consider
>> a 5-bit field set initially to 10101, and the requested value is
>> 01000 with a mask of 1.  What you end up with is not 01000, but
>> 11101.  Hence, register corruption.  It is not possible to easily
>> tell whether the mask and values refer to a multi-bit field or not,
>> so the mere fact that bits are set in both issues the alert.
>>
>> The message "L2C: platform modifies aux control register ..." means
>> that you're trying to modify the value of the auxiliary control
>> register, which brings with it the problems I stated in my previous
>> email; platform configuration of the L2C must be done by firmware and
>> not the kernel for the reasons I've set out.
> 
> Actually, looking at the values there:
> 
> .l2c_aux_val= 0x3c41,
> -   .l2c_aux_mask   = 0xc20f,
> +   .l2c_aux_mask   = 0xc208,
> 
> Bit 0 is L310_AUX_CTRL_FULL_LINE_ZERO feature bit, which platforms have
> no business fiddling with - it is a Cortex-A9/L2C310 specific feature
> that needs both ends to be configured correctly to work.  The L2C code
> knows this and will deal with it.  So, .l2c_aux_val should drop setting
> bit 0.

Ack, I've just sent a patch to fix that:

  ARM: exynos: clear L310_AUX_CTRL_FULL_LINE_ZERO in default l2c_aux_val

Sorry about the confusion in my first patch, I got mislead with
the TRM of an earlier revision of the L2C-310 when these bits
were used to set the RAM data read latency.  So this all makes
sense to me now with the matching documentation for the hardware.

> It's also setting L310_AUX_CTRL_NS_LOCKDOWN, which the kernel already
> deals with - this bit should be dropped as well.

OK I'll take a look at that and send a separate patch.

Presumably this bit should also be added to the mask to report an
error if the kernel is changing it?

> It's clearing L310_AUX_CTRL_CACHE_REPLACE_RR - this should be setup by
> firmware.

As far as I can tell, L310_AUX_CTRL_CACHE_REPLACE_RR is bit 25
i.e. 0x0200, which is set in hardware by default and is not
changed by the kernel:

L2C: platform modifies aux control register: 0x0207 -> 0x3e470001

Also this bit is already in the mask, and there is no error about
register corruption any more with the patch I sent today.

> For the prefetching, I thought there were DT properties for that.
> Please look at that, and see whether you can eliminate most of the
> .l2c_aux_val field set bits, and the .l2c_aux_mask clear bits.

Ack, it's handled by l2c310_of_parse().  I'll take a look and see
if I can make another patch for that.

Thanks,
Guillaume


[PATCH] ARM: exynos: clear L310_AUX_CTRL_FULL_LINE_ZERO in default l2c_aux_val

2020-06-12 Thread Guillaume Tucker
This "alert" error message can be seen on exynos4412-odroidx2:

L2C: platform modifies aux control register: 0x0207 -> 0x3e470001
L2C: platform provided aux values permit register corruption.

Followed by this plain error message:

L2C-310: enabling full line of zeros but not enabled in Cortex-A9

To fix it, don't set the L310_AUX_CTRL_FULL_LINE_ZERO flag (bit 0) in
the default value of l2c_aux_val.  It may instead be enabled when
applicable by the logic in l2c310_enable() if the attribute
"arm,full-line-zero-disable" was set in the device tree.

The initial commit that introduced this default value was in v2.6.38:

  1cf0eb799759 "ARM: S5PV310: Add L2 cache init function in cpu.c"

However, the code to set the L310_AUX_CTRL_FULL_LINE_ZERO flag and
manage that feature was added much later and the default value was not
updated then.  So this seems to have been a subtle oversight
especially since enabling it only in the cache and not in the A9 core
doesn't actually prevent the platform from running.  According to the
TRM, the opposite would be a real issue, if the feature was enabled in
the A9 core but not in the cache controller.

Reported-by: "kernelci.org bot" 
Signed-off-by: Guillaume Tucker 
---
 arch/arm/mach-exynos/exynos.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c
index 7a8d1555db40..36c3785a 100644
--- a/arch/arm/mach-exynos/exynos.c
+++ b/arch/arm/mach-exynos/exynos.c
@@ -193,7 +193,7 @@ static void __init exynos_dt_fixup(void)
 }
 
 DT_MACHINE_START(EXYNOS_DT, "Samsung Exynos (Flattened Device Tree)")
-   .l2c_aux_val= 0x3c41,
+   .l2c_aux_val= 0x3c40,
.l2c_aux_mask   = 0xc20f,
.smp= smp_ops(exynos_smp_ops),
.map_io = exynos_init_io,
-- 
2.20.1



Re: next/master bisection: baseline.login on meson-sm1-sei610

2020-05-25 Thread Guillaume Tucker
Please see the bisection report below about a kernel Oops.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Guillaume


On 23/05/2020 18:46, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on meson-sm1-sei610
> 
> Summary:
>   Start:  c11d28ab4a691 Add linux-next specific files for 20200522
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200522/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-sm1-sei610.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200522/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-sm1-sei610.html
>   Result: 013af227f58a9 usb: dwc3: meson-g12a: handle the phy and glue 
> registers separately
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: meson-sm1-sei610
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 013af227f58a97ffc61b99301f8f4448dc7e7f55
> Author: Neil Armstrong 
> Date:   Thu Mar 26 14:44:55 2020 +0100
> 
> usb: dwc3: meson-g12a: handle the phy and glue registers separately
> 
> On the Amlogic GXL/GXM SoCs, only the USB control registers are available,
> the PHY mode being handled in the PHY registers.
> 
> Thus, handle the PHY mode registers in separate regmaps and prepare
> support for Amlogic GXL/GXM SoCs by moving the regmap setup in a callback
> set in the SoC match data.
> 
> Reviewed-by: Martin Blumenstingl 
> Signed-off-by: Neil Armstrong 
> Signed-off-by: Felipe Balbi 
> 
> diff --git a/drivers/usb/dwc3/dwc3-meson-g12a.c 
> b/drivers/usb/dwc3/dwc3-meson-g12a.c
> index f49c9e2665376..d7eff4d7c5fe6 100644
> --- a/drivers/usb/dwc3/dwc3-meson-g12a.c
> +++ b/drivers/usb/dwc3/dwc3-meson-g12a.c
> @@ -30,7 +30,7 @@
>  #include 
>  #include 
>  
> -/* USB2 Ports Control Registers */
> +/* USB2 Ports Control Registers, offsets are per-port */
>  
>  #define U2P_REG_SIZE 0x20
>  
> @@ -50,14 +50,16 @@
>  
>  /* USB Glue Control Registers */
>  
> -#define USB_R0   0x80
> +#define G12A_GLUE_OFFSET 0x80
> +
> +#define USB_R0   0x00
>   #define USB_R0_P30_LANE0_TX2RX_LOOPBACK BIT(17)
>   #define USB_R0_P30_LANE0_EXT_PCLK_REQ   BIT(18)
>   #define USB_R0_P30_PCS_RX_LOS_MASK_VAL_MASK GENMASK(28, 19)
>   #define USB_R0_U2D_SS_SCALEDOWN_MODE_MASK   GENMASK(30, 29)
>   #define USB_R0_U2D_ACT  BIT(31)
>  
> -#define USB_R1   0x84
> +#define USB_R1   0x04
>   #define USB_R1_U3H_BIGENDIAN_GS BIT(0)
>   #define USB_R1_U3H_PME_ENABLE   BIT(1)
>   #define USB_R1_U3H_HUB_PORT_OVERCURRENT_MASKGENMASK(4, 2)
> @@ -69,23 +71,23 @@
>   #define USB_R1_U3H_FLADJ_30MHZ_REG_MASK GENMASK(24, 19)
>   #define USB_R1_P30_PCS_TX_SWING_FULL_MASK   GENMASK(31, 25)
>  
> -#define USB_R2   0x88
> +#define USB_R2   0x08
>   #define USB_R2_P30_PCS_TX_DEEMPH_3P5DB_MASK GENMASK(25, 20)
>   #define USB_R2_P30_PCS_TX_DEEMPH_6DB_MASK   GENMASK(31, 26)
>  
> -#define USB_R3   0x8c
> +#define USB_R3   0x0c
>   #define USB_R3_P30_SSC_ENABLE   BIT(0)
>   #define USB_R3_P30_SSC_RANGE_MASK   GENMASK(3, 1)
>   #define USB_R3_P30_SSC_REF_CLK_SEL_MASK 

Re: next/master bisection: baseline.login on panda

2020-05-20 Thread Guillaume Tucker
Please see the bisection report below about a boot failure.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

Unfortunately there isn't anything in the kernel log, it's
probably crashing very early on.  The bisection was run on
omap4-panda, and there seems to be the same issue on
omap3-beagle-xm as it's also failing to boot.

Please let us know if anyone is able to debug the issue or if we
need to rerun the KernelCI job with earlyprintk enabled or any
debug config option.

Thanks,
Guillaume

On 20/05/2020 09:34, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on panda
> 
> Summary:
>   Start:  fb57b1fabcb28 Add linux-next specific files for 20200519
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200519/arm/omap2plus_defconfig/gcc-8/lab-baylibre/baseline-omap4-panda.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200519/arm/omap2plus_defconfig/gcc-8/lab-baylibre/baseline-omap4-panda.html
>   Result: ce574c27ae275 iommu: Move iommu_group_create_direct_mappings() 
> out of iommu_group_add_device()
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: panda
>   CPU arch:   arm
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: omap2plus_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit ce574c27ae275bc51b6437883fc9cd1c46b498e5
> Author: Joerg Roedel 
> Date:   Wed Apr 29 15:36:50 2020 +0200
> 
> iommu: Move iommu_group_create_direct_mappings() out of 
> iommu_group_add_device()
> 
> After the previous changes the iommu group may not have a default
> domain when iommu_group_add_device() is called. With no default domain
> iommu_group_create_direct_mappings() will do nothing and no direct
> mappings will be created.
> 
> Rename iommu_group_create_direct_mappings() to
> iommu_create_device_direct_mappings() to better reflect that the
> function creates direct mappings only for one device and not for all
> devices in the group. Then move the call to the places where a default
> domain actually exists.
> 
> Signed-off-by: Joerg Roedel 
> Tested-by: Marek Szyprowski 
> Acked-by: Marek Szyprowski 
> Link: https://lore.kernel.org/r/20200429133712.31431-13-j...@8bytes.org
> Signed-off-by: Joerg Roedel 
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 7de0e29db3338..834a45da0ed0f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -89,6 +89,8 @@ static int __iommu_attach_group(struct iommu_domain *domain,
>   struct iommu_group *group);
>  static void __iommu_detach_group(struct iommu_domain *domain,
>struct iommu_group *group);
> +static int iommu_create_device_direct_mappings(struct iommu_group *group,
> +struct device *dev);
>  
>  #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)\
>  struct iommu_group_attribute iommu_group_attr_##_name =  \
> @@ -243,6 +245,8 @@ static int __iommu_probe_device_helper(struct device *dev)
>   if (group->default_domain)
>   ret = __iommu_attach_device(group->default_domain, dev);
>  
> + iommu_create_device_direct_mappings(group, dev);
> +
>   iommu_group_put(group);
>  
>   if (ret)
> @@ -263,6 +267,7 @@ static int __iommu_probe_device_helper(struct device *dev)
>  int iommu_probe_device(struct device *dev)
>  {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
> + struct iommu_group *group;
>   int ret;
>  
>   WARN_ON(dev->iommu_group);
> @@ -285,6 +290,10 @@ int iommu_probe_device(struct device *dev)
>   if (ret)
>   goto err_module_put;
>  
> + group = iommu_group_get(dev);
> + iommu_create_device_direct_mappings(group, dev);
> + iommu_group_put(group);
> +
>   if (ops->probe_finalize)
>   

Re: next/master bisection: baseline.login on jetson-tk1

2020-05-13 Thread Guillaume Tucker
On 12/05/2020 16:16, Joerg Roedel wrote:
> Hi Guillaume,
> 
> thanks for the report!
> 
> On Tue, May 12, 2020 at 07:05:13AM +0100, Guillaume Tucker wrote:
>>> Summary:
>>>   Start:  4b20e7462caa6 Add linux-next specific files for 20200511
>>>   Plain log:  
>>> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.txt
>>>   HTML log:   
>>> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.html
>>>   Result: 3eeeb45c6d044 iommu: Remove add_device()/remove_device() 
>>> code-paths
> 
> Okay, so it faults at
> 
>   PC is at __iommu_probe_device+0x20/0x1b8
> 
> Can you translate that for me into a code-line, please? That would help
> finding the issue.

Sure, sorry for the delay.  I've built my own image as vmlinux is
not stored by kernelci and reproduced the problem:

  https://lava.collabora.co.uk/scheduler/job/2403076#L544

which this time gave me:

<4>[2.540558] PC is at iommu_probe_device+0x1c/0x15c
<4>[2.545606] LR is at of_iommu_configure+0x15c/0x1c4
<4>[2.550736] pc : []lr : []psr: a013

which in turn brings us to:

(gdb) l *0xc092e0e4
0xc092e0e4 is in iommu_probe_device (drivers/iommu/iommu.c:232).
227 int ret;
228 
229 if (!dev_iommu_get(dev))
230 return -ENOMEM;
231 
232 if (!try_module_get(ops->owner)) {
233 ret = -EINVAL;
234 goto err_out;
235 }
236 


Hope this helps.

Guillaume


kernelci.org transitioning to functional testing

2020-05-13 Thread Guillaume Tucker
As kernelci.org is expanding its functional testing
capabilities, the concept of boot testing is now being
deprecated.

Next Monday 18th May, the web dashboard on https://kernelci.org
will be updated to primarily show functional test results
rather than boot results.  The Boots tab will still be
available until 5th June to ease the transition.

The new equivalent to boot testing is the *baseline* test suite
which also runs sanity checks using dmesg and bootrr[1].

Boot email reports will eventually be replaced with baseline
reports.  For those of you already familiar with the test email
reports, they will be simplified to only show regressions with
links to the dashboard for all the details.

Some functional tests are already being run by kernelci.org,
results have only been shared by email so far but they will
become visible on the web dashboard next week.  In particular:
v4l2-compliance, i-g-t for DRM/KMS and Panfrost,
suspend/resume...

And of course, a lot of functional test suites are in the
process of being added: kselftest, KUnit, LTP, xfstests,
extended i-g-t coverage and many more.

The detailed schedule is available on a GitHub issue[2].

Please let us know if you have any questions, comments or
concerns either in this thread, on kerne...@groups.io or IRC
#kernelci on Freenode.

Stay tuned!

Thanks,
Guillaume


[1] bootrr: https://github.com/kernelci/bootrr
[2] schedule: https://github.com/kernelci/kernelci-backend/issues/238



Re: next/master bisection: baseline.login on jetson-tk1

2020-05-12 Thread Guillaume Tucker
Please see the bisection report below about a kernel panic.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

See the kernel Oops due to a NULL pointer followed by a panic:


https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.html#L573

Stack trace:

<0>[2.953683] [] (__iommu_probe_device) from [] 
(iommu_probe_device+0x18/0x124)
<0>[2.962810] [] (iommu_probe_device) from [] 
(of_iommu_configure+0x154/0x1b8)
<0>[2.971853] [] (of_iommu_configure) from [] 
(of_dma_configure+0x144/0x2c8)
<0>[2.980722] [] (of_dma_configure) from [] 
(host1x_attach_driver+0x148/0x2c4)
<0>[2.989763] [] (host1x_attach_driver) from [] 
(host1x_driver_register_full+0x70/0xcc)
<0>[2.999585] [] (host1x_driver_register_full) from [] 
(host1x_drm_init+0x14/0x50)
<0>[3.008973] [] (host1x_drm_init) from [] 
(do_one_initcall+0x50/0x2b0)
<0>[3.017405] [] (do_one_initcall) from [] 
(kernel_init_freeable+0x188/0x200)
<0>[3.026361] [] (kernel_init_freeable) from [] 
(kernel_init+0x8/0x114)
<0>[3.034794] [] (kernel_init) from [] 
(ret_from_fork+0x14/0x2c)

Guillaume


On 12/05/2020 02:24, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> next/master bisection: baseline.login on jetson-tk1
> 
> Summary:
>   Start:  4b20e7462caa6 Add linux-next specific files for 20200511
>   Plain log:  
> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.txt
>   HTML log:   
> https://storage.kernelci.org/next/master/next-20200511/arm/tegra_defconfig/gcc-8/lab-collabora/baseline-tegra124-jetson-tk1.html
>   Result: 3eeeb45c6d044 iommu: Remove add_device()/remove_device() 
> code-paths
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   next
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   Branch: master
>   Target: jetson-tk1
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: tegra_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 3eeeb45c6d0444b368cdeba9bdafa8bbcf5370d1
> Author: Joerg Roedel 
> Date:   Wed Apr 29 15:37:10 2020 +0200
> 
> iommu: Remove add_device()/remove_device() code-paths
> 
> All drivers are converted to use the probe/release_device()
> call-backs, so the add_device/remove_device() pointers are unused and
> the code using them can be removed.
> 
> Signed-off-by: Joerg Roedel 
> Tested-by: Marek Szyprowski 
> Acked-by: Marek Szyprowski 
> Link: https://lore.kernel.org/r/20200429133712.31431-33-j...@8bytes.org
> Signed-off-by: Joerg Roedel 
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 397fd4fd0c320..7f99e5ae432c6 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -220,12 +220,20 @@ static int __iommu_probe_device(struct device *dev, 
> struct list_head *group_list
>   return ret;
>  }
>  
> -static int __iommu_probe_device_helper(struct device *dev)
> +int iommu_probe_device(struct device *dev)
>  {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
>   struct iommu_group *group;
>   int ret;
>  
> + if (!dev_iommu_get(dev))
> + return -ENOMEM;
> +
> + if (!try_module_get(ops->owner)) {
> + ret = -EINVAL;
> + goto err_out;
> + }
> +
>   ret = __iommu_probe_device(dev, NULL);
>   if (ret)
>   goto err_out;
> @@ -259,75 +267,23 @@ static int __iommu_probe_device_helper(struct device 
> *dev)
>  
>  err_release:
>   iommu_release_device(dev);
> +
>  err_out:
>   return ret;
>  
>  }
>  
> -int iommu_probe_device(struct device *dev)
> +void iommu_release_device(struct device *dev)
>  {
>   const struct iommu_ops *ops = dev->bus->iommu_ops;
> - struct iommu_group *group;
> - int ret;
> -
> - WARN_ON(dev->iommu_group);
> -
> - if (!ops)
> - return -EINVAL;
> -
> - if (!dev_iommu_get(dev))
> - return -ENOMEM;

Re: stable/linux-4.4.y bisection: baseline.login on at91-sama5d4_xplained

2020-05-11 Thread Guillaume Tucker
Please see the bisection report below about a boot failure.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

It appears to be due to the fact that the network interface is
failing to get brought up:

[  114.385000] Waiting up to 10 more seconds for network.
[  124.355000] Sending DHCP requests ...#
..#
.#
 timed out!
[  212.355000] IP-Config: Reopening network devices...
[  212.365000] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
#


I guess the board would boot fine without network if it didn't
have ip=dhcp in the command line, so it's not strictly a kernel
boot failure but still an ethernet issue.

There wasn't any failure reported by kernelci on linux-4.9.y so
maybe this patch was applied by mistake on linux-4.4.y but I
haven't investigated enough to prove this.

Thanks,
Guillaume


On 10/05/2020 18:27, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> stable/linux-4.4.y bisection: baseline.login on at91-sama5d4_xplained
> 
> Summary:
>   Start:  e157447efd85b Linux 4.4.223
>   Plain log:  
> https://storage.kernelci.org/stable/linux-4.4.y/v4.4.223/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-at91-sama5d4_xplained.txt
>   HTML log:   
> https://storage.kernelci.org/stable/linux-4.4.y/v4.4.223/arm/multi_v7_defconfig/gcc-8/lab-baylibre/baseline-at91-sama5d4_xplained.html
>   Result: 0d1951fa23ba0 net: phy: Avoid polling PHY with 
> PHY_IGNORE_INTERRUPTS
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   stable
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>   Branch: linux-4.4.y
>   Target: at91-sama5d4_xplained
>   CPU arch:   arm
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: multi_v7_defconfig
>   Test case:  baseline.login
> 
> Breaking commit found:
> 
> ---
> commit 0d1951fa23ba0d35a4c5498ff28d1c5206d6fcdd
> Author: Florian Fainelli 
> Date:   Mon Jan 18 19:33:06 2016 -0800
> 
> net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS
> 
> commit d5c3d84657db57bd23ecd58b97f1c99dd42a7b80 upstream.
> 
> Commit 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT") changed
> a hunk in phy_state_machine() in the PHY_RUNNING case which was not
> needed. The change essentially makes the PHY library treat PHY devices
> with PHY_IGNORE_INTERRUPT to keep polling for the PHY device, even
> though the intent is not to do it.
> 
> Fix this by reverting that specific hunk, which makes the PHY state
> machine wait for state changes, and stay in the PHY_RUNNING state for as
> long as needed.
> 
> Fixes: 2c7b49212a86 ("phy: fix the use of PHY_IGNORE_INTERRUPT")
> Signed-off-by: Florian Fainelli 
> Signed-off-by: David S. Miller 
> Signed-off-by: Greg Kroah-Hartman 
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index 7d2cf015c5e76..b242bec834f4b 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -912,10 +912,10 @@ void phy_state_machine(struct work_struct *work)
>   phydev->adjust_link(phydev->attached_dev);
>   break;
>   case PHY_RUNNING:
> - /* Only register a CHANGE if we are polling or ignoring
> -  * interrupts and link changed since latest checking.
> + /* Only register a CHANGE if we are polling and link changed
> +  * since latest checking.
>*/
> - if (!phy_interrupt_is_valid(phydev)) {
> + if (phydev->irq == PHY_POLL) {
>   old_link = phydev->link;
>   err = phy_read_status(phydev);
>   if (err)
> @@ -1015,8 +1015,13 @@ void phy_state_machine(struct work_struct *work)
>   dev_dbg(>dev, "PHY state change %s -> %s\n",
>   phy_state_to_str(old_state), phy_state_to_str(phydev->state));
>  
> - queue_delayed_work(system_power_efficient_wq, >state_queue,
> -PHY_STATE_TIME * HZ);
> + /* Only re-schedule a PHY state machine change if we are polling the
> +  * PHY, if PHY_IGNORE_INTERRUPT is set, 

Re: stable-rc/linux-5.4.y bisection: baseline.dmesg.alert on meson-g12a-x96-max

2020-05-01 Thread Guillaume Tucker
Please see the bisection report below about a kernel Oops.

Reports aren't automatically sent to the public while we're
trialing new bisection features on kernelci.org but this one
looks valid.

The log shows a kernel NULL pointer dereference:

  
https://storage.kernelci.org/stable-rc/linux-5.4.y/v5.4.36-52-g35bbc55d9e29/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-g12a-x96-max.html#L1113

The call stack is not the same as in the commit message found by
the bisection, so maybe it only fixed part of the problem:

<1>[   16.007376] Unable to handle kernel NULL pointer dereference at virtual 
address 0010
<1>[   16.016300] Mem abort info:
<1>[   16.019269]   ESR = 0x9606
<1>[   16.022571]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[   16.028075]   SET = 0, FnV = 0
<1>[   16.031356]   EA = 0, S1PTW = 0
<1>[   16.034705] Data abort info:
<1>[   16.037837]   ISV = 0, ISS = 0x0006
<1>[   16.041876]   CM = 0, WnR = 0
<1>[   16.045128] user pgtable: 4k pages, 48-bit VAs, pgdp=be0f
<1>[   16.051702] [0010] pgd=be117003, 
pud=be118003, pmd=
<0>[   16.051709] Internal error: Oops: 9606 [#1] PREEMPT SMP
<4>[   16.133466] CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: GW 
5.4.37-rc1 #1
<4>[   16.141566] Hardware name: Shenzhen Amediatech Technology Co., Ltd X96 
Max (DT)
<4>[   16.149087] Workqueue: events deferred_probe_work_func
<4>[   16.154419] pstate: 2005 (nzCv daif -PAN -UAO)
<4>[   16.159428] pc : snd_soc_dapm_new_dai+0x3c/0x1b0
<4>[   16.164252] lr : snd_soc_dapm_connect_dai_link_widgets+0x114/0x268
<4>[   16.256970] Call trace:
<4>[   16.259647]  snd_soc_dapm_new_dai+0x3c/0x1b0
<4>[   16.264129]  snd_soc_dapm_connect_dai_link_widgets+0x114/0x268
<4>[   16.270167]  snd_soc_instantiate_card+0x858/0xb88
<4>[   16.275083]  snd_soc_register_card+0xf8/0x120
<4>[   16.279656]  devm_snd_soc_register_card+0x40/0x90
<4>[   16.284575]  axg_card_probe+0x9dc/0xaf0 [snd_soc_meson_axg_sound_card]
<4>[   16.291299]  platform_drv_probe+0x50/0xa0
<4>[   16.295524]  really_probe+0xd4/0x328
<4>[   16.299319]  driver_probe_device+0x54/0xe8
...


Guillaume


On 01/05/2020 10:32, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> stable-rc/linux-5.4.y bisection: baseline.dmesg.alert on meson-g12a-x96-max
> 
> Summary:
>   Start:  35bbc55d9e296 Linux 5.4.37-rc1
>   Plain log:  
> https://storage.kernelci.org/stable-rc/linux-5.4.y/v5.4.36-52-g35bbc55d9e29/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-g12a-x96-max.txt
>   HTML log:   
> https://storage.kernelci.org/stable-rc/linux-5.4.y/v5.4.36-52-g35bbc55d9e29/arm64/defconfig/gcc-8/lab-baylibre/baseline-meson-g12a-x96-max.html
>   Result: 09f4294793bd3 ASoC: meson: axg-card: fix codec-to-codec link 
> setup
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   stable-rc
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
>   Branch: linux-5.4.y
>   Target: meson-g12a-x96-max
>   CPU arch:   arm64
>   Lab:lab-baylibre
>   Compiler:   gcc-8
>   Config: defconfig
>   Test case:  baseline.dmesg.alert
> 
> Breaking commit found:
> 
> ---
> commit 09f4294793bd3e70d68fdab5b392dff18bff62ca
> Author: Jerome Brunet 
> Date:   Mon Apr 20 13:45:10 2020 +0200
> 
> ASoC: meson: axg-card: fix codec-to-codec link setup
> 
> commit 1164284270779e1865cc2046a2a01b58a1e858a9 upstream.
> 
> Since the addition of commit 9b5db059366a ("ASoC: soc-pcm: dpcm: Only 
> allow
> playback/capture if supported"), meson-axg cards which have codec-to-codec
> links fail to init and Oops:
> 
>   Unable to handle kernel NULL pointer dereference at virtual address 
> 0128
>   Internal error: Oops: 9644 [#1] PREEMPT SMP
>   CPU: 3 PID: 1582 Comm: arecord Not tainted 5.7.0-rc1
>   pc : invalidate_paths_ep+0x30/0xe0
>   lr : snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
>   Call trace:
>invalidate_paths_ep+0x30/0xe0
>snd_soc_dapm_dai_get_connected_widgets+0x170/0x1a8
>dpcm_path_get+0x38/0xd0
>dpcm_fe_dai_open+0x70/0x920
>

Re: net-next/master boot bisection: v5.3-13203-gc01ebd6c4698 on bcm2836-rpi-2-b

2019-10-02 Thread Guillaume Tucker
On 02/10/2019 18:26, Masahiro Yamada wrote:
> On Thu, Oct 3, 2019 at 2:24 AM David Miller  wrote:
>>
>> From: Guillaume Tucker 
>> Date: Wed, 2 Oct 2019 18:21:31 +0100
>>
>>> It seems like this isn't the case on the Raspberry Pi 2b with
>>> bcm2835_defconfig.  Here's an example of the kernel errors:
>>
>> This has been fixed upstream I believe, it was some ARM assembler issue
>> or something like that.
>>
>> In any event, definitely not a networking problem. :-)

Quite, and there was also a bisection on the clk-next branch.  If
some subsystem branches don't rebase with the fix and the problem
keeps happening then we'll be disabling boot bisections for them
temporarily to avoid email noise.

On a side note, we're also planning to add a way to mark a
revision as fixed to stop reporting particular failures that have
been fixed upstream - but that's not possible at the moment.

> The fix and related discussions are available.
> 
> https://lore.kernel.org/patchwork/patch/1132785/

Great, thanks!  Sorry I missed that thread.  Thank you also for
having mentioned the kernelci.org bot in the fix.

Guillaume


Re: net-next/master boot bisection: v5.3-13203-gc01ebd6c4698 on bcm2836-rpi-2-b

2019-10-02 Thread Guillaume Tucker
On 02/10/2019 11:05, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> *   *
> * If you do send a fix, please include this trailer:*
> *   Reported-by: "kernelci.org bot"   *
> *   *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> net-next/master boot bisection: v5.3-13203-gc01ebd6c4698 on bcm2836-rpi-2-b
> 
> Summary:
>   Start:  c01ebd6c4698 r8152: Use guard clause and fix comment typos
>   Details:https://kernelci.org/boot/id/5d942a9059b514a119d857e9
>   Plain log:  
> https://storage.kernelci.org//net-next/master/v5.3-13203-gc01ebd6c4698/arm/bcm2835_defconfig/gcc-8/lab-collabora/boot-bcm2836-rpi-2-b.txt
>   HTML log:   
> https://storage.kernelci.org//net-next/master/v5.3-13203-gc01ebd6c4698/arm/bcm2835_defconfig/gcc-8/lab-collabora/boot-bcm2836-rpi-2-b.html
>   Result: ac7c3e4ff401 compiler: enable CONFIG_OPTIMIZE_INLINING forcibly
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   net-next
>   URL:git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
>   Branch: master
>   Target: bcm2836-rpi-2-b
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: bcm2835_defconfig
>   Test suite: boot
> 
> Breaking commit found:
> 
> ---
> commit ac7c3e4ff401b304489a031938dbeaab585bfe0a
> Author: Masahiro Yamada 
> Date:   Wed Sep 25 16:47:42 2019 -0700
> 
> compiler: enable CONFIG_OPTIMIZE_INLINING forcibly
> 
> Commit 9012d011660e ("compiler: allow all arches to enable
> CONFIG_OPTIMIZE_INLINING") allowed all architectures to enable this
> option.  A couple of build errors were reported by randconfig, but all of
> them have been ironed out.
> 
> Towards the goal of removing CONFIG_OPTIMIZE_INLINING entirely (and it
> will simplify the 'inline' macro in compiler_types.h), this commit changes
> it to always-on option.  Going forward, the compiler will always be
> allowed to not inline functions marked 'inline'.
> 
> This is not a problem for x86 since it has been long used by
> arch/x86/configs/{x86_64,i386}_defconfig.
> 
> I am keeping the config option just in case any problem crops up for other
> architectures.

It seems like this isn't the case on the Raspberry Pi 2b with
bcm2835_defconfig.  Here's an example of the kernel errors:

https://lava.collabora.co.uk/scheduler/job/1859342#L327

There doesn't appear to be any problem with multi_v7_defconfig on
the same platform as shown with this test:

https://lava.collabora.co.uk/scheduler/job/1858986

So it's not entirely broken, but some investigation needs to be
done to find out what differences between bcm2835_defconfig and
multi_v7_defconfig are making the problem apparent.

I guess people can turn the option off in their local configs in
the meantime.  It doesn't seem like turning it off in
bcm2835_defconfig would be a solution though as it would only
hide the issue for a bit longer until the option is removed.

Guillaume

> The code clean-up will be done after confirming this is solid.
> 
> Link: 
> http://lkml.kernel.org/r/20190830034304.24259-1-yamada.masah...@socionext.com
> Signed-off-by: Masahiro Yamada 
> Acked-by: Nick Desaulniers 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: Miguel Ojeda 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Linus Torvalds 
> 
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 6b1b1703a646..93d97f9b0157 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -311,7 +311,7 @@ config HEADERS_CHECK
> relevant for userspace, say 'Y'.
>  
>  config OPTIMIZE_INLINING
> - bool "Allow compiler to uninline functions marked 'inline'"
> + def_bool y
>   help
> This option determines if the kernel forces gcc to inline the 
> functions
> developers have marked 'inline'. Doing so takes away freedom from gcc 
> to
> @@ -322,8 +322,6 @@ config OPTIMIZE_INLINING
> decision will become the default in the future. Until then this option
> is there to test gcc for this.
>  
> -   If unsure, say N.
> -
>  config DEBUG_SECTION_MISMATCH
>   bool "Enable full Section mismatch analysis"
>   help
> ---
> 
> 
> Git bisection log:
> 
> 

[PATCH v2] merge_config.sh: ignore unwanted grep errors

2019-09-02 Thread Guillaume Tucker
The merge_config.sh script verifies that all the config options have
their expected value in the resulting file and prints any issues as
warnings.  These checks aren't intended to be treated as errors given
the current implementation.  However, since "set -e" was added, if the
grep command to look for a config option does not find it the script
will then abort prematurely.

Handle the case where the grep exit status is non-zero by setting
ACTUAL_VAL to an empty string to restore previous functionality.

Fixes: cdfca821571d ("merge_config.sh: Check error codes from make")
Signed-off-by: Guillaume Tucker 
Cc: Jon Hunter 
---

Notes:
v2: use true rather than echo as per Jon Hunter's suggestion

 scripts/kconfig/merge_config.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index d924c51d28b7..f2cc10b1d404 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -177,7 +177,7 @@ make KCONFIG_ALLCONFIG=$TMP_FILE $OUTPUT_ARG $ALLTARGET
 for CFG in $(sed -n -e "$SED_CONFIG_EXP1" -e "$SED_CONFIG_EXP2" $TMP_FILE); do
 
REQUESTED_VAL=$(grep -w -e "$CFG" $TMP_FILE)
-   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG")
+   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG" || true)
if [ "x$REQUESTED_VAL" != "x$ACTUAL_VAL" ] ; then
echo "Value requested for $CFG not in final .config"
echo "Requested value:  $REQUESTED_VAL"
-- 
2.20.1



Re: [PATCH 1/1] merge_config.sh: ignore unwanted grep errors

2019-09-02 Thread Guillaume Tucker
On 02/09/2019 15:32, Jon Hunter wrote:
> 
> On 02/09/2019 15:26, Guillaume Tucker wrote:
>> On 02/09/2019 15:21, Jon Hunter wrote:
>>>
>>> On 02/09/2019 15:14, Guillaume Tucker wrote:
>>>> + Jon Hunter who hit a similar issue
>>>
>>> Thanks for adding me.
>>>
>>>> On 28/08/2019 21:19, Guillaume Tucker wrote:
>>>>> The merge_config.sh script verifies that all the config options have
>>>>> their expected value in the resulting file and prints any issues as
>>>>> warnings.  These checks aren't intended to be treated as errors given
>>>>> the current implementation.  However, since "set -e" was added, if the
>>>>> grep command to look for a config option does not find it the script
>>>>> will then abort prematurely.
>>>>>
>>>>> Handle the case where the grep exit status is non-zero by setting
>>>>> ACTUAL_VAL to an empty string to restore previous functionality.
>>>>>
>>>>> Fixes: cdfca821571d ("merge_config.sh: Check error codes from make")
>>>>> Signed-off-by: Guillaume Tucker 
>>>>> ---
>>>>>  scripts/kconfig/merge_config.sh | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/scripts/kconfig/merge_config.sh 
>>>>> b/scripts/kconfig/merge_config.sh
>>>>> index d924c51d28b7..d673268d414b 100755
>>>>> --- a/scripts/kconfig/merge_config.sh
>>>>> +++ b/scripts/kconfig/merge_config.sh
>>>>> @@ -177,7 +177,7 @@ make KCONFIG_ALLCONFIG=$TMP_FILE $OUTPUT_ARG 
>>>>> $ALLTARGET
>>>>>  for CFG in $(sed -n -e "$SED_CONFIG_EXP1" -e "$SED_CONFIG_EXP2" 
>>>>> $TMP_FILE); do
>>>>>  
>>>>>   REQUESTED_VAL=$(grep -w -e "$CFG" $TMP_FILE)
>>>>> - ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG")
>>>>> + ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG" || echo)
>>>
>>> Shouldn't this just be 'true' instead of 'echo'?
>>
>> I just explained why I used "echo" on your thread.  Essentially,
>> I think both can be used but "echo" made more sense to me because
>> the script is then using the output string from the command
>> rather than the exit status.
> 
> Yes just saw that. However, I don't think that using 'echo' is
> necessary. The grep command does not output anything and so the variable
> will essentially be an empty string, we just need to ensure that no
> error is returned from the command. In cases such as these I always use
> 'true' in conjunction with grep.

Sure, that makes sense too.  Your solution is arguably a bit
simpler so I agree it would be better to use "true" here.

I can submit a v2 with "true" if that helps, unless you prefer to
send your version of the fix yourself?


Also we're actually using this fix in KernelCI to test it on top
of Mark's patch:

  https://github.com/kernelci/linux/commits/staging.kernelci.org

so I can get it tested again with quite a few build variants
using "true".  It's kind of trivial but we need a working
merge_config.sh anyway on that branch.

Guillaume


Re: [PATCH 1/1] merge_config.sh: ignore unwanted grep errors

2019-09-02 Thread Guillaume Tucker
On 02/09/2019 15:21, Jon Hunter wrote:
> 
> On 02/09/2019 15:14, Guillaume Tucker wrote:
>> + Jon Hunter who hit a similar issue
> 
> Thanks for adding me.
> 
>> On 28/08/2019 21:19, Guillaume Tucker wrote:
>>> The merge_config.sh script verifies that all the config options have
>>> their expected value in the resulting file and prints any issues as
>>> warnings.  These checks aren't intended to be treated as errors given
>>> the current implementation.  However, since "set -e" was added, if the
>>> grep command to look for a config option does not find it the script
>>> will then abort prematurely.
>>>
>>> Handle the case where the grep exit status is non-zero by setting
>>> ACTUAL_VAL to an empty string to restore previous functionality.
>>>
>>> Fixes: cdfca821571d ("merge_config.sh: Check error codes from make")
>>> Signed-off-by: Guillaume Tucker 
>>> ---
>>>  scripts/kconfig/merge_config.sh | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/scripts/kconfig/merge_config.sh 
>>> b/scripts/kconfig/merge_config.sh
>>> index d924c51d28b7..d673268d414b 100755
>>> --- a/scripts/kconfig/merge_config.sh
>>> +++ b/scripts/kconfig/merge_config.sh
>>> @@ -177,7 +177,7 @@ make KCONFIG_ALLCONFIG=$TMP_FILE $OUTPUT_ARG $ALLTARGET
>>>  for CFG in $(sed -n -e "$SED_CONFIG_EXP1" -e "$SED_CONFIG_EXP2" 
>>> $TMP_FILE); do
>>>  
>>> REQUESTED_VAL=$(grep -w -e "$CFG" $TMP_FILE)
>>> -   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG")
>>> +   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG" || echo)
> 
> Shouldn't this just be 'true' instead of 'echo'?

I just explained why I used "echo" on your thread.  Essentially,
I think both can be used but "echo" made more sense to me because
the script is then using the output string from the command
rather than the exit status.

Guillaume


Re: [PATCH v2] merge_config.sh: Check error codes from make

2019-09-02 Thread Guillaume Tucker
On 02/09/2019 15:06, Jon Hunter wrote:
> 
> On 19/08/2019 21:06, Mark Brown wrote:
>> When we execute make after merging the configurations we ignore any
>> errors it produces causing whatever is running merge_config.sh to be
>> unaware of any failures.  This issue was noticed by Guillaume Tucker
>> while looking at problems with testing of clang only builds in KernelCI
>> which caused Kbuild to be unable to find a working host compiler.
>>
>> This implementation was suggested by Yamada-san.
>>
>> Suggested-by: Masahiro Yamada 
>> Reported-by: Guillaume Tucker 
>> Signed-off-by: Mark Brown 
>> ---
> 
> I have noticed some recent build failures on -next and the bisect is 
> pointing to this commit. I have been looking at why this commit is 
> making the builds fail and I see a few different things going on ...
> 
> 1. By using 'set -e', if grep fails to find a kconfig option in the   
>resulting config file, then script exits silently without reporting 
>which option it failed to find. Hence, it is unclear what triggered 
>the failure. This may happen when options are being disabled.
> 
> 2. If an option is disabled by the config fragment that happens to be a 
>parent of other kconfig options, then although the parent and 
>children are disabled correctly, the script may fail because it no 
>longer finds the children in the resulting config file. A specific 
>example, here is CONFIG_NFS_V4. We disable this option due to 
>issues with some host machines we use, and disabling this also 
>disables CONFIG_NFS_V4_1 and CONFIG_NFS_V4_2. Now if all 3 of these 
>options are enabled by default in the base config file, such as the 
>case in the ARM64 defconfig, then disabling CONFIG_NFS_V4 in the 
>config fragment causes merge_config.sh to fail because  
>CONFIG_NFS_V4_1 and CONFIG_NFS_V4_2 are not defined at all in 
>the resulting config. This causes grep to fail to find these and 
>hence causes the script to terminate. In the resulting config file we 
>just have '# CONFIG_NFS_V4 is not set'. I am not sure if there is an 
>easy way to determine if a missing config option is legitimate or 
>not. 
> 
> A simple way to test the above is ...
> 
>  $ export ARCH=arm64
>  $ echo "CONFIG_NFS_V4=n" > kfrag 
>   
> 
>  $ ./scripts/kconfig/merge_config.sh arch/arm64/configs/defconfig kfrag 
> 
> If the intent is to catch errors returned by make, then one simple fix would 
> be ...
> 
> diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
> index bec246719aea..63c8565206a4 100755
> --- a/scripts/kconfig/merge_config.sh
> +++ b/scripts/kconfig/merge_config.sh
> @@ -179,7 +179,7 @@ make KCONFIG_ALLCONFIG=$TMP_FILE $OUTPUT_ARG $ALLTARGET
>  for CFG in $(sed -n -e "$SED_CONFIG_EXP1" -e "$SED_CONFIG_EXP2" $TMP_FILE); 
> do
>  
> REQUESTED_VAL=$(grep -w -e "$CFG" $TMP_FILE)
> -   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG")
> +   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG" || true)
> if [ "x$REQUESTED_VAL" != "x$ACTUAL_VAL" ] ; then
> echo "Value requested for $CFG not in final .config"
> echo "Requested value:  $REQUESTED_VAL"
> 
> 
> I have been using merge_config.sh to enable and disable various options
> we need for testing for sometime now and so would hope I am not doing
> anything out of the ordinary here. 
> 
> Let me know your thoughts.

I've added you to another thread with a fix I sent last week for
the same issue.

The way I addressed it with "echo" was to explicitly return an
empty line as that is essentially what is then being used to
compare the config values.  I guess "true" also works in
practice.

My understanding is that "set -e" was added primarily to catch
errors returned by the make command.  The config value checks
with grep have always been warnings that don't cause errors, so I
would assume that it should stay like this until there's a
conscious decision to change this behaviour.

Thanks,
Guillaume


Re: [PATCH 1/1] merge_config.sh: ignore unwanted grep errors

2019-09-02 Thread Guillaume Tucker
+ Jon Hunter who hit a similar issue

On 28/08/2019 21:19, Guillaume Tucker wrote:
> The merge_config.sh script verifies that all the config options have
> their expected value in the resulting file and prints any issues as
> warnings.  These checks aren't intended to be treated as errors given
> the current implementation.  However, since "set -e" was added, if the
> grep command to look for a config option does not find it the script
> will then abort prematurely.
> 
> Handle the case where the grep exit status is non-zero by setting
> ACTUAL_VAL to an empty string to restore previous functionality.
> 
> Fixes: cdfca821571d ("merge_config.sh: Check error codes from make")
> Signed-off-by: Guillaume Tucker 
> ---
>  scripts/kconfig/merge_config.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
> index d924c51d28b7..d673268d414b 100755
> --- a/scripts/kconfig/merge_config.sh
> +++ b/scripts/kconfig/merge_config.sh
> @@ -177,7 +177,7 @@ make KCONFIG_ALLCONFIG=$TMP_FILE $OUTPUT_ARG $ALLTARGET
>  for CFG in $(sed -n -e "$SED_CONFIG_EXP1" -e "$SED_CONFIG_EXP2" $TMP_FILE); 
> do
>  
>   REQUESTED_VAL=$(grep -w -e "$CFG" $TMP_FILE)
> - ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG")
> + ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG" || echo)
>   if [ "x$REQUESTED_VAL" != "x$ACTUAL_VAL" ] ; then
>   echo "Value requested for $CFG not in final .config"
>   echo "Requested value:  $REQUESTED_VAL"
> 


[PATCH 1/1] merge_config.sh: ignore unwanted grep errors

2019-08-28 Thread Guillaume Tucker
The merge_config.sh script verifies that all the config options have
their expected value in the resulting file and prints any issues as
warnings.  These checks aren't intended to be treated as errors given
the current implementation.  However, since "set -e" was added, if the
grep command to look for a config option does not find it the script
will then abort prematurely.

Handle the case where the grep exit status is non-zero by setting
ACTUAL_VAL to an empty string to restore previous functionality.

Fixes: cdfca821571d ("merge_config.sh: Check error codes from make")
Signed-off-by: Guillaume Tucker 
---
 scripts/kconfig/merge_config.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index d924c51d28b7..d673268d414b 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -177,7 +177,7 @@ make KCONFIG_ALLCONFIG=$TMP_FILE $OUTPUT_ARG $ALLTARGET
 for CFG in $(sed -n -e "$SED_CONFIG_EXP1" -e "$SED_CONFIG_EXP2" $TMP_FILE); do
 
REQUESTED_VAL=$(grep -w -e "$CFG" $TMP_FILE)
-   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG")
+   ACTUAL_VAL=$(grep -w -e "$CFG" "$KCONFIG_CONFIG" || echo)
if [ "x$REQUESTED_VAL" != "x$ACTUAL_VAL" ] ; then
echo "Value requested for $CFG not in final .config"
echo "Requested value:  $REQUESTED_VAL"
-- 
2.20.1



Re: [PATCH 0/4] Followup to "Make clk_hw::init NULL after clk registration"

2019-08-19 Thread Guillaume Tucker
On 15/08/2019 17:00, Stephen Boyd wrote:
> I found some more cases where the init structure is referenced from
> within the clk_hw struct after clk_registration is called. I suspect the
> rtc driver fix is useful to avoid crashes on Allwinner devices, reported
> by kernel-ci.

Please feel free to add this trailer where appropriate:

  Reported-by: "kernelci.org bot" 


Thanks,
Guillaume


[PATCH] media: vivid: fix device init when no_error_inj=1 and fb disabled

2019-07-24 Thread Guillaume Tucker
Add an extra condition to add the video output control class when the
device has some hdmi outputs defined.  This is required to then always
be able to add the display present control, which is enabled when
there are some hdmi outputs.

This fixes the corner case where no_error_inj is enabled and the
device has no frame buffer but some hdmi outputs, as otherwise the
video output control class would be added anyway.  Without this fix,
the sanity checks fail in v4l2_ctrl_new() as name is NULL.

Fixes: c533435ffb91 ("media: vivid: add display present control")
Cc: sta...@vger.kernel.org
Signed-off-by: Guillaume Tucker 
---
 drivers/media/platform/vivid/vivid-ctrls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/vivid/vivid-ctrls.c 
b/drivers/media/platform/vivid/vivid-ctrls.c
index 3e916c8befb7..7a52f585cab7 100644
--- a/drivers/media/platform/vivid/vivid-ctrls.c
+++ b/drivers/media/platform/vivid/vivid-ctrls.c
@@ -1473,7 +1473,7 @@ int vivid_create_controls(struct vivid_dev *dev, bool 
show_ccs_cap,
v4l2_ctrl_handler_init(hdl_vid_cap, 55);
v4l2_ctrl_new_custom(hdl_vid_cap, _ctrl_class, NULL);
v4l2_ctrl_handler_init(hdl_vid_out, 26);
-   if (!no_error_inj || dev->has_fb)
+   if (!no_error_inj || dev->has_fb || dev->num_hdmi_outputs)
v4l2_ctrl_new_custom(hdl_vid_out, _ctrl_class, NULL);
v4l2_ctrl_handler_init(hdl_vbi_cap, 21);
v4l2_ctrl_new_custom(hdl_vbi_cap, _ctrl_class, NULL);
-- 
2.20.1



Re: next/master boot bisection: next-20190617 on sun8i-h2-plus-orangepi-zero

2019-06-18 Thread Guillaume Tucker
Hi Martin,

On 18/06/2019 21:58, Martin Blumenstingl wrote:
> Hi Guillaume,
> 
> On Tue, Jun 18, 2019 at 10:53 PM Guillaume Tucker
>  wrote:
>>
>> On 18/06/2019 21:42, Martin Blumenstingl wrote:
>>> On Tue, Jun 18, 2019 at 6:53 PM Kevin Hilman  wrote:
>>> [...]
>>>> This seems to have broken on several sunxi SoCs, but also a MIPS SoC
>>>> (pistachio_marduk):
>>>>
>>>> https://storage.kernelci.org/next/master/next-20190618/mips/pistachio_defconfig/gcc-8/lab-baylibre-seattle/boot-pistachio_marduk.html
>>> today I learned why initializing arrays on the stack is important
>>> too bad gcc didn't warn that I was about to shoot myself (or someone
>>> else) in the foot :/
>>>
>>> I just sent a fix: [0]
>>>
>>> sorry for this issue and thanks to Kernel CI for even pointing out the
>>> offending commit (this makes things a lot easier than just yelling
>>> that "something is broken")
>>
>> Glad that helped :)
>>
>> If you would be so kind as to credit our robot friend in your
>> patch, it'll be forever grateful:
>>
>>   Reported-by: "kernelci.org bot" 
> sure
> do you want me to re-send my other patch or should I just reply to it
> adding the Reported-by tag and hope that Dave will catch it when
> applying the patch?

Well that's no big deal so replying would already be great.  The
important part is that the fix gets applied.

> in either case: I did mention in the patch description that Kernel CI caught 
> it

I see, thanks!

> by the way: I didn't know how to credit the Kernel CI bot.
> syzbot / syzkaller makes that bit easy as it's mentioned in the
> generated email, see [0] for a (random) example
> have you considered adding the Reported-by to the generated email?

Yes, we've got some bugs to fix first but that will be added to
the email report soon (next week I guess).  Thanks for the
suggestion though.

Guillaume

> [0] https://lkml.org/lkml/2019/4/19/638


Re: next/master boot bisection: next-20190617 on sun8i-h2-plus-orangepi-zero

2019-06-18 Thread Guillaume Tucker
On 18/06/2019 21:42, Martin Blumenstingl wrote:
> On Tue, Jun 18, 2019 at 6:53 PM Kevin Hilman  wrote:
> [...]
>> This seems to have broken on several sunxi SoCs, but also a MIPS SoC
>> (pistachio_marduk):
>>
>> https://storage.kernelci.org/next/master/next-20190618/mips/pistachio_defconfig/gcc-8/lab-baylibre-seattle/boot-pistachio_marduk.html
> today I learned why initializing arrays on the stack is important
> too bad gcc didn't warn that I was about to shoot myself (or someone
> else) in the foot :/
> 
> I just sent a fix: [0]
> 
> sorry for this issue and thanks to Kernel CI for even pointing out the
> offending commit (this makes things a lot easier than just yelling
> that "something is broken")

Glad that helped :)

If you would be so kind as to credit our robot friend in your
patch, it'll be forever grateful:

  Reported-by: "kernelci.org bot" 

Thanks,
Guillaume

> Martin
> 
> 
> [0] https://patchwork.ozlabs.org/patch/1118313/
> 



Re: [alsa-devel] next/master boot bisection: next-20190528 on sun8i-h3-libretech-all-h3-cc

2019-06-07 Thread Guillaume Tucker
On 30/05/2019 16:53, Takashi Iwai wrote:
> On Thu, 30 May 2019 11:16:22 +0200,
> kernelci.org bot wrote:
>>
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has  *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.  *
>> * Hope this helps!  *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> next/master boot bisection: next-20190528 on sun8i-h3-libretech-all-h3-cc
>>
>> Summary:
>>   Start:  531b0a360899 Add linux-next specific files for 20190528
>>   Details:https://kernelci.org/boot/id/5cece0fd59b5144bc47a362b
>>   Plain log:  
>> https://storage.kernelci.org//next/master/next-20190528/arm/sunxi_defconfig/gcc-8/lab-baylibre/boot-sun8i-h3-libretech-all-h3-cc.txt
>>   HTML log:   
>> https://storage.kernelci.org//next/master/next-20190528/arm/sunxi_defconfig/gcc-8/lab-baylibre/boot-sun8i-h3-libretech-all-h3-cc.html
>>   Result: 34ac3c3eb8f0 ASoC: core: lock client_mutex while removing link 
>> components
>>
>> Checks:
>>   revert: PASS
>>   verify: PASS
>>
>> Parameters:
>>   Tree:   next
>>   URL:
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>>   Branch: master
>>   Target: sun8i-h3-libretech-all-h3-cc
>>   CPU arch:   arm
>>   Lab:lab-baylibre
>>   Compiler:   gcc-8
>>   Config: sunxi_defconfig
>>   Test suite: boot
>>
>> Breaking commit found:
>>
>> ---
>> commit 34ac3c3eb8f0c07252ceddf0a22dd240e5c91ccb
>> Author: Ranjani Sridharan 
>> Date:   Thu May 23 10:12:01 2019 -0700
>>
>> ASoC: core: lock client_mutex while removing link components
>> 
>> Removing link components results in topology unloading. So,
>> acquire the client_mutex before removing components in
>> soc_remove_link_components. This will prevent the lockdep warning
>> seen when dai links are removed during topology removal.
>> 
>> Signed-off-by: Ranjani Sridharan 
>> Signed-off-by: Mark Brown 
>>
>> diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
>> index 2403bec2fccf..7c9415987ac7 100644
>> --- a/sound/soc/soc-core.c
>> +++ b/sound/soc/soc-core.c
>> @@ -1005,12 +1005,14 @@ static void soc_remove_link_components(struct 
>> snd_soc_card *card,
>>  struct snd_soc_component *component;
>>  struct snd_soc_rtdcom_list *rtdcom;
>>  
>> +mutex_lock(_mutex);
>>  for_each_rtdcom(rtd, rtdcom) {
>>  component = rtdcom->component;
>>  
>>  if (component->driver->remove_order == order)
>>  soc_remove_component(component);
>>  }
>> +mutex_unlock(_mutex);
>>  }
> 
> Indeed this dead-locks in the error path of
> snd_soc_instantiate_card():
> 
> snd_soc_instantiate_card() ->
>   mutex_lock(_mutex);
>   
>   -> soc_cleanup_card_resources();
> -> soc_remove_dai_links();
>   -> soc_remove_link_components();
>  mutex_lock(_mutex);
> 
> 
> Ranjani, which code path your patch tries to address?  Maybe better to
> wrap client_mutex() in the caller side like snd_soc_unbind_card()?

Is anyone looking into this issue?

It is still occurring in next-20190606, there was a bisection
today which landed on the same commit.  There just hasn't been
any new bisection reports because they have been temporarily
disabled while we fix some issues on kernelci.org.

Thanks,
Guillaume

>>  static void soc_remove_dai_links(struct snd_soc_card *card)
>> ---
>>
>>
>> Git bisection log:
>>
>> ---
>> git bisect start
>> # good: [cd6c84d8f0cdc911df435bb075ba22ce3c605b07] Linux 5.2-rc2
>> git bisect good cd6c84d8f0cdc911df435bb075ba22ce3c605b07
>> # bad: [531b0a360899269bd99a38ba9852a8ba46852bcd] Add linux-next specific 
>> files for 20190528
>> git bisect bad 531b0a360899269bd99a38ba9852a8ba46852bcd
>> # bad: [0b61d4c3b7d7938ef0014778c328e3f65c0d6d57] Merge remote-tracking 
>> branch 'crypto/master'
>> git bisect bad 0b61d4c3b7d7938ef0014778c328e3f65c0d6d57
>> # bad: [6179e21b065dc0f592cd3d9d3676bd64d4278025] Merge remote-tracking 
>> branch 'xtensa/xtensa-for-next'
>> git bisect bad 6179e21b065dc0f592cd3d9d3676bd64d4278025
>> # bad: [3e085f66fe7e93575f2a583a3d434415cef2d860] Merge remote-tracking 
>> branch 'amlogic/for-next'
>> git bisect bad 3e085f66fe7e93575f2a583a3d434415cef2d860
>> # bad: [b9afa223a3420432bc483d2b43429c88c6a5d0e0] Merge remote-tracking 
>> branch 'staging.current/staging-linus'
>> git bisect bad b9afa223a3420432bc483d2b43429c88c6a5d0e0
>> # good: [fc6557648e19dbd207dc815c6e09fc6452f01e63] Merge remote-tracking 
>> branch 'bpf/master'
>> 

Re: linusw/for-next boot bisection: v5.2-rc1-8-g73a790c68d7e on rk3288-veyron-jaq

2019-05-28 Thread Guillaume Tucker
Hi Geert,

On 28/05/2019 08:45, Geert Uytterhoeven wrote:
> Hi Guillaume,
> 
> On Tue, May 28, 2019 at 9:13 AM Guillaume Tucker
>  wrote:
>> On 28/05/2019 00:38, kernelci.org bot wrote:
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>> * This automated bisection report was sent to you on the basis  *
>>> * that you may be involved with the breaking commit it has  *
>>> * found.  No manual investigation has been done to verify it,   *
>>> * and the root cause of the problem may be somewhere else.  *
>>> * Hope this helps!  *
>>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>>
>>> linusw/for-next boot bisection: v5.2-rc1-8-g73a790c68d7e on 
>>> rk3288-veyron-jaq
>>>
>>> Summary:
>>>   Start:  73a790c68d7e Merge branch 'devel' into for-next
>>>   Details:https://kernelci.org/boot/id/5cebf03d59b514dd627a3629
>>>   Plain log:  
>>> https://storage.kernelci.org//linusw/for-next/v5.2-rc1-8-g73a790c68d7e/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.txt
>>>   HTML log:   
>>> https://storage.kernelci.org//linusw/for-next/v5.2-rc1-8-g73a790c68d7e/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.html
>>>   Result: 28694e009e51 thermal: rockchip: fix up the tsadc pinctrl 
>>> setting error
>>>
>>> Checks:
>>>   revert: PASS
>>>   verify: PASS
>>>
>>> Parameters:
>>>   Tree:   linusw
>>>   URL:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git/
>>>   Branch: for-next
>>>   Target: rk3288-veyron-jaq
>>>   CPU arch:   arm
>>>   Lab:lab-collabora
>>>   Compiler:   gcc-8
>>>   Config: multi_v7_defconfig
>>>   Test suite: boot
>>>
>>> Breaking commit found:
>>>
>>> ---
>>> commit 28694e009e512451ead5519dd801f9869acb1f60
>>> Author: Elaine Zhang 
>>> Date:   Tue Apr 30 18:09:44 2019 +0800
>>>
>>> thermal: rockchip: fix up the tsadc pinctrl setting error
>>
>> This commit has now been reverted in mainline.  Would it be OK
>> for you to rebase your for-next branch on v5.2-rc2 or cherry-pick
>> the revert to avoid recurring bisections?
>>
>> Ideally this should have been fixed or reverted in mainline
>> before v5.2-rc1 was released, or even earlier when this was first
>> found in -next on 13th May.  Unfortunately it was overlooked and
>> then spread to other branches like yours.
> 
> I'm afraid it's gonna spread to even more for-next branches, as most
> subsystem maintainers base their for-next branch on the previous rc1
> release.  Typically maintainers do not rebase their for-next branches,
> and do not cherry-pick fixes, unless they are critical for their
> subsystem.  So you can expect this to show up in e.g. the m68k for-next
> branch soon...

That is what I feared, thanks for confirming.

> Can't you mark this as a known issue, to prevent spending cycles on the
> same bisection, and sending out more bisection reports for the same
> issue?

Not really, so I've disabled bisections in the linux-gpio tree
and a few other maintainers' trees for now.  I'll see if we can
come up with a more systematic way of suppressing bisections in
similar cases (i.e. the issue has been fixed in mainline later
than the base commit for the branch being tested).

Thanks,
Guillaume


Re: linusw/for-next boot bisection: v5.2-rc1-8-g73a790c68d7e on rk3288-veyron-jaq

2019-05-28 Thread Guillaume Tucker
Hi Linus,

On 28/05/2019 00:38, kernelci.org bot wrote:
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> * This automated bisection report was sent to you on the basis  *
> * that you may be involved with the breaking commit it has  *
> * found.  No manual investigation has been done to verify it,   *
> * and the root cause of the problem may be somewhere else.  *
> * Hope this helps!  *
> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
> 
> linusw/for-next boot bisection: v5.2-rc1-8-g73a790c68d7e on rk3288-veyron-jaq
> 
> Summary:
>   Start:  73a790c68d7e Merge branch 'devel' into for-next
>   Details:https://kernelci.org/boot/id/5cebf03d59b514dd627a3629
>   Plain log:  
> https://storage.kernelci.org//linusw/for-next/v5.2-rc1-8-g73a790c68d7e/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.txt
>   HTML log:   
> https://storage.kernelci.org//linusw/for-next/v5.2-rc1-8-g73a790c68d7e/arm/multi_v7_defconfig/gcc-8/lab-collabora/boot-rk3288-veyron-jaq.html
>   Result: 28694e009e51 thermal: rockchip: fix up the tsadc pinctrl 
> setting error
> 
> Checks:
>   revert: PASS
>   verify: PASS
> 
> Parameters:
>   Tree:   linusw
>   URL:
> https://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio.git/
>   Branch: for-next
>   Target: rk3288-veyron-jaq
>   CPU arch:   arm
>   Lab:lab-collabora
>   Compiler:   gcc-8
>   Config: multi_v7_defconfig
>   Test suite: boot
> 
> Breaking commit found:
> 
> ---
> commit 28694e009e512451ead5519dd801f9869acb1f60
> Author: Elaine Zhang 
> Date:   Tue Apr 30 18:09:44 2019 +0800
> 
> thermal: rockchip: fix up the tsadc pinctrl setting error

This commit has now been reverted in mainline.  Would it be OK
for you to rebase your for-next branch on v5.2-rc2 or cherry-pick
the revert to avoid recurring bisections?

Ideally this should have been fixed or reverted in mainline
before v5.2-rc1 was released, or even earlier when this was first
found in -next on 13th May.  Unfortunately it was overlooked and
then spread to other branches like yours.

Thanks,
Guillaume


  1   2   3   >