Re: [PATCH 4.19 000/247] 4.19.178-rc1 review

2021-03-03 Thread Hanjun Guo

On 2021/3/2 0:10, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.19.178 release.
There are 247 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Mar 2021 16:09:49 +.
Anything received after that time might be too late.


Tested on arm64 and x86 for 4.19.178-rc3+,

arm64:

Testcase Result Summary:
total_num: 4674
succeed_num: 4674
failed_num: 0
timeout_num: 0

x86 (No kernel failures)

Testcase Result Summary:
total_num: 4674
succeed_num: 4673
failed_num: 1
timeout_num: 0

Tested-by: Hulk Robot 


Re: [PATCH 5.4 000/340] 5.4.102-rc1 review

2021-03-03 Thread Hanjun Guo

On 2021/3/2 0:09, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 5.4.102 release.
There are 340 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Mar 2021 16:09:49 +.
Anything received after that time might be too late.


Tested on arm64 and x86 for 5.4.102-rc4+,

arm64:

Testcase Result Summary:
total_num: 4716
succeed_num: 4716
failed_num: 0
timeout_num: 0

x86 (No kernel failures)

Testcase Result Summary:
total_num: 4716
succeed_num: 4715
failed_num: 1
timeout_num: 0

Tested-by: Hulk Robot 


Re: [PATCH 5.4 000/340] 5.4.102-rc1 review

2021-03-02 Thread Hanjun Guo

On 2021/3/2 20:31, Greg Kroah-Hartman wrote:

On Tue, Mar 02, 2021 at 02:42:15PM +0800, Hanjun Guo wrote:

Hi Greg,

On 2021/3/2 0:09, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 5.4.102 release.
There are 340 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Mar 2021 16:09:49 +.
Anything received after that time might be too late.

Our test CI monitored the 5.4.102-rc2, and compile failure:

kernel/rcu/tree.c:617:2: error: implicit declaration of function
‘IRQ_WORK_INIT’; did you mean ‘IRQ_WORK_BUSY’?
[-Werror=implicit-function-declaration]
   IRQ_WORK_INIT(late_wakeup_func);
   ^
   IRQ_WORK_BUSY
kernel/rcu/tree.c:617:2: error: invalid initializer

Should be commit e1e41aa31ed1 (rcu/nocb: Trigger self-IPI on late
deferred wake up before user resume) fails the build.

Ah, thank you, I'll go fix that up.  Looks like 5.10.y also fails with
that issue...


It is, same compile error for 5.10. I just confirmed both arm64 and
x86 have the same compile error.

Thanks
Hanjun


Re: [PATCH 5.4 000/340] 5.4.102-rc1 review

2021-03-02 Thread Hanjun Guo

Hi Greg,

On 2021/3/2 0:09, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 5.4.102 release.
There are 340 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Mar 2021 16:09:49 +.
Anything received after that time might be too late.


Our test CI monitored the 5.4.102-rc2, and compile failure:

kernel/rcu/tree.c:617:2: error: implicit declaration of function 
‘IRQ_WORK_INIT’; did you mean ‘IRQ_WORK_BUSY’? 
[-Werror=implicit-function-declaration]

  IRQ_WORK_INIT(late_wakeup_func);
  ^
  IRQ_WORK_BUSY
kernel/rcu/tree.c:617:2: error: invalid initializer

Should be commit e1e41aa31ed1 (rcu/nocb: Trigger self-IPI on late
deferred wake up before user resume) fails the build.

By the way, do you expect us test the 5.4.102-rc1 or the
5.4.102-rc2 in your tree?

Thanks
Hanjun


Re: [PATCH 4.19 000/247] 4.19.178-rc1 review

2021-03-01 Thread Hanjun Guo

On 2021/3/2 0:10, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.19.178 release.
There are 247 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed, 03 Mar 2021 16:09:49 +.
Anything received after that time might be too late.


Tested both on x86 [1] and ARM64 [2] server,

Tested-by: Hulk Robot 

Thanks
Hanjun

[1]:
Arch: x86 (confirmed that no kernel failures)

Testcase Result Summary:
total_num: 4681
succeed_num: 4677
failed_num: 4
timeout_num: 1


[2]:
Arch: arm64

Testcase Result Summary:
total_num: 4675
succeed_num: 4675
failed_num: 0
timeout_num: 0





Re: [PATCH 5.10 00/23] 5.10.19-rc1 review

2021-02-26 Thread Hanjun Guo

On 2021/2/26 14:44, Hanjun Guo wrote:

Hi Greg,

On 2021/2/25 17:53, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 5.10.19 release.
There are 23 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat, 27 Feb 2021 09:25:06 +.
Anything received after that time might be too late.


It takes longer time to set up our test farm than I expected,
for now we can run test on x86 for stable 5.10, test for
ARM64 server and other stable kernels (4.19 and 5.4) is
in progress (needs more machines and a rack in the data
center), please give us a bit more time to get things ready.

Here is the test results for x86, compiled and booted OK,
also no regressions for the functional test [1] (the failed
ones are the mismatch of the testcase and the test environment,
not the kernel failures),


Although 5.10.19 is released, it's better to report the ARM64
test results as well:

Testcase Result Summary:

total_num: 4732

succeed_num: 4732

failed_num: 0

timeout_num: 0

Thanks
Hanjun


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-26 Thread Hanjun Guo

Hi Nick,

Sorry for taking so long to reply you, we had discussions on how to
corporate with KCIDB, please see my comments inline.

On 2021/2/19 22:45, Nikolai Kondrashov wrote:

Hi Hanjun,

On 2/19/21 10:54 AM, Hanjun Guo wrote:
 > In specific, we will start from the testing work, using HULK robot
 > (reports lots of bugs to mainline kernel) testing framework to test
 > compile, reboot, functional testing, and will extend to basic
 > performance regression testing in the future.

I heard about Huawei ramping up kernel testing from someone at FOSDEM
2019. I wonder if it was you :) Nice to see your progress and the company
stepping up to help with testing!


I Cced Yongjun and Jinyue, they are the key persons :)



Would you be interested in working with the Linux Foundation KernelCI 
project

on submitting your build and test results to the common database - KCIDB?


Yes, we are willing to sent the test results to KCIDB.

For now, all the tests are inside the company, which blocks us to
directly sent the test results out of the test machine due to the
security policy, it takes us sometime to discuss how to send out
the test results, and we may need do the test in a public cloud.



We are working on aggregating results from various testing systems so we 
can

provide a dashboard, and a single, aggregated e-mail report to subscribed
maintainers and developers.

We have a prototype dashboard at https://staging.kernelci.org:3000/ and are
working hard on making the e-mail reports good enough to start reaching 
out to

maintainers.

We already have ARM, Google Syzbot, Gentoo GKernelCI, Red Hat CKI, and, of
course, KernelCI native tests sending data to the database. Linaro 
Tuxsuite is
starting sending today. We could use your data, and of course any 
development

help you could spare :)


How can we connect to the KCIDB? Can we send the test data out by email
first?



I wish I could show you my today's KCIDB presentation at DevConf.cz, but 
the
recording is not out yet. Meanwhile you can take a look at our 
presentation at

last year's Linux Plumbers: https://youtu.be/y9Glc90WUN0?t=10739

Or see our intro in an older blog post:
https://foundation.kernelci.org/blog/2020/08/21/introducing-common-reporting/ 



Anyone wishing to contribute to KCIDB gets credentials and permissions to
submit to our "playground" setup where they can send their data, see it 
in a
dashboard, experiment without worrying about breaking anything, and 
decide if

they like it or n >
If you're interested, take a look at our Submission HOWTO:
https://github.com/kernelci/kcidb/blob/v8/SUBMISSION_HOWTO.md
and send an email to kerne...@groups.io (CC'd), or come over to the 
#kernelci

channel on freenode.net!


Thanks!
Hanjun


Re: [PATCH 5.10 00/23] 5.10.19-rc1 review

2021-02-25 Thread Hanjun Guo

Hi Greg,

On 2021/2/25 17:53, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 5.10.19 release.
There are 23 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat, 27 Feb 2021 09:25:06 +.
Anything received after that time might be too late.


It takes longer time to set up our test farm than I expected,
for now we can run test on x86 for stable 5.10, test for
ARM64 server and other stable kernels (4.19 and 5.4) is
in progress (needs more machines and a rack in the data
center), please give us a bit more time to get things ready.

Here is the test results for x86, compiled and booted OK,
also no regressions for the functional test [1] (the failed
ones are the mismatch of the testcase and the test environment,
not the kernel failures),

Tested-by: Hulk Robot 

Thanks
Hanjun

[1]:
Kernel repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git


Branch: linux-5.10.y

Arch: x86

Version: 5.10.19-rc1+

Commit: 6ffb943c0e01d843a06842f9a7bcfc008e10a6d2

Compiler: gcc version 8.3.1 (GCC)



Testcase Result Summary:

total_num: 4739

succeed_num: 4732

failed_num: 7

timeout_num: 0



Testsuites List:

autotest

crackerjack

kernel_selftests

ltp-aiodio

ltp-aio-stress

ltp-controllers

ltp-openposix

ltp-can

ltp-cap_bounds

ltp-commands

ltp-connectors

ltp-containers

ltp-cpuhotplug

ltp-crashme

ltp-crypto

ltp-cve

ltp-dio

ltp-dma_thread_diotest

ltp-fcntl-locktests

ltp-filecaps

ltp-fs

ltp-fs_bind

ltp-fs_perms_simple

ltp-fs_readonly

ltp-fsx

ltp-hugetlb

ltp-hyperthreading

ltp-ima

ltp-input

ltp-io

ltp-io_cd

ltp-io_floppy

ltp-ipc

ltp-kernel_misc

ltp-fsstress

ltp-fsx-linux

ltp-math

ltp-mm

ltp-nptl

ltp-numa

ltp-power_management_tests

ltp-power_management_tests_exclusive

ltp-pty

ltp-sched

ltp-scsi_debug

ltp-securebits

ltp-smack

ltp-smoketest

ltp-syscalls

ltp-tracing

ltp-uevent

ltp-realtime

memory_ksm

security_audit


Re: [PATCH v1] ACPI: processor: idle: Drop extra prefix from pr_notice()

2021-02-25 Thread Hanjun Guo

On 2021/2/25 2:37, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Drop "ACPI: " from the pr_noitice() instance in
acpi_processor_cstate_first_run_checks(), because pr_fmt() causes
that prefix to be added to the message already.

Reported-by: Hanjun Guo 
Signed-off-by: Rafael J. Wysocki 
---
  drivers/acpi/processor_idle.c |4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/acpi/processor_idle.c
===
--- linux-pm.orig/drivers/acpi/processor_idle.c
+++ linux-pm/drivers/acpi/processor_idle.c
@@ -780,8 +780,8 @@ static inline void acpi_processor_cstate
dmi_check_system(processor_power_dmi_table);
max_cstate = acpi_processor_cstate_check(max_cstate);
if (max_cstate < ACPI_C_STATES_MAX)
-   pr_notice("ACPI: processor limited to max C-state %d\n",
- max_cstate);
+   pr_notice("processor limited to max C-state %d\n", max_cstate);
+
first_run++;
  
  	if (nocst)


Reviewed-by: Hanjun Guo 

Thanks
Hanjun


Re: [PATCH v1 1/4] ACPI: processor: Get rid of ACPICA message printing

2021-02-25 Thread Hanjun Guo

On 2021/2/25 2:06, Rafael J. Wysocki wrote:

On Tue, Feb 23, 2021 at 3:45 PM Rafael J. Wysocki  wrote:


On Tue, Feb 23, 2021 at 12:31 PM Hanjun Guo  wrote:


On 2021/2/23 2:59, Rafael J. Wysocki wrote:

Index: linux-pm/drivers/acpi/processor_idle.c
===
--- linux-pm.orig/drivers/acpi/processor_idle.c
+++ linux-pm/drivers/acpi/processor_idle.c


In this file, function acpi_processor_cstate_first_run_checks()
has a wrong pr_notice():

pr_notice("ACPI: processor limited to max C-state %d\n",
 max_cstate);

Since we have pr_fmt() for this file, "ACPI:" is duplicate,
we'd better cleanup this as below:

pr_notice("processor limited to max C-state %d\n", max_cstate);


Thanks for pointing this out, I'll make this change when applying the patch.


Actually, this issue is not strictly related to the patch here, so I'm
going to send a separate patch to fix it.


Make sense to me as well.

Thanks
Hanjun


Re: [PATCH v1 0/4] ACPI: Get rid of ACPICA message printing from core

2021-02-23 Thread Hanjun Guo

On 2021/2/23 2:57, Rafael J. Wysocki wrote:

Hi All,

This series replaces ACPI_DEBUG_PRINT() and ACPI_EXCEPTION() in the ACPI
processor driver and sysfs code and drops definitions of related symbols
that are not used for anything meaningful any more.

Please refer to the patch changelogs for details.


Except patch 1/4, others are looking good to me. some
legacy printk(PRIFIX ...) are still there, but we can
clean up them later.

Feel feel to add my review tag with minor issue addressed.

Reviewed-by: Hanjun Guo 

Thanks
Hanjun


Re: [PATCH v1 1/4] ACPI: processor: Get rid of ACPICA message printing

2021-02-23 Thread Hanjun Guo

On 2021/2/23 2:59, Rafael J. Wysocki wrote:

Index: linux-pm/drivers/acpi/processor_idle.c
===
--- linux-pm.orig/drivers/acpi/processor_idle.c
+++ linux-pm/drivers/acpi/processor_idle.c


In this file, function acpi_processor_cstate_first_run_checks()
has a wrong pr_notice():

pr_notice("ACPI: processor limited to max C-state %d\n",
max_cstate);

Since we have pr_fmt() for this file, "ACPI:" is duplicate,
we'd better cleanup this as below:

pr_notice("processor limited to max C-state %d\n", max_cstate);

Thanks
Hanjun


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-22 Thread Hanjun Guo

On 2021/2/20 17:53, Greg Kroah-Hartman wrote:

On Sat, Feb 20, 2021 at 03:02:54PM +0800, Hanjun Guo wrote:

On 2021/2/19 17:08, Greg Kroah-Hartman wrote:

On Fri, Feb 19, 2021 at 04:54:24PM +0800, Hanjun Guo wrote:

Hi Greg,

On 2021/1/26 15:29, Greg Kroah-Hartman wrote:
[...]


I want to see companies _using_ the kernel, and most importantly,
_updating_ their devices with it, to know if it is worth to keep around
for longer than 2 years.  I also, hopefully, want to see how those
companies will help me out in the testing and maintenance of that kernel
version in order to make supporting it for 6 years actually possible.

So, are you planning on using 5.10?  Will you will be willing to help
out in testing the -rc releases I make to let me know if there are any
problems, and to help in pointing out and backporting any specific
patches that your platforms need for that kernel release?


We(Huawei) are willing to commit resources to help out in testing the
stable -rc releases, and to help to backport patches for stable kernels.


Wonderful!


5.10 stable kernel will be used for openEuler [1] kernel and also inside
Huawei. From customer's feedback, it's very important to see the stable
kernel we used to be maintained for 6 years in the community, and we
will use 5.10 kernel for at least 6 years, so we are willing to help
you and help ourselves :)

In specific, we will start from the testing work, using HULK robot
(reports lots of bugs to mainline kernel) testing framework to test
compile, reboot, functional testing, and will extend to basic
performance regression testing in the future.


Great!  Do you all need an email notification when the -rc releases come
out for the stable trees, or can you trigger off of the -rc stable git
tree?  Different CI systems work in different ways :)


We can trigger the test when you updated the -rc stable git tree,
by monitoring new commits for the stable branches. So if you push
all the commits at once for -rc stable branches, then our CI system
can work well.


I do push to the -rc branches, but those branches are rebased, and I do
"intermediate" pushes as well.  Meaning I push to have CI systems run on
the existing patch queue at times that are not only the "main" -rc
release periods.

Watch the branches for a few weeks to get an idea of how they work if
you are curious.


Will run our CI system based on your -rc branch to see if anything
doesn't work, then update our system as needed.

Thanks
Hanjun


Re: [PATCH v1 0/4] ACPI: PCI: Unify printing of messages

2021-02-20 Thread Hanjun Guo

On 2021/2/20 2:14, Rafael J. Wysocki wrote:

Hi All,

This series gets rid of ACPICA debugging from non-ACPICA code related to PCI
(patches [1-3/4]) and replaces direct printk() usage in pci_link.c with
pr_*() or acpi_handle_*().

Please refer to the patch changelogs for details.


Looks good to me,

Reviewed-by: Hanjun Guo 

Thanks
Hanjun


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-19 Thread Hanjun Guo

On 2021/2/19 17:08, Greg Kroah-Hartman wrote:

On Fri, Feb 19, 2021 at 04:54:24PM +0800, Hanjun Guo wrote:

Hi Greg,

On 2021/1/26 15:29, Greg Kroah-Hartman wrote:
[...]


I want to see companies _using_ the kernel, and most importantly,
_updating_ their devices with it, to know if it is worth to keep around
for longer than 2 years.  I also, hopefully, want to see how those
companies will help me out in the testing and maintenance of that kernel
version in order to make supporting it for 6 years actually possible.

So, are you planning on using 5.10?  Will you will be willing to help
out in testing the -rc releases I make to let me know if there are any
problems, and to help in pointing out and backporting any specific
patches that your platforms need for that kernel release?


We(Huawei) are willing to commit resources to help out in testing the
stable -rc releases, and to help to backport patches for stable kernels.


Wonderful!


5.10 stable kernel will be used for openEuler [1] kernel and also inside
Huawei. From customer's feedback, it's very important to see the stable
kernel we used to be maintained for 6 years in the community, and we
will use 5.10 kernel for at least 6 years, so we are willing to help
you and help ourselves :)

In specific, we will start from the testing work, using HULK robot
(reports lots of bugs to mainline kernel) testing framework to test
compile, reboot, functional testing, and will extend to basic
performance regression testing in the future.


Great!  Do you all need an email notification when the -rc releases come
out for the stable trees, or can you trigger off of the -rc stable git
tree?  Different CI systems work in different ways :)


We can trigger the test when you updated the -rc stable git tree,
by monitoring new commits for the stable branches. So if you push
all the commits at once for -rc stable branches, then our CI system
can work well.



And if you can reply to the -rc release emails with a "Tested-by:" tag,
I will be glad to add that to the release commit when that happens to
show that you all have tested the release.


Thanks, will reply "Tested-by:" with -rc releases. We are working on
setting up the test farm and will report the test results in a week.




And we will start from ARM64 and X86 architecture first, and then extend
to other platforms.


That's a good start, the useful ones :)


For patch backporting, will send the bugfix patches (from mainline)
we spotted, but I think this work may not doing in regular but will
be triggered as needed.


That's fine, it is not something that happens at a regular interval.


Does this sound good to you?


Yes it does, thank you so much.

greg k-h


Thanks
Hanjun


Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-19 Thread Hanjun Guo

Hi Greg,

On 2021/1/26 15:29, Greg Kroah-Hartman wrote:
[...]


I want to see companies _using_ the kernel, and most importantly,
_updating_ their devices with it, to know if it is worth to keep around
for longer than 2 years.  I also, hopefully, want to see how those
companies will help me out in the testing and maintenance of that kernel
version in order to make supporting it for 6 years actually possible.

So, are you planning on using 5.10?  Will you will be willing to help
out in testing the -rc releases I make to let me know if there are any
problems, and to help in pointing out and backporting any specific
patches that your platforms need for that kernel release?


We(Huawei) are willing to commit resources to help out in testing the
stable -rc releases, and to help to backport patches for stable kernels.

5.10 stable kernel will be used for openEuler [1] kernel and also inside
Huawei. From customer's feedback, it's very important to see the stable
kernel we used to be maintained for 6 years in the community, and we
will use 5.10 kernel for at least 6 years, so we are willing to help
you and help ourselves :)

In specific, we will start from the testing work, using HULK robot
(reports lots of bugs to mainline kernel) testing framework to test
compile, reboot, functional testing, and will extend to basic
performance regression testing in the future.

And we will start from ARM64 and X86 architecture first, and then extend
to other platforms.

For patch backporting, will send the bugfix patches (from mainline)
we spotted, but I think this work may not doing in regular but will
be triggered as needed.

Does this sound good to you?

Thanks
Hanjun

[1]: https://openeuler.org/en/


Re: [PATCH v3 4/5] ACPI: video: Clean up printing messages

2021-02-03 Thread Hanjun Guo

On 2021/2/4 2:48, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki

Replace the ACPI_DEBUG_PRINT() instances in acpi_video.c with
acpi_handle_debug() calls and the ACPI_EXCEPTION()/ACPI_ERROR()/
ACPI_WARNING() instances in there with acpi_handle_info() calls,
which among other things causes the excessive log levels of those
messages to be increased.

Drop the _COMPONENT and ACPI_MODULE_NAME() definitions that are not
used any more from acpi_video.c, drop the no longer needed
ACPI_VIDEO_COMPONENT definition from the headers and update the
documentation accordingly.

While at it, add a pr_fmt() definition to acpi_video.c, replace the
direct printk() invocations in there with acpi_handle_info() or
pr_info() (and reduce the excessive log level where applicable) and
drop the PREFIX sybmbol definition which is not necessary any more
from acpi_video.c.

Also make unrelated janitorial changes to fix up white space and
use ACPI_FAILURE() instead of negating ACPI_SUCCESS().

Signed-off-by: Rafael J. Wysocki


Reviewed-by: Hanjun Guo 

Thanks
Hanjun


Re: [PATCH v3 2/5] ACPI: battery: Clean up printing messages

2021-02-03 Thread Hanjun Guo

On 2021/2/4 2:44, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki

Replace the ACPI_DEBUG_PRINT() and ACPI_EXCEPTION() instances
in battery.c with acpi_handle_debug() and acpi_handle_info() calls,
respectively, which among other things causes the excessive log
level of the messages previously printed via ACPI_EXCEPTION() to
be increased.

Drop the _COMPONENT and ACPI_MODULE_NAME() definitions that are not
used any more, drop the no longer needed ACPI_BATTERY_COMPONENT
definition from the headers and update the documentation accordingly.

While at it, update the pr_fmt() definition and drop the unneeded
PREFIX sybmbol definition from battery.c.  Also adapt the existing
pr_info() calls to the new pr_fmt() definition.

Signed-off-by: Rafael J. Wysocki


Reviewed-by: Hanjun Guo 

Thanks
Hanjun


Re: [PATCH v3 1/5] ACPI: AC: Clean up printing messages

2021-02-03 Thread Hanjun Guo

On 2021/2/4 2:43, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Replace the ACPI_DEBUG_PRINT() and ACPI_EXCEPTION() instances
in ac.c with acpi_handle_debug() and acpi_handle_info() calls,
respectively, which among other things causes the excessive log
level of the messages previously printed via ACPI_EXCEPTION() to
be increased.

Drop the _COMPONENT and ACPI_MODULE_NAME() definitions that are not
used any more, drop the no longer needed ACPI_AC_COMPONENT definition
from the headers and update the documentation accordingly.

While at it, replace the direct printk() invocation with pr_info(),
add a pr_fmt() definition to ac.c and drop the unneeded PREFIX
symbol definition from there.

Signed-off-by: Rafael J. Wysocki 
---

v2 -> v3: Also add a pr_fmt() definition to ac.c and replace direct
   printk() with pr_info (no log level change).


Reviewed-by: Hanjun Guo 


Re: [PATCH v2 5/5] ACPI: thermal: Clean up printing messages

2021-02-02 Thread Hanjun Guo

On 2021/2/3 2:19, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Replace the ACPI_DEBUG_PRINT() instances in thermal.c with
acpi_handle_debug() calls and modify the ACPI_THERMAL_TRIPS_EXCEPTION()
macro in there to use acpi_handle_info() internally,  which among other
things causes the excessive log level of the messages printed by it to
be more adequate.

Drop the _COMPONENT and ACPI_MODULE_NAME() definitions that are not
used any more from thermal.c, drop the no longer needed
ACPI_THERMAL_COMPONENT definition from the headers and update the
documentation accordingly.

While at it, add a pr_fmt() definition to thermal.c, drop the PREFIX
definition from there and replace some pr_warn() calls with pr_info()
or acpi_handle_info() to reduce the excessive log level and (in the
latter case) facilitate easier identification of the message source.

Signed-off-by: Rafael J. Wysocki 
---

v1 -> v2: Changelog update


Reviewed-by: Hanjun Guo 

Thanks
Hanjun


Re: [PATCH v2 4/5] ACPI: video: Clean up printing messages

2021-02-02 Thread Hanjun Guo

On 2021/2/3 2:18, Rafael J. Wysocki wrote:
[...]

Also make one unrelated janitorial change to fix up white space and
use ACPI_FAILURE() instead of negating ACPI_SUCCESS().


[...]


status = acpi_evaluate_object(video->device->handle, "_DOD", NULL, 
);
if (!ACPI_SUCCESS(status)) {


if (ACPI_FAILURE(status))


-   ACPI_EXCEPTION((AE_INFO, status, "Evaluating _DOD"));
+   acpi_handle_info(video->device->handle,
+"_DOD evaluation failed: %s\n",
+acpi_format_exception(status));



Thanks
Hanjun


Re: [PATCH v2 3/5] ACPI: button: Clean up printing messages

2021-02-02 Thread Hanjun Guo

On 2021/2/3 2:17, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Replace the ACPI_DEBUG_PRINT() instance in button.c with an
acpi_handle_debug() call, drop the _COMPONENT and ACPI_MODULE_NAME()
definitions that are not used any more, drop the no longer needed
ACPI_BUTTON_COMPONENT definition from the headers and update the
documentation accordingly.

While at it, replace the direct printk() invocations with pr_info()
(that changes the excessive log level for some of them too) and drop
the unneeded PREFIX sybmbol definition from battery.c.

Signed-off-by: Rafael J. Wysocki 


Reviewed-by: Hanjun Guo 


Re: [PATCH v2 2/5] ACPI: battery: Clean up printing messages

2021-02-02 Thread Hanjun Guo

On 2021/2/3 2:15, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Replace the ACPI_DEBUG_PRINT() and ACPI_EXCEPTION() instances
in battery.c with acpi_handle_debug() and acpi_handle_info() calls,
respectively, which among other things causes the excessive log
level of the messages previously printed via ACPI_EXCEPTION() to
be more adequate.

Drop the _COMPONENT and ACPI_MODULE_NAME() definitions that are not
used any more, drop the no longer needed ACPI_BATTERY_COMPONENT
definition from the headers and update the documentation accordingly.

While at it, update the pr_fmt() definition and drop the unneeded
PREFIX sybmbol definition from battery.c.

Signed-off-by: Rafael J. Wysocki 
---

v1 -> v2: Changelog update

---
  Documentation/firmware-guide/acpi/debug.rst |1
  drivers/acpi/battery.c  |   29 
++--
  drivers/acpi/sysfs.c|1
  include/acpi/acpi_drivers.h |1
  4 files changed, 15 insertions(+), 17 deletions(-)

Index: linux-pm/drivers/acpi/battery.c
===
--- linux-pm.orig/drivers/acpi/battery.c
+++ linux-pm/drivers/acpi/battery.c
@@ -8,7 +8,7 @@
   *  Copyright (C) 2001, 2002 Paul Diefenbaugh 
   */
  
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

+#define pr_fmt(fmt) "ACPI: battery: " fmt
  

[...]
  
-	pr_info(PREFIX "%s Slot [%s] (battery %s)\n",

+   pr_info("%s Slot [%s] (battery %s)\n",
ACPI_BATTERY_DEVICE_NAME, acpi_device_bid(device),
device->status.battery_present ? "present" : "absent");


Will print:
ACPI: battery: Battery Slot ...

How about:
ACPI: battery: Slot ...

  
@@ -1282,7 +1283,7 @@ static void __init acpi_battery_init_asy

if (battery_check_pmic) {
for (i = 0; i < ARRAY_SIZE(acpi_battery_blacklist); i++)
if (acpi_dev_present(acpi_battery_blacklist[i], "1", 
-1)) {
-   pr_info(PREFIX ACPI_BATTERY_DEVICE_NAME
+   pr_info(ACPI_BATTERY_DEVICE_NAME
": found native %s PMIC, not loading\n",
acpi_battery_blacklist[i]);


Will print:
ACPI: battery: Battery: found native...

will be better for
ACPI: battery: found native

So I think we can remove ACPI_BATTERY_DEVICE_NAME in pr_info()

Thanks
Hanjun


Re: [PATCH v2 1/5] ACPI: AC: Clean up printing messages

2021-02-02 Thread Hanjun Guo

Hi Rafael,

On 2021/2/3 2:14, Rafael J. Wysocki wrote:

From: Rafael J. Wysocki 

Replace the ACPI_DEBUG_PRINT() and ACPI_EXCEPTION() instances
in ac.c with acpi_handle_debug() and acpi_handle_info() calls,
respectively, which among other things causes the excessive log
level of the messages previously printed via ACPI_EXCEPTION() to
be more adequate.

Drop the _COMPONENT and ACPI_MODULE_NAME() definitions that are not
used any more, drop the no longer needed ACPI_AC_COMPONENT definition
from the headers and update the documentation accordingly.

Signed-off-by: Rafael J. Wysocki 
---

v1 -> v2: Changelog update

---
  Documentation/firmware-guide/acpi/debug.rst |1 -
  drivers/acpi/ac.c   |   12 +---
  drivers/acpi/sysfs.c|1 -
  include/acpi/acpi_drivers.h |1 -
  4 files changed, 5 insertions(+), 10 deletions(-)

Index: linux-pm/Documentation/firmware-guide/acpi/debug.rst
===
--- linux-pm.orig/Documentation/firmware-guide/acpi/debug.rst
+++ linux-pm/Documentation/firmware-guide/acpi/debug.rst
@@ -52,7 +52,6 @@ shows the supported mask values, current
  ACPI_CA_DISASSEMBLER0x0800
  ACPI_COMPILER   0x1000
  ACPI_TOOLS  0x2000
-ACPI_AC_COMPONENT   0x0002
  ACPI_BATTERY_COMPONENT  0x0004
  ACPI_BUTTON_COMPONENT   0x0008
  ACPI_SBS_COMPONENT  0x0010
Index: linux-pm/drivers/acpi/ac.c
===
--- linux-pm.orig/drivers/acpi/ac.c
+++ linux-pm/drivers/acpi/ac.c


In this file, printk() is still using, how about convert them
all into pr_*? I think patch on top your or update this patch
are both OK.

Thanks
Hanjun


Re: [PATCH] ACPI: configfs: add missing check after configfs_register_default_group

2021-01-13 Thread Hanjun Guo

On 2021/1/13 15:30, Qinglang Miao wrote:

A list_add corruption is reported by Hulk Robot like this:
==
list_add corruption.
Call Trace:
link_obj+0xc0/0x1c0
link_group+0x21/0x140
configfs_register_subsystem+0xdb/0x380
acpi_configfs_init+0x25/0x1000 [acpi_configfs]
do_one_initcall+0x149/0x820
do_init_module+0x1ef/0x720
load_module+0x35c8/0x4380
__do_sys_finit_module+0x10d/0x1a0
do_syscall_64+0x34/0x80

It's because of the missing check after configfs_register_default_group,
where configfs_unregister_subsystem should be called once failure.


I think it's right to call *_unregister_* for error path, but...



Fixes: 612bd01fc6e0 ("ACPI: add support for loading SSDTs via configfs")
Reported-by: Hulk Robot 
Signed-off-by: Qinglang Miao 
---
  drivers/acpi/acpi_configfs.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/acpi_configfs.c b/drivers/acpi/acpi_configfs.c
index cf91f4910..25512ab0e 100644
--- a/drivers/acpi/acpi_configfs.c
+++ b/drivers/acpi/acpi_configfs.c
@@ -268,7 +268,12 @@ static int __init acpi_configfs_init(void)
  
  	acpi_table_group = configfs_register_default_group(root, "table",

   _tables_type);
-   return PTR_ERR_OR_ZERO(acpi_table_group);
+   if (IS_ERR(acpi_table_group)) {
+   configfs_register_subsystem(_configfs);


...

Typo here, s/register/unregister.

Thanks
Hanjun


[PATCH v3 1/2] drm/amdkfd: Move the ignore_crat check before the CRAT table get

2020-11-12 Thread Hanjun Guo
If the ignore_crat is set to non-zero value, it's no point getting
the CRAT table, so just move the ignore_crat check before we get the
CRAT table.

Signed-off-by: Hanjun Guo 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 3de5e14..c23e571 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -780,6 +780,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
 
*crat_image = NULL;
 
+   if (kfd_ignore_crat()) {
+   pr_info("CRAT table disabled by module option\n");
+   return -ENODATA;
+   }
+
/* Fetch the CRAT table from ACPI */
status = acpi_get_table(CRAT_SIGNATURE, 0, _table);
if (status == AE_NOT_FOUND) {
@@ -792,11 +797,6 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
return -EINVAL;
}
 
-   if (kfd_ignore_crat()) {
-   pr_info("CRAT table disabled by module option\n");
-   return -ENODATA;
-   }
-
pcrat_image = kvmalloc(crat_table->length, GFP_KERNEL);
if (!pcrat_image)
return -ENOMEM;
-- 
1.7.12.4



[PATCH v3 2/2] drm/amdkfd: Put ACPI table after using it

2020-11-12 Thread Hanjun Guo
The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping which can prevent the memory leak.

In kfd_create_crat_image_acpi(), crat_table is copied to pcrat_image,
and in kfd_create_vcrat_image_cpu(), the acpi_table is only used to
get the OEM information, so those two table mappings need to be released
after using it.

Fixes: 174de876d6d0 ("drm/amdkfd: Group up CRAT related functions")
Fixes: 520b8fb755cc ("drm/amdkfd: Add topology support for CPUs")
Signed-off-by: Hanjun Guo 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index c23e571..0dc8de0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -774,6 +774,7 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
struct acpi_table_header *crat_table;
acpi_status status;
void *pcrat_image;
+   int rc = 0;
 
if (!crat_image)
return -EINVAL;
@@ -798,14 +799,17 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
}
 
pcrat_image = kvmalloc(crat_table->length, GFP_KERNEL);
-   if (!pcrat_image)
-   return -ENOMEM;
+   if (!pcrat_image) {
+   rc = -ENOMEM;
+   goto out;
+   }
 
memcpy(pcrat_image, crat_table, crat_table->length);
*crat_image = pcrat_image;
*size = crat_table->length;
-
-   return 0;
+out:
+   acpi_put_table(crat_table);
+   return rc;
 }
 
 /* Memory required to create Virtual CRAT.
@@ -988,6 +992,7 @@ static int kfd_create_vcrat_image_cpu(void *pcrat_image, 
size_t *size)
CRAT_OEMID_LENGTH);
memcpy(crat_table->oem_table_id, acpi_table->oem_table_id,
CRAT_OEMTABLEID_LENGTH);
+   acpi_put_table(acpi_table);
}
crat_table->total_entries = 0;
crat_table->num_domains = 0;
-- 
1.7.12.4



Re: [PATCH v4 0/6] resource: introduce union(), intersection() API

2020-11-03 Thread Hanjun Guo

On 2020/11/3 16:31, Andy Shevchenko wrote:

On Tue, Nov 3, 2020 at 2:46 AM Hanjun Guo  wrote:


On 2020/11/3 5:00, Andy Shevchenko wrote:

Some users may want to use resource library to manage their own resources,
besides existing users that open code union() and intersection()
implementations.

Provide a generic API for wider use.

Changelog v4:
- added Rb tag (Rafael)
- Cc'ed to LKML and Greg (Rafael)

Changelog v3:
- rebased on top of v5.10-rc1
- dropped upstreamed dependencies
- added Rb tag to the last patch (Mika)

Cc: Kuppuswamy Sathyanarayanan 
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org

Andy Shevchenko (6):
resource: Simplify region_intersects() by reducing conditionals
resource: Group resource_overlaps() with other inline helpers
resource: Introduce resource_union() for overlapping resources
resource: Introduce resource_intersection() for overlapping resources
PCI/ACPI: Replace open coded variant of resource_union()
ACPI: watchdog: Replace open coded variant of resource_union()

   drivers/acpi/acpi_watchdog.c |  6 +-
   drivers/acpi/pci_root.c  |  4 +---
   include/linux/ioport.h   | 34 +++---
   kernel/resource.c| 10 +-
   4 files changed, 34 insertions(+), 20 deletions(-)


Reviewed-by: Hanjun Guo 


Thanks. Is it for the entire series?


Yes.

By the way, I tested this patch set on a ARM64 machine booting
with ACPI against 5.10-rc2, and no regressions with PCI, so feel
free to add my Tested-by tag for patch [1,2,3,5/6].

Thanks
Hanjun


Re: [PATCH v4 0/6] resource: introduce union(), intersection() API

2020-11-02 Thread Hanjun Guo

On 2020/11/3 5:00, Andy Shevchenko wrote:

Some users may want to use resource library to manage their own resources,
besides existing users that open code union() and intersection()
implementations.

Provide a generic API for wider use.

Changelog v4:
- added Rb tag (Rafael)
- Cc'ed to LKML and Greg (Rafael)

Changelog v3:
- rebased on top of v5.10-rc1
- dropped upstreamed dependencies
- added Rb tag to the last patch (Mika)

Cc: Kuppuswamy Sathyanarayanan 
Cc: Bjorn Helgaas 
Cc: linux-...@vger.kernel.org

Andy Shevchenko (6):
   resource: Simplify region_intersects() by reducing conditionals
   resource: Group resource_overlaps() with other inline helpers
   resource: Introduce resource_union() for overlapping resources
   resource: Introduce resource_intersection() for overlapping resources
   PCI/ACPI: Replace open coded variant of resource_union()
   ACPI: watchdog: Replace open coded variant of resource_union()

  drivers/acpi/acpi_watchdog.c |  6 +-
  drivers/acpi/pci_root.c  |  4 +---
  include/linux/ioport.h   | 34 +++---
  kernel/resource.c| 10 +-
  4 files changed, 34 insertions(+), 20 deletions(-)


Reviewed-by: Hanjun Guo 


Re: [PATCH v4 6/7] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-28 Thread Hanjun Guo

On 2020/10/21 20:34, Nicolas Saenz Julienne wrote:

From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.

Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: Rebased, removed documentation change and add declaration in 
acpi_iort.h]
Signed-off-by: Nicolas Saenz Julienne 

---

Changes since v3:
  - Use min_not_zero()
  - Check ACPI revision
  - Remove unnecessary #ifdef in zone_sizes_init()

  arch/arm64/mm/init.c  |  3 ++-
  drivers/acpi/arm64/iort.c | 52 +++
  include/linux/acpi_iort.h |  4 +++


Acked-by: Hanjun Guo 


Re: [PATCH] ACPI: dock: fix enum-conversion warning

2020-10-26 Thread Hanjun Guo

On 2020/10/27 5:48, Arnd Bergmann wrote:

From: Arnd Bergmann 

gcc points out a type mismatch:

drivers/acpi/dock.c: In function 'hot_remove_dock_devices':
drivers/acpi/dock.c:234:53: warning: implicit conversion from 'enum 
' to 'enum dock_callback_type' [-Wenum-conversion]
   234 |   dock_hotplug_event(dd, ACPI_NOTIFY_EJECT_REQUEST, false);

This is harmless because 'false' still has the correct numeric value,
but passing DOCK_CALL_HANDLER documents better what is going on
and avoids the warning.

Fixes: 37f908778f20 ("ACPI / dock: Walk list in reverse order during removal of 
devices")
Fixes: f09ce741a03a ("ACPI / dock / PCI: Drop ACPI dock notifier chain")
Signed-off-by: Arnd Bergmann 
---
  drivers/acpi/dock.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/dock.c b/drivers/acpi/dock.c
index 45d4b7b69de8..24e076f44d23 100644
--- a/drivers/acpi/dock.c
+++ b/drivers/acpi/dock.c
@@ -231,7 +231,8 @@ static void hot_remove_dock_devices(struct dock_station *ds)
 * between them).
 */
list_for_each_entry_reverse(dd, >dependent_devices, list)
-   dock_hotplug_event(dd, ACPI_NOTIFY_EJECT_REQUEST, false);
+   dock_hotplug_event(dd, ACPI_NOTIFY_EJECT_REQUEST,
+  DOCK_CALL_HANDLER);
  
  	list_for_each_entry_reverse(dd, >dependent_devices, list)

acpi_bus_trim(dd->adev);


Reviewed-by: Hanjun Guo 


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-16 Thread Hanjun Guo

On 2020/10/16 15:27, Hanjun Guo wrote:
The patch only takes the address limit field into account if its value 
> 0.


Sorry I missed the if (*->memory_address_limit) check, thanks
for the reminding.



Also, before commit 7fb89e1d44cb6aec ("ACPI/IORT: take _DMA methods
into account for named components"), the _DMA method was not taken
into account for named components at all, and only the IORT limit was
used, so I do not anticipate any problems with that.


Then this patch is fine to me.


Certainly we need to address Lorenzo's comments.

Thanks
Hanjun


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-16 Thread Hanjun Guo

Hi Ard,

On 2020/10/16 14:54, Ard Biesheuvel wrote:

On Fri, 16 Oct 2020 at 08:51, Hanjun Guo  wrote:


On 2020/10/16 2:03, Catalin Marinas wrote:

On Thu, Oct 15, 2020 at 10:26:18PM +0800, Hanjun Guo wrote:

On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:

From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.


Sorry, I'm still a little bit confused. With this patch, if we have
a device which set the right _DMA method (DMA size >= 32), but with the
wrong DMA size in IORT, we still have the ZONE_DMA created which
is actually not needed?


With the current kernel, we get a ZONE_DMA already with an arbitrary
size of 1GB that matches what RPi4 needs. We are trying to eliminate
such unnecessary ZONE_DMA based on some heuristics (well, something that
looks "better" than a OEM ID based quirk). Now, if we learn that IORT
for platforms in the field is that broken as to describe few bits-wide
DMA masks, we may have to go back to the OEM ID quirk.


Some platforms using 0 as the memory size limit, for example D05 [0] and
D06 [1], I think we need to go back to the OEM ID quirk.

For D05/D06, there are multi interrupt controllers named as mbigen,
mbigen is using the named component to describe the mappings with
the ITS controller, and mbigen is using 0 as the memory size limit.

Also since the memory size limit for PCI RC was introduced by later
IORT revision, so firmware people may think it's fine to set that
as 0 because the system works without it.



Hello Hanjun,

The patch only takes the address limit field into account if its value > 0.


Sorry I missed the if (*->memory_address_limit) check, thanks
for the reminding.



Also, before commit 7fb89e1d44cb6aec ("ACPI/IORT: take _DMA methods
into account for named components"), the _DMA method was not taken
into account for named components at all, and only the IORT limit was
used, so I do not anticipate any problems with that.


Then this patch is fine to me.

Thanks
Hanjun


Re: [PATCH 1/1] ACPI/IORT: Fix doc warnings in iort.c

2020-10-16 Thread Hanjun Guo

On 2020/10/14 17:31, Shiju Jose wrote:

Fix following warnings caused by mismatch between
function parameters and function comments.

drivers/acpi/arm64/iort.c:55: warning: Function parameter or member 'iort_node' 
not described in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:55: warning: Excess function parameter 'node' 
description in 'iort_set_fwnode'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 'id' not 
described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Function parameter or member 
'bus_token' not described in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:682: warning: Excess function parameter 'req_id' 
description in 'iort_get_device_domain'
drivers/acpi/arm64/iort.c:1142: warning: Function parameter or member 
'dma_size' not described in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1142: warning: Excess function parameter 'size' 
description in 'iort_dma_setup'
drivers/acpi/arm64/iort.c:1534: warning: Function parameter or member 'ops' not 
described in 'iort_add_platform_device'

Signed-off-by: Shiju Jose 


Acked-by: Hanjun Guo 


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-16 Thread Hanjun Guo

On 2020/10/16 2:03, Catalin Marinas wrote:

On Thu, Oct 15, 2020 at 10:26:18PM +0800, Hanjun Guo wrote:

On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:

From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.


Sorry, I'm still a little bit confused. With this patch, if we have
a device which set the right _DMA method (DMA size >= 32), but with the
wrong DMA size in IORT, we still have the ZONE_DMA created which
is actually not needed?


With the current kernel, we get a ZONE_DMA already with an arbitrary
size of 1GB that matches what RPi4 needs. We are trying to eliminate
such unnecessary ZONE_DMA based on some heuristics (well, something that
looks "better" than a OEM ID based quirk). Now, if we learn that IORT
for platforms in the field is that broken as to describe few bits-wide
DMA masks, we may have to go back to the OEM ID quirk.


Some platforms using 0 as the memory size limit, for example D05 [0] and
D06 [1], I think we need to go back to the OEM ID quirk.

For D05/D06, there are multi interrupt controllers named as mbigen,
mbigen is using the named component to describe the mappings with
the ITS controller, and mbigen is using 0 as the memory size limit.

Also since the memory size limit for PCI RC was introduced by later
IORT revision, so firmware people may think it's fine to set that
as 0 because the system works without it.

Thanks
Hanjun

[0]:
https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1616/D05AcpiTables/D05Iort.asl
[1]:
https://github.com/tianocore/edk2-platforms/blob/master/Silicon/Hisilicon/Hi1620/Hi1620AcpiTables/Hi1620Iort.asl


Re: [PATCH v3 7/8] arm64: mm: Set ZONE_DMA size based on early IORT scan

2020-10-15 Thread Hanjun Guo

On 2020/10/15 3:12, Nicolas Saenz Julienne wrote:

From: Ard Biesheuvel 

We recently introduced a 1 GB sized ZONE_DMA to cater for platforms
incorporating masters that can address less than 32 bits of DMA, in
particular the Raspberry Pi 4, which has 4 or 8 GB of DRAM, but has
peripherals that can only address up to 1 GB (and its PCIe host
bridge can only access the bottom 3 GB)

Instructing the DMA layer about these limitations is straight-forward,
even though we had to fix some issues regarding memory limits set in
the IORT for named components, and regarding the handling of ACPI _DMA
methods. However, the DMA layer also needs to be able to allocate
memory that is guaranteed to meet those DMA constraints, for bounce
buffering as well as allocating the backing for consistent mappings.

This is why the 1 GB ZONE_DMA was introduced recently. Unfortunately,
it turns out the having a 1 GB ZONE_DMA as well as a ZONE_DMA32 causes
problems with kdump, and potentially in other places where allocations
cannot cross zone boundaries. Therefore, we should avoid having two
separate DMA zones when possible.

So let's do an early scan of the IORT, and only create the ZONE_DMA
if we encounter any devices that need it. This puts the burden on
the firmware to describe such limitations in the IORT, which may be
redundant (and less precise) if _DMA methods are also being provided.
However, it should be noted that this situation is highly unusual for
arm64 ACPI machines. Also, the DMA subsystem still gives precedence to
the _DMA method if implemented, and so we will not lose the ability to
perform streaming DMA outside the ZONE_DMA if the _DMA method permits
it.


Sorry, I'm still a little bit confused. With this patch, if we have
a device which set the right _DMA method (DMA size >= 32), but with the
wrong DMA size in IORT, we still have the ZONE_DMA created which
is actually not needed?



Cc: Jeremy Linton 
Cc: Lorenzo Pieralisi 
Cc: Nicolas Saenz Julienne 
Cc: Rob Herring 
Cc: Christoph Hellwig 
Cc: Robin Murphy 
Cc: Hanjun Guo 
Cc: Sudeep Holla 
Cc: Anshuman Khandual 
Signed-off-by: Ard Biesheuvel 
[nsaenz: Rebased, removed documentation change, warnings and add
declaration in acpi_iort.h]
Signed-off-by: Nicolas Saenz Julienne 
---
  arch/arm64/mm/init.c  |  6 +
  drivers/acpi/arm64/iort.c | 51 +++
  include/linux/acpi_iort.h |  4 +++
  3 files changed, 61 insertions(+)

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 97b0d2768349..f321761eedb2 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -29,6 +29,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include 

  #include 
@@ -196,6 +197,11 @@ static void __init zone_sizes_init(unsigned long min, 
unsigned long max)
  #ifdef CONFIG_ZONE_DMA
zone_dma_bits = min(zone_dma_bits,
(unsigned 
int)ilog2(of_dma_get_max_cpu_address(NULL)));
+
+   if (IS_ENABLED(CONFIG_ACPI))
+   zone_dma_bits = min(zone_dma_bits,
+   acpi_iort_get_zone_dma_size());
+
arm64_dma_phys_limit = max_zone_phys(zone_dma_bits);
max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
  #endif
diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index 9929ff50c0c0..8f530bf3c03b 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1718,3 +1718,54 @@ void __init acpi_iort_init(void)
  
  	iort_init_platform_devices();

  }
+
+#ifdef CONFIG_ZONE_DMA
+/*
+ * Check the IORT whether any devices exist whose DMA mask is < 32 bits.
+ * If so, return the smallest value encountered, or 32 otherwise.
+ */
+unsigned int __init acpi_iort_get_zone_dma_size(void)
+{
+   struct acpi_table_iort *iort;
+   struct acpi_iort_node *node, *end;
+   acpi_status status;
+   u8 limit = 32;
+   int i;
+
+   if (acpi_disabled)
+   return limit;
+
+   status = acpi_get_table(ACPI_SIG_IORT, 0,
+   (struct acpi_table_header **));
+   if (ACPI_FAILURE(status))
+   return limit;
+
+   node = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->node_offset);
+   end = ACPI_ADD_PTR(struct acpi_iort_node, iort, iort->header.length);
+
+   for (i = 0; i < iort->node_count; i++) {
+   if (node >= end)
+   break;
+
+   switch (node->type) {
+   struct acpi_iort_named_component *ncomp;
+   struct acpi_iort_root_complex *rc;
+
+   case ACPI_IORT_NODE_NAMED_COMPONENT:
+   ncomp = (struct acpi_iort_named_component 
*)node->node_data;
+   if (ncomp->memory_address_limit)
+   limit = min(limit, ncomp->memory_address_limit);
+   break;
+
+   case ACPI_IORT_NODE_PCI_ROOT_COMPLEX:
+   rc = (str

Re: [PATCH] ACPI / NUMA: Add stub function for pxm_to_node

2020-09-30 Thread Hanjun Guo

On 2020/9/29 3:45, Nathan Chancellor wrote:

After commit 01feba590cd6 ("ACPI: Do not create new NUMA domains from
ACPI static tables that are not SRAT"):

$ scripts/config --file arch/x86/configs/x86_64_defconfig -d NUMA -e ACPI_NFIT

$ make -skj"$(nproc)" distclean defconfig drivers/acpi/nfit/
drivers/acpi/nfit/core.c: In function ‘acpi_nfit_register_region’:
drivers/acpi/nfit/core.c:3010:27: error: implicit declaration of
function ‘pxm_to_node’; did you mean ‘xa_to_node’?
[-Werror=implicit-function-declaration]
  3010 |   ndr_desc->target_node = pxm_to_node(spa->proximity_domain);
   |   ^~~
   |   xa_to_node
cc1: some warnings being treated as errors
...

Add a stub function like acpi_map_pxm_to_node had so that the build
continues to work.

Fixes: 01feba590cd6 ("ACPI: Do not create new NUMA domains from ACPI static tables 
that are not SRAT")
Signed-off-by: Nathan Chancellor 
---

I am not sure if this is the right place or value for this. It looks
like there is going to be another stub function added here, which is
going through -mm:

https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.st...@dwillia2-desk3.amr.corp.intel.com

  include/acpi/acpi_numa.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
index fdebcfc6c8df..09eb3bc20ff5 100644
--- a/include/acpi/acpi_numa.h
+++ b/include/acpi/acpi_numa.h
@@ -22,5 +22,10 @@ extern int acpi_numa __initdata;
  extern void bad_srat(void);
  extern int srat_disabled(void);
  
+#else/* CONFIG_ACPI_NUMA */

+static inline int pxm_to_node(int pxm)
+{
+   return 0;
+}
  #endif/* CONFIG_ACPI_NUMA */
  #endif/* __ACP_NUMA_H */


Looks good to me,

Reviewed-by: Hanjun Guo 


Re: [PATCH v2 0/2] ACPI/IORT: Code cleanups

2020-09-02 Thread Hanjun Guo

On 2020/9/2 17:48, Will Deacon wrote:

On Wed, Sep 02, 2020 at 05:17:43PM +0800, Hanjun Guo wrote:

+Cc Will

On 2020/8/18 17:16, Hanjun Guo wrote:

On 2020/8/18 14:36, Zenghui Yu wrote:

* From v1 [1]:
    - As pointed out by Hanjun, remove two now unused inline functions.
  Compile tested with CONFIG_IOMMU_API is not selected.

[1] https://lore.kernel.org/r/20200817105946.1511-1-yuzeng...@huawei.com

Zenghui Yu (2):
    ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
    ACPI/IORT: Remove the unused inline functions

   drivers/acpi/arm64/iort.c | 10 ++
   1 file changed, 2 insertions(+), 8 deletions(-)


Nice cleanup.

Acked-by: Hanjun Guo 


Will, would you mind taking this patch set via ARM64 tree?


Sure, no problem. I'll queue this for 5.10 later this week.


Thanks!

Hanjun



Re: [PATCH v2 0/2] ACPI/IORT: Code cleanups

2020-09-02 Thread Hanjun Guo

+Cc Will

On 2020/8/18 17:16, Hanjun Guo wrote:

On 2020/8/18 14:36, Zenghui Yu wrote:

* From v1 [1]:
   - As pointed out by Hanjun, remove two now unused inline functions.
 Compile tested with CONFIG_IOMMU_API is not selected.

[1] https://lore.kernel.org/r/20200817105946.1511-1-yuzeng...@huawei.com

Zenghui Yu (2):
   ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
   ACPI/IORT: Remove the unused inline functions

  drivers/acpi/arm64/iort.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)


Nice cleanup.

Acked-by: Hanjun Guo 


Will, would you mind taking this patch set via ARM64 tree?

Thanks
Hanjun



Re: [PATCH v2 1/2] drm/amdkfd: Move the ignore_crat check before the CRAT table get

2020-08-27 Thread Hanjun Guo

On 2020/8/27 12:28, Felix Kuehling wrote:


Am 2020-08-26 um 4:29 a.m. schrieb Hanjun Guo:

If the ignore_crat is set to non-zero value, it's no point getting
the CRAT table, so just move the ignore_crat check before we get the
CRAT table.

Signed-off-by: Hanjun Guo 
---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 10 +-
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 6a250f8..ed77b109 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -764,6 +764,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
  
  	*crat_image = NULL;
  
+	if (ignore_crat) {


A conflicting change in this area was just submitted on Monday to
amd-staging-drm-next. You'll need to rebase your patch. It should be
straight-forward. ignore_crat was replaced with a function call
kfd_ignore_crat().

Other than that, your patch series looks good to me. If I don't see an
update from you in a day or two, I'll fix it up myself and add my R-b.


Please fix it up by yourself, thanks a lot for the help!

Regards
Hanjun



[PATCH v2 2/2] drm/amdkfd: Put ACPI table after using it

2020-08-26 Thread Hanjun Guo
The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping which can prevent the memory leak.

In kfd_create_crat_image_acpi(), crat_table is copied to pcrat_image,
and in kfd_create_vcrat_image_cpu(), the acpi_table is only used to
get the OEM information, so those two table mappings need to be released
after using it.

Fixes: 174de876d6d0 ("drm/amdkfd: Group up CRAT related functions")
Fixes: 520b8fb755cc ("drm/amdkfd: Add topology support for CPUs")
Signed-off-by: Hanjun Guo 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index ed77b109..c3f3ffb6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -758,6 +758,7 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
struct acpi_table_header *crat_table;
acpi_status status;
void *pcrat_image;
+   int rc = 0;
 
if (!crat_image)
return -EINVAL;
@@ -782,13 +783,16 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
}
 
pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL);
-   if (!pcrat_image)
-   return -ENOMEM;
+   if (!pcrat_image) {
+   rc = -ENOMEM;
+   goto out;
+   }
 
*crat_image = pcrat_image;
*size = crat_table->length;
-
-   return 0;
+out:
+   acpi_put_table(crat_table);
+   return rc;
 }
 
 /* Memory required to create Virtual CRAT.
@@ -972,6 +976,7 @@ static int kfd_create_vcrat_image_cpu(void *pcrat_image, 
size_t *size)
CRAT_OEMID_LENGTH);
memcpy(crat_table->oem_table_id, acpi_table->oem_table_id,
CRAT_OEMTABLEID_LENGTH);
+   acpi_put_table(acpi_table);
}
crat_table->total_entries = 0;
crat_table->num_domains = 0;
-- 
1.7.12.4



[PATCH v2 1/2] drm/amdkfd: Move the ignore_crat check before the CRAT table get

2020-08-26 Thread Hanjun Guo
If the ignore_crat is set to non-zero value, it's no point getting
the CRAT table, so just move the ignore_crat check before we get the
CRAT table.

Signed-off-by: Hanjun Guo 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 6a250f8..ed77b109 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -764,6 +764,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
 
*crat_image = NULL;
 
+   if (ignore_crat) {
+   pr_info("CRAT table disabled by module option\n");
+   return -ENODATA;
+   }
+
/* Fetch the CRAT table from ACPI */
status = acpi_get_table(CRAT_SIGNATURE, 0, _table);
if (status == AE_NOT_FOUND) {
@@ -776,11 +781,6 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
return -EINVAL;
}
 
-   if (ignore_crat) {
-   pr_info("CRAT table disabled by module option\n");
-   return -ENODATA;
-   }
-
pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL);
if (!pcrat_image)
return -ENOMEM;
-- 
1.7.12.4



Re: [PATCH v2 0/2] ACPI/IORT: Code cleanups

2020-08-18 Thread Hanjun Guo

On 2020/8/18 14:36, Zenghui Yu wrote:

* From v1 [1]:
   - As pointed out by Hanjun, remove two now unused inline functions.
 Compile tested with CONFIG_IOMMU_API is not selected.

[1] https://lore.kernel.org/r/20200817105946.1511-1-yuzeng...@huawei.com

Zenghui Yu (2):
   ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
   ACPI/IORT: Remove the unused inline functions

  drivers/acpi/arm64/iort.c | 10 ++
  1 file changed, 2 insertions(+), 8 deletions(-)


Nice cleanup.

Acked-by: Hanjun Guo 



Re: [PATCH] ACPI/IORT: Drop the unused @ops of iort_add_device_replay()

2020-08-17 Thread Hanjun Guo

On 2020/8/17 18:59, Zenghui Yu wrote:

Since commit d2e1a003af56 ("ACPI/IORT: Don't call iommu_ops->add_device
directly"), we use the IOMMU core API to replace a direct invoke of the
specified callback. The parameter @ops has therefore became unused. Let's
drop it.

Signed-off-by: Zenghui Yu 
---
  drivers/acpi/arm64/iort.c | 8 +++-
  1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index ec782e4a0fe4..a0ece0e201b2 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -811,8 +811,7 @@ static inline const struct iommu_ops 
*iort_fwspec_iommu_ops(struct device *dev)
return (fwspec && fwspec->ops) ? fwspec->ops : NULL;
  }
  
-static inline int iort_add_device_replay(const struct iommu_ops *ops,

-struct device *dev)
+static inline int iort_add_device_replay(struct device *dev)
  {
int err = 0;
  
@@ -1072,7 +1071,7 @@ const struct iommu_ops *iort_iommu_configure_id(struct device *dev,

 */
if (!err) {
ops = iort_fwspec_iommu_ops(dev);
-   err = iort_add_device_replay(ops, dev);
+   err = iort_add_device_replay(dev);
}
  
  	/* Ignore all other errors apart from EPROBE_DEFER */

@@ -1089,8 +1088,7 @@ const struct iommu_ops *iort_iommu_configure_id(struct 
device *dev,
  #else
  static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device 
*dev)
  { return NULL; }
-static inline int iort_add_device_replay(const struct iommu_ops *ops,
-struct device *dev)
+static inline int iort_add_device_replay(struct device *dev)


inline functions iort_fwspec_iommu_ops() and iort_add_device_replay()
are not needed anymore after commit 8212688600ed ("ACPI/IORT: Fix build
error when IOMMU_SUPPORT is disabled"), could you please add another
patch to remove them as well?

Thanks
Hanjun



Re: [PATCH] drm/amdkfd: Put ACPI table after using it

2020-07-31 Thread Hanjun Guo

On 2020/7/31 10:41, Felix Kuehling wrote:

Hi Hanjun,

Sorry for the late reply.

Thank you for the patch and the explanation. This seems to have been
broken since the first version of KFD in 2014. See one suggestion inline.

Am 2020-07-22 um 5:48 a.m. schrieb Hanjun Guo:

The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping.

In kfd_create_crat_image_acpi(), crat_table is copied to pcrat_image,
and in kfd_create_vcrat_image_cpu(), the acpi_table is only used to
get the OEM info, so those table mappings need to be release after
using it.

Signed-off-by: Hanjun Guo 
---
  drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 16 +++-
  1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1009a3b..d378e61 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -756,6 +756,7 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
struct acpi_table_header *crat_table;
acpi_status status;
void *pcrat_image;
+   int rc = 0;
  
  	if (!crat_image)

return -EINVAL;
@@ -776,17 +777,21 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
  
  	if (ignore_crat) {

pr_info("CRAT table disabled by module option\n");


We should probably move this check to before we get the CRAT table.
There is not point getting and putting it if we're going to ignore it
anyway.

Do you want to send an updated patch with that change? Or maybe do it as
a 2-patch series?


I will do it as 2-patch series and send a updated patch set.

Thanks
Hanjun



[PATCH] dmaengine: acpi: Put the CSRT table after using it

2020-07-22 Thread Hanjun Guo
The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping, put the CSRT table buf after using it.

Signed-off-by: Hanjun Guo 
---
 drivers/dma/acpi-dma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/acpi-dma.c b/drivers/dma/acpi-dma.c
index 8a05db3..dcbcb71 100644
--- a/drivers/dma/acpi-dma.c
+++ b/drivers/dma/acpi-dma.c
@@ -135,11 +135,13 @@ static void acpi_dma_parse_csrt(struct acpi_device *adev, 
struct acpi_dma *adma)
if (ret < 0) {
dev_warn(>dev,
 "error in parsing resource group\n");
-   return;
+   break;
}
 
grp = (struct acpi_csrt_group *)((void *)grp + grp->length);
}
+
+   acpi_put_table((struct acpi_table_header *)csrt);
 }
 
 /**
-- 
1.7.12.4



[PATCH] drm/amdkfd: Put ACPI table after using it

2020-07-22 Thread Hanjun Guo
The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping.

In kfd_create_crat_image_acpi(), crat_table is copied to pcrat_image,
and in kfd_create_vcrat_image_cpu(), the acpi_table is only used to
get the OEM info, so those table mappings need to be release after
using it.

Signed-off-by: Hanjun Guo 
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1009a3b..d378e61 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -756,6 +756,7 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
struct acpi_table_header *crat_table;
acpi_status status;
void *pcrat_image;
+   int rc = 0;
 
if (!crat_image)
return -EINVAL;
@@ -776,17 +777,21 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t 
*size)
 
if (ignore_crat) {
pr_info("CRAT table disabled by module option\n");
-   return -ENODATA;
+   rc = -ENODATA;
+   goto out;
}
 
pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL);
-   if (!pcrat_image)
-   return -ENOMEM;
+   if (!pcrat_image) {
+   rc = -ENOMEM;
+   goto out;
+   }
 
*crat_image = pcrat_image;
*size = crat_table->length;
-
-   return 0;
+out:
+   acpi_put_table(crat_table);
+   return rc;
 }
 
 /* Memory required to create Virtual CRAT.
@@ -970,6 +975,7 @@ static int kfd_create_vcrat_image_cpu(void *pcrat_image, 
size_t *size)
CRAT_OEMID_LENGTH);
memcpy(crat_table->oem_table_id, acpi_table->oem_table_id,
CRAT_OEMTABLEID_LENGTH);
+   acpi_put_table(acpi_table);
}
crat_table->total_entries = 0;
crat_table->num_domains = 0;
-- 
1.7.12.4



[PATCH] mailbox: pcc: Put the PCCT table for error path

2020-07-22 Thread Hanjun Guo
The acpi_get_table() should be coupled with acpi_put_table() if
the mapped table is not used at runtime to release the table
mapping.

In acpi_pcc_probe(), the PCCT table entries will be used as private
data for communication chan at runtime, but the table should be put
for error path.

Signed-off-by: Hanjun Guo 
---
 drivers/mailbox/pcc.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/mailbox/pcc.c b/drivers/mailbox/pcc.c
index 8c7fac3..ef9ecd1 100644
--- a/drivers/mailbox/pcc.c
+++ b/drivers/mailbox/pcc.c
@@ -457,14 +457,17 @@ static int __init acpi_pcc_probe(void)
pr_warn("Error parsing PCC subspaces from PCCT\n");
else
pr_warn("Invalid PCCT: %d PCC subspaces\n", count);
-   return -EINVAL;
+
+   rc = -EINVAL;
+   goto err_put_pcct;
}
 
pcc_mbox_channels = kcalloc(count, sizeof(struct mbox_chan),
GFP_KERNEL);
if (!pcc_mbox_channels) {
pr_err("Could not allocate space for PCC mbox channels\n");
-   return -ENOMEM;
+   rc = -ENOMEM;
+   goto err_put_pcct;
}
 
pcc_doorbell_vaddr = kcalloc(count, sizeof(void *), GFP_KERNEL);
@@ -535,6 +538,8 @@ static int __init acpi_pcc_probe(void)
kfree(pcc_doorbell_vaddr);
 err_free_mbox:
kfree(pcc_mbox_channels);
+err_put_pcct:
+   acpi_put_table(pcct_tbl);
return rc;
 }
 
-- 
1.7.12.4



Re: [Question]: about 'cpuinfo_cur_freq' shown in sysfs when the CPU is in idle state

2020-06-02 Thread Hanjun Guo

On 2020/6/2 11:34, Xiongfeng Wang wrote:

Hi Viresh,

Sorry to disturb you about another problem as follows.

CPPC use the increment of Desired Performance counter and Reference Performance
counter to get the CPU frequency and show it in sysfs through
'cpuinfo_cur_freq'. But ACPI CPPC doesn't specifically define the behavior of
these two counters when the CPU is in idle state, such as stop incrementing when
the CPU is in idle state.

ARMv8.4 Extension inctroduced support for the Activity Monitors Unit (AMU). The
processor frequency cycles and constant frequency cycles in AMU can be used as
Delivered Performance counter and Reference Performance counter. These two
counter in AMU does not increase when the PE is in WFI or WFE. So the increment
is zero when the PE is in WFI/WFE. This cause no issue because
'cppc_get_rate_from_fbctrs()' in cppc_cpufreq driver will check the increment
and return the desired performance if the increment is zero.

But when the CPU goes into power down idle state, accessing these two counters
in AMU by memory-mapped address will return zero. Such as CPU1 went into power
down idle state and CPU0 try to get the frequency of CPU1. In this situation,
will display a very big value for 'cpuinfo_cur_freq' in sysfs. Do you have some
advice about this problem ?


Just a wild guess, how about just return 0 for idle CPUs? which means
the frequency is 0 for idle CPUs.



I was thinking about an idea as follows. We can run 'cppc_cpufreq_get_rate()' on
the CPU to be measured, so that we can make sure the CPU is in C0 state when we
access the two counters. Also we can return the actual frequency rather than
desired performance when the CPU is in WFI/WFE. But this modification will
change the existing logical and I am not sure if this will cause some bad 
effect.


diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 257d726..ded3bcc 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -396,9 +396,10 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata 
*cpu,
 return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
  }

-static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+static int cppc_cpufreq_get_rate_cpu(void *info)
  {
 struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
+ unsigned int cpunum = *(unsigned int *)info;
 struct cppc_cpudata *cpu = all_cpu_data[cpunum];
 int ret;

@@ -418,6 +419,22 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int 
cpunum)
 return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
  }

+static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
+{
+ unsigned int ret;
+
+ ret = smp_call_on_cpu(cpunum, cppc_cpufreq_get_rate_cpu, , true);
+
+ /*
+  * convert negative error code to zero, otherwise we will display
+  * an odd value for 'cpuinfo_cur_freq' in sysfs
+  */
+ if (ret < 0)
+ ret = 0;
+
+ return ret;
+}
+
  static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
  {
 struct cppc_cpudata *cpudata;



It will bring the CPU back if the CPU is in idle state, not friendly to
powersaving :)

Thanks
Hanjun



Re: arm64/acpi: NULL dereference reports from UBSAN at boot

2020-05-22 Thread Hanjun Guo

On 2020/5/22 16:07, Hanjun Guo wrote:

Hi Will,

On 2020/5/21 18:09, Will Deacon wrote:

Hi folks,

I just tried booting the arm64 for-kernelci branch under QEMU (version
4.2.50 (v4.2.0-779-g4354edb6dcc7)) with UBSAN enabled, and I see a couple
of NULL pointer dereferences reported at boot. I think they're both GIC
related (log below). I don't see a panic with UBSAN disabled, so 
something's

fishy here.

Please can you take a look when you get a chance? I haven't had time 
to see
if this is a regression or not, but I don't think it's particularly 
serious

as I have all sorts of horrible stuff enabled in my .config, since I'm
trying to chase down another bug:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/plain/arch/arm64/configs/fuzzing.config?h=fuzzing/arm64-kernelci-20200519=c149cf6a51aa4f72d53fc681c6661094e93ef660 



(on top of defconfig)

CONFIG_FAIL_PAGE_ALLOC may be to blame.


I enabled UBSAN and CONFIG_FAIL_PAGE_ALLOC on top of defconfig,
testing against the for-kernelci branch on the D06 board, I
can see some UBSAN warnings from megaraid_sas driver [0], but not
from any other subsystem including ACPI, I will try all your
configs above to see if I can get more warnings.


Enabled all the configs except "CONFIG_MODULES=n" and
"CONFIG_SHADOW_CALL_STACK=y" (can't do that via make menuconfig,
do it manually?) in the link, but still got the same warnings and
no other warnings as before, the system is in good function, please
let me know I can do anything more to help the debug.

Thanks
Hanjun




Re: arm64/acpi: NULL dereference reports from UBSAN at boot

2020-05-22 Thread Hanjun Guo

Hi Will,

On 2020/5/21 18:09, Will Deacon wrote:

Hi folks,

I just tried booting the arm64 for-kernelci branch under QEMU (version
4.2.50 (v4.2.0-779-g4354edb6dcc7)) with UBSAN enabled, and I see a couple
of NULL pointer dereferences reported at boot. I think they're both GIC
related (log below). I don't see a panic with UBSAN disabled, so something's
fishy here.

Please can you take a look when you get a chance? I haven't had time to see
if this is a regression or not, but I don't think it's particularly serious
as I have all sorts of horrible stuff enabled in my .config, since I'm
trying to chase down another bug:

https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/plain/arch/arm64/configs/fuzzing.config?h=fuzzing/arm64-kernelci-20200519=c149cf6a51aa4f72d53fc681c6661094e93ef660

(on top of defconfig)

CONFIG_FAIL_PAGE_ALLOC may be to blame.


I enabled UBSAN and CONFIG_FAIL_PAGE_ALLOC on top of defconfig,
testing against the for-kernelci branch on the D06 board, I
can see some UBSAN warnings from megaraid_sas driver [0], but not
from any other subsystem including ACPI, I will try all your
configs above to see if I can get more warnings.

Thanks
Hanjun

[0]:
[   18.244272] 

[   18.252673] UBSAN: array-index-out-of-bounds in 
drivers/scsi/megaraid/megaraid_sas_fp.c:104:32

[   18.261244] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
[   18.267313] CPU: 0 PID: 656 Comm: kworker/0:1 Not tainted 
5.7.0-rc6-1-14703-gf4582661223d-dirty #20
[   18.276314] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V3.B210.01 03/12/2020

[   18.285151] Workqueue: events work_for_cpu_fn
[   18.289488] Call trace:
[   18.291925]  dump_backtrace+0x0/0x248
[   18.295572]  show_stack+0x18/0x28
[   18.298873]  dump_stack+0xc0/0x10c
[   18.302261]  ubsan_epilogue+0x10/0x58
[   18.305905]  __ubsan_handle_out_of_bounds+0x8c/0xa8
[   18.310763]  mr_update_load_balance_params+0x118/0x120
[   18.315877]  MR_ValidateMapInfo+0x300/0xb00
[   18.320040]  megasas_get_map_info+0x134/0x1f8
[   18.324377]  megasas_init_adapter_fusion+0xba8/0x10a0
[   18.329403]  megasas_probe_one+0x6e0/0x1b70
[   18.333569]  local_pci_probe+0x40/0xb0
[   18.337299]  work_for_cpu_fn+0x1c/0x30
[   18.341031]  process_one_work+0x1f8/0x378
[   18.345022]  worker_thread+0x21c/0x4c0
[   18.348753]  kthread+0x150/0x158
[   18.351967]  ret_from_fork+0x10/0x18
[   18.355529] 



[   18.592274] 

[   18.600672] UBSAN: array-index-out-of-bounds in 
drivers/scsi/megaraid/megaraid_sas_fp.c:141:9

[   18.609155] index 1 is out of range for type 'MR_LD_SPAN_MAP [1]'
[   18.615221] CPU: 0 PID: 656 Comm: kworker/0:1 Not tainted 
5.7.0-rc6-1-14703-gf4582661223d-dirty #20
[   18.624222] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V3.B210.01 03/12/2020

[   18.633050] Workqueue: events work_for_cpu_fn
[   18.637387] Call trace:
[   18.639822]  dump_backtrace+0x0/0x248
[   18.643467]  show_stack+0x18/0x28
[   18.646767]  dump_stack+0xc0/0x10c
[   18.650152]  ubsan_epilogue+0x10/0x58
[   18.653796]  __ubsan_handle_out_of_bounds+0x8c/0xa8
[   18.658652]  MR_GetLDTgtId+0x58/0x60
[   18.662211]  megasas_sync_map_info+0xd0/0x1c0
[   18.666547]  megasas_init_adapter_fusion+0xd60/0x10a0
[   18.671574]  megasas_probe_one+0x6e0/0x1b70
[   18.675736]  local_pci_probe+0x40/0xb0
[   18.679466]  work_for_cpu_fn+0x1c/0x30
[   18.683197]  process_one_work+0x1f8/0x378
[   18.687188]  worker_thread+0x21c/0x4c0
[   18.690920]  kthread+0x150/0x158
[   18.694123]  ret_from_fork+0x10/0x18
[   18.697683] 





Re: [PATCH v2] ACPI/IORT: Fix PMCG node always look for a single ID mapping.

2020-05-12 Thread Hanjun Guo

On 2020/5/13 7:56, Tuan Phan wrote:

PMCG node can have zero ID mapping if its overflow interrupt
is wire based. The code to parse PMCG node can not assume it will
have a single ID mapping.

Signed-off-by: Tuan Phan 


It's better to add

Fixes: 24e516049360 ("ACPI/IORT: Add support for PMCG")


---
Changes in v2:
- Used pmcg node to detect wired base overflow interrupt.
  
  drivers/acpi/arm64/iort.c | 5 +

  1 file changed, 5 insertions(+)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index ed3d2d1..11a4e8e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -414,6 +414,7 @@ static struct acpi_iort_node *iort_node_get_id(struct 
acpi_iort_node *node,
  static int iort_get_id_mapping_index(struct acpi_iort_node *node)
  {
struct acpi_iort_smmu_v3 *smmu;
+   struct acpi_iort_pmcg *pmcg;
  
  	switch (node->type) {

case ACPI_IORT_NODE_SMMU_V3:
@@ -441,6 +442,10 @@ static int iort_get_id_mapping_index(struct acpi_iort_node 
*node)
  
  		return smmu->id_mapping_index;

case ACPI_IORT_NODE_PMCG:
+   pmcg = (struct acpi_iort_pmcg *)node->node_data;
+   if (pmcg->overflow_gsiv)
+   return -EINVAL;
+
return 0;
default:
return -EINVAL;


With my comments addressed,

Reviewed-by: Hanjun Guo 



Re: [PATCH] ACPI/IORT: Remove the unused __get_pci_rid()

2020-05-09 Thread Hanjun Guo

On 2020/5/9 17:34, Zenghui Yu wrote:

Since commit bc8648d49a95 ("ACPI/IORT: Handle PCI aliases properly for
IOMMUs"), __get_pci_rid() has become actually unused and can be removed.

Signed-off-by: Zenghui Yu 


Looks good to me,

Acked-by: Hanjun Guo 



Re: [RFC PATCH] cpufreq: add support for HiSilicon SoC HIP09

2020-05-07 Thread Hanjun Guo

Hi Thanu, Souvik, Sudeep,

Thanks a lot for the prompt reply, and it clarify a lot for
us, comment inline below.

On 2020/5/6 22:57, Souvik Chakravarty wrote:

Hi,


From: Thanu Rangarajan 
Sent: Wednesday, May 6, 2020 1:58 PM

Hi,
ACPI CPPC already supports the notion of boost. Not sure we need any
enhancements there.

Regards,
Thanu

On 06/05/2020, 18:19, "Sudeep Holla"  wrote:

 + Thanu, Souvik who work with ASWG

 On Wed, May 06, 2020 at 05:36:51PM +0800, Hanjun Guo wrote:
 > Hi Sudeep,
 >
 > On 2020/4/30 17:55, Sudeep Holla wrote:
 > > On Thu, Apr 30, 2020 at 02:19:59PM +0800, Xiongfeng Wang wrote:
 > > > HiSilicon SoC has a separate System Control Processor(SCP)
dedicated for
 > > > clock frequency adjustment and has been using the cpufreq driver
 > > > 'cppc-cpufreq'. New HiSilicon SoC HIP09 add support for CPU Boost,
but
 > > > ACPI CPPC doesn't support this. In HiSilicon SoC HIP09, each core has
 > > > its own clock domain. It is better for the core itself to adjust its
 > > > frequency when we require fast response. In this patch, we add a
 > > > separate cpufreq driver for HiSilicon SoC HIP09.
 > > >
 > >
 > > I disagree with this approach unless you have tried to extend the CPPC
 > > in ACPI to accommodate this boost feature you need. Until you show
those
 > > efforts and disagreement to do that from ASWG, I am NACKing this
approach.
 >
 > Unfortunately we are not in ASWG at now, could you please give some
 > help about extending CPPC in ACPI to support boost feature?
 >

 You may have to provide more details than the commit log for sure as I
 haven't understood the boost feature and what is missing in ACPI CPPC.


I would agree with Thanu here regarding the ACPI spec part - everything is 
already there.


I take another detail read of the spec and as you said,
everything is already there,thanks!. I was misleading by the
CPPC code which it's using the 'Highest Performance' as
the max performance not the 'Nominal Performance', so seems
that 'Highest Performance' is the normal max performance but
not the boost performance.



It seems to me that the .set_boost is today not handled in cppc_cpufreq.c. In 
fact if a platform provides a value for Highest Performance which is different 
than Nominal Performance, then the entire range of performance is always 
requested, which works well for platforms which do boost enable/disable 
selection at hardware/firmware level.


Yes, so for now, the CPPC code will enable the boost feature in
default if the firmware provides a value for Highest Performance
which is different than Nominal Performance.



.boost hook could potentially be useful in cppc_cpufreq.c for platforms which would 
manage the boost selection in software. But it would be good to keep a common 
implementation which can choose between "software-triggered or auto-boost" 
selection for different platforms.


Thanks for the clarify, it helps a lot, Xiongfeng prepared
some patches to enable .boost hook, but needs to set the
'Nominal Performance' as the max performance, and
'Highest Performance' is the max boost performance, will
send out for comments.

Best Regards,
Hanjun



Re: [RFC PATCH] cpufreq: add support for HiSilicon SoC HIP09

2020-05-06 Thread Hanjun Guo

Hi Sudeep,

On 2020/4/30 17:55, Sudeep Holla wrote:

On Thu, Apr 30, 2020 at 02:19:59PM +0800, Xiongfeng Wang wrote:

HiSilicon SoC has a separate System Control Processor(SCP) dedicated for
clock frequency adjustment and has been using the cpufreq driver
'cppc-cpufreq'. New HiSilicon SoC HIP09 add support for CPU Boost, but
ACPI CPPC doesn't support this. In HiSilicon SoC HIP09, each core has
its own clock domain. It is better for the core itself to adjust its
frequency when we require fast response. In this patch, we add a
separate cpufreq driver for HiSilicon SoC HIP09.



I disagree with this approach unless you have tried to extend the CPPC
in ACPI to accommodate this boost feature you need. Until you show those
efforts and disagreement to do that from ASWG, I am NACKing this approach.


Unfortunately we are not in ASWG at now, could you please give some
help about extending CPPC in ACPI to support boost feature?

Thanks
Hanjun



Re: [PATCH 3/3] arm64: configs: unset CPU_BIG_ENDIAN

2019-10-14 Thread Hanjun Guo
On 2019/10/15 0:24, Will Deacon wrote:
> On Sat, Oct 12, 2019 at 03:50:55PM +0100, Russell King - ARM Linux admin 
> wrote:
>> On Sat, Oct 12, 2019 at 12:47:45AM +0200, Arnd Bergmann wrote:
>>> On Fri, Oct 11, 2019 at 12:33 PM Russell King - ARM Linux admin
>>>  wrote:
 32-bit ARM experience is that telco class users really like big
 endian.
>>>
>>> Right, basically anyone with a large code base migrated over from a
>>> big-endian MIPS or PowerPC legacy that found it cheaper to change
>>> the rest of the world than to fix their own code.
>>
>> I think you need to step off your soap box!  Big endian isn't going
>> away, and it likely has nothing to do with code bases.  Just look at
>> networking and telco protocols.  Everything in that world tends to
>> be big endian.  BE is what is understood in that world, and there's
>> little we can do to change it.
>>
>> Demanding that they switch to LE is tantamount to you demanding that
>> their entire world change - it ain't going to happen.
> 
> Oh, I wasn't demanding anything! Just interested to know if big-endian is
> actually being used because it's not something that I'm able to test
> sensibly and I haven't see anywhere near the amount of (public) effort to
> keep it supported as for little-endian. However, having asked the question,
> it's clear that it does have some users so we'll keep maintaining it on a
> best-effort basis and rely on those users to let us know if anything breaks.

Sure, we (Huawei kernel team) did that and we will do that in the future
as well.

Thanks
Hanjun



Re: [PATCH 3/3] arm64: configs: unset CPU_BIG_ENDIAN

2019-10-14 Thread Hanjun Guo
Hi Arnd,

On 2019/10/12 22:05, Arnd Bergmann wrote:
> On Sat, Oct 12, 2019 at 9:33 AM Hanjun Guo  wrote:
>>
>> On 2019/10/11 18:27, Will Deacon wrote:
>> [...]
>>>
>>> Does anybody use BIG_ENDIAN? If we're not even building it then maybe we
>>> should get rid of it altogether on arm64. I don't know of any supported
>>> userspace that supports it or any CPUs that are unable to run little-endian
>>> binaries.
>>
>> FWIW, massive telecommunication products (based on ARM64) form Huawei are 
>> using
>> BIG_ENDIAN, and will use BIG_ENDIAN in the near future as well.
> 
> Ok, thanks for the information -- that definitely makes it clear that
> it cannot go
> away anytime soon (though it doesn't stop us from change the
> allmodconfig default
> if we decide that's a good idea).

I agree with you.

> 
> Do you know if there are currently any patches against mainline to
> make big-endian
> work in products, or is everything working out of the box?

We are not using mainline kernel for product, but LTS 4.4 is working
fine, and also we tested LTS 4.19 kernel without any other big-endian
patches, the latest big-endian bug we found is this:

a6002ec5a8c6 arm64: opcodes.h: Add arm big-endian config options before 
including arm header

The running kernel code covered but Huawei's telecommunication products
is limited, but I think ARM64 arch code is working fine for big-endian.

Thanks
Hanjun



Re: [PATCH 3/3] arm64: configs: unset CPU_BIG_ENDIAN

2019-10-12 Thread Hanjun Guo
On 2019/10/11 18:27, Will Deacon wrote:
[...]
> 
> Does anybody use BIG_ENDIAN? If we're not even building it then maybe we
> should get rid of it altogether on arm64. I don't know of any supported
> userspace that supports it or any CPUs that are unable to run little-endian
> binaries.

FWIW, massive telecommunication products (based on ARM64) form Huawei are using
BIG_ENDIAN, and will use BIG_ENDIAN in the near future as well.

Thanks
Hanjun



Re: [RFC PATCH 0/3] ACPI, arm64: Backport for ACPI PPTT 6.3 thread flag for stable 4.19.x

2019-10-10 Thread Hanjun Guo
Hi John,

On 2019/10/10 21:29, John Garry wrote:
> This series is a backport of the ACPI PPTT 6.3 thread flag feature for
> supporting arm64 systems.
> 
> The background is that some arm64 implementations are broken, in that they
> incorrectly advertise that a CPU is mutli-threaded, when it is not - the
> HiSilicon Taishanv110 rev 2, aka tsv110, being an example.
> 
> This leads to the system topology being incorrect. The reason being that
> arm64 topology code uses a combination of ACPI PPTT (Processor Properties
> Topology Table) and the system MPIDR (Multiprocessor Affinity Register) MT
> bit to determine the topology.
> 
> Until ACPI 6.3, the PPTT did not have any method to determine whether
> a CPU was multi-threaded, so only the MT bit is used - hence the
> broken topology for some systems.
> 
> In ACPI 6.3, a PPTT thread flag was introduced, which - when supported -
> would be used by the kernel to determine really if a CPU is multi-threaded
> or not, so that we don't get incorrect topology.

Thanks for doing this, and this patch set fix my CPU topology issue
in 4.19 kernel with updated firmware.

> 
> Note: I'm sending this as an RFC before sending to stable proper. I also
> have a 5.2 and 5.3 backport which are almost the same, and only
> significant change being that the ACPICA patch is not required.

5.2 stable was end of life, so only 4.19 and 5.3 are needed I think.

Thanks
Hanjun



Re: [PATCH v3 00/10] Rework REFCOUNT_FULL using atomic_fetch_* operations

2019-10-09 Thread Hanjun Guo
On 2019/10/7 23:46, Will Deacon wrote:
> Hi all,
> 
> This is version three of the patches I previously posted here:
> 
>   v1: https://lkml.kernel.org/r/20190802101000.12958-1-w...@kernel.org
>   v2: https://lkml.kernel.org/r/20190827163204.29903-1-w...@kernel.org
> 
> Changes since v2 include:
> 
>   - Remove the x86 assembly version and enable this code unconditionally
>   - Move saturation warnings out-of-line to reduce image bloat
> 
> Cheers,
> 
> Will
> 
> Cc: Kees Cook 
> Cc: Ingo Molnar 
> Cc: Elena Reshetova 
> Cc: Peter Zijlstra 
> Cc: Ard Biesheuvel 
> Cc: Hanjun Guo 
> Cc: Jan Glauber 
> 

I tested on top of 5.4-rc2 (with Jan's open-read-close file test case), on a 96 
CPU
cores ARM64 server, I can see no much difference under 24 cores (each 24 core is
a NUMA node), but +5.9% performance improve on 48 cores and +8.4% for 96 cores.

For the ARM64 arch,

Tested-by: Hanjun Guo 

Thanks
Hanjun



Re: [PATCH v4 1/5] locking/qspinlock: Rename arch_mcs_spin_unlock_contended to arch_mcs_pass_lock and make it more generic

2019-09-17 Thread Hanjun Guo
Hi Alex,

On 2019/9/6 22:25, Alex Kogan wrote:
> The new macro should accept the value to be stored into the lock argument
> as another argument. This allows using the same macro in cases where the
> value to be stored when passing the lock is different from 1.
> 
> Signed-off-by: Alex Kogan 
> Reviewed-by: Steve Sistare 
> ---
>  arch/arm/include/asm/mcs_spinlock.h | 4 ++--
>  kernel/locking/mcs_spinlock.h   | 6 +++---
>  kernel/locking/qspinlock.c  | 2 +-
>  3 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/include/asm/mcs_spinlock.h 
> b/arch/arm/include/asm/mcs_spinlock.h
> index 529d2cf4d06f..f3f9efdcd2ca 100644
> --- a/arch/arm/include/asm/mcs_spinlock.h
> +++ b/arch/arm/include/asm/mcs_spinlock.h
> @@ -14,9 +14,9 @@ do {
> \
>   wfe();  \
>  } while (0)  \
>  
> -#define arch_mcs_spin_unlock_contended(lock) \
> +#define arch_mcs_pass_lock(lock, val)
> \

arch_mcs_spin_unlock_contended() has a matching function 
arch_mcs_spin_lock_contended(),
please see include/asm-generic/mcs_spinlock.h, so if we update this function 
name,
should we update the matching one as well? and update the relevant comments as 
well?

>  do { \
> - smp_store_release(lock, 1); \
> + smp_store_release((lock), (val));   \
>   dsb_sev();  \
>  } while (0)
>  
> diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
> index 5e10153b4d3c..84327ca21650 100644
> --- a/kernel/locking/mcs_spinlock.h
> +++ b/kernel/locking/mcs_spinlock.h
> @@ -41,8 +41,8 @@ do {
> \
>   * operations in the critical section has been completed before
>   * unlocking.
>   */
> -#define arch_mcs_spin_unlock_contended(l)\

Before this line of the code, there is:

#ifndef arch_mcs_spin_lock_contended

...

#define arch_mcs_spin_lock_contended(l) \

So #ifndef should be updated too.

Thanks
Hanjun



Re: [PATCH v2 0/6] Rework REFCOUNT_FULL using atomic_fetch_* operations

2019-09-06 Thread Hanjun Guo
On 2019/9/6 21:43, Will Deacon wrote:
> On Wed, Aug 28, 2019 at 02:03:37PM -0700, Kees Cook wrote:
>> On Wed, Aug 28, 2019 at 03:14:40PM +0100, Will Deacon wrote:
>>> On Wed, Aug 28, 2019 at 09:30:52AM +0200, Peter Zijlstra wrote:
 On Tue, Aug 27, 2019 at 05:31:58PM +0100, Will Deacon wrote:
> Will Deacon (6):
>   lib/refcount: Define constants for saturation and max refcount values
>   lib/refcount: Ensure integer operands are treated as signed
>   lib/refcount: Remove unused refcount_*_checked() variants
>   lib/refcount: Move bulk of REFCOUNT_FULL implementation into header
>   lib/refcount: Improve performance of generic REFCOUNT_FULL code
>   lib/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions
>> BTW, can you repeat the timing details into the "Improve performance of
>> generic REFCOUNT_FULL code" patch?
> Of course.
> 
 So I'm not a fan; I itch at the whole racy nature of this thing and I
 find the code less than obvious. Yet, I have to agree it is exceedingly
 unlikely the race will ever actually happen, I just don't want to be the
 one having to debug it.
>>> FWIW, I think much the same about the version under arch/x86 ;)
>>>
 I've not looked at the implementation much; does it do all the same
 checks the FULL one does? The x86-asm one misses a few iirc, so if this
 is similarly fast but has all the checks, it is in fact better.
>>> Yes, it passes all of the REFCOUNT_* tests in lkdtm [1] so I agree that
>>> it's an improvement over the asm version.
>>>
 Can't we make this a default !FULL implementation?
>>> My concern with doing that is I think it would make the FULL implementation
>>> entirely pointless. I can't see anybody using it, and it would only exist
>>> as an academic exercise in handling the theoretical races. That's a change
>>> from the current situation where it genuinely handles cases which the
>>> x86-specific code does not and, judging by the Kconfig text, that's the
>>> only reason for its existence.
>> Looking at timing details, the new implementation is close enough to the
>> x86 asm version that I would be fine to drop the x86-specific case
>> entirely as long as we could drop "FULL" entirely too -- we'd have _one_
>> refcount_t implementation: it would be both complete and fast.
> That works for me; I'll spin a new version of this series so you can see
> what it looks like.

I will wait for the new version then do the performance test on ARM64 server.

Thanks
Hanjun



Re: [PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn

2019-07-24 Thread Hanjun Guo
On 2019/7/23 16:33, Mike Rapoport wrote:
> On Tue, Jul 23, 2019 at 01:51:13PM +0800, Hanjun Guo wrote:
>> From: Jia He 
>>
>> After skipping some invalid pfns in memmap_init_zone(), there is still
>> some room for improvement.
>>
>> E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++
>> instead of doing the binary search in memblock_next_valid_pfn.
>>
>> Furthermore, if the pfn is in a gap of two memory region, skip to next
>> region directly to speedup the binary search.
> How much speed up do you see with this improvements relatively to simple
> binary search in memblock_next_valid_pfn()?

The major speedup on my platform is the previous patch in this patch set,
not this one, I think it's related to sparse memory mode for different
platforms.

Thanks
Hanjun

>   



Re: [PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64

2019-07-24 Thread Hanjun Guo
On 2019/7/23 16:30, Mike Rapoport wrote:
> On Tue, Jul 23, 2019 at 01:51:12PM +0800, Hanjun Guo wrote:
>> From: Jia He 
>>
>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>> where possible") optimized the loop in memmap_init_zone(). But it causes
>> possible panic on x86 due to specific memory mapping on x86_64 which will
>> skip valid pfns as well, so Daniel Vacek reverted it later.
>>
>> But as suggested by Daniel Vacek, it is fine to using memblock to skip
>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>>
>> Daniel said:
>> "On arm and arm64, memblock is used by default. But generic version of
>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
>> not always return the next valid one but skips more resulting in some
>> valid frames to be skipped (as if they were invalid). And that's why
>> kernel was eventually crashing on some !arm machines."
> 
> I think that the crash on x86 was not related to CONFIG_HAVE_ARCH_PFN_VALID
> but rather to the x86 way to setup memblock.  Some of the x86 reserved
> memory areas were never added to memblock.memory, which makes memblock's
> view of the physical memory incomplete and that's why
> memblock_next_valid_pfn() could skip valid PFNs on x86.

Thank you for kindly clarify, I will update the patch with your comments
in next version.

> 
>> Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
>> selected for arm64, using the new config option to guard the
>> memblock_next_valid_pfn().
>  
> As far as I can tell, the memblock_next_valid_pfn() should work on most
> architectures and not only on ARM. For sure there is should be no
> dependency between CONFIG_HAVE_ARCH_PFN_VALID and memblock_next_valid_pfn().
> 
> I believe that the configuration option to guard memblock_next_valid_pfn()
> should be opt-out and that only x86 will require it.

So how about introduce a configuration option, say, 
CONFIG_HAVE_ARCH_PFN_INVALID,
selected by x86 and keep it default unselected for all other architecture?

> 
>> This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
>> is pretty impressive for bootmem_init() at boot:
>>
>> with 384G memory,
>> before: 13310ms
>> after:  1415ms
>>
>> with 1T memory,
>> before: 20s
>> after:  2s
>>
>> Suggested-by: Daniel Vacek 
>> Signed-off-by: Jia He 
>> Signed-off-by: Hanjun Guo 
>> ---
>>  arch/arm64/Kconfig |  1 +
>>  include/linux/mmzone.h |  9 +
>>  mm/Kconfig |  3 +++
>>  mm/memblock.c  | 31 +++
>>  mm/page_alloc.c|  4 +++-
>>  5 files changed, 47 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 697ea0510729..058eb26579be 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
>>  
>>  config HAVE_ARCH_PFN_VALID
>>  def_bool y
>> +select HAVE_MEMBLOCK_PFN_VALID
>>
>>  config HW_PERF_EVENTS
>>  def_bool y
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 70394cabaf4e..24cb6bdb1759 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
>>  #endif
>>  
>>  #define early_pfn_valid(pfn)pfn_valid(pfn)
>> +#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
>> +extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
>> +#define next_valid_pfn(pfn) memblock_next_valid_pfn(pfn)
> 
> Please make it 'static inline' and move out of '#ifdef CONFIG_SPARSEMEM'

Will do.

> 
>> +#endif
>>  void sparse_init(void);
>>  #else
>>  #define sparse_init()   do {} while (0)
>> @@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
>>  #define early_pfn_valid(pfn)(1)
>>  #endif
>>  
>> +/* fallback to default definitions */
>> +#ifndef next_valid_pfn
>> +#define next_valid_pfn(pfn) (pfn + 1)
> 
> static inline as well.

OK.

> 
>> +#endif
>> +
>>  void memory_present(int nid, unsigned long start, unsigned long end);
>>  
>>  /*
>> diff --git a/mm/Kconfig b/mm/Kconfig
>> index f0c76ba47695..c578374b6413 100644
>> --- a/mm/Kconfig
>> +++ b/mm/Kconfig
>> @@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
>>  config HAVE_MEMBLOCK_PHYS_MAP
>>  bool
>>  
>> +config HAVE_MEMBLOCK_PFN_VALID
>> +bool
>> +
>>  config HAVE_GENER

[PATCH v12 1/2] mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64

2019-07-22 Thread Hanjun Guo
From: Jia He 

Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") optimized the loop in memmap_init_zone(). But it causes
possible panic on x86 due to specific memory mapping on x86_64 which will
skip valid pfns as well, so Daniel Vacek reverted it later.

But as suggested by Daniel Vacek, it is fine to using memblock to skip
gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.

Daniel said:
"On arm and arm64, memblock is used by default. But generic version of
pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
not always return the next valid one but skips more resulting in some
valid frames to be skipped (as if they were invalid). And that's why
kernel was eventually crashing on some !arm machines."

Introduce a new config option CONFIG_HAVE_MEMBLOCK_PFN_VALID and only
selected for arm64, using the new config option to guard the
memblock_next_valid_pfn().

This was tested on a HiSilicon Kunpeng920 based ARM64 server, the speedup
is pretty impressive for bootmem_init() at boot:

with 384G memory,
before: 13310ms
after:  1415ms

with 1T memory,
before: 20s
after:  2s

Suggested-by: Daniel Vacek 
Signed-off-by: Jia He 
Signed-off-by: Hanjun Guo 
---
 arch/arm64/Kconfig |  1 +
 include/linux/mmzone.h |  9 +
 mm/Kconfig |  3 +++
 mm/memblock.c  | 31 +++
 mm/page_alloc.c|  4 +++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 697ea0510729..058eb26579be 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -893,6 +893,7 @@ config ARCH_FLATMEM_ENABLE
 
 config HAVE_ARCH_PFN_VALID
def_bool y
+   select HAVE_MEMBLOCK_PFN_VALID
 
 config HW_PERF_EVENTS
def_bool y
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 70394cabaf4e..24cb6bdb1759 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1325,6 +1325,10 @@ static inline int pfn_present(unsigned long pfn)
 #endif
 
 #define early_pfn_valid(pfn)   pfn_valid(pfn)
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+extern unsigned long memblock_next_valid_pfn(unsigned long pfn);
+#define next_valid_pfn(pfn)memblock_next_valid_pfn(pfn)
+#endif
 void sparse_init(void);
 #else
 #define sparse_init()  do {} while (0)
@@ -1347,6 +1351,11 @@ struct mminit_pfnnid_cache {
 #define early_pfn_valid(pfn)   (1)
 #endif
 
+/* fallback to default definitions */
+#ifndef next_valid_pfn
+#define next_valid_pfn(pfn)(pfn + 1)
+#endif
+
 void memory_present(int nid, unsigned long start, unsigned long end);
 
 /*
diff --git a/mm/Kconfig b/mm/Kconfig
index f0c76ba47695..c578374b6413 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -132,6 +132,9 @@ config HAVE_MEMBLOCK_NODE_MAP
 config HAVE_MEMBLOCK_PHYS_MAP
bool
 
+config HAVE_MEMBLOCK_PFN_VALID
+   bool
+
 config HAVE_GENERIC_GUP
bool
 
diff --git a/mm/memblock.c b/mm/memblock.c
index 7d4f61ae666a..d57ba51bb9cd 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1251,6 +1251,37 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
return 0;
 }
 #endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
+
+#ifdef CONFIG_HAVE_MEMBLOCK_PFN_VALID
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
+{
+   struct memblock_type *type = 
+   unsigned int right = type->cnt;
+   unsigned int mid, left = 0;
+   phys_addr_t addr = PFN_PHYS(++pfn);
+
+   do {
+   mid = (right + left) / 2;
+
+   if (addr < type->regions[mid].base)
+   right = mid;
+   else if (addr >= (type->regions[mid].base +
+ type->regions[mid].size))
+   left = mid + 1;
+   else {
+   /* addr is within the region, so pfn is valid */
+   return pfn;
+   }
+   } while (left < right);
+
+   if (right == type->cnt)
+   return -1UL;
+   else
+   return PHYS_PFN(type->regions[right].base);
+}
+EXPORT_SYMBOL(memblock_next_valid_pfn);
+#endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
+
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
 /**
  * __next_mem_pfn_range_in_zone - iterator for for_each_*_range_in_zone()
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d66bc8abe0af..70933c40380a 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5811,8 +5811,10 @@ void __meminit memmap_init_zone(unsigned long size, int 
nid, unsigned long zone,
 * function.  They do not exist on hotplugged memory.
 */
if (context == MEMMAP_EARLY) {
-   if (!early_pfn_valid(pfn))
+   if (!early_pfn_valid(pfn)) {
+   pfn = next_valid_pfn(pfn) - 1;
continue;
+   }
   

[PATCH v12 2/2] mm: page_alloc: reduce unnecessary binary search in memblock_next_valid_pfn

2019-07-22 Thread Hanjun Guo
From: Jia He 

After skipping some invalid pfns in memmap_init_zone(), there is still
some room for improvement.

E.g. if pfn and pfn+1 are in the same memblock region, we can simply pfn++
instead of doing the binary search in memblock_next_valid_pfn.

Furthermore, if the pfn is in a gap of two memory region, skip to next
region directly to speedup the binary search.

Signed-off-by: Jia He 
Signed-off-by: Hanjun Guo 
---
 mm/memblock.c | 37 +++--
 1 file changed, 31 insertions(+), 6 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index d57ba51bb9cd..95d5916716a0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1256,28 +1256,53 @@ int __init_memblock memblock_set_node(phys_addr_t base, 
phys_addr_t size,
 unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
 {
struct memblock_type *type = 
+   struct memblock_region *regions = type->regions;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
+   unsigned long start_pfn, end_pfn, next_start_pfn;
phys_addr_t addr = PFN_PHYS(++pfn);
+   static int early_region_idx __initdata_memblock = -1;
 
+   /* fast path, return pfn+1 if next pfn is in the same region */
+   if (early_region_idx != -1) {
+   start_pfn = PFN_DOWN(regions[early_region_idx].base);
+   end_pfn = PFN_DOWN(regions[early_region_idx].base +
+   regions[early_region_idx].size);
+
+   if (pfn >= start_pfn && pfn < end_pfn)
+   return pfn;
+
+   /* try slow path */
+   if (++early_region_idx == type->cnt)
+   goto slow_path;
+
+   next_start_pfn = PFN_DOWN(regions[early_region_idx].base);
+
+   if (pfn >= end_pfn && pfn <= next_start_pfn)
+   return next_start_pfn;
+   }
+
+slow_path:
+   /* slow path, do the binary searching */
do {
mid = (right + left) / 2;
 
-   if (addr < type->regions[mid].base)
+   if (addr < regions[mid].base)
right = mid;
-   else if (addr >= (type->regions[mid].base +
- type->regions[mid].size))
+   else if (addr >= (regions[mid].base + regions[mid].size))
left = mid + 1;
else {
-   /* addr is within the region, so pfn is valid */
+   early_region_idx = mid;
return pfn;
}
} while (left < right);
 
if (right == type->cnt)
return -1UL;
-   else
-   return PHYS_PFN(type->regions[right].base);
+
+   early_region_idx = right;
+
+   return PHYS_PFN(regions[early_region_idx].base);
 }
 EXPORT_SYMBOL(memblock_next_valid_pfn);
 #endif /* CONFIG_HAVE_MEMBLOCK_PFN_VALID */
-- 
2.19.1



[PATCH v12 0/2] introduce memblock_next_valid_pfn() (again) for arm64

2019-07-22 Thread Hanjun Guo
Here is new version of "[PATCH v11 0/3] remain and optimize
memblock_next_valid_pfn on arm and arm64" from Jia He, which is suggested
by Ard to respin this patch set [1].

In the new version, I squashed patch 1/3 and patch 2/3 in v11 into
one patch, fixed a bug for possible out of bound accessing the
regions, and just introduce memblock_next_valid_pfn() for arm64 only
as I don't have a arm32 platform to test.

Ard asked to "with the new data points added for documentation, and
crystal clear about how the meaning of PFN validity differs between
ARM and other architectures, and why the assumptions that the
optimization is based on are guaranteed to hold", to be honest, I
didn't see PFN validity differs between ARM and x86 architecture,
but there is a bug in commit b92df1de5d28 ("mm: page_alloc: skip over
regions of invalid pfns where possible") which has a possible out of
bound accessing the regions as well, so not sure that is the root cause.

Testing on a HiSilicon ARM64 server (a 4 sockets system), I can get
pretty much speedup for bootmem_init() at boot:

with 384G memory,
before: 13310ms
after:  1415ms
   
with 1T memory,
before: 20s
after:  2s

[1]: https://lkml.org/lkml/2019/6/10/412

Jia He (2):
  mm: page_alloc: introduce memblock_next_valid_pfn() (again) for arm64
  mm: page_alloc: reduce unnecessary binary search in
memblock_next_valid_pfn

 arch/arm64/Kconfig |  1 +
 include/linux/mmzone.h |  9 +++
 mm/Kconfig |  3 +++
 mm/memblock.c  | 56 ++
 mm/page_alloc.c|  4 ++-
 5 files changed, 72 insertions(+), 1 deletion(-)

-- 
2.19.1



Re: [PATCH 0/3] arm64: Allow early timestamping of kernel log

2019-07-22 Thread Hanjun Guo
On 2019/7/22 18:33, Marc Zyngier wrote:
> So far, we've let the arm64 kernel start its meaningful time stamping
> of the kernel log pretty late, which is caused by sched_clock() being
> initialised rather late compared to other architectures.
> 
> Pavel Tatashin proposed[1] to move the initialisation of sched_clock
> much earlier, which I had objections to. The reason for initialising
> sched_clock late is that a number of systems have broken counters, and
> we need to apply all kind of terrifying workarounds to avoid time
> going backward on the affected platforms. Being able to identify the
> right workaround comes pretty late in the kernel boot, and providing
> an unreliable sched_clock, even for a short period of time, isn't an
> appealing prospect.
> 
> To address this, I'm proposing that we allow an architecture to chose
> to (1) divorce time stamping and sched_clock during the early phase of
> booting, and (2) inherit the time stamping clock as the new epoch the
> first time a sched_sched clock gets registered.
> 
> (1) would allow arm64 to provide a time stamping clock, however
> unreliable it might be, while (2) would allow sched_clock to provide
> time stamps that are continuous with the time-stamping clock.
> 
> The last patch in the series adds the necessary logic to arm64,
> allowing the (potentially unreliable) time stamping of early kernel
> messages.
> 
> Tested on a bunch of arm64 systems, both bare-metal and in VMs. Boot
> tested on a x86 guest.

This makes the boot log more useful and I can debug some time consuming
issue easier before the arch timer initialization, tested on my ARM64
server, I can see the timestamping from the start [1],

Tested-by: Hanjun Guo 

Thanks
Hanjun

[1]:
[0.00] Booting Linux on physical CPU 0x08 [0x481fd010]
[0.00] Linux version 5.2.0+ (root@localhost.localdomain) (gcc version 
9.0.1 20190312 (Red Hat 9.0.1-0.10) (GCC)) #45 SMP Tue Jul 23 09:17:48 CST 2019
[0.00] Using timestamp clock @100MHz
[0.74] efi: Getting EFI parameters from FDT:
[0.82] efi: EFI v2.70 by EDK II
[0.83] efi:  ACPI 2.0=0x3a30  SMBIOS 3.0=0x39f8  
MEMATTR=0x30996018  MEMRESERVE=0x30997e18
[0.000122] crashkernel reserved: 0x0ba0 - 0x2ba0 
(512 MB)
[0.000126] cma: Failed to reserve 512 MiB
[0.185111] ACPI: Early table checksum verification disabled
[0.185115] ACPI: RSDP 0x3A30 24 (v02 HISI  )
[0.185120] ACPI: XSDT 0x3A27 9C (v01 HISI   HIP08
  0113)
[0.185127] ACPI: FACP 0x39B1 000114 (v06 HISI   HIP08
 HISI 20151124)
[0.185134] ACPI: DSDT 0x39AB 0084E4 (v02 HISI   HIP08
 INTL 20181213)
[0.185139] ACPI: PCCT 0x39FB 8A (v01 HISI   HIP08
 HISI 20151124)
[0.185143] ACPI: SSDT 0x39F9 01021A (v02 HISI   HIP07
 INTL 20181213)
[0.185147] ACPI: BERT 0x39F5 30 (v01 HISI   HIP08
 HISI 20151124)
[0.185150] ACPI: HEST 0x39F3 000308 (v01 HISI   HIP08
 HISI 20151124)
[0.185154] ACPI: ERST 0x39EF 000230 (v01 HISI   HIP08
 HISI 20151124)
[0.185158] ACPI: EINJ 0x39EE 000170 (v01 HISI   HIP08
 HISI 20151124)
[0.185162] ACPI: SLIT 0x39B3 3C (v01 HISI   HIP08
 HISI 20151124)
[0.185166] ACPI: GTDT 0x39B0 7C (v02 HISI   HIP08
 HISI 20151124)
[0.185169] ACPI: MCFG 0x39AF 3C (v01 HISI   HIP08
 HISI 20151124)
[0.185173] ACPI: SPCR 0x39AE 50 (v02 HISI   HIP08
 HISI 20151124)
[0.185177] ACPI: SRAT 0x39AD 0007D0 (v03 HISI   HIP08
 HISI 20151124)
[0.185181] ACPI: APIC 0x39AC 001E6C (v04 HISI   HIP08
 HISI 20151124)
[0.185185] ACPI: IORT 0x39AA 001310 (v00 HISI   HIP08
 INTL 20181213)
[0.185189] ACPI: PPTT 0x3097 0031B0 (v01 HISI   HIP08
 HISI 20151124)
[0.185196] ACPI: SPCR: console: pl011,mmio32,0x9408,115200
[0.185208] ACPI: SRAT: Node 0 PXM 0 [mem 0x208000-0x2f]
[0.185210] ACPI: SRAT: Node 1 PXM 1 [mem 0x30-0x3f]
[0.185212] ACPI: SRAT: Node 0 PXM 0 [mem 0x-0x7fff]
[0.185213] ACPI: SRAT: Node 2 PXM 2 [mem 0x2020-0x202f]
[0.185215] ACPI: SRAT: Node 3 PXM 3 [mem 0x2030-0x203f]
[0.185221] NUMA: NODE_DATA [mem 0x2fe3c0-0x2f]
[0.185224] NUMA: NODE_DATA [mem 0x3fe3c0-0x3f]
[0.185226] NUMA: NODE_DATA [mem 0x202fe3c0-0x202f]
[0.185229] NUMA: NODE_DATA [mem 0x203ffdfde3c0-0x203ffdfd]




Re: [PATCH] ACPI/IORT: Fix off-by-one check in iort_dev_find_its_id()

2019-07-22 Thread Hanjun Guo
On 2019/7/23 0:25, Lorenzo Pieralisi wrote:
> Static analysis identified that index comparison against ITS entries in
> iort_dev_find_its_id() is off by one.
> 
> Update the comparison condition and clarify the resulting error
> message.
> 
> Fixes: 4bf2efd26d76 ("ACPI: Add new IORT functions to support MSI domain 
> handling")
> Link: https://lore.kernel.org/linux-arm-kernel/20190613065410.GB16334@mwanda/
> Reported-by: Dan Carpenter 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Dan Carpenter 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 

Reviewed-by: Hanjun Guo 



Re: [PATCH] ACPI/IORT: Rename arm_smmu_v3_set_proximity() 'node' local variable

2019-07-22 Thread Hanjun Guo
On 2019/7/23 0:14, Lorenzo Pieralisi wrote:
> Commit 36a2ba07757d ("ACPI/IORT: Reject platform device creation on NUMA
> node mapping failure") introduced a local variable 'node' in
> arm_smmu_v3_set_proximity() that shadows the struct acpi_iort_node
> pointer function parameter.
> 
> Execution was unaffected but it is prone to errors and can lead
> to subtle bugs.
> 
> Rename the local variable to prevent any issue.
> 
> Reported-by: Will Deacon 
> Signed-off-by: Lorenzo Pieralisi 
> Cc: Will Deacon 
> Cc: Hanjun Guo 
> Cc: Sudeep Holla 
> Cc: Catalin Marinas 
> Cc: Robin Murphy 
> Cc: Kefeng Wang 
> ---
>  drivers/acpi/arm64/iort.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Reviewed-by: Hanjun Guo 



Re: [PATCH v2 0/5] Add NUMA-awareness to qspinlock

2019-07-12 Thread Hanjun Guo
On 2019/7/3 19:58, Jan Glauber wrote:
> Hi Alex,
> I've tried this series on arm64 (ThunderX2 with up to SMT=4  and 224 CPUs)
> with the borderline testcase of accessing a single file from all
> threads. With that
> testcase the qspinlock slowpath is the top spot in the kernel.
> 
> The results look really promising:
> 
> CPUsnormalnuma-qspinlocks
> -
> 56149.41  73.90
> 224  576.95  290.31
> 
> Also frontend-stalls are reduced to 50% and interconnect traffic is
> greatly reduced.
> Tested-by: Jan Glauber 

Tested this patchset on Kunpeng920 ARM64 server (96 cores,
4 NUMA nodes), and with the same test case from Jan, I can
see 150%+ boost! (Need to add a patch below [1].)

For the real workload such as Nginx I can see about 10%
performance improvement as well.

Tested-by: Hanjun Guo 

Please cc me for new versions and I'm willing to test it.

Thanks
Hanjun

[1]
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 657bbc5..72c1346 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -792,6 +792,20 @@ config NODES_SHIFT
  Specify the maximum number of NUMA Nodes available on the target
  system.  Increases memory reserved to accommodate various tables.

+config NUMA_AWARE_SPINLOCKS
+ bool "Numa-aware spinlocks"
+ depends on NUMA
+ default y
+ help
+   Introduce NUMA (Non Uniform Memory Access) awareness into
+   the slow path of spinlocks.
+
+   The kernel will try to keep the lock on the same node,
+   thus reducing the number of remote cache misses, while
+   trading some of the short term fairness for better performance.
+
+   Say N if you want absolute first come first serve fairness.
+
 config USE_PERCPU_NUMA_NODE_ID
def_bool y
depends on NUMA
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 2994167..be5dd44 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,7 +4,7 @@
 #endif

 #include 
-
+#include 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
  *
@@ -170,7 +170,7 @@ static __always_inline void cna_init_node(struct 
mcs_spinlock *node, int cpuid,
  u32 tail)
 {
if (decode_numa_node(node->node_and_count) == -1)
-   store_numa_node(node, numa_cpu_node(cpuid));
+ store_numa_node(node, cpu_to_node(cpuid));
node->encoded_tail = tail;
 }



Re: [PATCH v3 1/2] x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig

2019-07-01 Thread Hanjun Guo
On 2019/7/2 1:58, Alexandre Ghiti wrote:
> ARCH_WANT_HUGE_PMD_SHARE config was declared in both architectures:
> move this declaration in arch/Kconfig and make those architectures
> select it.
> 
> Signed-off-by: Alexandre Ghiti 
> ---
>  arch/Kconfig   | 3 +++
>  arch/arm64/Kconfig | 4 +---
>  arch/x86/Kconfig   | 4 +---
>  3 files changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index c47b328eada0..d2f212dc8e72 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -577,6 +577,9 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +config ARCH_WANT_HUGE_PMD_SHARE
> + bool
> +
>  config HAVE_ARCH_SOFT_DIRTY
>   bool
>  
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 697ea0510729..c862575decd3 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -71,6 +71,7 @@ config ARM64
>   select ARCH_SUPPORTS_NUMA_BALANCING
>   select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT
>   select ARCH_WANT_FRAME_POINTERS
> + select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
> && !ARM64_VA_BITS_36)
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
>   select ARM_AMBA
>   select ARM_ARCH_TIMER
> @@ -901,9 +902,6 @@ config HW_PERF_EVENTS
>  config SYS_SUPPORTS_HUGETLBFS
>   def_bool y
>  
> -config ARCH_WANT_HUGE_PMD_SHARE
> - def_bool y if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
> -

Reviewed-by: Hanjun Guo 

Thanks
Hanjun



Re: [PATCH REBASE v2 1/2] x86, arm64: Move ARCH_WANT_HUGE_PMD_SHARE config in arch/Kconfig

2019-06-30 Thread Hanjun Guo
On 2019/5/26 20:50, Alexandre Ghiti wrote:
> ARCH_WANT_HUGE_PMD_SHARE config was declared in both architectures:
> move this declaration in arch/Kconfig and make those architectures
> select it.
> 
> Signed-off-by: Alexandre Ghiti 
> Reviewed-by: Palmer Dabbelt 
> ---
>  arch/Kconfig   | 3 +++
>  arch/arm64/Kconfig | 2 +-
>  arch/x86/Kconfig   | 4 +---
>  3 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index c47b328eada0..d2f212dc8e72 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -577,6 +577,9 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +config ARCH_WANT_HUGE_PMD_SHARE
> + bool
> +
>  config HAVE_ARCH_SOFT_DIRTY
>   bool
>  
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4780eb7af842..dee7f750c42f 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -70,6 +70,7 @@ config ARM64
>   select ARCH_SUPPORTS_NUMA_BALANCING
>   select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
>   select ARCH_WANT_FRAME_POINTERS
> + select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
> && !ARM64_VA_BITS_36)
>   select ARCH_HAS_UBSAN_SANITIZE_ALL
>   select ARM_AMBA
>   select ARM_ARCH_TIMER
> @@ -884,7 +885,6 @@ config SYS_SUPPORTS_HUGETLBFS
>   def_bool y
>  
>  config ARCH_WANT_HUGE_PMD_SHARE
> - def_bool y if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)

Why not remove config ARCH_WANT_HUGE_PMD_SHARE as well?
Did I miss something?

Thanks
Hanjun



Re: [PATCH v8 3/7] cpu-topology: Move cpu topology code to common code.

2019-06-28 Thread Hanjun Guo
On 2019/6/28 3:52, Atish Patra wrote:
> Both RISC-V & ARM64 are using cpu-map device tree to describe
> their cpu topology. It's better to move the relevant code to
> a common place instead of duplicate code.
> 
> To: Will Deacon 
> To: Catalin Marinas 

Using Cc: is better.

> Signed-off-by: Atish Patra 
> [Tested on QDF2400]
> Tested-by: Jeffrey Hugo 
> [Tested on Juno and other embedded platforms.]
> Tested-by: Sudeep Holla 
> Reviewed-by: Sudeep Holla 
> Acked-by: Will Deacon 
> Acked-by: Greg Kroah-Hartman 
> ---
>  arch/arm64/include/asm/topology.h |  23 ---
>  arch/arm64/kernel/topology.c  | 303 +-
>  drivers/base/arch_topology.c  | 296 +
>  include/linux/arch_topology.h |  28 +++
>  include/linux/topology.h  |   1 +
>  5 files changed, 329 insertions(+), 322 deletions(-)

Tested on Kunpeng920 ARM64 server, works good,

# lscpu
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  96
On-line CPU(s) list: 0-95
Thread(s) per core:  1
Core(s) per socket:  48
Socket(s):   2
NUMA node(s):4
Vendor ID:   0x48
Model:   0
Stepping:0x1
CPU max MHz: 2600.
CPU min MHz: 260.
BogoMIPS:200.00
L1d cache:   64K
L1i cache:   64K
L2 cache:512K
L3 cache:32768K
NUMA node0 CPU(s):   0-23
NUMA node1 CPU(s):   24-47
NUMA node2 CPU(s):   48-71
NUMA node3 CPU(s):   72-95
Flags:   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp 
asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

Tested-by: Hanjun Guo 

For the ACPI code,

Acked-by: Hanjun Guo 

Thanks
Hanjun



Re: [PATCH v5 1/4] ACPI/PPTT: Modify node flag detection to find last IDENTICAL

2019-06-26 Thread Hanjun Guo
On 2019/6/27 5:37, Jeremy Linton wrote:
> The ACPI specification implies that the IDENTICAL flag should be
> set on all non leaf nodes where the children are identical.
> This means that we need to be searching for the last node with
> the identical flag set rather than the first one.
> 
> Since this flag is also dependent on the table revision, we
> need to add a bit of extra code to verify the table revision,
> and the next node's state in the traversal. Since we want to
> avoid function pointers here, lets just special case
> the IDENTICAL flag.
> 
> Tested-by: Hanjun Gou 

This is wrong, my family name is Guo, and please correct my
email address as well (for all the 4 patches).

Thanks
Hanjun




Re: [PATCH v2] irqchip/mbigen: stop printing kernel addresses

2019-06-18 Thread Hanjun Guo
On 2019/6/18 17:15, Kefeng Wang wrote:
> After commit ad67b74d2469d9b8 ("printk: hash addresses printed with %p"),
> it will print "ptrval" instead of actual addresses when mbigen
> create domain fails,
> 
>   Hisilicon MBIGEN-V2 HISI0152:00: Failed to create mbi-gen@(ptrval) 
> irqdomain
>   Hisilicon MBIGEN-V2: probe of HISI0152:00 failed with error -12
> 
> dev_xxx() helper contains the device info, HISI0152:00, which stands for
> mbigen ACPI HID and its UID, we can identify the failing probed mbigen,
> so just remove the printing "mgn_chip->base", and also add missing "\n".
> 
> Signed-off-by: Kefeng Wang 
> ---
>  drivers/irqchip/irq-mbigen.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c
> index 98b6e1d4b1a6..c0f65ea0ae0f 100644
> --- a/drivers/irqchip/irq-mbigen.c
> +++ b/drivers/irqchip/irq-mbigen.c
> @@ -355,8 +355,7 @@ static int mbigen_device_probe(struct platform_device 
> *pdev)
>   err = -EINVAL;
>  
>   if (err) {
> - dev_err(>dev, "Failed to create mbi-gen@%p irqdomain",
> -         mgn_chip->base);
> + dev_err(>dev, "Failed to create mbi-gen irqdomain\n");
>   return err;

Reviewed-by: Hanjun Guo 



Re: [PATCH] irqchip/mbigen: stop printing kernel addresses

2019-06-18 Thread Hanjun Guo
On 2019/6/18 11:22, Kefeng Wang wrote:
> After commit ad67b74d2469d9b8 ("printk: hash addresses printed with %p"),
> it will print "ptrval" instead of actual addresses when mbigen
> create domain fails,
> 
>   Hisilicon MBIGEN-V2 HISI0152:00: Failed to create mbi-gen@(ptrval) 
> irqdomain
>   Hisilicon MBIGEN-V2: probe of HISI0152:00 failed with error -12
> 
> Instead of changing the print to "%px", and leaking kernel addresses,
> just remove the print completely.

This is a little bit misleading, as the "base" was used for
identify which mbigen failed, so saying 'remove completely'
will make people think that we will miss some debug information.

In fact, we have HISI0152:00 stands for mbigen ACPI HID and
its UID, so we can identify the failing probed mbigen even
we remove the printing "mgn_chip->base". It's better to add
this clarify in the commit message as well.

Thanks
Hanjun



Re: [PATCH v4 0/4] arm64: SPE ACPI enablement

2019-06-17 Thread Hanjun Guo
On 2019/6/15 9:09, Jeremy Linton wrote:
> This patch series enables the Arm Statistical Profiling
> Extension (SPE) on ACPI platforms.
> 
> This is possible because ACPI 6.3 uses a previously
> reserved field in the MADT to store the SPE interrupt
> number, similarly to how the normal PMU is described.
> If a consistent valid interrupt exists across all the
> cores in the system, a platform device is registered.
> That then triggers the SPE module, which runs as normal.
> 
> We also add the ability to parse the PPTT for IDENTICAL
> cores. We then use this to sanity check the single SPE
> device we create. This creates a bit of a problem with
> respect to the specification though. The specification
> says that its legal for multiple tree's to exist in the
> PPTT. We handle this fine, but what happens in the
> case of multiple tree's is that the lack of a common
> node with IDENTICAL set forces us to assume that there
> are multiple non-IDENTICAL cores in the machine.
> 
> v3->v4: Rebase to 5.2.
>   Minor formatting, patch rearrangement.
>   Add missing `inline` in static header definition.
>   Drop ARM_SPE_ACPI and just use ARM_SPE_PMU.

Tested on top of 5.2-rc1, I can see in the boot log:

arm_spe_pmu arm,spe-v1: probed for CPUs 0-95 [max_record_sz 128, align 4, 
features 0x7]

and I also tested perf record, and works as expected,

Tested-by: Hanjun Guo 

Thanks
Hanjun



Re: [RFC] Disable lockref on arm64

2019-06-13 Thread Hanjun Guo
On 2019/6/12 12:10, Jayachandran Chandrasekharan Nair wrote:
> On Wed, May 22, 2019 at 05:04:17PM +0100, Will Deacon wrote:
>> On Sat, May 18, 2019 at 12:00:34PM +0200, Ard Biesheuvel wrote:
>>> On Sat, 18 May 2019 at 06:25, Jayachandran Chandrasekharan Nair
>>>  wrote:

 On Mon, May 06, 2019 at 07:10:40PM +0100, Will Deacon wrote:
> On Mon, May 06, 2019 at 06:13:12AM +, Jayachandran Chandrasekharan 
> Nair wrote:
>> Perhaps someone from ARM can chime in here how the cas/yield combo
>> is expected to work when there is contention. ThunderX2 does not
>> do much with the yield, but I don't expect any ARM implementation
>> to treat YIELD as a hint not to yield, but to get/keep exclusive
>> access to the last failed CAS location.
>
> Just picking up on this as "someone from ARM".
>
> The yield instruction in our implementation of cpu_relax() is *only* there
> as a scheduling hint to QEMU so that it can treat it as an internal
> scheduling hint and run some other thread; see 1baa82f48030 ("arm64:
> Implement cpu_relax as yield"). We can't use WFE or WFI blindly here, as 
> it
> could be a long time before we see a wake-up event such as an interrupt. 
> Our
> implementation of smp_cond_load_acquire() is much better for that kind of
> thing, but doesn't help at all for a contended CAS loop where the variable
> is actually changing constantly.

 Looking thru the perf output of this case (open/close of a file from
 multiple CPUs), I see that refcount is a significant factor in most
 kernel configurations - and that too uses cmpxchg (without yield).
 x86 has an optimized inline version of refcount that helps
 significantly. Do you think this is worth looking at for arm64?

>>>
>>> I looked into this a while ago [0], but at the time, we decided to
>>> stick with the generic implementation until we encountered a use case
>>> that benefits from it. Worth a try, I suppose ...
>>>
>>> [0] 
>>> https://lore.kernel.org/linux-arm-kernel/20170903101622.12093-1-ard.biesheu...@linaro.org/
>>
>> If JC can show that we benefit from this, it would be interesting to see if
>> we can implement the refcount-full saturating arithmetic using the
>> LDMIN/LDMAX instructions instead of the current cmpxchg() loops.
> 
> Now that the lockref change is mainline, I think we need to take another
> look at this patch.
> 
> Using a fixed up version of Ard's patch above along with Jan's lockref
> change upstream, I get significant improvement in scaling for my file
> open/read/close testcase[1]. Like I wrote earlier, if I take a
> standard Ubuntu arm64 kernel configuration, most of the time for my
> test[1] is spent in refcount operations.
> 
> With Ard's changes applied[2], I see that the lockref CAS code becomes
> the top function and then the retry limit will kick in as expected. In
> my testcase, I see that the queued spinlock case is about 2.5 times
> faster than the unbound CAS loop when 224 CPUs are enabled (SMT 4,
> 28core, 2socket).
> 
> JC
> 
> [1] https://github.com/jchandra-cavm/refcount-test
> [2] https://github.com/jchandra-cavm/linux/commits/refcount-fixes

FWIW, with the patch (Ard's patch plus fixes) above, running the
same testcase on ARM64 Kunpeng920 96 CPU core system, I can see about 50%
performance boost.

I also tested Jan's lockref change without Ard's patch, performance
is almost the same.

Thanks
Hanjun



Re: [PATCH v11 0/3] remain and optimize memblock_next_valid_pfn on arm and arm64

2019-06-12 Thread Hanjun Guo
On 2019/6/12 9:05, Jia He wrote:
>>
>>> So what I would like to see is the patch set being proposed again,
>>> with the new data points added for documentation. Also, the commit
>>> logs need to crystal clear about how the meaning of PFN validity
>>> differs between ARM and other architectures, and why the assumptions
>>> that the optimization is based on are guaranteed to hold.
>> I think Jia He no longer works for HXT, if don't mind, I can repost
>> this patch set with Jia He's authority unchanged.
> Ok, I don't mind that, thanks for your followup :)

That's great, I will prepare a new version with Ard's comments addressed
then repost.

Thanks
Hanjun



Re: [PATCH v11 0/3] remain and optimize memblock_next_valid_pfn on arm and arm64

2019-06-11 Thread Hanjun Guo
Hello Ard,

Thanks for the reply, please see my comments inline.

On 2019/6/10 21:16, Ard Biesheuvel wrote:
> On Sat, 8 Jun 2019 at 06:22, Hanjun Guo  wrote:
>>
>> Hi Ard, Will,
>>
>> This week we were trying to debug an issue of time consuming in mem_init(),
>> and leading to this similar solution form Jia He, so I would like to bring 
>> this
>> thread back, please see my detail test result below.
>>
>> On 2018/9/7 22:44, Will Deacon wrote:
>>> On Thu, Sep 06, 2018 at 01:24:22PM +0200, Ard Biesheuvel wrote:
>>>> On 22 August 2018 at 05:07, Jia He  wrote:
>>>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>>>> where possible") optimized the loop in memmap_init_zone(). But it causes
>>>>> possible panic bug. So Daniel Vacek reverted it later.
>>>>>
>>>>> But as suggested by Daniel Vacek, it is fine to using memblock to skip
>>>>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>>>>>
>>>>> More from what Daniel said:
>>>>> "On arm and arm64, memblock is used by default. But generic version of
>>>>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
>>>>> not always return the next valid one but skips more resulting in some
>>>>> valid frames to be skipped (as if they were invalid). And that's why
>>>>> kernel was eventually crashing on some !arm machines."
>>>>>
>>>>> About the performance consideration:
>>>>> As said by James in b92df1de5,
>>>>> "I have tested this patch on a virtual model of a Samurai CPU with a
>>>>> sparse memory map.  The kernel boot time drops from 109 to 62 seconds."
>>>>> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64.
>>>>>
>>>>> Besides we can remain memblock_next_valid_pfn, there is still some room
>>>>> for improvement. After this set, I can see the time overhead of 
>>>>> memmap_init
>>>>> is reduced from 27956us to 13537us in my armv8a server(QDF2400 with 96G
>>>>> memory, pagesize 64k). I believe arm server will benefit more if memory is
>>>>> larger than TBs
>>>>>
>>>>
>>>> OK so we can summarize the benefits of this series as follows:
>>>> - boot time on a virtual model of a Samurai CPU drops from 109 to 62 
>>>> seconds
>>>> - boot time on a QDF2400 arm64 server with 96 GB of RAM drops by ~15
>>>> *milliseconds*
>>>>
>>>> Google was not very helpful in figuring out what a Samurai CPU is and
>>>> why we should care about the boot time of Linux running on a virtual
>>>> model of it, and the 15 ms speedup is not that compelling either.
>>
>> Testing this patch set on top of Kunpeng 920 based ARM64 server, with
>> 384G memory in total, we got the time consuming below
>>
>>  without this patch set  with this patch set
>> mem_init()13310ms  1415ms
>>
>> So we got about 8x speedup on this machine, which is very impressive.
>>
> 
> Yes, this is impressive. But does it matter in the grand scheme of
> things? 

It matters for this machine, because it's for storage and there is
a watchdog and the time consuming triggers the watchdog.

> How much time does this system take to arrive at this point
> from power on?

Sorry, I don't have such data, as the arch timer is not initialized
and I didn't see the time stamp at this point, but I read the cycles
from arch timer before and after the time consuming function to get
how much time consumed.

> 
>> The time consuming is related the memory DIMM size and where to locate those
>> memory DIMMs in the slots. In above case, we are using 16G memory DIMM.
>> We also tested 1T memory with 64G size for each memory DIMM on another ARM64
>> machine, the time consuming reduced from 20s to 2s (I think it's related to
>> firmware implementations).
>>
> 
> I agree that this optimization looks good in isolation, but the fact
> that you spotted a bug justifies my skepticism at the time. On the
> other hand, now that we have several independent reports (from you,
> but also from the Renesas folks) that the speedup is worthwhile for
> real world use cases, I think it does make sense to revisit it.

Thank you very much for taking care of this :)

> 
> So what I would like to see is the patch set being proposed again,
> with the new data points added for documentation. Also, the commit
> logs need to crystal clear about how the meaning of PFN validity
> differs between ARM and other architectures, and why the assumptions
> that the optimization is based on are guaranteed to hold.

I think Jia He no longer works for HXT, if don't mind, I can repost
this patch set with Jia He's authority unchanged.

Thanks
Hanjun



Re: [PATCH v11 0/3] remain and optimize memblock_next_valid_pfn on arm and arm64

2019-06-07 Thread Hanjun Guo
Hi Ard, Will,

This week we were trying to debug an issue of time consuming in mem_init(),
and leading to this similar solution form Jia He, so I would like to bring this
thread back, please see my detail test result below.

On 2018/9/7 22:44, Will Deacon wrote:
> On Thu, Sep 06, 2018 at 01:24:22PM +0200, Ard Biesheuvel wrote:
>> On 22 August 2018 at 05:07, Jia He  wrote:
>>> Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
>>> where possible") optimized the loop in memmap_init_zone(). But it causes
>>> possible panic bug. So Daniel Vacek reverted it later.
>>>
>>> But as suggested by Daniel Vacek, it is fine to using memblock to skip
>>> gaps and finding next valid frame with CONFIG_HAVE_ARCH_PFN_VALID.
>>>
>>> More from what Daniel said:
>>> "On arm and arm64, memblock is used by default. But generic version of
>>> pfn_valid() is based on mem sections and memblock_next_valid_pfn() does
>>> not always return the next valid one but skips more resulting in some
>>> valid frames to be skipped (as if they were invalid). And that's why
>>> kernel was eventually crashing on some !arm machines."
>>>
>>> About the performance consideration:
>>> As said by James in b92df1de5,
>>> "I have tested this patch on a virtual model of a Samurai CPU with a
>>> sparse memory map.  The kernel boot time drops from 109 to 62 seconds."
>>> Thus it would be better if we remain memblock_next_valid_pfn on arm/arm64.
>>>
>>> Besides we can remain memblock_next_valid_pfn, there is still some room
>>> for improvement. After this set, I can see the time overhead of memmap_init
>>> is reduced from 27956us to 13537us in my armv8a server(QDF2400 with 96G
>>> memory, pagesize 64k). I believe arm server will benefit more if memory is
>>> larger than TBs
>>>
>>
>> OK so we can summarize the benefits of this series as follows:
>> - boot time on a virtual model of a Samurai CPU drops from 109 to 62 seconds
>> - boot time on a QDF2400 arm64 server with 96 GB of RAM drops by ~15
>> *milliseconds*
>>
>> Google was not very helpful in figuring out what a Samurai CPU is and
>> why we should care about the boot time of Linux running on a virtual
>> model of it, and the 15 ms speedup is not that compelling either.

Testing this patch set on top of Kunpeng 920 based ARM64 server, with
384G memory in total, we got the time consuming below

 without this patch set  with this patch set
mem_init()13310ms  1415ms

So we got about 8x speedup on this machine, which is very impressive.

The time consuming is related the memory DIMM size and where to locate those
memory DIMMs in the slots. In above case, we are using 16G memory DIMM.
We also tested 1T memory with 64G size for each memory DIMM on another ARM64
machine, the time consuming reduced from 20s to 2s (I think it's related to
firmware implementations).

>>
>> Apologies to Jia that it took 11 revisions to reach this conclusion,
>> but in /my/ opinion, tweaking the fragile memblock/pfn handling code
>> for this reason is totally unjustified, and we're better off
>> disregarding these patches.

Indeed this patch set has a bug, For exampe, if we have 3 regions which
is [a, b] [c, d] [e, f] if address of pfn is bigger than the end address of
last region, we will increase early_region_idx to count of region, which is
out of bound of the regions. Fixed by patch below,

 mm/memblock.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 8279295..8283bf0 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1252,13 +1252,17 @@ unsigned long __init_memblock 
memblock_next_valid_pfn(unsigned long pfn)
if (pfn >= start_pfn && pfn < end_pfn)
return pfn;

-   early_region_idx++;
+   /* try slow path */
+   if (++early_region_idx == type->cnt)
+   goto slow_path;
+
next_start_pfn = PFN_DOWN(regions[early_region_idx].base);

if (pfn >= end_pfn && pfn <= next_start_pfn)
return next_start_pfn;
}

+slow_path:
/* slow path, do the binary searching */
do {
mid = (right + left) / 2;

As the really impressive speedup on our ARM64 server system, could you 
reconsider
this patch set for merge? if you want more data I'm willing to clarify and give
more test.

Thanks
Hanjun



Re: [PATCH] ACPI/IORT: Fix build without CONFIG_IOMMU_API

2019-05-20 Thread Hanjun Guo
On 2019/5/20 14:57, Christoph Hellwig wrote:
> IOMMU_FWSPEC_PCI_RC_ATS is only defined if CONFIG_IOMMU_API is
> enabled.
> 
> Fixes: 5702ee24182f ("ACPI/IORT: Check ATS capability in root complex nodes")
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/acpi/arm64/iort.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 9058cb084b91..3e542b5d2a2d 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1074,9 +1074,10 @@ const struct iommu_ops *iort_iommu_configure(struct 
> device *dev)
>   info.node = node;
>   err = pci_for_each_dma_alias(to_pci_dev(dev),
>iort_pci_iommu_init, );
> -
> +#ifdef CONFIG_IOMMU_API
>   if (!err && iort_pci_rc_supports_ats(node))
>   dev->iommu_fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS;
> +#endif
>   } else {
>   int i = 0;

This was reported, please refer to this patch from Lorenzo:

https://patchwork.kernel.org/patch/10946845/

Thanks
Hanjun



Re: [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock

2019-04-03 Thread Hanjun Guo
Hi Alex,

On 2019/3/29 23:20, Alex Kogan wrote:
> +
> +static __always_inline void cna_init_node(struct mcs_spinlock *node, int 
> cpuid,
> +   u32 tail)
> +{
> + if (decode_numa_node(node->node_and_count) == -1)
> + store_numa_node(node, numa_cpu_node(cpuid));

How about using cpu_to_node() here and #include  in this
file, then the code can be reused for other architectures such as ARM64?

Thanks
Hanjun



Re: [PATCH RFC 1/1] iommu: set the default iommu-dma mode as non-strict

2019-02-26 Thread Hanjun Guo
Hi Jean,

On 2019/1/31 22:55, Jean-Philippe Brucker wrote:
> Hi,
> 
> On 31/01/2019 13:52, Zhen Lei wrote:
>> Currently, many peripherals are faster than before. For example, the top
>> speed of the older netcard is 10Gb/s, and now it's more than 25Gb/s. But
>> when iommu page-table mapping enabled, it's hard to reach the top speed
>> in strict mode, because of frequently map and unmap operations. In order
>> to keep abreast of the times, I think it's better to set non-strict as
>> default.
> 
> Most users won't be aware of this relaxation and will have their system
> vulnerable to e.g. thunderbolt hotplug. See for example 4.3 Deferred
> Invalidation in
> http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2018/MSC/MSC-2018-21.pdf
> 
> Why not keep the policy to secure by default, as we do for
> iommu.passthrough? And maybe add something similar to
> CONFIG_IOMMU_DEFAULT_PASSTRHOUGH? It's easy enough for experts to pass a
> command-line argument or change the default config.

Sorry for the late reply, it was Chinese new year, and we had a long discussion
internally, we are fine to add a Kconfig but not sure OS vendors will set it
to default y.

OS vendors seems not happy to pass a command-line argument, to be honest,
this is our motivation to enable non-strict as default. Hope OS vendors
can see this email thread, and give some input here.

Thanks
Hanjun



Re: [PATCH v6 4/4] perf/smmuv3_pmu: Enable HiSilicon Erratum 162001800 quirk

2019-02-13 Thread Hanjun Guo
On 2019/2/4 20:13, Shameer Kolothum wrote:
> HiSilicon erratum 162001800 describes the limitation of
> SMMUv3 PMCG implementation on HiSilicon Hip08 platforms.
> 
> On these platforms, the PMCG event counter registers
> (SMMU_PMCG_EVCNTRn) are read only and as a result it
> is not possible to set the initial counter period value
> on event monitor start.
> 
> To work around this, the current value of the counter
> is read and used for delta calculations. OEM information
> from ACPI header is used to identify the affected hardware
> platforms.
> 
> Signed-off-by: Shameer Kolothum 
> ---
>  drivers/acpi/arm64/iort.c | 16 ++-
>  drivers/perf/arm_smmuv3_pmu.c | 48 
> ---
>  include/linux/acpi_iort.h |  1 +
>  3 files changed, 57 insertions(+), 8 deletions(-)

For this patch,

Reviewed-by: Hanjun Guo 

Thanks
Hanjun



Re: [RFC PATCH] USB: PCI: set 32bit DMA mask for PCI based USB controllers

2019-02-02 Thread Hanjun Guo
On 2019/2/1 17:13, Hanjun Guo wrote:
> On 2019/2/1 13:55, Hanjun Guo wrote:
>> Hi John,
>>
>> On 2019/1/31 17:54, John Garry wrote:
>>> On 30/01/2019 07:01, Hanjun Guo wrote:
>>>> From: Hanjun Guo 
>> [...]
>>>>
>>>>  drivers/usb/core/hcd-pci.c | 4 
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
>>>> index 0343246..a9c33e6 100644
>>>> --- a/drivers/usb/core/hcd-pci.c
>>>> +++ b/drivers/usb/core/hcd-pci.c
>>>> @@ -188,6 +188,10 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const 
>>>> struct pci_device_id *id)
>>>>  if (pci_enable_device(dev) < 0)
>>>>  return -ENODEV;
>>>>
>>>> +    retval = dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(32));
>>>> +    if (retval)
>>>> +    return retval;
>>>
>>> Hi Hanjun,
>>>
>>> You're missing tidy-up upon failure.
>>
>> Good catch, needs to disable pci for failure, I will send
>> a updated version to address the comments from you and Christoph.
> 
> There is a _DMA method which was introduced in ACPI 6.2 to cover
> this, I will try that solution then report back.

We add a _DMA method under the hostbridge which the EHCI/OHCI being
connected to, the calltrace is gone and EHCI works as expected.

No need for kernel change, so this patch is dropped.

Thanks
Hanjun




Re: [RFC PATCH] USB: PCI: set 32bit DMA mask for PCI based USB controllers

2019-02-01 Thread Hanjun Guo
On 2019/2/1 13:55, Hanjun Guo wrote:
> Hi John,
> 
> On 2019/1/31 17:54, John Garry wrote:
>> On 30/01/2019 07:01, Hanjun Guo wrote:
>>> From: Hanjun Guo 
> [...]
>>>
>>>  drivers/usb/core/hcd-pci.c | 4 
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
>>> index 0343246..a9c33e6 100644
>>> --- a/drivers/usb/core/hcd-pci.c
>>> +++ b/drivers/usb/core/hcd-pci.c
>>> @@ -188,6 +188,10 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const 
>>> struct pci_device_id *id)
>>>  if (pci_enable_device(dev) < 0)
>>>  return -ENODEV;
>>>
>>> +    retval = dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(32));
>>> +    if (retval)
>>> +    return retval;
>>
>> Hi Hanjun,
>>
>> You're missing tidy-up upon failure.
> 
> Good catch, needs to disable pci for failure, I will send
> a updated version to address the comments from you and Christoph.

There is a _DMA method which was introduced in ACPI 6.2 to cover
this, I will try that solution then report back.

Thanks
Hanjun




Re: [RFC PATCH] USB: PCI: set 32bit DMA mask for PCI based USB controllers

2019-01-31 Thread Hanjun Guo
Hi John,

On 2019/1/31 17:54, John Garry wrote:
> On 30/01/2019 07:01, Hanjun Guo wrote:
>> From: Hanjun Guo 
[...]
>>
>>  drivers/usb/core/hcd-pci.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
>> index 0343246..a9c33e6 100644
>> --- a/drivers/usb/core/hcd-pci.c
>> +++ b/drivers/usb/core/hcd-pci.c
>> @@ -188,6 +188,10 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct 
>> pci_device_id *id)
>>  if (pci_enable_device(dev) < 0)
>>  return -ENODEV;
>>
>> +    retval = dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(32));
>> +    if (retval)
>> +    return retval;
> 
> Hi Hanjun,
> 
> You're missing tidy-up upon failure.

Good catch, needs to disable pci for failure, I will send
a updated version to address the comments from you and Christoph.

Thanks
Hanjun



Re: [RFC PATCH] USB: PCI: set 32bit DMA mask for PCI based USB controllers

2019-01-30 Thread Hanjun Guo
Hi Christoph,

On 2019/1/30 15:40, Christoph Hellwig wrote:
> On Wed, Jan 30, 2019 at 03:01:54PM +0800, Hanjun Guo wrote:
>> This is the RFC version, I'm not sure this is the best solution,
>> comments are warmly welcomed.
>>
>> Thanks
>> Hanjun
>>
>>  drivers/usb/core/hcd-pci.c | 4 
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
>> index 0343246..a9c33e6 100644
>> --- a/drivers/usb/core/hcd-pci.c
>> +++ b/drivers/usb/core/hcd-pci.c
>> @@ -188,6 +188,10 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct 
>> pci_device_id *id)
>>  if (pci_enable_device(dev) < 0)
>>  return -ENODEV;
>>  
>> +retval = dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(32));
>> +if (retval)
>> +return retval;
> 
> dma_coerce_mask_and_coherent is only for platform devices (and I'm
> not sure it is a good idea to start with, but that is a different
> story).

Yes, that's why this is the RFC version.

> 
> PCI device should have the dma_mask pointer set already, so you should
> use dma_set_mask_and_coherent here.

I will wait for a while to see if more comments, if not, I will update
my patch as you suggested.

Thanks
Hanjun




[RFC PATCH] USB: PCI: set 32bit DMA mask for PCI based USB controllers

2019-01-29 Thread Hanjun Guo
From: Hanjun Guo 

We met an issue that when we update the IORT table to revision D,
and the kernel update to 4.19, the USB on D06 (ARM64 based server)
will probe fail:

[   13.495751] CPU: 0 PID: 15 Comm: kworker/0:1 Not tainted 
4.19.0-00115-gb2b5200 #5
[   13.503219] Hardware name: Huawei D06/D06, BIOS Hisilicon D06 UEFI RC0 - 
V1.09.02 12/25/2018
[   13.511645] Workqueue: events work_for_cpu_fn
[   13.515989] pstate: a0c9 (NzCv daif +PAN +UAO)
[   13.520767] pc : dma_pool_alloc+0x218/0x270
[   13.524937] lr : dma_pool_alloc+0xa0/0x270
[   13.529019] sp : 09e23b20
[   13.532320] x29: 09e23b20 x28: 8027c58ad098 
[   13.537619] x27: 1000 x26: 8027d7a790a8 
[   13.542918] x25: 08fa7000 x24: 09e23bc0 
[   13.548216] x23: 006000c0 x22: 8027c58ad010 
[   13.553515] x21: 097e1000 x20: 8027c58ad000 
[   13.558814] x19: 8027c58ad080 x18:  
[   13.564112] x17:  x16: 7fff 
[   13.569411] x15: 097e16c8 x14: 8027c5d39885 
[   13.574709] x13: 8027c5d39884 x12: 0038 
[   13.580008] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f 
[   13.585307] x9 :  x8 : 8027c587c400 
[   13.590605] x7 :  x6 : 003f 
[   13.595904] x5 : 8027dc5b8000 x4 : 8027e09b91e0 
[   13.601202] x3 : 008d2280 x2 : 8027c58ad100 
[   13.606501] x1 : 0028 x0 :  
[   13.611800] Call trace:
[   13.614234]  dma_pool_alloc+0x218/0x270
[   13.617710] ata1: SATA link down (SStatus 0 SControl 300)
[   13.618059]  ehci_qh_alloc+0x5c/0xf8
[   13.627002]  ehci_setup+0x17c/0x4b8
[   13.630478]  ehci_pci_setup+0x18c/0x5b8
[   13.634301]  usb_add_hcd+0x290/0x7a0
[   13.637863]  usb_hcd_pci_probe+0x2cc/0x3e8
[   13.641946]  ehci_pci_probe+0x34/0x48
[   13.645596]  local_pci_probe+0x3c/0xb0
[   13.649331]  work_for_cpu_fn+0x18/0x28
[   13.653067]  process_one_work+0x1e4/0x458
[   13.657063]  worker_thread+0x228/0x450
[   13.660798]  kthread+0x12c/0x130
[   13.664014]  ret_from_fork+0x10/0x18
[   13.667577] ---[ end trace 6f8757456e2ec456 ]---

It turns out the the IORT revision D introduce the DMA address
limit size for PCI RC and in commit 5ac65e8c8941 ("ACPI/IORT: Support
address size limit for root complexes"), will set the DMA mask
for the RC and that will be inherited by device under the RC.

D06 only enables 1 RC but has EPs with different DMA address sizes,
for USB it use 32bit DMA, and 64bit for HNS and SAS, so this will
cause probe failure if we use 64bit DMA for USB controllers.

Set the DMA mask to 32bit for PCI based USB controllers,
EHCI and OHCI USB controllers are using 32bit DMA address,
XHCI will set the DMA mask in its probe after the pci probe,
so it's safe just add dma_coerce_mask_and_coherent() in
usb_hcd_pci_probe().

Signed-off-by: Hanjun Guo 
---
Hi all,

This is the RFC version, I'm not sure this is the best solution,
comments are warmly welcomed.

Thanks
Hanjun

 drivers/usb/core/hcd-pci.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index 0343246..a9c33e6 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -188,6 +188,10 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct 
pci_device_id *id)
if (pci_enable_device(dev) < 0)
return -ENODEV;
 
+   retval = dma_coerce_mask_and_coherent(>dev, DMA_BIT_MASK(32));
+   if (retval)
+   return retval;
+
/*
 * The xHCI driver has its own irq management
 * make sure irq setup is not touched for xhci in generic hcd code
-- 
1.7.12.4



Re: [PATCH v2 0/4] add support for MBIGEN generating message based SPIs

2019-01-09 Thread Hanjun Guo
Hi Marc,

Sorry for ping you...

On 2018/10/26 15:51, Yang Yingliang wrote:
> Now MBIGENs have pins that used to generate SPIs,
> with
> 5052875 ("irqchip/gic-v3: Add support for Message Based Interrupts as an MSI 
> controller"),
> we can support MBIGEN to generate message based SPIs by writing
> GICD_SETSPIR. This patchset add support for MBIGEN generating
> message based SPIs and a bugfix for MBI driver.

What's you suggestion for this patch set? We would like your input
for next steps.

> 
> 
> Patch #1 is a bugfix for MBI driver.

At least this patch can be merged first as it's an obvious bug,
without this patch, the MBI dirver can't work I think because
the mutex lock is not initialized.

Thanks
Hanjun



Re: [PATCH -next] ACPI/IORT: fix build when CONFIG_IOMMU_API=n

2018-12-24 Thread Hanjun Guo
Hi Qian,

Good catch, minor comments below.

On 2018/12/25 1:20, Qian Cai wrote:
> rivers/acpi/arm64/iort.c:880:1: error: expected identifier or '(' before '{' 
> token
  ^^
  drivers

>  { return NULL; }
>  ^
> drivers/acpi/arm64/iort.c:879:39: warning: 'iort_fwspec_iommu_ops' used but 
> never defined
>  static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device 
> *dev);
>^
> Signed-off-by: Qian Cai 
> ---
>  drivers/acpi/arm64/iort.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index fdd90ffceb85..5d29783ee5bd 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -876,7 +876,7 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, 
> struct list_head *head)
>   return (resv == its->its_count) ? resv : -ENODEV;
>  }
>  #else
> -static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device 
> *dev);
> +static inline const struct iommu_ops *iort_fwspec_iommu_ops(struct device 
> *dev)
>  { return NULL; }
>  static inline int iort_add_device_replay(const struct iommu_ops *ops,
>struct device *dev)

Acked-by: Hanjun Guo 

Lorenzo, I think this is 4.21-rc1 material if it's OK for you.

Thanks
Hanjun



Re: [PATCH v11 6/7] ACPI/IORT: Stub out ACS functions when CONFIG_PCI is not set

2018-12-17 Thread Hanjun Guo
On 2018/12/18 10:56, Sinan Kaya wrote:
> Remove PCI dependent code out of iort.c when CONFIG_PCI is not defined.
> A quick search reveals the following functions:
> 1. pci_request_acs()
> 2. pci_domain_nr()
> 3. pci_is_root_bus()
> 4. to_pci_dev()
> 
> Both pci_domain_nr() and pci_is_root_bus() are defined in linux/pci.h.
> pci_domain_nr() is a stub function when CONFIG_PCI is not set and
> pci_is_root_bus() just returns a reference to a structure member which
> is still valid without CONFIG_PCI set.
> 
> to_pci_dev() is a macro that expands to container_of.
> 
> pci_request_acs() is the only code that gets pulled in from drivers/pci/*.c

Actually we have pci_for_each_dma_alias() too which is from
drivers/pci/search.c without stub function in linux/pci.h, but I
didn't get the compile error at link time, I think the compiler
just do some optimization to remove the dead code because
dev_is_pci() is obvious false.

Thanks
Hanjun



Re: [PATCH v10 6/6] ACPI/IORT: Stub out ACS functions when CONFIG_PCI is not set

2018-12-17 Thread Hanjun Guo
Hi Sinan,

On 2018/12/15 9:02, Sinan Kaya wrote:
> Remove PCI dependent code out of iort.c when CONFIG_PCI is not defined.
> 
> Signed-off-by: Sinan Kaya 
> ---
>  drivers/acpi/arm64/iort.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index 70f4e80b9246..d0f68607efe6 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1437,6 +1437,7 @@ static int __init iort_add_platform_device(struct 
> acpi_iort_node *node,
>  
>  static bool __init iort_enable_acs(struct acpi_iort_node *iort_node)
>  {
> +#ifdef CONFIG_PCI
>   if (iort_node->type == ACPI_IORT_NODE_PCI_ROOT_COMPLEX) {
>   struct acpi_iort_node *parent;
>   struct acpi_iort_id_mapping *map;
> @@ -1462,6 +1463,7 @@ static bool __init iort_enable_acs(struct 
> acpi_iort_node *iort_node)
>   }
>   }
>   }
> +#endif

I prefer stub function for iort_enable_acs(), not adding #ifdef/#endif pair 
inside
this function.

By the way, there are other pci function called in iort.c, could you explain a
little bit why no need to update other function calls in commit message?

Thanks
Hanjun



Re: [PATCH 4/4] ACPI/IORT: Don't call iommu_ops->add_device directly

2018-12-17 Thread Hanjun Guo
On 2018/12/11 23:05, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Make sure to invoke this call-back through the proper
> function of the IOMMU-API.
> 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/acpi/arm64/iort.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Robin and Lorenzo know this better than me, but it's
pretty straight forward to me,

Acked-by: Hanjun Guo 

Thanks
Hanjun



Re: [PATCH 3/6] ACPI/IORT: Use device_iommu_mapped()

2018-12-17 Thread Hanjun Guo
On 2018/12/11 21:43, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Replace the iommu-check with a proper and readable function
> call.
> 
> Cc: Lorenzo Pieralisi 
> Acked-by: Robin Murphy 
> Signed-off-by: Joerg Roedel 

Acked-by: Hanjun Guo 

Thanks
Hanjun



Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec

2018-12-17 Thread Hanjun Guo
On 2018/12/11 20:19, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Use the new helpers dev_iommu_fwspec_get()/set() to access
> the dev->iommu_fwspec pointer. This makes it easier to move
> that pointer later into another struct.
> 
> Cc: Lorenzo Pieralisi 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/acpi/arm64/iort.c | 19 ++-
>  1 file changed, 10 insertions(+), 9 deletions(-)

Acked-by: Hanjun Guo 

Thanks
Hanjun



Re: [PATCH 3/4] irqchip/mbigen: add support for a MBIGEN generating SPIs

2018-10-18 Thread Hanjun Guo
On 2018/10/18 19:56, Marc Zyngier wrote:
> Hi Hanjun,
> 
> On 18/10/18 12:20, Hanjun Guo wrote:
>> Hi Marc,
>>
>>>>>>
>>>>> Now, the biggest question of them all: how does it work with ACPI? Last
>>>>> time I checked, there was no DT support for platforms using the MBIGEN.
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/hisilicon/hip07.dtsi
>>>> This DT describes how platform devices using the MBIGEN.
>>>
>>> That's not how my own D05 boots. It is ACPI only. How is that going to
>>> work on this platform?
>>
>> D05 is ACPI only so it has no dtb in the firmware, that's why we can
>> boot D05 without acpi=on in the boot cmdline, but if you want to
>> boot D05 with dtb, you could try to specify the dtb in the grub
>> (seems centos based):
>>
>> menuentry "example" --id example{
>> linux /Image-kernel root=... rdinit=...
>> initrd /example.img
>> devicetree /d05.dtb
>> }
> 
> Sure. But what I'm asking here is how this change in the MBIGEN driver
> can be beneficial to people who need ACPI, for better or worse? For
> example, you cannot get the PCIe SMMU without ACPI. Good-bye VFIO.
> 
> If it is DT only, I seriously doubt anyone will be able to use it.

We have another ARM64 SoC for wireless base station which with the
same CPU DIE but different IO DIE, they share the same mbigen controller.

Since we use big-endian on wireless base station so only DT is available
(EFI stub is little-endian), we need to reuse this mbigen driver
on this SoC (not the one for D05) as well, not sure if this makes sense
to you, please let us know.

Thanks
Hanjun



Re: [PATCH 3/4] irqchip/mbigen: add support for a MBIGEN generating SPIs

2018-10-18 Thread Hanjun Guo
On 2018/10/18 19:56, Marc Zyngier wrote:
> Hi Hanjun,
> 
> On 18/10/18 12:20, Hanjun Guo wrote:
>> Hi Marc,
>>
>>>>>>
>>>>> Now, the biggest question of them all: how does it work with ACPI? Last
>>>>> time I checked, there was no DT support for platforms using the MBIGEN.
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/boot/dts/hisilicon/hip07.dtsi
>>>> This DT describes how platform devices using the MBIGEN.
>>>
>>> That's not how my own D05 boots. It is ACPI only. How is that going to
>>> work on this platform?
>>
>> D05 is ACPI only so it has no dtb in the firmware, that's why we can
>> boot D05 without acpi=on in the boot cmdline, but if you want to
>> boot D05 with dtb, you could try to specify the dtb in the grub
>> (seems centos based):
>>
>> menuentry "example" --id example{
>> linux /Image-kernel root=... rdinit=...
>> initrd /example.img
>> devicetree /d05.dtb
>> }
> 
> Sure. But what I'm asking here is how this change in the MBIGEN driver
> can be beneficial to people who need ACPI, for better or worse? For
> example, you cannot get the PCIe SMMU without ACPI. Good-bye VFIO.
> 
> If it is DT only, I seriously doubt anyone will be able to use it.

We have another ARM64 SoC for wireless base station which with the
same CPU DIE but different IO DIE, they share the same mbigen controller.

Since we use big-endian on wireless base station so only DT is available
(EFI stub is little-endian), we need to reuse this mbigen driver
on this SoC (not the one for D05) as well, not sure if this makes sense
to you, please let us know.

Thanks
Hanjun



  1   2   3   4   5   6   7   8   9   10   >