Re: [PATCH v14 01/11] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

2021-03-28 Thread chenzhou



On 2021/3/2 15:43, Baoquan He wrote:
> On 02/26/21 at 09:38am, Eric W. Biederman wrote:
>> chenzhou  writes:
>>
>>> On 2021/2/25 15:25, Baoquan He wrote:
>>>> On 02/24/21 at 02:19pm, Catalin Marinas wrote:
>>>>> On Sat, Jan 30, 2021 at 03:10:15PM +0800, Chen Zhou wrote:
>>>>>> Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the
>>>>>> alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but
>>>>>> function reserve_crashkernel() also used 1M alignment. So just
>>>>>> replace hard-coded alignment 1M with macro CRASH_ALIGN.
>>>>> [...]
>>>>>> @@ -510,7 +507,7 @@ static void __init reserve_crashkernel(void)
>>>>>>  } else {
>>>>>>  unsigned long long start;
>>>>>>  
>>>>>> -start = memblock_phys_alloc_range(crash_size, SZ_1M, 
>>>>>> crash_base,
>>>>>> +start = memblock_phys_alloc_range(crash_size, 
>>>>>> CRASH_ALIGN, crash_base,
>>>>>>crash_base + 
>>>>>> crash_size);
>>>>>>  if (start != crash_base) {
>>>>>>  pr_info("crashkernel reservation failed - 
>>>>>> memory is in use.\n");
>>>>> There is a small functional change here for x86. Prior to this patch,
>>>>> crash_base passed by the user on the command line is allowed to be 1MB
>>>>> aligned. With this patch, such reservation will fail.
>>>>>
>>>>> Is the current behaviour a bug in the current x86 code or it does allow
>>>>> 1MB-aligned reservations?
>>>> Hmm, you are right. Here we should keep 1MB alignment as is because
>>>> users specify the address and size, their intention should be respected.
>>>> The 1MB alignment for fixed memory region reservation was introduced in
>>>> below commit, but it doesn't tell what is Eric's request at that time, I
>>>> guess it meant respecting users' specifying.
>>
>>> I think we could make the alignment unified. Why is the alignment system 
>>> reserved and
>>> user specified different? Besides, there is no document about the 1MB 
>>> alignment.
>>> How about adding the alignment size(16MB) in doc  if user specified
>>> start address as arm64 does.
>> Looking at what the code is doing.  Attempting to reserve a crash region
>> at the location the user specified.  Adding unnecessary alignment
>> constraints is totally broken. 
>>
>> I am not even certain enforcing a 1MB alignment makes sense.  I suspect
>> it was added so that we don't accidentally reserve low memory on x86.
>> Frankly I am not even certain that makes sense.
>>
>> Now in practice there might be an argument for 2MB alignment that goes
>> with huge page sizes on x86.  But until someone finds that there are
>> actual problems with 1MB alignment I would not touch it.
>>
>> The proper response to something that isn't documented and confusing is
>> not to arbitrarily change it and risk breaking users.  Especially in
>> this case where it is clear that adding additional alignment is total
>> nonsense.  The proper response to something that isn't clear and
>> documented is to dig in and document it, or to leave it alone and let it
> Sounds reasonable. Then adding document or code comment around looks
> like a good way to go further so that people can easily get why its
> alignment is different than other reservation.
Hi Baoquan & Eric,

Sorry for late reply, i missed it earlier.

Thanks for your explanation, i will just leave the 1MB alignment here as is.

I will introduce CRASH_ALIGN_SPECIFIED to help make function 
reserve_crashkernel generic.
CRASH_ALIGN_SPECIFIED is used for user specified start address which is 
distinct from
default CRASH_ALIGN.

Thanks,
Chen Zhou
>
>> be the next persons problem.
>>
>> In this case there is no reason for changing this bit of code.
>> All CRASH_ALIGN is about is a default alignment when none is specified.
>> It is not a functional requirement but just something so that things
>> come out nicely.
>>
>>
>> Eric
>>
> .
>



Re: [PATCH v14 08/11] arm64: kdump: reimplement crashkernel=X

2021-02-26 Thread chenzhou



On 2021/2/26 18:31, chenzhou wrote:
>
> On 2021/2/25 0:04, Catalin Marinas wrote:
>> On Sat, Jan 30, 2021 at 03:10:22PM +0800, Chen Zhou wrote:
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>>> will fail when there is no enough low memory.
>>> 2. If reserving crashkernel above 4G, in this case, crash dump
>>> kernel will boot failure because there is no low memory available
>>> for allocation.
>>>
>>> To solve these issues, change the behavior of crashkernel=X and
>>> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
>>> in DMA zone, and fall back to high allocation if it fails.
>>> We can also use "crashkernel=X,high" to select a region above DMA zone,
>>> which also tries to allocate at least 256M in DMA zone automatically.
>>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>>>
>>> Another minor change, there may be two regions reserved for crash
>>> dump kernel, in order to distinct from the high region and make no
>>> effect to the use of existing kexec-tools, rename the low region as
>>> "Crash kernel (low)".
>> I think we discussed this but I don't remember the conclusion. Is this
>> only renamed conditionally so that we don't break current kexec-tools?
> Yes.
>> IOW, assuming that the full crashkernel region is reserved below 4GB,
>> does the "(low)" suffix still appear or it's only if a high region is
>> additionally reserved?
> Suffix "low" only appear if a high region is additionally reserved.
>>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>>> index 3f6ecae0bc68..f0caed0cb5e1 100644
>>> --- a/arch/arm64/include/asm/kexec.h
>>> +++ b/arch/arm64/include/asm/kexec.h
>>> @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
>>>  static inline void crash_post_resume(void) {}
>>>  #endif
>>>  
>>> +#ifdef CONFIG_KEXEC_CORE
>>> +extern void __init reserve_crashkernel(void);
>>> +#endif
>> Why not have this in some generic header?
>>
>>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>>> index c18aacde8bb0..69c592c546de 100644
>>> --- a/arch/arm64/kernel/setup.c
>>> +++ b/arch/arm64/kernel/setup.c
>>> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>>> kernel_data.end <= res->end)
>>> request_resource(res, _data);
>>>  #ifdef CONFIG_KEXEC_CORE
>>> -   /* Userspace will find "Crash kernel" region in /proc/iomem. */
>>> +   /*
>>> +* Userspace will find "Crash kernel" or "Crash kernel (low)"
>>> +* region in /proc/iomem.
>>> +* In order to distinct from the high region and make no effect
>>> +* to the use of existing kexec-tools, rename the low region as
>>> +* "Crash kernel (low)".
>>> +*/
>>> +   if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>>> +   crashk_low_res.end <= res->end) {
>>> +   crashk_low_res.name = "Crash kernel (low)";
>>> +   request_resource(res, _low_res);
>>> +   }
>>> if (crashk_res.end && crashk_res.start >= res->start &&
>>> crashk_res.end <= res->end)
>>> request_resource(res, _res);
>> My reading of the new generic reserve_crashkernel() is that
>> crashk_low_res will only be populated if crask_res is above 4GB. If
>> that's correct, I'm fine with the renaming here since current systems
>> would not get a renamed low reservation (as long as they don't change
>> the kernel cmdline).
>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 912f64f505f7..d20f5c444ebf 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -35,6 +35,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -61,66 +62,11 @@ EXPORT_SYMBOL(memstart_addr);
>>>   */
>>>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
>>>  
>>> -#ifdef CONFIG_KEXEC_CORE
>>> -/*

Re: [PATCH v14 08/11] arm64: kdump: reimplement crashkernel=X

2021-02-26 Thread chenzhou



On 2021/2/25 0:04, Catalin Marinas wrote:
> On Sat, Jan 30, 2021 at 03:10:22PM +0800, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>>
>> To solve these issues, change the behavior of crashkernel=X and
>> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
>> in DMA zone, and fall back to high allocation if it fails.
>> We can also use "crashkernel=X,high" to select a region above DMA zone,
>> which also tries to allocate at least 256M in DMA zone automatically.
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>>
>> Another minor change, there may be two regions reserved for crash
>> dump kernel, in order to distinct from the high region and make no
>> effect to the use of existing kexec-tools, rename the low region as
>> "Crash kernel (low)".
> I think we discussed this but I don't remember the conclusion. Is this
> only renamed conditionally so that we don't break current kexec-tools?
Yes.
>
> IOW, assuming that the full crashkernel region is reserved below 4GB,
> does the "(low)" suffix still appear or it's only if a high region is
> additionally reserved?
Suffix "low" only appear if a high region is additionally reserved.
>
>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>> index 3f6ecae0bc68..f0caed0cb5e1 100644
>> --- a/arch/arm64/include/asm/kexec.h
>> +++ b/arch/arm64/include/asm/kexec.h
>> @@ -96,6 +96,10 @@ static inline void crash_prepare_suspend(void) {}
>>  static inline void crash_post_resume(void) {}
>>  #endif
>>  
>> +#ifdef CONFIG_KEXEC_CORE
>> +extern void __init reserve_crashkernel(void);
>> +#endif
> Why not have this in some generic header?
>
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index c18aacde8bb0..69c592c546de 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>>  kernel_data.end <= res->end)
>>  request_resource(res, _data);
>>  #ifdef CONFIG_KEXEC_CORE
>> -/* Userspace will find "Crash kernel" region in /proc/iomem. */
>> +/*
>> + * Userspace will find "Crash kernel" or "Crash kernel (low)"
>> + * region in /proc/iomem.
>> + * In order to distinct from the high region and make no effect
>> + * to the use of existing kexec-tools, rename the low region as
>> + * "Crash kernel (low)".
>> + */
>> +if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>> +crashk_low_res.end <= res->end) {
>> +crashk_low_res.name = "Crash kernel (low)";
>> +request_resource(res, _low_res);
>> +}
>>  if (crashk_res.end && crashk_res.start >= res->start &&
>>  crashk_res.end <= res->end)
>>  request_resource(res, _res);
> My reading of the new generic reserve_crashkernel() is that
> crashk_low_res will only be populated if crask_res is above 4GB. If
> that's correct, I'm fine with the renaming here since current systems
> would not get a renamed low reservation (as long as they don't change
> the kernel cmdline).
>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 912f64f505f7..d20f5c444ebf 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -35,6 +35,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -61,66 +62,11 @@ EXPORT_SYMBOL(memstart_addr);
>>   */
>>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
>>  
>> -#ifdef CONFIG_KEXEC_CORE
>> -/*
>> - * reserve_crashkernel() - reserves memory for crash kernel
>> - *
>> - * This function reserves memory area given in "crashkernel=" kernel command
>> - * line parameter. The memory reserved is used by dump capture kernel when
>> - * primary kernel is crashing.
>> - */
>> +#ifndef CONFIG_KEXEC_CORE
>>  static void __init reserve_crashkernel(void)
>>  {
> [...]
>>  }
>> +#endif
> Can we not have the dummy reserve_crashkernel() in the generic code as
> well and avoid the #ifndef here?
You mean put the dummy reserve_crashkernel() and the relate function 
declaration in some generic header?
 
Baoquan also mentioned about this.
Now all the arch that support kdump have the dummy reserve_crashkernel() and
function declaration, such as arm/arm64/ppc/s390..

But currently different arch may have different CONFIG and different function 
declaration about this,
for example,

for s390,
static void __init reserve_crashkernel(void)
{  
#ifdef CONFIG_CRASH_DUMP
...
#endif
}

for 

Re: [PATCH v14 02/11] x86: kdump: make the lower bound of crash kernel reservation consistent

2021-02-25 Thread chenzhou



On 2021/2/25 23:44, Baoquan He wrote:
> On 02/25/21 at 02:42pm, Catalin Marinas wrote:
>> On Thu, Feb 25, 2021 at 03:08:46PM +0800, Baoquan He wrote:
>>> On 02/24/21 at 02:35pm, Catalin Marinas wrote:
 On Sat, Jan 30, 2021 at 03:10:16PM +0800, Chen Zhou wrote:
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index da769845597d..27470479e4a3 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -439,7 +439,8 @@ static int __init reserve_crashkernel_low(void)
>   return 0;
>   }
>  
> - low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 0, 
> CRASH_ADDR_LOW_MAX);
> + low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, CRASH_ALIGN,
> + CRASH_ADDR_LOW_MAX);
>   if (!low_base) {
>   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
> smaller size.\n",
>  (unsigned long)(low_size >> 20));
 Is there any reason why the lower bound can't be 0 in all low cases
 here? (Sorry if it's been already discussed, I lost track)
>>> Seems like a good question.
>>>
>>> This reserve_crashkernel_low(), paired with reserve_crashkernel_high(), is
>>> used to reserve memory under 4G so that kdump kernel owns memory for dma
>>> buffer allocation. In that case, kernel usually is loaded in high
>>> memory. In x86_64, kernel loading need be aligned to 16M because of
>>> CONFIG_PHYSICAL_START, please see commit 32105f7fd8faa7b ("x86: find
>>> offset for crashkernel reservation automatically"). But for crashkernel
>>> low memory, there seems to be no reason to ask for 16M alignment, if
>>> it's taken as dma buffer memory.
>>>
>>> So we can make a different alignment for low memory only, e.g 2M. But
>>> 16M alignment consistent with crashkernel,high is also fine to me. The
>>> only affect is smaller alignment can increase the possibility of
>>> crashkernel low reservation.
>> I don't mind the 16M alignment in both low and high base. But is there
>> any reason that the lower bound (third argument) cannot be 0 in both
>> reserve_crashkernel() (the low attempt) and reserve_crashkernel_low()
>> cases? The comment in reserve_crashkernel() only talks about the 4G
>> upper bound but not why we need a 16M lower bound.
> Ah, sorry, I must have mixed this one with the alignment of fixed
> memory region reservation in patch 1 when considering comments.
>
> Hmm, in x86 we always have memory reserved in low 1M, lower bound
> being 0 or 16M (kernel alignment) doesn't make difference on crashkernel
> low reservation. But for crashkernel reservation, the reason should be
> kernel loading alignment being 16M, please see commit 32105f7fd8faa7b
> ("x86: find offset for crashkernel reservation automatically").
Sorry, i didn't mention in the commit message about this.

We discussed about this and the CRASH_ALIGN sounds better, so just use 
CRASH_ALIGN.
https://lkml.org/lkml/2020/9/4/82

Thanks,
Chen Zhou
>
> So, for crashkernel low, keeping lower bound as 0 looks good to me, the
> only reason is just as patch log tells. And it can skip the unnecessary
> memblock searching under 16M since it will always fail, even though it
> won't matter much. Or changing it to CRASH_ALIGN as this patch is doing,
> and adding code comment, is also fine to me.
>
> Thanks
> Baoquan
>
> .
>



Re: [PATCH v14 01/11] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

2021-02-25 Thread chenzhou



On 2021/2/25 15:25, Baoquan He wrote:
> On 02/24/21 at 02:19pm, Catalin Marinas wrote:
>> On Sat, Jan 30, 2021 at 03:10:15PM +0800, Chen Zhou wrote:
>>> Move CRASH_ALIGN to header asm/kexec.h for later use. Besides, the
>>> alignment of crash kernel regions in x86 is 16M(CRASH_ALIGN), but
>>> function reserve_crashkernel() also used 1M alignment. So just
>>> replace hard-coded alignment 1M with macro CRASH_ALIGN.
>> [...]
>>> @@ -510,7 +507,7 @@ static void __init reserve_crashkernel(void)
>>> } else {
>>> unsigned long long start;
>>>  
>>> -   start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
>>> +   start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, 
>>> crash_base,
>>>   crash_base + crash_size);
>>> if (start != crash_base) {
>>> pr_info("crashkernel reservation failed - memory is in 
>>> use.\n");
>> There is a small functional change here for x86. Prior to this patch,
>> crash_base passed by the user on the command line is allowed to be 1MB
>> aligned. With this patch, such reservation will fail.
>>
>> Is the current behaviour a bug in the current x86 code or it does allow
>> 1MB-aligned reservations?
> Hmm, you are right. Here we should keep 1MB alignment as is because
> users specify the address and size, their intention should be respected.
> The 1MB alignment for fixed memory region reservation was introduced in
> below commit, but it doesn't tell what is Eric's request at that time, I
> guess it meant respecting users' specifying.
I think we could make the alignment unified. Why is the alignment system 
reserved and
user specified different? Besides, there is no document about the 1MB alignment.
How about adding the alignment size(16MB) in doc  if user specified  start 
address as arm64 does.

Thanks,
Chen Zhou
>
> commit 44280733e71ad15377735b42d8538c109c94d7e3
> Author: Yinghai Lu 
> Date:   Sun Nov 22 17:18:49 2009 -0800
>
> x86: Change crash kernel to reserve via reserve_early()
> 
> use find_e820_area()/reserve_early() instead.
> 
> -v2: address Eric's request, to restore original semantics.
>  will fail, if the provided address can not be used.
>
> .
>



Re: [PATCH v14 11/11] kdump: update Documentation about crashkernel

2021-02-19 Thread chenzhou



On 2021/2/18 16:40, Baoquan He wrote:
> On 01/30/21 at 03:10pm, Chen Zhou wrote:
>> For arm64, the behavior of crashkernel=X has been changed, which
>> tries low allocation in DMA zone and fall back to high allocation
>> if it fails.
>>
>> We can also use "crashkernel=X,high" to select a high region above
>> DMA zone, which also tries to allocate at least 256M low memory in
>> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
>> specified size low memory.
>>
>> So update the Documentation.
> Nice document adding which also takes care of x86 code implementation,
> thanks. By the way, maybe you can remove John's 'Tested-by' since it
> doesn't make much sense to test a document patch.
I will remove the Tested-by in next version.
>
> Acked-by: Baoquan He 
>
>> Signed-off-by: Chen Zhou 
>> Tested-by: John Donnelly 
>> ---
>>  Documentation/admin-guide/kdump/kdump.rst | 22 ---
>>  .../admin-guide/kernel-parameters.txt | 11 --
>>  2 files changed, 28 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
>> b/Documentation/admin-guide/kdump/kdump.rst
>> index 75a9dd98e76e..0877c76f8015 100644
>> --- a/Documentation/admin-guide/kdump/kdump.rst
>> +++ b/Documentation/admin-guide/kdump/kdump.rst
>> @@ -299,7 +299,16 @@ Boot into System Kernel
>> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
>> starting at physical address 0x0100 (16MB) for the dump-capture 
>> kernel.
>>  
>> -   On x86 and x86_64, use "crashkernel=64M@16M".
>> +   On x86 use "crashkernel=64M@16M".
>> +
>> +   On x86_64, use "crashkernel=X" to select a region under 4G first, and
>> +   fall back to reserve region above 4G. And go for high allocation
>> +   directly if the required size is too large.
>> +   We can also use "crashkernel=X,high" to select a region above 4G, which
>> +   also tries to allocate at least 256M below 4G automatically and
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from specified
>> +   start address X.
>>  
>> On ppc64, use "crashkernel=128M@32M".
>>  
>> @@ -316,8 +325,15 @@ Boot into System Kernel
>> kernel will automatically locate the crash kernel image within the
>> first 512MB of RAM if X is not given.
>>  
>> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
>> -   the kernel, X if explicitly specified, must be aligned to 2MiB 
>> (0x20).
>> +   On arm64, use "crashkernel=X" to try low allocation in DMA zone and
>> +   fall back to high allocation if it fails.
>> +   We can also use "crashkernel=X,high" to select a high region above
>> +   DMA zone, which also tries to allocate at least 256M low memory in
>> +   DMA zone automatically.
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from
>> +   specified start address X. Note that the start address of the kernel,
>> +   X if explicitly specified, must be aligned to 2MiB (0x20).
>>  
>>  Load the Dump-capture Kernel
>>  
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index a10b545c2070..908e5c8b61ba 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -738,6 +738,9 @@
>>  [KNL, X86-64] Select a region under 4G first, and
>>  fall back to reserve region above 4G when '@offset'
>>  hasn't been specified.
>> +[KNL, arm64] Try low allocation in DMA zone and fall 
>> back
>> +to high allocation if it fails when '@offset' hasn't 
>> been
>> +specified.
>>  See Documentation/admin-guide/kdump/kdump.rst for 
>> further details.
>>  
>>  crashkernel=range1:size1[,range2:size2,...][@offset]
>> @@ -754,6 +757,8 @@
>>  Otherwise memory region will be allocated below 4G, if
>>  available.
>>  It will be ignored if crashkernel=X is specified.
>> +[KNL, arm64] range in high memory.
>> +Allow kernel to allocate physical memory region from 
>> top.
>>  crashkernel=size[KMG],low
>>  [KNL, X86-64] range under 4G. When crashkernel=X,high
>>  is passed, kernel could allocate physical memory region
>> @@ -762,13 +767,15 @@
>>  requires at least 64M+32K low memory, also enough extra
>>  low memory is needed to make sure DMA buffers for 32-bit
>>  devices won't run out. Kernel would try to allocate at
>> -at least 256M below 4G automatically.
>> +least 256M below 4G 

Re: [PATCH v14 09/11] x86, arm64: Add ARCH_WANT_RESERVE_CRASH_KERNEL config

2021-02-19 Thread chenzhou



On 2021/2/18 16:35, Baoquan He wrote:
> On 01/30/21 at 03:10pm, Chen Zhou wrote:
>> We make the functions reserve_crashkernel[_low]() as generic for
>> x86 and arm64. Since reserve_crashkernel[_low]() implementations
>> are quite similar on other architectures as well, we can have more
>> users of this later.
>>
>> So have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in arch/Kconfig and
>> select this by X86 and ARM64.
> This looks much better with the help of
> CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL. And please take off the
> 'Suggested-by' tag from me, I just don't like the old CONFIG_X86 and
> CONFIG_ARM64 ifdeffery way in v13, Mike suggested this ARCH_WANT_
> option.
OK, i will delete this.
>
> And the two dummy function reserve_crashkernel() in x86 and arm64 looks
> not so good, but I don't have better idea. Maybe add
> CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL iddeffery in the call site of
> reserve_crashkernel() in each ARCH? Or just leave with it for now if no
> other people has concern or suggestion about it.
>
> Anyway, ack this one.
>
> Acked-by: Baoquan He 
>
> Thanks
> Baoquan
>
>
>> Suggested-by: Mike Rapoport 
>> Suggested-by: Baoquan He 
>> Signed-off-by: Chen Zhou 
>> ---
>>  arch/Kconfig| 3 +++
>>  arch/arm64/Kconfig  | 1 +
>>  arch/x86/Kconfig| 2 ++
>>  kernel/crash_core.c | 7 ++-
>>  4 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 24862d15f3a3..0ca1ff5bb157 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -24,6 +24,9 @@ config KEXEC_ELF
>>  config HAVE_IMA_KEXEC
>>  bool
>>  
>> +config ARCH_WANT_RESERVE_CRASH_KERNEL
>> +bool
>> +
>>  config SET_FS
>>  bool
>>  
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index f39568b28ec1..09365c7ff469 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -82,6 +82,7 @@ config ARM64
>>  select ARCH_WANT_FRAME_POINTERS
>>  select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES 
>> && !ARM64_VA_BITS_36)
>>  select ARCH_WANT_LD_ORPHAN_WARN
>> +select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>>  select ARCH_HAS_UBSAN_SANITIZE_ALL
>>  select ARM_AMBA
>>  select ARM_ARCH_TIMER
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 21f851179ff0..e6926fcb4a40 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -12,6 +12,7 @@ config X86_32
>>  depends on !64BIT
>>  # Options that are inherently 32-bit kernel only:
>>  select ARCH_WANT_IPC_PARSE_VERSION
>> +select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>>  select CLKSRC_I8253
>>  select CLONE_BACKWARDS
>>  select GENERIC_VDSO_32
>> @@ -28,6 +29,7 @@ config X86_64
>>  select ARCH_HAS_GIGANTIC_PAGE
>>  select ARCH_SUPPORTS_INT128 if CC_HAS_INT128
>>  select ARCH_USE_CMPXCHG_LOCKREF
>> +select ARCH_WANT_RESERVE_CRASH_KERNEL if KEXEC_CORE
>>  select HAVE_ARCH_SOFT_DIRTY
>>  select MODULES_USE_ELF_RELA
>>  select NEED_DMA_MAP_STATE
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 8479be270c0b..2c5783985db5 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -320,9 +320,7 @@ int __init parse_crashkernel_low(char *cmdline,
>>   * - Crashkernel reservation --
>>   */
>>  
>> -#ifdef CONFIG_KEXEC_CORE
>> -
>> -#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
>> +#ifdef CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
>>  static int __init reserve_crashkernel_low(void)
>>  {
>>  #ifdef CONFIG_64BIT
>> @@ -450,8 +448,7 @@ void __init reserve_crashkernel(void)
>>  crashk_res.start = crash_base;
>>  crashk_res.end   = crash_base + crash_size - 1;
>>  }
>> -#endif
>> -#endif /* CONFIG_KEXEC_CORE */
>> +#endif /* CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL */
>>  
>>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
>>void *data, size_t data_len)
>> -- 
>> 2.20.1
>>
> .
>



Re: [PATCH v14 06/11] x86/elf: Move vmcore_elf_check_arch_cross to arch/x86/include/asm/elf.h

2021-02-17 Thread chenzhou



On 2021/2/18 14:31, Baoquan He wrote:
> On 01/30/21 at 03:10pm, Chen Zhou wrote:
>> Move macro vmcore_elf_check_arch_cross from arch/x86/include/asm/kexec.h
>> to arch/x86/include/asm/elf.h to fix the following compiling warning:
>>
>> make ARCH=i386
>> In file included from arch/x86/kernel/setup.c:39:0:
>> ./arch/x86/include/asm/kexec.h:77:0: warning: "vmcore_elf_check_arch_cross" 
>> redefined
>>  # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
>>
>> In file included from arch/x86/kernel/setup.c:9:0:
>> ./include/linux/crash_dump.h:39:0: note: this is the location of the 
>> previous definition
>>  #define vmcore_elf_check_arch_cross(x) 0
>>
>> The root cause is that vmcore_elf_check_arch_cross under CONFIG_CRASH_CORE
>> depend on CONFIG_KEXEC_CORE. Commit 2db65f1db17d ("x86: kdump: move
>> reserve_crashkernel[_low]() into crash_core.c") triggered the issue.
>>
>> Suggested by Mike, simply move vmcore_elf_check_arch_cross from
>> arch/x86/include/asm/kexec.h to arch/x86/include/asm/elf.h to fix
>> the warning.
>>
>> Fixes: 2db65f1db17d ("x86: kdump: move reserve_crashkernel[_low]() into 
>> crash_core.c")
> Where does this commit id '2db65f1db17d' come from? Here you are fixing
> another pathc in the same patchset. Please merge this with patch 05/11.
Yeah, the commit id is invalid, i will merge this patch with patch 05/11.

Thanks,
Chen Zhou
>
>> Reported-by: kernel test robot 
>> Suggested-by: Mike Rapoport 
>> Signed-off-by: Chen Zhou 
>> ---
>>  arch/x86/include/asm/elf.h   | 3 +++
>>  arch/x86/include/asm/kexec.h | 3 ---
>>  2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
>> index 66bdfe838d61..5333777cc758 100644
>> --- a/arch/x86/include/asm/elf.h
>> +++ b/arch/x86/include/asm/elf.h
>> @@ -94,6 +94,9 @@ extern unsigned int vdso32_enabled;
>>  
>>  #define elf_check_arch(x)   elf_check_arch_ia32(x)
>>  
>> +/* We can also handle crash dumps from 64 bit kernel. */
>> +# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
>> +
>>  /* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program starts %edx
>> contains a pointer to a function which might be registered using 
>> `atexit'.
>> This provides a mean for the dynamic linker to call DT_FINI functions for
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 2b18f918203e..6fcae01a9cca 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -72,9 +72,6 @@ struct kimage;
>>  
>>  /* The native architecture */
>>  # define KEXEC_ARCH KEXEC_ARCH_386
>> -
>> -/* We can also handle crash dumps from 64 bit kernel. */
>> -# define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
>>  #else
>>  /* Maximum physical address we can use pages from */
>>  # define KEXEC_SOURCE_MEMORY_LIMIT  (MAXMEM-1)
>> -- 
>> 2.20.1
>>
> .
>



Re: [PATCH v14 00/11] support reserving crashkernel above 4G on arm64 kdump

2021-02-07 Thread chenzhou
Hi all,

Friendly ping...


On 2021/1/30 15:10, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone and fall back to high
> allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically and "crashkernel=Y,low" can be used to allocate
> specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following eleven patches:
> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-xen_pv_domain-check-and-insert_resour.patch
> 0005-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0006-x86-elf-Move-vmcore_elf_check_arch_cross-to-arch-x86.patch
> 0007-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0008-arm64-kdump-reimplement-crashkernel-X.patch
> 0009-x86-arm64-Add-ARCH_WANT_RESERVE_CRASH_KERNEL-config.patch
> 0010-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0011-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0004 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
> 0005 makes functions reserve_crashkernel[_low]() generic.
> 0006 fix compiling warning.
> 0007-0009 reimplements arm64 crashkernel=X.
> 0010 adds memory for devices by DT property linux,usable-memory-range.
> 0011 updates the doc.
>
> Changes since [v13]
> - Rebased on top of 5.11-rc5.
> - Introduce config CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL.
> Since reserve_crashkernel[_low]() implementations are quite similar on
> other architectures, so have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL in
> arch/Kconfig and select this by X86 and ARM64.
> - Some minor cleanup.
>
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to 
> Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 
> 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add 

Re: [PATCH v14 11/11] kdump: update Documentation about crashkernel

2021-02-03 Thread chenzhou
Hi Randy,


On 2021/1/31 1:53, Randy Dunlap wrote:
> Hi--
>
> On 1/29/21 11:10 PM, Chen Zhou wrote:
>> ---
>>  Documentation/admin-guide/kdump/kdump.rst | 22 ---
>>  .../admin-guide/kernel-parameters.txt | 11 --
>>  2 files changed, 28 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index a10b545c2070..908e5c8b61ba 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
> All of the "arm64" instances in [square brackets] should be "ARM64".
Got it, thanks for your review.

Thanks,
Chen Zhou
>
>> @@ -738,6 +738,9 @@
>>  [KNL, X86-64] Select a region under 4G first, and
>>  fall back to reserve region above 4G when '@offset'
>>  hasn't been specified.
>> +[KNL, arm64] Try low allocation in DMA zone and fall 
>> back
> here
>
>> +to high allocation if it fails when '@offset' hasn't 
>> been
>> +specified.
>>  See Documentation/admin-guide/kdump/kdump.rst for 
>> further details.
>>  
>>  crashkernel=range1:size1[,range2:size2,...][@offset]
>> @@ -754,6 +757,8 @@
>>  Otherwise memory region will be allocated below 4G, if
>>  available.
>>  It will be ignored if crashkernel=X is specified.
>> +[KNL, arm64] range in high memory.
> here
>
>> +Allow kernel to allocate physical memory region from 
>> top.
>>  crashkernel=size[KMG],low
>>  [KNL, X86-64] range under 4G. When crashkernel=X,high
>>  is passed, kernel could allocate physical memory region
>> @@ -762,13 +767,15 @@
>>  requires at least 64M+32K low memory, also enough extra
>>  low memory is needed to make sure DMA buffers for 32-bit
>>  devices won't run out. Kernel would try to allocate at
>> -at least 256M below 4G automatically.
>> +least 256M below 4G automatically.
>>  This one let user to specify own low range under 4G
>>  for second kernel instead.
>>  0: to disable low allocation.
>>  It will be ignored when crashkernel=X,high is not used
>>  or memory reserved is below 4G.
>> -
>> +[KNL, arm64] range in low memory.
> here
>
>> +This one let user to specify a low range in DMA zone for
>> +crash dump kernel.
>
> Thanks.
>



Re: [PATCH 0/2] arm64: mm: fix kdump broken with ZONE_DMA reintroduced

2021-01-20 Thread chenzhou
Hi Will, Catalin,


On 2021/1/20 21:07, Catalin Marinas wrote:
> On Wed, Jan 20, 2021 at 12:40:55PM +, Will Deacon wrote:
>> On Sat, Dec 26, 2020 at 11:35:55AM +0800, Chen Zhou wrote:
>>> If the memory reserved for crash dump kernel falled in ZONE_DMA32,
>>> the devices in crash dump kernel need to use ZONE_DMA will alloc fail.
>>>
>>> Fix this by reserving low memory in ZONE_DMA if CONFIG_ZONE_DMA is
>>> enabled, otherwise, reserving in ZONE_DMA32.
>>>
>>> Patch 1 updates the comments about the ZONE_DMA.
>>> Patch 2 fix kdump broken.
>>>
>>> Chen Zhou (2):
>>>   arm64: mm: update the comments about ZONE_DMA
>>>   arm64: mm: fix kdump broken with ZONE_DMA reintroduced
>> Please can you repost this if it is still needed after the changes that
>> landed in -rc4?
> I don't think this series is needed anymore but I'll let Chen confirm.
Yes, we don't need this series, Catalin has solved the issue i mentioned.

I will repost the series "support reserving crashkernel above 4G on arm64 kdump"
based on the latest code.

Thanks,
Chen Zhou



Re: [PATCH v2] cgroup-v1: add disabled controller check in cgroup1_parse_param()

2021-01-15 Thread chenzhou
Hi Michal,

On 2021/1/15 18:08, Michal Koutný wrote:
> On Fri, Jan 15, 2021 at 09:55:43AM +0800, chenzhou  
> wrote:
>> Yeah, this will select all enabled controllers, but which doesn't the 
>> behavior we want.
> I see what the issue is now.
>
>> See above. Just the mount behavior isn't what we what.
> I agree this a bug and your I find your approach correct
>
> Reviewed-by: Michal Koutný 
I have sent the v3, which updates the description and add Fixes. 
[v3]: https://lore.kernel.org/patchwork/patch/1365535/
If it is ok for you to add Reviewed-by there.

Thanks,
Chen Zhou
>
>> The behavior was changed since commit f5dfb5315d34 ("cgroup: take
>> options parsing into ->parse_monolithic()"), will add this as Fixes.
> Thanks.
>
> Michal



Re: [PATCH v2] cgroup-v1: add disabled controller check in cgroup1_parse_param()

2021-01-14 Thread chenzhou



On 2021/1/15 11:17, Tejun Heo wrote:
> Hello,
>
> On Fri, Jan 15, 2021 at 09:55:43AM +0800, chenzhou wrote:
>> Yeah, this will select all enabled controllers, but which doesn't the 
>> behavior we want.
>> I think the case should return error with information "Disabled controller 
>> xx" rather than
>> attaching all the other enabled controllers.
>>
>> For example, boot with cgroup_no_v1=cpu, and then mount with
>> "mount -t cgroup -o cpu cpu /sys/fs/cgroup/cpu", then all enabled 
>> controllers will
>> be attached expect cpu.
> Okay, that explanation actually makes sense. Can you please update the
> description to include what's broken and how it's being fixed? It really
> isn't clear what the patch is trying to achieve from the current
> description.
Ok, will update the description.
>
> Thanks.
>



Re: [PATCH v2] cgroup-v1: add disabled controller check in cgroup1_parse_param()

2021-01-14 Thread chenzhou



On 2021/1/15 0:54, Michal Koutný wrote:
> On Thu, Jan 14, 2021 at 10:08:19PM +0800, chenzhou  
> wrote:
>> In this case, at the beginning of function check_cgroupfs_options(), the mask
>> ctx->subsys_mask will be 0. And if we mount without 'none' and 'name=' 
>> options,
>> then in check_cgroupfs_options(), the flag ctx->all_ss will be set, that is, 
>> select all the subsystems.
> But even then, the line
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/cgroup/cgroup-v1.c?h=v5.11-rc3#n1012
> would select only 'enabled' controllers, wouldn't it?
Yeah, this will select all enabled controllers, but which doesn't the behavior 
we want.
I think the case should return error with information "Disabled controller xx" 
rather than
attaching all the other enabled controllers.

For example, boot with cgroup_no_v1=cpu, and then mount with
"mount -t cgroup -o cpu cpu /sys/fs/cgroup/cpu", then all enabled controllers 
will
be attached expect cpu.
>
> It's possible I missed something but if this means that cgroup_no_v1=
> doesn't hold to its expectations, I'd suggest adding proper Fixes: tag
> to the patch.
See above. Just the mount behavior isn't what we what.
The behavior was changed since commit f5dfb5315d34 ("cgroup: take options 
parsing into ->parse_monolithic()"),
will add this as Fixes.

Thanks,
Chen Zhou
>
> Thanks,
> Michal



Re: [PATCH v2] cgroup-v1: add disabled controller check in cgroup1_parse_param()

2021-01-14 Thread chenzhou
Hi Michal,

On 2021/1/14 21:12, Michal Koutný wrote:
> Hello Chen.
>
> On Fri, Dec 18, 2020 at 02:17:55PM +0800, Chen Zhou  
> wrote:
>> When mounting a cgroup hierarchy with disabled controller in cgroup v1,
>> all available controllers will be attached.
> Not sure if I understand the situation -- have you observed a v1
> controller attached to a hierarchy while specifying cgroup_no_v1= kernel
> cmdline arg?
Yeah, this is the situation.
In this case, at the beginning of function check_cgroupfs_options(), the mask
ctx->subsys_mask will be 0. And if we mount without 'none' and 'name=' options,
then in check_cgroupfs_options(), the flag ctx->all_ss will be set, that is, 
select all the subsystems.

Thanks,
Chen Zhou
>
> AFAICS, the disabled controllers are honored thanks to
> check_cgroupfs_options().
>
> Michal



Re: [PATCH 2/2] arm64: mm: fix kdump broken with ZONE_DMA reintroduced

2021-01-07 Thread chenzhou



On 2021/1/7 22:25, Catalin Marinas wrote:
> On Sat, Dec 26, 2020 at 11:34:58AM +0100, Nicolas Saenz Julienne wrote:
>> On Sat, 2020-12-26 at 11:35 +0800, Chen Zhou wrote:
>>> If the memory reserved for crash dump kernel falled in ZONE_DMA32,
>>> the devices in crash dump kernel need to use ZONE_DMA will alloc fail.
>>>
>>> Fix this by reserving low memory in ZONE_DMA if CONFIG_ZONE_DMA is
>>> enabled, otherwise, reserving in ZONE_DMA32.
>>>
>>> Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32")
>> I'm not so sure this counts as a fix, if someone backports it it'll probably
>> break things as it depends on the series that dynamically sizes DMA zones.
>>
>>> Signed-off-by: Chen Zhou 
>>> ---
>> Why not doing the same with CMA? You'll probably have to move the
>> dma_contiguous_reserve() call into bootmem_init() so as to make sure that
>> arm64_dma_phys_limit is populated.
> Do we need the arm64_dma32_phys_limit at all? I can see the
> (arm64_dma_phys_limit ? : arm64_dma32_phys_limit) pattern in several
> places but I think we can just live with the arm64_dma_phys_limit.
Yes, arm64_dma_phys_limit is enough.
>
> Also, I don't think we need any early ARCH_LOW_ADDRESS_LIMIT. It's only
> used by memblock_alloc_low() and that's called from swiotlb_init()
> after arm64_dma_phys_limit was initialised.
>
> What about something like below (on top of you ARCH_LOW_ADDRESS_LIMIT
> fix but I can revert that)? I haven't tested it in all configurations
> yet.
>
> diff --git a/arch/arm64/include/asm/processor.h 
> b/arch/arm64/include/asm/processor.h
> index 69ad25fbeae4..ca2cd75d3286 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -94,8 +94,7 @@
>  #endif /* CONFIG_ARM64_FORCE_52BIT */
>  
>  extern phys_addr_t arm64_dma_phys_limit;
> -extern phys_addr_t arm64_dma32_phys_limit;
> -#define ARCH_LOW_ADDRESS_LIMIT   ((arm64_dma_phys_limit ? : 
> arm64_dma32_phys_limit) - 1)
> +#define ARCH_LOW_ADDRESS_LIMIT   (arm64_dma_phys_limit - 1)
>  
>  struct debug_info {
>  #ifdef CONFIG_HAVE_HW_BREAKPOINT
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 7deddf56f7c3..596a94bf5ed6 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -59,7 +59,6 @@ EXPORT_SYMBOL(memstart_addr);
>   * bit addressable memory area.
>   */
>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
> -phys_addr_t arm64_dma32_phys_limit __ro_after_init;
>  
>  #ifdef CONFIG_KEXEC_CORE
>  /*
> @@ -84,7 +83,7 @@ static void __init reserve_crashkernel(void)
>  
>   if (crash_base == 0) {
>   /* Current arm64 boot protocol requires 2MB alignment */
> - crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
> + crash_base = memblock_find_in_range(0, arm64_dma_phys_limit,
>   crash_size, SZ_2M);
>   if (crash_base == 0) {
>   pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
> @@ -196,6 +195,7 @@ static void __init zone_sizes_init(unsigned long min, 
> unsigned long max)
>   unsigned long max_zone_pfns[MAX_NR_ZONES]  = {0};
>   unsigned int __maybe_unused acpi_zone_dma_bits;
>   unsigned int __maybe_unused dt_zone_dma_bits;
> + phys_addr_t dma32_phys_limit = max_zone_phys(32);
>  
>  #ifdef CONFIG_ZONE_DMA
>   acpi_zone_dma_bits = fls64(acpi_iort_dma_get_max_cpu_address());
> @@ -205,8 +205,12 @@ static void __init zone_sizes_init(unsigned long min, 
> unsigned long max)
>   max_zone_pfns[ZONE_DMA] = PFN_DOWN(arm64_dma_phys_limit);
>  #endif
>  #ifdef CONFIG_ZONE_DMA32
> - max_zone_pfns[ZONE_DMA32] = PFN_DOWN(arm64_dma32_phys_limit);
> + max_zone_pfns[ZONE_DMA32] = PFN_DOWN(dma32_phys_limit);
> + if (!arm64_dma_phys_limit)
> + arm64_dma_phys_limit = dma32_phys_limit;
>  #endif
> + if (!arm64_dma_phys_limit)
> + arm64_dma_phys_limit = PHYS_MASK + 1;
>   max_zone_pfns[ZONE_NORMAL] = max;
>  
>   free_area_init(max_zone_pfns);
> @@ -394,16 +398,9 @@ void __init arm64_memblock_init(void)
>  
>   early_init_fdt_scan_reserved_mem();
>  
> - if (IS_ENABLED(CONFIG_ZONE_DMA32))
> - arm64_dma32_phys_limit = max_zone_phys(32);
> - else
> - arm64_dma32_phys_limit = PHYS_MASK + 1;
> -
>   reserve_elfcorehdr();
>  
>   high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
> -
> - dma_contiguous_reserve(arm64_dma32_phys_limit);
>  }
>  
>  void __init bootmem_init(void)
> @@ -438,6 +435,11 @@ void __init bootmem_init(void)
>   sparse_init();
>   zone_sizes_init(min, max);
>  
> + /*
> +  * Reserve the CMA area after arm64_dma_phys_limit was initialised.
> +  */
> + dma_contiguous_reserve(arm64_dma_phys_limit);
> +
>   /*
>* request_standard_resources() depends on crashkernel's memory being
>* reserved, so do it here.
> @@ -455,7 +457,7 @@ void __init bootmem_init(void)
>  void __init 

Re: [PATCH 2/2] arm64: mm: fix kdump broken with ZONE_DMA reintroduced

2020-12-27 Thread chenzhou
Hi Nicolas,

Thanks for your review.


On 2020/12/26 18:34, Nicolas Saenz Julienne wrote:
> Hi Chen, thanks for looking at this.
>
> On Sat, 2020-12-26 at 11:35 +0800, Chen Zhou wrote:
>> If the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc fail.
>>
>> Fix this by reserving low memory in ZONE_DMA if CONFIG_ZONE_DMA is
>> enabled, otherwise, reserving in ZONE_DMA32.
>>
>> Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32")
> I'm not so sure this counts as a fix, if someone backports it it'll probably
> break things as it depends on the series that dynamically sizes DMA zones.
I write this just because kdump is broken from this commit.
>
>> Signed-off-by: Chen Zhou 
>> ---
> Why not doing the same with CMA? You'll probably have to move the
> dma_contiguous_reserve() call into bootmem_init() so as to make sure that
> arm64_dma_phys_limit is populated.
You are right, CMA also need this. I will do this in next version.

Thanks,
Chen Zhou
>
> Regards,
> Nicolas
>
>>  arch/arm64/mm/init.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index 7b9809e39927..5074e945f1a6 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -85,7 +85,8 @@ static void __init reserve_crashkernel(void)
>>  
>>
>>  if (crash_base == 0) {
>>  /* Current arm64 boot protocol requires 2MB alignment */
>> -crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>> +crash_base = memblock_find_in_range(0,
>> +arm64_dma_phys_limit ? : arm64_dma32_phys_limit,
>>  crash_size, SZ_2M);
>>  if (crash_base == 0) {
>>  pr_warn("cannot allocate crashkernel (size:0x%llx)\n",
>



[QUESTION] Dying cgroups in subsystem cpu, pids and named hierarchy systemd

2020-12-14 Thread chenzhou
Hi all,

When i do some tests with kernel-4.19.x,i found some dying cgroups.

There are dying cgroup in subsystem cpu/cpuacct, memory, pids and named 
hierarchy systemd.

The root cause of dying cgroups in memory subsystem is that some charged pages 
aren't gone.

I don't figure out why there are dying cgroup in cpu, pids and named hierarchy 
systemd.

I used to turn on/off Delegate and do operation "systemctl daemon-reload" in 
the machine,

but now can't reproduce it.

Is this normal and what may cause this or any suggestions about this?


The details about the machine with problem is as below:
# mount -t cgroup
cgroup on /sys/fs/cgroup/systemd type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/pids type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,pids)
cgroup on /sys/fs/cgroup/memory type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,memory)
cgroup on /sys/fs/cgroup/blkio type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,blkio)
cgroup on /sys/fs/cgroup/devices type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,perf_event)
cgroup on /sys/fs/cgroup/rdma type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,freezer)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpuset type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,cpuset)
cgroup on /sys/fs/cgroup/hugetlb type cgroup 
(rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb)
memory on /cgroup/memory type cgroup (rw,relatime,seclabel,memory)
cpu,cpuacct on /sys/fs/cgroup/cpu,cpuacct type cgroup 
(rw,relatime,seclabel,cpu,cpuacct)

# cat /proc/cgroups
#subsys_namehierarchynum_cgroupsenabled
cpuset121091
cpu101101
cpuacct101101
blkio41091
memory325711
devices51091
freezer91091
net_cls1111
perf_event611
net_prio1111
hugetlb1311
pids24801
...

# crash vmlinux /proc/kcore
crash> list cgroup_roots
81272558
...

list -H 81272558 -s 
cgroup_root.subsys_mask,hierarchy_id,cgrp.nr_dying_descendants -l 
cgroup_root.root_list
8007ff67c6f8// cpu,cpuacct
  subsys_mask = 6
  hierarchy_id = 10
  cgrp.nr_dying_descendants = 1,
...
8007ff6726f8// memory
  subsys_mask = 16
  hierarchy_id = 3
  cgrp.nr_dying_descendants = 2036,
800fb579c6f8// pids
  subsys_mask = 2048
  hierarchy_id = 2
  cgrp.nr_dying_descendants = 1,
800fb57986f8// named systemd
  subsys_mask = 0
  hierarchy_id = 1
  cgrp.nr_dying_descendants = 1,
...


For the cpu subsystem, the dying cgroup is /sys/fs/cgroup/cpu/user.slice:

crash> p root_task_group.css.cgroup
$4 = (struct cgroup *) 0x8007ff67c010
crash> cgroup_subsys_state 8007ff67c010 -ox
struct cgroup_subsys_state {
  [8007ff67c010] struct cgroup *cgroup;
  [8007ff67c018] struct cgroup_subsys *ss;
  [8007ff67c020] struct percpu_ref refcnt;
  [8007ff67c058] struct list_head sibling;
  [8007ff67c068] struct list_head children;
  [8007ff67c078] struct list_head rstat_css_node;
  [8007ff67c088] int id;
  [8007ff67c08c] unsigned int flags;
  [8007ff67c090] u64 serial_nr;
  [8007ff67c098] atomic_t online_cnt;
  [8007ff67c0a0] struct work_struct destroy_work;
  [8007ff67c0e0] struct rcu_work destroy_rwork;
  [8007ff67c138] struct cgroup_subsys_state *parent;
}
SIZE: 0x130
crash> list -H 8007ff67c068 -s cgroup_subsys_state.cgroup -l 
cgroup_subsys_state.sibling
800faff65848
  cgroup = 0x800faff65800
800fa18b1048
  cgroup = 0x800fa18b1000
800fa22c0848
  cgroup = 0x800fa22c0800
800fa940e048
  cgroup = 0x800fa940e000
crash>
crash> cgroup.self.refcnt,flags,kn 0x800fa18b1000 -x
  self.refcnt = {
count = {
  counter = 0x3
},
percpu_count_ptr = 0xfffefdf041334c83,
release = 0x801c1ea0 ,
confirm_switch = 0x0,
force_atomic = 0x0,
rcu = {
  next = 0x80083365da30,
  func = 0x804fb748 
}
  },
  flags = 0x0
  kn = 0x800bbdb1b110
crash>
crash> kernfs_node.name 0x800bbdb1b110
  name = 0x800fa54d9c00 "user.slice"


Thanks,

Chen Zhou



Re: [PATCH] selinux: Fix error return code in sel_ib_pkey_sid_slow()

2020-11-12 Thread chenzhou



On 2020/11/12 16:55, Chen Zhou wrote:
> Fix to return a negative error code from the error handling case
> instead of 0 in function sel_ib_pkey_sid_slow(), as done elsewhere
> in this function.
>
> Fixes: 409dcf31538a ("selinux: Add a cache for quicker retreival of PKey 
> SIDs")
> Reported-by: Hulk Robot 
> Signed-off-by: Chen Zhou 
> ---
>  security/selinux/ibpkey.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/security/selinux/ibpkey.c b/security/selinux/ibpkey.c
> index f68a7617cfb9..680b2dd1520f 100644
> --- a/security/selinux/ibpkey.c
> +++ b/security/selinux/ibpkey.c
> @@ -151,8 +151,10 @@ static int sel_ib_pkey_sid_slow(u64 subnet_prefix, u16 
> pkey_num, u32 *sid)
>* is valid, it just won't be added to the cache.
>*/
>   new = kzalloc(sizeof(*new), GFP_ATOMIC);
> - if (!new)
> + if (IS_ERR(new)) {
> + ret = PTR_ERR(new);
>   goto out;
> + }
>  
>   new->psec.subnet_prefix = subnet_prefix;
>   new->psec.pkey = pkey_num;
Ignore this, will send v2.


Re: [PATCH] usb: gadget: mass_storage: fix error return code in msg_bind()

2020-11-12 Thread chenzhou



On 2020/11/12 16:53, Chen Zhou wrote:
> Fix to return a negative error code from the error handling case
> instead of 0 in function msg_bind(), as done elsewhere in this
> function.
>
> Fixes: d86788979761 ("usb: gadget: mass_storage: allocate and init otg 
> descriptor by otg capabilities")
> Reported-by: Hulk Robot 
> Signed-off-by: Chen Zhou 
> ---
>  drivers/usb/gadget/legacy/mass_storage.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/legacy/mass_storage.c 
> b/drivers/usb/gadget/legacy/mass_storage.c
> index 9ed22c5fb7fe..7a88c5282d61 100644
> --- a/drivers/usb/gadget/legacy/mass_storage.c
> +++ b/drivers/usb/gadget/legacy/mass_storage.c
> @@ -175,8 +175,10 @@ static int msg_bind(struct usb_composite_dev *cdev)
>   struct usb_descriptor_header *usb_desc;
>  
>   usb_desc = usb_otg_descriptor_alloc(cdev->gadget);
> - if (!usb_desc)
> + if (IS_ERR(usb_desc)) {
> + status = PTR_ERR(usb_desc);
>   goto fail_string_ids;
> + }
>   usb_otg_descriptor_init(cdev->gadget, usb_desc);
>   otg_desc[0] = usb_desc;
>   otg_desc[1] = NULL;
Ignore this.


Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

2020-11-12 Thread chenzhou



On 2020/11/12 16:36, Baoquan He wrote:
> On 11/12/20 at 10:25am, Mike Rapoport wrote:
>> On Wed, Nov 11, 2020 at 09:54:48PM +0800, Baoquan He wrote:
>>> On 11/11/20 at 09:27pm, chenzhou wrote:
>>>> Hi Baoquan,
>>> ...
>>>>>>  #ifdef CONFIG_CRASH_DUMP
>>>>>>  static int __init early_init_dt_scan_elfcorehdr(unsigned long node,
>>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>>> index 1c0f3e02f731..c55cee290bbb 100644
>>>>>> --- a/arch/arm64/mm/mmu.c
>>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>>> @@ -488,6 +488,10 @@ static void __init map_mem(pgd_t *pgdp)
>>>>>>   */
>>>>>>  memblock_mark_nomap(kernel_start, kernel_end - kernel_start);
>>>>>>  #ifdef CONFIG_KEXEC_CORE
>>>>>> +if (crashk_low_res.end)
>>>>>> +memblock_mark_nomap(crashk_low_res.start,
>>>>>> +resource_size(_low_res));
>>>>>> +
>>>>>>  if (crashk_res.end)
>>>>>>  memblock_mark_nomap(crashk_res.start,
>>>>>>  resource_size(_res));
>>>>>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>>>>>> index d39892bdb9ae..cdef7d8c91a6 100644
>>>>>> --- a/kernel/crash_core.c
>>>>>> +++ b/kernel/crash_core.c
>>>>>> @@ -321,7 +321,7 @@ int __init parse_crashkernel_low(char *cmdline,
>>>>>>  
>>>>>>  int __init reserve_crashkernel_low(void)
>>>>>>  {
>>>>>> -#ifdef CONFIG_X86_64
>>>>>> +#if defined(CONFIG_X86_64) || defined(CONFIG_ARM64)
>>>>> Not very sure if a CONFIG_64BIT checking is better.
>>>> If doing like this, there may be some compiling errors for other 64-bit 
>>>> kernel, such as mips.
>>>>>>  unsigned long long base, low_base = 0, low_size = 0;
>>>>>>  unsigned long low_mem_limit;
>>>>>>  int ret;
>>>>>> @@ -362,12 +362,14 @@ int __init reserve_crashkernel_low(void)
>>>>>>  
>>>>>>  crashk_low_res.start = low_base;
>>>>>>  crashk_low_res.end   = low_base + low_size - 1;
>>>>>> +#ifdef CONFIG_X86_64
>>>>>>  insert_resource(_resource, _low_res);
>>>>>> +#endif
>>>>>>  #endif
>>>>>>  return 0;
>>>>>>  }
>>>>>>  
>>>>>> -#ifdef CONFIG_X86
>>>>>> +#if defined(CONFIG_X86) || defined(CONFIG_ARM64)
>>>>> Should we make this weak default so that we can remove the ARCH config?
>>>> The same as above, some arch may not support kdump, in that case,  
>>>> compiling errors occur.
>>> OK, not sure if other people have better idea, oterwise, we can leave with 
>>> it. 
>>> Thanks for telling.
>> I think it would be better to have CONFIG_ARCH_WANT_RESERVE_CRASH_KERNEL
>> in arch/Kconfig and select this by X86 and ARM64.
>>
>> Since reserve_crashkernel() implementations are quite similart on other
>> architectures as well, we can have more users of this later.
> Yes, this sounds like a nice way.
I will think about this in next version.

Thanks,
Chen Zhou
>
> .
>



Re: [PATCH v13 4/8] x86: kdump: move reserve_crashkernel[_low]() into crash_core.c

2020-11-12 Thread chenzhou



On 2020/11/12 16:11, Mike Rapoport wrote:
> On Sat, Oct 31, 2020 at 03:44:33PM +0800, Chen Zhou wrote:
>> Make the functions reserve_crashkernel[_low]() as generic.
>> Arm64 will use these to reimplement crashkernel=X.
>>
>> Signed-off-by: Chen Zhou 
>> Tested-by: John Donnelly 
>> ---
>>  arch/x86/include/asm/kexec.h |  25 ++
>>  arch/x86/kernel/setup.c  | 151 +---
>>  include/linux/crash_core.h   |   4 +
>>  include/linux/kexec.h|   2 -
>>  kernel/crash_core.c  | 164 +++
>>  kernel/kexec_core.c  |  17 
>>  6 files changed, 195 insertions(+), 168 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 8cf9d3fd31c7..34afa7b645f9 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -21,6 +21,27 @@
>>  /* 2M alignment for crash kernel regions */
>>  #define CRASH_ALIGN SZ_16M
>>  
>> +/*
>> + * Keep the crash kernel below this limit.
>> + *
>> + * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
>> + * due to mapping restrictions.
>> + *
>> + * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
>> + * the upper limit of system RAM in 4-level paging mode. Since the kdump
>> + * jump could be from 5-level paging to 4-level paging, the jump will fail 
>> if
>> + * the kernel is put above 64 TB, and during the 1st kernel bootup there's
>> + * no good way to detect the paging mode of the target kernel which will be
>> + * loaded for dumping.
>> + */
>> +#ifdef CONFIG_X86_32
>> +# define CRASH_ADDR_LOW_MAX SZ_512M
>> +# define CRASH_ADDR_HIGH_MAXSZ_512M
>> +#else
>> +# define CRASH_ADDR_LOW_MAX SZ_4G
>> +# define CRASH_ADDR_HIGH_MAXSZ_64T
>> +#endif
>> +
>>  #ifndef __ASSEMBLY__
>>  
>>  #include 
>> @@ -200,6 +221,10 @@ typedef void crash_vmclear_fn(void);
>>  extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
>>  extern void kdump_nmi_shootdown_cpus(void);
>>  
>> +#ifdef CONFIG_KEXEC_CORE
>> +extern void __init reserve_crashkernel(void);
>> +#endif
>> +
>>  #endif /* __ASSEMBLY__ */
>>  
>>  #endif /* _ASM_X86_KEXEC_H */
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 1289f079ad5f..00b3840d30f9 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -25,8 +25,6 @@
>>  
>>  #include 
>>  
>> -#include 
>> -
>>  #include 
>>  #include 
>>  #include 
>> @@ -38,6 +36,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -389,153 +388,7 @@ static void __init 
>> memblock_x86_reserve_range_setup_data(void)
>>  }
>>  }
>>  
>> -/*
>> - * - Crashkernel reservation --
>> - */
>> -
>> -#ifdef CONFIG_KEXEC_CORE
>> -
>> -/*
>> - * Keep the crash kernel below this limit.
>> - *
>> - * Earlier 32-bits kernels would limit the kernel to the low 512 MB range
>> - * due to mapping restrictions.
>> - *
>> - * 64-bit kdump kernels need to be restricted to be under 64 TB, which is
>> - * the upper limit of system RAM in 4-level paging mode. Since the kdump
>> - * jump could be from 5-level paging to 4-level paging, the jump will fail 
>> if
>> - * the kernel is put above 64 TB, and during the 1st kernel bootup there's
>> - * no good way to detect the paging mode of the target kernel which will be
>> - * loaded for dumping.
>> - */
>> -#ifdef CONFIG_X86_32
>> -# define CRASH_ADDR_LOW_MAX SZ_512M
>> -# define CRASH_ADDR_HIGH_MAXSZ_512M
>> -#else
>> -# define CRASH_ADDR_LOW_MAX SZ_4G
>> -# define CRASH_ADDR_HIGH_MAXSZ_64T
>> -#endif
>> -
>> -static int __init reserve_crashkernel_low(void)
>> -{
>> -#ifdef CONFIG_X86_64
>> -unsigned long long base, low_base = 0, low_size = 0;
>> -unsigned long low_mem_limit;
>> -int ret;
>> -
>> -low_mem_limit = min(memblock_phys_mem_size(), CRASH_ADDR_LOW_MAX);
>> -
>> -/* crashkernel=Y,low */
>> -ret = parse_crashkernel_low(boot_command_line, low_mem_limit, 
>> _size, );
>> -if (ret) {
>> -/*
>> - * two parts from kernel/dma/swiotlb.c:
>> - * -swiotlb size: user-specified with swiotlb= or default.
>> - *
>> - * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>> - * to 8M for other buffers that may need to stay low too. Also
>> - * make sure we allocate enough extra low memory so that we
>> - * don't run out of DMA buffers for 32-bit devices.
>> - */
>> -low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL 
>> << 20);
>> -} else {
>> -/* passed with crashkernel=0,low ? */
>> -if (!low_size)
>> -return 0;
>> -}
>> -
>> -low_base = memblock_phys_alloc_range(low_size, CRASH_ALIGN, 
>> CRASH_ALIGN, CRASH_ADDR_LOW_MAX);
>> -if (!low_base) {
>> -pr_err("Cannot 

Re: [PATCH v13 1/8] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

2020-11-12 Thread chenzhou



On 2020/11/12 15:58, Mike Rapoport wrote:
> Hi,
>
> On Sat, Oct 31, 2020 at 03:44:30PM +0800, Chen Zhou wrote:
>> Move CRASH_ALIGN to header asm/kexec.h and replace the hard-coded
>> alignment with macro CRASH_ALIGN in function reserve_crashkernel().
>>
>> Suggested-by: Dave Young 
>> Signed-off-by: Chen Zhou 
>> Tested-by: John Donnelly 
>> ---
>>  arch/x86/include/asm/kexec.h | 3 +++
>>  arch/x86/kernel/setup.c  | 5 +
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 6802c59e8252..8cf9d3fd31c7 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -18,6 +18,9 @@
>>  
>>  # define KEXEC_CONTROL_CODE_MAX_SIZE2048
>>  
>> +/* 2M alignment for crash kernel regions */
>> +#define CRASH_ALIGN SZ_16M
> Please update the comment to match the code.
Ok, thanks for pointing this mistake.

Thanks,
Chen Zhou
>
>> +
>>  #ifndef __ASSEMBLY__
>>  
>>  #include 
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 84f581c91db4..bf373422dc8a 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -395,9 +395,6 @@ static void __init 
>> memblock_x86_reserve_range_setup_data(void)
>>  
>>  #ifdef CONFIG_KEXEC_CORE
>>  
>> -/* 16M alignment for crash kernel regions */
>> -#define CRASH_ALIGN SZ_16M
>> -
>>  /*
>>   * Keep the crash kernel below this limit.
>>   *
>> @@ -515,7 +512,7 @@ static void __init reserve_crashkernel(void)
>>  } else {
>>  unsigned long long start;
>>  
>> -start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
>> +start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, 
>> crash_base,
>>crash_base + crash_size);
>>  if (start != crash_base) {
>>  pr_info("crashkernel reservation failed - memory is in 
>> use.\n");
>> -- 
>> 2.20.1
>>



Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

2020-11-11 Thread chenzhou
Hi Baoquan, Bhupesh,


On 2020/11/11 11:01, Baoquan He wrote:
> Hi Zhou, Bhupesh
>
> On 10/31/20 at 03:44pm, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
> I went through this patchset, mainly the x86 related and generic
> changes, the changes look great and no risk. And I know Bhupesh is
> following up this and helping review, thanks, both.
>
> So you have also tested crashkernel reservation on x86_64, with the
> normal reservation, and high/low reservation, it is working well,
> right? Asking this because I didn't see the test result description, and
> just note it.

Yeah, i also tested on x86_64 and work well. I did these basic tests before 
sending every
new version.
But Bhupesh may have some review comments(Bhupesh referred one month ago).

Thanks,
Chen Zhou

>
> Thanks
> Baoquan
>
>> To solve these issues, change the behavior of crashkernel=X.
>> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
>> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
>>
>> We can also use "crashkernel=X,high" to select a high region above
>> DMA zone, which also tries to allocate at least 256M low memory in
>> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>>
>> When reserving crashkernel in high memory, some low memory is reserved
>> for crash dump kernel devices. So there may be two regions reserved for
>> crash dump kernel.
>> In order to distinct from the high region and make no effect to the use
>> of existing kexec-tools, rename the low region as "Crash kernel (low)",
>> and pass the low region by reusing DT property
>> "linux,usable-memory-range". We made the low memory region as the last
>> range of "linux,usable-memory-range" to keep compatibility with existing
>> user-space and older kdump kernels.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: support more than one crash kernel regions(see [1])
>>
>> Another update is document about DT property 'linux,usable-memory-range':
>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>>
>> This patchset contains the following eight patches:
>> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
>> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
>> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
>> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
>> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
>> 0006-arm64-kdump-reimplement-crashkernel-X.patch
>> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
>> 0008-kdump-update-Documentation-about-crashkernel.patch
>>
>> 0001-0003 are some x86 cleanups which prepares for making
>> functionsreserve_crashkernel[_low]() generic.
>> 0004 makes functions reserve_crashkernel[_low]() generic.
>> 0005-0006 reimplements arm64 crashkernel=X.
>> 0007 adds memory for devices by DT property linux,usable-memory-range.
>> 0008 updates the doc.
>>
>> Changes since [v12]
>> - Rebased on top of 5.10-rc1.
>> - Keep CRASH_ALIGN as 16M suggested by Dave.
>> - Drop patch "kdump: add threshold for the required memory".
>> - Add Tested-by from John.
>>
>> Changes since [v11]
>> - Rebased on top of 5.9-rc4.
>> - Make the function reserve_crashkernel() of x86 generic.
>> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
>> and arm64 use the generic version to reimplement crashkernel=X.
>>
>> Changes since [v10]
>> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>>
>> Changes since [v9]
>> - Patch 1 add Acked-by from Dave.
>> - Update patch 5 according to Dave's comments.
>> - Update chosen schema.
>>
>> Changes since [v8]
>> - Reuse DT property "linux,usable-memory-range".
>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
>> low
>> memory region.
>> - Fix kdump broken with ZONE_DMA reintroduced.
>> - Update chosen schema.
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt.
>> Add corresponding documentation to 
>> Documentation/devicetree/bindings/chosen.txt
>> suggested by Arnd.
>> - Add Tested-by from Jhon and pk.
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - 

Re: [PATCH v13 6/8] arm64: kdump: reimplement crashkernel=X

2020-11-11 Thread chenzhou
Hi Baoquan,


On 2020/11/11 9:59, Baoquan He wrote:
> On 10/31/20 at 03:44pm, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>>
>> To solve these issues, change the behavior of crashkernel=X and
>> introduce crashkernel=X,[high,low]. crashkernel=X tries low allocation
>> in DMA zone or DMA32 zone if CONFIG_ZONE_DMA is disabled, and fall back
>> to high allocation if it fails.
>> We can also use "crashkernel=X,high" to select a region above DMA zone,
>> which also tries to allocate at least 256M in DMA zone automatically
>> (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
>> "crashkernel=Y,low" can be used to allocate specified size low memory.
>>
>> Another minor change, there may be two regions reserved for crash
>> dump kernel, in order to distinct from the high region and make no
>> effect to the use of existing kexec-tools, rename the low region as
>> "Crash kernel (low)".
>>
>> Signed-off-by: Chen Zhou 
>> Tested-by: John Donnelly 
>> ---
>>  arch/arm64/include/asm/kexec.h |  9 +
>>  arch/arm64/kernel/setup.c  | 13 +++-
>>  arch/arm64/mm/init.c   | 60 ++
>>  arch/arm64/mm/mmu.c|  4 +++
>>  kernel/crash_core.c|  8 +++--
>>  5 files changed, 34 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
>> index 402d208265a3..79909ae5e22e 100644
>> --- a/arch/arm64/include/asm/kexec.h
>> +++ b/arch/arm64/include/asm/kexec.h
>> @@ -28,7 +28,12 @@
>>  /* 2M alignment for crash kernel regions */
>>  #define CRASH_ALIGN SZ_2M
>>  
>> +#ifdef CONFIG_ZONE_DMA
>> +#define CRASH_ADDR_LOW_MAX  arm64_dma_phys_limit
>> +#else
>>  #define CRASH_ADDR_LOW_MAX  arm64_dma32_phys_limit
>> +#endif
>> +
>>  #define CRASH_ADDR_HIGH_MAX MEMBLOCK_ALLOC_ACCESSIBLE
>>  
>>  #ifndef __ASSEMBLY__
>> @@ -96,6 +101,10 @@ static inline void crash_prepare_suspend(void) {}
>>  static inline void crash_post_resume(void) {}
>>  #endif
>>  
>> +#ifdef CONFIG_KEXEC_CORE
>> +extern void __init reserve_crashkernel(void);
>> +#endif
>> +
>>  #ifdef CONFIG_KEXEC_FILE
>>  #define ARCH_HAS_KIMAGE_ARCH
>>  
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 133257ffd859..6aff30de8f47 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>>  kernel_data.end <= res->end)
>>  request_resource(res, _data);
>>  #ifdef CONFIG_KEXEC_CORE
>> -/* Userspace will find "Crash kernel" region in /proc/iomem. */
>> +/*
>> + * Userspace will find "Crash kernel" or "Crash kernel (low)"
>> + * region in /proc/iomem.
>> + * In order to distinct from the high region and make no effect
>> + * to the use of existing kexec-tools, rename the low region as
>> + * "Crash kernel (low)".
>> + */
>> +if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>> +crashk_low_res.end <= res->end) {
>> +crashk_low_res.name = "Crash kernel (low)";
>> +request_resource(res, _low_res);
>> +}
>>  if (crashk_res.end && crashk_res.start >= res->start &&
>>  crashk_res.end <= res->end)
>>  request_resource(res, _res);
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index a07fd8e1f926..888c4f7eadc3 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -34,6 +34,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -62,66 +63,11 @@ EXPORT_SYMBOL(memstart_addr);
>>  phys_addr_t arm64_dma_phys_limit __ro_after_init;
>>  phys_addr_t arm64_dma32_phys_limit __ro_after_init;
>>  
>> -#ifdef CONFIG_KEXEC_CORE
>> -/*
>> - * reserve_crashkernel() - reserves memory for crash kernel
>> - *
>> - * This function reserves memory area given in "crashkernel=" kernel command
>> - * line parameter. The memory reserved is used by dump capture kernel when
>> - * primary kernel is crashing.
>> - */
>> -static void __init reserve_crashkernel(void)
>> -{
>> -unsigned long long crash_base, crash_size;
>> -int ret;
>> -
>> -ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
>> -_size, _base);
>> -/* no crashkernel= or invalid value 

Re: [PATCH v13 1/8] x86: kdump: replace the hard-coded alignment with macro CRASH_ALIGN

2020-11-11 Thread chenzhou
Hi Baoquan,


On 2020/11/11 9:38, Baoquan He wrote:
> On 10/31/20 at 03:44pm, Chen Zhou wrote:
>> Move CRASH_ALIGN to header asm/kexec.h and replace the hard-coded
>> alignment with macro CRASH_ALIGN in function reserve_crashkernel().
> Seems you tell what you have done in this patch, but don't like adding
> several more words to tell why it's done like that. Please see below
> inline comments.
>
> In other patches, I can also see this similar problem.
Thanks for your review. I will update relevant commit messages.
>
>> Suggested-by: Dave Young 
>> Signed-off-by: Chen Zhou 
>> Tested-by: John Donnelly 
>> ---
>>  arch/x86/include/asm/kexec.h | 3 +++
>>  arch/x86/kernel/setup.c  | 5 +
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 6802c59e8252..8cf9d3fd31c7 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -18,6 +18,9 @@
>>  
>>  # define KEXEC_CONTROL_CODE_MAX_SIZE2048
>>  
>> +/* 2M alignment for crash kernel regions */
>> +#define CRASH_ALIGN SZ_16M
>> +
>>  #ifndef __ASSEMBLY__
>>  
>>  #include 
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 84f581c91db4..bf373422dc8a 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -395,9 +395,6 @@ static void __init 
>> memblock_x86_reserve_range_setup_data(void)
>>  
>>  #ifdef CONFIG_KEXEC_CORE
>>  
>> -/* 16M alignment for crash kernel regions */
>> -#define CRASH_ALIGN SZ_16M
>> -
>>  /*
>>   * Keep the crash kernel below this limit.
>>   *
>> @@ -515,7 +512,7 @@ static void __init reserve_crashkernel(void)
>>  } else {
>>  unsigned long long start;
>>  
>> -start = memblock_phys_alloc_range(crash_size, SZ_1M, crash_base,
>> +start = memblock_phys_alloc_range(crash_size, CRASH_ALIGN, 
>> crash_base,
>>crash_base + crash_size);
> Here, SZ_1M is replaced with CRASH_ALIGN which is 16M. I remember I ever
> commented that this had better be told in patch log.
got it.

Thanks,
Chen Zhou
>
>>  if (start != crash_base) {
>>  pr_info("crashkernel reservation failed - memory is in 
>> use.\n");
>> -- 
>> 2.20.1
>>
>>
>> ___
>> kexec mailing list
>> ke...@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec
>>
> .
>



Re: [PATCH v13 0/8] support reserving crashkernel above 4G on arm64 kdump

2020-11-09 Thread chenzhou
Hi all,

Friendly ping...


On 2020/10/31 15:44, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone (or the DMA32 zone if
> CONFIG_ZONE_DMA is disabled), and fall back to high allocation if it fails.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically (or the DMA32 zone if CONFIG_ZONE_DMA is disabled).
> "crashkernel=Y,low" can be used to allocate specified size low memory.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following eight patches:
> 0001-x86-kdump-replace-the-hard-coded-alignment-with-macr.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0008-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements arm64 crashkernel=X.
> 0007 adds memory for devices by DT property linux,usable-memory-range.
> 0008 updates the doc.
>
> Changes since [v12]
> - Rebased on top of 5.10-rc1.
> - Keep CRASH_ALIGN as 16M suggested by Dave.
> - Drop patch "kdump: add threshold for the required memory".
> - Add Tested-by from John.
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to 
> Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 
> 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
>
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a 

Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-18 Thread chenzhou
Hi Bhupesh,


On 2020/10/7 15:07, Bhupesh Sharma wrote:
> Hi Catalin,
>
> On Tue, Oct 6, 2020 at 11:30 PM Catalin Marinas  
> wrote:
>> On Mon, Oct 05, 2020 at 11:12:10PM +0530, Bhupesh Sharma wrote:
>>> I think my earlier email with the test results on this series bounced
>>> off the mailing list server (for some weird reason), but I still see
>>> several issues with this patchset. I will add specific issues in the
>>> review comments for each patch again, but overall, with a crashkernel
>>> size of say 786M, I see the following issue:
>>>
>>> # cat /proc/cmdline
>>> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..> 
>>> rd.lvm.lv=<..snip..> crashkernel=786M
>>>
>>> I see two regions of size 786M and 256M reserved in low and high
>>> regions respectively, So we reserve a total of 1042M of memory, which
>>> is an incorrect behaviour:
>>>
>>> # dmesg | grep -i crash
>>> [0.00] Reserving 256MB of low memory at 2816MB for crashkernel 
>>> (System low RAM: 768MB)
>>> [0.00] Reserving 786MB of memory at 654158MB for crashkernel 
>>> (System RAM: 130816MB)
>>> [0.00] Kernel command line: 
>>> BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+ 
>>> root=/dev/mapper/rhel_ampere--hr330a--03-root ro 
>>> rd.lvm.lv=rhel_ampere-hr330a-03/root rd.lvm.lv=rhel_ampere-hr330a-03/swap 
>>> crashkernel=786M cma=1024M
>>>
>>> # cat /proc/iomem | grep -i crash
>>>   b000-bfff : Crash kernel (low)
>>>   bfcbe0-bffcff : Crash kernel
>> As Chen said, that's the intended behaviour and how x86 works. The
>> requested 768M goes in the high range if there's not enough low memory
>> and an additional buffer for swiotlb is allocated, hence the low 256M.
> I understand, but why 256M (as low) for arm64? x86_64 setups usually
> have more system memory available as compared to several commercially
> available arm64 setups. So is the intent, just to keep the behavior
> similar between arm64 and x86_64?
>
> Should we have a CONFIG option / bootarg to help one select the max
> 'low_size'? Currently the ' low_size' value is calculated as:
>
> /*
>  * two parts from kernel/dma/swiotlb.c:
>  * -swiotlb size: user-specified with swiotlb= or default.
>  *
>  * -swiotlb overflow buffer: now hardcoded to 32k. We round it
>  * to 8M for other buffers that may need to stay low too. Also
>  * make sure we allocate enough extra low memory so that we
>  * don't run out of DMA buffers for 32-bit devices.
>  */
> low_size = max(swiotlb_size_or_default() + (8UL << 20), 256UL << 20);
>
> Since many arm64 boards ship with swiotlb=0 (turned off) via kernel
> bootargs, the low_size, still ends up being 256M in such cases,
> whereas this 256M can be used for some other purposes - so should we
> be limiting this to 64M and failing the crash kernel allocation
> request (gracefully) otherwise?
>
>> We could (as an additional patch), subtract the 256M from the high
>> allocation so that you'd get a low 256M and a high 512M, not sure it's
>> worth it. Note that with a "crashkernel=768M,high" option, you still get
>> the additional low 256M, otherwise the crashkernel won't be able to
>> boot as there's no memory in ZONE_DMA. In the explicit ",high" request
>> case, I'm not sure subtracted the 256M is more intuitive.
>> In 5.11, we also hope to fix the ZONE_DMA layout for non-RPi4 platforms
>> to cover the entire 32-bit address space (i.e. identical to the current
>> ZONE_DMA32).
>>
>>> IMO, we should test this feature more before including this in 5.11
>> Definitely. That's one of the reasons we haven't queued it yet. So any
>> help with testing here is appreciated.
> Sure, I am running more checks on this series. I will be soon back
> with more updates.

Sorry to bother you. I am looking forward to your review comments.


Thanks,
Chen Zhou
>
> Regards,
> Bhupesh
>
> .
>



Re: [PATCH v12 9/9] kdump: update Documentation about crashkernel

2020-10-05 Thread chenzhou
Hi Catalin,


On 2020/10/6 1:19, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 09:47:45PM +0800, Chen Zhou wrote:
>> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
>> b/Documentation/admin-guide/kdump/kdump.rst
>> index 2da65fef2a1c..549611abc581 100644
>> --- a/Documentation/admin-guide/kdump/kdump.rst
>> +++ b/Documentation/admin-guide/kdump/kdump.rst
> [...]
>> @@ -316,8 +325,18 @@ Boot into System Kernel
>> kernel will automatically locate the crash kernel image within the
>> first 512MB of RAM if X is not given.
>>  
>> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
>> -   the kernel, X if explicitly specified, must be aligned to 2MiB 
>> (0x20).
>> +   On arm64, use "crashkernel=X" to try low allocation in DMA zone, and
>> +   fall back to high allocation if it fails. And go for high allocation
>> +   directly if the required size is too large.
>> +   We can also use "crashkernel=X,high" to select a high region above
>> +   DMA zone, which also tries to allocate at least 256M low memory in
>> +   DMA zone automatically.
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory
>> +   in DMA zone.
>> +   For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
> I don't think we should mention non-RPi4 explicitly here. I don't even
> understand what the suggestion is since the only way is to disable
> ZONE_DMA in the kernel config. I'd just stick to ZONE_DMA description
> here.
How about like this:
If the kernel config ZONE_DMA is disabled, just try low allocation in DMA32 zone
and high allocation above DMA32 zone.

Thanks,
Chen Zhou
>
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from
>> +   specified start address X. Note that the start address of the kernel,
>> +   X if explicitly specified, must be aligned to 2MiB (0x20).
>>  
>>  Load the Dump-capture Kernel
>>  
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index a1068742a6df..f7df572d8f64 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -727,6 +727,10 @@
>>  [KNL, X86-64] Select a region under 4G first, and
>>  fall back to reserve region above 4G when '@offset'
>>  hasn't been specified.
>> +[KNL, arm64] Try low allocation in DMA zone, fall back
>> +to high allocation if it fails when '@offset' hasn't 
>> been
>> +specified. For non-RPi4 platforms, change DMA zone to
>> +DMA32 zone.
> Same here, unclear what "change DMA zone to DMA32 zone" means.
>
>>  See Documentation/admin-guide/kdump/kdump.rst for 
>> further details.
>>  
>>  crashkernel=range1:size1[,range2:size2,...][@offset]
>> @@ -743,6 +747,8 @@
>>  Otherwise memory region will be allocated below 4G, if
>>  available.
>>  It will be ignored if crashkernel=X is specified.
>> +[KNL, arm64] range in high memory.
>> +Allow kernel to allocate physical memory region from 
>> top.
>>  crashkernel=size[KMG],low
>>  [KNL, X86-64] range under 4G. When crashkernel=X,high
>>  is passed, kernel could allocate physical memory region
>> @@ -751,13 +757,16 @@
>>  requires at least 64M+32K low memory, also enough extra
>>  low memory is needed to make sure DMA buffers for 32-bit
>>  devices won't run out. Kernel would try to allocate at
>> -at least 256M below 4G automatically.
>> +least 256M below 4G automatically.
>>  This one let user to specify own low range under 4G
>>  for second kernel instead.
>>  0: to disable low allocation.
>>  It will be ignored when crashkernel=X,high is not used
>>  or memory reserved is below 4G.
>> -
>> +[KNL, arm64] range in low memory.
>> +This one let user to specify a low range in DMA zone for
>> +crash dump kernel. For non-RPi4 platforms, change DMA 
>> zone
>> +to DMA32 zone.
> And again here.
>



Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-10-05 Thread chenzhou
Hi Bhupesh,


On 2020/10/6 1:42, Bhupesh Sharma wrote:
> Hi Catalin, Chen,
>
> On Mon, Oct 5, 2020 at 10:39 PM Catalin Marinas  
> wrote:
>> On Sat, Sep 12, 2020 at 06:44:29AM -0500, John Donnelly wrote:
>>> On 9/7/20 8:47 AM, Chen Zhou wrote:
 Chen Zhou (9):
x86: kdump: move CRASH_ALIGN to 2M
x86: kdump: make the lower bound of crash kernel reservation
  consistent
x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions
  reserve_crashkernel[_low]()
x86: kdump: move reserve_crashkernel[_low]() into crash_core.c
arm64: kdump: introduce some macroes for crash kernel reservation
arm64: kdump: reimplement crashkernel=X
kdump: add threshold for the required memory
arm64: kdump: add memory for devices by DT property
  linux,usable-memory-range
kdump: update Documentation about crashkernel
>> [...]
>>> I did a brief unit-test on 5.9-rc4.
>>>
>>> Please add:
>>>
>>> Tested-by:  John Donnelly 
>> Thanks for testing.
>>
>>> This activity is over a year old. It needs accepted.
>> It's getting there, hopefully in 5.11. There are some minor tweaks to
>> address.
> I think my earlier email with the test results on this series bounced
> off the mailing list server (for some weird reason), but I still see
> several issues with this patchset. I will add specific issues in the
> review comments for each patch again, but overall, with a crashkernel
> size of say 786M, I see the following issue:
>
> # cat /proc/cmdline
> BOOT_IMAGE=(hd7,gpt2)/vmlinuz-5.9.0-rc7+ root=<..snip..>
> rd.lvm.lv=<..snip..> crashkernel=786M
>
> I see two regions of size 786M and 256M reserved in low and high
> regions respectively, So we reserve a total of 1042M of memory, which
> is an incorrect behaviour:
>
> # dmesg | grep -i crash
> [0.00] Reserving 256MB of low memory at 2816MB for crashkernel
> (System low RAM: 768MB)
> [0.00] Reserving 786MB of memory at 654158MB for crashkernel
> (System RAM: 130816MB)
> [0.00] Kernel command line:
> BOOT_IMAGE=(hd2,gpt2)/vmlinuz-5.9.0-rc7+
> root=/dev/mapper/rhel_ampere--hr330a--03-root ro
> rd.lvm.lv=rhel_ampere-hr330a-03/root
> rd.lvm.lv=rhel_ampere-hr330a-03/swap crashkernel=786M cma=1024M
>
> # cat /proc/iomem | grep -i crash
>   b000-bfff : Crash kernel (low)
>   bfcbe0-bffcff : Crash kernel
>
> IMO, we should test this feature more before including this in 5.11
Thanks for you test. This behavior is what we what. What is the correct 
behavior you think?

Besides, this feature is been tested by John and PK, and i test for various 
parameters.
We may miss something, any comments are welcome.

Thanks,
Chen Zhou
>
> Thanks,
> Bhupesh
>
> .
>



Re: [PATCH v12 7/9] kdump: add threshold for the required memory

2020-10-05 Thread chenzhou



On 2020/10/6 1:12, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 09:47:43PM +0800, Chen Zhou wrote:
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index 3f735cb37ace..d11d597a470d 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -378,6 +378,15 @@ int __init reserve_crashkernel_low(void)
>>  }
>>  
>>  #if defined(CONFIG_X86) || defined(CONFIG_ARM64)
>> +
>> +/*
>> + * Add a threshold for required memory size of crashkernel. If required 
>> memory
>> + * size is greater than threshold, just go for high allocation directly. The
>> + * value of threshold is set as half of the total low memory.
>> + */
>> +#define REQUIRED_MEMORY_THRESHOLD   (memblock_mem_size(CRASH_ADDR_LOW_MAX 
>> >> \
>> +PAGE_SHIFT) >> 1)
>> +
>>  #ifdef CONFIG_KEXEC_CORE
>>  /*
>>   * reserve_crashkernel() - reserves memory for crash kernel
>> @@ -422,7 +431,7 @@ void __init reserve_crashkernel(void)
>>   * So try low memory first and fall back to high memory
>>   * unless "crashkernel=size[KMG],high" is specified.
>>   */
>> -if (!high)
>> +if (!high && crash_size <= REQUIRED_MEMORY_THRESHOLD)
>>  crash_base = memblock_find_in_range(CRASH_ALIGN,
>>  CRASH_ADDR_LOW_MAX,
>>  crash_size, CRASH_ALIGN);
> Since any change now is affecting the x86 semantics slightly, I'd
> suggest you drop this patch. We can add it later if needed, once the
> core changes are in.
Ok, i will drop this patch in next version.

Thanks,
Chen Zhou
>
> Thinking about this, if one requires a crashkernel reservation that
> allocates all of the ZONE_DMA, it would probably be noticed and explicit
> ,high/,low options can be used.
>
> Note that we are also trying to make ZONE_DMA full 32-bit on non-RPi4
> hardware.
>



Re: [PATCH v12 6/9] arm64: kdump: reimplement crashkernel=X

2020-10-05 Thread chenzhou



On 2020/10/6 1:16, Catalin Marinas wrote:
> On Mon, Sep 07, 2020 at 09:47:42PM +0800, Chen Zhou wrote:
>> diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
>> index 53acbeca4f57..1b24072f2bae 100644
>> --- a/arch/arm64/kernel/setup.c
>> +++ b/arch/arm64/kernel/setup.c
>> @@ -238,7 +238,18 @@ static void __init request_standard_resources(void)
>>  kernel_data.end <= res->end)
>>  request_resource(res, _data);
>>  #ifdef CONFIG_KEXEC_CORE
>> -/* Userspace will find "Crash kernel" region in /proc/iomem. */
>> +/*
>> + * Userspace will find "Crash kernel" or "Crash kernel (low)"
>> + * region in /proc/iomem.
>> + * In order to distinct from the high region and make no effect
>> + * to the use of existing kexec-tools, rename the low region as
>> + * "Crash kernel (low)".
>> + */
>> +if (crashk_low_res.end && crashk_low_res.start >= res->start &&
>> +crashk_low_res.end <= res->end) {
>> +crashk_low_res.name = "Crash kernel (low)";
>> +request_resource(res, _low_res);
>> +}
> With the changes in this series (including the above), how do the
> current kexec-tools behave? Do they pick just the high region and the
> loaded kernel will subsequently fail to boot?
Yes,just pick the high region and will boot fail if low memory is needed.

Thanks,
Chen Zhou
>



Re: [PATCH v12 3/9] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel[_low]()

2020-09-18 Thread chenzhou
Hi Catalin,


On 2020/9/18 16:59, chenzhou wrote:
> Hi Baoquan,
>
> On 2020/9/18 15:25, Baoquan He wrote:
>> Hi,
>>
>> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>>> To make the functions reserve_crashkernel[_low]() as generic,
>>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>>>
>>> Signed-off-by: Chen Zhou 
>>> ---
>>>  arch/x86/kernel/setup.c | 11 ++-
>>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>>> index d7fd90c52dae..71a6a6e7ca5b 100644
>>> --- a/arch/x86/kernel/setup.c
>>> +++ b/arch/x86/kernel/setup.c
>>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>>> unsigned long total_low_mem;
>>> int ret;
>>>  
>>> -   total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>> +   total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
>> Just note that the replacement has been done in another patch from Mike
>> Rapoport, partially. He seems to have done reserve_crashkernel_low()
>> part, there's one left in reserve_crashkernel(), you might want to check
>> that. 
>>
>> Mike's patch which is from a patchset has been merged into Andrew's next
>> tree.
>>
>> commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
>> Author: Mike Rapoport 
>> Date:   Wed Aug 26 15:22:32 2020 +1000
>>
>> x86/setup: simplify reserve_crashkernel()
As Baoquan said, some functions have been changed in the next tree,
if i need to rebase on top of the next tree.

Thanks,
Chen Zhou
> Yeah, the function reserve_crashkernel() has been changed in the next tree.
> Thanks for your review and reminder.
>
> Thanks,
> Chen Zhou
>>>  
>>> /* crashkernel=Y,low */
>>> ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
>>> _size, );
>>> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>>> return 0;
>>> }
>>>  
>>> -   low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, 
>>> CRASH_ALIGN);
>>> +   low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, 
>>> low_size, CRASH_ALIGN);
>>> if (!low_base) {
>>> pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
>>> smaller size.\n",
>>>(unsigned long)(low_size >> 20));
>>> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>>> if (!crash_base) {
>>> /*
>>>  * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>>> -* crashkernel=x,high reserves memory over 4G, also allocates
>>> -* 256M extra low memory for DMA buffers and swiotlb.
>>> +* crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>>> +* also allocates 256M extra low memory for DMA buffers
>>> +* and swiotlb.
>>>  * But the extra memory is not required for all machines.
>>>  * So try low memory first and fall back to high memory
>>>  * unless "crashkernel=size[KMG],high" is specified.
>>> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>>> return;
>>> }
>>>  
>>> -   if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>>> +   if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>> memblock_free(crash_base, crash_size);
>>> return;
>>> }
>>> -- 
>>> 2.20.1
>>>
>> .
>>



Re: [PATCH v12 3/9] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel[_low]()

2020-09-18 Thread chenzhou
Hi Baoquan,

On 2020/9/18 15:25, Baoquan He wrote:
> Hi,
>
> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>> To make the functions reserve_crashkernel[_low]() as generic,
>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>>
>> Signed-off-by: Chen Zhou 
>> ---
>>  arch/x86/kernel/setup.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index d7fd90c52dae..71a6a6e7ca5b 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>>  unsigned long total_low_mem;
>>  int ret;
>>  
>> -total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>> +total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> Just note that the replacement has been done in another patch from Mike
> Rapoport, partially. He seems to have done reserve_crashkernel_low()
> part, there's one left in reserve_crashkernel(), you might want to check
> that. 
>
> Mike's patch which is from a patchset has been merged into Andrew's next
> tree.
>
> commit 6e50f7672ffa362e9bd4bc0c0d2524ed872828c5
> Author: Mike Rapoport 
> Date:   Wed Aug 26 15:22:32 2020 +1000
>
> x86/setup: simplify reserve_crashkernel()
Yeah, the function reserve_crashkernel() has been changed in the next tree.
Thanks for your review and reminder.

Thanks,
Chen Zhou
>
>>  
>>  /* crashkernel=Y,low */
>>  ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
>> _size, );
>> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>>  return 0;
>>  }
>>  
>> -low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, 
>> CRASH_ALIGN);
>> +low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, 
>> low_size, CRASH_ALIGN);
>>  if (!low_base) {
>>  pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
>> smaller size.\n",
>> (unsigned long)(low_size >> 20));
>> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>>  if (!crash_base) {
>>  /*
>>   * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>> - * crashkernel=x,high reserves memory over 4G, also allocates
>> - * 256M extra low memory for DMA buffers and swiotlb.
>> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>> + * also allocates 256M extra low memory for DMA buffers
>> + * and swiotlb.
>>   * But the extra memory is not required for all machines.
>>   * So try low memory first and fall back to high memory
>>   * unless "crashkernel=size[KMG],high" is specified.
>> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>>  return;
>>  }
>>  
>> -if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>> +if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>  memblock_free(crash_base, crash_size);
>>  return;
>>  }
>> -- 
>> 2.20.1
>>
> .
>



Re: [PATCH v12 3/9] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel[_low]()

2020-09-17 Thread chenzhou
Hi Dave,


On 2020/9/18 11:01, Dave Young wrote:
> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>> To make the functions reserve_crashkernel[_low]() as generic,
>> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
>>
>> Signed-off-by: Chen Zhou 
>> ---
>>  arch/x86/kernel/setup.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index d7fd90c52dae..71a6a6e7ca5b 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>>  unsigned long total_low_mem;
>>  int ret;
>>  
>> -total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>> +total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> total_low_mem != CRASH_ADDR_LOW_MAX
I just replace the magic number with macro, no other change.
Besides, function memblock_mem_size(limit_pfn) will compute the memory size
according to the actual system ram.

Thanks,
Chen Zhou
>
>>  
>>  /* crashkernel=Y,low */
>>  ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
>> _size, );
> The param total_low_mem is for dynamically change crash_size according
> to system ram size.
>
> Is above change a must for your arm64 patches?
See above.
>
>> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>>  return 0;
>>  }
>>  
>> -low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, 
>> CRASH_ALIGN);
>> +low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, 
>> low_size, CRASH_ALIGN);
>>  if (!low_base) {
>>  pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
>> smaller size.\n",
>> (unsigned long)(low_size >> 20));
>> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>>  if (!crash_base) {
>>  /*
>>   * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
>> - * crashkernel=x,high reserves memory over 4G, also allocates
>> - * 256M extra low memory for DMA buffers and swiotlb.
>> + * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
>> + * also allocates 256M extra low memory for DMA buffers
>> + * and swiotlb.
>>   * But the extra memory is not required for all machines.
>>   * So try low memory first and fall back to high memory
>>   * unless "crashkernel=size[KMG],high" is specified.
>> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>>  return;
>>  }
>>  
>> -if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
>> +if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>>  memblock_free(crash_base, crash_size);
>>  return;
>>  }
>> -- 
>> 2.20.1
>>
> .
>



Re: [PATCH v12 0/9] support reserving crashkernel above 4G on arm64 kdump

2020-09-15 Thread chenzhou



On 2020/9/7 21:47, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
>
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in DMA zone, and fall back to
> high allocation if it fails.
> If requized size X is too large and leads to very little low memory
> in DMA zone after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
>
> We can also use "crashkernel=X,high" to select a high region above
> DMA zone, which also tries to allocate at least 256M low memory in
> DMA zone automatically.
> "crashkernel=Y,low" can be used to allocate specified size low memory.
> For non-RPi4 platforms, change DMA zone memtioned above to DMA32 zone.
>
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
>
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
>
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
>
> This patchset contains the following nine patches:
> 0001-x86-kdump-move-CRASH_ALIGN-to-2M.patch
> 0002-x86-kdump-make-the-lower-bound-of-crash-kernel-reser.patch
> 0003-x86-kdump-use-macro-CRASH_ADDR_LOW_MAX-in-functions-.patch
> 0004-x86-kdump-move-reserve_crashkernel-_low-into-crash_c.patch
> 0005-arm64-kdump-introduce-some-macroes-for-crash-kernel-.patch
> 0006-arm64-kdump-reimplement-crashkernel-X.patch
> 0007-kdump-add-threshold-for-the-required-memory.patch
> 0008-arm64-kdump-add-memory-for-devices-by-DT-property-li.patch
> 0009-kdump-update-Documentation-about-crashkernel.patch
>
> 0001-0003 are some x86 cleanups which prepares for making
> functionsreserve_crashkernel[_low]() generic.
>
> 0004 makes functions reserve_crashkernel[_low]() generic.
> 0005-0006 reimplements crashkernel=X.
> 0007 adds threshold for the required memory.
> 0008 adds memory for devices by DT property linux,usable-memory-range.
> 0009 updates the doc.
Hi Catalin and Dave,

Any other suggestions about this patchset? Let me know if you have any 
questions.

Thanks,
Chen Zhou
>
> Changes since [v11]
> - Rebased on top of 5.9-rc4.
> - Make the function reserve_crashkernel() of x86 generic.
> Suggested by Catalin, make the function reserve_crashkernel() of x86 generic
> and arm64 use the generic version to reimplement crashkernel=X.
>
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
>
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
>
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
>
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to 
> Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
>
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
>
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 
> 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
>
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>
> Changes since [v3]
> - Add 

Re: [PATCH v12 1/9] x86: kdump: move CRASH_ALIGN to 2M

2020-09-07 Thread chenzhou



On 2020/9/8 9:21, Dave Young wrote:
> Hi,
>
> On 09/07/20 at 09:47pm, Chen Zhou wrote:
>> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M and default
>> value is 2M, so move CRASH_ALIGN to 2M, with smaller value reservation
>> can have more chance to succeed.
> Seems still some misunderstanding about the change :(  I'm sorry if I
> did not explain it clearly.
>
> Previously I missed the PHYSICAL_ALIGN can change according to .config
> I mean we should change the value to CONFIG_PHYSICAL_ALIGN for X86
> And I suggest to move back to keep using 16M.  And do not change it in
> this series.
Hi Dave,

Sorry, i misunderstood about this.

Ok, this patch will keep the value of CRASH_ALIGN as it is,
just move CRASH_ALIGN to header asm/kexec.h and replace the hard-coded alignment
with macro CRASH_ALIGN in function reserve_crashkernel().

Thanks,
Chen Zhou
>
>> And replace the hard-coded alignment with macro CRASH_ALIGN in function
>> reserve_crashkernel().
>>
>> Suggested-by: Dave Young 
>> Signed-off-by: Chen Zhou 
>> ---
>>  arch/x86/include/asm/kexec.h | 3 +++
>>  arch/x86/kernel/setup.c  | 5 +
>>  2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
>> index 6802c59e8252..83f200dd54a1 100644
>> --- a/arch/x86/include/asm/kexec.h
>> +++ b/arch/x86/include/asm/kexec.h
>> @@ -18,6 +18,9 @@
>>  
>>  # define KEXEC_CONTROL_CODE_MAX_SIZE2048
>>  
>> +/* 2M alignment for crash kernel regions */
>> +#define CRASH_ALIGN SZ_2M
>> +
>>  #ifndef __ASSEMBLY__
>>  
>>  #include 
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 3511736fbc74..296294ad0dd8 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -402,9 +402,6 @@ static void __init 
>> memblock_x86_reserve_range_setup_data(void)
>>  
>>  #ifdef CONFIG_KEXEC_CORE
>>  
>> -/* 16M alignment for crash kernel regions */
>> -#define CRASH_ALIGN SZ_16M
>> -
>>  /*
>>   * Keep the crash kernel below this limit.
>>   *
>> @@ -530,7 +527,7 @@ static void __init reserve_crashkernel(void)
>>  
>>  start = memblock_find_in_range(crash_base,
>> crash_base + crash_size,
>> -   crash_size, 1 << 20);
>> +   crash_size, CRASH_ALIGN);
>>  if (start != crash_base) {
>>  pr_info("crashkernel reservation failed - memory is in 
>> use.\n");
>>  return;
>> -- 
>> 2.20.1
>>
> Thanks
> Dave
>
>
> .
>




Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-04 Thread chenzhou



On 2020/9/4 12:16, Dave Young wrote:
> On 09/04/20 at 12:02pm, chenzhou wrote:
>>
>> On 2020/9/4 11:10, Dave Young wrote:
>>> On 09/04/20 at 11:04am, Dave Young wrote:
>>>> On 09/03/20 at 07:26pm, chenzhou wrote:
>>>>> Hi Catalin,
>>>>>
>>>>>
>>>>> On 2020/9/3 1:09, Catalin Marinas wrote:
>>>>>> On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
>>>>>>> There are following issues in arm64 kdump:
>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>>>>>>> will fail when there is no enough low memory.
>>>>>>> 2. If reserving crashkernel above 4G, in this case, crash dump
>>>>>>> kernel will boot failure because there is no low memory available
>>>>>>> for allocation.
>>>>>>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and 
>>>>>>> ZONE_DMA32"),
>>>>>>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>>>>>>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>>>>>>> fail.
>>>>>>>
>>>>>>> To solve these issues, change the behavior of crashkernel=X.
>>>>>>> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
>>>>>>> high allocation if it fails.
>>>>>>>
>>>>>>> If requized size X is too large and leads to very little free memory
>>>>>>> in ZONE_DMA after low allocation, the system may not work normally.
>>>>>>> So add a threshold and go for high allocation directly if the required
>>>>>>> size is too large. The value of threshold is set as the half of
>>>>>>> the low memory.
>>>>>>>
>>>>>>> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
>>>>>>> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
>>>>>>> specified size low memory.
>>>>>> Except for the threshold to keep zone ZONE_DMA memory,
>>>>>> reserve_crashkernel() looks very close to the x86 version. Shall we try
>>>>>> to make this generic as well? In the first instance, you could avoid the
>>>>>> threshold check if it takes an explicit ",high" option.
>>>>> Ok, i will try to do this.
>>>>>
>>>>> I look into the function reserve_crashkernel() of x86 and found the start 
>>>>> address is
>>>>> CRASH_ALIGN in function memblock_find_in_range(), which is different with 
>>>>> arm64.
>>>>>
>>>>> I don't figure out why is CRASH_ALIGN in x86, is there any specific 
>>>>> reason?
>>>> Hmm, took another look at the option CONFIG_PHYSICAL_ALIGN
>>>> config PHYSICAL_ALIGN
>>>> hex "Alignment value to which kernel should be aligned"
>>>> default "0x20"
>>>> range 0x2000 0x100 if X86_32
>>>> range 0x20 0x100 if X86_64
>>>>
>>>> According to above, I think the 16M should come from the largest value
>>>> But the default value is 2M,  with smaller value reservation can have
>>>> more chance to succeed.
>>>>
>>>> It seems we still need arch specific CRASH_ALIGN, but the initial
>>>> version you added the #ifdef for different arches, can you move the
>>>> macro to arch specific headers?
>>> And just keep the x86 align value as is, I can try to change the x86
>>> value later to CONFIG_PHYSICAL_ALIGN, in this way this series can be
>>> cleaner.
>> Ok. I have no question about the value of macro CRASH_ALIGN,
>> instead the lower bound of memblock_find_in_range().
>>
>> For x86, in reserve_crashkernel(),restrict the lower bound of the range to 
>> CRASH_ALIGN,
>> ...
>> crash_base = memblock_find_in_range(CRASH_ALIGN,
>> CRASH_ADDR_LOW_MAX,
>> crash_size, CRASH_ALIGN);
>> ...
>>
>> in reserve_crashkernel_low(),with no this restriction.
>> ...
>> low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>> ...
>>
>> How about all making memblock_find_in_range() search from the start of 
>> memory?
>> If it is ok, i will do like this in the generic version.
> I feel starting with CRASH_ALIGN sounds better, can you just search from
> CRASH_ALIGN in generic version?
ok.
>
> Thanks
> Dave
>
>
> .
>




Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-03 Thread chenzhou



On 2020/9/4 11:10, Dave Young wrote:
> On 09/04/20 at 11:04am, Dave Young wrote:
>> On 09/03/20 at 07:26pm, chenzhou wrote:
>>> Hi Catalin,
>>>
>>>
>>> On 2020/9/3 1:09, Catalin Marinas wrote:
>>>> On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
>>>>> There are following issues in arm64 kdump:
>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>>>>> will fail when there is no enough low memory.
>>>>> 2. If reserving crashkernel above 4G, in this case, crash dump
>>>>> kernel will boot failure because there is no low memory available
>>>>> for allocation.
>>>>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>>>>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>>>>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>>>>> fail.
>>>>>
>>>>> To solve these issues, change the behavior of crashkernel=X.
>>>>> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
>>>>> high allocation if it fails.
>>>>>
>>>>> If requized size X is too large and leads to very little free memory
>>>>> in ZONE_DMA after low allocation, the system may not work normally.
>>>>> So add a threshold and go for high allocation directly if the required
>>>>> size is too large. The value of threshold is set as the half of
>>>>> the low memory.
>>>>>
>>>>> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
>>>>> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
>>>>> specified size low memory.
>>>> Except for the threshold to keep zone ZONE_DMA memory,
>>>> reserve_crashkernel() looks very close to the x86 version. Shall we try
>>>> to make this generic as well? In the first instance, you could avoid the
>>>> threshold check if it takes an explicit ",high" option.
>>> Ok, i will try to do this.
>>>
>>> I look into the function reserve_crashkernel() of x86 and found the start 
>>> address is
>>> CRASH_ALIGN in function memblock_find_in_range(), which is different with 
>>> arm64.
>>>
>>> I don't figure out why is CRASH_ALIGN in x86, is there any specific reason?
>> Hmm, took another look at the option CONFIG_PHYSICAL_ALIGN
>> config PHYSICAL_ALIGN
>> hex "Alignment value to which kernel should be aligned"
>> default "0x20"
>> range 0x2000 0x100 if X86_32
>> range 0x20 0x100 if X86_64
>>
>> According to above, I think the 16M should come from the largest value
>> But the default value is 2M,  with smaller value reservation can have
>> more chance to succeed.
>>
>> It seems we still need arch specific CRASH_ALIGN, but the initial
>> version you added the #ifdef for different arches, can you move the
>> macro to arch specific headers?
> And just keep the x86 align value as is, I can try to change the x86
> value later to CONFIG_PHYSICAL_ALIGN, in this way this series can be
> cleaner.
Ok. I have no question about the value of macro CRASH_ALIGN,
instead the lower bound of memblock_find_in_range().

For x86, in reserve_crashkernel(),restrict the lower bound of the range to 
CRASH_ALIGN,
...
crash_base = memblock_find_in_range(CRASH_ALIGN,
CRASH_ADDR_LOW_MAX,
crash_size, CRASH_ALIGN);
...
   
in reserve_crashkernel_low(),with no this restriction.
...
low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
...

How about all making memblock_find_in_range() search from the start of memory?
If it is ok, i will do like this in the generic version.

Thanks,
Chen Zhou
>
>> Thanks
>> Dave
>
> .
>




Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-03 Thread chenzhou
Hi Catalin,


On 2020/9/3 1:09, Catalin Marinas wrote:
> On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>> will fail when there is no enough low memory.
>> 2. If reserving crashkernel above 4G, in this case, crash dump
>> kernel will boot failure because there is no low memory available
>> for allocation.
>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>>
>> To solve these issues, change the behavior of crashkernel=X.
>> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
>> high allocation if it fails.
>>
>> If requized size X is too large and leads to very little free memory
>> in ZONE_DMA after low allocation, the system may not work normally.
>> So add a threshold and go for high allocation directly if the required
>> size is too large. The value of threshold is set as the half of
>> the low memory.
>>
>> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
>> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
>> specified size low memory.
> Except for the threshold to keep zone ZONE_DMA memory,
> reserve_crashkernel() looks very close to the x86 version. Shall we try
> to make this generic as well? In the first instance, you could avoid the
> threshold check if it takes an explicit ",high" option.
Ok, i will try to do this.

I look into the function reserve_crashkernel() of x86 and found the start 
address is
CRASH_ALIGN in function memblock_find_in_range(), which is different with arm64.

I don't figure out why is CRASH_ALIGN in x86, is there any specific reason?

Thanks,
Chen Zhou




Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-03 Thread chenzhou



On 2020/9/3 19:26, chenzhou wrote:
> Hi Catalin,
>
>
> On 2020/9/3 1:09, Catalin Marinas wrote:
>> On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
>>> will fail when there is no enough low memory.
>>> 2. If reserving crashkernel above 4G, in this case, crash dump
>>> kernel will boot failure because there is no low memory available
>>> for allocation.
>>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
>>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
>>> the devices in crash dump kernel need to use ZONE_DMA will alloc
>>> fail.
>>>
>>> To solve these issues, change the behavior of crashkernel=X.
>>> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
>>> high allocation if it fails.
>>>
>>> If requized size X is too large and leads to very little free memory
>>> in ZONE_DMA after low allocation, the system may not work normally.
>>> So add a threshold and go for high allocation directly if the required
>>> size is too large. The value of threshold is set as the half of
>>> the low memory.
>>>
>>> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
>>> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
>>> specified size low memory.
>> Except for the threshold to keep zone ZONE_DMA memory,
>> reserve_crashkernel() looks very close to the x86 version. Shall we try
>> to make this generic as well? In the first instance, you could avoid the
>> threshold check if it takes an explicit ",high" option.
> Ok, i will try to do this.
>
> I look into the function reserve_crashkernel() of x86 and found the start 
> address is
> CRASH_ALIGN in function memblock_find_in_range(), which is different with 
> arm64.
>
> I don't figure out why is CRASH_ALIGN in x86, is there any specific reason?
Besides, in function reserve_crashkernel_low() of x86, the start address is 0.

>
> Thanks,
> Chen Zhou
>
>
>
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
> .
>




Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-09-03 Thread chenzhou



On 2020/9/3 1:13, Catalin Marinas wrote:
> On Sat, Aug 01, 2020 at 09:08:56PM +0800, Chen Zhou wrote:
>> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
>> b/Documentation/admin-guide/kdump/kdump.rst
>> index 2da65fef2a1c..4b58f97351d5 100644
>> --- a/Documentation/admin-guide/kdump/kdump.rst
>> +++ b/Documentation/admin-guide/kdump/kdump.rst
>> @@ -299,7 +299,15 @@ Boot into System Kernel
>> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
>> starting at physical address 0x0100 (16MB) for the dump-capture 
>> kernel.
>>  
>> -   On x86 and x86_64, use "crashkernel=64M@16M".
>> +   On x86 use "crashkernel=64M@16M".
>> +
>> +   On x86_64, use "crashkernel=X" to select a region under 4G first, and
>> +   fall back to reserve region above 4G.
>> +   We can also use "crashkernel=X,high" to select a region above 4G, which
>> +   also tries to allocate at least 256M below 4G automatically and
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from specified
>> +   start address X.
>>  
>> On ppc64, use "crashkernel=128M@32M".
>>  
>> @@ -316,8 +324,15 @@ Boot into System Kernel
>> kernel will automatically locate the crash kernel image within the
>> first 512MB of RAM if X is not given.
>>  
>> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
>> -   the kernel, X if explicitly specified, must be aligned to 2MiB 
>> (0x20).
>> +   On arm64, use "crashkernel=X" to try low allocation in ZONE_DMA, and
>> +   fall back to high allocation if it fails. And go for high allocation
>> +   directly if the required size is too large. If crash_base is outside
> I wouldn't mention crash_base in the admin guide. That's an
> implementation detail really and admins are not supposed to read the
> source code to make sense of the documentation. ZONE_DMA is also a
> kernel internal, so you'd need to define what it is for arm64. At least
> the DMA and DMA32 zones are printed during kernel boot.
Ok, i will fix this in next version.
>
>> +   ZONE_DMA, try to allocate at least 256M in ZONE_DMA automatically.
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>> +   For non-RPi4 platforms, change ZONE_DMA memtioned above to ZONE_DMA32.
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from
>> +   specified start address X. Note that the start address of the kernel,
>> +   X if explicitly specified, must be aligned to 2MiB (0x20).




Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-27 Thread chenzhou
Hi Catalin,


On 2020/8/19 20:03, Dave Young wrote:
> On 08/18/20 at 03:07pm, chenzhou wrote:
>>
>> On 2020/8/10 14:03, Dave Young wrote:
>>> Hi,
>>>
>>>>> Previously I remember we talked about to use similar logic as X86, but I
>>>>> remember you mentioned on some arm64 platform there could be no low
>>>>> memory at all.  Is this not a problem now for the fallback?  Just be
>>>>> curious, thanks for the update, for the common part looks good.
>>>> Hi Dave,
>>>>
>>>> Did you mean this discuss: https://lkml.org/lkml/2019/12/27/122?
>>> I meant about this reply instead :)
>>> https://lkml.org/lkml/2020/1/16/616
>> Hi Dave,
>>
>> Sorry for not repley in time, I was on holiday last week.
> Hi, no problem, thanks for following up.
>
>> The platform James mentioned may exist for which have no devices and need no 
>> low memory.
>> For our arm64 server platform, there are some devices and need low memory.
>>
>> I got it. For the platform with no low memory, reserving crashkernel will  
>> always fail.
>> How about like this:
> I think the question should leave to Catalin or James, I have no
> suggestion about this:)
Any suggestions about this?

Thanks,
Chen Zhou
>
>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index a8e34d97a894..4df18c7ea438 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -147,7 +147,7 @@ static void __init reserve_crashkernel(void)
>> }
>> memblock_reserve(crash_base, crash_size);
>>  
>> -   if (crash_base >= CRASH_ADDR_LOW_MAX) {
>> +   if (memstart_addr < CRASH_ADDR_LOW_MAX && crash_base >= 
>> CRASH_ADDR_LOW_MAX) {
>> const char *rename = "Crash kernel (low)";
>>  
>> if (reserve_crashkernel_low()) {
>>
>>
>> Thanks,
>> Chen Zhou
>>
>>> Thanks
>>> Dave
>>>
>>>
>>> .
>>>
>>
>
> .
>




Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-18 Thread chenzhou



On 2020/8/10 14:03, Dave Young wrote:
> Hi,
>
>>> Previously I remember we talked about to use similar logic as X86, but I
>>> remember you mentioned on some arm64 platform there could be no low
>>> memory at all.  Is this not a problem now for the fallback?  Just be
>>> curious, thanks for the update, for the common part looks good.
>> Hi Dave,
>>
>> Did you mean this discuss: https://lkml.org/lkml/2019/12/27/122?
> I meant about this reply instead :)
> https://lkml.org/lkml/2020/1/16/616
Hi Dave,

Sorry for not repley in time, I was on holiday last week.

The platform James mentioned may exist for which have no devices and need no 
low memory.
For our arm64 server platform, there are some devices and need low memory.

I got it. For the platform with no low memory, reserving crashkernel will  
always fail.
How about like this:

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index a8e34d97a894..4df18c7ea438 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -147,7 +147,7 @@ static void __init reserve_crashkernel(void)
}
memblock_reserve(crash_base, crash_size);
 
-   if (crash_base >= CRASH_ADDR_LOW_MAX) {
+   if (memstart_addr < CRASH_ADDR_LOW_MAX && crash_base >= 
CRASH_ADDR_LOW_MAX) {
const char *rename = "Crash kernel (low)";
 
if (reserve_crashkernel_low()) {


Thanks,
Chen Zhou

>
> Thanks
> Dave
>
>
> .
>




Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-09 Thread chenzhou
On 2020/8/8 18:02, Dave Young wrote:
> On 08/01/20 at 09:08pm, Chen Zhou wrote:
>> Now the behavior of crashkernel=X has been changed, which tries low
>> allocation in ZONE_DMA, and fall back to high allocation if it fails.
>>
>> If requized size X is too large and leads to very little free memory
>> in ZONE_DMA after low allocation, the system may not work well.
>> So add a threshold and go for high allocation directly if the required
>> size is too large. The threshold is set as the half of low memory.
>>
>> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
>> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
>> specified size low memory. For non-RPi4 platforms, change ZONE_DMA
>> memtioned above to ZONE_DMA32.
>>
>> So update the Documentation.
>>
>> Signed-off-by: Chen Zhou 
>> ---
>>  Documentation/admin-guide/kdump/kdump.rst | 21 ---
>>  .../admin-guide/kernel-parameters.txt | 11 --
>>  2 files changed, 27 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
>> b/Documentation/admin-guide/kdump/kdump.rst
>> index 2da65fef2a1c..4b58f97351d5 100644
>> --- a/Documentation/admin-guide/kdump/kdump.rst
>> +++ b/Documentation/admin-guide/kdump/kdump.rst
>> @@ -299,7 +299,15 @@ Boot into System Kernel
>> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
>> starting at physical address 0x0100 (16MB) for the dump-capture 
>> kernel.
>>  
>> -   On x86 and x86_64, use "crashkernel=64M@16M".
>> +   On x86 use "crashkernel=64M@16M".
>> +
>> +   On x86_64, use "crashkernel=X" to select a region under 4G first, and
>> +   fall back to reserve region above 4G.
>> +   We can also use "crashkernel=X,high" to select a region above 4G, which
>> +   also tries to allocate at least 256M below 4G automatically and
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from specified
>> +   start address X.
>>  
>> On ppc64, use "crashkernel=128M@32M".
>>  
>> @@ -316,8 +324,15 @@ Boot into System Kernel
>> kernel will automatically locate the crash kernel image within the
>> first 512MB of RAM if X is not given.
>>  
>> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
>> -   the kernel, X if explicitly specified, must be aligned to 2MiB 
>> (0x20).
>> +   On arm64, use "crashkernel=X" to try low allocation in ZONE_DMA, and
>> +   fall back to high allocation if it fails. And go for high allocation
>> +   directly if the required size is too large. If crash_base is outside
>> +   ZONE_DMA, try to allocate at least 256M in ZONE_DMA automatically.
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>> +   For non-RPi4 platforms, change ZONE_DMA memtioned above to ZONE_DMA32.
>> +   Use "crashkernel=Y@X" if you really have to reserve memory from
>> +   specified start address X. Note that the start address of the kernel,
>> +   X if explicitly specified, must be aligned to 2MiB (0x20).
>>  
>>  Load the Dump-capture Kernel
>>  
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index fb95fad81c79..d1b6016850d6 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -722,6 +722,10 @@
>>  [KNL, x86_64] select a region under 4G first, and
>>  fall back to reserve region above 4G when '@offset'
>>  hasn't been specified.
>> +[KNL, arm64] Try low allocation in ZONE_DMA, fall back
>> +to high allocation if it fails when '@offset' hasn't 
>> been
>> +specified. For non-RPi4 platforms, change ZONE_DMA to
>> +ZONE_DMA32.
>>  See Documentation/admin-guide/kdump/kdump.rst for 
>> further details.
>>  
>>  crashkernel=range1:size1[,range2:size2,...][@offset]
>> @@ -746,13 +750,16 @@
>>  requires at least 64M+32K low memory, also enough extra
>>  low memory is needed to make sure DMA buffers for 32-bit
>>  devices won't run out. Kernel would try to allocate at
>> -at least 256M below 4G automatically.
>> +least 256M below 4G automatically.
>>  This one let user to specify own low range under 4G
>>  for second kernel instead.
>>  0: to disable low allocation.
>>  It will be ignored when crashkernel=X,high is not used
>>  or memory reserved is below 4G.
>> -
>> +[KNL, arm64] range under 4G.
>> +This one let user to specify a low range in ZONE_DMA for
>> +crash 

Re: [PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-30 Thread chenzhou
Hi Catalin,


On 2020/7/29 23:20, Catalin Marinas wrote:
> On Wed, Jul 29, 2020 at 10:14:32PM +0800, chenzhou wrote:
>> On 2020/7/29 19:58, Catalin Marinas wrote:
>>> On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
>>>> How about like this:
>>>> 1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel=
>>>>behaviour to ZONE_DMA allocations.
>>>> 2. For this patch series, make the reserve_crashkernel_low() to
>>>>ZONE_DMA allocations.
>>> So you mean rebasing your series on top of Bhupesh's? I guess you can
>>> combine the two, I really don't care which way as long as we fix both
>>> issues and agree on the crashkernel= semantics. I think with some tweaks
>>> we can go with your series alone.
>>>
>>> IIUC from the x86 code (especially the part you #ifdef'ed out for
>>> arm64), if ",low" is not passed (so just standard crashkernel=X), it
>>> still allocates sufficient low memory for the swiotlb in ZONE_DMA. The
>>> rest can go in a high region. Why can't we do something similar on
>>> arm64? Of course, you can keep the ",low" argument for explicit
>>> allocation but I don't want to mandate it.
>> It is a good idea to combine the two.
>>
>> For parameter crashkernel=X, we do like this:
>> 1. allocate some low memory in ZONE_DMA(or ZONE_DMA32 if CONFIG_ZONE_DMA=n)
>> 2. allocate X size memory in a high region
>>
>> ",low" argument can be used to specify the low memory.
>>
>> Do i understand correctly?
> Yes, although we could follow the x86 approach:
>
> 1. Try low (ZONE_DMA for arm64) allocation, fallback to high allocation
>if it fails.
>
> 2. If crash_base is outside ZONE_DMA, call reserve_crashkernel_low()
>which either honours the ,low option or allocates some small amount
>in ZONE_DMA.
>
> If at some point we have platforms failing step 2, we'll look at
> changing ZONE_DMA to the full 4GB on non-RPi4 platforms.
>
> It looks to me like x86 ignores the ,low option if the first step
> managed to get some low memory. Shall we do the same on arm64?
Yes, we could do like this.
>
>>> So with an implicit ZONE_DMA allocation similar to the x86 one, we
>>> probably don't need Bhupesh's series at all. In addition, we can limit
>>> crashkernel= to the first 4G with a fall-back to high like x86 (not sure
>>> if memblock_find_in_range() is guaranteed to search in ascending order).
>>> I don't think we need an explicit ",high" annotation.
>>>
>>> So with the above, just a crashkernel=1G gives you at least 256MB in
>>> ZONE_DMA followed by the rest anywhere, with a preference for
>>> ZONE_DMA32. This way we can also keep the reserve_crashkernel_low()
>>> mostly intact from x86 (less #ifdef's).
>> Yes. We can let crashkernel=X  try to reserve low memory and fall back to 
>> use high memory
>> if failing to find a low range.
> The only question is whether we need to preserve some more ZONE_DMA on
> the current system. If for example we pass a crashkernel=512M and some
> cma=, we may end up with very little free memory in ZONE_DMA. That's
> mostly an issue for RPi4 since other platforms would work with
> ZONE_DMA32. We could add a threshold and go for high allocation directly
> if the required size is too large.
Ok.  I will think about the threshold in the next version and make the value be 
1/2 or 1/3 of the ZONE_DMA.
>
>> About the function reserve_crashkernel_low(), if we put it in arch/arm64, 
>> there is some common
>> code with x86_64. Some suggestions about this?
> If we can use this function almost intact, just move it in a common
> place. But if it gets sprinkled with #ifdef CONFIG_ARM64, I'd rather
> duplicate it. I'd still prefer to move it to a common place if possible.
>
> You can go a step further and also move the x86 reserve_crashkernel() to
> common code. I don't think there a significant difference between arm64
> and x86 here. You'd have to define arch-specific specific
> CRASH_ADDR_LOW_MAX etc.
I will take these into account and send the next version recently.
>
> Also patches moving code should not have any functional change. The
> CRASH_ALIGN change from 16M to 2M on x86 should be a separate patch as
> it needs to be acked by the x86 maintainers (IIRC, Ingo only acked the
> function move if there was no functional change; CRASH_ALIGN is used for
> the start address, not just alignment, on x86).
>
Thanks,
Chen Zhou



Re: [PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-29 Thread chenzhou
Hi Catalin,

On 2020/7/29 19:58, Catalin Marinas wrote:
> Hi Chen,
>
> On Wed, Jul 29, 2020 at 11:52:39AM +0800, chenzhou wrote:
>> On 2020/7/28 1:30, Catalin Marinas wrote:
>>> Anyway, there are two series solving slightly different issues with
>>> kdump reservations:
>>>
>>> 1. This series which relaxes the crashkernel= allocation to go anywhere
>>>in the accessible space while having a dedicated crashkernel=X,low
>>>option for ZONE_DMA.
>>>
>>> 2. Bhupesh's series [1] forcing crashkernel=X allocations only from
>>>ZONE_DMA.
>>>
>>> For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
>>> Existing crashkernel= uses may no longer work, depending on where the
>>> allocation falls. Option (2) above is a quick fix assuming that the
>>> crashkernel reservation is small enough. What's a typical crashkernel
>>> option here? That series is probably more prone to reservation failures.
>>>
>>> Option (1), i.e. this series, doesn't solve the problem raised by
>>> Bhupesh unless one uses the crashkernel=X,low argument. It can actually
>>> make it worse even for ZONE_DMA32 since the allocation can go above 4G
>>> (assuming that we change the ZONE_DMA configuration to only limit it to
>>> 1GB on RPi4).
>>>
>>> I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
>>> allocations. If this is too small for typical kdump, we can look into
>>> expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
>>> list). In addition, if Chen thinks allocations above 4G are still needed
>>> or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
>>> ",high" option to explicitly require such access.
>> Thanks for your reply and exhaustive explanation.
>>
>> In our ARM servers, we need to to reserve a large chunk for kdump(512M
>> or 1G), there is no enough low memory. So we proposed this patch
>> series "support reserving crashkernel above 4G on arm64 kdump" In
>> April 2019.
> Trying to go through the discussions last year, hopefully things get
> clearer.
>
> So prior to the ZONE_DMA change, you still couldn't reserve 1G in the
> first 4GB? It shouldn't be sparsely populated during early boot.
Yes, we prior to the ZONE_DMA change, you still couldn't reserve 1G/512M in the 
first 4GB.
The memory reported by the bios may be splitted by some "reserved" entries.
Like this:
...
2f126000-2fbf : reserved
2fc0-396a : System RAM
  30de8000-30de9fff : reserved
  30dec000-30decfff : reserved
  30df2000-30df2fff : reserved
  30e2-30e4 : reserved
  3962-3968 : reserved
396b-3974 : reserved
3975-397a : System RAM
397b-398f : reserved
3990-3990 : System RAM
  3990-3990 : reserved
...
>
>> I introduce parameters "crashkernel=X,[high,low]" as x86_64 does in earlier 
>> versions.
>> Suggested by James, to simplify, we call reserve_crashkernel_low() at the 
>> beginning of
>> reserve_crashkernel() and then relax the arm64_dma32_phys_limit if 
>> reserve_crashkernel_low()
>> allocated something.
>> That is, just the parameter "crashkernel=X,low" is ok and i deleted 
>> "crashkernel=X,high".
> The problem I see is that with your patches we diverge from x86
> behaviour (and the arm64 behaviour prior to the ZONE_DMA reduction) as
> we now require that crashkernel=X,low is always passed if you want
> something in ZONE_DMA (and you do want, otherwise the crashdump kernel
> fails to boot).
>
> My main requirement is that crashkernel=X, without any suffix, still
> works which I don't think is guaranteed with your patches (well,
> ignoring RPi4 ZONE_DMA). Bhupesh's series is a quick fix but doesn't
> solve your large allocation requirements (that may have worked prior to
> the ZONE_DMA change).
The main purpose of this series is to solve the large allocation requirements.
Before the DMA_ZONE, both the original crashkernel=X and large allocation with 
my  patches
work well.

With the DMA_ZONE, both the original crashkernel=X and large allocation with my 
 patches
may fail to boot. Both need to think about the DMA_ZONE.

>
>> After the ZONE_DMA introduced in December 2019, the issue occurred as
>> you said above. In fact, we didn't have RPi4 machine.
> You don't even need to have a RPi4 machine, ZONE_DMA has been set to 1GB
> unconditionally. And while we could move it back to 4GB on non-RPi4
> hardware, I'd rather have a solution that fixes kdump for RPi4 as well.
>
>> Originally, i suggested to fix this based on this patch series and
>>

Re: [PATCH v10 4/5] arm64: kdump: fix kdump broken with ZONE_DMA reintroduced

2020-07-28 Thread chenzhou
Hi  Catalin,


On 2020/7/28 1:30, Catalin Marinas wrote:
> On Fri, Jul 03, 2020 at 11:58:15AM +0800, Chen Zhou wrote:
>> commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32")
>> broken the arm64 kdump. If the memory reserved for crash dump kernel
>> falled in ZONE_DMA32, the devices in crash dump kernel need to use
>> ZONE_DMA will alloc fail.
>>
>> This patch addressed the above issue based on "reserving crashkernel
>> above 4G". Originally, we reserve low memory below 4G, and now just need
>> to adjust memory limit to arm64_dma_phys_limit in reserve_crashkernel_low
>> if ZONE_DMA is enabled. That is, if there are devices need to use ZONE_DMA
>> in crash dump kernel, it is a good choice to use parameters
>> "crashkernel=X crashkernel=Y,low".
>>
>> Signed-off-by: Chen Zhou 
>> ---
>>  kernel/crash_core.c | 7 ++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
>> index a7580d291c37..e8ecbbc761a3 100644
>> --- a/kernel/crash_core.c
>> +++ b/kernel/crash_core.c
>> @@ -320,6 +320,7 @@ int __init reserve_crashkernel_low(void)
>>  unsigned long long base, low_base = 0, low_size = 0;
>>  unsigned long total_low_mem;
>>  int ret;
>> +phys_addr_t crash_max = 1ULL << 32;
>>  
>>  total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
>>  
>> @@ -352,7 +353,11 @@ int __init reserve_crashkernel_low(void)
>>  return 0;
>>  }
>>  
>> -low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
>> +#ifdef CONFIG_ARM64
>> +if (IS_ENABLED(CONFIG_ZONE_DMA))
>> +crash_max = arm64_dma_phys_limit;
>> +#endif
>> +low_base = memblock_find_in_range(0, crash_max, low_size, CRASH_ALIGN);
>>  if (!low_base) {
>>  pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
>> smaller size.\n",
>> (unsigned long)(low_size >> 20));
> Given the number of #ifdefs we end up with in this function, I think
> it's better to simply copy to the code to arch/arm64 and tailor it
> accordingly.
>
> Anyway, there are two series solving slightly different issues with
> kdump reservations:
>
> 1. This series which relaxes the crashkernel= allocation to go anywhere
>in the accessible space while having a dedicated crashkernel=X,low
>option for ZONE_DMA.
>
> 2. Bhupesh's series [1] forcing crashkernel=X allocations only from
>ZONE_DMA.
>
> For RPi4 support, we limited ZONE_DMA allocations to the 1st GB.
> Existing crashkernel= uses may no longer work, depending on where the
> allocation falls. Option (2) above is a quick fix assuming that the
> crashkernel reservation is small enough. What's a typical crashkernel
> option here? That series is probably more prone to reservation failures.
>
> Option (1), i.e. this series, doesn't solve the problem raised by
> Bhupesh unless one uses the crashkernel=X,low argument. It can actually
> make it worse even for ZONE_DMA32 since the allocation can go above 4G
> (assuming that we change the ZONE_DMA configuration to only limit it to
> 1GB on RPi4).
>
> I'm more inclined to keep the crashkernel= behaviour to ZONE_DMA
> allocations. If this is too small for typical kdump, we can look into
> expanding ZONE_DMA to 4G on non-RPi4 hardware (we had patches on the
> list). In addition, if Chen thinks allocations above 4G are still needed
> or if RPi4 needs a sufficiently large crashkernel=, I'd rather have a
> ",high" option to explicitly require such access.
Thanks for your reply and exhaustive explanation.

In our ARM servers, we need to to reserve a large chunk for kdump(512M or 1G),
there is no enough low memory. So we proposed this patch series
"support reserving crashkernel above 4G on arm64 kdump" In April 2019.

I introduce parameters "crashkernel=X,[high,low]" as x86_64 does in earlier 
versions.
Suggested by James, to simplify, we call reserve_crashkernel_low() at the 
beginning of
reserve_crashkernel() and then relax the arm64_dma32_phys_limit if 
reserve_crashkernel_low()
allocated something.
That is, just the parameter "crashkernel=X,low" is ok and i deleted 
"crashkernel=X,high".

After the ZONE_DMA introduced in December 2019, the issue occurred as you said 
above.
In fact, we didn't have RPi4 machine. Originally, i suggested to fix this based 
on this patch series
and used the dedicated option.

According to your clarify, for typical kdump, there are other solutions. In 
this case,
"keep the crashkernel= behaviour to ZONE_DMA allocations" looks much better.

How about like this:
1. For ZONE_DMA issue, use Bhupesh's solution, keep the crashkernel= behaviour 
to ZONE_DMA allocations.
2. For this patch series, make the reserve_crashkernel_low() to ZONE_DMA 
allocations.

Thanks,
Chen Zhou
> [1] http://lists.infradead.org/pipermail/kexec/2020-July/020777.html
>




Re: [PATCH v10 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-07-03 Thread chenzhou
Hi Bhupesh,


On 2020/7/3 15:26, Bhupesh Sharma wrote:
> Hi Chen,
>
> On Fri, Jul 3, 2020 at 9:24 AM Chen Zhou  wrote:
>> This patch series enable reserving crashkernel above 4G in arm64.
>>
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>> when there is no enough low memory.
>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>> will boot failure because there is no low memory available for allocation.
>> 3. commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32") broken
>> the arm64 kdump. If the memory reserved for crash dump kernel falled in
>> ZONE_DMA32, the devices in crash dump kernel need to use ZONE_DMA will alloc
>> fail.
>>
>> To solve these issues, introduce crashkernel=X,low to reserve specified
>> size low memory.
>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>> size low memory for crash kdump kernel devices firstly and then reserve
>> memory above 4G.
>>
>> When crashkernel is reserved above 4G in memory and crashkernel=X,low
>> is specified simultaneously, kernel should reserve specified size low memory
>> for crash dump kernel devices. So there may be two crash kernel regions, one
>> is below 4G, the other is above 4G.
>> In order to distinct from the high region and make no effect to the use of
>> kexec-tools, rename the low region as "Crash kernel (low)", and pass the
>> low region by reusing DT property "linux,usable-memory-range". We made the 
>> low
>> memory region as the last range of "linux,usable-memory-range" to keep
>> compatibility with existing user-space and older kdump kernels.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: support more than one crash kernel regions(see [1])
>>
>> Another update is document about DT property 'linux,usable-memory-range':
>> schemas: update 'linux,usable-memory-range' node schema(see [2])
>>
>> The previous changes and discussions can be retrieved from:
>>
>> Changes since [v9]
>> - Patch 1 add Acked-by from Dave.
>> - Update patch 5 according to Dave's comments.
>> - Update chosen schema.
>>
>> Changes since [v8]
>> - Reuse DT property "linux,usable-memory-range".
>> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
>> low
>> memory region.
>> - Fix kdump broken with ZONE_DMA reintroduced.
>> - Update chosen schema.
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
>> - Update Documentation/devicetree/bindings/chosen.txt.
>> Add corresponding documentation to 
>> Documentation/devicetree/bindings/chosen.txt
>> suggested by Arnd.
>> - Add Tested-by from Jhon and pk.
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 
>> 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and 
>> then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
>> [2]: https://github.com/robherring/dt-schema/pull/19
>> [v1]: https://lkml.org/lkml/2019/4/2/1174
>> [v2]: https://lkml.org/lkml/2019/4/9/86
>> [v3]: https://lkml.org/lkml/2019/4/9/306
>> [v4]: https://lkml.org/lkml/2019/4/15/273
>> [v5]: https://lkml.org/lkml/2019/5/6/1360
>> [v6]: https://lkml.org/lkml/2019/8/30/142
>> [v7]: https://lkml.org/lkml/2019/12/23/411
>> [v8]: https://lkml.org/lkml/2020/5/21/213
>> [v9]: https://lkml.org/lkml/2020/6/28/73
>>
>> Chen Zhou (5):
>>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>   arm64: kdump: 

Re: [PATCH 2/2] arm64: Allocate crashkernel always in ZONE_DMA

2020-07-02 Thread chenzhou
Hi Bhupesh,


On 2020/7/3 3:22, Bhupesh Sharma wrote:
> Hi Will,
>
> On Thu, Jul 2, 2020 at 1:20 PM Will Deacon  wrote:
>> On Thu, Jul 02, 2020 at 03:44:20AM +0530, Bhupesh Sharma wrote:
>>> commit bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in
>>> ZONE_DMA32") allocates crashkernel for arm64 in the ZONE_DMA32.
>>>
>>> However as reported by Prabhakar, this breaks kdump kernel booting in
>>> ThunderX2 like arm64 systems. I have noticed this on another ampere
>>> arm64 machine. The OOM log in the kdump kernel looks like this:
>>>
>>>   [0.240552] DMA: preallocated 128 KiB GFP_KERNEL pool for atomic 
>>> allocations
>>>   [0.247713] swapper/0: page allocation failure: order:1, 
>>> mode:0xcc1(GFP_KERNEL|GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>>   <..snip..>
>>>   [0.274706] Call trace:
>>>   [0.277170]  dump_backtrace+0x0/0x208
>>>   [0.280863]  show_stack+0x1c/0x28
>>>   [0.284207]  dump_stack+0xc4/0x10c
>>>   [0.287638]  warn_alloc+0x104/0x170
>>>   [0.291156]  __alloc_pages_slowpath.constprop.106+0xb08/0xb48
>>>   [0.296958]  __alloc_pages_nodemask+0x2ac/0x2f8
>>>   [0.301530]  alloc_page_interleave+0x20/0x90
>>>   [0.305839]  alloc_pages_current+0xdc/0xf8
>>>   [0.309972]  atomic_pool_expand+0x60/0x210
>>>   [0.314108]  __dma_atomic_pool_init+0x50/0xa4
>>>   [0.318504]  dma_atomic_pool_init+0xac/0x158
>>>   [0.322813]  do_one_initcall+0x50/0x218
>>>   [0.326684]  kernel_init_freeable+0x22c/0x2d0
>>>   [0.331083]  kernel_init+0x18/0x110
>>>   [0.334600]  ret_from_fork+0x10/0x18
>>>
>>> This patch limits the crashkernel allocation to the first 1GB of
>>> the RAM accessible (ZONE_DMA), as otherwise we might run into OOM
>>> issues when crashkernel is executed, as it might have been originally
>>> allocated from either a ZONE_DMA32 memory or mixture of memory chunks
>>> belonging to both ZONE_DMA and ZONE_DMA32.
>> How does this interact with this ongoing series:
>>
>> https://lore.kernel.org/r/20200628083458.40066-1-chenzho...@huawei.com
>>
>> (patch 4, in particular)
> Many thanks for having a look at this patchset. I was not aware that
> Chen had sent out a new version.
> I had noted in the v9 review of the high/low range allocation
>  that I was working
> on a generic solution (irrespective of the crashkernel, low and high
> range allocation) which resulted in this patchset.
>
> The issue is two-fold: OOPs in memcfg layer (PATCH 1/2, which has been
> Acked-by memcfg maintainer) and OOM in the kdump kernel due to
> crashkernel allocation in ZONE_DMA32 regions(s) which is addressed by
> this PATCH.
>
> I will have a closer look at the v10 patchset Chen shared, but seems
> it needs some rework as per Dave's review comments which he shared
> today.
> IMO, in the meanwhile this patchset  can be used to fix the existing
> kdump issue with upstream kernel.
Thanks for your work.
There is no progress on the issue for long time, so i sent my solution in v8 
comments
and sent v9 recently.

I think direct limiting the crashkernel in ZONE_DMA isn't a good idea:
1. For parameter "crashkernel=Y", reserving crashkernel in first 1G memory will 
increase
the probability of memory allocation failure.
Previous discuss from https://lkml.org/lkml/2019/10/21/725:
"With ZONE_DMA=y, this config will fail to reserve 512M CMA on a server"

2. For parameter "crashkernel=Y@X", limiting the crashkernel in ZONE_DMA is 
unreasonable
for someone really want to reserve crashkernel from specified start address.

I have sent v10: https://www.spinics.net/lists/arm-kernel/msg819408.html, any 
commets are welcome.

Thanks,
Chen Zhou
>
>>> Fixes: bff3b04460a8 ("arm64: mm: reserve CMA and crashkernel in ZONE_DMA32")
>>> Cc: Johannes Weiner 
>>> Cc: Michal Hocko 
>>> Cc: Vladimir Davydov 
>>> Cc: James Morse 
>>> Cc: Mark Rutland 
>>> Cc: Will Deacon 
>>> Cc: Catalin Marinas 
>>> Cc: cgro...@vger.kernel.org
>>> Cc: linux...@kvack.org
>>> Cc: linux-arm-ker...@lists.infradead.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: ke...@lists.infradead.org
>>> Reported-by: Prabhakar Kushwaha 
>>> Signed-off-by: Bhupesh Sharma 
>>> ---
>>>  arch/arm64/mm/init.c | 16 ++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>>> index 1e93cfc7c47a..02ae4d623802 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -91,8 +91,15 @@ static void __init reserve_crashkernel(void)
>>>   crash_size = PAGE_ALIGN(crash_size);
>>>
>>>   if (crash_base == 0) {
>>> - /* Current arm64 boot protocol requires 2MB alignment */
>>> - crash_base = memblock_find_in_range(0, arm64_dma32_phys_limit,
>>> + /* Current arm64 boot protocol requires 2MB alignment.
>>> +  * Also limit the crashkernel allocation to the first
>>> +  * 1GB of the RAM accessible (ZONE_DMA), as 

Re: [PATCH v9 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-02 Thread chenzhou
Hi Dave,


On 2020/7/2 10:59, Dave Young wrote:
> Hi Chen,
> On 06/28/20 at 04:34pm, Chen Zhou wrote:
>> Now we support crashkernel=X,[low] on arm64, update the Documentation.
>> We could use parameters "crashkernel=X crashkernel=Y,low" to reserve
>> memory above 4G.
>>
>> Signed-off-by: Chen Zhou 
>> Tested-by: John Donnelly 
>> Tested-by: Prabhakar Kushwaha 
>> ---
>>  Documentation/admin-guide/kdump/kdump.rst   | 13 +++--
>>  Documentation/admin-guide/kernel-parameters.txt | 17 +++--
>>  2 files changed, 26 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
>> b/Documentation/admin-guide/kdump/kdump.rst
>> index 2da65fef2a1c..6ba294d425c9 100644
>> --- a/Documentation/admin-guide/kdump/kdump.rst
>> +++ b/Documentation/admin-guide/kdump/kdump.rst
>> @@ -299,7 +299,13 @@ Boot into System Kernel
>> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
>> starting at physical address 0x0100 (16MB) for the dump-capture 
>> kernel.
>>  
>> -   On x86 and x86_64, use "crashkernel=64M@16M".
>> +   On x86 use "crashkernel=64M@16M".
>> +
>> +   On x86_64, use "crashkernel=Y[@X]" to select a region under 4G first, and
>> +   fall back to reserve region above 4G when '@offset' hasn't been 
>> specified.
> Actually crashkernel=Y without the offset works well, I do not see why
> we need the offset, it should be some legacy thing.  So it should be
> better just use the Y without offset here, and just leave a note
> somewhere people can use [@X] offset when they really have to.
>
>> +   We can also use "crashkernel=X,high" to select a region above 4G, which
>> +   also tries to allocate at least 256M below 4G automatically and
>> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>>  
>> On ppc64, use "crashkernel=128M@32M".
>>  
>> @@ -316,8 +322,11 @@ Boot into System Kernel
>> kernel will automatically locate the crash kernel image within the
>> first 512MB of RAM if X is not given.
>>  
>> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
>> +   On arm64, use "crashkernel=Y[@X]". Note that the start address of
>> the kernel, X if explicitly specified, must be aligned to 2MiB 
>> (0x20).
>> +   If crashkernel=Z,low is specified simultaneously, reserve spcified size
>> +   low memory for crash kdump kernel devices firstly and then reserve memory
> "devices" seems not very accurate, maybe just drop the "for crash kdump
> kernel devices" since it is clear based on the context.
>
>> +   above 4G.
>>  
>>  Load the Dump-capture Kernel
>>  
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
>> b/Documentation/admin-guide/kernel-parameters.txt
>> index fb95fad81c79..335431a351c0 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -722,6 +722,9 @@
>>  [KNL, x86_64] select a region under 4G first, and
>>  fall back to reserve region above 4G when '@offset'
>>  hasn't been specified.
>> +[KNL, arm64] If crashkernel=X,low is specified, reserve
>> +spcified size low memory for crash kdump kernel devices
> Ditto.
>
>> +firstly, and then reserve memory above 4G.
>>  See Documentation/admin-guide/kdump/kdump.rst for 
>> further details.
>>  
>>  crashkernel=range1:size1[,range2:size2,...][@offset]
>> @@ -746,13 +749,23 @@
>>  requires at least 64M+32K low memory, also enough extra
>>  low memory is needed to make sure DMA buffers for 32-bit
>>  devices won't run out. Kernel would try to allocate at
>> -at least 256M below 4G automatically.
>> +least 256M below 4G automatically.
>>  This one let user to specify own low range under 4G
>>  for second kernel instead.
>>  0: to disable low allocation.
>>  It will be ignored when crashkernel=X,high is not used
>>  or memory reserved is below 4G.
>> -
>> +[KNL, arm64] range under 4G.
>> +This one let user to specify own low range under 4G
>> +for crash dump kernel instead.
>> +Different with x86_64, kernel allocates specified size
> sounds better:
> s/Different with/Be different from
>
> s/allocates/reserves
>
>> +physical memory region only when this parameter is 
>> specified
>> +instead of trying to allocate at least 256M below 4G
> s/allocate/reserve
>
>> +automatically.
>> +This parameter is used along with crashkernel=X when we
> Could change the passive sentence to below:
> "Use this parameter along with"
Thanks 

Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump

2020-06-19 Thread chenzhou
Hi James, Rob,


On 2020/5/30 0:11, James Morse wrote:
> Hi guys,
>
> On 26/05/2020 22:18, Rob Herring wrote:
>> On Fri, May 22, 2020 at 11:24:11AM +0800, chenzhou wrote:
>>> On 2020/5/21 21:29, Rob Herring wrote:
>>>> On Thu, May 21, 2020 at 3:35 AM Chen Zhou  wrote:
>>>>> Add documentation for DT property used by arm64 kdump:
>>>>> linux,low-memory-range.
>>>>> "linux,low-memory-range" is an another memory region used for crash
>>>>> dump kernel devices.
>>>>> diff --git a/Documentation/devicetree/bindings/chosen.txt 
>>>>> b/Documentation/devicetree/bindings/chosen.txt
>>>>> index 45e79172a646..bfe6fb6976e6 100644
>>>>> --- a/Documentation/devicetree/bindings/chosen.txt
>>>>> +++ b/Documentation/devicetree/bindings/chosen.txt
>>>>> +linux,low-memory-range
>>>>> +--
>>>>> +This property (arm64 only) holds a base address and size, describing a
>>>>> +limited region below 4G. Similar to "linux,usable-memory-range", it is
>>>>> +an another memory range which may be considered available for use by the
>>>>> +kernel.
>>>> Why can't you just add a range to "linux,usable-memory-range"? It
>>>> shouldn't be hard to figure out which part is below 4G.
>>> The comments from James:
>>> Won't this break if your kdump kernel doesn't know what the extra 
>>> parameters are?
>>> Or if it expects two ranges, but only gets one? These DT properties should 
>>> be treated as
>>> ABI between kernel versions, we can't really change it like this.
>>>
>>> I think the 'low' region is an optional-extra, that is never mapped by the 
>>> first kernel. I
>>> think the simplest thing to do is to add an 'linux,low-memory-range' that we
>>> memblock_add() after memblock_cap_memory_range() has been called.
>>> If its missing, or the new kernel doesn't know what its for, everything 
>>> keeps working.
>>
>> I don't think there's a compatibility issue here though. The current 
>> kernel doesn't care if the property is longer than 1 base+size. It only 
>> checks if the size is less than 1 base+size.
> Aha! I missed that.
>
>
>> And yes, we can rely on 
>> that implementation detail. It's only an ABI if an existing user 
>> notices.
>>
>> Now, if the low memory is listed first, then an older kdump kernel 
>> would get a different memory range. If that's a problem, then define 
>> that low memory goes last. 
> This first entry would need to be the 'crashkernel' range where the kdump 
> kernel is
> placed, otherwise an older kernel won't boot. The rest can be optional 
> extras, as long as
> we are tolerant of it being missing...
How about like this:

1. The low memory region remained as "Crash kernel (low)".
2. Userspace will find "Crash kernel" and "Crash kernel (low)" region in 
/proc/iomem,
and add "Crash kernel (low)" as the last range of property 
"linux,usable-memory-range".

Thanks,
Chen Zhou
>
> I'll try and look at the rest of this series on Monday,
>
>
> Thanks,
>
> James
>
> .
>




Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-19 Thread chenzhou


On 2020/6/19 10:32, John Donnelly wrote:
>
> On 6/4/20 12:01 PM, Nicolas Saenz Julienne wrote:
>> On Thu, 2020-06-04 at 01:17 +0530, Bhupesh Sharma wrote:
>>> Hi All,
>>>
>>> On Wed, Jun 3, 2020 at 9:03 PM John Donnelly 
>>> wrote:
>>>>
>>>>> On Jun 3, 2020, at 8:20 AM, chenzhou  wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>
>>>>> On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
>>>>>> Hi Chen,
>>>>>>
>>>>>> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly >>>>>> wrote:
>>>>>>>
>>>>>>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha <
>>>>>>>> prabhakar.p...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly <
>>>>>>>> john.p.donne...@oracle.com> wrote:
>>>>>>>>> Hi .  See below !
>>>>>>>>>
>>>>>>>>>> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma 
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly <
>>>>>>>>>> john.p.donne...@oracle.com> wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>>>>>>>>>>> Hi Chen,
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou <
>>>>>>>>>>>> chenzho...@huawei.com> wrote:
>>>>>>>>>>>>> This patch series enable reserving crashkernel above 4G in
>>>>>>>>>>>>> arm64.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are following issues in arm64 kdump:
>>>>>>>>>>>>> 1. We use crashkernel=X to reserve crashkernel below 4G,
>>>>>>>>>>>>> which will fail
>>>>>>>>>>>>> when there is no enough low memory.
>>>>>>>>>>>>> 2. Currently, crashkernel=Y@X can be used to reserve
>>>>>>>>>>>>> crashkernel above 4G,
>>>>>>>>>>>>> in this case, if swiotlb or DMA buffers are required,
>>>>>>>>>>>>> crash dump kernel
>>>>>>>>>>>>> will boot failure because there is no low memory available
>>>>>>>>>>>>> for allocation.
>>>>>>>>>>>>>
>>>>>>>>>>>> We are getting "warn_alloc" [1] warning during boot of kdump
>>>>>>>>>>>> kernel
>>>>>>>>>>>> with bootargs as [2] of primary kernel.
>>>>>>>>>>>> This error observed on ThunderX2  ARM64 platform.
>>>>>>>>>>>>
>>>>>>>>>>>> It is observed with latest upstream tag (v5.7-rc3) with this
>>>>>>>>>>>> patch set
>>>>>>>>>>>> and
>>>>>>>>>>>>
>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>>>>>>>>>>> Also **without** this patch-set
>>>>>>>>>>>> "
>>>>>>>>>>>>
>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>>>>>>>>>> "
>>>>>>>>>>>>
>>>>>>>>>>>> This issue comes whenever crashkernel memory is reserved
>>>>>>>>>>>> after 0xc000_.
>>>>>>>>>>>> More details discussed earlier in
>>>>>>>>>>>>
>> https://urldefense.com/v3/__https://www.spinics.net

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-06-03 Thread chenzhou
Hi,


On 2020/6/3 19:47, Prabhakar Kushwaha wrote:
> Hi Chen,
>
> On Tue, Jun 2, 2020 at 8:12 PM John Donnelly  
> wrote:
>>
>>
>>> On Jun 2, 2020, at 12:38 AM, Prabhakar Kushwaha  
>>> wrote:
>>>
>>> On Tue, Jun 2, 2020 at 3:29 AM John Donnelly  
>>> wrote:
 Hi .  See below !

> On Jun 1, 2020, at 4:02 PM, Bhupesh Sharma  wrote:
>
> Hi John,
>
> On Tue, Jun 2, 2020 at 1:01 AM John Donnelly  
> wrote:
>> Hi,
>>
>>
>> On 6/1/20 7:02 AM, Prabhakar Kushwaha wrote:
>>> Hi Chen,
>>>
>>> On Thu, May 21, 2020 at 3:05 PM Chen Zhou  wrote:
 This patch series enable reserving crashkernel above 4G in arm64.

 There are following issues in arm64 kdump:
 1. We use crashkernel=X to reserve crashkernel below 4G, which will 
 fail
 when there is no enough low memory.
 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 
 4G,
 in this case, if swiotlb or DMA buffers are required, crash dump kernel
 will boot failure because there is no low memory available for 
 allocation.

>>> We are getting "warn_alloc" [1] warning during boot of kdump kernel
>>> with bootargs as [2] of primary kernel.
>>> This error observed on ThunderX2  ARM64 platform.
>>>
>>> It is observed with latest upstream tag (v5.7-rc3) with this patch set
>>> and 
>>> https://urldefense.com/v3/__https://lists.infradead.org/pipermail/kexec/2020-May/025128.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbiIAAlzu$
>>> Also **without** this patch-set
>>> "https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$;
>>>
>>> This issue comes whenever crashkernel memory is reserved after 
>>> 0xc000_.
>>> More details discussed earlier in
>>> https://urldefense.com/v3/__https://www.spinics.net/lists/arm-kernel/msg806882.html__;!!GqivPVa7Brio!LnTSARkCt0V0FozR0KmqooaH5ADtdXvs3mPdP3KRVqALmvSK2VmCkIPIhsaxbjC6ujMA$
>>>   without any
>>> solution
>>>
>>> This patch-set is expected to solve similar kind of issue.
>>> i.e. low memory is only targeted for DMA, swiotlb; So above mentioned
>>> observation should be considered/fixed. .
>>>
>>> --pk
>>>
>>> [1]
>>> [   30.366695] DMI: Cavium Inc. Saber/Saber, BIOS
>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
>>> [   30.367696] NET: Registered protocol family 16
>>> [   30.369973] swapper/0: page allocation failure: order:6,
>>> mode:0x1(GFP_DMA), nodemask=(null),cpuset=/,mems_allowed=0
>>> [   30.369980] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc3+ #121
>>> [   30.369981] Hardware name: Cavium Inc. Saber/Saber, BIOS
>>> TX2-FW-Release-3.1-build_01-2803-g74253a541a mm/dd/
>>> [   30.369984] Call trace:
>>> [   30.369989]  dump_backtrace+0x0/0x1f8
>>> [   30.369991]  show_stack+0x20/0x30
>>> [   30.369997]  dump_stack+0xc0/0x10c
>>> [   30.370001]  warn_alloc+0x10c/0x178
>>> [   30.370004]  __alloc_pages_slowpath.constprop.111+0xb10/0xb50
>>> [   30.370006]  __alloc_pages_nodemask+0x2b4/0x300
>>> [   30.370008]  alloc_page_interleave+0x24/0x98
>>> [   30.370011]  alloc_pages_current+0xe4/0x108
>>> [   30.370017]  dma_atomic_pool_init+0x44/0x1a4
>>> [   30.370020]  do_one_initcall+0x54/0x228
>>> [   30.370027]  kernel_init_freeable+0x228/0x2cc
>>> [   30.370031]  kernel_init+0x1c/0x110
>>> [   30.370034]  ret_from_fork+0x10/0x18
>>> [   30.370036] Mem-Info:
>>> [   30.370064] active_anon:0 inactive_anon:0 isolated_anon:0
>>> [   30.370064]  active_file:0 inactive_file:0 isolated_file:0
>>> [   30.370064]  unevictable:0 dirty:0 writeback:0 unstable:0
>>> [   30.370064]  slab_reclaimable:34 slab_unreclaimable:4438
>>> [   30.370064]  mapped:0 shmem:0 pagetables:14 bounce:0
>>> [   30.370064]  free:1537719 free_pcp:219 free_cma:0
>>> [   30.370070] Node 0 active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>> unstable:0kB all_unreclaimable? no
>>> [   30.370073] Node 1 active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
>>> isolated(file):0kB mapped:0kB dirty:0kB writeback:0kB shmem:0kB
>>> shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB
>>> unstable:0kB all_unreclaimable? no
>>> [   30.370079] Node 0 DMA free:0kB min:0kB low:0kB high:0kB
>>> reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB
>>> active_file:0kB inactive_file:0kB unevictable:0kB 

Re: [PATCH v8 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-05-25 Thread chenzhou
Hi Baoquan,


Thanks for your suggestions.

You are right, some details should be made in the commit log.


Thanks,

Chen Zhou


On 2020/5/26 9:42, Baoquan He wrote:
> On 05/21/20 at 05:38pm, Chen Zhou wrote:
>> This patch series enable reserving crashkernel above 4G in arm64.
>>
>> There are following issues in arm64 kdump:
>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>> when there is no enough low memory.
>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>> will boot failure because there is no low memory available for allocation.
>>
>> To solve these issues, introduce crashkernel=X,low to reserve specified
>> size low memory.
>> Crashkernel=X tries to reserve memory for the crash dump kernel under
>> 4G. If crashkernel=Y,low is specified simultaneously, reserve spcified
>> size low memory for crash kdump kernel devices firstly and then reserve
>> memory above 4G.
>>
>> When crashkernel is reserved above 4G in memory, that is, crashkernel=X,low
>> is specified simultaneously, kernel should reserve specified size low memory
>> for crash dump kernel devices. So there may be two crash kernel regions, one
>> is below 4G, the other is above 4G.
>> In order to distinct from the high region and make no effect to the use of
>> kexec-tools, rename the low region as "Crash kernel (low)", and add DT 
>> property
>> "linux,low-memory-range" to crash dump kernel's dtb to pass the low region.
>>
>> Besides, we need to modify kexec-tools:
>> arm64: kdump: add another DT property to crash dump kernel's dtb(see [1])
>>
>> The previous changes and discussions can be retrieved from:
>>
>> Changes since [v7]
>> - Move x86 CRASH_ALIGN to 2M
>> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> OK, moving x86 CRASH_ALIGN to 2M is suggested by Dave. Because
> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M. So 2M seems good.
> But, anyway, we should tell the reason why it need be changed in commit
> log.
>
>
> arch/x86/Kconfig:
> config PHYSICAL_ALIGN
> hex "Alignment value to which kernel should be aligned"
> default "0x20"
> range 0x2000 0x100 if X86_32
> range 0x20 0x100 if X86_64
>
>> - Update Documentation/devicetree/bindings/chosen.txt 
>> Add corresponding documentation to 
>> Documentation/devicetree/bindings/chosen.txt suggested by Arnd.
>> - Add Tested-by from Jhon and pk
>>
>> Changes since [v6]
>> - Fix build errors reported by kbuild test robot.
>>
>> Changes since [v5]
>> - Move reserve_crashkernel_low() into kernel/crash_core.c.
>> - Delete crashkernel=X,high.
> And the crashkernel=X,high being deleted need be told too. Otherwise
> people reading the commit have to check why themselves. I didn't follow
> the old version, can't see why ,high can't be specified explicitly.
>
>> - Modify crashkernel=X,low.
>> If crashkernel=X,low is specified simultaneously, reserve spcified size low
>> memory for crash kdump kernel devices firstly and then reserve memory above 
>> 4G.
>> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and 
>> then
>> pass to crash dump kernel by DT property "linux,low-memory-range".
>> - Update Documentation/admin-guide/kdump/kdump.rst.
>>
>> Changes since [v4]
>> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
>>
>> Changes since [v3]
>> - Add memblock_cap_memory_ranges back for multiple ranges.
>> - Fix some compiling warnings.
>>
>> Changes since [v2]
>> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
>> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
>> patch.
>>
>> Changes since [v1]:
>> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
>> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
>> in fdt_enforce_memory_region().
>> There are at most two crash kernel regions, for two crash kernel regions
>> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
>> and then remove the memory range in the middle.
>>
>> [1]: http://lists.infradead.org/pipermail/kexec/2020-May/025128.html
>> [v1]: https://lkml.org/lkml/2019/4/2/1174
>> [v2]: https://lkml.org/lkml/2019/4/9/86
>> [v3]: https://lkml.org/lkml/2019/4/9/306
>> [v4]: https://lkml.org/lkml/2019/4/15/273
>> [v5]: https://lkml.org/lkml/2019/5/6/1360
>> [v6]: https://lkml.org/lkml/2019/8/30/142
>> [v7]: https://lkml.org/lkml/2019/12/23/411
>>
>> Chen Zhou (5):
>>   x86: kdump: move reserve_crashkernel_low() into crash_core.c
>>   arm64: kdump: reserve crashkenel above 4G for crash dump kernel
>>   arm64: kdump: add memory for devices by DT property, low-memory-range
>>   kdump: update Documentation about crashkernel on arm64
>>   dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump
>>
>>  Documentation/admin-guide/kdump/kdump.rst | 13 ++-
>>  .../admin-guide/kernel-parameters.txt   

Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump

2020-05-21 Thread chenzhou
Hi Rob,

On 2020/5/21 21:29, Rob Herring wrote:
> On Thu, May 21, 2020 at 3:35 AM Chen Zhou  wrote:
>> Add documentation for DT property used by arm64 kdump:
>> linux,low-memory-range.
>> "linux,low-memory-range" is an another memory region used for crash
>> dump kernel devices.
>>
>> Signed-off-by: Chen Zhou 
>> ---
>>  Documentation/devicetree/bindings/chosen.txt | 25 
>>  1 file changed, 25 insertions(+)
> chosen is now a schema documented here[1].
Ok, that is, i don't need to modify the doc in kernel, just create a pull 
request in github [1]?

>
>> diff --git a/Documentation/devicetree/bindings/chosen.txt 
>> b/Documentation/devicetree/bindings/chosen.txt
>> index 45e79172a646..bfe6fb6976e6 100644
>> --- a/Documentation/devicetree/bindings/chosen.txt
>> +++ b/Documentation/devicetree/bindings/chosen.txt
>> @@ -103,6 +103,31 @@ While this property does not represent a real hardware, 
>> the address
>>  and the size are expressed in #address-cells and #size-cells,
>>  respectively, of the root node.
>>
>> +linux,low-memory-range
>> +--
>> +This property (arm64 only) holds a base address and size, describing a
>> +limited region below 4G. Similar to "linux,usable-memory-range", it is
>> +an another memory range which may be considered available for use by the
>> +kernel.
> Why can't you just add a range to "linux,usable-memory-range"? It
> shouldn't be hard to figure out which part is below 4G.
I did like this in my previous version, such as v5. After discussed with James, 
i modified it to the current way.

We think the existing behavior should be unchanged, which helps with keeping 
compatibility with existing
user-space and older kdump kernels.

The comments from James:
> linux,usable-memory-range = .
Won't this break if your kdump kernel doesn't know what the extra parameters 
are?
Or if it expects two ranges, but only gets one? These DT properties should be 
treated as
ABI between kernel versions, we can't really change it like this.

I think the 'low' region is an optional-extra, that is never mapped by the 
first kernel. I
think the simplest thing to do is to add an 'linux,low-memory-range' that we
memblock_add() after memblock_cap_memory_range() has been called.
If its missing, or the new kernel doesn't know what its for, everything keeps 
working.

previous discusses:
https://lkml.org/lkml/2019/6/5/674
https://lkml.org/lkml/2019/6/13/229

Thanks,
Chen Zhou

>
> Rob
>
> [1] 
> https://github.com/devicetree-org/dt-schema/blob/master/schemas/chosen.yaml
>
> .
>




Re: [PATCH v7 0/4] support reserving crashkernel above 4G on arm64 kdump

2020-05-19 Thread chenzhou
Hi Arnd,

On 2020/5/19 18:21, Arnd Bergmann wrote:
> On Thu, Mar 26, 2020 at 4:10 AM Chen Zhou  wrote:
>> Hi all,
>>
>> Friendly ping...
> I was asked about this patch series, and see that you last posted it in
> December. I think you should rebase it to linux-5.7-rc6 and post the
> entire series again to make progress, as it's unlikely that any maintainer
> would pick up the patches from last year.
>
> For the contents, everything seems reasonable to me, but I noticed that
> you are adding a property to the /chosen node without adding the
> corresponding documentation to
> Documentation/devicetree/bindings/chosen.txt
>
> Please add that, and Cc the devicetree maintainers on the updated
> patch.
>
>  Arnd

Thanks for your review and comments, i will rebase it to linux-5.7-rc6 and add 
the
corresponding documentation.

Thanks,
Chen Zhou

>> On 2019/12/23 23:23, Chen Zhou wrote:
>>> This patch series enable reserving crashkernel above 4G in arm64.
>>>
>>> There are following issues in arm64 kdump:
>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which will fail
>>> when there is no enough low memory.
>>> 2. Currently, crashkernel=Y@X can be used to reserve crashkernel above 4G,
>>> in this case, if swiotlb or DMA buffers are required, crash dump kernel
>>> will boot failure because there is no low memory available for allocation.
>>>
>>> The previous changes and discussions can be retrieved from:
>>>
>>> Changes since [v6]
>>> - Fix build errors reported by kbuild test robot.
> ...
>
> .
>




Re: [PATCH -next 0/2] sparc: use snprintf() in show() methods

2020-05-10 Thread chenzhou
Sorry, i made a mistake, should be scnprintf().


On 2020/5/9 19:40, Joe Perches wrote:
> On Sat, 2020-05-09 at 17:18 +0800, Chen Zhou wrote:
>> snprintf() returns the number of bytes that would be written,
>> which may be greater than the the actual length to be written.
> []
>> Chen Zhou (2):
>>   sparc: use snprintf() in show_pciobppath_attr() in pci.c
>>   sparc: use snprintf() in show_pciobppath_attr() in vio.c
> Your subjects are a bit off: snprintf vs scnprintf
>
>
>
>