Re: x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Dave Young
On 04/16/21 at 01:28pm, Mike Galbraith wrote:
> On Fri, 2021-04-16 at 19:07 +0800, Dave Young wrote:
> >
> > > We're excluding two ranges, allocate the scratch space we need to do that.
> >
> > I think 1 range should be fine, have you tested 1?
> 
> Have now, and vzalloc(struct_size(cmem, ranges, 1)) worked just fine.

Ok, thanks for your quick response.  Care to resend and cc x86 list and
Andrew?

Andrew usually takes core kexec/kdump fixes, x86 usually go through x86
maintainer.

Thanks
Dave



Re: x86/crash: fix crash_setup_memmap_entries() out-of-bounds access

2021-04-16 Thread Dave Young
Hi Mike,

Thanks for the patch! I suggest always cc kexec list for kexec/kdump
patches.
On 04/15/21 at 07:56pm, Mike Galbraith wrote:
> x86/crash: fix crash_setup_memmap_entries() KASAN vmalloc-out-of-bounds gripe
> 
> [   15.428011] BUG: KASAN: vmalloc-out-of-bounds in 
> crash_setup_memmap_entries+0x17e/0x3a0
> [   15.428018] Write of size 8 at addr c9426008 by task kexec/1187
> 
> (gdb) list *crash_setup_memmap_entries+0x17e
> 0x8107cafe is in crash_setup_memmap_entries 
> (arch/x86/kernel/crash.c:322).
> 317  unsigned long long mend)
> 318 {
> 319 unsigned long start, end;
> 320
> 321 cmem->ranges[0].start = mstart;
> 322 cmem->ranges[0].end = mend;
> 323 cmem->nr_ranges = 1;
> 324
> 325 /* Exclude elf header region */
> 326 start = image->arch.elf_load_addr;
> (gdb)
> 
> We're excluding two ranges, allocate the scratch space we need to do that.

I think 1 range should be fine, have you tested 1?

The code is just excluding the elf header space which will be loaded
first before anything else so I assume it will be just at the start of
the crashkernel resource region.  Thus [a b] after exclude the start
part will be [c b].  But I have not read the code for long time, maybe I
need to double check.

But anyway 2 would be good since the code is obscure we can easily miss
it in the future.  See how other people think.

> 
> Signed-off-by: Mike Galbraith 
> ---
>  arch/x86/kernel/crash.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -337,7 +337,7 @@ int crash_setup_memmap_entries(struct ki
>   struct crash_memmap_data cmd;
>   struct crash_mem *cmem;
> 
> - cmem = vzalloc(sizeof(struct crash_mem));
> + cmem = vzalloc(sizeof(struct crash_mem)+(2*sizeof(struct 
> crash_mem_range)));

Thanks for the patch, can you try below?
vzalloc(struct_size(cmem, ranges, 2));


>   if (!cmem)
>   return -ENOMEM;
> 
> 

Thanks
Dave



Re: [PATCH v4 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-25 Thread Dave Young
On 02/23/21 at 09:41am, Saeed Mirzamohammadi wrote:
> This adds crashkernel=auto feature to configure reserved memory for
> vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for
> different kernel distributions and different archs based on their
> needs.
> 
> Signed-off-by: Saeed Mirzamohammadi 
> Signed-off-by: John Donnelly 
> Tested-by: John Donnelly 
> ---
>  Documentation/admin-guide/kdump/kdump.rst |  3 ++-
>  .../admin-guide/kernel-parameters.txt |  6 ++
>  arch/Kconfig  | 20 +++
>  kernel/crash_core.c   |  7 +++
>  4 files changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 75a9dd98e76e..ae030111e22a 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -285,7 +285,8 @@ This would mean:
>  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
>  3) if the RAM size is larger than 2G, then reserve 128M
>  
> -
> +Or you can use crashkernel=auto to choose the crash kernel memory size
> +based on the recommended configuration set for each arch.
>  
>  Boot into System Kernel
>  ===
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 9e3cdb271d06..a5deda5c85fe 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -747,6 +747,12 @@
>   a memory unit (amount[KMG]). See also
>   Documentation/admin-guide/kdump/kdump.rst for an 
> example.
>  
> + crashkernel=auto
> + [KNL] This parameter will set the reserved memory for
> + the crash kernel based on the value of the 
> CRASH_AUTO_STR
> + that is the best effort estimation for each arch. See 
> also
> + arch/Kconfig for further details.
> +
>   crashkernel=size[KMG],high
>   [KNL, X86-64] range could be above 4G. Allow kernel
>   to allocate physical memory region from top, so could
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..23d047548772 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -14,6 +14,26 @@ menu "General architecture-dependent options"
>  config CRASH_CORE
>   bool
>  
> +config CRASH_AUTO_STR
> + string "Memory reserved for crash kernel"
> + depends on CRASH_CORE
> + default "1G-64G:128M,64G-1T:256M,1T-:512M"
> + help
> +   This configures the reserved memory dependent
> +   on the value of System RAM. The syntax is:
> +   crashkernel=:[,:,...][@offset]
> +   range=start-[end]
> +
> +   For example:
> +   crashkernel=512M-2G:64M,2G-:128M
> +
> +   This would mean:
> +
> +   1) if the RAM is smaller than 512M, then don't reserve anything
> +  (this is the "rescue" case)
> +   2) if the RAM size is between 512M and 2G (exclusive), then 
> reserve 64M
> +   3) if the RAM size is larger than 2G, then reserve 128M
> +
>  config KEXEC_CORE
>   select CRASH_CORE
>   bool
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 825284baaf46..90f9e4bb6704 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline,
>   if (suffix)
>   return parse_crashkernel_suffix(ck_cmdline, crash_size,
>   suffix);
> +#ifdef CONFIG_CRASH_AUTO_STR
> + if (strncmp(ck_cmdline, "auto", 4) == 0) {
> + ck_cmdline = CONFIG_CRASH_AUTO_STR;
> + pr_info("Using crashkernel=auto, the size chosen is a best 
> effort estimation.\n");
> + }
> +#endif
>   /*
>* if the commandline contains a ':', then that's the extended
>* syntax -- if not, it must be the classic syntax
> -- 
> 2.27.0
> 


Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-17 Thread Dave Young
On 02/17/21 at 02:42pm, Vivek Goyal wrote:
> On Wed, Feb 17, 2021 at 02:26:53PM -0500, Steven Rostedt wrote:
> > On Wed, 17 Feb 2021 12:40:43 -0600
> > john.p.donne...@oracle.com wrote:
> > 
> > > Hello.
> > > 
> > > Ping.
> > > 
> > > Can we get this reviewed and staged ?
> > > 
> > > Thank you.
> > 
> > Andrew,
> > 
> > Seems you are the only one pushing patches in for kexec/crash. Is this
> > maintained by anyone?
> 
> Dave Young and Baoquan He still maintain kexec/kdump stuff, AFAIK. I
> don't get time to look into this stuff now a days. 

Vivek, no problem, both Baoquan and me are on holiday leaves previously.

I'm fine with the change. 
This patch benefits distributions and those people who want to deploy a lot of
machines.  It is a good start and we can continue to improve the estimation 
later.

Thanks
Dave 



Re: [PATCH v3 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-17 Thread Dave Young
On 02/11/21 at 10:08am, Saeed Mirzamohammadi wrote:
> This adds crashkernel=auto feature to configure reserved memory for
> vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for
> different kernel distributions and different archs based on their
> needs.
> 
> Signed-off-by: Saeed Mirzamohammadi 
> Signed-off-by: John Donnelly 
> Tested-by: John Donnelly 
> ---
>  Documentation/admin-guide/kdump/kdump.rst |  3 ++-
>  .../admin-guide/kernel-parameters.txt |  6 +
>  arch/Kconfig  | 24 +++
>  kernel/crash_core.c   |  7 ++
>  4 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..e55cdc404c6b 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -285,7 +285,8 @@ This would mean:
>  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
>  3) if the RAM size is larger than 2G, then reserve 128M
>  
> -
> +Or you can use crashkernel=auto to choose the crash kernel memory size
> +based on the recommended configuration set for each arch.
>  
>  Boot into System Kernel
>  ===
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 7d4e523646c3..aa2099465458 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -736,6 +736,12 @@
>   a memory unit (amount[KMG]). See also
>   Documentation/admin-guide/kdump/kdump.rst for an 
> example.
>  
> + crashkernel=auto
> + [KNL] This parameter will set the reserved memory for
> + the crash kernel based on the value of the 
> CRASH_AUTO_STR
> + that is the best effort estimation for each arch. See 
> also
> + arch/Kconfig for further details.
> +
>   crashkernel=size[KMG],high
>   [KNL, X86-64] range could be above 4G. Allow kernel
>   to allocate physical memory region from top, so could
> diff --git a/arch/Kconfig b/arch/Kconfig
> index af14a567b493..f87c88ffa2f8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -14,6 +14,30 @@ menu "General architecture-dependent options"
>  config CRASH_CORE
>   bool
>  
> +if CRASH_CORE
> +
> +config CRASH_AUTO_STR
> + string "Memory reserved for crash kernel"
> + depends on CRASH_CORE
> + default "1G-64G:128M,64G-1T:256M,1T-:512M"
> + help
> +   This configures the reserved memory dependent
> +   on the value of System RAM. The syntax is:
> +   crashkernel=:[,:,...][@offset]
> +   range=start-[end]
> +
> +   For example:
> +   crashkernel=512M-2G:64M,2G-:128M
> +
> +   This would mean:
> +
> +   1) if the RAM is smaller than 512M, then don't reserve anything
> +  (this is the "rescue" case)
> +   2) if the RAM size is between 512M and 2G (exclusive), then 
> reserve 64M
> +   3) if the RAM size is larger than 2G, then reserve 128M
> +
> +endif # CRASH_CORE
> +
>  config KEXEC_CORE
>   select CRASH_CORE
>   bool
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..ab0a2b4b1ffa 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline,
>   if (suffix)
>   return parse_crashkernel_suffix(ck_cmdline, crash_size,
>   suffix);
> +#ifdef CONFIG_CRASH_AUTO_STR
> + if (strncmp(ck_cmdline, "auto", 4) == 0) {
> + ck_cmdline = CONFIG_CRASH_AUTO_STR;
> + pr_info("Using crashkernel=auto, the size chosen is a best 
> effort estimation.\n");
> + }
> +#endif
>   /*
>* if the commandline contains a ':', then that's the extended
>* syntax -- if not, it must be the classic syntax
> -- 
> 2.27.0
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v2 1/1] kernel/crash_core: Add crashkernel=auto for vmcore creation

2021-02-06 Thread Dave Young
Hi Saeed,
On 02/03/21 at 04:43pm, Saeed Mirzamohammadi wrote:
> This adds crashkernel=auto feature to configure reserved memory for
> vmcore creation. CONFIG_CRASH_AUTO_STR is defined to be set for
> different kernel distributions and different archs based on their
> needs.
> 
> Signed-off-by: Saeed Mirzamohammadi 
> Signed-off-by: John Donnelly 
> Tested-by: John Donnelly 
> ---
>  Documentation/admin-guide/kdump/kdump.rst |  5 +
>  arch/Kconfig  | 24 +++
>  kernel/crash_core.c   |  7 +++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 75a9dd98e76e..f95a2af64f59 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -285,7 +285,12 @@ This would mean:
>  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
>  3) if the RAM size is larger than 2G, then reserve 128M
>  
> +Or you can use crashkernel=auto if you have enough memory. The threshold
> +is 1G on x86_64 and arm64. If your system memory is less than the threshold,
> +crashkernel=auto will not reserve memory. The size changes according to
> +the system memory size like below:
>  
> +x86_64/arm64: 1G-64G:128M,64G-1T:256M,1T-:512M

This part should be updated since you do not make the default value arch
dependent.

The format of the auto str is documented well in kernel-parameters.txt
below part:
crashkernel=range1:size1[,range2:size2,...][@offset]

The crashkernel=auto should be also documented in kernel-parameters.txt
and do not need to explain the threshold etc again, just refer to the
"crashkernel=range1:size1[,range2:size2,...][@offset]" format would be
fine.

>  
>  Boot into System Kernel
>  ===
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 56b6ccc0e32d..a772eb397d73 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -14,6 +14,30 @@ menu "General architecture-dependent options"
>  config CRASH_CORE
>   bool
>  
> +if CRASH_CORE
> +
> +config CRASH_AUTO_STR
> + string "Memory reserved for crash kernel"
> + depends on CRASH_CORE
> + default "1G-64G:128M,64G-1T:256M,1T-:512M"
> + help
> +   This configures the reserved memory dependent
> +   on the value of System RAM. The syntax is:
> +   crashkernel=:[,:,...][@offset]
> +   range=start-[end]
> +
> +   For example:
> +   crashkernel=512M-2G:64M,2G-:128M
> +
> +   This would mean:
> +
> +   1) if the RAM is smaller than 512M, then don't reserve anything
> +  (this is the "rescue" case)
> +   2) if the RAM size is between 512M and 2G (exclusive), then 
> reserve 64M
> +   3) if the RAM size is larger than 2G, then reserve 128M
> +
> +endif # CRASH_CORE
> +
>  config KEXEC_CORE
>   select CRASH_CORE
>   bool
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 106e4500fd53..ab0a2b4b1ffa 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -7,6 +7,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -250,6 +251,12 @@ static int __init __parse_crashkernel(char *cmdline,
>   if (suffix)
>   return parse_crashkernel_suffix(ck_cmdline, crash_size,
>   suffix);
> +#ifdef CONFIG_CRASH_AUTO_STR
> + if (strncmp(ck_cmdline, "auto", 4) == 0) {
> + ck_cmdline = CONFIG_CRASH_AUTO_STR;
> + pr_info("Using crashkernel=auto, the size chosen is a best 
> effort estimation.\n");
> + }
> +#endif
>   /*
>* if the commandline contains a ':', then that's the extended
>* syntax -- if not, it must be the classic syntax
> -- 
> 2.27.0
> 

Thanks
Dave



Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2021-01-22 Thread Dave Young
On 01/23/21 at 11:51am, Dave Young wrote:
> Hi Saeed,
> On 01/22/21 at 05:14pm, Saeed Mirzamohammadi wrote:
> > Hi,
> > 
> > > On Jan 21, 2021, at 7:12 PM, Dave Young  wrote:
> > > 
> > > On 01/22/21 at 09:22am, Dave Young wrote:
> > >> Hi John,
> > >> 
> > >> On 01/21/21 at 09:32am, john.p.donne...@oracle.com wrote:
> > >>> On 11/22/20 9:47 PM, Dave Young wrote:
> > >>>> Hi Guilherme,
> > >>>> On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:
> > >>>>> Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
> > >>>>> to you I'm fine with it. I'd just recommend to test recent kernels in
> > >>>>> multiple distros with the minimum "range" to see if 64M is enough for
> > >>>>> crashkernel, maybe we'd need to bump that.
> > >>>> 
> > >>>> Giving the different kernel configs and the different userspace
> > >>>> initramfs setup it is hard to get an uniform value for all 
> > >>>> distributions,
> > >>>> but we can have an interface/kconfig-option for them to provide a 
> > >>>> value like this patch
> > >>>> is doing. And it could be improved like Kairui said about some known
> > >>>> kernel added extra values later, probably some more improvements if
> > >>>> doable.
> > >>>> 
> > >>>> Thanks
> > >>>> Dave
> > >>>> 
> > >>> 
> > >>> Hi.
> > >>> 
> > >>> Are we going to move forward with implementing this for X86 and Arm ?
> > >>> 
> > >>> If other platform maintainers want to include this CONFIG option in 
> > >>> their
> > >>> configuration settings they have a starting point.
> > >> 
> > >> I would expect this become arch independent.
> > > 
> > > Clarify a bit, it can be a general config option under arch/Kconfig and
> > > just put the code in general arch independent part.
> > 
> > Does this mean that we need to add the option to def_configs in all archs 
> > as well?
> > 
> 
> I think we do not need to add defconfig, something like this will just work?
> 
> BTW, it should depend on CRASH_CORE instead of CRASH_DUMP, the logic of
> parsing crashkernel is in kernel/crash_core.c
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index af14a567b493..fa6efeb52dc5 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -14,6 +14,11 @@ menu "General architecture-dependent options"
>  config CRASH_CORE
>   bool
>  
> +config CRASH_AUTO_STR
> + depends on CRASH_CORE
> + string "Memory reserved for crash kernel"
> + default "1G-:128M"

People do not want to see the default value if they do not need kdump 
so it would be better to add another kconfig option as a switch which is
set default as off in bool state.

> + ... help text [snip] ...
> +
>  config KEXEC_CORE
>   select CRASH_CORE
>   bool
> 
> [...]
> 
> > Thanks,
> > Saeed
> > 
> > > 
> > >> 
> > >> Saeed, Kairui, would any of you like to update the patch?
> > >> 
> > >>> 
> > >>> Thank you,
> > >>> 
> > >>> John.
> > >>> 
> > >>> ( I am not currently on many of the included dist lists  in this email, 
> > >>> so
> > >>> hopefully key contributors are included in this exchange )
> > >>> 
> > >> 
> > >> Thanks
> > >> Dave
> > 
> 
> Thanks
> Dave



Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2021-01-22 Thread Dave Young
Hi Saeed,
On 01/22/21 at 05:14pm, Saeed Mirzamohammadi wrote:
> Hi,
> 
> > On Jan 21, 2021, at 7:12 PM, Dave Young  wrote:
> > 
> > On 01/22/21 at 09:22am, Dave Young wrote:
> >> Hi John,
> >> 
> >> On 01/21/21 at 09:32am, john.p.donne...@oracle.com wrote:
> >>> On 11/22/20 9:47 PM, Dave Young wrote:
> >>>> Hi Guilherme,
> >>>> On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:
> >>>>> Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
> >>>>> to you I'm fine with it. I'd just recommend to test recent kernels in
> >>>>> multiple distros with the minimum "range" to see if 64M is enough for
> >>>>> crashkernel, maybe we'd need to bump that.
> >>>> 
> >>>> Giving the different kernel configs and the different userspace
> >>>> initramfs setup it is hard to get an uniform value for all distributions,
> >>>> but we can have an interface/kconfig-option for them to provide a value 
> >>>> like this patch
> >>>> is doing. And it could be improved like Kairui said about some known
> >>>> kernel added extra values later, probably some more improvements if
> >>>> doable.
> >>>> 
> >>>> Thanks
> >>>> Dave
> >>>> 
> >>> 
> >>> Hi.
> >>> 
> >>> Are we going to move forward with implementing this for X86 and Arm ?
> >>> 
> >>> If other platform maintainers want to include this CONFIG option in their
> >>> configuration settings they have a starting point.
> >> 
> >> I would expect this become arch independent.
> > 
> > Clarify a bit, it can be a general config option under arch/Kconfig and
> > just put the code in general arch independent part.
> 
> Does this mean that we need to add the option to def_configs in all archs as 
> well?
> 

I think we do not need to add defconfig, something like this will just work?

BTW, it should depend on CRASH_CORE instead of CRASH_DUMP, the logic of
parsing crashkernel is in kernel/crash_core.c

diff --git a/arch/Kconfig b/arch/Kconfig
index af14a567b493..fa6efeb52dc5 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -14,6 +14,11 @@ menu "General architecture-dependent options"
 config CRASH_CORE
bool
 
+config CRASH_AUTO_STR
+   depends on CRASH_CORE
+   string "Memory reserved for crash kernel"
+   default "1G-:128M"
+   ... help text [snip] ...
+
 config KEXEC_CORE
select CRASH_CORE
bool

[...]

> Thanks,
> Saeed
> 
> > 
> >> 
> >> Saeed, Kairui, would any of you like to update the patch?
> >> 
> >>> 
> >>> Thank you,
> >>> 
> >>> John.
> >>> 
> >>> ( I am not currently on many of the included dist lists  in this email, so
> >>> hopefully key contributors are included in this exchange )
> >>> 
> >> 
> >> Thanks
> >> Dave
> 

Thanks
Dave



Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2021-01-21 Thread Dave Young
On 01/22/21 at 09:22am, Dave Young wrote:
> Hi John,
> 
> On 01/21/21 at 09:32am, john.p.donne...@oracle.com wrote:
> > On 11/22/20 9:47 PM, Dave Young wrote:
> > > Hi Guilherme,
> > > On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:
> > > > Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
> > > > to you I'm fine with it. I'd just recommend to test recent kernels in
> > > > multiple distros with the minimum "range" to see if 64M is enough for
> > > > crashkernel, maybe we'd need to bump that.
> > > 
> > > Giving the different kernel configs and the different userspace
> > > initramfs setup it is hard to get an uniform value for all distributions,
> > > but we can have an interface/kconfig-option for them to provide a value 
> > > like this patch
> > > is doing. And it could be improved like Kairui said about some known
> > > kernel added extra values later, probably some more improvements if
> > > doable.
> > > 
> > > Thanks
> > > Dave
> > > 
> > 
> > Hi.
> > 
> > Are we going to move forward with implementing this for X86 and Arm ?
> > 
> > If other platform maintainers want to include this CONFIG option in their
> > configuration settings they have a starting point.
> 
> I would expect this become arch independent.

Clarify a bit, it can be a general config option under arch/Kconfig and
just put the code in general arch independent part.

> 
> Saeed, Kairui, would any of you like to update the patch?
> 
> > 
> > Thank you,
> > 
> > John.
> > 
> > ( I am not currently on many of the included dist lists  in this email, so
> > hopefully key contributors are included in this exchange )
> > 
> 
> Thanks
> Dave



Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2021-01-21 Thread Dave Young
Hi John,

On 01/21/21 at 09:32am, john.p.donne...@oracle.com wrote:
> On 11/22/20 9:47 PM, Dave Young wrote:
> > Hi Guilherme,
> > On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:
> > > Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
> > > to you I'm fine with it. I'd just recommend to test recent kernels in
> > > multiple distros with the minimum "range" to see if 64M is enough for
> > > crashkernel, maybe we'd need to bump that.
> > 
> > Giving the different kernel configs and the different userspace
> > initramfs setup it is hard to get an uniform value for all distributions,
> > but we can have an interface/kconfig-option for them to provide a value 
> > like this patch
> > is doing. And it could be improved like Kairui said about some known
> > kernel added extra values later, probably some more improvements if
> > doable.
> > 
> > Thanks
> > Dave
> > 
> 
> Hi.
> 
> Are we going to move forward with implementing this for X86 and Arm ?
> 
> If other platform maintainers want to include this CONFIG option in their
> configuration settings they have a starting point.

I would expect this become arch independent.

Saeed, Kairui, would any of you like to update the patch?

> 
> Thank you,
> 
> John.
> 
> ( I am not currently on many of the included dist lists  in this email, so
> hopefully key contributors are included in this exchange )
> 

Thanks
Dave



Re: [PATCH] kexec: Fix error code in kexec_calculate_store_digests()

2020-12-09 Thread Dave Young
On 12/08/20 at 10:55pm, Dan Carpenter wrote:
> Return -ENOMEM on allocation failure instead of returning success.
> 
> Fixes: a43cac0d9dc2 ("kexec: split kexec_file syscall code to kexec_file.c")
> Signed-off-by: Dan Carpenter 
> ---
>  kernel/kexec_file.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index b02086d70492..9570f380a825 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -735,8 +735,10 @@ static int kexec_calculate_store_digests(struct kimage 
> *image)
>  
>   sha_region_sz = KEXEC_SEGMENT_MAX * sizeof(struct kexec_sha_region);
>   sha_regions = vzalloc(sha_region_sz);
> - if (!sha_regions)
> + if (!sha_regions) {
> + ret = -ENOMEM;
>   goto out_free_desc;
> + }
>  
>   desc->tfm   = tfm;
>  
> -- 
> 2.29.2
> 

Good catch, thanks!

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2020-11-22 Thread Dave Young
Hi Guilherme,
On 11/22/20 at 12:32pm, Guilherme Piccoli wrote:
> Hi Dave and Kairui, thanks for your responses! OK, if that makes sense
> to you I'm fine with it. I'd just recommend to test recent kernels in
> multiple distros with the minimum "range" to see if 64M is enough for
> crashkernel, maybe we'd need to bump that.

Giving the different kernel configs and the different userspace
initramfs setup it is hard to get an uniform value for all distributions,
but we can have an interface/kconfig-option for them to provide a value like 
this patch
is doing. And it could be improved like Kairui said about some known
kernel added extra values later, probably some more improvements if
doable.

Thanks
Dave



Re: [PATCH 1/1] kernel/crash_core.c - Add crashkernel=auto for x86 and ARM

2020-11-19 Thread Dave Young
Hi Guilherme,
On 11/19/20 at 06:56pm, Guilherme Piccoli wrote:
> Hi Saeed, thanks for your patch/idea! Comments inline, below.
> 
> On Wed, Nov 18, 2020 at 8:29 PM Saeed Mirzamohammadi
>  wrote:
> >
> > This adds crashkernel=auto feature to configure reserved memory for
> > vmcore creation to both x86 and ARM platforms based on the total memory
> > size.
> >
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: John Donnelly 
> > Signed-off-by: Saeed Mirzamohammadi 
> > ---
> >  Documentation/admin-guide/kdump/kdump.rst |  5 +
> >  arch/arm64/Kconfig| 26 ++-
> >  arch/arm64/configs/defconfig  |  1 +
> >  arch/x86/Kconfig  | 26 ++-
> >  arch/x86/configs/x86_64_defconfig |  1 +
> >  kernel/crash_core.c   | 20 +++--
> >  6 files changed, 75 insertions(+), 4 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> > b/Documentation/admin-guide/kdump/kdump.rst
> > index 75a9dd98e76e..f95a2af64f59 100644
> > --- a/Documentation/admin-guide/kdump/kdump.rst
> > +++ b/Documentation/admin-guide/kdump/kdump.rst
> > @@ -285,7 +285,12 @@ This would mean:
> >  2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
> >  3) if the RAM size is larger than 2G, then reserve 128M
> >
> > +Or you can use crashkernel=auto if you have enough memory. The threshold
> > +is 1G on x86_64 and arm64. If your system memory is less than the 
> > threshold,
> > +crashkernel=auto will not reserve memory. The size changes according to
> > +the system memory size like below:
> >
> > +x86_64/arm64: 1G-64G:128M,64G-1T:256M,1T-:512M
> 
> As mentioned in the thread, this was tried before and never got merged
> - I'm not sure the all the reasons, but I speculate that a stronger
> reason is that it'd likely fail in many cases. I've seen cases of 256G

Yes, there were a few tries, last time I tried to set a default value, I
do not think people are strongly against it.  We have been using the
auto in Red Hat for long time, it does work for most of usual cases
like Saeed said in the patch. But I think all of us are aligned it is
not possible to satisfy all the user cases.  Anyway I also think this is
good to have.

> servers that require crashkernel=600M (or more), due to the amount of
> devices. Also, the minimum nowadays would likely be 96M or more - I'm
> looping Cascardo and Dann (Debian/Ubuntu maintainers of kdump stuff)
> so they maybe can jump in with even more examples/considerations.

Another reason of people have different feeling about the memory
requirement is currently distributions are doing different on kdump,
especially for the userspace part. Kairui did a lot of work in dracut to
reduce the memory requirements in dracut, for example only add dump
required kernel modules in 2nd kernel initramfs, also we have a lot of
other twicks for dracut to use "hostonly" mode, eg. hostonly multipath
configurations will just bring up necessary paths instead of creating
all of the multipath devices.

> 
> What we've been trying to do in Ubuntu/Debian is using an estimator
> approach [0] - this is purely userspace and tries to infer the amount
> of necessary memory a kdump minimal[1] kernel would take. I'm not
> -1'ing your approach totally, but I think a bit more consideration is
> needed in the ranges, at least accounting the number of devices of the
> machine or something like that.

There are definitely room to improve and make it better in the future,
but I think this is a good start and simple enough proposal for the time
being :)

> 
> Cheers,
> 
> 
> Guilherme
> 
> [0] https://salsa.debian.org/debian/makedumpfile/-/merge_requests/7
> [1] Minimal as having a reduced initrd + "shrinking" parameters (like
> nr_cpus=1).
> 

Thanks
Dave



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-26 Thread Dave Young
Hi,

On 09/25/20 at 10:56am, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote:
> > Hi,
> > 
> > On 09/24/20 at 01:16pm, boris.ostrov...@oracle.com wrote:
> > > 
> > > On 9/24/20 12:43 PM, Michael Kelley wrote:
> > > > From: Eric W. Biederman  Sent: Thursday, 
> > > > September 24, 2020 9:26 AM
> > > >> Michael Kelley  writes:
> > > >>
> > > >>>>> Added Hyper-V people and people who created the param, it is below
> > > >>>>> commit, I also want to remove it if possible, let's see how people
> > > >>>>> think, but the least way should be to disable the auto setting in 
> > > >>>>> both systemd
> > > >>>>> and kernel:
> > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> > > >>> panic'ed.  Informing the host is particularly important in a public 
> > > >>> cloud
> > > >>> such as Azure so that the cloud software can alert the customer, and 
> > > >>> can
> > > >>> track cloud-wide reliability statistics.   Whether a kdump is taken 
> > > >>> is controlled
> > > >>> entirely by the customer and how he configures the VM, and we want
> > > >>> the host to be informed either way.
> > > >> Why?
> > > >>
> > > >> Why does the host care?
> > > >> Especially if the VM continues executing into a kdump kernel?
> > > > The host itself doesn't care.  But the host is a convenient out-of-band
> > > > channel for recording that a panic has occurred and to collect basic 
> > > > data
> > > > about the panic.  This out-of-band channel is then used to notify the 
> > > > end
> > > > customer that his VM has panic'ed.  Sure, the customer should be running
> > > > his own monitoring software, but customers don't always do what they
> > > > should.  Equally important, the out-of-band channel allows the cloud
> > > > infrastructure software to notice trends, such as that the rate of Linux
> > > > panics has increased, and that perhaps there is a cloud problem that
> > > > should be investigated.
> > > 
> > > 
> > > In many cases (especially in cloud environment) your dump device is 
> > > remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of 
> > > connectivity issues (which could be cause of the panic in the first 
> > > place). So it is quite desirable to inform the infrastructure that the VM 
> > > is on its way out without waiting for kdump to complete.
> > 
> > That can probably be done in kdump kernel if it is really needed.  Say
> > informing host that panic happened and a kdump kernel is runnning.
> 
> If kdump kernel gets to that point. Sometimes (sadly) it ends up being
> misconfigured and it chokes up - and hence having multiple ways to emit
> the crash information before running kdump kernel is a life-saver.

If it is done in kernel boot phase before pid 1 comes up then things
should be good enough, specific for kvm/hyper-v guests the kdump kernel.

> 
> > 
> > But I think to set crash_kexec_post_notifiers by default is still bad. 
> 
> Because of the way it is run today I presume? If there was some
> safe/unsafe policy that should work right? I would think that the
> safe ones that work properly all the time are:
> 
>  - HyperV CRASH_MSRs,
>  - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob,
>  - pstore EFI variables
>  - Dumping in memory,
> 
> And then some that depend on firmware version (aka BIOS, and vendor) are:
>  - ACPI ERST,
> 
> And then the unsafe:
>  - s390, PowerPC (I don't actually know what they are but that
> was Dave's primary motivator).

As I said we also got reports of kdump kernel hang with Hyper-V with the
crash_kexec_post_notifiers enabled.

EFI pstore also depends on efi runtime that is in firmware, also we can
not ensure it works well after a panic happened.  Ditto for other pstore
backends we do not prefer to do it before kdump.  But as I said I'm not
saying they are not useful, people can use them by their choose.

As for the virtual machine panic events maybe it is ok to add some other
hooks instead of the notifiers.  But frankly I still feel it is better to do
it in kdump kernel boot path since kdump works well for virt from our
experience.

> 
> > 
> > > 
> > > 
> > > >
> > > >> Further like I have mentioned everytime something like this has come up
> > > >> a call on the kexec on panic code path should be a direct call (That 
> > > >> can
> > > >> be audited) not something hidden in a notifier call chain (which can 
> > > >> not).
> > > >>
> > > 
> > > We btw already have a direct call from panic() to kmsg_dump() which is 
> > > indirectly controlled by crash_kexec_post_notifiers, and it would also be 
> > > preferable to be able to call it before kdump as well.
> > 
> > Right, that is the same thing we are talking about.
> > 
> > Thanks
> > Dave
> > 
> 

Thanks
Dave



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-24 Thread Dave Young
Hi,

On 09/24/20 at 01:16pm, boris.ostrov...@oracle.com wrote:
> 
> On 9/24/20 12:43 PM, Michael Kelley wrote:
> > From: Eric W. Biederman  Sent: Thursday, September 
> > 24, 2020 9:26 AM
> >> Michael Kelley  writes:
> >>
> > Added Hyper-V people and people who created the param, it is below
> > commit, I also want to remove it if possible, let's see how people
> > think, but the least way should be to disable the auto setting in both 
> > systemd
> > and kernel:
> >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> >>> panic'ed.  Informing the host is particularly important in a public cloud
> >>> such as Azure so that the cloud software can alert the customer, and can
> >>> track cloud-wide reliability statistics.   Whether a kdump is taken is 
> >>> controlled
> >>> entirely by the customer and how he configures the VM, and we want
> >>> the host to be informed either way.
> >> Why?
> >>
> >> Why does the host care?
> >> Especially if the VM continues executing into a kdump kernel?
> > The host itself doesn't care.  But the host is a convenient out-of-band
> > channel for recording that a panic has occurred and to collect basic data
> > about the panic.  This out-of-band channel is then used to notify the end
> > customer that his VM has panic'ed.  Sure, the customer should be running
> > his own monitoring software, but customers don't always do what they
> > should.  Equally important, the out-of-band channel allows the cloud
> > infrastructure software to notice trends, such as that the rate of Linux
> > panics has increased, and that perhaps there is a cloud problem that
> > should be investigated.
> 
> 
> In many cases (especially in cloud environment) your dump device is remote 
> (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity 
> issues (which could be cause of the panic in the first place). So it is quite 
> desirable to inform the infrastructure that the VM is on its way out without 
> waiting for kdump to complete.

That can probably be done in kdump kernel if it is really needed.  Say
informing host that panic happened and a kdump kernel is runnning.

But I think to set crash_kexec_post_notifiers by default is still bad. 

> 
> 
> >
> >> Further like I have mentioned everytime something like this has come up
> >> a call on the kexec on panic code path should be a direct call (That can
> >> be audited) not something hidden in a notifier call chain (which can not).
> >>
> 
> We btw already have a direct call from panic() to kmsg_dump() which is 
> indirectly controlled by crash_kexec_post_notifiers, and it would also be 
> preferable to be able to call it before kdump as well.

Right, that is the same thing we are talking about.

Thanks
Dave



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-22 Thread Dave Young
+ more people who may care about this param 
On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk  writes:
> 
> > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young  wrote:
> >> 
> >> > crash_kexec_post_notifiers enables running various panic notifier
> >> > before kdump kernel booting. This increases risks of kdump failure.
> >> > It is well documented in kernel-parameters.txt. We do not suggest
> >> > people to enable it together with kdump unless he/she is really sure.
> >> > This is also not suggested to be enabled by default when users are
> >> > not aware in distributions.
> >> > 
> >> > But unfortunately it is enabled by default in systemd, see below
> >> > discussions in a systemd report, we can not convince systemd to change
> >> > it:
> >> > https://github.com/systemd/systemd/issues/16661
> >> > 
> >> > Actually we have got reports about kdump kernel hangs in both s390x
> >> > and powerpcle cases caused by the systemd change,  also some x86 cases
> >> > could also be caused by the same (although that is in Hyper-V code
> >> > instead of systemd, that need to be addressed separately).
> >
> > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> >
> >> > 
> >> > Thus to avoid the auto enablement here just disable the param writable
> >> > permission in sysfs.
> >> > 
> >> 
> >> Well.  I don't think this is at all a desirable way of resolving a
> >> disagreement with the systemd developers
> >> 
> >> At the above github address I'm seeing "ryncsn added a commit to
> >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> >> enable crash_kexec_post_notifiers by default".  So didn't that address
> >> the issue?
> >
> > It does in systemd, but there is a strong interest in making this on
> > by default.
> 
> There is also a strong interest in removing this code entirely from the
> kernel.

Added Hyper-V people and people who created the param, it is below
commit, I also want to remove it if possible, let's see how people
think, but the least way should be to disable the auto setting in both systemd
and kernel:

commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
Author: Masami Hiramatsu 
Date:   Fri Jun 6 14:37:07 2014 -0700

kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after 
panic_notifers

Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg.  This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).

Usage: add "crash_kexec_post_notifiers" to kernel boot option.

Note that this actually increases risks of the failure of kdump.  This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.

> 
> This failure is a case in point.
> 
> I think I am at my I told you so point.  This is what all of the testing
> over all the years has said.  Leaving functionality to the peculiarities
> of firmware when you don't have to, and can actually control what is
> going on doesn't work.
> 
> Eric
> 
> 

Thanks
Dave



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-22 Thread Dave Young
On 09/21/20 at 04:18pm, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young  wrote:
> > 
> > > crash_kexec_post_notifiers enables running various panic notifier
> > > before kdump kernel booting. This increases risks of kdump failure.
> > > It is well documented in kernel-parameters.txt. We do not suggest
> > > people to enable it together with kdump unless he/she is really sure.
> > > This is also not suggested to be enabled by default when users are
> > > not aware in distributions.
> > > 
> > > But unfortunately it is enabled by default in systemd, see below
> > > discussions in a systemd report, we can not convince systemd to change
> > > it:
> > > https://github.com/systemd/systemd/issues/16661
> > > 
> > > Actually we have got reports about kdump kernel hangs in both s390x
> > > and powerpcle cases caused by the systemd change,  also some x86 cases
> > > could also be caused by the same (although that is in Hyper-V code
> > > instead of systemd, that need to be addressed separately).
> 
> Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> 
> > > 
> > > Thus to avoid the auto enablement here just disable the param writable
> > > permission in sysfs.
> > > 
> > 
> > Well.  I don't think this is at all a desirable way of resolving a
> > disagreement with the systemd developers
> > 
> > At the above github address I'm seeing "ryncsn added a commit to
> > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > enable crash_kexec_post_notifiers by default".  So didn't that address
> > the issue?
> 
> It does in systemd, but there is a strong interest in making this on by 
> default.

I understand there could be such interest, but we have to keep in mind
that any extra things after a system crash can cause kdump unreliable.

I do not object people to use pstore, but I do object to enable the
notifiers by default.

BTW, crash notifiers are not limited to pstore, there are quite a log of
other pieces like led trigger etc.

Thanks
Dave



Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-19 Thread Dave Young
On 09/18/20 at 05:47pm, Andrew Morton wrote:
> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young  wrote:
> 
> > crash_kexec_post_notifiers enables running various panic notifier
> > before kdump kernel booting. This increases risks of kdump failure.
> > It is well documented in kernel-parameters.txt. We do not suggest
> > people to enable it together with kdump unless he/she is really sure.
> > This is also not suggested to be enabled by default when users are
> > not aware in distributions.
> > 
> > But unfortunately it is enabled by default in systemd, see below
> > discussions in a systemd report, we can not convince systemd to change
> > it:
> > https://github.com/systemd/systemd/issues/16661
> > 
> > Actually we have got reports about kdump kernel hangs in both s390x
> > and powerpcle cases caused by the systemd change,  also some x86 cases
> > could also be caused by the same (although that is in Hyper-V code
> > instead of systemd, that need to be addressed separately).
> > 
> > Thus to avoid the auto enablement here just disable the param writable
> > permission in sysfs.
> > 
> 
> Well.  I don't think this is at all a desirable way of resolving a
> disagreement with the systemd developers
> 
> At the above github address I'm seeing "ryncsn added a commit to
> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> enable crash_kexec_post_notifiers by default".  So didn't that address
> the issue?
> 

I hope that commit can be merged in systemd, but we are really not
optimize about that. The discussion is clear there but we did not get
response since Aug 6.

BTW, Kairui sent the systemd pull request 15 days ago, the new update added some
comment.

Thanks
Dave 



Re: [PATCH v12 3/9] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel[_low]()

2020-09-17 Thread Dave Young
On 09/18/20 at 11:57am, chenzhou wrote:
> Hi Dave,
> 
> 
> On 2020/9/18 11:01, Dave Young wrote:
> > On 09/07/20 at 09:47pm, Chen Zhou wrote:
> >> To make the functions reserve_crashkernel[_low]() as generic,
> >> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> >>
> >> Signed-off-by: Chen Zhou 
> >> ---
> >>  arch/x86/kernel/setup.c | 11 ++-
> >>  1 file changed, 6 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> >> index d7fd90c52dae..71a6a6e7ca5b 100644
> >> --- a/arch/x86/kernel/setup.c
> >> +++ b/arch/x86/kernel/setup.c
> >> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
> >>unsigned long total_low_mem;
> >>int ret;
> >>  
> >> -  total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> >> +  total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);
> > total_low_mem != CRASH_ADDR_LOW_MAX
> I just replace the magic number with macro, no other change.
> Besides, function memblock_mem_size(limit_pfn) will compute the memory size
> according to the actual system ram.
> 

Ok, it is not obvious in patch this is 64bit only, I'm fine with this
then.



[PATCH] Only allow to set crash_kexec_post_notifiers on boot time

2020-09-17 Thread Dave Young
crash_kexec_post_notifiers enables running various panic notifier
before kdump kernel booting. This increases risks of kdump failure.
It is well documented in kernel-parameters.txt. We do not suggest
people to enable it together with kdump unless he/she is really sure.
This is also not suggested to be enabled by default when users are
not aware in distributions.

But unfortunately it is enabled by default in systemd, see below
discussions in a systemd report, we can not convince systemd to change
it:
https://github.com/systemd/systemd/issues/16661

Actually we have got reports about kdump kernel hangs in both s390x
and powerpcle cases caused by the systemd change,  also some x86 cases
could also be caused by the same (although that is in Hyper-V code
instead of systemd, that need to be addressed separately).

Thus to avoid the auto enablement here just disable the param writable
permission in sysfs.

Signed-off-by: Dave Young 
---
 kernel/panic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index aef8872ba843..bea44fc4eb3b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -695,7 +695,7 @@ core_param(panic, panic_timeout, int, 0644);
 core_param(panic_print, panic_print, ulong, 0644);
 core_param(pause_on_oops, pause_on_oops, int, 0644);
 core_param(panic_on_warn, panic_on_warn, int, 0644);
-core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
+core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0444);
 
 static int __init oops_setup(char *s)
 {
-- 
2.26.2



Re: [PATCH v12 3/9] x86: kdump: use macro CRASH_ADDR_LOW_MAX in functions reserve_crashkernel[_low]()

2020-09-17 Thread Dave Young
On 09/07/20 at 09:47pm, Chen Zhou wrote:
> To make the functions reserve_crashkernel[_low]() as generic,
> replace some hard-coded numbers with macro CRASH_ADDR_LOW_MAX.
> 
> Signed-off-by: Chen Zhou 
> ---
>  arch/x86/kernel/setup.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index d7fd90c52dae..71a6a6e7ca5b 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -430,7 +430,7 @@ static int __init reserve_crashkernel_low(void)
>   unsigned long total_low_mem;
>   int ret;
>  
> - total_low_mem = memblock_mem_size(1UL << (32 - PAGE_SHIFT));
> + total_low_mem = memblock_mem_size(CRASH_ADDR_LOW_MAX >> PAGE_SHIFT);

total_low_mem != CRASH_ADDR_LOW_MAX

>  
>   /* crashkernel=Y,low */
>   ret = parse_crashkernel_low(boot_command_line, total_low_mem, 
> _size, );

The param total_low_mem is for dynamically change crash_size according
to system ram size.

Is above change a must for your arm64 patches?

> @@ -451,7 +451,7 @@ static int __init reserve_crashkernel_low(void)
>   return 0;
>   }
>  
> - low_base = memblock_find_in_range(CRASH_ALIGN, 1ULL << 32, low_size, 
> CRASH_ALIGN);
> + low_base = memblock_find_in_range(CRASH_ALIGN, CRASH_ADDR_LOW_MAX, 
> low_size, CRASH_ALIGN);
>   if (!low_base) {
>   pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
> smaller size.\n",
>  (unsigned long)(low_size >> 20));
> @@ -504,8 +504,9 @@ static void __init reserve_crashkernel(void)
>   if (!crash_base) {
>   /*
>* Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> -  * crashkernel=x,high reserves memory over 4G, also allocates
> -  * 256M extra low memory for DMA buffers and swiotlb.
> +  * crashkernel=x,high reserves memory over CRASH_ADDR_LOW_MAX,
> +  * also allocates 256M extra low memory for DMA buffers
> +  * and swiotlb.
>* But the extra memory is not required for all machines.
>* So try low memory first and fall back to high memory
>* unless "crashkernel=size[KMG],high" is specified.
> @@ -539,7 +540,7 @@ static void __init reserve_crashkernel(void)
>   return;
>   }
>  
> - if (crash_base >= (1ULL << 32) && reserve_crashkernel_low()) {
> + if (crash_base >= CRASH_ADDR_LOW_MAX && reserve_crashkernel_low()) {
>   memblock_free(crash_base, crash_size);
>   return;
>   }
> -- 
> 2.20.1
> 



Re: [PATCH v12 1/9] x86: kdump: move CRASH_ALIGN to 2M

2020-09-07 Thread Dave Young
Hi,

On 09/07/20 at 09:47pm, Chen Zhou wrote:
> CONFIG_PHYSICAL_ALIGN can be selected from 2M to 16M and default
> value is 2M, so move CRASH_ALIGN to 2M, with smaller value reservation
> can have more chance to succeed.

Seems still some misunderstanding about the change :(  I'm sorry if I
did not explain it clearly.

Previously I missed the PHYSICAL_ALIGN can change according to .config
I mean we should change the value to CONFIG_PHYSICAL_ALIGN for X86
And I suggest to move back to keep using 16M.  And do not change it in
this series.

> And replace the hard-coded alignment with macro CRASH_ALIGN in function
> reserve_crashkernel().
> 
> Suggested-by: Dave Young 
> Signed-off-by: Chen Zhou 
> ---
>  arch/x86/include/asm/kexec.h | 3 +++
>  arch/x86/kernel/setup.c  | 5 +
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index 6802c59e8252..83f200dd54a1 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -18,6 +18,9 @@
>  
>  # define KEXEC_CONTROL_CODE_MAX_SIZE 2048
>  
> +/* 2M alignment for crash kernel regions */
> +#define CRASH_ALIGN  SZ_2M
> +
>  #ifndef __ASSEMBLY__
>  
>  #include 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3511736fbc74..296294ad0dd8 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -402,9 +402,6 @@ static void __init 
> memblock_x86_reserve_range_setup_data(void)
>  
>  #ifdef CONFIG_KEXEC_CORE
>  
> -/* 16M alignment for crash kernel regions */
> -#define CRASH_ALIGN  SZ_16M
> -
>  /*
>   * Keep the crash kernel below this limit.
>   *
> @@ -530,7 +527,7 @@ static void __init reserve_crashkernel(void)
>  
>   start = memblock_find_in_range(crash_base,
>  crash_base + crash_size,
> -crash_size, 1 << 20);
> +crash_size, CRASH_ALIGN);
>   if (start != crash_base) {
>   pr_info("crashkernel reservation failed - memory is in 
> use.\n");
>   return;
> -- 
> 2.20.1
> 

Thanks
Dave



Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-03 Thread Dave Young
On 09/04/20 at 12:02pm, chenzhou wrote:
> 
> 
> On 2020/9/4 11:10, Dave Young wrote:
> > On 09/04/20 at 11:04am, Dave Young wrote:
> >> On 09/03/20 at 07:26pm, chenzhou wrote:
> >>> Hi Catalin,
> >>>
> >>>
> >>> On 2020/9/3 1:09, Catalin Marinas wrote:
> >>>> On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
> >>>>> There are following issues in arm64 kdump:
> >>>>> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> >>>>> will fail when there is no enough low memory.
> >>>>> 2. If reserving crashkernel above 4G, in this case, crash dump
> >>>>> kernel will boot failure because there is no low memory available
> >>>>> for allocation.
> >>>>> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and 
> >>>>> ZONE_DMA32"),
> >>>>> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> >>>>> the devices in crash dump kernel need to use ZONE_DMA will alloc
> >>>>> fail.
> >>>>>
> >>>>> To solve these issues, change the behavior of crashkernel=X.
> >>>>> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
> >>>>> high allocation if it fails.
> >>>>>
> >>>>> If requized size X is too large and leads to very little free memory
> >>>>> in ZONE_DMA after low allocation, the system may not work normally.
> >>>>> So add a threshold and go for high allocation directly if the required
> >>>>> size is too large. The value of threshold is set as the half of
> >>>>> the low memory.
> >>>>>
> >>>>> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> >>>>> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> >>>>> specified size low memory.
> >>>> Except for the threshold to keep zone ZONE_DMA memory,
> >>>> reserve_crashkernel() looks very close to the x86 version. Shall we try
> >>>> to make this generic as well? In the first instance, you could avoid the
> >>>> threshold check if it takes an explicit ",high" option.
> >>> Ok, i will try to do this.
> >>>
> >>> I look into the function reserve_crashkernel() of x86 and found the start 
> >>> address is
> >>> CRASH_ALIGN in function memblock_find_in_range(), which is different with 
> >>> arm64.
> >>>
> >>> I don't figure out why is CRASH_ALIGN in x86, is there any specific 
> >>> reason?
> >> Hmm, took another look at the option CONFIG_PHYSICAL_ALIGN
> >> config PHYSICAL_ALIGN
> >> hex "Alignment value to which kernel should be aligned"
> >> default "0x20"
> >> range 0x2000 0x100 if X86_32
> >> range 0x20 0x100 if X86_64
> >>
> >> According to above, I think the 16M should come from the largest value
> >> But the default value is 2M,  with smaller value reservation can have
> >> more chance to succeed.
> >>
> >> It seems we still need arch specific CRASH_ALIGN, but the initial
> >> version you added the #ifdef for different arches, can you move the
> >> macro to arch specific headers?
> > And just keep the x86 align value as is, I can try to change the x86
> > value later to CONFIG_PHYSICAL_ALIGN, in this way this series can be
> > cleaner.
> Ok. I have no question about the value of macro CRASH_ALIGN,
> instead the lower bound of memblock_find_in_range().
> 
> For x86, in reserve_crashkernel(),restrict the lower bound of the range to 
> CRASH_ALIGN,
> ...
> crash_base = memblock_find_in_range(CRASH_ALIGN,
> CRASH_ADDR_LOW_MAX,
> crash_size, CRASH_ALIGN);
> ...
>
> in reserve_crashkernel_low(),with no this restriction.
> ...
> low_base = memblock_find_in_range(0, 1ULL << 32, low_size, CRASH_ALIGN);
> ...
> 
> How about all making memblock_find_in_range() search from the start of memory?
> If it is ok, i will do like this in the generic version.

I feel starting with CRASH_ALIGN sounds better, can you just search from
CRASH_ALIGN in generic version?

Thanks
Dave



Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-03 Thread Dave Young
On 09/04/20 at 11:04am, Dave Young wrote:
> On 09/03/20 at 07:26pm, chenzhou wrote:
> > Hi Catalin,
> > 
> > 
> > On 2020/9/3 1:09, Catalin Marinas wrote:
> > > On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
> > >> There are following issues in arm64 kdump:
> > >> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> > >> will fail when there is no enough low memory.
> > >> 2. If reserving crashkernel above 4G, in this case, crash dump
> > >> kernel will boot failure because there is no low memory available
> > >> for allocation.
> > >> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> > >> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> > >> the devices in crash dump kernel need to use ZONE_DMA will alloc
> > >> fail.
> > >>
> > >> To solve these issues, change the behavior of crashkernel=X.
> > >> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
> > >> high allocation if it fails.
> > >>
> > >> If requized size X is too large and leads to very little free memory
> > >> in ZONE_DMA after low allocation, the system may not work normally.
> > >> So add a threshold and go for high allocation directly if the required
> > >> size is too large. The value of threshold is set as the half of
> > >> the low memory.
> > >>
> > >> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> > >> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> > >> specified size low memory.
> > > Except for the threshold to keep zone ZONE_DMA memory,
> > > reserve_crashkernel() looks very close to the x86 version. Shall we try
> > > to make this generic as well? In the first instance, you could avoid the
> > > threshold check if it takes an explicit ",high" option.
> > Ok, i will try to do this.
> > 
> > I look into the function reserve_crashkernel() of x86 and found the start 
> > address is
> > CRASH_ALIGN in function memblock_find_in_range(), which is different with 
> > arm64.
> > 
> > I don't figure out why is CRASH_ALIGN in x86, is there any specific reason?
> 
> Hmm, took another look at the option CONFIG_PHYSICAL_ALIGN
> config PHYSICAL_ALIGN
> hex "Alignment value to which kernel should be aligned"
> default "0x20"
> range 0x2000 0x100 if X86_32
> range 0x20 0x100 if X86_64
> 
> According to above, I think the 16M should come from the largest value
> But the default value is 2M,  with smaller value reservation can have
> more chance to succeed.
> 
> It seems we still need arch specific CRASH_ALIGN, but the initial
> version you added the #ifdef for different arches, can you move the
> macro to arch specific headers?

And just keep the x86 align value as is, I can try to change the x86
value later to CONFIG_PHYSICAL_ALIGN, in this way this series can be
cleaner.

> 
> Thanks
> Dave



Re: [PATCH v11 3/5] arm64: kdump: reimplement crashkernel=X

2020-09-03 Thread Dave Young
On 09/03/20 at 07:26pm, chenzhou wrote:
> Hi Catalin,
> 
> 
> On 2020/9/3 1:09, Catalin Marinas wrote:
> > On Sat, Aug 01, 2020 at 09:08:54PM +0800, Chen Zhou wrote:
> >> There are following issues in arm64 kdump:
> >> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> >> will fail when there is no enough low memory.
> >> 2. If reserving crashkernel above 4G, in this case, crash dump
> >> kernel will boot failure because there is no low memory available
> >> for allocation.
> >> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> >> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> >> the devices in crash dump kernel need to use ZONE_DMA will alloc
> >> fail.
> >>
> >> To solve these issues, change the behavior of crashkernel=X.
> >> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
> >> high allocation if it fails.
> >>
> >> If requized size X is too large and leads to very little free memory
> >> in ZONE_DMA after low allocation, the system may not work normally.
> >> So add a threshold and go for high allocation directly if the required
> >> size is too large. The value of threshold is set as the half of
> >> the low memory.
> >>
> >> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> >> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> >> specified size low memory.
> > Except for the threshold to keep zone ZONE_DMA memory,
> > reserve_crashkernel() looks very close to the x86 version. Shall we try
> > to make this generic as well? In the first instance, you could avoid the
> > threshold check if it takes an explicit ",high" option.
> Ok, i will try to do this.
> 
> I look into the function reserve_crashkernel() of x86 and found the start 
> address is
> CRASH_ALIGN in function memblock_find_in_range(), which is different with 
> arm64.
> 
> I don't figure out why is CRASH_ALIGN in x86, is there any specific reason?

Hmm, took another look at the option CONFIG_PHYSICAL_ALIGN
config PHYSICAL_ALIGN
hex "Alignment value to which kernel should be aligned"
default "0x20"
range 0x2000 0x100 if X86_32
range 0x20 0x100 if X86_64

According to above, I think the 16M should come from the largest value
But the default value is 2M,  with smaller value reservation can have
more chance to succeed.

It seems we still need arch specific CRASH_ALIGN, but the initial
version you added the #ifdef for different arches, can you move the
macro to arch specific headers?

Thanks
Dave



Re: [PATCH 1/2] kexec: Add quick kexec support for kernel

2020-08-19 Thread Dave Young
On 08/17/20 at 01:14pm, James Morse wrote:
> Hi guys,
> 
> On 14/08/2020 20:22, Pavel Tatashin wrote:
> > On Fri, Aug 14, 2020 at 7:24 AM Dave Young  wrote:
> >> On 08/14/20 at 04:21pm, Sang Yan wrote:
> >>> On 08/14/20 14:58, Dave Young wrote:
> >>>> On 08/14/20 at 01:52am, Sang Yan wrote:
> >>> Yes, it's particularly obvious on arm64. I will add it to the patch log,
> >>> and test how long it takes on x86 and other arch.
> 
> Earlier versions of kexec-tools had the in-purgatory checksum enabled 
> unconditionally.
> More recent versions let you disable it, I think the parameter is called 
> no-checks. This
> saves some time, but the relocations still have to be done.
> 
> 
> >>>> About the arm64 problem, I know Pavel Tatashin is working on a patchset
> >>>> to improve the performance with enabling mmu.
> >>>>
> >>>> I added Pavel in cc, can you try his patches?
> >>>>
> >>> Thanks for your tips, I will try these patches. @Pavel.
> >>> Disable mmu after finishing copying pages?
> 
> >>>>> We introduce quick kexec to save time of copying memory as above,
> >>>>> just like kdump(kexec on crash), by using reserved memory
> >>>>> "Quick Kexec".
> >>>>
> >>>> This approach may have gain, but it also introduce extra requirements to
> >>>> pre-reserve a memory region.  I wonder how Eric thinks about the idea.
> >>>>
> >>>> Anyway the "quick" name sounds not very good, I would suggest do not
> >>>> introduce a new param, and the code can check if pre-reserved region
> >>>> exist then use it, if not then fallback to old way.
> >>>>
> >>> aha. I agree with it, but I thought it may change the old behaviors of
> >>> kexec_load.
> >>>
> >>> I will update a new patch without introducing new flags and new params.
> >>
> >> Frankly I'm still not sure it is worth to introduce a new interface if the
> >> improvement can be done in arch code like Pavel is doing.  Can you try
> >> that first?
> 
> > My patches will fix this issue. This is an ARM64 specific problem and
> > I did not see this to be performance problem on x86 during kexec
> > relocation. This happens because on ARM64 relocation is performed with
> > MMU disabled, and when MMU is disabled the caching is disabled as
> > well.
> 
> > I have a patch series that fixes this entirely, but James Morse
> > (+CCed) and I still have not agreed on the final approach. We had an
> > off-list conversation about it, and we need to continue it in public
> > ML.
> > 
> > Here is some history:
> > 
> > This is the original series that I sent a year ago. It basically
> > proposes the same thing as this series from Sang Yan:
> > https://lore.kernel.org/lkml/20190709182014.16052-1-pasha.tatas...@soleen.com/
> > 
> > Once, I realized that with enabling MMU the relocation is issue is
> > gone completely, I sent a new series, and this is the latest version
> > of that series:
> > https://lore.kernel.org/lkml/20200326032420.27220-1-pasha.tatas...@soleen.com/
> > 
> > It has been tested in production, and several people from different
> > companies commented to me that they are using it as well.
> > 
> > After my patch series was sent out, James created a new branch in his
> > tree with his approach of enabling MMU without having a new VA space,
> > but instead re-use what the kernel  has now. I have not tested that
> > branch yet.
> 
> For context, that is here:
> http://www.linux-arm.org/git?p=linux-jm.git;a=shortlog;h=refs/heads/kexec%2Bmmu/v0
> 
> I think we can maintain this approach, but it doesn't work for Pavel, as he 
> has extra
> requirements. I stopped looking at it because it became a solution no-one 
> needed.
> 
> 
> > Here are some comments from James Morse and the off-list discussion we had:
> > ---
> > It sounds like you are depending on write streaming mode to meet your
> > target performance.
> > This isn't even CPU specific, its cache and firmware configuration specific!
> > I don't think we should optimise a general purpose operating system
> > based on things like this.
> > ..
> > I think the best approach is going to be to eliminate the relocations 
> > entirely.> ...
> > I'm afraid I view this kexec-map thing as high-risk duct-tape over the
> > kexec core code
> > deliberately scattering the kexec payload.
&

Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-19 Thread Dave Young
On 08/18/20 at 03:07pm, chenzhou wrote:
> 
> 
> On 2020/8/10 14:03, Dave Young wrote:
> > Hi,
> >
> >>> Previously I remember we talked about to use similar logic as X86, but I
> >>> remember you mentioned on some arm64 platform there could be no low
> >>> memory at all.  Is this not a problem now for the fallback?  Just be
> >>> curious, thanks for the update, for the common part looks good.
> >> Hi Dave,
> >>
> >> Did you mean this discuss: https://lkml.org/lkml/2019/12/27/122?
> > I meant about this reply instead :)
> > https://lkml.org/lkml/2020/1/16/616
> Hi Dave,
> 
> Sorry for not repley in time, I was on holiday last week.

Hi, no problem, thanks for following up.

> 
> The platform James mentioned may exist for which have no devices and need no 
> low memory.
> For our arm64 server platform, there are some devices and need low memory.
> 
> I got it. For the platform with no low memory, reserving crashkernel will  
> always fail.
> How about like this:

I think the question should leave to Catalin or James, I have no
suggestion about this:)

> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index a8e34d97a894..4df18c7ea438 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -147,7 +147,7 @@ static void __init reserve_crashkernel(void)
> }
> memblock_reserve(crash_base, crash_size);
>  
> -   if (crash_base >= CRASH_ADDR_LOW_MAX) {
> +   if (memstart_addr < CRASH_ADDR_LOW_MAX && crash_base >= 
> CRASH_ADDR_LOW_MAX) {
> const char *rename = "Crash kernel (low)";
>  
> if (reserve_crashkernel_low()) {
> 
> 
> Thanks,
> Chen Zhou
> 
> >
> > Thanks
> > Dave
> >
> >
> > .
> >
> 
> 



Re: [PATCH 1/2] kexec: Add quick kexec support for kernel

2020-08-14 Thread Dave Young
Hi,

On 08/14/20 at 04:21pm, Sang Yan wrote:
> 
> 
> On 08/14/20 14:58, Dave Young wrote:
> > On 08/14/20 at 01:52am, Sang Yan wrote:
> >> In normal kexec, relocating kernel may cost 5 ~ 10 seconds, to
> >> copy all segments from vmalloced memory to kernel boot memory,
> >> because of disabled mmu.
> > 
> > It is not the case on all archs, I assume your case is arm64, please
> > describe it in patch log :)
> > 
> Yes, it's particularly obvious on arm64. I will add it to the patch log,
> and test how long it takes on x86 and other arch.
> 
> > About the arm64 problem, I know Pavel Tatashin is working on a patchset
> > to improve the performance with enabling mmu.
> > 
> > I added Pavel in cc, can you try his patches?
> > 
> Thanks for your tips, I will try these patches. @Pavel.
> Disable mmu after finishing copying pages?
> >>
> >> We introduce quick kexec to save time of copying memory as above,
> >> just like kdump(kexec on crash), by using reserved memory
> >> "Quick Kexec".
> > 
> > This approach may have gain, but it also introduce extra requirements to
> > pre-reserve a memory region.  I wonder how Eric thinks about the idea.
> > 
> > Anyway the "quick" name sounds not very good, I would suggest do not
> > introduce a new param, and the code can check if pre-reserved region
> > exist then use it, if not then fallback to old way.
> > 
> aha. I agree with it, but I thought it may change the old behaviors of
> kexec_load.
> 
> I will update a new patch without introducing new flags and new params.

Frankly I'm still not sure it is worth to introduce a new interface if the
improvement can be done in arch code like Pavel is doing.  Can you try
that first?

Thanks
Dave



Re: [PATCH 1/2] kexec: Add quick kexec support for kernel

2020-08-14 Thread Dave Young
On 08/14/20 at 01:52am, Sang Yan wrote:
> In normal kexec, relocating kernel may cost 5 ~ 10 seconds, to
> copy all segments from vmalloced memory to kernel boot memory,
> because of disabled mmu.

It is not the case on all archs, I assume your case is arm64, please
describe it in patch log :)

About the arm64 problem, I know Pavel Tatashin is working on a patchset
to improve the performance with enabling mmu.

I added Pavel in cc, can you try his patches?

> 
> We introduce quick kexec to save time of copying memory as above,
> just like kdump(kexec on crash), by using reserved memory
> "Quick Kexec".

This approach may have gain, but it also introduce extra requirements to
pre-reserve a memory region.  I wonder how Eric thinks about the idea.

Anyway the "quick" name sounds not very good, I would suggest do not
introduce a new param, and the code can check if pre-reserved region
exist then use it, if not then fallback to old way.

> 
> Constructing quick kimage as the same as crash kernel,
> then simply copy all segments of kimage to reserved memroy.
> 
> We also add this support in syscall kexec_load using flags
> of KEXEC_QUICK.
> 
> Signed-off-by: Sang Yan 
> ---
>  arch/Kconfig   | 10 ++
>  include/linux/ioport.h |  3 +++
>  include/linux/kexec.h  | 13 +++-
>  include/uapi/linux/kexec.h |  3 +++
>  kernel/kexec.c | 10 ++
>  kernel/kexec_core.c| 41 +-
>  6 files changed, 70 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 3329fa143637..eca782cb8e29 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -21,6 +21,16 @@ config KEXEC_CORE
>  config KEXEC_ELF
>   bool
>  
> +config QUICK_KEXEC
> + bool "Support for quick kexec"
> + depends on KEXEC_CORE
> + help
> +   Say y here to enable this feature.
> +   It use reserved memory to accelerate kexec, just like crash
> +   kexec, load new kernel and initrd to reserved memory, and
> +   boot new kernel on that memory. It will save the time of
> +   relocating kernel.
> +
>  config HAVE_IMA_KEXEC
>   bool
>  
> diff --git a/include/linux/ioport.h b/include/linux/ioport.h
> index 6c2b06fe8beb..f37c632accbe 100644
> --- a/include/linux/ioport.h
> +++ b/include/linux/ioport.h
> @@ -136,6 +136,9 @@ enum {
>   IORES_DESC_DEVICE_PRIVATE_MEMORY= 6,
>   IORES_DESC_RESERVED = 7,
>   IORES_DESC_SOFT_RESERVED= 8,
> +#ifdef CONFIG_QUICK_KEXEC
> + IORES_DESC_QUICK_KEXEC  = 9,
> +#endif
>  };
>  
>  /*
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 9e93bef52968..976bf9631070 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -269,9 +269,12 @@ struct kimage {
>   unsigned long control_page;
>  
>   /* Flags to indicate special processing */
> - unsigned int type : 1;
> + unsigned int type : 2;
>  #define KEXEC_TYPE_DEFAULT 0
>  #define KEXEC_TYPE_CRASH   1
> +#ifdef CONFIG_QUICK_KEXEC
> +#define KEXEC_TYPE_QUICK   2
> +#endif
>   unsigned int preserve_context : 1;
>   /* If set, we are using file mode kexec syscall */
>   unsigned int file_mode:1;
> @@ -331,6 +334,11 @@ extern int kexec_load_disabled;
>  #define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_PRESERVE_CONTEXT)
>  #endif
>  
> +#ifdef CONFIG_QUICK_KEXEC
> +#undef KEXEC_FLAGS
> +#define KEXEC_FLAGS(KEXEC_ON_CRASH | KEXEC_QUICK)
> +#endif
> +
>  /* List of defined/legal kexec file flags */
>  #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \
>KEXEC_FILE_NO_INITRAMFS)
> @@ -340,6 +348,9 @@ extern int kexec_load_disabled;
>  extern struct resource crashk_res;
>  extern struct resource crashk_low_res;
>  extern note_buf_t __percpu *crash_notes;
> +#ifdef CONFIG_QUICK_KEXEC
> +extern struct resource quick_kexec_res;
> +#endif
>  
>  /* flag to track if kexec reboot is in progress */
>  extern bool kexec_in_progress;
> diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h
> index 05669c87a0af..e3213614b713 100644
> --- a/include/uapi/linux/kexec.h
> +++ b/include/uapi/linux/kexec.h
> @@ -12,6 +12,9 @@
>  /* kexec flags for different usage scenarios */
>  #define KEXEC_ON_CRASH   0x0001
>  #define KEXEC_PRESERVE_CONTEXT   0x0002
> +#ifdef CONFIG_QUICK_KEXEC
> +#define KEXEC_QUICK  0x0004
> +#endif
>  #define KEXEC_ARCH_MASK  0x
>  
>  /*
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index f977786fe498..428af4cd3e1a 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -44,6 +44,9 @@ static int kimage_alloc_init(struct kimage **rimage, 
> unsigned long entry,
>   int ret;
>   struct kimage *image;
>   bool kexec_on_panic = flags & KEXEC_ON_CRASH;
> +#ifdef CONFIG_QUICK_KEXEC
> + bool kexec_on_quick = flags & KEXEC_QUICK;
> +#endif
>  
>   if 

Re: [RFC PATCH] printk: Change timestamp to triplet as mono, boot and real

2020-08-14 Thread Dave Young
On 08/11/20 at 12:40pm, Orson Zhai wrote:
> From: Thomas Gleixner 
> 
> Timestamps printed in kernel log are retrieved by local_clock which reads
> jiffies as a referrence clock source.
> But it is diffcult to be synchronized with logs generated out of kernel,
> say some remote processors (Modem) in the Soc of mobile phones. 
> Jiffies will be unchanged when system is in suspend mode but Modem will
> not.
> 
> So timestamps are required to be compensated with suspend time in some
> complecated scenarios especially for Android system. As an initial
> implementation was rejected by Linus, Thomas made this patch [1] 2 yeas
> ago to replace ts_nsec with a struct ts that consists 3 types of time:
> mono, boot and real.
> 
> This is an updated version which comes from patch [1] written by Thomas
> and suggestion [2] about VMCORE_INFO given by Linus.
> 
> It provides the prerequisite code to print kernel logs with boot or real
> timestamp in the future.
> 
> [1] https://lore.kernel.org/lkml/alpine.DEB.2.20.1711142341130.2221@nanos/
> [2] 
> https://lore.kernel.org/lkml/ca+55afz-vvpbow0rvdw4en5yhwauzcmvsyzrqec41qdfodi...@mail.gmail.com/
> 
> Cc. Thomas Gleixner 
> Cc. Linus Torvalds 
> Cc. Prarit Bhargava 
> Cc. Petr Mladek 
> Signed-off-by: Orson Zhai 
> Tested-by: Cixi Geng 
> ---
> This patch has been tested in qemu-x86-system. One problem is the timestamp
> in kernel log will be printed [0.00] for longer time than before. 
> 
> 1 [0.00] microcode: microcode updated early to revision 0x28, date = 
> 2019-11-12 
> 2 [0.00] Linux version 5.8.0+ (root@spreadtrum) (gcc (Ubuntu 
> 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu) 2.30) #2 SMP 
> Fri Aug 7 21:30:51 CST 2020
> 3 [0.00] Command line: BOOT_IMAGE=/vmlinuz-5.8.0+ 
> root=UUID=4d889bca-be18-433d-9b86-c1c1714cc293 ro quiet splash vt.handoff=1
> 4 [0.00] KERNEL supported cpus: 
> 5 [0.00]   Intel GenuineIntel
> 
> 223 [0.00]  Tracing variant of Tasks RCU enabled. 
> 224 [0.00] rcu: RCU calculated value of scheduler-enlistment delay is 
> 25 jiffies. 
> 225 [0.00] rcu: Adjusting geometry for rcu_fanout_leaf=16, 
> nr_cpu_ids=4 
> 226 [0.00] NR_IRQS: 524544, nr_irqs: 456, preallocated irqs: 16 
> 227 [0.059662] random: crng done (trusting CPU's manufacturer) 
>  
> 228 [0.059662] Console: colour dummy device 80x25 
> 229 [0.059662] printk: console [tty0] enabled 
> 230 [0.059662] ACPI: Core revision 20200528 
> 
> compared to old log:
> 
> 69 [0.00] reserve setup_data: [mem 
> 0x0001-0x00021fdf] usable
> 70 [0.00] efi: EFI v2.31 by American Megatrends
> 71 [0.00] efi: ESRT=0xdbf69818 ACPI=0xdaef8000 ACPI 2.0=0xdaef8000 
> SMBIOS=0xf0480
> 72 [0.00] SMBIOS 2.8 present.
> 73 [0.00] DMI: LENOVO ThinkCentre M4500t-N000/, BIOS FCKT58AUS 
> 09/17/2014
> 74 [0.00] tsc: Fast TSC calibration using PIT
> 75 [0.00] tsc: Detected 3292.477 MHz processor
> 76 [0.000503] e820: update [mem 0x-0x0fff] usable ==> reserved
> 
> 77 [0.000504] e820: remove [mem 0x000a-0x000f] usable
> 78 [0.000510] last_pfn = 0x21fe00 max_arch_pfn = 0x4
> 79 [0.000513] MTRR default type: uncachable
>  
>  include/linux/crash_core.h  |  6 --
>  include/linux/timekeeping.h | 15 +++
>  kernel/printk/printk.c  | 34 +++---
>  kernel/time/timekeeping.c   | 36 +---
>  4 files changed, 63 insertions(+), 28 deletions(-)
> 
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index 525510a..85b96cd 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -50,9 +50,11 @@ phys_addr_t paddr_vmcoreinfo_note(void);
>  #define VMCOREINFO_STRUCT_SIZE(name) \
>   vmcoreinfo_append_str("SIZE(%s)=%lu\n", #name, \
> (unsigned long)sizeof(struct name))
> -#define VMCOREINFO_OFFSET(name, field) \
> +#define VMCOREINFO_FIELD_OFFSET(name, field, offset) \
>   vmcoreinfo_append_str("OFFSET(%s.%s)=%lu\n", #name, #field, \
> -   (unsigned long)offsetof(struct name, field))
> +   (unsigned long)offsetof(struct name, offset))
> +#define VMCOREINFO_OFFSET(name, field) \
> + VMCOREINFO_FIELD_OFFSET(name, field, field)

We do not regard the VMCOREINFO* as a strict ABI,  makedumpfile and
crash tool can update to check the new field first then fallback to the old 
name.

These crash dump tools are not usual userspace applications, although we
try not to break them, but if we have to and a tool update can make it
work again then I suggest we go with the change instead of doing the
trick.

Thanks
Dave



Re: [PATCH] kexec: Delete an unnecessary comparison

2020-08-13 Thread Dave Young
On 08/13/20 at 08:45pm, Youling Tang wrote:
> Regardless of whether the ret value is zero or non-zero, the trajectory
> of the program execution is the same, so there is no need to compare.
> 
> Signed-off-by: Youling Tang 
> ---
>  kernel/kexec_file.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 78c0837..3ad0ae2 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -800,8 +800,6 @@ static int kexec_calculate_store_digests(struct kimage 
> *image)
>  
>   ret = kexec_purgatory_get_set_symbol(image, 
> "purgatory_sha256_digest",
>digest, 
> SHA256_DIGEST_SIZE, 0);
> - if (ret)
> - goto out_free_digest;
>   }
>  
>  out_free_digest:
> -- 
> 2.1.0
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-10 Thread Dave Young
Hi,

> > Previously I remember we talked about to use similar logic as X86, but I
> > remember you mentioned on some arm64 platform there could be no low
> > memory at all.  Is this not a problem now for the fallback?  Just be
> > curious, thanks for the update, for the common part looks good.
> Hi Dave,
> 
> Did you mean this discuss: https://lkml.org/lkml/2019/12/27/122?

I meant about this reply instead :)
https://lkml.org/lkml/2020/1/16/616

Thanks
Dave



Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-09 Thread Dave Young
On 08/10/20 at 11:28am, chenzhou wrote:
> On 2020/8/8 18:02, Dave Young wrote:
> > On 08/01/20 at 09:08pm, Chen Zhou wrote:
> >> Now the behavior of crashkernel=X has been changed, which tries low
> >> allocation in ZONE_DMA, and fall back to high allocation if it fails.
> >>
> >> If requized size X is too large and leads to very little free memory
> >> in ZONE_DMA after low allocation, the system may not work well.
> >> So add a threshold and go for high allocation directly if the required
> >> size is too large. The threshold is set as the half of low memory.
> >>
> >> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> >> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> >> specified size low memory. For non-RPi4 platforms, change ZONE_DMA
> >> memtioned above to ZONE_DMA32.
> >>
> >> So update the Documentation.
> >>
> >> Signed-off-by: Chen Zhou 
> >> ---
> >>  Documentation/admin-guide/kdump/kdump.rst | 21 ---
> >>  .../admin-guide/kernel-parameters.txt | 11 --
> >>  2 files changed, 27 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> >> b/Documentation/admin-guide/kdump/kdump.rst
> >> index 2da65fef2a1c..4b58f97351d5 100644
> >> --- a/Documentation/admin-guide/kdump/kdump.rst
> >> +++ b/Documentation/admin-guide/kdump/kdump.rst
> >> @@ -299,7 +299,15 @@ Boot into System Kernel
> >> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of 
> >> memory
> >> starting at physical address 0x0100 (16MB) for the dump-capture 
> >> kernel.
> >>  
> >> -   On x86 and x86_64, use "crashkernel=64M@16M".
> >> +   On x86 use "crashkernel=64M@16M".
> >> +
> >> +   On x86_64, use "crashkernel=X" to select a region under 4G first, and
> >> +   fall back to reserve region above 4G.
> >> +   We can also use "crashkernel=X,high" to select a region above 4G, which
> >> +   also tries to allocate at least 256M below 4G automatically and
> >> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> >> +   Use "crashkernel=Y@X" if you really have to reserve memory from 
> >> specified
> >> +   start address X.
> >>  
> >> On ppc64, use "crashkernel=128M@32M".
> >>  
> >> @@ -316,8 +324,15 @@ Boot into System Kernel
> >> kernel will automatically locate the crash kernel image within the
> >> first 512MB of RAM if X is not given.
> >>  
> >> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> >> -   the kernel, X if explicitly specified, must be aligned to 2MiB 
> >> (0x20).
> >> +   On arm64, use "crashkernel=X" to try low allocation in ZONE_DMA, and
> >> +   fall back to high allocation if it fails. And go for high allocation
> >> +   directly if the required size is too large. If crash_base is outside
> >> +   ZONE_DMA, try to allocate at least 256M in ZONE_DMA automatically.
> >> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> >> +   For non-RPi4 platforms, change ZONE_DMA memtioned above to ZONE_DMA32.
> >> +   Use "crashkernel=Y@X" if you really have to reserve memory from
> >> +   specified start address X. Note that the start address of the kernel,
> >> +   X if explicitly specified, must be aligned to 2MiB (0x20).
> >>  
> >>  Load the Dump-capture Kernel
> >>  
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> >> b/Documentation/admin-guide/kernel-parameters.txt
> >> index fb95fad81c79..d1b6016850d6 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -722,6 +722,10 @@
> >>[KNL, x86_64] select a region under 4G first, and
> >>fall back to reserve region above 4G when '@offset'
> >>hasn't been specified.
> >> +  [KNL, arm64] Try low allocation in ZONE_DMA, fall back
> >> +  to high allocation if it fails when '@offset' hasn't 
> >> been
> >> +  specified. For non-RPi4 platforms, change ZONE_DMA to
> &g

Re: [PATCH v11 5/5] kdump: update Documentation about crashkernel

2020-08-08 Thread Dave Young
>   It will be ignored when crashkernel=X,high is not used
>   or memory reserved is below 4G.
> -
> + [KNL, arm64] range under 4G.
> + This one let user to specify a low range in ZONE_DMA for
> + crash dump kernel. For non-RPi4 platforms, change 
> ZONE_DMA
> +     to ZONE_DMA32.
>   cryptomgr.notests
>   [KNL] Disable crypto self-tests
>  
> -- 
> 2.20.1
> 

Hi Chen,

Previously I remember we talked about to use similar logic as X86, but I
remember you mentioned on some arm64 platform there could be no low
memory at all.  Is this not a problem now for the fallback?  Just be
curious, thanks for the update, for the common part looks good.

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v11 2/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2020-08-08 Thread Dave Young
from lib/swiotlb.c:
> +  * -swiotlb size: user-specified with swiotlb= or default.
> +  *
> +  * -swiotlb overflow buffer: now hardcoded to 32k. We round it
> +  * to 8M for other buffers that may need to stay low too. Also
> +  * make sure we allocate enough extra low memory so that we
> +  * don't run out of DMA buffers for 32-bit devices.
> +  */
> + low_size = max(swiotlb_size_or_default() + (8UL << 20),
> + 256UL << 20);
> + } else {
> + /* passed with crashkernel=0,low ? */
> + if (!low_size)
> + return 0;
> + }
> +
> + low_base = memblock_find_in_range(0, CRASH_ADDR_LOW_MAX, low_size, 
> CRASH_ALIGN);
> + if (!low_base) {
> + pr_err("Cannot reserve %ldMB crashkernel low memory, please try 
> smaller size.\n",
> +(unsigned long)(low_size >> 20));
> + return -ENOMEM;
> + }
> +
> + ret = memblock_reserve(low_base, low_size);
> + if (ret) {
> + pr_err("%s: Error reserving crashkernel low memblock.\n",
> + __func__);
> + return ret;
> + }
> +
> + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System 
> low RAM: %ldMB)\n",
> + (unsigned long)(low_size >> 20),
> + (unsigned long)(low_base >> 20),
> + (unsigned long)(total_low_mem >> 20));
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end   = low_base + low_size - 1;
> +#endif
> + return 0;
> +}
> +
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len)
>  {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..db66bbabfff3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>  /* Flag to indicate we are going to kexec a new kernel */
>  bool kexec_in_progress = false;
>  
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> - .name  = "Crash kernel",
> - .start = 0,
> - .end   = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc  = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> - .name  = "Crash kernel",
> - .start = 0,
> - .end   = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc  = IORES_DESC_CRASH_KERNEL
> -};
> -
>  int kexec_should_crash(struct task_struct *p)
>  {
>   /*
> -- 
> 2.20.1
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCHv2] x86/purgatory: don't generate debug info for purgatory.ro

2020-08-06 Thread Dave Young
On 08/03/20 at 01:49pm, Pingfan Liu wrote:
> Purgatory.ro is a standalone binary that is not linked against the rest of
> the kernel.  Its image is copied into an array that is linked to the
> kernel, and from there kexec relocates it wherever it desires.
> 
> Unlike the debug info for vmlinux, which can be used for analyzing crash
> such info is useless in purgatory.ro. And discarding them can save about
> 200K space.
> 
> Original:
>   259080  kexec-purgatory.o
> Stripped debug info:
>29152  kexec-purgatory.o
> 
> Signed-off-by: Pingfan Liu 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: "H. Peter Anvin" 
> Cc: Hans de Goede 
> Cc: Nick Desaulniers 
> Cc: Arvind Sankar 
> Cc: Steve Wahl 
> Cc: linux-kernel@vger.kernel.org
> Cc: ke...@lists.infradead.org
> To: x...@kernel.org
> ---
>  arch/x86/purgatory/Makefile | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/purgatory/Makefile b/arch/x86/purgatory/Makefile
> index 088bd76..d24b43a 100644
> --- a/arch/x86/purgatory/Makefile
> +++ b/arch/x86/purgatory/Makefile
> @@ -32,7 +32,7 @@ KCOV_INSTRUMENT := n
>  # make up the standalone purgatory.ro
>  
>  PURGATORY_CFLAGS_REMOVE := -mcmodel=kernel
> -PURGATORY_CFLAGS := -mcmodel=large -ffreestanding 
> -fno-zero-initialized-in-bss
> +PURGATORY_CFLAGS := -mcmodel=large -ffreestanding 
> -fno-zero-initialized-in-bss -g0
>  PURGATORY_CFLAGS += $(DISABLE_STACKLEAK_PLUGIN) -DDISABLE_BRANCH_PROFILING
>  PURGATORY_CFLAGS += $(call cc-option,-fno-stack-protector)
>  
> @@ -64,6 +64,9 @@ CFLAGS_sha256.o += $(PURGATORY_CFLAGS)
>  CFLAGS_REMOVE_string.o   += $(PURGATORY_CFLAGS_REMOVE)
>  CFLAGS_string.o  += $(PURGATORY_CFLAGS)
>  
> +AFLAGS_REMOVE_setup-x86_$(BITS).o+= -Wa,-gdwarf-2
> +AFLAGS_REMOVE_entry64.o  += -Wa,-gdwarf-2
> +
>  $(obj)/purgatory.ro: $(PURGATORY_OBJS) FORCE
>   $(call if_changed,ld)
>  
> -- 
> 2.7.5
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

Hi Pingfan,

Looks good, thanks for the patch.

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v11 0/5] support reserving crashkernel above 4G on arm64 kdump

2020-08-06 Thread Dave Young
Hi Chen,

Thanks for the update. I was busy on other things, I will review your 
x86/common changes
this weekend or early next week.

On 08/01/20 at 09:08pm, Chen Zhou wrote:
> There are following issues in arm64 kdump:
> 1. We use crashkernel=X to reserve crashkernel below 4G, which
> will fail when there is no enough low memory.
> 2. If reserving crashkernel above 4G, in this case, crash dump
> kernel will boot failure because there is no low memory available
> for allocation.
> 3. Since commit 1a8e1cef7603 ("arm64: use both ZONE_DMA and ZONE_DMA32"),
> if the memory reserved for crash dump kernel falled in ZONE_DMA32,
> the devices in crash dump kernel need to use ZONE_DMA will alloc
> fail.
> 
> To solve these issues, change the behavior of crashkernel=X.
> crashkernel=X tries low allocation in ZONE_DMA, and fall back to
> high allocation if it fails.
> 
> If requized size X is too large and leads to very little free memory
> in ZONE_DMA after low allocation, the system may not work normally.
> So add a threshold and go for high allocation directly if the required
> size is too large. The value of threshold is set as the half of
> the low memory.
> 
> If crash_base is outside ZONE_DMA, try to allocate at least 256M in
> ZONE_DMA automatically. "crashkernel=Y,low" can be used to allocate
> specified size low memory.
> For non-RPi4 platforms, change ZONE_DMA memtioned above to ZONE_DMA32.
> 
> When reserving crashkernel in high memory, some low memory is reserved
> for crash dump kernel devices. So there may be two regions reserved for
> crash dump kernel, one is below 4G, the other is above 4G.
> In order to distinct from the high region and make no effect to the use
> of existing kexec-tools, rename the low region as "Crash kernel (low)",
> and pass the low region by reusing DT property
> "linux,usable-memory-range". We made the low memory region as the last
> range of "linux,usable-memory-range" to keep compatibility with existing
> user-space and older kdump kernels.
> 
> Besides, we need to modify kexec-tools:
> arm64: support more than one crash kernel regions(see [1])
> 
> Another update is document about DT property 'linux,usable-memory-range':
> schemas: update 'linux,usable-memory-range' node schema(see [2])
> 
> Changes since [v10]
> - Reimplement crashkernel=X suggested by Catalin, Many thanks to Catalin.
> 
> Changes since [v9]
> - Patch 1 add Acked-by from Dave.
> - Update patch 5 according to Dave's comments.
> - Update chosen schema.
> 
> Changes since [v8]
> - Reuse DT property "linux,usable-memory-range".
> Suggested by Rob, reuse DT property "linux,usable-memory-range" to pass the 
> low
> memory region.
> - Fix kdump broken with ZONE_DMA reintroduced.
> - Update chosen schema.
> 
> Changes since [v7]
> - Move x86 CRASH_ALIGN to 2M
> Suggested by Dave and do some test, move x86 CRASH_ALIGN to 2M.
> - Update Documentation/devicetree/bindings/chosen.txt.
> Add corresponding documentation to 
> Documentation/devicetree/bindings/chosen.txt
> suggested by Arnd.
> - Add Tested-by from Jhon and pk.
> 
> Changes since [v6]
> - Fix build errors reported by kbuild test robot.
> 
> Changes since [v5]
> - Move reserve_crashkernel_low() into kernel/crash_core.c.
> - Delete crashkernel=X,high.
> - Modify crashkernel=X,low.
> If crashkernel=X,low is specified simultaneously, reserve spcified size low
> memory for crash kdump kernel devices firstly and then reserve memory above 
> 4G.
> In addition, rename crashk_low_res as "Crash kernel (low)" for arm64, and then
> pass to crash dump kernel by DT property "linux,low-memory-range".
> - Update Documentation/admin-guide/kdump/kdump.rst.
> 
> Changes since [v4]
> - Reimplement memblock_cap_memory_ranges for multiple ranges by Mike.
> 
> Changes since [v3]
> - Add memblock_cap_memory_ranges back for multiple ranges.
> - Fix some compiling warnings.
> 
> Changes since [v2]
> - Split patch "arm64: kdump: support reserving crashkernel above 4G" as
> two. Put "move reserve_crashkernel_low() into kexec_core.c" in a separate
> patch.
> 
> Changes since [v1]:
> - Move common reserve_crashkernel_low() code into kernel/kexec_core.c.
> - Remove memblock_cap_memory_ranges() i added in v1 and implement that
> in fdt_enforce_memory_region().
> There are at most two crash kernel regions, for two crash kernel regions
> case, we cap the memory range [min(regs[*].start), max(regs[*].end)]
> and then remove the memory range in the middle.
> 
> [1]: http://lists.infradead.org/pipermail/kexec/2020-June/020737.html
> [2]: https://github.com/robherring/dt-schema/pull/19 
> [v1]: https://lkml.org/lkml/2019/4/2/1174
> [v2]: https://lkml.org/lkml/2019/4/9/86
> [v3]: https://lkml.org/lkml/2019/4/9/306
> [v4]: https://lkml.org/lkml/2019/4/15/273
> [v5]: https://lkml.org/lkml/2019/5/6/1360
> [v6]: https://lkml.org/lkml/2019/8/30/142
> [v7]: https://lkml.org/lkml/2019/12/23/411
> [v8]: https://lkml.org/lkml/2020/5/21/213
> [v9]: https://lkml.org/lkml/2020/6/28/73
> [v10]: 

Re: [PATCH 0/3] x86/kexec_file: Fix some corners bugs and improve the crash_exclude_mem_range()

2020-08-05 Thread Dave Young
Hi Lianbo,

Added Andrew in cc.
On 08/04/20 at 12:49pm, Lianbo Jiang wrote:
> This series includes the following patches, it fixes some corners bugs
> and improves the crash_exclude_mem_range().
> 
> [1] [PATCH 1/3] x86/crash: Correct the address boundary of function
> parameters
> [2] [PATCH 2/3] kexec: Improve the crash_exclude_mem_range() to handle
> the overlapping ranges
> [3] [PATCH 3/3] kexec_file: correctly output debugging information for
> the PT_LOAD elf header
> 
> Lianbo Jiang (3):
>   x86/crash: Correct the address boundary of function parameters
>   kexec: Improve the crash_exclude_mem_range() to handle the overlapping
> ranges
>   kexec_file: correctly output debugging information for the PT_LOAD elf
> header
> 
>  arch/x86/kernel/crash.c |  2 +-
>  kernel/kexec_file.c | 33 ++---
>  2 files changed, 23 insertions(+), 12 deletions(-)
> 
> -- 
> 2.17.1

Looks good, thanks for the patches

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH] Replace HTTP links with HTTPS ones: kdump

2020-07-07 Thread Dave Young
On 07/08/20 at 05:20am, Alexander A. Klimov wrote:
> 
> 
> Am 08.07.20 um 05:17 schrieb Dave Young:
> > On 07/01/20 at 07:33pm, Alexander A. Klimov wrote:
> > > 
> > > 
> > > Am 01.07.20 um 09:58 schrieb Dave Young:
> > > > On 06/27/20 at 12:31pm, Alexander A. Klimov wrote:
> > > > > Rationale:
> > > > > Reduces attack surface on kernel devs opening the links for MITM
> > > > > as HTTPS traffic is much harder to manipulate.
> > > > > 
> > > > > Deterministic algorithm:
> > > > > For each file:
> > > > > If not .svg:
> > > > >   For each line:
> > > > > If doesn't contain `\bxmlns\b`:
> > > > >   For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> > > > > If both the HTTP and HTTPS versions
> > > > > return 200 OK and serve the same content:
> > > > >   Replace HTTP with HTTPS.
> > > > > 
> > > > > Signed-off-by: Alexander A. Klimov 
> > > > > ---
> > > > >If there are any URLs to be removed completely or at least not 
> > > > > HTTPSified:
> > > > >Just clearly say so and I'll *undo my change*.
> > > > >See also https://lkml.org/lkml/2020/6/27/64
> > > > > 
> > > > >If there are any valid, but yet not changed URLs:
> > > > >See https://lkml.org/lkml/2020/6/26/837
> > > > > 
> > > > >Documentation/admin-guide/kdump/kdump.rst | 10 +-
> > > > >1 file changed, 5 insertions(+), 5 deletions(-)
> > > > > 
> > > > > diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> > > > > b/Documentation/admin-guide/kdump/kdump.rst
> > > > > index 2da65fef2a1c..8cfa35f777f5 100644
> > > > > --- a/Documentation/admin-guide/kdump/kdump.rst
> > > > > +++ b/Documentation/admin-guide/kdump/kdump.rst
> > > > > @@ -65,20 +65,20 @@ Install kexec-tools
> > > > >2) Download the kexec-tools user-space package from the following 
> > > > > URL:
> > > > > -http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
> > > > > +https://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
> > > > >This is a symlink to the latest version.
> > > > >The latest kexec-tools git tree is available at:
> > > > >- git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> > > > > -- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> > > > > +- https://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> > > > >There is also a gitweb interface available at
> > > > > -http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
> > > > > +https://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
> > > > >More information about kexec-tools can be found at
> > > > > -http://horms.net/projects/kexec/
> > > > > +https://horms.net/projects/kexec/
> > > > >3) Unpack the tarball with the tar command, as follows::
> > > > > @@ -511,7 +511,7 @@ dump kernel.
> > > > >You can also use the Crash utility to analyze dump files in Kdump
> > > > >format. Crash is available on Dave Anderson's site at the 
> > > > > following URL:
> > > > > -   http://people.redhat.com/~anderson/
> > > > > +   https://people.redhat.com/~anderson/
> > > > 
> > > > Would you mind to update above url as well?
> > > I'll update all of the URLs not changed yet, but (please) not in this 
> > > patch
> > > round.
> > > 
> > > See also https://lkml.org/lkml/2020/6/26/837
> > 
> > If this series can be taken soon then we can wait and submit a patch
> > later.
> > 
> > Or just drop this one from your series, I can submit another one to take
> > the https and crash url together later.
> > 
> > Either works.
> Or (if you don't apply as-is) I'll just cover patched+non-patched together
> in round II.

Also good if you will post another round. feel free to add for either
this one or the potential updated one:

Acked-by: Dave Young 

Thanks
Dave

> 
> > 
> > > 
> > > > 
> > > > Dave have moved it to below url instead:
> > > > https://crash-utility.github.io/
> > > > 
> > > > Thanks
> > > > Dave
> > > > 
> > > 
> > 
> 



Re: [PATCH] Replace HTTP links with HTTPS ones: kdump

2020-07-07 Thread Dave Young
On 07/01/20 at 07:33pm, Alexander A. Klimov wrote:
> 
> 
> Am 01.07.20 um 09:58 schrieb Dave Young:
> > On 06/27/20 at 12:31pm, Alexander A. Klimov wrote:
> > > Rationale:
> > > Reduces attack surface on kernel devs opening the links for MITM
> > > as HTTPS traffic is much harder to manipulate.
> > > 
> > > Deterministic algorithm:
> > > For each file:
> > >If not .svg:
> > >  For each line:
> > >If doesn't contain `\bxmlns\b`:
> > >  For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
> > >If both the HTTP and HTTPS versions
> > >return 200 OK and serve the same content:
> > >  Replace HTTP with HTTPS.
> > > 
> > > Signed-off-by: Alexander A. Klimov 
> > > ---
> > >   If there are any URLs to be removed completely or at least not 
> > > HTTPSified:
> > >   Just clearly say so and I'll *undo my change*.
> > >   See also https://lkml.org/lkml/2020/6/27/64
> > > 
> > >   If there are any valid, but yet not changed URLs:
> > >   See https://lkml.org/lkml/2020/6/26/837
> > > 
> > >   Documentation/admin-guide/kdump/kdump.rst | 10 +-
> > >   1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> > > b/Documentation/admin-guide/kdump/kdump.rst
> > > index 2da65fef2a1c..8cfa35f777f5 100644
> > > --- a/Documentation/admin-guide/kdump/kdump.rst
> > > +++ b/Documentation/admin-guide/kdump/kdump.rst
> > > @@ -65,20 +65,20 @@ Install kexec-tools
> > >   2) Download the kexec-tools user-space package from the following URL:
> > > -http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
> > > +https://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
> > >   This is a symlink to the latest version.
> > >   The latest kexec-tools git tree is available at:
> > >   - git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> > > -- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> > > +- https://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> > >   There is also a gitweb interface available at
> > > -http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
> > > +https://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
> > >   More information about kexec-tools can be found at
> > > -http://horms.net/projects/kexec/
> > > +https://horms.net/projects/kexec/
> > >   3) Unpack the tarball with the tar command, as follows::
> > > @@ -511,7 +511,7 @@ dump kernel.
> > >   You can also use the Crash utility to analyze dump files in Kdump
> > >   format. Crash is available on Dave Anderson's site at the following URL:
> > > -   http://people.redhat.com/~anderson/
> > > +   https://people.redhat.com/~anderson/
> > 
> > Would you mind to update above url as well?
> I'll update all of the URLs not changed yet, but (please) not in this patch
> round.
> 
> See also https://lkml.org/lkml/2020/6/26/837

If this series can be taken soon then we can wait and submit a patch
later.

Or just drop this one from your series, I can submit another one to take
the https and crash url together later.

Either works.

> 
> > 
> > Dave have moved it to below url instead:
> > https://crash-utility.github.io/
> > 
> > Thanks
> > Dave
> > 
> 



Re: [PATCH v2 01/12] kexec_file: allow archs to handle special regions while locating memory hole

2020-07-06 Thread Dave Young
On 07/03/20 at 01:24am, Hari Bathini wrote:
> Some architectures may have special memory regions, within the given
> memory range, which can't be used for the buffer in a kexec segment.
> Implement weak arch_kexec_locate_mem_hole() definition which arch code
> may override, to take care of special regions, while trying to locate
> a memory hole.
> 
> Also, add the missing declarations for arch overridable functions and
> and drop the __weak descriptors in the declarations to avoid non-weak
> definitions from becoming weak.
> 
> Reported-by: kernel test robot 
> [lkp: In v1, arch_kimage_file_post_load_cleanup() declaration was missing]
> Signed-off-by: Hari Bathini 
> ---
> 
> Changes in v2:
> * Introduced arch_kexec_locate_mem_hole() for override and dropped
>   weak arch_kexec_add_buffer().
> * Dropped __weak identifier for arch overridable functions.
> * Fixed the missing declaration for arch_kimage_file_post_load_cleanup()
>   reported by lkp. lkp report for reference:
> - https://lore.kernel.org/patchwork/patch/1264418/
> 
> 
>  include/linux/kexec.h |   29 ++---
>  kernel/kexec_file.c   |   16 ++--
>  2 files changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index ea67910..9e93bef 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -183,17 +183,24 @@ int kexec_purgatory_get_set_symbol(struct kimage 
> *image, const char *name,
>  bool get_value);
>  void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char 
> *name);
>  
> -int __weak arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> -  unsigned long buf_len);
> -void * __weak arch_kexec_kernel_image_load(struct kimage *image);
> -int __weak arch_kexec_apply_relocations_add(struct purgatory_info *pi,
> - Elf_Shdr *section,
> - const Elf_Shdr *relsec,
> - const Elf_Shdr *symtab);
> -int __weak arch_kexec_apply_relocations(struct purgatory_info *pi,
> - Elf_Shdr *section,
> - const Elf_Shdr *relsec,
> - const Elf_Shdr *symtab);
> +/* Architectures may override the below functions */
> +int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
> +   unsigned long buf_len);
> +void *arch_kexec_kernel_image_load(struct kimage *image);
> +int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
> +  Elf_Shdr *section,
> +  const Elf_Shdr *relsec,
> +  const Elf_Shdr *symtab);
> +int arch_kexec_apply_relocations(struct purgatory_info *pi,
> +  Elf_Shdr *section,
> +  const Elf_Shdr *relsec,
> +  const Elf_Shdr *symtab);
> +int arch_kimage_file_post_load_cleanup(struct kimage *image);
> +#ifdef CONFIG_KEXEC_SIG
> +int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf,
> +  unsigned long buf_len);
> +#endif
> +int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf);
>  
>  extern int kexec_add_buffer(struct kexec_buf *kbuf);
>  int kexec_locate_mem_hole(struct kexec_buf *kbuf);
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index 09cc78d..e89912d 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -636,6 +636,19 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
>  }
>  
>  /**
> + * arch_kexec_locate_mem_hole - Find free memory to place the segments.
> + * @kbuf:   Parameters for the memory search.
> + *
> + * On success, kbuf->mem will have the start address of the memory region 
> found.
> + *
> + * Return: 0 on success, negative errno on error.
> + */
> +int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
> +{
> + return kexec_locate_mem_hole(kbuf);
> +}
> +
> +/**
>   * kexec_add_buffer - place a buffer in a kexec segment
>   * @kbuf:Buffer contents and memory parameters.
>   *
> @@ -647,7 +660,6 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf)
>   */
>  int kexec_add_buffer(struct kexec_buf *kbuf)
>  {
> -
>   struct kexec_segment *ksegment;
>   int ret;
>  
> @@ -675,7 +687,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
>   kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
>  
>   /* Walk the RAM ranges and allocate a suitable range for the buffer */
> - ret = kexec_locate_mem_hole(kbuf);
> + ret = arch_kexec_locate_mem_hole(kbuf);
>   if (ret)
>   return ret;
>  
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v10 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-03 Thread Dave Young
On 07/03/20 at 12:50pm, Dave Young wrote:
> On 07/03/20 at 12:46pm, Dave Young wrote:
> > Hi,
> > 
> > Thanks for the update, but still some nitpicks :(
> > 
> > I'm sorry I did not catch them previously,  but maybe it is not worth to
> > repost the whole series if no other changes needed.
> 
> Feel free to add my acks for the common kdump part:

Forgot to add "With those typos fixed":)

> 
> Acked-by: Dave Young 
> 
> Thanks
> Dave



Re: [PATCH v10 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-02 Thread Dave Young
On 07/03/20 at 12:46pm, Dave Young wrote:
> Hi,
> 
> Thanks for the update, but still some nitpicks :(
> 
> I'm sorry I did not catch them previously,  but maybe it is not worth to
> repost the whole series if no other changes needed.

Feel free to add my acks for the common kdump part:

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v10 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-02 Thread Dave Young
Hi,

Thanks for the update, but still some nitpicks :(

I'm sorry I did not catch them previously,  but maybe it is not worth to
repost the whole series if no other changes needed.
On 07/03/20 at 11:58am, Chen Zhou wrote:
> Now we support crashkernel=X,[low] on arm64, update the Documentation.
> We could use parameters "crashkernel=X crashkernel=Y,low" to reserve
> memory above 4G.
> 
> Signed-off-by: Chen Zhou 
> Tested-by: John Donnelly 
> Tested-by: Prabhakar Kushwaha 
> ---
>  Documentation/admin-guide/kdump/kdump.rst   | 14 --
>  Documentation/admin-guide/kernel-parameters.txt | 17 +++--
>  2 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..e80fc9e28a9a 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -299,7 +299,15 @@ Boot into System Kernel
> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
> starting at physical address 0x0100 (16MB) for the dump-capture 
> kernel.
>  
> -   On x86 and x86_64, use "crashkernel=64M@16M".
> +   On x86 use "crashkernel=64M@16M".
> +
> +   On x86_64, use "crashkernel=Y" to select a region under 4G first, and
> +   fall back to reserve region above 4G.
> +   We can also use "crashkernel=X,high" to select a region above 4G, which
> +   also tries to allocate at least 256M below 4G automatically and
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
> +   Use "crashkernel=Y@X" if we really have to reserve memory from specified

s/we/you

> +   start address X.
>  
> On ppc64, use "crashkernel=128M@32M".
>  
> @@ -316,8 +324,10 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>  
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> +   On arm64, use "crashkernel=Y[@X]". Note that the start address of
> the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
> +   If crashkernel=Z,low is specified simultaneously, reserve spcified size

s/spcified/specified

> +   low memory firstly and then reserve memory above 4G.
>  
>  Load the Dump-capture Kernel
>  
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index fb95fad81c79..58a731eed011 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -722,6 +722,9 @@
>   [KNL, x86_64] select a region under 4G first, and
>   fall back to reserve region above 4G when '@offset'
>   hasn't been specified.
> + [KNL, arm64] If crashkernel=X,low is specified, reserve
> + spcified size low memory firstly, and then reserve 
> memory

s/spcified/specified

> + above 4G.
>   See Documentation/admin-guide/kdump/kdump.rst for 
> further details.
>  
>   crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -746,13 +749,23 @@
>   requires at least 64M+32K low memory, also enough extra
>   low memory is needed to make sure DMA buffers for 32-bit
>   devices won't run out. Kernel would try to allocate at
> - at least 256M below 4G automatically.
> + least 256M below 4G automatically.
>   This one let user to specify own low range under 4G
>   for second kernel instead.
>   0: to disable low allocation.
>   It will be ignored when crashkernel=X,high is not used
>   or memory reserved is below 4G.
> -
> + [KNL, arm64] range under 4G.
> + This one let user to specify own low range under 4G

s/own low/a low

> + for crash dump kernel instead.
> + Be different from x86_64, kernel reserves specified size
> + physical memory region only when this parameter is 
> specified
> + instead of trying to reserve at least 256M below 4G
> + automatically.
> + Use this parameter along with crashkernel=X when we want
> + to reserve crashkernel above 4G. If there are devices
> + need to use ZONE_DMA in crash dump kernel, it is also
> + a good choice.
>   cryptomgr.notests
>   [KNL] Disable crypto self-tests
>  
> -- 
> 2.20.1
> 



Re: [PATCH v6 1/2] crash_core, vmcoreinfo: Append 'MAX_PHYSMEM_BITS' to vmcoreinfo

2020-07-02 Thread Dave Young
Hi Catalin,
On 07/02/20 at 12:00pm, Catalin Marinas wrote:
> On Thu, May 14, 2020 at 12:22:36AM +0530, Bhupesh Sharma wrote:
> > diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> > index 9f1557b98468..18175687133a 100644
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -413,6 +413,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> > VMCOREINFO_LENGTH(mem_section, NR_SECTION_ROOTS);
> > VMCOREINFO_STRUCT_SIZE(mem_section);
> > VMCOREINFO_OFFSET(mem_section, section_mem_map);
> > +   VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS);
> >  #endif
> > VMCOREINFO_STRUCT_SIZE(page);
> > VMCOREINFO_STRUCT_SIZE(pglist_data);
> 
> I can queue this patch via the arm64 tree (together with the second one)
> but I'd like an ack from the kernel/crash_core.c maintainers. They don't
> seem to have been cc'ed either (only the kexec list).

For the VMCOREINFO part, I'm fine with the changes, but since I do not
understand the arm64 pieces so I would like to leave to arm64 people to
review.  If arm64 bits are good enough, feel free to add:

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-02 Thread Dave Young
> > I'm confused about the "overlap with crashkernel memory", does that mean
> > those normal kernel used memory could be put in crashkernel reserved
> > memory range?  If so why can't just skip those areas while crashkernel
> > doing the reservation?
> I raised the same question in another mail. As Hari's answer, "kexec -p"
> skips these ranges in user space. And the same logic should be done in
> "kexec -s -p"

See it, thanks!  The confusion also applied to the userspace
implementation though.  Seems they have to be special cases because of 
the powerpc crashkernel reservation implemtation in kernel limitation

Thanks
Dave



Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-02 Thread Dave Young
On 07/01/20 at 11:48pm, Hari Bathini wrote:
> 
> 
> On 01/07/20 1:10 pm, Dave Young wrote:
> > Hi Hari,
> > On 06/27/20 at 12:35am, Hari Bathini wrote:
> >> crashkernel region could have an overlap with special memory regions
> >> like  opal, rtas, tce-table & such. These regions are referred to as
> >> exclude memory ranges. Setup this ranges during image probe in order
> >> to avoid them while finding the buffer for different kdump segments.
> >> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
> >> accounting for these ranges. Also, override arch_kexec_add_buffer()
> >> to locate a memory hole & later call __kexec_add_buffer() function
> >> with kbuf->mem set to skip the generic locate memory hole lookup.
> >>
> >> Signed-off-by: Hari Bathini 
> >> ---
> >>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
> >>  arch/powerpc/include/asm/kexec.h   |7 -
> >>  arch/powerpc/kexec/elf_64.c|7 +
> >>  arch/powerpc/kexec/file_load_64.c  |  292 
> >> 
> >>  4 files changed, 312 insertions(+), 4 deletions(-)
> >>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
> >>
> > [snip]
> >>  /**
> >> + * get_exclude_memory_ranges - Get exclude memory ranges. This list 
> >> includes
> >> + * regions like opal/rtas, tce-table, initrd,
> >> + * kernel, htab which should be avoided while
> >> + * setting up kexec load segments.
> >> + * @mem_ranges:Range list to add the memory ranges to.
> >> + *
> >> + * Returns 0 on success, negative errno on error.
> >> + */
> >> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
> >> +{
> >> +  int ret;
> >> +
> >> +  ret = add_tce_mem_ranges(mem_ranges);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  ret = add_initrd_mem_range(mem_ranges);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  ret = add_htab_mem_range(mem_ranges);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  ret = add_kernel_mem_range(mem_ranges);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  ret = add_rtas_mem_range(mem_ranges, false);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  ret = add_opal_mem_range(mem_ranges, false);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  ret = add_reserved_ranges(mem_ranges);
> >> +  if (ret)
> >> +  goto out;
> >> +
> >> +  /* exclude memory ranges should be sorted for easy lookup */
> >> +  sort_memory_ranges(*mem_ranges);
> >> +out:
> >> +  if (ret)
> >> +  pr_err("Failed to setup exclude memory ranges\n");
> >> +  return ret;
> >> +}
> > 
> > I'm confused about the "overlap with crashkernel memory", does that mean
> > those normal kernel used memory could be put in crashkernel reserved
> 
> There are regions that could overlap with crashkernel region but they are
> not normal kernel used memory though. These are regions that kernel and/or
> f/w chose to place at a particular address for real mode accessibility
> and/or memory layout between kernel & f/w kind of thing.
> 
> > memory range?  If so why can't just skip those areas while crashkernel
> > doing the reservation?
> 
> crashkernel region has a dependency to be in the first memory block for it
> to be accessible in real mode. Accommodating this requirement while addressing
> other requirements would mean something like what we have now. A list of
> possible special memory regions in crashkernel region to take care of.
> 
> I have plans to split crashkernel region into low & high to have exclusive
> regions for crashkernel, even if that means to have two of them. But that
> is for another day with its own set of complexities to deal with...

Ok, I was not aware the powerpc crashkernel reservation is not
dynamically reserved.  But seems powerpc need those tricks at least for
the time being like you said.

Thanks
Dave



Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole

2020-07-02 Thread Dave Young
On 07/02/20 at 12:01am, Hari Bathini wrote:
> 
> 
> On 01/07/20 1:16 pm, Dave Young wrote:
> > On 06/29/20 at 05:26pm, Hari Bathini wrote:
> >> Hi Petr,
> >>
> >> On 29/06/20 5:09 pm, Petr Tesarik wrote:
> >>> Hi Hari,
> >>>
> >>> is there any good reason to add two more functions with a very similar
> >>> name to an existing function? AFAICS all you need is a way to call a
> >>> PPC64-specific function from within kexec_add_buffer (PATCH 4/11), so
> >>> you could add something like this:
> >>>
> >>> int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
> >>> {
> >>>   return 0;
> >>> }
> >>>
> >>> Call this function from kexec_add_buffer where appropriate and then
> >>> override it for PPC64 (it roughly corresponds to your
> >>> kexec_locate_mem_hole_ppc64() from PATCH 4/11).
> >>>
> >>> FWIW it would make it easier for me to follow the resulting code.
> >>
> >> Right, Petr.
> >>
> >> I was trying out a few things before I ended up with what I sent here.
> >> Bu yeah.. I did realize arch_kexec_locate_mem_hole() would have been better
> >> after sending out v1. Will take care of that in v2.
> > 
> > Another way is use arch private function to locate mem hole, then set
> > kbuf->mem, and then call kexec_add_buf, it will skip the common locate
> > hole function.
> 
> Dave, I did think about it. But there are a couple of places this can get
> tricky. One is ima_add_kexec_buffer() and the other is kexec_elf_load().
> These call sites could be updated to set kbuf->mem before kexec_add_buffer().
> But the current approach seemed like the better option for it creates a
> single point of control in setting up segment buffers and also, makes adding
> any new segments simpler, arch-specific segments or otherwise.
> 

Ok, thanks for the explanation.



Re: [PATCH v9 5/5] kdump: update Documentation about crashkernel on arm64

2020-07-01 Thread Dave Young
Hi Chen,
On 06/28/20 at 04:34pm, Chen Zhou wrote:
> Now we support crashkernel=X,[low] on arm64, update the Documentation.
> We could use parameters "crashkernel=X crashkernel=Y,low" to reserve
> memory above 4G.
> 
> Signed-off-by: Chen Zhou 
> Tested-by: John Donnelly 
> Tested-by: Prabhakar Kushwaha 
> ---
>  Documentation/admin-guide/kdump/kdump.rst   | 13 +++--
>  Documentation/admin-guide/kernel-parameters.txt | 17 +++--
>  2 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..6ba294d425c9 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -299,7 +299,13 @@ Boot into System Kernel
> "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
> starting at physical address 0x0100 (16MB) for the dump-capture 
> kernel.
>  
> -   On x86 and x86_64, use "crashkernel=64M@16M".
> +   On x86 use "crashkernel=64M@16M".
> +
> +   On x86_64, use "crashkernel=Y[@X]" to select a region under 4G first, and
> +   fall back to reserve region above 4G when '@offset' hasn't been specified.

Actually crashkernel=Y without the offset works well, I do not see why
we need the offset, it should be some legacy thing.  So it should be
better just use the Y without offset here, and just leave a note
somewhere people can use [@X] offset when they really have to.

> +   We can also use "crashkernel=X,high" to select a region above 4G, which
> +   also tries to allocate at least 256M below 4G automatically and
> +   "crashkernel=Y,low" can be used to allocate specified size low memory.
>  
> On ppc64, use "crashkernel=128M@32M".
>  
> @@ -316,8 +322,11 @@ Boot into System Kernel
> kernel will automatically locate the crash kernel image within the
> first 512MB of RAM if X is not given.
>  
> -   On arm64, use "crashkernel=Y[@X]".  Note that the start address of
> +   On arm64, use "crashkernel=Y[@X]". Note that the start address of
> the kernel, X if explicitly specified, must be aligned to 2MiB (0x20).
> +   If crashkernel=Z,low is specified simultaneously, reserve spcified size
> +   low memory for crash kdump kernel devices firstly and then reserve memory

"devices" seems not very accurate, maybe just drop the "for crash kdump
kernel devices" since it is clear based on the context.

> +   above 4G.
>  
>  Load the Dump-capture Kernel
>  
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index fb95fad81c79..335431a351c0 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -722,6 +722,9 @@
>   [KNL, x86_64] select a region under 4G first, and
>   fall back to reserve region above 4G when '@offset'
>   hasn't been specified.
> + [KNL, arm64] If crashkernel=X,low is specified, reserve
> + spcified size low memory for crash kdump kernel devices

Ditto.

> + firstly, and then reserve memory above 4G.
>   See Documentation/admin-guide/kdump/kdump.rst for 
> further details.
>  
>   crashkernel=range1:size1[,range2:size2,...][@offset]
> @@ -746,13 +749,23 @@
>   requires at least 64M+32K low memory, also enough extra
>   low memory is needed to make sure DMA buffers for 32-bit
>   devices won't run out. Kernel would try to allocate at
> - at least 256M below 4G automatically.
> + least 256M below 4G automatically.
>   This one let user to specify own low range under 4G
>   for second kernel instead.
>   0: to disable low allocation.
>   It will be ignored when crashkernel=X,high is not used
>   or memory reserved is below 4G.
> -
> + [KNL, arm64] range under 4G.
> + This one let user to specify own low range under 4G
> + for crash dump kernel instead.
> + Different with x86_64, kernel allocates specified size

sounds better:
s/Different with/Be different from

s/allocates/reserves

> + physical memory region only when this parameter is 
> specified
> + instead of trying to allocate at least 256M below 4G

s/allocate/reserve

> + automatically.
> + This parameter is used along with crashkernel=X when we

Could change the passive sentence to below:
"Use this parameter along with"

> + want to reserve crashkernel above 4G. If there are 
> devices
> + need to use ZONE_DMA in crash dump 

Re: [PATCH v9 1/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2020-07-01 Thread Dave Young
 return ret;
> + }
> +
> + pr_info("Reserving %ldMB of low memory at %ldMB for crashkernel (System 
> low RAM: %ldMB)\n",
> + (unsigned long)(low_size >> 20),
> + (unsigned long)(low_base >> 20),
> + (unsigned long)(total_low_mem >> 20));
> +
> + crashk_low_res.start = low_base;
> + crashk_low_res.end   = low_base + low_size - 1;
> +#endif
> + return 0;
> +}
> +
>  Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
> void *data, size_t data_len)
>  {
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index c19c0dad1ebe..db66bbabfff3 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -53,23 +53,6 @@ note_buf_t __percpu *crash_notes;
>  /* Flag to indicate we are going to kexec a new kernel */
>  bool kexec_in_progress = false;
>  
> -
> -/* Location of the reserved area for the crash kernel */
> -struct resource crashk_res = {
> - .name  = "Crash kernel",
> - .start = 0,
> - .end   = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc  = IORES_DESC_CRASH_KERNEL
> -};
> -struct resource crashk_low_res = {
> - .name  = "Crash kernel",
> - .start = 0,
> - .end   = 0,
> - .flags = IORESOURCE_BUSY | IORESOURCE_SYSTEM_RAM,
> - .desc  = IORES_DESC_CRASH_KERNEL
> -};
> -
>  int kexec_should_crash(struct task_struct *p)
>  {
>   /*
> -- 
> 2.20.1
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH v2 11/11] ima: Support additional conditionals in the KEXEC_CMDLINE hook function

2020-07-01 Thread Dave Young
Hi,
On 06/26/20 at 05:39pm, Tyler Hicks wrote:
> Take the properties of the kexec kernel's inode and the current task
> ownership into consideration when matching a KEXEC_CMDLINE operation to
> the rules in the IMA policy. This allows for some uniformity when
> writing IMA policy rules for KEXEC_KERNEL_CHECK, KEXEC_INITRAMFS_CHECK,
> and KEXEC_CMDLINE operations.
> 
> Prior to this patch, it was not possible to write a set of rules like
> this:
> 
>  dont_measure func=KEXEC_KERNEL_CHECK obj_type=foo_t
>  dont_measure func=KEXEC_INITRAMFS_CHECK obj_type=foo_t
>  dont_measure func=KEXEC_CMDLINE obj_type=foo_t
>  measure func=KEXEC_KERNEL_CHECK
>  measure func=KEXEC_INITRAMFS_CHECK
>  measure func=KEXEC_CMDLINE
> 
> The inode information associated with the kernel being loaded by a
> kexec_kernel_load(2) syscall can now be included in the decision to
> measure or not
> 
> Additonally, the uid, euid, and subj_* conditionals can also now be
> used in KEXEC_CMDLINE rules. There was no technical reason as to why
> those conditionals weren't being considered previously other than
> ima_match_rules() didn't have a valid inode to use so it immediately
> bailed out for KEXEC_CMDLINE operations rather than going through the
> full list of conditional comparisons.
> 
> Signed-off-by: Tyler Hicks 
> Cc: Eric Biederman 
> Cc: ke...@lists.infradead.org
> ---
> 
> * v2
>   - Moved the inode parameter of process_buffer_measurement() to be the
> first parameter so that it more closely matches process_masurement()
> 
>  include/linux/ima.h  |  4 ++--
>  kernel/kexec_file.c  |  2 +-
>  security/integrity/ima/ima.h |  2 +-
>  security/integrity/ima/ima_api.c |  2 +-
>  security/integrity/ima/ima_appraise.c|  2 +-
>  security/integrity/ima/ima_asymmetric_keys.c |  2 +-
>  security/integrity/ima/ima_main.c| 23 +++-
>  security/integrity/ima/ima_policy.c  | 17 +--
>  security/integrity/ima/ima_queue_keys.c  |  2 +-
>  9 files changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/ima.h b/include/linux/ima.h
> index 9164e1534ec9..d15100de6cdd 100644
> --- a/include/linux/ima.h
> +++ b/include/linux/ima.h
> @@ -25,7 +25,7 @@ extern int ima_post_read_file(struct file *file, void *buf, 
> loff_t size,
> enum kernel_read_file_id id);
>  extern void ima_post_path_mknod(struct dentry *dentry);
>  extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
> -extern void ima_kexec_cmdline(const void *buf, int size);
> +extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
>  
>  #ifdef CONFIG_IMA_KEXEC
>  extern void ima_add_kexec_buffer(struct kimage *image);
> @@ -103,7 +103,7 @@ static inline int ima_file_hash(struct file *file, char 
> *buf, size_t buf_size)
>   return -EOPNOTSUPP;
>  }
>  
> -static inline void ima_kexec_cmdline(const void *buf, int size) {}
> +static inline void ima_kexec_cmdline(int kernel_fd, const void *buf, int 
> size) {}
>  #endif /* CONFIG_IMA */
>  
>  #ifndef CONFIG_IMA_KEXEC
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index bb05fd52de85..07df431c1f21 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -287,7 +287,7 @@ kimage_file_prepare_segments(struct kimage *image, int 
> kernel_fd, int initrd_fd,
>   goto out;
>   }
>  
> - ima_kexec_cmdline(image->cmdline_buf,
> + ima_kexec_cmdline(kernel_fd, image->cmdline_buf,
> image->cmdline_buf_len - 1);
>   }
>  
> diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
> index 59ec28f5c117..ff2bf57ff0c7 100644
> --- a/security/integrity/ima/ima.h
> +++ b/security/integrity/ima/ima.h
> @@ -265,7 +265,7 @@ void ima_store_measurement(struct integrity_iint_cache 
> *iint, struct file *file,
>  struct evm_ima_xattr_data *xattr_value,
>  int xattr_len, const struct modsig *modsig, int pcr,
>  struct ima_template_desc *template_desc);
> -void process_buffer_measurement(const void *buf, int size,
> +void process_buffer_measurement(struct inode *inode, const void *buf, int 
> size,
>   const char *eventname, enum ima_hooks func,
>   int pcr, const char *keyring);
>  void ima_audit_measurement(struct integrity_iint_cache *iint,
> diff --git a/security/integrity/ima/ima_api.c 
> b/security/integrity/ima/ima_api.c
> index bf22de8b7ce0..4f39fb93f278 100644
> --- a/security/integrity/ima/ima_api.c
> +++ b/security/integrity/ima/ima_api.c
> @@ -162,7 +162,7 @@ void ima_add_violation(struct file *file, const unsigned 
> char *filename,
>  
>  /**
>   * ima_get_action - appraise & measure decision based on policy.
> - * @inode: pointer to inode to measure
> + * @inode: pointer to the inode 

Re: [PATCH] Replace HTTP links with HTTPS ones: kdump

2020-07-01 Thread Dave Young
On 06/27/20 at 12:31pm, Alexander A. Klimov wrote:
> Rationale:
> Reduces attack surface on kernel devs opening the links for MITM
> as HTTPS traffic is much harder to manipulate.
> 
> Deterministic algorithm:
> For each file:
>   If not .svg:
> For each line:
>   If doesn't contain `\bxmlns\b`:
> For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
>   If both the HTTP and HTTPS versions
>   return 200 OK and serve the same content:
> Replace HTTP with HTTPS.
> 
> Signed-off-by: Alexander A. Klimov 
> ---
>  If there are any URLs to be removed completely or at least not HTTPSified:
>  Just clearly say so and I'll *undo my change*.
>  See also https://lkml.org/lkml/2020/6/27/64
> 
>  If there are any valid, but yet not changed URLs:
>  See https://lkml.org/lkml/2020/6/26/837
> 
>  Documentation/admin-guide/kdump/kdump.rst | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/kdump.rst 
> b/Documentation/admin-guide/kdump/kdump.rst
> index 2da65fef2a1c..8cfa35f777f5 100644
> --- a/Documentation/admin-guide/kdump/kdump.rst
> +++ b/Documentation/admin-guide/kdump/kdump.rst
> @@ -65,20 +65,20 @@ Install kexec-tools
>  
>  2) Download the kexec-tools user-space package from the following URL:
>  
> -http://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
> +https://kernel.org/pub/linux/utils/kernel/kexec/kexec-tools.tar.gz
>  
>  This is a symlink to the latest version.
>  
>  The latest kexec-tools git tree is available at:
>  
>  - git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> -- http://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
> +- https://www.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git
>  
>  There is also a gitweb interface available at
> -http://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
> +https://www.kernel.org/git/?p=utils/kernel/kexec/kexec-tools.git
>  
>  More information about kexec-tools can be found at
> -http://horms.net/projects/kexec/
> +https://horms.net/projects/kexec/
>  
>  3) Unpack the tarball with the tar command, as follows::
>  
> @@ -511,7 +511,7 @@ dump kernel.
>  You can also use the Crash utility to analyze dump files in Kdump
>  format. Crash is available on Dave Anderson's site at the following URL:
>  
> -   http://people.redhat.com/~anderson/
> +   https://people.redhat.com/~anderson/

Would you mind to update above url as well?

Dave have moved it to below url instead:
https://crash-utility.github.io/

Thanks
Dave



Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole

2020-07-01 Thread Dave Young
On 06/29/20 at 05:26pm, Hari Bathini wrote:
> Hi Petr,
> 
> On 29/06/20 5:09 pm, Petr Tesarik wrote:
> > Hi Hari,
> > 
> > is there any good reason to add two more functions with a very similar
> > name to an existing function? AFAICS all you need is a way to call a
> > PPC64-specific function from within kexec_add_buffer (PATCH 4/11), so
> > you could add something like this:
> > 
> > int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
> > {
> > return 0;
> > }
> > 
> > Call this function from kexec_add_buffer where appropriate and then
> > override it for PPC64 (it roughly corresponds to your
> > kexec_locate_mem_hole_ppc64() from PATCH 4/11).
> > 
> > FWIW it would make it easier for me to follow the resulting code.
> 
> Right, Petr.
> 
> I was trying out a few things before I ended up with what I sent here.
> Bu yeah.. I did realize arch_kexec_locate_mem_hole() would have been better
> after sending out v1. Will take care of that in v2.

Another way is use arch private function to locate mem hole, then set
kbuf->mem, and then call kexec_add_buf, it will skip the common locate
hole function.

But other than that I have some confusion about those excluded ranges.
Replied a question to patch 4.

Thanks
Dave



Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions

2020-07-01 Thread Dave Young
Hi Hari,
On 06/27/20 at 12:35am, Hari Bathini wrote:
> crashkernel region could have an overlap with special memory regions
> like  opal, rtas, tce-table & such. These regions are referred to as
> exclude memory ranges. Setup this ranges during image probe in order
> to avoid them while finding the buffer for different kdump segments.
> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
> accounting for these ranges. Also, override arch_kexec_add_buffer()
> to locate a memory hole & later call __kexec_add_buffer() function
> with kbuf->mem set to skip the generic locate memory hole lookup.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
>  arch/powerpc/include/asm/kexec.h   |7 -
>  arch/powerpc/kexec/elf_64.c|7 +
>  arch/powerpc/kexec/file_load_64.c  |  292 
> 
>  4 files changed, 312 insertions(+), 4 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
> 
[snip]
>  /**
> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
> + * regions like opal/rtas, tce-table, initrd,
> + * kernel, htab which should be avoided while
> + * setting up kexec load segments.
> + * @mem_ranges:Range list to add the memory ranges to.
> + *
> + * Returns 0 on success, negative errno on error.
> + */
> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
> +{
> + int ret;
> +
> + ret = add_tce_mem_ranges(mem_ranges);
> + if (ret)
> + goto out;
> +
> + ret = add_initrd_mem_range(mem_ranges);
> + if (ret)
> + goto out;
> +
> + ret = add_htab_mem_range(mem_ranges);
> + if (ret)
> + goto out;
> +
> + ret = add_kernel_mem_range(mem_ranges);
> + if (ret)
> + goto out;
> +
> + ret = add_rtas_mem_range(mem_ranges, false);
> + if (ret)
> + goto out;
> +
> + ret = add_opal_mem_range(mem_ranges, false);
> + if (ret)
> + goto out;
> +
> + ret = add_reserved_ranges(mem_ranges);
> + if (ret)
> + goto out;
> +
> + /* exclude memory ranges should be sorted for easy lookup */
> + sort_memory_ranges(*mem_ranges);
> +out:
> + if (ret)
> + pr_err("Failed to setup exclude memory ranges\n");
> + return ret;
> +}

I'm confused about the "overlap with crashkernel memory", does that mean
those normal kernel used memory could be put in crashkernel reserved
memory range?  If so why can't just skip those areas while crashkernel
doing the reservation?

Thanks
Dave



Re: [PATCH v3 0/3] printk: replace ringbuffer

2020-06-25 Thread Dave Young
Hi John,
On 06/18/20 at 04:55pm, John Ogness wrote:
> Hello,
> 
> Here is a v3 for the first series to rework the printk
> subsystem. The v2 and history are here [0]. This first series
> only replaces the existing ringbuffer implementation. No locking
> is removed. No semantics/behavior of printk are changed.
> 
> Reviews on the ringbuffer are still ongoing, but I was asked to
> post this new version since several changes from v2 have been
> already agreed upon.
> 
> The series is based on v5.8-rc1.

Do you have the kdump userspace part link so that people can try
and do some testing?

Eg. some makedumpfile/crash tool git branch etc.

Thanks
Dave



i915/kexec: warning at drivers/gpu/drm/i915/display/intel_psr.c:782 intel_psr_activate+0x3c6/0x440

2020-06-17 Thread Dave Young
Hi,

This warning exists for long time, I did not find time to report, here
is the latest kernel logs, can you please to have a look?

hardware: Thinkpad T480s
lspci: 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 
(rev 07)
--
[0.00] Linux version 5.8.0-rc1+ (dyo...@dhcp-128-65.nay.redhat.com) 
(gcc (GCC) 10.0.1 20200328 (Red Hat 10.0.1-0.11), GNU ld version 2.34-2.fc32) 
#179 SMP Wed Jun 17 14:12:27 CST 2020
[0.00] Command line: ramoops.mem_address=0x2000 
ramoops.mem_size=0x40 hung_task_panic=1 softlockup_panic=1 panic=6 
root=/dev/nvme0n1p9 ro rd.lvm.lv=rhel/swap LANG=zh_CN.UTF-8 audit=0 selinux=0 
no_console_suspend crashkernel=160M printk.devkmsg=off usbcore.autosuspend=-1
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point 
registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
[0.00] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
[0.00] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[0.00] x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
[0.00] x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
[0.00] x86/fpu: Enabled xstate features 0x1f, context size is 960 
bytes, using 'compacted' format.
[0.00] BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x00057fff] usable
[0.00] BIOS-e820: [mem 0x00058000-0x00058fff] reserved
[0.00] BIOS-e820: [mem 0x00059000-0x0009cfff] usable
[0.00] BIOS-e820: [mem 0x0009d000-0x000f] reserved
[0.00] BIOS-e820: [mem 0x0010-0x3fff] usable
[0.00] BIOS-e820: [mem 0x4000-0x403f] reserved
[0.00] BIOS-e820: [mem 0x4040-0x7b4b2fff] usable
[0.00] BIOS-e820: [mem 0x7b4b3000-0x7b4b4fff] reserved
[0.00] BIOS-e820: [mem 0x7b4b5000-0x7b51cfff] usable
[0.00] BIOS-e820: [mem 0x7b51d000-0x7b51dfff] reserved
[0.00] BIOS-e820: [mem 0x7b51e000-0xad334fff] usable
[0.00] BIOS-e820: [mem 0xad335000-0xad335fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xad336000-0xad336fff] reserved
[0.00] BIOS-e820: [mem 0xad337000-0xba3e9fff] usable
[0.00] BIOS-e820: [mem 0xba3ea000-0xbb535fff] reserved
[0.00] BIOS-e820: [mem 0xbb536000-0xbb599fff] ACPI NVS
[0.00] BIOS-e820: [mem 0xbb59a000-0xbb5fefff] ACPI data
[0.00] BIOS-e820: [mem 0xbb5ff000-0xbb5f] usable
[0.00] BIOS-e820: [mem 0xbb60-0xbf7f] reserved
[0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved
[0.00] BIOS-e820: [mem 0xfe01-0xfe010fff] reserved
[0.00] BIOS-e820: [mem 0x0001-0x00043e7f] usable
[0.00] NX (Execute Disable) protection: active
[0.00] e820: update [mem 0x00050270-0x000502df] usable ==> usable
[0.00] extended physical RAM map:
[0.00] reserve setup_data: [mem 0x-0x0005026f] 
usable
[0.00] reserve setup_data: [mem 0x00050270-0x000502df] 
usable
[0.00] reserve setup_data: [mem 0x000502e0-0x00057fff] 
usable
[0.00] reserve setup_data: [mem 0x00058000-0x00058fff] 
reserved
[0.00] reserve setup_data: [mem 0x00059000-0x0009cfff] 
usable
[0.00] reserve setup_data: [mem 0x0009d000-0x000f] 
reserved
[0.00] reserve setup_data: [mem 0x0010-0x3fff] 
usable
[0.00] reserve setup_data: [mem 0x4000-0x403f] 
reserved
[0.00] reserve setup_data: [mem 0x4040-0x7b4b2fff] 
usable
[0.00] reserve setup_data: [mem 0x7b4b3000-0x7b4b4fff] 
reserved
[0.00] reserve setup_data: [mem 0x7b4b5000-0x7b51cfff] 
usable
[0.00] reserve setup_data: [mem 0x7b51d000-0x7b51dfff] 
reserved
[0.00] reserve setup_data: [mem 0x7b51e000-0xad334fff] 
usable
[0.00] reserve setup_data: [mem 0xad335000-0xad335fff] 
ACPI NVS
[0.00] reserve setup_data: [mem 0xad336000-0xad336fff] 
reserved
[0.00] reserve setup_data: [mem 0xad337000-0xba3e9fff] 
usable
[0.00] reserve setup_data: [mem 0xba3ea000-0xbb535fff] 
reserved
[0.00] reserve 

Re: [PATCH 0/5] kexec_file_load() for arm

2020-06-11 Thread Dave Young
Cc kexec list in case people may missed the pieces.
On 06/01/20 at 04:27pm, Łukasz Stelmach wrote:
> The following series of patches provides implementation of the
> kexec_file_load() system call form the arm architecture. zImage and uImage
> (legacy format) files are supported. Like on arm64, there is no
> possibility of loading a new DTB and the currently loaded is reused.
> 
> Łukasz Stelmach (5):
>   arm: decompressor: set malloc pool size for the decompressor
>   arm: add image header definitions
>   arm: decompressor: define a new zImage tag
>   arm: Add kexec_image_info
>   arm: kexec_file: load zImage or uImage, initrd and dtb
> 
>  arch/arm/Kconfig   |  15 ++
>  arch/arm/boot/compressed/Makefile  |   2 +
>  arch/arm/boot/compressed/head.S|   9 +-
>  arch/arm/boot/compressed/vmlinux.lds.S |  22 +--
>  arch/arm/include/asm/image.h   |  87 ++
>  arch/arm/include/asm/kexec.h   |  14 ++
>  arch/arm/kernel/Makefile   |   5 +-
>  arch/arm/kernel/kexec_uimage.c |  80 ++
>  arch/arm/kernel/kexec_zimage.c | 199 +++
>  arch/arm/kernel/machine_kexec.c|  39 -
>  arch/arm/kernel/machine_kexec_file.c   | 209 +
>  11 files changed, 662 insertions(+), 19 deletions(-)
>  create mode 100644 arch/arm/include/asm/image.h
>  create mode 100644 arch/arm/kernel/kexec_uimage.c
>  create mode 100644 arch/arm/kernel/kexec_zimage.c
>  create mode 100644 arch/arm/kernel/machine_kexec_file.c
> 
> -- 
> 2.26.2
> 



Re: [PATCH v2] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-06-03 Thread Dave Young
On 06/02/20 at 12:59pm, Lianbo Jiang wrote:
> Signature verification is an important security feature, to protect
> system from being attacked with a kernel of unknown origin. Kexec
> rebooting is a way to replace the running kernel, hence need be
> secured carefully.
> 
> In the current code of handling signature verification of kexec kernel,
> the logic is very twisted. It mixes signature verification, IMA signature
> appraising and kexec lockdown.
> 
> If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
> signature, the supported crypto, and key, we don't think this is wrong,
> Unless kexec lockdown is executed. IMA is considered as another kind of
> signature appraising method.
> 
> If kexec kernel image has signature/crypto/key, it has to go through the
> signature verification and pass. Otherwise it's seen as verification
> failure, and won't be loaded.
> 
> Seems kexec kernel image with an unqualified signature is even worse than
> those w/o signature at all, this sounds very unreasonable. E.g. If people
> get a unsigned kernel to load, or a kernel signed with expired key, which
> one is more dangerous?
> 
> So, here, let's simplify the logic to improve code readability. If the
> KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
> is mandated. Otherwise, we lift the bar for any kernel image.
> 
> Signed-off-by: Lianbo Jiang 
> ---
> Changes since v1:
> [1] Modify the log level(suggested by Jiri Bohac)
> 
>  kernel/kexec_file.c | 34 ++
>  1 file changed, 6 insertions(+), 28 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index faa74d5f6941..fae496958a68 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -181,34 +181,19 @@ void kimage_file_post_load_cleanup(struct kimage *image)
>  static int
>  kimage_validate_signature(struct kimage *image)
>  {
> - const char *reason;
>   int ret;
>  
>   ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
>  image->kernel_buf_len);
> - switch (ret) {
> - case 0:
> - break;
> + if (ret) {
>  
> - /* Certain verification errors are non-fatal if we're not
> -  * checking errors, provided we aren't mandating that there
> -  * must be a valid signature.
> -  */
> - case -ENODATA:
> - reason = "kexec of unsigned image";
> - goto decide;
> - case -ENOPKG:
> - reason = "kexec of image with unsupported crypto";
> - goto decide;
> - case -ENOKEY:
> - reason = "kexec of image with unavailable key";
> - decide:
>   if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
> - pr_notice("%s rejected\n", reason);
> + pr_notice("Enforced kernel signature verification 
> failed (%d).\n", ret);
>   return ret;
>   }
>  
> - /* If IMA is guaranteed to appraise a signature on the kexec
> + /*
> +  * If IMA is guaranteed to appraise a signature on the kexec
>* image, permit it even if the kernel is otherwise locked
>* down.
>*/
> @@ -216,17 +201,10 @@ kimage_validate_signature(struct kimage *image)
>   security_locked_down(LOCKDOWN_KEXEC))
>   return -EPERM;
>  
> - return 0;
> -
> - /* All other errors are fatal, including nomem, unparseable
> -  * signatures and signature check failures - even if signatures
> -  * aren't required.
> -  */
> - default:
> - pr_notice("kernel signature verification failed (%d).\n", ret);
> + pr_debug("kernel signature verification failed (%d).\n", ret);
>   }
>  
> - return ret;
> + return 0;
>  }
>  #endif
>  
> -- 
> 2.17.1
> 

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH] kexec: Do not verify the signature without the lockdown or mandatory signature

2020-05-25 Thread Dave Young
On 05/25/20 at 01:23pm, Lianbo Jiang wrote:
> Signature verification is an important security feature, to protect
> system from being attacked with a kernel of unknown origin. Kexec
> rebooting is a way to replace the running kernel, hence need be
> secured carefully.
> 
> In the current code of handling signature verification of kexec kernel,
> the logic is very twisted. It mixes signature verification, IMA signature
> appraising and kexec lockdown.
> 
> If there is no KEXEC_SIG_FORCE, kexec kernel image doesn't have one of
> signature, the supported crypto, and key, we don't think this is wrong,
> Unless kexec lockdown is executed. IMA is considered as another kind of
> signature appraising method.
> 
> If kexec kernel image has signature/crypto/key, it has to go through the
> signature verification and pass. Otherwise it's seen as verification
> failure, and won't be loaded.
> 
> Seems kexec kernel image with an unqualified signature is even worse than
> those w/o signature at all, this sounds very unreasonable. E.g. If people
> get a unsigned kernel to load, or a kernel signed with expired key, which
> one is more dangerous?
> 
> So, here, let's simplify the logic to improve code readability. If the
> KEXEC_SIG_FORCE enabled or kexec lockdown enabled, signature verification
> is mandated. Otherwise, we lift the bar for any kernel image.
> 
> Signed-off-by: Lianbo Jiang 
> ---
>  kernel/kexec_file.c | 37 ++---
>  1 file changed, 6 insertions(+), 31 deletions(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index faa74d5f6941..e4bdf0c42f35 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -181,52 +181,27 @@ void kimage_file_post_load_cleanup(struct kimage *image)
>  static int
>  kimage_validate_signature(struct kimage *image)
>  {
> - const char *reason;
>   int ret;
>  
>   ret = arch_kexec_kernel_verify_sig(image, image->kernel_buf,
>  image->kernel_buf_len);
> - switch (ret) {
> - case 0:
> - break;
> + if (ret) {
> + pr_debug("kernel signature verification failed (%d).\n", ret);
>  
> - /* Certain verification errors are non-fatal if we're not
> -  * checking errors, provided we aren't mandating that there
> -  * must be a valid signature.
> -  */
> - case -ENODATA:
> - reason = "kexec of unsigned image";
> - goto decide;
> - case -ENOPKG:
> - reason = "kexec of image with unsupported crypto";
> - goto decide;
> - case -ENOKEY:
> - reason = "kexec of image with unavailable key";
> - decide:
> - if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE)) {
> - pr_notice("%s rejected\n", reason);
> + if (IS_ENABLED(CONFIG_KEXEC_SIG_FORCE))
>   return ret;
> - }
>  
> - /* If IMA is guaranteed to appraise a signature on the kexec
> + /*
> +  * If IMA is guaranteed to appraise a signature on the kexec
>* image, permit it even if the kernel is otherwise locked
>* down.
>*/
>   if (!ima_appraise_signature(READING_KEXEC_IMAGE) &&
>   security_locked_down(LOCKDOWN_KEXEC))
>   return -EPERM;
> -
> - return 0;
> -
> - /* All other errors are fatal, including nomem, unparseable
> -  * signatures and signature check failures - even if signatures
> -  * aren't required.
> -  */
> - default:
> - pr_notice("kernel signature verification failed (%d).\n", ret);
>   }
>  
> - return ret;
> + return 0;
>  }
>  #endif
>  
> -- 
> 2.17.1
> 


Acked-by: Dave Young 

Thanks
Dave



[tip: efi/urgent] efi/earlycon: Fix early printk for wider fonts

2020-05-22 Thread tip-bot2 for Dave Young
The following commit has been merged into the efi/urgent branch of tip:

Commit-ID: 8f592ada59b321d248391bae175cd78a12972223
Gitweb:
https://git.kernel.org/tip/8f592ada59b321d248391bae175cd78a12972223
Author:Dave Young 
AuthorDate:Sun, 12 Apr 2020 10:49:27 +08:00
Committer: Ard Biesheuvel 
CommitterDate: Tue, 12 May 2020 12:29:45 +02:00

efi/earlycon: Fix early printk for wider fonts

When I play with terminus fonts I noticed the efi early printk does
not work because the earlycon code assumes font width is 8.

Here add the code to adapt with larger fonts.  Tested with all kinds
of kernel built-in fonts on my laptop. Also tested with a local draft
patch for 14x28 !bold terminus font.

Signed-off-by: Dave Young 
Link: https://lore.kernel.org/r/20200412024927.ga6...@dhcp-128-65.nay.redhat.com
Signed-off-by: Ard Biesheuvel 
---
 drivers/firmware/efi/earlycon.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/firmware/efi/earlycon.c b/drivers/firmware/efi/earlycon.c
index 5d4f847..a52236e 100644
--- a/drivers/firmware/efi/earlycon.c
+++ b/drivers/firmware/efi/earlycon.c
@@ -114,14 +114,16 @@ static void efi_earlycon_write_char(u32 *dst, unsigned 
char c, unsigned int h)
const u32 color_black = 0x;
const u32 color_white = 0x00ff;
const u8 *src;
-   u8 s8;
-   int m;
+   int m, n, bytes;
+   u8 x;
 
-   src = font->data + c * font->height;
-   s8 = *(src + h);
+   bytes = BITS_TO_BYTES(font->width);
+   src = font->data + c * font->height * bytes + h * bytes;
 
-   for (m = 0; m < 8; m++) {
-   if ((s8 >> (7 - m)) & 1)
+   for (m = 0; m < font->width; m++) {
+   n = m % 8;
+   x = *(src + m / 8);
+   if ((x >> (7 - n)) & 1)
*dst = color_white;
else
*dst = color_black;


Re: [PATCH] MAINTAINERS: add files related to kdump

2020-05-20 Thread Dave Young
On 05/20/20 at 05:43pm, Baoquan He wrote:
> On 05/20/20 at 05:14pm, Dave Young wrote:
> > Hi Baoquan,
> > On 05/20/20 at 04:05pm, Baoquan He wrote:
> > > Kdump is implemented based on kexec, however some files are only
> > > related to crash dumping and missing, add them to KDUMP entry.
> > > 
> > > Signed-off-by: Baoquan He 
> > > ---
> > >  MAINTAINERS | 5 +
> > >  1 file changed, 5 insertions(+)
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 83cf5c43242a..2f9eefd33114 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -9251,6 +9251,11 @@ L: ke...@lists.infradead.org
> > >  S:   Maintained
> > >  W:   http://lse.sourceforge.net/kdump/
> > >  F:   Documentation/admin-guide/kdump/
> > > +F:   fs/proc/vmcore.c
> > > +F:   include/linux/crash_core.h
> > > +F:   include/linux/crash_dump.h
> > > +F:   include/uapi/linux/vmcore.h
> > > +F:   kernel/crash.*
> > 
> > ls kernel/crash.*
> > ls: cannot access 'kernel/crash.*': No such file or directory
> > 
> > But ls kernel/crash*
> > kernel/crash_core.c  kernel/crash_dump.c
> 
> Is below patten ok? Surely kernel/crash* is also OK to me. Thanks.
> 
> F:kernel/crash_*.c

Yes, looks good, thanks!

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH] MAINTAINERS: add files related to kdump

2020-05-20 Thread Dave Young
Hi Baoquan,
On 05/20/20 at 04:05pm, Baoquan He wrote:
> Kdump is implemented based on kexec, however some files are only
> related to crash dumping and missing, add them to KDUMP entry.
> 
> Signed-off-by: Baoquan He 
> ---
>  MAINTAINERS | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 83cf5c43242a..2f9eefd33114 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -9251,6 +9251,11 @@ L: ke...@lists.infradead.org
>  S:   Maintained
>  W:   http://lse.sourceforge.net/kdump/
>  F:   Documentation/admin-guide/kdump/
> +F:   fs/proc/vmcore.c
> +F:   include/linux/crash_core.h
> +F:   include/linux/crash_dump.h
> +F:   include/uapi/linux/vmcore.h
> +F:   kernel/crash.*

ls kernel/crash.*
ls: cannot access 'kernel/crash.*': No such file or directory

But ls kernel/crash*
kernel/crash_core.c  kernel/crash_dump.c

Thanks
Dave



Re: [PATCH v2] x86/kdump: Fix 'kmem -s' reported an invalid freepointer when SME was active

2019-10-07 Thread Dave Young
Hi Lianbo,
On 10/07/19 at 03:08pm, Lianbo Jiang wrote:
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204793
> 
> Kdump kernel will reuse the first 640k region because of some reasons,
> for example: the trampline and conventional PC system BIOS region may
> require to allocate memory in this area. Obviously, kdump kernel will
> also overwrite the first 640k region, therefore, kernel has to copy
> the contents of the first 640k area to a backup area, which is done in
> purgatory(), because vmcore may need the old memory. When vmcore is
> dumped, kdump kernel will read the old memory from the backup area of
> the first 640k area.
> 
> Basically, the main reason should be clear, kernel does not correctly
> handle the first 640k region when SME is active, which causes that
> kernel does not properly copy these old memory to the backup area in
> purgatory(). Therefore, kdump kernel reads out the incorrect contents
> from the backup area when dumping vmcore. Finally, the phenomenon is
> as follow:
> 
> [root linux]$ crash vmlinux /var/crash/127.0.0.1-2019-09-19-08\:31\:27/vmcore
> WARNING: kernel relocated [240MB]: patching 97110 gdb minimal_symbol values
> 
>   KERNEL: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmlinux
> DUMPFILE: /var/crash/127.0.0.1-2019-09-19-08:31:27/vmcore  [PARTIAL DUMP]
> CPUS: 128
> DATE: Thu Sep 19 08:31:18 2019
>   UPTIME: 00:01:21
> LOAD AVERAGE: 0.16, 0.07, 0.02
>TASKS: 1343
> NODENAME: amd-ethanol
>  RELEASE: 5.3.0-rc7+
>  VERSION: #4 SMP Thu Sep 19 08:14:00 EDT 2019
>  MACHINE: x86_64  (2195 Mhz)
>   MEMORY: 127.9 GB
>PANIC: "Kernel panic - not syncing: sysrq triggered crash"
>  PID: 9789
>  COMMAND: "bash"
> TASK: "89711894ae80  [THREAD_INFO: 89711894ae80]"
>  CPU: 83
>STATE: TASK_RUNNING (PANIC)
> 
> crash> kmem -s|grep -i invalid
> kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
> freepointer:a6086ac099f0c5a4
> kmem: dma-kmalloc-512: slab:d77680001c00 invalid 
> freepointer:a6086ac099f0c5a4
> crash>
> 
> BTW: I also tried to fix the above problem in purgatory(), but there
> are too many restricts in purgatory() context, for example: i can't
> allocate new memory to create the identity mapping page table for SME
> situation.
> 
> Currently, there are two places where the first 640k area is needed,
> the first one is in the find_trampoline_placement(), another one is
> in the reserve_real_mode(), and their content doesn't matter. To avoid
> the above error, lets occupy the remain memory of the first 640k region
> (expect for the trampoline and real mode) so that the allocated memory
> does not fall into the first 640k area when SME is active, which makes
> us not to worry about whether kernel can correctly copy the contents of
> the first 640k area to a backup region in the purgatory().
> 
> Signed-off-by: Lianbo Jiang 
> ---
> Changes since v1:
> 1. Improve patch log
> 2. Change the checking condition from sme_active() to sme_active()
>&& strstr(boot_command_line, "crashkernel=")
> 
>  arch/x86/kernel/setup.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 77ea96b794bd..bdb1a02a84fd 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1148,6 +1148,9 @@ void __init setup_arch(char **cmdline_p)
>  
>   reserve_real_mode();
>  
> + if (sme_active() && strstr(boot_command_line, "crashkernel="))
> + memblock_reserve(0, 640*1024);
> +

Seems you missed the comment about "unconditionally do it", only check
crashkernel param looks better.

Also I noticed reserve_crashkernel is called after initmem_init, I'm not
sure if memblock_reserve is good enough in early code before
initmem_init. 

>   trim_platform_memory_ranges();
>   trim_low_memory_range();
>  
> -- 
> 2.17.1
> 

Thanks
Dave


Re: [PATCH V38 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down

2019-08-13 Thread Dave Young
On 08/07/19 at 05:07pm, Matthew Garrett wrote:
> From: Josh Boyer 
> 
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware. Reject
> the option when the kernel is locked down. This requires some reworking
> of the existing RSDP command line logic, since the early boot code also
> makes use of a command-line passed RSDP when locating the SRAT table
> before the lockdown code has been initialised. This is achieved by
> separating the command line RSDP path in the early boot code from the
> generic RSDP path, and then copying the command line RSDP into boot
> params in the kernel proper if lockdown is not enabled. If lockdown is
> enabled and an RSDP is provided on the command line, this will only be
> used when parsing SRAT (which shouldn't permit kernel code execution)
> and will be ignored in the rest of the kernel.
> 
> (Modified by Matthew Garrett in order to handle the early boot RSDP
> environment)
> 
> Signed-off-by: Josh Boyer 
> Signed-off-by: David Howells 
> Signed-off-by: Matthew Garrett 
> Reviewed-by: Kees Cook 
> cc: Dave Young 
> cc: linux-a...@vger.kernel.org
> ---
>  arch/x86/boot/compressed/acpi.c | 19 +--
>  arch/x86/include/asm/acpi.h |  9 +
>  arch/x86/include/asm/x86_init.h |  2 ++
>  arch/x86/kernel/acpi/boot.c |  5 +
>  arch/x86/kernel/x86_init.c  |  1 +
>  drivers/acpi/osl.c  | 14 +-
>  include/linux/acpi.h|  6 ++
>  7 files changed, 49 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> index 15255f388a85..149795c369f2 100644
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -26,7 +26,7 @@ struct mem_vector immovable_mem[MAX_NUMNODES*2];
>   */
>  #define MAX_ADDR_LEN 19
>  
> -static acpi_physical_address get_acpi_rsdp(void)
> +static acpi_physical_address get_cmdline_acpi_rsdp(void)
>  {
>   acpi_physical_address addr = 0;
>  
> @@ -278,10 +278,7 @@ acpi_physical_address get_rsdp_addr(void)
>  {
>   acpi_physical_address pa;
>  
> - pa = get_acpi_rsdp();
> -
> - if (!pa)
> - pa = boot_params->acpi_rsdp_addr;
> + pa = boot_params->acpi_rsdp_addr;
>  
>   /*
>* Try to get EFI data from setup_data. This can happen when we're a
> @@ -311,7 +308,17 @@ static unsigned long get_acpi_srat_table(void)
>   char arg[10];
>   u8 *entry;
>  
> - rsdp = (struct acpi_table_rsdp *)(long)boot_params->acpi_rsdp_addr;
> + /*
> +  * Check whether we were given an RSDP on the command line. We don't
> +  * stash this in boot params because the kernel itself may have
> +  * different ideas about whether to trust a command-line parameter.
> +  */
> + rsdp = (struct acpi_table_rsdp *)get_cmdline_acpi_rsdp();
> +
> + if (!rsdp)
> + rsdp = (struct acpi_table_rsdp *)(long)
> + boot_params->acpi_rsdp_addr;
> +
>   if (!rsdp)
>   return 0;
>  
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
> index aac686e1e005..bc9693c9107e 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -117,6 +117,12 @@ static inline bool acpi_has_cpu_in_madt(void)
>   return !!acpi_lapic;
>  }
>  
> +#define ACPI_HAVE_ARCH_SET_ROOT_POINTER
> +static inline void acpi_arch_set_root_pointer(u64 addr)
> +{
> + x86_init.acpi.set_root_pointer(addr);
> +}
> +
>  #define ACPI_HAVE_ARCH_GET_ROOT_POINTER
>  static inline u64 acpi_arch_get_root_pointer(void)
>  {
> @@ -125,6 +131,7 @@ static inline u64 acpi_arch_get_root_pointer(void)
>  
>  void acpi_generic_reduced_hw_init(void);
>  
> +void x86_default_set_root_pointer(u64 addr);
>  u64 x86_default_get_root_pointer(void);
>  
>  #else /* !CONFIG_ACPI */
> @@ -138,6 +145,8 @@ static inline void disable_acpi(void) { }
>  
>  static inline void acpi_generic_reduced_hw_init(void) { }
>  
> +static inline void x86_default_set_root_pointer(u64 addr) { }
> +
>  static inline u64 x86_default_get_root_pointer(void)
>  {
>   return 0;
> diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
> index ac0934189017..19435858df5f 100644
> --- a/arch/x86/include/asm/x86_init.h
> +++ b/arch/x86/include/asm/x86_init.h
> @@ -134,10 +134,12 @@ struct x86_hyper_init {
>  
>  /**
>   * struct x86_init_acpi - x86 ACPI init functions
> + * @set_root_poitner:set RSDP address
>   * @get_root_pointer:get RSDP address
>   *

Re: [PATCH V35 15/29] acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down

2019-07-15 Thread Dave Young
Hi,
On 07/15/19 at 12:59pm, Matthew Garrett wrote:
> From: Josh Boyer 
> 
> This option allows userspace to pass the RSDP address to the kernel, which
> makes it possible for a user to modify the workings of hardware .  Reject
> the option when the kernel is locked down.
> 
> Signed-off-by: Josh Boyer 
> Signed-off-by: David Howells 
> Signed-off-by: Matthew Garrett 
> Reviewed-by: Kees Cook 
> cc: Dave Young 
> cc: linux-a...@vger.kernel.org
> ---
>  drivers/acpi/osl.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 9c0edf2fc0dd..06e7cffc4386 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -180,7 +181,7 @@ acpi_physical_address __init 
> acpi_os_get_root_pointer(void)
>   acpi_physical_address pa;
>  
>  #ifdef CONFIG_KEXEC
> - if (acpi_rsdp)
> + if (acpi_rsdp && !security_locked_down(LOCKDOWN_ACPI_TABLES))
>   return acpi_rsdp;

I'm very sorry I noticed this late, but have to say this will not work for
X86 with latest kernel code.

acpi_physical_address __init acpi_os_get_root_pointer(void)
{
acpi_physical_address pa;

#ifdef CONFIG_KEXEC
if (acpi_rsdp)
return acpi_rsdp;
#endif
pa = acpi_arch_get_root_pointer();
if (pa)
return pa;
[snip]

In above code, the later acpi_arch_get_root_pointer still get acpi_rsdp
from cmdline param if provided.

You can check the arch/x86/boot/compressed/acpi.c, and
arch/x86/kernel/acpi/boot.c

In X86 early code, cmdline provided acpi_rsdp pointer will be saved in 
boot_params.acpi_rsdp_addr;
and the used in x86_default_get_root_pointer
 
>  #endif
>   pa = acpi_arch_get_root_pointer();
> -- 
> 2.22.0.510.g264f2c817a-goog
> 

Thanks
Dave


Re: [PATCH v2] kexec: fix warnig of crash_zero_bytes in crash.c

2019-06-23 Thread Dave Young
On 06/24/19 at 09:35am, Dave Young wrote:
> On 06/23/19 at 06:24am, Tiezhu Yang wrote:
> > Fix the following sparse warning:
> > 
> > arch/x86/kernel/crash.c:59:15:
> > warning: symbol 'crash_zero_bytes' was not declared. Should it be static?
> > 
> > First, make crash_zero_bytes static. In addition, crash_zero_bytes
> > is used when CONFIG_KEXEC_FILE is set, so make it only available
> > under CONFIG_KEXEC_FILE. Otherwise, if CONFIG_KEXEC_FILE is not set,
> > the following warning will appear when make crash_zero_bytes static:
> > 
> > arch/x86/kernel/crash.c:59:22:
> > warning: ‘crash_zero_bytes’ defined but not used [-Wunused-variable]
> > 
> > Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system 
> > call")
> > Signed-off-by: Tiezhu Yang 
> > Cc: Vivek Goyal 
> > ---
> >  arch/x86/kernel/crash.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index 576b2e1..f13480e 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -56,7 +56,9 @@ struct crash_memmap_data {
> >   */
> >  crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss = NULL;
> >  EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss);
> > -unsigned long crash_zero_bytes;
> > +#ifdef CONFIG_KEXEC_FILE
> > +static unsigned long crash_zero_bytes;
> > +#endif
> >  
> >  static inline void cpu_crash_vmclear_loaded_vmcss(void)
> >  {
> > -- 
> > 1.8.3.1
> 
> Acked-by: Dave Young 

BTW, a soft reminder, for kexec patches, it would be better to cc kexec mail
list.

> 
> Thanks
> Dave
> 


Re: [PATCH v2] kexec: fix warnig of crash_zero_bytes in crash.c

2019-06-23 Thread Dave Young
On 06/23/19 at 06:24am, Tiezhu Yang wrote:
> Fix the following sparse warning:
> 
> arch/x86/kernel/crash.c:59:15:
> warning: symbol 'crash_zero_bytes' was not declared. Should it be static?
> 
> First, make crash_zero_bytes static. In addition, crash_zero_bytes
> is used when CONFIG_KEXEC_FILE is set, so make it only available
> under CONFIG_KEXEC_FILE. Otherwise, if CONFIG_KEXEC_FILE is not set,
> the following warning will appear when make crash_zero_bytes static:
> 
> arch/x86/kernel/crash.c:59:22:
> warning: ‘crash_zero_bytes’ defined but not used [-Wunused-variable]
> 
> Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system 
> call")
> Signed-off-by: Tiezhu Yang 
> Cc: Vivek Goyal 
> ---
>  arch/x86/kernel/crash.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 576b2e1..f13480e 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -56,7 +56,9 @@ struct crash_memmap_data {
>   */
>  crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss = NULL;
>  EXPORT_SYMBOL_GPL(crash_vmclear_loaded_vmcss);
> -unsigned long crash_zero_bytes;
> +#ifdef CONFIG_KEXEC_FILE
> +static unsigned long crash_zero_bytes;
> +#endif
>  
>  static inline void cpu_crash_vmclear_loaded_vmcss(void)
>  {
> -- 
> 1.8.3.1

Acked-by: Dave Young 

Thanks
Dave



Re: [PATCH V8 3/3] Call ima_kexec_cmdline to measure the cmdline args

2019-06-13 Thread Dave Young
On 06/12/19 at 06:31pm, Mimi Zohar wrote:
> [Cc: kexec mailing list]
> 
> Hi Eric, Dave,
> 
> On Wed, 2019-06-12 at 15:15 -0700, Prakhar Srivastava wrote:
> > During soft reboot(kexec_file_load) boot cmdline args
> > are not measured.Thus the new kernel on load boots with
> > an assumption of cold reboot.
> > 
> > This patch makes a call to the ima hook ima_kexec_cmdline,
> > added in "Define a new IMA hook to measure the boot command
> > line arguments"
> > to measure the boot cmdline args into the ima log.
> > 
> > - call ima_kexec_cmdline from kexec_file_load.
> > - move the call ima_add_kexec_buffer after the cmdline
> > args have been measured.
> > 
> > Signed-off-by: Prakhar Srivastava 
> Cc: Eric W. Biederman 
> Cc: Dave Young 
> 
> Any chance we could get some Acks?

The ima_* is blackbox functions to me, looks like this patch is trying
to measure kexec cmdline buffer and save in some ima logs and then add all the
measure results including those for kernel/initrd to a kexec_buf and pass to 2nd
kernel.

It should be good and only take effect when IMA enabled. If all the
assumptions are right:

Acked-by: Dave Young 
> 
> thanks,
> 
> Mimi
> 
> > ---
> >  kernel/kexec_file.c | 9 ++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> > index 072b6ee55e3f..b0c724e5d86c 100644
> > --- a/kernel/kexec_file.c
> > +++ b/kernel/kexec_file.c
> > @@ -198,9 +198,6 @@ kimage_file_prepare_segments(struct kimage *image, int 
> > kernel_fd, int initrd_fd,
> > return ret;
> > image->kernel_buf_len = size;
> >  
> > -   /* IMA needs to pass the measurement list to the next kernel. */
> > -   ima_add_kexec_buffer(image);
> > -
> > /* Call arch image probe handlers */
> > ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
> > image->kernel_buf_len);
> > @@ -241,8 +238,14 @@ kimage_file_prepare_segments(struct kimage *image, int 
> > kernel_fd, int initrd_fd,
> > ret = -EINVAL;
> > goto out;
> > }
> > +
> > +   ima_kexec_cmdline(image->cmdline_buf,
> > + image->cmdline_buf_len - 1);
> > }
> >  
> > +   /* IMA needs to pass the measurement list to the next kernel. */
> > +   ima_add_kexec_buffer(image);
> > +
> > /* Call arch image load handlers */
> > ldata = arch_kexec_kernel_image_load(image);
> >  
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Thanks
Dave


Re: [PATCH] powerpc: Fix loading of kernel + initramfs with kexec_file_load()

2019-05-22 Thread Dave Young
On 05/22/19 at 07:01pm, Thiago Jung Bauermann wrote:
> Commit b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
> changed kexec_add_buffer() to skip searching for a memory location if
> kexec_buf.mem is already set, and use the address that is there.
> 
> In powerpc code we reuse a kexec_buf variable for loading both the kernel
> and the initramfs by resetting some of the fields between those uses, but
> not mem. This causes kexec_add_buffer() to try to load the kernel at the
> same address where initramfs will be loaded, which is naturally rejected:
> 
>   # kexec -s -l --initrd initramfs vmlinuz
>   kexec_file_load failed: Invalid argument
> 
> Setting the mem field before every call to kexec_add_buffer() fixes this
> regression.
> 
> Fixes: b6664ba42f14 ("s390, kexec_file: drop arch_kexec_mem_walk()")
> Signed-off-by: Thiago Jung Bauermann 
> ---
>  arch/powerpc/kernel/kexec_elf_64.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
> b/arch/powerpc/kernel/kexec_elf_64.c
> index ba4f18a43ee8..52a29fc73730 100644
> --- a/arch/powerpc/kernel/kexec_elf_64.c
> +++ b/arch/powerpc/kernel/kexec_elf_64.c
> @@ -547,6 +547,7 @@ static int elf_exec_load(struct kimage *image, struct 
> elfhdr *ehdr,
>   kbuf.memsz = phdr->p_memsz;
>   kbuf.buf_align = phdr->p_align;
>   kbuf.buf_min = phdr->p_paddr + base;
> + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out;
> @@ -581,7 +582,8 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
> .buf_max = ppc64_rma_size };
>   struct kexec_buf pbuf = { .image = image, .buf_min = 0,
> -   .buf_max = ppc64_rma_size, .top_down = true };
> +   .buf_max = ppc64_rma_size, .top_down = true,
> +   .mem = KEXEC_BUF_MEM_UNKNOWN };
>  
>   ret = build_elf_exec_info(kernel_buf, kernel_len, , _info);
>   if (ret)
> @@ -606,6 +608,7 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = initrd_len;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = false;
> + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out;
> @@ -638,6 +641,7 @@ static void *elf64_load(struct kimage *image, char 
> *kernel_buf,
>   kbuf.bufsz = kbuf.memsz = fdt_size;
>   kbuf.buf_align = PAGE_SIZE;
>   kbuf.top_down = true;
> + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
>   if (ret)
>   goto out;
> 
> 
> ___
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

Reviewed-by: Dave Young 

Thanks
Dave


Re: [PATCH v4 2/3] x86/kexec/64: Error out if try to jump to old 4-level kernel from 5-level kernel

2019-05-21 Thread Dave Young
On 05/22/19 at 11:20am, Dave Young wrote:
> On 05/09/19 at 09:36am, Baoquan He wrote:
> > If the running kernel has 5-level paging activated, the 5-level paging
> > mode is preserved across kexec. If the kexec'ed kernel does not contain
> > support for handling active 5-level paging mode in the decompressor, the
> > decompressor will crash with #GP.
> > 
> > Prevent this situation at load time. If 5-level paging is active, check the
> > xloadflags whether the kexec kernel can handle 5-level paging at least in
> > the decompressor. If not, reject the load attempt and print out error
> > message.
> > 
> > Signed-off-by: Baoquan He 
> > Acked-by: Kirill A. Shutemov 
> > ---
> >  arch/x86/kernel/kexec-bzimage64.c | 5 +
> 
> How about the userspace kexec-tools?  It needs a similar detection, but
> I'm not sure how to detect paging mode, maybe some sysfs entry or
> vmcoreinfo in /proc/vmcore

meant /proc/kcore ...

Thanks
Dave


Re: [PATCH v4 2/3] x86/kexec/64: Error out if try to jump to old 4-level kernel from 5-level kernel

2019-05-21 Thread Dave Young
On 05/09/19 at 09:36am, Baoquan He wrote:
> If the running kernel has 5-level paging activated, the 5-level paging
> mode is preserved across kexec. If the kexec'ed kernel does not contain
> support for handling active 5-level paging mode in the decompressor, the
> decompressor will crash with #GP.
> 
> Prevent this situation at load time. If 5-level paging is active, check the
> xloadflags whether the kexec kernel can handle 5-level paging at least in
> the decompressor. If not, reject the load attempt and print out error
> message.
> 
> Signed-off-by: Baoquan He 
> Acked-by: Kirill A. Shutemov 
> ---
>  arch/x86/kernel/kexec-bzimage64.c | 5 +

How about the userspace kexec-tools?  It needs a similar detection, but
I'm not sure how to detect paging mode, maybe some sysfs entry or
vmcoreinfo in /proc/vmcore


>  1 file changed, 5 insertions(+)
> 
> diff --git a/arch/x86/kernel/kexec-bzimage64.c 
> b/arch/x86/kernel/kexec-bzimage64.c
> index 22f60dd26460..858cc892672f 100644
> --- a/arch/x86/kernel/kexec-bzimage64.c
> +++ b/arch/x86/kernel/kexec-bzimage64.c
> @@ -321,6 +321,11 @@ static int bzImage64_probe(const char *buf, unsigned 
> long len)
>   return ret;
>   }
>  
> + if (!(header->xloadflags & XLF_5LEVEL) && pgtable_l5_enabled()) {
> + pr_err("Can not jump to old 4-level kernel from 5-level 
> kernel.\n");

4-level kernel sounds not very clear, maybe something like below?

"5-level paging enabled, can not kexec into an old kernel without 5-level
paging facility"?

> + return ret;
> + }
> +
>   /* I've got a bzImage */
>   pr_debug("It's a relocatable bzImage64\n");
>   ret = 0;
> -- 
> 2.17.2
> 

Thanks
Dave


Re: [PATCH v4 3/3] x86/kdump/64: Change the upper limit of crashkernel reservation

2019-05-21 Thread Dave Young
Hi Baoquan,

A few nitpicks, otherwise
Acked-by: Dave Young 

On 05/09/19 at 09:36am, Baoquan He wrote:
> Restrict kdump to only reserve crashkernel below 64TB.
> 
> The reaons is that the kdump may jump from 5-level to 4-level, and if
> the kdump kernel is put above 64TB, then the jumping will fail. While the
> 1st kernel reserves crashkernel region during bootup, we don't know yet
> which kind of kernel will be loaded after system bootup, 5-level kernel
> or 5-level kernel.

5-level kernel or 4-level kernel ?
> 
> Signed-off-by: Baoquan He 
> Acked-by: Kirill A. Shutemov 
> ---
>  arch/x86/kernel/setup.c | 17 ++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 905dae880563..efb0934a46f6 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -452,15 +452,26 @@ static void __init 
> memblock_x86_reserve_range_setup_data(void)
>  #define CRASH_ALIGN  SZ_16M
>  
>  /*
> - * Keep the crash kernel below this limit.  On 32 bits earlier kernels
> - * would limit the kernel to the low 512 MiB due to mapping restrictions.
> + * Keep the crash kernel below this limit.
> + *
> + * On 32 bits earlier kernels would limit the kernel to the low
> + * 512 MiB due to mapping restrictions.
> + *
> + * On 64bit, old kexec-tools need to be under 896MiB. The later
> + * supports to put kernel above 4G, up to system RAM top. Here

Above two lines are not reflected in code because we have removed
the 896M limitation, it would be better to drop the two lines to
avoid confusion. 

> + * kdump kernel need be restricted to be under 64TB, which is
> + * the upper limit of system RAM in 4-level paing mode. Since
> + * the kdump jumping could be from 5-level to 4-level, the jumping
> + * will fail if kernel is put above 64TB, and there's no way to
> + * detect the paging mode of the kernel which will be loaded for
> + * dumping during the 1st kernel bootup.
>   */
>  #ifdef CONFIG_X86_32
>  # define CRASH_ADDR_LOW_MAX  SZ_512M
>  # define CRASH_ADDR_HIGH_MAX SZ_512M
>  #else
>  # define CRASH_ADDR_LOW_MAX  SZ_4G
> -# define CRASH_ADDR_HIGH_MAX MAXMEM
> +# define CRASH_ADDR_HIGH_MAX (64UL << 40)

Maybe add a new macro in sizes.h like SZ_64T

>  #endif
>  
>  static int __init reserve_crashkernel_low(void)
> -- 
> 2.17.2
> 

Thanks
Dave


Re: [PATCH] x86/kexec: always ensure EFI systab region is mapped

2019-04-24 Thread Dave Young
On 04/24/19 at 01:41pm, Baoquan He wrote:
> On 04/24/19 at 02:47am, Junichi Nomura wrote:
> > On 4/24/19 2:15 AM, Kairui Song wrote:
> > > On Mon, Apr 22, 2019 at 11:21 PM Junichi Nomura  
> > > wrote:
> > >> Is the mapping of ACPI tables just by luck, too?
> > > 
> > > Good question, they should have same issue with systab, I ignored this 
> > > one.
> > > Then in first kernel when doing kexec it should ensure both ACPI
> > > tables and the EFI systab are mapped, that should cover everything and
> > > make it work.
> > 
> > Right.
> > 
> > > Is there anything else missing?
> > No, as far as I looked around get_rsdp_addr().
> 
> Have made a draft patch to build ident mapping for ACPI tables too, it's
> based on Kairui's patch. Dave has tested on his t400s laptop, and
> passed. Please check if this adding is OK.
> 
> Kairui, you can add this into your patch to make a new one and resend.
> Or I can combine them and send for you today.
> 
> From 7f17fcb12a6fddbb0f5e56e5137cc81f25c04af4 Mon Sep 17 00:00:00 2001
> From: Baoquan He 
> Date: Wed, 24 Apr 2019 09:57:01 +0800
> Subject: [PATCH] x86/kexec: Prepare ACPI table mapping for kexec kernel
> 
> If the kernel decompressing code accesses ACPI tabels in kexec-ed kernel,
> they also need be 1:1 mapped.
> 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 54 --
>  1 file changed, 51 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index d5da54893f97..e2948aea27d4 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -30,6 +30,48 @@
>  #include 
>  #include 
>  
> +#ifdef CONFIG_ACPI
> +/**
> + * Used while adding mapping for ACPI tables.
> + * Can be reused when other iomem regions need be mapped
> + */
> +struct init_pgtable_data {
> + struct x86_mapping_info *info;
> + pgd_t *level4p;
> +};
> +
> +static int mem_region_callback(struct resource *res, void *arg)
> +{
> + struct init_pgtable_data *data= arg;
> + unsigned long mstart, mend;
> +
> + mstart = res->start;
> + mend = mstart + resource_size(res) - 1;
> +
> + return kernel_ident_mapping_init(data->info,
> + data->level4p, mstart, mend);
> +}
> +
> +static int init_acpi_pgtable(struct x86_mapping_info *info,
> +pgd_t *level4p)
> +{
> + unsigned long flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> + struct init_pgtable_data data;
> +
> + data.info = info;
> + data.level4p = level4p;
> + flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> + return walk_iomem_res_desc(IORES_DESC_ACPI_TABLES, flags, 0, -1,
> +, mem_region_callback);
> +}
> +#else
> +static int init_acpi_pgtable(struct x86_mapping_info *info,
> +pgd_t *level4p)
> +{
> + return 0;
> +}
> +#endif
> +
>  #ifdef CONFIG_KEXEC_FILE
>  const struct kexec_file_ops * const kexec_file_loaders[] = {
>   _bzImage64_ops,
> @@ -114,6 +156,8 @@ static void *alloc_pgt_page(void *data)
>   return p;
>  }
>  
> +
> +
>  #ifdef CONFIG_EFI
>  static int init_efi_systab_pgtable(struct x86_mapping_info *info,
>  pgd_t *level4p)
> @@ -191,14 +235,18 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
>   return result;
>   }
>  
> - /*
> -  * Prepare EFI systab mapping for kexec kernel, systab is not
> -  * covered by pfn_mapped.
> + /**
> +  * Prepare EFI systab and ACPI table mapping for kexec kernel,
> +  * since they are not covered by pfn_mapped.
>*/
>   result = init_efi_systab_pgtable(, level4p);
>   if (result)
>   return result;
>  
> + result = init_acpi_pgtable(, level4p);
> + if (result)
> + return result;
> +
>   return init_transition_pgtable(image, level4p);
>  }
>  
> -- 
> 2.17.2
> 

Since I can not reproduce the acpi table accessing fault with Kairui's patch,
the test is just sanity testing on same hardware. But the patch looks
good.

With Kairui's fix+ this acpi fix and Junichi's patch everything works.
Can anyone send them for example patch 1/2: kexec early mapping for
efi/acpi,  patch 2/2: Junichi's previous patch.

With these fixes I think kexec will just work.  Maybe we can go with
these fixes and leave other issues like the loader type flag etc as 
future issue.

Thanks
Dave


Re: [PATCH] x86/kexec: always ensure EFI systab region is mapped

2019-04-23 Thread Dave Young
On 04/23/19 at 06:20am, Junichi Nomura wrote:
> On 4/22/19 6:28 PM, Kairui Song wrote:
> > The reason is the systab region is not mapped by the identity mapping
> > provided by kexec. Currently kexec only create identity mapping for
> > mem regions, wihch won't cover the systab. So second kernel will be
> > accessing a not mapped memory region and cause fault.
> > But as kexec tend to pad the map region up to PUD size, the
> > systab could be included in the map by accident, so it worked on
> > some machines, but that will be broken easily and unstable.
> 
> Is the mapping of ACPI tables just by luck, too?

Hmm, guess it should be mapped by luck, here is the range on the T420: 
da99f000 - dae9efff Reserved (efi systab fall in this region)
daf9f000 - daffefff ACPI tables

Thanks
Dave


[tip:x86/kdump] x86/kdump: Have crashkernel=X reserve under 4G by default

2019-04-22 Thread tip-bot for Dave Young
Commit-ID:  9ca5c8e632ce8f144ec6d00da2dc5e16b41d593c
Gitweb: https://git.kernel.org/tip/9ca5c8e632ce8f144ec6d00da2dc5e16b41d593c
Author: Dave Young 
AuthorDate: Sun, 21 Apr 2019 11:50:59 +0800
Committer:  Borislav Petkov 
CommitDate: Mon, 22 Apr 2019 10:15:16 +0200

x86/kdump: Have crashkernel=X reserve under 4G by default

The kdump crashkernel low reservation is limited to under 896M even for
X86_64. This obscure and miserable limitation exists for compatibility
with old kexec-tools but the reason is not documented anywhere.

Some more tests/investigations about the background:

a) Previously, old kexec-tools could only load purgatory to memory under
   2G. Eric removed that limitation in 2012 in kexec-tools:

 b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere
   in the 64bit address space.")

b) Back in 2013 Yinghai removed all the limitations in new kexec-tools,
   bzImage64 can be loaded anywhere:

 82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G")

c) Test results with old kexec-tools with old and latest kernels:

  1. Old kexec-tools can not build with modern toolchain anymore,
 I built it in a RHEL6 vm.

  2. 2.0.0 kexec-tools does not work with the latest kernel even with
 memory under 896M and gives an error:

 "ELF core (kcore) parse failed"

 For that it needs below kexec-tools fix:

   ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid")

  3. Even with patched kexec-tools which fixes 2),  it still needs some
 other fixes to work correctly for KASLR-enabled kernels.

So the situation is:

* Old kexec-tools is already broken with latest kernels.

* We can not keep these limitations forever just for compatibility with very
  old kexec-tools.

* If one must use old tools then he/she can choose crashkernel=X@Y.

* People have reported bugs where crashkernel=384M failed because KASLR
  makes the 0-896M space sparse.

* Crashkernel can reserve in low or high area, it is natural to understand
  low as memory under 4G.

Hence drop the 896M limitation and change crashkernel low reservation to
reserve under 4G by default.

Signed-off-by: Dave Young 
Signed-off-by: Borislav Petkov 
Reviewed-by: Ingo Molnar 
Acked-by: Baoquan He 
Cc: Dave Hansen 
Cc: David Howells 
Cc: Eric Biederman 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Juergen Gross 
Cc: Petr Tesarik 
Cc: pi...@redhat.com
Cc: Ram Pai 
Cc: Sinan Kaya 
Cc: Thomas Gleixner 
Cc: vgo...@redhat.com
Cc: x86-ml 
Cc: Yinghai Lu 
Cc: Zhimin Gu 
Link: https://lkml.kernel.org/r/20190421035058.943630...@redhat.com
---
 arch/x86/kernel/setup.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a527cd9..daf7c5650c18 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -448,18 +449,17 @@ static void __init 
memblock_x86_reserve_range_setup_data(void)
 #ifdef CONFIG_KEXEC_CORE
 
 /* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN(16 << 20)
+#define CRASH_ALIGNSZ_16M
 
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64bit, old kexec-tools need to under 896MiB.
  */
 #ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX(512 << 20)
-# define CRASH_ADDR_HIGH_MAX   (512 << 20)
+# define CRASH_ADDR_LOW_MAXSZ_512M
+# define CRASH_ADDR_HIGH_MAX   SZ_512M
 #else
-# define CRASH_ADDR_LOW_MAX(896UL << 20)
+# define CRASH_ADDR_LOW_MAXSZ_4G
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif
 


[tip:x86/kdump] x86/kdump: Fall back to reserve high crashkernel memory

2019-04-22 Thread tip-bot for Dave Young
Commit-ID:  b9ac3849af412fd3887d7652bdbabf29d2aecc16
Gitweb: https://git.kernel.org/tip/b9ac3849af412fd3887d7652bdbabf29d2aecc16
Author: Dave Young 
AuthorDate: Mon, 22 Apr 2019 11:19:05 +0800
Committer:  Borislav Petkov 
CommitDate: Mon, 22 Apr 2019 10:23:05 +0200

x86/kdump: Fall back to reserve high crashkernel memory

crashkernel=xM tries to reserve memory for the crash kernel under 4G,
which is enough, usually. But this could fail sometimes, for example
when one tries to reserve a big chunk like 2G, for example.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range. Do not set the ,high as default
because it allocates extra low memory for DMA buffers and swiotlb, and
this is not always necessary for all machines.

Typically, crashkernel=128M usually works with low reservation under 4G,
so keep <4G as default.

 [ bp: Massage. ]

Signed-off-by: Dave Young 
Signed-off-by: Borislav Petkov 
Reviewed-by: Ingo Molnar 
Acked-by: Baoquan He 
Cc: Dave Young 
Cc: David Howells 
Cc: Eric Biederman 
Cc: Greg Kroah-Hartman 
Cc: "H. Peter Anvin" 
Cc: Ingo Molnar 
Cc: Jiri Kosina 
Cc: Jonathan Corbet 
Cc: Juergen Gross 
Cc: Kees Cook 
Cc: Konrad Rzeszutek Wilk 
Cc: linux-...@vger.kernel.org
Cc: "Paul E. McKenney" 
Cc: Petr Tesarik 
Cc: pi...@redhat.com
Cc: Ram Pai 
Cc: Sinan Kaya 
Cc: Thomas Gleixner 
Cc: Thymo van Beers 
Cc: vgo...@redhat.com
Cc: x86-ml 
Cc: Yinghai Lu 
Cc: Zhimin Gu 
Link: https://lkml.kernel.org/r/20190422031905.ga8...@dhcp-128-65.nay.redhat.com
---
 Documentation/admin-guide/kernel-parameters.txt |  7 +--
 arch/x86/kernel/setup.c | 22 ++
 2 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 2b8ee90bb644..24d01648edeb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -704,8 +704,11 @@
upon panic. This parameter reserves the physical
memory region [offset, offset + size] for that kernel
image. If '@offset' is omitted, then a suitable offset
-   is selected automatically. Check
-   Documentation/kdump/kdump.txt for further details.
+   is selected automatically.
+   [KNL, x86_64] select a region under 4G first, and
+   fall back to reserve region above 4G when '@offset'
+   hasn't been specified.
+   See Documentation/kdump/kdump.txt for further details.
 
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index daf7c5650c18..c15f362a2516 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -541,21 +541,27 @@ static void __init reserve_crashkernel(void)
}
 
/* 0 means: find the address automatically */
-   if (crash_base <= 0) {
+   if (!crash_base) {
/*
 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-* as old kexec-tools loads bzImage below that, unless
-* "crashkernel=size[KMG],high" is specified.
+* crashkernel=x,high reserves memory over 4G, also allocates
+* 256M extra low memory for DMA buffers and swiotlb.
+* But the extra memory is not required for all machines.
+* So try low memory first and fall back to high memory
+* unless "crashkernel=size[KMG],high" is specified.
 */
-   crash_base = memblock_find_in_range(CRASH_ALIGN,
-   high ? CRASH_ADDR_HIGH_MAX
-: CRASH_ADDR_LOW_MAX,
-   crash_size, CRASH_ALIGN);
+   if (!high)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_LOW_MAX,
+   crash_size, CRASH_ALIGN);
+   if (!crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;
}
-
} else {
unsigned long long start;
 


[PATCH 2/2 update] X86/kdump: fall back to reserve high crashkernel memory

2019-04-21 Thread Dave Young
crashkernel=xM tries to reserve crashkernel memory under 4G, which
is enough for usual cases.  But this could fail sometimes, for example
one tries to reserve a big chunk like 2G, it is possible to fail.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range.  Do not set the ,high as default
because it allocates extra low memory for DMA buffers and swiotlb, this is
not always necessary for all machines. Typically like crashkernel=128M
usually work with low reservation under 4G, so still keep <4G as default.

Signed-off-by: Dave Young 
Reviewed-by: Ingo Molnar 
---
 Documentation/admin-guide/kernel-parameters.txt |7 +--
 arch/x86/kernel/setup.c |   22 ++
 2 files changed, 19 insertions(+), 10 deletions(-)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
}
 
/* 0 means: find the address automatically */
-   if (crash_base <= 0) {
+   if (!crash_base) {
/*
 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-* as old kexec-tools loads bzImage below that, unless
-* "crashkernel=size[KMG],high" is specified.
+* crashkernel=x,high reserves memory over 4G, also allocates
+* 256M extra low memory for DMA buffers and swiotlb.
+* but the extra memory is not required for all machines.
+* So prefer low memory first, and fall back to high memory
+* unless "crashkernel=size[KMG],high" is specified.
 */
-   crash_base = memblock_find_in_range(CRASH_ALIGN,
-   high ? CRASH_ADDR_HIGH_MAX
-: CRASH_ADDR_LOW_MAX,
-   crash_size, CRASH_ALIGN);
+   if (!high)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_LOW_MAX,
+   crash_size, CRASH_ALIGN);
+   if (!crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;
}
-
} else {
unsigned long long start;
 
--- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
@@ -704,8 +704,11 @@
upon panic. This parameter reserves the physical
memory region [offset, offset + size] for that kernel
image. If '@offset' is omitted, then a suitable offset
-   is selected automatically. Check
-   Documentation/kdump/kdump.txt for further details.
+   is selected automatically.
+   [KNL, x86_64] select a region under 4G first, and
+   fall back to reserve region above 4G in case without
+   '@offset'.
+   See Documentation/kdump/kdump.txt for further details.
 
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory


Re: [PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory

2019-04-21 Thread Dave Young
On 04/21/19 at 08:26pm, Ingo Molnar wrote:
> 
> * Dave Young  wrote:
> 
> > crashkernel=xM tries to reserve crashkernel memory under 4G, which
> > is enough for usual cases.  But this could fail sometimes, for example
> > one tries to reserve a big chunk like 2G, it is possible to fail.
> > 
> > So let the crashkernel=xM just fall back to use high memory in case it
> > fails to find a suitable low range.  Do not set the ,high as default
> > because it allocs extra low memory for DMA buffers and swiotlb, this is
> > not always necessary for all machines. Typically like crashkernel=128M
> > usually work with low reservation under 4G, so still keep <4G as default.
> > 
> > Signed-off-by: Dave Young 
> > ---
> >  Documentation/admin-guide/kernel-parameters.txt |7 +--
> >  arch/x86/kernel/setup.c |   22 
> > ++
> >  2 files changed, 19 insertions(+), 10 deletions(-)
> > 
> > --- linux-x86.orig/arch/x86/kernel/setup.c
> > +++ linux-x86/arch/x86/kernel/setup.c
> > @@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
> > }
> >  
> > /* 0 means: find the address automatically */
> > -   if (crash_base <= 0) {
> > +   if (!crash_base) {
> > /*
> >  * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
> > -* as old kexec-tools loads bzImage below that, unless
> > -* "crashkernel=size[KMG],high" is specified.
> > +* as crashkernel=x,high allocs memory over 4G, also allocs
> 
> s/allocs
>  /allocates
> 
> > +* 256M extra low memory for DMA buffers and swiotlb.
> > +* but the extra memory is not required for all machines.
> > +* So prefer low memory first, and fallback to high memory
> 
> s/fallback
>  /fall back
> 
> > +* unless "crashkernel=size[KMG],high" is specified.
> >  */
> > -   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > -   high ? CRASH_ADDR_HIGH_MAX
> > -: CRASH_ADDR_LOW_MAX,
> > -   crash_size, CRASH_ALIGN);
> > +   if (!high)
> > +   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +   CRASH_ADDR_LOW_MAX,
> > +   crash_size, CRASH_ALIGN);
> > +   if (!crash_base)
> > +   crash_base = memblock_find_in_range(CRASH_ALIGN,
> > +   CRASH_ADDR_HIGH_MAX,
> > +   crash_size, CRASH_ALIGN);
> > if (!crash_base) {
> > pr_info("crashkernel reservation failed - No suitable 
> > area found.\n");
> > return;
> > }
> > -
> > } else {
> > unsigned long long start;
> >  
> > --- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
> > +++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
> > @@ -704,8 +704,11 @@
> > upon panic. This parameter reserves the physical
> > memory region [offset, offset + size] for that kernel
> > image. If '@offset' is omitted, then a suitable offset
> > -   is selected automatically. Check
> > -   Documentation/kdump/kdump.txt for further details.
> > +   is selected automatically.
> > +   [KNL, x86_64] select a region under 4G first, and
> > +   fallback to reserve region above 4G in case without
> 
> s/fallback
>  /fall back
> 
> > +   '@offset'.
> > +   See Documentation/kdump/kdump.txt for further details.
> >  
> > crashkernel=range1:size1[,range2:size2,...][@offset]
> > [KNL] Same as above, but depends on the memory
> 
> With the nits fixed:
> 
> Reviewed-by: Ingo Molnar 

Thanks for review, will reply to 2/2 with an update of those spelling
issues.

Dave


[PATCH 2/2] X86/kdump: fall back to reserve high crashkernel memory

2019-04-20 Thread Dave Young
crashkernel=xM tries to reserve crashkernel memory under 4G, which
is enough for usual cases.  But this could fail sometimes, for example
one tries to reserve a big chunk like 2G, it is possible to fail.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range.  Do not set the ,high as default
because it allocs extra low memory for DMA buffers and swiotlb, this is
not always necessary for all machines. Typically like crashkernel=128M
usually work with low reservation under 4G, so still keep <4G as default.

Signed-off-by: Dave Young 
---
 Documentation/admin-guide/kernel-parameters.txt |7 +--
 arch/x86/kernel/setup.c |   22 ++
 2 files changed, 19 insertions(+), 10 deletions(-)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -541,21 +541,27 @@ static void __init reserve_crashkernel(v
}
 
/* 0 means: find the address automatically */
-   if (crash_base <= 0) {
+   if (!crash_base) {
/*
 * Set CRASH_ADDR_LOW_MAX upper bound for crash memory,
-* as old kexec-tools loads bzImage below that, unless
-* "crashkernel=size[KMG],high" is specified.
+* as crashkernel=x,high allocs memory over 4G, also allocs
+* 256M extra low memory for DMA buffers and swiotlb.
+* but the extra memory is not required for all machines.
+* So prefer low memory first, and fallback to high memory
+* unless "crashkernel=size[KMG],high" is specified.
 */
-   crash_base = memblock_find_in_range(CRASH_ALIGN,
-   high ? CRASH_ADDR_HIGH_MAX
-: CRASH_ADDR_LOW_MAX,
-   crash_size, CRASH_ALIGN);
+   if (!high)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_LOW_MAX,
+   crash_size, CRASH_ALIGN);
+   if (!crash_base)
+   crash_base = memblock_find_in_range(CRASH_ALIGN,
+   CRASH_ADDR_HIGH_MAX,
+   crash_size, CRASH_ALIGN);
if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable 
area found.\n");
return;
}
-
} else {
unsigned long long start;
 
--- linux-x86.orig/Documentation/admin-guide/kernel-parameters.txt
+++ linux-x86/Documentation/admin-guide/kernel-parameters.txt
@@ -704,8 +704,11 @@
upon panic. This parameter reserves the physical
memory region [offset, offset + size] for that kernel
image. If '@offset' is omitted, then a suitable offset
-   is selected automatically. Check
-   Documentation/kdump/kdump.txt for further details.
+   is selected automatically.
+   [KNL, x86_64] select a region under 4G first, and
+   fallback to reserve region above 4G in case without
+   '@offset'.
+   See Documentation/kdump/kdump.txt for further details.
 
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory




[PATCH 1/2] X86/kdump: move crashkernel=X to reserve under 4G by default

2019-04-20 Thread Dave Young
The kdump crashkernel low reservation is limited to under 896M even for
X86_64. This obscure and miserable limitation exists for old kexec-tools
compatibility, but the reason is not documented anywhere.

Some more tests/investigations about the background:
a) Previously old kexec-tools can only load purgatory to memory under 2G,
   Eric remove that limitation in 2012 in kexec-tools:
   Commit b4f9f8599679 ("kexec x86_64: Make purgatory relocatable anywhere
   in the 64bit address space.")

b) back in 2013 Yinghai removed all the limitations in new kexec-tools,
   bzImage64 can be loaded to anywhere.
   Commit 82c3dd2280d2 ("kexec, x86_64: Load bzImage64 above 4G")

c) test results with old kexec-tools with old and latest kernels.
  1. old kexec-tools can not build with modern toolchain anymore,
 I built it in a RHEL6 vm
  2. 2.0.0 kexec-tools does not work with latest kernel even with
 memory under 896M and give an error:
 "ELF core (kcore) parse failed", it needs below kexec-tools fix 
 Commit ed15ba1b9977 ("build_mem_phdrs(): check if p_paddr is invalid")
  3. even with patched kexec-tools which fixes 2),  it still needs some
 other fixes to work correctly for kaslr enabled kernels.

So the situation is:
* old kexec-tools is already broken with latest kernels
* we can not keep this limitations forever just for compatibility of very
  old kexec-tools.
* If one must use old tools then he/she can choose crashkernel=X@Y
* people have reported bugs crashkernel=384M failed because kaslr makes
  the 0-896M space sparse, 
* crashkernel can reserve in low or high area, it is natural to understand 
  low as memory under 4G

Hence drop the 896M limitation, and change crashkernel low reservation to
reserve under 4G by default.

Signed-off-by: Dave Young 
---
 arch/x86/kernel/setup.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

--- linux-x86.orig/arch/x86/kernel/setup.c
+++ linux-x86/arch/x86/kernel/setup.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -448,18 +449,17 @@ static void __init memblock_x86_reserve_
 #ifdef CONFIG_KEXEC_CORE
 
 /* 16M alignment for crash kernel regions */
-#define CRASH_ALIGN(16 << 20)
+#define CRASH_ALIGNSZ_16M
 
 /*
  * Keep the crash kernel below this limit.  On 32 bits earlier kernels
  * would limit the kernel to the low 512 MiB due to mapping restrictions.
- * On 64bit, old kexec-tools need to under 896MiB.
  */
 #ifdef CONFIG_X86_32
-# define CRASH_ADDR_LOW_MAX(512 << 20)
-# define CRASH_ADDR_HIGH_MAX   (512 << 20)
+# define CRASH_ADDR_LOW_MAXSZ_512M
+# define CRASH_ADDR_HIGH_MAX   SZ_512M
 #else
-# define CRASH_ADDR_LOW_MAX(896UL << 20)
+# define CRASH_ADDR_LOW_MAXSZ_4G
 # define CRASH_ADDR_HIGH_MAX   MAXMEM
 #endif
 




[PATCH] checkpatch: mute SOB warning in case SOB exists but without From header

2019-04-20 Thread Dave Young
checkpatch.pl report below warning but the SOB line is correct:
$ ./scripts/checkpatch.pl patches/test.patch
WARNING: Missing Signed-off-by: line by nominal patch author ''

Actually checkpatch.pl assumes every patch includes "From:" line,
this is true for a git formated patch, some saved mail format patch.

But the warning is boring for people who do not use git formated patch.
Change the script only print warning in case "From:" line exists.

Signed-off-by: Dave Young 
---
 scripts/checkpatch.pl |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

--- linux-x86.orig/scripts/checkpatch.pl
+++ linux-x86/scripts/checkpatch.pl
@@ -2283,7 +2283,7 @@ sub process {
our $clean = 1;
my $signoff = 0;
my $author = '';
-   my $authorsignoff = 0;
+   my $authorsignoff = -1;
my $is_patch = 0;
my $is_binding_patch = -1;
my $in_header_lines = $file ? 0 : 1;
@@ -2609,6 +2609,8 @@ sub process {
$l =~ s/"//g;
if ($l =~ /^\s*signed-off-by:\s*\Q$author\E/i) {
$authorsignoff = 1;
+   } else {
+   $authorsignoff = 0;
}
}
}
@@ -6649,7 +6651,7 @@ sub process {
if ($signoff == 0) {
ERROR("MISSING_SIGN_OFF",
  "Missing Signed-off-by: line(s)\n");
-   } elsif (!$authorsignoff) {
+   } elsif ($authorsignoff == 0) {
WARN("NO_AUTHOR_SIGN_OFF",
 "Missing Signed-off-by: line by nominal patch 
author '$author'\n");
}


Re: [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels

2019-04-17 Thread Dave Young
On 04/17/19 at 02:00pm, Kairui Song wrote:
> On Wed, Apr 17, 2019 at 12:57 PM Dave Young  wrote:
> >
> > On 04/17/19 at 09:38am, Dave Young wrote:
> > > On 04/16/19 at 03:22pm, Borislav Petkov wrote:
> > > > On Tue, Apr 16, 2019 at 07:41:33PM +0800, Dave Young wrote:
> > > > > On 04/16/19 at 11:52am, Borislav Petkov wrote:
> > > > > > I'll queue the below in the next days if there are no more 
> > > > > > complaints:
> > > > >
> > > > > As for the kexec breakage, even with the V3 patch, kexec still hangs 
> > > > > on
> > > > > a Lenovo T420 laptop.  Kairui also reproduced the problem. So can we
> > > > > wait a few days see if we can make some progress to find the cause?
> > > >
> > > > How is applying this patch going to change anything?
> > > >
> > > > I was told that the breakage is there even without it...
> > >
> > > Without this patch, the bug happens in the efi_get_rsdp.. function, this
> > > patch tries to fix that by adding kexec_get.. but the new introduced
> > > kexec_* function does not work on some laptops, so it is not a 100% good
> > > fix, I hoped we can get it working for all known issues.  But if we can
> > > not do it eg. within one week we can go with this version and leave the
> > > laptop issue as a known issue.
> > >
> >
> > Latest debugging status:
> >
> > Kexec boot works with commenting out some code like below, so the guid
> > cmp (memcmp) caused a system reset), still need to find out why:
> >
> > diff --git a/arch/x86/boot/compressed/acpi.c 
> > b/arch/x86/boot/compressed/acpi.c
> > index d9f9abd63c68..13e7a23ae94c 100644
> > --- a/arch/x86/boot/compressed/acpi.c
> > +++ b/arch/x86/boot/compressed/acpi.c
> > @@ -95,10 +95,12 @@ __efi_get_rsdp_addr(unsigned long config_tables, 
> > unsigned int nr_tables,
> > table = tbl->table;
> > }
> >
> > +/*
> > if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> > rsdp_addr = table;
> > else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> > return table;
> > +*/
> > }
> >
> > return rsdp_addr;
> > @@ -291,9 +293,10 @@ acpi_physical_address get_rsdp_addr(void)
> > if (!pa)
> > pa = kexec_get_rsdp_addr();
> >
> > +/*
> > if (!pa)
> > pa = efi_get_rsdp_addr();
> > -
> > +*/
> > if (!pa)
> > pa = bios_get_rsdp_addr();
> >
> >
> 
> Hi Dave, for this case I think it's just because GCC will found the
> loop does nothing, and optimize out the whole loop in
> __efi_get_rsdp_addr and will no longer read the actual nr_table value.
> 
> I can fix the boot error on T420 with your patch, but if I add
> anything, like a hardcode value assignment with the right value for
> acpi_rsdp in the loop, it will reset the machine. But set acpi_rsdp
> with a right initial value out side the loop works fine.
> If the loop condition is false, then there should be no difference
> between just comment out the line you mentioned and add an assignment.
> Else it just assign the value multiple times, not very reasonable but
> shouldn't fail.
> 
> And, I inspected the generated ASM code also suggest the same thing.
> So still, access the systab memory is the cause of the system reset on
> certain machines.

Makse sense, my previous debug also point to some systab accessing.
Probably some early pg table mess up.

Thanks
Dave


Re: [PATCH v2] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-04-03 Thread Dave Young
On 04/03/19 at 04:23pm, Dave Young wrote:
> On 04/03/19 at 03:50pm, Baoquan He wrote:
> > On 04/03/19 at 03:30pm, Chao Fan wrote:
> > > On Wed, Apr 03, 2019 at 02:39:11PM +0800, Dave Young wrote:
> > > >> Actually I got some different kexec test results.
> > > >> 
> > > >> Yesterday, with my installed kernel (based on git head several weeks
> > > >> ago), kexec kernel panics.
> > > >> 
> > > >> Then I tried latest mainline with git pull, everything works, (with or
> > > >> without the patch, and can not reproduce the bug this patch is fixing)
> > > >> 
> > > >> Today, test again, kexec reboot hangs (with or without your patch), but
> > > >> kdump works always (with or without the patch)
> > > >> 
> > > >> It is weird to me. Probably I need find out why I can not reproduce the
> > > >> bug this patch is addressing first.
> > > >> 
> > > >> earlyprintk seems not working for me anymore, it is not easy to debug 
> > > >> on
> > > >> laptop now.
> > > >> 
> > > >> But the patch itself is clear, I think it should be good.  There might 
> > > >> be
> > > >> other things broken.
> > > >
> > > >Disable your immovable mem code then everything works for me.  There
> > > >might be something wrong in the code.  Also "nokaslr" does not help, it
> > > >should be another problem 
> > > 
> > > If "nokaslr" doesn't help, so I think
> > > >+  /*num_immovable_mem = count_immovable_mem_regions();*/
> > > also doesn't help. I think the problem is from get_rsdp_addr().
> > 
> > Yes, seems get_rsdp_addr() has issue in this case. I am wondering if we
> > can adjust the postion of get_rsdp_addr() calling. If nokaslr is
> > specified, no need to get rsdp?
> 
> I assumed the early code is only be useful for kaslr, if this is true
> then no need to get rsdp in case nokaslr.
> 
> But I vaguely remember Boris want the boot_params rsdp pointer be a
> general thing, I did not followed up these patch discussions, need to see how 
> Boris thinks about this.
> 
Personally I would like to have a cmdline to bypass the early acpi code
because early code is hard to debug, if we have something weird happens
we can try to gate out these code, and then get some idea..


Re: [PATCH v2] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-04-03 Thread Dave Young
On 04/03/19 at 01:53pm, Dave Young wrote:
> On 04/03/19 at 01:35pm, Chao Fan wrote:
> > On Tue, Apr 02, 2019 at 08:03:19PM +0800, Dave Young wrote:
> > >On 04/01/19 at 12:08am, Junichi Nomura wrote:
> > >> Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
> > >> boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
> > >> in the early parsing code tries to search RSDP from EFI table but
> > >> that will crash because the table address is virtual when the kernel
> > >> was booted by kexec.
> > >> 
> > >> In the case of kexec, physical address of EFI tables is provided
> > >> via efi_setup_data in boot_params, which is set up by kexec(1).
> > >> 
> > >> Factor out the table parsing code and use different pointers depending
> > >> on whether the kernel is booted by kexec or not.
> > >> 
> > >> Fixes: 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in 
> > >> boot_params")
> > >> Signed-off-by: Jun'ichi Nomura 
> > >> Acked-by: Baoquan He 
> > >> Cc: Chao Fan 
> > >> Cc: Borislav Petkov 
> > >> Cc: Dave Young 
> > [...]
> > >
> > >I failed to kexec reboot on my laptop, kernel panics too quick,  I'm not 
> > >sure this is
> > >caused by your patch though.
> > >
> > >Actually there are something probably i915 changes break kexec,  the
> > >above test is with "nomodeset" which should work.
> > >
> > >Let me do more testing and update here tomorrow.
> > >
> > 
> > Hi Dave,
> > 
> > Last day I was testing the normal kexec, today I have tested the kdump
> > issue. Since the kdump has set "nokaslr" to cmdline, so I drop from
> > KDUMP_COMMANDLINE_APPEND
> > And it booted OK, so the PATCH works in both normal kexec and kdump.
> > 
> 
> Actually I got some different kexec test results.
> 
> Yesterday, with my installed kernel (based on git head several weeks
> ago), kexec kernel panics.
> 
> Then I tried latest mainline with git pull, everything works, (with or
> without the patch, and can not reproduce the bug this patch is fixing)
> 
> Today, test again, kexec reboot hangs (with or without your patch), but
> kdump works always (with or without the patch)
> 
> It is weird to me. Probably I need find out why I can not reproduce the
> bug this patch is addressing first.
> 
> earlyprintk seems not working for me anymore, it is not easy to debug on
> laptop now.
> 
> But the patch itself is clear, I think it should be good.  There might be
> other things broken.

Disable your immovable mem code then everything works for me.  There
might be something wrong in the code.  Also "nokaslr" does not help, it
should be another problem 

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index 2e53c056ba20..e760c9159662 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -416,7 +416,7 @@ static void mem_avoid_init(unsigned long input, unsigned 
long input_size,
handle_mem_options();
 
/* Enumerate the immovable memory regions */
-   num_immovable_mem = count_immovable_mem_regions();
+   /*num_immovable_mem = count_immovable_mem_regions();*/
 
 #ifdef CONFIG_X86_VERBOSE_BOOTUP
/* Make sure video RAM can be used. */
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index c0d6c560df69..1bc6f46d3aa7 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -352,7 +352,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, 
memptr heap,
boot_params->hdr.loadflags &= ~KASLR_FLAG;
 
/* Save RSDP address for later use. */
-   boot_params->acpi_rsdp_addr = get_rsdp_addr();
+/* boot_params->acpi_rsdp_addr = get_rsdp_addr(); */
 
sanitize_boot_params(boot_params);
 


Re: [PATCH v2] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernel

2019-04-02 Thread Dave Young
On 04/01/19 at 12:08am, Junichi Nomura wrote:
> Commit 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in
> boot_params") broke kexec boot on EFI systems.  efi_get_rsdp_addr()
> in the early parsing code tries to search RSDP from EFI table but
> that will crash because the table address is virtual when the kernel
> was booted by kexec.
> 
> In the case of kexec, physical address of EFI tables is provided
> via efi_setup_data in boot_params, which is set up by kexec(1).
> 
> Factor out the table parsing code and use different pointers depending
> on whether the kernel is booted by kexec or not.
> 
> Fixes: 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in boot_params")
> Signed-off-by: Jun'ichi Nomura 
> Acked-by: Baoquan He 
> Cc: Chao Fan 
> Cc: Borislav Petkov 
> Cc: Dave Young 
> 
> --
> v2: Added comments above __efi_get_rsdp_addr() and kexec_get_rsdp_addr() 
> 
> diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
> --- a/arch/x86/boot/compressed/acpi.c
> +++ b/arch/x86/boot/compressed/acpi.c
> @@ -44,17 +44,112 @@ static acpi_physical_address get_acpi_rsdp(void)
>   return addr;
>  }
>  
> -/* Search EFI system tables for RSDP. */
> -static acpi_physical_address efi_get_rsdp_addr(void)
> +#ifdef CONFIG_EFI
> +static unsigned long efi_get_kexec_setup_data_addr(void)
> +{
> + struct setup_data *data;
> + u64 pa_data;
> +
> + pa_data = boot_params->hdr.setup_data;
> + while (pa_data) {
> + data = (struct setup_data *) pa_data;
> + if (data->type == SETUP_EFI)
> + return pa_data + sizeof(struct setup_data);
> + pa_data = data->next;
> + }
> + return 0;
> +}
> +
> +/*
> + * Search EFI system tables for RSDP.  If both ACPI_20_TABLE_GUID and
> + * ACPI_TABLE_GUID are found, take the former, which has more features.
> + */
> +static acpi_physical_address
> +__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
> + bool efi_64)
>  {
>   acpi_physical_address rsdp_addr = 0;
> + int i;
> +
> + /* Get EFI tables from systab. */
> + for (i = 0; i < nr_tables; i++) {
> + acpi_physical_address table;
> + efi_guid_t guid;
> +
> + if (efi_64) {
> + efi_config_table_64_t *tbl = (efi_config_table_64_t *) 
> config_tables + i;
> +
> + guid  = tbl->guid;
> + table = tbl->table;
> +
> + if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
> + debug_putstr("Error getting RSDP address: EFI 
> config table located above 4GB.\n");
> + return 0;
> + }
> + } else {
> + efi_config_table_32_t *tbl = (efi_config_table_32_t *) 
> config_tables + i;
>  
> + guid  = tbl->guid;
> + table = tbl->table;
> + }
> +
> + if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
> + rsdp_addr = table;
> + else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
> + return table;
> + }
> +
> + return rsdp_addr;
> +}
> +#endif
> +
> +/*
> + * EFI/kexec support is only added for 64bit. So we don't have to
> + * care 32bit case.
> + */
> +static acpi_physical_address kexec_get_rsdp_addr(void)
> +{
>  #ifdef CONFIG_EFI
> - unsigned long systab, systab_tables, config_tables;
> + efi_system_table_64_t *systab;
> + struct efi_setup_data *esd;
> + struct efi_info *ei;
> + char *sig;
> +
> + esd = (struct efi_setup_data *) efi_get_kexec_setup_data_addr();
> + if (!esd)
> + return 0;
> +
> + if (!esd->tables) {
> + debug_putstr("Wrong kexec SETUP_EFI data.\n");
> + return 0;
> + }
> +
> + ei = _params->efi_info;
> + sig = (char *)>efi_loader_signature;
> + if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
> + debug_putstr("Wrong kexec EFI loader signature.\n");
> + return 0;
> + }
> +
> + /* Get systab from boot params. */
> + systab = (efi_system_table_64_t *) (ei->efi_systab | 
> ((__u64)ei->efi_systab_hi << 32));
> + if (!systab)
> + error("EFI system table not found in kexec boot_params.");
> +
> + return __efi_get_rsdp_addr((unsigned long) esd->tables,
> +systab->nr_tables, 

Re: [PATCH v2] x86/boot: Use EFI setup data if provided

2019-03-25 Thread Dave Young
On 03/25/19 at 09:54am, Boris Petkov wrote:
> On March 25, 2019 9:27:21 AM GMT+01:00, Junichi Nomura 
>  wrote:
> >On 3/25/19 3:59 PM, Dave Young wrote:
> >> On 03/25/19 at 06:47am, Junichi Nomura wrote:
> >>> On 3/25/19 3:19 PM, Dave Young wrote:
> >>>> On 03/25/19 at 02:01pm, Dave Young wrote:
> >>>> I think normally people do not see this bug, because kernel will
> >set the
> >>>> rsdp in boot_params->acpi_rsdp_addr.  Maybe you are testing with
> >>>
> >>> I think it's only done for file-based kexec interface.
> >> 
> >> Saw Kairui's another reply, yes, kexec-tools need a patch to fill the
> >> value as well then.
> >> 
> >> I would vote for a repost of your old patch with some #ifdef
> >
> >Thanks for comments, Dave, Kairui and Baoquan.
> >
> >The problem for me is it's a regression in v5.1-rc1, that breaks
> >existing setup. If early parsing of RSDP is required only for newly
> >supported configuration, I'm fine such configuration requires
> >new tools or new options.
> >
> >This is the 1st version plus #ifdef around the EFI code.
> 
> I'm going to repeat that again until you get it:
> 
> If the kexec kernel should continue to use efi_systab_init() then you
> should make efi_get_rsdp_addr() exit early in the kexec-ed kernel.

In that way, early parsing will fail in kexeced kernel, am I missing
something?  The early code become complicated but since we have already
the early acpi parsing why not to make it consistent in kexeced kernel?

Thanks
Dave


Re: [PATCH v4a 1/2] selftests/kexec: make tests independent of IMA being enabled

2019-03-25 Thread Dave Young
Hi Mimi
On 03/22/19 at 03:35pm, Mimi Zohar wrote:
> Verify IMA is enabled before failing tests or emitting irrelevant
> messages.  Also, don't skip the test if signatures are not required.
> 
> Suggested-by: Dave Young 
> Signed-off-by: Mimi Zohar 
> ---
> Dave, if this patch resolves the outstanding issues, I can fold these
> changes into the original patches. (Reminder, these patches will need to
> be updated to support the "lockdown" patch set.)

They looks good to me, thanks for the update

Feel free to add my reviewed-by, I did some tests although not cover all
ima cases.

Thanks
Dave

> 
>  .../selftests/kexec/test_kexec_file_load.sh| 27 
> ++
>  tools/testing/selftests/kexec/test_kexec_load.sh   | 24 ---
>  2 files changed, 33 insertions(+), 18 deletions(-)
> 
> diff --git a/tools/testing/selftests/kexec/test_kexec_file_load.sh 
> b/tools/testing/selftests/kexec/test_kexec_file_load.sh
> index 1d2e5e799523..57b636792086 100755
> --- a/tools/testing/selftests/kexec/test_kexec_file_load.sh
> +++ b/tools/testing/selftests/kexec/test_kexec_file_load.sh
> @@ -110,11 +110,20 @@ kexec_file_load_test()
>   log_fail "$succeed_msg (missing IMA sig)"
>   fi
>  
> - if [ $pe_sig_required -eq 0 ] && [ $ima_sig_required -eq 0 ] \
> - && [ $ima_read_policy -eq 0 ] && [ $ima_signed -eq 0 ]; then
> + if [ $pe_sig_required -eq 0 ] && [ $ima_appraise -eq 1 ] \
> + && [ $ima_sig_required -eq 0 ] && [ $ima_signed -eq 0 ] \
> + && [ $ima_read_policy -eq 0 ]; then
>   log_fail "$succeed_msg (possibly missing IMA sig)"
>   fi
>  
> + if [ $pe_sig_required -eq 0 ] && [ $ima_appraise -eq 0 ]; then
> + log_info "No signature verification required"
> + elif [ $pe_sig_required -eq 0 ] && [ $ima_appraise -eq 1 ] \
> + && [ $ima_sig_required -eq 0 ] && [ $ima_signed -eq 0 ] \
> + && [ $ima_read_policy -eq 1 ]; then
> + log_info "No signature verification required"
> + fi
> +
>   log_pass "$succeed_msg"
>   fi
>  
> @@ -136,8 +145,9 @@ kexec_file_load_test()
>   log_pass "$failed_msg (missing IMA sig)"
>   fi
>  
> - if [ $pe_sig_required -eq 0 ] && [ $ima_sig_required -eq 0 ] \
> - && [ $ima_read_policy -eq 0 ] && [ $ima_signed -eq 0 ]; then
> + if [ $pe_sig_required -eq 0 ] && [ $ima_appraise -eq 1 ] \
> + && [ $ima_sig_required -eq 0 ] && [ $ima_read_policy -eq 0 ] \
> + && [ $ima_signed -eq 0 ]; then
>   log_pass "$failed_msg (possibly missing IMA sig)"
>   fi
>  
> @@ -157,6 +167,9 @@ if [ $? -eq 0 ]; then
>  fi
>  
>  # Determine which kernel config options are enabled
> +kconfig_enabled "CONFIG_IMA_APPRAISE=y" "IMA enabled"
> +ima_appraise=$?
> +
>  kconfig_enabled "CONFIG_IMA_ARCH_POLICY=y" \
>   "architecture specific policy enabled"
>  arch_policy=$?
> @@ -178,12 +191,6 @@ ima_sig_required=$?
>  get_secureboot_mode
>  secureboot=$?
>  
> -if [ $secureboot -eq 0 ] && [ $arch_policy -eq 0 ] && \
> -   [ $pe_sig_required -eq 0 ] && [ $ima_sig_required -eq 0 ] && \
> -   [ $ima_read_policy -eq 1 ]; then
> - log_skip "No signature verification required"
> -fi
> -
>  # Are there pe and ima signatures
>  check_for_pesig
>  pe_signed=$?
> diff --git a/tools/testing/selftests/kexec/test_kexec_load.sh 
> b/tools/testing/selftests/kexec/test_kexec_load.sh
> index 2a66c8897f55..49c6aa929137 100755
> --- a/tools/testing/selftests/kexec/test_kexec_load.sh
> +++ b/tools/testing/selftests/kexec/test_kexec_load.sh
> @@ -1,8 +1,8 @@
>  #!/bin/sh
>  # SPDX-License-Identifier: GPL-2.0
> -# Loading a kernel image via the kexec_load syscall should fail
> -# when the kernel is CONFIG_KEXEC_VERIFY_SIG enabled and the system
> -# is booted in secureboot mode.
> +#
> +# Prevent loading a kernel image via the kexec_load syscall when
> +# signatures are required.  (Dependent on CONFIG_IMA_ARCH_POLICY.)
>  
>  TEST="$0"
>  . ./kexec_common_lib.sh
> @@ -18,20 +18,28 @@ if [ $? -eq 0 ]; then
>   log_skip "kexec_load is not enabled"
>  fi
>  
> +kconfig_enabled "CONFIG_IMA_APPRAISE=y" "IMA enabled"
> +ima_appraise=$?
> +
>

Re: [PATCH 2/3] scripts/ima: define a set of common functions

2019-03-07 Thread Dave Young
On 02/28/19 at 10:05am, Mimi Zohar wrote:
> Hi Dave,
> 
> On Thu, 2019-02-28 at 21:41 +0800, Dave Young wrote:
> > Hi Mimi,
> >  
> > Sorry for jumping in late, just noticed this kexec selftests, I think we
> > also need a kexec load test not only for ima, but for general kexec
> 
> The IMA kselftest tests are for the coordination between the different
> methods of verifying file signatures.  In particular, for the kexec
> kernel image and kernel module signatures.
> 
> The initial IMA kselftest just verifies that in an environment
> requiring signed kexec kernel images, the kexec_load syscall fails. 
> 
> This week I posted additional IMA kselftests[1][2], including one for
> the kexec_file_load syscall.  I would really appreciate these
> kselftests being reviewed/acked.
> 
> Mimi
> 
> [1] Subject: [PATCH v2 0/5] selftests/ima: add kexec and kernel module tests
> [2] Patches available from the "next-queued-testing" branch
> https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git/
> 

Hi Mimi,

Still did not get change to have a look at V2,  but seems you missed the
last chunk of comments about the secure boot mode in previous reply?

I just copy it hear:
'''
Do you want to get the Secureboot status here?
I got some advice from Peter Jones previously, thus we have below
in our kdump scripts:
https://src.fedoraproject.org/cgit/rpms/kexec-tools.git/tree/kdump-lib.sh

See the function is_secure_boot_enforced(), probably you can refer to
that function and check setup mode as well.
'''

Thanks
Dave


Re: [PATCH 2/3] scripts/ima: define a set of common functions

2019-02-28 Thread Dave Young
Hi Mimi,
 
Sorry for jumping in late, just noticed this kexec selftests, I think we
also need a kexec load test not only for ima, but for general kexec

On 01/31/19 at 01:55pm, Mimi Zohar wrote:
> Define and move get_secureboot_mode() to a common file for use by other
> tests.
> 
> Signed-off-by: Mimi Zohar 
> ---
>  tools/testing/selftests/ima/common_lib.sh  | 20 
>  tools/testing/selftests/ima/test_kexec_load.sh | 17 +++--
>  2 files changed, 23 insertions(+), 14 deletions(-)
>  create mode 100755 tools/testing/selftests/ima/common_lib.sh
> 
> diff --git a/tools/testing/selftests/ima/common_lib.sh 
> b/tools/testing/selftests/ima/common_lib.sh
> new file mode 100755
> index ..ae097a634da5
> --- /dev/null
> +++ b/tools/testing/selftests/ima/common_lib.sh
> @@ -0,0 +1,20 @@
> +#!/bin/sh
> +# SPDX-License-Identifier: GPL-2.0+
> +
> +get_secureboot_mode()
> +{
> + EFIVARFS="/sys/firmware/efi/efivars"
> + # Make sure that efivars is mounted in the normal location
> + if ! grep -q "^\S\+ $EFIVARFS efivarfs" /proc/mounts; then
> + echo "$TEST: efivars is not mounted on $EFIVARFS" >&2
> + exit $ksft_skip
> + fi
> +
> + # Get secureboot mode
> + file="$EFIVARFS/SecureBoot-*"
> + if [ ! -e $file ]; then
> + echo "$TEST: unknown secureboot mode" >&2
> + exit $ksft_skip
> + fi
> + return `hexdump $file | awk '{print substr($4,length($4),1)}'`
> +}

Do you want to get the Secureboot status here?
I got some advice from Peter Jones previously, thus we have below
in our kdump scripts:
https://src.fedoraproject.org/cgit/rpms/kexec-tools.git/tree/kdump-lib.sh
 
See the function is_secure_boot_enforced(), probably you can refer to
that function and check setup mode as well.

> diff --git a/tools/testing/selftests/ima/test_kexec_load.sh 
> b/tools/testing/selftests/ima/test_kexec_load.sh
> index 74423c4229e2..5e356673 100755
> --- a/tools/testing/selftests/ima/test_kexec_load.sh
> +++ b/tools/testing/selftests/ima/test_kexec_load.sh
> @@ -5,7 +5,7 @@
>  # is booted in secureboot mode.
>  
>  TEST="$0"
> -EFIVARFS="/sys/firmware/efi/efivars"
> +. ./common_lib.sh
>  rc=0
>  
>  # Kselftest framework requirement - SKIP code is 4.
> @@ -17,19 +17,8 @@ if [ $(id -ru) != 0 ]; then
>   exit $ksft_skip
>  fi
>  
> -# Make sure that efivars is mounted in the normal location
> -if ! grep -q "^\S\+ $EFIVARFS efivarfs" /proc/mounts; then
> - echo "$TEST: efivars is not mounted on $EFIVARFS" >&2
> - exit $ksft_skip
> -fi
> -
> -# Get secureboot mode
> -file="$EFIVARFS/SecureBoot-*"
> -if [ ! -e $file ]; then
> - echo "$TEST: unknown secureboot mode" >&2
> - exit $ksft_skip
> -fi
> -secureboot=`hexdump $file | awk '{print substr($4,length($4),1)}'`
> +get_secureboot_mode
> +secureboot=$?
>  
>  # kexec_load should fail in secure boot mode
>  KERNEL_IMAGE="/boot/vmlinuz-`uname -r`"
> -- 
> 2.7.5
> 
Thanks
Dave


Re: [PATCH] x86/boot/KASLR: skip the specified crashkernel reserved region

2019-02-25 Thread Dave Young
On 02/25/19 at 09:05pm, Baoquan He wrote:
> On 02/25/19 at 10:45am, Borislav Petkov wrote:
> > On Mon, Feb 25, 2019 at 03:59:56PM +0800, Pingfan Liu wrote:
> > > crashkernel=x@y option may fail to reserve the required memory region if
> > > KASLR puts kernel into the region. To avoid this uncertainty, making KASLR
> > > skip the required region.
> > 
> > Lemme see if I understand this correctly: supplying crashkernel=X@Y
> > influences where KASLR would put the randomized kernel. And it should be
> > the other way around, IMHO. crashkernel= will have to "work" with KASLR
> > to find a suitable range and if the reservation at Y fails, then we tell
> > the user to try the more relaxed variant crashkernel=M.
> 
> Hmm, asking user to try crashkernel=M is an option. Users may want to
> specify a region for crashkernel reservation, Just I forget in what case
> they want crashkernel=x@y set. In crashkernel=x@y specified case, we may
> truly need to avoid the already specified region.
> 
> Not sure if Dave still remember it. If no need, removing it is also
> good.

I do not know the exact use cases, but long time ago the kernel is not
relocatable this might be a reason.  Even now, there could be some non
linux use cases, if the loaded binary is not relocatable then the param
is still useful.

Also this is a general param instead of x86 only, some other arches
still use it, and no crashkernel=X implemented.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-25 Thread Dave Young
On 02/25/19 at 12:00pm, Joerg Roedel wrote:
> On Fri, Feb 22, 2019 at 02:00:26PM +0100, Borislav Petkov wrote:
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> > 
> > Exactly, and this is what makes sense.
> > 
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
> 
> Right, makes sense. While at it, maybe it is time to move the default
> allocation policy to 'high' again. The change was reverted six years ago
> because it broke old kexec tools, but those are probably out-of-service
> now. I think this change would make the whole crashdump allocation
> process less fragile.

One concern about this is for average cases, one do not need so much
memory for kdump.  For example in RHEL we use crashkernel=auto to
automatically reserve kdump kernel memory, and for x86 the reserved size
is like below now:
1G-64G:160M,64G-1T:256M,1T-:512M

That means for a machine with less than 64G memory we only allocate
160M, it works for most machines in our lab.

If we move to high as default, it will allocate 160M high + 256M low. It
is too much for people who is good with the default 160M.  Especially
for virtual machine with less memory (but > 4G)

To make the process less fragile maybe we can remove the 896M limitation
and only try <4G then go to high.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-24 Thread Dave Young
On 02/24/19 at 09:25pm, Pingfan Liu wrote:
> On Fri, Feb 22, 2019 at 9:00 PM Borislav Petkov  wrote:
> >
> > On Fri, Feb 22, 2019 at 09:42:41AM +0100, Joerg Roedel wrote:
> > > The current default of 256MB was found by experiments on a bigger
> > > number of machines, to create a reasonable default that is at least
> > > likely to be sufficient of an average machine.
> >
> > Exactly, and this is what makes sense.
> >
> > The code should try the requested reservation and if it fails, it should
> > try high allocation with default swiotlb size because we need to reserve
> > *some* range.
> >
> > If that reservation succeeds, we should say something along the lines of
> >
> > "... requested range failed, reserved  range instead."
> >
> Maybe I misunderstood you, but does "requested range failed" mean that
> user specify the range? If yes, then it should be the duty of user as
> you said later, not the duty of kernel"

If you go with the changes in your current patch it is needed to say
something like:
"crashkernel: can not find free memory under 4G, reserve XM@.. instead" 

Also need to print the reserved low memory area in case ,high being used.

But for 896M -> 4G, the 896M faulure is not necessary to show in dmesg,
it is some in kernel logic.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-21 Thread Dave Young
On 02/21/19 at 06:13pm, Borislav Petkov wrote:
> On Wed, Feb 20, 2019 at 05:41:46PM +0800, Dave Young wrote:
> > Previously Joerg posted below patch, maybe he has some idea. Joerg?
> 
> Isn't it clear from the commit message?

Then, does it answered your question?

256M is set as a default value in the patch, but it is not a predict to
satisfy all use cases, from the description it is also possible that some
people run out of the 256M and the ,low and ,high format is still
necessary to exist even if we make crashkernel=X do the allocation
automatically in high in case failed in low area.

crashkernel=X:  allocate in low first, if not possible, then allocate in
high

In case people have a lot of devices need more swiotlb, then he manually
set the ,high with ,low together.

What's your suggestion then?  remove ,low and ,high and increase default
256M in case we get failure bug report?


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-20 Thread Dave Young
On 02/20/19 at 09:32am, Borislav Petkov wrote:
> On Mon, Feb 18, 2019 at 09:48:20AM +0800, Dave Young wrote:
> > It is ideal if kernel can do it automatically, but I'm not sure if
> > kernel can predict the swiotlb reserved size automatically.
> 
> Do you see how even more absurd this gets?
> 
> If the kernel cannot know the swiotlb reserved size automatically, how
> is the normal user even supposed to know?!
> 
> I see swiotlb_size_or_default() so we have a sane default which we fall
> back to. Now where's the problem with that?

Good question, I expect some answer from people who know more about the
background.  It would be good to have some actual test results, Pingfan
is trying to do some tests.

Previously Joerg posted below patch, maybe he has some idea. Joerg?

commit 94fb9334182284e8e7e4bcb9125c25dc33af19d4
Author: Joerg Roedel 
Date:   Wed Jun 10 17:49:42 2015 +0200

x86/crash: Allocate enough low memory when crashkernel=high

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-17 Thread Dave Young
On 02/15/19 at 11:24am, Borislav Petkov wrote:
> On Tue, Feb 12, 2019 at 04:48:16AM +0800, Dave Young wrote:
> > Even we make it automatic in kernel, but we have to have some default
> > value for swiotlb in case crashkernel can not find a free region under 4G.
> > So this default value can not work for every use cases, people need 
> > manually use crashkernel=,low and crashkernel=,high in case
> > crashkernel=X does not work.
> 
> Why would the user need to find swiotlb range? The kernel has all the
> information it requires at its finger tips in order to decide properly.
> 
> The user wants a crashkernel range, the kernel tries the low range =>
> no workie, then it tries the next range => workie but needs to allocate
> swiotlb range so that DMA can happen too. Doh, then the kernel does
> allocate that too.

It is ideal if kernel can do it automatically, but I'm not sure if
kernel can predict the swiotlb reserved size automatically.

Let's add more people to seek for comments. 

> 
> Why would the user need to do anything here?!
> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-11 Thread Dave Young
On 02/06/19 at 08:08pm, Dave Young wrote:
> On 02/05/19 at 09:15am, Borislav Petkov wrote:
> > On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > > Is your objection only to the second fallback of allocating
> > > memory above >= 4GB?   Or are you objecting to allocating from
> > > (896 .. 4GB) as well?
> > 
> > My problem is why should the user need to specify high or low allocation
> > explicitly when we can handle all that in the kernel automatically.
> > 
> > The presence of crashkernel= on the cmdline sure means that the user
> > wants to allocate memory for a second kernel.
> > 
> > Now, if the requested allocation fails, we say:
> > 
> >   Error reserving crashkernel
> > 
> > So, instead of saying that, we can *try* *again* and say
> > 
> >   Error reserving requested crashkernel at @..., attempting a high range.
> > 
> > and run memblock_find_in_range() on the other regions which we deemed
> > are ok to allocate from.
> > 
> > Why aren't we doing that by default instead of placing all those
> > different options in front of the user and expecting her/him to know
> > something about all those magic ranges?
> 
> As we talked in another reply, for the >4G allocation we can not avoid
> the swiotlb issue,  but if one request for 256M in high region and we
> allocate the low part automatically, it will eat more memory eg. 512M.
> 
> But probably in case allacation failed in low region ,high is a must for kdump
> reservation, since no other choices perhaps we can make that as you said

That is exactly what Pingfan is doing in this patch.

Even we make it automatic in kernel, but we have to have some default
value for swiotlb in case crashkernel can not find a free region under 4G.
So this default value can not work for every use cases, people need 
manually use crashkernel=,low and crashkernel=,high in case
crashkernel=X does not work.  One can tune it for their use:

1) crashkernel=X reservation fails, likely the ,low default value is
still too big, one can shrink the value and manually try other value
2) crashkernel=X reserve successfully on high memory and along with some
default low memory region. But the low region is not enough.  In this
case one can increase the 

This should answer the question why ,high and ,low is still needed.

But for above consumption 1),  KASLR can still cause default ,low memory
failed to reserve.  So I wonder if KASLR can skip the 0-896M if the
system memory is big enough.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-02-06 Thread Dave Young
On 02/05/19 at 09:15am, Borislav Petkov wrote:
> On Mon, Feb 04, 2019 at 03:30:16PM -0700, Jerry Hoemann wrote:
> > Is your objection only to the second fallback of allocating
> > memory above >= 4GB?   Or are you objecting to allocating from
> > (896 .. 4GB) as well?
> 
> My problem is why should the user need to specify high or low allocation
> explicitly when we can handle all that in the kernel automatically.
> 
> The presence of crashkernel= on the cmdline sure means that the user
> wants to allocate memory for a second kernel.
> 
> Now, if the requested allocation fails, we say:
> 
>   Error reserving crashkernel
> 
> So, instead of saying that, we can *try* *again* and say
> 
>   Error reserving requested crashkernel at @..., attempting a high range.
> 
> and run memblock_find_in_range() on the other regions which we deemed
> are ok to allocate from.
> 
> Why aren't we doing that by default instead of placing all those
> different options in front of the user and expecting her/him to know
> something about all those magic ranges?

As we talked in another reply, for the >4G allocation we can not avoid
the swiotlb issue,  but if one request for 256M in high region and we
allocate the low part automatically, it will eat more memory eg. 512M.

But probably in case allacation failed in low region ,high is a must for kdump
reservation, since no other choices perhaps we can make that as you said

> 
> I don't think most of the users care about where the kernel gets
> allocated - all they want is a working kdump setup.
> 
> > Falling back to allocating < 4GB probably satisfes most of the cases
> > where the original allocation fails.
> 
> Yes. Now make that automatic.

For the time being, this should be good enough.

> 
> -- 
> Regards/Gruss,
> Boris.
> 
> Good mailing practices for 400: avoid top-posting and trim the reply.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-30 Thread Dave Young
On 01/25/19 at 03:08pm, Borislav Petkov wrote:
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area.  May because
> > <4G memory is limited on large server, they want to leave this for other
> > use. 
> > 
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
> 
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.

Got more about this, actually the thing is for large server with very
huge memory and also possible a lot of io devices.  The UEFI IO drivers
could allocate a lot memory under 896M,  so it will leave small room for
crashkernel=X

Also if the memory is huge, then copying the vmcore will be very slow,
people want to use big makedumpfile bitmap buffer, and multithread, then
kdump kernel needs more memory, they can choose ,high to get more
memory.

But for common use cases, if one does not need so much kdump memory
reserved he/she can just use crashkernel=X.


> 
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.

As Pingfan/me mentioned in another reply, there are two reasons:
1. old kexec-tools can not load kernel to high memory
2. ,high will not work on some systems without some amounts of low
memory so it nees reserve extra low memory, it is bad for people who do
not want it. 

> 
> > Good question, still it may be some historical reason, but it is good to
> > make them clear and rethink about it after long time.
> > 
> > I also want to understand, need dig the log more.
> 
> Good idea. That would be a very nice cleanup. :-)

Let's review these different parameters:
> crashkernel=size[KMG][@offset[KMG]]

Did not get the initial commit for this, it should be the very old
format, and kernel did not find the base address automatically for the
size

Other arches still use this, for example mips code seems needs explict
@offset, see the function mips_parse_crashkernel in
arch/mips/kernel/setup.c

Probably there are other arches as well which only support this format.

> crashkernel=range1:size1[,range2:size2,...][@offset]

This is nice param which can help to dynamically reserve different size
depends on system memory.

> crashkernel=size[KMG],high
> crashkernel=size[KMG],low

We have talked about these two.  It is not graceful, but seems no way to
improve it due to the swiotlb issue.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-30 Thread Dave Young
On 01/29/19 at 01:25pm, Pingfan Liu wrote:
> On Fri, Jan 25, 2019 at 10:08 PM Borislav Petkov  wrote:
> >
> > On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > > AFAIK, some people prefer to explictly reserve crash memory at high
> > > region even if it is possible to reserve at low area.  May because
> > > <4G memory is limited on large server, they want to leave this for other
> > > use.
> > >
> > > Yinghai or Vivek should know more about the history, probably they can
> > > recall some initial reason.
> >
> Go through the git log, and I found the initial introduction of
> crashkernel_high option. Refer to
> commit 55a20ee7804ab64ac90bcdd4e2868a42829e2784
> Author: Yinghai Lu 
> Date:   Mon Apr 15 22:23:47 2013 -0700
> 
> x86, kdump: Retore crashkernel= to allocate under 896M
> 
> Vivek found old kexec-tools does not work new kernel anymore.
> 
> So change back crashkernel= back to old behavoir, and add 
> crashkernel_high=
> to let user decide if buffer could be above 4G, and also new
> kexec-tools will
> be needed.
> 
> But kexec-tools-2.0.3, released at 2012, can run 4.20 kernel with
> crashkernel=256M@5G, so I think only very old kexec-tools requires
> memory under 896M. Due to -1.few people running latest kernel with
> very old kexec-tools to date, -2. crashkernel=X is more popular than
> crashkernel=X.high, it should be time to eliminate this limit of
> crashkernel=X parameter, otherwise we will run into this bug.
> As for crashkernel=,high, I think it is a more professional option for
> who cares about the DMA32. On high-end machine, big reserved region is
> used for crashkernel(e.g. in this case 384M), which make the crowed
> situation under under 4GB memory worse.

This seems answers the question why ,high is needed and why
crashkernel=X can not allocate from high by default.

Another reason for this question is for crashkernel=,high, a separate
region under 4G is still needed.  I searched some history bug/emails,
previously the initial commit does not reserve low memory by default if
,high is used, but later Yinghai saw a regression bug in case iommu
disabled. see below commit:
commit c729de8fcea37a1c444e81857eace12494c804a9
Author: Yinghai Lu 
Date:   Mon Apr 15 22:23:45 2013 -0700

x86, kdump: Set crashkernel_low automatically

Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y.  This causes regression
if iommu is not enabled.  Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.
[snip]



> 
> > Yes, just "prefer" is not good enough. There should be a technical
> > reason why that's there.
> >
> > Also, if the user doesn't care, then the code should be free to force
> > "high" and thus probe a different range for allocation.
> >
> Do you suggest to remove crashkernel=X,high parameter?

I do not think it can be removed,  because for ,high we also need extra
low memory, it will cause confusion, most people are just fine without
this.  People can choose to use ,high if they really need it.

Thanks
Dave


Re: [PATCHv7] x86/kdump: bugfix, make the behavior of crashkernel=X consistent with kaslr

2019-01-28 Thread Dave Young
On 01/25/19 at 03:08pm, Borislav Petkov wrote:
> On Fri, Jan 25, 2019 at 09:45:18PM +0800, Dave Young wrote:
> > AFAIK, some people prefer to explictly reserve crash memory at high
> > region even if it is possible to reserve at low area.  May because
> > <4G memory is limited on large server, they want to leave this for other
> > use. 
> > 
> > Yinghai or Vivek should know more about the history, probably they can
> > recall some initial reason.
> 
> Yes, just "prefer" is not good enough. There should be a technical
> reason why that's there.
> 
> Also, if the user doesn't care, then the code should be free to force
> "high" and thus probe a different range for allocation.

Another reason is in case ,high we will need automatically reserve a
region in low area for swiotlb.  So for example one use
crashkernel=256M,high,  actual reserved memory is 256M above 4G and
another 256M under 4G for swiotlb.  Normally it is not necessary for
most people.  Thus we can not make ,high as default.

Thanks
Dave


  1   2   3   4   5   6   7   8   9   10   >