Re: [PATCH] kdump: add default crashkernel reserve kernel config options

2018-05-21 Thread Dave Young
On 05/21/18 at 12:02pm, Andrew Morton wrote:
> On Mon, 21 May 2018 10:53:37 +0800 Dave Young  wrote:
> 
> > This is a rework of the crashkernel=auto patches back to 2009 although
> > I'm not sure if below is the last version of the old effort:
> > https://lkml.org/lkml/2009/8/12/61
> > https://lwn.net/Articles/345344/
> > 
> > I changed the original design, instead of adding the auto reserve logic
> > in code, in this patch just introduce two kernel config options for
> > the default crashkernel value in MB and the threshold of system memory
> > in MB so that only reserve default when system memory is equal or
> > above the threshold.
> > 
> > With the kernel configs distributions can easily change the default
> > values so that people do not need to manually set kernel cmdline
> > for common use cases and one can still overwrite the default value
> > with manual setup or disable it by using crashkernel=0
> > 
> > Signed-off-by: Dave Young 
> > ---
> > Another difference is with original design the crashkernel size scales
> > with system memory, according to test, large machine may need more
> > memory in kdump kernel because of several factors:
> > 1. cpu numbers, because of the percpu memory allocated for cpus.
> >(kdump can use nr_cpus=1 to workaround this, but some
> > arches do not support nr_cpus=X for example powerpc) 
> > 2. IO devices, large system can have a lot of io devices, although we
> >can try to only add those device drivers we needed, it is still a
> >problem because of some built-in drivers, some stacked logical devices
> >eg. device mapper devices, acpi etc.  Even if only considering the
> >meta data for driver model it will still be a big number eg. sysfs
> >files etc.
> > 3. The minimum memory requirement for some device drivers are big, even
> >if some of them have implemented low meory profile.  It is usual to see
> >10M memory use for a storage driver.
> > 4. user space initramfs size growing.  Busybox is not usable if we need
> >to add udev support and some complicate storage support.  Use dracut
> >with systemd, especially networking stuff need more memory.
> > 
> > So probably add another kernel config option to scale the memory size
> > eg.  CRASHKERNEL_DEFAULT_SCALE_RATIO is also good to have,  in RHEL we
> > use base_value + system_mem >> (2^14) for x86.  I'm still hesatating
> > how to describe and add this option. Any suggestions will be appreciated.
> > 
> > ...
> >
> > --- linux-x86.orig/arch/Kconfig
> > +++ linux-x86/arch/Kconfig
> > @@ -10,6 +10,22 @@ config KEXEC_CORE
> > select CRASH_CORE
> > bool
> >  
> > +config CRASHKERNEL_DEFAULT_THRESHOLD_MB
> > +   int "System memory size threshold for kdump memory default reserving"
> > +   depends on CRASH_CORE
> > +   default 0
> > +   help
> > + CRASHKERNEL_DEFAULT_MB is used as default crashkernel value if
> > + the system memory size is equal or bigger than the threshold.
> 
> "the threshold" is rather vague.  Can it be clarified?
> 
> In fact I'm really struggling to understand the logic here

Sorry I missed this comment, the threshold is like this:

if system memory size is lower than the threshold then we will do
nothing and do not reserve crashkernel memory by default.  Eg. if the
threshold is 1024M then default reservation is only used when system has
more than 1024M memory, and for those low mem machine we do not reserve by
default.

Thanks
Dave

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kdump: add default crashkernel reserve kernel config options

2018-05-21 Thread Dave Young
On 05/21/18 at 12:02pm, Andrew Morton wrote:
> On Mon, 21 May 2018 10:53:37 +0800 Dave Young  wrote:
> 
> > This is a rework of the crashkernel=auto patches back to 2009 although
> > I'm not sure if below is the last version of the old effort:
> > https://lkml.org/lkml/2009/8/12/61
> > https://lwn.net/Articles/345344/
> > 
> > I changed the original design, instead of adding the auto reserve logic
> > in code, in this patch just introduce two kernel config options for
> > the default crashkernel value in MB and the threshold of system memory
> > in MB so that only reserve default when system memory is equal or
> > above the threshold.
> > 
> > With the kernel configs distributions can easily change the default
> > values so that people do not need to manually set kernel cmdline
> > for common use cases and one can still overwrite the default value
> > with manual setup or disable it by using crashkernel=0
> > 
> > Signed-off-by: Dave Young 
> > ---
> > Another difference is with original design the crashkernel size scales
> > with system memory, according to test, large machine may need more
> > memory in kdump kernel because of several factors:
> > 1. cpu numbers, because of the percpu memory allocated for cpus.
> >(kdump can use nr_cpus=1 to workaround this, but some
> > arches do not support nr_cpus=X for example powerpc) 
> > 2. IO devices, large system can have a lot of io devices, although we
> >can try to only add those device drivers we needed, it is still a
> >problem because of some built-in drivers, some stacked logical devices
> >eg. device mapper devices, acpi etc.  Even if only considering the
> >meta data for driver model it will still be a big number eg. sysfs
> >files etc.
> > 3. The minimum memory requirement for some device drivers are big, even
> >if some of them have implemented low meory profile.  It is usual to see
> >10M memory use for a storage driver.
> > 4. user space initramfs size growing.  Busybox is not usable if we need
> >to add udev support and some complicate storage support.  Use dracut
> >with systemd, especially networking stuff need more memory.
> > 
> > So probably add another kernel config option to scale the memory size
> > eg.  CRASHKERNEL_DEFAULT_SCALE_RATIO is also good to have,  in RHEL we
> > use base_value + system_mem >> (2^14) for x86.  I'm still hesatating
> > how to describe and add this option. Any suggestions will be appreciated.
> > 
> > ...
> >
> > --- linux-x86.orig/arch/Kconfig
> > +++ linux-x86/arch/Kconfig
> > @@ -10,6 +10,22 @@ config KEXEC_CORE
> > select CRASH_CORE
> > bool
> >  
> > +config CRASHKERNEL_DEFAULT_THRESHOLD_MB
> > +   int "System memory size threshold for kdump memory default reserving"
> > +   depends on CRASH_CORE
> > +   default 0
> > +   help
> > + CRASHKERNEL_DEFAULT_MB is used as default crashkernel value if
> > + the system memory size is equal or bigger than the threshold.
> 
> "the threshold" is rather vague.  Can it be clarified?
> 
> In fact I'm really struggling to understand the logic here
> 
> 
> > +config CRASHKERNEL_DEFAULT_MB
> > +   int "Default crashkernel memory size reserved for kdump"
> > +   depends on CRASH_CORE
> > +   default 0
> > +   help
> > + This is used as the default kdump reserved memory size in MB.
> > + crashkernel=X kernel cmdline can overwrite this value.
> > +
> >  config HAVE_IMA_KEXEC
> > bool
> >  
> > @@ -143,6 +144,24 @@ static int __init parse_crashkernel_simp
> > return 0;
> >  }
> >  
> > +static int __init get_crashkernel_default(unsigned long long system_ram,
> > + unsigned long long *size)
> > +{
> > +   unsigned long long sz = CONFIG_CRASHKERNEL_DEFAULT_MB;
> > +   unsigned long long thres = CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB;
> > +
> > +   thres *= SZ_1M;
> > +   sz *= SZ_1M;
> > +
> > +   if (sz >= system_ram || system_ram < thres) {
> > +   pr_debug("crashkernel default size can not be used.\n");
> > +   return -EINVAL;
> 
> In other words,
> 
>   if (system_ram <= CONFIG_CRASHKERNEL_DEFAULT_MB ||
>   system_ram < CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB)
>   fail;
> 
> yes?

the first comparison is a sanity check for the default reserved
size, if it is bigger than system ram size it is apparently bad:
if ( CONFIG_CRASHKERNEL_DEFAULT_MB >= system_ram )
fail;

The second comparison is for the threshold setting, it is a designed
logic like:
if ( system_ram >= CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB ) then
go ahead to use the default value of CONFIG_CRASHKERNEL_DEFAULT_MB

> 
> How come?  What's happening here?  Perhaps a (good) explanatory comment
> is needed.  And clearer Kconfig text.
> 
> All confused :(

Hmm, scratch head~, will think about how to describe it better.  If you
have any suggestions just let me know :)

Thanks
Dave


___
kexec mailing list
kexec@l

Proposal

2018-05-21 Thread Miss Zeliha Omer Faruk



Hello

Greetings to you please i have a business proposal for you contact me
for more detailes asap thanks.

Best Regards,
Miss.Zeliha ömer faruk
Esentepe Mahallesi Büyükdere
Caddesi Kristal Kule Binasi
No:215
Sisli - Istanbul, Turkey


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kdump: add default crashkernel reserve kernel config options

2018-05-21 Thread Andrew Morton
On Mon, 21 May 2018 10:53:37 +0800 Dave Young  wrote:

> This is a rework of the crashkernel=auto patches back to 2009 although
> I'm not sure if below is the last version of the old effort:
> https://lkml.org/lkml/2009/8/12/61
> https://lwn.net/Articles/345344/
> 
> I changed the original design, instead of adding the auto reserve logic
> in code, in this patch just introduce two kernel config options for
> the default crashkernel value in MB and the threshold of system memory
> in MB so that only reserve default when system memory is equal or
> above the threshold.
> 
> With the kernel configs distributions can easily change the default
> values so that people do not need to manually set kernel cmdline
> for common use cases and one can still overwrite the default value
> with manual setup or disable it by using crashkernel=0
> 
> Signed-off-by: Dave Young 
> ---
> Another difference is with original design the crashkernel size scales
> with system memory, according to test, large machine may need more
> memory in kdump kernel because of several factors:
> 1. cpu numbers, because of the percpu memory allocated for cpus.
>(kdump can use nr_cpus=1 to workaround this, but some
> arches do not support nr_cpus=X for example powerpc) 
> 2. IO devices, large system can have a lot of io devices, although we
>can try to only add those device drivers we needed, it is still a
>problem because of some built-in drivers, some stacked logical devices
>eg. device mapper devices, acpi etc.  Even if only considering the
>meta data for driver model it will still be a big number eg. sysfs
>files etc.
> 3. The minimum memory requirement for some device drivers are big, even
>if some of them have implemented low meory profile.  It is usual to see
>10M memory use for a storage driver.
> 4. user space initramfs size growing.  Busybox is not usable if we need
>to add udev support and some complicate storage support.  Use dracut
>with systemd, especially networking stuff need more memory.
> 
> So probably add another kernel config option to scale the memory size
> eg.  CRASHKERNEL_DEFAULT_SCALE_RATIO is also good to have,  in RHEL we
> use base_value + system_mem >> (2^14) for x86.  I'm still hesatating
> how to describe and add this option. Any suggestions will be appreciated.
> 
> ...
>
> --- linux-x86.orig/arch/Kconfig
> +++ linux-x86/arch/Kconfig
> @@ -10,6 +10,22 @@ config KEXEC_CORE
>   select CRASH_CORE
>   bool
>  
> +config CRASHKERNEL_DEFAULT_THRESHOLD_MB
> + int "System memory size threshold for kdump memory default reserving"
> + depends on CRASH_CORE
> + default 0
> + help
> +   CRASHKERNEL_DEFAULT_MB is used as default crashkernel value if
> +   the system memory size is equal or bigger than the threshold.

"the threshold" is rather vague.  Can it be clarified?

In fact I'm really struggling to understand the logic here


> +config CRASHKERNEL_DEFAULT_MB
> + int "Default crashkernel memory size reserved for kdump"
> + depends on CRASH_CORE
> + default 0
> + help
> +   This is used as the default kdump reserved memory size in MB.
> +   crashkernel=X kernel cmdline can overwrite this value.
> +
>  config HAVE_IMA_KEXEC
>   bool
>  
> @@ -143,6 +144,24 @@ static int __init parse_crashkernel_simp
>   return 0;
>  }
>  
> +static int __init get_crashkernel_default(unsigned long long system_ram,
> +   unsigned long long *size)
> +{
> + unsigned long long sz = CONFIG_CRASHKERNEL_DEFAULT_MB;
> + unsigned long long thres = CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB;
> +
> + thres *= SZ_1M;
> + sz *= SZ_1M;
> +
> + if (sz >= system_ram || system_ram < thres) {
> + pr_debug("crashkernel default size can not be used.\n");
> + return -EINVAL;

In other words,

if (system_ram <= CONFIG_CRASHKERNEL_DEFAULT_MB ||
system_ram < CONFIG_CRASHKERNEL_DEFAULT_THRESHOLD_MB)
fail;

yes?

How come?  What's happening here?  Perhaps a (good) explanatory comment
is needed.  And clearer Kconfig text.

All confused :(

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH net-next] vmcore: move get_vmcore_size out of __init

2018-05-21 Thread David Miller
From: Rahul Lakkireddy 
Date: Mon, 21 May 2018 19:07:50 +0530

> Fix below build warning:
> 
> WARNING: vmlinux.o(.text+0x422bb8): Section mismatch in reference from
> the function vmcore_add_device_dump() to the function
> .init.text:get_vmcore_size.constprop.5()
> 
> The function vmcore_add_device_dump() references
> the function __init get_vmcore_size.constprop.5().
> This is often because vmcore_add_device_dump lacks a __init
> annotation or the annotation of get_vmcore_size.constprop.5 is wrong.
> 
> Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes")
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Ganesh Goudar 

Applied, thank you.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH net-next] vmcore: move get_vmcore_size out of __init

2018-05-21 Thread Rahul Lakkireddy
Fix below build warning:

WARNING: vmlinux.o(.text+0x422bb8): Section mismatch in reference from
the function vmcore_add_device_dump() to the function
.init.text:get_vmcore_size.constprop.5()

The function vmcore_add_device_dump() references
the function __init get_vmcore_size.constprop.5().
This is often because vmcore_add_device_dump lacks a __init
annotation or the annotation of get_vmcore_size.constprop.5 is wrong.

Fixes: 7efe48df8a3d ("vmcore: append device dumps to vmcore as elf notes")
Signed-off-by: Rahul Lakkireddy 
Signed-off-by: Ganesh Goudar 
---
 fs/proc/vmcore.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 247c3499e5bd..cfb6674331fd 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -649,8 +649,8 @@ static struct vmcore* __init get_new_element(void)
return kzalloc(sizeof(struct vmcore), GFP_KERNEL);
 }
 
-static u64 __init get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
- struct list_head *vc_list)
+static u64 get_vmcore_size(size_t elfsz, size_t elfnotesegsz,
+  struct list_head *vc_list)
 {
u64 size;
struct vmcore *m;
-- 
2.14.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 0/2] support kdump for AMD secure memory encryption(sme)

2018-05-21 Thread Tom Lendacky
On 5/20/2018 10:45 PM, lijiang wrote:
> 在 2018年05月17日 21:45, lijiang 写道:
>> 在 2018年05月15日 21:31, Tom Lendacky 写道:
>>> On 5/14/2018 8:51 PM, Lianbo Jiang wrote:
 It is convenient to remap the old memory encrypted to the second kernel by
 calling ioremap_encrypted().

 When sme enabled on AMD server, we also need to support kdump. Because
 the memory is encrypted in the first kernel, we will remap the old memory
 encrypted to the second kernel(crash kernel), and sme is also enabled in
 the second kernel, otherwise the old memory encrypted can not be decrypted.
 Because simply changing the value of a C-bit on a page will not
 automatically encrypt the existing contents of a page, and any data in the
 page prior to the C-bit modification will become unintelligible. A page of
 memory that is marked encrypted will be automatically decrypted when read
 from DRAM and will be automatically encrypted when written to DRAM.

 For the kdump, it is necessary to distinguish whether the memory is
 encrypted. Furthermore, we should also know which part of the memory is
 encrypted or decrypted. We will appropriately remap the memory according
 to the specific situation in order to tell cpu how to deal with the
 data(encrypted or decrypted). For example, when sme enabled, if the old
 memory is encrypted, we will remap the old memory in encrypted way, which
 will automatically decrypt the old memory encrypted when we read those data
 from the remapping address.

  --
 | first-kernel | second-kernel | kdump support |
 |  (mem_encrypt=on|off)|   (yes|no)| 
 |--+---+---|
 | on   | on| yes   |
 | off  | off   | yes   |
 | on   | off   | no|
 | off  | on| no|
 |__|___|___|

 Test tools:
 makedumpfile[v1.6.3]: https://github.com/LianboJ/makedumpfile
 commit e1de103eca8f (A draft for kdump vmcore about AMD SME)
 Author: Lianbo Jiang 
 Date:   Mon May 14 17:02:40 2018 +0800
 Note: This patch can only dump vmcore in the case of SME enabled.

 crash-7.2.1: https://github.com/crash-utility/crash.git
 commit 1e1bd9c4c1be (Fix for the "bpf" command display on Linux 4.17-rc1)
 Author: Dave Anderson 
 Date:   Fri May 11 15:54:32 2018 -0400

 Test environment:
 HP ProLiant DL385Gen10 AMD EPYC 7251
 8-Core Processor
 32768 MB memory
 600 GB disk space

 Linux 4.17-rc4:
 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 commit 75bc37fefc44 ("Linux 4.17-rc4")
 Author: Linus Torvalds 
 Date:   Sun May 6 16:57:38 2018 -1000

 Reference:
 AMD64 Architecture Programmer's Manual
 https://support.amd.com/TechDocs/24593.pdf

>>>
>>> Have you also tested this with SEV?  It would be nice if the kdump
>>> changes you make work with both SME and SEV.
>>>
>> Thank you, Tom.
>> This is a great question, we originally plan to implement SEV in subsequent 
>> patches, and
>> we are also working on SEV at present.
>> Furthermore, we have another known issue that the system can't jump into the 
>> second kernel
>> when SME is enabled and kaslr is disabled in kdump mode. It seems that is a 
>> complex problems,
>> maybe it is related to kaslr and SME, currently, i'm not sure the root 
>> cause, but we will
>> also plan to fix it. Can you give me any advice about this issue?
>>
> Based on upstream 4.17-rc5, we have recently found a new issue that the 
> system can't boot to
> use kexec when SME is enabled and kaslr is disabled. Have you ever run into 
> this issue? 
> They have similar reproduction scenarios.
> 
> CC Tom & Baoquan

I haven't encountered this issue.  Can you send the kernel config that you
are using?  And your kernel command line?  I'll try your config and see if
I can reproduce the issue.

Just to be clarify, you perform a power-on boot with SME enabled and KASLR
disabled (e.g. mem_encrypt=on nokaslr), correct, and that won't boot?

Thanks,
Tom

> 
> Thanks.
> Lianbo
> 
>> Thanks.
>> Lianbo
>>> Thanks,
>>> Tom
>>>
 Lianbo Jiang (2):
   add a function(ioremap_encrypted) for kdump when AMD sme enabled.
   support kdump when AMD secure memory encryption is active

  arch/x86/include/asm/dmi.h  | 14 +-
  arch/x86/include/asm/io.h   |  2 ++
  arch/x86/kernel/acpi/boot.c |  8 
  arch/x86/kernel/crash_dump_64.c | 27 +++
  arch/x86/mm/ioremap.c   | 25 +
  drivers/acpi/tables.c   | 14 +-
  drivers/iommu/amd_iommu_init.c  |  9 -
  fs/proc/vmcore.c| 36 +++-
  

Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support

2018-05-21 Thread AKASHI Takahiro
Hi Rob,

On Fri, May 18, 2018 at 10:35:52AM -0500, Rob Herring wrote:
> On Tue, May 15, 2018 at 06:12:59PM +0100, James Morse wrote:
> > Hi guys,
> > 
> > (CC: +RobH, devicetree list)
> 
> Thanks.
> 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > Enabling crash dump (kdump) includes
> > > * prepare contents of ELF header of a core dump file, /proc/vmcore,
> > >   using crash_prepare_elf64_headers(), and
> > > * add two device tree properties, "linux,usable-memory-range" and
> > >   "linux,elfcorehdr", which represent repsectively a memory range
> > >   to be used by crash dump kernel and the header's location
> 
> BTW, I intend to move existing parsing these out of the arch code. 
> Please don't add more DT handling to arch/ unless it is *really* arch 
> specific. I'd assume that the next arch to add kexec support will use 
> these bindings instead of the powerpc way.

So do you expect all the fdt-related stuff in my current implementation
for arm64 to be put into libfdt, or at least drivers/of, from the beginning?

I'm not sure how arch-specific the properties here are. For instance,
it is only arm64 that uses "linux,usable-memory-range" right now but
if some other arch follows, it is no more arch-specific.
# I remember that you didn't like this property :)

> > kexec_file_load() on arm64 needs to be able to create a prop encoded array 
> > to
> > the FDT, but there doesn't appear to be a libfdt helper to do this.
> > 
> > Akashi's code below adds fdt_setprop_range() to the arch code, and 
> > duplicates
> > bits of libfdt_internal.h to do the work.
> > 
> > How should this be done? I'm assuming this is something we need a new API in
> > libfdt.h for. How do these come about, and is there an interim step we can 
> > use
> > until then?
> 
> Submit patches to upstream dtc and then we can pull it in. Ahead of that 
> you can add it to drivers/of/fdt.c (or maybe fdt_address.c because 
> that's really what this is dealing with).

OK, I'm going to try to follow your suggestion.

> libfdt has only recently gained the beginnings of address handling.
> 
> > 
> > Thanks!
> > 
> > James
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c 
> > > b/arch/arm64/kernel/machine_kexec_file.c
> > > index 37c0a9dc2e47..ec674f4d267c 100644
> > > --- a/arch/arm64/kernel/machine_kexec_file.c
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -76,6 +81,78 @@ int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > >   return ret;
> > >  }
> > >  
> > > +static int __init arch_kexec_file_init(void)
> > > +{
> > > + /* Those values are used later on loading the kernel */
> > > + __dt_root_addr_cells = dt_root_addr_cells;
> > > + __dt_root_size_cells = dt_root_size_cells;
> 
> I intend to make dt_root_*_cells private, so don't add another user 
> outside of drivers/of/.

Once cells_size_fitted() moves to drivers/of, there will be no users.

> > > +
> > > + return 0;
> > > +}
> > > +late_initcall(arch_kexec_file_init);
> > > +
> > > +#define FDT_ALIGN(x, a)  (((x) + (a) - 1) & ~((a) - 1))
> > > +#define FDT_TAGALIGN(x)  (FDT_ALIGN((x), FDT_TAGSIZE))
> > > +
> > > +static int fdt_prop_len(const char *prop_name, int len)
> > > +{
> > > + return (strlen(prop_name) + 1) +
> > > + sizeof(struct fdt_property) +
> > > + FDT_TAGALIGN(len);
> > > +}
> > > +
> > > +static bool cells_size_fitted(unsigned long base, unsigned long size)
> 
> I can't imagine this would happen. However, when this is moved to 
> drivers/of/ or dtc, these need to be u64 types to work on 32-bit.

OK.

> > > + /* if *_cells >= 2, cells can hold 64-bit values anyway */
> > > + if ((__dt_root_addr_cells == 1) && (base >= (1ULL << 32)))
> > > + return false;
> > > +
> > > + if ((__dt_root_size_cells == 1) && (size >= (1ULL << 32)))
> > > + return false;
> > > +
> > > + return true;
> > > +}
> > > +
> > > +static void fill_property(void *buf, u64 val64, int cells)
> > > +{
> > > + u32 val32;
> 
> This should be a __be32 or fdt32 type. So should buf.

OK for val32, but buf is a local pointer address.

> > > +
> > > + if (cells == 1) {
> > > + val32 = cpu_to_fdt32((u32)val64);
> > > + memcpy(buf, &val32, sizeof(val32));
> > > + } else {
> > > + memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> > > + buf += cells * sizeof(u32) - sizeof(u64);
> > > +
> > > + val64 = cpu_to_fdt64(val64);
> > > + memcpy(buf, &val64, sizeof(val64));
> 
> Look how of_read_number() is implemented. You should be able to do 
> something similar here looping and avoiding the if/else.

Ah, excellent!

> > > + }
> > > +}
> > > +
> > > +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> > > + unsigned long addr, unsigned long size)
> 
> A very generic sounding function, but really only works on addresses in 
> children of the root node.
> 
> > > +{
> > > + void *buf, *prop;
> > > + size_t buf_size;
> > > + int result;
> > > +
> > > + buf_size = (__dt_ro

Re: [PATCH v9 07/11] arm64: kexec_file: add crash dump support

2018-05-21 Thread AKASHI Takahiro
James,

On Fri, May 18, 2018 at 05:00:55PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 18/05/18 11:39, AKASHI Takahiro wrote:
> > On Tue, May 15, 2018 at 06:11:15PM +0100, James Morse wrote:
> >> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>> Enabling crash dump (kdump) includes
> >>> * prepare contents of ELF header of a core dump file, /proc/vmcore,
> >>>   using crash_prepare_elf64_headers(), and
> >>> * add two device tree properties, "linux,usable-memory-range" and
> >>>   "linux,elfcorehdr", which represent repsectively a memory range
> 
> >>> diff --git a/arch/arm64/kernel/machine_kexec_file.c 
> >>> b/arch/arm64/kernel/machine_kexec_file.c
> >>> index 37c0a9dc2e47..ec674f4d267c 100644
> >>> --- a/arch/arm64/kernel/machine_kexec_file.c
> >>> +++ b/arch/arm64/kernel/machine_kexec_file.c
> 
> >>> +static void fill_property(void *buf, u64 val64, int cells)
> >>> +{
> >>> + u32 val32;
> >>> +
> >>> + if (cells == 1) {
> >>> + val32 = cpu_to_fdt32((u32)val64);
> >>> + memcpy(buf, &val32, sizeof(val32));
> >>> + } else {
> >>
> >>> + memset(buf, 0, cells * sizeof(u32) - sizeof(u64));
> >>> + buf += cells * sizeof(u32) - sizeof(u64);
> >>
> >> Is this trying to clear the 'top' cells and shuffle the pointer to point 
> >> at the
> >> 'bottom' 2? I'm pretty sure this isn't endian safe.
> >>
> >> Do we really expect a system to have #address-cells > 2?
> > 
> > I don't know, but just for safety.
> 
> Okay, so this is aiming to be a cover-all-cases library function.
> 
> 
> >>> + val64 = cpu_to_fdt64(val64);
> >>> + memcpy(buf, &val64, sizeof(val64));
> >>> + }
> >>> +}
> >>> +
> >>> +static int fdt_setprop_range(void *fdt, int nodeoffset, const char *name,
> >>> + unsigned long addr, unsigned long size)
> >>
> >> (the device-tree spec describes a 'ranges' property, which had me 
> >> confused. This
> >> is encoding a prop-encoded-array)
> > 
> > Should we rename it to, say, fdt_setprop_reg()?
> 
> Sure, but I'd really like this code to come from libfdt. I'm hoping for some
> temporary workaround, lets see what the DT folk say.

OK, I will follow Rob's suggestion.

> >>> + if (!buf)
> >>> + return -ENOMEM;
> >>> +
> >>> + fill_property(prop, addr, __dt_root_addr_cells);
> >>> + prop += __dt_root_addr_cells * sizeof(u32);
> >>> +
> >>> + fill_property(prop, size, __dt_root_size_cells);
> >>> +
> >>> + result = fdt_setprop(fdt, nodeoffset, name, buf, buf_size);
> >>> +
> >>> + vfree(buf);
> >>> +
> >>> + return result;
> >>> +}
> >>
> >> Doesn't this stuff belong in libfdt? I guess there is no 'add array 
> >> element' api
> >> because this the first time we've wanted to create a node with more than
> >> key=fixed-size-value.
> >>
> >> I don't think this belongs in arch C code. Do we have a plan for getting 
> >> libfdt
> >> to support encoding prop-arrays? Can we put it somewhere anyone else 
> >> duplicating
> >> this will find it, until we can (re)move it?
> > 
> > I will temporarily move all fdt-related stuff to a separate file, but
> > 
> >> I have no idea how that happens... it looks like the devicetree list is the
> >> place to ask.
> > 
> > should we always sync with the original dtc/libfdt repository?
> 
> I thought so, libfdt is one of those external libraries that the kernel
> consumes, like acpica. For acpica at least the rule is changes go upstream, 
> then
> get sync'd back.

Same above.

> >>>  static int setup_dtb(struct kimage *image,
> >>>   unsigned long initrd_load_addr, unsigned long initrd_len,
> >>>   char *cmdline, unsigned long cmdline_len,
> >>> @@ -88,10 +165,26 @@ static int setup_dtb(struct kimage *image,
> >>>   int range_len;
> >>>   int ret;
> >>>  
> >>> + /* check ranges against root's #address-cells and #size-cells */
> >>> + if (image->type == KEXEC_TYPE_CRASH &&
> >>> + (!cells_size_fitted(image->arch.elf_load_addr,
> >>> + image->arch.elf_headers_sz) ||
> >>> +  !cells_size_fitted(crashk_res.start,
> >>> + crashk_res.end - crashk_res.start + 1))) {
> >>> + pr_err("Crash memory region doesn't fit into DT's root cell 
> >>> sizes.\n");
> >>> + ret = -EINVAL;
> >>> + goto out_err;
> >>> + }
> >>
> >> To check I've understood this properly: This can happen if the firmware 
> >> provided
> >> a DTB with 32bit address/size cells, but at least some of the memory 
> >> requires 64
> >> bit address/size cells. This could only happen on a UEFI system where the
> >> firmware-DTB doesn't describe memory. ACPI-only systems would have the 
> >> EFIstub DT.
> > 
> > Probably, yes. I assumed the case where #address-cells and #size-cells
> > were just missing in fdt.
> 
> Ah, that's another one. I just wanted to check we could boot on a system where
> this can happen.
> 
> 
> >>>   /* duplicate dt blob */
> >>>   buf_size = fdt_totalsize(initial_boot_params);
> >>>   range_len = (__dt_root_addr_cells + __dt

Re: [PATCH v9 06/11] arm64: kexec_file: allow for loading Image-format kernel

2018-05-21 Thread AKASHI Takahiro
James,

I haven't commented on this email.

On Tue, May 15, 2018 at 06:14:37PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 15/05/18 06:13, AKASHI Takahiro wrote:
> > On Fri, May 11, 2018 at 06:07:06PM +0100, James Morse wrote:
> >> On 07/05/18 08:21, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:11PM +0100, James Morse wrote:
>  On 25/04/18 07:26, AKASHI Takahiro wrote:
> > This patch provides kexec_file_ops for "Image"-format kernel. In this
> > implementation, a binary is always loaded with a fixed offset identified
> > in text_offset field of its header.
> >>
> > diff --git a/arch/arm64/include/asm/kexec.h 
> > b/arch/arm64/include/asm/kexec.h
> > index e4de1223715f..3cba4161818a 100644
> > --- a/arch/arm64/include/asm/kexec.h
> > +++ b/arch/arm64/include/asm/kexec.h
> 
>  Could we check branch_code is non-zero, and text-offset points within 
>  image-size?
> >>>
> >>> We could do it, but I don't think this check is very useful.
> >>>
> 
>  We could check that this platform supports the page-size/endian config 
>  that this
>  Image was built with... We get a message from the EFI stub if the 
>  page-size
>  can't be supported, it would be nice to do the same here (as we can).
> >>>
> >>> There is no restriction on page-size or endianness for kexec.
> >>
> >> No, but it won't boot if the hardware doesn't support it. The kernel will 
> >> spin
> >> at a magic address that is, difficult, to debug without JTAG. The bug 
> >> report
> >> will be "it didn't boot".
> > 
> > OK.
> > Added sanity checks for cpu features, endianness as well as page size.
> > 
> >>
> >>> What will be the purpose of this check?
> >>
> >> These values are in the header so that the bootloader can check them, then 
> >> print
> >> a meaningful error. Here, kexec_file_load() is playing the part of the 
> >> bootloader.
> 
> >> I'm assuming kexec_file_load() can only be used to kexec linux... unlike 
> >> regular
> >> kexec. Is this where I'm going wrong?
> 
> Trying to work this out for myself: we can't support any UEFI application as 
> we
> can't give it the boot-services environment, so I'm pretty sure
> kexec_file_load() must be linux-specific.
> 
> Can we state somewhere that we only expect arm64 linux to be booted with
> kexec_file_load()? Its not clear from the kconfig text, which refers to kexec,
> which explicitly states it can boot other OS. But for kexec_file_load() we're
> following the kernel's booting.txt.

While I don't know anything about requirements in booting other OS's nor
if we can boot them even with kexec, I agree that kexec_file_load is a more
limited form of booting mechanism. I will add some statement in Kconfig.

> > diff --git a/arch/arm64/kernel/kexec_image.c 
> > b/arch/arm64/kernel/kexec_image.c
> > new file mode 100644
> > index ..4dd524ad6611
> > --- /dev/null
> > +++ b/arch/arm64/kernel/kexec_image.c
> > @@ -0,0 +1,79 @@
> 
> > +static void *image_load(struct kimage *image,
> > +   char *kernel, unsigned long kernel_len,
> > +   char *initrd, unsigned long initrd_len,
> > +   char *cmdline, unsigned long 
> > cmdline_len)
> > +{
> > +   struct kexec_buf kbuf;
> > +   struct arm64_image_header *h = (struct arm64_image_header 
> > *)kernel;
> > +   unsigned long text_offset;
> > +   int ret;
> > +
> > +   /* Load the kernel */
> > +   kbuf.image = image;
> > +   kbuf.buf_min = 0;
> > +   kbuf.buf_max = ULONG_MAX;
> > +   kbuf.top_down = false;
> > +
> > +   kbuf.buffer = kernel;
> > +   kbuf.bufsz = kernel_len;
> > +   kbuf.memsz = le64_to_cpu(h->image_size);
> > +   text_offset = le64_to_cpu(h->text_offset);
> > +   kbuf.buf_align = SZ_2M;
> 
> > +   /* Adjust kernel segment with TEXT_OFFSET */
> > +   kbuf.memsz += text_offset;
> > +
> > +   ret = kexec_add_buffer(&kbuf);
> > +   if (ret)
> > +   goto out;
> > +
> > +   image->arch.kern_segment = image->nr_segments - 1;
> 
>  You only seem to use kern_segment here, and in load_other_segments() 
>  called
>  below. Could it not be a local variable passed in? Instead of 
>  arch-specific data
>  we keep forever?
> >>>
> >>> No, kern_segment is also used in load_other_segments() in 
> >>> machine_kexec_file.c.
> >>> To optimize memory hole allocation logic in locate_mem_hole_callback(),
> >>> we need to know the exact range of kernel image (start and end).
> >>
> >> That's the second user. My badly-made point is one calls the other, but 
> >> passes
> >> the data via some until-kexec lifetime struct. (its not important, just an
> >> indicator this worked differently in the past and hasn't been cleaned up).
> >>