Re: [PATCH v9 1/2] Documentation: kdump: remind user of nr_cpus

2016-08-23 Thread Baoquan He
On 08/22/16 at 09:14am, "Zhou, Wenjian/周文剑" wrote:
> On 08/19/2016 11:57 PM, Jonathan Corbet wrote:
> >On Fri, 19 Aug 2016 08:33:21 +0800
> >"Zhou, Wenjian/周文剑"  wrote:
> >
> >>I was also confused by maxcpus and nr_cpus before writing this patch.
> >>I think it is a good choice to describe it in kernel-parameters.txt.
> >>
> >>Then, only two things need to be done I think.
> >>One is move the above description to maxcpus= in kernel-parameters.txt.
> >>And the other is replace maxcpus with maxcpus/nr_cpus in kdump.txt.
> >>
> >>How do you think?
> >
> >That is not quite what I had in mind, sorry.  What I would really like to
> >see in kernel-parameters.txt is an explanation of how those two parameters
> >differ - what do they do differently and how should a user choose one over
> >the other?  What we have now offers no guidance in that matter.
> >
> 
> I thought about it. I think user may not need this.
> What user really want to know is how to choose.
> And it is also not a hard work. If nr_cpus is not supported by the ARCH, use 
> maxcpus.
> Otherwise, nr_cpus. The reason why maxcpus still exists is nr_cpus can't be 
> supported
> by some ARCHes.

I think Jon is suggesting that a note can be added into
kernel-parameter.txt to tell what's the difference between nr_cpus and
max_cpus. I checked code and discussed within our kdump team, max_cpus
is used to limit how many 'present' cpus are allowed to be brought up
during system bootup, while nr_cpus is used to set the upper limit of
'possible' cpus. E.g on my laptop, there are 4 cpus while 4 hotplug
cpus, altogether 8 possible cpus. Possible cpus slot is for cpu hot
plug, means during bootup you want to bring up 4 present cpus, but
later you could physically hot plug 4 others. Because of attribute of
some static percpu variables, we need pre-allocate memory for all
possible cpus though some of them may not be really used if no extra
cpu physically hot plugged after system bootup.

Hence for kdump kernel, people never want to do a cpu hot plug in there.
That's why we want to use nr_cpus to limit the number of possible cpu to
save memory. E.g still on my laptop, if I want to do a kdump, the number
of possible cpu is still 8, but you may want to use only 1 cpu to dump,
maybe 2 or 3 for parallel dumping. But you absolutely don't want to set
nr_cpus=8 in your kdump kernel cmdline, though it doesn't cause failure,
memory is wasted because of percpu pre-allocation. So specifying nr_cpus=1
is much better. While with specifying max_cpus=1, the number of possible
cpu is still 8. That's the reason. On x86_64 and s390, there's another
kernel para "possible_cpus=xx" which can be used to set possible cpus for
cpu hot plug. Only when "possible_cpus=0" is specified, smp is disabled.
I am not very sure why this is introduced, number of possible cpu is
decided by the min value of nr_cpus= and possible_cpus=.

nr_cpus and maxcpus might not be very clear to people which are
described in Documentation/kernel-parameters.txt.

Hi Jon, do you think change as below is OK to you?


From 8b940193a29acf0857d4975d77f4b9f48e2d6cb8 Mon Sep 17 00:00:00 2001
From: Baoquan He 
Date: Wed, 24 Aug 2016 11:14:34 +0800
Subject: [PATCH] docs: kernel-parameter : Improve the description of nr_cpus
 and maxcpus

From the old description people still can't get what's the exact
difference between nr_cpus and maxcpus. Especially in kdump kernel
nr_cpus is always suggested if it's implemented in the ARCH. The
reason is nr_cpus is used to limit the max number of possible cpu
in system, the sum of already plugged cpus and hot plug cpus can't
exceed its value. However maxcpus is used to limit how many cpus
are allowed to be brought up during bootup.

Signed-off-by: Baoquan He 
---
 Documentation/kernel-parameters.txt | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 46c030a..25d3b36 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2161,10 +2161,13 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
than or equal to this physical address is ignored.
 
maxcpus=[SMP] Maximum number of processors that an SMP kernel
-   should make use of.  maxcpus=n : n >= 0 limits the
-   kernel to using 'n' processors.  n=0 is a special case,
-   it is equivalent to "nosmp", which also disables
-   the IO APIC.
+   will bring up during bootup.  maxcpus=n : n >= 0 limits
+   the kernel to bring up 'n' processors. Surely after
+   bootup you can bring up the other plugged cpu by 
executing
+   "echo 1 > /sys/devices/system/cpu/cpuX/online". So 
maxcpus
+   only 

Re: [PATCH v2 1/2] kexec: Introduce "/sys/kernel/kexec_crash_low_size"

2016-08-23 Thread Yinghai Lu
On Wed, Aug 17, 2016 at 1:20 AM, Dave Young  wrote:
> On 08/17/16 at 09:50am, Xunlei Pang wrote:
>> "/sys/kernel/kexec_crash_size" only handles crashk_res, it
>> is fine in most cases, but sometimes we have crashk_low_res.
>> For example, when "crashkernel=size[KMG],high" combined with
>> "crashkernel=size[KMG],low" is used for 64-bit x86.
>>
>> Like crashk_res, we introduce the corresponding sysfs file
>> "/sys/kernel/kexec_crash_low_size" for crashk_low_res.
>>
>> So, the exact total reserved memory is the sum of the two.
>>
>> crashk_low_res can also be shrunk via this new interface,
>> and users should be aware of what they are doing.
...
>> @@ -218,6 +238,7 @@ static struct attribute * kernel_attrs[] = {
>>  #ifdef CONFIG_KEXEC_CORE
>>   _loaded_attr.attr,
>>   _crash_loaded_attr.attr,
>> + _crash_low_size_attr.attr,
>>   _crash_size_attr.attr,
>>   _attr.attr,
>>  #endif

would be better if you can use attribute_group .is_visible to control showing of
crash_low_size only when the crash_base is above 4G.

Thanks

Yinghai

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 04/13] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.

2016-08-23 Thread Thiago Jung Bauermann
Am Mittwoch, 24 August 2016, 10:50:26 schrieb Oliver O'Halloran:
> On Tue, Aug 23, 2016 at 1:21 PM, Balbir Singh  
wrote:
> >> zImage on ppc64 BE is an ELF32 file. This patch set only supports
> >> loading
> >> ELF files of the same class as the kernel, so a 64 bit kernel can't
> >> load an ELF32 file. It would be possible to add such support, but it
> >> would be a new feature.
> >> 
> >> The distros I was able to check on ppc64 LE and BE all use vmlinux.
> >> kexec-tools with kexec_load also doesn't support zImage. Do you think
> >> it is important to support zImage?
> > 
> > Well if it didn't work already, I think its low priority. Michael should
> > be able to confirm this. Oliver's been trying to cleanup the zImage to
> > get rid the old zImage limitation, cc'ing him
> 
> I don't think it's ever worked so I wouldn't worry too much about
> supporting it. Fixing kexec-into-zImage and fixing the 32bit wrapper
> on 64bit BE kernel problem has been on my TODO list for a while, but
> it's not a priority.

Ok, thanks for your input.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v5 04/13] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.

2016-08-23 Thread Oliver O'Halloran
On Tue, Aug 23, 2016 at 1:21 PM, Balbir Singh  wrote:
>
>> zImage on ppc64 BE is an ELF32 file. This patch set only supports loading
>> ELF files of the same class as the kernel, so a 64 bit kernel can't load an
>> ELF32 file. It would be possible to add such support, but it would be a new
>> feature.
>>
>> The distros I was able to check on ppc64 LE and BE all use vmlinux.
>> kexec-tools with kexec_load also doesn't support zImage. Do you think it is
>> important to support zImage?
>
> Well if it didn't work already, I think its low priority. Michael should be
> able to confirm this. Oliver's been trying to cleanup the zImage to get rid
> the old zImage limitation, cc'ing him

I don't think it's ever worked so I wouldn't worry too much about
supporting it. Fixing kexec-into-zImage and fixing the 32bit wrapper
on 64bit BE kernel problem has been on my TODO list for a while, but
it's not a priority.

oliver

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1] kexec/arch/i386: Add support for KASLR memory randomization

2016-08-23 Thread Thomas Garnier
On Wed, Aug 17, 2016 at 9:59 PM, Baoquan He  wrote:
> On 08/17/16 at 09:47am, Thomas Garnier wrote:
>> Multiple changes were made on KASLR (right now in linux-next). One of
>> them is randomizing the virtual address of the physical mapping, vmalloc
>> and vmemmap memory sections. It breaks kdump ability to read physical
>> memory.
>>
>> This change identifies if KASLR memories randomization is used by
>> checking if the page_offset_base variable exists. It search for the
>> correct PAGE_OFFSET value by looking at the loaded memory section and
>> find the lowest aligned on PUD (the randomization level).
>>
>> Related commits on linux-next:
>>  - 0483e1fa6e09d4948272680f691dccb1edb9677f: Base for randomization
>>  - 021182e52fe01c1f7b126f97fd6ba048dc4234fd: Enable for PAGE_OFFSET
>
> Seems above two commits have been inside Linus's tree, while vmemmap
> not yet.
>
>>
>> Signed-off-by: Thomas Garnier 
>> ---
>>  kexec/arch/i386/crashdump-x86.c | 29 ++---
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/kexec/arch/i386/crashdump-x86.c 
>> b/kexec/arch/i386/crashdump-x86.c
>> index bbc0f35..ab833d4 100644
>> --- a/kexec/arch/i386/crashdump-x86.c
>> +++ b/kexec/arch/i386/crashdump-x86.c
>> @@ -102,11 +102,10 @@ static int get_kernel_paddr(struct kexec_info 
>> *UNUSED(info),
>>   return -1;
>>  }
>>
>> -/* Retrieve kernel _stext symbol virtual address from /proc/kallsyms */
>> -static unsigned long long get_kernel_stext_sym(void)
>> +/* Retrieve kernel symbol virtual address from /proc/kallsyms */
>> +static unsigned long long get_kernel_sym(const char *symbol)
>>  {
>>   const char *kallsyms = "/proc/kallsyms";
>> - const char *stext = "_stext";
>>   char sym[128];
>>   char line[128];
>>   FILE *fp;
>> @@ -122,13 +121,13 @@ static unsigned long long get_kernel_stext_sym(void)
>>   while(fgets(line, sizeof(line), fp) != NULL) {
>>   if (sscanf(line, "%Lx %c %s", , , sym) != 3)
>>   continue;
>> - if (strcmp(sym, stext) == 0) {
>> - dbgprintf("kernel symbol %s vaddr = %16llx\n", stext, 
>> vaddr);
>> + if (strcmp(sym, symbol) == 0) {
>> + dbgprintf("kernel symbol %s vaddr = %16llx\n", symbol, 
>> vaddr);
>>   return vaddr;
>>   }
>>   }
>>
>> - fprintf(stderr, "Cannot get kernel %s symbol address\n", stext);
>> + fprintf(stderr, "Cannot get kernel %s symbol address\n", symbol);
>>   return 0;
>>  }
>>
>> @@ -151,6 +150,8 @@ static int get_kernel_vaddr_and_size(struct kexec_info 
>> *UNUSED(info),
>>   off_t size;
>>   uint32_t elf_flags = 0;
>>   uint64_t stext_sym;
>> + const unsigned long long pud_mask = ~((1 << 30) - 1);
>> + unsigned long long vaddr, lowest_vaddr = 0;
>>
>>   if (elf_info->machine != EM_X86_64)
>>   return 0;
>> @@ -180,9 +181,23 @@ static int get_kernel_vaddr_and_size(struct kexec_info 
>> *UNUSED(info),
>>
>>   end_phdr = _phdr[ehdr.e_phnum];
>>
>> + /* Search for the real PAGE_OFFSET when KASLR memory randomization
>> +  * is enabled */
>
> Yeah, this is necessary. That would be great if it can be put into
> get_kernel_page_offset. But then it need parse kcore elf file again,
> seems no better way.
>

I agree.

Simon: Do you have any comments?

>> + if (get_kernel_sym("page_offset_base") != 0) {
>> + for(phdr = ehdr.e_phdr; phdr != end_phdr; phdr++) {
>> + if (phdr->p_type == PT_LOAD) {
>> + vaddr = phdr->p_vaddr & pud_mask;
>> + if (lowest_vaddr == 0 || lowest_vaddr > vaddr)
>> + lowest_vaddr = vaddr;
>> + }
>> + }
>> + if (lowest_vaddr != 0)
>> + elf_info->page_offset = lowest_vaddr;
>> + }
>> +
>>   /* Traverse through the Elf headers and find the region where
>>* _stext symbol is located in. That's where kernel is mapped */
>> - stext_sym = get_kernel_stext_sym();
>> + stext_sym = get_kernel_sym("_stext");
>>   for(phdr = ehdr.e_phdr; stext_sym && phdr != end_phdr; phdr++) {
>>   if (phdr->p_type == PT_LOAD) {
>>   unsigned long long saddr = phdr->p_vaddr;
>> --
>> 2.8.0.rc3.226.g39d4020
>>
>>
>> ___
>> kexec mailing list
>> kexec@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/kexec

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [integrity:next-restore-kexec 18/31] include/linux/kexec.h:400:52: warning: 'struct kexec_buf' declared inside parameter list will not be visible outside of this definition or declaration

2016-08-23 Thread Thiago Jung Bauermann
Am Dienstag, 23 August 2016, 22:17:59 schrieb kbuild test robot:
> tree:  
> https://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git
> next-restore-kexec head:   62bc4b565254de4796a0835f6f67569eb4835f9f
> commit: f9f57350e53441210120931fc4e0163cf833e648 [18/31] kexec_file: Add
> buffer hand-over support for the next kernel config: i386-defconfig
> (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
> git checkout f9f57350e53441210120931fc4e0163cf833e648
> # save the attached .config to linux build tree
> make ARCH=i386
> 
> All warnings (new ones prefixed by >>):
> 
>In file included from drivers/pci/pci-driver.c:22:0:
> >> include/linux/kexec.h:400:52: warning: 'struct kexec_buf' declared
> >> inside parameter list will not be visible outside of this definition
> >> or declaration
> static inline int kexec_add_handover_buffer(struct kexec_buf *kbuf)

This happens when CONFIG_KEXEC=y but CONFIG_KEXEC_FILE=n, and is fixed by 
the following change, which will be in my next revision of the kexec
buffer hand-over series:

Fix for "kexec_file: Add buffer hand-over support for the next kernel"

Declare stub struct kexec_buf when CONFIG_KEXEC_FILE=n.

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 6ec09e85efd9..29f98f816e92 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -404,6 +404,8 @@ int kexec_add_handover_buffer(struct kexec_buf *kbuf);
 int __weak kexec_get_handover_buffer(void **addr, unsigned long *size);
 int __weak kexec_free_handover_buffer(void);
 #else
+struct kexec_buf;
+
 static inline bool kexec_can_hand_over_buffer(void)
 {
return false;


-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v24 5/9] arm64: kdump: add kdump support

2016-08-23 Thread Pratyush Anand
On 23/08/2016:09:38:16 AM, AKASHI Takahiro wrote:
> On Mon, Aug 22, 2016 at 02:47:30PM +0100, James Morse wrote:
> > On 22/08/16 02:29, AKASHI Takahiro wrote:
> > > On Fri, Aug 19, 2016 at 04:52:17PM +0530, Pratyush Anand wrote:
> > >> It will help kexec-tools to prevent copying  of any unnecessary data. I
> > >> think, then you also need to change phys_offset calculation in 
> > >> kexec-tools. That
> > >> should be start of either of first "reserved" or "System RAM" block.
> > > 
> > > Good point, but I'm not sure this is always true.
> > 
> > > Is there any system whose ACPI memory is *not* part of DRAM
> > 
> > From the spec, it looks like this is allowed.
> > 
> > What do you mean by 'DRAM'? Any ACPI region will be in the UEFI memory map, 
> > so
> > the question is what is its type and memory attributes?
> 
> Yes.
> 
> > The UEFI spec[0] says ACPI regions can have a type of EfiACPIReclaimMemory 
> > or
> > EfiACPIMemoryNVS, the memory attributes aren't specified, so are chosen by 
> > the
> > firmware.
> > 
> > It is possible these regions have to be mapped non-cacheable, page 40 has a
> > couple of:
> > > If no information about the table location exists in the UEFI memory map 
> > > or
> > ACPI memory
> > > descriptors, the table is assumed to be non-cached.
> > 
> > reserve_regions() in drivers/firmware/efi/arm-init.c will add any entry in 
> > the
> > memory map that has a 'WB' attribute to the memblock.memory list (via
> > early_init_dt_add_memory_arch()), it will also mark as no-map regions that 
> > have
> > this attribute and aren't in the is_reserve_region() list.
> > 
> > If these ACPI regions have the 'WB' attribute, we add them as memory and 
> > mark
> > them nomap. These show up as either a hole, or 'reserved' in /proc/iomem.
> > If they don't have the 'WB' attribute, then then they are left out of 
> > memblock
> > and aren't part of DRAM, I don't think these will show up in /proc/iomem at 
> > all.
> 
> Let's say,
> 0x1000-0x1fff: reserved (SRAM for UEFI, WB)
> 0x8000-0x: System RAM (DRAM)

May be slightly more complicated:
0x8000-0x80001fff: System RAM (DRAM) for UEFI, WB
0x80002000-0x: System RAM (DRAM)

Kernel will have phys_offset 0x8000, however kexec-tools will calculate it
as 0x80002000.

> 
> If, as Pratyush suggested, "reserved" resources are added to phys_offset
> calculation, the kernel linear mapping area starts at PAGE_OFFSET, but
> there is no actual mapping around PAGE_OFFSET.
> It won't hurt anything, but looks funny.
> So we'd better not include "reserved" in phys_offset calculation anyway.
> -> Pratyush

My only concern is that, then we will have different values of phys_offset in
kernel and kexec-tools, which might lead to further confusion.

~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/7] (kexec-tools) arm64: add kdump support

2016-08-23 Thread AKASHI Takahiro
On Tue, Aug 23, 2016 at 01:34:21PM +0530, Pratyush Anand wrote:
> On 23/08/2016:02:29:03 PM, AKASHI Takahiro wrote:
> > Pratyush,
> > 
> > On Wed, Aug 10, 2016 at 11:26:48PM +0530, Pratyush Anand wrote:
> > > Hi Geoff and Takahiro,
> > > 
> > > I am having some issues with kexec+kdump while working with Seattle 
> > > platform. On
> > > top level, kernel crashes in copy_oldmem_page(), because it gets wrong 
> > > offset
> > > for log_buf during vmcore-dmesg save.
> > > 
> > > Here is the detail:
> > > 
> > > (1) From /proc/iomem, these are the "System RAM" Components:
> > > 
> > > 80-8001e7 : System RAM
> > > 8001e8-83ff17 : System RAM
> > > 800208-8002b3 : Kernel code
> > > 8002c4-800348 : Kernel data
> > > 807fe0-80ffdf : Crash kernel
> > > 83ff18-83ff1c : System RAM
> > > 83ff1d-83ff21 : System RAM
> > > 83ff22-83ffe4 : System RAM
> > > 83ffe5-83 : System RAM
> > > 
> > > (2) From kexec-tools debug print I see following:
> > > elf_arm64_load: e_entry:   fc000808 -> 0088
> > > elf_arm64_load: p_vaddr:   fc000808 -> 0088
> > > elf_arm64_load: header_offset: 
> > > elf_arm64_load: text_offset:   0008
> > > elf_arm64_load: image_size:0141
> > > elf_arm64_load: phys_offset:   0080
> > > elf_arm64_load: page_offset:   fc000800
> > >
> > > I understand that "Kernel Code start physical address" 0x800208 
> > > should map
> > > to e_entry vaddr which is 0xfc000808. However, kexec-tools debug 
> > > print
> > > shows that e_entry vaddr maps to PA 88 which seems wrong.
> > 
> > Who specifies the kernel load address, 0x800208 and why?
> 
> May be I could not get the question. This load address is coming from the 1st
> kernel.

My question is why we need to use this value, 0x800208,
as the kernel load address.
I guess that, on Seattle platform, 0x80-0x8001e8 is
used for a specific purpose and the kernel must be loaded above
0x8001e8.
Since PHYS_OFFSET must be 2MB aligned, the lowest kernel load address
should be:
0x800200 + 0x8(default TEXT_OFFSET).

> > 
> > Since image_arm64_load() also use
> > get_phys_offset() + arm64_mem.text_offset (== 0x88)
> > as the load address unconditionally, doesn't kexec fail on Seattle?
> 
> Yes, had n't tried kexec with binary image. It fails.
> I think image_base should be virt_to_phys(get_kernel_sym("_text")) for
> !KEXEC_ON_CRASH.

So all what we expect from kexec-tools as a sort of boot loader,
we should be able to specify the command line like:
$ kexec --load vmlinux (or Image) --mem-min=0x8001e8 ...

(We could use get_kernel_sym("_text"), but don't have to.)

I'm going to modify our arm64 port in this way.

Thanks,
-Takahiro AKASHI

> ~Pratyush
> 
> > 
> > -Takahiro AKASHI
> > 
> > > (3) further page_offset (or vp_offset in your new code) is calculated
> > > as:arm64_mem.page_offset = ehdr.e_entry - arm64_mem.text_offset;
> > > 
> > > Current calcualtion of page_offset leads to wrong configuration of VA of 
> > > alls
> > > PT_LOAD (see below). Ultimately, this is also leading to kernel crash 
> > > during
> > > vmcore-dmesg and vmcore save operations, because we pass an offset to 
> > > pread()
> > > system call which maps to wrong physical address.
> > > 
> > > Elf header: p_type = 1, p_offset = 0x80 p_paddr = 0x80
> > > p_vaddr = 0xfc000800 p_filesz = 0x1e8 p_memsz = 0x1e8
> > > [0xfc000800 should be mapping to 0x800200 and not 
> > > 0x80]
> > > Elf header: p_type = 1, p_offset = 0x8001e8 p_paddr = 0x8001e8
> > > p_vaddr = 0xfc0009e8 p_filesz = 0x7df8 p_memsz = 0x7df8
> > > Elf header: p_type = 1, p_offset = 0x80ffe0 p_paddr = 0x80ffe0
> > > p_vaddr = 0xfc0107e0 p_filesz = 0x2ff38 p_memsz = 0x2ff38
> > > Elf header: p_type = 1, p_offset = 0x83ff18 p_paddr = 0x83ff18
> > > p_vaddr = 0xfc040718 p_filesz = 0x5 p_memsz = 0x5
> > > Elf header: p_type = 1, p_offset = 0x83ff1d p_paddr = 0x83ff1d
> > > p_vaddr = 0xfc04071d p_filesz = 0x5 p_memsz = 0x5
> > > Elf header: p_type = 1, p_offset = 0x83ff22 p_paddr = 0x83ff22
> > > p_vaddr = 0xfc040722 p_filesz = 0xc3 p_memsz = 0xc3
> > > Elf header: p_type = 1, p_offset = 0x83ffe5 p_paddr = 0x83ffe5
> > > p_vaddr = 0xfc0407e5 p_filesz = 0x1b p_memsz = 0x1b
> > > 
> > > May be following should be better.
> > > arm64_mem.page_offset = ehdr.e_entry - "kernel Code Start PA" + 
> > > phys_offset.
> > > 
> > > (4) Further more,  vmcore must have first PT_LOAD segment as kernel text 
> > > area.
> > > In this platform we have first "System RAM" area as 80-8001e7 
> > > which
> > > is not matching to "Kernel code" area. Therefore, we should provide 
> > 

Re: [PATCH v2 0/7] (kexec-tools) arm64: add kdump support

2016-08-23 Thread Pratyush Anand
On 22/08/2016:05:10:17 PM, AKASHI Takahiro wrote:
> On Fri, Aug 19, 2016 at 03:44:23PM +0530, Pratyush Anand wrote:
> > page_offset is the virtual address of phys_offset, right?
> > 
> > ehdr.e_entry is the virtual address of "kernel Code Start PA", right?
> > 
> > If yes, then why should n't above be correct for all linear regions.
> 
> What I was thinking of is that we'd better fake arm64_mem.text_offset
> instead of calculating a page_offset because text_offset is also used
> in arm64_load_other_segments(). Here, if the first segment doesn't not
> contain kernel text, image_base will be incorrect.

IMHO, we should keep all definitions in kexec-tools as close as possible to that
of corresponding definitions in kernel. This will help us to avoid any confusion
in future.

~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/7] (kexec-tools) arm64: add kdump support

2016-08-23 Thread Pratyush Anand
On 23/08/2016:02:29:03 PM, AKASHI Takahiro wrote:
> Pratyush,
> 
> On Wed, Aug 10, 2016 at 11:26:48PM +0530, Pratyush Anand wrote:
> > Hi Geoff and Takahiro,
> > 
> > I am having some issues with kexec+kdump while working with Seattle 
> > platform. On
> > top level, kernel crashes in copy_oldmem_page(), because it gets wrong 
> > offset
> > for log_buf during vmcore-dmesg save.
> > 
> > Here is the detail:
> > 
> > (1) From /proc/iomem, these are the "System RAM" Components:
> > 
> > 80-8001e7 : System RAM
> > 8001e8-83ff17 : System RAM
> > 800208-8002b3 : Kernel code
> > 8002c4-800348 : Kernel data
> > 807fe0-80ffdf : Crash kernel
> > 83ff18-83ff1c : System RAM
> > 83ff1d-83ff21 : System RAM
> > 83ff22-83ffe4 : System RAM
> > 83ffe5-83 : System RAM
> > 
> > (2) From kexec-tools debug print I see following:
> > elf_arm64_load: e_entry:   fc000808 -> 0088
> > elf_arm64_load: p_vaddr:   fc000808 -> 0088
> > elf_arm64_load: header_offset: 
> > elf_arm64_load: text_offset:   0008
> > elf_arm64_load: image_size:0141
> > elf_arm64_load: phys_offset:   0080
> > elf_arm64_load: page_offset:   fc000800
> >
> > I understand that "Kernel Code start physical address" 0x800208 should 
> > map
> > to e_entry vaddr which is 0xfc000808. However, kexec-tools debug 
> > print
> > shows that e_entry vaddr maps to PA 88 which seems wrong.
> 
> Who specifies the kernel load address, 0x800208 and why?

May be I could not get the question. This load address is coming from the 1st
kernel.

> 
> Since image_arm64_load() also use
> get_phys_offset() + arm64_mem.text_offset (== 0x88)
> as the load address unconditionally, doesn't kexec fail on Seattle?

Yes, had n't tried kexec with binary image. It fails.
I think image_base should be virt_to_phys(get_kernel_sym("_text")) for
!KEXEC_ON_CRASH.

~Pratyush

> 
> -Takahiro AKASHI
> 
> > (3) further page_offset (or vp_offset in your new code) is calculated
> > as:arm64_mem.page_offset = ehdr.e_entry - arm64_mem.text_offset;
> > 
> > Current calcualtion of page_offset leads to wrong configuration of VA of 
> > alls
> > PT_LOAD (see below). Ultimately, this is also leading to kernel crash during
> > vmcore-dmesg and vmcore save operations, because we pass an offset to 
> > pread()
> > system call which maps to wrong physical address.
> > 
> > Elf header: p_type = 1, p_offset = 0x80 p_paddr = 0x80
> > p_vaddr = 0xfc000800 p_filesz = 0x1e8 p_memsz = 0x1e8
> > [0xfc000800 should be mapping to 0x800200 and not 0x80]
> > Elf header: p_type = 1, p_offset = 0x8001e8 p_paddr = 0x8001e8
> > p_vaddr = 0xfc0009e8 p_filesz = 0x7df8 p_memsz = 0x7df8
> > Elf header: p_type = 1, p_offset = 0x80ffe0 p_paddr = 0x80ffe0
> > p_vaddr = 0xfc0107e0 p_filesz = 0x2ff38 p_memsz = 0x2ff38
> > Elf header: p_type = 1, p_offset = 0x83ff18 p_paddr = 0x83ff18
> > p_vaddr = 0xfc040718 p_filesz = 0x5 p_memsz = 0x5
> > Elf header: p_type = 1, p_offset = 0x83ff1d p_paddr = 0x83ff1d
> > p_vaddr = 0xfc04071d p_filesz = 0x5 p_memsz = 0x5
> > Elf header: p_type = 1, p_offset = 0x83ff22 p_paddr = 0x83ff22
> > p_vaddr = 0xfc040722 p_filesz = 0xc3 p_memsz = 0xc3
> > Elf header: p_type = 1, p_offset = 0x83ffe5 p_paddr = 0x83ffe5
> > p_vaddr = 0xfc0407e5 p_filesz = 0x1b p_memsz = 0x1b
> > 
> > May be following should be better.
> > arm64_mem.page_offset = ehdr.e_entry - "kernel Code Start PA" + phys_offset.
> > 
> > (4) Further more,  vmcore must have first PT_LOAD segment as kernel text 
> > area.
> > In this platform we have first "System RAM" area as 80-8001e7 
> > which
> > is not matching to "Kernel code" area. Therefore, we should provide support 
> > of
> > "kern_size" so that first PT_LOAD is kernel text area.
> > 
> > ~Pratyush
> > On 09/08/2016:11:00:25 AM, AKASHI Takahiro wrote:
> > > My kernel patches of kdump suport on arm64 are currently under reviews 
> > > [1].
> > > 
> > > This patchset is synced with them (v24) and provides necessary changes for
> > > kexec-tools. It should be applied on top of Geoff's kexec-tools patches
> > > v3[2] along with a bugfix[3].
> > > 
> > > [1] 
> > > http://lists.infradead.org/pipermail/linux-arm-kernel/2016-August/447597.html
> > > [2] http://lists.infradead.org/pipermail/kexec/2016-August/016768.html
> > > [3] http://lists.infradead.org/pipermail/kexec/2016-July/016664.html
> > > 
> > > Changes for v2:
> > >  - Trim a temoprary buffer in setup_2nd_dtb()
> > >  - Add patch#6("kexec: generalize and rename get_kernel_stext_sym()")
> > >  - Update patch#7 from Pratyush
> > >(re-worked by akashi)
> > > 
> > > AKASHI Takahiro (5):

Re: [RFC 0/4] Kexec: Enable run time memory resrvation of crash kernel

2016-08-23 Thread Xunlei Pang
On 2016/08/22 at 18:59, Pratyush Anand wrote:
> On 12/08/2016:07:48:38 PM, Ronit Halder wrote:
>> Currenty linux kernel reserves memory at the boot time for crash kernel.
>> It will be very useful if we can reserve memory in run time. The user can 
>> reserve the memory whenerver needed instead of reserving at the boot time.
>>
>> It is possible to reserve memory for crash kernel at the run time using
>> CMA (Contiguous Memory Allocator). CMA is capable of allocating big chunk 
>> of memory. At the boot time we will create one (if only low memory is used)
>> or two (if we use both high memory in case of x86_64) CMA areas of size 
>> given in "crashkernel" boot time command line parameter. This memory in CMA
>> areas can be used as movable pages (used for disk caches, process pages
>> etc) if not allocated. Then the user can reserve or free memory from those
>> CMA areas using "/sys/kernel/kexec_crash_size" sysfs entry. If the user
> But the cma_alloc() is not a guaranteed allocation function, whereas memblock
> api will guarantee that crashkerenel memory is available. 
> More over, most of the system starts kdump service at boot time, so not sure 
> if
> it could be useful enough. Lets see what other says

Maybe this is useful for debug purpose, after you shrunk the memory and realized
you just made a mistake, you can use this function to expand it without reboot 
to
modify the cmdline. Otherwise, I can't think of other use cases.

But it still relys on the "crashkernel" cmdline, and I think it would be more 
useful(at least
for me) if you can throw away "crashkernel", and use the sysfs entry directly 
to reserve
or expand the memory if possible. Because sometimes when I want to debug some 
kdump
issue, I found the system I was using didn't specify the right (none or 
smaller)"crashkernel"
cmdline, so I must reboot it.

Regards,
Xunlei

>
>> usee high memory it will automatically at least 256MB low memory
>> (needed for swiotlb and DMA buffers) when the user allocates memory using
>> mentioned sysfs enrty. In case of high memory reservation the user controls
>> the size of reserved region in high memory with
>> "/sys/kernel/kexec_crash_size" entry. If the size set is zero then the 
>> memory allocated in low memory will automatically be freed.
>>
>> As the pages under CMA area (when not allocated by CMA) can only be used by
>> movable pages. The pages won't be used for DMA. So, after allocating pages
>> from CMA area for loading the crash kernel, there won't be any chance of
>> DMA on the memory.
>>
>> Thus is a prototype patch. Please share your opinions on my approach. This
>> patch is only for x86 and x86_64. Please note, this patch is only a
>> prototype just to explain my approach and get the review. This patch is on
>> kernel version v4.4.11.
>>
>> CMA depends on page migration and only uses movable pages. But, the movable
>> pages become unmovable momentarily for pinning. The CMA fails for this
>> reason. I don't have any solution for that right now. This approach will
>> work when the this problems with CMA will be fixed. The patch is enabled
>> by a kernel configuration option CONFIG_KEXEC_CMA.
>>
>> Ronit Halder (4):
>>   Creating one or two CMA area at Boot time
>>   Functions for memory reservation and release
>>   Adding a new kernel configuration to enable the feature
>>   Enable memory allocation through sysfs interface
>>
>>  arch/x86/kernel/setup.c | 44 --
>>  include/linux/kexec.h   | 11 ++-
>>  kernel/kexec_core.c | 83 
>> +
>>  kernel/ksysfs.c | 23 +-
>>  mm/Kconfig  |  6 
>>  5 files changed, 162 insertions(+), 5 deletions(-)
> ~Pratyush
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 2/6] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-23 Thread Thiago Jung Bauermann
Am Montag, 22 August 2016, 15:22:00 schrieb Dave Young:
> On 08/22/16 at 12:38am, Thiago Jung Bauermann wrote:
> > Am Montag, 22 August 2016, 11:21:35 schrieb Dave Young:
> > > On 08/13/16 at 12:18am, Thiago Jung Bauermann wrote:
> > > > diff --git a/arch/powerpc/kernel/machine_kexec_64.c
> > > > b/arch/powerpc/kernel/machine_kexec_64.c index
> > > > a484a6346146..190c652e49b7 100644
> > > > --- a/arch/powerpc/kernel/machine_kexec_64.c
> > > > +++ b/arch/powerpc/kernel/machine_kexec_64.c
> > > > @@ -490,6 +490,60 @@ int arch_kimage_file_post_load_cleanup(struct
> > > > kimage *image)>
> > > > 
> > > > return image->fops->cleanup(image->image_loader_data);
> > > >  
> > > >  }
> > > > 
> > > > +bool kexec_can_hand_over_buffer(void)
> > > > +{
> > > > +   return true;
> > > > +}
> > > > +
> > > > +int arch_kexec_add_handover_buffer(struct kimage *image,
> > > > +  unsigned long load_addr, unsigned 
long
> > 
> > size)
> > 
> > > > +{
> > > > +   image->arch.handover_buffer_addr = load_addr;
> > > > +   image->arch.handover_buffer_size = size;
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +int kexec_get_handover_buffer(void **addr, unsigned long *size)
> > > > +{
> > > > +   int ret;
> > > > +   u64 start_addr, end_addr;
> > > > +
> > > > +   ret = of_property_read_u64(of_chosen,
> > > > +  "linux,kexec-handover-buffer-
start",
> > > > +  _addr);
> > > > +   if (ret == -EINVAL)
> > > > +   return -ENOENT;
> > > > +   else if (ret)
> > > > +   return -EINVAL;
> > > > +
> > > > +   ret = of_property_read_u64(of_chosen,
> > > > "linux,kexec-handover-buffer-end", +
> > 
> > _addr);
> > 
> > > > +   if (ret == -EINVAL)
> > > > +   return -ENOENT;
> > > > +   else if (ret)
> > > > +   return -EINVAL;
> > > > +
> > > > +   *addr =  __va(start_addr);
> > > > +   /* -end is the first address after the buffer. */
> > > > +   *size = end_addr - start_addr;
> > > > +
> > > > +   return 0;
> > > > +}
> > > 
> > > This depends on dtb, so if IMA want to extend it to arches like x86 in
> > > the future you will have to think about other way to pass it.
> > > 
> > > How about think about a general way now?
> > 
> > The only general way I can think of is by adding a kernel command line
> > parameter which the first kernel would pass to the second kernel, but
> > IMHO that is ugly, because such parameter wouldn't be useful to a user,
> > and it would also be something that, from the perspective of the user,
> > would magically appear in the kernel command line of the second
> > kernel...
> Sorry I just brought up the question, actually I have no idea either.
> Maybe we have to do this with arch specific ways..

Actually, I don't think it's possible to avoid arch-specific code because 
the first kernel has to put the buffer memory region in a reserved memory 
map, and that is arch-specific.

On powerpc, this is done by adding it to the device tree memory reservation 
map. On x86, I believe this would be done added to the e820 map.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec