Re: [PATCHv5 00/11] CONFIG_DEBUG_VIRTUAL for arm64

2016-12-13 Thread Mark Rutland
On Tue, Dec 06, 2016 at 03:50:46PM -0800, Laura Abbott wrote:
> Hi,
> 
> This is v5 of the series to add CONFIG_DEBUG_VIRTUAL for arm64. This mostly
> contains minor fixups including adding a few extra headers around and 
> splitting
> things out into a few more sub-patches.
> 
> With a few more acks I think this should be ready to go. More testing is
> always appreciated though.

I've given the whole series a go with kasan, kexec, and hibernate (using
test_resume with the disk target), and everything looks happy. So FWIW,
for the series:

Reviewed-by: Mark Rutland 
Tested-by: Mark Rutland 

Hopefully this can be queued soon for v4.11!

Thanks,
Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-13 Thread Baoquan He
On 12/14/16 at 02:11pm, Pingfan Liu wrote:
> kexec-tools always allocates program headers for each possible cpu. This
> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> the capture kernel can distinguish it from the mistake of allocated
> program header.
> The counterpart of the capture kernel comes in next patch.

When you execute dmesg on your testing machine and grep nr_cpu_ids,
what's the value of nr_cpu_ids?

> 
> Signed-off-by: Pingfan Liu 
> ---
> This unnecessary warning buzz on all archs when there is offline cpu
> 
>  include/uapi/linux/elf.h | 1 +
>  kernel/kexec_core.c  | 9 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index b59ee07..9744f1e 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -367,6 +367,7 @@ typedef struct elf64_shdr {
>   * using the corresponding note types via the PTRACE_GETREGSET and
>   * PTRACE_SETREGSET requests.
>   */
> +#define NT_DUMMY 0
>  #define NT_PRSTATUS  1
>  #define NT_PRFPREG   2
>  #define NT_PRPSINFO  3
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..aeac16e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
>   if (mutex_trylock(_mutex)) {
>   if (kexec_crash_image) {
>   struct pt_regs fixed_regs;
> + unsigned int cpu;
>  
>   crash_setup_regs(_regs, regs);
>   crash_save_vmcoreinfo();
> + for_each_cpu_not(cpu, cpu_online_mask)
> + crash_save_cpu(NULL, cpu);
>   machine_crash_shutdown(_regs);
>   machine_kexec(kexec_crash_image);
>   }
> @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>   buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
>   if (!buf)
>   return;
> + if (regs == NULL) {
> + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
> + NULL, 0);
> + final_note(buf);
> + return;
> + }
>   memset(, 0, sizeof(prstatus));
>   prstatus.pr_pid = current->pid;
>   elf_core_copy_kernel_regs(_reg, regs);
> -- 
> 2.7.4
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-13 Thread Liu ping fan
On Wed, Dec 14, 2016 at 2:11 PM, Pingfan Liu  wrote:
> kexec-tools always allocates program headers for each possible cpu. This

The code is in the file:kexec-tools/kexec/crashdump-elf.c
   nr_cpus = sysconf(_SC_NPROCESSORS_CONF);

> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> the capture kernel can distinguish it from the mistake of allocated
> program header.
> The counterpart of the capture kernel comes in next patch.
>
> Signed-off-by: Pingfan Liu 
> ---
> This unnecessary warning buzz on all archs when there is offline cpu
>
>  include/uapi/linux/elf.h | 1 +
>  kernel/kexec_core.c  | 9 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index b59ee07..9744f1e 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -367,6 +367,7 @@ typedef struct elf64_shdr {
>   * using the corresponding note types via the PTRACE_GETREGSET and
>   * PTRACE_SETREGSET requests.
>   */
> +#define NT_DUMMY   0
>  #define NT_PRSTATUS1
>  #define NT_PRFPREG 2
>  #define NT_PRPSINFO3
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..aeac16e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
> if (mutex_trylock(_mutex)) {
> if (kexec_crash_image) {
> struct pt_regs fixed_regs;
> +   unsigned int cpu;
>
> crash_setup_regs(_regs, regs);
> crash_save_vmcoreinfo();
> +   for_each_cpu_not(cpu, cpu_online_mask)
> +   crash_save_cpu(NULL, cpu);
> machine_crash_shutdown(_regs);
> machine_kexec(kexec_crash_image);
> }
> @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
> buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
> if (!buf)
> return;
> +   if (regs == NULL) {
> +   buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
> +   NULL, 0);
> +   final_note(buf);
> +   return;
> +   }
> memset(, 0, sizeof(prstatus));
> prstatus.pr_pid = current->pid;
> elf_core_copy_kernel_regs(_reg, regs);
> --
> 2.7.4
>

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-13 Thread Pingfan Liu
kexec-tools always allocates program headers for each possible cpu. This
incurs zero PT_NOTE for offline cpu. We mark this case so that later,
the capture kernel can distinguish it from the mistake of allocated
program header.
The counterpart of the capture kernel comes in next patch.

Signed-off-by: Pingfan Liu 
---
This unnecessary warning buzz on all archs when there is offline cpu

 include/uapi/linux/elf.h | 1 +
 kernel/kexec_core.c  | 9 +
 2 files changed, 10 insertions(+)

diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index b59ee07..9744f1e 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -367,6 +367,7 @@ typedef struct elf64_shdr {
  * using the corresponding note types via the PTRACE_GETREGSET and
  * PTRACE_SETREGSET requests.
  */
+#define NT_DUMMY   0
 #define NT_PRSTATUS1
 #define NT_PRFPREG 2
 #define NT_PRPSINFO3
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 5616755..aeac16e 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
if (mutex_trylock(_mutex)) {
if (kexec_crash_image) {
struct pt_regs fixed_regs;
+   unsigned int cpu;
 
crash_setup_regs(_regs, regs);
crash_save_vmcoreinfo();
+   for_each_cpu_not(cpu, cpu_online_mask)
+   crash_save_cpu(NULL, cpu);
machine_crash_shutdown(_regs);
machine_kexec(kexec_crash_image);
}
@@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
if (!buf)
return;
+   if (regs == NULL) {
+   buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
+   NULL, 0);
+   final_note(buf);
+   return;
+   }
memset(, 0, sizeof(prstatus));
prstatus.pr_pid = current->pid;
elf_core_copy_kernel_regs(_reg, regs);
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning

2016-12-13 Thread Pingfan Liu
kexec-tools always allocates program headers for possible cpus. But
when crashing, offline cpus have dummy headers. We do not copy these
dummy notes into ELF file, also have no need of warning on them.

Signed-off-by: Pingfan Liu 
---
 fs/proc/vmcore.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 8ab782d..bbc9dad 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -526,9 +526,10 @@ static u64 __init get_vmcore_size(size_t elfsz, size_t 
elfnotesegsz,
  */
 static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
 {
-   int i, rc=0;
+   int i, j, rc = 0;
Elf64_Phdr *phdr_ptr;
Elf64_Nhdr *nhdr_ptr;
+   bool warn;
 
phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
@@ -536,6 +537,7 @@ static int __init update_note_header_size_elf64(const 
Elf64_Ehdr *ehdr_ptr)
u64 offset, max_sz, sz, real_sz = 0;
if (phdr_ptr->p_type != PT_NOTE)
continue;
+   warn = true;
max_sz = phdr_ptr->p_memsz;
offset = phdr_ptr->p_offset;
notes_section = kmalloc(max_sz, GFP_KERNEL);
@@ -547,7 +549,7 @@ static int __init update_note_header_size_elf64(const 
Elf64_Ehdr *ehdr_ptr)
return rc;
}
nhdr_ptr = notes_section;
-   while (nhdr_ptr->n_namesz != 0) {
+   for (j = 0; nhdr_ptr->n_namesz != 0; j++) {
sz = sizeof(Elf64_Nhdr) +
(((u64)nhdr_ptr->n_namesz + 3) & ~3) +
(((u64)nhdr_ptr->n_descsz + 3) & ~3);
@@ -559,11 +561,22 @@ static int __init update_note_header_size_elf64(const 
Elf64_Ehdr *ehdr_ptr)
real_sz += sz;
nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
}
+   if (real_sz != 0)
+   warn = false;
+   if (j == 1) {
+   nhdr_ptr = notes_section;
+   if ((nhdr_ptr->n_type == NT_DUMMY)
+ && !strncmp(KEXEC_CORE_NOTE_NAME,
+   (char *)nhdr_ptr + sizeof(Elf64_Nhdr),
+   strlen(KEXEC_CORE_NOTE_NAME))) {
+   /* do not copy this dummy note */
+   real_sz = 0;
+   }
+   }
kfree(notes_section);
phdr_ptr->p_memsz = real_sz;
-   if (real_sz == 0) {
+   if (warn)
pr_warn("Warning: Zero PT_NOTE entries found\n");
-   }
}
 
return 0;
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Add +~800M crashkernel explaination

2016-12-13 Thread Xunlei Pang
On 12/14/2016 at 11:08 AM, Xunlei Pang wrote:
> On 12/10/2016 at 01:20 PM, Robert LeBlanc wrote:
>> On Fri, Dec 9, 2016 at 7:49 PM, Baoquan He  wrote:
>>> On 12/09/16 at 05:22pm, Robert LeBlanc wrote:
 When trying to configure crashkernel greater than about 800 MB, the
 kernel fails to allocate memory on x86 and x86_64. This is due to an
 undocumented limit that the crashkernel and other low memory items must
 be allocated below 896 MB unless the ",high" option is given. This
 updates the documentation to explain this and what I understand the
 limitations to be on the option.
>>> This is true, but not very accurate. You found it's about 800M, it's
>>> becasue usually the current kernel need about 40M space to run, and some
>>> extra reservation before reserve_crashkernel invocation, another ~10M.
>>> However it's normal case, people may build modules into or have some
>>> special code to bloat kernel. This patch makes sense to address the
>>> low|high issue, it might be not good so determined to say ~800M.
>> My testing showed that I could go anywhere from about 830M to 880M,
>> depending on distro, kernel version, and stuff that you mentioned. I
>> just thought some rule of thumb of when to consider using high would
>> be good. People may not think that 800 MB is 'large' when you have 512
>> GB of RAM for instance. I thought about making 512 MB be the rule of
>> thumb, but you can do a lot with ~300 MB.
> Hi Robert,
>
> I think you are correct.
>
> For x86, the kernel uses memblock to locate the proper range starts from 16MB 
> to some "end",
> without "high" prefix, "end" is CRASH_ADDR_LOW_MAX, otherwise 
> CRASH_ADDR_HIGH_MAX.
>
> You can find the definition for both 32-bit and 64-bit:
> #ifdef CONFIG_X86_32
> # define CRASH_ADDR_LOW_MAX (512 << 20)
> # define CRASH_ADDR_HIGH_MAX(512 << 20)
> #else
> # define CRASH_ADDR_LOW_MAX (896UL << 20)
> # define CRASH_ADDR_HIGH_MAXMAXMEM
> #endif
>
> as some memory was already allocated by the kernel, which means it's highly 
> likely to get a reservation
> failure after specifying a crashkernel value near 800MB(for x86_64) which was 
> what you met. But we can't
> get the exact threshold, but it would be better if there is some explanation 
> accordingly in the document.

But there is another point:
If you specify the base using crashkernel=size[KMG][@offset[KMG]], for example
"crashkernel=1024M@0x1000", there is no such limitation, and you may get
a successful reservation. I have no idea why the design is so different.

Regards,
Xunlei

>
>> I'm happy to adjust the wording, what would you recommend? Also, I'm
>> not 100% sure that I got the cases covered correctly. I was surprised
>> that I could not get it to work with the "new" format with the
>> multiple ranges, and that specifying an offset would't work either,
>> although the offset kind of makes sense. Do you know for sure that it
>> doesn't work with ranges?
>>
>> I tried,
>>
>> crashkernel=256M-1G:128M,high,1G-4G:256M,high,4G-:512M,high
>>
>> and
>>
>> crashkernel=256M-1G:128M,1G-4G:256M,4G-:512M,high
>>
>> and neither worked. It seems that a better separator would be ';'
>> instead of ',' for ranges, then you could specify options better. Kind
>> of hard to change now.
> For "crashkernel=range1:size1[,range2:size2,...][@offset]"
> I'm afraid it doesn't support "high" prefix in the current implementation, so 
> there is no guarantee.
> I guess we can drop a note to eliminate the confusion.
>
> Regards,
> Xunlei
>
 Signed-off-by: Robert LeBlanc 
 ---
  Documentation/kdump/kdump.txt | 22 +-
  1 file changed, 17 insertions(+), 5 deletions(-)

 diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
 index b0eb27b..aa3efa8 100644
 --- a/Documentation/kdump/kdump.txt
 +++ b/Documentation/kdump/kdump.txt
 @@ -256,7 +256,9 @@ While the "crashkernel=size[@offset]" syntax is 
 sufficient for most
  configurations, sometimes it's handy to have the reserved memory dependent
  on the value of System RAM -- that's mostly for distributors that 
 pre-setup
  the kernel command line to avoid a unbootable system after some memory has
 -been removed from the machine.
 +been removed from the machine. If you need to allocate more than ~800M
 +for x86 or x86_64 then you must use the simple format as the format
 +',high' conflicts with the separators of ranges.

  The syntax is:

 @@ -282,11 +284,21 @@ Boot into System Kernel
  1) Update the boot loader (such as grub, yaboot, or lilo) configuration
 files as necessary.

 -2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
 +2) Boot the system kernel with the boot parameter "crashkernel=Y[@X | 
 ,high]",
 where Y specifies how much memory to reserve for the dump-capture 
 kernel
 -   and X specifies the 

Re: [PATCH] Add +~800M crashkernel explaination

2016-12-13 Thread Xunlei Pang
On 12/10/2016 at 01:20 PM, Robert LeBlanc wrote:
> On Fri, Dec 9, 2016 at 7:49 PM, Baoquan He  wrote:
>> On 12/09/16 at 05:22pm, Robert LeBlanc wrote:
>>> When trying to configure crashkernel greater than about 800 MB, the
>>> kernel fails to allocate memory on x86 and x86_64. This is due to an
>>> undocumented limit that the crashkernel and other low memory items must
>>> be allocated below 896 MB unless the ",high" option is given. This
>>> updates the documentation to explain this and what I understand the
>>> limitations to be on the option.
>> This is true, but not very accurate. You found it's about 800M, it's
>> becasue usually the current kernel need about 40M space to run, and some
>> extra reservation before reserve_crashkernel invocation, another ~10M.
>> However it's normal case, people may build modules into or have some
>> special code to bloat kernel. This patch makes sense to address the
>> low|high issue, it might be not good so determined to say ~800M.
> My testing showed that I could go anywhere from about 830M to 880M,
> depending on distro, kernel version, and stuff that you mentioned. I
> just thought some rule of thumb of when to consider using high would
> be good. People may not think that 800 MB is 'large' when you have 512
> GB of RAM for instance. I thought about making 512 MB be the rule of
> thumb, but you can do a lot with ~300 MB.

Hi Robert,

I think you are correct.

For x86, the kernel uses memblock to locate the proper range starts from 16MB 
to some "end",
without "high" prefix, "end" is CRASH_ADDR_LOW_MAX, otherwise 
CRASH_ADDR_HIGH_MAX.

You can find the definition for both 32-bit and 64-bit:
#ifdef CONFIG_X86_32
# define CRASH_ADDR_LOW_MAX (512 << 20)
# define CRASH_ADDR_HIGH_MAX(512 << 20)
#else
# define CRASH_ADDR_LOW_MAX (896UL << 20)
# define CRASH_ADDR_HIGH_MAXMAXMEM
#endif

as some memory was already allocated by the kernel, which means it's highly 
likely to get a reservation
failure after specifying a crashkernel value near 800MB(for x86_64) which was 
what you met. But we can't
get the exact threshold, but it would be better if there is some explanation 
accordingly in the document.

>
> I'm happy to adjust the wording, what would you recommend? Also, I'm
> not 100% sure that I got the cases covered correctly. I was surprised
> that I could not get it to work with the "new" format with the
> multiple ranges, and that specifying an offset would't work either,
> although the offset kind of makes sense. Do you know for sure that it
> doesn't work with ranges?
>
> I tried,
>
> crashkernel=256M-1G:128M,high,1G-4G:256M,high,4G-:512M,high
>
> and
>
> crashkernel=256M-1G:128M,1G-4G:256M,4G-:512M,high
>
> and neither worked. It seems that a better separator would be ';'
> instead of ',' for ranges, then you could specify options better. Kind
> of hard to change now.

For "crashkernel=range1:size1[,range2:size2,...][@offset]"
I'm afraid it doesn't support "high" prefix in the current implementation, so 
there is no guarantee.
I guess we can drop a note to eliminate the confusion.

Regards,
Xunlei

>>> Signed-off-by: Robert LeBlanc 
>>> ---
>>>  Documentation/kdump/kdump.txt | 22 +-
>>>  1 file changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
>>> index b0eb27b..aa3efa8 100644
>>> --- a/Documentation/kdump/kdump.txt
>>> +++ b/Documentation/kdump/kdump.txt
>>> @@ -256,7 +256,9 @@ While the "crashkernel=size[@offset]" syntax is 
>>> sufficient for most
>>>  configurations, sometimes it's handy to have the reserved memory dependent
>>>  on the value of System RAM -- that's mostly for distributors that pre-setup
>>>  the kernel command line to avoid a unbootable system after some memory has
>>> -been removed from the machine.
>>> +been removed from the machine. If you need to allocate more than ~800M
>>> +for x86 or x86_64 then you must use the simple format as the format
>>> +',high' conflicts with the separators of ranges.
>>>
>>>  The syntax is:
>>>
>>> @@ -282,11 +284,21 @@ Boot into System Kernel
>>>  1) Update the boot loader (such as grub, yaboot, or lilo) configuration
>>> files as necessary.
>>>
>>> -2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
>>> +2) Boot the system kernel with the boot parameter "crashkernel=Y[@X | 
>>> ,high]",
>>> where Y specifies how much memory to reserve for the dump-capture kernel
>>> -   and X specifies the beginning of this reserved memory. For example,
>>> -   "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
>>> -   starting at physical address 0x0100 (16MB) for the dump-capture 
>>> kernel.
>>> +   and X specifies the beginning of this reserved memory or ',high' to 
>>> load in
>>> +   high memory. For example, "crashkernel=64M@16M" tells the system
>>> +   kernel to reserve 64 MB of memory starting at physical address
>>> +