Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Dave Young
Hi, Pingfan
On 12/14/16 at 02:11pm, Pingfan Liu wrote:
> kexec-tools always allocates program headers for each possible cpu. This
> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> the capture kernel can distinguish it from the mistake of allocated
> program header.
> The counterpart of the capture kernel comes in next patch.

I thought you saw the warnings on ppc64 and it might be a ppc64 issue.
But if this is instead a general issue, can we think about if this is
really necessary?

Does it have any side effect other than the warning messages? If there
is nothing bad other than the warnings maybe leave it as is will be
a better way.

> 
> Signed-off-by: Pingfan Liu 
> ---
> This unnecessary warning buzz on all archs when there is offline cpu
> 
>  include/uapi/linux/elf.h | 1 +
>  kernel/kexec_core.c  | 9 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index b59ee07..9744f1e 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -367,6 +367,7 @@ typedef struct elf64_shdr {
>   * using the corresponding note types via the PTRACE_GETREGSET and
>   * PTRACE_SETREGSET requests.
>   */
> +#define NT_DUMMY 0
>  #define NT_PRSTATUS  1
>  #define NT_PRFPREG   2
>  #define NT_PRPSINFO  3
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..aeac16e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
>   if (mutex_trylock(_mutex)) {
>   if (kexec_crash_image) {
>   struct pt_regs fixed_regs;
> + unsigned int cpu;
>  
>   crash_setup_regs(_regs, regs);
>   crash_save_vmcoreinfo();
> + for_each_cpu_not(cpu, cpu_online_mask)
> + crash_save_cpu(NULL, cpu);
>   machine_crash_shutdown(_regs);
>   machine_kexec(kexec_crash_image);
>   }
> @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>   buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
>   if (!buf)
>   return;
> + if (regs == NULL) {
> + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
> + NULL, 0);
> + final_note(buf);
> + return;
> + }
>   memset(, 0, sizeof(prstatus));
>   prstatus.pr_pid = current->pid;
>   elf_core_copy_kernel_regs(_reg, regs);
> -- 
> 2.7.4
> 

Thanks
Dave

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning

2016-12-14 Thread Liu ping fan
On Thu, Dec 15, 2016 at 7:56 AM, Xunlei Pang  wrote:
> On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for possible cpus. But
>> when crashing, offline cpus have dummy headers. We do not copy these
>> dummy notes into ELF file, also have no need of warning on them.
>>
>> Signed-off-by: Pingfan Liu 
>> ---
>>  fs/proc/vmcore.c | 21 +
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
>> index 8ab782d..bbc9dad 100644
>> --- a/fs/proc/vmcore.c
>> +++ b/fs/proc/vmcore.c
>> @@ -526,9 +526,10 @@ static u64 __init get_vmcore_size(size_t elfsz, size_t 
>> elfnotesegsz,
>>   */
>>  static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>>  {
>> - int i, rc=0;
>> + int i, j, rc = 0;
>>   Elf64_Phdr *phdr_ptr;
>>   Elf64_Nhdr *nhdr_ptr;
>> + bool warn;
>>
>>   phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
>>   for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
>> @@ -536,6 +537,7 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>   u64 offset, max_sz, sz, real_sz = 0;
>>   if (phdr_ptr->p_type != PT_NOTE)
>>   continue;
>> + warn = true;
>>   max_sz = phdr_ptr->p_memsz;
>>   offset = phdr_ptr->p_offset;
>>   notes_section = kmalloc(max_sz, GFP_KERNEL);
>> @@ -547,7 +549,7 @@ static int __init update_note_header_size_elf64(const 
>> Elf64_Ehdr *ehdr_ptr)
>>   return rc;
>>   }
>>   nhdr_ptr = notes_section;
>> - while (nhdr_ptr->n_namesz != 0) {
>> + for (j = 0; nhdr_ptr->n_namesz != 0; j++) {
>
> Hi Pingfan,
>
> I think we don't need to be this complex, how about simply check before while 
> loop,
> if it is the cpu dummy note(initialize it with some magic), then handle it 
> differently,
> e.g. set a "nowarn" flag to use afterwards and make sure it has zero p_memsz?
>
I had thought that how the percpu note section was filled. But you are
right,  we can suppose that for all archs, cpus just overwrite the
note, not append the note.

> Also do the similar thing for update_note_header_size_elf32()?
>
Yes, will fix it.

Thx,

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning

2016-12-14 Thread Xunlei Pang
On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
> kexec-tools always allocates program headers for possible cpus. But
> when crashing, offline cpus have dummy headers. We do not copy these
> dummy notes into ELF file, also have no need of warning on them.
>
> Signed-off-by: Pingfan Liu 
> ---
>  fs/proc/vmcore.c | 21 +
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 8ab782d..bbc9dad 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -526,9 +526,10 @@ static u64 __init get_vmcore_size(size_t elfsz, size_t 
> elfnotesegsz,
>   */
>  static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr)
>  {
> - int i, rc=0;
> + int i, j, rc = 0;
>   Elf64_Phdr *phdr_ptr;
>   Elf64_Nhdr *nhdr_ptr;
> + bool warn;
>  
>   phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1);
>   for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) {
> @@ -536,6 +537,7 @@ static int __init update_note_header_size_elf64(const 
> Elf64_Ehdr *ehdr_ptr)
>   u64 offset, max_sz, sz, real_sz = 0;
>   if (phdr_ptr->p_type != PT_NOTE)
>   continue;
> + warn = true;
>   max_sz = phdr_ptr->p_memsz;
>   offset = phdr_ptr->p_offset;
>   notes_section = kmalloc(max_sz, GFP_KERNEL);
> @@ -547,7 +549,7 @@ static int __init update_note_header_size_elf64(const 
> Elf64_Ehdr *ehdr_ptr)
>   return rc;
>   }
>   nhdr_ptr = notes_section;
> - while (nhdr_ptr->n_namesz != 0) {
> + for (j = 0; nhdr_ptr->n_namesz != 0; j++) {

Hi Pingfan,

I think we don't need to be this complex, how about simply check before while 
loop,
if it is the cpu dummy note(initialize it with some magic), then handle it 
differently,
e.g. set a "nowarn" flag to use afterwards and make sure it has zero p_memsz?

Also do the similar thing for update_note_header_size_elf32()?

Regards,
Xunlei

>   sz = sizeof(Elf64_Nhdr) +
>   (((u64)nhdr_ptr->n_namesz + 3) & ~3) +
>   (((u64)nhdr_ptr->n_descsz + 3) & ~3);
> @@ -559,11 +561,22 @@ static int __init update_note_header_size_elf64(const 
> Elf64_Ehdr *ehdr_ptr)
>   real_sz += sz;
>   nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz);
>   }
> + if (real_sz != 0)
> + warn = false;
> + if (j == 1) {
> + nhdr_ptr = notes_section;
> + if ((nhdr_ptr->n_type == NT_DUMMY)
> +   && !strncmp(KEXEC_CORE_NOTE_NAME,
> + (char *)nhdr_ptr + sizeof(Elf64_Nhdr),
> + strlen(KEXEC_CORE_NOTE_NAME))) {
> + /* do not copy this dummy note */
> + real_sz = 0;
> + }
> + }
>   kfree(notes_section);
>   phdr_ptr->p_memsz = real_sz;
> - if (real_sz == 0) {
> + if (warn)
>   pr_warn("Warning: Zero PT_NOTE entries found\n");
> - }
>   }
>  
>   return 0;


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Xunlei Pang
On 12/14/2016 at 05:13 PM, Liu ping fan wrote:
> [...]
>>> No. This patch just place a mark on these offline cpu. The next patch
>>> for capture kernel will recognize this case, and ignore this kind of
>>> pt_note by the code:
>>> real_sz = 0; // although the size of this kind of PT_NOTE is not zero,
>>> but it contains nothing useful, so just ignore it
>>> phdr_ptr->p_memsz = real_sz
>> If there is any other vmcore functional issue besides throwing "Warning: 
>> Zero PT_NOTE entries found"?
>>
> Not at present when I debugged.

Well, agree that we should fix it given that it produces many unnecessary 
warnings on some machines.

> I just think we can not suppose the behaviour of different archs, so
> just mark out the dummy pt_note. If some archs want to use these notes
> memory,
> they will just overwrite the dummy.

For cpu crash_notes, it should be arch-independent, and related to elf format.

>
> Thx,
> Pingfan
>
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Add +~800M crashkernel explaination

2016-12-14 Thread Xunlei Pang
On 12/15/2016 at 01:50 AM, Robert LeBlanc wrote:
> On Tue, Dec 13, 2016 at 8:08 PM, Xunlei Pang  wrote:
>> On 12/10/2016 at 01:20 PM, Robert LeBlanc wrote:
>>> On Fri, Dec 9, 2016 at 7:49 PM, Baoquan He  wrote:
 On 12/09/16 at 05:22pm, Robert LeBlanc wrote:
> When trying to configure crashkernel greater than about 800 MB, the
> kernel fails to allocate memory on x86 and x86_64. This is due to an
> undocumented limit that the crashkernel and other low memory items must
> be allocated below 896 MB unless the ",high" option is given. This
> updates the documentation to explain this and what I understand the
> limitations to be on the option.
 This is true, but not very accurate. You found it's about 800M, it's
 becasue usually the current kernel need about 40M space to run, and some
 extra reservation before reserve_crashkernel invocation, another ~10M.
 However it's normal case, people may build modules into or have some
 special code to bloat kernel. This patch makes sense to address the
 low|high issue, it might be not good so determined to say ~800M.
>>> My testing showed that I could go anywhere from about 830M to 880M,
>>> depending on distro, kernel version, and stuff that you mentioned. I
>>> just thought some rule of thumb of when to consider using high would
>>> be good. People may not think that 800 MB is 'large' when you have 512
>>> GB of RAM for instance. I thought about making 512 MB be the rule of
>>> thumb, but you can do a lot with ~300 MB.
>> Hi Robert,
>>
>> I think you are correct.
>>
>> For x86, the kernel uses memblock to locate the proper range starts from 
>> 16MB to some "end",
>> without "high" prefix, "end" is CRASH_ADDR_LOW_MAX, otherwise 
>> CRASH_ADDR_HIGH_MAX.
>>
>> You can find the definition for both 32-bit and 64-bit:
>> #ifdef CONFIG_X86_32
>> # define CRASH_ADDR_LOW_MAX (512 << 20)
>> # define CRASH_ADDR_HIGH_MAX(512 << 20)
>> #else
>> # define CRASH_ADDR_LOW_MAX (896UL << 20)
>> # define CRASH_ADDR_HIGH_MAXMAXMEM
>> #endif
>>
>> as some memory was already allocated by the kernel, which means it's highly 
>> likely to get a reservation
>> failure after specifying a crashkernel value near 800MB(for x86_64) which 
>> was what you met. But we can't
>> get the exact threshold, but it would be better if there is some explanation 
>> accordingly in the document.
> To make sure I'm understanding what you are say, you want me to go
> into a bit more detail about the limitation and specify the
> differences between x86 and x86_64, right?

Yeah, it would be better to have one, at least to mention the different upper 
bounds.

As I replied in another post, if you really want to detail the behaviour, 
should mention
"crashkernel=size[KMG][@offset[KMG]]" with @offset[KMG] specified explicitly, 
after
all, it's handled differently with no upper bound limitation, but doing this 
may put
the first kernel at the risk of lacking low memory(some devices require 32bit 
DMA),
must use it with care because the kernel will assume users are aware of what 
they
are doing and make a successful reservation as long as the given range is 
available.

>
>>> I'm happy to adjust the wording, what would you recommend? Also, I'm
>>> not 100% sure that I got the cases covered correctly. I was surprised
>>> that I could not get it to work with the "new" format with the
>>> multiple ranges, and that specifying an offset would't work either,
>>> although the offset kind of makes sense. Do you know for sure that it
>>> doesn't work with ranges?
>>>
>>> I tried,
>>>
>>> crashkernel=256M-1G:128M,high,1G-4G:256M,high,4G-:512M,high
>>>
>>> and
>>>
>>> crashkernel=256M-1G:128M,1G-4G:256M,4G-:512M,high
>>>
>>> and neither worked. It seems that a better separator would be ';'
>>> instead of ',' for ranges, then you could specify options better. Kind
>>> of hard to change now.
>> For "crashkernel=range1:size1[,range2:size2,...][@offset]"
>> I'm afraid it doesn't support "high" prefix in the current implementation, 
>> so there is no guarantee.
>> I guess we can drop a note to eliminate the confusion.
> I tried to express in the extended syntax section that ',high' is not
> available and you have to use the 'simple' format. Do you think this

ditto

> needs to be expanded as well?

If you really have good reasons or use cases, please try it :-)

Regards,
Xunlei

>
>
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
> Signed-off-by: Robert LeBlanc 
> ---
>  Documentation/kdump/kdump.txt | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
> index b0eb27b..aa3efa8 100644
> --- a/Documentation/kdump/kdump.txt
> +++ b/Documentation/kdump/kdump.txt
> @@ -256,7 +256,9 @@ While the "crashkernel=size[@offset]" 

Re: [PATCH] Add +~800M crashkernel explaination

2016-12-14 Thread Robert LeBlanc
On Tue, Dec 13, 2016 at 8:08 PM, Xunlei Pang  wrote:
> On 12/10/2016 at 01:20 PM, Robert LeBlanc wrote:
>> On Fri, Dec 9, 2016 at 7:49 PM, Baoquan He  wrote:
>>> On 12/09/16 at 05:22pm, Robert LeBlanc wrote:
 When trying to configure crashkernel greater than about 800 MB, the
 kernel fails to allocate memory on x86 and x86_64. This is due to an
 undocumented limit that the crashkernel and other low memory items must
 be allocated below 896 MB unless the ",high" option is given. This
 updates the documentation to explain this and what I understand the
 limitations to be on the option.
>>> This is true, but not very accurate. You found it's about 800M, it's
>>> becasue usually the current kernel need about 40M space to run, and some
>>> extra reservation before reserve_crashkernel invocation, another ~10M.
>>> However it's normal case, people may build modules into or have some
>>> special code to bloat kernel. This patch makes sense to address the
>>> low|high issue, it might be not good so determined to say ~800M.
>> My testing showed that I could go anywhere from about 830M to 880M,
>> depending on distro, kernel version, and stuff that you mentioned. I
>> just thought some rule of thumb of when to consider using high would
>> be good. People may not think that 800 MB is 'large' when you have 512
>> GB of RAM for instance. I thought about making 512 MB be the rule of
>> thumb, but you can do a lot with ~300 MB.
>
> Hi Robert,
>
> I think you are correct.
>
> For x86, the kernel uses memblock to locate the proper range starts from 16MB 
> to some "end",
> without "high" prefix, "end" is CRASH_ADDR_LOW_MAX, otherwise 
> CRASH_ADDR_HIGH_MAX.
>
> You can find the definition for both 32-bit and 64-bit:
> #ifdef CONFIG_X86_32
> # define CRASH_ADDR_LOW_MAX (512 << 20)
> # define CRASH_ADDR_HIGH_MAX(512 << 20)
> #else
> # define CRASH_ADDR_LOW_MAX (896UL << 20)
> # define CRASH_ADDR_HIGH_MAXMAXMEM
> #endif
>
> as some memory was already allocated by the kernel, which means it's highly 
> likely to get a reservation
> failure after specifying a crashkernel value near 800MB(for x86_64) which was 
> what you met. But we can't
> get the exact threshold, but it would be better if there is some explanation 
> accordingly in the document.

To make sure I'm understanding what you are say, you want me to go
into a bit more detail about the limitation and specify the
differences between x86 and x86_64, right?

>> I'm happy to adjust the wording, what would you recommend? Also, I'm
>> not 100% sure that I got the cases covered correctly. I was surprised
>> that I could not get it to work with the "new" format with the
>> multiple ranges, and that specifying an offset would't work either,
>> although the offset kind of makes sense. Do you know for sure that it
>> doesn't work with ranges?
>>
>> I tried,
>>
>> crashkernel=256M-1G:128M,high,1G-4G:256M,high,4G-:512M,high
>>
>> and
>>
>> crashkernel=256M-1G:128M,1G-4G:256M,4G-:512M,high
>>
>> and neither worked. It seems that a better separator would be ';'
>> instead of ',' for ranges, then you could specify options better. Kind
>> of hard to change now.
>
> For "crashkernel=range1:size1[,range2:size2,...][@offset]"
> I'm afraid it doesn't support "high" prefix in the current implementation, so 
> there is no guarantee.
> I guess we can drop a note to eliminate the confusion.

I tried to express in the extended syntax section that ',high' is not
available and you have to use the 'simple' format. Do you think this
needs to be expanded as well?



Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

 Signed-off-by: Robert LeBlanc 
 ---
  Documentation/kdump/kdump.txt | 22 +-
  1 file changed, 17 insertions(+), 5 deletions(-)

 diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
 index b0eb27b..aa3efa8 100644
 --- a/Documentation/kdump/kdump.txt
 +++ b/Documentation/kdump/kdump.txt
 @@ -256,7 +256,9 @@ While the "crashkernel=size[@offset]" syntax is 
 sufficient for most
  configurations, sometimes it's handy to have the reserved memory dependent
  on the value of System RAM -- that's mostly for distributors that 
 pre-setup
  the kernel command line to avoid a unbootable system after some memory has
 -been removed from the machine.
 +been removed from the machine. If you need to allocate more than ~800M
 +for x86 or x86_64 then you must use the simple format as the format
 +',high' conflicts with the separators of ranges.

  The syntax is:

 @@ -282,11 +284,21 @@ Boot into System Kernel
  1) Update the boot loader (such as grub, yaboot, or lilo) configuration
 files as necessary.

 -2) Boot the system kernel with the boot parameter "crashkernel=Y@X",
 +2) Boot the system kernel with the 

Re: Need help for arm64

2016-12-14 Thread Geoff Levand

Hi,

On 12/14/2016 01:42 AM, He Zhe wrote:

Hi,

I notice the function below in kexec/arch/arm64/crashdump-arm64.c. It looks it always 
causes "kexec -p" to return:
"Memory for crashkernel is not reserved

Do we have plan to implement this?
Do we have workaround for arm64?


kexec does not yet support kdump.  Patches for it are in
review.  Check the kexec ML.

-Geoff

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Pratyush Anand



On Wednesday 14 December 2016 07:14 PM, Mark Rutland wrote:

Even for the non-kdump ie `kexec -l` case we do not have a
> functionality to bypass sha verification in kexec-tools. --lite
> option with the kexec-tools was discouraged and not accepted.

Ok. Do you have a pointer to the thread regarding that, for context?



https://lists.ozlabs.org/pipermail/petitboot/2015-October/000141.html
https://lists.ozlabs.org/pipermail/petitboot/2015-October/000136.html

~Pratyush


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Mark Rutland
Hi,

On Wed, Dec 14, 2016 at 05:51:05PM +0530, Pratyush Anand wrote:
> 
> On Wednesday 14 December 2016 05:07 PM, Mark Rutland wrote:
> >I see in an earlier message that the need for sha256 was being discussed
> >in another thread. Do either of you happen to have a pointer to that.
> 
> patch 0/2 of this series.

AFAICT, that just says the the existing sha256 check is slow, not *why*
a sha256 check of some description is necessary. I'm still at a loss as
to why it is considered necessary, rather than being a debugging aid or
sanity check.

> >To me, it seems like it doesn't come with much benefit for the kdump
> >case given that's best-effort anyway, and as above the verification code
> >could have been be corrupted. In the non-kdump case it's not strictly
> >necessary and seems like a debugging aid rather than a necessary piece
> >of functionality -- if that's the case, a 20 second delay isn't the end
> >of the world...
> 
> Even for the non-kdump ie `kexec -l` case we do not have a
> functionality to bypass sha verification in kexec-tools. --lite
> option with the kexec-tools was discouraged and not accepted.

Ok. Do you have a pointer to the thread regarding that, for context?

> So,it is 20s for both `kexec -l` and `kexec -p`.

Well, unless we can have a --{no-,}sha-check, and make the default NO
for arm64.

> Also other arch like x86_64 takes negligible time in sha verification.

That's certainly an argument for not changing the other architectures,
but given it's slow for arm64, we could have a different default...

Thanks,
Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] eppic: vhost_net_buffers - Adopt to struct sk_buff changes

2016-12-14 Thread Kamalesh Babulal
Linux kernel commit 56b174256b69 ("net: add rbnode to struct sk_buff"),
moves sk_buff->next into an union of sk_buff->{next/prev/tstamp/rb_node}.
Introduce this structure member change, while traversing the socket
buffer list.

Signed-off-by: Kamalesh Babulal 
---
 eppic_scripts/vhost_net_buffers.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/eppic_scripts/vhost_net_buffers.c 
b/eppic_scripts/vhost_net_buffers.c
index 39ae595..1260acb 100644
--- a/eppic_scripts/vhost_net_buffers.c
+++ b/eppic_scripts/vhost_net_buffers.c
@@ -45,7 +45,10 @@ vhost_net(struct vhost_net *net)
memset((char *)&(buff->data_len), 'L', 0x4);
}
 
-   next = buff->next;
+   /*
+* .next is the first entry.
+*/
+   next = (struct sk_buff *)(unsigned long)*buff;
}
 
head = (struct sk_buff_head *)&(sk->sk_write_queue);
@@ -60,8 +63,10 @@ vhost_net(struct vhost_net *net)
memset((char *)&(buff->data_len), 'L', 0x4);
}
 
-   next = buff->next;
-
+   /*
+* .next is the first entry.
+*/
+   next = (struct sk_buff *)(unsigned long)*buff;
}
}
 }
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/2] eppic: dir_names - Convert mnt_hash to hlist from list

2016-12-14 Thread Kamalesh Babulal
Linux kernel commit 38129a13e6e7 ("switch mnt_hash to hlist"),
moves mnt_hash from list_head to hlist_node. Introduce these
list type changes to iterate over the mounted filesystem and
walk dentries.

Signed-off-by: Kamalesh Babulal 
---
 eppic_scripts/dir_names.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/eppic_scripts/dir_names.c b/eppic_scripts/dir_names.c
index dbe6d00..6f6eb41 100644
--- a/eppic_scripts/dir_names.c
+++ b/eppic_scripts/dir_names.c
@@ -26,9 +26,12 @@ void
 rm_names(struct dentry *dir)
 {
struct list_head *next, *head;
+   unsigned int hash_len;
+   int i;
 
memset(dir->d_iname, 0, 0x20);
-   memset(dir->d_name.name, 0, 0x20);
+   hash_len = *((unsigned int *)>d_name);
+   memset(dir->d_name.name, 0, hash_len);
 
head = (struct list_head *)&(dir->d_subdirs);
next = (struct list_head *)dir->d_subdirs.next;
@@ -37,9 +40,9 @@ rm_names(struct dentry *dir)
{
struct dentry *child, *off = 0;
 
-   child = (struct dentry *)((unsigned long)next - (unsigned 
long)&(off->d_u));
+   child = (struct dentry *)((unsigned long)next - (unsigned 
long)&(off->d_child));
rm_names(child);
-   next = child->d_u.d_child.next;
+   next = child->d_child.next;
}
 
return;
@@ -49,29 +52,30 @@ int
 vfs()
 {
int i;
-   struct list_head *tab;
+   struct hlist_bl_head *tab;
+   unsigned int d_hash_size = d_hash_mask;
 
-   tab = (struct list_head *)mount_hashtable;
+   tab = (struct hlist_bl_head *)dentry_hashtable;
 
-   for (i = 0; i < 256; i++)
+   for (i = 0; i < d_hash_size; i++)
{
-   struct list_head *head, *next;
-
-   head = (struct list_head *) (tab + i);
-   next = (struct list_head *) head->next;
+   struct hlist_bl_head *head;
+   struct hlist_bl_node *head_node, *next;
 
-   if (!next)
+   head = (struct hlist_bl_head *) (tab + i);
+   head_node = head->first;
+   if (!head_node)
continue;
 
-   while (next != head)
+   next = head_node;
+
+   while (next)
{
-   struct mount *mntfs;
-   struct dentry *root;
+   struct dentry *root, *off = 0;
 
-   mntfs = (struct mount *)((unsigned long)next);
-   root = (struct dentry *)mntfs->mnt.mnt_root;
+   root = (struct dentry *)((unsigned long)next - 
(unsigned long)&(off->d_hash));
rm_names(root);
-   next = mntfs->mnt_hash.next;
+   next = next->next;
}
}
return 1;
-- 
2.7.4


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Pratyush Anand



On Wednesday 14 December 2016 05:07 PM, Mark Rutland wrote:

On Wed, Dec 14, 2016 at 11:16:17AM +, James Morse wrote:

Hi Pratyush,

On 14/12/16 10:12, Pratyush Anand wrote:

On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote:

I would go as far as to generate the page tables at 'kexec -l' time,
and only if


Ok..So you mean that I create a new section which will have page table
entries mapping physicalmemory represented by remaining section, and
then purgatory can just enable mmu with page table from that section,
right? Seems doable. can do that.


I see a problem here. If we create  page table as a new segment then, how can we
verify in purgatory that sha for page table is correct? We need page table
before sha verification start,and we can not rely the page table created by
first kernel until it's sha is verified. So a chicken-egg problem.


There is more than one of those! What happens if your sha256 calculation code is
corrupted? You have to run it before you know. The same goes for all the
purgatory code.

This is why I think its better to do this in the kernel before we exit to
purgatory, but obviously that doesn't work for kdump.


I see in an earlier message that the need for sha256 was being discussed
in another thread. Do either of you happen to have a pointer to that.



patch 0/2 of this series.


To me, it seems like it doesn't come with much benefit for the kdump
case given that's best-effort anyway, and as above the verification code
could have been be corrupted. In the non-kdump case it's not strictly
necessary and seems like a debugging aid rather than a necessary piece
of functionality -- if that's the case, a 20 second delay isn't the end
of the world...



Even for the non-kdump ie `kexec -l` case we do not have a functionality 
to bypass sha verification in kexec-tools. --lite option with the 
kexec-tools was discouraged and not accepted. So,it is 20s for both 
`kexec -l` and `kexec -p`.

Also other arch like x86_64 takes negligible time in sha verification.

~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread James Morse
Hi Mark,

On 14/12/16 11:37, Mark Rutland wrote:
> On Wed, Dec 14, 2016 at 11:16:17AM +, James Morse wrote:
>> On 14/12/16 10:12, Pratyush Anand wrote:
>>> On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote:
> I would go as far as to generate the page tables at 'kexec -l' time,
> and only if

 Ok..So you mean that I create a new section which will have page table
 entries mapping physicalmemory represented by remaining section, and
 then purgatory can just enable mmu with page table from that section,
 right? Seems doable. can do that.
>>>
>>> I see a problem here. If we create  page table as a new segment then, how 
>>> can we
>>> verify in purgatory that sha for page table is correct? We need page table
>>> before sha verification start,and we can not rely the page table created by
>>> first kernel until it's sha is verified. So a chicken-egg problem.
>>
>> There is more than one of those! What happens if your sha256 calculation 
>> code is
>> corrupted? You have to run it before you know. The same goes for all the
>> purgatory code.
>>
>> This is why I think its better to do this in the kernel before we exit to
>> purgatory, but obviously that doesn't work for kdump.
> 
> I see in an earlier message that the need for sha256 was being discussed
> in another thread. Do either of you happen to have a pointer to that.

https://www.spinics.net/lists/arm-kernel/msg544472.html


Thanks,

James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Pratyush Anand

Hi James,

Thanks for your input !!

On Wednesday 14 December 2016 04:46 PM, James Morse wrote:

Hi Pratyush,

On 14/12/16 10:12, Pratyush Anand wrote:

> On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote:

>>> I would go as far as to generate the page tables at 'kexec -l' time,
>>> and only if

>>
>> Ok..So you mean that I create a new section which will have page table
>> entries mapping physicalmemory represented by remaining section, and
>> then purgatory can just enable mmu with page table from that section,
>> right? Seems doable. can do that.

>
> I see a problem here. If we create  page table as a new segment then, how can 
we
> verify in purgatory that sha for page table is correct? We need page table
> before sha verification start,and we can not rely the page table created by
> first kernel until it's sha is verified. So a chicken-egg problem.

There is more than one of those! What happens if your sha256 calculation code is
corrupted? You have to run it before you know. The same goes for all the
purgatory code.



OK, seems reasonable... will do it in kexec code.


This is why I think its better to do this in the kernel before we exit to
purgatory, but obviously that doesn't work for kdump.



> I think, creating page table will just take fraction of second and should be
> good even in purgatory, What do you say?

If it's for kdump its best-effort. I think its easier/simpler to generate and
debug them at 'kexec -l' time, but if you're worried about the increased area
that could be corrupted then do it in purgatory.



~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Mark Rutland
On Wed, Dec 14, 2016 at 11:16:17AM +, James Morse wrote:
> Hi Pratyush,
> 
> On 14/12/16 10:12, Pratyush Anand wrote:
> > On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote:
> >>> I would go as far as to generate the page tables at 'kexec -l' time,
> >>> and only if
> >>
> >> Ok..So you mean that I create a new section which will have page table
> >> entries mapping physicalmemory represented by remaining section, and
> >> then purgatory can just enable mmu with page table from that section,
> >> right? Seems doable. can do that.
> > 
> > I see a problem here. If we create  page table as a new segment then, how 
> > can we
> > verify in purgatory that sha for page table is correct? We need page table
> > before sha verification start,and we can not rely the page table created by
> > first kernel until it's sha is verified. So a chicken-egg problem.
> 
> There is more than one of those! What happens if your sha256 calculation code 
> is
> corrupted? You have to run it before you know. The same goes for all the
> purgatory code.
> 
> This is why I think its better to do this in the kernel before we exit to
> purgatory, but obviously that doesn't work for kdump.

I see in an earlier message that the need for sha256 was being discussed
in another thread. Do either of you happen to have a pointer to that.

To me, it seems like it doesn't come with much benefit for the kdump
case given that's best-effort anyway, and as above the verification code
could have been be corrupted. In the non-kdump case it's not strictly
necessary and seems like a debugging aid rather than a necessary piece
of functionality -- if that's the case, a 20 second delay isn't the end
of the world...

Thanks,
Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread James Morse
Hi Pratyush,

On 14/12/16 09:38, Pratyush Anand wrote:
> On Saturday 26 November 2016 12:00 AM, James Morse wrote:
>> On 22/11/16 04:32, Pratyush Anand wrote:
>>> This patch adds support to enable/disable d-cache, which can be used for
>>> faster purgatory sha256 verification.
>>
>> (I'm not clear why we want the sha256, but that is being discussed elsewhere 
>> on
>>  the thread)
>>
>>
>>> We are supporting only 4K and 64K page sizes. This code will not work if a
>>> hardware is not supporting at least one of these page sizes.  Therefore,
>>> D-cache is disabled by default and enabled only when "enable-dcache" is
>>> passed to the kexec().
>>
>> I don't think the maybe-4K/maybe-64K/maybe-neither logic is needed. It would 
>> be
>> a lot simpler to only support one page size, which should be 4K as that is 
>> what
>> UEFI requires. (If there are CPUs that only support one size, I bet its 4K!)
> 
> Ok.. So, I will implement a new version after considering that 4K will always 
> be
> supported. If 4K is not supported by hw(which is very unlikely) then there 
> would
> be no d-cache enabling feature.

Sounds good tom me. I think its important to keep the purgatory code as small
and as simple as possible as its very hard to debug. If we do get bug reports
they are likely to be 'it didn't nothing', with no further details. If it only
fails on some platform we don't have access to its basically impossible.


>> I would go as far as to generate the page tables at 'kexec -l' time, and 
>> only if
> 
> Ok..So you mean that I create a new section which will have page table entries
> mapping physicalmemory represented by remaining section, and then purgatory 
> can
> just enable mmu with page table from that section, right? Seems doable. can do
> that.
> 
>> '/sys/firmware/efi' exists to indicate we booted via UEFI. (and therefore 
>> must
>> support 4K pages). This would keep the purgatory code as simple as possible.
> 
> What about reading ID_AA64MMFR0_EL1 instead of /sys/firmware/efi? That can 
> also
> tell us that whether 4K is supported or not?

If you're doing it at EL1/EL2 in the purgatory code, sure. But if you generate
the page tables at 'kexec -l' time you can't read this register from EL0 so you
need another way to guess if 4K pages are supported (or just assume they are and
test that register once you're in purgatory).

I was looking for some way to print a message at 'kexec -l' time that the sha256
would be slow as 4K wasn't supported. (a message printed at any other time won't
get seen).


>>> +/*
>>> + *disable_dcache: Disable D-cache and flush RAM locations
>>> + *ram_start - Start address of RAM
>>> + *ram_end - End address of RAM
>>> + */
>>> +void disable_dcache(uint64_t ram_start, uint64_t ram_end)
>>> +{
>>> +switch(get_current_el()) {
>>> +case 2:
>>> +reset_sctlr_el2();
>>> +break;
>>> +case 1:
>>> +reset_sctlr_el1();
>>
>> You have C code running between disabling the MMU and cleaning the cache. The
>> compiler is allowed to move data on and off the stack in here, but after
>> disabling the MMU it will see whatever was on the stack before we turned the 
>> MMU
>> on. Any data written at the beginning of this function is left in the caches.
>>
>> I'm afraid this sort of stuff needs to be done in assembly!
> 
> All these routines are self coded in assembly even though they are called
> from C, so should be safe I think. Anyway, I can keep all of them in
> assembly as well.

You can't tell the compiler that the stack data is inaccessible until the dcache
clean call completes. Some future version may do really crazy things in here.
You can decompile what your compiler version produces to check it doesn't
load/store to the stack, but that doesn't mean my compiler version does the
same. This is the kind of thing that is extremely difficult to debug, its best
not to take the risk.


Thanks,

James


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Pratyush Anand



On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote:




I would go as far as to generate the page tables at 'kexec -l' time,
and only if


Ok..So you mean that I create a new section which will have page table
entries mapping physicalmemory represented by remaining section, and
then purgatory can just enable mmu with page table from that section,
right? Seems doable. can do that.


I see a problem here. If we create  page table as a new segment then, 
how can we verify in purgatory that sha for page table is correct? We 
need page table before sha verification start,and we can not rely the 
page table created by first kernel until it's sha is verified. So a 
chicken-egg problem.


I think, creating page table will just take fraction of second and 
should be good even in purgatory, What do you say?


~Pratyush

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Need help for arm64

2016-12-14 Thread He Zhe
Hi,

I notice the function below in kexec/arch/arm64/crashdump-arm64.c. It looks it 
always causes "kexec -p" to return:
"Memory for crashkernel is not reserved
Please reserve memory by passing"crashkernel=X@Y" parameter to kernel
Then try to loading kdump kernel"

int is_crashkernel_mem_reserved(void)   
{   
return 0;   
}

Do we have plan to implement this?
Do we have workaround for arm64?

Thanks,
Zhe


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory

2016-12-14 Thread Pratyush Anand

Hi James,

Thanks a lot for your review. Its helpful.

On Saturday 26 November 2016 12:00 AM, James Morse wrote:

Hi Pratyush,

(CC: Mark, mismatched memory attributes in paragraph 3?)

On 22/11/16 04:32, Pratyush Anand wrote:

This patch adds support to enable/disable d-cache, which can be used for
faster purgatory sha256 verification.


(I'm not clear why we want the sha256, but that is being discussed elsewhere on
 the thread)



We are supporting only 4K and 64K page sizes. This code will not work if a
hardware is not supporting at least one of these page sizes.  Therefore,
D-cache is disabled by default and enabled only when "enable-dcache" is
passed to the kexec().


I don't think the maybe-4K/maybe-64K/maybe-neither logic is needed. It would be
a lot simpler to only support one page size, which should be 4K as that is what
UEFI requires. (If there are CPUs that only support one size, I bet its 4K!)


Ok.. So, I will implement a new version after considering that 4K will 
always be supported. If 4K is not supported by hw(which is very 
unlikely) then there would be no d-cache enabling feature.




I would go as far as to generate the page tables at 'kexec -l' time, and only if


Ok..So you mean that I create a new section which will have page table 
entries mapping physicalmemory represented by remaining section, and 
then purgatory can just enable mmu with page table from that section, 
right? Seems doable. can do that.



'/sys/firmware/efi' exists to indicate we booted via UEFI. (and therefore must
support 4K pages). This would keep the purgatory code as simple as possible.


What about reading ID_AA64MMFR0_EL1 instead of /sys/firmware/efi? That 
can also tell us that whether 4K is supported or not?




I don't think the performance difference between 4K and 64K page sizes will be
measurable, is purgatory really performance sensitive code?


I agree, implementing only 4K will make it very simple.





Since this is an identity mapped system, so VA_BITS will be same as max PA
bits supported. If VA_BITS <= 42 for 64K and <= 39 for 4K then only one
level of page table will be there with block descriptor entries.
Otherwise, For 4K mapping, TTBR points to level 0 lookups, which will have
only table entries pointing to a level 1 lookup. Level 1 will have only
block entries which will map 1GB block. For 64K mapping, TTBR points to
level 1 lookups, which will have only table entries pointing to a level 2
lookup. Level 2 will have only block entries which will map 512MB block. If


This is more complexity to pick a VA size. Why not always use the maximum 48bit
VA? The cost is negligible compared to having simpler (easier to review!)
purgatory code.

By always using 1GB blocks you may be creating aliases with mismatched 
attributes:
* If kdump only reserves 128MB, your 1GB mapping will alias whatever else was
  in the same 1GB of address space. This could be a reserved region with some
  other memory attributes.
* With kdump, we may have failed to park the other CPUs if they are executing
  with interrupts masked and haven't yet handled the smp_send_stop() IPI.
* One of these other CPUs could be reading/writing in this area as it doesn't
  belong to the kdump reserved area, just happens to be in the same 1GB.

I need to dig through the ARM-ARM to find out what happens next, but I'm pretty
sure this is well into the "don't do that" territory.


It would be much better to force the memory areas to be a multiple of 2MB and
2MB aligned, which will allow you to use 2M section mappings for memory, (but
not the uart). This way we only map regions we had reserved and know are memory.



OK. So, 48 bit VA, 4K page size, 3 level page table with entries in 3rd 
level representing 2M block size.







UART base address and RAM addresses are not at least 1GB and 512MB apart
for 4K and 64K respectively, then mapping result could be unpredictable. In
that case we need to support one more level of granularity, but until
someone needs that keep it like this only.

We can not allocate dynamic memory in purgatory. Therefore we keep page
table allocation size fixed as (3 * MAX_PAGE_SIZE). (page_table) points to
first level (having only table entries) and (page_table + MAX_PAGE_SIZE)
points to table at next level (having block entries).  If index for RAM
area and UART area in first table is not same, then we will need another
next level table which will be located at (page_table + 2 * MAX_PAGE_SIZE).




diff --git a/purgatory/arch/arm64/cache-asm.S b/purgatory/arch/arm64/cache-asm.S
new file mode 100644
index ..bef97ef4
--- /dev/null
+++ b/purgatory/arch/arm64/cache-asm.S
@@ -0,0 +1,186 @@
+/*
+ * Some of the routines have been copied from Linux Kernel, therefore
+ * copying the license as well.
+ *
+ * Copyright (C) 2001 Deep Blue Solutions Ltd.
+ * Copyright (C) 2012 ARM Ltd.
+ * Copyright (C) 2015 Pratyush Anand 
+ *
+ * This program is free software; you can redistribute it and/or 

Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
[...]
>>>
>> No. This patch just place a mark on these offline cpu. The next patch
>> for capture kernel will recognize this case, and ignore this kind of
>> pt_note by the code:
>> real_sz = 0; // although the size of this kind of PT_NOTE is not zero,
>> but it contains nothing useful, so just ignore it
>> phdr_ptr->p_memsz = real_sz
>
> If there is any other vmcore functional issue besides throwing "Warning: Zero 
> PT_NOTE entries found"?
>
Not at present when I debugged.
I just think we can not suppose the behaviour of different archs, so
just mark out the dummy pt_note. If some archs want to use these notes
memory,
they will just overwrite the dummy.

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Xunlei Pang
On 12/14/2016 at 04:56 PM, Liu ping fan wrote:
> On Wed, Dec 14, 2016 at 4:48 PM, Xunlei Pang  wrote:
>> On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
>>> kexec-tools always allocates program headers for each possible cpu. This
>>> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>>> the capture kernel can distinguish it from the mistake of allocated
>>> program header.
>>> The counterpart of the capture kernel comes in next patch.
>> Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), 
>> needless
>> to do it at the crash moment, right?
>>
> The cpus can be on-off-on.., We can not know the user's action.

I meant we can add the fake note into the cpu note buf, then the crash happens, 
the online ones
will be overwritten with the real note data, while others(!online) will still 
be the fake note.

>
>> BTW, does this cause any issue, for example the crash utility can't parse 
>> the vmcore
>> properly? or just reproduce lots of warnings after offline multiple cpus?
>>
> No. This patch just place a mark on these offline cpu. The next patch
> for capture kernel will recognize this case, and ignore this kind of
> pt_note by the code:
> real_sz = 0; // although the size of this kind of PT_NOTE is not zero,
> but it contains nothing useful, so just ignore it
> phdr_ptr->p_memsz = real_sz

If there is any other vmcore functional issue besides throwing "Warning: Zero 
PT_NOTE entries found"?

Regards,
Xunlei

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
[...]
>> >> >
>> >> > When you execute dmesg on your testing machine and grep nr_cpu_ids,
>> >> > what's the value of nr_cpu_ids?
>> >> >
>> >> nr_cpu_ids=128
>> >
>> > And what's the cpu number of in "lscpu" command?
>>
>> NUMA node1 CPU(s): 0-7
>>
>> The system booted up with 128 possible cpu and only 8 online.
>> Also I tested on x86 guest, after bootup with 8 cpus, then  offline 4
>> of them, the zero PT_NOTE warning buzz too.
>
> Yes, this is what I think not quite appropriate using
> for_each_cpu_not(cpu, cpu_online_mask). Maybe it need try to save on
> those cpus which is present but not online. not online seems not good,
> it's not reasonable to save those getting apic but no cpu plugged.
>
In the file:kexec-tools/kexec/crashdump-elf.c
   nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
And this is why this patch needs to make a mark on these offline cpu.
This is no something like "_SC_NPROCESSORS_PRESENT" option, so just
work around it in kernel side.
Anyway for crash kernel, we only write "core" in percpu notes without
no more info, and it costs nothing when capture kernel gather the
PT_NOTE.

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
On Wed, Dec 14, 2016 at 4:48 PM, Xunlei Pang  wrote:
> On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for each possible cpu. This
>> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> the capture kernel can distinguish it from the mistake of allocated
>> program header.
>> The counterpart of the capture kernel comes in next patch.
>
> Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), 
> needless
> to do it at the crash moment, right?
>
The cpus can be on-off-on.., We can not know the user's action.

> BTW, does this cause any issue, for example the crash utility can't parse the 
> vmcore
> properly? or just reproduce lots of warnings after offline multiple cpus?
>
No. This patch just place a mark on these offline cpu. The next patch
for capture kernel will recognize this case, and ignore this kind of
pt_note by the code:
real_sz = 0; // although the size of this kind of PT_NOTE is not zero,
but it contains nothing useful, so just ignore it
phdr_ptr->p_memsz = real_sz

Thx,
Pingfan

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Xunlei Pang
On 12/14/2016 at 02:11 PM, Pingfan Liu wrote:
> kexec-tools always allocates program headers for each possible cpu. This
> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> the capture kernel can distinguish it from the mistake of allocated
> program header.
> The counterpart of the capture kernel comes in next patch.

Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), 
needless
to do it at the crash moment, right?

BTW, does this cause any issue, for example the crash utility can't parse the 
vmcore
properly? or just reproduce lots of warnings after offline multiple cpus?

Regards,
Xunlei

>
> Signed-off-by: Pingfan Liu 
> ---
> This unnecessary warning buzz on all archs when there is offline cpu
>
>  include/uapi/linux/elf.h | 1 +
>  kernel/kexec_core.c  | 9 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index b59ee07..9744f1e 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -367,6 +367,7 @@ typedef struct elf64_shdr {
>   * using the corresponding note types via the PTRACE_GETREGSET and
>   * PTRACE_SETREGSET requests.
>   */
> +#define NT_DUMMY 0
>  #define NT_PRSTATUS  1
>  #define NT_PRFPREG   2
>  #define NT_PRPSINFO  3
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index 5616755..aeac16e 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs)
>   if (mutex_trylock(_mutex)) {
>   if (kexec_crash_image) {
>   struct pt_regs fixed_regs;
> + unsigned int cpu;
>  
>   crash_setup_regs(_regs, regs);
>   crash_save_vmcoreinfo();
> + for_each_cpu_not(cpu, cpu_online_mask)
> + crash_save_cpu(NULL, cpu);
>   machine_crash_shutdown(_regs);
>   machine_kexec(kexec_crash_image);
>   }
> @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu)
>   buf = (u32 *)per_cpu_ptr(crash_notes, cpu);
>   if (!buf)
>   return;
> + if (regs == NULL) {
> + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY,
> + NULL, 0);
> + final_note(buf);
> + return;
> + }
>   memset(, 0, sizeof(prstatus));
>   prstatus.pr_pid = current->pid;
>   elf_core_copy_kernel_regs(_reg, regs);


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Baoquan He
On 12/14/16 at 04:39pm, Liu ping fan wrote:
> On Wed, Dec 14, 2016 at 4:25 PM, Baoquan He  wrote:
> > On 12/14/16 at 04:15pm, Liu ping fan wrote:
> >> On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He  wrote:
> >> > On 12/14/16 at 02:11pm, Pingfan Liu wrote:
> >> >> kexec-tools always allocates program headers for each possible cpu. This
> >> >> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> >> >> the capture kernel can distinguish it from the mistake of allocated
> >> >> program header.
> >> >> The counterpart of the capture kernel comes in next patch.
> >> >
> >> > When you execute dmesg on your testing machine and grep nr_cpu_ids,
> >> > what's the value of nr_cpu_ids?
> >> >
> >> nr_cpu_ids=128
> >
> > And what's the cpu number of in "lscpu" command?
> 
> NUMA node1 CPU(s): 0-7
> 
> The system booted up with 128 possible cpu and only 8 online.
> Also I tested on x86 guest, after bootup with 8 cpus, then  offline 4
> of them, the zero PT_NOTE warning buzz too.

Yes, this is what I think not quite appropriate using
for_each_cpu_not(cpu, cpu_online_mask). Maybe it need try to save on
those cpus which is present but not online. not online seems not good,
it's not reasonable to save those getting apic but no cpu plugged.

Thanks
Baoquan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
On Wed, Dec 14, 2016 at 4:25 PM, Baoquan He  wrote:
> On 12/14/16 at 04:15pm, Liu ping fan wrote:
>> On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He  wrote:
>> > On 12/14/16 at 02:11pm, Pingfan Liu wrote:
>> >> kexec-tools always allocates program headers for each possible cpu. This
>> >> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> >> the capture kernel can distinguish it from the mistake of allocated
>> >> program header.
>> >> The counterpart of the capture kernel comes in next patch.
>> >
>> > When you execute dmesg on your testing machine and grep nr_cpu_ids,
>> > what's the value of nr_cpu_ids?
>> >
>> nr_cpu_ids=128
>
> And what's the cpu number of in "lscpu" command?

NUMA node1 CPU(s): 0-7

The system booted up with 128 possible cpu and only 8 online.
Also I tested on x86 guest, after bootup with 8 cpus, then  offline 4
of them, the zero PT_NOTE warning buzz too.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Baoquan He
On 12/14/16 at 04:15pm, Liu ping fan wrote:
> On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He  wrote:
> > On 12/14/16 at 02:11pm, Pingfan Liu wrote:
> >> kexec-tools always allocates program headers for each possible cpu. This
> >> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
> >> the capture kernel can distinguish it from the mistake of allocated
> >> program header.
> >> The counterpart of the capture kernel comes in next patch.
> >
> > When you execute dmesg on your testing machine and grep nr_cpu_ids,
> > what's the value of nr_cpu_ids?
> >
> nr_cpu_ids=128

And what's the cpu number of in "lscpu" command?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu

2016-12-14 Thread Liu ping fan
On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He  wrote:
> On 12/14/16 at 02:11pm, Pingfan Liu wrote:
>> kexec-tools always allocates program headers for each possible cpu. This
>> incurs zero PT_NOTE for offline cpu. We mark this case so that later,
>> the capture kernel can distinguish it from the mistake of allocated
>> program header.
>> The counterpart of the capture kernel comes in next patch.
>
> When you execute dmesg on your testing machine and grep nr_cpu_ids,
> what's the value of nr_cpu_ids?
>
nr_cpu_ids=128

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec