Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
Hi, Pingfan On 12/14/16 at 02:11pm, Pingfan Liu wrote: > kexec-tools always allocates program headers for each possible cpu. This > incurs zero PT_NOTE for offline cpu. We mark this case so that later, > the capture kernel can distinguish it from the mistake of allocated > program header. > The counterpart of the capture kernel comes in next patch. I thought you saw the warnings on ppc64 and it might be a ppc64 issue. But if this is instead a general issue, can we think about if this is really necessary? Does it have any side effect other than the warning messages? If there is nothing bad other than the warnings maybe leave it as is will be a better way. > > Signed-off-by: Pingfan Liu> --- > This unnecessary warning buzz on all archs when there is offline cpu > > include/uapi/linux/elf.h | 1 + > kernel/kexec_core.c | 9 + > 2 files changed, 10 insertions(+) > > diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h > index b59ee07..9744f1e 100644 > --- a/include/uapi/linux/elf.h > +++ b/include/uapi/linux/elf.h > @@ -367,6 +367,7 @@ typedef struct elf64_shdr { > * using the corresponding note types via the PTRACE_GETREGSET and > * PTRACE_SETREGSET requests. > */ > +#define NT_DUMMY 0 > #define NT_PRSTATUS 1 > #define NT_PRFPREG 2 > #define NT_PRPSINFO 3 > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 5616755..aeac16e 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs) > if (mutex_trylock(_mutex)) { > if (kexec_crash_image) { > struct pt_regs fixed_regs; > + unsigned int cpu; > > crash_setup_regs(_regs, regs); > crash_save_vmcoreinfo(); > + for_each_cpu_not(cpu, cpu_online_mask) > + crash_save_cpu(NULL, cpu); > machine_crash_shutdown(_regs); > machine_kexec(kexec_crash_image); > } > @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > buf = (u32 *)per_cpu_ptr(crash_notes, cpu); > if (!buf) > return; > + if (regs == NULL) { > + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY, > + NULL, 0); > + final_note(buf); > + return; > + } > memset(, 0, sizeof(prstatus)); > prstatus.pr_pid = current->pid; > elf_core_copy_kernel_regs(_reg, regs); > -- > 2.7.4 > Thanks Dave ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning
On Thu, Dec 15, 2016 at 7:56 AM, Xunlei Pangwrote: > On 12/14/2016 at 02:11 PM, Pingfan Liu wrote: >> kexec-tools always allocates program headers for possible cpus. But >> when crashing, offline cpus have dummy headers. We do not copy these >> dummy notes into ELF file, also have no need of warning on them. >> >> Signed-off-by: Pingfan Liu >> --- >> fs/proc/vmcore.c | 21 + >> 1 file changed, 17 insertions(+), 4 deletions(-) >> >> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c >> index 8ab782d..bbc9dad 100644 >> --- a/fs/proc/vmcore.c >> +++ b/fs/proc/vmcore.c >> @@ -526,9 +526,10 @@ static u64 __init get_vmcore_size(size_t elfsz, size_t >> elfnotesegsz, >> */ >> static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr) >> { >> - int i, rc=0; >> + int i, j, rc = 0; >> Elf64_Phdr *phdr_ptr; >> Elf64_Nhdr *nhdr_ptr; >> + bool warn; >> >> phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1); >> for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) { >> @@ -536,6 +537,7 @@ static int __init update_note_header_size_elf64(const >> Elf64_Ehdr *ehdr_ptr) >> u64 offset, max_sz, sz, real_sz = 0; >> if (phdr_ptr->p_type != PT_NOTE) >> continue; >> + warn = true; >> max_sz = phdr_ptr->p_memsz; >> offset = phdr_ptr->p_offset; >> notes_section = kmalloc(max_sz, GFP_KERNEL); >> @@ -547,7 +549,7 @@ static int __init update_note_header_size_elf64(const >> Elf64_Ehdr *ehdr_ptr) >> return rc; >> } >> nhdr_ptr = notes_section; >> - while (nhdr_ptr->n_namesz != 0) { >> + for (j = 0; nhdr_ptr->n_namesz != 0; j++) { > > Hi Pingfan, > > I think we don't need to be this complex, how about simply check before while > loop, > if it is the cpu dummy note(initialize it with some magic), then handle it > differently, > e.g. set a "nowarn" flag to use afterwards and make sure it has zero p_memsz? > I had thought that how the percpu note section was filled. But you are right, we can suppose that for all archs, cpus just overwrite the note, not append the note. > Also do the similar thing for update_note_header_size_elf32()? > Yes, will fix it. Thx, ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/2] [fs] proc/vmcore: check the dummy place holder for offline cpu to avoid warning
On 12/14/2016 at 02:11 PM, Pingfan Liu wrote: > kexec-tools always allocates program headers for possible cpus. But > when crashing, offline cpus have dummy headers. We do not copy these > dummy notes into ELF file, also have no need of warning on them. > > Signed-off-by: Pingfan Liu> --- > fs/proc/vmcore.c | 21 + > 1 file changed, 17 insertions(+), 4 deletions(-) > > diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c > index 8ab782d..bbc9dad 100644 > --- a/fs/proc/vmcore.c > +++ b/fs/proc/vmcore.c > @@ -526,9 +526,10 @@ static u64 __init get_vmcore_size(size_t elfsz, size_t > elfnotesegsz, > */ > static int __init update_note_header_size_elf64(const Elf64_Ehdr *ehdr_ptr) > { > - int i, rc=0; > + int i, j, rc = 0; > Elf64_Phdr *phdr_ptr; > Elf64_Nhdr *nhdr_ptr; > + bool warn; > > phdr_ptr = (Elf64_Phdr *)(ehdr_ptr + 1); > for (i = 0; i < ehdr_ptr->e_phnum; i++, phdr_ptr++) { > @@ -536,6 +537,7 @@ static int __init update_note_header_size_elf64(const > Elf64_Ehdr *ehdr_ptr) > u64 offset, max_sz, sz, real_sz = 0; > if (phdr_ptr->p_type != PT_NOTE) > continue; > + warn = true; > max_sz = phdr_ptr->p_memsz; > offset = phdr_ptr->p_offset; > notes_section = kmalloc(max_sz, GFP_KERNEL); > @@ -547,7 +549,7 @@ static int __init update_note_header_size_elf64(const > Elf64_Ehdr *ehdr_ptr) > return rc; > } > nhdr_ptr = notes_section; > - while (nhdr_ptr->n_namesz != 0) { > + for (j = 0; nhdr_ptr->n_namesz != 0; j++) { Hi Pingfan, I think we don't need to be this complex, how about simply check before while loop, if it is the cpu dummy note(initialize it with some magic), then handle it differently, e.g. set a "nowarn" flag to use afterwards and make sure it has zero p_memsz? Also do the similar thing for update_note_header_size_elf32()? Regards, Xunlei > sz = sizeof(Elf64_Nhdr) + > (((u64)nhdr_ptr->n_namesz + 3) & ~3) + > (((u64)nhdr_ptr->n_descsz + 3) & ~3); > @@ -559,11 +561,22 @@ static int __init update_note_header_size_elf64(const > Elf64_Ehdr *ehdr_ptr) > real_sz += sz; > nhdr_ptr = (Elf64_Nhdr*)((char*)nhdr_ptr + sz); > } > + if (real_sz != 0) > + warn = false; > + if (j == 1) { > + nhdr_ptr = notes_section; > + if ((nhdr_ptr->n_type == NT_DUMMY) > + && !strncmp(KEXEC_CORE_NOTE_NAME, > + (char *)nhdr_ptr + sizeof(Elf64_Nhdr), > + strlen(KEXEC_CORE_NOTE_NAME))) { > + /* do not copy this dummy note */ > + real_sz = 0; > + } > + } > kfree(notes_section); > phdr_ptr->p_memsz = real_sz; > - if (real_sz == 0) { > + if (warn) > pr_warn("Warning: Zero PT_NOTE entries found\n"); > - } > } > > return 0; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On 12/14/2016 at 05:13 PM, Liu ping fan wrote: > [...] >>> No. This patch just place a mark on these offline cpu. The next patch >>> for capture kernel will recognize this case, and ignore this kind of >>> pt_note by the code: >>> real_sz = 0; // although the size of this kind of PT_NOTE is not zero, >>> but it contains nothing useful, so just ignore it >>> phdr_ptr->p_memsz = real_sz >> If there is any other vmcore functional issue besides throwing "Warning: >> Zero PT_NOTE entries found"? >> > Not at present when I debugged. Well, agree that we should fix it given that it produces many unnecessary warnings on some machines. > I just think we can not suppose the behaviour of different archs, so > just mark out the dummy pt_note. If some archs want to use these notes > memory, > they will just overwrite the dummy. For cpu crash_notes, it should be arch-independent, and related to elf format. > > Thx, > Pingfan > > ___ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Add +~800M crashkernel explaination
On 12/15/2016 at 01:50 AM, Robert LeBlanc wrote: > On Tue, Dec 13, 2016 at 8:08 PM, Xunlei Pangwrote: >> On 12/10/2016 at 01:20 PM, Robert LeBlanc wrote: >>> On Fri, Dec 9, 2016 at 7:49 PM, Baoquan He wrote: On 12/09/16 at 05:22pm, Robert LeBlanc wrote: > When trying to configure crashkernel greater than about 800 MB, the > kernel fails to allocate memory on x86 and x86_64. This is due to an > undocumented limit that the crashkernel and other low memory items must > be allocated below 896 MB unless the ",high" option is given. This > updates the documentation to explain this and what I understand the > limitations to be on the option. This is true, but not very accurate. You found it's about 800M, it's becasue usually the current kernel need about 40M space to run, and some extra reservation before reserve_crashkernel invocation, another ~10M. However it's normal case, people may build modules into or have some special code to bloat kernel. This patch makes sense to address the low|high issue, it might be not good so determined to say ~800M. >>> My testing showed that I could go anywhere from about 830M to 880M, >>> depending on distro, kernel version, and stuff that you mentioned. I >>> just thought some rule of thumb of when to consider using high would >>> be good. People may not think that 800 MB is 'large' when you have 512 >>> GB of RAM for instance. I thought about making 512 MB be the rule of >>> thumb, but you can do a lot with ~300 MB. >> Hi Robert, >> >> I think you are correct. >> >> For x86, the kernel uses memblock to locate the proper range starts from >> 16MB to some "end", >> without "high" prefix, "end" is CRASH_ADDR_LOW_MAX, otherwise >> CRASH_ADDR_HIGH_MAX. >> >> You can find the definition for both 32-bit and 64-bit: >> #ifdef CONFIG_X86_32 >> # define CRASH_ADDR_LOW_MAX (512 << 20) >> # define CRASH_ADDR_HIGH_MAX(512 << 20) >> #else >> # define CRASH_ADDR_LOW_MAX (896UL << 20) >> # define CRASH_ADDR_HIGH_MAXMAXMEM >> #endif >> >> as some memory was already allocated by the kernel, which means it's highly >> likely to get a reservation >> failure after specifying a crashkernel value near 800MB(for x86_64) which >> was what you met. But we can't >> get the exact threshold, but it would be better if there is some explanation >> accordingly in the document. > To make sure I'm understanding what you are say, you want me to go > into a bit more detail about the limitation and specify the > differences between x86 and x86_64, right? Yeah, it would be better to have one, at least to mention the different upper bounds. As I replied in another post, if you really want to detail the behaviour, should mention "crashkernel=size[KMG][@offset[KMG]]" with @offset[KMG] specified explicitly, after all, it's handled differently with no upper bound limitation, but doing this may put the first kernel at the risk of lacking low memory(some devices require 32bit DMA), must use it with care because the kernel will assume users are aware of what they are doing and make a successful reservation as long as the given range is available. > >>> I'm happy to adjust the wording, what would you recommend? Also, I'm >>> not 100% sure that I got the cases covered correctly. I was surprised >>> that I could not get it to work with the "new" format with the >>> multiple ranges, and that specifying an offset would't work either, >>> although the offset kind of makes sense. Do you know for sure that it >>> doesn't work with ranges? >>> >>> I tried, >>> >>> crashkernel=256M-1G:128M,high,1G-4G:256M,high,4G-:512M,high >>> >>> and >>> >>> crashkernel=256M-1G:128M,1G-4G:256M,4G-:512M,high >>> >>> and neither worked. It seems that a better separator would be ';' >>> instead of ',' for ranges, then you could specify options better. Kind >>> of hard to change now. >> For "crashkernel=range1:size1[,range2:size2,...][@offset]" >> I'm afraid it doesn't support "high" prefix in the current implementation, >> so there is no guarantee. >> I guess we can drop a note to eliminate the confusion. > I tried to express in the extended syntax section that ',high' is not > available and you have to use the 'simple' format. Do you think this ditto > needs to be expanded as well? If you really have good reasons or use cases, please try it :-) Regards, Xunlei > > > > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > Signed-off-by: Robert LeBlanc > --- > Documentation/kdump/kdump.txt | 22 +- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt > index b0eb27b..aa3efa8 100644 > --- a/Documentation/kdump/kdump.txt > +++ b/Documentation/kdump/kdump.txt > @@ -256,7 +256,9 @@ While the "crashkernel=size[@offset]"
Re: [PATCH] Add +~800M crashkernel explaination
On Tue, Dec 13, 2016 at 8:08 PM, Xunlei Pangwrote: > On 12/10/2016 at 01:20 PM, Robert LeBlanc wrote: >> On Fri, Dec 9, 2016 at 7:49 PM, Baoquan He wrote: >>> On 12/09/16 at 05:22pm, Robert LeBlanc wrote: When trying to configure crashkernel greater than about 800 MB, the kernel fails to allocate memory on x86 and x86_64. This is due to an undocumented limit that the crashkernel and other low memory items must be allocated below 896 MB unless the ",high" option is given. This updates the documentation to explain this and what I understand the limitations to be on the option. >>> This is true, but not very accurate. You found it's about 800M, it's >>> becasue usually the current kernel need about 40M space to run, and some >>> extra reservation before reserve_crashkernel invocation, another ~10M. >>> However it's normal case, people may build modules into or have some >>> special code to bloat kernel. This patch makes sense to address the >>> low|high issue, it might be not good so determined to say ~800M. >> My testing showed that I could go anywhere from about 830M to 880M, >> depending on distro, kernel version, and stuff that you mentioned. I >> just thought some rule of thumb of when to consider using high would >> be good. People may not think that 800 MB is 'large' when you have 512 >> GB of RAM for instance. I thought about making 512 MB be the rule of >> thumb, but you can do a lot with ~300 MB. > > Hi Robert, > > I think you are correct. > > For x86, the kernel uses memblock to locate the proper range starts from 16MB > to some "end", > without "high" prefix, "end" is CRASH_ADDR_LOW_MAX, otherwise > CRASH_ADDR_HIGH_MAX. > > You can find the definition for both 32-bit and 64-bit: > #ifdef CONFIG_X86_32 > # define CRASH_ADDR_LOW_MAX (512 << 20) > # define CRASH_ADDR_HIGH_MAX(512 << 20) > #else > # define CRASH_ADDR_LOW_MAX (896UL << 20) > # define CRASH_ADDR_HIGH_MAXMAXMEM > #endif > > as some memory was already allocated by the kernel, which means it's highly > likely to get a reservation > failure after specifying a crashkernel value near 800MB(for x86_64) which was > what you met. But we can't > get the exact threshold, but it would be better if there is some explanation > accordingly in the document. To make sure I'm understanding what you are say, you want me to go into a bit more detail about the limitation and specify the differences between x86 and x86_64, right? >> I'm happy to adjust the wording, what would you recommend? Also, I'm >> not 100% sure that I got the cases covered correctly. I was surprised >> that I could not get it to work with the "new" format with the >> multiple ranges, and that specifying an offset would't work either, >> although the offset kind of makes sense. Do you know for sure that it >> doesn't work with ranges? >> >> I tried, >> >> crashkernel=256M-1G:128M,high,1G-4G:256M,high,4G-:512M,high >> >> and >> >> crashkernel=256M-1G:128M,1G-4G:256M,4G-:512M,high >> >> and neither worked. It seems that a better separator would be ';' >> instead of ',' for ranges, then you could specify options better. Kind >> of hard to change now. > > For "crashkernel=range1:size1[,range2:size2,...][@offset]" > I'm afraid it doesn't support "high" prefix in the current implementation, so > there is no guarantee. > I guess we can drop a note to eliminate the confusion. I tried to express in the extended syntax section that ',high' is not available and you have to use the 'simple' format. Do you think this needs to be expanded as well? Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 Signed-off-by: Robert LeBlanc --- Documentation/kdump/kdump.txt | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index b0eb27b..aa3efa8 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt @@ -256,7 +256,9 @@ While the "crashkernel=size[@offset]" syntax is sufficient for most configurations, sometimes it's handy to have the reserved memory dependent on the value of System RAM -- that's mostly for distributors that pre-setup the kernel command line to avoid a unbootable system after some memory has -been removed from the machine. +been removed from the machine. If you need to allocate more than ~800M +for x86 or x86_64 then you must use the simple format as the format +',high' conflicts with the separators of ranges. The syntax is: @@ -282,11 +284,21 @@ Boot into System Kernel 1) Update the boot loader (such as grub, yaboot, or lilo) configuration files as necessary. -2) Boot the system kernel with the boot parameter "crashkernel=Y@X", +2) Boot the system kernel with the
Re: Need help for arm64
Hi, On 12/14/2016 01:42 AM, He Zhe wrote: Hi, I notice the function below in kexec/arch/arm64/crashdump-arm64.c. It looks it always causes "kexec -p" to return: "Memory for crashkernel is not reserved Do we have plan to implement this? Do we have workaround for arm64? kexec does not yet support kdump. Patches for it are in review. Check the kexec ML. -Geoff ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
On Wednesday 14 December 2016 07:14 PM, Mark Rutland wrote: Even for the non-kdump ie `kexec -l` case we do not have a > functionality to bypass sha verification in kexec-tools. --lite > option with the kexec-tools was discouraged and not accepted. Ok. Do you have a pointer to the thread regarding that, for context? https://lists.ozlabs.org/pipermail/petitboot/2015-October/000141.html https://lists.ozlabs.org/pipermail/petitboot/2015-October/000136.html ~Pratyush ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
Hi, On Wed, Dec 14, 2016 at 05:51:05PM +0530, Pratyush Anand wrote: > > On Wednesday 14 December 2016 05:07 PM, Mark Rutland wrote: > >I see in an earlier message that the need for sha256 was being discussed > >in another thread. Do either of you happen to have a pointer to that. > > patch 0/2 of this series. AFAICT, that just says the the existing sha256 check is slow, not *why* a sha256 check of some description is necessary. I'm still at a loss as to why it is considered necessary, rather than being a debugging aid or sanity check. > >To me, it seems like it doesn't come with much benefit for the kdump > >case given that's best-effort anyway, and as above the verification code > >could have been be corrupted. In the non-kdump case it's not strictly > >necessary and seems like a debugging aid rather than a necessary piece > >of functionality -- if that's the case, a 20 second delay isn't the end > >of the world... > > Even for the non-kdump ie `kexec -l` case we do not have a > functionality to bypass sha verification in kexec-tools. --lite > option with the kexec-tools was discouraged and not accepted. Ok. Do you have a pointer to the thread regarding that, for context? > So,it is 20s for both `kexec -l` and `kexec -p`. Well, unless we can have a --{no-,}sha-check, and make the default NO for arm64. > Also other arch like x86_64 takes negligible time in sha verification. That's certainly an argument for not changing the other architectures, but given it's slow for arm64, we could have a different default... Thanks, Mark. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/2] eppic: vhost_net_buffers - Adopt to struct sk_buff changes
Linux kernel commit 56b174256b69 ("net: add rbnode to struct sk_buff"), moves sk_buff->next into an union of sk_buff->{next/prev/tstamp/rb_node}. Introduce this structure member change, while traversing the socket buffer list. Signed-off-by: Kamalesh Babulal--- eppic_scripts/vhost_net_buffers.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/eppic_scripts/vhost_net_buffers.c b/eppic_scripts/vhost_net_buffers.c index 39ae595..1260acb 100644 --- a/eppic_scripts/vhost_net_buffers.c +++ b/eppic_scripts/vhost_net_buffers.c @@ -45,7 +45,10 @@ vhost_net(struct vhost_net *net) memset((char *)&(buff->data_len), 'L', 0x4); } - next = buff->next; + /* +* .next is the first entry. +*/ + next = (struct sk_buff *)(unsigned long)*buff; } head = (struct sk_buff_head *)&(sk->sk_write_queue); @@ -60,8 +63,10 @@ vhost_net(struct vhost_net *net) memset((char *)&(buff->data_len), 'L', 0x4); } - next = buff->next; - + /* +* .next is the first entry. +*/ + next = (struct sk_buff *)(unsigned long)*buff; } } } -- 2.7.4 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/2] eppic: dir_names - Convert mnt_hash to hlist from list
Linux kernel commit 38129a13e6e7 ("switch mnt_hash to hlist"), moves mnt_hash from list_head to hlist_node. Introduce these list type changes to iterate over the mounted filesystem and walk dentries. Signed-off-by: Kamalesh Babulal--- eppic_scripts/dir_names.c | 38 +- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/eppic_scripts/dir_names.c b/eppic_scripts/dir_names.c index dbe6d00..6f6eb41 100644 --- a/eppic_scripts/dir_names.c +++ b/eppic_scripts/dir_names.c @@ -26,9 +26,12 @@ void rm_names(struct dentry *dir) { struct list_head *next, *head; + unsigned int hash_len; + int i; memset(dir->d_iname, 0, 0x20); - memset(dir->d_name.name, 0, 0x20); + hash_len = *((unsigned int *)>d_name); + memset(dir->d_name.name, 0, hash_len); head = (struct list_head *)&(dir->d_subdirs); next = (struct list_head *)dir->d_subdirs.next; @@ -37,9 +40,9 @@ rm_names(struct dentry *dir) { struct dentry *child, *off = 0; - child = (struct dentry *)((unsigned long)next - (unsigned long)&(off->d_u)); + child = (struct dentry *)((unsigned long)next - (unsigned long)&(off->d_child)); rm_names(child); - next = child->d_u.d_child.next; + next = child->d_child.next; } return; @@ -49,29 +52,30 @@ int vfs() { int i; - struct list_head *tab; + struct hlist_bl_head *tab; + unsigned int d_hash_size = d_hash_mask; - tab = (struct list_head *)mount_hashtable; + tab = (struct hlist_bl_head *)dentry_hashtable; - for (i = 0; i < 256; i++) + for (i = 0; i < d_hash_size; i++) { - struct list_head *head, *next; - - head = (struct list_head *) (tab + i); - next = (struct list_head *) head->next; + struct hlist_bl_head *head; + struct hlist_bl_node *head_node, *next; - if (!next) + head = (struct hlist_bl_head *) (tab + i); + head_node = head->first; + if (!head_node) continue; - while (next != head) + next = head_node; + + while (next) { - struct mount *mntfs; - struct dentry *root; + struct dentry *root, *off = 0; - mntfs = (struct mount *)((unsigned long)next); - root = (struct dentry *)mntfs->mnt.mnt_root; + root = (struct dentry *)((unsigned long)next - (unsigned long)&(off->d_hash)); rm_names(root); - next = mntfs->mnt_hash.next; + next = next->next; } } return 1; -- 2.7.4 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
On Wednesday 14 December 2016 05:07 PM, Mark Rutland wrote: On Wed, Dec 14, 2016 at 11:16:17AM +, James Morse wrote: Hi Pratyush, On 14/12/16 10:12, Pratyush Anand wrote: On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote: I would go as far as to generate the page tables at 'kexec -l' time, and only if Ok..So you mean that I create a new section which will have page table entries mapping physicalmemory represented by remaining section, and then purgatory can just enable mmu with page table from that section, right? Seems doable. can do that. I see a problem here. If we create page table as a new segment then, how can we verify in purgatory that sha for page table is correct? We need page table before sha verification start,and we can not rely the page table created by first kernel until it's sha is verified. So a chicken-egg problem. There is more than one of those! What happens if your sha256 calculation code is corrupted? You have to run it before you know. The same goes for all the purgatory code. This is why I think its better to do this in the kernel before we exit to purgatory, but obviously that doesn't work for kdump. I see in an earlier message that the need for sha256 was being discussed in another thread. Do either of you happen to have a pointer to that. patch 0/2 of this series. To me, it seems like it doesn't come with much benefit for the kdump case given that's best-effort anyway, and as above the verification code could have been be corrupted. In the non-kdump case it's not strictly necessary and seems like a debugging aid rather than a necessary piece of functionality -- if that's the case, a 20 second delay isn't the end of the world... Even for the non-kdump ie `kexec -l` case we do not have a functionality to bypass sha verification in kexec-tools. --lite option with the kexec-tools was discouraged and not accepted. So,it is 20s for both `kexec -l` and `kexec -p`. Also other arch like x86_64 takes negligible time in sha verification. ~Pratyush ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
Hi Mark, On 14/12/16 11:37, Mark Rutland wrote: > On Wed, Dec 14, 2016 at 11:16:17AM +, James Morse wrote: >> On 14/12/16 10:12, Pratyush Anand wrote: >>> On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote: > I would go as far as to generate the page tables at 'kexec -l' time, > and only if Ok..So you mean that I create a new section which will have page table entries mapping physicalmemory represented by remaining section, and then purgatory can just enable mmu with page table from that section, right? Seems doable. can do that. >>> >>> I see a problem here. If we create page table as a new segment then, how >>> can we >>> verify in purgatory that sha for page table is correct? We need page table >>> before sha verification start,and we can not rely the page table created by >>> first kernel until it's sha is verified. So a chicken-egg problem. >> >> There is more than one of those! What happens if your sha256 calculation >> code is >> corrupted? You have to run it before you know. The same goes for all the >> purgatory code. >> >> This is why I think its better to do this in the kernel before we exit to >> purgatory, but obviously that doesn't work for kdump. > > I see in an earlier message that the need for sha256 was being discussed > in another thread. Do either of you happen to have a pointer to that. https://www.spinics.net/lists/arm-kernel/msg544472.html Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
Hi James, Thanks for your input !! On Wednesday 14 December 2016 04:46 PM, James Morse wrote: Hi Pratyush, On 14/12/16 10:12, Pratyush Anand wrote: > On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote: >>> I would go as far as to generate the page tables at 'kexec -l' time, >>> and only if >> >> Ok..So you mean that I create a new section which will have page table >> entries mapping physicalmemory represented by remaining section, and >> then purgatory can just enable mmu with page table from that section, >> right? Seems doable. can do that. > > I see a problem here. If we create page table as a new segment then, how can we > verify in purgatory that sha for page table is correct? We need page table > before sha verification start,and we can not rely the page table created by > first kernel until it's sha is verified. So a chicken-egg problem. There is more than one of those! What happens if your sha256 calculation code is corrupted? You have to run it before you know. The same goes for all the purgatory code. OK, seems reasonable... will do it in kexec code. This is why I think its better to do this in the kernel before we exit to purgatory, but obviously that doesn't work for kdump. > I think, creating page table will just take fraction of second and should be > good even in purgatory, What do you say? If it's for kdump its best-effort. I think its easier/simpler to generate and debug them at 'kexec -l' time, but if you're worried about the increased area that could be corrupted then do it in purgatory. ~Pratyush ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
On Wed, Dec 14, 2016 at 11:16:17AM +, James Morse wrote: > Hi Pratyush, > > On 14/12/16 10:12, Pratyush Anand wrote: > > On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote: > >>> I would go as far as to generate the page tables at 'kexec -l' time, > >>> and only if > >> > >> Ok..So you mean that I create a new section which will have page table > >> entries mapping physicalmemory represented by remaining section, and > >> then purgatory can just enable mmu with page table from that section, > >> right? Seems doable. can do that. > > > > I see a problem here. If we create page table as a new segment then, how > > can we > > verify in purgatory that sha for page table is correct? We need page table > > before sha verification start,and we can not rely the page table created by > > first kernel until it's sha is verified. So a chicken-egg problem. > > There is more than one of those! What happens if your sha256 calculation code > is > corrupted? You have to run it before you know. The same goes for all the > purgatory code. > > This is why I think its better to do this in the kernel before we exit to > purgatory, but obviously that doesn't work for kdump. I see in an earlier message that the need for sha256 was being discussed in another thread. Do either of you happen to have a pointer to that. To me, it seems like it doesn't come with much benefit for the kdump case given that's best-effort anyway, and as above the verification code could have been be corrupted. In the non-kdump case it's not strictly necessary and seems like a debugging aid rather than a necessary piece of functionality -- if that's the case, a 20 second delay isn't the end of the world... Thanks, Mark. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
Hi Pratyush, On 14/12/16 09:38, Pratyush Anand wrote: > On Saturday 26 November 2016 12:00 AM, James Morse wrote: >> On 22/11/16 04:32, Pratyush Anand wrote: >>> This patch adds support to enable/disable d-cache, which can be used for >>> faster purgatory sha256 verification. >> >> (I'm not clear why we want the sha256, but that is being discussed elsewhere >> on >> the thread) >> >> >>> We are supporting only 4K and 64K page sizes. This code will not work if a >>> hardware is not supporting at least one of these page sizes. Therefore, >>> D-cache is disabled by default and enabled only when "enable-dcache" is >>> passed to the kexec(). >> >> I don't think the maybe-4K/maybe-64K/maybe-neither logic is needed. It would >> be >> a lot simpler to only support one page size, which should be 4K as that is >> what >> UEFI requires. (If there are CPUs that only support one size, I bet its 4K!) > > Ok.. So, I will implement a new version after considering that 4K will always > be > supported. If 4K is not supported by hw(which is very unlikely) then there > would > be no d-cache enabling feature. Sounds good tom me. I think its important to keep the purgatory code as small and as simple as possible as its very hard to debug. If we do get bug reports they are likely to be 'it didn't nothing', with no further details. If it only fails on some platform we don't have access to its basically impossible. >> I would go as far as to generate the page tables at 'kexec -l' time, and >> only if > > Ok..So you mean that I create a new section which will have page table entries > mapping physicalmemory represented by remaining section, and then purgatory > can > just enable mmu with page table from that section, right? Seems doable. can do > that. > >> '/sys/firmware/efi' exists to indicate we booted via UEFI. (and therefore >> must >> support 4K pages). This would keep the purgatory code as simple as possible. > > What about reading ID_AA64MMFR0_EL1 instead of /sys/firmware/efi? That can > also > tell us that whether 4K is supported or not? If you're doing it at EL1/EL2 in the purgatory code, sure. But if you generate the page tables at 'kexec -l' time you can't read this register from EL0 so you need another way to guess if 4K pages are supported (or just assume they are and test that register once you're in purgatory). I was looking for some way to print a message at 'kexec -l' time that the sha256 would be slow as 4K wasn't supported. (a message printed at any other time won't get seen). >>> +/* >>> + *disable_dcache: Disable D-cache and flush RAM locations >>> + *ram_start - Start address of RAM >>> + *ram_end - End address of RAM >>> + */ >>> +void disable_dcache(uint64_t ram_start, uint64_t ram_end) >>> +{ >>> +switch(get_current_el()) { >>> +case 2: >>> +reset_sctlr_el2(); >>> +break; >>> +case 1: >>> +reset_sctlr_el1(); >> >> You have C code running between disabling the MMU and cleaning the cache. The >> compiler is allowed to move data on and off the stack in here, but after >> disabling the MMU it will see whatever was on the stack before we turned the >> MMU >> on. Any data written at the beginning of this function is left in the caches. >> >> I'm afraid this sort of stuff needs to be done in assembly! > > All these routines are self coded in assembly even though they are called > from C, so should be safe I think. Anyway, I can keep all of them in > assembly as well. You can't tell the compiler that the stack data is inaccessible until the dcache clean call completes. Some future version may do really crazy things in here. You can decompile what your compiler version produces to check it doesn't load/store to the stack, but that doesn't mean my compiler version does the same. This is the kind of thing that is extremely difficult to debug, its best not to take the risk. Thanks, James ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
On Wednesday 14 December 2016 03:08 PM, Pratyush Anand wrote: I would go as far as to generate the page tables at 'kexec -l' time, and only if Ok..So you mean that I create a new section which will have page table entries mapping physicalmemory represented by remaining section, and then purgatory can just enable mmu with page table from that section, right? Seems doable. can do that. I see a problem here. If we create page table as a new segment then, how can we verify in purgatory that sha for page table is correct? We need page table before sha verification start,and we can not rely the page table created by first kernel until it's sha is verified. So a chicken-egg problem. I think, creating page table will just take fraction of second and should be good even in purgatory, What do you say? ~Pratyush ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Need help for arm64
Hi, I notice the function below in kexec/arch/arm64/crashdump-arm64.c. It looks it always causes "kexec -p" to return: "Memory for crashkernel is not reserved Please reserve memory by passing"crashkernel=X@Y" parameter to kernel Then try to loading kdump kernel" int is_crashkernel_mem_reserved(void) { return 0; } Do we have plan to implement this? Do we have workaround for arm64? Thanks, Zhe ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] arm64: Add enable/disable d-cache support for purgatory
Hi James, Thanks a lot for your review. Its helpful. On Saturday 26 November 2016 12:00 AM, James Morse wrote: Hi Pratyush, (CC: Mark, mismatched memory attributes in paragraph 3?) On 22/11/16 04:32, Pratyush Anand wrote: This patch adds support to enable/disable d-cache, which can be used for faster purgatory sha256 verification. (I'm not clear why we want the sha256, but that is being discussed elsewhere on the thread) We are supporting only 4K and 64K page sizes. This code will not work if a hardware is not supporting at least one of these page sizes. Therefore, D-cache is disabled by default and enabled only when "enable-dcache" is passed to the kexec(). I don't think the maybe-4K/maybe-64K/maybe-neither logic is needed. It would be a lot simpler to only support one page size, which should be 4K as that is what UEFI requires. (If there are CPUs that only support one size, I bet its 4K!) Ok.. So, I will implement a new version after considering that 4K will always be supported. If 4K is not supported by hw(which is very unlikely) then there would be no d-cache enabling feature. I would go as far as to generate the page tables at 'kexec -l' time, and only if Ok..So you mean that I create a new section which will have page table entries mapping physicalmemory represented by remaining section, and then purgatory can just enable mmu with page table from that section, right? Seems doable. can do that. '/sys/firmware/efi' exists to indicate we booted via UEFI. (and therefore must support 4K pages). This would keep the purgatory code as simple as possible. What about reading ID_AA64MMFR0_EL1 instead of /sys/firmware/efi? That can also tell us that whether 4K is supported or not? I don't think the performance difference between 4K and 64K page sizes will be measurable, is purgatory really performance sensitive code? I agree, implementing only 4K will make it very simple. Since this is an identity mapped system, so VA_BITS will be same as max PA bits supported. If VA_BITS <= 42 for 64K and <= 39 for 4K then only one level of page table will be there with block descriptor entries. Otherwise, For 4K mapping, TTBR points to level 0 lookups, which will have only table entries pointing to a level 1 lookup. Level 1 will have only block entries which will map 1GB block. For 64K mapping, TTBR points to level 1 lookups, which will have only table entries pointing to a level 2 lookup. Level 2 will have only block entries which will map 512MB block. If This is more complexity to pick a VA size. Why not always use the maximum 48bit VA? The cost is negligible compared to having simpler (easier to review!) purgatory code. By always using 1GB blocks you may be creating aliases with mismatched attributes: * If kdump only reserves 128MB, your 1GB mapping will alias whatever else was in the same 1GB of address space. This could be a reserved region with some other memory attributes. * With kdump, we may have failed to park the other CPUs if they are executing with interrupts masked and haven't yet handled the smp_send_stop() IPI. * One of these other CPUs could be reading/writing in this area as it doesn't belong to the kdump reserved area, just happens to be in the same 1GB. I need to dig through the ARM-ARM to find out what happens next, but I'm pretty sure this is well into the "don't do that" territory. It would be much better to force the memory areas to be a multiple of 2MB and 2MB aligned, which will allow you to use 2M section mappings for memory, (but not the uart). This way we only map regions we had reserved and know are memory. OK. So, 48 bit VA, 4K page size, 3 level page table with entries in 3rd level representing 2M block size. UART base address and RAM addresses are not at least 1GB and 512MB apart for 4K and 64K respectively, then mapping result could be unpredictable. In that case we need to support one more level of granularity, but until someone needs that keep it like this only. We can not allocate dynamic memory in purgatory. Therefore we keep page table allocation size fixed as (3 * MAX_PAGE_SIZE). (page_table) points to first level (having only table entries) and (page_table + MAX_PAGE_SIZE) points to table at next level (having block entries). If index for RAM area and UART area in first table is not same, then we will need another next level table which will be located at (page_table + 2 * MAX_PAGE_SIZE). diff --git a/purgatory/arch/arm64/cache-asm.S b/purgatory/arch/arm64/cache-asm.S new file mode 100644 index ..bef97ef4 --- /dev/null +++ b/purgatory/arch/arm64/cache-asm.S @@ -0,0 +1,186 @@ +/* + * Some of the routines have been copied from Linux Kernel, therefore + * copying the license as well. + * + * Copyright (C) 2001 Deep Blue Solutions Ltd. + * Copyright (C) 2012 ARM Ltd. + * Copyright (C) 2015 Pratyush Anand+ * + * This program is free software; you can redistribute it and/or
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
[...] >>> >> No. This patch just place a mark on these offline cpu. The next patch >> for capture kernel will recognize this case, and ignore this kind of >> pt_note by the code: >> real_sz = 0; // although the size of this kind of PT_NOTE is not zero, >> but it contains nothing useful, so just ignore it >> phdr_ptr->p_memsz = real_sz > > If there is any other vmcore functional issue besides throwing "Warning: Zero > PT_NOTE entries found"? > Not at present when I debugged. I just think we can not suppose the behaviour of different archs, so just mark out the dummy pt_note. If some archs want to use these notes memory, they will just overwrite the dummy. Thx, Pingfan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On 12/14/2016 at 04:56 PM, Liu ping fan wrote: > On Wed, Dec 14, 2016 at 4:48 PM, Xunlei Pangwrote: >> On 12/14/2016 at 02:11 PM, Pingfan Liu wrote: >>> kexec-tools always allocates program headers for each possible cpu. This >>> incurs zero PT_NOTE for offline cpu. We mark this case so that later, >>> the capture kernel can distinguish it from the mistake of allocated >>> program header. >>> The counterpart of the capture kernel comes in next patch. >> Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), >> needless >> to do it at the crash moment, right? >> > The cpus can be on-off-on.., We can not know the user's action. I meant we can add the fake note into the cpu note buf, then the crash happens, the online ones will be overwritten with the real note data, while others(!online) will still be the fake note. > >> BTW, does this cause any issue, for example the crash utility can't parse >> the vmcore >> properly? or just reproduce lots of warnings after offline multiple cpus? >> > No. This patch just place a mark on these offline cpu. The next patch > for capture kernel will recognize this case, and ignore this kind of > pt_note by the code: > real_sz = 0; // although the size of this kind of PT_NOTE is not zero, > but it contains nothing useful, so just ignore it > phdr_ptr->p_memsz = real_sz If there is any other vmcore functional issue besides throwing "Warning: Zero PT_NOTE entries found"? Regards, Xunlei ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
[...] >> >> > >> >> > When you execute dmesg on your testing machine and grep nr_cpu_ids, >> >> > what's the value of nr_cpu_ids? >> >> > >> >> nr_cpu_ids=128 >> > >> > And what's the cpu number of in "lscpu" command? >> >> NUMA node1 CPU(s): 0-7 >> >> The system booted up with 128 possible cpu and only 8 online. >> Also I tested on x86 guest, after bootup with 8 cpus, then offline 4 >> of them, the zero PT_NOTE warning buzz too. > > Yes, this is what I think not quite appropriate using > for_each_cpu_not(cpu, cpu_online_mask). Maybe it need try to save on > those cpus which is present but not online. not online seems not good, > it's not reasonable to save those getting apic but no cpu plugged. > In the file:kexec-tools/kexec/crashdump-elf.c nr_cpus = sysconf(_SC_NPROCESSORS_CONF); And this is why this patch needs to make a mark on these offline cpu. This is no something like "_SC_NPROCESSORS_PRESENT" option, so just work around it in kernel side. Anyway for crash kernel, we only write "core" in percpu notes without no more info, and it costs nothing when capture kernel gather the PT_NOTE. Thx, Pingfan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On Wed, Dec 14, 2016 at 4:48 PM, Xunlei Pangwrote: > On 12/14/2016 at 02:11 PM, Pingfan Liu wrote: >> kexec-tools always allocates program headers for each possible cpu. This >> incurs zero PT_NOTE for offline cpu. We mark this case so that later, >> the capture kernel can distinguish it from the mistake of allocated >> program header. >> The counterpart of the capture kernel comes in next patch. > > Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), > needless > to do it at the crash moment, right? > The cpus can be on-off-on.., We can not know the user's action. > BTW, does this cause any issue, for example the crash utility can't parse the > vmcore > properly? or just reproduce lots of warnings after offline multiple cpus? > No. This patch just place a mark on these offline cpu. The next patch for capture kernel will recognize this case, and ignore this kind of pt_note by the code: real_sz = 0; // although the size of this kind of PT_NOTE is not zero, but it contains nothing useful, so just ignore it phdr_ptr->p_memsz = real_sz Thx, Pingfan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On 12/14/2016 at 02:11 PM, Pingfan Liu wrote: > kexec-tools always allocates program headers for each possible cpu. This > incurs zero PT_NOTE for offline cpu. We mark this case so that later, > the capture kernel can distinguish it from the mistake of allocated > program header. > The counterpart of the capture kernel comes in next patch. Hmm, we can initialize the cpu crash note buf in crash_notes_memory_init(), needless to do it at the crash moment, right? BTW, does this cause any issue, for example the crash utility can't parse the vmcore properly? or just reproduce lots of warnings after offline multiple cpus? Regards, Xunlei > > Signed-off-by: Pingfan Liu> --- > This unnecessary warning buzz on all archs when there is offline cpu > > include/uapi/linux/elf.h | 1 + > kernel/kexec_core.c | 9 + > 2 files changed, 10 insertions(+) > > diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h > index b59ee07..9744f1e 100644 > --- a/include/uapi/linux/elf.h > +++ b/include/uapi/linux/elf.h > @@ -367,6 +367,7 @@ typedef struct elf64_shdr { > * using the corresponding note types via the PTRACE_GETREGSET and > * PTRACE_SETREGSET requests. > */ > +#define NT_DUMMY 0 > #define NT_PRSTATUS 1 > #define NT_PRFPREG 2 > #define NT_PRPSINFO 3 > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index 5616755..aeac16e 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -891,9 +891,12 @@ void __crash_kexec(struct pt_regs *regs) > if (mutex_trylock(_mutex)) { > if (kexec_crash_image) { > struct pt_regs fixed_regs; > + unsigned int cpu; > > crash_setup_regs(_regs, regs); > crash_save_vmcoreinfo(); > + for_each_cpu_not(cpu, cpu_online_mask) > + crash_save_cpu(NULL, cpu); > machine_crash_shutdown(_regs); > machine_kexec(kexec_crash_image); > } > @@ -1040,6 +1043,12 @@ void crash_save_cpu(struct pt_regs *regs, int cpu) > buf = (u32 *)per_cpu_ptr(crash_notes, cpu); > if (!buf) > return; > + if (regs == NULL) { > + buf = append_elf_note(buf, KEXEC_CORE_NOTE_NAME, NT_DUMMY, > + NULL, 0); > + final_note(buf); > + return; > + } > memset(, 0, sizeof(prstatus)); > prstatus.pr_pid = current->pid; > elf_core_copy_kernel_regs(_reg, regs); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On 12/14/16 at 04:39pm, Liu ping fan wrote: > On Wed, Dec 14, 2016 at 4:25 PM, Baoquan Hewrote: > > On 12/14/16 at 04:15pm, Liu ping fan wrote: > >> On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He wrote: > >> > On 12/14/16 at 02:11pm, Pingfan Liu wrote: > >> >> kexec-tools always allocates program headers for each possible cpu. This > >> >> incurs zero PT_NOTE for offline cpu. We mark this case so that later, > >> >> the capture kernel can distinguish it from the mistake of allocated > >> >> program header. > >> >> The counterpart of the capture kernel comes in next patch. > >> > > >> > When you execute dmesg on your testing machine and grep nr_cpu_ids, > >> > what's the value of nr_cpu_ids? > >> > > >> nr_cpu_ids=128 > > > > And what's the cpu number of in "lscpu" command? > > NUMA node1 CPU(s): 0-7 > > The system booted up with 128 possible cpu and only 8 online. > Also I tested on x86 guest, after bootup with 8 cpus, then offline 4 > of them, the zero PT_NOTE warning buzz too. Yes, this is what I think not quite appropriate using for_each_cpu_not(cpu, cpu_online_mask). Maybe it need try to save on those cpus which is present but not online. not online seems not good, it's not reasonable to save those getting apic but no cpu plugged. Thanks Baoquan ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On Wed, Dec 14, 2016 at 4:25 PM, Baoquan Hewrote: > On 12/14/16 at 04:15pm, Liu ping fan wrote: >> On Wed, Dec 14, 2016 at 3:40 PM, Baoquan He wrote: >> > On 12/14/16 at 02:11pm, Pingfan Liu wrote: >> >> kexec-tools always allocates program headers for each possible cpu. This >> >> incurs zero PT_NOTE for offline cpu. We mark this case so that later, >> >> the capture kernel can distinguish it from the mistake of allocated >> >> program header. >> >> The counterpart of the capture kernel comes in next patch. >> > >> > When you execute dmesg on your testing machine and grep nr_cpu_ids, >> > what's the value of nr_cpu_ids? >> > >> nr_cpu_ids=128 > > And what's the cpu number of in "lscpu" command? NUMA node1 CPU(s): 0-7 The system booted up with 128 possible cpu and only 8 online. Also I tested on x86 guest, after bootup with 8 cpus, then offline 4 of them, the zero PT_NOTE warning buzz too. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On 12/14/16 at 04:15pm, Liu ping fan wrote: > On Wed, Dec 14, 2016 at 3:40 PM, Baoquan Hewrote: > > On 12/14/16 at 02:11pm, Pingfan Liu wrote: > >> kexec-tools always allocates program headers for each possible cpu. This > >> incurs zero PT_NOTE for offline cpu. We mark this case so that later, > >> the capture kernel can distinguish it from the mistake of allocated > >> program header. > >> The counterpart of the capture kernel comes in next patch. > > > > When you execute dmesg on your testing machine and grep nr_cpu_ids, > > what's the value of nr_cpu_ids? > > > nr_cpu_ids=128 And what's the cpu number of in "lscpu" command? ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/2] kexec: add a dummy note for each offline cpu
On Wed, Dec 14, 2016 at 3:40 PM, Baoquan Hewrote: > On 12/14/16 at 02:11pm, Pingfan Liu wrote: >> kexec-tools always allocates program headers for each possible cpu. This >> incurs zero PT_NOTE for offline cpu. We mark this case so that later, >> the capture kernel can distinguish it from the mistake of allocated >> program header. >> The counterpart of the capture kernel comes in next patch. > > When you execute dmesg on your testing machine and grep nr_cpu_ids, > what's the value of nr_cpu_ids? > nr_cpu_ids=128 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec