Re: [PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump
Xunlei Pangwrites: > kexec setups identity mappings for all the memory mapped in 1st kernel, > this is not necessary for the kdump case. Actually it can cause extra > memory consumption for paging structures, which is quite considerable > on modern machines with huge memory. > > E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB) > from the reserved memory range if setting all the identity mappings. > > It also causes some trouble for distributions that use an intelligent > policy to evaluate the proper "crashkernel=X" for users. > > To solve it, in case of kdump, we only setup identity mappings for the > crash memory and the ISA memory(may be needed by purgatory/kdump > boot). How about instead we detect the presence of 1GiB pages and use them if they are available. We already use 2MiB pages. If we can do that we will only need about 192K for page tables in the case you have described and this all becomes a non-issue. I strongly suspect that the presence of 24TiB of memory in an x86 system strongly correlates to the presence of 1GiB pages. In principle we certainly can use a less extensive mapping but that should not be something that differs between the two kexec cases. I can see forcing the low 1MiB range in. But calling it ISA range is very wrong and misleading. The reasons that range are special during boot-up have nothing to do with ISA. But have everything to do with where legacy page tables are mapped, and where we need identity pages to start other cpus. I think the only user that actually cares is purgatory where it plays swapping games with the low 1MiB because we can't preload what we need to down there or it would mess up the running kernel. So saying anything about the old ISA bus is wrong and misleading. At the very very least we need accurate comments. Eric > > Signed-off-by: Xunlei Pang > --- > arch/x86/kernel/machine_kexec_64.c | 34 ++ > 1 file changed, 30 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/machine_kexec_64.c > b/arch/x86/kernel/machine_kexec_64.c > index 857cdbd..db77a76 100644 > --- a/arch/x86/kernel/machine_kexec_64.c > +++ b/arch/x86/kernel/machine_kexec_64.c > @@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned > long start_pgtable) > > level4p = (pgd_t *)__va(start_pgtable); > clear_page(level4p); > - for (i = 0; i < nr_pfn_mapped; i++) { > - mstart = pfn_mapped[i].start << PAGE_SHIFT; > - mend = pfn_mapped[i].end << PAGE_SHIFT; > > + if (image->type == KEXEC_TYPE_CRASH) { > + /* Always map the ISA range */ > result = kernel_ident_mapping_init(, > - level4p, mstart, mend); > + level4p, 0, ISA_END_ADDRESS); > if (result) > return result; > + > + /* crashk_low_res may not be initialized when reaching here */ > + if (crashk_low_res.end) { > + mstart = crashk_low_res.start; > + mend = crashk_low_res.end + 1; > + result = kernel_ident_mapping_init(, > + level4p, mstart, mend); > + if (result) > + return result; > + } > + > + mstart = crashk_res.start; > + mend = crashk_res.end + 1; > + result = kernel_ident_mapping_init(, > + level4p, mstart, mend); > + if (result) > + return result; > + } else { > + for (i = 0; i < nr_pfn_mapped; i++) { > + mstart = pfn_mapped[i].start << PAGE_SHIFT; > + mend = pfn_mapped[i].end << PAGE_SHIFT; > + > + result = kernel_ident_mapping_init(, > + level4p, mstart, mend); > + if (result) > + return result; > + } > } > > /* ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v2] kexec: Introduce vmcoreinfo signature verification
Xunlei Pangwrites: > Currently vmcoreinfo data is updated at boot time subsys_initcall(), > it has the risk of being modified by some wrong code during system > is running. > > As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on, > when using "crash" or "makedumpfile"(etc) utility to parse this vmcore, > we probably will get "Segmentation fault" or other unexpected/confusing > errors. If this is a real concern and the previous discussion sounds like it is part of what we need to do is move the variable vmcoreinfo_note out of the kernel's .bss section. And modify the code to regenerate and keep this information in something like the control page. Definitely something like this needs a page all to itself, and ideally far away from any other kernel data structures. I clearly was not watching closely the data someone decided to keep this silly thing in the kernel's .bss section. > As vmcoreinfo is the most fundamental information for vmcore, we better > double check its correctness. Here we generate a signature(using crc32) > after it is saved, then verify it in crash_save_vmcoreinfo() to see if > the signature was broken, if so we have to re-save the vmcoreinfo data > to get the correct vmcoreinfo for kdump as possible as we can. Sigh. We already have a sha256 that is supposed to cover this sort of thing. The bug rather is that apparently it isn't covering this data. That sounds like what we should be fixing. Please let's not invent new mechanisms we have to maintain. Let's reorganize this so this static data is protected like all other static data in the kexec-on-panic path. We have good mechanims and good strategies for avoiding and detecting corruption we just need to use them. Eric > Signed-off-by: Xunlei Pang > --- > v1->v2: > - Keep crash_save_vmcoreinfo_init() because "makedumpfile --mem-usage" > uses the information. > - Add crc32 verification for vmcoreinfo, re-save when failure. > > arch/Kconfig| 1 + > kernel/kexec_core.c | 43 +++ > 2 files changed, 36 insertions(+), 8 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index c4d6833..66eb296 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -4,6 +4,7 @@ > > config KEXEC_CORE > bool > + select CRC32 > > config HAVE_IMA_KEXEC > bool > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > index bfe62d5..012acbe 100644 > --- a/kernel/kexec_core.c > +++ b/kernel/kexec_core.c > @@ -38,6 +38,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -53,9 +54,10 @@ > > /* vmcoreinfo stuff */ > static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES]; > -u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; > +static u32 vmcoreinfo_sig; > size_t vmcoreinfo_size; > size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data); > +u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4]; > > /* Flag to indicate we are going to kexec a new kernel */ > bool kexec_in_progress = false; > @@ -1367,12 +1369,6 @@ static void update_vmcoreinfo_note(void) > final_note(buf); > } > > -void crash_save_vmcoreinfo(void) > -{ > - vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds()); > - update_vmcoreinfo_note(); > -} > - > void vmcoreinfo_append_str(const char *fmt, ...) > { > va_list args; > @@ -1402,7 +1398,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void) > return __pa_symbol((unsigned long)(char *)_note); > } > > -static int __init crash_save_vmcoreinfo_init(void) > +static void do_crash_save_vmcoreinfo_init(void) > { > VMCOREINFO_OSRELEASE(init_uts_ns.name.release); > VMCOREINFO_PAGESIZE(PAGE_SIZE); > @@ -1474,6 +1470,37 @@ static int __init crash_save_vmcoreinfo_init(void) > #endif > > arch_crash_save_vmcoreinfo(); > +} > + > +static u32 crash_calc_vmcoreinfo_sig(void) > +{ > + return crc32(~0, vmcoreinfo_data, vmcoreinfo_size); > +} > + > +static bool crash_verify_vmcoreinfo(void) > +{ > + if (crash_calc_vmcoreinfo_sig() == vmcoreinfo_sig) > + return true; > + > + return false; > +} > + > +void crash_save_vmcoreinfo(void) > +{ > + /* Re-save if verification fails */ > + if (!crash_verify_vmcoreinfo()) { > + vmcoreinfo_size = 0; > + do_crash_save_vmcoreinfo_init(); > + } > + > + vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds()); > + update_vmcoreinfo_note(); > +} > + > +static int __init crash_save_vmcoreinfo_init(void) > +{ > + do_crash_save_vmcoreinfo_init(); > + vmcoreinfo_sig = crash_calc_vmcoreinfo_sig(); > update_vmcoreinfo_note(); > > return 0; ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On 17/03/17 16:59, Marc Zyngier wrote: > On 17/03/17 16:24, Mark Rutland wrote: >> On Fri, Mar 17, 2017 at 03:47:08PM +, David Woodhouse wrote: >>> On Fri, 2017-03-17 at 15:33 +, Mark Rutland wrote: >>> No, in this case the CPUs *were* offlined correctly, or at least "as >>> designed", by smp_send_crash_stop(). And if that hadn't worked, as >>> verified by *its* synchronisation method based on the atomic_t >>> waiting_for_crash_ipi, then *it* would have complained for itself: >>> >>> if (atomic_read(_for_crash_ipi) > 0) >>> pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", >>>cpumask_pr_args(cpu_online_mask)); >>> >>> It's just that smp_send_crash_stop() (or more specifically >>> ipi_cpu_crash_stop()) doesn't touch the online cpu mask. Unlike the >>> ARM32 equivalent function machien_crash_nonpanic_core(), which does. >>> >>> It wasn't clear if that was *intentional*, to allow the original >>> contents of the online mask before the crash to be seen in the >>> resulting vmcore... or purely an accident. >> >> Looking at this, there's a larger mess. >> >> The waiting_for_crash_ipi dance only tells us if CPUs have taken the >> IPI, not wether they've been offlined (i.e. actually left the kernel). >> We need something closer to the usual cpu_{disable,die,kill} dance, >> clearing online as appropriate. >> >> If CPUs haven't left the kernel, we still need to warn about that. >> >>> FWIW if I trigger a crash on CPU 1 my kdump (still 4.9.8+v32) doesn't work. >>> I end up booting the kdump kernel on CPU#1 and then it gets distinctly >>> unhappy... >>> >>> [0.00] Booting Linux on physical CPU 0x1 >>> ... >>> [0.017125] Detected PIPT I-cache on CPU1 >>> [0.017138] GICv3: CPU1: found redistributor 0 region >>> 0:0xf028 >>> [0.017147] CPU1: Booted secondary processor [411fd073] >>> [0.017339] Detected PIPT I-cache on CPU2 >>> [0.017347] GICv3: CPU2: found redistributor 2 region >>> 0:0xf02c >>> [0.017354] CPU2: Booted secondary processor [411fd073] >>> [0.017537] Detected PIPT I-cache on CPU3 >>> [0.017545] GICv3: CPU3: found redistributor 3 region >>> 0:0xf02e >>> [0.017551] CPU3: Booted secondary processor [411fd073] >>> [0.017576] Brought up 4 CPUs >>> [0.017587] SMP: Total of 4 processors activated. >>> ... >>> [ 31.745809] INFO: rcu_sched detected stalls on CPUs/tasks: >>> [ 31.751299] 1-...: (30 GPs behind) idle=c90/0/0 softirq=0/0 fqs=0 >>> [ 31.757557] 2-...: (30 GPs behind) idle=608/0/0 softirq=0/0 fqs=0 >>> [ 31.763814] 3-...: (30 GPs behind) idle=604/0/0 softirq=0/0 fqs=0 >>> [ 31.770069] (detected by 0, t=5252 jiffies, g=-270, c=-271, q=0) >>> [ 31.776161] Task dump for CPU 1: >>> [ 31.779381] swapper/1 R running task0 0 1 >>> 0x0080 >>> [ 31.786446] Task dump for CPU 2: >>> [ 31.789666] swapper/2 R running task0 0 1 >>> 0x0080 >>> [ 31.796725] Task dump for CPU 3: >>> [ 31.799945] swapper/3 R running task0 0 1 >>> 0x0080 >>> >>> Is some of that platform-specific? >> >> That sounds like timer interrupts aren't being taken. >> >> Given that the CPUs have come up, my suspicion would be that the GIC's >> been left in some odd state, that the kdump kernel hasn't managed to >> recover from. >> >> Marc may have an idea. > > I thought kdump was UP only? Anyway, this doesn't look too good. > > It would be interesting to find out whether we're still taking > interrupts. Also, being able to reproduce this on mainline would be useful. > > I wonder if we don't have a bug when booting on something other than > CPU#0, possibly on a GICv3 platform... I'll give it a go. Went ahead and tried a couple of kexecs with various CPUs disabled in order to force kexec not to boot on CPU#0, and the VM did boot just fine. So I'd really appreciate a mainline reproducer. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On 17/03/17 16:24, Mark Rutland wrote: > On Fri, Mar 17, 2017 at 03:47:08PM +, David Woodhouse wrote: >> On Fri, 2017-03-17 at 15:33 +, Mark Rutland wrote: >> No, in this case the CPUs *were* offlined correctly, or at least "as >> designed", by smp_send_crash_stop(). And if that hadn't worked, as >> verified by *its* synchronisation method based on the atomic_t >> waiting_for_crash_ipi, then *it* would have complained for itself: >> >> if (atomic_read(_for_crash_ipi) > 0) >> pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", >> cpumask_pr_args(cpu_online_mask)); >> >> It's just that smp_send_crash_stop() (or more specifically >> ipi_cpu_crash_stop()) doesn't touch the online cpu mask. Unlike the >> ARM32 equivalent function machien_crash_nonpanic_core(), which does. >> >> It wasn't clear if that was *intentional*, to allow the original >> contents of the online mask before the crash to be seen in the >> resulting vmcore... or purely an accident. > > Looking at this, there's a larger mess. > > The waiting_for_crash_ipi dance only tells us if CPUs have taken the > IPI, not wether they've been offlined (i.e. actually left the kernel). > We need something closer to the usual cpu_{disable,die,kill} dance, > clearing online as appropriate. > > If CPUs haven't left the kernel, we still need to warn about that. > >> FWIW if I trigger a crash on CPU 1 my kdump (still 4.9.8+v32) doesn't work. >> I end up booting the kdump kernel on CPU#1 and then it gets distinctly >> unhappy... >> >> [0.00] Booting Linux on physical CPU 0x1 >> ... >> [0.017125] Detected PIPT I-cache on CPU1 >> [0.017138] GICv3: CPU1: found redistributor 0 region 0:0xf028 >> [0.017147] CPU1: Booted secondary processor [411fd073] >> [0.017339] Detected PIPT I-cache on CPU2 >> [0.017347] GICv3: CPU2: found redistributor 2 region 0:0xf02c >> [0.017354] CPU2: Booted secondary processor [411fd073] >> [0.017537] Detected PIPT I-cache on CPU3 >> [0.017545] GICv3: CPU3: found redistributor 3 region 0:0xf02e >> [0.017551] CPU3: Booted secondary processor [411fd073] >> [0.017576] Brought up 4 CPUs >> [0.017587] SMP: Total of 4 processors activated. >> ... >> [ 31.745809] INFO: rcu_sched detected stalls on CPUs/tasks: >> [ 31.751299] 1-...: (30 GPs behind) idle=c90/0/0 softirq=0/0 fqs=0 >> [ 31.757557] 2-...: (30 GPs behind) idle=608/0/0 softirq=0/0 fqs=0 >> [ 31.763814] 3-...: (30 GPs behind) idle=604/0/0 softirq=0/0 fqs=0 >> [ 31.770069] (detected by 0, t=5252 jiffies, g=-270, c=-271, q=0) >> [ 31.776161] Task dump for CPU 1: >> [ 31.779381] swapper/1 R running task0 0 1 >> 0x0080 >> [ 31.786446] Task dump for CPU 2: >> [ 31.789666] swapper/2 R running task0 0 1 >> 0x0080 >> [ 31.796725] Task dump for CPU 3: >> [ 31.799945] swapper/3 R running task0 0 1 >> 0x0080 >> >> Is some of that platform-specific? > > That sounds like timer interrupts aren't being taken. > > Given that the CPUs have come up, my suspicion would be that the GIC's > been left in some odd state, that the kdump kernel hasn't managed to > recover from. > > Marc may have an idea. I thought kdump was UP only? Anyway, this doesn't look too good. It would be interesting to find out whether we're still taking interrupts. Also, being able to reproduce this on mainline would be useful. I wonder if we don't have a bug when booting on something other than CPU#0, possibly on a GICv3 platform... I'll give it a go. Thanks, M. -- Jazz is not dead. It just smells funny... ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On Fri, Mar 17, 2017 at 03:47:08PM +, David Woodhouse wrote: > On Fri, 2017-03-17 at 15:33 +, Mark Rutland wrote: > No, in this case the CPUs *were* offlined correctly, or at least "as > designed", by smp_send_crash_stop(). And if that hadn't worked, as > verified by *its* synchronisation method based on the atomic_t > waiting_for_crash_ipi, then *it* would have complained for itself: > > if (atomic_read(_for_crash_ipi) > 0) > pr_warning("SMP: failed to stop secondary CPUs %*pbl\n", > cpumask_pr_args(cpu_online_mask)); > > It's just that smp_send_crash_stop() (or more specifically > ipi_cpu_crash_stop()) doesn't touch the online cpu mask. Unlike the > ARM32 equivalent function machien_crash_nonpanic_core(), which does. > > It wasn't clear if that was *intentional*, to allow the original > contents of the online mask before the crash to be seen in the > resulting vmcore... or purely an accident. Looking at this, there's a larger mess. The waiting_for_crash_ipi dance only tells us if CPUs have taken the IPI, not wether they've been offlined (i.e. actually left the kernel). We need something closer to the usual cpu_{disable,die,kill} dance, clearing online as appropriate. If CPUs haven't left the kernel, we still need to warn about that. > FWIW if I trigger a crash on CPU 1 my kdump (still 4.9.8+v32) doesn't work. > I end up booting the kdump kernel on CPU#1 and then it gets distinctly > unhappy... > > [0.00] Booting Linux on physical CPU 0x1 > ... > [0.017125] Detected PIPT I-cache on CPU1 > [0.017138] GICv3: CPU1: found redistributor 0 region 0:0xf028 > [0.017147] CPU1: Booted secondary processor [411fd073] > [0.017339] Detected PIPT I-cache on CPU2 > [0.017347] GICv3: CPU2: found redistributor 2 region 0:0xf02c > [0.017354] CPU2: Booted secondary processor [411fd073] > [0.017537] Detected PIPT I-cache on CPU3 > [0.017545] GICv3: CPU3: found redistributor 3 region 0:0xf02e > [0.017551] CPU3: Booted secondary processor [411fd073] > [0.017576] Brought up 4 CPUs > [0.017587] SMP: Total of 4 processors activated. > ... > [ 31.745809] INFO: rcu_sched detected stalls on CPUs/tasks: > [ 31.751299] 1-...: (30 GPs behind) idle=c90/0/0 softirq=0/0 fqs=0 > [ 31.757557] 2-...: (30 GPs behind) idle=608/0/0 softirq=0/0 fqs=0 > [ 31.763814] 3-...: (30 GPs behind) idle=604/0/0 softirq=0/0 fqs=0 > [ 31.770069] (detected by 0, t=5252 jiffies, g=-270, c=-271, q=0) > [ 31.776161] Task dump for CPU 1: > [ 31.779381] swapper/1 R running task0 0 1 > 0x0080 > [ 31.786446] Task dump for CPU 2: > [ 31.789666] swapper/2 R running task0 0 1 > 0x0080 > [ 31.796725] Task dump for CPU 3: > [ 31.799945] swapper/3 R running task0 0 1 > 0x0080 > > Is some of that platform-specific? That sounds like timer interrupts aren't being taken. Given that the CPUs have come up, my suspicion would be that the GIC's been left in some odd state, that the kdump kernel hasn't managed to recover from. Marc may have an idea. Thanks, Mark. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On Fri, Mar 17, 2017 at 02:02:53PM +, David Woodhouse wrote: > On Fri, 2017-03-17 at 11:43 +, David Woodhouse wrote: > > > > Is this one going to be be my fault too? > > Looks like it isn't my fault. In ipi_cpu_crash_stop() we don't modify > the online mask. Which is reasonable enough if we want to preserve its > original contents from before the crash, but it does make that > WARN_ON() in machine_kexec() a false positive. I'd say it's not so much a false positive, but rather an uninformative message. Some warning here is completely appropriate. Even if the CPUs are stopped in the kernel, there are a number of cases where the HW can corrupt system state in the background. We can certainly log a better message, e.g. bool kdump = (image == kexec_crash_image); bool stuck_cpus = cpus_are_stuck_in_kernel() || num_online_cpus() > 1; BUG_ON(stuck_cpus && !kdump); WARN(stuck_cpus, "Unable to offline CPUs, kdump will be unreliable.\n"); Thanks, Mark. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On Fri, Mar 17, 2017 at 02:02:53PM +, David Woodhouse wrote: > On Fri, 2017-03-17 at 11:43 +, David Woodhouse wrote: > > > > Is this one going to be be my fault too? > > Looks like it isn't my fault. In ipi_cpu_crash_stop() we don't modify > the online mask. Which is reasonable enough if we want to preserve its > original contents from before the crash, but it does make that > WARN_ON() in machine_kexec() a false positive. > > Btw, why is this a normal IPI and not something... less maskable? > On x86 we use NMI for that... Architecturally, arm64 does not have an NMI. There's been some work to try to get pseudo-NMIs using GICv3 priorities, but that's about the closest we can get today. Thanks, Mark. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On Wed, 2017-03-15 at 18:56 +0900, AKASHI Takahiro wrote: > This patch series adds kdump support on arm64. > > To load a crash-dump kernel to the systems, a series of patches to > kexec-tools[1] are also needed. Please use the latest one, v6 [2]. > For your convinience, you can pick them up from: > https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump > https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump > > To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use > - crash utility (v7.1.8 or later) [3] > > I tested this patchset on fast model and hikey. Is this one going to be be my fault too? [ 38.947305] SMP: stopping secondary CPUs [ 38.951291] Starting crashdump kernel... [ 38.955207] [ cut here ] [ 38.959818] WARNING: CPU: 0 PID: 312 at arch/arm64/kernel/machine_kexec.c:169 machine_kexec+0x220/0x298 That's the cpus_are_stuck_in_kernel() || num_online_cpus() one. Again, v32 applied to a 4.9 kernel. With the stray __init removed this time ;) Taking secondary CPUs down for a normal kexec does work via PSCI. I'll see if I can enable enough hardware support to let me test a newer kernel... smime.p7s Description: S/MIME cryptographic signature ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 00/14] add kdump support
On Fri, 2017-03-17 at 11:43 +, David Woodhouse wrote: > > Is this one going to be be my fault too? Looks like it isn't my fault. In ipi_cpu_crash_stop() we don't modify the online mask. Which is reasonable enough if we want to preserve its original contents from before the crash, but it does make that WARN_ON() in machine_kexec() a false positive. Btw, why is this a normal IPI and not something... less maskable? On x86 we use NMI for that... smime.p7s Description: S/MIME cryptographic signature ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 04/14] arm64: kdump: reserve memory for crash dump kernel
On Fri, 2017-03-17 at 20:31 +0900, AKASHI Takahiro wrote: > > Yes and no. > This notation is consistent with other places like mem_init() > in mm/init.c.] Well, perhaps we should fix those too then. But we certainly shouldn't add *more* errors. smime.p7s Description: S/MIME cryptographic signature ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 04/14] arm64: kdump: reserve memory for crash dump kernel
On Wed, 2017-03-15 at 18:59 +0900, AKASHI Takahiro wrote: > > + pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n", > + crash_base, crash_base + crash_size, crash_size >> 20); There's a typo there — it says MB but you mean MiB. Unless you meant crash_size / 100 and not crash_size >> 20? smime.p7s Description: S/MIME cryptographic signature ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec regression since 4.9 caused by efi
On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote: > > Matt, I think it should be fine although I think the md type checking in > efi_mem_desc_lookup() is causing confusion and not easy to understand.. Could you make that a separate patch if you think of improvements there? > How about move the if chunk early like below because it seems no need > to sanity check the addr + size any more if the md is still RUNTIME? My original version did as you suggest, but I changed it because we *really* want to know if someone tries to reserve a range that spans regions. That would be totally unexpected and a warning about a potential bug/issue. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: kexec regression since 4.9 caused by efi
On 17 March 2017 at 02:09, Dave Youngwrote: > On 03/16/17 at 12:41pm, Matt Fleming wrote: >> On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote: >> > >> > Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is >> > not >> > correct to be used in efi_arch_mem_reserve, if it passed your test, I >> > can rewrite patch log with more background and send it out: >> > >> > for_each_efi_memory_desc(md) { >> > [snip] >> > if (!(md->attribute & EFI_MEMORY_RUNTIME) && >> > md->type != EFI_BOOT_SERVICES_DATA && >> > md->type != EFI_RUNTIME_SERVICES_DATA) { >> > continue; >> > } >> > >> > In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot >> > data or runtime data, this is wrong for efi_mem_reserve, because we are >> > reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the >> > running time. Just is happened to work and we did not capture the error. >> >> Wouldn't something like this be simpler? >> >> --- >> >> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c >> index 30031d5293c4..cdfe8c628959 100644 >> --- a/arch/x86/platform/efi/quirks.c >> +++ b/arch/x86/platform/efi/quirks.c >> @@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 >> size) >> return; >> } >> >> + /* No need to reserve regions that will never be freed. */ >> + if (md.attribute & EFI_MEMORY_RUNTIME) >> + return; >> + > > Matt, I think it should be fine although I think the md type checking in > efi_mem_desc_lookup() is causing confusion and not easy to understand.. > > How about move the if chunk early like below because it seems no need > to sanity check the addr + size any more if the md is still RUNTIME? > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c > +++ linux-x86/arch/x86/platform/efi/quirks.c > @@ -196,6 +196,10 @@ void __init efi_arch_mem_reserve(phys_ad > return; > } > > + /* No need to reserve regions that will never be freed. */ > + if (md.attribute & EFI_MEMORY_RUNTIME) > + return; > + > if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) { > pr_err("Region spans EFI memory descriptors, %pa\n", ); > return; > This way, we suppress the error it the region spans multiple descriptors, and only the first one has the runtime attribute. So the two patches are not equivalent. I don't have a strong preference either way, though. ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH v33 04/14] arm64: kdump: reserve memory for crash dump kernel
On Fri, Mar 17, 2017 at 10:46:06AM +, David Woodhouse wrote: > On Wed, 2017-03-15 at 18:59 +0900, AKASHI Takahiro wrote: > > > > +pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n", > > +crash_base, crash_base + crash_size, crash_size >> 20); > > There's a typo there — it says MB but you mean MiB. > > Unless you meant crash_size / 100 and not crash_size >> 20? Yes and no. This notation is consistent with other places like mem_init() in mm/init.c. Thanks, -Takahiro AKASHI ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump
kexec setups identity mappings for all the memory mapped in 1st kernel, this is not necessary for the kdump case. Actually it can cause extra memory consumption for paging structures, which is quite considerable on modern machines with huge memory. E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB) from the reserved memory range if setting all the identity mappings. It also causes some trouble for distributions that use an intelligent policy to evaluate the proper "crashkernel=X" for users. To solve it, in case of kdump, we only setup identity mappings for the crash memory and the ISA memory(may be needed by purgatory/kdump boot). Signed-off-by: Xunlei Pang--- arch/x86/kernel/machine_kexec_64.c | 34 ++ 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index 857cdbd..db77a76 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) level4p = (pgd_t *)__va(start_pgtable); clear_page(level4p); - for (i = 0; i < nr_pfn_mapped; i++) { - mstart = pfn_mapped[i].start << PAGE_SHIFT; - mend = pfn_mapped[i].end << PAGE_SHIFT; + if (image->type == KEXEC_TYPE_CRASH) { + /* Always map the ISA range */ result = kernel_ident_mapping_init(, -level4p, mstart, mend); + level4p, 0, ISA_END_ADDRESS); if (result) return result; + + /* crashk_low_res may not be initialized when reaching here */ + if (crashk_low_res.end) { + mstart = crashk_low_res.start; + mend = crashk_low_res.end + 1; + result = kernel_ident_mapping_init(, + level4p, mstart, mend); + if (result) + return result; + } + + mstart = crashk_res.start; + mend = crashk_res.end + 1; + result = kernel_ident_mapping_init(, + level4p, mstart, mend); + if (result) + return result; + } else { + for (i = 0; i < nr_pfn_mapped; i++) { + mstart = pfn_mapped[i].start << PAGE_SHIFT; + mend = pfn_mapped[i].end << PAGE_SHIFT; + + result = kernel_ident_mapping_init(, +level4p, mstart, mend); + if (result) + return result; + } } /* -- 1.8.3.1 ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec