Re: [PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump

2017-03-17 Thread Eric W. Biederman
Xunlei Pang  writes:

> kexec setups identity mappings for all the memory mapped in 1st kernel,
> this is not necessary for the kdump case. Actually it can cause extra
> memory consumption for paging structures, which is quite considerable
> on modern machines with huge memory.
>
> E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
> from the reserved memory range if setting all the identity mappings.
>
> It also causes some trouble for distributions that use an intelligent
> policy to evaluate the proper "crashkernel=X" for users.
>
> To solve it, in case of kdump, we only setup identity mappings for the
> crash memory and the ISA memory(may be needed by purgatory/kdump
> boot).

How about instead we detect the presence of 1GiB pages and use them
if they are available.  We already use 2MiB pages.  If we can do that
we will only need about 192K for page tables in the case you have
described and this all becomes a non-issue.

I strongly suspect that the presence of 24TiB of memory in an x86 system
strongly correlates to the presence of 1GiB pages.

In principle we certainly can use a less extensive mapping but that
should not be something that differs between the two kexec cases.

I can see forcing the low 1MiB range in.  But calling it ISA range is
very wrong and misleading.  The reasons that range are special during
boot-up have nothing to do with ISA.  But have everything to do with
where legacy page tables are mapped, and where we need identity pages to
start other cpus.  I think the only user that actually cares is
purgatory where it plays swapping games with the low 1MiB because we
can't preload what we need to down there or it would mess up the running
kernel.  So saying anything about the old ISA bus is wrong and
misleading.  At the very very least we need accurate comments.

Eric


>
> Signed-off-by: Xunlei Pang 
> ---
>  arch/x86/kernel/machine_kexec_64.c | 34 ++
>  1 file changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c 
> b/arch/x86/kernel/machine_kexec_64.c
> index 857cdbd..db77a76 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned 
> long start_pgtable)
>  
>   level4p = (pgd_t *)__va(start_pgtable);
>   clear_page(level4p);
> - for (i = 0; i < nr_pfn_mapped; i++) {
> - mstart = pfn_mapped[i].start << PAGE_SHIFT;
> - mend   = pfn_mapped[i].end << PAGE_SHIFT;
>  
> + if (image->type == KEXEC_TYPE_CRASH) {
> + /* Always map the ISA range */
>   result = kernel_ident_mapping_init(,
> -  level4p, mstart, mend);
> + level4p, 0, ISA_END_ADDRESS);
>   if (result)
>   return result;
> +
> + /* crashk_low_res may not be initialized when reaching here */
> + if (crashk_low_res.end) {
> + mstart = crashk_low_res.start;
> + mend = crashk_low_res.end + 1;
> + result = kernel_ident_mapping_init(,
> + level4p, mstart, mend);
> + if (result)
> + return result;
> + }
> +
> + mstart = crashk_res.start;
> + mend = crashk_res.end + 1;
> + result = kernel_ident_mapping_init(,
> + level4p, mstart, mend);
> + if (result)
> + return result;
> + } else {
> + for (i = 0; i < nr_pfn_mapped; i++) {
> + mstart = pfn_mapped[i].start << PAGE_SHIFT;
> + mend   = pfn_mapped[i].end << PAGE_SHIFT;
> +
> + result = kernel_ident_mapping_init(,
> +  level4p, mstart, mend);
> + if (result)
> + return result;
> + }
>   }
>  
>   /*

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2] kexec: Introduce vmcoreinfo signature verification

2017-03-17 Thread Eric W. Biederman
Xunlei Pang  writes:

> Currently vmcoreinfo data is updated at boot time subsys_initcall(),
> it has the risk of being modified by some wrong code during system
> is running.
>
> As a result, vmcore dumped may contain the wrong vmcoreinfo. Later on,
> when using "crash" or "makedumpfile"(etc) utility to parse this vmcore,
> we probably will get "Segmentation fault" or other unexpected/confusing
> errors.

If this is a real concern and the previous discussion sounds like it is
part of what we need to do is move the variable vmcoreinfo_note out
of the kernel's .bss section.  And modify the code to regenerate
and keep this information in something like the control page.

Definitely something like this needs a page all to itself, and ideally
far away from any other kernel data structures.  I clearly was not
watching closely the data someone decided to keep this silly thing
in the kernel's .bss section.

> As vmcoreinfo is the most fundamental information for vmcore, we better
> double check its correctness. Here we generate a signature(using crc32)
> after it is saved, then verify it in crash_save_vmcoreinfo() to see if
> the signature was broken, if so we have to re-save the vmcoreinfo data
> to get the correct vmcoreinfo for kdump as possible as we can.

Sigh.  We already have a sha256 that is supposed to cover this sort of
thing.  The bug rather is that apparently it isn't covering this data.
That sounds like what we should be fixing.

Please let's not invent new mechanisms we have to maintain.  Let's
reorganize this so this static data is protected like all other static
data in the kexec-on-panic path.  We have good mechanims and good
strategies for avoiding and detecting corruption we just need to use
them.

Eric



> Signed-off-by: Xunlei Pang 
> ---
> v1->v2:
> - Keep crash_save_vmcoreinfo_init() because "makedumpfile --mem-usage"
>   uses the information.
> - Add crc32 verification for vmcoreinfo, re-save when failure.
>
>  arch/Kconfig|  1 +
>  kernel/kexec_core.c | 43 +++
>  2 files changed, 36 insertions(+), 8 deletions(-)
>
> diff --git a/arch/Kconfig b/arch/Kconfig
> index c4d6833..66eb296 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -4,6 +4,7 @@
>  
>  config KEXEC_CORE
>   bool
> + select CRC32
>  
>  config HAVE_IMA_KEXEC
>   bool
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index bfe62d5..012acbe 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -53,9 +54,10 @@
>  
>  /* vmcoreinfo stuff */
>  static unsigned char vmcoreinfo_data[VMCOREINFO_BYTES];
> -u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
> +static u32 vmcoreinfo_sig;
>  size_t vmcoreinfo_size;
>  size_t vmcoreinfo_max_size = sizeof(vmcoreinfo_data);
> +u32 vmcoreinfo_note[VMCOREINFO_NOTE_SIZE/4];
>  
>  /* Flag to indicate we are going to kexec a new kernel */
>  bool kexec_in_progress = false;
> @@ -1367,12 +1369,6 @@ static void update_vmcoreinfo_note(void)
>   final_note(buf);
>  }
>  
> -void crash_save_vmcoreinfo(void)
> -{
> - vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds());
> - update_vmcoreinfo_note();
> -}
> -
>  void vmcoreinfo_append_str(const char *fmt, ...)
>  {
>   va_list args;
> @@ -1402,7 +1398,7 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
>   return __pa_symbol((unsigned long)(char *)_note);
>  }
>  
> -static int __init crash_save_vmcoreinfo_init(void)
> +static void do_crash_save_vmcoreinfo_init(void)
>  {
>   VMCOREINFO_OSRELEASE(init_uts_ns.name.release);
>   VMCOREINFO_PAGESIZE(PAGE_SIZE);
> @@ -1474,6 +1470,37 @@ static int __init crash_save_vmcoreinfo_init(void)
>  #endif
>  
>   arch_crash_save_vmcoreinfo();
> +}
> +
> +static u32 crash_calc_vmcoreinfo_sig(void)
> +{
> + return crc32(~0, vmcoreinfo_data, vmcoreinfo_size);
> +}
> +
> +static bool crash_verify_vmcoreinfo(void)
> +{
> + if (crash_calc_vmcoreinfo_sig() == vmcoreinfo_sig)
> + return true;
> +
> + return false;
> +}
> +
> +void crash_save_vmcoreinfo(void)
> +{
> + /* Re-save if verification fails */
> + if (!crash_verify_vmcoreinfo()) {
> + vmcoreinfo_size = 0;
> + do_crash_save_vmcoreinfo_init();
> + }
> +
> + vmcoreinfo_append_str("CRASHTIME=%ld\n", get_seconds());
> + update_vmcoreinfo_note();
> +}
> +
> +static int __init crash_save_vmcoreinfo_init(void)
> +{
> + do_crash_save_vmcoreinfo_init();
> + vmcoreinfo_sig = crash_calc_vmcoreinfo_sig();
>   update_vmcoreinfo_note();
>  
>   return 0;

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread Marc Zyngier
On 17/03/17 16:59, Marc Zyngier wrote:
> On 17/03/17 16:24, Mark Rutland wrote:
>> On Fri, Mar 17, 2017 at 03:47:08PM +, David Woodhouse wrote:
>>> On Fri, 2017-03-17 at 15:33 +, Mark Rutland wrote:
>>> No, in this case the CPUs *were* offlined correctly, or at least "as
>>> designed", by smp_send_crash_stop(). And if that hadn't worked, as
>>> verified by *its* synchronisation method based on the atomic_t
>>> waiting_for_crash_ipi, then *it* would have complained for itself:
>>>
>>> if (atomic_read(_for_crash_ipi) > 0)
>>> pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
>>>cpumask_pr_args(cpu_online_mask));
>>>
>>> It's just that smp_send_crash_stop() (or more specifically
>>> ipi_cpu_crash_stop()) doesn't touch the online cpu mask. Unlike the
>>> ARM32 equivalent function machien_crash_nonpanic_core(), which does.
>>>
>>> It wasn't clear if that was *intentional*, to allow the original
>>> contents of the online mask before the crash to be seen in the
>>> resulting vmcore... or purely an accident. 
>>
>> Looking at this, there's a larger mess.
>>
>> The waiting_for_crash_ipi dance only tells us if CPUs have taken the
>> IPI, not wether they've been offlined (i.e. actually left the kernel).
>> We need something closer to the usual cpu_{disable,die,kill} dance,
>> clearing online as appropriate.
>>
>> If CPUs haven't left the kernel, we still need to warn about that.
>>
>>> FWIW if I trigger a crash on CPU 1 my kdump (still 4.9.8+v32) doesn't work.
>>> I end up booting the kdump kernel on CPU#1 and then it gets distinctly 
>>> unhappy...
>>>
>>> [0.00] Booting Linux on physical CPU 0x1
>>> ...
>>> [0.017125] Detected PIPT I-cache on CPU1
>>> [0.017138] GICv3: CPU1: found redistributor 0 region 
>>> 0:0xf028
>>> [0.017147] CPU1: Booted secondary processor [411fd073]
>>> [0.017339] Detected PIPT I-cache on CPU2
>>> [0.017347] GICv3: CPU2: found redistributor 2 region 
>>> 0:0xf02c
>>> [0.017354] CPU2: Booted secondary processor [411fd073]
>>> [0.017537] Detected PIPT I-cache on CPU3
>>> [0.017545] GICv3: CPU3: found redistributor 3 region 
>>> 0:0xf02e
>>> [0.017551] CPU3: Booted secondary processor [411fd073]
>>> [0.017576] Brought up 4 CPUs
>>> [0.017587] SMP: Total of 4 processors activated.
>>> ...
>>> [   31.745809] INFO: rcu_sched detected stalls on CPUs/tasks:
>>> [   31.751299]  1-...: (30 GPs behind) idle=c90/0/0 softirq=0/0 fqs=0 
>>> [   31.757557]  2-...: (30 GPs behind) idle=608/0/0 softirq=0/0 fqs=0 
>>> [   31.763814]  3-...: (30 GPs behind) idle=604/0/0 softirq=0/0 fqs=0 
>>> [   31.770069]  (detected by 0, t=5252 jiffies, g=-270, c=-271, q=0)
>>> [   31.776161] Task dump for CPU 1:
>>> [   31.779381] swapper/1   R  running task0 0  1 
>>> 0x0080
>>> [   31.786446] Task dump for CPU 2:
>>> [   31.789666] swapper/2   R  running task0 0  1 
>>> 0x0080
>>> [   31.796725] Task dump for CPU 3:
>>> [   31.799945] swapper/3   R  running task0 0  1 
>>> 0x0080
>>>
>>> Is some of that platform-specific?
>>
>> That sounds like timer interrupts aren't being taken.
>>
>> Given that the CPUs have come up, my suspicion would be that the GIC's
>> been left in some odd state, that the kdump kernel hasn't managed to
>> recover from.
>>
>> Marc may have an idea.
> 
> I thought kdump was UP only? Anyway, this doesn't look too good.
> 
> It would be interesting to find out whether we're still taking
> interrupts. Also, being able to reproduce this on mainline would be useful.
> 
> I wonder if we don't have a bug when booting on something other than
> CPU#0, possibly on a GICv3 platform... I'll give it a go.

Went ahead and tried a couple of kexecs with various CPUs disabled in
order to force kexec not to boot on CPU#0, and the VM did boot just fine.

So I'd really appreciate a mainline reproducer.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread Marc Zyngier
On 17/03/17 16:24, Mark Rutland wrote:
> On Fri, Mar 17, 2017 at 03:47:08PM +, David Woodhouse wrote:
>> On Fri, 2017-03-17 at 15:33 +, Mark Rutland wrote:
>> No, in this case the CPUs *were* offlined correctly, or at least "as
>> designed", by smp_send_crash_stop(). And if that hadn't worked, as
>> verified by *its* synchronisation method based on the atomic_t
>> waiting_for_crash_ipi, then *it* would have complained for itself:
>>
>>  if (atomic_read(_for_crash_ipi) > 0)
>>  pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
>> cpumask_pr_args(cpu_online_mask));
>>
>> It's just that smp_send_crash_stop() (or more specifically
>> ipi_cpu_crash_stop()) doesn't touch the online cpu mask. Unlike the
>> ARM32 equivalent function machien_crash_nonpanic_core(), which does.
>>
>> It wasn't clear if that was *intentional*, to allow the original
>> contents of the online mask before the crash to be seen in the
>> resulting vmcore... or purely an accident. 
> 
> Looking at this, there's a larger mess.
> 
> The waiting_for_crash_ipi dance only tells us if CPUs have taken the
> IPI, not wether they've been offlined (i.e. actually left the kernel).
> We need something closer to the usual cpu_{disable,die,kill} dance,
> clearing online as appropriate.
> 
> If CPUs haven't left the kernel, we still need to warn about that.
> 
>> FWIW if I trigger a crash on CPU 1 my kdump (still 4.9.8+v32) doesn't work.
>> I end up booting the kdump kernel on CPU#1 and then it gets distinctly 
>> unhappy...
>>
>> [0.00] Booting Linux on physical CPU 0x1
>> ...
>> [0.017125] Detected PIPT I-cache on CPU1
>> [0.017138] GICv3: CPU1: found redistributor 0 region 0:0xf028
>> [0.017147] CPU1: Booted secondary processor [411fd073]
>> [0.017339] Detected PIPT I-cache on CPU2
>> [0.017347] GICv3: CPU2: found redistributor 2 region 0:0xf02c
>> [0.017354] CPU2: Booted secondary processor [411fd073]
>> [0.017537] Detected PIPT I-cache on CPU3
>> [0.017545] GICv3: CPU3: found redistributor 3 region 0:0xf02e
>> [0.017551] CPU3: Booted secondary processor [411fd073]
>> [0.017576] Brought up 4 CPUs
>> [0.017587] SMP: Total of 4 processors activated.
>> ...
>> [   31.745809] INFO: rcu_sched detected stalls on CPUs/tasks:
>> [   31.751299]   1-...: (30 GPs behind) idle=c90/0/0 softirq=0/0 fqs=0 
>> [   31.757557]   2-...: (30 GPs behind) idle=608/0/0 softirq=0/0 fqs=0 
>> [   31.763814]   3-...: (30 GPs behind) idle=604/0/0 softirq=0/0 fqs=0 
>> [   31.770069]   (detected by 0, t=5252 jiffies, g=-270, c=-271, q=0)
>> [   31.776161] Task dump for CPU 1:
>> [   31.779381] swapper/1   R  running task0 0  1 
>> 0x0080
>> [   31.786446] Task dump for CPU 2:
>> [   31.789666] swapper/2   R  running task0 0  1 
>> 0x0080
>> [   31.796725] Task dump for CPU 3:
>> [   31.799945] swapper/3   R  running task0 0  1 
>> 0x0080
>>
>> Is some of that platform-specific?
> 
> That sounds like timer interrupts aren't being taken.
> 
> Given that the CPUs have come up, my suspicion would be that the GIC's
> been left in some odd state, that the kdump kernel hasn't managed to
> recover from.
> 
> Marc may have an idea.

I thought kdump was UP only? Anyway, this doesn't look too good.

It would be interesting to find out whether we're still taking
interrupts. Also, being able to reproduce this on mainline would be useful.

I wonder if we don't have a bug when booting on something other than
CPU#0, possibly on a GICv3 platform... I'll give it a go.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread Mark Rutland
On Fri, Mar 17, 2017 at 03:47:08PM +, David Woodhouse wrote:
> On Fri, 2017-03-17 at 15:33 +, Mark Rutland wrote:
> No, in this case the CPUs *were* offlined correctly, or at least "as
> designed", by smp_send_crash_stop(). And if that hadn't worked, as
> verified by *its* synchronisation method based on the atomic_t
> waiting_for_crash_ipi, then *it* would have complained for itself:
> 
>   if (atomic_read(_for_crash_ipi) > 0)
>   pr_warning("SMP: failed to stop secondary CPUs %*pbl\n",
>      cpumask_pr_args(cpu_online_mask));
> 
> It's just that smp_send_crash_stop() (or more specifically
> ipi_cpu_crash_stop()) doesn't touch the online cpu mask. Unlike the
> ARM32 equivalent function machien_crash_nonpanic_core(), which does.
> 
> It wasn't clear if that was *intentional*, to allow the original
> contents of the online mask before the crash to be seen in the
> resulting vmcore... or purely an accident. 

Looking at this, there's a larger mess.

The waiting_for_crash_ipi dance only tells us if CPUs have taken the
IPI, not wether they've been offlined (i.e. actually left the kernel).
We need something closer to the usual cpu_{disable,die,kill} dance,
clearing online as appropriate.

If CPUs haven't left the kernel, we still need to warn about that.

> FWIW if I trigger a crash on CPU 1 my kdump (still 4.9.8+v32) doesn't work.
> I end up booting the kdump kernel on CPU#1 and then it gets distinctly 
> unhappy...
> 
> [0.00] Booting Linux on physical CPU 0x1
> ...
> [0.017125] Detected PIPT I-cache on CPU1
> [0.017138] GICv3: CPU1: found redistributor 0 region 0:0xf028
> [0.017147] CPU1: Booted secondary processor [411fd073]
> [0.017339] Detected PIPT I-cache on CPU2
> [0.017347] GICv3: CPU2: found redistributor 2 region 0:0xf02c
> [0.017354] CPU2: Booted secondary processor [411fd073]
> [0.017537] Detected PIPT I-cache on CPU3
> [0.017545] GICv3: CPU3: found redistributor 3 region 0:0xf02e
> [0.017551] CPU3: Booted secondary processor [411fd073]
> [0.017576] Brought up 4 CPUs
> [0.017587] SMP: Total of 4 processors activated.
> ...
> [   31.745809] INFO: rcu_sched detected stalls on CPUs/tasks:
> [   31.751299]    1-...: (30 GPs behind) idle=c90/0/0 softirq=0/0 fqs=0 
> [   31.757557]    2-...: (30 GPs behind) idle=608/0/0 softirq=0/0 fqs=0 
> [   31.763814]    3-...: (30 GPs behind) idle=604/0/0 softirq=0/0 fqs=0 
> [   31.770069]    (detected by 0, t=5252 jiffies, g=-270, c=-271, q=0)
> [   31.776161] Task dump for CPU 1:
> [   31.779381] swapper/1   R  running task0 0  1 
> 0x0080
> [   31.786446] Task dump for CPU 2:
> [   31.789666] swapper/2   R  running task0 0  1 
> 0x0080
> [   31.796725] Task dump for CPU 3:
> [   31.799945] swapper/3   R  running task0 0  1 
> 0x0080
> 
> Is some of that platform-specific?

That sounds like timer interrupts aren't being taken.

Given that the CPUs have come up, my suspicion would be that the GIC's
been left in some odd state, that the kdump kernel hasn't managed to
recover from.

Marc may have an idea.

Thanks,
Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread Mark Rutland
On Fri, Mar 17, 2017 at 02:02:53PM +, David Woodhouse wrote:
> On Fri, 2017-03-17 at 11:43 +, David Woodhouse wrote:
> > 
> > Is this one going to be be my fault too?
> 
> Looks like it isn't my fault. In ipi_cpu_crash_stop() we don't modify
> the online mask. Which is reasonable enough if we want to preserve its
> original contents from before the crash, but it does make that
> WARN_ON() in machine_kexec() a false positive.

I'd say it's not so much a false positive, but rather an uninformative
message.

Some warning here is completely appropriate. Even if the CPUs are
stopped in the kernel, there are a number of cases where the HW can
corrupt system state in the background.

We can certainly log a better message, e.g.

bool kdump = (image == kexec_crash_image);
bool stuck_cpus = cpus_are_stuck_in_kernel() ||
  num_online_cpus() > 1;

BUG_ON(stuck_cpus && !kdump);
WARN(stuck_cpus, "Unable to offline CPUs, kdump will be unreliable.\n");

Thanks,
Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread Mark Rutland
On Fri, Mar 17, 2017 at 02:02:53PM +, David Woodhouse wrote:
> On Fri, 2017-03-17 at 11:43 +, David Woodhouse wrote:
> > 
> > Is this one going to be be my fault too?
> 
> Looks like it isn't my fault. In ipi_cpu_crash_stop() we don't modify
> the online mask. Which is reasonable enough if we want to preserve its
> original contents from before the crash, but it does make that
> WARN_ON() in machine_kexec() a false positive.
> 
> Btw, why is this a normal IPI and not something... less maskable?
> On x86 we use NMI for that...

Architecturally, arm64 does not have an NMI.

There's been some work to try to get pseudo-NMIs using GICv3 priorities,
but that's about the closest we can get today.

Thanks,
Mark.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread David Woodhouse
On Wed, 2017-03-15 at 18:56 +0900, AKASHI Takahiro wrote:
> This patch series adds kdump support on arm64.
> 
> To load a crash-dump kernel to the systems, a series of patches to
> kexec-tools[1] are also needed. Please use the latest one, v6 [2].
> For your convinience, you can pick them up from:
>    https://git.linaro.org/people/takahiro.akashi/linux-aarch64.git arm64/kdump
>    https://git.linaro.org/people/takahiro.akashi/kexec-tools.git arm64/kdump
> 
> To examine vmcore (/proc/vmcore) on a crash-dump kernel, you can use
>   - crash utility (v7.1.8 or later) [3]
> 
> I tested this patchset on fast model and hikey.

Is this one going to be be my fault too?

[   38.947305] SMP: stopping secondary CPUs
[   38.951291] Starting crashdump kernel...
[   38.955207] [ cut here ]
[   38.959818] WARNING: CPU: 0 PID: 312 at 
arch/arm64/kernel/machine_kexec.c:169 machine_kexec+0x220/0x298

That's the cpus_are_stuck_in_kernel() || num_online_cpus() one. Again,
v32 applied to a 4.9 kernel. With the stray __init removed this time ;)

Taking secondary CPUs down for a normal kexec does work via PSCI.

I'll see if I can enable enough hardware support to let me test a newer
kernel...

smime.p7s
Description: S/MIME cryptographic signature
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 00/14] add kdump support

2017-03-17 Thread David Woodhouse
On Fri, 2017-03-17 at 11:43 +, David Woodhouse wrote:
> 
> Is this one going to be be my fault too?

Looks like it isn't my fault. In ipi_cpu_crash_stop() we don't modify
the online mask. Which is reasonable enough if we want to preserve its
original contents from before the crash, but it does make that
WARN_ON() in machine_kexec() a false positive.

Btw, why is this a normal IPI and not something... less maskable?
On x86 we use NMI for that...

smime.p7s
Description: S/MIME cryptographic signature
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 04/14] arm64: kdump: reserve memory for crash dump kernel

2017-03-17 Thread David Woodhouse
On Fri, 2017-03-17 at 20:31 +0900, AKASHI Takahiro wrote:
> 
> Yes and no.
> This notation is consistent with other places like mem_init()
> in mm/init.c.]

Well, perhaps we should fix those too then. But we certainly shouldn't
add *more* errors.

smime.p7s
Description: S/MIME cryptographic signature
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 04/14] arm64: kdump: reserve memory for crash dump kernel

2017-03-17 Thread David Woodhouse
On Wed, 2017-03-15 at 18:59 +0900, AKASHI Takahiro wrote:
> 
> +   pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
> +   crash_base, crash_base + crash_size, crash_size >> 20);

There's a typo there — it says MB but you mean MiB.

Unless you meant crash_size / 100 and not crash_size >> 20?

smime.p7s
Description: S/MIME cryptographic signature
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kexec regression since 4.9 caused by efi

2017-03-17 Thread Matt Fleming
On Fri, 17 Mar, at 10:09:51AM, Dave Young wrote:
> 
> Matt, I think it should be fine although I think the md type checking in
> efi_mem_desc_lookup() is causing confusion and not easy to understand..
 
Could you make that a separate patch if you think of improvements
there?

> How about move the if chunk early like below because it seems no need
> to sanity check the addr + size any more if the md is still RUNTIME?

My original version did as you suggest, but I changed it because we
*really* want to know if someone tries to reserve a range that spans
regions. That would be totally unexpected and a warning about a
potential bug/issue.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: kexec regression since 4.9 caused by efi

2017-03-17 Thread Ard Biesheuvel
On 17 March 2017 at 02:09, Dave Young  wrote:
> On 03/16/17 at 12:41pm, Matt Fleming wrote:
>> On Mon, 13 Mar, at 03:37:48PM, Dave Young wrote:
>> >
>> > Omar, could you try below patch? Looking at the efi_mem_desc_lookup, it is 
>> > not
>> > correct to be used in efi_arch_mem_reserve, if it passed your test, I
>> > can rewrite patch log with more background and send it out:
>> >
>> > for_each_efi_memory_desc(md) {
>> > [snip]
>> > if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
>> > md->type != EFI_BOOT_SERVICES_DATA &&
>> > md->type != EFI_RUNTIME_SERVICES_DATA) {
>> > continue;
>> > }
>> >
>> > In above code, it meant to get a md of EFI_MEMORY_RUNTIME of either boot
>> > data or runtime data, this is wrong for efi_mem_reserve, because we are
>> > reserving boot data which has no EFI_MEMORY_RUNTIME attribute at the
>> > running time. Just is happened to work and we did not capture the error.
>>
>> Wouldn't something like this be simpler?
>>
>> ---
>>
>> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
>> index 30031d5293c4..cdfe8c628959 100644
>> --- a/arch/x86/platform/efi/quirks.c
>> +++ b/arch/x86/platform/efi/quirks.c
>> @@ -201,6 +201,10 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 
>> size)
>>   return;
>>   }
>>
>> + /* No need to reserve regions that will never be freed. */
>> + if (md.attribute & EFI_MEMORY_RUNTIME)
>> + return;
>> +
>
> Matt, I think it should be fine although I think the md type checking in
> efi_mem_desc_lookup() is causing confusion and not easy to understand..
>
> How about move the if chunk early like below because it seems no need
> to sanity check the addr + size any more if the md is still RUNTIME?
>
> --- linux-x86.orig/arch/x86/platform/efi/quirks.c
> +++ linux-x86/arch/x86/platform/efi/quirks.c
> @@ -196,6 +196,10 @@ void __init efi_arch_mem_reserve(phys_ad
> return;
> }
>
> +   /* No need to reserve regions that will never be freed. */
> +   if (md.attribute & EFI_MEMORY_RUNTIME)
> +   return;
> +
> if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
> pr_err("Region spans EFI memory descriptors, %pa\n", );
> return;
>

This way, we suppress the error it the region spans multiple
descriptors, and only the first one has the runtime attribute. So the
two patches are not equivalent. I don't have a strong preference
either way, though.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v33 04/14] arm64: kdump: reserve memory for crash dump kernel

2017-03-17 Thread AKASHI Takahiro
On Fri, Mar 17, 2017 at 10:46:06AM +, David Woodhouse wrote:
> On Wed, 2017-03-15 at 18:59 +0900, AKASHI Takahiro wrote:
> > 
> > +pr_info("crashkernel reserved: 0x%016llx - 0x%016llx (%lld MB)\n",
> > +crash_base, crash_base + crash_size, crash_size >> 20);
> 
> There's a typo there — it says MB but you mean MiB.
> 
> Unless you meant crash_size / 100 and not crash_size >> 20?

Yes and no.
This notation is consistent with other places like mem_init()
in mm/init.c.

Thanks,
-Takahiro AKASHI


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] x86_64, kexec: Avoid unnecessary identity mappings for kdump

2017-03-17 Thread Xunlei Pang
kexec setups identity mappings for all the memory mapped in 1st kernel,
this is not necessary for the kdump case. Actually it can cause extra
memory consumption for paging structures, which is quite considerable
on modern machines with huge memory.

E.g. On our 24TB machine, it will waste around 96MB (around 4MB/TB)
from the reserved memory range if setting all the identity mappings.

It also causes some trouble for distributions that use an intelligent
policy to evaluate the proper "crashkernel=X" for users.

To solve it, in case of kdump, we only setup identity mappings for the
crash memory and the ISA memory(may be needed by purgatory/kdump boot).

Signed-off-by: Xunlei Pang 
---
 arch/x86/kernel/machine_kexec_64.c | 34 ++
 1 file changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/machine_kexec_64.c 
b/arch/x86/kernel/machine_kexec_64.c
index 857cdbd..db77a76 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -112,14 +112,40 @@ static int init_pgtable(struct kimage *image, unsigned 
long start_pgtable)
 
level4p = (pgd_t *)__va(start_pgtable);
clear_page(level4p);
-   for (i = 0; i < nr_pfn_mapped; i++) {
-   mstart = pfn_mapped[i].start << PAGE_SHIFT;
-   mend   = pfn_mapped[i].end << PAGE_SHIFT;
 
+   if (image->type == KEXEC_TYPE_CRASH) {
+   /* Always map the ISA range */
result = kernel_ident_mapping_init(,
-level4p, mstart, mend);
+   level4p, 0, ISA_END_ADDRESS);
if (result)
return result;
+
+   /* crashk_low_res may not be initialized when reaching here */
+   if (crashk_low_res.end) {
+   mstart = crashk_low_res.start;
+   mend = crashk_low_res.end + 1;
+   result = kernel_ident_mapping_init(,
+   level4p, mstart, mend);
+   if (result)
+   return result;
+   }
+
+   mstart = crashk_res.start;
+   mend = crashk_res.end + 1;
+   result = kernel_ident_mapping_init(,
+   level4p, mstart, mend);
+   if (result)
+   return result;
+   } else {
+   for (i = 0; i < nr_pfn_mapped; i++) {
+   mstart = pfn_mapped[i].start << PAGE_SHIFT;
+   mend   = pfn_mapped[i].end << PAGE_SHIFT;
+
+   result = kernel_ident_mapping_init(,
+level4p, mstart, mend);
+   if (result)
+   return result;
+   }
}
 
/*
-- 
1.8.3.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec