Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Xunlei Pang
On 02/16/2017 at 08:22 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
>> then mce will be broadcast to the other cpus which are still running
>> in the first kernel(i.e. looping in crash_nmi_callback).
> Simple: the crash code should really mark CPUs as not being online:
>
> void do_machine_check(struct pt_regs *regs, long error_code)
>
>   ...
>
> /* If this CPU is offline, just bail out. */
> if (cpu_is_offline(smp_processor_id())) {
> u64 mcgstatus;
>
> mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
> if (mcgstatus & MCG_STATUS_RIPV) {
> mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> return;
> }
> }
>
> because looping in crash_nmi_callback() does not really denote them as
> CPUs being online.
>
> And just so that you don't disturb the machine too much during crashing,
> you could simply clear them from the online masks, i.e., perhaps call
> remove_cpu_from_maps() with the proper locking around it instead of
> doing a full cpu_down().

It changes the value of cpu_online_mask/etc which will cause confusion to 
vmcore analysis.
Moreover, for the code(see comment inlined)

if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) { // This condition may be not 
true, the mce triggered on kdump cpu 
 // doesn't 
need to have this bit set for the other cpus remain in 1st kernel. 
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}


Regards,
Xunlei

>
> The machine will be killed anyway after kdump is done writing out
> memory.
>


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Borislav Petkov
On Thu, Feb 16, 2017 at 07:52:09PM +0800, Xunlei Pang wrote:
> then mce will be broadcast to the other cpus which are still running
> in the first kernel(i.e. looping in crash_nmi_callback).

Simple: the crash code should really mark CPUs as not being online:

void do_machine_check(struct pt_regs *regs, long error_code)

...

/* If this CPU is offline, just bail out. */
if (cpu_is_offline(smp_processor_id())) {
u64 mcgstatus;

mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
if (mcgstatus & MCG_STATUS_RIPV) {
mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
return;
}
}

because looping in crash_nmi_callback() does not really denote them as
CPUs being online.

And just so that you don't disturb the machine too much during crashing,
you could simply clear them from the online masks, i.e., perhaps call
remove_cpu_from_maps() with the proper locking around it instead of
doing a full cpu_down().

The machine will be killed anyway after kdump is done writing out
memory.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Xunlei Pang
On 02/16/2017 at 06:18 PM, Borislav Petkov wrote:
> On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
>> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 
>> 0x0"),
>> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
>> boots(seems
>> the cpus remain in 1st kernel don't respond to the simulated broadcasting 
>> mce).
>>
>> But in theory, we know cpus belong to kdump kernel can't respond to the
>> old mce handler, so a single SRAO injection in 1st kernel should be similar.
>> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
>> broadcast
>> mce supported, and inject SRAO to cpu0 only through qemu monitor
>> "mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
>> the machine as follows(running on linux-4.9):
>>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast 
>> exception handler
> Sounds to me like you're trying hard to prove some point of yours which
> doesn't make much sense to me. And when you say "in theory", that makes
> it even less believable. So I remember asking you for exact steps. That
> above doesn't read like steps but like some babbling and I've actually
> tried to make sense of it for a couple of minutes but failed.
>
> So lemme spell it out for ya. I'd like for you to give me this:
>
> 1. Build kernel with this config
> 2. Boot it in kvm with this settings
> 3. Do this in the guest
> 4. Do that in the guest
> 5. ...
> 6. ...
>
>
> And all should be exact commands so that I can do them here on my machine.
>

Sorry, missed your point.

The steps should be as follows:
1. Prepare a multi-core intel machine with broadcasted mce support.
Enable kdump(crashkernel=256M) and configure kdump kernel to boot with 
"nr_cpus=1".
2. Activate kdump, and crash the first kernel on some cpu, say cpu1
(taskset -c 1 echo 0 > /proc/sysrq-trigger), then kdump will boot on cpu1.
3. After kdump boots up(let it enter shell), trigger a SRAO on cpu1
   (QEMU monitor cmd: mce -b 1 0 0xb100 0x5 0x0 0x0),
then mce will be broadcast to the other cpus which are still running
in the first kernel(i.e. looping in crash_nmi_callback).
If you own some hardware to inject mce, it would be great, as QEMU does not 
work correctly for me.
4. Then something like below is expected to happen:

[1.468556] tsc: Refined TSC clocksource calibration: 2933.437 MHz
 Starting Kdump Vmcore Save Service...
kdump: saving to /sysroot//var/crash/127.0.0.1-2015-09-01-05:07:03/
kdump: saving vmcore-dmesg.txt
[   39.10] mce: [Hardware Error]: CPU 0: Machine Check Exception: 0 Bank 2: 
bd00017a
[   39.10] mce: [Hardware Error]: TSC 0 ADDR 6160 MISC 8c 
[   39.10] mce: [Hardware Error]: PROCESSOR 0:106a3 TIME 1441083980 SOCKET 
0 APIC 0 microcode 1
[   39.10] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[   39.10] Kernel panic - not syncing: Timeout: Not all CPUs entered 
broadcast exception handler
[   39.10] Shutting down cpus with NMI
[1.758463] Uhhuh. NMI received for unknown reason 20 on CPU 0.
[1.758463] Do you have a strange power saving mode enabled?
[1.758463] Dazed and confused, but trying to continue
[   39.10] Rebooting in 30 seconds..

Regards,
Xunlei

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

2017-02-16 Thread Borislav Petkov
On Thu, Feb 16, 2017 at 01:36:37PM +0800, Xunlei Pang wrote:
> I tried to use qemu to inject SRAO("mce -b 0 0 0xb100 0x5 0x0 
> 0x0"),
> it works well in 1st kernel, but it doesn't work for 1st kernel after kdump 
> boots(seems
> the cpus remain in 1st kernel don't respond to the simulated broadcasting 
> mce).
> 
> But in theory, we know cpus belong to kdump kernel can't respond to the
> old mce handler, so a single SRAO injection in 1st kernel should be similar.
> For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with 
> broadcast
> mce supported, and inject SRAO to cpu0 only through qemu monitor
> "mce 0 0 0xb100 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
> the machine as follows(running on linux-4.9):
>   Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast 
> exception handler

Sounds to me like you're trying hard to prove some point of yours which
doesn't make much sense to me. And when you say "in theory", that makes
it even less believable. So I remember asking you for exact steps. That
above doesn't read like steps but like some babbling and I've actually
tried to make sense of it for a couple of minutes but failed.

So lemme spell it out for ya. I'd like for you to give me this:

1. Build kernel with this config
2. Boot it in kvm with this settings
3. Do this in the guest
4. Do that in the guest
5. ...
6. ...


And all should be exact commands so that I can do them here on my machine.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v32 07/13] arm64: hibernate: preserve kdump image around hibernation

2017-02-16 Thread AKASHI Takahiro
On Wed, Feb 15, 2017 at 12:12:35PM +, James Morse wrote:
> Hi Akashi,
> 
> On 07/02/17 08:08, AKASHI Takahiro wrote:
> > Since arch_kexec_protect_crashkres() removes a mapping for crash dump
> > kernel memory, the loaded contents won't be preserved around hibernation.
> > 
> > In this patch, arch_kexec_(un)protect_crashkres() are additionally called
> > before/after hibernation so that the relevant region will be mapped again
> > and restored just as the other memory regions are.
> 
> Reviewed-by: James Morse 

Thank you very much.

> A quick test of this took longer than expected (writing to a slow usb device),

Really? I use a uSD card on hikey as a swap device, and it takes just
a few moments to save a hibernate image although I do the test right after
the system comes up.

> I
> suspect it is save/restoring the whole crash region (which I don't think is a
> problem).

Now that we have only page-level mappings for the crash region,
it might be possible to mark all the unused pages "reserved"
in arch_kexec_unprotect_crashkres() if called in hibernate.

-Takahiro AKASHI

> If someone turns out to use this combination of features I will look
> at improving this, (almost certainly requires core-code changes).
> 
> 
> Thanks,
> 
> James

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: EFI stub kexec probelm

2017-02-16 Thread Dave Young
On 02/16/17 at 03:20pm, Dave Young wrote:
> On 02/16/17 at 07:35pm, Eric W. Biederman wrote:
> > Paweł Lenkow  writes:
> > 
> > > Hi!
> > >
> > > I am trying to run EFI stub kernel using kexec and unfortunately 2nd 
> > > kernel
> > > crashes.
> > 
> > Adding the kexec list as there may be someone there who may be more
> > knowledgeable about our problem.
> > 
> > > First kernel is loaded by UEFI, starts simple init with shell and then
> > > I am loading the same image using kexec.
> > >
> > > It looks like a lack of correct mappings for EFI tables because it
> > > calls address zero (RDI register) in efi_call.
> > >
> > > Does kernel support such configuration?

Yes, it *should* works, at least works for me previously on Lenovo
laptops. What is the hardware in this case?

> > >
> > > If yes what could be wrong?
> > 
> > I don't know exactly where the support is, I haven't looked recently.
> > 
> > There is a major challenge with efi.  The call to place efi in virtual
> > mode may be called exactly once.  There has been work on at least ia64
> > to make it possible to capture the efi virtual address bass and pass it
> > through to the kernel calling kexec.  I don't remember seeing the work
> > to do that happen on other platforms.
> > 
> > I personally think we should just abandon running efi in virtual mode
> > and always run efi in physicall mode, but the feedback when I suggested
> > that was that efi didn't actually work if you did that.  *Shrug*
> > 
> > Given that efi_enter_virtual_mode is in your call trace I am guessing
> > that the problem is the historical problem I have mentioned above.

In latest kernel we have a stable va address space for efi runtime,
kexec-tools will copy the runtime maps in 1st kernel so that efi runtime
still can be still usable after kexec kernel boot up but do not entering
virtual mode again.

There might be some bug either in kernel or some special firmware
problem..

> > 
> > Eric
> > 
> > 
> > 
> > > Using grub-efi all works fine, but it is not enough for me.
> > >
> > > See log below:
> > >
> > >
> > > / # kexec -l /mnt/INSTALL/vmlinuz -s
> 
> What is the kexec-tools version? And does kexec -l works (without -s)?
> 
> A test with efi=debug parameters may help, also full logs of both
> pre-kexec boot and post-kexec boot could also help to find something..

This questions obviously are for Paweł Lenkow, add him to "To" list..

> 
> Thanks
> Dave

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec