Re: [Xenomai-core] Fwd: Re: [RFC][PATCH] x86: Fix root domain state restoring on exception return

2010-01-25 Thread Jan Kiszka
Wolfgang Mauerer wrote:
 (sorry if you get this twice, the cc addresses were screwed up
 by copy and paste last time)
 
 Kiszka, Jan wrote:
 
 If we enter __ipipe_handle_exception over a non-root domain and leave it
 due to migration in the event handler over root, we must not restore the
 root domain state so far saved on entry. This caused subtle pipeline
 state corruptions. Actually, we only need to save the state if we enter
 over the root domain and have to align its state to the hardware
 interrupt mask.

 Moreover, the x86-32 regs.eflags fix-up must happen based on the current
 root domain state to avoid more spurious corruptions.
 
 unfortunately, this won't boot on x86-32: During CPU bug checking
 in check_hlt, the kernel will really go into the halt state and
 never recover. By modifying __ipipe_handle_exception to use
 raw_irqs_disabled_flags as argument to __fixup_if instead of
 raw_irqs_disabled, everything is back to normal again. However,

That would reintroduce the bug we saw on 64 bit to 32 bit (in a slight
variation). So we need to understand what goes wrong here. Can you
generate an ipipe panic trace?

 I'm not sure if this is a) the proper solution or b) won't cause
 problems somewhere else, so a discussion would be highly welcome...
 
 Cheers, Wolfgang
 

Jan



signature.asc
Description: OpenPGP digital signature
___
Xenomai-core mailing list
Xenomai-core@gna.org
https://mail.gna.org/listinfo/xenomai-core


Re: [Xenomai-core] Fwd: Re: [RFC][PATCH] x86: Fix root domain state restoring on exception return

2010-01-25 Thread Wolfgang Mauerer
Jan Kiszka wrote:
 Wolfgang Mauerer wrote:
 Kiszka, Jan wrote:
 
 If we enter __ipipe_handle_exception over a non-root domain and leave it
 due to migration in the event handler over root, we must not restore the
 root domain state so far saved on entry. This caused subtle pipeline
 state corruptions. Actually, we only need to save the state if we enter
 over the root domain and have to align its state to the hardware
 interrupt mask.

 Moreover, the x86-32 regs.eflags fix-up must happen based on the current
 root domain state to avoid more spurious corruptions.
 
 unfortunately, this won't boot on x86-32: During CPU bug checking
 in check_hlt, the kernel will really go into the halt state and
 never recover. By modifying __ipipe_handle_exception to use
 raw_irqs_disabled_flags as argument to __fixup_if instead of
 raw_irqs_disabled, everything is back to normal again. However,
 
 That would reintroduce the bug we saw on 64 bit to 32 bit (in a slight
 variation). So we need to understand what goes wrong here. Can you
 generate an ipipe panic trace?

the problem is that when Linux does the hlt check
in check_hlt, it relies on interrupts to occur after the hlt
instructions have been issued - otherwise, the computer will really
halt. This happens during early boot before ftrace is enabled, which
makes tracing a bit hard... Besides, since we are stuck after the first
hlt, its also hard to find a good point where to start the trace ;-)

One missing part of the problem seems to be to figure out when
to _really_ use __fixup_if, I did some more testing on the
effects wrt. the bug handled in Jan's recent patch.

1.) With

if (restore_flags)
__fixup_regs(raw_irqs_disabled_flags(regs) ,regs);

being used in __ipipe_handle_exception, the kernel boots properly
(so it seems to do as well if __fixup_regs is called
unconditionally)

2.) With __fixup_regs(raw_irq_disable(), regs), the boot process
stops in check_hlt() in the Linux kernel.

There seems to be a disagreement about the state of the hardware
interrupt flag right before the kernel executes the tests in
check_halt(). With the following code in __ipipe_handle_exception

if (restore_flags) {
rawflags = __raw_local_save_flags();

printk(before fixup: rawflags: %lx, flags: %lx, 
   regs-flags: %lx\n, rawflags, flags,
   regs-flags);

__fixup_if(raw_irqs_disabled_flags(flags), regs);

printk(after fixup: rawflags: %lx, flags: %lx, 
   regs-flags: %lx\n, rawflags, flags,
   regs-flags);
}

we can distinguish between case 1 and 2 (X86_FLAGS_IF=0x200) right
before check_hlt():

case 1 (raw_irqs_disable_flags):
[0.339060] before fixup: rawflags: 0, flags: 200, regs-flags: 246
[0.340001] after fixup: rawflags: 0, flags: 200, regs-flags: 246

case 2 (raw_irqs_disable):
[0.345352] before fixup: rawflags: 0, flags: 200, regs-flags: 246
[0.346582] after fixup: rawflags: 0, flags: 200, regs-flags: 46

Consider __ipipe_unstall_iret_root. When regs-flags | X86_FLAGS_IF
- is 0, the pipeline is stalled, but regs.flags =| X86_EFLAGS_IF
  = Ipipe recives interrupts, but does not pass them on to the Linux
 domain.
- is X86_FLAGS_IF, then the pipeline is unstalled and interrupts are
  implicitely enabled by iret. Everything works as expected even if
  rawflags and regs.flags have disagreed upon entry to
  __ipipe_handle_exception

This explains why everything works fine in case 1 (despite
rawflags and regs-flags disagree), but fails in case 2.

However, since the IRQ state obtained from the raw hardware settings
via pushf in raw_irqs_disabled() disagrees with what Ipipe believes
to be the hardware state in regs-flags, it would seem that the
consistency between both is destroyed somewhere in the early boot
process, and inadvertently  fixed by __fixup_regs.

Does that make sense? Or does anyone else have another idea what might
be going on? For the sake of completeness, I've also attached the
patch used for testing as it is right now.

Thanks, Wolfgang
commit 86e5bcfd0a39960e68475a5a6108088176d98aae
Author: Wolfgang Mauerer wolfgang.maue...@siemens.com
Date:   Sun Jan 24 23:13:30 2010 +0100

x86: Fix root domain state restoring on exception return

diff --git a/arch/x86/kernel/ipipe.c b/arch/x86/kernel/ipipe.c
index 4442d96..b8a6ec7 100644
--- a/arch/x86/kernel/ipipe.c
+++ b/arch/x86/kernel/ipipe.c
@@ -702,19 +702,22 @@ static int __ipipe_xlate_signo[] = {
 
 int __ipipe_handle_exception(struct pt_regs *regs, long error_code, int vector)
 {
-	unsigned long flags;
+	bool restore_flags = false;
+	unsigned long flags = 0;
+	unsigned long regflags;
+	unsigned long rawflags;
 
 	/* Pick up the root domain state of the interrupted context. */
 	local_save_flags(flags);
 
-	if (ipipe_root_domain_p) {
+	if (ipipe_root_domain_p  irqs_disabled_hw()) {
 		/*
 		 * Replicate hw interrupt state into the virtual