subject:"Re\: Another preempt folding issue\?"

Re: Another preempt folding issue?

2014-03-25 Thread Stefan Bader

On 24.03.2014 18:39, Paolo Bonzini wrote:
> Il 20/02/2014 16:50, Peter Zijlstra ha scritto:
> > >> One thing I likely should do is to reinstall the exact same laptop
> with 64bit
> > >> kernel and userspace... maybe only 64bit kernel first... and make 
> > >> sure
> on my
> > >> side that this does not show up on 64bit, too. I took the word of
> reporters for
> > >> that (and the impression that otherwise many more people would have
> complained).
 > >
 > > Yeha, I'm going to try and install some 32bit userspace on a usb
 > > harddisk I've got and see if I can boot my Core2 laptop from that to 
 > > try
 > > and reproduce.
 > >
 > > But all that is probably going to be Monday :/
 > >
>>> > *sigh* Already Thursday...
>>> >
>>> > Peter, did you get to reproduce this locally? Unfortunately I had some
>>> > interruption and have not more Information than on last Friday (which is 
>>> > that
>>> > the same hw but 64bit kernel does not show it).
>> I got side-tracked as well, someone reported crashes, which come above
>> weird behaviour :/
>>
> 
> Stefan, Peter, any news here?
> 
> Paolo

No, unfortunately not. :( I thought I had some idea as i386 seems to have soft
and hard irqs using a seperate stack and thought they maybe loose the process
flag there (x86_64 seems to always do that for soft irqs and if it does for hard
irqs it was hidden well enough for me not to see). But trial and error showed
showed no improvement when I made a copy of the original TIF_NEED_RSCHED before
executing on stack. So not really any further.
And then the usual distractions hit...

-Stefan



signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-03-25 Thread Stefan Bader

On 24.03.2014 18:39, Paolo Bonzini wrote:
 Il 20/02/2014 16:50, Peter Zijlstra ha scritto:
   One thing I likely should do is to reinstall the exact same laptop
 with 64bit
   kernel and userspace... maybe only 64bit kernel first... and make 
   sure
 on my
   side that this does not show up on 64bit, too. I took the word of
 reporters for
   that (and the impression that otherwise many more people would have
 complained).
  
   Yeha, I'm going to try and install some 32bit userspace on a usb
   harddisk I've got and see if I can boot my Core2 laptop from that to 
   try
   and reproduce.
  
   But all that is probably going to be Monday :/
  
  *sigh* Already Thursday...
 
  Peter, did you get to reproduce this locally? Unfortunately I had some
  interruption and have not more Information than on last Friday (which is 
  that
  the same hw but 64bit kernel does not show it).
 I got side-tracked as well, someone reported crashes, which come above
 weird behaviour :/

 
 Stefan, Peter, any news here?
 
 Paolo

No, unfortunately not. :( I thought I had some idea as i386 seems to have soft
and hard irqs using a seperate stack and thought they maybe loose the process
flag there (x86_64 seems to always do that for soft irqs and if it does for hard
irqs it was hidden well enough for me not to see). But trial and error showed
showed no improvement when I made a copy of the original TIF_NEED_RSCHED before
executing on stack. So not really any further.
And then the usual distractions hit...

-Stefan



signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-03-24 Thread Paolo Bonzini


Il 20/02/2014 16:50, Peter Zijlstra ha scritto:

> >> One thing I likely should do is to reinstall the exact same laptop with 
64bit
> >> kernel and userspace... maybe only 64bit kernel first... and make sure on 
my
> >> side that this does not show up on 64bit, too. I took the word of 
reporters for
> >> that (and the impression that otherwise many more people would have 
complained).

> >
> > Yeha, I'm going to try and install some 32bit userspace on a usb
> > harddisk I've got and see if I can boot my Core2 laptop from that to try
> > and reproduce.
> >
> > But all that is probably going to be Monday :/
> >

> *sigh* Already Thursday...
>
> Peter, did you get to reproduce this locally? Unfortunately I had some
> interruption and have not more Information than on last Friday (which is that
> the same hw but 64bit kernel does not show it).

I got side-tracked as well, someone reported crashes, which come above
weird behaviour :/



Stefan, Peter, any news here?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-03-24 Thread Paolo Bonzini


Il 20/02/2014 16:50, Peter Zijlstra ha scritto:

  One thing I likely should do is to reinstall the exact same laptop with 
64bit
  kernel and userspace... maybe only 64bit kernel first... and make sure on 
my
  side that this does not show up on 64bit, too. I took the word of 
reporters for
  that (and the impression that otherwise many more people would have 
complained).

 
  Yeha, I'm going to try and install some 32bit userspace on a usb
  harddisk I've got and see if I can boot my Core2 laptop from that to try
  and reproduce.
 
  But all that is probably going to be Monday :/
 

 *sigh* Already Thursday...

 Peter, did you get to reproduce this locally? Unfortunately I had some
 interruption and have not more Information than on last Friday (which is that
 the same hw but 64bit kernel does not show it).

I got side-tracked as well, someone reported crashes, which come above
weird behaviour :/



Stefan, Peter, any news here?

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-20 Thread Peter Zijlstra

On Thu, Feb 20, 2014 at 04:38:13PM +0100, Stefan Bader wrote:
> On 14.02.2014 18:21, Peter Zijlstra wrote:
> > On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
> >> One thing I likely should do is to reinstall the exact same laptop with 
> >> 64bit
> >> kernel and userspace... maybe only 64bit kernel first... and make sure on 
> >> my
> >> side that this does not show up on 64bit, too. I took the word of 
> >> reporters for
> >> that (and the impression that otherwise many more people would have 
> >> complained).
> > 
> > Yeha, I'm going to try and install some 32bit userspace on a usb
> > harddisk I've got and see if I can boot my Core2 laptop from that to try
> > and reproduce.
> > 
> > But all that is probably going to be Monday :/
> > 
> *sigh* Already Thursday...
> 
> Peter, did you get to reproduce this locally? Unfortunately I had some
> interruption and have not more Information than on last Friday (which is that
> the same hw but 64bit kernel does not show it).

I got side-tracked as well, someone reported crashes, which come above
weird behaviour :/

> Meanwhile I wonder whether it would make sense to push the following (or 
> more?)
> to stable for 3.13.y:
> 
> 1) 16824255394f55adf31b9a96a9965d8c15bdac4c
>x86, acpi, idle: Restructure the mwait idle routines
> 2) 7e98b71920464b8d15fa95c74366416cd3c88861
>x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
> 3) 8cb75e0c4ec9786b81439761eac1d18d4a931af3
>sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
> 4) 215393bc1fab3d61a5a296838bdffce22f27ffda
>sched/preempt/x86: Fix voluntary preempt for x86
> 
> 1+2 would be to avoid touching 3 too much and looked to be improvements on 
> their
> own. 3+4 would be cherry-picks if not for some fuzz 2.

I'll not object; but if someone wants a smaller set you could do with
just the idle.c hunk from 3.

> I saw a few more things labelled preempt between 3.13 and current HEAD but am
> not sure whether or which of those are strictly required. Namely some fixing 
> to
> preempt_enable_no_resched() mis-usage and maybe one fixing an issue of ftrace
> locking up.

Yeah, that's all fresh breakage and doesn't go back to 3.13 I think.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-20 Thread Stefan Bader

On 14.02.2014 18:21, Peter Zijlstra wrote:
> On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
>> One thing I likely should do is to reinstall the exact same laptop with 64bit
>> kernel and userspace... maybe only 64bit kernel first... and make sure on my
>> side that this does not show up on 64bit, too. I took the word of reporters 
>> for
>> that (and the impression that otherwise many more people would have 
>> complained).
> 
> Yeha, I'm going to try and install some 32bit userspace on a usb
> harddisk I've got and see if I can boot my Core2 laptop from that to try
> and reproduce.
> 
> But all that is probably going to be Monday :/
> 
*sigh* Already Thursday...

Peter, did you get to reproduce this locally? Unfortunately I had some
interruption and have not more Information than on last Friday (which is that
the same hw but 64bit kernel does not show it).

Meanwhile I wonder whether it would make sense to push the following (or more?)
to stable for 3.13.y:

1) 16824255394f55adf31b9a96a9965d8c15bdac4c
   x86, acpi, idle: Restructure the mwait idle routines
2) 7e98b71920464b8d15fa95c74366416cd3c88861
   x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
3) 8cb75e0c4ec9786b81439761eac1d18d4a931af3
   sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
4) 215393bc1fab3d61a5a296838bdffce22f27ffda
   sched/preempt/x86: Fix voluntary preempt for x86

1+2 would be to avoid touching 3 too much and looked to be improvements on their
own. 3+4 would be cherry-picks if not for some fuzz 2.
I saw a few more things labelled preempt between 3.13 and current HEAD but am
not sure whether or which of those are strictly required. Namely some fixing to
preempt_enable_no_resched() mis-usage and maybe one fixing an issue of ftrace
locking up.

-Stefan
3

signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-20 Thread Stefan Bader

On 14.02.2014 18:21, Peter Zijlstra wrote:
 On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
 One thing I likely should do is to reinstall the exact same laptop with 64bit
 kernel and userspace... maybe only 64bit kernel first... and make sure on my
 side that this does not show up on 64bit, too. I took the word of reporters 
 for
 that (and the impression that otherwise many more people would have 
 complained).
 
 Yeha, I'm going to try and install some 32bit userspace on a usb
 harddisk I've got and see if I can boot my Core2 laptop from that to try
 and reproduce.
 
 But all that is probably going to be Monday :/
 
*sigh* Already Thursday...

Peter, did you get to reproduce this locally? Unfortunately I had some
interruption and have not more Information than on last Friday (which is that
the same hw but 64bit kernel does not show it).

Meanwhile I wonder whether it would make sense to push the following (or more?)
to stable for 3.13.y:

1) 16824255394f55adf31b9a96a9965d8c15bdac4c
   x86, acpi, idle: Restructure the mwait idle routines
2) 7e98b71920464b8d15fa95c74366416cd3c88861
   x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
3) 8cb75e0c4ec9786b81439761eac1d18d4a931af3
   sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
4) 215393bc1fab3d61a5a296838bdffce22f27ffda
   sched/preempt/x86: Fix voluntary preempt for x86

1+2 would be to avoid touching 3 too much and looked to be improvements on their
own. 3+4 would be cherry-picks if not for some fuzz 2.
I saw a few more things labelled preempt between 3.13 and current HEAD but am
not sure whether or which of those are strictly required. Namely some fixing to
preempt_enable_no_resched() mis-usage and maybe one fixing an issue of ftrace
locking up.

-Stefan
3



signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-20 Thread Peter Zijlstra

On Thu, Feb 20, 2014 at 04:38:13PM +0100, Stefan Bader wrote:
 On 14.02.2014 18:21, Peter Zijlstra wrote:
  On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
  One thing I likely should do is to reinstall the exact same laptop with 
  64bit
  kernel and userspace... maybe only 64bit kernel first... and make sure on 
  my
  side that this does not show up on 64bit, too. I took the word of 
  reporters for
  that (and the impression that otherwise many more people would have 
  complained).
  
  Yeha, I'm going to try and install some 32bit userspace on a usb
  harddisk I've got and see if I can boot my Core2 laptop from that to try
  and reproduce.
  
  But all that is probably going to be Monday :/
  
 *sigh* Already Thursday...
 
 Peter, did you get to reproduce this locally? Unfortunately I had some
 interruption and have not more Information than on last Friday (which is that
 the same hw but 64bit kernel does not show it).

I got side-tracked as well, someone reported crashes, which come above
weird behaviour :/

 Meanwhile I wonder whether it would make sense to push the following (or 
 more?)
 to stable for 3.13.y:
 
 1) 16824255394f55adf31b9a96a9965d8c15bdac4c
x86, acpi, idle: Restructure the mwait idle routines
 2) 7e98b71920464b8d15fa95c74366416cd3c88861
x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
 3) 8cb75e0c4ec9786b81439761eac1d18d4a931af3
sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
 4) 215393bc1fab3d61a5a296838bdffce22f27ffda
sched/preempt/x86: Fix voluntary preempt for x86
 
 1+2 would be to avoid touching 3 too much and looked to be improvements on 
 their
 own. 3+4 would be cherry-picks if not for some fuzz 2.

I'll not object; but if someone wants a smaller set you could do with
just the idle.c hunk from 3.

 I saw a few more things labelled preempt between 3.13 and current HEAD but am
 not sure whether or which of those are strictly required. Namely some fixing 
 to
 preempt_enable_no_resched() mis-usage and maybe one fixing an issue of ftrace
 locking up.

Yeah, that's all fresh breakage and doesn't go back to 3.13 I think.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 14.02.2014 19:23, Stefan Bader wrote:
> On 14.02.2014 18:33, Borislav Petkov wrote:
>> On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
>>> Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu 
>>> info. And
>>> there is a mfence in the disassembly:
>>
>> Btw, I just realized booting the kernel in the guest was a dumb idea,
>> because, doh, the guest is not baremetal. The only reliable thing we
>> can say is that sse2 is present and that MFENCE alternative replacement
>> works :)
>>
>> But for simplicity's sake let's just assume the machine can do MFENCE
>> just fine and it gets replaced by the alternatives code.
>>
>> Besides, if that weren't true, we'd have a whole lot of other problems
>> on those boxes.
>>
>>> Thinking about it, I guess Peter is quite right saying that I likely
>>> will end on the patch that converted preempt_count to percpu.
>>
>> Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al.
>>
>>> One thing I likely should do is to reinstall the exact same laptop
>>> with 64bit kernel and userspace... maybe only 64bit kernel first...
>>> and make sure on my side that this does not show up on 64bit, too. I
>>> took the word of reporters for that (and the impression that otherwise
>>> many more people would have complained).
>>
>> Yeah, that should be a prudent thing to do.
>>
>> Also, Paolo and I were wondering whether you can trigger this thing
>> without kvm, i.e. virtualization involved... do you have any data on
>> that?
> 
> Unfortunately no hard evidence. Kvm just happens to be such a good way to 
> notice
> this as it is using the reschedule interrupt itself and has this exit before
> running the guest vcpu to hadnle it in the outer loop by calling 
> cond_resched()
> and repeat.
> I find running kvm seems to make that laptop quite sluggish in responding to
> other tasks (in that install) and I got some oddness going on when lightdm 
> quite
> often refuses to take keyboard input without opening some menu with the mouse
> first... But I could not be sure whether that is the kernel or some new
> user-space ... errr "feature".
> At least Marcello (iirc that other report came from him directly or 
> indirectly)
> has seen it, too. And he likely has complete different user-space.
> 
> So I will go and do that different (64bit) kernel and kernel + user-space 
> test.
> But like fo Peter, it likely is a Monday thing...
> 

Ok, it is still Friday... So a quick test (2 boots of a 32bit guest, same as
before) on the 32bit user-space, with the same kernel source, but compiled as
64bit (obviously not 100% same config but close). While I see the false
inconsistency messages (I modified the in kernel-test to trigger the trace stop
only if the __vcpu_run loop encounters an inconsistent state three times in a
row), I do not see the final stop message. Also (but that is rather feeling) the
system seems to remain more responsive (switching to other windows, opening
terminal windows,...) compared to 32bit kernel.




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 14.02.2014 18:33, Borislav Petkov wrote:
> On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
>> Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. 
>> And
>> there is a mfence in the disassembly:
> 
> Btw, I just realized booting the kernel in the guest was a dumb idea,
> because, doh, the guest is not baremetal. The only reliable thing we
> can say is that sse2 is present and that MFENCE alternative replacement
> works :)
> 
> But for simplicity's sake let's just assume the machine can do MFENCE
> just fine and it gets replaced by the alternatives code.
> 
> Besides, if that weren't true, we'd have a whole lot of other problems
> on those boxes.
> 
>> Thinking about it, I guess Peter is quite right saying that I likely
>> will end on the patch that converted preempt_count to percpu.
> 
> Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al.
> 
>> One thing I likely should do is to reinstall the exact same laptop
>> with 64bit kernel and userspace... maybe only 64bit kernel first...
>> and make sure on my side that this does not show up on 64bit, too. I
>> took the word of reporters for that (and the impression that otherwise
>> many more people would have complained).
> 
> Yeah, that should be a prudent thing to do.
> 
> Also, Paolo and I were wondering whether you can trigger this thing
> without kvm, i.e. virtualization involved... do you have any data on
> that?

Unfortunately no hard evidence. Kvm just happens to be such a good way to notice
this as it is using the reschedule interrupt itself and has this exit before
running the guest vcpu to hadnle it in the outer loop by calling cond_resched()
and repeat.
I find running kvm seems to make that laptop quite sluggish in responding to
other tasks (in that install) and I got some oddness going on when lightdm quite
often refuses to take keyboard input without opening some menu with the mouse
first... But I could not be sure whether that is the kernel or some new
user-space ... errr "feature".
At least Marcello (iirc that other report came from him directly or indirectly)
has seen it, too. And he likely has complete different user-space.

So I will go and do that different (64bit) kernel and kernel + user-space test.
But like fo Peter, it likely is a Monday thing...

signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-14 Thread Borislav Petkov

On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
> Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. 
> And
> there is a mfence in the disassembly:

Btw, I just realized booting the kernel in the guest was a dumb idea,
because, doh, the guest is not baremetal. The only reliable thing we
can say is that sse2 is present and that MFENCE alternative replacement
works :)

But for simplicity's sake let's just assume the machine can do MFENCE
just fine and it gets replaced by the alternatives code.

Besides, if that weren't true, we'd have a whole lot of other problems
on those boxes.

> Thinking about it, I guess Peter is quite right saying that I likely
> will end on the patch that converted preempt_count to percpu.

Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al.

> One thing I likely should do is to reinstall the exact same laptop
> with 64bit kernel and userspace... maybe only 64bit kernel first...
> and make sure on my side that this does not show up on 64bit, too. I
> took the word of reporters for that (and the impression that otherwise
> many more people would have complained).

Yeah, that should be a prudent thing to do.

Also, Paolo and I were wondering whether you can trigger this thing
without kvm, i.e. virtualization involved... do you have any data on
that?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
> One thing I likely should do is to reinstall the exact same laptop with 64bit
> kernel and userspace... maybe only 64bit kernel first... and make sure on my
> side that this does not show up on 64bit, too. I took the word of reporters 
> for
> that (and the impression that otherwise many more people would have 
> complained).

Yeha, I'm going to try and install some 32bit userspace on a usb
harddisk I've got and see if I can boot my Core2 laptop from that to try
and reproduce.

But all that is probably going to be Monday :/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 14.02.2014 15:47, Borislav Petkov wrote:
> On Fri, Feb 14, 2014 at 03:24:09PM +0100, Stefan Bader wrote:
>> Actually, this code just makes so much more sense if I let objdump do
>> relocation info...
> 
> Ok, we're pretty sure you have an MFENCE there in resched_task but can
> you confirm it please.
> 
> First, does /proc/cpuinfo have the "sse2" string? It should but nowadays
> I don't trust anything.
> 
> Then, can you boot that kernel in qemu with the -gdb flag so that it
> starts the gdb stub, here's the manpage about it:
> 
>-gdb dev
>Wait for gdb connection on device dev. Typical connections will 
> likely be
>TCP-based, but also UDP, pseudo TTY, or even stdio are reasonable 
> use case.
>The latter is allowing to start QEMU from within gdb and establish 
> the
>connection via a pipe:
> 
>(gdb) target remote | exec qemu-system-i386 -gdb stdio ...
> 
>-s  Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 
> 1234.
> 
> then boot the guest and when it is up, do
> 
> $ gdb ./vmlinux
> $ target remote localhost:1234
> 
> and type in the prompt
> 
> $ (gdb) x/50i resched_task
> 
> It gives here:
> 
> (gdb) x/50i resched_task
>0x810836f0 :   data32 data32 data32 xchg %ax,%ax
>0x810836f5 : push   %rbp
>0x810836f6 : mov0x85e123(%rip),%r10d# 
> 0x818e1820 
>0x810836fd :mov%rsp,%rbp
>0x81083700 :push   %r12
>0x81083702 :test   %r10d,%r10d
>0x81083705 :push   %rbx
>0x81083706 :mov%rdi,%rbx
>0x81083709 :jne0x81083760 
> 
>0x8108370b :mov0x8(%rbx),%rax
>0x8108370f :mov0x10(%rax),%rdx
>0x81083713 :and$0x8,%edx
>0x81083716 :jne0x8108373c 
> 
>0x81083718 :lock orb $0x8,0x10(%rax)
>0x8108371d :mov0x8(%rbx),%rax
>0x81083721 :mov0x18(%rax),%r12d
>0x81083725 :callq  0x812d8fc0 
> 
>0x8108372a :cmp%r12d,%eax
>0x8108372d :je 0x810837a0 
> 
>0x8108372f :mfence
>   ^^
> I want to make sure the smp_mb() is really replaced with an MFENCE there.
> 
> Thanks!
> 
Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. And
there is a mfence in the disassembly:

   0xc107dcb0 :   push   %ebp
   0xc107dcb1 : mov%esp,%ebp
   0xc107dcb3 : lea%ds:0x0(%esi,%eiz,1),%esi
   0xc107dcb8 : mov0x4(%eax),%edx
   0xc107dcbb :mov0x8(%edx),%ecx
   0xc107dcbe :and$0x8,%ecx
   0xc107dcc1 :jne0xc107dce7 
   0xc107dcc3 :orb$0x8,%ds:0x8(%edx)
   0xc107dcc8 :mov0x4(%eax),%edx
   0xc107dccb :mov%fs:0xc1a71010,%ecx
   0xc107dcd2 :mov0x10(%edx),%edx
   0xc107dcd5 :cmp%ecx,%edx
   0xc107dcd7 :je 0xc107dd00 
   0xc107dcd9 :mfence
   0xc107dcdc :mov%esi,%esi
   0xc107dcde :mov0x4(%eax),%eax
   0xc107dce1 :testb  $0x4,0xc(%eax)
   0xc107dce5 :je 0xc107dcf0 
   0xc107dce7 :pop%ebp
   0xc107dce8 :ret
   0xc107dce9 :lea0x0(%esi,%eiz,1),%esi
   0xc107dcf0 :mov%edx,%eax
   0xc107dcf2 :call   *0xc1917950
   0xc107dcf8 :pop%ebp

Thinking about it, I guess Peter is quite right saying that I likely will end on
the patch that converted preempt_count to percpu.

One thing I likely should do is to reinstall the exact same laptop with 64bit
kernel and userspace... maybe only 64bit kernel first... and make sure on my
side that this does not show up on 64bit, too. I took the word of reporters for
that (and the impression that otherwise many more people would have complained).




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 04:21:32PM +0100, Borislav Petkov wrote:
> Oh, and just in case this is relatively easy to reproduce and in case we
> don't have any other idea, bisection might be another option. I'm not
> saying you should do it right away - I'm just putting it on the table...

I'm fairly sure you're going to hit the patch that introduced the whole
per-cpu preemption thing. Before that not receiving the IPI wasn't as
bad as it is now.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Borislav Petkov

On Fri, Feb 14, 2014 at 04:28:16PM +0100, Stefan Bader wrote:
> Oh yeah, bisection is nearly as entertaining as doing my tax records.
> Hm, on the other hand those will have to be done at some point too...
> :-P

Ah, tax records, crap, I have those lying around in the corner since
forever... I need to do those too :-(

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Stefan Bader

On 14.02.2014 16:21, Borislav Petkov wrote:
> Oh, and just in case this is relatively easy to reproduce and in case we
> don't have any other idea, bisection might be another option. I'm not
> saying you should do it right away - I'm just putting it on the table...
> 
> :-)
> 
> :-)
> 

Oh yeah, bisection is nearly as entertaining as doing my tax records. Hm, on the
other hand those will have to be done at some point too... :-P



signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Borislav Petkov

Oh, and just in case this is relatively easy to reproduce and in case we
don't have any other idea, bisection might be another option. I'm not
saying you should do it right away - I'm just putting it on the table...

:-)

:-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Borislav Petkov

On Fri, Feb 14, 2014 at 03:24:09PM +0100, Stefan Bader wrote:
> Actually, this code just makes so much more sense if I let objdump do
> relocation info...

Ok, we're pretty sure you have an MFENCE there in resched_task but can
you confirm it please.

First, does /proc/cpuinfo have the "sse2" string? It should but nowadays
I don't trust anything.

Then, can you boot that kernel in qemu with the -gdb flag so that it
starts the gdb stub, here's the manpage about it:

   -gdb dev
   Wait for gdb connection on device dev. Typical connections will 
likely be
   TCP-based, but also UDP, pseudo TTY, or even stdio are reasonable 
use case.
   The latter is allowing to start QEMU from within gdb and establish 
the
   connection via a pipe:

   (gdb) target remote | exec qemu-system-i386 -gdb stdio ...

   -s  Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 1234.

then boot the guest and when it is up, do

$ gdb ./vmlinux
$ target remote localhost:1234

and type in the prompt

$ (gdb) x/50i resched_task

It gives here:

(gdb) x/50i resched_task
   0x810836f0 :   data32 data32 data32 xchg %ax,%ax
   0x810836f5 : push   %rbp
   0x810836f6 : mov0x85e123(%rip),%r10d# 
0x818e1820 
   0x810836fd :mov%rsp,%rbp
   0x81083700 :push   %r12
   0x81083702 :test   %r10d,%r10d
   0x81083705 :push   %rbx
   0x81083706 :mov%rdi,%rbx
   0x81083709 :jne0x81083760 

   0x8108370b :mov0x8(%rbx),%rax
   0x8108370f :mov0x10(%rax),%rdx
   0x81083713 :and$0x8,%edx
   0x81083716 :jne0x8108373c 

   0x81083718 :lock orb $0x8,0x10(%rax)
   0x8108371d :mov0x8(%rbx),%rax
   0x81083721 :mov0x18(%rax),%r12d
   0x81083725 :callq  0x812d8fc0 

   0x8108372a :cmp%r12d,%eax
   0x8108372d :je 0x810837a0 

   0x8108372f :mfence
^^
I want to make sure the smp_mb() is really replaced with an MFENCE there.

Thanks!

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Borislav Petkov

On Thu, Feb 13, 2014 at 07:26:05PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote:
> > Yeah... not sure the interleaved source helps or not ...
> 
> It did, thanks!

Stefan, can you also send sched/core.s? I'm particularly interested in
what resched_task() has been turned into by your build env.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 11:55:25AM +0100, Stefan Bader wrote:
> Ok, I think I now got a log of the actual issue. It seems cpu#1 missed out on
> handling a reschedule interrupt but did send one to cpu#0 and on cpu#0 while
> handling the interrupt the tif flag was not set (yet?) but then when it is,
> there is no more interrupt to do the folding... (sorry for the wrapping)...
> 
>  qemu-system-x86-2503  [000] d.h.  1145.887872: default_send_IPI_mask_logical 
> <-native_smp_send_reschedule
>  qemu-system-x86-2503  [000] d...  1145.887888: resched_task 
> <-check_preempt_curr
>   -0 [000] d.h.  1145.888001: resched_task 
> <-check_preempt_curr

>   -0 [001] dn..  1145.888788: smp_reschedule_interrupt 
> <-reschedule_interrupt
>   -0 [001] dn..  1145.888789: scheduler_ipi 
> <-smp_reschedule_interrupt

>   gnome-terminal-2440  [000] d...  1145.888957: resched_task 
> <-check_preempt_curr
>   gnome-terminal-2440  [000] d...  1145.888959: native_smp_send_reschedule 
> <-resched_task
>   gnome-terminal-2440  [000] d...  1145.888959: default_send_IPI_mask_logical 
> <-native_smp_send_reschedule

>   -0 [001] dn..  1145.889296: smp_reschedule_interrupt 
> <-reschedule_interrupt
>   -0 [001] dn..  1145.889297: scheduler_ipi 
> <-smp_reschedule_interrupt

>   gnome-terminal-2440  [000] d...  1145.889465: resched_task 
> <-check_preempt_curr
>   gnome-terminal-2440  [000] d...  1145.889467: native_smp_send_reschedule 
> <-resched_task
>   gnome-terminal-2440  [000] d...  1145.889467: default_send_IPI_mask_logical 
> <-native_smp_send_reschedule

> Xorg-1054  [001] d.h.  1145.889469: resched_task 
> <-check_preempt_wakeup
> Xorg-1054  [001] d.h.  1145.889469: native_smp_send_reschedule 
> <-resched_task
> Xorg-1054  [001] d.h.  1145.889469: default_send_IPI_mask_logical 
> <-native_smp_send_reschedule

>  qemu-system-x86-2503  [000] d.s.  1145.889643: smp_reschedule_interrupt 
> <-reschedule_interrupt
>  qemu-system-x86-2503  [000] d.s.  1145.889643: scheduler_ipi 
> <-smp_reschedule_interrupt
>  qemu-system-x86-2503  [000] .n..  1145.889647: kvm_arch_vcpu_ioctl_run: 
> sched inconsistency
>  qemu-system-x86-2503  [000] .n..  1145.889662: kvm_arch_vcpu_ioctl_run: 
> sched inconsistency
>  qemu-system-x86-2503  [000] .n..  1145.889664: kvm_arch_vcpu_ioctl_run: 
> sched inconsistency
>  qemu-system-x86-2503  [000] .n..  1145.889668: kvm_arch_vcpu_ioctl_run: 
> need(1) != should(0)
>  qemu-system-x86-2503  [000] .n..  1145.889669: kvm_arch_vcpu_ioctl_run: 
> exit_reason=2
>  qemu-system-x86-2503  [000] .n..  1145.889669: kvm_arch_vcpu_ioctl_run: 
> preempt_count=8000
> 

So afaiu you're running this on an AMD fam 0xf which doesn't have
synchronized TSC; so again the timing might be off.. however:

The first block from cpu 0, sends an IPI to cpu 1, cpu 1 sees 'n'.

Then again, cpu 0 sends to cpu 1, we observe 'n'.

So far so good and timing seems to reflect causality.

Then however, cpu 0 sends, but nothing in this trace answers on cpu 1..
odd.

Then cpu 1 sends to cpu 0, cpu 0 does receive the IPI but does _not_
observe 'n', which is mighty odd. However after the IPI is handled cpu 0
magically gets the 'n'.

So one thing I did wonder about; if cpu 0 sends an IPI while cpu 1 is
already processing the same interrupt; cpu 1 should latch the interrupt
and it would re-trigger with when we re-enable interrupts after we're
done with the first.

The only thing that would ruin that is late-ack, eg. ack (clear the
latch) when exiting the interrupt; but I checked and we ack before we
call scheduler_ipi().

Now; if this were to happen (but I don't exactly see how on a 2 CPU
system, nor do I see this in the trace), there might be a hole...

CPU 0   |   CPU 1
+
|
/* ! resched_task */|
smp_send_reschedule(1)  |L
|L  smp_reschedule_interrupt()
resched_task()  |L
  set_tsk_need_resched()|L
  smp_mb()  |L
  smp_send_reschedule(1)|L
|Lack_APIC_irq()
| scheduler_ipi()
|   need_resched();

Now strictly speaking that need_resched() load could creep up before
the store and we'd fail to observe 'n'.

However, x86 forbids reads from passing an earlier write to the same
location (it allows passing earlier writes in general). At which point
the only other option is GCC emitting that load far earlier, in fact
before ack_APIC_irq(), but that would be a violation of the C std
because scheduler_ipi() is a (real) function call and acts as a compiler
barrier AFAIK (as opposed to inlining which doesn't act as a compiler
barrier).

So colour me confused..

You could try putting an smp_mb() at the very start of scheduler_ipi()

Re: Another preempt folding issue?

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 12:24:42PM +0100, Stefan Bader wrote:
> Oh and one thing I was wondering. Not sure I do understand it right... When
> initially converting to percpu counts, you changed the 32bit assembly like 
> that:
> 
> --- a/arch/x86/kernel/entry_32.S
> +++ b/arch/x86/kernel/entry_32.S
> @@ -362,12 +362,9 @@ END(ret_from_exception)
>  #ifdef CONFIG_PREEMPT
>  ENTRY(resume_kernel)
> DISABLE_INTERRUPTS(CLBR_ANY)
> -   cmpl $0,TI_preempt_count(%ebp)  # non-zero preempt_count ?
> -   jnz restore_all
>  need_resched:
> -   movl TI_flags(%ebp), %ecx   # need_resched set ?
> -   testb $_TIF_NEED_RESCHED, %cl
> -   jz restore_all
> +   cmpl $0,PER_CPU_VAR(__preempt_count)
> +   jnz restore_all
> testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)# interrupts off (exception 
> path
> jz restore_all
> call preempt_schedule_irq
> 
> This seems to say if preempt_count was 0 then then if the thread flag was set
> and interrupts were not off(?) it would do a preempt ipi and then come back to
> re-check the thread flag.

No not an IPI; it would reschedule.

So the old code:

  if preempt_count != 0; continue out
  if !TIF_NEED_RESCHED; continue out
  if IRQs-off in calling context; continue out
  preempt_schedule_irq

The new code:

  if preempt_count != 0; continue out
  if IRQs-off in calling context; continue out
  preempt_schedule_irq

> This would now be if preempt_count is 0 only... and I wonder whether that 
> would
> change from doing that loop...

We can do away with the TIF_NEED_RESCHED test because that state is
folded into the preempt_count by means of PREEMPT_NEED_RESCHED.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 13.02.2014 19:25, Peter Zijlstra wrote:
> On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
>> On 12.02.2014 12:54, Peter Zijlstra wrote:
>>> On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
 Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
 limited to 8 (for the 32bit kernel). So the default apic driver is used. 
 Since
 default_send_IPI_mask_logical is only used from there, I assume the trace 
 you
 got does the same. Maybe something there is wrong which would explain why 
 we
 only see it on 32bit hosts.
>>>
>>> Can you try with a different APIC driver to test this?
>>>
>> I don't think I can. And I think the statement about this only be used for 
>> 32bit
>> could be wrong. I got mislead to think so because those are only defined in
>> probe_32 but the 64bit counterpart isn't containing much aside that.
>>
>> Anyway, I played around with tracing a bit more. So with this change:
>>
>> if (need_resched()) {
>> srcu_read_unlock(>srcu, vcpu->srcu_idx);
>> if (need_resched() != should_resched()) {
>> +   trace_printk("need(%i) != should(%i)\n",
>> +   need_resched(), should_resched());
>> +   trace_printk("exit_reason=%u\n",
>> +   vcpu->run->exit_reason);
>> +trace_printk("preempt_count=%lx\n",
>> +__this_cpu_read_4(__preempt_count));
>> +tracing_stop();
>> +printk(KERN_ERR "Stopped tracing, due to
>> inconsistent state.\n");
>> }
>>  +  schedule();
>>  -  cond_reschedule();
>> vcpu->srcu_idx = srcu_read_lock(>srcu);
>> }
>>
>> I get the following (weird) output:
>>
>> Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
>> <-resched_task
>> Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
>> <-native_smp_send_reschedule
>>   bamfdaemon-2318  [001] d...71.270465: resched_task 
>> <-check_preempt_wakeup
>>   bamfdaemon-2318  [001] d...71.270539: resched_task 
>> <-check_preempt_wakeup
>>   compiz-2365  [001] d...71.270689: resched_task 
>> <-check_preempt_wakeup
>>   compiz-2365  [001] d...71.270827: resched_task 
>> <-check_preempt_wakeup
>>   compiz-2365  [001] d...71.270940: resched_task 
>> <-check_preempt_wakeup
>>  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
>> <-reschedule_interrupt
>>  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
>> <-smp_reschedule_interrupt
>>  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
>> need(1)
>> != should(0)
>>  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
>> exit_reason=2
>>  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
>> preempt_count=0
>>
>> So am I reading this right, that the interrupt did get delivered to cpu#0 
>> while
>> the thread info already had the resched flag set. So this really should have
>> cleared the bit in preempt_count. But while the trace info shows 'N' for some
>> reason should_reschedule returns false but at the same time reading the 
>> preempt
>> count manually shows it 0?
> 
> So the assembly merges the first and second should_resched(), so its
> possible that load got before the interrupt().
> 
> The 3rd preempt_count load gets re-issued and so that would show the
> 'true' value again.
> 
> If you want to force a reload after the condition; put in a barrier().
> 
> In any case; this looks like a false-positive. Please try again until
> you get one where the interrupt doesn't happen and we stay in 'n' state.
> 

Oh and one thing I was wondering. Not sure I do understand it right... When
initially converting to percpu counts, you changed the 32bit assembly like that:

--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
DISABLE_INTERRUPTS(CLBR_ANY)
-   cmpl $0,TI_preempt_count(%ebp)  # non-zero preempt_count ?
-   jnz restore_all
 need_resched:
-   movl TI_flags(%ebp), %ecx   # need_resched set ?
-   testb $_TIF_NEED_RESCHED, %cl
-   jz restore_all
+   cmpl $0,PER_CPU_VAR(__preempt_count)
+   jnz restore_all
testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)# interrupts off (exception path
jz restore_all
call preempt_schedule_irq

This seems to say if preempt_count was 0 then then if the thread flag was set
and interrupts were not off(?) it would do a preempt ipi and then come back to
re-check the thread flag.
This would now be if preempt_count is 0 only...

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 13.02.2014 19:25, Peter Zijlstra wrote:
> On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
>> On 12.02.2014 12:54, Peter Zijlstra wrote:
>>> On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
 Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
 limited to 8 (for the 32bit kernel). So the default apic driver is used. 
 Since
 default_send_IPI_mask_logical is only used from there, I assume the trace 
 you
 got does the same. Maybe something there is wrong which would explain why 
 we
 only see it on 32bit hosts.
>>>
>>> Can you try with a different APIC driver to test this?
>>>
>> I don't think I can. And I think the statement about this only be used for 
>> 32bit
>> could be wrong. I got mislead to think so because those are only defined in
>> probe_32 but the 64bit counterpart isn't containing much aside that.
>>
>> Anyway, I played around with tracing a bit more. So with this change:
>>
>> if (need_resched()) {
>> srcu_read_unlock(>srcu, vcpu->srcu_idx);
>> if (need_resched() != should_resched()) {
>> +   trace_printk("need(%i) != should(%i)\n",
>> +   need_resched(), should_resched());
>> +   trace_printk("exit_reason=%u\n",
>> +   vcpu->run->exit_reason);
>> +trace_printk("preempt_count=%lx\n",
>> +__this_cpu_read_4(__preempt_count));
>> +tracing_stop();
>> +printk(KERN_ERR "Stopped tracing, due to
>> inconsistent state.\n");
>> }
>>  +  schedule();
>>  -  cond_reschedule();
>> vcpu->srcu_idx = srcu_read_lock(>srcu);
>> }
>>
>> I get the following (weird) output:
>>
>> Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
>> <-resched_task
>> Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
>> <-native_smp_send_reschedule
>>   bamfdaemon-2318  [001] d...71.270465: resched_task 
>> <-check_preempt_wakeup
>>   bamfdaemon-2318  [001] d...71.270539: resched_task 
>> <-check_preempt_wakeup
>>   compiz-2365  [001] d...71.270689: resched_task 
>> <-check_preempt_wakeup
>>   compiz-2365  [001] d...71.270827: resched_task 
>> <-check_preempt_wakeup
>>   compiz-2365  [001] d...71.270940: resched_task 
>> <-check_preempt_wakeup
>>  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
>> <-reschedule_interrupt
>>  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
>> <-smp_reschedule_interrupt
>>  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
>> need(1)
>> != should(0)
>>  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
>> exit_reason=2
>>  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
>> preempt_count=0
>>
>> So am I reading this right, that the interrupt did get delivered to cpu#0 
>> while
>> the thread info already had the resched flag set. So this really should have
>> cleared the bit in preempt_count. But while the trace info shows 'N' for some
>> reason should_reschedule returns false but at the same time reading the 
>> preempt
>> count manually shows it 0?
> 
> So the assembly merges the first and second should_resched(), so its
> possible that load got before the interrupt().
> 
> The 3rd preempt_count load gets re-issued and so that would show the
> 'true' value again.
> 
> If you want to force a reload after the condition; put in a barrier().
> 
> In any case; this looks like a false-positive. Please try again until
> you get one where the interrupt doesn't happen and we stay in 'n' state.
> 
Ok, I think I now got a log of the actual issue. It seems cpu#1 missed out on
handling a reschedule interrupt but did send one to cpu#0 and on cpu#0 while
handling the interrupt the tif flag was not set (yet?) but then when it is,
there is no more interrupt to do the folding... (sorry for the wrapping)...

 qemu-system-x86-2503  [000] d.h.  1145.887872: default_send_IPI_mask_logical
<-native_smp_send_reschedule
 qemu-system-x86-2503  [000] d...  1145.887888: resched_task 
<-check_preempt_curr
  -0 [000] d.h.  1145.888001: resched_task 
<-check_preempt_curr
  -0 [001] dn..  1145.888788: smp_reschedule_interrupt
<-reschedule_interrupt
  -0 [001] dn..  1145.888789: scheduler_ipi
<-smp_reschedule_interrupt
  gnome-terminal-2440  [000] d...  1145.888957: resched_task 
<-check_preempt_curr
  gnome-terminal-2440  [000] d...  1145.888959: native_smp_send_reschedule
<-resched_task
  gnome-terminal-2440  [000] d...  1145.888959: default_send_IPI_mask_logical

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 13.02.2014 19:25, Peter Zijlstra wrote:
 On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
 On 12.02.2014 12:54, Peter Zijlstra wrote:
 On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
 Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
 limited to 8 (for the 32bit kernel). So the default apic driver is used. 
 Since
 default_send_IPI_mask_logical is only used from there, I assume the trace 
 you
 got does the same. Maybe something there is wrong which would explain why 
 we
 only see it on 32bit hosts.

 Can you try with a different APIC driver to test this?

 I don't think I can. And I think the statement about this only be used for 
 32bit
 could be wrong. I got mislead to think so because those are only defined in
 probe_32 but the 64bit counterpart isn't containing much aside that.

 Anyway, I played around with tracing a bit more. So with this change:

 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 if (need_resched() != should_resched()) {
 +   trace_printk(need(%i) != should(%i)\n,
 +   need_resched(), should_resched());
 +   trace_printk(exit_reason=%u\n,
 +   vcpu-run-exit_reason);
 +trace_printk(preempt_count=%lx\n,
 +__this_cpu_read_4(__preempt_count));
 +tracing_stop();
 +printk(KERN_ERR Stopped tracing, due to
 inconsistent state.\n);
 }
  +  schedule();
  -  cond_reschedule();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }

 I get the following (weird) output:

 Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
 -resched_task
 Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
 -native_smp_send_reschedule
   bamfdaemon-2318  [001] d...71.270465: resched_task 
 -check_preempt_wakeup
   bamfdaemon-2318  [001] d...71.270539: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270689: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270827: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270940: resched_task 
 -check_preempt_wakeup
  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
 -reschedule_interrupt
  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
 -smp_reschedule_interrupt
  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
 need(1)
 != should(0)
  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
 exit_reason=2
  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
 preempt_count=0

 So am I reading this right, that the interrupt did get delivered to cpu#0 
 while
 the thread info already had the resched flag set. So this really should have
 cleared the bit in preempt_count. But while the trace info shows 'N' for some
 reason should_reschedule returns false but at the same time reading the 
 preempt
 count manually shows it 0?
 
 So the assembly merges the first and second should_resched(), so its
 possible that load got before the interrupt().
 
 The 3rd preempt_count load gets re-issued and so that would show the
 'true' value again.
 
 If you want to force a reload after the condition; put in a barrier().
 
 In any case; this looks like a false-positive. Please try again until
 you get one where the interrupt doesn't happen and we stay in 'n' state.
 
Ok, I think I now got a log of the actual issue. It seems cpu#1 missed out on
handling a reschedule interrupt but did send one to cpu#0 and on cpu#0 while
handling the interrupt the tif flag was not set (yet?) but then when it is,
there is no more interrupt to do the folding... (sorry for the wrapping)...

 qemu-system-x86-2503  [000] d.h.  1145.887872: default_send_IPI_mask_logical
-native_smp_send_reschedule
 qemu-system-x86-2503  [000] d...  1145.887888: resched_task 
-check_preempt_curr
  idle-0 [000] d.h.  1145.888001: resched_task 
-check_preempt_curr
  idle-0 [001] dn..  1145.888788: smp_reschedule_interrupt
-reschedule_interrupt
  idle-0 [001] dn..  1145.888789: scheduler_ipi
-smp_reschedule_interrupt
  gnome-terminal-2440  [000] d...  1145.888957: resched_task 
-check_preempt_curr
  gnome-terminal-2440  [000] d...  1145.888959: native_smp_send_reschedule
-resched_task
  gnome-terminal-2440  [000] d...  1145.888959: default_send_IPI_mask_logical
-native_smp_send_reschedule
  idle-0 [001] dn..  1145.889296: smp_reschedule_interrupt
-reschedule_interrupt
  idle-0 [001] dn..  1145.889297: scheduler_ipi
-smp_reschedule_interrupt

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 13.02.2014 19:25, Peter Zijlstra wrote:
 On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
 On 12.02.2014 12:54, Peter Zijlstra wrote:
 On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
 Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
 limited to 8 (for the 32bit kernel). So the default apic driver is used. 
 Since
 default_send_IPI_mask_logical is only used from there, I assume the trace 
 you
 got does the same. Maybe something there is wrong which would explain why 
 we
 only see it on 32bit hosts.

 Can you try with a different APIC driver to test this?

 I don't think I can. And I think the statement about this only be used for 
 32bit
 could be wrong. I got mislead to think so because those are only defined in
 probe_32 but the 64bit counterpart isn't containing much aside that.

 Anyway, I played around with tracing a bit more. So with this change:

 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 if (need_resched() != should_resched()) {
 +   trace_printk(need(%i) != should(%i)\n,
 +   need_resched(), should_resched());
 +   trace_printk(exit_reason=%u\n,
 +   vcpu-run-exit_reason);
 +trace_printk(preempt_count=%lx\n,
 +__this_cpu_read_4(__preempt_count));
 +tracing_stop();
 +printk(KERN_ERR Stopped tracing, due to
 inconsistent state.\n);
 }
  +  schedule();
  -  cond_reschedule();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }

 I get the following (weird) output:

 Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
 -resched_task
 Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
 -native_smp_send_reschedule
   bamfdaemon-2318  [001] d...71.270465: resched_task 
 -check_preempt_wakeup
   bamfdaemon-2318  [001] d...71.270539: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270689: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270827: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270940: resched_task 
 -check_preempt_wakeup
  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
 -reschedule_interrupt
  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
 -smp_reschedule_interrupt
  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
 need(1)
 != should(0)
  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
 exit_reason=2
  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
 preempt_count=0

 So am I reading this right, that the interrupt did get delivered to cpu#0 
 while
 the thread info already had the resched flag set. So this really should have
 cleared the bit in preempt_count. But while the trace info shows 'N' for some
 reason should_reschedule returns false but at the same time reading the 
 preempt
 count manually shows it 0?
 
 So the assembly merges the first and second should_resched(), so its
 possible that load got before the interrupt().
 
 The 3rd preempt_count load gets re-issued and so that would show the
 'true' value again.
 
 If you want to force a reload after the condition; put in a barrier().
 
 In any case; this looks like a false-positive. Please try again until
 you get one where the interrupt doesn't happen and we stay in 'n' state.
 

Oh and one thing I was wondering. Not sure I do understand it right... When
initially converting to percpu counts, you changed the 32bit assembly like that:

--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -362,12 +362,9 @@ END(ret_from_exception)
 #ifdef CONFIG_PREEMPT
 ENTRY(resume_kernel)
DISABLE_INTERRUPTS(CLBR_ANY)
-   cmpl $0,TI_preempt_count(%ebp)  # non-zero preempt_count ?
-   jnz restore_all
 need_resched:
-   movl TI_flags(%ebp), %ecx   # need_resched set ?
-   testb $_TIF_NEED_RESCHED, %cl
-   jz restore_all
+   cmpl $0,PER_CPU_VAR(__preempt_count)
+   jnz restore_all
testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)# interrupts off (exception path
jz restore_all
call preempt_schedule_irq

This seems to say if preempt_count was 0 then then if the thread flag was set
and interrupts were not off(?) it would do a preempt ipi and then come back to
re-check the thread flag.
This would now be if preempt_count is 0 only... and I wonder whether that would
change from doing that loop...



signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 12:24:42PM +0100, Stefan Bader wrote:
 Oh and one thing I was wondering. Not sure I do understand it right... When
 initially converting to percpu counts, you changed the 32bit assembly like 
 that:
 
 --- a/arch/x86/kernel/entry_32.S
 +++ b/arch/x86/kernel/entry_32.S
 @@ -362,12 +362,9 @@ END(ret_from_exception)
  #ifdef CONFIG_PREEMPT
  ENTRY(resume_kernel)
 DISABLE_INTERRUPTS(CLBR_ANY)
 -   cmpl $0,TI_preempt_count(%ebp)  # non-zero preempt_count ?
 -   jnz restore_all
  need_resched:
 -   movl TI_flags(%ebp), %ecx   # need_resched set ?
 -   testb $_TIF_NEED_RESCHED, %cl
 -   jz restore_all
 +   cmpl $0,PER_CPU_VAR(__preempt_count)
 +   jnz restore_all
 testl $X86_EFLAGS_IF,PT_EFLAGS(%esp)# interrupts off (exception 
 path
 jz restore_all
 call preempt_schedule_irq
 
 This seems to say if preempt_count was 0 then then if the thread flag was set
 and interrupts were not off(?) it would do a preempt ipi and then come back to
 re-check the thread flag.

No not an IPI; it would reschedule.

So the old code:

  if preempt_count != 0; continue out
  if !TIF_NEED_RESCHED; continue out
  if IRQs-off in calling context; continue out
  preempt_schedule_irq

The new code:

  if preempt_count != 0; continue out
  if IRQs-off in calling context; continue out
  preempt_schedule_irq

 This would now be if preempt_count is 0 only... and I wonder whether that 
 would
 change from doing that loop...

We can do away with the TIF_NEED_RESCHED test because that state is
folded into the preempt_count by means of PREEMPT_NEED_RESCHED.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 11:55:25AM +0100, Stefan Bader wrote:
 Ok, I think I now got a log of the actual issue. It seems cpu#1 missed out on
 handling a reschedule interrupt but did send one to cpu#0 and on cpu#0 while
 handling the interrupt the tif flag was not set (yet?) but then when it is,
 there is no more interrupt to do the folding... (sorry for the wrapping)...
 
  qemu-system-x86-2503  [000] d.h.  1145.887872: default_send_IPI_mask_logical 
 -native_smp_send_reschedule
  qemu-system-x86-2503  [000] d...  1145.887888: resched_task 
 -check_preempt_curr
   idle-0 [000] d.h.  1145.888001: resched_task 
 -check_preempt_curr

   idle-0 [001] dn..  1145.888788: smp_reschedule_interrupt 
 -reschedule_interrupt
   idle-0 [001] dn..  1145.888789: scheduler_ipi 
 -smp_reschedule_interrupt

   gnome-terminal-2440  [000] d...  1145.888957: resched_task 
 -check_preempt_curr
   gnome-terminal-2440  [000] d...  1145.888959: native_smp_send_reschedule 
 -resched_task
   gnome-terminal-2440  [000] d...  1145.888959: default_send_IPI_mask_logical 
 -native_smp_send_reschedule

   idle-0 [001] dn..  1145.889296: smp_reschedule_interrupt 
 -reschedule_interrupt
   idle-0 [001] dn..  1145.889297: scheduler_ipi 
 -smp_reschedule_interrupt

   gnome-terminal-2440  [000] d...  1145.889465: resched_task 
 -check_preempt_curr
   gnome-terminal-2440  [000] d...  1145.889467: native_smp_send_reschedule 
 -resched_task
   gnome-terminal-2440  [000] d...  1145.889467: default_send_IPI_mask_logical 
 -native_smp_send_reschedule

 Xorg-1054  [001] d.h.  1145.889469: resched_task 
 -check_preempt_wakeup
 Xorg-1054  [001] d.h.  1145.889469: native_smp_send_reschedule 
 -resched_task
 Xorg-1054  [001] d.h.  1145.889469: default_send_IPI_mask_logical 
 -native_smp_send_reschedule

  qemu-system-x86-2503  [000] d.s.  1145.889643: smp_reschedule_interrupt 
 -reschedule_interrupt
  qemu-system-x86-2503  [000] d.s.  1145.889643: scheduler_ipi 
 -smp_reschedule_interrupt
  qemu-system-x86-2503  [000] .n..  1145.889647: kvm_arch_vcpu_ioctl_run: 
 sched inconsistency
  qemu-system-x86-2503  [000] .n..  1145.889662: kvm_arch_vcpu_ioctl_run: 
 sched inconsistency
  qemu-system-x86-2503  [000] .n..  1145.889664: kvm_arch_vcpu_ioctl_run: 
 sched inconsistency
  qemu-system-x86-2503  [000] .n..  1145.889668: kvm_arch_vcpu_ioctl_run: 
 need(1) != should(0)
  qemu-system-x86-2503  [000] .n..  1145.889669: kvm_arch_vcpu_ioctl_run: 
 exit_reason=2
  qemu-system-x86-2503  [000] .n..  1145.889669: kvm_arch_vcpu_ioctl_run: 
 preempt_count=8000
 

So afaiu you're running this on an AMD fam 0xf which doesn't have
synchronized TSC; so again the timing might be off.. however:

The first block from cpu 0, sends an IPI to cpu 1, cpu 1 sees 'n'.

Then again, cpu 0 sends to cpu 1, we observe 'n'.

So far so good and timing seems to reflect causality.

Then however, cpu 0 sends, but nothing in this trace answers on cpu 1..
odd.

Then cpu 1 sends to cpu 0, cpu 0 does receive the IPI but does _not_
observe 'n', which is mighty odd. However after the IPI is handled cpu 0
magically gets the 'n'.



So one thing I did wonder about; if cpu 0 sends an IPI while cpu 1 is
already processing the same interrupt; cpu 1 should latch the interrupt
and it would re-trigger with when we re-enable interrupts after we're
done with the first.

The only thing that would ruin that is late-ack, eg. ack (clear the
latch) when exiting the interrupt; but I checked and we ack before we
call scheduler_ipi().

Now; if this were to happen (but I don't exactly see how on a 2 CPU
system, nor do I see this in the trace), there might be a hole...

CPU 0   |   CPU 1
+
|
/* ! resched_task */|
smp_send_reschedule(1)  |L
|L  smp_reschedule_interrupt()
resched_task()  |L
  set_tsk_need_resched()|L
  smp_mb()  |L
  smp_send_reschedule(1)|L
|Lack_APIC_irq()
| scheduler_ipi()
|   need_resched();

Now strictly speaking that need_resched() load could creep up before
the store and we'd fail to observe 'n'.

However, x86 forbids reads from passing an earlier write to the same
location (it allows passing earlier writes in general). At which point
the only other option is GCC emitting that load far earlier, in fact
before ack_APIC_irq(), but that would be a violation of the C std
because scheduler_ipi() is a (real) function call and acts as a compiler
barrier AFAIK (as opposed to inlining which doesn't act as a compiler
barrier).

So colour me confused..


You could try putting an smp_mb() at the very start of scheduler_ipi()
just to make absolutely sure, but it _should_ not

Re: Another preempt folding issue?

2014-02-14 Thread Borislav Petkov

On Thu, Feb 13, 2014 at 07:26:05PM +0100, Peter Zijlstra wrote:
 On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote:
  Yeah... not sure the interleaved source helps or not ...
 
 It did, thanks!

Stefan, can you also send sched/core.s? I'm particularly interested in
what resched_task() has been turned into by your build env.

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Borislav Petkov

On Fri, Feb 14, 2014 at 03:24:09PM +0100, Stefan Bader wrote:
 Actually, this code just makes so much more sense if I let objdump do
 relocation info...

Ok, we're pretty sure you have an MFENCE there in resched_task but can
you confirm it please.

First, does /proc/cpuinfo have the sse2 string? It should but nowadays
I don't trust anything.

Then, can you boot that kernel in qemu with the -gdb flag so that it
starts the gdb stub, here's the manpage about it:

   -gdb dev
   Wait for gdb connection on device dev. Typical connections will 
likely be
   TCP-based, but also UDP, pseudo TTY, or even stdio are reasonable 
use case.
   The latter is allowing to start QEMU from within gdb and establish 
the
   connection via a pipe:

   (gdb) target remote | exec qemu-system-i386 -gdb stdio ...

   -s  Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 1234.

then boot the guest and when it is up, do

$ gdb ./vmlinux
$ target remote localhost:1234

and type in the prompt

$ (gdb) x/50i resched_task

It gives here:

(gdb) x/50i resched_task
   0x810836f0 resched_task:   data32 data32 data32 xchg %ax,%ax
   0x810836f5 resched_task+5: push   %rbp
   0x810836f6 resched_task+6: mov0x85e123(%rip),%r10d# 
0x818e1820 debug_locks
   0x810836fd resched_task+13:mov%rsp,%rbp
   0x81083700 resched_task+16:push   %r12
   0x81083702 resched_task+18:test   %r10d,%r10d
   0x81083705 resched_task+21:push   %rbx
   0x81083706 resched_task+22:mov%rdi,%rbx
   0x81083709 resched_task+25:jne0x81083760 
resched_task+112
   0x8108370b resched_task+27:mov0x8(%rbx),%rax
   0x8108370f resched_task+31:mov0x10(%rax),%rdx
   0x81083713 resched_task+35:and$0x8,%edx
   0x81083716 resched_task+38:jne0x8108373c 
resched_task+76
   0x81083718 resched_task+40:lock orb $0x8,0x10(%rax)
   0x8108371d resched_task+45:mov0x8(%rbx),%rax
   0x81083721 resched_task+49:mov0x18(%rax),%r12d
   0x81083725 resched_task+53:callq  0x812d8fc0 
debug_smp_processor_id
   0x8108372a resched_task+58:cmp%r12d,%eax
   0x8108372d resched_task+61:je 0x810837a0 
resched_task+176
   0x8108372f resched_task+63:mfence
^^
I want to make sure the smp_mb() is really replaced with an MFENCE there.

Thanks!

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Borislav Petkov

Oh, and just in case this is relatively easy to reproduce and in case we
don't have any other idea, bisection might be another option. I'm not
saying you should do it right away - I'm just putting it on the table...

:-)

:-)

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Stefan Bader

On 14.02.2014 16:21, Borislav Petkov wrote:
 Oh, and just in case this is relatively easy to reproduce and in case we
 don't have any other idea, bisection might be another option. I'm not
 saying you should do it right away - I'm just putting it on the table...
 
 :-)
 
 :-)
 

Oh yeah, bisection is nearly as entertaining as doing my tax records. Hm, on the
other hand those will have to be done at some point too... :-P



signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Borislav Petkov

On Fri, Feb 14, 2014 at 04:28:16PM +0100, Stefan Bader wrote:
 Oh yeah, bisection is nearly as entertaining as doing my tax records.
 Hm, on the other hand those will have to be done at some point too...
 :-P

Ah, tax records, crap, I have those lying around in the corner since
forever... I need to do those too :-(

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue? (maybe bisect)

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 04:21:32PM +0100, Borislav Petkov wrote:
 Oh, and just in case this is relatively easy to reproduce and in case we
 don't have any other idea, bisection might be another option. I'm not
 saying you should do it right away - I'm just putting it on the table...

I'm fairly sure you're going to hit the patch that introduced the whole
per-cpu preemption thing. Before that not receiving the IPI wasn't as
bad as it is now.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 14.02.2014 15:47, Borislav Petkov wrote:
 On Fri, Feb 14, 2014 at 03:24:09PM +0100, Stefan Bader wrote:
 Actually, this code just makes so much more sense if I let objdump do
 relocation info...
 
 Ok, we're pretty sure you have an MFENCE there in resched_task but can
 you confirm it please.
 
 First, does /proc/cpuinfo have the sse2 string? It should but nowadays
 I don't trust anything.
 
 Then, can you boot that kernel in qemu with the -gdb flag so that it
 starts the gdb stub, here's the manpage about it:
 
-gdb dev
Wait for gdb connection on device dev. Typical connections will 
 likely be
TCP-based, but also UDP, pseudo TTY, or even stdio are reasonable 
 use case.
The latter is allowing to start QEMU from within gdb and establish 
 the
connection via a pipe:
 
(gdb) target remote | exec qemu-system-i386 -gdb stdio ...
 
-s  Shorthand for -gdb tcp::1234, i.e. open a gdbserver on TCP port 
 1234.
 
 then boot the guest and when it is up, do
 
 $ gdb ./vmlinux
 $ target remote localhost:1234
 
 and type in the prompt
 
 $ (gdb) x/50i resched_task
 
 It gives here:
 
 (gdb) x/50i resched_task
0x810836f0 resched_task:   data32 data32 data32 xchg %ax,%ax
0x810836f5 resched_task+5: push   %rbp
0x810836f6 resched_task+6: mov0x85e123(%rip),%r10d# 
 0x818e1820 debug_locks
0x810836fd resched_task+13:mov%rsp,%rbp
0x81083700 resched_task+16:push   %r12
0x81083702 resched_task+18:test   %r10d,%r10d
0x81083705 resched_task+21:push   %rbx
0x81083706 resched_task+22:mov%rdi,%rbx
0x81083709 resched_task+25:jne0x81083760 
 resched_task+112
0x8108370b resched_task+27:mov0x8(%rbx),%rax
0x8108370f resched_task+31:mov0x10(%rax),%rdx
0x81083713 resched_task+35:and$0x8,%edx
0x81083716 resched_task+38:jne0x8108373c 
 resched_task+76
0x81083718 resched_task+40:lock orb $0x8,0x10(%rax)
0x8108371d resched_task+45:mov0x8(%rbx),%rax
0x81083721 resched_task+49:mov0x18(%rax),%r12d
0x81083725 resched_task+53:callq  0x812d8fc0 
 debug_smp_processor_id
0x8108372a resched_task+58:cmp%r12d,%eax
0x8108372d resched_task+61:je 0x810837a0 
 resched_task+176
0x8108372f resched_task+63:mfence
   ^^
 I want to make sure the smp_mb() is really replaced with an MFENCE there.
 
 Thanks!
 
Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. And
there is a mfence in the disassembly:

   0xc107dcb0 resched_task:   push   %ebp
   0xc107dcb1 resched_task+1: mov%esp,%ebp
   0xc107dcb3 resched_task+3: lea%ds:0x0(%esi,%eiz,1),%esi
   0xc107dcb8 resched_task+8: mov0x4(%eax),%edx
   0xc107dcbb resched_task+11:mov0x8(%edx),%ecx
   0xc107dcbe resched_task+14:and$0x8,%ecx
   0xc107dcc1 resched_task+17:jne0xc107dce7 resched_task+55
   0xc107dcc3 resched_task+19:orb$0x8,%ds:0x8(%edx)
   0xc107dcc8 resched_task+24:mov0x4(%eax),%edx
   0xc107dccb resched_task+27:mov%fs:0xc1a71010,%ecx
   0xc107dcd2 resched_task+34:mov0x10(%edx),%edx
   0xc107dcd5 resched_task+37:cmp%ecx,%edx
   0xc107dcd7 resched_task+39:je 0xc107dd00 resched_task+80
   0xc107dcd9 resched_task+41:mfence
   0xc107dcdc resched_task+44:mov%esi,%esi
   0xc107dcde resched_task+46:mov0x4(%eax),%eax
   0xc107dce1 resched_task+49:testb  $0x4,0xc(%eax)
   0xc107dce5 resched_task+53:je 0xc107dcf0 resched_task+64
   0xc107dce7 resched_task+55:pop%ebp
   0xc107dce8 resched_task+56:ret
   0xc107dce9 resched_task+57:lea0x0(%esi,%eiz,1),%esi
   0xc107dcf0 resched_task+64:mov%edx,%eax
   0xc107dcf2 resched_task+66:call   *0xc1917950
   0xc107dcf8 resched_task+72:pop%ebp

Thinking about it, I guess Peter is quite right saying that I likely will end on
the patch that converted preempt_count to percpu.

One thing I likely should do is to reinstall the exact same laptop with 64bit
kernel and userspace... maybe only 64bit kernel first... and make sure on my
side that this does not show up on 64bit, too. I took the word of reporters for
that (and the impression that otherwise many more people would have complained).




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-14 Thread Peter Zijlstra

On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
 One thing I likely should do is to reinstall the exact same laptop with 64bit
 kernel and userspace... maybe only 64bit kernel first... and make sure on my
 side that this does not show up on 64bit, too. I took the word of reporters 
 for
 that (and the impression that otherwise many more people would have 
 complained).

Yeha, I'm going to try and install some 32bit userspace on a usb
harddisk I've got and see if I can boot my Core2 laptop from that to try
and reproduce.

But all that is probably going to be Monday :/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Borislav Petkov

On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
 Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. 
 And
 there is a mfence in the disassembly:

Btw, I just realized booting the kernel in the guest was a dumb idea,
because, doh, the guest is not baremetal. The only reliable thing we
can say is that sse2 is present and that MFENCE alternative replacement
works :)

But for simplicity's sake let's just assume the machine can do MFENCE
just fine and it gets replaced by the alternatives code.

Besides, if that weren't true, we'd have a whole lot of other problems
on those boxes.

 Thinking about it, I guess Peter is quite right saying that I likely
 will end on the patch that converted preempt_count to percpu.

Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al.

 One thing I likely should do is to reinstall the exact same laptop
 with 64bit kernel and userspace... maybe only 64bit kernel first...
 and make sure on my side that this does not show up on 64bit, too. I
 took the word of reporters for that (and the impression that otherwise
 many more people would have complained).

Yeah, that should be a prudent thing to do.

Also, Paolo and I were wondering whether you can trigger this thing
without kvm, i.e. virtualization involved... do you have any data on
that?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 14.02.2014 18:33, Borislav Petkov wrote:
 On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
 Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu info. 
 And
 there is a mfence in the disassembly:
 
 Btw, I just realized booting the kernel in the guest was a dumb idea,
 because, doh, the guest is not baremetal. The only reliable thing we
 can say is that sse2 is present and that MFENCE alternative replacement
 works :)
 
 But for simplicity's sake let's just assume the machine can do MFENCE
 just fine and it gets replaced by the alternatives code.
 
 Besides, if that weren't true, we'd have a whole lot of other problems
 on those boxes.
 
 Thinking about it, I guess Peter is quite right saying that I likely
 will end on the patch that converted preempt_count to percpu.
 
 Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al.
 
 One thing I likely should do is to reinstall the exact same laptop
 with 64bit kernel and userspace... maybe only 64bit kernel first...
 and make sure on my side that this does not show up on 64bit, too. I
 took the word of reporters for that (and the impression that otherwise
 many more people would have complained).
 
 Yeah, that should be a prudent thing to do.
 
 Also, Paolo and I were wondering whether you can trigger this thing
 without kvm, i.e. virtualization involved... do you have any data on
 that?

Unfortunately no hard evidence. Kvm just happens to be such a good way to notice
this as it is using the reschedule interrupt itself and has this exit before
running the guest vcpu to hadnle it in the outer loop by calling cond_resched()
and repeat.
I find running kvm seems to make that laptop quite sluggish in responding to
other tasks (in that install) and I got some oddness going on when lightdm quite
often refuses to take keyboard input without opening some menu with the mouse
first... But I could not be sure whether that is the kernel or some new
user-space ... errr feature.
At least Marcello (iirc that other report came from him directly or indirectly)
has seen it, too. And he likely has complete different user-space.

So I will go and do that different (64bit) kernel and kernel + user-space test.
But like fo Peter, it likely is a Monday thing...






signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-14 Thread Stefan Bader

On 14.02.2014 19:23, Stefan Bader wrote:
 On 14.02.2014 18:33, Borislav Petkov wrote:
 On Fri, Feb 14, 2014 at 06:02:32PM +0100, Stefan Bader wrote:
 Okaaay, I think I did what you asked. So yes, there is sse2 in the cpu 
 info. And
 there is a mfence in the disassembly:

 Btw, I just realized booting the kernel in the guest was a dumb idea,
 because, doh, the guest is not baremetal. The only reliable thing we
 can say is that sse2 is present and that MFENCE alternative replacement
 works :)

 But for simplicity's sake let's just assume the machine can do MFENCE
 just fine and it gets replaced by the alternatives code.

 Besides, if that weren't true, we'd have a whole lot of other problems
 on those boxes.

 Thinking about it, I guess Peter is quite right saying that I likely
 will end on the patch that converted preempt_count to percpu.

 Yeah, c2daa3bed53a81171cf8c1a36db798e82b91afe8 et al.

 One thing I likely should do is to reinstall the exact same laptop
 with 64bit kernel and userspace... maybe only 64bit kernel first...
 and make sure on my side that this does not show up on 64bit, too. I
 took the word of reporters for that (and the impression that otherwise
 many more people would have complained).

 Yeah, that should be a prudent thing to do.

 Also, Paolo and I were wondering whether you can trigger this thing
 without kvm, i.e. virtualization involved... do you have any data on
 that?
 
 Unfortunately no hard evidence. Kvm just happens to be such a good way to 
 notice
 this as it is using the reschedule interrupt itself and has this exit before
 running the guest vcpu to hadnle it in the outer loop by calling 
 cond_resched()
 and repeat.
 I find running kvm seems to make that laptop quite sluggish in responding to
 other tasks (in that install) and I got some oddness going on when lightdm 
 quite
 often refuses to take keyboard input without opening some menu with the mouse
 first... But I could not be sure whether that is the kernel or some new
 user-space ... errr feature.
 At least Marcello (iirc that other report came from him directly or 
 indirectly)
 has seen it, too. And he likely has complete different user-space.
 
 So I will go and do that different (64bit) kernel and kernel + user-space 
 test.
 But like fo Peter, it likely is a Monday thing...
 

Ok, it is still Friday... So a quick test (2 boots of a 32bit guest, same as
before) on the 32bit user-space, with the same kernel source, but compiled as
64bit (obviously not 100% same config but close). While I see the false
inconsistency messages (I modified the in kernel-test to trigger the trace stop
only if the __vcpu_run loop encounters an inconsistent state three times in a
row), I do not see the final stop message. Also (but that is rather feeling) the
system seems to remain more responsive (switching to other windows, opening
terminal windows,...) compared to 32bit kernel.




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-13 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote:
> Yeah... not sure the interleaved source helps or not ...

It did, thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-13 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
> On 12.02.2014 12:54, Peter Zijlstra wrote:
> > On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
> >> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
> >> limited to 8 (for the 32bit kernel). So the default apic driver is used. 
> >> Since
> >> default_send_IPI_mask_logical is only used from there, I assume the trace 
> >> you
> >> got does the same. Maybe something there is wrong which would explain why 
> >> we
> >> only see it on 32bit hosts.
> > 
> > Can you try with a different APIC driver to test this?
> > 
> I don't think I can. And I think the statement about this only be used for 
> 32bit
> could be wrong. I got mislead to think so because those are only defined in
> probe_32 but the 64bit counterpart isn't containing much aside that.
> 
> Anyway, I played around with tracing a bit more. So with this change:
> 
> if (need_resched()) {
> srcu_read_unlock(>srcu, vcpu->srcu_idx);
> if (need_resched() != should_resched()) {
> +   trace_printk("need(%i) != should(%i)\n",
> +   need_resched(), should_resched());
> +   trace_printk("exit_reason=%u\n",
> +   vcpu->run->exit_reason);
> +trace_printk("preempt_count=%lx\n",
> +__this_cpu_read_4(__preempt_count));
> +tracing_stop();
> +printk(KERN_ERR "Stopped tracing, due to
> inconsistent state.\n");
> }
>  +  schedule();
>  -  cond_reschedule();
> vcpu->srcu_idx = srcu_read_lock(>srcu);
> }
> 
> I get the following (weird) output:
> 
> Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
> <-resched_task
> Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
> <-native_smp_send_reschedule
>   bamfdaemon-2318  [001] d...71.270465: resched_task 
> <-check_preempt_wakeup
>   bamfdaemon-2318  [001] d...71.270539: resched_task 
> <-check_preempt_wakeup
>   compiz-2365  [001] d...71.270689: resched_task 
> <-check_preempt_wakeup
>   compiz-2365  [001] d...71.270827: resched_task 
> <-check_preempt_wakeup
>   compiz-2365  [001] d...71.270940: resched_task 
> <-check_preempt_wakeup
>  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
> <-reschedule_interrupt
>  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
> <-smp_reschedule_interrupt
>  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
> need(1)
> != should(0)
>  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
> exit_reason=2
>  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
> preempt_count=0
> 
> So am I reading this right, that the interrupt did get delivered to cpu#0 
> while
> the thread info already had the resched flag set. So this really should have
> cleared the bit in preempt_count. But while the trace info shows 'N' for some
> reason should_reschedule returns false but at the same time reading the 
> preempt
> count manually shows it 0?

So the assembly merges the first and second should_resched(), so its
possible that load got before the interrupt().

The 3rd preempt_count load gets re-issued and so that would show the
'true' value again.

If you want to force a reload after the condition; put in a barrier().

In any case; this looks like a false-positive. Please try again until
you get one where the interrupt doesn't happen and we stay in 'n' state.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-13 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
> On 12.02.2014 12:54, Peter Zijlstra wrote:
> > On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
> >> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
> >> limited to 8 (for the 32bit kernel). So the default apic driver is used. 
> >> Since
> >> default_send_IPI_mask_logical is only used from there, I assume the trace 
> >> you
> >> got does the same. Maybe something there is wrong which would explain why 
> >> we
> >> only see it on 32bit hosts.
> > 
> > Can you try with a different APIC driver to test this?
> > 
> I don't think I can. And I think the statement about this only be used for 
> 32bit
> could be wrong. I got mislead to think so because those are only defined in
> probe_32 but the 64bit counterpart isn't containing much aside that.
> 
> Anyway, I played around with tracing a bit more. So with this change:
> 
> if (need_resched()) {
> srcu_read_unlock(>srcu, vcpu->srcu_idx);
> if (need_resched() != should_resched()) {
> +   trace_printk("need(%i) != should(%i)\n",
> +   need_resched(), should_resched());
> +   trace_printk("exit_reason=%u\n",
> +   vcpu->run->exit_reason);
> +trace_printk("preempt_count=%lx\n",
> +__this_cpu_read_4(__preempt_count));
> +tracing_stop();
> +printk(KERN_ERR "Stopped tracing, due to
> inconsistent state.\n");
> }
>  +  schedule();
>  -  cond_reschedule();
> vcpu->srcu_idx = srcu_read_lock(>srcu);
> }
> 
> I get the following (weird) output:
> 
> Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
> <-resched_task
> Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
> <-native_smp_send_reschedule
>   bamfdaemon-2318  [001] d...71.270465: resched_task 
> <-check_preempt_wakeup
>   bamfdaemon-2318  [001] d...71.270539: resched_task 
> <-check_preempt_wakeup
>   compiz-2365  [001] d...71.270689: resched_task 
> <-check_preempt_wakeup
>   compiz-2365  [001] d...71.270827: resched_task 
> <-check_preempt_wakeup
>   compiz-2365  [001] d...71.270940: resched_task 
> <-check_preempt_wakeup
>  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
> <-reschedule_interrupt
>  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
> <-smp_reschedule_interrupt
>  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
> need(1)
> != should(0)
>  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
> exit_reason=2
>  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
> preempt_count=0
> 
> So am I reading this right, that the interrupt did get delivered to cpu#0 
> while
> the thread info already had the resched flag set. So this really should have
> cleared the bit in preempt_count. But while the trace info shows 'N' for some
> reason should_reschedule returns false but at the same time reading the 
> preempt
> count manually shows it 0?

*blink*... That's weird indeed... do you have the asm that goes along
with that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-13 Thread Stefan Bader

On 12.02.2014 12:54, Peter Zijlstra wrote:
> On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
>> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
>> limited to 8 (for the 32bit kernel). So the default apic driver is used. 
>> Since
>> default_send_IPI_mask_logical is only used from there, I assume the trace you
>> got does the same. Maybe something there is wrong which would explain why we
>> only see it on 32bit hosts.
> 
> Can you try with a different APIC driver to test this?
> 
I don't think I can. And I think the statement about this only be used for 32bit
could be wrong. I got mislead to think so because those are only defined in
probe_32 but the 64bit counterpart isn't containing much aside that.

Anyway, I played around with tracing a bit more. So with this change:

if (need_resched()) {
srcu_read_unlock(>srcu, vcpu->srcu_idx);
if (need_resched() != should_resched()) {
+   trace_printk("need(%i) != should(%i)\n",
+   need_resched(), should_resched());
+   trace_printk("exit_reason=%u\n",
+   vcpu->run->exit_reason);
+trace_printk("preempt_count=%lx\n",
+__this_cpu_read_4(__preempt_count));
+tracing_stop();
+printk(KERN_ERR "Stopped tracing, due to
inconsistent state.\n");
}
 +  schedule();
 -  cond_reschedule();
vcpu->srcu_idx = srcu_read_lock(>srcu);
}

I get the following (weird) output:

Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
<-resched_task
Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
<-native_smp_send_reschedule
  bamfdaemon-2318  [001] d...71.270465: resched_task 
<-check_preempt_wakeup
  bamfdaemon-2318  [001] d...71.270539: resched_task 
<-check_preempt_wakeup
  compiz-2365  [001] d...71.270689: resched_task 
<-check_preempt_wakeup
  compiz-2365  [001] d...71.270827: resched_task 
<-check_preempt_wakeup
  compiz-2365  [001] d...71.270940: resched_task 
<-check_preempt_wakeup
 qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
<-reschedule_interrupt
 qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
<-smp_reschedule_interrupt
 qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: need(1)
!= should(0)
 qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
exit_reason=2
 qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
preempt_count=0

So am I reading this right, that the interrupt did get delivered to cpu#0 while
the thread info already had the resched flag set. So this really should have
cleared the bit in preempt_count. But while the trace info shows 'N' for some
reason should_reschedule returns false but at the same time reading the preempt
count manually shows it 0?




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-13 Thread Stefan Bader

On 12.02.2014 12:54, Peter Zijlstra wrote:
 On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
 Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
 limited to 8 (for the 32bit kernel). So the default apic driver is used. 
 Since
 default_send_IPI_mask_logical is only used from there, I assume the trace you
 got does the same. Maybe something there is wrong which would explain why we
 only see it on 32bit hosts.
 
 Can you try with a different APIC driver to test this?
 
I don't think I can. And I think the statement about this only be used for 32bit
could be wrong. I got mislead to think so because those are only defined in
probe_32 but the 64bit counterpart isn't containing much aside that.

Anyway, I played around with tracing a bit more. So with this change:

if (need_resched()) {
srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
if (need_resched() != should_resched()) {
+   trace_printk(need(%i) != should(%i)\n,
+   need_resched(), should_resched());
+   trace_printk(exit_reason=%u\n,
+   vcpu-run-exit_reason);
+trace_printk(preempt_count=%lx\n,
+__this_cpu_read_4(__preempt_count));
+tracing_stop();
+printk(KERN_ERR Stopped tracing, due to
inconsistent state.\n);
}
 +  schedule();
 -  cond_reschedule();
vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
}

I get the following (weird) output:

Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
-resched_task
Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
-native_smp_send_reschedule
  bamfdaemon-2318  [001] d...71.270465: resched_task 
-check_preempt_wakeup
  bamfdaemon-2318  [001] d...71.270539: resched_task 
-check_preempt_wakeup
  compiz-2365  [001] d...71.270689: resched_task 
-check_preempt_wakeup
  compiz-2365  [001] d...71.270827: resched_task 
-check_preempt_wakeup
  compiz-2365  [001] d...71.270940: resched_task 
-check_preempt_wakeup
 qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
-reschedule_interrupt
 qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
-smp_reschedule_interrupt
 qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: need(1)
!= should(0)
 qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
exit_reason=2
 qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
preempt_count=0

So am I reading this right, that the interrupt did get delivered to cpu#0 while
the thread info already had the resched flag set. So this really should have
cleared the bit in preempt_count. But while the trace info shows 'N' for some
reason should_reschedule returns false but at the same time reading the preempt
count manually shows it 0?




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-13 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
 On 12.02.2014 12:54, Peter Zijlstra wrote:
  On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
  Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
  limited to 8 (for the 32bit kernel). So the default apic driver is used. 
  Since
  default_send_IPI_mask_logical is only used from there, I assume the trace 
  you
  got does the same. Maybe something there is wrong which would explain why 
  we
  only see it on 32bit hosts.
  
  Can you try with a different APIC driver to test this?
  
 I don't think I can. And I think the statement about this only be used for 
 32bit
 could be wrong. I got mislead to think so because those are only defined in
 probe_32 but the 64bit counterpart isn't containing much aside that.
 
 Anyway, I played around with tracing a bit more. So with this change:
 
 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 if (need_resched() != should_resched()) {
 +   trace_printk(need(%i) != should(%i)\n,
 +   need_resched(), should_resched());
 +   trace_printk(exit_reason=%u\n,
 +   vcpu-run-exit_reason);
 +trace_printk(preempt_count=%lx\n,
 +__this_cpu_read_4(__preempt_count));
 +tracing_stop();
 +printk(KERN_ERR Stopped tracing, due to
 inconsistent state.\n);
 }
  +  schedule();
  -  cond_reschedule();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }
 
 I get the following (weird) output:
 
 Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
 -resched_task
 Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
 -native_smp_send_reschedule
   bamfdaemon-2318  [001] d...71.270465: resched_task 
 -check_preempt_wakeup
   bamfdaemon-2318  [001] d...71.270539: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270689: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270827: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270940: resched_task 
 -check_preempt_wakeup
  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
 -reschedule_interrupt
  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
 -smp_reschedule_interrupt
  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
 need(1)
 != should(0)
  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
 exit_reason=2
  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
 preempt_count=0
 
 So am I reading this right, that the interrupt did get delivered to cpu#0 
 while
 the thread info already had the resched flag set. So this really should have
 cleared the bit in preempt_count. But while the trace info shows 'N' for some
 reason should_reschedule returns false but at the same time reading the 
 preempt
 count manually shows it 0?

*blink*... That's weird indeed... do you have the asm that goes along
with that?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-13 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 06:00:19PM +0100, Stefan Bader wrote:
 On 12.02.2014 12:54, Peter Zijlstra wrote:
  On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
  Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
  limited to 8 (for the 32bit kernel). So the default apic driver is used. 
  Since
  default_send_IPI_mask_logical is only used from there, I assume the trace 
  you
  got does the same. Maybe something there is wrong which would explain why 
  we
  only see it on 32bit hosts.
  
  Can you try with a different APIC driver to test this?
  
 I don't think I can. And I think the statement about this only be used for 
 32bit
 could be wrong. I got mislead to think so because those are only defined in
 probe_32 but the 64bit counterpart isn't containing much aside that.
 
 Anyway, I played around with tracing a bit more. So with this change:
 
 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 if (need_resched() != should_resched()) {
 +   trace_printk(need(%i) != should(%i)\n,
 +   need_resched(), should_resched());
 +   trace_printk(exit_reason=%u\n,
 +   vcpu-run-exit_reason);
 +trace_printk(preempt_count=%lx\n,
 +__this_cpu_read_4(__preempt_count));
 +tracing_stop();
 +printk(KERN_ERR Stopped tracing, due to
 inconsistent state.\n);
 }
  +  schedule();
  -  cond_reschedule();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }
 
 I get the following (weird) output:
 
 Xorg-1078  [001] d...71.270251: native_smp_send_reschedule
 -resched_task
 Xorg-1078  [001] d...71.270251: default_send_IPI_mask_logical
 -native_smp_send_reschedule
   bamfdaemon-2318  [001] d...71.270465: resched_task 
 -check_preempt_wakeup
   bamfdaemon-2318  [001] d...71.270539: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270689: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270827: resched_task 
 -check_preempt_wakeup
   compiz-2365  [001] d...71.270940: resched_task 
 -check_preempt_wakeup
  qemu-system-x86-2679  [000] dn..71.270999: smp_reschedule_interrupt
 -reschedule_interrupt
  qemu-system-x86-2679  [000] dn..71.270999: scheduler_ipi
 -smp_reschedule_interrupt
  qemu-system-x86-2679  [000] .N..71.271001: kvm_arch_vcpu_ioctl_run: 
 need(1)
 != should(0)
  qemu-system-x86-2679  [000] .N..71.271002: kvm_arch_vcpu_ioctl_run:
 exit_reason=2
  qemu-system-x86-2679  [000] .N..71.271003: kvm_arch_vcpu_ioctl_run:
 preempt_count=0
 
 So am I reading this right, that the interrupt did get delivered to cpu#0 
 while
 the thread info already had the resched flag set. So this really should have
 cleared the bit in preempt_count. But while the trace info shows 'N' for some
 reason should_reschedule returns false but at the same time reading the 
 preempt
 count manually shows it 0?

So the assembly merges the first and second should_resched(), so its
possible that load got before the interrupt().

The 3rd preempt_count load gets re-issued and so that would show the
'true' value again.

If you want to force a reload after the condition; put in a barrier().

In any case; this looks like a false-positive. Please try again until
you get one where the interrupt doesn't happen and we stay in 'n' state.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-13 Thread Peter Zijlstra

On Thu, Feb 13, 2014 at 07:03:56PM +0100, Stefan Bader wrote:
 Yeah... not sure the interleaved source helps or not ...

It did, thanks!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Peter Zijlstra

On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
> Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
> limited to 8 (for the 32bit kernel). So the default apic driver is used. Since
> default_send_IPI_mask_logical is only used from there, I assume the trace you
> got does the same. Maybe something there is wrong which would explain why we
> only see it on 32bit hosts.

Can you try with a different APIC driver to test this?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Joerg Roedel

On Wed, Feb 12, 2014 at 11:40:17AM +0100, Borislav Petkov wrote:
> Also what I'm wondering about and what's not clear from Stefan's reply
> is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
> 64-bit qemu running a 32-bit iso or what is it?
> 
> Or do we have reports for both 32-bit and 64-bit hosts?

As I  understand it, the problem reproduces only on a 32bit host, which
also limits the guests to 32bit, at least when ran with KVM. This is
true even if a x86-64 qemu is used. This is also independent on whether
the host-cpu supports 64bit or not.

Joerg

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Stefan Bader

On 12.02.2014 11:40, Borislav Petkov wrote:
> On Wed, Feb 12, 2014 at 11:37:13AM +0100, Peter Zijlstra wrote:
>>> Another reporter also saw this on an AMD and said it could not be 
>>> reproduced on
>>> the same hardware and the same software versions when using 64bit instead 
>>> of 32.
>>>
>>> In my case on a 32bit installation I will see this on every invocation of
>>
>> Curious.. so its both AMD and Intel.
>>
>> That makes funny hardware less likely... but how can the vmexit loose an
>> interrupt like this.
> 
> Also what I'm wondering about and what's not clear from Stefan's reply
> is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
> 64-bit qemu running a 32-bit iso or what is it?
> 
> Or do we have reports for both 32-bit and 64-bit hosts?

Only for 32bit hosts. I have not tested a 64bit kernel plus 32bit userspace,
though. But 64bit kernel and 64bit user-space seemed ok. And one detail I should
add is that while the reproduction is possible in all attempts, the timing can
vary. In some of the cases this was even before running a Linux kernel in the 
guest.

Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
limited to 8 (for the 32bit kernel). So the default apic driver is used. Since
default_send_IPI_mask_logical is only used from there, I assume the trace you
got does the same. Maybe something there is wrong which would explain why we
only see it on 32bit hosts.

> 
> Thanks.
> 

signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-12 Thread Borislav Petkov

On Wed, Feb 12, 2014 at 11:37:13AM +0100, Peter Zijlstra wrote:
> > Another reporter also saw this on an AMD and said it could not be 
> > reproduced on
> > the same hardware and the same software versions when using 64bit instead 
> > of 32.
> > 
> > In my case on a 32bit installation I will see this on every invocation of
> 
> Curious.. so its both AMD and Intel.
> 
> That makes funny hardware less likely... but how can the vmexit loose an
> interrupt like this.

Also what I'm wondering about and what's not clear from Stefan's reply
is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
64-bit qemu running a 32-bit iso or what is it?

Or do we have reports for both 32-bit and 64-bit hosts?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Peter Zijlstra

On Wed, Feb 12, 2014 at 09:20:24AM +0100, Stefan Bader wrote:
> On 11.02.2014 20:45, Peter Zijlstra wrote:
> > On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote:
> >> Hi Peter,
> >>
> >> I am currently looking at a weird issue that manifest itself when trying 
> >> to run
> >> kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially 
> >> important the
> >> cpu is 64bit capable, so qemu-system-x86_64 is called). 
> > 
> > AMD or Intel machine?
> 
> Personally I am reproducing this on an AMD:
> 
> cpu family: 15
> model : 72
> model name: AMD Turion(tm) 64 X2 Mobile Technology TL-64
> 
> but the reports I got are showing the same issue on an Intel i7
> 
> cpu family: 6
> model : 37
> model name: Intel(R) Core(TM) i7 CPU   M 620  @ 2.67GH
> 
> Another reporter also saw this on an AMD and said it could not be reproduced 
> on
> the same hardware and the same software versions when using 64bit instead of 
> 32.
> 
> In my case on a 32bit installation I will see this on every invocation of

Curious.. so its both AMD and Intel.

That makes funny hardware less likely... but how can the vmexit loose an
interrupt like this.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Stefan Bader

On 11.02.2014 20:45, Peter Zijlstra wrote:
> On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote:
>> Hi Peter,
>>
>> I am currently looking at a weird issue that manifest itself when trying to 
>> run
>> kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important 
>> the
>> cpu is 64bit capable, so qemu-system-x86_64 is called). 
> 
> AMD or Intel machine?

Personally I am reproducing this on an AMD:

cpu family  : 15
model   : 72
model name  : AMD Turion(tm) 64 X2 Mobile Technology TL-64

but the reports I got are showing the same issue on an Intel i7

cpu family  : 6
model   : 37
model name  : Intel(R) Core(TM) i7 CPU   M 620  @ 2.67GH

Another reporter also saw this on an AMD and said it could not be reproduced on
the same hardware and the same software versions when using 64bit instead of 32.

In my case on a 32bit installation I will see this on every invocation of

qemu-system-x86_64 -cdrom <32bit live iso> -boot d -m 512

> 
>> Sooner or later this
>> causes softlockup messages on the host. I tracked this down to __vcpu_run in
>> arch/x86/kvm/x86.c which does a loop which in that case never seems to make
>> progress or exit.
>>
>> What I found is that vcpu_enter_guest will exit quickly without causing the 
>> loop
>> to exit when need_resched() is true. Looking at a crash dump I took, this was
>> the case (thread_info->flags had TIF_NEED_RESCHED set). So after immediately
>> returning __vcpu_run has the following code:
>>
>> if (need_resched()) {
>> srcu_read_unlock(>srcu, vcpu->srcu_idx);
>> kvm_resched(vcpu); // now cond_resched();
>> vcpu->srcu_idx = srcu_read_lock(>srcu);
>> }
>>
>> The kvm_resched basically would end up doing a cond_resched() which now 
>> checks
>> preempt_count() to be 0. If that is zero it will do the reschedule, 
>> otherwise it
>> just does nothing. Looking at the percpu variables in the dump, I saw that
>> the preempt_count was 0x800 (actually it was 0x8011 but that was me
>> triggering the kexec crashdump with sysrq-c).
>>
>> I saw that there have been some changes in the upstream kernel and have 
>> picked
>> the following:
>> 1) x86, acpi, idle: Restructure the mwait idle routines
>> 2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
>> 3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
>> 4) sched/preempt/x86: Fix voluntary preempt for x86
>>
>> Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and 
>> to
>> the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do 
>> for
>> 3.13.y stable].
>>
>> Still, with all those I got the softlockup. Since I knew from the dump info 
>> that
>> something is wrong with the folding, I made the pragmatic approach and added 
>> the
>> following:
>>
>> if (need_resched()) {
>> srcu_read_unlock(>srcu, vcpu->srcu_idx);
>> +   preempt_fold_need_resched();
>> kvm_resched(vcpu); // now cond_resched();
>> vcpu->srcu_idx = srcu_read_lock(>srcu);
>> }
>>
>> And this lets the kvm guest run without the softlockups! However I am less 
>> than
>> convinced that this is the right thing to do. Somehow something done when
>> converting the preempt_count into percpu has caused at least the i386 side to
>> get into this mess (as there has not been any whining about 64bit). Just 
>> fail to
>> see what.
> 
> I've been looking at the same thing too; I've got a trace from someone
> who can reproduce and its just not making sense.
> 
> Looks like the hardware is loosing an interrupt (or worse).
> 
> With the below patch on top of current -git (might be whitespace damaged
> due to copy-paste).
> 
> ---
>  arch/x86/kvm/x86.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 39c28f09dfd5..f8b2fc7ce491 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6115,7 +6115,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
> }
> if (need_resched()) {
> srcu_read_unlock(>srcu, vcpu->srcu_idx);
> -   cond_resched();
> +   if (need_resched() != should_resched()) {
> +   tracing_stop();
> +   printk(KERN_ERR "Stopped tracing, due to 
> inconsistent state.\n");
> +   }
> +   schedule();
> vcpu->srcu_idx = srcu_read_lock(>srcu);
> }
> }
> 
> ---
> 
> I got the following from an Athlon X2 (fam 0xf):
> 
> On Mon, Feb 10, 2014 at 05:23:15PM -0200, Marcelo Tosatti wrote:
>> What is the problem exactly?
> 
> From the trace:
> 
>  qemu-system-x86-2455  [001] d.s3   393.630586: smp_reschedule_interrupt 
> <-reschedule_interrupt
>  qemu-system-x86-2455  [001] d.s3   393.630586:

Re: Another preempt folding issue?

2014-02-12 Thread Stefan Bader

On 11.02.2014 20:45, Peter Zijlstra wrote:
 On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote:
 Hi Peter,

 I am currently looking at a weird issue that manifest itself when trying to 
 run
 kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important 
 the
 cpu is 64bit capable, so qemu-system-x86_64 is called). 
 
 AMD or Intel machine?

Personally I am reproducing this on an AMD:

cpu family  : 15
model   : 72
model name  : AMD Turion(tm) 64 X2 Mobile Technology TL-64

but the reports I got are showing the same issue on an Intel i7

cpu family  : 6
model   : 37
model name  : Intel(R) Core(TM) i7 CPU   M 620  @ 2.67GH

Another reporter also saw this on an AMD and said it could not be reproduced on
the same hardware and the same software versions when using 64bit instead of 32.

In my case on a 32bit installation I will see this on every invocation of

qemu-system-x86_64 -cdrom 32bit live iso -boot d -m 512

 
 Sooner or later this
 causes softlockup messages on the host. I tracked this down to __vcpu_run in
 arch/x86/kvm/x86.c which does a loop which in that case never seems to make
 progress or exit.

 What I found is that vcpu_enter_guest will exit quickly without causing the 
 loop
 to exit when need_resched() is true. Looking at a crash dump I took, this was
 the case (thread_info-flags had TIF_NEED_RESCHED set). So after immediately
 returning __vcpu_run has the following code:

 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 kvm_resched(vcpu); // now cond_resched();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }

 The kvm_resched basically would end up doing a cond_resched() which now 
 checks
 preempt_count() to be 0. If that is zero it will do the reschedule, 
 otherwise it
 just does nothing. Looking at the percpu variables in the dump, I saw that
 the preempt_count was 0x800 (actually it was 0x8011 but that was me
 triggering the kexec crashdump with sysrq-c).

 I saw that there have been some changes in the upstream kernel and have 
 picked
 the following:
 1) x86, acpi, idle: Restructure the mwait idle routines
 2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
 3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
 4) sched/preempt/x86: Fix voluntary preempt for x86

 Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and 
 to
 the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do 
 for
 3.13.y stable].

 Still, with all those I got the softlockup. Since I knew from the dump info 
 that
 something is wrong with the folding, I made the pragmatic approach and added 
 the
 following:

 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 +   preempt_fold_need_resched();
 kvm_resched(vcpu); // now cond_resched();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }

 And this lets the kvm guest run without the softlockups! However I am less 
 than
 convinced that this is the right thing to do. Somehow something done when
 converting the preempt_count into percpu has caused at least the i386 side to
 get into this mess (as there has not been any whining about 64bit). Just 
 fail to
 see what.
 
 I've been looking at the same thing too; I've got a trace from someone
 who can reproduce and its just not making sense.
 
 Looks like the hardware is loosing an interrupt (or worse).
 
 With the below patch on top of current -git (might be whitespace damaged
 due to copy-paste).
 
 ---
  arch/x86/kvm/x86.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 39c28f09dfd5..f8b2fc7ce491 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -6115,7 +6115,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
 }
 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 -   cond_resched();
 +   if (need_resched() != should_resched()) {
 +   tracing_stop();
 +   printk(KERN_ERR Stopped tracing, due to 
 inconsistent state.\n);
 +   }
 +   schedule();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }
 }
 
 ---
 
 I got the following from an Athlon X2 (fam 0xf):
 
 On Mon, Feb 10, 2014 at 05:23:15PM -0200, Marcelo Tosatti wrote:
 What is the problem exactly?
 
 From the trace:
 
  qemu-system-x86-2455  [001] d.s3   393.630586: smp_reschedule_interrupt 
 -reschedule_interrupt
  qemu-system-x86-2455  [001] d.s3   393.630586: scheduler_ipi 
 -smp_reschedule_interrupt
  qemu-system-x86-2455  [001] ..s3   393.630586: preempt_count_sub 
 -_raw_spin_unlock_irqrestore
  qemu-system-x86-2455  [001] ..s2

Re: Another preempt folding issue?

2014-02-12 Thread Peter Zijlstra

On Wed, Feb 12, 2014 at 09:20:24AM +0100, Stefan Bader wrote:
 On 11.02.2014 20:45, Peter Zijlstra wrote:
  On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote:
  Hi Peter,
 
  I am currently looking at a weird issue that manifest itself when trying 
  to run
  kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially 
  important the
  cpu is 64bit capable, so qemu-system-x86_64 is called). 
  
  AMD or Intel machine?
 
 Personally I am reproducing this on an AMD:
 
 cpu family: 15
 model : 72
 model name: AMD Turion(tm) 64 X2 Mobile Technology TL-64
 
 but the reports I got are showing the same issue on an Intel i7
 
 cpu family: 6
 model : 37
 model name: Intel(R) Core(TM) i7 CPU   M 620  @ 2.67GH
 
 Another reporter also saw this on an AMD and said it could not be reproduced 
 on
 the same hardware and the same software versions when using 64bit instead of 
 32.
 
 In my case on a 32bit installation I will see this on every invocation of

Curious.. so its both AMD and Intel.

That makes funny hardware less likely... but how can the vmexit loose an
interrupt like this.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Borislav Petkov

On Wed, Feb 12, 2014 at 11:37:13AM +0100, Peter Zijlstra wrote:
  Another reporter also saw this on an AMD and said it could not be 
  reproduced on
  the same hardware and the same software versions when using 64bit instead 
  of 32.
  
  In my case on a 32bit installation I will see this on every invocation of
 
 Curious.. so its both AMD and Intel.
 
 That makes funny hardware less likely... but how can the vmexit loose an
 interrupt like this.

Also what I'm wondering about and what's not clear from Stefan's reply
is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
64-bit qemu running a 32-bit iso or what is it?

Or do we have reports for both 32-bit and 64-bit hosts?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Stefan Bader

On 12.02.2014 11:40, Borislav Petkov wrote:
 On Wed, Feb 12, 2014 at 11:37:13AM +0100, Peter Zijlstra wrote:
 Another reporter also saw this on an AMD and said it could not be 
 reproduced on
 the same hardware and the same software versions when using 64bit instead 
 of 32.

 In my case on a 32bit installation I will see this on every invocation of

 Curious.. so its both AMD and Intel.

 That makes funny hardware less likely... but how can the vmexit loose an
 interrupt like this.
 
 Also what I'm wondering about and what's not clear from Stefan's reply
 is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
 64-bit qemu running a 32-bit iso or what is it?
 
 Or do we have reports for both 32-bit and 64-bit hosts?

Only for 32bit hosts. I have not tested a 64bit kernel plus 32bit userspace,
though. But 64bit kernel and 64bit user-space seemed ok. And one detail I should
add is that while the reproduction is possible in all attempts, the timing can
vary. In some of the cases this was even before running a Linux kernel in the 
guest.

Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
limited to 8 (for the 32bit kernel). So the default apic driver is used. Since
default_send_IPI_mask_logical is only used from there, I assume the trace you
got does the same. Maybe something there is wrong which would explain why we
only see it on 32bit hosts.

 
 Thanks.
 




signature.asc
Description: OpenPGP digital signature

Re: Another preempt folding issue?

2014-02-12 Thread Joerg Roedel

On Wed, Feb 12, 2014 at 11:40:17AM +0100, Borislav Petkov wrote:
 Also what I'm wondering about and what's not clear from Stefan's reply
 is whether this is purely a 32-bit issue, i.e. a 32-bit host running a
 64-bit qemu running a 32-bit iso or what is it?
 
 Or do we have reports for both 32-bit and 64-bit hosts?

As I  understand it, the problem reproduces only on a 32bit host, which
also limits the guests to 32bit, at least when ran with KVM. This is
true even if a x86-64 qemu is used. This is also independent on whether
the host-cpu supports 64bit or not.


Joerg


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-12 Thread Peter Zijlstra

On Wed, Feb 12, 2014 at 12:09:29PM +0100, Stefan Bader wrote:
 Something else here I run a kernel with CONFIG_PREEMPT not set and NR_CPUS
 limited to 8 (for the 32bit kernel). So the default apic driver is used. Since
 default_send_IPI_mask_logical is only used from there, I assume the trace you
 got does the same. Maybe something there is wrong which would explain why we
 only see it on 32bit hosts.

Can you try with a different APIC driver to test this?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Another preempt folding issue?

2014-02-11 Thread Peter Zijlstra

On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote:
> Hi Peter,
> 
> I am currently looking at a weird issue that manifest itself when trying to 
> run
> kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important 
> the
> cpu is 64bit capable, so qemu-system-x86_64 is called). 

AMD or Intel machine?

> Sooner or later this
> causes softlockup messages on the host. I tracked this down to __vcpu_run in
> arch/x86/kvm/x86.c which does a loop which in that case never seems to make
> progress or exit.
> 
> What I found is that vcpu_enter_guest will exit quickly without causing the 
> loop
> to exit when need_resched() is true. Looking at a crash dump I took, this was
> the case (thread_info->flags had TIF_NEED_RESCHED set). So after immediately
> returning __vcpu_run has the following code:
> 
> if (need_resched()) {
> srcu_read_unlock(>srcu, vcpu->srcu_idx);
> kvm_resched(vcpu); // now cond_resched();
> vcpu->srcu_idx = srcu_read_lock(>srcu);
> }
> 
> The kvm_resched basically would end up doing a cond_resched() which now checks
> preempt_count() to be 0. If that is zero it will do the reschedule, otherwise 
> it
> just does nothing. Looking at the percpu variables in the dump, I saw that
> the preempt_count was 0x800 (actually it was 0x8011 but that was me
> triggering the kexec crashdump with sysrq-c).
> 
> I saw that there have been some changes in the upstream kernel and have picked
> the following:
> 1) x86, acpi, idle: Restructure the mwait idle routines
> 2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
> 3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
> 4) sched/preempt/x86: Fix voluntary preempt for x86
> 
> Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and 
> to
> the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do for
> 3.13.y stable].
> 
> Still, with all those I got the softlockup. Since I knew from the dump info 
> that
> something is wrong with the folding, I made the pragmatic approach and added 
> the
> following:
> 
> if (need_resched()) {
> srcu_read_unlock(>srcu, vcpu->srcu_idx);
> +   preempt_fold_need_resched();
> kvm_resched(vcpu); // now cond_resched();
> vcpu->srcu_idx = srcu_read_lock(>srcu);
> }
> 
> And this lets the kvm guest run without the softlockups! However I am less 
> than
> convinced that this is the right thing to do. Somehow something done when
> converting the preempt_count into percpu has caused at least the i386 side to
> get into this mess (as there has not been any whining about 64bit). Just fail 
> to
> see what.

I've been looking at the same thing too; I've got a trace from someone
who can reproduce and its just not making sense.

Looks like the hardware is loosing an interrupt (or worse).

With the below patch on top of current -git (might be whitespace damaged
due to copy-paste).

---
 arch/x86/kvm/x86.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 39c28f09dfd5..f8b2fc7ce491 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6115,7 +6115,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
}
if (need_resched()) {
srcu_read_unlock(>srcu, vcpu->srcu_idx);
-   cond_resched();
+   if (need_resched() != should_resched()) {
+   tracing_stop();
+   printk(KERN_ERR "Stopped tracing, due to 
inconsistent state.\n");
+   }
+   schedule();
vcpu->srcu_idx = srcu_read_lock(>srcu);
}
}

---

I got the following from an Athlon X2 (fam 0xf):

On Mon, Feb 10, 2014 at 05:23:15PM -0200, Marcelo Tosatti wrote:
> What is the problem exactly?

>From the trace:

 qemu-system-x86-2455  [001] d.s3   393.630586: smp_reschedule_interrupt 
<-reschedule_interrupt
 qemu-system-x86-2455  [001] d.s3   393.630586: scheduler_ipi 
<-smp_reschedule_interrupt
 qemu-system-x86-2455  [001] ..s3   393.630586: preempt_count_sub 
<-_raw_spin_unlock_irqrestore
 qemu-system-x86-2455  [001] ..s2   393.630586: _raw_spin_lock_irq 
<-run_timer_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: preempt_count_add 
<-_raw_spin_lock_irq
 qemu-system-x86-2455  [001] d.s3   393.630586: _raw_spin_unlock_irq 
<-run_timer_softirq
 qemu-system-x86-2455  [001] ..s3   393.630586: preempt_count_sub 
<-_raw_spin_unlock_irq
 qemu-system-x86-2455  [001] ..s2   393.630586: rcu_bh_qs <-__do_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: irqtime_account_irq 
<-__do_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: __local_bh_enable <-__do_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: preempt_count_sub 
<-__local_bh_enable

Re: Another preempt folding issue?

2014-02-11 Thread Peter Zijlstra

On Tue, Feb 11, 2014 at 07:34:51PM +0100, Stefan Bader wrote:
 Hi Peter,
 
 I am currently looking at a weird issue that manifest itself when trying to 
 run
 kvm enabled qemu on a i386 host (v3.13 kernel, oh and potentially important 
 the
 cpu is 64bit capable, so qemu-system-x86_64 is called). 

AMD or Intel machine?

 Sooner or later this
 causes softlockup messages on the host. I tracked this down to __vcpu_run in
 arch/x86/kvm/x86.c which does a loop which in that case never seems to make
 progress or exit.
 
 What I found is that vcpu_enter_guest will exit quickly without causing the 
 loop
 to exit when need_resched() is true. Looking at a crash dump I took, this was
 the case (thread_info-flags had TIF_NEED_RESCHED set). So after immediately
 returning __vcpu_run has the following code:
 
 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 kvm_resched(vcpu); // now cond_resched();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }
 
 The kvm_resched basically would end up doing a cond_resched() which now checks
 preempt_count() to be 0. If that is zero it will do the reschedule, otherwise 
 it
 just does nothing. Looking at the percpu variables in the dump, I saw that
 the preempt_count was 0x800 (actually it was 0x8011 but that was me
 triggering the kexec crashdump with sysrq-c).
 
 I saw that there have been some changes in the upstream kernel and have picked
 the following:
 1) x86, acpi, idle: Restructure the mwait idle routines
 2) x86, idle: Use static_cpu_has() for CLFLUSH workaround, add barriers
 3) sched/preempt: Fix up missed PREEMPT_NEED_RESCHED folding
 4) sched/preempt/x86: Fix voluntary preempt for x86
 
 Patch 1) and 2) as dependencies of 3) (to get the mwait function correct and 
 to
 the other file). Finally 4) is fixing up 3). [maybe worth suggesting to do for
 3.13.y stable].
 
 Still, with all those I got the softlockup. Since I knew from the dump info 
 that
 something is wrong with the folding, I made the pragmatic approach and added 
 the
 following:
 
 if (need_resched()) {
 srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
 +   preempt_fold_need_resched();
 kvm_resched(vcpu); // now cond_resched();
 vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
 }
 
 And this lets the kvm guest run without the softlockups! However I am less 
 than
 convinced that this is the right thing to do. Somehow something done when
 converting the preempt_count into percpu has caused at least the i386 side to
 get into this mess (as there has not been any whining about 64bit). Just fail 
 to
 see what.

I've been looking at the same thing too; I've got a trace from someone
who can reproduce and its just not making sense.

Looks like the hardware is loosing an interrupt (or worse).

With the below patch on top of current -git (might be whitespace damaged
due to copy-paste).

---
 arch/x86/kvm/x86.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 39c28f09dfd5..f8b2fc7ce491 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6115,7 +6115,11 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
}
if (need_resched()) {
srcu_read_unlock(kvm-srcu, vcpu-srcu_idx);
-   cond_resched();
+   if (need_resched() != should_resched()) {
+   tracing_stop();
+   printk(KERN_ERR Stopped tracing, due to 
inconsistent state.\n);
+   }
+   schedule();
vcpu-srcu_idx = srcu_read_lock(kvm-srcu);
}
}

---

I got the following from an Athlon X2 (fam 0xf):

On Mon, Feb 10, 2014 at 05:23:15PM -0200, Marcelo Tosatti wrote:
 What is the problem exactly?

From the trace:

 qemu-system-x86-2455  [001] d.s3   393.630586: smp_reschedule_interrupt 
-reschedule_interrupt
 qemu-system-x86-2455  [001] d.s3   393.630586: scheduler_ipi 
-smp_reschedule_interrupt
 qemu-system-x86-2455  [001] ..s3   393.630586: preempt_count_sub 
-_raw_spin_unlock_irqrestore
 qemu-system-x86-2455  [001] ..s2   393.630586: _raw_spin_lock_irq 
-run_timer_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: preempt_count_add 
-_raw_spin_lock_irq
 qemu-system-x86-2455  [001] d.s3   393.630586: _raw_spin_unlock_irq 
-run_timer_softirq
 qemu-system-x86-2455  [001] ..s3   393.630586: preempt_count_sub 
-_raw_spin_unlock_irq
 qemu-system-x86-2455  [001] ..s2   393.630586: rcu_bh_qs -__do_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: irqtime_account_irq 
-__do_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: __local_bh_enable -__do_softirq
 qemu-system-x86-2455  [001] d.s2   393.630586: preempt_count_sub 
-__local_bh_enable


   dmesg-2458  [000] d.s5   393.630614: resched_task

60 matches

Mail list logo