On Wed, Nov 30, 2016 at 07:19:12AM +0000, Peter Maydell wrote:
> On 29 November 2016 at 19:38, Andrew Jones <drjo...@redhat.com> wrote:
> > Thanks for making me look, I was simply assuming we were in the while
> > loops above.
> >
> > I couldn't get the problem to reproduce with access to the monitor,
> > but by adding '-d exec' I was able to see cpu0 was on the wfe in
> > smp_boot_secondary. It should only stay there until cpu1 executes the
> > sev in secondary_cinit, but it looks like TCG doesn't yet implement sev
> >
> >  $ grep SEV target-arm/translate.c
> >         /* TODO: Implement SEV, SEVL and WFE.  May help SMP performance.
> Yes, we currently NOP SEV. We only implement WFE as "yield back
> to TCG top level loop", though, so this is fine. The idea is
> that WFE gets used in busy loops so it's a helpful hint to
> try running some other TCG vCPU instead of just spinning in
> the guest on this one. Implementing SEV as a NOP and WFE as
> a more-or-less NOP is architecturally permitted (guest code
> is required to cope with WFE returning "early"). If something
> is not working correctly then it's either buggy guest code
> or a problem with the generic TCG scheduling of CPUs.

The problem is indeed with the scheduling. The way it currently works
is to depend on the iothread to kick a reschedule once in a while, or
a cpu to issue an instruction that does so (wfe/wfi). However if
there's no io and a cpu never issues a scheduling instruction, then it
won't happen. We either need a sched tick or to never have an infinite
iothread ppoll timeout (basically using the ppoll timeout as a tick).

As for being buggy guest code, I don't think so. Here's another
unit test that illustrates the issue taking wfe/sev out.

 #include <asm/smp.h>
 void secondary(void) {
     printf("secondary running\n");

     /* A "real" guest cpu shouldn't do this, but even if it
      * does, that shouldn't stop other cpus from running.
 int main(void) {
     smp_boot_secondary(1, secondary);
     printf("primary running\n");
     return 0;

With that test we get the two print statements, but it never exits.

Now that I understand the problem much better, I think I may be
coming full circle and advocating the iothread's ppoll never be
allowed to have an infinite timeout again, but now only for tcg.
Something like

 if (timeout < 0 && tcg_enabled())
    timeout = TCG_SCHED_TICK;


Reply via email to