David wrote:

> On Wed, Jan 30, 2008 at 08:35:13AM -0800, Jürgen Keil wrote:
> > Can anyone else reproduce opensolaris PV domU hangs
> > during domU boot, when the domU is using an root filesystem
> > on an nfs server and the domU is configured with more than
> > one vcpu?
> 
> I don't have this set up at the moment, but it definitely worked ~6
> months ago (even 32 way).

Hmm, this could be a generic S-x86 mp architecture bug (?).

Under xVM, this happens:

- mp_startup() is called to startup cpu#1

- in mp_startup(),  "(*ap_mlsetup)()" is called,
  which calls xen's xen_psm_post_cpu_start()

- in xen_psm_post_cpu_start() we have this:

        /*
         * Re-distribute interrupts to include the newly added cpu.
         */
        xen_psm_enable_intr(cpun);


   In my setup, this re-binds netfront's interrupt handler
   xnf`xnf_intr() from cpu0 to the new cpu1.

   (This might have changed in snv_77, with the fix for
   6611846 "after boot, all dom0 interrupts are targeting
   CPU 0 in a MP system" - this could explain why it
   did work for you ~6 month ago).

- later on, in mp_setup() it raises the spl for the new cpu1
  to LOCK_LEVEL, and enables interrupts. But at
  spl == LOCK_LEVEL, xnf_intr should be masked.

  add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi)
  is called.  This tries to load & attach the "cpudrv" kernel
  module (while we're still at spl == LOCK_LEVEL on cpu1). 
  It sends packes out of the domU, but the replies from
  the NFS server are never seen by xnf`xnf_intr, which is
  masked.

When the domU is hung, it see this:

[1]> ::cpuinfo -v
 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  0 fffffffffbc3fff0  1b    0    0  -1   no    no t-0    ffffff0001005c80
 (idle)
                       |
            RUNNING <--+
              READY
             EXISTS
             ENABLE

 ID ADDR             FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD           PROC
  1 ffffff0086199ac0  1b    0   10  60   no    no t-0    ffffff00010cbc80
                       |
            RUNNING <--+
              READY
             EXISTS
             ENABLE
[1]> ::interrupts
IRQ  Vect Evtchn IPL Bus    Trg Type   CPU Share APIC/INT# ISR(s)
256  -    I      15  -      Edg ipi    all -     -         xc_serv
257  -    I      13  -      Edg ipi    all -     -         xc_serv
258  -    I      11  -      Edg ipi    all -     -         poke_cpu
259  -    1      15  -      Edg virq   all -     -         xen_debug_handler
260  -    1      1   -      Edg evtchn 0   -     -         xenbus_intr
261  -    T      14  -      Edg virq   all -     -         cbe_fire
262  -    I      14  -      Edg ipi    all -     -         cbe_fire
263  -    9      6   xpvd   Edg evtchn 1   -     -         xnf`xnf_intr
264  -    2      9   xpvd   Edg evtchn 0   -     -         xencons`xenconsintr
[1]> ::evtchns
Type          Evtchn IRQ IPL CPU Masked Pending ISR(s)
evtchn        1      260 1   0   0      0       xenbus_intr
evtchn        2      264 9   0   0      1       xencons`xenconsintr
ipi           3      256 15  0   1      0       xc_serv
ipi           4      257 13  0   0      0       xc_serv
ipi           5      258 11  0   0      0       poke_cpu
virq:debug    6      259 15  0   0      0       xen_debug_handler
virq:timer    7      261 14  0   1      1       cbe_fire
ipi           8      262 14  0   0      0       cbe_fire
evtchn        9      263 6   1   1      1       xnf`xnf_intr
ipi           10     258 11  1   0      0       poke_cpu
ipi           11     257 13  1   0      0       xc_serv
ipi           12     262 14  1   0      0       cbe_fire
ipi           13     256 15  1   0      0       xc_serv
virq:timer    14     261 14  1   1      1       cbe_fire


A possible fix could be to move the add_cpunode2devtree()
call down a few lines in mp_startup(), after the spl0():

diff -r f6814e9b7def usr/src/uts/i86pc/os/mp_startup.c
--- a/usr/src/uts/i86pc/os/mp_startup.c Wed Jan 30 09:01:17 2008 -0800
+++ b/usr/src/uts/i86pc/os/mp_startup.c Thu Jan 31 01:00:58 2008 +0100
@@ -1518,13 +1518,15 @@ mp_startup(void)
         */
        curthread->t_preempt = 0;

-       add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi);
+       /* add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi); */

        /* The base spl should still be at LOCK LEVEL here */
        ASSERT(cp->cpu_base_spl == ipltospl(LOCK_LEVEL));
        set_base_spl();         /* Restore the spl to its proper value */

        (void) spl0();                          /* enable interrupts */
+
+       add_cpunode2devtree(cp->cpu_id, cp->cpu_m.mcpu_cpi);

 #ifndef __xpv
        {
 
 
This message posted from opensolaris.org
_______________________________________________
xen-discuss mailing list
[email protected]

Reply via email to