On 01/03/18 14:41, Vitaly Kuznetsov wrote: > QEMU/KVM guests running nested on top of Hyper-V fail to boot with > virtio-blk-pci disks, the debug log ends with > > Booting from Hard Disk... > call32_smm 0x000edd01 e97a0 > handle_smi cmd=b5 smbase=0x000a0000 > vp notify fe007000 (2) -- 0x0 > vp read fe005000 (1) -> 0x0 > handle_smi cmd=b5 smbase=0x000a0000 > call32_smm done 0x000edd01 0 > Booting from 0000:7c00 > call32_smm 0x000edd01 e97a4 > handle_smi cmd=b5 smbase=0x000a0000 > vp notify fe007000 (2) -- 0x0 > In resume (status=0) > In 32bit resume > Attempting a hard reboot > ... > > I bisected the breakage to the following commit: > > commit f46739b1a819750c63fb5849844d99cc2ab001e8 > Author: Kevin O'Connor <ke...@koconnor.net> > Date: Tue Feb 2 22:34:27 2016 -0500 > > virtio: Convert to new PCI BAR helper functions > > But the commit itself appears to be correct. The problem is in how > writew() function compiles into vp_notify(). For example, if we drop > 'volatile' qualifier from the current writew() implementation > everything starts to work. If we disassemble these two versions (as of > f46739b1a) the difference will be: > > 00000000 <vp_notify>: > > With 'volatile' (current) Without 'volatile' > > 0: push %ebx 0: push %ebx > 1: mov %eax,%ecx 1: mov %eax,%ecx > 3: mov 0x1518(%edx),%eax 3: mov 0x1518(%edx),%eax > 9: cmpb $0x0,0x2c(%ecx) 9: cmpb $0x0,0x2c(%ecx) > d: je 2f <vp_notify+0x2f> d: je 2e <vp_notify+0x2e> > f: mov 0x151c(%edx),%edx f: mov 0x151c(%edx),%edx > 15: mov 0x28(%ecx),%ebx > 18: imul %edx,%ebx 15: imul 0x28(%ecx),%edx > 1b: mov 0x8(%ecx),%edx 19: mov 0x8(%ecx),%ebx > 1e: add %ebx,%edx > 20: cmpb $0x0,0xe(%ecx) 1c: cmpb $0x0,0xe(%ecx) > 24: je 2a <vp_notify+0x2a> 20: je 28 <vp_notify+0x28> > 22: add %ebx,%edx > 26: out %ax,(%dx) 24: out %ax,(%dx) > 28: jmp 48 <vp_notify+0x48> 26: jmp 47 <vp_notify+0x47> > 2a: mov %ax,(%edx) 28: mov %ax,(%ebx,%edx,1) > 2d: jmp 48 <vp_notify+0x48> 2c: jmp 47 <vp_notify+0x47> > 2f: lea 0x20(%ecx),%ebx 2e: lea 0x20(%ecx),%ebx > 32: cltd 31: cltd > 33: push %edx 32: push %edx > 34: push %eax 33: push %eax > 35: mov $0x2,%ecx 34: mov $0x2,%ecx > 3a: mov $0x10,%edx 39: mov $0x10,%edx > 3f: mov %ebx,%eax 3e: mov %ebx,%eax > 41: call 42 <vp_notify+0x42> 40: call 41 <vp_notify+0x41> > 46: pop %eax 45: pop %eax > 47: pop %edx 46: pop %edx > 48: pop %ebx 47: pop %ebx > 49: ret 48: ret > > My eyes fail to see an obvious compiler flaw here but probably the > mov difference (at '2a' old, '28' new) is to blame. Doing some other > subtle changes (e.g. adding local variables to the function) help in > some cases too. At this point I got a bit lost with my debug so I > looked at how Linux does this stuff and it seems we're not using > '*(volatile u16) = ' there. Rewriting write/read{b,w,l} with volatile > asm help. > > Signed-off-by: Vitaly Kuznetsov <vkuzn...@redhat.com> > --- > RFC: This is rather an ongoing debug as I'm not able to point finger > at the real culprit yet, I'd be grateful for any help and suggestions. > In particular, I don't quite understand why nested virtualization > makes a difference here. > --- > src/x86.h | 21 +++++++++------------ > 1 file changed, 9 insertions(+), 12 deletions(-) > > diff --git a/src/x86.h b/src/x86.h > index 53378e9..d45122c 100644 > --- a/src/x86.h > +++ b/src/x86.h > @@ -199,30 +199,27 @@ static inline void smp_wmb(void) { > } > > static inline void writel(void *addr, u32 val) { > - barrier(); > - *(volatile u32 *)addr = val; > + asm volatile("movl %0, %1" : : "d"(val), "m"(*(u32 *)addr) : "memory"); > } > static inline void writew(void *addr, u16 val) { > - barrier(); > - *(volatile u16 *)addr = val; > + asm volatile("movw %0, %1" : : "d"(val), "m"(*(u16 *)addr) : "memory"); > } > static inline void writeb(void *addr, u8 val) { > - barrier(); > - *(volatile u8 *)addr = val; > + asm volatile("movb %0, %1" : : "d"(val), "m"(*(u8 *)addr) : "memory"); > } > static inline u32 readl(const void *addr) { > - u32 val = *(volatile const u32 *)addr; > - barrier(); > + u32 val; > + asm volatile("movl %1, %0" : "=d"(val) : "m"(*(u32 *)addr) : "memory"); > return val; > } > static inline u16 readw(const void *addr) { > - u16 val = *(volatile const u16 *)addr; > - barrier(); > + u16 val; > + asm volatile("movw %1, %0" : "=d"(val) : "m"(*(u16 *)addr) : "memory"); > return val; > } > static inline u8 readb(const void *addr) { > - u8 val = *(volatile const u8 *)addr; > - barrier(); > + u8 val; > + asm volatile("movb %1, %0" : "=d"(val) : "m"(*(u8 *)addr) : "memory"); > return val; > } > >
barrier() is defined like this, in "src/types.h": > #define barrier() __asm__ __volatile__("": : :"memory") and "src/x86.h" has some interesting stuff too, just above the code that you modify: > /* Compiler barrier is enough as an x86 CPU does not reorder reads or writes > */ > static inline void smp_rmb(void) { > barrier(); > } > static inline void smp_wmb(void) { > barrier(); > } Is it possible that the current barrier() is not sufficient for the intended purpose in an L2 guest? What happens if you drop your current patch, but replace __asm__ __volatile__("": : :"memory") in the barrier() macro definition, with a real, heavy-weight barrier, such as __asm__ __volatile__("mfence": : :"memory") (See mb() in "arch/x86/include/asm/barrier.h" in the kernel.) ... I think running in L2 could play a role here; see "Documentation/memory-barriers.txt", section "VIRTUAL MACHINE GUESTS"; from kernel commit 6a65d26385bf ("asm-generic: implement virt_xxx memory barriers", 2016-01-12). See also the commit message. Thanks, Laszlo _______________________________________________ SeaBIOS mailing list SeaBIOS@seabios.org https://mail.coreboot.org/mailman/listinfo/seabios