Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote: (Though glibc can be nasty, afaik it might load up optimized variants of some routines with hard wired cache line sizes based on the CPU type) You can also get application with hand-coded cache optimizations that are even harder, if not impossible, to fix. Arnd ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote: On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote: (Though glibc can be nasty, afaik it might load up optimized variants of some routines with hard wired cache line sizes based on the CPU type) You can also get application with hand-coded cache optimizations that are even harder, if not impossible, to fix. Right. But those are already broken across CPU variants anyways. Cheers, Ben ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On 04.11.2009, at 09:47, Benjamin Herrenschmidt wrote: On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote: On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote: (Though glibc can be nasty, afaik it might load up optimized variants of some routines with hard wired cache line sizes based on the CPU type) You can also get application with hand-coded cache optimizations that are even harder, if not impossible, to fix. Right. But those are already broken across CPU variants anyways. ... which might be the reason you're using KVM in the first place. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
+ case OP_31_XOP_EIOIO: + break; Have you always executed an eieio or sync when you get here, or do you just not allow direct access to I/O devices? Other context synchronising insns are not enough, they do not broadcast on the bus. There is no device passthrough yet :-). It's theoretically possible, but nothing for it is implemented so far. You could just always do an eieio here, it's not expensive at all compared to the emulation trap itself. However -- eieio is a Book II insn, it will never trap anyway! + case OP_31_XOP_DCBZ: + { + ulong rb = vcpu-arch.gpr[get_rb(inst)]; + ulong ra = 0; + ulong addr; + u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; + + if (get_ra(inst)) + ra = vcpu-arch.gpr[get_ra(inst)]; + + addr = (ra + rb) ~31ULL; + if (!(vcpu-arch.msr MSR_SF)) + addr = 0x; + + if (kvmppc_st(vcpu, addr, 32, zeros)) { DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there are HID bits to make it work on 32 bytes only, and an extra DCBZL insn that always clears a full cache line (128 bytes). Yes. We only come here when we patched the dcbz opcodes to invalid instructions Ah yes, I forgot. Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ or such? because cache line size of target == 32. On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode. Admittedly though, this could be a lot more clever. + /* guest HID5 set can change is_dcbz32 */ + if (vcpu-arch.mmu.is_dcbz32(vcpu) + (mfmsr() MSR_HV)) + vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32; + break; Wait, does this mean you allow other HID writes when MSR[HV] isn't set? All HIDs (and many other SPRs) cannot be read or written in supervisor mode. When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte dcbz HID flag. So all we need to do is tell our entry/exit code to set this bit. Which patch contains that entry/exit code? If we're on 970 on a hypervisor or on a non-970 though we can't use the HID5 bit, so we need to binary patch the opcodes. So in order to emulate real 970 behavior, we need to be able to emulate that HID5 bit too! That's what this chunk of code does - it basically sets us in dcbz32 mode when allowed on 970 guests. But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource will not trap but be silently ignored. Sorry for not being more clear. ...Oh. You run your guest as MSR[PR]=1 anyway! Tricky. Segher ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
Nice patchset. Some comments on the emulation part: +#define OP_31_XOP_EIOIO854 You mean EIEIO. + case 19: + switch (get_xop(inst)) { + case OP_19_XOP_RFID: + case OP_19_XOP_RFI: + vcpu-arch.pc = vcpu-arch.srr0; + kvmppc_set_msr(vcpu, vcpu-arch.srr1); + *advance = 0; + break; I think you should only emulate the insns that exist on whatever the guest pretends to be. RFID exist only on 64-bit implementations. Same comment everywhere else. + case OP_31_XOP_EIOIO: + break; Have you always executed an eieio or sync when you get here, or do you just not allow direct access to I/O devices? Other context synchronising insns are not enough, they do not broadcast on the bus. + case OP_31_XOP_DCBZ: + { + ulong rb = vcpu-arch.gpr[get_rb(inst)]; + ulong ra = 0; + ulong addr; + u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; + + if (get_ra(inst)) + ra = vcpu-arch.gpr[get_ra(inst)]; + + addr = (ra + rb) ~31ULL; + if (!(vcpu-arch.msr MSR_SF)) + addr = 0x; + + if (kvmppc_st(vcpu, addr, 32, zeros)) { DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there are HID bits to make it work on 32 bytes only, and an extra DCBZL insn that always clears a full cache line (128 bytes). + switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2]; + break; + case SPRN_IBAT4U ... SPRN_IBAT7L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2]; + break; + case SPRN_DBAT0U ... SPRN_DBAT3L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2]; + break; + case SPRN_DBAT4U ... SPRN_DBAT7L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2]; + break; Do xBAT4..7 have the same SPR numbers on all CPUs? They are CPU- specific SPRs, after all. Some CPUs have only six, some only four, some none, btw. + case SPRN_HID0: + to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID1: + to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID2: + to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID4: + to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID5: + to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs]; HIDs are different per CPU; and worse, different CPUs have different registers (SPR #s) for the same register name! + /* guest HID5 set can change is_dcbz32 */ + if (vcpu-arch.mmu.is_dcbz32(vcpu) + (mfmsr() MSR_HV)) + vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32; + break; Wait, does this mean you allow other HID writes when MSR[HV] isn't set? All HIDs (and many other SPRs) cannot be read or written in supervisor mode. Segher ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On 03.11.2009, at 09:47, Segher Boessenkool wrote: Nice patchset. Some comments on the emulation part: Cool, thanks for looking though them! +#define OP_31_XOP_EIOIO854 You mean EIEIO. Probably, yeah. + case 19: + switch (get_xop(inst)) { + case OP_19_XOP_RFID: + case OP_19_XOP_RFI: + vcpu-arch.pc = vcpu-arch.srr0; + kvmppc_set_msr(vcpu, vcpu-arch.srr1); + *advance = 0; + break; I think you should only emulate the insns that exist on whatever the guest pretends to be. RFID exist only on 64-bit implementations. Same comment everywhere else. True. + case OP_31_XOP_EIOIO: + break; Have you always executed an eieio or sync when you get here, or do you just not allow direct access to I/O devices? Other context synchronising insns are not enough, they do not broadcast on the bus. There is no device passthrough yet :-). It's theoretically possible, but nothing for it is implemented so far. + case OP_31_XOP_DCBZ: + { + ulong rb = vcpu-arch.gpr[get_rb(inst)]; + ulong ra = 0; + ulong addr; + u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 }; + + if (get_ra(inst)) + ra = vcpu-arch.gpr[get_ra(inst)]; + + addr = (ra + rb) ~31ULL; + if (!(vcpu-arch.msr MSR_SF)) + addr = 0x; + + if (kvmppc_st(vcpu, addr, 32, zeros)) { DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there are HID bits to make it work on 32 bytes only, and an extra DCBZL insn that always clears a full cache line (128 bytes). Yes. We only come here when we patched the dcbz opcodes to invalid instructions because cache line size of target == 32. On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode. Admittedly though, this could be a lot more clever. + switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2]; + break; + case SPRN_IBAT4U ... SPRN_IBAT7L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2]; + break; + case SPRN_DBAT0U ... SPRN_DBAT3L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2]; + break; + case SPRN_DBAT4U ... SPRN_DBAT7L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2]; + break; Do xBAT4..7 have the same SPR numbers on all CPUs? They are CPU- specific SPRs, after all. Some CPUs have only six, some only four, some none, btw. For now only Linux runs which only uses the first 3(?) IIRC. But yes, it's probably worth looking into at one point or the other. + case SPRN_HID0: + to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID1: + to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID2: + to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID4: + to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID5: + to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs]; HIDs are different per CPU; and worse, different CPUs have different registers (SPR #s) for the same register name! Sigh :-( + /* guest HID5 set can change is_dcbz32 */ + if (vcpu-arch.mmu.is_dcbz32(vcpu) + (mfmsr() MSR_HV)) + vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32; + break; Wait, does this mean you allow other HID writes when MSR[HV] isn't set? All HIDs (and many other SPRs) cannot be read or written in supervisor mode. When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte dcbz HID flag. So all we need to do is tell our entry/exit code to set this bit. If we're on 970 on a hypervisor or on a non-970 though we can't use the HID5 bit, so we need to binary patch the opcodes. So in order to emulate real 970 behavior, we need to be able to emulate that HID5 bit too! That's what this chunk of code does - it basically sets us in dcbz32 mode when allowed on 970 guests. Alex ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 14/27] Add book3s_64 specific opcode emulation
On Tue, 2009-11-03 at 10:06 +0100, Alexander Graf wrote: DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there are HID bits to make it work on 32 bytes only, and an extra DCBZL insn that always clears a full cache line (128 bytes). Yes. We only come here when we patched the dcbz opcodes to invalid instructions because cache line size of target == 32. On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode. Admittedly though, this could be a lot more clever. Yeah well, we also really need to fix ppc32 Linux to use the device-tree provided cache line size :-) For 64-bits, that should already be the case, and thus the emulation trick shouldn't be useful as long as you properly provide the guest with the right size in the device-tree. (Though glibc can be nasty, afaik it might load up optimized variants of some routines with hard wired cache line sizes based on the CPU type) + switch (sprn) { + case SPRN_IBAT0U ... SPRN_IBAT3L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2]; + break; + case SPRN_IBAT4U ... SPRN_IBAT7L: + bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2]; + break; + case SPRN_DBAT0U ... SPRN_DBAT3L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2]; + break; + case SPRN_DBAT4U ... SPRN_DBAT7L: + bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2]; + break; Do xBAT4..7 have the same SPR numbers on all CPUs? They are CPU- specific SPRs, after all. Some CPUs have only six, some only four, some none, btw. For now only Linux runs which only uses the first 3(?) IIRC. But yes, it's probably worth looking into at one point or the other. + case SPRN_HID0: + to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID1: + to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID2: + to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID4: + to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs]; + break; + case SPRN_HID5: + to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs]; HIDs are different per CPU; and worse, different CPUs have different registers (SPR #s) for the same register name! Sigh :-( On the other hand, you can probably just Swallow all of these and Linux won't even notice, except for the case of the sleep state maybe on 6xx/7xx/7xxx. Just a matter of knowing what your are emulating as guest. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev