Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-04 Thread Arnd Bergmann
On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
 (Though glibc can be nasty, afaik it might load up optimized variants of
 some routines with hard wired cache line sizes based on the CPU type)

You can also get application with hand-coded cache optimizations
that are even harder, if not impossible, to fix.

Arnd 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-04 Thread Benjamin Herrenschmidt
On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:
 On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:
  (Though glibc can be nasty, afaik it might load up optimized
 variants of
  some routines with hard wired cache line sizes based on the CPU
 type)
 
 You can also get application with hand-coded cache optimizations
 that are even harder, if not impossible, to fix. 

Right. But those are already broken across CPU variants anyways.

Cheers,
Ben

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-04 Thread Alexander Graf


On 04.11.2009, at 09:47, Benjamin Herrenschmidt wrote:


On Wed, 2009-11-04 at 09:43 +0100, Arnd Bergmann wrote:

On Tuesday 03 November 2009, Benjamin Herrenschmidt wrote:

(Though glibc can be nasty, afaik it might load up optimized

variants of

some routines with hard wired cache line sizes based on the CPU

type)

You can also get application with hand-coded cache optimizations
that are even harder, if not impossible, to fix.


Right. But those are already broken across CPU variants anyways.


... which might be the reason you're using KVM in the first place.

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-04 Thread Segher Boessenkool

+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


There is no device passthrough yet :-). It's theoretically  
possible, but nothing for it is implemented so far.


You could just always do an eieio here, it's not expensive at all
compared to the emulation trap itself.

However -- eieio is a Book II insn, it will never trap anyway!


+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu-arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu-arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb)  ~31ULL;
+   if (!(vcpu-arch.msr  MSR_SF))
+   addr = 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where  
there
are HID bits to make it work on 32 bytes only, and an extra DCBZL  
insn

that always clears a full cache line (128 bytes).


Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions


Ah yes, I forgot.  Could you rename it to OP_31_XOP_FAKE_32BIT_DCBZ
or such?


because cache line size of target == 32.
On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.



+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu-arch.mmu.is_dcbz32(vcpu) 
+   (mfmsr()  MSR_HV))
+   vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to  
set this bit.


Which patch contains that entry/exit code?

If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.


So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.


But when MSR[HV]=0 and MSR[PR]=0, mtspr to a hypervisor resource
will not trap but be silently ignored.  Sorry for not being more clear.
...Oh.  You run your guest as MSR[PR]=1 anyway!  Tricky.


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Segher Boessenkool

Nice patchset.  Some comments on the emulation part:


+#define OP_31_XOP_EIOIO854


You mean EIEIO.


+   case 19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:
+   case OP_19_XOP_RFI:
+   vcpu-arch.pc = vcpu-arch.srr0;
+   kvmppc_set_msr(vcpu, vcpu-arch.srr1);
+   *advance = 0;
+   break;


I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment

everywhere else.


+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu-arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu-arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb)  ~31ULL;
+   if (!(vcpu-arch.msr  MSR_SF))
+   addr = 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).


+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2];
+   break;


Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some none,  
btw.



+   case SPRN_HID0:
+   to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID1:
+   to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID2:
+   to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID4:
+   to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID5:
+   to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs];


HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!


+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu-arch.mmu.is_dcbz32(vcpu) 
+   (mfmsr()  MSR_HV))
+   vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Alexander Graf


On 03.11.2009, at 09:47, Segher Boessenkool wrote:


Nice patchset.  Some comments on the emulation part:


Cool, thanks for looking though them!


+#define OP_31_XOP_EIOIO854


You mean EIEIO.


Probably, yeah.


+   case 19:
+   switch (get_xop(inst)) {
+   case OP_19_XOP_RFID:
+   case OP_19_XOP_RFI:
+   vcpu-arch.pc = vcpu-arch.srr0;
+   kvmppc_set_msr(vcpu, vcpu-arch.srr1);
+   *advance = 0;
+   break;


I think you should only emulate the insns that exist on whatever the  
guest
pretends to be.  RFID exist only on 64-bit implementations.  Same  
comment

everywhere else.


True.




+   case OP_31_XOP_EIOIO:
+   break;


Have you always executed an eieio or sync when you get here, or
do you just not allow direct access to I/O devices?  Other context
synchronising insns are not enough, they do not broadcast on the
bus.


There is no device passthrough yet :-). It's theoretically possible,  
but nothing for it is implemented so far.





+   case OP_31_XOP_DCBZ:
+   {
+   ulong rb =  vcpu-arch.gpr[get_rb(inst)];
+   ulong ra = 0;
+   ulong addr;
+   u32 zeros[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };
+
+   if (get_ra(inst))
+   ra = vcpu-arch.gpr[get_ra(inst)];
+
+   addr = (ra + rb)  ~31ULL;
+   if (!(vcpu-arch.msr  MSR_SF))
+   addr = 0x;
+
+   if (kvmppc_st(vcpu, addr, 32, zeros)) {


DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
that always clears a full cache line (128 bytes).


Yes. We only come here when we patched the dcbz opcodes to invalid  
instructions because cache line size of target == 32.

On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.

Admittedly though, this could be a lot more clever.


+   switch (sprn) {
+   case SPRN_IBAT0U ... SPRN_IBAT3L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
+   break;
+   case SPRN_IBAT4U ... SPRN_IBAT7L:
+   bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2];
+   break;
+   case SPRN_DBAT0U ... SPRN_DBAT3L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
+   break;
+   case SPRN_DBAT4U ... SPRN_DBAT7L:
+   bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2];
+   break;


Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
specific
SPRs, after all.  Some CPUs have only six, some only four, some  
none, btw.


For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
it's probably worth looking into at one point or the other.





+   case SPRN_HID0:
+   to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID1:
+   to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID2:
+   to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID4:
+   to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs];
+   break;
+   case SPRN_HID5:
+   to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs];


HIDs are different per CPU; and worse, different CPUs have different
registers (SPR #s) for the same register name!


Sigh :-(


+   /* guest HID5 set can change is_dcbz32 */
+   if (vcpu-arch.mmu.is_dcbz32(vcpu) 
+   (mfmsr()  MSR_HV))
+   vcpu-arch.hflags |= BOOK3S_HFLAG_DCBZ32;
+   break;


Wait, does this mean you allow other HID writes when MSR[HV] isn't
set?  All HIDs (and many other SPRs) cannot be read or written in
supervisor mode.


When we're running in MSR_HV=0 mode on a 970 we can use the 32 byte  
dcbz HID flag. So all we need to do is tell our entry/exit code to set  
this bit.


If we're on 970 on a hypervisor or on a non-970 though we can't use  
the HID5 bit, so we need to binary patch the opcodes.


So in order to emulate real 970 behavior, we need to be able to  
emulate that HID5 bit too! That's what this chunk of code does - it  
basically sets us in dcbz32 mode when allowed on 970 guests.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 14/27] Add book3s_64 specific opcode emulation

2009-11-03 Thread Benjamin Herrenschmidt
On Tue, 2009-11-03 at 10:06 +0100, Alexander Graf wrote:

  DCBZ zeroes out a cache line, not 32 bytes; except on 970, where there
  are HID bits to make it work on 32 bytes only, and an extra DCBZL insn
  that always clears a full cache line (128 bytes).
 
 Yes. We only come here when we patched the dcbz opcodes to invalid  
 instructions because cache line size of target == 32.
 On 970 with MSR_HV = 0 we actually use the dcbz 32-bytes mode.
 
 Admittedly though, this could be a lot more clever.

Yeah well, we also really need to fix ppc32 Linux to use the device-tree
provided cache line size :-) For 64-bits, that should already be the
case, and thus the emulation trick shouldn't be useful as long as you
properly provide the guest with the right size in the device-tree.

(Though glibc can be nasty, afaik it might load up optimized variants of
some routines with hard wired cache line sizes based on the CPU type)

  +  switch (sprn) {
  +  case SPRN_IBAT0U ... SPRN_IBAT3L:
  +  bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT0U) / 2];
  +  break;
  +  case SPRN_IBAT4U ... SPRN_IBAT7L:
  +  bat = vcpu_book3s-ibat[(sprn - SPRN_IBAT4U) / 2];
  +  break;
  +  case SPRN_DBAT0U ... SPRN_DBAT3L:
  +  bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT0U) / 2];
  +  break;
  +  case SPRN_DBAT4U ... SPRN_DBAT7L:
  +  bat = vcpu_book3s-dbat[(sprn - SPRN_DBAT4U) / 2];
  +  break;
 
  Do xBAT4..7 have the same SPR numbers on all CPUs?  They are CPU- 
  specific
  SPRs, after all.  Some CPUs have only six, some only four, some  
  none, btw.
 
 For now only Linux runs which only uses the first 3(?) IIRC. But yes,  
 it's probably worth looking into at one point or the other.
 
 
  +  case SPRN_HID0:
  +  to_book3s(vcpu)-hid[0] = vcpu-arch.gpr[rs];
  +  break;
  +  case SPRN_HID1:
  +  to_book3s(vcpu)-hid[1] = vcpu-arch.gpr[rs];
  +  break;
  +  case SPRN_HID2:
  +  to_book3s(vcpu)-hid[2] = vcpu-arch.gpr[rs];
  +  break;
  +  case SPRN_HID4:
  +  to_book3s(vcpu)-hid[4] = vcpu-arch.gpr[rs];
  +  break;
  +  case SPRN_HID5:
  +  to_book3s(vcpu)-hid[5] = vcpu-arch.gpr[rs];
 
  HIDs are different per CPU; and worse, different CPUs have different
  registers (SPR #s) for the same register name!
 
 Sigh :-(

On the other hand, you can probably just Swallow all of these and
Linux won't even notice, except for the case of the sleep state maybe on
6xx/7xx/7xxx. Just a matter of knowing what your are emulating as guest.

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev