Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-28 Thread Avi Kivity

On 01/28/2014 01:27 AM, Benjamin Herrenschmidt wrote:

On Wed, 2014-01-22 at 17:29 +, Peter Maydell wrote:

Basically if it would be on real bus, get byte value
that corresponds to phys_addr + 0 address place
it into data[0], get byte value that corresponds to
phys_addr + 1 address place it into data[1], etc.

This just isn't how real buses work.

Actually it can be :-)


  There is no
address + 1, address + 2. There is a single address
for the memory transaction and a set of data on
data lines and some separate size information.
How the device at the far end of the bus chooses
to respond to 32 bit accesses to address X versus
8 bit accesses to addresses X through X+3 is entirely
its own business and unrelated to the CPU.

However the bus has a definition of what byte lane is the lowest in
address order. Byte order invariance is an important function of
all busses.

I think that trying to treat it any differently than an address
ordered series of bytes is going to turn into a complete and
inextricable mess.


I agree.

The two options are:

 (address, byte array, length)

and

 (address, value, word size, endianness)

the first is the KVM ABI, the second is how MemoryRegions work. Both are 
valid, but the first is more general (supports the 3-byte accesses 
sometimes generated on x86).






  (It would
be perfectly possible to have a device which when
you read from address X as 32 bits returned 0x12345678,
when you read from address X as 16 bits returned
0x9abc, returned 0x42 for an 8 bit read from X+1,
and so on. Having byte reads from X..X+3 return
values corresponding to parts of the 32 bit access
is purely a convention.)

Right, it's possible. It's also stupid and not how most modern devices
and busses work. Besides there is no reason why that can't be
implemented with Victor proposal anyway.


Right.



Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-28 Thread Christoffer Dall
On Tue, Jan 28, 2014 at 03:47:32PM +1100, Benjamin Herrenschmidt wrote:
 On Mon, 2014-01-27 at 16:44 -0800, Christoffer Dall wrote:
 
  I'm loosing track of this discussion, Ben, can you explain a bit?  You
  wrote:
  
Having a byte array coming in that represents what the CPU does in its
current byte order means you do *NOT* need to query the endianness of
the guest CPU from userspace.
  
  What does a byte array that represents what the CPU does in its current
  byte order mean in this context.  Do you mean the VCPU or the physical
  CPU when you say CPU.
 
 It doesn't matter once it's a byte array in address order. Again this is
 the *right* abstraction for the kernel ABI, because you do not care
 about the endianness of either side, guest or host.
 
 It makes no sense to treat a modern CPU data bus as having an MSB and an
 LSB (even if they have it sometimes on the block diagram). Only when
 *interpreting a value* on that bus, such as an *address* does the
 endianness become of use.
 
 Treat the bus instead as an ordered sequence of bytes in ascending
 address order and most of the complexity goes away.
 
 From there, for a given device, it all depends which bytes *that device*
 choses to consider as being the MSB vs. LSB. It's not even a bus thing,
 though of course some busses suggest an endianness, and some like PCI
 mandates it for configuration space. 
 
 But it remains a device-side choice.
 
  I read your text as saying just do a store of the register into the
  data pointer and don't worry about endianness, but somebody, somewhere,
  has to check the VCPU endianness setting.
  
  I'm probably wrong, and you are probably the right person to clear this
  up, but can you formulate exactly what you think the KVM ABI is and how
  you would put it in Documentation/virtual/kvm/api.txt?
  
  My point of view is that it is KVM that needs to do this, and it should
  emulate the CPU by performing a byteswap in the case where the CPU
  E-bit is set on ARM, but this is an ARM-centric way of looking at
  things.
 
 The ABI going to qemu should be (and inside qemu from TCG to the
 emulation) that the CPU did an access of N bytes wide at address A
 whose value is the byte array data[] in ascending address order.
 
OK, I've sent a v3 of the ABI clarification patch following the wording
from you and Scott.  I think we all agree what the format should look
like at this point and hopefully we can quickly agree about a text to
describe that.

Thanks,
-Christoffer



Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Wed, 2014-01-22 at 17:29 +, Peter Maydell wrote:
 
  Basically if it would be on real bus, get byte value
  that corresponds to phys_addr + 0 address place
  it into data[0], get byte value that corresponds to
  phys_addr + 1 address place it into data[1], etc.
 
 This just isn't how real buses work.

Actually it can be :-)

  There is no
 address + 1, address + 2. There is a single address
 for the memory transaction and a set of data on
 data lines and some separate size information.
 How the device at the far end of the bus chooses
 to respond to 32 bit accesses to address X versus
 8 bit accesses to addresses X through X+3 is entirely
 its own business and unrelated to the CPU.

However the bus has a definition of what byte lane is the lowest in
address order. Byte order invariance is an important function of
all busses.

I think that trying to treat it any differently than an address
ordered series of bytes is going to turn into a complete and
inextricable mess.

  (It would
 be perfectly possible to have a device which when
 you read from address X as 32 bits returned 0x12345678,
 when you read from address X as 16 bits returned
 0x9abc, returned 0x42 for an 8 bit read from X+1,
 and so on. Having byte reads from X..X+3 return
 values corresponding to parts of the 32 bit access
 is purely a convention.)

Right, it's possible. It's also stupid and not how most modern devices
and busses work. Besides there is no reason why that can't be
implemented with Victor proposal anyway.

Ben.






Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Wed, 2014-01-22 at 11:29 -0800, Victor Kamensky wrote:
 I don't see why you so attached to desire to describe
 data part of memory transaction as just one of int
 types. If we are talking about bunch of hypothetical
 cases imagine such bus that allow transaction with
 size of 6 bytes. How do you describe such data in
 your ints speak? What endianity you can assign to
 sequence of 6 bytes? While note that description of
 such transaction as set of 6 byte values at address
 $whatever makes perfect sense.

Absolutely. For example, the real bus out of a POWER8 core is
something like 128 bytes wide though I wouldn't be surprised if it was
serialized, I don't actually know the details, it's all inside the chip.
The interconnect between chip is a multi-lane elastic interface whose
width has nothing to do with the payload size. Same goes with PCIe.

The only thing that can more/less sanely represent all of these things
is a series of bytes ordered by address, with attributes such as the
access size (or byte enables if that makes more sense or we want to
emulate really funky stuff) and possibly other decoration that some
architectures might want to have (such as caching/combining attributes
etc... which *might* be useful under some circumstances).

Cheers,
Ben.





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Wed, 2014-01-22 at 20:02 +, Peter Maydell wrote:
 
 Defining it as being always guest-order would mean that
 userspace had to continually look at the guest CPU
 endianness bit, which is annoying and awkward.
 
 Defining it as always host-endian order is the most
 reasonable option available. It also happens to work
 for the current QEMU code, which is nice.

No.

Having a byte array coming in that represents what the CPU does in its
current byte order means you do *NOT* need to query the endianness of
the guest CPU from userspace.

Ben.





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Peter Maydell
On 27 January 2014 23:34, Benjamin Herrenschmidt
b...@kernel.crashing.org wrote:
 On Wed, 2014-01-22 at 20:02 +, Peter Maydell wrote:

 Defining it as being always guest-order would mean that
 userspace had to continually look at the guest CPU
 endianness bit, which is annoying and awkward.

 Defining it as always host-endian order is the most
 reasonable option available. It also happens to work
 for the current QEMU code, which is nice.

 No.

 Having a byte array coming in that represents what the CPU does in its
 current byte order means you do *NOT* need to query the endianness of
 the guest CPU from userspace.

Er, what? If we make the array be guest's current order
then by definition userspace has to look at the guest's
current endianness. I agree that would be bad. Either
of the two current proposals (host kernel order; guest
CPU's native/natural/default-byte-order) avoid it, though.

-- PMM



Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Thu, 2014-01-23 at 15:33 +, Peter Maydell wrote:
  (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
  the same, because in the ARM case it is doing an
  internal-to-CPU byteswap, and in the PPC case it is not

Aren't they both byte-order invariant ?

In that case they are the same.

Ben.





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Tue, 2014-01-28 at 11:07 +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2014-01-23 at 15:33 +, Peter Maydell wrote:
   (4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
   the same, because in the ARM case it is doing an
   internal-to-CPU byteswap, and in the PPC case it is not
 
 Aren't they both byte-order invariant ?

I meant byte-address...

 In that case they are the same.
 
 Ben.
 





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt

 The point is simple, and Peter has made it over and over:
 Any consumer of a memory operation sees value, len, address.
 
 This is what KVM_EXIT_MMIO emulates.  So just by knowing the ABI
 definition and having a pointer to the structure you need to be able to
 tell me value, len, address.

But that's useless because it doesn't tell you the byte order
of the value which is critical for emulation unless you *defined* in
your ABI the byte order of the value, and thus include an artificial
swap when the guest is in a different endian mode than the host.


My understanding is that ARM is byte-address invariant, as is powerpc,
so it makes a LOT more sense to carry a sequence of address ordered
bytes instead which will correspond to what the guest code thinks
it's writing, and have the device respond appropriately based on
the endianness of the bus it sits on or the device itself.


(2) the API between kernel and userspace needs to define
the semantics of mmio.data, ie how to map between
x byte wide transaction with value v and the array,
and that is primarily what this conversation is about
(3) the only choice which is both (a) sensible and (b)
not breaking existing usage is to say the array is
in host-kernel-byte-order
(4) PPC CPUs in BE mode and ARM CPUs in BE mode are not
the same, because in the ARM case it is doing an
internal-to-CPU byteswap, and in the PPC case it is not

I very much doubt that there is a difference here, I'm not sure about
that business with internal byteswap.

The order in which the bytes of a word are presented on the bus changes
depending on the core endianness. If that's what you call a byteswap
then both ARM and PPC do it when they are in their other endian,
but that confusion comes from assuming that a data bus has an endianness
at all to begin with.

I was hoping that by 2014, such ideas were things of the past.

  That is one of the key disconnects. I'll go find real examples
  in ARM LE, ARM BE, and PPC BE Linux kernel. Just for
  everybody sake's here is summary of the disconnect:
  
  If we have the same h/w connected to memory bus in ARM
  and PPC systems and we have the following three pieces
  of code that work with r0 having same device same
  register address:
  
  1. ARM LE word write of  0x04030201:
  setend le
  mov r1, #0x04030201
  str r1, [r0]
  
  2. ARM BE word write of 0x01020304:
  setend be
  mov r1, #0x01020304
  str r1, [r0]
  
  3. PPC BE word write of 0x01020304:
  lis r1,0x102
  ori r1,r1,0x304
  stwr1,0(r0)
  
  I claim that h/w will see the same data on bus lines in all
  three cases, and h/w would acts the same in all three
  cases. Peter says that ARM BE and PPC BE case h/w
  would act differently.
  
  If anyone else can offer opinion on that while I am looking
  for real examples that would be great.
  
 
 I really don't think listing all these examples help.  You need to focus
 on the key points that Peter listed in his previous mail.
 
 I tried in our chat to ask you this questions:
 
 vcpu_data_host_to_guest() is handling a read from an emulated device.
 All the info you have is:
 (1) len of memory access
 (2) mmio.data pointer
 (3) destination register
 (4) host CPU endianness
 (5) guest CPU endianness
 
 Based on this information alone, you need to decide whether you do a
 byteswap or not before loading the hardware register upon returning to
 the guest.
 
 You will find it impossible to answer, because you don't know the layout
 of mmio.data, and that is the thing we are trying to solve.
 
 If you cannot reply to this point in less than 50 lines or mention
 anything about devices being LE or BE or come with examples, I am
 probably not going to read your reply, sorry.
 
 -Christoffer





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Thu, 2014-01-23 at 20:11 -0800, Victor Kamensky wrote:
  I would take 50 byteswaps with a clear ABI any day over an obscure
  standard that can avoid a single hardware-on-register instruction.
 This
  is about designing a clean software interface, not about building an
  optimized integrated stack.
 
  Unfortunately, this is going nowhere, so I think we need to stop
 this
  thread.  As you can see I have sent a patch as a clarification to
 the
  ABI, if it's merged we can move on with more important tasks.
 
 OK, that is fine. I still believe is not the best choice,
 but I agree that we need to move on. I will respin my
 V7 KVM BE patches according to this new semantics, I will
 integrate comments that you (thanks!) and others gave me
 over mailing list and post my series again when it is ready.

Right, the whole host endian is a horrible choice from every way you
look at it, but I'm afraid it's unfixable since it's already ABI :-(

Ben.





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Mon, 2014-01-27 at 23:49 +, Peter Maydell wrote:
 
 Er, what? If we make the array be guest's current order
 then by definition userspace has to look at the guest's
 current endianness. I agree that would be bad. Either
 of the two current proposals (host kernel order; guest
 CPU's native/natural/default-byte-order) avoid it, though.

No, this has nothing to do with the guest endianness, and
all to do about the (hopefully) byte-address invariant bus we have
on the processor.

Anyway, the existing crap is ABI so I suspect we have to stick with it,
just maybe document it better.

Ben.





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Christoffer Dall
On Tue, Jan 28, 2014 at 11:32:41AM +1100, Benjamin Herrenschmidt wrote:
 On Thu, 2014-01-23 at 20:11 -0800, Victor Kamensky wrote:
   I would take 50 byteswaps with a clear ABI any day over an obscure
   standard that can avoid a single hardware-on-register instruction.
  This
   is about designing a clean software interface, not about building an
   optimized integrated stack.
  
   Unfortunately, this is going nowhere, so I think we need to stop
  this
   thread.  As you can see I have sent a patch as a clarification to
  the
   ABI, if it's merged we can move on with more important tasks.
  
  OK, that is fine. I still believe is not the best choice,
  but I agree that we need to move on. I will respin my
  V7 KVM BE patches according to this new semantics, I will
  integrate comments that you (thanks!) and others gave me
  over mailing list and post my series again when it is ready.
 
 Right, the whole host endian is a horrible choice from every way you
 look at it, but I'm afraid it's unfixable since it's already ABI :-(
 
Why is it a horrible choice?

I don't think it's actually ABI at this point, it's undefined.

The only thing fixed is PPC BE host and ARM LE host, and in both cases
we currently perform a byteswap in KVM if the guest is a different
endianness.

Honestly I don't care which way it's defined, as long as it's defined
somehow, and I have not yet seen anyone formulate how the ABI
specification should be worded, so people clearly understand what's
going on.

If you take a look at the v2 patch KVM: Specify byte order for
KVM_EXIT_MMIO, that's where it ended up.

If you can formulate something with your experience in endianness that
makes this clear, it would be extremely helpful.

-Christoffer



Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Christoffer Dall
On Tue, Jan 28, 2014 at 11:36:13AM +1100, Benjamin Herrenschmidt wrote:
 On Mon, 2014-01-27 at 23:49 +, Peter Maydell wrote:
  
  Er, what? If we make the array be guest's current order
  then by definition userspace has to look at the guest's
  current endianness. I agree that would be bad. Either
  of the two current proposals (host kernel order; guest
  CPU's native/natural/default-byte-order) avoid it, though.
 
 No, this has nothing to do with the guest endianness, and
 all to do about the (hopefully) byte-address invariant bus we have
 on the processor.
 
 Anyway, the existing crap is ABI so I suspect we have to stick with it,
 just maybe document it better.
 

I'm loosing track of this discussion, Ben, can you explain a bit?  You
wrote:

  Having a byte array coming in that represents what the CPU does in its
  current byte order means you do *NOT* need to query the endianness of
  the guest CPU from userspace.

What does a byte array that represents what the CPU does in its current
byte order mean in this context.  Do you mean the VCPU or the physical
CPU when you say CPU.

I read your text as saying just do a store of the register into the
data pointer and don't worry about endianness, but somebody, somewhere,
has to check the VCPU endianness setting.

I'm probably wrong, and you are probably the right person to clear this
up, but can you formulate exactly what you think the KVM ABI is and how
you would put it in Documentation/virtual/kvm/api.txt?

My point of view is that it is KVM that needs to do this, and it should
emulate the CPU by performing a byteswap in the case where the CPU
E-bit is set on ARM, but this is an ARM-centric way of looking at
things.

Thanks,
-Christoffer



Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-27 Thread Benjamin Herrenschmidt
On Mon, 2014-01-27 at 16:44 -0800, Christoffer Dall wrote:

 I'm loosing track of this discussion, Ben, can you explain a bit?  You
 wrote:
 
   Having a byte array coming in that represents what the CPU does in its
   current byte order means you do *NOT* need to query the endianness of
   the guest CPU from userspace.
 
 What does a byte array that represents what the CPU does in its current
 byte order mean in this context.  Do you mean the VCPU or the physical
 CPU when you say CPU.

It doesn't matter once it's a byte array in address order. Again this is
the *right* abstraction for the kernel ABI, because you do not care
about the endianness of either side, guest or host.

It makes no sense to treat a modern CPU data bus as having an MSB and an
LSB (even if they have it sometimes on the block diagram). Only when
*interpreting a value* on that bus, such as an *address* does the
endianness become of use.

Treat the bus instead as an ordered sequence of bytes in ascending
address order and most of the complexity goes away.

From there, for a given device, it all depends which bytes *that device*
choses to consider as being the MSB vs. LSB. It's not even a bus thing,
though of course some busses suggest an endianness, and some like PCI
mandates it for configuration space. 

But it remains a device-side choice.

 I read your text as saying just do a store of the register into the
 data pointer and don't worry about endianness, but somebody, somewhere,
 has to check the VCPU endianness setting.
 
 I'm probably wrong, and you are probably the right person to clear this
 up, but can you formulate exactly what you think the KVM ABI is and how
 you would put it in Documentation/virtual/kvm/api.txt?
 
 My point of view is that it is KVM that needs to do this, and it should
 emulate the CPU by performing a byteswap in the case where the CPU
 E-bit is set on ARM, but this is an ARM-centric way of looking at
 things.

The ABI going to qemu should be (and inside qemu from TCG to the
emulation) that the CPU did an access of N bytes wide at address A
whose value is the byte array data[] in ascending address order.

Ben.





Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-23 Thread Alexander Graf

On 23.01.2014, at 05:25, Victor Kamensky victor.kamen...@linaro.org wrote:

 Hi Alex,
 
 Sorry, for delayed reply, I was focusing on discussion
 with Peter. Hope you and other folks may get something
 out of it :).
 
 Please see responses inline
 
 On 22 January 2014 02:52, Alexander Graf ag...@suse.de wrote:
 
 On 22.01.2014, at 08:26, Victor Kamensky victor.kamen...@linaro.org wrote:
 
 On 21 January 2014 22:41, Alexander Graf ag...@suse.de wrote:
 
 
 Native endian really is just a shortcut for target endian
 which is LE for ARM and BE for PPC. There shouldn't be
 a qemu-system-armeb or qemu-system-ppc64le.
 
 I disagree. Fully functional ARM BE system is what we've
 been working on for last few months. 'We' is Linaro
 Networking Group, Endian subteam and some other guys
 in ARM and across community. Why we do that is a bit
 beyond of this discussion.
 
 ARM BE patches for both V7 and V8 are already in mainline
 kernel. But ARM BE KVM host is broken now. It is known
 deficiency that I am trying to fix. Please look at [1]. Patches
 for V7 BE KVM were proposed and currently under active
 discussion. Currently I work on ARM V8 BE KVM changes.
 
 So native endian in ARM is value of CPSR register E bit.
 If it is off native endian is LE, if it is on it is BE.
 
 Once and if we agree on ARM BE KVM host changes, the
 next step would be patches in qemu one of which introduces
 qemu-system-armeb. Please see [2].
 
 I think we're facing an ideology conflict here. Yes, there
 should be a qemu-system-arm that is BE capable.
 
 Maybe it is not ideology conflict but rather terminology clarity
 issue :). I am not sure what do you mean by qemu-system-arm
 that is BE capable. In qemu build system there is just target
 name 'arm', which is ARM V7 cpu in LE mode, and 'armeb'
 target which is ARM V7 cpu in BE mode. That is true for a lot
 of open source packages. You could check [1] patch that
 introduces armeb target into qemu. Build for
 arm target produces qemu-system-arm executable that is
 marked 'ELF 32-bit LSB executable' and it could run on LE
 traditional ARM Linux. Build for armeb target produces
 qemu-system-armeb executable that is marked 'ELF 32-bit
 MSB executable' that can run on BE ARM Linux. armbe is
 nothing special here, just build option for qemu that should run
 on BE ARM Linux.

But why should it be called armbe then? What actual difference does the model 
have compared to the qemu-system-arm model?

 
 Both qemu-system-arm and qemu-system-armeb should
 be BE/LE capable. I.e either of them along with KVM could
 either run LE or BE guest. MarcZ demonstrated that this
 is possible. I've tested both LE and BE guests with
 qemu-system-arm running on traditional LE ARM Linux,
 effectively repeating Marc's setup but with qemu.
 And I did test with my patches both BE and LE guests with
 qemu-system-armeb running on BE ARM Linux.
 
 There
 should also be a qemu-system-ppc64 that is LE capable.
 But there is no point in changing the default endiannes
 for the virtual CPUs that we plug in there. Both CPUs are
 perfectly capable of running in LE or BE mode, the
 question is just what we declare the default.
 
 I am not sure, what you mean by default? Is it initial
 setting of CPSR E bit and 'cp15 c1, c0, 0' EE bit? Yes,
 the way it is currently implemented by committed
 qemu-system-arm, and proposed qemu-system-armeb
 patches, they are both off. I.e even qemu-system-armeb
 starts running vcpu in LE mode, exactly by very similar
 reason as desribed in your next paragraph
 qemu-system-armeb has tiny bootloader that starts
 in LE mode, jumps to kernel kernel switches cpu to
 run in BE mode 'setend be' and EE bit is set just
 before mmu is enabled.

You're proving my point even more. If both targets are LE/BE capable and both 
targets start execution in LE mode, then why do we need a qemu-system-armbe at 
all? Just use qemu-system-arm.
 

 Think about the PPC bootstrap. We start off with a
 BE firmware, then boot into the Linux kernel which
 calls a hypercall to set the LE bit on every interrupt.
 
 We have very similar situation with BE ARM Linux.
 When we run ARM BE Linux we start with bootloader
 which is LE and then CPU issues 'setend be' very
 soon as it starts executing kernel code, all secondary
 CPUs issue 'setend be' when they go out of reset pen
 or bootmonitor sleep.
 
 But there's no reason this little endian kernel
 couldn't theoretically have big endian user space running
 with access to emulated device registers.
 
 I don't want to go there, it is very very messy ...
 
 -- Just a side note: --
 Interestingly, half a year before I joined Linaro in Cisco I and
 my colleague implemented kernel patch that allowed to run
 BE user-space processes as sort of separate personality on
 top of LE ARM kernel ... treated kind of multi-abi system.
 Effectively we had to do byteswaps on all non-trivial
 system calls and ioctls in side of the kernel. We converted
 around 30 system calls and around 10 

Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-23 Thread Greg Kurz
On Wed, 22 Jan 2014 20:25:05 -0800
Victor Kamensky victor.kamen...@linaro.org wrote:

 Hi Alex,
 
 Sorry, for delayed reply, I was focusing on discussion
 with Peter. Hope you and other folks may get something
 out of it :).
 
 Please see responses inline
 
 On 22 January 2014 02:52, Alexander Graf ag...@suse.de wrote:
 
  On 22.01.2014, at 08:26, Victor Kamensky victor.kamen...@linaro.org
  wrote:
 
  On 21 January 2014 22:41, Alexander Graf ag...@suse.de wrote:
 
 
  Native endian really is just a shortcut for target endian
  which is LE for ARM and BE for PPC. There shouldn't be
  a qemu-system-armeb or qemu-system-ppc64le.
 
  I disagree. Fully functional ARM BE system is what we've
  been working on for last few months. 'We' is Linaro
  Networking Group, Endian subteam and some other guys
  in ARM and across community. Why we do that is a bit
  beyond of this discussion.
 
  ARM BE patches for both V7 and V8 are already in mainline
  kernel. But ARM BE KVM host is broken now. It is known
  deficiency that I am trying to fix. Please look at [1]. Patches
  for V7 BE KVM were proposed and currently under active
  discussion. Currently I work on ARM V8 BE KVM changes.
 
  So native endian in ARM is value of CPSR register E bit.
  If it is off native endian is LE, if it is on it is BE.
 
  Once and if we agree on ARM BE KVM host changes, the
  next step would be patches in qemu one of which introduces
  qemu-system-armeb. Please see [2].
 
  I think we're facing an ideology conflict here. Yes, there
  should be a qemu-system-arm that is BE capable.
 
 Maybe it is not ideology conflict but rather terminology clarity
 issue :). I am not sure what do you mean by qemu-system-arm
 that is BE capable. In qemu build system there is just target
 name 'arm', which is ARM V7 cpu in LE mode, and 'armeb'
 target which is ARM V7 cpu in BE mode. That is true for a lot
 of open source packages. You could check [1] patch that
 introduces armeb target into qemu. Build for
 arm target produces qemu-system-arm executable that is
 marked 'ELF 32-bit LSB executable' and it could run on LE
 traditional ARM Linux. Build for armeb target produces
 qemu-system-armeb executable that is marked 'ELF 32-bit
 MSB executable' that can run on BE ARM Linux. armbe is
 nothing special here, just build option for qemu that should run
 on BE ARM Linux.
 

Hmmm... it looks like there is a confusion about the qemu command naming.
The -target suffix in qemu-system-target has nothing to do with the ELF
information of the command itself.

[greg@bahia ~]$ file `which qemu-system-arm`
/bin/qemu-system-arm: ELF 64-bit LSB shared object, x86-64, version 1
(SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=0xbcb974847daa8159c17ed74906cd5351387d4097, stripped

It is valid to create a new target if it is substantially different from
existing ones (ppc64 versus ppc for example). This is not the case with ARM
since it is the very same CPU that can switch endianess with the 'setend'
instruction (which needs anyway to be emulated when running in TCG mode).

qemu-system-arm is THE command that should be able to emulate an ARM cpu,
whether the guest does 'setend le' or 'setend be'.

 Both qemu-system-arm and qemu-system-armeb should
 be BE/LE capable. I.e either of them along with KVM could
 either run LE or BE guest. MarcZ demonstrated that this
 is possible. I've tested both LE and BE guests with
 qemu-system-arm running on traditional LE ARM Linux,
 effectively repeating Marc's setup but with qemu.
 And I did test with my patches both BE and LE guests with
 qemu-system-armeb running on BE ARM Linux.
 
  There
  should also be a qemu-system-ppc64 that is LE capable.
  But there is no point in changing the default endiannes
  for the virtual CPUs that we plug in there. Both CPUs are
  perfectly capable of running in LE or BE mode, the
  question is just what we declare the default.
 
 I am not sure, what you mean by default? Is it initial
 setting of CPSR E bit and 'cp15 c1, c0, 0' EE bit? Yes,
 the way it is currently implemented by committed
 qemu-system-arm, and proposed qemu-system-armeb
 patches, they are both off. I.e even qemu-system-armeb
 starts running vcpu in LE mode, exactly by very similar
 reason as desribed in your next paragraph
 qemu-system-armeb has tiny bootloader that starts
 in LE mode, jumps to kernel kernel switches cpu to
 run in BE mode 'setend be' and EE bit is set just
 before mmu is enabled.
 
  Think about the PPC bootstrap. We start off with a
  BE firmware, then boot into the Linux kernel which
  calls a hypercall to set the LE bit on every interrupt.
 
 We have very similar situation with BE ARM Linux.
 When we run ARM BE Linux we start with bootloader
 which is LE and then CPU issues 'setend be' very
 soon as it starts executing kernel code, all secondary
 CPUs issue 'setend be' when they go out of reset pen
 or bootmonitor sleep.
 
  But there's no reason this little endian kernel
  couldn't 

Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-23 Thread Christoffer Dall
On Wed, Jan 22, 2014 at 02:27:29PM +0530, Anup Patel wrote:

[...]

 
 Thanks for the info on QEMU side handling of MMIO data.
 
 I was not aware that we would be only have target endian = LE
 for ARM/ARM64 in QEMU. I think Marc Z had mentioned similar
 thing about MMIO this in our previous discussions on his patches.
 (Please refer, http://www.spinics.net/lists/arm-kernel/msg283313.html)
 
 This clearly means MMIO data passed to user space (QEMU) has
 to of host endianness so that QEMU can take care of bust-device
 endian map.

Hmmm, I'm not sure what you mean exactly, but the fact remains that we
simply need to decide on a layout of mmio.data that (1) doesn't break
existing userspace (2) is clearly defined for mixed-mmio use cases.

 
 Current vcpu_data_guest_to_host() and vcpu_data_host_to_guest()
 does not perform endianness conversion of MMIO data to LE when
 we are running LE guest on BE host so we do need Victor's patch
 for fixing vcpu_data_guest_to_host() and vcpu_data_host_to_guest().
 (Already reported long time back by me,
 http://www.spinics.net/lists/arm-kernel/msg283308.html)
 

The problem is that we cannot decide on how the patch should look like
before the endianness of mmio.data is decided.

Alex, Peter, and I agree that it should be that of the host endianness
and represent what the architecture in question would put on the memory
bus.  In the case of ARM, it's the register value when the VCPU E-bit is
clear, and it's the byteswapped register value when the VCPU E-bit is
set.

Therefore, the patch needs to do an unconditional byteswap when the VCPU
E-bit is set, instead of the beXX_to_cpu and cpu_to_beXX.

I'm sending out a patch to clarify the KVM API so we can move on.

-Christoffer



Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-22 Thread Anup Patel
Hi Alex,

On Wed, Jan 22, 2014 at 12:11 PM, Alexander Graf ag...@suse.de wrote:


 Am 22.01.2014 um 07:31 schrieb Anup Patel a...@brainfault.org:

 On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
 victor.kamen...@linaro.org wrote:
 Hi Guys,

 Christoffer and I had a bit heated chat :) on this
 subject last night. Christoffer, really appreciate
 your time! We did not really reach agreement
 during the chat and Christoffer asked me to follow
 up on this thread.
 Here it goes. Sorry, it is very long email.

 I don't believe we can assign any endianity to
 mmio.data[] byte array. I believe mmio.data[] and
 mmio.len acts just memcpy and that is all. As
 memcpy does not imply any endianity of underlying
 data mmio.data[] should not either.

 Here is my definition:

 mmio.data[] is array of bytes that contains memory
 bytes in such form, for read case, that if those
 bytes are placed in guest memory and guest executes
 the same read access instruction with address to this
 memory, result would be the same as real h/w device
 memory access. Rest of KVM host and hypervisor
 part of code should really take care of mmio.data[]
 memory so it will be delivered to vcpu registers and
 restored by hypervisor part in such way that guest CPU
 register value is the same as it would be for real
 non-emulated h/w read access (that is emulation part).
 The same goes for write access, if guest writes into
 memory and those bytes are just copied to emulated
 h/w register it would have the same effect as real
 mapped h/w register write.

 In shorter form, i.e for len=4 access: endianity of integer
 at mmio.data[0] address should match endianity
 of emulated h/w device behind phys_addr address,
 regardless what is endianity of emulator, KVM host,
 hypervisor, and guest

 Examples that illustrate my definition
 --

 1) LE guest (E bit is off in ARM speak) reads integer
 (4 bytes) from mapped h/w LE device register -
 mmio.data[3] contains MSB, mmio.data[0] contains LSB.

 2) BE guest (E bit is on in ARM speak) reads integer
 from mapped h/w LE device register - mmio.data[3]
 contains MSB, mmio.data[0] contains LSB. Note that
 if mmio.data[0] memory would be placed in guest
 address space and instruction restarted with new
 address, then it would meet BE guest expectations
 - the guest knows that it reads LE h/w so it will byteswap
 register before processing it further. This is BE guest ARM
 case (regardless of what KVM host endianity is).

 3) BE guest reads integer from mapped h/w BE device
 register - mmio.data[0] contains MSB, mmio.data[3]
 contains LSB. Note that if mmio.data[0] memory would
 be placed in guest address space and instruction
 restarted with new address, then it would meet BE
 guest expectation - the guest knows that it reads
 BE h/w so it will proceed further without any other
 work. I guess, it is BE ppc case.


 Arguments in favor of memcpy semantics of mmio.data[]
 --

 x) What are possible values of 'len'? Previous discussions
 imply that is always powers of 2. Why is that? Maybe
 there will be CPU that would need to do 5 bytes mmio
 access, or 6 bytes. How do you assign endianity to
 such case? 'len' 5 or 6, or any works fine with
 memcpy semantics. I admit it is hypothetical case, but
 IMHO it tests how clean ABI definition is.

 x) Byte array does not have endianity because it
 does not have any structure. If one would want to
 imply structure why mmio is not defined in such way
 so structure reflected in mmio definition?
 Something like:


/* KVM_EXIT_MMIO */
struct {
  __u64 phys_addr;
  union {
   __u8 byte;
   __u16 hword;
   __u32 word;
   __u64 dword;
  }  data;
  __u32 len;
  __u8  is_write;
} mmio;

 where len is really serves as union discriminator and
 only allowed len values are 1, 2, 4, 8.
 In this case, I agree, endianity of integer types
 should be defined. I believe, use of byte array strongly
 implies that original intent was to have semantics of
 byte stream copy, just like memcpy does.

 x) Note there is nothing wrong with user kernel ABI to
 use just bytes stream as parameter. There is already
 precedents like 'read' and 'write' system calls :).

 x) Consider case when KVM works with emulated memory mapped
 h/w devices where some devices operate in LE mode and others
 operate in BE mode. It is defined by semantics of real h/w
 device which is it, and should be emulated by emulator and KVM
 given all other context. As far as mmio.data[] array concerned, if the
 same integer value is read from these devices registers, mmio.data[]
 memory should contain integer in opposite endianity for these
 two cases, i.e MSB is data[0] in 

Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-22 Thread Alexander Graf

On 22.01.2014, at 08:26, Victor Kamensky victor.kamen...@linaro.org wrote:

 On 21 January 2014 22:41, Alexander Graf ag...@suse.de wrote:
 
 
 Native endian really is just a shortcut for target endian
 which is LE for ARM and BE for PPC. There shouldn't be
 a qemu-system-armeb or qemu-system-ppc64le.
 
 I disagree. Fully functional ARM BE system is what we've
 been working on for last few months. 'We' is Linaro
 Networking Group, Endian subteam and some other guys
 in ARM and across community. Why we do that is a bit
 beyond of this discussion.
 
 ARM BE patches for both V7 and V8 are already in mainline
 kernel. But ARM BE KVM host is broken now. It is known
 deficiency that I am trying to fix. Please look at [1]. Patches
 for V7 BE KVM were proposed and currently under active
 discussion. Currently I work on ARM V8 BE KVM changes.
 
 So native endian in ARM is value of CPSR register E bit.
 If it is off native endian is LE, if it is on it is BE.
 
 Once and if we agree on ARM BE KVM host changes, the
 next step would be patches in qemu one of which introduces
 qemu-system-armeb. Please see [2].

I think we're facing an ideology conflict here. Yes, there should be a 
qemu-system-arm that is BE capable. There should also be a qemu-system-ppc64 
that is LE capable. But there is no point in changing the default endiannes 
for the virtual CPUs that we plug in there. Both CPUs are perfectly capable of 
running in LE or BE mode, the question is just what we declare the default.

Think about the PPC bootstrap. We start off with a BE firmware, then boot into 
the Linux kernel which calls a hypercall to set the LE bit on every interrupt. 
But there's no reason this little endian kernel couldn't theoretically have big 
endian user space running with access to emulated device registers.

As Peter already pointed out, the actual breakage behind this is that we have a 
default endianness at all. But that's a very difficult thing to resolve and I 
don't think should be our primary goal. Just live with the fact that we declare 
ARM little endian in QEMU and swap things accordingly - then everyone's happy.

This really only ever becomes a problem if you have devices that have awareness 
of the CPUs endian mode. The only one on PPC that I'm aware of that falls into 
this category is virtio and there are patches pending to solve that. I don't 
know if there are any QEMU emulated devices outside of virtio with this issue 
on ARM, but you'll have to make the emulation code for those look at the CPU 
state then.

 
 QEMU emulates everything that comes after the CPU, so
 imagine the ioctl struct as a bus package. Your bus
 doesn't care what endianness the CPU is in - it just
 gets data from the CPU.
 
 I am not sure that I follow above. Suppose I have
 
 move r1, #1
 str r1, [r0]
 
 where r0 is device address. Now depending on CPSR
 E bit value device address will receive 1 as integer either
 in LE order or in BE order. That is how ARM v7 CPU
 works, regardless whether it is emulated or not.
 
 So if E bit is off (LE case) after str is executed
 byte at r0 address will get 1
 byte at r0 + 1 address will get 0
 byte at r0 + 2 address will get 0
 byte at r0 + 3 address will get 0
 
 If E bit is on (BE case) after str is executed
 byte at r0 address will get 0
 byte at r0 + 1 address will get 0
 byte at r0 + 2 address will get 0
 byte at r0 + 3 address will get 1
 
 my point that mmio.data[] just carries bytes for phys_addr
 mmio.data[0] would be value for byte at phys_addr,
 mmio.data[1] would be value for byte at phys_addr + 1, and
 so on.

What we get is an instruction that traps because it wants to write r1 (which 
has value=1) into address x. So at that point we get the register value.

Then we need to take a look at the E bit to see whether the write was supposed 
to be in non-host endianness because we need to emulate exactly the LE/BE 
difference you're indicating above. The way we implement this on PPC is that we 
simply byte swap the register value when guest_endian != host_endian.

With this in place, QEMU can just memcpy() the value into a local register and 
feed it into its emulation code which expects a register value as if the CPU 
was running in native endianness as parameter - with native meaning little 
endian for qemu-system-arm. Device emulation code doesn't know what to do with 
a byte array.

Take a look at QEMU's MMIO handler:

case KVM_EXIT_MMIO:
DPRINTF(handle_mmio\n);
cpu_physical_memory_rw(run-mmio.phys_addr,
   run-mmio.data,
   run-mmio.len,
   run-mmio.is_write);
ret = 0;
break;

which translates to

switch (l) {
case 8:
/* 64 bit write access */
val = ldq_p(buf);
error |= io_mem_write(mr, addr1, val, 8);
break;
case 4:
  

Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-22 Thread Victor Kamensky
Hi Alex,

Sorry, for delayed reply, I was focusing on discussion
with Peter. Hope you and other folks may get something
out of it :).

Please see responses inline

On 22 January 2014 02:52, Alexander Graf ag...@suse.de wrote:

 On 22.01.2014, at 08:26, Victor Kamensky victor.kamen...@linaro.org wrote:

 On 21 January 2014 22:41, Alexander Graf ag...@suse.de wrote:


 Native endian really is just a shortcut for target endian
 which is LE for ARM and BE for PPC. There shouldn't be
 a qemu-system-armeb or qemu-system-ppc64le.

 I disagree. Fully functional ARM BE system is what we've
 been working on for last few months. 'We' is Linaro
 Networking Group, Endian subteam and some other guys
 in ARM and across community. Why we do that is a bit
 beyond of this discussion.

 ARM BE patches for both V7 and V8 are already in mainline
 kernel. But ARM BE KVM host is broken now. It is known
 deficiency that I am trying to fix. Please look at [1]. Patches
 for V7 BE KVM were proposed and currently under active
 discussion. Currently I work on ARM V8 BE KVM changes.

 So native endian in ARM is value of CPSR register E bit.
 If it is off native endian is LE, if it is on it is BE.

 Once and if we agree on ARM BE KVM host changes, the
 next step would be patches in qemu one of which introduces
 qemu-system-armeb. Please see [2].

 I think we're facing an ideology conflict here. Yes, there
 should be a qemu-system-arm that is BE capable.

Maybe it is not ideology conflict but rather terminology clarity
issue :). I am not sure what do you mean by qemu-system-arm
that is BE capable. In qemu build system there is just target
name 'arm', which is ARM V7 cpu in LE mode, and 'armeb'
target which is ARM V7 cpu in BE mode. That is true for a lot
of open source packages. You could check [1] patch that
introduces armeb target into qemu. Build for
arm target produces qemu-system-arm executable that is
marked 'ELF 32-bit LSB executable' and it could run on LE
traditional ARM Linux. Build for armeb target produces
qemu-system-armeb executable that is marked 'ELF 32-bit
MSB executable' that can run on BE ARM Linux. armbe is
nothing special here, just build option for qemu that should run
on BE ARM Linux.

Both qemu-system-arm and qemu-system-armeb should
be BE/LE capable. I.e either of them along with KVM could
either run LE or BE guest. MarcZ demonstrated that this
is possible. I've tested both LE and BE guests with
qemu-system-arm running on traditional LE ARM Linux,
effectively repeating Marc's setup but with qemu.
And I did test with my patches both BE and LE guests with
qemu-system-armeb running on BE ARM Linux.

 There
 should also be a qemu-system-ppc64 that is LE capable.
 But there is no point in changing the default endiannes
 for the virtual CPUs that we plug in there. Both CPUs are
 perfectly capable of running in LE or BE mode, the
 question is just what we declare the default.

I am not sure, what you mean by default? Is it initial
setting of CPSR E bit and 'cp15 c1, c0, 0' EE bit? Yes,
the way it is currently implemented by committed
qemu-system-arm, and proposed qemu-system-armeb
patches, they are both off. I.e even qemu-system-armeb
starts running vcpu in LE mode, exactly by very similar
reason as desribed in your next paragraph
qemu-system-armeb has tiny bootloader that starts
in LE mode, jumps to kernel kernel switches cpu to
run in BE mode 'setend be' and EE bit is set just
before mmu is enabled.

 Think about the PPC bootstrap. We start off with a
 BE firmware, then boot into the Linux kernel which
 calls a hypercall to set the LE bit on every interrupt.

We have very similar situation with BE ARM Linux.
When we run ARM BE Linux we start with bootloader
which is LE and then CPU issues 'setend be' very
soon as it starts executing kernel code, all secondary
CPUs issue 'setend be' when they go out of reset pen
or bootmonitor sleep.

 But there's no reason this little endian kernel
 couldn't theoretically have big endian user space running
 with access to emulated device registers.

I don't want to go there, it is very very messy ...

-- Just a side note: --
Interestingly, half a year before I joined Linaro in Cisco I and
my colleague implemented kernel patch that allowed to run
BE user-space processes as sort of separate personality on
top of LE ARM kernel ... treated kind of multi-abi system.
Effectively we had to do byteswaps on all non-trivial
system calls and ioctls in side of the kernel. We converted
around 30 system calls and around 10 ioctls. Our target process
was just using those and it works working, but patch was
very intrusive and unnatural. I think in Linaro there was
some public version of my presentation circulated that
explained all this mess. I don't want seriously to consider it.

The only robust mixed mode, as MarcZ demonstrated,
could be done only on VM boundaries. I.e LE host can
run BE guest fine. And BE host can run LE guest fine.
Everything else would be a huge mess. If we want to

Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-21 Thread Alexander Graf


 Am 22.01.2014 um 07:31 schrieb Anup Patel a...@brainfault.org:
 
 On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
 victor.kamen...@linaro.org wrote:
 Hi Guys,
 
 Christoffer and I had a bit heated chat :) on this
 subject last night. Christoffer, really appreciate
 your time! We did not really reach agreement
 during the chat and Christoffer asked me to follow
 up on this thread.
 Here it goes. Sorry, it is very long email.
 
 I don't believe we can assign any endianity to
 mmio.data[] byte array. I believe mmio.data[] and
 mmio.len acts just memcpy and that is all. As
 memcpy does not imply any endianity of underlying
 data mmio.data[] should not either.
 
 Here is my definition:
 
 mmio.data[] is array of bytes that contains memory
 bytes in such form, for read case, that if those
 bytes are placed in guest memory and guest executes
 the same read access instruction with address to this
 memory, result would be the same as real h/w device
 memory access. Rest of KVM host and hypervisor
 part of code should really take care of mmio.data[]
 memory so it will be delivered to vcpu registers and
 restored by hypervisor part in such way that guest CPU
 register value is the same as it would be for real
 non-emulated h/w read access (that is emulation part).
 The same goes for write access, if guest writes into
 memory and those bytes are just copied to emulated
 h/w register it would have the same effect as real
 mapped h/w register write.
 
 In shorter form, i.e for len=4 access: endianity of integer
 at mmio.data[0] address should match endianity
 of emulated h/w device behind phys_addr address,
 regardless what is endianity of emulator, KVM host,
 hypervisor, and guest
 
 Examples that illustrate my definition
 --
 
 1) LE guest (E bit is off in ARM speak) reads integer
 (4 bytes) from mapped h/w LE device register -
 mmio.data[3] contains MSB, mmio.data[0] contains LSB.
 
 2) BE guest (E bit is on in ARM speak) reads integer
 from mapped h/w LE device register - mmio.data[3]
 contains MSB, mmio.data[0] contains LSB. Note that
 if mmio.data[0] memory would be placed in guest
 address space and instruction restarted with new
 address, then it would meet BE guest expectations
 - the guest knows that it reads LE h/w so it will byteswap
 register before processing it further. This is BE guest ARM
 case (regardless of what KVM host endianity is).
 
 3) BE guest reads integer from mapped h/w BE device
 register - mmio.data[0] contains MSB, mmio.data[3]
 contains LSB. Note that if mmio.data[0] memory would
 be placed in guest address space and instruction
 restarted with new address, then it would meet BE
 guest expectation - the guest knows that it reads
 BE h/w so it will proceed further without any other
 work. I guess, it is BE ppc case.
 
 
 Arguments in favor of memcpy semantics of mmio.data[]
 --
 
 x) What are possible values of 'len'? Previous discussions
 imply that is always powers of 2. Why is that? Maybe
 there will be CPU that would need to do 5 bytes mmio
 access, or 6 bytes. How do you assign endianity to
 such case? 'len' 5 or 6, or any works fine with
 memcpy semantics. I admit it is hypothetical case, but
 IMHO it tests how clean ABI definition is.
 
 x) Byte array does not have endianity because it
 does not have any structure. If one would want to
 imply structure why mmio is not defined in such way
 so structure reflected in mmio definition?
 Something like:
 
 
/* KVM_EXIT_MMIO */
struct {
  __u64 phys_addr;
  union {
   __u8 byte;
   __u16 hword;
   __u32 word;
   __u64 dword;
  }  data;
  __u32 len;
  __u8  is_write;
} mmio;
 
 where len is really serves as union discriminator and
 only allowed len values are 1, 2, 4, 8.
 In this case, I agree, endianity of integer types
 should be defined. I believe, use of byte array strongly
 implies that original intent was to have semantics of
 byte stream copy, just like memcpy does.
 
 x) Note there is nothing wrong with user kernel ABI to
 use just bytes stream as parameter. There is already
 precedents like 'read' and 'write' system calls :).
 
 x) Consider case when KVM works with emulated memory mapped
 h/w devices where some devices operate in LE mode and others
 operate in BE mode. It is defined by semantics of real h/w
 device which is it, and should be emulated by emulator and KVM
 given all other context. As far as mmio.data[] array concerned, if the
 same integer value is read from these devices registers, mmio.data[]
 memory should contain integer in opposite endianity for these
 two cases, i.e MSB is data[0] in one case and MSB is
 data[3] is in another case. It cannot 

Re: [Qemu-devel] [Qemu-ppc] KVM and variable-endianness guest CPUs

2014-01-21 Thread Victor Kamensky
On 21 January 2014 22:41, Alexander Graf ag...@suse.de wrote:


 Am 22.01.2014 um 07:31 schrieb Anup Patel a...@brainfault.org:

 On Wed, Jan 22, 2014 at 11:09 AM, Victor Kamensky
 victor.kamen...@linaro.org wrote:
 Hi Guys,

 Christoffer and I had a bit heated chat :) on this
 subject last night. Christoffer, really appreciate
 your time! We did not really reach agreement
 during the chat and Christoffer asked me to follow
 up on this thread.
 Here it goes. Sorry, it is very long email.

 I don't believe we can assign any endianity to
 mmio.data[] byte array. I believe mmio.data[] and
 mmio.len acts just memcpy and that is all. As
 memcpy does not imply any endianity of underlying
 data mmio.data[] should not either.

 Here is my definition:

 mmio.data[] is array of bytes that contains memory
 bytes in such form, for read case, that if those
 bytes are placed in guest memory and guest executes
 the same read access instruction with address to this
 memory, result would be the same as real h/w device
 memory access. Rest of KVM host and hypervisor
 part of code should really take care of mmio.data[]
 memory so it will be delivered to vcpu registers and
 restored by hypervisor part in such way that guest CPU
 register value is the same as it would be for real
 non-emulated h/w read access (that is emulation part).
 The same goes for write access, if guest writes into
 memory and those bytes are just copied to emulated
 h/w register it would have the same effect as real
 mapped h/w register write.

 In shorter form, i.e for len=4 access: endianity of integer
 at mmio.data[0] address should match endianity
 of emulated h/w device behind phys_addr address,
 regardless what is endianity of emulator, KVM host,
 hypervisor, and guest

 Examples that illustrate my definition
 --

 1) LE guest (E bit is off in ARM speak) reads integer
 (4 bytes) from mapped h/w LE device register -
 mmio.data[3] contains MSB, mmio.data[0] contains LSB.

 2) BE guest (E bit is on in ARM speak) reads integer
 from mapped h/w LE device register - mmio.data[3]
 contains MSB, mmio.data[0] contains LSB. Note that
 if mmio.data[0] memory would be placed in guest
 address space and instruction restarted with new
 address, then it would meet BE guest expectations
 - the guest knows that it reads LE h/w so it will byteswap
 register before processing it further. This is BE guest ARM
 case (regardless of what KVM host endianity is).

 3) BE guest reads integer from mapped h/w BE device
 register - mmio.data[0] contains MSB, mmio.data[3]
 contains LSB. Note that if mmio.data[0] memory would
 be placed in guest address space and instruction
 restarted with new address, then it would meet BE
 guest expectation - the guest knows that it reads
 BE h/w so it will proceed further without any other
 work. I guess, it is BE ppc case.


 Arguments in favor of memcpy semantics of mmio.data[]
 --

 x) What are possible values of 'len'? Previous discussions
 imply that is always powers of 2. Why is that? Maybe
 there will be CPU that would need to do 5 bytes mmio
 access, or 6 bytes. How do you assign endianity to
 such case? 'len' 5 or 6, or any works fine with
 memcpy semantics. I admit it is hypothetical case, but
 IMHO it tests how clean ABI definition is.

 x) Byte array does not have endianity because it
 does not have any structure. If one would want to
 imply structure why mmio is not defined in such way
 so structure reflected in mmio definition?
 Something like:


/* KVM_EXIT_MMIO */
struct {
  __u64 phys_addr;
  union {
   __u8 byte;
   __u16 hword;
   __u32 word;
   __u64 dword;
  }  data;
  __u32 len;
  __u8  is_write;
} mmio;

 where len is really serves as union discriminator and
 only allowed len values are 1, 2, 4, 8.
 In this case, I agree, endianity of integer types
 should be defined. I believe, use of byte array strongly
 implies that original intent was to have semantics of
 byte stream copy, just like memcpy does.

 x) Note there is nothing wrong with user kernel ABI to
 use just bytes stream as parameter. There is already
 precedents like 'read' and 'write' system calls :).

 x) Consider case when KVM works with emulated memory mapped
 h/w devices where some devices operate in LE mode and others
 operate in BE mode. It is defined by semantics of real h/w
 device which is it, and should be emulated by emulator and KVM
 given all other context. As far as mmio.data[] array concerned, if the
 same integer value is read from these devices registers, mmio.data[]
 memory should contain integer in opposite endianity for these
 two cases, i.e MSB is data[0] in one case and MSB