On Wed, Nov 12, 2014 at 10:08:19AM +0100, Alexander Graf wrote: > > > On 12.11.14 09:49, Frank Blaschka wrote: > > On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote: > >> > >> > >> On 11.11.14 15:08, Frank Blaschka wrote: > >>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote: > >>>> > >>>> > >>>> > >>>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka > >>>>> <blasc...@linux.vnet.ibm.com>: > >>>>> > >>>>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote: > >>>>>> > >>>>>> > >>>>>>> On 11.11.14 13:10, Frank Blaschka wrote: > >>>>>>>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>>> On 10.11.14 15:20, Frank Blaschka wrote: > >>>>>>>>> From: Frank Blaschka <frank.blasc...@de.ibm.com> > >>>>>>>>> > >>>>>>>>> This patch implements the s390 pci instructions in qemu. It allows > >>>>>>>>> to access and drive pci devices attached to the s390 pci bus. > >>>>>>>>> Because of platform constrains devices using IO BARs are not > >>>>>>>>> supported. Also a device has to support MSI/MSI-X to run on s390. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Frank Blaschka <frank.blasc...@de.ibm.com> > >>>>>>>>> --- > >>>>>>>>> target-s390x/Makefile.objs | 2 +- > >>>>>>>>> target-s390x/kvm.c | 52 ++++ > >>>>>>>>> target-s390x/pci_ic.c | 753 > >>>>>>>>> +++++++++++++++++++++++++++++++++++++++++++++ > >>>>>>>>> target-s390x/pci_ic.h | 335 ++++++++++++++++++++ > >>>>>>>>> 4 files changed, 1141 insertions(+), 1 deletion(-) > >>>>>>>>> create mode 100644 target-s390x/pci_ic.c > >>>>>>>>> create mode 100644 target-s390x/pci_ic.h > >>>>>>>>> > >>>>>> > >>>>>> [...] > >>>>>> > >>>>>>>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run) > >>>>>>>>> +{ > >>>>>>>>> + CPUS390XState *env = &cpu->env; > >>>>>>>>> + S390PCIBusDevice *pbdev; > >>>>>>>>> + uint8_t r1 = (run->s390_sieic.ipb & 0x00f00000) >> 20; > >>>>>>>>> + uint8_t r2 = (run->s390_sieic.ipb & 0x000f0000) >> 16; > >>>>>>>>> + PciLgStg *rp; > >>>>>>>>> + uint64_t offset; > >>>>>>>>> + uint64_t data; > >>>>>>>>> + uint8_t len; > >>>>>>>>> + > >>>>>>>>> + cpu_synchronize_state(CPU(cpu)); > >>>>>>>>> + > >>>>>>>>> + if (env->psw.mask & PSW_MASK_PSTATE) { > >>>>>>>>> + program_interrupt(env, PGM_PRIVILEGED, 4); > >>>>>>>>> + return 0; > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> + if (r2 & 0x1) { > >>>>>>>>> + program_interrupt(env, PGM_SPECIFICATION, 4); > >>>>>>>>> + return 0; > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> + rp = (PciLgStg *)&env->regs[r2]; > >>>>>>>>> + offset = env->regs[r2 + 1]; > >>>>>>>>> + > >>>>>>>>> + pbdev = s390_pci_find_dev_by_fh(rp->fh); > >>>>>>>>> + if (!pbdev) { > >>>>>>>>> + DPRINTF("pcilg no pci dev\n"); > >>>>>>>>> + setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE); > >>>>>>>>> + return 0; > >>>>>>>>> + } > >>>>>>>>> + > >>>>>>>>> + len = rp->len & 0xF; > >>>>>>>>> + if (rp->pcias < 6) { > >>>>>>>>> + if ((8 - (offset & 0x7)) < len) { > >>>>>>>>> + program_interrupt(env, PGM_OPERAND, 4); > >>>>>>>>> + return 0; > >>>>>>>>> + } > >>>>>>>>> + MemoryRegion *mr = > >>>>>>>>> pbdev->pdev->io_regions[rp->pcias].memory; > >>>>>>>>> + io_mem_read(mr, offset, &data, len); > >>>>>>>>> + } else if (rp->pcias == 15) { > >>>>>>>>> + if ((4 - (offset & 0x3)) < len) { > >>>>>>>>> + program_interrupt(env, PGM_OPERAND, 4); > >>>>>>>>> + return 0; > >>>>>>>>> + } > >>>>>>>>> + data = pci_host_config_read_common( > >>>>>>>>> + pbdev->pdev, offset, > >>>>>>>>> pci_config_size(pbdev->pdev), len); > >>>>>>>>> + > >>>>>>>>> + switch (len) { > >>>>>>>>> + case 1: > >>>>>>>>> + break; > >>>>>>>>> + case 2: > >>>>>>>>> + data = cpu_to_le16(data); > >>>>>>>>> + break; > >>>>>>>>> + case 4: > >>>>>>>>> + data = cpu_to_le32(data); > >>>>>>>>> + break; > >>>>>>>>> + case 8: > >>>>>>>>> + data = cpu_to_le64(data); > >>>>>>>>> + break; > >>>>>>>> > >>>>>>>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness > >>>>>>>> and LE. So if you're running this on an LE host, you won't swap the > >>>>>>>> value and get a broken result. > >>>>>>>> > >>>>>>>> If you know that the value is always swapped, use bswapxx(). > >>>>>>>> > >>>>>>> > >>>>>>> Actually the code is right and required for a big endian host :-) > >>>>>>> pcilg/pcistg provide access to the PCI config space which is defined > >>>>>>> as PCI byte order (little endian). Since pci_host_config_read_common > >>>>>>> does > >>>>>>> already a le to cpu conversion we have to convert back to PCI byte > >>>>>>> order. > >>>>>>> Doing an unconditional swap would be a bug on a little endian host. > >>>>>> > >>>>>> Why would it be a bug? The value you end up writing is contents of a > >>>>>> register and thus doesn't have endianness. So if QEMU was an LE > >>>>>> process, > >>>>> > >>>>> No, the s390 guest executing pcilg instruction expects to receive > >>>>> config space data > >>>>> in PCI byte order. > >>>>> > >>>>>> the value of data would be identical as on a BE QEMU before your swab. > >>>>>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts > >>>>>> break. > >>>>>> > >>>>> > >>>>> Again on BE endian host we do the swap because of > >>>>> pci_host_config_read_common does > >>>>> read the value and do a byte swap for that value, but we need PCI byte > >>>>> order not BE here. > >>>>> > >>>>> On LE host pci_host_config_read_common does not do a byte swap so we do > >>>>> not have to > >>>>> convert back to PCI byte order. > >>>> > >>>> We maintain the PCI config space always in LE byte order in memory, > >>>> that's why there is a bwap in its read function. The return result of > >>>> the read function however is always the same, regardless of LE or BE > >>>> host. If I do a read of size 4, I will always get 0x1, not 0x01000000 > >>>> returned. > >>>> > >>>> So now you need to convert that 0x1 into a 0x01000000 manually here > >>>> because some architect thought that registers have endianness (which > >>>> they don't). But you need to do it always, even on an LE host, because > >>>> the pci config space return value is identical on LE and BE. > >>>> > >>> so you tell me pci_host_config_read_common does not end up in > >>> pci_default_read_config? > >>> > >>> uint32_t pci_default_read_config(PCIDevice *d, > >>> uint32_t address, int len) > >>> { > >>> uint32_t val = 0; > >>> > >>> memcpy(&val, d->config + address, len); > >>> return le32_to_cpu(val); > >>> } > >>> > >>> What did I miss? > >> > >> That's exactly where you end up in - and it's there to convert from the > >> PCI config space backing storage to a native number. > >> > >> Imagine you write 0x12345678 at offset 0. Because PCI config space is > >> defined to be LE, in the PCI config space memory this gets stored as > >> > >> 78 56 34 12 > >> > >> The reason we do the internal storage of the config space that way is > >> that it's (in some PCI implementations) legal to access with single byte > >> granularities. So you could do a pci_config_read(offset = 1) which > >> should return 0x56. > >> > >> However, that means we completely nullify any effect of host endianness > >> in the PCI config layer already. So if you do pci_config_write(offset = > >> 0, size = 4, value = 0x12345678), the contents of d->config will always > >> be identical, regardless of host endianness. The same holds true for > >> pci_config_read(offset = 0, size = 4). It will always return 0x12345678. > >> > > > > I understood this from the beginning and I completely agree to this. > > > >> In your code, you swab that value again. I assume there's a reason > >> you're swapping it and that it's the way the architecture expects it > > > > Yes, s390 pcilg architecture states: > > Data in the PCI configuration space are treated > > as being in little-endian byte ordering > > > >> (mind to point me to the respective spec so I can verify?). But if the > >> architecture expects it, then it expects it regardless of host > >> endianness. The contents of regs[r1] should always be 0x78563412, no > >> matter whether we're in an LE or a BE environment. > >> > >> Does that make sense now? > >> > > Absolutely lets make an example for qemu running on BE and LE > > > > byte order config space backing pci_default_read_config pcilg (with > > cpu_to_le) > > BE 0x78563412 0x12345678 0x78563412 > > LE 0x78563412 0x78563412 0x78563412 > > No, pci_default_read_config() always returns 0x12345678 because it > returns a register, not memory. >
You mean implementation of pci_default_read_config is broken? If it should return a register it should not do "return le32_to_cpu(val);" > > Alex > > > > > So what is the problem with my code? Adding unconditional byte swap instead > > of > > cpu_to_le in pcilg would break architecture for pcilg if qemu is running on > > LE > > platform. > > > >> > >> Alex > >> > > >