Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-05 Thread Igor Mammedov
On Mon,  2 Nov 2015 17:13:27 +0800
Xiao Guangrong  wrote:

> A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
   ^^ missing one 0???

> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> for detailed design
> 
> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> that controls if nvdimm support is enabled, it is true on default and
> it is false on 2.4 and its earlier version to keep compatibility
> 
> Signed-off-by: Xiao Guangrong 
[...]

> @@ -33,6 +33,15 @@
>   */
>  #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)
>  
> +/*
> + * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
 ^^^ missing 0 or value in define below has an 
extra 0

> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> + * for detailed design.
> + */
> +#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL
it still maps RAM at arbitrary place,
that's the reason why VMGenID patches hasn't been merged for
more than several months.
I'm not against of using (more exactly I'm for it) direct mapping
but we should reach consensus when and how to use it first.

I'd wouldn't use addresses below 4G as it may be used firmware or
legacy hardware and I won't bet that 0xFF00ULL won't conflict
with anything.
An alternative place to allocate reserve from could be high memory.
For pc we have "reserved-memory-end" which currently makes sure
that hotpluggable memory range isn't used by firmware.

How about making API that allows to map additional memory
ranges before reserved-memory-end and pushes it up as mappings are
added.

Michael, Paolo what do you think about it?


> +#define NVDIMM_ACPI_IO_BASE   0x0a18
> +#define NVDIMM_ACPI_IO_LEN4
> +
>  #define TYPE_NVDIMM  "nvdimm"
>  #define NVDIMM(obj)  OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
>  #define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
> @@ -80,4 +89,29 @@ struct NVDIMMClass {
>  };
>  typedef struct NVDIMMClass NVDIMMClass;
>  
> +/*
> + * AcpiNVDIMMState:
> + * @is_enabled: detect if NVDIMM support is enabled.
> + *
> + * @fit: fit buffer which will be accessed via ACPI _FIT method. It is
> + *   dynamically built based on current NVDIMM devices so that it does
> + *   not require to keep consistent during live migration.
> + *
> + * @ram_mr: RAM-based memory region which is mapped into guest address
> + *  space and used to transfer data between OSPM and QEMU.
> + * @io_mr: the IO region used by OSPM to transfer control to QEMU.
> + */
> +struct AcpiNVDIMMState {
> +bool is_enabled;
> +
> +GArray *fit;
> +
> +MemoryRegion ram_mr;
> +MemoryRegion io_mr;
> +};
> +typedef struct AcpiNVDIMMState AcpiNVDIMMState;
> +
> +/* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
> +void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
> +Object *owner, AcpiNVDIMMState *state);
>  #endif

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-05 Thread Haozhong Zhang
On 11/05/15 09:05, Christian Borntraeger wrote:
> Am 02.11.2015 um 10:40 schrieb James Hogan:
> > On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> >> The value of the migrated vcpu's TSC rate is determined as below.
> >>  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> >> user-specified value will be used.
> >>  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> >> present, we will use the TSC rate from KVM (returned by
> >> KVM_GET_TSC_KHZ).
> >>  3. Otherwise, we will use the migrated TSC rate.
> >>
> >> Signed-off-by: Haozhong Zhang 
> >> ---
> >>  include/sysemu/kvm.h |  2 ++
> >>  kvm-all.c|  1 +
> >>  target-arm/kvm.c |  5 +
> >>  target-i386/kvm.c| 33 +
> >>  target-mips/kvm.c|  5 +
> >>  target-ppc/kvm.c |  5 +
> >>  target-s390x/kvm.c   |  5 +
> >>  7 files changed, 56 insertions(+)
> >>
> >> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> >> index 461ef65..0ec8b98 100644
> >> --- a/include/sysemu/kvm.h
> >> +++ b/include/sysemu/kvm.h
> >> @@ -328,6 +328,8 @@ int kvm_arch_fixup_msi_route(struct 
> >> kvm_irq_routing_entry *route,
> >>  
> >>  int kvm_arch_msi_data_to_gsi(uint32_t data);
> >>  
> >> +int kvm_arch_setup_tsc_khz(CPUState *cpu);
> >> +
> >>  int kvm_set_irq(KVMState *s, int irq, int level);
> >>  int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
> >>  
> >> diff --git a/kvm-all.c b/kvm-all.c
> >> index c442838..1ecaf04 100644
> >> --- a/kvm-all.c
> >> +++ b/kvm-all.c
> >> @@ -1757,6 +1757,7 @@ static void do_kvm_cpu_synchronize_post_init(void 
> >> *arg)
> >>  {
> >>  CPUState *cpu = arg;
> >>  
> >> +kvm_arch_setup_tsc_khz(cpu);
> > 
> > Sorry if this is a stupid question, but why aren't you doing this from
> > the i386 kvm_arch_put_registers when level == KVM_PUT_FULL_STATE, rather
> > than introducing x86 specifics to the generic KVM api?
> > 
> > Cheers
> > James
> 
> I agree. We should try to keep this in x86 code.
> 
> 

As in another reply, I'm going to move the above line to
kvm_arch_put_registers() of target-i386 so that it will not pollute
other targets.

Haozhong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-05 Thread Vladimir Sementsov-Ogievskiy

On 03.11.2015 17:47, Xiao Guangrong wrote:



On 11/03/2015 12:16 AM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 18:06, Xiao Guangrong wrote:



On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 16:08, Xiao Guangrong wrote:



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:

On 02.11.2015 12:13, Xiao Guangrong wrote:

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region 
as it
assumed the backend memory of pc-dimm is always properly 
initialized,

we make get_memory_region internally catch the case if something is
wrong


but here you call not pc-dimm's get_memory_region, but common 
ddc->get_memory_region, which may be
nvdimm or possibly other future dimm, so, why not check it here? 
And than pc_dimm_get_memory_region

may be left untouched (error_abort is ok, because errp is unused).


Hmm, because 'here' is not the only place calling 
->get_memory_region, this method has

multiple callers:

$ git grep "\->get_memory_region"
hw/i386/pc.c:MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/i386/pc.c:MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/mem/dimm.c:mr = ddc->get_memory_region(dimm);
hw/mem/nvdimm.c:ddc->get_memory_region = nvdimm_get_memory_region;
hw/mem/pc-dimm.c:ddc->get_memory_region = 
pc_dimm_get_memory_region;

hw/ppc/spapr.c:MemoryRegion *mr = ddc->get_memory_region(dimm);

memory region validation is also done for NVDIMM in nvdimm device.

Ok, then it should be documented by a comment in dimm.h, where 
DIMMDeviceClass is defined, that this

function should not fail



Okay, how about this comment:

/*
 * get the memory region which will be mapped into guest's address
 * space. It is called after dimm device realized so it is never
 * failed.
 */
MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);


if you don't mind:
s/it is never failed/it should never fail and assumed to return valid 
not-NULL address


I'll ok with this if others don't mind, but personally I prefer explicit 
error handling for such functions.





--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-05 Thread Christian Borntraeger
Am 02.11.2015 um 10:40 schrieb James Hogan:
> On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
>> The value of the migrated vcpu's TSC rate is determined as below.
>>  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
>> user-specified value will be used.
>>  2. If neither a user-specified TSC rate nor a migrated TSC rate is
>> present, we will use the TSC rate from KVM (returned by
>> KVM_GET_TSC_KHZ).
>>  3. Otherwise, we will use the migrated TSC rate.
>>
>> Signed-off-by: Haozhong Zhang 
>> ---
>>  include/sysemu/kvm.h |  2 ++
>>  kvm-all.c|  1 +
>>  target-arm/kvm.c |  5 +
>>  target-i386/kvm.c| 33 +
>>  target-mips/kvm.c|  5 +
>>  target-ppc/kvm.c |  5 +
>>  target-s390x/kvm.c   |  5 +
>>  7 files changed, 56 insertions(+)
>>
>> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
>> index 461ef65..0ec8b98 100644
>> --- a/include/sysemu/kvm.h
>> +++ b/include/sysemu/kvm.h
>> @@ -328,6 +328,8 @@ int kvm_arch_fixup_msi_route(struct 
>> kvm_irq_routing_entry *route,
>>  
>>  int kvm_arch_msi_data_to_gsi(uint32_t data);
>>  
>> +int kvm_arch_setup_tsc_khz(CPUState *cpu);
>> +
>>  int kvm_set_irq(KVMState *s, int irq, int level);
>>  int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
>>  
>> diff --git a/kvm-all.c b/kvm-all.c
>> index c442838..1ecaf04 100644
>> --- a/kvm-all.c
>> +++ b/kvm-all.c
>> @@ -1757,6 +1757,7 @@ static void do_kvm_cpu_synchronize_post_init(void *arg)
>>  {
>>  CPUState *cpu = arg;
>>  
>> +kvm_arch_setup_tsc_khz(cpu);
> 
> Sorry if this is a stupid question, but why aren't you doing this from
> the i386 kvm_arch_put_registers when level == KVM_PUT_FULL_STATE, rather
> than introducing x86 specifics to the generic KVM api?
> 
> Cheers
> James

I agree. We should try to keep this in x86 code.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 0/3] KVM: arm/arm64: Clean up some obsolete code

2015-11-05 Thread Peter Maydell
On 5 November 2015 at 06:50, Pavel Fedin 
>  You know, since we are talking about this...  This definitely
> has something to do with the reset, and... Looks like nobody
> resets vGIC/vTimer, unless the userland does it explicitly by
> resetting every register by hand.

This is how KVM in-kernel device reset is supposed to work, yes.

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC 0/4] dma ops and virtio

2015-11-05 Thread Cornelia Huck
On Wed, 4 Nov 2015 10:14:11 -0800
Andy Lutomirski  wrote:

> On Wed, Nov 4, 2015 at 9:52 AM, Cornelia Huck  
> wrote:
> > On Wed, 4 Nov 2015 15:29:23 +0100
> > Cornelia Huck  wrote:
> >
> >> I'm currently suspecting some endianness issues, probably with the ecw
> >> accesses, which is why I'd be interested in your trace information (as
> >> I currently don't have a LE test setup at hand.)
> >
> > I think I've got it. We have sense_data as a byte array, which
> > implicitly makes it BE already. When we copy to the ecws while building
> > the irb, the data ends up in 32 bit values. The conversion from host
> > endianness to BE now treats them as LE on your system...
> >
> > Could you please give the following qemu patch a try?
> 
> Tested-by: Andy Lutomirski 
> 
> Now my test script panics for the right reason (init isn't actually an
> s390 binary).  Thanks!

Cool, thanks for testing! I'll get this into qemu as a proper patch.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-05 Thread Christian Borntraeger
Am 03.11.2015 um 13:26 schrieb Cornelia Huck:
> On Tue,  3 Nov 2015 12:54:39 +0100
> Christian Borntraeger  wrote:
> 
>> As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
>> Make use of dev_archdata to keep the dma_ops per device. The pci devices
>> now use that to override the default, and the default is changed to use
>> the noop ops for everything that is not PCI. To compile without PCI
>> support we also have to enable the DMA api with virtio.
> 
> Not only with virtio, but generally, right?

Yes, will update the patch description.
> 
>> Signed-off-by: Christian Borntraeger 
>> Reviewed-by: Joerg Roedel 
>> Acked-by: Sebastian Ott 
>> ---
>>  arch/s390/Kconfig   | 3 ++-
>>  arch/s390/include/asm/device.h  | 6 +-
>>  arch/s390/include/asm/dma-mapping.h | 6 --
>>  arch/s390/pci/pci.c | 1 +
>>  arch/s390/pci/pci_dma.c | 4 ++--
>>  5 files changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
>> index 1d57000..04f0e02 100644
>> --- a/arch/s390/Kconfig
>> +++ b/arch/s390/Kconfig
>> @@ -113,6 +113,7 @@ config S390
>>  select GENERIC_FIND_FIRST_BIT
>>  select GENERIC_SMP_IDLE_THREAD
>>  select GENERIC_TIME_VSYSCALL
>> +select HAS_DMA
>>  select HAVE_ALIGNED_STRUCT_PAGE if SLUB
>>  select HAVE_ARCH_AUDITSYSCALL
>>  select HAVE_ARCH_EARLY_PFN_TO_NID
>> @@ -124,6 +125,7 @@ config S390
>>  select HAVE_CMPXCHG_DOUBLE
>>  select HAVE_CMPXCHG_LOCAL
>>  select HAVE_DEBUG_KMEMLEAK
>> +select HAVE_DMA_ATTRS
>>  select HAVE_DYNAMIC_FTRACE
>>  select HAVE_DYNAMIC_FTRACE_WITH_REGS
>>  select HAVE_FTRACE_MCOUNT_RECORD
>> @@ -580,7 +582,6 @@ config QDIO
>>
>>  menuconfig PCI
>>  bool "PCI support"
>> -select HAVE_DMA_ATTRS
>>  select PCI_MSI
>>  help
>>Enable PCI support.
> 
> Hm. Further down in this file, there's
> 
> config HAS_DMA
>   
> def_bool PCI  
>   
> select HAVE_DMA_API_DEBUG
> 
> Should we maybe select HAVE_DMA_API_DEBUG above, drop the HAS_DMA
> config option and rely on not defining NO_DMA instead?

Hmm, yes. That would simplify things a lot.  Right now we include
lib/Kconfig (which defines HAS_DMA) and define it ourselfes in 
arch/s390/Kconfig. WHoever comes first wins. Adding a select statement
would make this even more complicated.

Andy, I will simply send you a respin of this patch.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v2] KVM: VMX: Fix commit which broke PML

2015-11-05 Thread Paolo Bonzini


On 05/11/2015 03:04, Kai Huang wrote:
> 
> Thanks for applying! I am really sorry that I forgot to delete the line
> that clears SECONDARY_EXEC_ENABLE_PML bit in vmx_disable_pml, which is
> renamed to vmx_destroy_pml_buffer now.
> It won't impact functionality but to make the function consistent, would
> you also do below? Sorry for such negligence!
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 89f4fa2..ef4ca76 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7826,8 +7826,6 @@ static void vmx_destroy_pml_buffer(struct vcpu_vmx
> *vmx)
> ASSERT(vmx->pml_pg);
> __free_page(vmx->pml_pg);
> vmx->pml_pg = NULL;
> -
> -   vmcs_clear_bits(SECONDARY_VM_EXEC_CONTROL,
> SECONDARY_EXEC_ENABLE_PML);
>  }

No problem.  I haven't yet pushed to kvm/next, so I can change this commit.

Thanks for the quick response.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-05 Thread Xiao Guangrong



On 11/05/2015 05:58 PM, Igor Mammedov wrote:

On Mon,  2 Nov 2015 17:13:27 +0800
Xiao Guangrong  wrote:


A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are

^^ missing one 0???


reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
for detailed design

A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
that controls if nvdimm support is enabled, it is true on default and
it is false on 2.4 and its earlier version to keep compatibility

Signed-off-by: Xiao Guangrong 

[...]


@@ -33,6 +33,15 @@
   */
  #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)

+/*
+ * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are

  ^^^ missing 0 or value in define below has an 
extra 0


+ * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
+ * for detailed design.
+ */
+#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL

it still maps RAM at arbitrary place,
that's the reason why VMGenID patches hasn't been merged for
more than several months.
I'm not against of using (more exactly I'm for it) direct mapping
but we should reach consensus when and how to use it first.

I'd wouldn't use addresses below 4G as it may be used firmware or
legacy hardware and I won't bet that 0xFF00ULL won't conflict
with anything.
An alternative place to allocate reserve from could be high memory.
For pc we have "reserved-memory-end" which currently makes sure
that hotpluggable memory range isn't used by firmware.

How about making API that allows to map additional memory
ranges before reserved-memory-end and pushes it up as mappings are
added.


That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
address in ACPI in QEMU is not a easy work - we can not simply make
SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
v3's changelog:

  3) we figure out a unused memory hole below 4G that is 0xFF0 ~
 0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
 ACPI SSDT/DSDT table will break windows XP.
 BTW, only make SSDT.rev = 2 can not work since the width is only depended
 on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
 in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
| the global integer width for all integers, including integers in SSDTs.
  4) use the lowest ACPI spec version to document AML terms.

The only way introducing 64 bit address is adding XSDT support that what
Michael did before, however, it seems it need great efforts to do it as
it will break OVMF. It's a long term workload. :(

The luck thing is, the ACPI part is not ABI, we can move it to the high
memory if QEMU supports XSDT is ready in future development.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-05 Thread Igor Mammedov
On Thu, 5 Nov 2015 18:15:31 +0800
Xiao Guangrong  wrote:

> 
> 
> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:27 +0800
> > Xiao Guangrong  wrote:
> >
> >> A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
> > ^^ missing one 0???
> >
> >> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> for detailed design
> >>
> >> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >> that controls if nvdimm support is enabled, it is true on default and
> >> it is false on 2.4 and its earlier version to keep compatibility
> >>
> >> Signed-off-by: Xiao Guangrong 
> > [...]
> >
> >> @@ -33,6 +33,15 @@
> >>*/
> >>   #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)
> >>
> >> +/*
> >> + * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
> >   ^^^ missing 0 or value in define below 
> > has an extra 0
> >
> >> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> + * for detailed design.
> >> + */
> >> +#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL
> > it still maps RAM at arbitrary place,
> > that's the reason why VMGenID patches hasn't been merged for
> > more than several months.
> > I'm not against of using (more exactly I'm for it) direct mapping
> > but we should reach consensus when and how to use it first.
> >
> > I'd wouldn't use addresses below 4G as it may be used firmware or
> > legacy hardware and I won't bet that 0xFF00ULL won't conflict
> > with anything.
> > An alternative place to allocate reserve from could be high memory.
> > For pc we have "reserved-memory-end" which currently makes sure
> > that hotpluggable memory range isn't used by firmware.
> >
> > How about making API that allows to map additional memory
> > ranges before reserved-memory-end and pushes it up as mappings are
> > added.
> 
> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
> address in ACPI in QEMU is not a easy work - we can not simply make
> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
> v3's changelog:
> 
>3) we figure out a unused memory hole below 4G that is 0xFF0 ~
>   0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
>   ACPI SSDT/DSDT table will break windows XP.
>   BTW, only make SSDT.rev = 2 can not work since the width is only 
> depended
>   on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>   in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
> | the global integer width for all integers, including integers in SSDTs.
>4) use the lowest ACPI spec version to document AML terms.
> 
> The only way introducing 64 bit address is adding XSDT support that what
> Michael did before, however, it seems it need great efforts to do it as
> it will break OVMF. It's a long term workload. :(
to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
I already have a patch that switches DSDT/SSDT to rev2.
Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
on linux & later Windows versions.

> 
> The luck thing is, the ACPI part is not ABI, we can move it to the high
> memory if QEMU supports XSDT is ready in future development.
But mapped control region is ABI and we can't change it if we find out later
that it breaks something.

> 
> Thanks!

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: IT Services

2015-11-05 Thread Deleon, Jorge (RGC) A Middle Grade
 Attention,



Your Password Expires in 2hour(s) You are to change your Password below via the 
ACCOUNT MANAGEMET PAGE.



Click on CHANGE-PASSWORD



 If Password is not change in the next 2hour(s) Your next log-in Access will be 
declined.



If you do find any difficulties to Change Password, quotas, accessing files or 
missing files please contact the ITS Helpdesk 
(itshelpd...@bartsandthelondon.nhs.uk/83612).



Regards,

IT Services



Many Thanks,







Remote Desktop Services Co-ordinator

Windows Operations (ITS)

The information contained in this message is confidential and is intended for 
the addressee only. If you have received this message in error or there are any 
problems, please notify the originator immediately. The unauthorised use, 
disclosure, copying or alteration of this message is strictly forbidden. This 
mail and any attachments have been scanned for viruses prior to leaving the 
Barts Health NHS Trust network. Barts Health NHS Trust will not be liable for 
direct, special, indirect or consequential damages arising from alteration of 
the contents of this message by a third party or as a result of any virus being 
passed on.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 3/3] target-i386: load the migrated vcpu's TSC rate

2015-11-05 Thread Eduardo Habkost
On Mon, Nov 02, 2015 at 05:26:43PM +0800, Haozhong Zhang wrote:
> Set vcpu's TSC rate to the migrated value if the user does not specify a
> TSC rate by cpu option 'tsc-freq' and a migrated TSC rate does exist. If
> KVM supports TSC scaling, guest programs will observe TSC increasing in
> the migrated rate other than the host TSC rate.
> 
> Signed-off-by: Haozhong Zhang 
> ---
>  target-i386/kvm.c | 21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index aae5e58..2be70df 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -3042,6 +3042,27 @@ int kvm_arch_setup_tsc_khz(CPUState *cs)
>  int r;
>  
>  /*
> + * If a TSC rate is migrated and the user does not specify the
> + * vcpu's TSC rate on the destination, the migrated TSC rate will
> + * be used on the destination after the migration.
> + */
> +if (env->tsc_khz_saved && !env->tsc_khz) {
> +if (kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL)) {
> +r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz_saved);

Why are you duplicating the existing KVM_SET_TSC_KHZ code in
kvm_arch_init_vcpu()?

> +if (r < 0) {
> +fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");

If you want to report errors, please use error_report().

(But I don't think we want to print those warnings. See below.)

> +}
> +} else {
> +r = -1;
> +fprintf(stderr, "KVM doesn't support TSC scaling\n");
> +}
> +if (r < 0) {
> +fprintf(stderr, "Use host TSC frequency instead. "

Did you mean "Using host TSC frequency instead."?

> +"Guest TSC may be inaccurate.\n");
> +}
> +}

This will make QEMU print a warning every single time when migrating to
hosts that don't support TSC scaling, even if the source and destination
hosts already have the same TSC frequency. That means most users will
see a bogus warning, in today's hardware.

Maybe it will be acceptable to print a warning if (and only if) we know
that the host TSC is different from the original TSC frequency.

Considering that we already have code to handle tsc_khz that prints an
error, you don't need to duplicate it. You could handle both
user-provided and migration tsc_khz cases with the same code. With
something like this:

if (env->tsc_khz) { /* may be set by the user, or loaded from incoming 
migration */
r = kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL) ?
kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz) :
-ENOTSUP;
if (r < 0) {
int64_t cur_freq = kvm_check_extension(KVM_CAP_GET_TSC_KHZ)) ?
   kvm_vcpu_ioctl(KVM_GET_TSC_KHZ) :
   0;
/* If we know the host frequency, print a warning every time
 * there's a mismatch.
 * If we don't know the host frequency, print a warning only
 * if the user asked for a specific TSC frequency.
 */
if ((cur_freq <= 0 && env->tsc_freq_requested_by_user) ||
(cur_freq > 0 && cur_freq != env->tsc_khz)) {
error_report("warning: TSC frequency mismatch between VM and 
host, and TSC scaling unavailable");
if (env->tsc_freq_set_by_user) {
return r;
}
}
}
}

You will just need a new tsc_freq_requested_by_user field to track if
the TSC frequency was explicitly requested by the user.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-05 Thread Gerhard Wiesinger

On 04.11.2015 23:04, Paolo Bonzini wrote:


On 04/11/2015 22:33, Gerhard Wiesinger wrote:

What is the problem you are seeing?  KVM can emulate task switches; the
intercept is set here because of a processor erratum that can mess them
up even though, in theory, AMD supports task switching from guest mode.

See old thread:
https://lists.nongnu.org/archive/html/qemu-devel/2012-04/msg01506.html

Can you obtain the traces you were asked for at the time?


./trace-cmd record -b 2 -e kvm
./trace-cmd report | grep -i task_switch
 qemu-system-x86-6024  [001] 792774.719297: kvm_exit: reason 
task_switch rip 0x4883 info 158 40


But I can't interpret it. But I know my patch works well. Since it is 
just a module parameter it is fully backward compatible by default and 
because of the one liner no side effects are possible. So a intergration 
would be good.


Ciao,
Gerhard

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-05 Thread Paolo Bonzini


On 05/11/2015 17:07, Gerhard Wiesinger wrote:
>>
> 
> ./trace-cmd record -b 2 -e kvm
> ./trace-cmd report | grep -i task_switch
>  qemu-system-x86-6024  [001] 792774.719297: kvm_exit: reason task_switch
> rip 0x4883 info 158 40

0x158 is the segment selector of the incoming TSS, and the task switch
was caused by a far jump.

> But I can't interpret it.

Neither can I; you have to send the whole trace.

> But I know my patch works well. Since it is
> just a module parameter it is fully backward compatible by default and
> because of the one liner no side effects are possible. So a intergration
> would be good.

It's also papering over a bug, and likely the bug still triggers on
Intel systems.  So it's not acceptable.

Can you provide reproduction instructions please?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too

2015-11-05 Thread Sasha Levin
On 11/05/2015 09:41 AM, Will Deacon wrote:
> On Wed, Nov 04, 2015 at 06:51:12PM -0500, Sasha Levin wrote:
>> > On 11/04/2015 06:51 AM, Will Deacon wrote:
>>> > > +   mutex_lock(_lock);
>>> > > +
>>> > > +   /* The kvm->cpus array contains a null pointer in the last 
>>> > > location */
>>> > > +   for (i = 0; ; i++) {
>>> > > +   if (kvm->cpus[i])
>>> > > +   pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT);
>>> > > +   else
>>> > > +   break;
>>> > > +   }
>>> > > +
>>> > > +   kvm__continue(kvm);
>> > 
>> > In this scenario: if we grabbed pause_lock, signaled vcpu0 to exit, and it 
>> > did
>> > before we called kvm__continue(), we won't end up releasing pause_lock, 
>> > which
>> > might cause a lockup later, no?
> Hmm, yeah, maybe that should be an explicit mutex_unlock rather than a
> call to kvm__continue.

Yeah, that should do the trick.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fixed KVM problems with old DOS programs. Compatibility can be forced by module parameter.

2015-11-05 Thread Paolo Bonzini


On 05/11/2015 17:15, Paolo Bonzini wrote:
> 
> 
> On 05/11/2015 17:07, Gerhard Wiesinger wrote:
>>>
>>
>> ./trace-cmd record -b 2 -e kvm
>> ./trace-cmd report | grep -i task_switch
>>  qemu-system-x86-6024  [001] 792774.719297: kvm_exit: reason task_switch
>> rip 0x4883 info 158 40
> 
> 0x158 is the segment selector of the incoming TSS, and the task switch
> was caused by a far jump.
> 
>> But I can't interpret it.
> 
> Neither can I; you have to send the whole trace.
> 
>> But I know my patch works well. Since it is
>> just a module parameter it is fully backward compatible by default and
>> because of the one liner no side effects are possible. So a intergration
>> would be good.
> 
> It's also papering over a bug, and likely the bug still triggers on
> Intel systems.  So it's not acceptable.
> 
> Can you provide reproduction instructions please?

At the very least, does it reproduce without KVM?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-05 Thread Eduardo Habkost
On Thu, Nov 05, 2015 at 09:30:51AM +0800, Haozhong Zhang wrote:
> On 11/04/15 19:42, Eduardo Habkost wrote:
> > On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> > > The value of the migrated vcpu's TSC rate is determined as below.
> > >  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> > > user-specified value will be used.
> > >  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> > > present, we will use the TSC rate from KVM (returned by
> > > KVM_GET_TSC_KHZ).
> > >  3. Otherwise, we will use the migrated TSC rate.
> > > 
> > > Signed-off-by: Haozhong Zhang 
> > [...]
> > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > > index 64046cb..aae5e58 100644
> > > --- a/target-i386/kvm.c
> > > +++ b/target-i386/kvm.c
> > > @@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> > >  {
> > >  abort();
> > >  }
> > > +
> > > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > > +{
> > > +X86CPU *cpu = X86_CPU(cs);
> > > +CPUX86State *env = >env;
> > > +int r;
> > > +
> > > +/*
> > > + * Prepare vcpu's TSC rate to be migrated.
> > > + *
> > > + * - If the user specifies the TSC rate by cpu option 'tsc-freq',
> > > + *   we will use the user-specified value.
> > > + *
> > > + * - If there is neither user-specified TSC rate nor migrated TSC
> > > + *   rate, we will ask KVM for the TSC rate by calling
> > > + *   KVM_GET_TSC_KHZ.
> > > + *
> > > + * - Otherwise, if there is a migrated TSC rate, we will use the
> > > + *   migrated value.
> > > + */
> > > +if (env->tsc_khz) {
> > > +env->tsc_khz_saved = env->tsc_khz;
> > > +} else if (!env->tsc_khz_saved) {
> > > +r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> > > +if (r < 0) {
> > > +fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
> > > +return r;
> > > +}
> > 
> > The lack of KVM_CAP_GET_TSC_KHZ should make QEMU abort, unless the user
> > is explicitly requesting a more strict mode where the TSC frequency will
> > be guaranteed to never change.
> >
> 
> I agree KVM_CAP_GET_TSC_KHZ should be checked before KVM_GET_TSC_KHZ,
> but I don't think the lack of it should abort QEMU.


Oops, I meant to write: "the lack of KVM_CAP_GET_TSC_KHZ should not
abort QEMU".

> This piece of code
> on the source machine is just to get the TSC frequency to be
> migrated. If it fails, it will leave env->tsc_khz_saved be 0. And
> according to tsc_khz_needed() in patch 1, if env->tsc_khz_saved == 0,
> no TSC frequency will be migrated. So the lack of KVM_CAP_GET_TSC_KHZ
> only hurts the migration and does not need to abort QEMU on the source
> machine.

The lack of KVM_CAP_GET_TSC_KHZ shouldn't prevent migration either. but
it looks your code is not doing that: errors from
kvm_arch_setup_tsc_khz() are being ignored by
do_kvm_cpu_synchronize_post_init(), sorry for the noise.

> 
> > > +env->tsc_khz_saved = r;
> > > +}
> > 
> > Why do you need a separate tsc_khz_saved field, and don't simply use
> > tsc_khz? It would have the additional feature of letting QMP clients
> > query the current TSC rate by asking for the tsc-freq property on CPU
> > objects.
> >
> 
> It's to avoid overriding env->tsc_khz on the destination in the
> migration. I can change this line to
>  env->tsc_khz = env->tsc_khz_saved = r;

You are already avoiding overriding env->tsc_khz, because you use
KVM_GET_TSC_KHZ only if tsc_khz is not set yet. I still don't see why
you need a tsc_khz_saved field that requires duplicating the SET_TSC_KHZ
code, if you could just do this:

if (!env->tsc_khz) {
env->tsc_khz = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
}


> 
> For the additional QMP feature, will the value of tsc-freq property be
> env->tsc_khz? If yes, I guess the above change would be fine?

It may work, but I still don't see the point of duplicating the field
and duplicating the existing SET_TSC_KHZ code.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Link kvm with Xed library: Unknown symbol error

2015-11-05 Thread Yacine HEBBAL
Hi, 
I'm a student working on a monitoring application on top of KVM in which I
need to disassemble some VM instructions from the hypervisor level.
To do so (disassemble instructions), I want to use Xed library from the code
I added to KVM.
The problem is that Xed library comes with header files and a static library
file. When I compile kvm-kmod with Xed (with some changes in the Kbuild file
to include Xed).
I get the following input:

compilation OK with some warnings:

WARNING: "xed_encode_order_limit" [/home/yacine/vmi/kvm-kmod/x86/kvm-intel]
is COMMON symbol
 // omitted
WARNING: "xed_chip_features2" [/home/yacine/vmi/kvm-kmod/x86/kvm-intel] is
COMMON symbol
WARNING: "stderr" [/home/yacine/vmi/kvm-kmod/x86/kvm-intel.ko] undefined!
WARNING: "fprintf" [/home/yacine/vmi/kvm-kmod/x86/kvm-intel.ko] undefined!
WARNING: "abort" [/home/yacine/vmi/kvm-kmod/x86/kvm-intel.ko] undefined!
WARNING: "__umoddi3" [/home/yacine/vmi/kvm-kmod/x86/kvm-intel.ko] undefined!

insmod: error inserting 'kvm-intel.ko': -1 Invalid module format

when running dmesg:

[17591.101368] kvm_intel: please compile with -fno-common
... // omitted
[17591.101527] kvm_intel: Unknown symbol __umoddi3 (err 0)
[17591.101553] kvm_intel: Unknown symbol abort (err 0)
... // omitted
[17591.101763] kvm_intel: please compile with -fno-common

If anyone has an idea how to solve this, I would be very grateful :)





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH] target-i386: enable cflushopt/clwb/pcommit instructions

2015-11-05 Thread Eduardo Habkost
On Thu, Nov 05, 2015 at 08:51:24AM +0100, Richard Henderson wrote:
> On 11/04/2015 08:35 PM, Eduardo Habkost wrote:
> >On Fri, Oct 30, 2015 at 01:54:33PM -0700, Richard Henderson wrote:
> >>On 10/29/2015 12:31 AM, Xiao Guangrong wrote:
> >>>These instructions are used by NVDIMM drivers and the specification
> >>>locates at:
> >>>https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> >>>
> >>>There instructions are available on Skylake Server
> >>>
> >>>Signed-off-by: Xiao Guangrong 
> >>>---
> >>>  target-i386/cpu.c | 8 +---
> >>>  target-i386/cpu.h | 3 +++
> >>>  2 files changed, 8 insertions(+), 3 deletions(-)
> >>
> >>Reviewed-by: Richard Henderson 
> >>
> >>Although it would be nice to update the comments in translate.c to include 
> >>the
> >>new insns, since they overlap mfence and sfence.  At present we only check 
> >>for
> >>SSE enabled when accepting these; I suppose it's easiest to consider it 
> >>invalid
> >>to specify +clwb,-sse?
> >
> >I assume you want to add the extra SSE requirement to TCG code, not to
> >generic x86 code, then I have no objections.
> 
> I don't really want to add any requirement, just point and laugh at anyone
> who reports an bug for the above condition.
> 
> >But in the case of clwb (/6 with a memory operand, modrm != 0xc0), we
> >are not just requiring SSE2: we are rejecting the instruction unless
> >modrm == 0xc0. That means TCG is rejecting the clwb instruction, so I
> >believe we shouldn't add CLWB to TCG_7_0_EBX_FEATURES yet.
> 
> Hmm, yes.
> 
> I've cleaned up some of this code on a branch, but it didn't get enough
> testing or review this cycle, so it's going to wait for the next.  I see
> you've posted a patch for this, which should be good enough until then.

I will apply this patch without the TCG_*_FEATURES changes until we
change TCG, then. That's OK?

About the TCG patches I have sent, please let me know if they look good
and appropriate for 2.5. This is the first time I have touched TCG code.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-05 Thread Eduardo Habkost
On Mon, Nov 02, 2015 at 05:13:22PM +0800, Xiao Guangrong wrote:
[...]
>  static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>  {
> -return host_memory_backend_get_memory(dimm->hostmem, _abort);
> +Error *local_err = NULL;
> +MemoryRegion *mr;
> +
> +mr = host_memory_backend_get_memory(dimm->hostmem, _err);
> +
> +/*
> + * plug a pc-dimm device whose backend memory was not properly
> + * initialized?
> + */
> +assert(!local_err && mr);

I don't know if you are going to remove the errp parameter in the next
version, but if you want to simply abort in case an error is reported by
a function, you can use _abort.

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] s390/dma: Allow per device dma ops

2015-11-05 Thread Andy Lutomirski

On 11/05/2015 01:33 AM, Christian Borntraeger wrote:

Am 03.11.2015 um 13:26 schrieb Cornelia Huck:

On Tue,  3 Nov 2015 12:54:39 +0100
Christian Borntraeger  wrote:


As virtio-ccw now has dma ops, we can no longer default to the PCI ones.
Make use of dev_archdata to keep the dma_ops per device. The pci devices
now use that to override the default, and the default is changed to use
the noop ops for everything that is not PCI. To compile without PCI
support we also have to enable the DMA api with virtio.


Not only with virtio, but generally, right?


Yes, will update the patch description.



Signed-off-by: Christian Borntraeger 
Reviewed-by: Joerg Roedel 
Acked-by: Sebastian Ott 
---
  arch/s390/Kconfig   | 3 ++-
  arch/s390/include/asm/device.h  | 6 +-
  arch/s390/include/asm/dma-mapping.h | 6 --
  arch/s390/pci/pci.c | 1 +
  arch/s390/pci/pci_dma.c | 4 ++--
  5 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 1d57000..04f0e02 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -113,6 +113,7 @@ config S390
select GENERIC_FIND_FIRST_BIT
select GENERIC_SMP_IDLE_THREAD
select GENERIC_TIME_VSYSCALL
+   select HAS_DMA
select HAVE_ALIGNED_STRUCT_PAGE if SLUB
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_EARLY_PFN_TO_NID
@@ -124,6 +125,7 @@ config S390
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_DEBUG_KMEMLEAK
+   select HAVE_DMA_ATTRS
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_FTRACE_MCOUNT_RECORD
@@ -580,7 +582,6 @@ config QDIO

  menuconfig PCI
bool "PCI support"
-   select HAVE_DMA_ATTRS
select PCI_MSI
help
  Enable PCI support.


Hm. Further down in this file, there's

config HAS_DMA
 def_bool PCI
 select HAVE_DMA_API_DEBUG

Should we maybe select HAVE_DMA_API_DEBUG above, drop the HAS_DMA
config option and rely on not defining NO_DMA instead?


Hmm, yes. That would simplify things a lot.  Right now we include
lib/Kconfig (which defines HAS_DMA) and define it ourselfes in
arch/s390/Kconfig. WHoever comes first wins. Adding a select statement
would make this even more complicated.

Andy, I will simply send you a respin of this patch.



I'm slightly concerned that I'm going to screw this all up and apply the 
wrong version.  Could you resend the whole series with git format-patch 
-vN for some appropriate N (or similar)?


Thanks,
Andy


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL v4 2/3] alpha/dma: use common noop dma ops

2015-11-05 Thread Christian Borntraeger
Some of the alpha pci noop dma ops are identical to the common ones.
Use them.

Signed-off-by: Christian Borntraeger 
Reviewed-by: Joerg Roedel 
---
 arch/alpha/kernel/pci-noop.c | 46 
 1 file changed, 4 insertions(+), 42 deletions(-)

diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c
index 2b1f4a1..8e735b5e 100644
--- a/arch/alpha/kernel/pci-noop.c
+++ b/arch/alpha/kernel/pci-noop.c
@@ -123,44 +123,6 @@ static void *alpha_noop_alloc_coherent(struct device *dev, 
size_t size,
return ret;
 }
 
-static void alpha_noop_free_coherent(struct device *dev, size_t size,
-void *cpu_addr, dma_addr_t dma_addr,
-struct dma_attrs *attrs)
-{
-   free_pages((unsigned long)cpu_addr, get_order(size));
-}
-
-static dma_addr_t alpha_noop_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size,
- enum dma_data_direction dir,
- struct dma_attrs *attrs)
-{
-   return page_to_pa(page) + offset;
-}
-
-static int alpha_noop_map_sg(struct device *dev, struct scatterlist *sgl, int 
nents,
-enum dma_data_direction dir, struct dma_attrs 
*attrs)
-{
-   int i;
-   struct scatterlist *sg;
-
-   for_each_sg(sgl, sg, nents, i) {
-   void *va;
-
-   BUG_ON(!sg_page(sg));
-   va = sg_virt(sg);
-   sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va);
-   sg_dma_len(sg) = sg->length;
-   }
-
-   return nents;
-}
-
-static int alpha_noop_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
 static int alpha_noop_supported(struct device *dev, u64 mask)
 {
return mask < 0x00ffUL ? 0 : 1;
@@ -168,10 +130,10 @@ static int alpha_noop_supported(struct device *dev, u64 
mask)
 
 struct dma_map_ops alpha_noop_ops = {
.alloc  = alpha_noop_alloc_coherent,
-   .free   = alpha_noop_free_coherent,
-   .map_page   = alpha_noop_map_page,
-   .map_sg = alpha_noop_map_sg,
-   .mapping_error  = alpha_noop_mapping_error,
+   .free   = dma_noop_free_coherent,
+   .map_page   = dma_noop_map_page,
+   .map_sg = dma_noop_map_sg,
+   .mapping_error  = dma_noop_mapping_error,
.dma_supported  = alpha_noop_supported,
 };
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL v4 0/3] dma and virtio prep patches

2015-11-05 Thread Christian Borntraeger
Andy,

to make it obvious which version is the latest, here is a branch

The following changes since commit 6a13feb9c82803e2b815eca72fa7a9f5561d7861:

  Linux 4.3 (2015-11-01 16:05:25 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  dma

for you to fetch changes up to fc7f9754db6ce0c12281da4055281f731d36bdee:

  s390/dma: Allow per device dma ops (2015-11-05 21:02:40 +0100)


Christian Borntraeger (3):
  dma: Provide simple noop dma ops
  alpha/dma: use common noop dma ops
  s390/dma: Allow per device dma ops

 arch/alpha/kernel/pci-noop.c| 46 ++-
 arch/s390/Kconfig   |  7 +---
 arch/s390/include/asm/device.h  |  6 ++-
 arch/s390/include/asm/dma-mapping.h |  6 ++-
 arch/s390/pci/pci.c |  1 +
 arch/s390/pci/pci_dma.c |  4 +-
 include/linux/dma-mapping.h |  2 +
 lib/Makefile|  1 +
 lib/dma-noop.c  | 75 +
 9 files changed, 96 insertions(+), 52 deletions(-)
 create mode 100644 lib/dma-noop.c

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm: Don't try to flush hyp-mode device mappings

2015-11-05 Thread kbuild test robot
Hi Christoffer,

[auto build test WARNING on: kvmarm/next]
[also build test WARNING on: v4.3 next-20151105]

url:
https://github.com/0day-ci/linux/commits/Christoffer-Dall/KVM-arm-Don-t-try-to-flush-hyp-mode-device-mappings/20151105-232548
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
config: arm64-defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm64 

All warnings (new ones prefixed by >>):

   arch/arm64/kvm/../../../arch/arm/kvm/mmu.c: In function 'unmap_ptes':
>> arch/arm64/kvm/../../../arch/arm/kvm/mmu.c:212:3: warning: ISO C90 forbids 
>> mixed declarations and code [-Wdeclaration-after-statement]
  pte_t old_pte = *pte;
  ^

vim +212 arch/arm64/kvm/../../../arch/arm/kvm/mmu.c

363ef89f8 Marc Zyngier 2014-12-19  196   *
363ef89f8 Marc Zyngier 2014-12-19  197   * This is why right after 
unmapping a page/section and invalidating
363ef89f8 Marc Zyngier 2014-12-19  198   * the corresponding TLBs, we call 
kvm_flush_dcache_p*() to make sure
363ef89f8 Marc Zyngier 2014-12-19  199   * the IO subsystem will never hit 
in the cache.
363ef89f8 Marc Zyngier 2014-12-19  200   */
4f853a714 Christoffer Dall 2014-05-09  201  static void unmap_ptes(struct kvm 
*kvm, pmd_t *pmd,
4f853a714 Christoffer Dall 2014-05-09  202 phys_addr_t 
addr, phys_addr_t end)
4f728276f Marc Zyngier 2013-04-12  203  {
4f853a714 Christoffer Dall 2014-05-09  204  phys_addr_t start_addr = addr;
4f853a714 Christoffer Dall 2014-05-09  205  pte_t *pte, *start_pte;
4f853a714 Christoffer Dall 2014-05-09  206  
4f853a714 Christoffer Dall 2014-05-09  207  start_pte = pte = 
pte_offset_kernel(pmd, addr);
4f853a714 Christoffer Dall 2014-05-09  208  do {
8e9a96138 Christoffer Dall 2015-11-05  209  if (pte_none(*pte))
8e9a96138 Christoffer Dall 2015-11-05  210  continue;
8e9a96138 Christoffer Dall 2015-11-05  211  
363ef89f8 Marc Zyngier 2014-12-19 @212  pte_t old_pte = *pte;
363ef89f8 Marc Zyngier 2014-12-19  213  
4f728276f Marc Zyngier 2013-04-12  214  kvm_set_pte(pte, 
__pte(0));
d4cb9df5d Marc Zyngier 2013-05-14  215  
kvm_tlb_flush_vmid_ipa(kvm, addr);
363ef89f8 Marc Zyngier 2014-12-19  216  
363ef89f8 Marc Zyngier 2014-12-19  217  /* No need to 
invalidate the cache for device mappings */
8e9a96138 Christoffer Dall 2015-11-05  218  if ((pte_val(old_pte) & 
PAGE_S2_DEVICE) != PAGE_S2_DEVICE &&
8e9a96138 Christoffer Dall 2015-11-05  219  (pte_val(old_pte) & 
PAGE_HYP_DEVICE) != PAGE_HYP_DEVICE)
363ef89f8 Marc Zyngier 2014-12-19  220  
kvm_flush_dcache_pte(old_pte);

:: The code at line 212 was first introduced by commit
:: 363ef89f8e9bcedc28b976d0fe2d858fe139c122 arm/arm64: KVM: Invalidate data 
cache on unmap

:: TO: Marc Zyngier <marc.zyng...@arm.com>
:: CC: Christoffer Dall <christoffer.d...@linaro.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


[GIT PULL v4 1/3] dma: Provide simple noop dma ops

2015-11-05 Thread Christian Borntraeger
We are going to require dma_ops for several common drivers, even for
systems that do have an identity mapping. Lets provide some minimal
no-op dma_ops that can be used for that purpose.

Signed-off-by: Christian Borntraeger 
Reviewed-by: Joerg Roedel 
---
 include/linux/dma-mapping.h |  2 ++
 lib/Makefile|  1 +
 lib/dma-noop.c  | 75 +
 3 files changed, 78 insertions(+)
 create mode 100644 lib/dma-noop.c

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ac07ff0..7912f54 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -66,6 +66,8 @@ struct dma_map_ops {
int is_phys;
 };
 
+extern struct dma_map_ops dma_noop_ops;
+
 #define DMA_BIT_MASK(n)(((n) == 64) ? ~0ULL : ((1ULL<<(n))-1))
 
 #define DMA_MASK_NONE  0x0ULL
diff --git a/lib/Makefile b/lib/Makefile
index 13a7c6a..92d6135 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -18,6 +18,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
+lib-$(CONFIG_HAS_DMA) += dma-noop.o
 
 lib-y  += kobject.o klist.o
 obj-y  += lockref.o
diff --git a/lib/dma-noop.c b/lib/dma-noop.c
new file mode 100644
index 000..7214564
--- /dev/null
+++ b/lib/dma-noop.c
@@ -0,0 +1,75 @@
+/*
+ * lib/dma-noop.c
+ *
+ * Simple DMA noop-ops that map 1:1 with memory
+ */
+#include 
+#include 
+#include 
+#include 
+
+static void *dma_noop_alloc(struct device *dev, size_t size,
+   dma_addr_t *dma_handle, gfp_t gfp,
+   struct dma_attrs *attrs)
+{
+   void *ret;
+
+   ret = (void *)__get_free_pages(gfp, get_order(size));
+   if (ret)
+   *dma_handle = virt_to_phys(ret);
+   return ret;
+}
+
+static void dma_noop_free(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t dma_addr,
+ struct dma_attrs *attrs)
+{
+   free_pages((unsigned long)cpu_addr, get_order(size));
+}
+
+static dma_addr_t dma_noop_map_page(struct device *dev, struct page *page,
+ unsigned long offset, size_t size,
+ enum dma_data_direction dir,
+ struct dma_attrs *attrs)
+{
+   return page_to_phys(page) + offset;
+}
+
+static int dma_noop_map_sg(struct device *dev, struct scatterlist *sgl, int 
nents,
+enum dma_data_direction dir, struct dma_attrs 
*attrs)
+{
+   int i;
+   struct scatterlist *sg;
+
+   for_each_sg(sgl, sg, nents, i) {
+   void *va;
+
+   BUG_ON(!sg_page(sg));
+   va = sg_virt(sg);
+   sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va);
+   sg_dma_len(sg) = sg->length;
+   }
+
+   return nents;
+}
+
+static int dma_noop_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return 0;
+}
+
+static int dma_noop_supported(struct device *dev, u64 mask)
+{
+   return 1;
+}
+
+struct dma_map_ops dma_noop_ops = {
+   .alloc  = dma_noop_alloc,
+   .free   = dma_noop_free,
+   .map_page   = dma_noop_map_page,
+   .map_sg = dma_noop_map_sg,
+   .mapping_error  = dma_noop_mapping_error,
+   .dma_supported  = dma_noop_supported,
+};
+
+EXPORT_SYMBOL(dma_noop_ops);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL v4 3/3] s390/dma: Allow per device dma ops

2015-11-05 Thread Christian Borntraeger
As virtio-ccw will have dma ops, we can no longer default to the
zPCI ones. Make use of dev_archdata to keep the dma_ops per device.
The pci devices now use that to override the default, and the
default is changed to use the noop ops for everything that does not
specify a device specific one.
To compile without PCI support we will enable HAS_DMA all the time,
via the default config in lib/Kconfig.

Signed-off-by: Christian Borntraeger 
Reviewed-by: Joerg Roedel 
Acked-by: Sebastian Ott 
---
 arch/s390/Kconfig   | 7 ++-
 arch/s390/include/asm/device.h  | 6 +-
 arch/s390/include/asm/dma-mapping.h | 6 --
 arch/s390/pci/pci.c | 1 +
 arch/s390/pci/pci_dma.c | 4 ++--
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 1d57000..e2a885b 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -124,6 +124,8 @@ config S390
select HAVE_CMPXCHG_DOUBLE
select HAVE_CMPXCHG_LOCAL
select HAVE_DEBUG_KMEMLEAK
+   select HAVE_DMA_ATTRS
+   select HAVE_DMA_API_DEBUG
select HAVE_DYNAMIC_FTRACE
select HAVE_DYNAMIC_FTRACE_WITH_REGS
select HAVE_FTRACE_MCOUNT_RECORD
@@ -580,7 +582,6 @@ config QDIO
 
 menuconfig PCI
bool "PCI support"
-   select HAVE_DMA_ATTRS
select PCI_MSI
help
  Enable PCI support.
@@ -620,10 +621,6 @@ config HAS_IOMEM
 config IOMMU_HELPER
def_bool PCI
 
-config HAS_DMA
-   def_bool PCI
-   select HAVE_DMA_API_DEBUG
-
 config NEED_SG_DMA_LENGTH
def_bool PCI
 
diff --git a/arch/s390/include/asm/device.h b/arch/s390/include/asm/device.h
index d8f9872..4a9f35e 100644
--- a/arch/s390/include/asm/device.h
+++ b/arch/s390/include/asm/device.h
@@ -3,5 +3,9 @@
  *
  * This file is released under the GPLv2
  */
-#include 
+struct dev_archdata {
+   struct dma_map_ops *dma_ops;
+};
 
+struct pdev_archdata {
+};
diff --git a/arch/s390/include/asm/dma-mapping.h 
b/arch/s390/include/asm/dma-mapping.h
index b3fd54d..cb05f5c 100644
--- a/arch/s390/include/asm/dma-mapping.h
+++ b/arch/s390/include/asm/dma-mapping.h
@@ -11,11 +11,13 @@
 
 #define DMA_ERROR_CODE (~(dma_addr_t) 0x0)
 
-extern struct dma_map_ops s390_dma_ops;
+extern struct dma_map_ops s390_pci_dma_ops;
 
 static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 {
-   return _dma_ops;
+   if (dev && dev->archdata.dma_ops)
+   return dev->archdata.dma_ops;
+   return _noop_ops;
 }
 
 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c
index 7ef12a3..fa41605 100644
--- a/arch/s390/pci/pci.c
+++ b/arch/s390/pci/pci.c
@@ -649,6 +649,7 @@ int pcibios_add_device(struct pci_dev *pdev)
 
zdev->pdev = pdev;
pdev->dev.groups = zpci_attr_groups;
+   pdev->dev.archdata.dma_ops = _pci_dma_ops;
zpci_map_resources(pdev);
 
for (i = 0; i < PCI_BAR_COUNT; i++) {
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 37505b8..ea39c3f 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -495,7 +495,7 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-struct dma_map_ops s390_dma_ops = {
+struct dma_map_ops s390_pci_dma_ops = {
.alloc  = s390_dma_alloc,
.free   = s390_dma_free,
.map_sg = s390_dma_map_sg,
@@ -506,7 +506,7 @@ struct dma_map_ops s390_dma_ops = {
.is_phys= 0,
/* dma_supported is unconditionally true without a callback */
 };
-EXPORT_SYMBOL_GPL(s390_dma_ops);
+EXPORT_SYMBOL_GPL(s390_pci_dma_ops);
 
 static int __init s390_iommu_setup(char *str)
 {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 11/18] lib/printf: support the %u unsigned fmt field

2015-11-05 Thread Andrew Jones
From: Alex Bennée 

Signed-off-by: Alex Bennée 
Reviewed-by: Andrew Jones 
---
 lib/printf.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/lib/printf.c b/lib/printf.c
index 89308fb26b7d2..5d83605afe829 100644
--- a/lib/printf.c
+++ b/lib/printf.c
@@ -180,6 +180,19 @@ int vsnprintf(char *buf, int size, const char *fmt, 
va_list va)
break;
}
break;
+   case 'u':
+   switch (nlong) {
+   case 0:
+   print_unsigned(, va_arg(va, unsigned), 10, props);
+   break;
+   case 1:
+   print_unsigned(, va_arg(va, unsigned long), 10, props);
+   break;
+   default:
+   print_unsigned(, va_arg(va, unsigned long long), 10, props);
+   break;
+   }
+   break;
case 'x':
switch (nlong) {
case 0:
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 16/18] lib: link in linux kernel headers (uapi)

2015-11-05 Thread Andrew Jones
Rather than import uapi headers, e.g. lib/arm/asm/uapi-psci.h, just
include them. For cross compilation we'll need the headers explicitly
added to the include path, but doing -I /usr/include won't work, as
some of our local header names will collide with /usr/include names.
Even just doing -I /usr/include/linux would leave too many potential
name collisions. So we create a lib/linux link, and do *not*
add -I lib/linux. Doing it this way requires code to include the uapi
headers with , putting them in their own "linux"
namespace.

Signed-off-by: Andrew Jones 
---
 .gitignore |  1 +
 Makefile   |  4 ++--
 configure  |  9 +
 lib/arm/asm/page.h |  2 +-
 lib/arm/asm/psci.h |  2 +-
 lib/arm64/asm/page.h   |  2 +-
 lib/arm64/asm/psci.h   |  2 +-
 lib/asm-generic/page.h |  2 +-
 lib/const.h| 11 ---
 9 files changed, 17 insertions(+), 18 deletions(-)
 delete mode 100644 lib/const.h

diff --git a/.gitignore b/.gitignore
index acbb9055212aa..b193802488003 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,6 +11,7 @@ patches
 cscope.*
 *.swp
 /lib/asm
+/lib/linux
 /config.mak
 /*-run
 /test.log
diff --git a/Makefile b/Makefile
index 3e60b4f8e4a57..2e76c90bdcea5 100644
--- a/Makefile
+++ b/Makefile
@@ -82,10 +82,10 @@ libfdt_clean:
$(LIBFDT_objdir)/.*.d
 
 distclean: clean libfdt_clean
-   $(RM) lib/asm config.mak $(TEST_DIR)-run test.log msr.out cscope.*
+   $(RM) lib/linux lib/asm config.mak $(TEST_DIR)-run test.log msr.out 
cscope.*
$(RM) -r tests
 
-cscope: common_dirs = lib lib/libfdt lib/asm lib/asm-generic
+cscope: common_dirs = lib lib/libfdt lib/linux lib/asm lib/asm-generic
 cscope:
$(RM) ./cscope.*
find -L $(TEST_DIR) lib/$(TEST_DIR) lib/$(ARCH) $(common_dirs) 
-maxdepth 1 \
diff --git a/configure b/configure
index 078b70ce096a6..667cc1b30e119 100755
--- a/configure
+++ b/configure
@@ -91,6 +91,15 @@ if [ -f $testdir/run ]; then
 ln -fs $testdir/run $testdir-run
 fi
 
+# link uapi/linux
+rm -f lib/linux
+if [ ! -d /usr/include/linux ]; then
+echo kernel-headers not installed, aborting...
+exit 1
+else
+ln -s /usr/include/linux lib/linux
+fi
+
 # check for dependent 32 bit libraries
 if [ "$arch" != "arm" ]; then
 cat << EOF > lib_test.c
diff --git a/lib/arm/asm/page.h b/lib/arm/asm/page.h
index 039e2ddfb8e0f..df76969964ed3 100644
--- a/lib/arm/asm/page.h
+++ b/lib/arm/asm/page.h
@@ -6,7 +6,7 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 
-#include 
+#include 
 
 #define PAGE_SHIFT 12
 #define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
diff --git a/lib/arm/asm/psci.h b/lib/arm/asm/psci.h
index c5fe78184b5ac..11ac45028d787 100644
--- a/lib/arm/asm/psci.h
+++ b/lib/arm/asm/psci.h
@@ -1,7 +1,7 @@
 #ifndef _ASMARM_PSCI_H_
 #define _ASMARM_PSCI_H_
 #include 
-#include 
+#include 
 
 #define PSCI_INVOKE_ARG_TYPE   u32
 #define PSCI_FN_CPU_ON PSCI_0_2_FN_CPU_ON
diff --git a/lib/arm64/asm/page.h b/lib/arm64/asm/page.h
index 29ad1f1f720c4..3144e8efcc7ae 100644
--- a/lib/arm64/asm/page.h
+++ b/lib/arm64/asm/page.h
@@ -11,7 +11,7 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 
-#include 
+#include 
 
 #define PGTABLE_LEVELS 2
 #define VA_BITS42
diff --git a/lib/arm64/asm/psci.h b/lib/arm64/asm/psci.h
index 940d61d34c05d..0a7d7c854e2b3 100644
--- a/lib/arm64/asm/psci.h
+++ b/lib/arm64/asm/psci.h
@@ -1,7 +1,7 @@
 #ifndef _ASMARM64_PSCI_H_
 #define _ASMARM64_PSCI_H_
 #include 
-#include 
+#include 
 
 #define PSCI_INVOKE_ARG_TYPE   u64
 #define PSCI_FN_CPU_ON PSCI_0_2_FN64_CPU_ON
diff --git a/lib/asm-generic/page.h b/lib/asm-generic/page.h
index 66c72a62bb0f7..f872f6fa0dad2 100644
--- a/lib/asm-generic/page.h
+++ b/lib/asm-generic/page.h
@@ -9,7 +9,7 @@
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
 
-#include "const.h"
+#include 
 
 #define PAGE_SHIFT 12
 #define PAGE_SIZE  (_AC(1,UL) << PAGE_SHIFT)
diff --git a/lib/const.h b/lib/const.h
deleted file mode 100644
index 5cd94d7067541..0
--- a/lib/const.h
+++ /dev/null
@@ -1,11 +0,0 @@
-#ifndef _CONST_H_
-#define _CONST_H_
-#ifdef __ASSEMBLY__
-#define _AC(X,Y)   X
-#define _AT(T,X)   X
-#else
-#define __AC(X,Y)  (X##Y)
-#define _AC(X,Y)   __AC(X,Y)
-#define _AT(T,X)   ((T)(X))
-#endif
-#endif
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 18/18] arm/arm64: uart0_init: check /chosen/stdout-path

2015-11-05 Thread Andrew Jones
Arguably all of uart0_init() is unnecessary, as we're pretty sure
that the address we initialize uart0_base to is correct. We go
through the motions of finding the uart anyway though, because it's
easy. It's also easy to check chosen/stdout-path first, so let's do
that too. But, just to make all this stuff is a little less unnecessary,
let's add a warning when we do actually find an address that doesn't
match our initializer.

Signed-off-by: Andrew Jones 
---
 lib/arm/io.c | 36 +++-
 1 file changed, 27 insertions(+), 9 deletions(-)

diff --git a/lib/arm/io.c b/lib/arm/io.c
index 8b1501886736a..a08d394e4aa1c 100644
--- a/lib/arm/io.c
+++ b/lib/arm/io.c
@@ -19,12 +19,14 @@ extern void halt(int code);
 /*
  * Use this guess for the pl011 base in order to make an attempt at
  * having earlier printf support. We'll overwrite it with the real
- * base address that we read from the device tree later.
+ * base address that we read from the device tree later. This is
+ * the address we expect QEMU's mach-virt machine type to put in
+ * its generated device tree.
  */
-#define QEMU_MACH_VIRT_PL011_BASE 0x0900UL
+#define UART_EARLY_BASE 0x0900UL
 
 static struct spinlock uart_lock;
-static volatile u8 *uart0_base = (u8 *)QEMU_MACH_VIRT_PL011_BASE;
+static volatile u8 *uart0_base = (u8 *)UART_EARLY_BASE;
 
 static void uart0_init(void)
 {
@@ -32,16 +34,32 @@ static void uart0_init(void)
struct dt_pbus_reg base;
int ret;
 
-   ret = dt_pbus_get_base_compatible(compatible, );
-   assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
+   ret = dt_get_default_console_node();
+   assert(ret >= 0 || ret == -FDT_ERR_NOTFOUND);
 
-   if (ret) {
-   printf("%s: %s not found in the device tree, aborting...\n",
-   __func__, compatible);
-   abort();
+   if (ret == -FDT_ERR_NOTFOUND) {
+
+   ret = dt_pbus_get_base_compatible(compatible, );
+   assert(ret == 0 || ret == -FDT_ERR_NOTFOUND);
+
+   if (ret) {
+   printf("%s: %s not found in the device tree, "
+   "aborting...\n",
+   __func__, compatible);
+   abort();
+   }
+
+   } else {
+   assert(dt_pbus_translate_node(ret, 0, ) == 0);
}
 
uart0_base = ioremap(base.addr, base.size);
+
+   if (uart0_base != (u8 *)UART_EARLY_BASE) {
+   printf("WARNING: early print support may not work. "
+  "Found uart at %p, but early base is %p.\n",
+   uart0_base, (u8 *)UART_EARLY_BASE);
+   }
 }
 
 void io_init(void)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 05/18] README: add pointer to new wiki page

2015-11-05 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 README | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/README b/README
index eab5ea28f7fab..45587f2a97ec6 100644
--- a/README
+++ b/README
@@ -1,3 +1,9 @@
+Welcome to kvm-unit-tests
+
+See http://www.linux-kvm.org/page/KVM-unit-tests for a high-level
+description of this project, as well as running tests and adding
+tests HOWTOs.
+
 This directory contains sources for a kvm test suite.
 
 To create the test images do
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 02/18] trivial: lib: fail hard on failed mallocs

2015-11-05 Thread Andrew Jones
It's pretty safe to not even bother checking for NULL when
using malloc and friends, but if we do check, then fail
hard.

Signed-off-by: Andrew Jones 
---
 lib/virtio-mmio.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/lib/virtio-mmio.c b/lib/virtio-mmio.c
index 043832299174e..1b6f0cc378b79 100644
--- a/lib/virtio-mmio.c
+++ b/lib/virtio-mmio.c
@@ -54,8 +54,7 @@ static struct virtqueue *vm_setup_vq(struct virtio_device 
*vdev,
 
vq = calloc(1, sizeof(*vq));
queue = memalign(PAGE_SIZE, VIRTIO_MMIO_QUEUE_SIZE_MIN);
-   if (!vq || !queue)
-   return NULL;
+   assert(vq && queue);
 
writel(index, vm_dev->base + VIRTIO_MMIO_QUEUE_SEL);
 
@@ -161,9 +160,7 @@ static struct virtio_device *virtio_mmio_dt_bind(u32 devid)
if (node == -FDT_ERR_NOTFOUND)
return NULL;
 
-   vm_dev = calloc(1, sizeof(*vm_dev));
-   if (!vm_dev)
-   return NULL;
+   assert((vm_dev = calloc(1, sizeof(*vm_dev))) != NULL);
 
vm_dev->base = info.base;
vm_device_init(vm_dev);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 10/18] run_tests: probe for max-smp

2015-11-05 Thread Andrew Jones
KVM can be configured to only support a few vcpus. ARM and AArch64
currently have a default config of only 4. While it's nice to be
able to write tests that use the maximum recommended, nr-host-cpus,
we can't assume that nr-host-cpus == kvm-max-vcpus. This patch allows
one to put $MAX_SMP in the smp =  line of a unittests.cfg file.
That variable will then expand to the number of host cpus, or to the
maximum vcpus allowed by KVM.

[Inspired by a patch from Alex Bennée solving the same issue.]

Signed-off-by: Andrew Jones 
---
 arm/unittests.cfg   | 3 ++-
 run_tests.sh| 9 +
 scripts/mkstandalone.sh | 9 -
 x86/unittests.cfg   | 1 +
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 243c13301811b..5e26da1a8c1bc 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -2,6 +2,7 @@
 # [unittest_name]
 # file = foo.flat # Name of the flat file to be used
 # smp  = 2# Number of processors the VM will use during this test
+# # Use $MAX_SMP to use the maximum the host supports.
 # extra_params = -append  # Additional parameters used
 # arch = arm|arm64   # Only if test case is specific to one
 # groups = group1 group2 # Used to identify test cases with run_tests -g ...
@@ -34,6 +35,6 @@ groups = selftest
 # Test SMP support
 [selftest-smp]
 file = selftest.flat
-smp = `getconf _NPROCESSORS_CONF`
+smp = $MAX_SMP
 extra_params = -append 'smp'
 groups = selftest
diff --git a/run_tests.sh b/run_tests.sh
index b1b4c541ecaea..fad22a935b007 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -98,4 +98,13 @@ while getopts "g:hv" opt; do
 esac
 done
 
+#
+# Probe for MAX_SMP
+#
+MAX_SMP=$(getconf _NPROCESSORS_CONF)
+while ./$TEST_DIR-run _NO_FILE_4Uhere_ -smp $MAX_SMP \
+   |& grep -q 'exceeds max cpus'; do
+   ((--MAX_SMP))
+done
+
 for_each_unittest $config run
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
index 0c39451e538c9..3ce244aff67b9 100755
--- a/scripts/mkstandalone.sh
+++ b/scripts/mkstandalone.sh
@@ -95,12 +95,19 @@ qemu="$qemu"
 if [ "\$QEMU" ]; then
qemu="\$QEMU"
 fi
+
+MAX_SMP="MAX_SMP"
 echo \$qemu $cmdline -smp $smp $opts
 
 cmdline="\`echo '$cmdline' | sed s%$kernel%_NO_FILE_4Uhere_%\`"
 if \$qemu \$cmdline 2>&1 | grep 'No accelerator found'; then
-ret=2
+   ret=2
 else
+   MAX_SMP=\`getconf _NPROCESSORS_CONF\`
+   while \$qemu \$cmdline -smp \$MAX_SMP 2>&1 | grep 'exceeds max cpus' > 
/dev/null; do
+   MAX_SMP=\`expr \$MAX_SMP - 1\`
+   done
+
cmdline="\`echo '$cmdline' | sed s%$kernel%\$bin%\`"
\$qemu \$cmdline -smp $smp $opts
ret=\$?
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index a38544f77c056..337cc19d3d19d 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -2,6 +2,7 @@
 # [unittest_name]
 # file = foo.flat # Name of the flat file to be used
 # smp = 2 # Number of processors the VM will use during this test
+# # Use $MAX_SMP to use the maximum the host supports.
 # extra_params = -cpu qemu64,+x2apic # Additional parameters used
 # arch = i386/x86_64 # Only if the test case works only on one of them
 # groups = group1 group2 # Used to identify test cases with run_tests -g ...
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 15/18] arm/arm64: generate map files

2015-11-05 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 .gitignore   | 1 +
 config/config-arm-common.mak | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/.gitignore b/.gitignore
index 242fae475094c..acbb9055212aa 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,6 +4,7 @@
 *.o
 *.flat
 *.elf
+*.map
 .pc
 patches
 .stgit-*
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 937d408574751..54cca5663d275 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -55,6 +55,7 @@ FLATLIBS = $(libcflat) $(LIBFDT_archive) $(libgcc) $(libeabi)
 %.elf: %.o $(FLATLIBS) arm/flat.lds
$(CC) $(LDFLAGS) -o $@ \
-Wl,-T,arm/flat.lds,--build-id=none,-Ttext=$(start_addr) \
+   -Wl,-Map,$(basename $@).map \
$(filter %.o, $^) $(FLATLIBS)
 
 %.flat: %.elf
@@ -64,7 +65,7 @@ $(libeabi): $(eabiobjs)
$(AR) rcs $@ $^
 
 arm_clean: libfdt_clean asm_offsets_clean
-   $(RM) $(TEST_DIR)/*.{o,flat,elf} $(libeabi) $(eabiobjs) \
+   $(RM) $(TEST_DIR)/*.{o,flat,elf,map} $(libeabi) $(eabiobjs) \
  $(TEST_DIR)/.*.d lib/arm/.*.d
 
 ##
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 14/18] arm/arm64: allow building a single test

2015-11-05 Thread Andrew Jones
This is mostly useful for building new tests that don't yet (and
may never) have entries in the makefiles (config-arm*.mak). Of course
it can be used to build tests that do have entries as well, in order
to avoid building all tests, if the plan is to run just the one.

Just do 'make TEST=some-test' to use it, where "some-test" matches
the name of the source file, i.e. arm/some-test.c

Signed-off-by: Andrew Jones 
---
 config/config-arm-common.mak | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 698555d6a676f..937d408574751 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -13,6 +13,11 @@ tests-common = \
$(TEST_DIR)/selftest.flat \
$(TEST_DIR)/spinlock-test.flat
 
+ifneq ($(TEST),)
+   tests = $(TEST_DIR)/$(TEST).flat
+   tests-common =
+endif
+
 all: test_cases
 
 ##
@@ -68,5 +73,6 @@ generated_files = $(asm-offsets)
 
 test_cases: $(generated_files) $(tests-common) $(tests)
 
+$(TEST_DIR)/$(TEST).elf: $(cstart.o) $(TEST_DIR)/$(TEST).o
 $(TEST_DIR)/selftest.elf: $(cstart.o) $(TEST_DIR)/selftest.o
 $(TEST_DIR)/spinlock-test.elf: $(cstart.o) $(TEST_DIR)/spinlock-test.o
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 17/18] Revert "arm/arm64: import include/uapi/linux/psci.h"

2015-11-05 Thread Andrew Jones
The previous patch allows us to "unimport" this header now.

This reverts commit 7bc9f5e757bfa5c5a520281640fcf47a14b3.

Signed-off-by: Andrew Jones 
---
 lib/arm/asm/uapi-psci.h   | 73 ---
 lib/arm64/asm/uapi-psci.h |  1 -
 2 files changed, 74 deletions(-)
 delete mode 100644 lib/arm/asm/uapi-psci.h
 delete mode 100644 lib/arm64/asm/uapi-psci.h

diff --git a/lib/arm/asm/uapi-psci.h b/lib/arm/asm/uapi-psci.h
deleted file mode 100644
index 5c6fada2b5105..0
--- a/lib/arm/asm/uapi-psci.h
+++ /dev/null
@@ -1,73 +0,0 @@
-#ifndef _ASMARM_UAPI_PSCI_H_
-#define _ASMARM_UAPI_PSCI_H_
-/*
- * From include/uapi/linux/psci.h
- */
-
-/* PSCI v0.2 interface */
-#define PSCI_0_2_FN_BASE   0x8400
-#define PSCI_0_2_FN(n) (PSCI_0_2_FN_BASE + (n))
-#define PSCI_0_2_64BIT 0x4000
-#define PSCI_0_2_FN64_BASE \
-   (PSCI_0_2_FN_BASE + PSCI_0_2_64BIT)
-#define PSCI_0_2_FN64(n)   (PSCI_0_2_FN64_BASE + (n))
-
-#define PSCI_0_2_FN_PSCI_VERSION   PSCI_0_2_FN(0)
-#define PSCI_0_2_FN_CPU_SUSPENDPSCI_0_2_FN(1)
-#define PSCI_0_2_FN_CPU_OFFPSCI_0_2_FN(2)
-#define PSCI_0_2_FN_CPU_ON PSCI_0_2_FN(3)
-#define PSCI_0_2_FN_AFFINITY_INFO  PSCI_0_2_FN(4)
-#define PSCI_0_2_FN_MIGRATEPSCI_0_2_FN(5)
-#define PSCI_0_2_FN_MIGRATE_INFO_TYPE  PSCI_0_2_FN(6)
-#define PSCI_0_2_FN_MIGRATE_INFO_UP_CPUPSCI_0_2_FN(7)
-#define PSCI_0_2_FN_SYSTEM_OFF PSCI_0_2_FN(8)
-#define PSCI_0_2_FN_SYSTEM_RESET   PSCI_0_2_FN(9)
-
-#define PSCI_0_2_FN64_CPU_SUSPEND  PSCI_0_2_FN64(1)
-#define PSCI_0_2_FN64_CPU_ON   PSCI_0_2_FN64(3)
-#define PSCI_0_2_FN64_AFFINITY_INFOPSCI_0_2_FN64(4)
-#define PSCI_0_2_FN64_MIGRATE  PSCI_0_2_FN64(5)
-#define PSCI_0_2_FN64_MIGRATE_INFO_UP_CPU  PSCI_0_2_FN64(7)
-
-/* PSCI v0.2 power state encoding for CPU_SUSPEND function */
-#define PSCI_0_2_POWER_STATE_ID_MASK   0x
-#define PSCI_0_2_POWER_STATE_ID_SHIFT  0
-#define PSCI_0_2_POWER_STATE_TYPE_SHIFT16
-#define PSCI_0_2_POWER_STATE_TYPE_MASK \
-   (0x1 << PSCI_0_2_POWER_STATE_TYPE_SHIFT)
-#define PSCI_0_2_POWER_STATE_AFFL_SHIFT24
-#define PSCI_0_2_POWER_STATE_AFFL_MASK \
-   (0x3 << PSCI_0_2_POWER_STATE_AFFL_SHIFT)
-
-/* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
-#define PSCI_0_2_AFFINITY_LEVEL_ON 0
-#define PSCI_0_2_AFFINITY_LEVEL_OFF1
-#define PSCI_0_2_AFFINITY_LEVEL_ON_PENDING 2
-
-/* PSCI v0.2 multicore support in Trusted OS returned by MIGRATE_INFO_TYPE */
-#define PSCI_0_2_TOS_UP_MIGRATE0
-#define PSCI_0_2_TOS_UP_NO_MIGRATE 1
-#define PSCI_0_2_TOS_MP2
-
-/* PSCI version decoding (independent of PSCI version) */
-#define PSCI_VERSION_MAJOR_SHIFT   16
-#define PSCI_VERSION_MINOR_MASK\
-   ((1U << PSCI_VERSION_MAJOR_SHIFT) - 1)
-#define PSCI_VERSION_MAJOR_MASK~PSCI_VERSION_MINOR_MASK
-#define PSCI_VERSION_MAJOR(ver)\
-   (((ver) & PSCI_VERSION_MAJOR_MASK) >> PSCI_VERSION_MAJOR_SHIFT)
-#define PSCI_VERSION_MINOR(ver)\
-   ((ver) & PSCI_VERSION_MINOR_MASK)
-
-/* PSCI return values (inclusive of all PSCI versions) */
-#define PSCI_RET_SUCCESS   0
-#define PSCI_RET_NOT_SUPPORTED -1
-#define PSCI_RET_INVALID_PARAMS-2
-#define PSCI_RET_DENIED-3
-#define PSCI_RET_ALREADY_ON-4
-#define PSCI_RET_ON_PENDING-5
-#define PSCI_RET_INTERNAL_FAILURE  -6
-#define PSCI_RET_NOT_PRESENT   -7
-#define PSCI_RET_DISABLED  -8
-
-#endif /* _ASMARM_UAPI_PSCI_H_ */
diff --git a/lib/arm64/asm/uapi-psci.h b/lib/arm64/asm/uapi-psci.h
deleted file mode 100644
index 83d018f954e4c..0
--- a/lib/arm64/asm/uapi-psci.h
+++ /dev/null
@@ -1 +0,0 @@
-#include "../../arm/asm/uapi-psci.h"
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 08/18] run_tests: pass test name to run script

2015-11-05 Thread Andrew Jones
With this $TEST_DIR/run can output test specific error messages.

Signed-off-by: Andrew Jones 
---
 run_tests.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/run_tests.sh b/run_tests.sh
index ebb7e9fe6fdfc..80b87823c3358 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -46,7 +46,7 @@ function run()
 fi
 done
 
-cmdline="./$TEST_DIR-run $kernel -smp $smp $opts"
+cmdline="TESTNAME=$testname ./$TEST_DIR-run $kernel -smp $smp $opts"
 if [ $verbose != 0 ]; then
 echo $cmdline
 fi
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 12/18] lib/arm: add flush_tlb_page mmu function

2015-11-05 Thread Andrew Jones
From: Alex Bennée 

This introduces a new flush_tlb_page function which does exactly what
you expect. It's going to be useful for the future TLB torture test.

Signed-off-by: Alex Bennée 
Reviewed-by: Andrew Jones 
---
 lib/arm/asm/mmu.h   | 11 +++
 lib/arm64/asm/mmu.h |  8 
 2 files changed, 19 insertions(+)

diff --git a/lib/arm/asm/mmu.h b/lib/arm/asm/mmu.h
index c1bd01c9ee1b9..2bb0cde820f8a 100644
--- a/lib/arm/asm/mmu.h
+++ b/lib/arm/asm/mmu.h
@@ -14,8 +14,11 @@
 #define PTE_AF PTE_EXT_AF
 #define PTE_WBWA   L_PTE_MT_WRITEALLOC
 
+/* See B3.18.7 TLB maintenance operations */
+
 static inline void local_flush_tlb_all(void)
 {
+   /* TLBIALL */
asm volatile("mcr p15, 0, %0, c8, c7, 0" :: "r" (0));
dsb();
isb();
@@ -27,6 +30,14 @@ static inline void flush_tlb_all(void)
local_flush_tlb_all();
 }
 
+static inline void flush_tlb_page(unsigned long vaddr)
+{
+   /* TLBIMVAA */
+   asm volatile("mcr p15, 0, %0, c8, c7, 3" :: "r" (vaddr));
+   dsb();
+   isb();
+}
+
 #include 
 
 #endif /* __ASMARM_MMU_H_ */
diff --git a/lib/arm64/asm/mmu.h b/lib/arm64/asm/mmu.h
index 18b4d6be18fae..3bc31c91c36f8 100644
--- a/lib/arm64/asm/mmu.h
+++ b/lib/arm64/asm/mmu.h
@@ -19,6 +19,14 @@ static inline void flush_tlb_all(void)
isb();
 }
 
+static inline void flush_tlb_page(unsigned long vaddr)
+{
+   unsigned long page = vaddr >> 12;
+   dsb(ishst);
+   asm("tlbi   vaae1is, %0" :: "r" (page));
+   dsb(ish);
+}
+
 #include 
 
 #endif /* __ASMARM64_MMU_H_ */
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 01/18] makefiles: use bash

2015-11-05 Thread Andrew Jones
Use bash in the makefiles, like we do in the scripts. Without
this some platforms using dash fail to execute make targets
that use bash-isms.

Signed-off-by: Andrew Jones 
---
 Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Makefile b/Makefile
index 0d5933474cd8c..3e60b4f8e4a57 100644
--- a/Makefile
+++ b/Makefile
@@ -1,4 +1,6 @@
 
+SHELL := /bin/bash
+
 ifeq ($(wildcard config.mak),)
 $(error run ./configure first. See ./configure -h)
 endif
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 00/18] bunch of mostly trivial patches

2015-11-05 Thread Andrew Jones
Many of these patches were posted once. Some weren't, but anyway
almost everything is pretty trivial. I'd like to get these in, or
at least get definitive nacks on them (and then drop them) in order
to clean my queue before more patches (coming from Alex Bennée and
Chistopher are reposted).

All patches also available here
https://github.com/rhdrjones/kvm-unit-tests/commits/queue

Thanks,
drew


Alex Bennée (4):
  README: add some CONTRIBUTING notes
  configure: emit HOST=$host to config.mak
  lib/printf: support the %u unsigned fmt field
  lib/arm: add flush_tlb_page mmu function

Andrew Jones (13):
  makefiles: use bash
  trivial: lib: fail hard on failed mallocs
  trivial: alloc: don't use 'top' outside spinlock
  trivial: lib: missing extern in string.h
  README: add pointer to new wiki page
  run_tests: pass test name to run script
  arm/run: use ACCEL to choose between kvm and tcg
  run_tests: probe for max-smp
  arm/arm64: allow building a single test
  arm/arm64: generate map files
  lib: link in linux kernel headers (uapi)
  Revert "arm/arm64: import include/uapi/linux/psci.h"
  arm/arm64: uart0_init: check /chosen/stdout-path

Christopher Covington (1):
  arm: Fail on unknown subtest

 .gitignore   |  2 ++
 Makefile |  6 ++--
 README   | 32 +++
 arm/run  | 43 ++
 arm/selftest.c   |  3 ++
 arm/unittests.cfg|  7 +++--
 config/config-arm-common.mak |  9 +-
 configure| 11 +++
 lib/alloc.c  |  8 +++--
 lib/arm/asm/mmu.h| 11 +++
 lib/arm/asm/page.h   |  2 +-
 lib/arm/asm/psci.h   |  2 +-
 lib/arm/asm/uapi-psci.h  | 73 
 lib/arm/io.c | 36 --
 lib/arm64/asm/mmu.h  |  8 +
 lib/arm64/asm/page.h |  2 +-
 lib/arm64/asm/psci.h |  2 +-
 lib/arm64/asm/uapi-psci.h|  1 -
 lib/asm-generic/page.h   |  2 +-
 lib/const.h  | 11 ---
 lib/printf.c | 13 
 lib/string.h |  2 +-
 lib/virtio-mmio.c|  7 ++---
 run_tests.sh | 12 +++-
 scripts/functions.bash   |  8 +++--
 scripts/mkstandalone.sh  | 22 ++---
 x86/unittests.cfg|  1 +
 27 files changed, 210 insertions(+), 126 deletions(-)
 delete mode 100644 lib/arm/asm/uapi-psci.h
 delete mode 100644 lib/arm64/asm/uapi-psci.h
 delete mode 100644 lib/const.h

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 07/18] configure: emit HOST=$host to config.mak

2015-11-05 Thread Andrew Jones
From: Alex Bennée 

This is useful information for the run scripts to know, especially if
they want to drop to using TCG.

Signed-off-by: Alex Bennée 
Reviewed-by: Andrew Jones 
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index b2ad32a3e3a52..078b70ce096a6 100755
--- a/configure
+++ b/configure
@@ -7,6 +7,7 @@ ld=ld
 objcopy=objcopy
 ar=ar
 arch=`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'`
+host=$arch
 cross_prefix=
 
 usage() {
@@ -122,6 +123,7 @@ ln -s $asm lib/asm
 cat < config.mak
 PREFIX=$prefix
 KERNELDIR=$(readlink -f $kerneldir)
+HOST=$host
 ARCH=$arch
 ARCH_NAME=$arch_name
 PROCESSOR=$processor
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 09/18] arm/run: use ACCEL to choose between kvm and tcg

2015-11-05 Thread Andrew Jones
Inspired by a patch by Alex Bennée. This version uses a new
unittests.cfg variable and includes support for DRYRUN.

Signed-off-by: Andrew Jones 
---
 arm/run | 43 +--
 arm/unittests.cfg   |  4 +++-
 run_tests.sh|  3 ++-
 scripts/functions.bash  |  8 ++--
 scripts/mkstandalone.sh | 15 +++
 5 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/arm/run b/arm/run
index 8cc2fa2571967..4a648697d7fb5 100755
--- a/arm/run
+++ b/arm/run
@@ -7,6 +7,42 @@ fi
 source config.mak
 processor="$PROCESSOR"
 
+if [ -c /dev/kvm ]; then
+   if [ "$HOST" = "arm" ] && [ "$ARCH" = "arm" ]; then
+   kvm_available=yes
+   elif [ "$HOST" = "aarch64" ]; then
+   kvm_available=yes
+   fi
+fi
+
+if [ "$ACCEL" = "kvm" ] && [ "$kvm_available" != "yes" ] &&
+   [ "$DRYRUN" != "yes" ]; then
+   printf "skip $TESTNAME (kvm only)\n\n"
+   exit 2
+fi
+
+if [ -z "$ACCEL" ]; then
+   if [ "$DRYRUN" = "yes" ]; then
+   # Output kvm with tcg fallback for dryrun (when both are
+   # allowed), since the command line we output may get used
+   # elsewhere.
+   ACCEL="kvm:tcg"
+   elif [ "$kvm_available" = "yes" ]; then
+   ACCEL="kvm"
+   else
+   ACCEL="tcg"
+   fi
+fi
+
+if [ "$ARCH" = "arm64" ]; then
+   if [[ $ACCEL =~ kvm ]]; then
+   # arm64 must use '-cpu host' with kvm, and we can't use
+   # '-cpu host' with tcg, so we force kvm-only (no fallback)
+   ACCEL="kvm"
+   processor="host"
+   fi
+fi
+
 qemu="${QEMU:-qemu-system-$ARCH_NAME}"
 qpath=$(which $qemu 2>/dev/null)
 
@@ -33,15 +69,10 @@ if $qemu $M -chardev testdev,id=id -initrd . 2>&1 \
exit 2
 fi
 
-M='-machine virt,accel=kvm:tcg'
 chr_testdev='-device virtio-serial-device'
 chr_testdev+=' -device virtconsole,chardev=ctd -chardev testdev,id=ctd'
 
-# arm64 must use '-cpu host' with kvm
-if [ "$(arch)" = "aarch64" ] && [ "$ARCH" = "arm64" ] && [ -c /dev/kvm ]; then
-   processor="host"
-fi
-
+M+=",accel=$ACCEL"
 command="$qemu $M -cpu $processor $chr_testdev"
 command+=" -display none -serial stdio -kernel"
 echo $command "$@"
diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index e068a0cdd9c1f..243c13301811b 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -3,8 +3,10 @@
 # file = foo.flat # Name of the flat file to be used
 # smp  = 2# Number of processors the VM will use during this test
 # extra_params = -append  # Additional parameters used
-# arch = arm/arm64   # Only if test case is specific to one
+# arch = arm|arm64   # Only if test case is specific to one
 # groups = group1 group2 # Used to identify test cases with run_tests -g ...
+# accel = kvm|tcg # Optionally specify if test must run with kvm or tcg.
+# # If not specified, then kvm will be used when available.
 
 #
 # Test that the configured number of processors (smp = ), and
diff --git a/run_tests.sh b/run_tests.sh
index 80b87823c3358..b1b4c541ecaea 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -20,6 +20,7 @@ function run()
 local opts="$5"
 local arch="$6"
 local check="$7"
+local accel="$8"
 
 if [ -z "$testname" ]; then
 return
@@ -46,7 +47,7 @@ function run()
 fi
 done
 
-cmdline="TESTNAME=$testname ./$TEST_DIR-run $kernel -smp $smp $opts"
+cmdline="TESTNAME=$testname ACCEL=$accel ./$TEST_DIR-run $kernel -smp $smp 
$opts"
 if [ $verbose != 0 ]; then
 echo $cmdline
 fi
diff --git a/scripts/functions.bash b/scripts/functions.bash
index 7ed5a517250bc..f13fe6f88f23d 100644
--- a/scripts/functions.bash
+++ b/scripts/functions.bash
@@ -10,12 +10,13 @@ function for_each_unittest()
local groups
local arch
local check
+   local accel
 
exec {fd}<"$unittests"
 
while read -u $fd line; do
if [[ "$line" =~ ^\[(.*)\]$ ]]; then
-   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
"$arch" "$check"
+   "$cmd" "$testname" "$groups" "$smp" "$kernel" "$opts" 
"$arch" "$check" "$accel"
testname=${BASH_REMATCH[1]}
smp=1
kernel=""
@@ -23,6 +24,7 @@ function for_each_unittest()
groups=""
arch=""
check=""
+   accel=""
elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
kernel=$TEST_DIR/${BASH_REMATCH[1]}
elif [[ $line =~ ^smp\ *=\ *(.*)$ ]]; then
@@ -35,8 +37,10 @@ function for_each_unittest()
arch=${BASH_REMATCH[1]}
elif [[ $line =~ ^check\ *=\ *(.*)$ ]]; then
check=${BASH_REMATCH[1]}
+   elif 

[kvm-unit-tests PATCH 04/18] trivial: lib: missing extern in string.h

2015-11-05 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 lib/string.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/string.h b/lib/string.h
index 7820db86ee4e0..4e24f54d9e231 100644
--- a/lib/string.h
+++ b/lib/string.h
@@ -6,7 +6,7 @@ extern char *strcat(char *dest, const char *src);
 extern char *strcpy(char *dest, const char *src);
 extern int strcmp(const char *a, const char *b);
 extern char *strchr(const char *s, int c);
-char *strstr(const char *haystack, const char *needle);
+extern char *strstr(const char *haystack, const char *needle);
 extern void *memset(void *s, int c, size_t n);
 extern void *memcpy(void *dest, const void *src, size_t n);
 extern int memcmp(const void *s1, const void *s2, size_t n);
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 06/18] README: add some CONTRIBUTING notes

2015-11-05 Thread Andrew Jones
From: Alex Bennée 

Signed-off-by: Alex Bennée 
Reviewed-by: Andrew Jones 
---
 README | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/README b/README
index 45587f2a97ec6..457bd797362cf 100644
--- a/README
+++ b/README
@@ -41,3 +41,29 @@ Directory structure:
 ./:  the sources of the tests and the created objects/images
 
 See /README for architecture specific documentation.
+
+CONTRIBUTING:
+=
+
+Style
+-
+
+Currently there is a mix of indentation styles so any changes to
+existing files should be consistent with the existing style. For new
+files:
+
+  - C: please use standard linux-with-tabs
+  - Shell: use TABs for indentation
+
+Patches
+---
+
+Patches are welcome at the KVM mailing list .
+
+Please prefix messages with: [kvm-unit-tests PATCH]
+
+You can add the following to .git/config to do this automatically for you:
+
+[format]
+   subjectprefix = kvm-unit-tests PATCH
+
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 03/18] trivial: alloc: don't use 'top' outside spinlock

2015-11-05 Thread Andrew Jones
This is a fix just due to being too much of a type-A person.
I noticed the issue while reading over the function, and
decided to fix it, even though it's unlikely to be a problem
ever because top is read-mostly (like written once, then only
read, type of mostly).

Signed-off-by: Andrew Jones 
---
 lib/alloc.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/alloc.c b/lib/alloc.c
index ad6761430c965..34f71a337d868 100644
--- a/lib/alloc.c
+++ b/lib/alloc.c
@@ -61,15 +61,17 @@ static phys_addr_t phys_alloc_aligned_safe(phys_addr_t size,
 {
static bool warned = false;
phys_addr_t addr, size_orig = size;
-   u64 top_safe = top;
+   u64 top_safe;
+
+   spin_lock();
+
+   top_safe = top;
 
if (safe && sizeof(long) == 4)
top_safe = MIN(top, 1ULL << 32);
 
align = MAX(align, align_min);
 
-   spin_lock();
-
addr = ALIGN(base, align);
size += addr - base;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm: Don't try to flush hyp-mode device mappings

2015-11-05 Thread kbuild test robot
Hi Christoffer,

[auto build test WARNING on: kvmarm/next]
[also build test WARNING on: v4.3 next-20151105]

url:
https://github.com/0day-ci/linux/commits/Christoffer-Dall/KVM-arm-Don-t-try-to-flush-hyp-mode-device-mappings/20151105-232548
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm.git next
config: arm-axm55xx_defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All warnings (new ones prefixed by >>):

   arch/arm/kvm/mmu.c: In function 'unmap_ptes':
>> arch/arm/kvm/mmu.c:212:3: warning: ISO C90 forbids mixed declarations and 
>> code [-Wdeclaration-after-statement]
  pte_t old_pte = *pte;
  ^

vim +212 arch/arm/kvm/mmu.c

363ef89f8 Marc Zyngier 2014-12-19  196   *
363ef89f8 Marc Zyngier 2014-12-19  197   * This is why right after 
unmapping a page/section and invalidating
363ef89f8 Marc Zyngier 2014-12-19  198   * the corresponding TLBs, we call 
kvm_flush_dcache_p*() to make sure
363ef89f8 Marc Zyngier 2014-12-19  199   * the IO subsystem will never hit 
in the cache.
363ef89f8 Marc Zyngier 2014-12-19  200   */
4f853a714 Christoffer Dall 2014-05-09  201  static void unmap_ptes(struct kvm 
*kvm, pmd_t *pmd,
4f853a714 Christoffer Dall 2014-05-09  202 phys_addr_t 
addr, phys_addr_t end)
4f728276f Marc Zyngier 2013-04-12  203  {
4f853a714 Christoffer Dall 2014-05-09  204  phys_addr_t start_addr = addr;
4f853a714 Christoffer Dall 2014-05-09  205  pte_t *pte, *start_pte;
4f853a714 Christoffer Dall 2014-05-09  206  
4f853a714 Christoffer Dall 2014-05-09  207  start_pte = pte = 
pte_offset_kernel(pmd, addr);
4f853a714 Christoffer Dall 2014-05-09  208  do {
8e9a96138 Christoffer Dall 2015-11-05  209  if (pte_none(*pte))
8e9a96138 Christoffer Dall 2015-11-05  210  continue;
8e9a96138 Christoffer Dall 2015-11-05  211  
363ef89f8 Marc Zyngier 2014-12-19 @212  pte_t old_pte = *pte;
363ef89f8 Marc Zyngier 2014-12-19  213  
4f728276f Marc Zyngier 2013-04-12  214  kvm_set_pte(pte, 
__pte(0));
d4cb9df5d Marc Zyngier 2013-05-14  215  
kvm_tlb_flush_vmid_ipa(kvm, addr);
363ef89f8 Marc Zyngier 2014-12-19  216  
363ef89f8 Marc Zyngier 2014-12-19  217  /* No need to 
invalidate the cache for device mappings */
8e9a96138 Christoffer Dall 2015-11-05  218  if ((pte_val(old_pte) & 
PAGE_S2_DEVICE) != PAGE_S2_DEVICE &&
8e9a96138 Christoffer Dall 2015-11-05  219  (pte_val(old_pte) & 
PAGE_HYP_DEVICE) != PAGE_HYP_DEVICE)
363ef89f8 Marc Zyngier 2014-12-19  220  
kvm_flush_dcache_pte(old_pte);

:: The code at line 212 was first introduced by commit
:: 363ef89f8e9bcedc28b976d0fe2d858fe139c122 arm/arm64: KVM: Invalidate data 
cache on unmap

:: TO: Marc Zyngier <marc.zyng...@arm.com>
:: CC: Christoffer Dall <christoffer.d...@linaro.org>

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-05 Thread David Miller
From: Joerg Roedel 
Date: Thu, 5 Nov 2015 14:42:06 +0100

> Contended IOMMU locks are not only a problem on SPARC, but on x86 and
> various other IOMMU drivers too. But I have some ideas on how to improve
> the situation there.

And for the record Sowmini fixed a lot of the lock contention:

commit ff7d37a502022149655c18035b99a53391be0383
Author: Sowmini Varadhan 
Date:   Thu Apr 9 15:33:30 2015 -0400

Break up monolithic iommu table/lock into finer graularity pools and lock

Investigation of multithreaded iperf experiments on an ethernet
interface show the iommu->lock as the hottest lock identified by
lockstat, with something of the order of  21M contentions out of
27M acquisitions, and an average wait time of 26 us for the lock.
This is not efficient. A more scalable design is to follow the ppc
model, where the iommu_map_table has multiple pools, each stretching
over a segment of the map, and with a separate lock for each pool.
This model allows for better parallelization of the iommu map search.

This patch adds the iommu range alloc/free function infrastructure.

Signed-off-by: Sowmini Varadhan 
Acked-by: Benjamin Herrenschmidt 
Signed-off-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 13/18] arm: Fail on unknown subtest

2015-11-05 Thread Andrew Jones
From: Christopher Covington 

Signed-off-by: Christopher Covington 
Reviewed-by: Andrew Jones 
---
 arm/selftest.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arm/selftest.c b/arm/selftest.c
index fc9ec609d875e..f4a503079e464 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -376,6 +376,9 @@ int main(int argc, char **argv)
cpumask_set_cpu(0, _reported);
while (!cpumask_full(_reported))
cpu_relax();
+   } else {
+   printf("Unknown subtest\n");
+   abort();
}
 
return report_summary();
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-11-05 Thread Mario Smarduch


On 11/5/2015 6:48 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
>> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
>> flag is set on guest access and traps to vfp/simd hardware switch handler. 
>> On 
>> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
>> vm-exit if flag is set skip hardware context switch and return to host with 
>> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
>> hardware context switch to restore host.
>>
>> Also some arm64 field and empty function are added to compile for arm64.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/include/asm/kvm_host.h   |  1 +
>>  arch/arm/kvm/arm.c|  6 
>>  arch/arm/kvm/interrupts.S | 60 
>> ---
>>  arch/arm/kvm/interrupts_head.S| 14 +
>>  arch/arm64/include/asm/kvm_host.h |  4 +++
>>  5 files changed, 63 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_host.h 
>> b/arch/arm/include/asm/kvm_host.h
>> index f1bf551..a9e86e0 100644
>> --- a/arch/arm/include/asm/kvm_host.h
>> +++ b/arch/arm/include/asm/kvm_host.h
>> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>>  
>>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>>  
>>  static inline void kvm_arch_hardware_disable(void) {}
>>  static inline void kvm_arch_hardware_unsetup(void) {}
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index dc017ad..11a56fe 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  {
>>  /*
>> + * If fp/simd registers are dirty save guest, restore host before
> 
> If the fp/simd registers are dirty, then restore the host state before
I'd drop 'releasing the cpu', the vcpu thread may be returning to
user mode.
> 
>> + * releasing the cpu.
>> + */
>> +if (vcpu->arch.vfp_dirty)
>> +kvm_restore_host_vfp_state(vcpu);
>> +/*
>>   * The arch-generic KVM code expects the cpu field of a vcpu to be -1
>>   * if the vcpu is no longer assigned to a cpu.  This is used for the
>>   * optimized make_all_cpus_request path.
>> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
>> index 900ef6d..ca25314 100644
>> --- a/arch/arm/kvm/interrupts.S
>> +++ b/arch/arm/kvm/interrupts.S
>> @@ -28,6 +28,32 @@
>>  #include "interrupts_head.S"
>>  
>>  .text
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> 
> nit: Can you move the multi-line description of the function into a
> separate paragraph?
Sure.
> 
>> + *  fp/simd switch, saves the guest, restores host. Called from host
>> + *  mode, placed outside of hyp region start/end.
> 
> Put the description in a separate paragraph and get rid of the "executes
> lazy fp/simd swithch" part, that doesn't help understanding.  Just say
> that this funciton restores the host state.
Sure.
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +#ifdef CONFIG_VFPv3
>> +push{r4-r7}
>> +
>> +add r7, vcpu, #VCPU_VFP_GUEST
>> +store_vfp_state r7
>> +
>> +add r7, vcpu, #VCPU_VFP_HOST
>> +ldr r7, [r7]
>> +restore_vfp_state r7
>> +
>> +ldr r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
>> +VFPFMXR FPEXC, r3
>> +
>> +mov r3, #0
>> +strbr3, [vcpu, #VCPU_VFP_DIRTY]
>> +
>> +pop {r4-r7}
>> +#endif
>> +bx  lr
>> +ENDPROC(kvm_restore_host_vfp_state)
>>  
>>  __kvm_hyp_code_start:
>>  .globl __kvm_hyp_code_start
>> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
>>  @ If the host kernel has not been configured with VFPv3 support,
>>  @ then it is safer if we deny guests from using it as well.
>>  #ifdef CONFIG_VFPv3
>> -@ Set FPEXC_EN so the guest doesn't trap floating point instructions
>> +@ fp/simd register file has already been accessed, so skip host fpexc
>> +@ save and access trap enable.
>> +vfp_inlazy_mode r7, skip_guest_vfp_trap
> 
> So, why do we need to touch this register at all on every CPU exit?
> 
> Is it not true that we can only be in one of two state:
>  1) The register file is not dirty (not touched by the guest) and we
> should trap
>  2) The register file is dirty, and we should not trap to EL2?
> 
> Only in the first case do we need to set the FPEXC, and couldn't we just
> do that on vcpu_load and git rid of all this?  (except HCPTR_TCP which
> we still need to adjust).

I'm trying to think what happens if you're preempted after you saved
the FPEXC and set the FPEXC_EN bit in kvm_arch_vcpu_load(). Could some
thread pick up a bad FPEXC? May be possible to undo in the vcpu_put().

> 
>> +
>>  VFPFMRX r2, FPEXC   

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-05 Thread David Miller
From: David Woodhouse 
Date: Thu, 29 Oct 2015 22:35:25 +

> For the receive side, it shouldn't be beyond the wit of man to
> introduce an API which allocates *and* DMA-maps a skb. Pass it to
> netif_rx() still mapped, with a destructor that just shoves it back in
> a pool for re-use.

For forwarding, the SKB is going to another device to be DMA'd,
perhaps via another IOMMU.

For local connections, it's going to sit for an unpredictable and
unbounded amount of time in the socket queue.

We've been through this thought process before, believe me :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL v4 0/3] dma and virtio prep patches

2015-11-05 Thread Andy Lutomirski
On Thu, Nov 5, 2015 at 12:08 PM, Christian Borntraeger
 wrote:
> Andy,
>
> to make it obvious which version is the latest, here is a branch
>
> The following changes since commit 6a13feb9c82803e2b815eca72fa7a9f5561d7861:
>
>   Linux 4.3 (2015-11-01 16:05:25 -0800)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux.git  dma
>
> for you to fetch changes up to fc7f9754db6ce0c12281da4055281f731d36bdee:
>
>   s390/dma: Allow per device dma ops (2015-11-05 21:02:40 +0100)

Pulled, thanks.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region

2015-11-05 Thread Xiao Guangrong



On 11/06/2015 01:29 AM, Eduardo Habkost wrote:

On Mon, Nov 02, 2015 at 05:13:22PM +0800, Xiao Guangrong wrote:
[...]

  static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
  {
-return host_memory_backend_get_memory(dimm->hostmem, _abort);
+Error *local_err = NULL;
+MemoryRegion *mr;
+
+mr = host_memory_backend_get_memory(dimm->hostmem, _err);
+
+/*
+ * plug a pc-dimm device whose backend memory was not properly
+ * initialized?
+ */
+assert(!local_err && mr);


I don't know if you are going to remove the errp parameter in the next
version, but if you want to simply abort in case an error is reported by
a function, you can use _abort.



Thank you, Eduardo! let's happily drop the unused errp parameter in
host_memory_backend_get_memory in the next version. :)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split

2015-11-05 Thread Paul Mackerras
In static micro-threading modes, the dynamic micro-threading code
is supposed to be disabled, because subcores can't make independent
decisions about what micro-threading mode to put the core in - there is
only one micro-threading mode for the whole core.  The code that
implements dynamic micro-threading checks for this, except that the
check was missed in one case.  This means that it is possible for a
subcore in static 2-way micro-threading mode to try to put the core
into 4-way micro-threading mode, which usually leads to stuck CPUs,
spinlock lockups, and other stalls in the host.

The problem was in the can_split_piggybacked_subcores() function, which
should always return false if the system is in a static micro-threading
mode.  This fixes the problem by making can_split_piggybacked_subcores()
use subcore_config_ok() for its checks, as subcore_config_ok() includes
the necessary check for the static micro-threading modes.

Credit to Gautham Shenoy for working out that the reason for the hangs
and stalls we were seeing was that we were trying to do dynamic 4-way
micro-threading while we were in static 2-way mode.

Fixes: b4deba5c41e9
Cc: v...@stable.kernel.org # v4.3
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..becad3a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct 
core_info *cip)
return false;
n_subcores += (cip->subcore_threads[sub] - 1) >> 1;
}
-   if (n_subcores > 3 || large_sub < 0)
+   if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2))
return false;
 
/*
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3S HV: Don't dynamically split core when already split

2015-11-05 Thread Paul Mackerras
In static micro-threading modes, the dynamic micro-threading code
is supposed to be disabled, because subcores can't make independent
decisions about what micro-threading mode to put the core in - there is
only one micro-threading mode for the whole core.  The code that
implements dynamic micro-threading checks for this, except that the
check was missed in one case.  This means that it is possible for a
subcore in static 2-way micro-threading mode to try to put the core
into 4-way micro-threading mode, which usually leads to stuck CPUs,
spinlock lockups, and other stalls in the host.

The problem was in the can_split_piggybacked_subcores() function, which
should always return false if the system is in a static micro-threading
mode.  This fixes the problem by making can_split_piggybacked_subcores()
use subcore_config_ok() for its checks, as subcore_config_ok() includes
the necessary check for the static micro-threading modes.

Credit to Gautham Shenoy for working out that the reason for the hangs
and stalls we were seeing was that we were trying to do dynamic 4-way
micro-threading while we were in static 2-way mode.

Fixes: b4deba5c41e9
Cc: v...@stable.kernel.org # v4.3
Signed-off-by: Paul Mackerras 
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2280497..becad3a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2060,7 +2060,7 @@ static bool can_split_piggybacked_subcores(struct 
core_info *cip)
return false;
n_subcores += (cip->subcore_threads[sub] - 1) >> 1;
}
-   if (n_subcores > 3 || large_sub < 0)
+   if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2))
return false;
 
/*
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: irqchip: fix memory leak in -stable

2015-11-05 Thread Greg KH
On Wed, Nov 04, 2015 at 02:11:54PM +0100, William Dauchy wrote:
> Hello stable release team,
> 
> The commit ba60c41 kvm: irqchip: fix memory leak
> is fixing commit e73f61e kvm: irqchip: Break up high order allocations of 
> kvm_irq_routing_table
> 
> I believe commit ba60c41 kvm: irqchip: fix memory leak
> is a good candidate for -stable. I also  got an agreement from Paolo.

Now queued up.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-05 Thread Mario Smarduch


On 11/5/2015 7:02 AM, Christoffer Dall wrote:
> On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
>> This patch enables arm64 lazy fp/simd switch, similar to arm described in
>> second patch. Change from previous version - restore function is moved to
>> host. 
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm64/include/asm/kvm_host.h |  2 +-
>>  arch/arm64/kernel/asm-offsets.c   |  1 +
>>  arch/arm64/kvm/hyp.S  | 37 +++--
>>  3 files changed, 33 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 26a2347..dcecf92 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>>  
>>  void kvm_arm_init_debug(void);
>>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
>> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>>  
>>  #endif /* __ARM64_KVM_HOST_H__ */
>> diff --git a/arch/arm64/kernel/asm-offsets.c 
>> b/arch/arm64/kernel/asm-offsets.c
>> index 8d89cf8..c9c5242 100644
>> --- a/arch/arm64/kernel/asm-offsets.c
>> +++ b/arch/arm64/kernel/asm-offsets.c
>> @@ -124,6 +124,7 @@ int main(void)
>>DEFINE(VCPU_HCR_EL2,  offsetof(struct kvm_vcpu, 
>> arch.hcr_el2));
>>DEFINE(VCPU_MDCR_EL2, offsetof(struct kvm_vcpu, arch.mdcr_el2));
>>DEFINE(VCPU_IRQ_LINES,offsetof(struct kvm_vcpu, arch.irq_lines));
>> +  DEFINE(VCPU_VFP_DIRTY,offsetof(struct kvm_vcpu, arch.vfp_dirty));
>>DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, 
>> arch.host_cpu_context));
>>DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
>> arch.host_debug_state));
>>DEFINE(VCPU_TIMER_CNTV_CTL,   offsetof(struct kvm_vcpu, 
>> arch.timer_cpu.cntv_ctl));
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index e583613..ed2c4cf 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -36,6 +36,28 @@
>>  #define CPU_SYSREG_OFFSET(x)(CPU_SYSREGS + 8*x)
>>  
>>  .text
>> +
>> +/**
>> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
>> + *  fp/simd switch, saves the guest, restores host. Called from host
>> + *  mode, placed outside of hyp section.
> 
> same comments on style as previous patch
Got it.
> 
>> + */
>> +ENTRY(kvm_restore_host_vfp_state)
>> +pushxzr, lr
>> +
>> +add x2, x0, #VCPU_CONTEXT
>> +mov w3, #0
>> +strbw3, [x0, #VCPU_VFP_DIRTY]
> 
> I've been discussing with myself if it would make more sense to clear
> the dirty flag in the C-code...
Since all the work is done here I placed it here.
> 
>> +
>> +bl __save_fpsimd
>> +
>> +ldr x2, [x0, #VCPU_HOST_CONTEXT]
>> +bl __restore_fpsimd
>> +
>> +pop xzr, lr
>> +ret
>> +ENDPROC(kvm_restore_host_vfp_state)
>> +
>>  .pushsection.hyp.text, "ax"
>>  .align  PAGE_SHIFT
>>  
>> @@ -482,7 +504,11 @@
>>  99:
>>  msr hcr_el2, x2
>>  mov x2, #CPTR_EL2_TTA
>> +
>> +ldrbw3, [x0, #VCPU_VFP_DIRTY]
>> +tbnzw3, #0, 98f
>>  orr x2, x2, #CPTR_EL2_TFP
>> +98:
> 
> mmm, don't you need to only set the fpexc32 when you're actually going
> to trap the guest accesses?

My understanding is you always need to set enable in fpexec32 for 32 bit guests,
otherwise EL1 would get the trap instead of EL2. Not sure if that's the point
you're making.

> 
> also, you can consider only setting this in vcpu_load (jumping quickly
> to EL2 to do so) if we're running a 32-bit guest.  Probably worth
> measuring the difference between the extra EL2 jump on vcpu_load
> compared to hitting this register on every entry/exit.

Sure, makes sense since this is a hot code path.
> 
> Code-wise, it will be nicer to do it on vcpu_load.
> 
>>  msr cptr_el2, x2
>>  
>>  mov x2, #(1 << 15)  // Trap CP15 Cr=15
>> @@ -669,14 +695,12 @@ __restore_debug:
>>  ret
>>  
>>  __save_fpsimd:
>> -skip_fpsimd_state x3, 1f
>>  save_fpsimd
>> -1:  ret
>> +ret
>>  
>>  __restore_fpsimd:
>> -skip_fpsimd_state x3, 1f
>>  restore_fpsimd
>> -1:  ret
>> +ret
>>  
>>  switch_to_guest_fpsimd:
>>  pushx4, lr
>> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
>>  
>>  mrs x0, tpidr_el2
>>  
>> +mov w2, #1
>> +strbw2, [x0, #VCPU_VFP_DIRTY]
> 
> hmm, just noticing this.  Are you not writing a 32-bit value to a
> potentially 8-bit field (ignoring padding in the struct), as the dirty
> flag is declared a 

Re: [PATCH v3 2/3] target-i386: calculate vcpu's TSC rate to be migrated

2015-11-05 Thread haozhong . zhang
On 11/05/15 14:05, Eduardo Habkost wrote:
> On Thu, Nov 05, 2015 at 09:30:51AM +0800, Haozhong Zhang wrote:
> > On 11/04/15 19:42, Eduardo Habkost wrote:
> > > On Mon, Nov 02, 2015 at 05:26:42PM +0800, Haozhong Zhang wrote:
> > > > The value of the migrated vcpu's TSC rate is determined as below.
> > > >  1. If a TSC rate is specified by the cpu option 'tsc-freq', then this
> > > > user-specified value will be used.
> > > >  2. If neither a user-specified TSC rate nor a migrated TSC rate is
> > > > present, we will use the TSC rate from KVM (returned by
> > > > KVM_GET_TSC_KHZ).
> > > >  3. Otherwise, we will use the migrated TSC rate.
> > > > 
> > > > Signed-off-by: Haozhong Zhang 
> > > [...]
> > > > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > > > index 64046cb..aae5e58 100644
> > > > --- a/target-i386/kvm.c
> > > > +++ b/target-i386/kvm.c
> > > > @@ -3034,3 +3034,36 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> > > >  {
> > > >  abort();
> > > >  }
> > > > +
> > > > +int kvm_arch_setup_tsc_khz(CPUState *cs)
> > > > +{
> > > > +X86CPU *cpu = X86_CPU(cs);
> > > > +CPUX86State *env = >env;
> > > > +int r;
> > > > +
> > > > +/*
> > > > + * Prepare vcpu's TSC rate to be migrated.
> > > > + *
> > > > + * - If the user specifies the TSC rate by cpu option 'tsc-freq',
> > > > + *   we will use the user-specified value.
> > > > + *
> > > > + * - If there is neither user-specified TSC rate nor migrated TSC
> > > > + *   rate, we will ask KVM for the TSC rate by calling
> > > > + *   KVM_GET_TSC_KHZ.
> > > > + *
> > > > + * - Otherwise, if there is a migrated TSC rate, we will use the
> > > > + *   migrated value.
> > > > + */
> > > > +if (env->tsc_khz) {
> > > > +env->tsc_khz_saved = env->tsc_khz;
> > > > +} else if (!env->tsc_khz_saved) {
> > > > +r = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> > > > +if (r < 0) {
> > > > +fprintf(stderr, "KVM_GET_TSC_KHZ failed\n");
> > > > +return r;
> > > > +}
> > > 
> > > The lack of KVM_CAP_GET_TSC_KHZ should make QEMU abort, unless the user
> > > is explicitly requesting a more strict mode where the TSC frequency will
> > > be guaranteed to never change.
> > >
> > 
> > I agree KVM_CAP_GET_TSC_KHZ should be checked before KVM_GET_TSC_KHZ,
> > but I don't think the lack of it should abort QEMU.
> 
> 
> Oops, I meant to write: "the lack of KVM_CAP_GET_TSC_KHZ should not
> abort QEMU".
> 
> > This piece of code
> > on the source machine is just to get the TSC frequency to be
> > migrated. If it fails, it will leave env->tsc_khz_saved be 0. And
> > according to tsc_khz_needed() in patch 1, if env->tsc_khz_saved == 0,
> > no TSC frequency will be migrated. So the lack of KVM_CAP_GET_TSC_KHZ
> > only hurts the migration and does not need to abort QEMU on the source
> > machine.
> 
> The lack of KVM_CAP_GET_TSC_KHZ shouldn't prevent migration either. but
> it looks your code is not doing that: errors from
> kvm_arch_setup_tsc_khz() are being ignored by
> do_kvm_cpu_synchronize_post_init(), sorry for the noise.
>
> > 
> > > > +env->tsc_khz_saved = r;
> > > > +}
> > > 
> > > Why do you need a separate tsc_khz_saved field, and don't simply use
> > > tsc_khz? It would have the additional feature of letting QMP clients
> > > query the current TSC rate by asking for the tsc-freq property on CPU
> > > objects.
> > >
> > 
> > It's to avoid overriding env->tsc_khz on the destination in the
> > migration. I can change this line to
> >  env->tsc_khz = env->tsc_khz_saved = r;
> 
> You are already avoiding overriding env->tsc_khz, because you use
> KVM_GET_TSC_KHZ only if tsc_khz is not set yet. I still don't see why
> you need a tsc_khz_saved field that requires duplicating the SET_TSC_KHZ
> code, if you could just do this:
> 
> if (!env->tsc_khz) {
> env->tsc_khz = kvm_vcpu_ioctl(cs, KVM_GET_TSC_KHZ);
> }
>

Consider an example that we migrate a VM from machine A to machine B
and then to machine C, and QEMU on machine B is launched with the cpu
option 'tsc-freq' (i.e. env->tsc_khz on B is non-zero at the
beginning):
 1) In the migration from B to C, the user-specified TSC frequency by
'tsc-freq' on B is expected to be migrated to C. That is, the
value of env->tsc_khz on B is migrated.
 2) If TSC frequency is migrated through env->tsc_khz, then
env->tsc_khz on B will be overrode in the migration from A to B
before kvm_arch_setup_tsc_khz(). If the guest TSC frequency is
different than the user-specified TSC frequency on B, the
expectation in 1) will not be satisfied anymore.

So, I introduce the field tsc_khz_saved to migrate TSC frequency and
(in patch 3) let the destination decide if this migrated one will be
used.

And, adding a flag like tsc_freq_requested_by_user in your comments on
patch 3 does not solve the problem.

> 
> > 
> > For the additional 

Re: [PATCH v3 3/3] target-i386: load the migrated vcpu's TSC rate

2015-11-05 Thread Haozhong Zhang
On 11/05/15 14:10, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:26:43PM +0800, Haozhong Zhang wrote:
> > Set vcpu's TSC rate to the migrated value if the user does not specify a
> > TSC rate by cpu option 'tsc-freq' and a migrated TSC rate does exist. If
> > KVM supports TSC scaling, guest programs will observe TSC increasing in
> > the migrated rate other than the host TSC rate.
> > 
> > Signed-off-by: Haozhong Zhang 
> > ---
> >  target-i386/kvm.c | 21 +
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> > index aae5e58..2be70df 100644
> > --- a/target-i386/kvm.c
> > +++ b/target-i386/kvm.c
> > @@ -3042,6 +3042,27 @@ int kvm_arch_setup_tsc_khz(CPUState *cs)
> >  int r;
> >  
> >  /*
> > + * If a TSC rate is migrated and the user does not specify the
> > + * vcpu's TSC rate on the destination, the migrated TSC rate will
> > + * be used on the destination after the migration.
> > + */
> > +if (env->tsc_khz_saved && !env->tsc_khz) {
> > +if (kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL)) {
> > +r = kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz_saved);
> 
> Why are you duplicating the existing KVM_SET_TSC_KHZ code in
> kvm_arch_init_vcpu()?
>

Because they are called in different cases and their behaviors on
failure are different:
 1) KVM_SET_TSC_KHZ in kvm_arch_init_vcpu() is called only when a VM
is created and a user-specified TSC frequency is given. If it
fails, QEMU will abort.
 2) KVM_SET_TSC_KHZ in kvm_arch_setup_tsc_khz() is called on the
destination only when TSC frequency is migrated and no
user-specified TSC frequency is given. If it fails, QEMU as well
as the migration will not be aborted.

However, after reading your comment at the end, they really could be
merged.

> > +if (r < 0) {
> > +fprintf(stderr, "KVM_SET_TSC_KHZ failed\n");
> 
> If you want to report errors, please use error_report().
> 
> (But I don't think we want to print those warnings. See below.)
> 
> > +}
> > +} else {
> > +r = -1;
> > +fprintf(stderr, "KVM doesn't support TSC scaling\n");
> > +}
> > +if (r < 0) {
> > +fprintf(stderr, "Use host TSC frequency instead. "
> 
> Did you mean "Using host TSC frequency instead."?
>

Yes.

> > +"Guest TSC may be inaccurate.\n");
> > +}
> > +}
> 
> This will make QEMU print a warning every single time when migrating to
> hosts that don't support TSC scaling, even if the source and destination
> hosts already have the same TSC frequency. That means most users will
> see a bogus warning, in today's hardware.
> 
> Maybe it will be acceptable to print a warning if (and only if) we know
> that the host TSC is different from the original TSC frequency.
>

Agree, I should add such a check to avoid bogus warnings.

> Considering that we already have code to handle tsc_khz that prints an
> error, you don't need to duplicate it. You could handle both
> user-provided and migration tsc_khz cases with the same code. With
> something like this:
>

Mostly, but as tsc_khz_saved in patch 2 is really needed, I'll make
some minor changes.

- if (env->tsc_khz) { /* may be set by the user, or loaded from incoming 
migration */
+ if (env->tsc_khz || env->tsc_khz_saved) { /* may be set by the user, or 
loaded from incoming migration */
+ int64_t tgt_tsc_khz = env->tsc_khz ? : env->tsc_khz_saved;
> r = kvm_check_extension(cs->kvm_state, KVM_CAP_TSC_CONTROL) ?
- kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, env->tsc_khz) :
+ kvm_vcpu_ioctl(cs, KVM_SET_TSC_KHZ, tgt_tsc_khz) :
> -ENOTSUP;
> if (r < 0) {
> int64_t cur_freq = kvm_check_extension(KVM_CAP_GET_TSC_KHZ)) ?
>kvm_vcpu_ioctl(KVM_GET_TSC_KHZ) :
>0;
> /* If we know the host frequency, print a warning every time
>  * there's a mismatch.
>  * If we don't know the host frequency, print a warning only
>  * if the user asked for a specific TSC frequency.
>  */
- if ((cur_freq <= 0 && env->tsc_freq_requested_by_user) ||
+ if ((cur_freq <= 0 && env->tsc_khz) ||
- (cur_freq > 0 && cur_freq != env->tsc_khz)) {
+ (cur_freq > 0 && cur_freq != tgt_tsc_khz)) {
> error_report("warning: TSC frequency mismatch between VM and 
> host, and TSC scaling unavailable");
- if (env->tsc_freq_set_by_user) {
+ if (env->tsc_khz) {
> return r;
> }
> }
> }
> }
>

Haozhong

> You will just need a new tsc_freq_requested_by_user field to track if
> the TSC frequency was explicitly requested by the user.
> 
> -- 
> Eduardo
--
To 

[PATCH 0/5] KVM: x86: MMU: Clean up x86's mmu code for future work

2015-11-05 Thread Takuya Yoshikawa
Patch 1/2/3 are easy ones.

Following two, patch 4/5, may not be ideal solutions, but at least
explain, or try to explain, the problems.

Takuya Yoshikawa (5):
  KVM: x86: MMU: Remove unused parameter of __direct_map()
  KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
  KVM: x86: MMU: Make mmu_set_spte() return emulate value
  KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
  KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

 Documentation/virtual/kvm/mmu.txt |   4 +-
 arch/x86/kvm/mmu.c| 118 --
 arch/x86/kvm/mmu_audit.c  |   2 +-
 arch/x86/kvm/paging_tmpl.h|  10 ++--
 4 files changed, 70 insertions(+), 64 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap

2015-11-05 Thread Takuya Yoshikawa
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line
code which clears a bit in the unsync child bitmap; the former places it
inside a loop block and uses a few goto statements to jump to it.

A new helper function, clear_unsync_child_bit(), makes the code cleaner.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a76bc04..a9622a2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, 
struct kvm_mmu_page *sp,
return (pvec->nr == KVM_PAGE_ARRAY_NR);
 }
 
+static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx)
+{
+   --sp->unsync_children;
+   WARN_ON((int)sp->unsync_children < 0);
+   __clear_bit(idx, sp->unsync_child_bitmap);
+}
+
 static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
   struct kvm_mmu_pages *pvec)
 {
@@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];
 
-   if (!is_shadow_present_pte(ent) || is_large_pte(ent))
-   goto clear_child_bitmap;
+   if (!is_shadow_present_pte(ent) || is_large_pte(ent)) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   }
 
child = page_header(ent & PT64_BASE_ADDR_MASK);
 
@@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
return -ENOSPC;
 
ret = __mmu_unsync_walk(child, pvec);
-   if (!ret)
-   goto clear_child_bitmap;
-   else if (ret > 0)
+   if (!ret) {
+   clear_unsync_child_bit(sp, i);
+   continue;
+   } else if (ret > 0) {
nr_unsync_leaf += ret;
-   else
+   } else
return ret;
} else if (child->unsync) {
nr_unsync_leaf++;
if (mmu_pages_add(pvec, child, i))
return -ENOSPC;
} else
-goto clear_child_bitmap;
-
-   continue;
-
-clear_child_bitmap:
-   __clear_bit(i, sp->unsync_child_bitmap);
-   sp->unsync_children--;
-   WARN_ON((int)sp->unsync_children < 0);
+   clear_unsync_child_bit(sp, i);
}
 
-
return nr_unsync_leaf;
 }
 
@@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path 
*parents)
if (!sp)
return;
 
-   --sp->unsync_children;
-   WARN_ON((int)sp->unsync_children < 0);
-   __clear_bit(idx, sp->unsync_child_bitmap);
+   clear_unsync_child_bit(sp, idx);
level++;
} while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/5] KVM: x86: MMU: Remove unused parameter of __direct_map()

2015-11-05 Thread Takuya Yoshikawa
Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d85bca..a76bc04 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write,
-   int map_writable, int level, gfn_t gfn, pfn_t pfn,
-   bool prefault)
+static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable,
+   int level, gfn_t gfn, pfn_t pfn, bool prefault)
 {
struct kvm_shadow_walk_iterator iterator;
struct kvm_mmu_page *sp;
@@ -3018,8 +3017,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, 
u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn,
-prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
 
@@ -3541,8 +3539,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
make_mmu_pages_available(vcpu);
if (likely(!force_pt_level))
transparent_hugepage_adjust(vcpu, , , );
-   r = __direct_map(vcpu, gpa, write, map_writable,
-level, gfn, pfn, prefault);
+   r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault);
spin_unlock(>kvm->mmu_lock);
 
return r;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/5] KVM: x86: MMU: Make mmu_set_spte() return emulate value

2015-11-05 Thread Takuya Yoshikawa
mmu_set_spte()'s code is based on the assumption that the emulate
parameter has a valid pointer value if set_spte() returns true and
write_fault is not zero.  In other cases, emulate may be NULL, so a
NULL-check is needed.

Stop passing emulate pointer and make mmu_set_spte() return the emulate
value instead to clean up this complex interface.  Prefetch functions
can just throw away the return value.

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c | 27 ++-
 arch/x86/kvm/paging_tmpl.h | 10 +-
 2 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a9622a2..69e7d20 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2564,13 +2564,13 @@ done:
return ret;
 }
 
-static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
-unsigned pte_access, int write_fault, int *emulate,
-int level, gfn_t gfn, pfn_t pfn, bool speculative,
-bool host_writable)
+static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned 
pte_access,
+int write_fault, int level, gfn_t gfn, pfn_t pfn,
+bool speculative, bool host_writable)
 {
int was_rmapped = 0;
int rmap_count;
+   bool emulate = false;
 
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
@@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative,
  true, host_writable)) {
if (write_fault)
-   *emulate = 1;
+   emulate = true;
kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
}
 
-   if (unlikely(is_mmio_spte(*sptep) && emulate))
-   *emulate = 1;
+   if (unlikely(is_mmio_spte(*sptep)))
+   emulate = true;
 
pgprintk("%s: setting spte %llx\n", __func__, *sptep);
pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n",
@@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep,
}
 
kvm_release_pfn_clean(pfn);
+
+   return emulate;
 }
 
 static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn,
@@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu,
return -1;
 
for (i = 0; i < ret; i++, gfn++, start++)
-   mmu_set_spte(vcpu, start, access, 0, NULL,
-sp->role.level, gfn, page_to_pfn(pages[i]),
-true, true);
+   mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn,
+page_to_pfn(pages[i]), true, true);
 
return 0;
 }
@@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, 
int map_writable,
 
for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
if (iterator.level == level) {
-   mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
-write, , level, gfn, pfn,
-prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL,
+  write, level, gfn, pfn, prefault,
+  map_writable);
direct_pte_prefetch(vcpu, iterator.sptep);
++vcpu->stat.pf_fixed;
break;
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b41faa9..de24499 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct 
kvm_mmu_page *sp,
 * we call mmu_set_spte() with host_writable = true because
 * pte_prefetch_gfn_to_pfn always gets a writable pfn.
 */
-   mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL,
-gfn, pfn, true, true);
+   mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn,
+true, true);
 
return true;
 }
@@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
struct kvm_mmu_page *sp = NULL;
struct kvm_shadow_walk_iterator it;
unsigned direct_access, access = gw->pt_access;
-   int top_level, emulate = 0;
+   int top_level, emulate;
 
direct_access = gw->pte_access;
 
@@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
}
 
clear_sp_write_flooding_count(it.sptep);
-   mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, ,
-it.level, gw->gfn, pfn, prefault, map_writable);
+   emulate = mmu_set_spte(vcpu, 

[PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes

2015-11-05 Thread Takuya Yoshikawa
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is
placed right after the call to detect unrelated sptes which should not
be found in the reverse-mapping list.

Move this check in rmap_get_first/next() so that all call sites, not
just the users of the for_each_rmap_spte() macro, will be checked the
same way.  In addition, change the BUG_ON to WARN_ON since killing the
whole host is the last thing that KVM should try.

One thing to keep in mind is that kvm_mmu_unlink_parents() also uses
rmap_get_first() to handle parent sptes.  The change will not break it
because parent sptes are present, at least until drop_parent_pte()
actually unlinks them, and not mmio-sptes.

Signed-off-by: Takuya Yoshikawa 
---
 Documentation/virtual/kvm/mmu.txt |  4 ++--
 arch/x86/kvm/mmu.c| 31 ++-
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/Documentation/virtual/kvm/mmu.txt 
b/Documentation/virtual/kvm/mmu.txt
index 3a4d681..daf9c0f 100644
--- a/Documentation/virtual/kvm/mmu.txt
+++ b/Documentation/virtual/kvm/mmu.txt
@@ -203,10 +203,10 @@ Shadow pages contain the following information:
 page cannot be destroyed.  See role.invalid.
   parent_ptes:
 The reverse mapping for the pte/ptes pointing at this page's spt. If
-parent_ptes bit 0 is zero, only one spte points at this pages and
+parent_ptes bit 0 is zero, only one spte points at this page and
 parent_ptes points at this single spte, otherwise, there exists multiple
 sptes pointing at this page and (parent_ptes & ~0x1) points at a data
-structure with a list of parent_ptes.
+structure with a list of parent sptes.
   unsync:
 If true, then the translations in this page may not match the guest's
 translation.  This is equivalent to the state of the tlb when a pte is
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c5e2363..353d752 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1099,17 +1099,28 @@ struct rmap_iterator {
  */
 static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (!rmap)
return NULL;
 
if (!(rmap & 1)) {
iter->desc = NULL;
-   return (u64 *)rmap;
+   sptep = (u64 *)rmap;
+   goto out;
}
 
iter->desc = (struct pte_list_desc *)(rmap & ~1ul);
iter->pos = 0;
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+out:
+   /*
+* Parent sptes found in sp->parent_ptes lists are also checked here
+* since kvm_mmu_unlink_parents() uses this function.  If the condition
+* needs to be changed for them, make another wrapper function.
+*/
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 /*
@@ -1119,14 +1130,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct 
rmap_iterator *iter)
  */
 static u64 *rmap_get_next(struct rmap_iterator *iter)
 {
+   u64 *sptep;
+
if (iter->desc) {
if (iter->pos < PTE_LIST_EXT - 1) {
-   u64 *sptep;
-
++iter->pos;
sptep = iter->desc->sptes[iter->pos];
if (sptep)
-   return sptep;
+   goto out;
}
 
iter->desc = iter->desc->more;
@@ -1134,17 +1145,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
if (iter->desc) {
iter->pos = 0;
/* desc->sptes[0] cannot be NULL */
-   return iter->desc->sptes[iter->pos];
+   sptep = iter->desc->sptes[iter->pos];
+   goto out;
}
}
 
return NULL;
+out:
+   WARN_ON(!is_shadow_present_pte(*sptep));
+   return sptep;
 }
 
 #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
   for (_spte_ = rmap_get_first(*_rmap_, _iter_);   \
-   _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;});  \
-   _spte_ = rmap_get_next(_iter_))
+   _spte_; _spte_ = rmap_get_next(_iter_))
 
 static void drop_spte(struct kvm *kvm, u64 *sptep)
 {
@@ -1358,7 +1372,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long 
*rmapp)
bool flush = false;
 
while ((sptep = rmap_get_first(*rmapp, ))) {
-   BUG_ON(!(*sptep & PT_PRESENT_MASK));
rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);
 
drop_spte(kvm, sptep);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()

2015-11-05 Thread Takuya Yoshikawa
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the
simple reverse mapping was implemented by commit cd4a4e5374110444
("[PATCH] KVM: MMU: Implement simple reverse mapping").  At that point,
its role was clear and only rmap_add() and rmap_remove() were using it
to select sptes that need to be reverse-mapped.

Independently of that, is_shadow_present_pte() was first introduced by
commit c7addb902054195b ("KVM: Allow not-present guest page faults to
bypass kvm") to do bypass_guest_pf optimization, which does not exist
any more.

These two seem to have changed their roles somewhat, and is_rmap_spte()
just calls is_shadow_present_pte() now.

Since using both of them without no clear distinction just makes the
code confusing, remove is_rmap_spte().

Signed-off-by: Takuya Yoshikawa 
---
 arch/x86/kvm/mmu.c   | 13 -
 arch/x86/kvm/mmu_audit.c |  2 +-
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 69e7d20..c5e2363 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -311,11 +311,6 @@ static int is_large_pte(u64 pte)
return pte & PT_PAGE_SIZE_MASK;
 }
 
-static int is_rmap_spte(u64 pte)
-{
-   return is_shadow_present_pte(pte);
-}
-
 static int is_last_spte(u64 pte, int level)
 {
if (level == PT_PAGE_TABLE_LEVEL)
@@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
u64 old_spte = *sptep;
bool ret = false;
 
-   WARN_ON(!is_rmap_spte(new_spte));
+   WARN_ON(!is_shadow_present_pte(new_spte));
 
if (!is_shadow_present_pte(old_spte)) {
mmu_spte_set(sptep, new_spte);
@@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep)
else
old_spte = __update_clear_spte_slow(sptep, 0ull);
 
-   if (!is_rmap_spte(old_spte))
+   if (!is_shadow_present_pte(old_spte))
return 0;
 
pfn = spte_to_pfn(old_spte);
@@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 
*sptep, unsigned pte_access,
pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__,
 *sptep, write_fault, gfn);
 
-   if (is_rmap_spte(*sptep)) {
+   if (is_shadow_present_pte(*sptep)) {
/*
 * If we overwrite a PTE page pointer with a 2MB PMD, unlink
 * the parent of the now unreachable PTE.
@@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t 
gva, int level,
 * If the mapping has been changed, let the vcpu fault on the
 * same address again.
 */
-   if (!is_rmap_spte(spte)) {
+   if (!is_shadow_present_pte(spte)) {
ret = true;
goto exit;
}
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 03d518e..90ee420 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct 
kvm_mmu_page *sp)
return;
 
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
-   if (!is_rmap_spte(sp->spt[i]))
+   if (!is_shadow_present_pte(sp->spt[i]))
continue;
 
inspect_spte_has_rmap(kvm, sp->spt + i);
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvmtool: assume dead vcpus are paused too

2015-11-05 Thread Will Deacon
On Wed, Nov 04, 2015 at 06:51:12PM -0500, Sasha Levin wrote:
> On 11/04/2015 06:51 AM, Will Deacon wrote:
> > +   mutex_lock(_lock);
> > +
> > +   /* The kvm->cpus array contains a null pointer in the last location */
> > +   for (i = 0; ; i++) {
> > +   if (kvm->cpus[i])
> > +   pthread_kill(kvm->cpus[i]->thread, SIGKVMEXIT);
> > +   else
> > +   break;
> > +   }
> > +
> > +   kvm__continue(kvm);
> 
> In this scenario: if we grabbed pause_lock, signaled vcpu0 to exit, and it did
> before we called kvm__continue(), we won't end up releasing pause_lock, which
> might cause a lockup later, no?

Hmm, yeah, maybe that should be an explicit mutex_unlock rather than a
call to kvm__continue.

Will
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: arm: Fix crash in free_hyp_pgds() if timer initialization fails

2015-11-05 Thread Christoffer Dall
Hi Pavel,

On Tue, Oct 27, 2015 at 10:40:08AM +0300, Pavel Fedin wrote:
> After vGIC initialization succeeded, and timer initialization failed,
> the following crash can be observed on ARM32:
> 
> kvm [1]: interrupt-controller@10484000 IRQ57
> kvm [1]: kvm_arch_timer: can't find DT node
> Unable to handle kernel paging request at virtual address 90484000
> pgd = c0003000
> [90484000] *pgd=8040006003, *pmd=
> Internal error: Oops: 2a06 [#1] PREEMPT SMP ARM
> ...
> [] (v7_flush_kern_dcache_area) from [] 
> (kvm_flush_dcache_pte+0x48/0x5c)
> [] (kvm_flush_dcache_pte) from [] 
> (unmap_range+0x24c/0x460)
> [] (unmap_range) from [] (free_hyp_pgds+0x84/0x160)
> [] (free_hyp_pgds) from [] (kvm_arch_init+0x254/0x41c)
> [] (kvm_arch_init) from [] (kvm_init+0x28/0x2b4)
> [] (kvm_init) from [] (do_one_initcall+0x9c/0x200)
> 
> This happens when unmapping reaches mapped vGIC control registers. The
> problem root seems to be combination of two facts:
> 1. vGIC control region is defined in device trees as having size of
>0x2000. But the specification defines only registers up to 0x1FC,
>therefore it is only one page, not two.
> 2. unmap_ptes() is expected to recognize device memory and omit cache
>flushing. However, it tests only for PAGE_S2_DEVICE, while devices
>mapped for HYP mode have PAGE_HYP_DEVICE, which is different.
>Therefore, cache flush is attempted, and it dies when hitting the
>nonexistent second page.
> 
> This patch fixes the problem by adding missing recognition of
> PAGE_HYP_DEVICE protection value.
> 
> The crash can be observed on Exynos 5410 (and probably on all Exynos5
> family) with stock device trees (using MCT) and CONFIG_KVM enabled.
> 
> Signed-off-by: Pavel Fedin 
> ---
>  arch/arm/kvm/mmu.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7b42012..839dd970 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -213,7 +213,10 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
>   kvm_tlb_flush_vmid_ipa(kvm, addr);
>  
>   /* No need to invalidate the cache for device mappings 
> */
> - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
> PAGE_S2_DEVICE)
> + if (((pte_val(old_pte) & PAGE_S2_DEVICE)
> +  != PAGE_S2_DEVICE) &&
> + ((pte_val(old_pte) & PAGE_HYP_DEVICE)
> +  != PAGE_HYP_DEVICE))
>   kvm_flush_dcache_pte(old_pte);
>  
>   put_page(virt_to_page(pte));
> -- 
> 2.4.4
> 

Did you check if PAGE_HYP_DEVICE can mean something sane on a stage-2
page table entry and vice verse?

Also, the commit message and formatting here is horrible, see this
reworked version:

>From e15700dd24419bb0e7ddc79feaa4efdf20304f2c Mon Sep 17 00:00:00 2001
From: Pavel Fedin 
Date: Tue, 27 Oct 2015 10:40:08 +0300
Subject: [PATCH] KVM: arm: Don't try to flush hyp-mode device mappings

The unmap_ptes function is currently called to unmap both Stage-2 and
Hyp mode page table entries.  Since calling clean and invalidate on
device memory may raise exceptions, we currently test against
PAGE_S2_DEVICE and do not flush for such mappings.  However, we should
also be testing against PAGE_HYP_DEVICE.

This fixes an issue observed on some 32-bit platforms, for example the
Exynos 5410.

Signed-off-by: Pavel Fedin 
Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/mmu.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6984342..f0c3aef 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -206,18 +206,20 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd,
 
start_pte = pte = pte_offset_kernel(pmd, addr);
do {
-   if (!pte_none(*pte)) {
-   pte_t old_pte = *pte;
+   if (pte_none(*pte))
+   continue;
 
-   kvm_set_pte(pte, __pte(0));
-   kvm_tlb_flush_vmid_ipa(kvm, addr);
+   pte_t old_pte = *pte;
 
-   /* No need to invalidate the cache for device mappings 
*/
-   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != 
PAGE_S2_DEVICE)
-   kvm_flush_dcache_pte(old_pte);
+   kvm_set_pte(pte, __pte(0));
+   kvm_tlb_flush_vmid_ipa(kvm, addr);
 
-   put_page(virt_to_page(pte));
-   }
+   /* No need to invalidate the cache for device mappings */
+   if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE &&
+   (pte_val(old_pte) & PAGE_HYP_DEVICE) != PAGE_HYP_DEVICE)
+   kvm_flush_dcache_pte(old_pte);
+
+   

Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-05 Thread Igor Mammedov
On Thu, 5 Nov 2015 21:33:39 +0800
Xiao Guangrong  wrote:

> 
> 
> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> > On Thu, 5 Nov 2015 18:15:31 +0800
> > Xiao Guangrong  wrote:
> >
> >>
> >>
> >> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> >>> On Mon,  2 Nov 2015 17:13:27 +0800
> >>> Xiao Guangrong  wrote:
> >>>
>  A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
> >>>  ^^ missing one 0???
> >>>
>  reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>  for detailed design
> 
>  A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>  that controls if nvdimm support is enabled, it is true on default and
>  it is false on 2.4 and its earlier version to keep compatibility
> 
>  Signed-off-by: Xiao Guangrong 
> >>> [...]
> >>>
>  @@ -33,6 +33,15 @@
>  */
> #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)
> 
>  +/*
>  + * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are
> >>>^^^ missing 0 or value in define below 
> >>> has an extra 0
> >>>
>  + * reserved for NVDIMM ACPI emulation, refer to 
>  docs/specs/acpi_nvdimm.txt
>  + * for detailed design.
>  + */
>  +#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL
> >>> it still maps RAM at arbitrary place,
> >>> that's the reason why VMGenID patches hasn't been merged for
> >>> more than several months.
> >>> I'm not against of using (more exactly I'm for it) direct mapping
> >>> but we should reach consensus when and how to use it first.
> >>>
> >>> I'd wouldn't use addresses below 4G as it may be used firmware or
> >>> legacy hardware and I won't bet that 0xFF00ULL won't conflict
> >>> with anything.
> >>> An alternative place to allocate reserve from could be high memory.
> >>> For pc we have "reserved-memory-end" which currently makes sure
> >>> that hotpluggable memory range isn't used by firmware.
> >>>
> >>> How about making API that allows to map additional memory
> >>> ranges before reserved-memory-end and pushes it up as mappings are
> >>> added.
> >>
> >> That what i did in the v1/v2 versions, however, as you noticed, using 
> >> 64-bit
> >> address in ACPI in QEMU is not a easy work - we can not simply make
> >> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
> >> v3's changelog:
> >>
> >> 3) we figure out a unused memory hole below 4G that is 0xFF0 ~
> >>0xFFF0, this range is large enough for NVDIMM ACPI as build 
> >> 64-bit
> >>ACPI SSDT/DSDT table will break windows XP.
> >>BTW, only make SSDT.rev = 2 can not work since the width is only 
> >> depended
> >>on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition 
> >> Block)
> >>in ACPI spec:
> >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> >> | width of Integer objects is dependent on the ComplianceRevision of the 
> >> DSDT.
> >> | If the ComplianceRevision is less than 2, all integers are restricted to 
> >> 32
> >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT 
> >> sets
> >> | the global integer width for all integers, including integers in SSDTs.
> >> 4) use the lowest ACPI spec version to document AML terms.
> >>
> >> The only way introducing 64 bit address is adding XSDT support that what
> >> Michael did before, however, it seems it need great efforts to do it as
> >> it will break OVMF. It's a long term workload. :(
> > to enable 64-bit integers in AML it's sufficient to change DSDT revision to 
> > 2,
> > I already have a patch that switches DSDT/SSDT to rev2.
> > Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit 
> > integers
> > on linux & later Windows versions.
> 
> Great, i remembered i did the similar test (directly change DSDT to rev2) and 
> it
> caused winXP blue screen. Could you please tell me where i can find your 
> patch?
https://github.com/imammedo/qemu/commits/mhpt_table_v2
following changes revision:
 pc: acpi: bump DSDT/SSDT compliance revision to v2
and here is user:
 acpi: memhp: simplify MCRS() using 64-bit math

when writing ASL one shall make sure that only XP supported
features are in global scope, which is evaluated when tables
are loaded and features of rev2 and higher are inside methods.
That way XP doesn't crash as far as it doesn't evaluate unsupported
opcode and one can guard those opcodes checking _REV object if neccesary.


> >>
> >> The luck thing is, the ACPI part is not ABI, we can move it to the high
> >> memory if QEMU supports XSDT is ready in future development.
> > But mapped control region is ABI and we can't change it if we find out later
> > that it breaks something.
> 
> But the ACPI code is completely built by QEMU, 

Re: [PATCH v3 2/3] KVM/arm/arm64: enable enhanced armv7 fp/simd lazy switch

2015-11-05 Thread Christoffer Dall
On Fri, Oct 30, 2015 at 02:56:32PM -0700, Mario Smarduch wrote:
> This patch tracks vfp/simd hardware state with a vcpu lazy flag. vCPU lazy 
> flag is set on guest access and traps to vfp/simd hardware switch handler. On 
> vm-enter if lazy flag is set skip trap enable and save host fpexc. On 
> vm-exit if flag is set skip hardware context switch and return to host with 
> guest context. In vcpu_put check if vcpu lazy flag is set, and execute a 
> hardware context switch to restore host.
> 
> Also some arm64 field and empty function are added to compile for arm64.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c|  6 
>  arch/arm/kvm/interrupts.S | 60 
> ---
>  arch/arm/kvm/interrupts_head.S| 14 +
>  arch/arm64/include/asm/kvm_host.h |  4 +++
>  5 files changed, 63 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index f1bf551..a9e86e0 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -227,6 +227,7 @@ int kvm_perf_teardown(void);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  
>  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *);
>  
>  static inline void kvm_arch_hardware_disable(void) {}
>  static inline void kvm_arch_hardware_unsetup(void) {}
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index dc017ad..11a56fe 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -296,6 +296,12 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>   /*
> +  * If fp/simd registers are dirty save guest, restore host before

If the fp/simd registers are dirty, then restore the host state before

> +  * releasing the cpu.
> +  */
> + if (vcpu->arch.vfp_dirty)
> + kvm_restore_host_vfp_state(vcpu);
> + /*
>* The arch-generic KVM code expects the cpu field of a vcpu to be -1
>* if the vcpu is no longer assigned to a cpu.  This is used for the
>* optimized make_all_cpus_request path.
> diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
> index 900ef6d..ca25314 100644
> --- a/arch/arm/kvm/interrupts.S
> +++ b/arch/arm/kvm/interrupts.S
> @@ -28,6 +28,32 @@
>  #include "interrupts_head.S"
>  
>   .text
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy

nit: Can you move the multi-line description of the function into a
separate paragraph?

> + *   fp/simd switch, saves the guest, restores host. Called from host
> + *   mode, placed outside of hyp region start/end.

Put the description in a separate paragraph and get rid of the "executes
lazy fp/simd swithch" part, that doesn't help understanding.  Just say
that this funciton restores the host state.

> + */
> +ENTRY(kvm_restore_host_vfp_state)
> +#ifdef CONFIG_VFPv3
> + push{r4-r7}
> +
> + add r7, vcpu, #VCPU_VFP_GUEST
> + store_vfp_state r7
> +
> + add r7, vcpu, #VCPU_VFP_HOST
> + ldr r7, [r7]
> + restore_vfp_state r7
> +
> + ldr r3, [vcpu, #VCPU_VFP_HOST_FPEXC]
> + VFPFMXR FPEXC, r3
> +
> + mov r3, #0
> + strbr3, [vcpu, #VCPU_VFP_DIRTY]
> +
> + pop {r4-r7}
> +#endif
> + bx  lr
> +ENDPROC(kvm_restore_host_vfp_state)
>  
>  __kvm_hyp_code_start:
>   .globl __kvm_hyp_code_start
> @@ -119,11 +145,16 @@ ENTRY(__kvm_vcpu_run)
>   @ If the host kernel has not been configured with VFPv3 support,
>   @ then it is safer if we deny guests from using it as well.
>  #ifdef CONFIG_VFPv3
> - @ Set FPEXC_EN so the guest doesn't trap floating point instructions
> + @ fp/simd register file has already been accessed, so skip host fpexc
> + @ save and access trap enable.
> + vfp_inlazy_mode r7, skip_guest_vfp_trap

So, why do we need to touch this register at all on every CPU exit?

Is it not true that we can only be in one of two state:
 1) The register file is not dirty (not touched by the guest) and we
should trap
 2) The register file is dirty, and we should not trap to EL2?

Only in the first case do we need to set the FPEXC, and couldn't we just
do that on vcpu_load and git rid of all this?  (except HCPTR_TCP which
we still need to adjust).

> +
>   VFPFMRX r2, FPEXC   @ VMRS
> - push{r2}
> + str r2, [vcpu, #VCPU_VFP_HOST_FPEXC]
>   orr r2, r2, #FPEXC_EN
>   VFPFMXR FPEXC, r2   @ VMSR
> + set_hcptr vmentry, (HCPTR_TCP(10) | HCPTR_TCP(11))
> +skip_guest_vfp_trap:
>  #endif
>  
>   @ Configure Hyp-role
> @@ -131,7 +162,7 @@ ENTRY(__kvm_vcpu_run)
>  
>   @ Trap coprocessor CRx accesses
>   set_hstr vmentry
> - set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | 

Re: [PATCH 3/3] KVM/arm64: enable enhanced armv8 fp/simd lazy switch

2015-11-05 Thread Christoffer Dall
On Fri, Oct 30, 2015 at 02:56:33PM -0700, Mario Smarduch wrote:
> This patch enables arm64 lazy fp/simd switch, similar to arm described in
> second patch. Change from previous version - restore function is moved to
> host. 
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 +-
>  arch/arm64/kernel/asm-offsets.c   |  1 +
>  arch/arm64/kvm/hyp.S  | 37 +++--
>  3 files changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 26a2347..dcecf92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -251,11 +251,11 @@ static inline void kvm_arch_hardware_unsetup(void) {}
>  static inline void kvm_arch_sync_events(struct kvm *kvm) {}
>  static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
> -static inline void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu) {}
>  
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
>  void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
> +void kvm_restore_host_vfp_state(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 8d89cf8..c9c5242 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -124,6 +124,7 @@ int main(void)
>DEFINE(VCPU_HCR_EL2,   offsetof(struct kvm_vcpu, 
> arch.hcr_el2));
>DEFINE(VCPU_MDCR_EL2,  offsetof(struct kvm_vcpu, arch.mdcr_el2));
>DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines));
> +  DEFINE(VCPU_VFP_DIRTY, offsetof(struct kvm_vcpu, arch.vfp_dirty));
>DEFINE(VCPU_HOST_CONTEXT,  offsetof(struct kvm_vcpu, 
> arch.host_cpu_context));
>DEFINE(VCPU_HOST_DEBUG_STATE, offsetof(struct kvm_vcpu, 
> arch.host_debug_state));
>DEFINE(VCPU_TIMER_CNTV_CTL,offsetof(struct kvm_vcpu, 
> arch.timer_cpu.cntv_ctl));
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index e583613..ed2c4cf 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -36,6 +36,28 @@
>  #define CPU_SYSREG_OFFSET(x) (CPU_SYSREGS + 8*x)
>  
>   .text
> +
> +/**
> + * void kvm_restore_host_vfp_state(struct vcpu *vcpu) - Executes lazy
> + *   fp/simd switch, saves the guest, restores host. Called from host
> + *   mode, placed outside of hyp section.

same comments on style as previous patch

> + */
> +ENTRY(kvm_restore_host_vfp_state)
> + pushxzr, lr
> +
> + add x2, x0, #VCPU_CONTEXT
> + mov w3, #0
> + strbw3, [x0, #VCPU_VFP_DIRTY]

I've been discussing with myself if it would make more sense to clear
the dirty flag in the C-code...

> +
> + bl __save_fpsimd
> +
> + ldr x2, [x0, #VCPU_HOST_CONTEXT]
> + bl __restore_fpsimd
> +
> + pop xzr, lr
> + ret
> +ENDPROC(kvm_restore_host_vfp_state)
> +
>   .pushsection.hyp.text, "ax"
>   .align  PAGE_SHIFT
>  
> @@ -482,7 +504,11 @@
>  99:
>   msr hcr_el2, x2
>   mov x2, #CPTR_EL2_TTA
> +
> + ldrbw3, [x0, #VCPU_VFP_DIRTY]
> + tbnzw3, #0, 98f
>   orr x2, x2, #CPTR_EL2_TFP
> +98:

mmm, don't you need to only set the fpexc32 when you're actually going
to trap the guest accesses?

also, you can consider only setting this in vcpu_load (jumping quickly
to EL2 to do so) if we're running a 32-bit guest.  Probably worth
measuring the difference between the extra EL2 jump on vcpu_load
compared to hitting this register on every entry/exit.

Code-wise, it will be nicer to do it on vcpu_load.

>   msr cptr_el2, x2
>  
>   mov x2, #(1 << 15)  // Trap CP15 Cr=15
> @@ -669,14 +695,12 @@ __restore_debug:
>   ret
>  
>  __save_fpsimd:
> - skip_fpsimd_state x3, 1f
>   save_fpsimd
> -1:   ret
> + ret
>  
>  __restore_fpsimd:
> - skip_fpsimd_state x3, 1f
>   restore_fpsimd
> -1:   ret
> + ret
>  
>  switch_to_guest_fpsimd:
>   pushx4, lr
> @@ -688,6 +712,9 @@ switch_to_guest_fpsimd:
>  
>   mrs x0, tpidr_el2
>  
> + mov w2, #1
> + strbw2, [x0, #VCPU_VFP_DIRTY]

hmm, just noticing this.  Are you not writing a 32-bit value to a
potentially 8-bit field (ignoring padding in the struct), as the dirty
flag is declared a bool.

Are you also doing this on the 32-bit side?

> +
>   ldr x2, [x0, #VCPU_HOST_CONTEXT]
>   kern_hyp_va x2
>   bl __save_fpsimd
> @@ -763,7 +790,6 @@ __kvm_vcpu_return:
>   add x2, x0, #VCPU_CONTEXT
>  
>   save_guest_regs
> - bl __save_fpsimd
>   bl __save_sysregs
>  
>   skip_debug_state x3, 1f
> @@ -784,7 +810,6 @@ __kvm_vcpu_return:
>   kern_hyp_va x2
>  
>   bl __restore_sysregs
> - bl 

Re: [PATCH] KVM/arm: kernel low level debug support for ARM32 virtual platforms

2015-11-05 Thread Christoffer Dall
On Wed, Nov 04, 2015 at 07:51:58PM +0100, Ard Biesheuvel wrote:
> On 4 November 2015 at 19:49, Christopher Covington  
> wrote:
> > On 11/04/2015 08:31 AM, Christoffer Dall wrote:
> >> On Tue, Nov 03, 2015 at 01:39:44PM -0600, Rob Herring wrote:
> >>> On Tue, Nov 3, 2015 at 1:17 PM, Mario Smarduch  
> >>> wrote:
>  On 11/3/2015 9:55 AM, Will Deacon wrote:
> > On Tue, Nov 03, 2015 at 09:44:52AM -0800, Mario Smarduch wrote:
> >> On 11/3/2015 8:33 AM, Christopher Covington wrote:
> >>> On 11/02/2015 06:51 PM, Mario Smarduch wrote:
> this is a re-post from couple weeks ago, please take time to 
>  review this
>  simple patch which simplifies DEBUG_LL and prevents kernel crash on 
>  virtual
>  platforms.
> 
>  Before this patch DEBUG_LL for 'dummy virtual machine':
> 
>  ( ) Kernel low-level debugging via EmbeddedICE DCC channel
>  ( ) Kernel low-level debug output via semihosting I/O
>  ( ) Kernel low-level debugging via 8250 UART
>  ( ) Kernel low-level debugging via ARM Ltd PL01x Primecell
> 
>  In summary if debug uart is not emulated kernel crashes.
>  And once you pass that hurdle, uart physical/virtual addresses are 
>  unknown.
>  DEBUG_LL comes in handy on many occasions and should be somewhat
>  intuitive to use like it is for physical platforms. For virtual 
>  platforms
>  user may start daubting the host and get into a bigger mess.
> 
>  After this patch is applied user gets:
> 
>  (X) Kernel low-level debugging on QEMU Virtual Platform
>  ( ) Kernel low-level debugging on Kvmtool Virtual Platform
> . above repeated 
> 
>  The virtual addresses selected follow arm reference models, high in 
>  vmalloc
>  section with high mem enabled and guest running with >= 1GB of 
>  memory. The
>  offset is leftover from arm reference models.
> >>>
> >>> Which model? It doesn't appear to match the vexpress 
> >>> AEM/RTSM/FVP/whatever
> >>> which used 0x1c09 for UART0.
> >>
> >> I recall QEMU virt model had it's own physical address map, for sure I 
> >> saw the
> >> virtio-mmio regions assigned in some ARM document. Peter would you 
> >> know?
> >>
> >> As far as kvmtool I'm not sure, currently PC1 COM1 port is used? Andre 
> >> will that
> >> stay fixed?
> >
> > We make absolutely no guarantees about the memory map provided by 
> > kvmtool.
> 
>  If that's also the case for qemu, then I guess the best you can do is 
>  find a way
>  to dump the device tree. Find the uart, physical address and try figure 
>  out the
>  virtual address.
> 
>  Pretty involved, hoped for something more automated since that's a handy 
>  feature.
> >>>
> >>> You really only need LL_DEBUG now if you are debugging very early code
> >>> before memory is setup and/or bad memory. Use earlycon instead which
> >>> should already be supported both via the pl011 or semihosting. I used
> >>> it with QEMU semihosting support.
> >>>
> >> Then we should really document how to use that with qemu's virt platform
> >> and kvmtool's platform on both 32-bit and 64-bit so that users can
> >> easily figure out what they're doing wrong when they get no output.
> >>
> >> In practice, the address for the pl011 is quite unlikely to change, I
> >> dare speculate, so that documentation shouldn't need frequent updating.
> >
> > Is it not on by default since the following change?
> >
> > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=f022b8e95379b0433d13509706b66f38fc15dde8
> >
> 
> Yes, but it still requires the plain 'earlycon' argument (i.e, without
> '=pl011,...') to be passed on the kernel command line if you want
> early output.
> 
I didn't notice this. Cool!

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] First batch of KVM changes for 4.4

2015-11-05 Thread Paolo Bonzini
Linus,

The following changes since commit 0d997491f814c87310a6ad7be30a9049c7150489:

  arm/arm64: KVM: Fix disabled distributor operation (2015-10-20 18:09:13 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

for you to fetch changes up to a3eaa8649e4c6a6afdafaa04b9114fb230617bb1:

  KVM: VMX: Fix commit which broke PML (2015-11-05 11:34:11 +0100)


s390: A bunch of fixes and optimizations for interrupt and time
handling.

PPC: Mostly bug fixes.

ARM: No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite for
  IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state

x86: quite a few changes:

- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs).  This introduces a new component (in
virt/lib/) that connects VFIO and KVM together.  The same infrastructure
will be used for ARM interrupt forwarding as well.

- more Hyper-V features, though the main one Hyper-V synthetic interrupt
controller will have to wait for 4.5.  These will let KVM expose Hyper-V
devices.

- nested virtualization now supports VPID (same as PCID but for vCPUs)
which makes it quite a bit faster

- for future hardware that supports NVDIMM, there is support for clflushopt,
clwb, pcommit

- support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
userspace, which reduces the attack surface of the hypervisor

- obligatory smattering of SMM fixes

- on the guest side, stable scheduler clock support was rewritten to not
require help from the hypervisor.


Alex Williamson (1):
  virt: IRQ bypass manager

Andrey Smetanin (8):
  kvm/x86: Hyper-V HV_X64_MSR_RESET msr
  kvm/x86: Hyper-V HV_X64_MSR_VP_INDEX export for QEMU.
  kvm/x86: Hyper-V HV_X64_MSR_VP_RUNTIME support
  kvm/eventfd: avoid loop inside irqfd_update()
  kvm/eventfd: factor out kvm_notify_acked_gsi()
  kvm/eventfd: add arch-specific set_irq
  kvm/irqchip: allow only multiple irqchip routes per GSI
  drivers/hv: share Hyper-V SynIC constants with userspace

Andrzej Hajda (1):
  KVM: PPC: e500: fix handling local_sid_lookup result

Christian Borntraeger (3):
  KVM: s390: remove unused variable in __inject_vm
  KVM: s390: drop useless newline in debugging data
  KVM: s390: use simple switch statement as multiplexer

Christoffer Dall (10):
  KVM: Add kvm_arch_vcpu_{un}blocking callbacks
  arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block
  arm/arm64: KVM: vgic: Factor out level irq processing on guest exit
  arm/arm64: KVM: Implement GICD_ICFGR as RO for PPIs
  arm/arm64: KVM: Use appropriate define in VGIC reset code
  arm/arm64: KVM: Add forwarded physical interrupts documentation
  arm/arm64: KVM: Rework the arch timer to use level-triggered semantics
  arm/arm64: KVM: Support edge-triggered forwarded interrupts
  arm/arm64: KVM: Improve kvm_exit tracepoint
  arm/arm64: KVM: Add tracepoints for vgic and timer

David Hildenbrand (10):
  KVM: s390: disabled wait cares about machine checks, not PER
  KVM: s390: set interception requests for all floating irqs
  KVM: s390: kvm_arch_vcpu_runnable already cares about timer interrupts
  KVM: s390: drop out early in kvm_s390_has_irq()
  KVM: s390: simplify in-kernel program irq injection
  KVM: s390: correctly handle injection of pgm irqs and per events
  KVM: s390: switch to get_tod_clock() and fix STP sync races
  KVM: s390: factor out and fix setting of guest TOD clock
  KVM: s390: factor out reading of the guest TOD clock
  KVM: s390: SCA must not cross page boundaries

Eric Auger (7):
  KVM: create kvm_irqfd.h
  KVM: introduce kvm_arch functions for IRQ bypass
  KVM: eventfd: add irq bypass consumer management
  KVM: arm/arm64: rename pause into power_off
  KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnable
  KVM: arm/arm64: check power_off in critical section before VCPU run
  KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest

Feng Wu (12):
  virt: Add virt directory to the top Makefile
  KVM: Add an arch specific hooks in 'struct kvm_kernel_irqfd'
  KVM: Extend struct pi_desc for VT-d Posted-Interrupts
  KVM: Add some helper functions for Posted-Interrupts
  KVM: Define a new interface kvm_intr_is_single_vcpu()
  KVM: make kvm_set_msi_irq() public
  vfio: Register/unregister irq_bypass_producer
  KVM: x86: Update IRTE for posted-interrupts
  KVM: x86: select IRQ_BYPASS_MANAGER
  KVM: Update 

Re: [PATCH v1 2/2] dma-mapping-common: add DMA attribute - DMA_ATTR_IOMMU_BYPASS

2015-11-05 Thread Joerg Roedel
On Mon, Nov 02, 2015 at 07:32:19PM +0200, Shamir Rabinovitch wrote:
> Correct. This issue is one of the concerns here in the previous replies.
> I will take different approach which will not require the IOMMU bypass
> per mapping. Will try to shift to the x86 'iommu=pt' approach.

Yeah, it doesn't really make sense to have an extra remappable area when
the device can access all physical memory anyway.

> We had a bunch of issues around SPARC IOMMU. Not all of them relate to
> performance. The first issue was that on SPARC, currently, we only have 
> limited address space to IOMMU so we had issue to do large DMA mappings
> for Infiniband. Second issue was that we identified high contention on 
> the IOMMU locks even in ETH driver.

Contended IOMMU locks are not only a problem on SPARC, but on x86 and
various other IOMMU drivers too. But I have some ideas on how to improve
the situation there.

> I do not want to put too much information here but you can see some results:
> 
> rds-stress test from sparc t5-2 -> x86:
> 
> with iommu bypass:
> -
> sparc->x86 cmdline = -r XXX -s XXX -q 256 -a 8192 -T 10 -d 10 -t 3 -o XXX
> tsks   tx/s   rx/s  tx+rx K/smbi K/smbo K/s tx us/c   rtt us cpu %
>3 141278  0 1165565.81   0.00   0.008.93   376.60 -1.00  
> (average)
> 
> without iommu bypass:
> -
> sparc->x86 cmdline = -r XXX -s XXX -q 256 -a 8192 -T 10 -d 10 -t 3 -o XXX
> tsks   tx/s   rx/s  tx+rx K/smbi K/smbo K/s tx us/c   rtt us cpu %
>3  78558  0  648101.41   0.00   0.00   15.05   876.72 -1.00  
> (average)
> 
> + RDMA tests are totally not working (might be due to failure to DMA map all 
> the memory).
> 
> So IOMMU bypass give ~80% performance boost.

Interesting. Have you looked more closely on what causes the performance
degradation? Is it the lock contention or something else?


Joerg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] dma: Provide simple noop dma ops

2015-11-05 Thread Joerg Roedel
On Tue, Nov 03, 2015 at 12:54:37PM +0100, Christian Borntraeger wrote:
> We are going to require dma_ops for several common drivers, even for
> systems that do have an identity mapping. Lets provide some minimal
> no-op dma_ops that can be used for that purpose.
> 
> Signed-off-by: Christian Borntraeger 
> ---
>  include/linux/dma-mapping.h |  2 ++
>  lib/Makefile|  1 +
>  lib/dma-noop.c  | 75 
> +
>  3 files changed, 78 insertions(+)
>  create mode 100644 lib/dma-noop.c

Reviewed-by: Joerg Roedel 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI

2015-11-05 Thread Xiao Guangrong



On 11/05/2015 09:03 PM, Igor Mammedov wrote:

On Thu, 5 Nov 2015 18:15:31 +0800
Xiao Guangrong  wrote:




On 11/05/2015 05:58 PM, Igor Mammedov wrote:

On Mon,  2 Nov 2015 17:13:27 +0800
Xiao Guangrong  wrote:


A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are

 ^^ missing one 0???


reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
for detailed design

A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
that controls if nvdimm support is enabled, it is true on default and
it is false on 2.4 and its earlier version to keep compatibility

Signed-off-by: Xiao Guangrong 

[...]


@@ -33,6 +33,15 @@
*/
   #define MIN_NAMESPACE_LABEL_SIZE  (128UL << 10)

+/*
+ * A page staring from 0xFF0 and IO port 0x0a18 - 0xa1b in guest are

   ^^^ missing 0 or value in define below has 
an extra 0


+ * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
+ * for detailed design.
+ */
+#define NVDIMM_ACPI_MEM_BASE  0xFF00ULL

it still maps RAM at arbitrary place,
that's the reason why VMGenID patches hasn't been merged for
more than several months.
I'm not against of using (more exactly I'm for it) direct mapping
but we should reach consensus when and how to use it first.

I'd wouldn't use addresses below 4G as it may be used firmware or
legacy hardware and I won't bet that 0xFF00ULL won't conflict
with anything.
An alternative place to allocate reserve from could be high memory.
For pc we have "reserved-memory-end" which currently makes sure
that hotpluggable memory range isn't used by firmware.

How about making API that allows to map additional memory
ranges before reserved-memory-end and pushes it up as mappings are
added.


That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
address in ACPI in QEMU is not a easy work - we can not simply make
SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
v3's changelog:

3) we figure out a unused memory hole below 4G that is 0xFF0 ~
   0xFFF0, this range is large enough for NVDIMM ACPI as build 64-bit
   ACPI SSDT/DSDT table will break windows XP.
   BTW, only make SSDT.rev = 2 can not work since the width is only depended
   on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
   in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
| the global integer width for all integers, including integers in SSDTs.
4) use the lowest ACPI spec version to document AML terms.

The only way introducing 64 bit address is adding XSDT support that what
Michael did before, however, it seems it need great efforts to do it as
it will break OVMF. It's a long term workload. :(

to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
I already have a patch that switches DSDT/SSDT to rev2.
Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
on linux & later Windows versions.


Great, i remembered i did the similar test (directly change DSDT to rev2) and it
caused winXP blue screen. Could you please tell me where i can find your patch?





The luck thing is, the ACPI part is not ABI, we can move it to the high
memory if QEMU supports XSDT is ready in future development.

But mapped control region is ABI and we can't change it if we find out later
that it breaks something.


But the ACPI code is completely built by QEMU, which is transparent to guest
and guest should not depend on it, no?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html