On Thu, Sep 05, 2013 at 02:05:09PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-09-03 at 13:53 +0300, Gleb Natapov wrote:
> > > Or supporting all IOMMU links (and leaving emulated stuff as is) in on
> > > "device" is the last thing I have to do and then you'll ack the patch?
> > >
> > I am
On Fri, Sep 06, 2013 at 10:58:16AM +0530, Aneesh Kumar K.V wrote:
> Paul Mackerras writes:
>
> > This enables us to use the Processor Compatibility Register (PCR) on
> > POWER7 to put the processor into architecture 2.05 compatibility mode
> > when running a guest. In this mode the new instructi
On 09/06/2013 04:01 PM, Gleb Natapov wrote:
> On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote:
>> On 09/06/2013 04:10 AM, Gleb Natapov wrote:
>>> On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
On 09/03/2013 08:53 PM, Gleb Natapov wrote:
> On Mon,
On Fri, Sep 06, 2013 at 09:38:21AM +1000, Alexey Kardashevskiy wrote:
> On 09/06/2013 04:10 AM, Gleb Natapov wrote:
> > On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
> >> On 09/03/2013 08:53 PM, Gleb Natapov wrote:
> >>> On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Karda
Paul Mackerras writes:
> This enables us to use the Processor Compatibility Register (PCR) on
> POWER7 to put the processor into architecture 2.05 compatibility mode
> when running a guest. In this mode the new instructions and registers
> that were introduced on POWER7 are disabled in user mode
POWER8 has 512 sets in the TLB, compared to 128 for POWER7, so we need
to do more tlbiel instructions when flushing the TLB on POWER8.
Signed-off-by: Paul Mackerras
---
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc
From: Michael Ellerman
At present this should never happen, since the host kernel sets
HFSCR to allow access to all facilities. It's better to be prepared
to handle it cleanly if it does ever happen, though.
Signed-off-by: Michael Ellerman
Signed-off-by: Paul Mackerras
---
arch/powerpc/inclu
This series adds support for the POWER8 CPU in HV KVM. POWER8 adds
several new guest-accessible instructions, special-purpose registers,
and other features such as doorbell interrupts and hardware
transactional memory. It also adds new hypervisor-controlled features
such as relocation-on interrup
From: Michael Neuling
POWER8 doesn't have the DABR and DABRX registers; instead it has
new DAWR/DAWRX registers, which will be handled in a later patch.
Signed-off-by: Michael Neuling
Signed-off-by: Paul Mackerras
---
arch/powerpc/kvm/book3s_hv_interrupts.S | 2 ++
arch/powerpc/kvm/book3s_hv
POWER8 has support for hypervisor doorbell interrupts. Though the
kernel doesn't use them for IPIs on the powernv platform yet, it
probably will in future, so this makes KVM cope gracefully if a
hypervisor doorbell interrupt arrives while in a guest.
Signed-off-by: Paul Mackerras
---
arch/power
From: Michael Neuling
This adds fields to the struct kvm_vcpu_arch to store the new
guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
functions to allow userspace to access this state, and adds code to
the guest entry and exit to context-switch these SPRs between host
and guest.
On a threaded processor such as POWER7, we group VCPUs into virtual
cores and arrange that the VCPUs in a virtual core run on the same
physical core. Currently we don't enforce any correspondence between
virtual thread numbers within a virtual core and physical thread
numbers. Physical threads ar
POWER8 has a bit in the LPCR to enable or disable the PURR and SPURR
registers to count when in the guest. Set this bit.
POWER8 has a field in the LPCR called AIL (Alternate Interrupt Location)
which is used to enable relocation-on interrupts. Allow userspace to
set this field.
Signed-off-by: P
This allows us to select architecture 2.05 (POWER6) or 2.06 (POWER7)
compatibility modes on a POWER8 processor.
Signed-off-by: Paul Mackerras
---
arch/powerpc/include/asm/reg.h | 2 ++
arch/powerpc/kvm/book3s_hv.c | 16 +++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff -
Currently in book3s_hv_rmhandlers.S we have three places where we
have woken up from nap mode and we check the reason field in SRR1
to see what event woke us up. This consolidates them into a new
function, kvmppc_check_wake_reason. It looks at the wake reason
field in SRR1, and if it indicates th
* SRR1 wake reason field for system reset interrupt on wakeup from nap
is now a 4-bit field on P8, compared to 3 bits on P7.
* Set PECEDP in LPCR when napping because of H_CEDE so guest doorbells
will wake us up.
* Waking up from nap because of a guest doorbell interrupt is not a
reason to
From: Michael Ellerman
This means that if we do happen to get a trap that we don't know
about, we abort the guest rather than crashing the host kernel.
Signed-off-by: Michael Ellerman
Signed-off-by: Paul Mackerras
---
arch/powerpc/kvm/book3s_hv.c | 3 +--
1 file changed, 1 insertion(+), 2 del
This moves the code in book3s_hv_rmhandlers.S that reads any pending
interrupt from the XICS interrupt controller, and works out whether
it is an IPI for the guest, an IPI for the host, or a device interrupt,
into a new function called kvmppc_read_intr. Later patches will
need this.
Signed-off-by
The yield count in the VPA is supposed to be incremented every time
we enter the guest, and every time we exit the guest, so that its
value is even when the vcpu is running in the guest and odd when it
isn't. However, it's currently possible that we increment the yield
count on the way into the gu
We have two paths into and out of the low-level guest entry and exit
code: from a vcpu task via kvmppc_hv_entry_trampoline, and from the
system reset vector for an offline secondary thread on POWER7 via
kvm_start_guest. Currently both just branch to kvmppc_hv_entry to
enter the guest, and on guest
The H_CONFER hypercall is used when a guest vcpu is spinning on a lock
held by another vcpu which has been preempted, and the spinning vcpu
wishes to give its timeslice to the lock holder. We implement this
in the straightforward way using kvm_vcpu_yield_to().
Signed-off-by: Paul Mackerras
---
Currently we are not saving and restoring the SIAR and SDAR registers in
the PMU (performance monitor unit) on guest entry and exit. The result
is that performance monitoring tools in the guest could get false
information about where a program was executing and what data it was
accessing at the ti
This adds the ability for userspace to read and write the LPCR
(Logical Partitioning Control Register) value relating to a guest
via the GET/SET_ONE_REG interface. There is only one LPCR value
for the guest, which can be accessed through any vcpu. Userspace
can only modify the following fields of
The VRSAVE register value for a vcpu is accessible through the
GET/SET_SREGS interface for Book E processors, but not for Book 3S
processors. In order to make this accessible for Book 3S processors,
this adds a new register identifier for GET/SET_ONE_REG, and adds
the code to implement it.
Signed
This enables us to use the Processor Compatibility Register (PCR) on
POWER7 to put the processor into architecture 2.05 compatibility mode
when running a guest. In this mode the new instructions and registers
that were introduced on POWER7 are disabled in user mode. This
includes all the VSX faci
POWER7 and later IBM server processors have a register called the
Program Priority Register (PPR), which controls the priority of
each hardware CPU SMT thread, and affects how fast it runs compared
to other SMT threads. This priority can be controlled by writing to
the PPR or by use of a set of in
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only
This series of patches is based on Alex Graf's kvm-ppc-queue branch.
It fixes some bugs, makes some more registers accessible through the
one_reg interface, and implements some missing features such as
support for the compatibility modes in recent POWER cpus and support
for the guest having a diffe
Test cases for preemption timer in nested VMX. Two aspects are tested:
1. Save preemption timer on VMEXIT if relevant bit set in EXIT_CONTROL
2. Test a relevant bug of KVM. The bug will not save preemption timer
value if exit L2->L0 for some reason and enter L0->L2. Thus preemption
timer will never
This patch contains the following two changes:
1. Fix the bug in nested preemption timer support. If vmexit L2->L0
with some reasons not emulated by L1, preemption timer value should
be save in such exits.
2. Add support of "Save VMX-preemption timer value" VM-Exit controls
to nVMX.
With this patc
On 09/06/2013 04:10 AM, Gleb Natapov wrote:
> On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
>> On 09/03/2013 08:53 PM, Gleb Natapov wrote:
>>> On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
On 09/01/2013 10:06 PM, Gleb Natapov wrote:
> On Wed,
On 06.09.2013, at 00:09, Scott Wood wrote:
> On Thu, 2013-07-11 at 01:09 +0200, Alexander Graf wrote:
>> On 11.07.2013, at 01:08, Scott Wood wrote:
>>
>>> On 07/10/2013 06:04:53 PM, Alexander Graf wrote:
On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote:
> On Thu, 2013-07-11 at 00:5
On Thu, 2013-09-05 at 15:22 -0700, Zi Shen Lim wrote:
> Signed-off-by: Zi Shen Lim
> ---
Applied. Thanks!
Alex
> Documentation/vfio.txt | 8
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> index d7993dc..b9ca023 10
Signed-off-by: Zi Shen Lim
---
Documentation/vfio.txt | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
index d7993dc..b9ca023 100644
--- a/Documentation/vfio.txt
+++ b/Documentation/vfio.txt
@@ -167,8 +167,8 @@ group and c
On Thu, 2013-07-11 at 01:09 +0200, Alexander Graf wrote:
> On 11.07.2013, at 01:08, Scott Wood wrote:
>
> > On 07/10/2013 06:04:53 PM, Alexander Graf wrote:
> >> On 11.07.2013, at 01:01, Benjamin Herrenschmidt wrote:
> >> > On Thu, 2013-07-11 at 00:57 +0200, Alexander Graf wrote:
> >> >>> #ifdef C
Hi, Gleb.
On Thursday 05 September 2013 21:00:50 +0300,
Gleb Natapov wrote:
> > > > Someone had this problem and could solve it somehow? There any
> > > > debug information I can provide to help solve this?
> > > For simple troubleshooting try "info status" from the QEMU monitor.
> > ss01:~# te
On Wed, Sep 04, 2013 at 02:01:28AM +1000, Alexey Kardashevskiy wrote:
> On 09/03/2013 08:53 PM, Gleb Natapov wrote:
> > On Mon, Sep 02, 2013 at 01:14:29PM +1000, Alexey Kardashevskiy wrote:
> >> On 09/01/2013 10:06 PM, Gleb Natapov wrote:
> >>> On Wed, Aug 28, 2013 at 06:50:41PM +1000, Alexey Karda
On Thu, Sep 05, 2013 at 12:44:58PM -0300, Daniel Bareiro wrote:
> Hi, Pablo.
>
> On Thursday 05 September 2013 10:30:12 +0200,
> Paolo Bonzini wrote:
>
> > > Someone had this problem and could solve it somehow? There any
> > > debug information I can provide to help solve this?
>
> > For simple
Hi Sasha,
On 04/09/13 19:01, Sasha Levin wrote:
On 09/04/2013 01:48 PM, Pekka Enberg wrote:
On Wed, Sep 4, 2013 at 8:40 PM, Jonathan Austin wrote:
'top' works on ARM with virtio console. I've just done some new testing
and with the serial console emulation and I see the same as you're reporti
Il 04/09/2013 22:32, Radim Krčmář ha scritto:
> I did not reproduce the bug fixed in [1/2], but there are not that many
> reasons why we could not unload a module, so the spot is quite obvious.
>
>
> Radim Krčmář (2):
> kvm: free resources after canceling async_pf
> kvm: remove .done from str
Hi, Pablo.
On Thursday 05 September 2013 10:30:12 +0200,
Paolo Bonzini wrote:
> > Someone had this problem and could solve it somehow? There any
> > debug information I can provide to help solve this?
> For simple troubleshooting try "info status" from the QEMU monitor.
ss01:~# telnet localhost
On Thu, Sep 5, 2013 at 7:05 PM, Zhang, Yang Z wrote:
> Arthur Chunqi Li wrote on 2013-09-05:
>> > Arthur Chunqi Li wrote on 2013-09-05:
>> >> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z
>> >>
>> >> wrote:
>> >> > Arthur Chunqi Li wrote on 2013-09-04:
>> >> >> This patch contains the following t
On Tue, 3 Sep 2013, Michael S. Tsirkin wrote:
> On Tue, Sep 03, 2013 at 09:40:48AM +0100, Wei Liu wrote:
> > On Tue, Sep 03, 2013 at 09:28:11AM +0800, Qin Chuanyu wrote:
> > > On 2013/9/2 15:57, Wei Liu wrote:
> > > >On Sat, Aug 31, 2013 at 12:45:11PM +0800, Qin Chuanyu wrote:
> > > >>On 2013/8/30
On 09/05/2013 08:21 PM, Paolo Bonzini wrote:
> Page tables in a read-only memory slot will currently cause a triple
> fault when running with shadow paging, because the page walker uses
> gfn_to_hva and it fails on such a slot.
>
> TianoCore uses such a page table. The idea is that, on real hardw
QEMU moves state from CPUArchState to struct kvm_xsave and back when it
invokes the KVM_*_XSAVE ioctls. Because it doesn't treat the XSAVE
region as an opaque blob, it might be impossible to set some state on
the destination if migrating to an older version.
This patch blocks migration if it find
This series fixes two migration bugs concerning KVM's XSAVE ioctls,
both found by code inspection (the second in fact is just theoretical
until AVX512 or MPX support is added to KVM).
Please review.
Paolo Bonzini (2):
x86: fix migration from pre-version 12
KVM: make XSAVE support more robust
On KVM, the KVM_SET_XSAVE would be executed with a 0 xstate_bv,
and not restore anything.
Since FP and SSE data are always valid, set them in xstate_bv at reset
time. In fact, that value is the same that KVM_GET_XSAVE returns on
pre-XSAVE hosts.
Signed-off-by: Paolo Bonzini
---
target-i386/cpu
A guest can still attempt to save and restore XSAVE states even if they
have been masked in CPUID leaf 0Dh. This usually is not visible to
the guest, but is still wrong: "Any attempt to set a reserved bit (as
determined by the contents of EAX and EDX after executing CPUID with
EAX=0DH, ECX= 0H) in
Page tables in a read-only memory slot will currently cause a triple
fault when running with shadow paging, because the page walker uses
gfn_to_hva and it fails on such a slot.
TianoCore uses such a page table. The idea is that, on real hardware,
the firmware can already run in 64-bit flat mode w
Arthur Chunqi Li wrote on 2013-09-05:
> > Arthur Chunqi Li wrote on 2013-09-05:
> >> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z
> >>
> >> wrote:
> >> > Arthur Chunqi Li wrote on 2013-09-04:
> >> >> This patch contains the following two changes:
> >> >> 1. Fix the bug in nested preemption timer
It likes nulls list and we use the pte-list as the nulls which can help us to
detect whether the "desc" is moved to anther rmap then we can re-walk the rmap
if that happened
kvm->slots_lock is held when we do lockless walking that prevents rmap
is reused (free rmap need to hold that lock) so that
kvm_vm_ioctl_get_dirty_log() write-protects the spte based on the its dirty
bitmap, so we should ensure the writable spte can be found in rmap before the
dirty bitmap is visible. Otherwise, we clear the dirty bitmap but fail to
write-protect the page which is detailed in the comments in this patch
Currently, kvm zaps the large spte if write-protected is needed, the later
read can fault on that spte. Actually, we can make the large spte readonly
instead of making them un-present, the page fault caused by read access can
be avoided
The idea is from Avi:
| As I mentioned before, write-protecti
Changelog v2:
- the changes from Gleb's review:
1) fix calculating the number of spte in the pte_list_add()
2) set iter->desc to NULL if meet a nulls desc to cleanup the code of
rmap_get_next()
3) fix hlist corruption due to accessing sp->hlish out of mmu-lock
4) use rcu functions to
Change the algorithm to:
1) always add new desc to the first desc (pointed by parent_ptes/rmap)
that is good to implement rcu-nulls-list-like lockless rmap
walking
2) always move the entry in the first desc to the the position we want
to remove when delete a spte in the parent_ptes/rmap (
Relax the tlb flush condition since we will write-protect the spte out of mmu
lock. Note lockless write-protection only marks the writable spte to readonly
and the spte can be writable only if both SPTE_HOST_WRITEABLE and
SPTE_MMU_WRITEABLE are set (that are tested by spte_is_locklessly_modifiable)
If the desc is the last one and it is full, its sptes is not counted
Signed-off-by: Xiao Guangrong
---
arch/x86/kvm/mmu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6e2d2c8..7714fd8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@
Now we can flush all the TLBs out of the mmu lock without TLB corruption when
write-proect the sptes, it is because:
- we have marked large sptes readonly instead of dropping them that means we
just change the spte from writable to readonly so that we only need to care
the case of changing spte
Using sp->role.level instead of @level since @level is not got from the
page table hierarchy
There is no issue in current code since the fast page fault currently only
fixes the fault caused by dirty-log that is always on the last level
(level = 1)
This patch makes the code more readable and avoi
The basic idea is from nulls list which uses a nulls to indicate
whether the desc is moved to different pte-list
Note, we should do bottom-up walk in the desc since we always move
the bottom entry to the deleted position. A desc only has 3 entries
in the current code so it is not a problem now, bu
Since pte_list_desc will be locklessly accessed we need to atomicly initialize
its pointers so that the lockless walker can not get the partial value from the
pointer
In this patch we use the way of assigning pointer to initialize its pointers
which is always atomic instead of using kmem_cache_zal
It is easy if the handler is in the vcpu context, in that case we can use
walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end() that
disable interrupt to stop shadow page being freed. But we are on the ioctl
context and the paths we are optimizing for have heavy workload, disabling
Currently, when mark memslot dirty logged or get dirty page, we need to
write-protect large guest memory, it is the heavy work, especially, we
need to hold mmu-lock which is also required by vcpu to fix its page table
fault and mmu-notifier when host page is being changed. In the extreme
cpu / memo
Now, the only user of spte_write_protect is rmap_write_protect which
always calls spte_write_protect with pt_protect = true, so drop
it and the unused parameter @kvm
Signed-off-by: Xiao Guangrong
---
arch/x86/kvm/mmu.c | 19 ---
1 file changed, 8 insertions(+), 11 deletions(-)
d
It was removed by commit 834be0d83. Now we will need it to do lockless shadow
page walking protected by rcu, so reintroduce it
Signed-off-by: Xiao Guangrong
---
arch/x86/kvm/mmu.c | 23 ---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/ar
Use rcu_assign_pointer() to update all the pointer in desc
and use rcu_dereference() to lockless read the pointer
Signed-off-by: Xiao Guangrong
---
arch/x86/kvm/mmu.c | 46 --
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kvm
I have made some experiments like you suggested. Migrating VM back to
node where it worked fine does not help. I have found in VMs logs:
Clocksource tsc unstable (delta = 123652847 ns)
It was logged after first live migration but did not break VM. I
continued to migrate it back and forth and afte
Dear Sir/Madam,
This is my fifth times of written you this email since last year
till date but no response from you.Hope you get this one, as this is a
personal email directed to you. My wife and I won a Jackpot Lottery of
$11.3 million in July and have voluntarily decided to donate the sum of
On Thu, Sep 5, 2013 at 5:24 PM, Zhang, Yang Z wrote:
> Arthur Chunqi Li wrote on 2013-09-05:
>> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z
>> wrote:
>> > Arthur Chunqi Li wrote on 2013-09-04:
>> >> This patch contains the following two changes:
>> >> 1. Fix the bug in nested preemption timer s
Arthur Chunqi Li wrote on 2013-09-05:
> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z
> wrote:
> > Arthur Chunqi Li wrote on 2013-09-04:
> >> This patch contains the following two changes:
> >> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> >> with some reasons not emulated
Hi Jan, Gleb and Paolo,
It suddenly occurred to me that, if guest's PIN_PREEMPT disabled while
EXI_SAVE_PREEMPT_VALUE enabled, what will happen? The preempt value in
vmcs will not be affected, yes?
This cases fails to test in this patch.
Arthur
On Wed, Sep 4, 2013 at 11:26 PM, Arthur Chunqi Li
On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z wrote:
> Arthur Chunqi Li wrote on 2013-09-04:
>> This patch contains the following two changes:
>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 with some
>> reasons not emulated by L1, preemption timer value should be save in su
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Il 05/09/2013 01:31, Daniel Bareiro ha scritto:
> Someone had this problem and could solve it somehow? There any
> debug information I can provide to help solve this?
For simple troubleshooting try "info status" from the QEMU monitor.
You can also tr
https://bugzilla.kernel.org/show_bug.cgi?id=60850
Gleb changed:
What|Removed |Added
CC||g...@redhat.com
--- Comment #1 from Gleb ---
This
Arthur Chunqi Li wrote on 2013-09-04:
> This patch contains the following two changes:
> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 with some
> reasons not emulated by L1, preemption timer value should be save in such
> exits.
> 2. Add support of "Save VMX-preemption timer
75 matches
Mail list logo