Re: [RFC Patch 03/12] IXGBE: Add sysfs interface for Qemu to migrate VF status in the PF driver

2015-10-25 Thread Lan, Tianyu



On 10/22/2015 4:45 AM, Alexander Duyck wrote:

+/* Record states hold by PF */
+memcpy(>vf_data, >vfinfo[vfn], sizeof(struct
vf_data_storage));
+
+vf_shift = vfn % 32;
+reg_offset = vfn / 32;
+
+reg = IXGBE_READ_REG(hw, IXGBE_VFTE(reg_offset));
+reg &= ~(1 << vf_shift);
+IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), reg);
+
+reg = IXGBE_READ_REG(hw, IXGBE_VFRE(reg_offset));
+reg &= ~(1 << vf_shift);
+IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), reg);
+
+reg = IXGBE_READ_REG(hw, IXGBE_VMECM(reg_offset));
+reg &= ~(1 << vf_shift);
+IXGBE_WRITE_REG(hw, IXGBE_VMECM(reg_offset), reg);
+
+return sizeof(struct state_in_pf);
+}
+


This is a read.  Why does it need to switch off the VF?  Also why turn
of the anti-spoof, it doesn't make much sense.


This is to prevent packets which target to VM from delivering to 
original VF after migration. E,G After migration, VM pings the PF of 
original machine and the ping reply packet will forward to original

VF if it wasn't disabled.

BTW, the read is done when VM has been stopped on the source machine.





+static ssize_t ixgbe_store_state_in_pf(struct device *dev,
+   struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+struct ixgbe_adapter *adapter = to_adapter(dev);
+struct pci_dev *pdev = adapter->pdev, *vdev;
+struct pci_dev *vf_pdev = to_pci_dev(dev);
+struct state_in_pf *state = (struct state_in_pf *)buf;
+int vfn = vf_pdev->virtfn_index;
+
+/* Check struct size */
+if (count != sizeof(struct state_in_pf)) {
+printk(KERN_ERR "State in PF size does not fit.\n");
+goto out;
+}
+
+/* Restore PCI configurations */
+vdev = ixgbe_get_virtfn_dev(pdev, vfn);
+if (vdev) {
+pci_write_config_word(vdev, IXGBE_PCI_VFCOMMAND,
state->command);
+pci_write_config_word(vdev, IXGBE_PCI_VFMSIXMC,
state->msix_message_control);
+}
+
+/* Restore states hold by PF */
+memcpy(>vfinfo[vfn], >vf_data, sizeof(struct
vf_data_storage));
+
+  out:
+return count;
+}


Just doing a memcpy to move the vfinfo over adds no value.  The fact is
there are a number of filters that have to be configured in hardware
after, and it isn't as simple as just migrating the values stored.


Restoring VF status in the PF is triggered by VF driver via new mailbox
msg and call ixgbe_restore_setting(). Here just copy data into vfinfo.
If configure hardware early, state will be cleared by FLR which is
trigger by restoring operation in the VF driver.



 As I
mentioned in the case of the 82598 there is also jumbo frames to take
into account.  If the first PF didn't have it enabled, but the second
one does that implies the state of the VF needs to change to account for
that.


Yes, that will be a problem and VF driver also need to know this change 
after migration and reconfigure jumbo frame




I really think you would be better off only migrating the data related
to what can be configured using the ip link command and leaving other
values such as clear_to_send at the reset value of 0. Then you can at
least restore state from the VF after just a couple of quick messages.


This sounds good. I will try it later.




+static struct device_attribute ixgbe_per_state_in_pf_attribute =
+__ATTR(state_in_pf, S_IRUGO | S_IWUSR,
+ixgbe_show_state_in_pf, ixgbe_store_state_in_pf);
+
+void ixgbe_add_vf_attrib(struct ixgbe_adapter *adapter)
+{
+struct pci_dev *pdev = adapter->pdev;
+struct pci_dev *vfdev;
+unsigned short vf_id;
+int pos, ret;
+
+pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
+if (!pos)
+return;
+
+/* get the device ID for the VF */
+pci_read_config_word(pdev, pos + PCI_SRIOV_VF_DID, _id);
+
+vfdev = pci_get_device(pdev->vendor, vf_id, NULL);
+
+while (vfdev) {
+if (vfdev->is_virtfn) {
+ret = device_create_file(>dev,
+_per_state_in_pf_attribute);
+if (ret)
+pr_warn("Unable to add VF attribute for dev %s,\n",
+dev_name(>dev));
+}
+
+vfdev = pci_get_device(pdev->vendor, vf_id, vfdev);
+}
+}


Driver specific sysfs is a no-go.  Otherwise we will end up with a
different implementation of this for every driver.  You will need to
find a way to make this generic in order to have a hope of getting this
to be acceptable.


Yes, Alex Williamson proposed to get/put data via VFIO interface. This
will be more general. I will do more research about how to communicate
between PF driver and VFIO subsystem.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-25 Thread Dmitry Vyukov
On Thu, Oct 22, 2015 at 1:07 AM, Sasha Levin  wrote:
> On 10/19/2015 11:15 AM, Dmitry Vyukov wrote:
>> On Mon, Oct 19, 2015 at 5:08 PM, Sasha Levin  wrote:
>>> > On 10/19/2015 10:47 AM, Dmitry Vyukov wrote:
> >>> Right, the memory areas that are accessed both by the hypervisor and 
> >>> the guest
>> >>> > should be treated as untrusted input, but the hypervisor is 
>> >>> > supposed to validate
>> >>> > the input carefully before using it - so I'm not sure how data 
>> >>> > races would
>> >>> > introduce anything new that we didn't catch during validation.
 >>
 >> One possibility would be: if result of a racy read is passed to guest,
 >> that can leak arbitrary host data into guest. Does not sound good.
 >> Also, without usage of proper atomic operations, it is basically
 >> impossible to verify untrusted data, as it can be changing under your
 >> feet. And storing data into a local variable does not prevent the data
 >> from changing.
>>> >
>>> > What's missing here is that the guest doesn't directly read/write the 
>>> > memory:
>>> > every time it accesses a memory that is shared with the host it will 
>>> > trigger
>>> > an exit, which will stop the vcpu thread that made the access and kernel 
>>> > side
>>> > kvm will pass the hypervisor the value the guest wrote (or the memory 
>>> > address
>>> > it attempted to read). The value/address can't change under us in that 
>>> > scenario.
>> But still: if result of a racy read is passed to guest, that can leak
>> arbitrary host data into guest.
>
> I see what you're saying. I need to think about it a bit, maybe we do need 
> locking
> for each of the virtio devices we emulate.
>
>
> On an unrelated note, a few of the reports are pointing to 
> ioport__unregister():
>
> ==
> WARNING: ThreadSanitizer: data race (pid=109228)
>   Write of size 8 at 0x7d1cdf40 by main thread:
> #0 free tsan/rtl/tsan_interceptors.cc:570 (lkvm+0x00443376)
> #1 ioport__unregister ioport.c:138:2 (lkvm+0x004a9ff9)
> #2 pci__exit pci.c:247:2 (lkvm+0x004ac857)
> #3 init_list__exit util/init.c:59:8 (lkvm+0x004bca6e)
> #4 kvm_cmd_run_exit builtin-run.c:645:2 (lkvm+0x004a68a7)
> #5 kvm_cmd_run builtin-run.c:661 (lkvm+0x004a68a7)
> #6 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #7 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #8 main main.c:18 (lkvm+0x004ac0b4)
>
>   Previous read of size 8 at 0x7d1cdf40 by thread T55:
> #0 rb_int_search_single util/rbtree-interval.c:14:17 (lkvm+0x004bf968)
> #1 ioport_search ioport.c:41:9 (lkvm+0x004aa05f)
> #2 kvm__emulate_io ioport.c:186 (lkvm+0x004aa05f)
> #3 kvm_cpu__emulate_io x86/include/kvm/kvm-cpu-arch.h:41:9 
> (lkvm+0x004aa718)
> #4 kvm_cpu__start kvm-cpu.c:126 (lkvm+0x004aa718)
> #5 kvm_cpu_thread builtin-run.c:174:6 (lkvm+0x004a6e3e)
>
>   Thread T55 'kvm-vcpu-2' (tid=109285, finished) created by main thread at:
> #0 pthread_create tsan/rtl/tsan_interceptors.cc:848 (lkvm+0x004478a3)
> #1 kvm_cmd_run_work builtin-run.c:633:7 (lkvm+0x004a683f)
> #2 kvm_cmd_run builtin-run.c:660 (lkvm+0x004a683f)
> #3 handle_command kvm-cmd.c:84:8 (lkvm+0x004bc40c)
> #4 handle_kvm_command main.c:11:9 (lkvm+0x004ac0b4)
> #5 main main.c:18 (lkvm+0x004ac0b4)
>
> SUMMARY: ThreadSanitizer: data race ioport.c:138:2 in ioport__unregister
> ==
>
> I think this is because we don't perform locking using pthread, but rather 
> pause
> the vm entirely - so the cpu threads it's pointing to aren't actually running 
> when
> we unregister ioports. Is there a way to annotate that for tsan?


I've looked at brlock and I think should understand it. Reader threads
write to the eventfd to notify that they are stopped and writer reads
from the event fd and tsan considers this write->read as
synchronization. I suspect that this can be caused by the same
use-after-free on cpu array, probably kvm__pause takes fast path when
it should not.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf"

2015-10-25 Thread Lan, Tianyu



On 10/25/2015 2:03 PM, Alexander Duyck wrote:

On 10/24/2015 08:43 AM, Lan, Tianyu wrote:


On 10/22/2015 4:52 AM, Alexander Duyck wrote:

Also have you even considered the MSI-X configuration on the VF?  I
haven't seen anything anywhere that would have migrated the VF's MSI-X
configuration from BAR 3 on one system to the new system.


MSI-X migration is done by Hypervisor(Qemu).
Following link is my Qemu patch to do that.
http://marc.info/?l=kvm=144544706530484=2


I really don't like the idea of trying to migrate the MSI-X across from
host to host while it is still active.  I really think Qemu shouldn't be
moving this kind of data over in a migration.


Hi Alex:

VF MSI-X regs in the VM are faked by Qemu and Qemu maps host vectors of
VF with guest's vector. The MSIX data migrated are for faked regs rather
than the one on the host. After migration, Qemu will remap guest vectors
with host vectors on the new machine. Moreover, VM is stopped during
migrating MSI-X data.




I think that having the VF do a suspend/resume is the best way to go.
Then it simplifies things as all you have to deal with is the dirty page
tracking for the Rx DMA and you should be able to do this without making
things too difficult.



Yes, that will be simple and most concern is service down time. I will
test later.



- Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC Patch 05/12] IXGBE: Add new sysfs interface of "notify_vf"

2015-10-25 Thread Alexander Duyck

On 10/24/2015 08:43 AM, Lan, Tianyu wrote:


On 10/22/2015 4:52 AM, Alexander Duyck wrote:

Also have you even considered the MSI-X configuration on the VF?  I
haven't seen anything anywhere that would have migrated the VF's MSI-X
configuration from BAR 3 on one system to the new system.


MSI-X migration is done by Hypervisor(Qemu).
Following link is my Qemu patch to do that.
http://marc.info/?l=kvm=144544706530484=2


I really don't like the idea of trying to migrate the MSI-X across from 
host to host while it is still active.  I really think Qemu shouldn't be 
moving this kind of data over in a migration.


I think that having the VF do a suspend/resume is the best way to go.  
Then it simplifies things as all you have to deal with is the dirty page 
tracking for the Rx DMA and you should be able to do this without making 
things too difficult.


- Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] target-i386: save/restore vcpu's TSC rate during migration

2015-10-25 Thread haozhong . zhang
On Fri, Oct 23, 2015 at 12:45:13PM -0200, Eduardo Habkost wrote:
> On Fri, Oct 23, 2015 at 10:27:27AM +0800, Haozhong Zhang wrote:
> > On Thu, Oct 22, 2015 at 04:45:21PM -0200, Eduardo Habkost wrote:
> > > On Tue, Oct 20, 2015 at 03:22:51PM +0800, Haozhong Zhang wrote:
> > > > This patchset enables QEMU to save/restore vcpu's TSC rate during the
> > > > migration. When cooperating with KVM which supports TSC scaling, guest
> > > > programs can observe a consistent guest TSC rate even though they are
> > > > migrated among machines with different host TSC rates.
> > > > 
> > > > A pair of cpu options 'save-tsc-freq' and 'load-tsc-freq' are added to
> > > > control the migration of vcpu's TSC rate.
> > > 
> > > The requirements and goals aren't clear to me. I see two possible use
> > > cases, here:
> > > 
> > > 1) Best effort to keep TSC frequency constant if possible (but not
> > >aborting migration if not possible). This would be an interesting
> > >default, but a bit unpredictable.
> > > 2) Strictly ensuring TSC frequency stays constant on migration (and
> > >aborting migration if not possible). This would be an useful feature,
> > >but can't be enabled by default unless both hosts have the same TSC
> > >frequency or support TSC scaling.
> > > 
> > > Which one(s) you are trying to implement?
> > >
> > 
> > The former. I agree that it's unpredictable if setting vcpu's TSC
> > frequency to the migrated value is enabled by default (but not in this
> > patchset). The cpu option 'load-tsc-freq' is introduced to allow users
> > to enable this behavior if they do know the underlying KVM and CPU
> > support TSC scaling. In this way, I think the behavior is predictable
> > as users do know what they are doing.
> 
> I'm confused. If load-tsc-freq doesn't abort when TSC scaling isn't
> available (use case #1), why isn't it enabled by default? On the other
> hand, if you expect the user to enable it only if the host supports TSC
> scaling, why doesn't it abort if TSC scaling isn't available?
>

Sorry for the confusion. For user case #1, load-tsc-freq is really not
needed and the migrated TSC frequency should be set if possible
(ie. if TSC scaling is supported and KVM_SET_TSC_KHZ succeeds). If
setting TSC frequency fails, the migration will not be aborted.

> I mean, we can implement both use cases above this way:
> 
> 1) If the user didn't ask for anything explicitly:
>   * If the tsc-freq value is available in the migration stream, try to
> set it (but don't abort if it can't be set). (use case #1 above)
> * Rationale: it won't hurt to try to make the VM behave nicely if
>   possible, without blocking migration if TSC scaling isn't
>   available.
> 2) If the user asked for the TSC frequency to be enforced, set it and
>   abort if it couldn't be set (use case #2 above). This could apply to
>   both cases:
>   2.1) If tsc-freq is explicitly set in the command-line.
> * Rationale: if the user asked for a specific frequency, we
>   should do what was requested and not ignore errors silently.
>   2.2) If tsc-freq is available in the migration stream, and the
> user asked explicitly for it to be enforced.
> * Rationale: the user is telling us that the incoming tsc-freq
>   is important, so we shouldn't ignore it silently.
> * Open question: how should we name the new option?
>   "load-tsc-freq" would be misleading because it won't be just about
>   _loading_ tsc-freq (we would be loading it on use case #1, too),
>   but about making sure it is enforced. "strict-tsc-freq"?
>   "enforce-tsc-freq"?
> 
> We don't need to implement both #1 and #2 at the same time. But if you
> just want to implement #1 first, I don't see the need for the
> "load-tsc-freq" option.
> 
> On the migration source, we need another option or internal machine flag
> for #1. I am not sure it should be an user-visible option. If
> user-visible, I don't know how to name it. "save-tsc-freq" describes it
> correctly, but it doesn't make its purpose very clear. Any suggestions?
> It can also be implemented first as an internal machine class flag (set
> in pc >= 2.5 only), and possibly become a user-visible option later.
>

Because the way I implements 'save-tsc-freq' in patch 1, it's exposed
to users. I'm not familiar the way to make a feature only available
for newer machine types. Could you provide some suggestions to hide
'save-tsc-freq' from users?

For the name, if we make the option internal only, could we still use
'save-tsc-freq' as it does mean saving the TSC frequency.

> > 
> > > In other words, what is the right behavior when KVM_SET_TSC_KHZ fails or
> > > KVM_CAP_TSC_CONTROL is not available? We can't answer that question if
> > > the requirements and goals are not clear.
> > >
> > 
> > If KVM_CAP_TSC_CONTROL is unavailable, QEMU and KVM will use the host
> > TSC frequency as vcpu's TSC frequency.
> > 
> > If KVM_CAP_TSC_CONTROL is available and KVM_SET_TSC_KHZ fails, 

Re: [PATCH v2 3/3] target-i386: load the migrated vcpu's TSC rate

2015-10-25 Thread haozhong . zhang
On Fri, Oct 23, 2015 at 12:58:02PM -0200, Eduardo Habkost wrote:
> On Fri, Oct 23, 2015 at 11:14:48AM +0800, Haozhong Zhang wrote:
> > On Thu, Oct 22, 2015 at 04:11:37PM -0200, Eduardo Habkost wrote:
> > > On Tue, Oct 20, 2015 at 03:22:54PM +0800, Haozhong Zhang wrote:
> > > > Set vcpu's TSC rate to the migrated value (if any). If KVM supports TSC
> > > > scaling, guest programs will observe TSC increasing in the migrated rate
> > > > other than the host TSC rate.
> > > > 
> > > > The loading is controlled by a new cpu option 'load-tsc-freq'. If it is
> > > > present, then the loading will be enabled and the migrated vcpu's TSC
> > > > rate will override the value specified by the cpu option
> > > > 'tsc-freq'. Otherwise, the loading will be disabled.
> > > 
> > > Why do we need an option? Why can't we enable loading unconditionally?
> > >
> > 
> > If TSC scaling is not supported by KVM and CPU, unconditionally
> > enabling this loading will not take effect which would be different
> > from users' expectation. 'load-tsc-freq' is introduced to allow users
> > to enable the loading of migrated TSC frequency if they do know the
> > underlying KVM and CPU have TSC scaling support.
> > 
>

Sorry for the confusion, I changed my mind. The semantics of
'load-tsc-freq' is really confusing and we should not need it at all.

Now, what I want to implement is to migrate the TSC frequency as much
as possible. If it could not, QEMU does not abort the migration.

> I don't get your argument about user expectations. We can't read the
> user's mind, but let's enumerate all possible scenarios:
>
> * Host has TSC scaling, user expect TSC frequency to be set:
>   * We set it. The user is happy.
Agree.

> * Host has TSC scaling, user doesn't expect TSC frequency to be
>   set:
>   * We still set it. VM behaves better, guest doesn't see changing TSC
> frequency. User didn't expect it but won't be unhappy.
Agree.

> * No TSC scaling, user expect TSC frequency to be set:
>   * We won't set it, user will be unhappy. But I believe we all agree
> we shouldn't make QEMU abort migration by default on all hosts that
> don't support TSC scaling.
Agree and display warning messages.

> * No TSC scaling, user doesn't expect TSC frequency to be set:
>   * We don't set it. User is happy.
Agree. This is the current QEMU's behavior, so it's still acceptable.

Thanks,
Haozhong

> 
> Could you clarify on which items you disagree above, exactly?
> 
> -- 
> Eduardo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Please pull my kvm-ppc-next branch

2015-10-25 Thread Paul Mackerras
Paolo,

Here is my current patch queue for KVM on PPC.  There's nothing much
in the way of new features this time; it's mostly bug fixes, plus
Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS.  These are
intended for the "next" branch of the KVM tree.  Please pull.

Thanks,
Paul.

The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118:

  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 
16:31:52 +1100)


Andrzej Hajda (1):
  KVM: PPC: e500: fix handling local_sid_lookup result

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path

Mahesh Salgaonkar (1):
  KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE

Nikunj A Dadhania (1):
  KVM: PPC: Implement extension to report number of memslots

Paul Mackerras (2):
  KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation 
ioctl
  KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent 
HPTEs

Tudor Laurentiu (3):
  powerpc/e6500: add TMCFG0 register definition
  KVM: PPC: e500: Emulate TMCFG0 TMRN register
  KVM: PPC: e500: fix couple of shift operations on 64 bits

 arch/powerpc/include/asm/disassemble.h  |  5 +
 arch/powerpc/include/asm/reg_booke.h|  6 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  3 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++---
 arch/powerpc/kvm/e500.c |  3 ++-
 arch/powerpc/kvm/e500_emulate.c | 19 +++
 arch/powerpc/kvm/e500_mmu_host.c|  4 ++--
 arch/powerpc/kvm/powerpc.c  |  3 +++
 9 files changed, 63 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] Please pull my kvm-ppc-next branch

2015-10-25 Thread Paul Mackerras
Paolo,

Here is my current patch queue for KVM on PPC.  There's nothing much
in the way of new features this time; it's mostly bug fixes, plus
Nikunj has implemented support for KVM_CAP_NR_MEMSLOTS.  These are
intended for the "next" branch of the KVM tree.  Please pull.

Thanks,
Paul.

The following changes since commit 9ffecb10283508260936b96022d4ee43a7798b4c:

  Linux 4.3-rc3 (2015-09-27 07:50:08 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc.git kvm-ppc-next

for you to fetch changes up to 70aa3961a196ac32baf54032b2051bac9a941118:

  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path (2015-10-21 
16:31:52 +1100)


Andrzej Hajda (1):
  KVM: PPC: e500: fix handling local_sid_lookup result

Gautham R. Shenoy (1):
  KVM: PPC: Book3S HV: Handle H_DOORBELL on the guest exit path

Mahesh Salgaonkar (1):
  KVM: PPC: Book3S HV: Deliver machine check with MSR(RI=0) to guest as MCE

Nikunj A Dadhania (1):
  KVM: PPC: Implement extension to report number of memslots

Paul Mackerras (2):
  KVM: PPC: Book3S HV: Don't fall back to smaller HPT size in allocation 
ioctl
  KVM: PPC: Book3S HV: Make H_REMOVE return correct HPTE value for absent 
HPTEs

Tudor Laurentiu (3):
  powerpc/e6500: add TMCFG0 register definition
  KVM: PPC: e500: Emulate TMCFG0 TMRN register
  KVM: PPC: e500: fix couple of shift operations on 64 bits

 arch/powerpc/include/asm/disassemble.h  |  5 +
 arch/powerpc/include/asm/reg_booke.h|  6 ++
 arch/powerpc/kvm/book3s_64_mmu_hv.c |  3 ++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c |  2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 29 ++---
 arch/powerpc/kvm/e500.c |  3 ++-
 arch/powerpc/kvm/e500_emulate.c | 19 +++
 arch/powerpc/kvm/e500_mmu_host.c|  4 ++--
 arch/powerpc/kvm/powerpc.c  |  3 +++
 9 files changed, 63 insertions(+), 11 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH] KVM: VMX: add tests for MSR_IA32_DEBUGCTLMSR register

2015-10-25 Thread Jian Zhou
Test bit 0(LBR), bit 1(BTF) and bit 11(FREEZE_LBRS_ON_PMI)
of MSR_IA32_DEBUGCTLMSR register.
Bit 11 depends on bit 0, so I tested five combinations:
(1) bit 0
(2) bit 1
(3) bit 0 | bit 1
(4) bit 0 | bit 11
(5) bit 0 | bit 1 | bit 11

Pentium4, Atom and Skylake are not defined in Qemu.
I have tested core2duo, Nehalem, Westmere, SandyBridge,
IvyBridge, Haswell and Broadwell.
Because LBR depends on the CPU model, it needs to add '-cpu ...'
in the command line of x86-run bash script.

Signed-off-by: Jian Zhou 
---
 x86/msr.c | 88 ++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/x86/msr.c b/x86/msr.c
index ec4710e..8e118b2 100644
--- a/x86/msr.c
+++ b/x86/msr.c
@@ -58,8 +58,15 @@ struct msr_info msr_info[] =
 { .index = 0xc084, .name = "MSR_SYSCALL_MASK",
   .val_pairs = {{ .valid = 1, .value = 0x, .expected = 0x}}
 },
+// MSR_IA32_DEBUGCTLMSR needs feature LBRV
+{ .index = 0x01d9, .name = "MSR_IA32_DEBUGCTLMSR",
+  .val_pairs = {{ .valid = 1, .value = 0x801, .expected = 0x801}},
+  .val_pairs = {{ .valid = 1, .value = 0x803, .expected = 0x803}},
+  .val_pairs = {{ .valid = 1, .value = 0x1, .expected = 0x1}},
+  .val_pairs = {{ .valid = 1, .value = 0x2, .expected = 0x2}},
+  .val_pairs = {{ .valid = 1, .value = 0x3, .expected = 0x3}}
+},

-//MSR_IA32_DEBUGCTLMSR needs svm feature LBRV
 //MSR_VM_HSAVE_PA only AMD host
 };

@@ -77,14 +84,93 @@ static int find_msr_info(int msr_index)
 static void test_msr_rw(int msr_index, unsigned long long input, unsigned long 
long expected)
 {
 unsigned long long r = 0;
+int lbr_supported = 0;
+u32 eax;
+u8 family;
+u8 model;
 int index;
 char *sptr;
+
 if ((index = find_msr_info(msr_index)) != -1) {
 sptr = msr_info[index].name;
 } else {
 printf("couldn't find name for msr # 0x%x, skipping\n", msr_index);
 return;
 }
+
+if (msr_index == MSR_IA32_DEBUGCTLMSR) {
+eax = raw_cpuid(0x1, 0).a;
+family = (eax >> 8) & 0xf;
+model = (eax >> 4) & 0xf;
+
+if (family == 15)
+family += (eax >> 20) & 0xff;
+if (family >= 6)
+model += ((eax >> 16) & 0xf) << 4;
+// test for Intel CPUs
+if (family == 6) {
+switch (model)
+{
+case 15: /* 65nm Core2 "Merom" */
+case 22: /* 65nm Core2 "Merom-L" */
+case 23: /* 45nm Core2 "Penryn" */
+case 29: /* 45nm Core2 "Dunnington (MP) */
+case 28: /* 45nm Atom "Pineview" */
+case 38: /* 45nm Atom "Lincroft" */
+case 39: /* 32nm Atom "Penwell" */
+case 53: /* 32nm Atom "Cloverview" */
+case 54: /* 32nm Atom "Cedarview" */
+case 55: /* 22nm Atom "Silvermont" */
+case 76: /* 14nm Atom "Airmont" */
+case 77: /* 22nm Atom "Silvermont Avoton/Rangely" */
+case 30: /* 45nm Nehalem */
+case 26: /* 45nm Nehalem-EP */
+case 46: /* 45nm Nehalem-EX */
+case 37: /* 32nm Westmere */
+case 44: /* 32nm Westmere-EP */
+case 47: /* 32nm Westmere-EX */
+case 42: /* 32nm SandyBridge */
+case 45: /* 32nm SandyBridge-E/EN/EP */
+case 58: /* 22nm IvyBridge */
+case 62: /* 22nm IvyBridge-EP/EX */
+case 60: /* 22nm Haswell Core */
+case 63: /* 22nm Haswell Server */
+case 69: /* 22nm Haswell ULT */
+case 70: /* 22nm Haswell + GT3e */
+case 61: /* 14nm Broadwell Core-M */
+case 86: /* 14nm Broadwell Xeon D */
+case 71: /* 14nm Broadwell + GT3e */
+case 79: /* 14nm Broadwell Server */
+case 78: /* 14nm Skylake Mobile */
+case 94: /* 14nm Skylake Desktop */
+lbr_supported = 1;
+break;
+}
+}
+if (family == 15) {
+switch (model)
+{
+/* Pentium4/Xeon(based on NetBurst) */
+case 3:
+case 4:
+case 6:
+/* don't support bit 11, skipping */
+if (input & (1 << 11)) {
+return;
+break;
+}
+lbr_supported = 1;
+break;
+}
+}
+
+if(!lbr_supported) {
+printf("WARNING: LBR does't support, add a specific CPU(-cpu ...) 
model to support it."
+   " Skipping for %s test.\n", sptr);
+return;
+}
+}
+
 wrmsr(msr_index, input);
 r = rdmsr(msr_index);
 if 

Re: [RFC Patch 00/12] IXGBE: Add live migration support for SRIOV NIC

2015-10-25 Thread Lan Tianyu
On 2015年10月24日 02:36, Alexander Duyck wrote:
> I was thinking about it and I am pretty sure the dummy write approach is
> problematic at best.  Specifically the issue is that while you are
> performing a dummy write you risk pulling in descriptors for data that
> hasn't been dummy written to yet.  So when you resume and restore your
> descriptors you will have once that may contain Rx descriptors
> indicating they contain data when after the migration they don't.

How about changing sequence? dummy writing Rx packet data fist and then
its desc. This can ensure that RX data is migrated before its desc and
prevent such case.

-- 
Best regards
Tianyu Lan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-25 Thread Paul Mackerras
On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
> Yes, we'll likely need this soon! 32 slots are not enough...

Would anyone object if I raised the limit for PPC to 512 slots?
Would that cause problems on embedded PPC, for instance?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: Implement extension to report number of memslots

2015-10-25 Thread Paul Mackerras
On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote:
> Yes, we'll likely need this soon! 32 slots are not enough...

Would anyone object if I raised the limit for PPC to 512 slots?
Would that cause problems on embedded PPC, for instance?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 102301] Shutting down a Windowvs 10 virtual machine (with VGA passthrough) causes a hard crash, every time

2015-10-25 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=102301

Balázs László Batári  changed:

   What|Removed |Added

 CC||b...@bayi.hu

--- Comment #2 from Balázs László Batári  ---
I can - occasionally - reproduce it.

Exact same thing: QEMU-KVM with a Windows 10 guest and with GPU passed through.

My kernel version is 4.2.4
Qemu: 2.4.0.1
Libvirt: 1.2.20

To reproduce it:
 - First shutdown is normally fine
 - Second shutdown triggers it
 - Reboot is always fine

Oh and it is not a "hard freeze", if im on TeamSpeak3 for example the others
can still hear me and i can hear them, but keyboard/mouse/screen is not working

I couldnt copy out the dmesg, but i made screenshots:
 https://goo.gl/photos/fqK7Z5gom9FxwVfe9
 https://goo.gl/photos/CwtSX4MBktmEP7J57

The strange thing is that we have very similar motherboards with the reporter (
Z87 Pro3 vs Z87 Pro4 ) maybe its something with Asus

-- 
You are receiving this mail because:
You are watching the assignee of the bug.--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-25 Thread Paolo Bonzini


On 21/10/2015 19:07, Sasha Levin wrote:
> On 10/19/2015 11:15 AM, Dmitry Vyukov wrote:
>> But still: if result of a racy read is passed to guest, that can leak
>> arbitrary host data into guest.
> 
> I see what you're saying.

I don't... how can it leak arbitrary host data?  The memcpy cannot write
out of bounds.

> I need to think about it a bit, maybe we do need locking
> for each of the virtio devices we emulate.

No, it's unnecessary.  The guest is racing against itself.  Races like
this one do mean that the MSIX PBA and table are untrusted data, but as
long as you do not use the untrusted data to e.g. index an array it's fine.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: sanitizing kvmtool

2015-10-25 Thread Sasha Levin
On 10/25/2015 11:19 AM, Paolo Bonzini wrote:
> 
> 
> On 21/10/2015 19:07, Sasha Levin wrote:
>> On 10/19/2015 11:15 AM, Dmitry Vyukov wrote:
>>> But still: if result of a racy read is passed to guest, that can leak
>>> arbitrary host data into guest.
>>
>> I see what you're saying.
> 
> I don't... how can it leak arbitrary host data?  The memcpy cannot write
> out of bounds.

The issue I had in mind (simplified) is:

vcpu1   vcpu2

guest writes idx
check if idx is valid
guest writes new idx
access (guest mem + idx)


So I'm not sure if cover both the locking, and potential compiler tricks
sufficiently enough to prevent that scenario.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html