Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote: > On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: > > An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, > > even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when > > assigning a device to a guest with low latency requirement since it > > reduce the context switch to/from the IRQ thread. > > Is there any way we can do this automatically? Perhaps detecting that > we're on a RT kernel or maybe that the user is running with RT priority? > I find that module options are mostly misunderstood and misused. Alex, thanks for review. It's not easy to detect if the user is running with RT priority, since sometimes the user start the thread and then set the scheduler priority late. Also should we do this only for in kernel irqchip scenario and not for user space handler, since in kernel irqchip has lower overhead? > > > An experiment was conducted on a HSW platform for 1 minutes, with the > > guest vCPU bound to isolated pCPU. The assigned device triggered the > > interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time > > is dropped from 5.3us to 2.2us. > > > > Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this > > option only to specific devices when in kernel irq_chip is enabled. It > > provides more flexibility but is more complex, not sure if we need go > > through that way. > > Allowing the user to decide whether or not to use a threaded IRQ seems > like a privilege violation; a chance for the user to game the system and > give themselves better latency, maybe at the cost of others. I think Yes, you are right. One benefit of the ioctl change is to have a per-device-option thus is more flexible. > we're better off trying to infer the privilege from the task priority or I'd think system admin may make decision after some tunning, like you said it "maybe at the cost of others" and not sure if we should make decision based on task priority or kernel config. Thanks --jyh > kernel config or, if we run out of options, make a module option as you > have here requiring the system admin to provide the privilege. Thanks, > > Alex > > > > Signed-off-by: Yunhong Jiang> > --- > > drivers/vfio/pci/vfio_pci_intrs.c | 10 +- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c > > b/drivers/vfio/pci/vfio_pci_intrs.c > > index 1f577b4..ca1f95a 100644 > > --- a/drivers/vfio/pci/vfio_pci_intrs.c > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > > @@ -22,9 +22,13 @@ > > #include > > #include > > #include > > +#include > > > > #include "vfio_pci_private.h" > > > > +static bool nonthread_msi = 1; > > +module_param(nonthread_msi, bool, 0444); > > + > > /* > > * INTx > > */ > > @@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct > > vfio_pci_device *vdev, > > char *name = msix ? "vfio-msix" : "vfio-msi"; > > struct eventfd_ctx *trigger; > > int ret; > > + unsigned long irqflags = 0; > > > > if (vector >= vdev->num_ctx) > > return -EINVAL; > > @@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct > > vfio_pci_device *vdev, > > pci_write_msi_msg(irq, ); > > } > > > > - ret = request_irq(irq, vfio_msihandler, 0, > > + if (nonthread_msi) > > + irqflags = IRQF_NO_THREAD; > > + > > + ret = request_irq(irq, vfio_msihandler, irqflags, > > vdev->ctx[vector].name, trigger); > > if (ret) { > > kfree(vdev->ctx[vector].name); > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v3 2/3] KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
Hello! > > --- cut --- > > Additionally, remove unnecessary vgic_set_lr() and LR_STATE_PENDING check > > in vgic_unqueue_irqs(), because all these things are now done by the > > following vgic_retire_lr(). > > --- cut --- > > This does not explain the question I'm raising. > > After applying this patch, and before applying your next patch, > unqueueing an IRQ will not restore the pending state on the > distributor, but just throw that piece of state away It will restore the state and not throw it away. I guess i'm just not clear enough and you misunderstand me. This check in vgic_unqueue_irqs() is redundant from the beginning. Let's look at current vgic_retire_lr(): https://git.kernel.org/cgit/linux/kernel/git/kvmarm/kvmarm.git/tree/virt/kvm/arm/vgic.c?h=next#n1099 It already does LR_STATE_PENDING check and pushback by itself, since cff9211eb1a1f58ce7f5a2d596b617928fd4be0e (it's your commit, BTW), so that this check: https://git.kernel.org/cgit/linux/kernel/git/kvmarm/kvmarm.git/tree/virt/kvm/arm/vgic.c?h=next#n728 is already redundant. So actually this is a separate change, and perhaps it's my fault to squash it in. > which breaks bisectability and makes it impossible to understand the logic by > looking > at this commit in isolation. Will this be understood better if i make this particular refactor a separate commit, with better explanations? Kind regards, Pavel Fedin Expert Engineer Samsung Electronics Research center Russia -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 1/3] KVM: arm/arm64: Optimize away redundant LR tracking
Currently we use vgic_irq_lr_map in order to track which LRs hold which IRQs, and lr_used bitmap in order to track which LRs are used or free. vgic_irq_lr_map is actually used only for piggy-back optimization, and can be easily replaced by iteration over lr_used. This is good because in future, when LPI support is introduced, number of IRQs will grow up to at least 16384, while numbers from 1024 to 8192 are never going to be used. This would be a huge memory waste. In its turn, lr_used is also completely redundant since ae705930fca6322600690df9dc1c7d0516145a93 ("arm/arm64: KVM: Keep elrsr/aisr in sync with software model"), because together with lr_used we also update elrsr. This allows to easily replace lr_used with elrsr, inverting all conditions (because in elrsr '1' means 'free'). Signed-off-by: Pavel Fedin--- include/kvm/arm_vgic.h | 6 -- virt/kvm/arm/vgic-v2.c | 1 + virt/kvm/arm/vgic-v3.c | 1 + virt/kvm/arm/vgic.c| 53 ++ 4 files changed, 17 insertions(+), 44 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 8065801..3936bf8 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -295,9 +295,6 @@ struct vgic_v3_cpu_if { }; struct vgic_cpu { - /* per IRQ to LR mapping */ - u8 *vgic_irq_lr_map; - /* Pending/active/both interrupts on this VCPU */ DECLARE_BITMAP(pending_percpu, VGIC_NR_PRIVATE_IRQS); DECLARE_BITMAP(active_percpu, VGIC_NR_PRIVATE_IRQS); @@ -308,9 +305,6 @@ struct vgic_cpu { unsigned long *active_shared; unsigned long *pend_act_shared; - /* Bitmap of used/free list registers */ - DECLARE_BITMAP(lr_used, VGIC_V2_MAX_LRS); - /* Number of list registers on this CPU */ int nr_lr; diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c index 8d7b04d..c0f5d7f 100644 --- a/virt/kvm/arm/vgic-v2.c +++ b/virt/kvm/arm/vgic-v2.c @@ -158,6 +158,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu) * anyway. */ vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0; + vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0; /* Get the show on the road... */ vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN; diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c index 7dd5d62..92003cb 100644 --- a/virt/kvm/arm/vgic-v3.c +++ b/virt/kvm/arm/vgic-v3.c @@ -193,6 +193,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu) * anyway. */ vgic_v3->vgic_vmcr = 0; + vgic_v3->vgic_elrsr = ~0; /* * If we are emulating a GICv3, we do it in an non-GICv2-compatible diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index d4669eb..265a410 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -108,6 +108,7 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu); static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu); static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc); +static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu); static struct irq_phys_map *vgic_irq_map_search(struct kvm_vcpu *vcpu, int virt_irq); static int compute_pending_for_cpu(struct kvm_vcpu *vcpu); @@ -691,9 +692,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio *mmio, void vgic_unqueue_irqs(struct kvm_vcpu *vcpu) { struct vgic_cpu *vgic_cpu = >arch.vgic_cpu; + u64 elrsr = vgic_get_elrsr(vcpu); + unsigned long *elrsr_ptr = u64_to_bitmask(); int i; - for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) { + for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) { struct vgic_lr lr = vgic_get_lr(vcpu, i); /* @@ -1098,7 +1101,6 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu) static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) { - struct vgic_cpu *vgic_cpu = >arch.vgic_cpu; struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr); /* @@ -1112,8 +1114,6 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) vlr.state = 0; vgic_set_lr(vcpu, lr_nr, vlr); - clear_bit(lr_nr, vgic_cpu->lr_used); - vgic_cpu->vgic_irq_lr_map[irq] = LR_EMPTY; vgic_sync_lr_elrsr(vcpu, lr_nr, vlr); } @@ -1128,10 +1128,11 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) */ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu) { - struct vgic_cpu *vgic_cpu = >arch.vgic_cpu; + u64 elrsr = vgic_get_elrsr(vcpu); + unsigned long *elrsr_ptr = u64_to_bitmask(); int lr; - for_each_set_bit(lr, vgic_cpu->lr_used, vgic->nr_lr) { + for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) { struct vgic_lr vlr =
[PATCH v4 2/3] KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
1. Remove unnecessary 'irq' argument, because irq number can be retrieved from the LR. 2. Since cff9211eb1a1f58ce7f5a2d596b617928fd4be0e ("arm/arm64: KVM: Fix arch timer behavior for disabled interrupts ") LR_STATE_PENDING is queued back by vgic_retire_lr() itself. Also, it clears vlr.state itself. Therefore, we remove the same, now duplicated, check with all accompanying bit manipulations from vgic_unqueue_irqs(). 3. vgic_retire_lr() is always accompanied by vgic_irq_clear_queued(). Since it already does more than just clearing the LR, move vgic_irq_clear_queued() inside of it. Signed-off-by: Pavel Fedin--- virt/kvm/arm/vgic.c | 37 ++--- 1 file changed, 10 insertions(+), 27 deletions(-) diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 265a410..96e45f3 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -105,7 +105,7 @@ #include "vgic.h" static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu); -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu); +static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu); static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr); static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc); static u64 vgic_get_elrsr(struct kvm_vcpu *vcpu); @@ -717,30 +717,14 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu) * interrupt then move the active state to the * distributor tracking bit. */ - if (lr.state & LR_STATE_ACTIVE) { + if (lr.state & LR_STATE_ACTIVE) vgic_irq_set_active(vcpu, lr.irq); - lr.state &= ~LR_STATE_ACTIVE; - } /* * Reestablish the pending state on the distributor and the -* CPU interface. It may have already been pending, but that -* is fine, then we are only setting a few bits that were -* already set. +* CPU interface and mark the LR as free for other use. */ - if (lr.state & LR_STATE_PENDING) { - vgic_dist_irq_set_pending(vcpu, lr.irq); - lr.state &= ~LR_STATE_PENDING; - } - - vgic_set_lr(vcpu, i, lr); - - /* -* Mark the LR as free for other use. -*/ - BUG_ON(lr.state & LR_STATE_MASK); - vgic_retire_lr(i, lr.irq, vcpu); - vgic_irq_clear_queued(vcpu, lr.irq); + vgic_retire_lr(i, vcpu); /* Finally update the VGIC state. */ vgic_update_state(vcpu->kvm); @@ -1099,16 +1083,18 @@ static inline void vgic_enable(struct kvm_vcpu *vcpu) vgic_ops->enable(vcpu); } -static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu) +static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu) { struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr); + vgic_irq_clear_queued(vcpu, vlr.irq); + /* * We must transfer the pending state back to the distributor before * retiring the LR, otherwise we may loose edge-triggered interrupts. */ if (vlr.state & LR_STATE_PENDING) { - vgic_dist_irq_set_pending(vcpu, irq); + vgic_dist_irq_set_pending(vcpu, vlr.irq); vlr.hwirq = 0; } @@ -1135,11 +1121,8 @@ static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu) for_each_clear_bit(lr, elrsr_ptr, vgic->nr_lr) { struct vgic_lr vlr = vgic_get_lr(vcpu, lr); - if (!vgic_irq_is_enabled(vcpu, vlr.irq)) { - vgic_retire_lr(lr, vlr.irq, vcpu); - if (vgic_irq_is_queued(vcpu, vlr.irq)) - vgic_irq_clear_queued(vcpu, vlr.irq); - } + if (!vgic_irq_is_enabled(vcpu, vlr.irq)) + vgic_retire_lr(lr, vcpu); } } -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 3/3] KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
Now we see that vgic_set_lr() and vgic_sync_lr_elrsr() are always used together. Merge them into one function, saving from second vgic_ops dereferencing every time. Signed-off-by: Pavel Fedin--- include/kvm/arm_vgic.h | 1 - virt/kvm/arm/vgic-v2.c | 5 - virt/kvm/arm/vgic-v3.c | 5 - virt/kvm/arm/vgic.c| 14 ++ 4 files changed, 2 insertions(+), 23 deletions(-) diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h index 3936bf8..f62addc 100644 --- a/include/kvm/arm_vgic.h +++ b/include/kvm/arm_vgic.h @@ -112,7 +112,6 @@ struct vgic_vmcr { struct vgic_ops { struct vgic_lr (*get_lr)(const struct kvm_vcpu *, int); void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr); - void(*sync_lr_elrsr)(struct kvm_vcpu *, int, struct vgic_lr); u64 (*get_elrsr)(const struct kvm_vcpu *vcpu); u64 (*get_eisr)(const struct kvm_vcpu *vcpu); void(*clear_eisr)(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c index c0f5d7f..ff02f08 100644 --- a/virt/kvm/arm/vgic-v2.c +++ b/virt/kvm/arm/vgic-v2.c @@ -79,11 +79,7 @@ static void vgic_v2_set_lr(struct kvm_vcpu *vcpu, int lr, lr_val |= (lr_desc.source << GICH_LR_PHYSID_CPUID_SHIFT); vcpu->arch.vgic_cpu.vgic_v2.vgic_lr[lr] = lr_val; -} -static void vgic_v2_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr, - struct vgic_lr lr_desc) -{ if (!(lr_desc.state & LR_STATE_MASK)) vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr |= (1ULL << lr); else @@ -167,7 +163,6 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu) static const struct vgic_ops vgic_v2_ops = { .get_lr = vgic_v2_get_lr, .set_lr = vgic_v2_set_lr, - .sync_lr_elrsr = vgic_v2_sync_lr_elrsr, .get_elrsr = vgic_v2_get_elrsr, .get_eisr = vgic_v2_get_eisr, .clear_eisr = vgic_v2_clear_eisr, diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c index 92003cb..487d635 100644 --- a/virt/kvm/arm/vgic-v3.c +++ b/virt/kvm/arm/vgic-v3.c @@ -112,11 +112,7 @@ static void vgic_v3_set_lr(struct kvm_vcpu *vcpu, int lr, } vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[LR_INDEX(lr)] = lr_val; -} -static void vgic_v3_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr, - struct vgic_lr lr_desc) -{ if (!(lr_desc.state & LR_STATE_MASK)) vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr); else @@ -212,7 +208,6 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu) static const struct vgic_ops vgic_v3_ops = { .get_lr = vgic_v3_get_lr, .set_lr = vgic_v3_set_lr, - .sync_lr_elrsr = vgic_v3_sync_lr_elrsr, .get_elrsr = vgic_v3_get_elrsr, .get_eisr = vgic_v3_get_eisr, .clear_eisr = vgic_v3_clear_eisr, diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 96e45f3..fe451d4 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -1032,12 +1032,6 @@ static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, vgic_ops->set_lr(vcpu, lr, vlr); } -static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr, - struct vgic_lr vlr) -{ - vgic_ops->sync_lr_elrsr(vcpu, lr, vlr); -} - static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu) { return vgic_ops->get_elrsr(vcpu); @@ -1100,7 +1094,6 @@ static void vgic_retire_lr(int lr_nr, struct kvm_vcpu *vcpu) vlr.state = 0; vgic_set_lr(vcpu, lr_nr, vlr); - vgic_sync_lr_elrsr(vcpu, lr_nr, vlr); } /* @@ -1162,7 +1155,6 @@ static void vgic_queue_irq_to_lr(struct kvm_vcpu *vcpu, int irq, } vgic_set_lr(vcpu, lr_nr, vlr); - vgic_sync_lr_elrsr(vcpu, lr_nr, vlr); } /* @@ -1340,8 +1332,6 @@ static int process_queued_irq(struct kvm_vcpu *vcpu, vlr.hwirq = 0; vgic_set_lr(vcpu, lr, vlr); - vgic_sync_lr_elrsr(vcpu, lr, vlr); - return pending; } @@ -1442,8 +1432,6 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) bool level_pending; level_pending = vgic_process_maintenance(vcpu); - elrsr = vgic_get_elrsr(vcpu); - elrsr_ptr = u64_to_bitmask(); /* Deal with HW interrupts, and clear mappings for empty LRs */ for (lr = 0; lr < vgic->nr_lr; lr++) { @@ -1454,6 +1442,8 @@ static void __kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) } /* Check if we still have something up our sleeve... */ + elrsr = vgic_get_elrsr(vcpu); + elrsr_ptr = u64_to_bitmask(); pending = find_first_zero_bit(elrsr_ptr, vgic->nr_lr); if (level_pending || pending < vgic->nr_lr) set_bit(vcpu->vcpu_id, dist->irq_pending_on_cpu); -- 2.4.4 -- To
[PATCH v4 0/3] KVM: arm/arm64: Clean up some obsolete code
Current KVM code has lots of old redundancies, which can be cleaned up. This patchset is actually a better alternative to http://www.spinics.net/lists/arm-kernel/msg430726.html, which allows to keep piggy-backed LRs. The idea is based on the fact that our code also maintains LR state in elrsr, and this information is enough to track LR usage. In case of problems this series can be applied partially, each patch is a complete refactoring step on its own. Thanks to Andre Przywara for pinpointing some 4.3+ specifics. This version has been tested on SMDK5410 development board (Exynos5410 SoC). v3 => v4: - Reordered changes for purpose of better understanding and bisection. All changes related to vgic_retire_lr() are gathered in one patch now. v2 => v3: - Removed two unused variables in __kvm_vgic_flush_hwstate(), overlooked leftover from v1. v1 => v2: - Rebased to kvmarm/next of 23.10.2015. - Do not use vgic_retire_lr() for initializing ELRSR bitmask, because now it also handles pushback of PENDING state, use direct initialization instead (copied from Andre's patchset). - Took more care about vgic_retire_lr(), which has deserved own patch. Pavel Fedin (3): KVM: arm/arm64: Optimize away redundant LR tracking KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr() include/kvm/arm_vgic.h | 7 virt/kvm/arm/vgic-v2.c | 6 +-- virt/kvm/arm/vgic-v3.c | 6 +-- virt/kvm/arm/vgic.c| 104 + 4 files changed, 29 insertions(+), 94 deletions(-) -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: arm: Fix crash in free_hyp_pgds() if timer initialization fails
After vGIC initialization succeeded, and timer initialization failed, the following crash can be observed on ARM32: kvm [1]: interrupt-controller@10484000 IRQ57 kvm [1]: kvm_arch_timer: can't find DT node Unable to handle kernel paging request at virtual address 90484000 pgd = c0003000 [90484000] *pgd=8040006003, *pmd= Internal error: Oops: 2a06 [#1] PREEMPT SMP ARM ... [] (v7_flush_kern_dcache_area) from [] (kvm_flush_dcache_pte+0x48/0x5c) [] (kvm_flush_dcache_pte) from [] (unmap_range+0x24c/0x460) [] (unmap_range) from [] (free_hyp_pgds+0x84/0x160) [] (free_hyp_pgds) from [] (kvm_arch_init+0x254/0x41c) [] (kvm_arch_init) from [] (kvm_init+0x28/0x2b4) [] (kvm_init) from [] (do_one_initcall+0x9c/0x200) This happens when unmapping reaches mapped vGIC control registers. The problem root seems to be combination of two facts: 1. vGIC control region is defined in device trees as having size of 0x2000. But the specification defines only registers up to 0x1FC, therefore it is only one page, not two. 2. unmap_ptes() is expected to recognize device memory and omit cache flushing. However, it tests only for PAGE_S2_DEVICE, while devices mapped for HYP mode have PAGE_HYP_DEVICE, which is different. Therefore, cache flush is attempted, and it dies when hitting the nonexistent second page. This patch fixes the problem by adding missing recognition of PAGE_HYP_DEVICE protection value. The crash can be observed on Exynos 5410 (and probably on all Exynos5 family) with stock device trees (using MCT) and CONFIG_KVM enabled. Signed-off-by: Pavel Fedin--- arch/arm/kvm/mmu.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c index 7b42012..839dd970 100644 --- a/arch/arm/kvm/mmu.c +++ b/arch/arm/kvm/mmu.c @@ -213,7 +213,10 @@ static void unmap_ptes(struct kvm *kvm, pmd_t *pmd, kvm_tlb_flush_vmid_ipa(kvm, addr); /* No need to invalidate the cache for device mappings */ - if ((pte_val(old_pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE) + if (((pte_val(old_pte) & PAGE_S2_DEVICE) +!= PAGE_S2_DEVICE) && + ((pte_val(old_pte) & PAGE_HYP_DEVICE) +!= PAGE_HYP_DEVICE)) kvm_flush_dcache_pte(old_pte); put_page(virt_to_page(pte)); -- 2.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE][RFC] KVMGT - the implementation of Intel GVT-g(full GPU virtualization) for KVM
Hi all, We are pleased to announce another update of Intel GVT-g for KVM. Intel GVT-g is a full GPU virtualization solution with mediated pass-through, starting from 4th generation Intel Core(TM) processors with Intel Graphics processors. A virtual GPU instance is maintained for each VM, with part of performance critical resources directly assigned. The capability of running native graphics driver inside a VM, without hypervisor intervention in performance critical paths, achieves a good balance among performance, feature, and sharing capability. KVM is supported by Intel GVT-g(a.k.a. KVMGT). Repositories Kernel: https://github.com/01org/igvtg-kernel (2015q3-3.18.0 branch) Qemu: https://github.com/01org/igvtg-qemu (kvmgt_public2015q3 branch) This update consists of: - KVMGT is now merged with XenGT in unified repositories(kernel and qemu), but currently different branches for qemu. KVMGT and XenGT share same iGVT-g core logic. - PPGTT supported, hence the Windows guest support - KVMGT now supports both 4th generation (Haswell) and 5th generation (Broadwell) Intel Core(TM) processors - 2D/3D/Media decoding have been validated on Ubuntu 14.04 and Windows7/Windows 8.1 Next update will be around early Jan, 2016. Known issues: - At least 2GB memory is suggested for VM to run most 3D workloads. - 3Dmark06 running in Windows VM may have some stability issue. - Using VLC to play .ogg file may cause mosaic or slow response. Please subscribe the mailing list to report BUGs, discuss, and/or contribute: https://lists.01.org/mailman/listinfo/igvt-g More information about Intel GVT-g background, architecture, etc can be found at(may not be up-to-date): https://01.org/igvt-g http://www.linux-kvm.org/images/f/f3/01x08b-KVMGT-a.pdf https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian Note: The KVMGT project should be considered a work in progress. As such it is not a complete product nor should it be considered one. Extra care should be taken when testing and configuring a system to use the KVMGT project. -- Thanks, Jike On 12/04/2014 10:24 AM, Jike Song wrote: Hi all, We are pleased to announce the first release of KVMGT project. KVMGT is the implementation of Intel GVT-g technology, a full GPU virtualization solution. Under Intel GVT-g, a virtual GPU instance is maintained for each VM, with part of performance critical resources directly assigned. The capability of running native graphics driver inside a VM, without hypervisor intervention in performance critical paths, achieves a good balance of performance, feature, and sharing capability. KVMGT is still in the early stage: - Basic functions of full GPU virtualization works, guest can see a full-featured vGPU. We ran several 3D workloads such as lightsmark, nexuiz, urbanterror and warsow. - Only Linux guest supported so far, and PPGTT must be disabled in guest through a kernel parameter(see README.kvmgt in QEMU). - This drop also includes some Xen specific changes, which will be cleaned up later. - Our end goal is to upstream both XenGT and KVMGT, which shares ~90% logic for vGPU device model (will be part of i915 driver), with only difference in hypervisor specific services - insufficient test coverage, so please bear with stability issues :) There are things need to be improved, esp. the KVM interfacing part: 1 a domid was added to each KVMGT guest An ID is needed for foreground OS switching, e.g. # echo > /sys/kernel/vgt/control/foreground_vm domid 0 is reserved for host OS. 2 SRCU workarounds. Some KVM functions, such as: kvm_io_bus_register_dev install_new_memslots must be called *without* >srcu read-locked. Otherwise it hangs. In KVMGT, we need to register an iodev only *after* BAR registers are written by guest. That means, we already have >srcu hold - trapping/emulating PIO(BAR registers) makes us in such a condition. That will make kvm_io_bus_register_dev hangs. Currently we have to disable rcu_assign_pointer() in such functions. These were dirty workarounds, your suggestions are high welcome! 3 syscalls were called to access "/dev/mem" from kernel An in-kernel memslot was added for aperture, but using syscalls like open and mmap to open and access the character device "/dev/mem", for pass-through. The source codes(kernel, qemu as well as seabios) are available at github: git://github.com/01org/KVMGT-kernel git://github.com/01org/KVMGT-qemu git://github.com/01org/KVMGT-seabios
[PATCH] vhost: fix performance on LE hosts
commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian support for legacy devices") introduced a minor regression: even with cross-endian disabled, and even on LE host, vhost_is_little_endian is checking is_le flag so there's always a branch. To fix, simply check virtio_legacy_is_little_endian first. Cc: Greg KurzSigned-off-by: Michael S. Tsirkin --- drivers/vhost/vhost.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 4772862..d3f7674 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -183,10 +183,17 @@ static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit) return vq->acked_features & (1ULL << bit); } +#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) { return vq->is_le; } +#else +static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) +{ + return virtio_legacy_is_little_endian() || vq->is_le; +} +#endif /* Memory accessors */ static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix performance on LE hosts
On Tue, 27 Oct 2015 11:37:39 +0200 "Michael S. Tsirkin"wrote: > commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian > support for legacy devices") introduced a minor regression: even with > cross-endian disabled, and even on LE host, vhost_is_little_endian is > checking is_le flag so there's always a branch. > > To fix, simply check virtio_legacy_is_little_endian first. > > Cc: Greg Kurz > Signed-off-by: Michael S. Tsirkin Oops my bad for this oversight... Reviewed-by: Greg Kurz > --- > drivers/vhost/vhost.h | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h > index 4772862..d3f7674 100644 > --- a/drivers/vhost/vhost.h > +++ b/drivers/vhost/vhost.h > @@ -183,10 +183,17 @@ static inline bool vhost_has_feature(struct > vhost_virtqueue *vq, int bit) > return vq->acked_features & (1ULL << bit); > } > > +#ifdef CONFIG_VHOST_CROSS_ENDIAN_LEGACY > static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) > { > return vq->is_le; > } > +#else > +static inline bool vhost_is_little_endian(struct vhost_virtqueue *vq) > +{ > + return virtio_legacy_is_little_endian() || vq->is_le; > +} > +#endif > > /* Memory accessors */ > static inline u16 vhost16_to_cpu(struct vhost_virtqueue *vq, __virtio16 val) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Network hangs when communicating with host
[apologies for the delay -- I've been off for a week and am catching up on email] On Tue, Oct 20, 2015 at 09:58:51AM -0400, Sasha Levin wrote: > On 10/20/2015 09:42 AM, Dmitry Vyukov wrote: > > I now have another issue. My binary fails to mmap a file within lkvm > > sandbox. The same binary works fine on host and in qemu. I've added > > strace into sandbox script, and here is the output: > > > > [pid 837] openat(AT_FDCWD, "syzkaller-shm048878722", O_RDWR|O_CLOEXEC) = 5 > > [pid 837] mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_SHARED, 5, > > 0) = -1 EINVAL (Invalid argument) > > > > I don't see anything that can potentially cause EINVAL here. Is it > > possible that lkvm somehow affects kernel behavior here? > > > > I run lkvm as: > > > > $ taskset 1 /kvmtool/lkvm sandbox --disk syz-0 --mem=2048 --cpus=2 > > --kernel /arch/x86/boot/bzImage --network mode=user --sandbox > > /workdir/kvm/syz-0.sh > > It's possible that something in the virtio-9p layer is broken. I'll give > it a look in the evening. I ended up with the patch below, but it's really ugly and I didn't get round to posting it. Will --->8 >From 7cbcdfef1b9f094db4bf75676f22339f3164e103 Mon Sep 17 00:00:00 2001 From: Will DeaconDate: Fri, 17 Apr 2015 17:31:36 +0100 Subject: [PATCH] kvmtool: virtio-net: fix VIRTIO_NET_F_MRG_RXBUF usage in rx thread When merging virtio-net buffers using the VIRTIO_NET_F_MRG_RXBUF feature, the first buffer added to the used ring should indicate the total number of buffers used to hold the packet. Unfortunately, kvmtool has a number of issues when constructing these merged buffers: - Commit 5131332e3f1a ("kvmtool: convert net backend to support bi-endianness") introduced a strange loop counter, which resulted in hdr->num_buffers being set redundantly the first time round - When adding the buffers to the ring, we actually add them one-by-one, allowing the guest to see the header before we've inserted the rest of the data buffers... - ... which is made worse because we non-atomically increment the num_buffers count in the header each time we insert a new data buffer Consequently, the guest quickly becomes confused in its net rx code and the whole thing grinds to a halt. This is easily exemplified by trying to boot a root filesystem over NFS, which seldom succeeds. This patch resolves the issues by allowing us to insert items into the used ring without updating the index. Once the full payload has been added and num_buffers corresponds to the total size, we *then* publish the buffers to the guest. Cc: Marc Zyngier Cc: Sasha Levin Signed-off-by: Will Deacon --- include/kvm/virtio.h | 2 ++ virtio/core.c| 32 +--- virtio/net.c | 31 +++ 3 files changed, 42 insertions(+), 23 deletions(-) diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h index 768ee9668d44..8324ba7d38be 100644 --- a/include/kvm/virtio.h +++ b/include/kvm/virtio.h @@ -112,6 +112,8 @@ static inline bool virt_queue__available(struct virt_queue *vq) return virtio_guest_to_host_u16(vq, vq->vring.avail->idx) != vq->last_avail_idx; } +void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump); +struct vring_used_elem * virt_queue__set_used_elem_no_update(struct virt_queue *queue, u32 head, u32 len, u16 offset); struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len); bool virtio_queue__should_signal(struct virt_queue *vq); diff --git a/virtio/core.c b/virtio/core.c index 3b6e4d7cd045..d6ac289d450e 100644 --- a/virtio/core.c +++ b/virtio/core.c @@ -21,22 +21,17 @@ const char* virtio_trans_name(enum virtio_trans trans) return "unknown"; } -struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 head, u32 len) +void virt_queue__used_idx_advance(struct virt_queue *queue, u16 jump) { - struct vring_used_elem *used_elem; u16 idx = virtio_guest_to_host_u16(queue, queue->vring.used->idx); - used_elem = >vring.used->ring[idx % queue->vring.num]; - used_elem->id = virtio_host_to_guest_u32(queue, head); - used_elem->len = virtio_host_to_guest_u32(queue, len); - /* * Use wmb to assure that used elem was updated with head and len. * We need a wmb here since we can't advance idx unless we're ready * to pass the used element to the guest. */ wmb(); - idx++; + idx += jump; queue->vring.used->idx = virtio_host_to_guest_u16(queue, idx); /* @@ -45,6 +40,29 @@ struct vring_used_elem *virt_queue__set_used_elem(struct virt_queue *queue, u32 * an updated idx. */ wmb(); +} + +struct vring_used_elem * +virt_queue__set_used_elem_no_update(struct virt_queue *queue, u32 head, + u32
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 27/10/2015 07:35, Yunhong Jiang wrote: > On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote: >> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: >>> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, >>> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when >>> assigning a device to a guest with low latency requirement since it >>> reduce the context switch to/from the IRQ thread. >> >> Is there any way we can do this automatically? Perhaps detecting that >> we're on a RT kernel or maybe that the user is running with RT priority? >> I find that module options are mostly misunderstood and misused. > > Alex, thanks for review. > > It's not easy to detect if the user is running with RT priority, since > sometimes the user start the thread and then set the scheduler priority > late. > > Also should we do this only for in kernel irqchip scenario and not for user > space handler, since in kernel irqchip has lower overhead? The overhead of the non-threaded IRQ handler is the same for kernel or userspace irqchip, since the handler just writes 1 to the eventfd. On RT kernels however can you call eventfd_signal from interrupt context? You cannot call spin_lock_irqsave (which can sleep) from a non-threaded interrupt handler, can you? You would need a raw spin lock. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix performance on LE hosts
From: "Michael S. Tsirkin"Date: Tue, 27 Oct 2015 11:37:39 +0200 > commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian > support for legacy devices") introduced a minor regression: even with > cross-endian disabled, and even on LE host, vhost_is_little_endian is > checking is_le flag so there's always a branch. > > To fix, simply check virtio_legacy_is_little_endian first. > > Cc: Greg Kurz > Signed-off-by: Michael S. Tsirkin Michael, do you want me to take this via the 'net' tree? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Implement extension to report number of memslots
On 26/10/15 06:15, Paul Mackerras wrote: > On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote: >> Yes, we'll likely need this soon! 32 slots are not enough... > > Would anyone object if I raised the limit for PPC to 512 slots? In the long run we should really make this somehow dynamically instead. But as a first step, that should IMHO be fine. Question is whether we immediately need 512 on PPC, too? QEMU for x86 features 256 pluggable memory DIMM slots ("#define ACPI_MAX_RAM_SLOTS 256"), while PPC only has 32 ("#define SPAPR_MAX_RAM_SLOTS 32"). So maybe we are fine with less memory slots on PPC, e.g. 300 or so? Thomas -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: Implement extension to report number of memslots
On 26/10/15 06:15, Paul Mackerras wrote: > On Fri, Oct 16, 2015 at 08:41:31AM +0200, Thomas Huth wrote: >> Yes, we'll likely need this soon! 32 slots are not enough... > > Would anyone object if I raised the limit for PPC to 512 slots? In the long run we should really make this somehow dynamically instead. But as a first step, that should IMHO be fine. Question is whether we immediately need 512 on PPC, too? QEMU for x86 features 256 pluggable memory DIMM slots ("#define ACPI_MAX_RAM_SLOTS 256"), while PPC only has 32 ("#define SPAPR_MAX_RAM_SLOTS 32"). So maybe we are fine with less memory slots on PPC, e.g. 300 or so? Thomas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix performance on LE hosts
On Tue, Oct 27, 2015 at 06:32:55AM -0700, David Miller wrote: > From: "Michael S. Tsirkin"> Date: Tue, 27 Oct 2015 11:37:39 +0200 > > > commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian > > support for legacy devices") introduced a minor regression: even with > > cross-endian disabled, and even on LE host, vhost_is_little_endian is > > checking is_le flag so there's always a branch. > > > > To fix, simply check virtio_legacy_is_little_endian first. > > > > Cc: Greg Kurz > > Signed-off-by: Michael S. Tsirkin > > Michael, do you want me to take this via the 'net' tree? Sure - especially since it looks like I have nothing else for this merge window (working on fixing shutdown races, but it's slow going). Thanks a lot! -- MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] vfio/type1: Do not support IOMMUs that allow bypass
[cc +iommu] On Tue, 2015-10-27 at 15:40 +, Will Deacon wrote: > On Fri, Oct 16, 2015 at 09:51:22AM -0600, Alex Williamson wrote: > > On Fri, 2015-10-16 at 16:03 +0200, Eric Auger wrote: > > > Hi Alex, > > > On 10/15/2015 10:52 PM, Alex Williamson wrote: > > > > We can only provide isolation if DMA is forced through the IOMMU > > > > aperture. Don't allow type1 to be used if this is not the case. > > > > > > > > Signed-off-by: Alex Williamson> > > > --- > > > > > > > > Eric, I see a number of IOMMU drivers enable this, do the ones you > > > > care about for ARM set geometry.force_aperture? Thanks, > > > I am currently using arm-smmu.c. I don't see force_aperture being set. > > > > Hi Will, > > Hi Alex, > > > Would it be possible to add iommu_domain_geometry support to arm-smmu.c? > > In addition to this test to verify that DMA cannot bypass the IOMMU, I'd > > eventually like to pass the aperture information out through the VFIO > > API. Thanks, > > The slight snag here is that we don't know which SMMU in the system the > domain is attached to at the point when the geometry is queried, so I > can't give you an upper bound on the aperture. For example, if there is > an SMMU with a 32-bit input address and another with a 48-bit input > address. > > We could play the same horrible games that we do with the pgsize bitmap, > and truncate some global aperture everytime we probe an SMMU device, but > I'd really like to have fewer hacks like that if possible. The root of > the problem is still that domains are allocated for a bus, rather than > an IOMMU instance. Hi Will, Yes, Intel VT-d has this issue as well. In theory we can have heterogeneous IOMMU hardware units (DRHDs) in a system and the upper bound of the geometry could be diminished if we add a less capable DRHD into the domain. I suspect this is more a theoretical problem than a practical one though as we're typically mixing similar DRHDs and I think we're still capable of 39-bit addressing in the least capable version per the spec. In any case, I really want to start testing geometry.force_aperture, even if we're not yet comfortable to expose the IOMMU limits to the user. The vfio type1 shouldn't be enabled at all for underlying hardware that allows DMA bypass. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] vfio/type1: Do not support IOMMUs that allow bypass
On Fri, Oct 16, 2015 at 09:51:22AM -0600, Alex Williamson wrote: > On Fri, 2015-10-16 at 16:03 +0200, Eric Auger wrote: > > Hi Alex, > > On 10/15/2015 10:52 PM, Alex Williamson wrote: > > > We can only provide isolation if DMA is forced through the IOMMU > > > aperture. Don't allow type1 to be used if this is not the case. > > > > > > Signed-off-by: Alex Williamson> > > --- > > > > > > Eric, I see a number of IOMMU drivers enable this, do the ones you > > > care about for ARM set geometry.force_aperture? Thanks, > > I am currently using arm-smmu.c. I don't see force_aperture being set. > > Hi Will, Hi Alex, > Would it be possible to add iommu_domain_geometry support to arm-smmu.c? > In addition to this test to verify that DMA cannot bypass the IOMMU, I'd > eventually like to pass the aperture information out through the VFIO > API. Thanks, The slight snag here is that we don't know which SMMU in the system the domain is attached to at the point when the geometry is queried, so I can't give you an upper bound on the aperture. For example, if there is an SMMU with a 32-bit input address and another with a 48-bit input address. We could play the same horrible games that we do with the pgsize bitmap, and truncate some global aperture everytime we probe an SMMU device, but I'd really like to have fewer hacks like that if possible. The root of the problem is still that domains are allocated for a bus, rather than an IOMMU instance. Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] kvmtool: don't overwrite /virt/init
Hi Oleg, On Thu, Oct 22, 2015 at 05:59:21PM +0200, Oleg Nesterov wrote: > On top of "[PATCH 0/3] kvmtool: tiny init fox x86_64" I sent. > > I simply can't understand why kvmtool always overwrites /virt/init. > This doesn't look right, especially because you can't pass "init=" > kernel argument without another patch "[PATCH] kvmtool/run: append > cfg.kernel_cmdline at the end of real_cmdline" I sent. And so far > it is not clear to me if you are going to accept that patch or not. Sorry, I was off last week and didn't have email access. I'm going through my inbox now and your patches look pretty good to me. I'll apply/test and then push out if I don't see any issues. Thanks! Will -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
SALLAM ALEIKUM
صديقي العزيز أشعر أنك من العمر ما يكفي للحفاظ على أسرار والتعامل مع مسألة حساسة بشكل سري، وبعد دردشة لدينا القليل شعرت بداخلي بأنني يجب مناقشة اقتراحي معك بسبب المستوى الخاص من التفاهم. هذا هو السبب في أنني أريد أن تقصر هذه الصفقة السرية فيكم. يجب عليك أن تبقي هذه المحادثة على أنها سرية الأعلى بين كل واحد منا. يا سيدي، أنا من دولة صغيرة تسمى تكساس في الولايات المتحدة، وأنا في الواقع أم وحيدة ابنتي هو سبع سنوات من العمر، اسمها ديزي. أنا مهندس بترول مع شركة شل للبترول، تعاقدت الحكومة الأمريكية لي قبل سنوات قليلة، وكنت نشرت من المكسيك المحيط إلى العراق، وهذا هو بلدي السنة الثانية هنا في بغداد العراق. أنا مهندس اختبار والتحقق من النفط نظرا إلى البيت الأبيض من قبل الحكومة العراقية. بلدي التوقيع والتقارير تلعب دورا حيويا بين الولايات المتحدة الأمريكية والحكومة العراقية في مجال النفط الخام. جاء بعض الدبلوماسيين الأمريكيين وبعض التجار النفط خاصة من الولايات المتحدة إلى العراق لإجراء عملية شراء النفط الخام الضخمة التي تقدر بمئات الملايين من الدولارات، وهناك حاجة إلى دور جهدي لختم الصفقة بالموافقة على جودة الزيت وبدون توقيعي، فإن شراء لا يكون ناجحا. أعطيت لي مبلغ اجمالي قدره 10 € ملايين يورو كما نصيبي في صفقة شراء. وقد تم بالفعل نقل هذا المال للخروج من العراق من خلال تأمين شركة البريد السريع الدبلوماسية. هذا 10€ ملايين يورو نقلت الى خارج العراق، لا يسمح لنا بقواعد حكومة الولايات المتحدة إلى الدخول في صفقات تجارية، وهذا هو السبب في أنني أريد منك أن الشراكة معي والوقوف لتلقي هذه الأموال باسمي في بلدك. الشركة التي انتقلت من المال للخروج من العراق سوف نقل المال إلى الوجهة التي تريدها. يا سيدي، أنا سوف نقدم لك مبلغا مجموعه 20٪ من المبلغ الإجمالي لتلقي مربع والحفاظ معك حتى وصولي للقاء معكم لمناقشة الأعمال الأخرى الممكنة. إذا كنت تقبل هذا الاقتراح وسوف يحتاج فقط ما يلي: - (1) إسمك (2) عنوانك الحالي (3) اسم الشركة التي تعمل بها (4) الموقع الخاص بك في مكان العمل (5) رقم الهاتف الخاص بك (6) بطاقة الهوية أو رخصة القيادة أو جواز السفر الدولي بمجرد تلقي كل هذه المعلومات، وسوف عملية مستند وإرساله إليك وسترسل أيضا نسخة للشركة إدخال لك كشريكي المعترف بها فقط، وأنها ينبغي أن تحويل الأموال بلدي لكم لمزيد من الاستثمار في بلدكم.العملية برمتها بسيطة وخالية من المخاطر، ولكن يجب علينا الحفاظ على الابتعاد عن الأضواء والانصياع لقواعد السرية. مع أطيب التحيات صديقك وشريك كينت جينا مهندس بغداد، العراق -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Provide simple noop dma ops
We are going to require dma_ops for several common drivers, even for systems that do have an identity mapping. Lets provide some minimal no-op dma_ops that can be used for that purpose. Signed-off-by: Christian Borntraeger--- include/linux/dma-mapping.h | 2 ++ lib/Makefile| 2 +- lib/dma-noop.c | 77 + 3 files changed, 80 insertions(+), 1 deletion(-) create mode 100644 lib/dma-noop.c diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index ac07ff0..7912f54 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -66,6 +66,8 @@ struct dma_map_ops { int is_phys; }; +extern struct dma_map_ops dma_noop_ops; + #define DMA_BIT_MASK(n)(((n) == 64) ? ~0ULL : ((1ULL<<(n))-1)) #define DMA_MASK_NONE 0x0ULL diff --git a/lib/Makefile b/lib/Makefile index 13a7c6a..b04ba71 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -13,7 +13,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \ sha1.o md5.o irq_regs.o argv_split.o \ proportions.o flex_proportions.o ratelimit.o show_mem.o \ is_single_threaded.o plist.o decompress.o kobject_uevent.o \ -earlycpio.o seq_buf.o nmi_backtrace.o +earlycpio.o seq_buf.o nmi_backtrace.o dma-noop.o obj-$(CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS) += usercopy.o lib-$(CONFIG_MMU) += ioremap.o diff --git a/lib/dma-noop.c b/lib/dma-noop.c new file mode 100644 index 000..3ce31302 --- /dev/null +++ b/lib/dma-noop.c @@ -0,0 +1,77 @@ +/* + * lib/dma-noop.c + * + * Stub DMA noop-ops + */ +#include +#include +#include +#include + +static void *dma_noop_alloc(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t gfp, + struct dma_attrs *attrs) +{ + void *ret; + + ret = (void *)__get_free_pages(gfp, get_order(size)); + if (ret) { + memset(ret, 0, size); + *dma_handle = virt_to_phys(ret); + } + return ret; +} + +static void dma_noop_free(struct device *dev, size_t size, + void *cpu_addr, dma_addr_t dma_addr, + struct dma_attrs *attrs) +{ + free_pages((unsigned long)cpu_addr, get_order(size)); +} + +static dma_addr_t dma_noop_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, + enum dma_data_direction dir, + struct dma_attrs *attrs) +{ + return page_to_phys(page) + offset; +} + +static int dma_noop_map_sg(struct device *dev, struct scatterlist *sgl, int nents, +enum dma_data_direction dir, struct dma_attrs *attrs) +{ + int i; + struct scatterlist *sg; + + for_each_sg(sgl, sg, nents, i) { + void *va; + + BUG_ON(!sg_page(sg)); + va = sg_virt(sg); + sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va); + sg_dma_len(sg) = sg->length; + } + + return nents; +} + +static int dma_noop_mapping_error(struct device *dev, dma_addr_t dma_addr) +{ + return 0; +} + +static int dma_noop_supported(struct device *dev, u64 mask) +{ + return 1; +} + +struct dma_map_ops dma_noop_ops = { + .alloc = dma_noop_alloc, + .free = dma_noop_free, + .map_page = dma_noop_map_page, + .map_sg = dma_noop_map_sg, + .mapping_error = dma_noop_mapping_error, + .dma_supported = dma_noop_supported, +}; + +EXPORT_SYMBOL(dma_noop_ops); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] alpha: use common noop dma ops
Some of the alpha pci noop dma ops are identical to the common ones. Use them. Signed-off-by: Christian Borntraeger--- arch/alpha/kernel/pci-noop.c | 46 1 file changed, 4 insertions(+), 42 deletions(-) diff --git a/arch/alpha/kernel/pci-noop.c b/arch/alpha/kernel/pci-noop.c index 2b1f4a1..8e735b5e 100644 --- a/arch/alpha/kernel/pci-noop.c +++ b/arch/alpha/kernel/pci-noop.c @@ -123,44 +123,6 @@ static void *alpha_noop_alloc_coherent(struct device *dev, size_t size, return ret; } -static void alpha_noop_free_coherent(struct device *dev, size_t size, -void *cpu_addr, dma_addr_t dma_addr, -struct dma_attrs *attrs) -{ - free_pages((unsigned long)cpu_addr, get_order(size)); -} - -static dma_addr_t alpha_noop_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, - enum dma_data_direction dir, - struct dma_attrs *attrs) -{ - return page_to_pa(page) + offset; -} - -static int alpha_noop_map_sg(struct device *dev, struct scatterlist *sgl, int nents, -enum dma_data_direction dir, struct dma_attrs *attrs) -{ - int i; - struct scatterlist *sg; - - for_each_sg(sgl, sg, nents, i) { - void *va; - - BUG_ON(!sg_page(sg)); - va = sg_virt(sg); - sg_dma_address(sg) = (dma_addr_t)virt_to_phys(va); - sg_dma_len(sg) = sg->length; - } - - return nents; -} - -static int alpha_noop_mapping_error(struct device *dev, dma_addr_t dma_addr) -{ - return 0; -} - static int alpha_noop_supported(struct device *dev, u64 mask) { return mask < 0x00ffUL ? 0 : 1; @@ -168,10 +130,10 @@ static int alpha_noop_supported(struct device *dev, u64 mask) struct dma_map_ops alpha_noop_ops = { .alloc = alpha_noop_alloc_coherent, - .free = alpha_noop_free_coherent, - .map_page = alpha_noop_map_page, - .map_sg = alpha_noop_map_sg, - .mapping_error = alpha_noop_mapping_error, + .free = dma_noop_free_coherent, + .map_page = dma_noop_map_page, + .map_sg = dma_noop_map_sg, + .mapping_error = dma_noop_mapping_error, .dma_supported = alpha_noop_supported, }; -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] s390/dma: Allow per device dma ops
As virtio-ccw now has dma ops, we can no longer default to the PCI ones. Make use of dev_archdata to keep the dma_ops per device. Signed-off-by: Christian Borntraeger--- arch/s390/include/asm/device.h | 6 +- arch/s390/include/asm/dma-mapping.h | 2 +- arch/s390/pci/pci.c | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/s390/include/asm/device.h b/arch/s390/include/asm/device.h index d8f9872..4a9f35e 100644 --- a/arch/s390/include/asm/device.h +++ b/arch/s390/include/asm/device.h @@ -3,5 +3,9 @@ * * This file is released under the GPLv2 */ -#include +struct dev_archdata { + struct dma_map_ops *dma_ops; +}; +struct pdev_archdata { +}; diff --git a/arch/s390/include/asm/dma-mapping.h b/arch/s390/include/asm/dma-mapping.h index b3fd54d..71404f1 100644 --- a/arch/s390/include/asm/dma-mapping.h +++ b/arch/s390/include/asm/dma-mapping.h @@ -15,7 +15,7 @@ extern struct dma_map_ops s390_dma_ops; static inline struct dma_map_ops *get_dma_ops(struct device *dev) { - return _dma_ops; + return dev->archdata.dma_ops; } static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size, diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index 7ef12a3..9913915 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -649,6 +649,7 @@ int pcibios_add_device(struct pci_dev *pdev) zdev->pdev = pdev; pdev->dev.groups = zpci_attr_groups; + pdev->dev.archdata.dma_ops = _dma_ops; zpci_map_resources(pdev); for (i = 0; i < PCI_BAR_COUNT; i++) { -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC 0/4] dma ops and virtio
This is an RFC to check if I am on the right track. There are some attempts to unify the dma ops (Christoph) as well as some attempts to make virtio use the dma API (Andy). At todays discussion at the kernel summit, we concluded that we want to use the same code on all platforms, whereever possible, so having a dummy dma_op might be the easiest solution to keep virtio-ccw as similar as possible to virtio-pci. Andy Lutomirski will rework his patchset to unconditionally use the dma ops. We will also need a compatibility quirk for powerpc to bypass the iommu mappings on older QEMU version (which translates to all versions as of today) and per device, e.g. via device tree. Ben Herrenschmidt will look into that. Here is a very quick (still untested) shot at providing the s390 part: - patch1: dummy dma ops, inspired by the alpha code - patch2: replace some of the alpha functions with the dummy ones - patch3: allow per device dma ops for s390 - patch4: wire up virtio dma ops TODOs - test (s390 virtio-ccw with Andis changes, s390 pci) and review - check i386 nommu dma ops for potential improvements in the noop handlers - make dma-noop only compile when necessary Christian Borntraeger (4): Provide simple noop dma ops alpha: use common noop dma ops s390/dma: Allow per device dma ops s390/virtio: use noop dma ops arch/alpha/kernel/pci-noop.c| 46 ++ arch/s390/include/asm/device.h | 6 ++- arch/s390/include/asm/dma-mapping.h | 2 +- arch/s390/pci/pci.c | 1 + drivers/s390/virtio/kvm_virtio.c| 2 + drivers/s390/virtio/virtio_ccw.c| 2 + include/linux/dma-mapping.h | 2 + lib/Makefile| 2 +- lib/dma-noop.c | 77 + 9 files changed, 95 insertions(+), 45 deletions(-) create mode 100644 lib/dma-noop.c -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] s390/virtio: use noop dma ops
With all infrastructure in place, lets provide dma_ops for virtio devices on s390. Signed-off-by: Christian Borntraeger--- drivers/s390/virtio/kvm_virtio.c | 2 ++ drivers/s390/virtio/virtio_ccw.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/drivers/s390/virtio/kvm_virtio.c b/drivers/s390/virtio/kvm_virtio.c index 53fb975..05adaa9 100644 --- a/drivers/s390/virtio/kvm_virtio.c +++ b/drivers/s390/virtio/kvm_virtio.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -318,6 +319,7 @@ static void add_kvm_device(struct kvm_device_desc *d, unsigned int offset) return; } + kdev->vdev.dev.archdata.dma_ops = _noop_ops; kdev->vdev.dev.parent = kvm_root; kdev->vdev.id.device = d->type; kdev->vdev.config = _vq_configspace_ops; diff --git a/drivers/s390/virtio/virtio_ccw.c b/drivers/s390/virtio/virtio_ccw.c index 1cda784..8fb7a6b 100644 --- a/drivers/s390/virtio/virtio_ccw.c +++ b/drivers/s390/virtio/virtio_ccw.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -1093,6 +1094,7 @@ static void virtio_ccw_auto_online(void *data, async_cookie_t cookie) struct ccw_device *cdev = data; int ret; + cdev->dev.archdata.dma_ops = _noop_ops; ret = ccw_device_set_online(cdev); if (ret) dev_warn(>dev, "Failed to set online: %d\n", ret); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Tue, Oct 27, 2015 at 10:29:28AM +0100, Paolo Bonzini wrote: > > > On 27/10/2015 07:35, Yunhong Jiang wrote: > > On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote: > >> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: > >>> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, > >>> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when > >>> assigning a device to a guest with low latency requirement since it > >>> reduce the context switch to/from the IRQ thread. > >> > >> Is there any way we can do this automatically? Perhaps detecting that > >> we're on a RT kernel or maybe that the user is running with RT priority? > >> I find that module options are mostly misunderstood and misused. > > > > Alex, thanks for review. > > > > It's not easy to detect if the user is running with RT priority, since > > sometimes the user start the thread and then set the scheduler priority > > late. > > > > Also should we do this only for in kernel irqchip scenario and not for user > > space handler, since in kernel irqchip has lower overhead? > > The overhead of the non-threaded IRQ handler is the same for kernel or > userspace irqchip, since the handler just writes 1 to the eventfd. IIUC, the handler not only write1 1 to the eventfd, it also invoke the wait queue function, and the in kernel irqchip has different callback with the user space irqchip, am I right? But I should not state that in kernel irqchip has lower overhead since I have no data for it. > > On RT kernels however can you call eventfd_signal from interrupt > context? You cannot call spin_lock_irqsave (which can sleep) from a > non-threaded interrupt handler, can you? You would need a raw spin lock. Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT kernel. Will do this way on next patch. But not sure if it's overkill to use raw_spinlock there since the eventfd_signal is used by other caller also. Thanks --jyh > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] Provide simple noop dma ops
Hi Christian, On Tue, Oct 27, 2015 at 11:48:48PM +0100, Christian Borntraeger wrote: > +static dma_addr_t dma_noop_map_page(struct device *dev, struct page *page, > + unsigned long offset, size_t size, > + enum dma_data_direction dir, > + struct dma_attrs *attrs) > +{ > + return page_to_phys(page) + offset; > +} X86 also has its own version of these noop dma_ops, see arch/x86/kernel/pci-nommu.c. This one also checks the dma_mask and prints a warning if the physical address doesn't fit into the mask. I think this would make sense here too, and that we can also make x86 use the same generic noop-dma-ops your are introducing. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] virtio_net: Stop doing DMA from the stack
From: Andy LutomirskiOnce virtio starts using the DMA API, we won't be able to safely DMA from the stack. virtio-net does a couple of config DMA requests from small stack buffers -- switch to using dynamically-allocated memory. This should have no effect on any performance-critical code paths. Signed-off-by: Andy Lutomirski --- drivers/net/virtio_net.c | 53 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d8838dedb7a4..4f10f8a58811 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -976,31 +976,43 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, struct scatterlist *out) { struct scatterlist *sgs[4], hdr, stat; - struct virtio_net_ctrl_hdr ctrl; - virtio_net_ctrl_ack status = ~0; + + struct { + struct virtio_net_ctrl_hdr ctrl; + virtio_net_ctrl_ack status; + } *buf; + unsigned out_num = 0, tmp; + bool ret; /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); - ctrl.class = class; - ctrl.cmd = cmd; + buf = kmalloc(sizeof(*buf), GFP_ATOMIC); + if (!buf) + return false; + buf->status = ~0; + + buf->ctrl.class = class; + buf->ctrl.cmd = cmd; /* Add header */ - sg_init_one(, , sizeof(ctrl)); + sg_init_one(, >ctrl, sizeof(buf->ctrl)); sgs[out_num++] = if (out) sgs[out_num++] = out; /* Add return status. */ - sg_init_one(, , sizeof(status)); + sg_init_one(, >status, sizeof(buf->status)); sgs[out_num] = BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); - if (unlikely(!virtqueue_kick(vi->cvq))) - return status == VIRTIO_NET_OK; + if (unlikely(!virtqueue_kick(vi->cvq))) { + ret = (buf->status == VIRTIO_NET_OK); + goto out; + } /* Spin for a response, the kick causes an ioport write, trapping * into the hypervisor, so the request should be handled immediately. @@ -1009,7 +1021,11 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, !virtqueue_is_broken(vi->cvq)) cpu_relax(); - return status == VIRTIO_NET_OK; + ret = (buf->status == VIRTIO_NET_OK); + +out: + kfree(buf); + return ret; } static int virtnet_set_mac_address(struct net_device *dev, void *p) @@ -1151,7 +1167,7 @@ static void virtnet_set_rx_mode(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg[2]; - u8 promisc, allmulti; + u8 *cmdbyte; struct virtio_net_ctrl_mac *mac_data; struct netdev_hw_addr *ha; int uc_count; @@ -1163,22 +1179,25 @@ static void virtnet_set_rx_mode(struct net_device *dev) if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX)) return; - promisc = ((dev->flags & IFF_PROMISC) != 0); - allmulti = ((dev->flags & IFF_ALLMULTI) != 0); + cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC); + if (!cmdbyte) + return; - sg_init_one(sg, , sizeof(promisc)); + sg_init_one(sg, cmdbyte, sizeof(*cmdbyte)); + *cmdbyte = ((dev->flags & IFF_PROMISC) != 0); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_PROMISC, sg)) dev_warn(>dev, "Failed to %sable promisc mode.\n", -promisc ? "en" : "dis"); - - sg_init_one(sg, , sizeof(allmulti)); +*cmdbyte ? "en" : "dis"); + *cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_ALLMULTI, sg)) dev_warn(>dev, "Failed to %sable allmulti mode.\n", -allmulti ? "en" : "dis"); +*cmdbyte ? "en" : "dis"); + + kfree(cmdbyte); uc_count = netdev_uc_count(dev); mc_count = netdev_mc_count(dev); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] virtio DMA API core stuff
This switches virtio to use the DMA API unconditionally. I'm sure it breaks things, but it seems to work on x86 using virtio-pci, with and without Xen, and using both the modern 1.0 variant and the legacy variant. Andy Lutomirski (3): virtio_net: Stop doing DMA from the stack virtio_ring: Support DMA APIs virtio_pci: Use the DMA API drivers/net/virtio_net.c | 53 +++ drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_pci_common.h | 3 +- drivers/virtio/virtio_pci_legacy.c | 19 +++- drivers/virtio/virtio_pci_modern.c | 34 --- drivers/virtio/virtio_ring.c | 179 ++--- tools/virtio/linux/dma-mapping.h | 17 7 files changed, 241 insertions(+), 66 deletions(-) create mode 100644 tools/virtio/linux/dma-mapping.h -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] virtio_ring: Support DMA APIs
From: Andy Lutomirskivirtio_ring currently sends the device (usually a hypervisor) physical addresses of its I/O buffers. This is okay when DMA addresses and physical addresses are the same thing, but this isn't always the case. For example, this never works on Xen guests, and it is likely to fail if a physical "virtio" device ever ends up behind an IOMMU or swiotlb. The immediate use case for me is to enable virtio on Xen guests. For that to work, we need vring to support DMA address translation as well as a corresponding change to virtio_pci or to another driver. With this patch, if enabled, virtfs survives kmemleak and CONFIG_DMA_API_DEBUG. Signed-off-by: Andy Lutomirski --- drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_ring.c | 179 +++ tools/virtio/linux/dma-mapping.h | 17 3 files changed, 164 insertions(+), 34 deletions(-) create mode 100644 tools/virtio/linux/dma-mapping.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index cab9f3f63a38..77590320d44c 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -60,7 +60,7 @@ config VIRTIO_INPUT config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices" - depends on HAS_IOMEM + depends on HAS_IOMEM && HAS_DMA select VIRTIO ---help--- This drivers provides support for memory mapped virtio diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 096b857e7b75..911fbaa7a260 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -24,6 +24,7 @@ #include #include #include +#include #ifdef DEBUG /* For development, we want to crash whenever the ring is screwed. */ @@ -54,7 +55,14 @@ #define END_USE(vq) #endif -struct vring_virtqueue { +struct vring_desc_state +{ + void *data; /* Data for callback. */ + struct vring_desc *indir_desc; /* Indirect descriptor, if any. */ +}; + +struct vring_virtqueue +{ struct virtqueue vq; /* Actual memory layout for this queue */ @@ -92,12 +100,71 @@ struct vring_virtqueue { ktime_t last_add_time; #endif - /* Tokens for callbacks. */ - void *data[]; + /* Per-descriptor state. */ + struct vring_desc_state desc_state[]; }; #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) +/* + * The DMA ops on various arches are rather gnarly right now, and + * making all of the arch DMA ops work on the vring device itself + * is a mess. For now, we use the parent device for DMA ops. + */ +struct device *vring_dma_dev(const struct vring_virtqueue *vq) +{ + return vq->vq.vdev->dev.parent; +} + +/* Map one sg entry. */ +static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, + struct scatterlist *sg, + enum dma_data_direction direction) +{ + /* +* We can't use dma_map_sg, because we don't use scatterlists in +* the way it expects (we don't guarantee that the scatterlist +* will exist for the lifetime of the mapping). +*/ + return dma_map_page(vring_dma_dev(vq), + sg_page(sg), sg->offset, sg->length, + direction); +} + +static dma_addr_t vring_map_single(const struct vring_virtqueue *vq, + void *cpu_addr, size_t size, + enum dma_data_direction direction) +{ + return dma_map_single(vring_dma_dev(vq), + cpu_addr, size, direction); +} + +static void vring_unmap_one(const struct vring_virtqueue *vq, + struct vring_desc *desc) +{ + u16 flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); + + if (flags & VRING_DESC_F_INDIRECT) { + dma_unmap_single(vring_dma_dev(vq), +virtio64_to_cpu(vq->vq.vdev, desc->addr), +virtio32_to_cpu(vq->vq.vdev, desc->len), +(flags & VRING_DESC_F_WRITE) ? +DMA_FROM_DEVICE : DMA_TO_DEVICE); + } else { + dma_unmap_page(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + } +} + +static int vring_mapping_error(const struct vring_virtqueue *vq, + dma_addr_t addr) +{ + return dma_mapping_error(vring_dma_dev(vq), addr); +} + static struct vring_desc *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg, gfp_t gfp) { @@ -131,7 +198,7 @@ static inline int
[PATCH 3/3] virtio_pci: Use the DMA API
From: Andy LutomirskiThis fixes virtio-pci on platforms and busses that have IOMMUs. This will break the experimental QEMU Q35 IOMMU support until QEMU is fixed. In exchange, it fixes physical virtio hardware as well as virtio-pci running under Xen. We should clean up the virtqueue API to do its own allocation and teach virtqueue_get_avail and virtqueue_get_used to return DMA addresses directly. Signed-off-by: Andy Lutomirski --- drivers/virtio/virtio_pci_common.h | 3 ++- drivers/virtio/virtio_pci_legacy.c | 19 +++ drivers/virtio/virtio_pci_modern.c | 34 -- 3 files changed, 41 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index b976d968e793..cd6196b513ad 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -38,8 +38,9 @@ struct virtio_pci_vq_info { /* the number of entries in the queue */ int num; - /* the virtual address of the ring queue */ + /* the ring queue */ void *queue; + dma_addr_t queue_dma_addr; /* bus address */ /* the list node for the virtqueues list */ struct list_head node; diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 48bc9797e530..b5293e5f2af4 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -135,12 +135,14 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, info->msix_vector = msix_vec; size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN)); - info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO); + info->queue = dma_zalloc_coherent(_dev->pci_dev->dev, size, + >queue_dma_addr, + GFP_KERNEL); if (info->queue == NULL) return ERR_PTR(-ENOMEM); /* activate the queue */ - iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, + iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); /* create the vring */ @@ -169,7 +171,8 @@ out_assign: vring_del_virtqueue(vq); out_activate_queue: iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); - free_pages_exact(info->queue, size); + dma_free_coherent(_dev->pci_dev->dev, size, + info->queue, info->queue_dma_addr); return ERR_PTR(err); } @@ -194,7 +197,8 @@ static void del_vq(struct virtio_pci_vq_info *info) iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN)); - free_pages_exact(info->queue, size); + dma_free_coherent(_dev->pci_dev->dev, size, + info->queue, info->queue_dma_addr); } static const struct virtio_config_ops virtio_pci_config_ops = { @@ -227,6 +231,13 @@ int virtio_pci_legacy_probe(struct virtio_pci_device *vp_dev) return -ENODEV; } + rc = dma_set_mask_and_coherent(_dev->dev, DMA_BIT_MASK(64)); + if (rc) + rc = dma_set_mask_and_coherent(_dev->dev, + DMA_BIT_MASK(32)); + if (rc) + dev_warn(_dev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); + rc = pci_request_region(pci_dev, 0, "virtio-pci-legacy"); if (rc) return rc; diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 8e5cf194cc0b..fbe0bd1c4881 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -293,14 +293,16 @@ static size_t vring_pci_size(u16 num) return PAGE_ALIGN(vring_size(num, SMP_CACHE_BYTES)); } -static void *alloc_virtqueue_pages(int *num) +static void *alloc_virtqueue_pages(struct virtio_pci_device *vp_dev, + int *num, dma_addr_t *dma_addr) { void *pages; /* TODO: allocate each queue chunk individually */ for (; *num && vring_pci_size(*num) > PAGE_SIZE; *num /= 2) { - pages = alloc_pages_exact(vring_pci_size(*num), - GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN); + pages = dma_zalloc_coherent( + _dev->pci_dev->dev, vring_pci_size(*num), + dma_addr, GFP_KERNEL|__GFP_NOWARN); if (pages) return pages; } @@ -309,7 +311,9 @@ static void *alloc_virtqueue_pages(int *num) return NULL; /* Try to get a single page. You are my only hope! */ - return alloc_pages_exact(vring_pci_size(*num), GFP_KERNEL|__GFP_ZERO); + return dma_zalloc_coherent( +
[PATCH 1/3] context_tracking: remove duplicate enabled check
All calls to context_tracking_enter and context_tracking_exit are already checking context_tracking_is_enabled, except the context_tracking_user_enter and context_tracking_user_exit functions left in for the benefit of assembly calls. Pull the check up to those functions, by making them simple wrappers around the user_enter and user_exit inline functions. Cc: Andy LutomirskiCc: Frederic Weisbecker Cc: Rik van Riel Cc: Paul McKenney Signed-off-by: Paolo Bonzini --- include/linux/context_tracking.h | 4 ++-- kernel/context_tracking.c| 16 ++-- 2 files changed, 4 insertions(+), 16 deletions(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 008fc67d0d96..6ef136ff0897 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -18,13 +18,13 @@ extern void context_tracking_user_exit(void); static inline void user_enter(void) { if (context_tracking_is_enabled()) - context_tracking_user_enter(); + context_tracking_enter(CONTEXT_USER); } static inline void user_exit(void) { if (context_tracking_is_enabled()) - context_tracking_user_exit(); + context_tracking_exit(CONTEXT_USER); } static inline enum ctx_state exception_enter(void) diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 0a495ab35bc7..6d4c6ce21275 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -63,15 +63,6 @@ void context_tracking_enter(enum ctx_state state) unsigned long flags; /* -* Repeat the user_enter() check here because some archs may be calling -* this from asm and if no CPU needs context tracking, they shouldn't -* go further. Repeat the check here until they support the inline static -* key check. -*/ - if (!context_tracking_is_enabled()) - return; - - /* * Some contexts may involve an exception occuring in an irq, * leading to that nesting: * rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit() @@ -128,7 +119,7 @@ EXPORT_SYMBOL_GPL(context_tracking_enter); void context_tracking_user_enter(void) { - context_tracking_enter(CONTEXT_USER); + user_enter(); } NOKPROBE_SYMBOL(context_tracking_user_enter); @@ -148,9 +139,6 @@ void context_tracking_exit(enum ctx_state state) { unsigned long flags; - if (!context_tracking_is_enabled()) - return; - if (in_interrupt()) return; @@ -181,7 +169,7 @@ EXPORT_SYMBOL_GPL(context_tracking_exit); void context_tracking_user_exit(void) { - context_tracking_exit(CONTEXT_USER); + user_exit(); } NOKPROBE_SYMBOL(context_tracking_user_exit); -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] context_tracking: avoid irq_save/irq_restore on guest entry and exit
guest_enter and guest_exit must be called with interrupts disabled, since they take the vtime_seqlock with write_seq{lock,unlock}. Therefore, it is not necessary to check for exceptions, nor to save/restore the IRQ state, when context tracking functions are called by guest_enter and guest_exit. Split the body of context_tracking_entry and context_tracking_exit out to __-prefixed functions, and use them from KVM. Rik van Riel has measured this to speed up a tight vmentry/vmexit loop by about 2%. Cc: Andy LutomirskiCc: Frederic Weisbecker Cc: Rik van Riel Cc: Paul McKenney Signed-off-by: Paolo Bonzini --- include/linux/context_tracking.h | 8 +++-- kernel/context_tracking.c| 64 2 files changed, 44 insertions(+), 28 deletions(-) diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 6ef136ff0897..68b575afe5f5 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -10,6 +10,10 @@ #ifdef CONFIG_CONTEXT_TRACKING extern void context_tracking_cpu_set(int cpu); +/* Called with interrupts disabled. */ +extern void __context_tracking_enter(enum ctx_state state); +extern void __context_tracking_exit(enum ctx_state state); + extern void context_tracking_enter(enum ctx_state state); extern void context_tracking_exit(enum ctx_state state); extern void context_tracking_user_enter(void); @@ -88,13 +92,13 @@ static inline void guest_enter(void) current->flags |= PF_VCPU; if (context_tracking_is_enabled()) - context_tracking_enter(CONTEXT_GUEST); + __context_tracking_enter(CONTEXT_GUEST); } static inline void guest_exit(void) { if (context_tracking_is_enabled()) - context_tracking_exit(CONTEXT_GUEST); + __context_tracking_exit(CONTEXT_GUEST); if (vtime_accounting_enabled()) vtime_guest_exit(current); diff --git a/kernel/context_tracking.c b/kernel/context_tracking.c index 6d4c6ce21275..d8560ee3bab7 100644 --- a/kernel/context_tracking.c +++ b/kernel/context_tracking.c @@ -58,27 +58,13 @@ static void context_tracking_recursion_exit(void) * instructions to execute won't use any RCU read side critical section * because this function sets RCU in extended quiescent state. */ -void context_tracking_enter(enum ctx_state state) +void __context_tracking_enter(enum ctx_state state) { - unsigned long flags; - - /* -* Some contexts may involve an exception occuring in an irq, -* leading to that nesting: -* rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit() -* This would mess up the dyntick_nesting count though. And rcu_irq_*() -* helpers are enough to protect RCU uses inside the exception. So -* just return immediately if we detect we are in an IRQ. -*/ - if (in_interrupt()) - return; - /* Kernel threads aren't supposed to go to userspace */ WARN_ON_ONCE(!current->mm); - local_irq_save(flags); if (!context_tracking_recursion_enter()) - goto out_irq_restore; + return; if ( __this_cpu_read(context_tracking.state) != state) { if (__this_cpu_read(context_tracking.active)) { @@ -111,7 +97,27 @@ void context_tracking_enter(enum ctx_state state) __this_cpu_write(context_tracking.state, state); } context_tracking_recursion_exit(); -out_irq_restore: +} +NOKPROBE_SYMBOL(__context_tracking_enter); +EXPORT_SYMBOL_GPL(__context_tracking_enter); + +void context_tracking_enter(enum ctx_state state) +{ + unsigned long flags; + + /* +* Some contexts may involve an exception occuring in an irq, +* leading to that nesting: +* rcu_irq_enter() rcu_user_exit() rcu_user_exit() rcu_irq_exit() +* This would mess up the dyntick_nesting count though. And rcu_irq_*() +* helpers are enough to protect RCU uses inside the exception. So +* just return immediately if we detect we are in an IRQ. +*/ + if (in_interrupt()) + return; + + local_irq_save(flags); + __context_tracking_enter(state); local_irq_restore(flags); } NOKPROBE_SYMBOL(context_tracking_enter); @@ -135,16 +141,10 @@ NOKPROBE_SYMBOL(context_tracking_user_enter); * This call supports re-entrancy. This way it can be called from any exception * handler without needing to know if we came from userspace or not. */ -void context_tracking_exit(enum ctx_state state) +void __context_tracking_exit(enum ctx_state state) { - unsigned long flags; - - if (in_interrupt()) - return; - - local_irq_save(flags); if (!context_tracking_recursion_enter()) - goto out_irq_restore; +
[PATCH 3/3] x86: context_tracking: avoid irq_save/irq_restore on kernel entry and exit
x86 always calls user_enter and user_exit with interrupt disabled. Therefore, it is not necessary to check for exceptions, nor to save/restore the IRQ state, when context tracking functions are called by guest_enter and guest_exit. Use the previously introduced __context_tracking_entry and __context_tracking_exit. Cc: Andy LutomirskiCc: Frederic Weisbecker Cc: Rik van Riel Cc: Paul McKenney Signed-off-by: Paolo Bonzini --- arch/x86/entry/common.c | 4 ++-- include/linux/context_tracking.h | 42 ++-- 2 files changed, 30 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 80dcc9261ca3..ba707c6da818 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -33,7 +33,7 @@ __visible void enter_from_user_mode(void) { CT_WARN_ON(ct_state() != CONTEXT_USER); - user_exit(); + __user_exit(); } #endif @@ -262,7 +262,7 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs) local_irq_disable(); } - user_enter(); + __user_enter(); } /* diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 68b575afe5f5..d36a5c8085b7 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -19,18 +19,6 @@ extern void context_tracking_exit(enum ctx_state state); extern void context_tracking_user_enter(void); extern void context_tracking_user_exit(void); -static inline void user_enter(void) -{ - if (context_tracking_is_enabled()) - context_tracking_enter(CONTEXT_USER); - -} -static inline void user_exit(void) -{ - if (context_tracking_is_enabled()) - context_tracking_exit(CONTEXT_USER); -} - static inline enum ctx_state exception_enter(void) { enum ctx_state prev_ctx; @@ -67,13 +55,39 @@ static inline enum ctx_state ct_state(void) this_cpu_read(context_tracking.state) : CONTEXT_DISABLED; } #else -static inline void user_enter(void) { } -static inline void user_exit(void) { } +static inline void __context_tracking_enter(enum ctx_state state) { } +static inline void __context_tracking_exit(enum ctx_state state) { } +static inline void context_tracking_enter(enum ctx_state state) { } +static inline void context_tracking_exit(enum ctx_state state) { } static inline enum ctx_state exception_enter(void) { return 0; } static inline void exception_exit(enum ctx_state prev_ctx) { } static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; } #endif /* !CONFIG_CONTEXT_TRACKING */ +static inline void __user_enter(void) +{ + if (context_tracking_is_enabled()) + __context_tracking_enter(CONTEXT_USER); + +} +static inline void __user_exit(void) +{ + if (context_tracking_is_enabled()) + __context_tracking_exit(CONTEXT_USER); +} + +static inline void user_enter(void) +{ + if (context_tracking_is_enabled()) + context_tracking_enter(CONTEXT_USER); + +} +static inline void user_exit(void) +{ + if (context_tracking_is_enabled()) + context_tracking_exit(CONTEXT_USER); +} + #define CT_WARN_ON(cond) WARN_ON(context_tracking_is_enabled() && (cond)) #ifdef CONFIG_CONTEXT_TRACKING_FORCE -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 0/4] dma ops and virtio
Hi Christian, On Tue, Oct 27, 2015 at 11:48:47PM +0100, Christian Borntraeger wrote: > Here is a very quick (still untested) shot at providing the s390 part: > - patch1: dummy dma ops, inspired by the alpha code > - patch2: replace some of the alpha functions with the dummy ones > - patch3: allow per device dma ops for s390 > - patch4: wire up virtio dma ops Thanks for the patches, I think they are a good start. I sent you comments for two of the patches, the others look good to me. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 27/10/2015 22:26, Yunhong Jiang wrote: >> > On RT kernels however can you call eventfd_signal from interrupt >> > context? You cannot call spin_lock_irqsave (which can sleep) from a >> > non-threaded interrupt handler, can you? You would need a raw spin lock. > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT > kernel. Will do this way on next patch. But not sure if it's overkill to use > raw_spinlock there since the eventfd_signal is used by other caller also. No, I don't think you can use raw_spinlock there. The problem is not just eventfd_signal, it is especially wake_up_locked_poll. You cannot convert the whole workqueue infrastructure to use raw_spinlock. Alex, would it make sense to use the IRQ bypass infrastructure always, not just for VT-d, to do the MSI injection directly from the VFIO interrupt handler and bypass the eventfd? Basically this would add an RCU-protected list of consumers matching the token to struct irq_bypass_producer, and a int (*inject)(struct irq_bypass_consumer *); callback to struct irq_bypass_consumer. If any callback returns true, the eventfd is not signaled. The KVM implementation would be like this (compare with virt/kvm/eventfd.c): /* Extracted out of irqfd_wakeup */ static int irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) { ... } /* Extracted out of irqfd_wakeup */ static int irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd) { ... } static int irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; if (flags & POLLIN) irqfd_wakeup_pollin(irqfd); if (flags & POLLHUP) irqfd_wakeup_pollhup(irqfd); return 0; } static int kvm_arch_irq_bypass_inject( struct irq_bypass_consumer *cons) { struct kvm_kernel_irqfd *irqfd = container_of(cons, struct kvm_kernel_irqfd, consumer); irqfd_wakeup_pollin(irqfd); } Or do you think it would be a hack? The latency improvement might actually be even better than what Yunhong is already reporting. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] context_tracking: streamline code, avoid IRQ save/restore
The first two of these patches were posted last February, the last one is new. Rik's old measurements were that it shaved around .3 microseconds on each iteration of his KVM benchmark. I guess three days before the start of the merge window is not the best time to post patches. However, I brought this series up at kernel summit yesterday, and Andy's cleanups actually makes it trivial to apply this to syscall entry. So here it is, perhaps it's worth it. Assuming it works, of course, because this is compile-tested only. :) Paolo Paolo Bonzini (3): context_tracking: remove duplicate enabled check context_tracking: avoid irq_save/irq_restore on guest entry and exit x86: context_tracking: avoid irq_save/irq_restore on kernel entry and exit arch/x86/entry/common.c | 4 +- include/linux/context_tracking.h | 50 + kernel/context_tracking.c| 80 3 files changed, 76 insertions(+), 58 deletions(-) -- 2.5.0 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] s390/virtio: use noop dma ops
On Tue, Oct 27, 2015 at 11:48:51PM +0100, Christian Borntraeger wrote: > @@ -1093,6 +1094,7 @@ static void virtio_ccw_auto_online(void *data, > async_cookie_t cookie) > struct ccw_device *cdev = data; > int ret; > > + cdev->dev.archdata.dma_ops = _noop_ops; > ret = ccw_device_set_online(cdev); > if (ret) > dev_warn(>dev, "Failed to set online: %d\n", ret); Hmm, drivers usually don't deal with setting the dma_ops for their devices, as they depend on the platform and not so much on the device itself. Can you do this special-case handling from device independent platform code, where you also setup dma_ops for other devices? Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio_ring: Support DMA APIs
Am 28.10.2015 um 10:17 schrieb Andy Lutomirski: @@ -423,27 +522,42 @@ EXPORT_SYMBOL_GPL(virtqueue_kick); > > static void detach_buf(struct vring_virtqueue *vq, unsigned int head) > { > - unsigned int i; > + unsigned int i, j; > + u16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); > > /* Clear data ptr. */ > - vq->data[head] = NULL; > + vq->desc_state[head].data = NULL; > > - /* Put back on free list: find end */ > + /* Put back on free list: unmap first-level descriptors and find end */ > i = head; > > - /* Free the indirect table */ > - if (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, > VRING_DESC_F_INDIRECT)) > - kfree(phys_to_virt(virtio64_to_cpu(vq->vq.vdev, > vq->vring.desc[i].addr))); > - > - while (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, > VRING_DESC_F_NEXT)) { > + while (vq->vring.desc[i].flags & nextflag) { > + vring_unmap_one(vq, >vring.desc[i]); > i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next); > vq->vq.num_free++; > } > > + vring_unmap_one(vq, >vring.desc[i]); > vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); > vq->free_head = head; > + > /* Plus final descriptor */ > vq->vq.num_free++; > + > + /* Free the indirect table, if any, now that it's unmapped. */ > + if (vq->desc_state[head].indir_desc) { > + struct vring_desc *indir_desc = vq->desc_state[head].indir_desc; > + u32 len = vq->vring.desc[head].len; > + > + BUG_ON(!(vq->vring.desc[head].flags & VRING_DESC_F_INDIRECT)); > + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); > + > + for (j = 0; j < len / sizeof(struct vring_desc); j++) > + vring_unmap_one(vq, _desc[j]); > + > + kfree(vq->desc_state[head].indir_desc); > + vq->desc_state[head].indir_desc = NULL; > + } > } something seems to be broken with indirect descriptors with that change [1.885451] [ cut here ] [1.885454] kernel BUG at drivers/virtio/virtio_ring.c:552! [1.885487] illegal operation: 0001 ilc:1 [#1] SMP [1.885489] Modules linked in: [1.885491] CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc3+ #250 [1.885493] task: 0be49988 ti: 0be54000 task.ti: 0be54000 [1.885514] Krnl PSW : 0404c0018000 0059b9d2 (detach_buf+0x1ca/0x1d0) [1.885519]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3 Krnl GPRS: 3000 0c0c8c00 0a89c000 [1.885522]0059b8fa 0001 [1.885523] 0100 0a89c000 [1.885525]0c082000 0c082000 0059b8fa 0c7fbc88 [1.885531] Krnl Code: 0059b9c6: a7f4ff40brc 15,59b846 0059b9ca: a7f40001 brc 15,59b9cc #0059b9ce: a7f40001 brc 15,59b9d0 >0059b9d2: 0707 bcr 0,%r7 0059b9d4: 0707 bcr 0,%r7 0059b9d6: 0707 bcr 0,%r7 0059b9d8: c004 brcl0,59b9d8 0059b9de: ebcff0780024 stmg%r12,%r15,120(%r15) [1.885542] Call Trace: [1.885544] ([<0059b8fa>] detach_buf+0xf2/0x1d0) [1.885546] [<0059bbb4>] virtqueue_get_buf+0xcc/0x218 [1.885549] [<005dd8fe>] virtblk_done+0xa6/0x120 [1.885550] [<0059b66e>] vring_interrupt+0x7e/0x108 [1.885552] [<006c21a4>] virtio_airq_handler+0x7c/0x120 [1.885554] [<00657e54>] do_airq_interrupt+0xa4/0xc8 [1.885558] [<001b79a0>] handle_irq_event_percpu+0x60/0x1f0 [1.885560] [<001bbbea>] handle_percpu_irq+0x72/0xa0 [1.885561] [<001b6fa4>] generic_handle_irq+0x4c/0x78 [1.885564] [<0010cc7c>] do_IRQ+0x64/0x88 [1.885566] Btrfs loaded [1.885600] [<0080e30a>] io_int_handler+0x10a/0x218 [1.885603] [<00186b9e>] wake_up_new_task+0x1be/0x260 [1.885604] ([<00186b6a>] wake_up_new_task+0x18a/0x260) [1.885606] [<00156182>] _do_fork+0xfa/0x338 [1.885608] [<0015644e>] kernel_thread+0x4e/0x60 [1.885609] [<00179202>] kthreadd+0x1a2/0x238 [1.885611] [<0080e056>] kernel_thread_starter+0x6/0xc [1.885613] [<0080e050>] kernel_thread_starter+0x0/0xc [1.885614] Last Breaking-Event-Address: [1.885615] [<0059b9ce>] detach_buf+0x1c6/0x1d0 [1.885617] [1.885618] Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a
Re: [PATCH 2/3] virtio_ring: Support DMA APIs
Hi Andy, On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: > From: Andy Lutomirski> > virtio_ring currently sends the device (usually a hypervisor) > physical addresses of its I/O buffers. This is okay when DMA > addresses and physical addresses are the same thing, but this isn't > always the case. For example, this never works on Xen guests, and > it is likely to fail if a physical "virtio" device ever ends up > behind an IOMMU or swiotlb. The overall code looks good, but I havn't seen and dma_sync* calls. When swiotlb=force is in use this would break. > + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, > vring_map_single( > + vq, > + desc, total_sg * sizeof(struct vring_desc), > + DMA_TO_DEVICE)); Nit: I think readability could be improved here by using a temporary variable for the return value of vring_map_single(). Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] virtio_net: Stop doing DMA from the stack
On Tue, Oct 27, 2015 at 06:17:08PM -0700, Andy Lutomirski wrote: > From: Andy Lutomirski> > Once virtio starts using the DMA API, we won't be able to safely DMA > from the stack. virtio-net does a couple of config DMA requests > from small stack buffers -- switch to using dynamically-allocated > memory. > > This should have no effect on any performance-critical code paths. > > Signed-off-by: Andy Lutomirski > --- > drivers/net/virtio_net.c | 53 > > 1 file changed, 36 insertions(+), 17 deletions(-) Reviewed-by: Joerg Roedel -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: fix eflags state following processor init/reset
Ping, :-) On 10/21/15 2:28 PM, Wanpeng Li wrote: Reference SDM 3.4.3: Following initialization of the processor (either by asserting the RESET pin or the INIT pin), the state of the EFLAGS register is 0002H. However, the eflags fixed bit is not set and other bits are also not cleared during the init/reset in kvm. This patch fix it by set eflags register to 0002H following initialization of the processor. Signed-off-by: Wanpeng Li--- arch/x86/kvm/vmx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b680c2e..326f6ea 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4935,6 +4935,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vmx_set_efer(vcpu, 0); vmx_fpu_activate(vcpu); update_exception_bitmap(vcpu); + vmx_set_rflags(vcpu, X86_EFLAGS_FIXED); vpid_sync_context(vmx->vpid); } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio_ring: Support DMA APIs
On Tue, Oct 27, 2015 at 7:27 PM, Christian Borntraegerwrote: > Am 28.10.2015 um 10:17 schrieb Andy Lutomirski: > @@ -423,27 +522,42 @@ EXPORT_SYMBOL_GPL(virtqueue_kick); >> >> static void detach_buf(struct vring_virtqueue *vq, unsigned int head) >> { >> - unsigned int i; >> + unsigned int i, j; >> + u16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT); >> >> /* Clear data ptr. */ >> - vq->data[head] = NULL; >> + vq->desc_state[head].data = NULL; >> >> - /* Put back on free list: find end */ >> + /* Put back on free list: unmap first-level descriptors and find end */ >> i = head; >> >> - /* Free the indirect table */ >> - if (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, >> VRING_DESC_F_INDIRECT)) >> - kfree(phys_to_virt(virtio64_to_cpu(vq->vq.vdev, >> vq->vring.desc[i].addr))); >> - >> - while (vq->vring.desc[i].flags & cpu_to_virtio16(vq->vq.vdev, >> VRING_DESC_F_NEXT)) { >> + while (vq->vring.desc[i].flags & nextflag) { >> + vring_unmap_one(vq, >vring.desc[i]); >> i = virtio16_to_cpu(vq->vq.vdev, vq->vring.desc[i].next); >> vq->vq.num_free++; >> } >> >> + vring_unmap_one(vq, >vring.desc[i]); >> vq->vring.desc[i].next = cpu_to_virtio16(vq->vq.vdev, vq->free_head); >> vq->free_head = head; >> + >> /* Plus final descriptor */ >> vq->vq.num_free++; >> + >> + /* Free the indirect table, if any, now that it's unmapped. */ >> + if (vq->desc_state[head].indir_desc) { >> + struct vring_desc *indir_desc = >> vq->desc_state[head].indir_desc; >> + u32 len = vq->vring.desc[head].len; >> + >> + BUG_ON(!(vq->vring.desc[head].flags & VRING_DESC_F_INDIRECT)); >> + BUG_ON(len == 0 || len % sizeof(struct vring_desc)); >> + >> + for (j = 0; j < len / sizeof(struct vring_desc); j++) >> + vring_unmap_one(vq, _desc[j]); >> + >> + kfree(vq->desc_state[head].indir_desc); >> + vq->desc_state[head].indir_desc = NULL; >> + } >> } > > something seems to be broken with indirect descriptors with that change You're big endian, right? I had an endian handling bug, and I'll fix it in v2. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] context_tracking: remove duplicate enabled check
On Tue, Oct 27, 2015 at 6:39 PM, Paolo Bonziniwrote: > All calls to context_tracking_enter and context_tracking_exit > are already checking context_tracking_is_enabled, except the > context_tracking_user_enter and context_tracking_user_exit > functions left in for the benefit of assembly calls. > > Pull the check up to those functions, by making them simple > wrappers around the user_enter and user_exit inline functions. This makes my brain hurt. Assuming that this survives a boot with CONFIG_CONTEXT_TRACKING_FORCE=y and CONFIG_PROVE_LOCKING=y (with the implied CONFIG_PROVE_RCU), then: Acked-by: Andy Lutomirski -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] context_tracking: avoid irq_save/irq_restore on guest entry and exit
On Tue, Oct 27, 2015 at 6:39 PM, Paolo Bonziniwrote: > guest_enter and guest_exit must be called with interrupts disabled, > since they take the vtime_seqlock with write_seq{lock,unlock}. > Therefore, it is not necessary to check for exceptions, nor to > save/restore the IRQ state, when context tracking functions are > called by guest_enter and guest_exit. > > Split the body of context_tracking_entry and context_tracking_exit > out to __-prefixed functions, and use them from KVM. > > Rik van Riel has measured this to speed up a tight vmentry/vmexit > loop by about 2%. Looks generally sensible. I'm not familiar enough with the code to call it reviewed-by while sitting on the airport shuttle. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] context_tracking: streamline code, avoid IRQ save/restore
On 28/10/2015 02:39, Paolo Bonzini wrote: > The first two of these patches were posted last February, the last one > is new. Rik's old measurements were that it shaved around .3 microseconds > on each iteration of his KVM benchmark. > > I guess three days before the start of the merge window is not > the best time to post patches. However, I brought this series up at > kernel summit yesterday, and Andy's cleanups actually makes it trivial > to apply this to syscall entry. So here it is, perhaps it's worth it. > > Assuming it works, of course, because this is compile-tested only. :) Heh, Andy said it doesn't. :) However, we could still merge the first two patches for 4.4 since they have been tested. I can even take them myself if I get Acked-by. The third one will have to wait for 4.5. Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] virtio_net: Stop doing DMA from the stack
On Tue, Oct 27, 2015 at 7:07 PM, Joerg Roedelwrote: > On Tue, Oct 27, 2015 at 06:17:08PM -0700, Andy Lutomirski wrote: >> From: Andy Lutomirski >> >> Once virtio starts using the DMA API, we won't be able to safely DMA >> from the stack. virtio-net does a couple of config DMA requests >> from small stack buffers -- switch to using dynamically-allocated >> memory. >> >> This should have no effect on any performance-critical code paths. >> >> Signed-off-by: Andy Lutomirski >> --- >> drivers/net/virtio_net.c | 53 >> >> 1 file changed, 36 insertions(+), 17 deletions(-) > > Reviewed-by: Joerg Roedel Should I just send this to netdev now for 4.4? We could get this one out of the way, maybe. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] virtio_pci: Use the DMA API
On Tue, Oct 27, 2015 at 06:17:10PM -0700, Andy Lutomirski wrote: > From: Andy Lutomirski> > This fixes virtio-pci on platforms and busses that have IOMMUs. This > will break the experimental QEMU Q35 IOMMU support until QEMU is > fixed. In exchange, it fixes physical virtio hardware as well as > virtio-pci running under Xen. > > We should clean up the virtqueue API to do its own allocation and > teach virtqueue_get_avail and virtqueue_get_used to return DMA > addresses directly. > > Signed-off-by: Andy Lutomirski > --- > drivers/virtio/virtio_pci_common.h | 3 ++- > drivers/virtio/virtio_pci_legacy.c | 19 +++ > drivers/virtio/virtio_pci_modern.c | 34 -- > 3 files changed, 41 insertions(+), 15 deletions(-) Same here, you need to call the dma_sync* functions when passing data from/to the virtio-device. I think a good test for that is to boot a virtio kvm-guest with swiotlb=force and see if it still works. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio_ring: Support DMA APIs
On Tue, Oct 27, 2015 at 07:13:56PM -0700, Andy Lutomirski wrote: > On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedelwrote: > > Hi Andy, > > > > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: > >> From: Andy Lutomirski > >> > >> virtio_ring currently sends the device (usually a hypervisor) > >> physical addresses of its I/O buffers. This is okay when DMA > >> addresses and physical addresses are the same thing, but this isn't > >> always the case. For example, this never works on Xen guests, and > >> it is likely to fail if a physical "virtio" device ever ends up > >> behind an IOMMU or swiotlb. > > > > The overall code looks good, but I havn't seen and dma_sync* calls. > > When swiotlb=force is in use this would break. > > > >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, > >> vring_map_single( > >> + vq, > >> + desc, total_sg * sizeof(struct vring_desc), > >> + DMA_TO_DEVICE)); > > > > Are you talking about a dma_sync call on the descriptor ring itself? > Isn't dma_alloc_coherent supposed to make that unnecessary? I should > move the allocation into the virtqueue code. > > The docs suggest that I might need to "flush the processor's write > buffers before telling devices to read that memory". I'm not sure how > to do that. The write buffers should be flushed by the dma-api functions if necessary. For dma_alloc_coherent allocations you don't need to call dma_sync*, but for the map_single/map_page/map_sg ones, as these might be bounce-buffered. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] virtio_pci: Use the DMA API
On Wed, 2015-10-28 at 11:15 +0900, Joerg Roedel wrote: > On Tue, Oct 27, 2015 at 06:17:10PM -0700, Andy Lutomirski wrote: > > From: Andy Lutomirski> > > > This fixes virtio-pci on platforms and busses that have IOMMUs. > > This > > will break the experimental QEMU Q35 IOMMU support until QEMU is > > fixed. In exchange, it fixes physical virtio hardware as well as > > virtio-pci running under Xen. > > > > We should clean up the virtqueue API to do its own allocation and > > teach virtqueue_get_avail and virtqueue_get_used to return DMA > > addresses directly. > > > > Signed-off-by: Andy Lutomirski > > --- > > drivers/virtio/virtio_pci_common.h | 3 ++- > > drivers/virtio/virtio_pci_legacy.c | 19 +++ > > drivers/virtio/virtio_pci_modern.c | 34 -- > > > > 3 files changed, 41 insertions(+), 15 deletions(-) > > Same here, you need to call the dma_sync* functions when passing data > from/to the virtio-device. > > I think a good test for that is to boot a virtio kvm-guest with > swiotlb=force and see if it still works. That's useful but doesn't cover the cases where dma_wmb() is needed, right? We should make sure we're handling descriptors properly as we would for real hardware, and ensuring that addresses etc. are written to the descriptor (and a write barrier occurs) *before* the bit is set which tells the 'hardware' that it owns that descriptor. AFAICT we're currently setting the flags *first* in virtqueue_add(), let alone the missing write barrier between the two. -- dwmw2 smime.p7s Description: S/MIME cryptographic signature
[PATCH v2 0/3] virtio DMA API core stuff
This switches virtio to use the DMA API unconditionally. I'm sure it breaks things, but it seems to work on x86 using virtio-pci, with and without Xen, and using both the modern 1.0 variant and the legacy variant. Changes from v1: - Fix an endian conversion error causing a BUG to hit. - Fix a DMA ordering issue (swiotlb=force works now). - Minor cleanups. Andy Lutomirski (3): virtio_net: Stop doing DMA from the stack virtio_ring: Support DMA APIs virtio_pci: Use the DMA API drivers/net/virtio_net.c | 53 +++ drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_pci_common.h | 3 +- drivers/virtio/virtio_pci_legacy.c | 19 +++- drivers/virtio/virtio_pci_modern.c | 34 +-- drivers/virtio/virtio_ring.c | 187 ++--- tools/virtio/linux/dma-mapping.h | 17 7 files changed, 246 insertions(+), 69 deletions(-) create mode 100644 tools/virtio/linux/dma-mapping.h -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio_ring: Support DMA APIs
On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedelwrote: > Hi Andy, > > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: >> From: Andy Lutomirski >> >> virtio_ring currently sends the device (usually a hypervisor) >> physical addresses of its I/O buffers. This is okay when DMA >> addresses and physical addresses are the same thing, but this isn't >> always the case. For example, this never works on Xen guests, and >> it is likely to fail if a physical "virtio" device ever ends up >> behind an IOMMU or swiotlb. > > The overall code looks good, but I havn't seen and dma_sync* calls. > When swiotlb=force is in use this would break. > >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, >> vring_map_single( >> + vq, >> + desc, total_sg * sizeof(struct vring_desc), >> + DMA_TO_DEVICE)); > Are you talking about a dma_sync call on the descriptor ring itself? Isn't dma_alloc_coherent supposed to make that unnecessary? I should move the allocation into the virtqueue code. The docs suggest that I might need to "flush the processor's write buffers before telling devices to read that memory". I'm not sure how to do that. > Nit: I think readability could be improved here by using a temporary > variable for the return value of vring_map_single(). > I'll do something like that for v2. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] virtio_pci: Use the DMA API
On Wed, Oct 28, 2015 at 11:15:30AM +0900, Joerg Roedel wrote: > Same here, you need to call the dma_sync* functions when passing data > from/to the virtio-device. Okay, forget about this comment. This patch only converts to dma_coherent allocations, which don't need synchronization. > I think a good test for that is to boot a virtio kvm-guest with > swiotlb=force and see if it still works. But this holds, Its a good way to test if your changes work with bounce-buffering. Together with DMA_API_DEBUG you also see if your specified dma_directions are right. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] virtio_pci: Use the DMA API
On Wed, Oct 28, 2015 at 11:22:52AM +0900, David Woodhouse wrote: > On Wed, 2015-10-28 at 11:15 +0900, Joerg Roedel wrote: > > I think a good test for that is to boot a virtio kvm-guest with > > swiotlb=force and see if it still works. > > That's useful but doesn't cover the cases where dma_wmb() is needed, > right? > > We should make sure we're handling descriptors properly as we would for > real hardware, and ensuring that addresses etc. are written to the > descriptor (and a write barrier occurs) *before* the bit is set which > tells the 'hardware' that it owns that descriptor. Hmm, good question. The virtio code has virtio_rmb/wmb and should already call it in the right places. The virtio-barriers default to rmb()/wmb() unless weak_barriers is set, in which case it calls dma_rmb()/wmb(), just a compiler barrier on x86. And weak_barriers is set by the virtio drivers when creating the queue, and the drivers should know what they are doing. Saying this, I think the virtio-drivers should already get this right. Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix performance on LE hosts
From: "Michael S. Tsirkin"Date: Tue, 27 Oct 2015 11:37:39 +0200 > commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian > support for legacy devices") introduced a minor regression: even with > cross-endian disabled, and even on LE host, vhost_is_little_endian is > checking is_le flag so there's always a branch. > > To fix, simply check virtio_legacy_is_little_endian first. > > Cc: Greg Kurz > Signed-off-by: Michael S. Tsirkin Applied. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] virtio_ring: Support DMA APIs
On Tue, Oct 27, 2015 at 7:21 PM, Joerg Roedelwrote: > On Tue, Oct 27, 2015 at 07:13:56PM -0700, Andy Lutomirski wrote: >> On Tue, Oct 27, 2015 at 7:06 PM, Joerg Roedel wrote: >> > Hi Andy, >> > >> > On Tue, Oct 27, 2015 at 06:17:09PM -0700, Andy Lutomirski wrote: >> >> From: Andy Lutomirski >> >> >> >> virtio_ring currently sends the device (usually a hypervisor) >> >> physical addresses of its I/O buffers. This is okay when DMA >> >> addresses and physical addresses are the same thing, but this isn't >> >> always the case. For example, this never works on Xen guests, and >> >> it is likely to fail if a physical "virtio" device ever ends up >> >> behind an IOMMU or swiotlb. >> > >> > The overall code looks good, but I havn't seen and dma_sync* calls. >> > When swiotlb=force is in use this would break. >> > >> >> + vq->vring.desc[head].addr = cpu_to_virtio64(_vq->vdev, >> >> vring_map_single( >> >> + vq, >> >> + desc, total_sg * sizeof(struct vring_desc), >> >> + DMA_TO_DEVICE)); >> > >> >> Are you talking about a dma_sync call on the descriptor ring itself? >> Isn't dma_alloc_coherent supposed to make that unnecessary? I should >> move the allocation into the virtqueue code. >> >> The docs suggest that I might need to "flush the processor's write >> buffers before telling devices to read that memory". I'm not sure how >> to do that. > > The write buffers should be flushed by the dma-api functions if > necessary. For dma_alloc_coherent allocations you don't need to call > dma_sync*, but for the map_single/map_page/map_sg ones, as these might > be bounce-buffered. I think that all the necessary barriers are already there. I had a nasty bug that swiotlb=force exposed, though, which I've fixed. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] x86: context_tracking: avoid irq_save/irq_restore on kernel entry and exit
On Tue, Oct 27, 2015 at 6:39 PM, Paolo Bonziniwrote: > x86 always calls user_enter and user_exit with interrupt disabled. > Therefore, it is not necessary to check for exceptions, nor to > save/restore the IRQ state, when context tracking functions are > called by guest_enter and guest_exit. > > Use the previously introduced __context_tracking_entry and > __context_tracking_exit. x86 isn't ready for this yet. We could do a quick-and-dirty fix with explicit IRQs-on-and-off much protected by the static key, or we could just wait until I finish the syscall cleanup. I favor the latter, but you're all welcome to do the former and I'll review it. BTW, Frederic, I have a static key patch coming that I think you'll like. I'll send it tomorrow once I'm in front of a real computer. --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/3] virtio_ring: Support DMA APIs
virtio_ring currently sends the device (usually a hypervisor) physical addresses of its I/O buffers. This is okay when DMA addresses and physical addresses are the same thing, but this isn't always the case. For example, this never works on Xen guests, and it is likely to fail if a physical "virtio" device ever ends up behind an IOMMU or swiotlb. The immediate use case for me is to enable virtio on Xen guests. For that to work, we need vring to support DMA address translation as well as a corresponding change to virtio_pci or to another driver. With this patch, if enabled, virtfs survives kmemleak and CONFIG_DMA_API_DEBUG. Signed-off-by: Andy Lutomirski--- drivers/virtio/Kconfig | 2 +- drivers/virtio/virtio_ring.c | 187 +++ tools/virtio/linux/dma-mapping.h | 17 3 files changed, 169 insertions(+), 37 deletions(-) create mode 100644 tools/virtio/linux/dma-mapping.h diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig index cab9f3f63a38..77590320d44c 100644 --- a/drivers/virtio/Kconfig +++ b/drivers/virtio/Kconfig @@ -60,7 +60,7 @@ config VIRTIO_INPUT config VIRTIO_MMIO tristate "Platform bus driver for memory mapped virtio devices" - depends on HAS_IOMEM + depends on HAS_IOMEM && HAS_DMA select VIRTIO ---help--- This drivers provides support for memory mapped virtio diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 096b857e7b75..e71bf0ca59a2 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -24,6 +24,7 @@ #include #include #include +#include #ifdef DEBUG /* For development, we want to crash whenever the ring is screwed. */ @@ -54,7 +55,14 @@ #define END_USE(vq) #endif -struct vring_virtqueue { +struct vring_desc_state +{ + void *data; /* Data for callback. */ + struct vring_desc *indir_desc; /* Indirect descriptor, if any. */ +}; + +struct vring_virtqueue +{ struct virtqueue vq; /* Actual memory layout for this queue */ @@ -92,12 +100,71 @@ struct vring_virtqueue { ktime_t last_add_time; #endif - /* Tokens for callbacks. */ - void *data[]; + /* Per-descriptor state. */ + struct vring_desc_state desc_state[]; }; #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq) +/* + * The DMA ops on various arches are rather gnarly right now, and + * making all of the arch DMA ops work on the vring device itself + * is a mess. For now, we use the parent device for DMA ops. + */ +struct device *vring_dma_dev(const struct vring_virtqueue *vq) +{ + return vq->vq.vdev->dev.parent; +} + +/* Map one sg entry. */ +static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq, + struct scatterlist *sg, + enum dma_data_direction direction) +{ + /* +* We can't use dma_map_sg, because we don't use scatterlists in +* the way it expects (we don't guarantee that the scatterlist +* will exist for the lifetime of the mapping). +*/ + return dma_map_page(vring_dma_dev(vq), + sg_page(sg), sg->offset, sg->length, + direction); +} + +static dma_addr_t vring_map_single(const struct vring_virtqueue *vq, + void *cpu_addr, size_t size, + enum dma_data_direction direction) +{ + return dma_map_single(vring_dma_dev(vq), + cpu_addr, size, direction); +} + +static void vring_unmap_one(const struct vring_virtqueue *vq, + struct vring_desc *desc) +{ + u16 flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); + + if (flags & VRING_DESC_F_INDIRECT) { + dma_unmap_single(vring_dma_dev(vq), +virtio64_to_cpu(vq->vq.vdev, desc->addr), +virtio32_to_cpu(vq->vq.vdev, desc->len), +(flags & VRING_DESC_F_WRITE) ? +DMA_FROM_DEVICE : DMA_TO_DEVICE); + } else { + dma_unmap_page(vring_dma_dev(vq), + virtio64_to_cpu(vq->vq.vdev, desc->addr), + virtio32_to_cpu(vq->vq.vdev, desc->len), + (flags & VRING_DESC_F_WRITE) ? + DMA_FROM_DEVICE : DMA_TO_DEVICE); + } +} + +static int vring_mapping_error(const struct vring_virtqueue *vq, + dma_addr_t addr) +{ + return dma_mapping_error(vring_dma_dev(vq), addr); +} + static struct vring_desc *alloc_indirect(struct virtqueue *_vq, unsigned int total_sg, gfp_t gfp) { @@ -131,7 +198,7 @@ static inline int virtqueue_add(struct virtqueue *_vq, struct
[PATCH v2 1/3] virtio_net: Stop doing DMA from the stack
From: Andy LutomirskiOnce virtio starts using the DMA API, we won't be able to safely DMA from the stack. virtio-net does a couple of config DMA requests from small stack buffers -- switch to using dynamically-allocated memory. This should have no effect on any performance-critical code paths. Cc: net...@vger.kernel.org Cc: "Michael S. Tsirkin" Cc: virtualizat...@lists.linux-foundation.org Reviewed-by: Joerg Roedel Signed-off-by: Andy Lutomirski --- Hi Michael and DaveM- This is a prerequisite for the virtio DMA fixing project. It works as a standalone patch, though. Would it make sense to apply it to an appropriate networking tree now? drivers/net/virtio_net.c | 53 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index d8838dedb7a4..4f10f8a58811 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -976,31 +976,43 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, struct scatterlist *out) { struct scatterlist *sgs[4], hdr, stat; - struct virtio_net_ctrl_hdr ctrl; - virtio_net_ctrl_ack status = ~0; + + struct { + struct virtio_net_ctrl_hdr ctrl; + virtio_net_ctrl_ack status; + } *buf; + unsigned out_num = 0, tmp; + bool ret; /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); - ctrl.class = class; - ctrl.cmd = cmd; + buf = kmalloc(sizeof(*buf), GFP_ATOMIC); + if (!buf) + return false; + buf->status = ~0; + + buf->ctrl.class = class; + buf->ctrl.cmd = cmd; /* Add header */ - sg_init_one(, , sizeof(ctrl)); + sg_init_one(, >ctrl, sizeof(buf->ctrl)); sgs[out_num++] = if (out) sgs[out_num++] = out; /* Add return status. */ - sg_init_one(, , sizeof(status)); + sg_init_one(, >status, sizeof(buf->status)); sgs[out_num] = BUG_ON(out_num + 1 > ARRAY_SIZE(sgs)); virtqueue_add_sgs(vi->cvq, sgs, out_num, 1, vi, GFP_ATOMIC); - if (unlikely(!virtqueue_kick(vi->cvq))) - return status == VIRTIO_NET_OK; + if (unlikely(!virtqueue_kick(vi->cvq))) { + ret = (buf->status == VIRTIO_NET_OK); + goto out; + } /* Spin for a response, the kick causes an ioport write, trapping * into the hypervisor, so the request should be handled immediately. @@ -1009,7 +1021,11 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, !virtqueue_is_broken(vi->cvq)) cpu_relax(); - return status == VIRTIO_NET_OK; + ret = (buf->status == VIRTIO_NET_OK); + +out: + kfree(buf); + return ret; } static int virtnet_set_mac_address(struct net_device *dev, void *p) @@ -1151,7 +1167,7 @@ static void virtnet_set_rx_mode(struct net_device *dev) { struct virtnet_info *vi = netdev_priv(dev); struct scatterlist sg[2]; - u8 promisc, allmulti; + u8 *cmdbyte; struct virtio_net_ctrl_mac *mac_data; struct netdev_hw_addr *ha; int uc_count; @@ -1163,22 +1179,25 @@ static void virtnet_set_rx_mode(struct net_device *dev) if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_RX)) return; - promisc = ((dev->flags & IFF_PROMISC) != 0); - allmulti = ((dev->flags & IFF_ALLMULTI) != 0); + cmdbyte = kmalloc(sizeof(*cmdbyte), GFP_ATOMIC); + if (!cmdbyte) + return; - sg_init_one(sg, , sizeof(promisc)); + sg_init_one(sg, cmdbyte, sizeof(*cmdbyte)); + *cmdbyte = ((dev->flags & IFF_PROMISC) != 0); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_PROMISC, sg)) dev_warn(>dev, "Failed to %sable promisc mode.\n", -promisc ? "en" : "dis"); - - sg_init_one(sg, , sizeof(allmulti)); +*cmdbyte ? "en" : "dis"); + *cmdbyte = ((dev->flags & IFF_ALLMULTI) != 0); if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_RX, VIRTIO_NET_CTRL_RX_ALLMULTI, sg)) dev_warn(>dev, "Failed to %sable allmulti mode.\n", -allmulti ? "en" : "dis"); +*cmdbyte ? "en" : "dis"); + + kfree(cmdbyte); uc_count = netdev_uc_count(dev); mc_count = netdev_mc_count(dev); -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/3] virtio_pci: Use the DMA API
From: Andy LutomirskiThis fixes virtio-pci on platforms and busses that have IOMMUs. This will break the experimental QEMU Q35 IOMMU support until QEMU is fixed. In exchange, it fixes physical virtio hardware as well as virtio-pci running under Xen. We should clean up the virtqueue API to do its own allocation and teach virtqueue_get_avail and virtqueue_get_used to return DMA addresses directly. Signed-off-by: Andy Lutomirski --- drivers/virtio/virtio_pci_common.h | 3 ++- drivers/virtio/virtio_pci_legacy.c | 19 +++ drivers/virtio/virtio_pci_modern.c | 34 -- 3 files changed, 41 insertions(+), 15 deletions(-) diff --git a/drivers/virtio/virtio_pci_common.h b/drivers/virtio/virtio_pci_common.h index b976d968e793..cd6196b513ad 100644 --- a/drivers/virtio/virtio_pci_common.h +++ b/drivers/virtio/virtio_pci_common.h @@ -38,8 +38,9 @@ struct virtio_pci_vq_info { /* the number of entries in the queue */ int num; - /* the virtual address of the ring queue */ + /* the ring queue */ void *queue; + dma_addr_t queue_dma_addr; /* bus address */ /* the list node for the virtqueues list */ struct list_head node; diff --git a/drivers/virtio/virtio_pci_legacy.c b/drivers/virtio/virtio_pci_legacy.c index 48bc9797e530..b5293e5f2af4 100644 --- a/drivers/virtio/virtio_pci_legacy.c +++ b/drivers/virtio/virtio_pci_legacy.c @@ -135,12 +135,14 @@ static struct virtqueue *setup_vq(struct virtio_pci_device *vp_dev, info->msix_vector = msix_vec; size = PAGE_ALIGN(vring_size(num, VIRTIO_PCI_VRING_ALIGN)); - info->queue = alloc_pages_exact(size, GFP_KERNEL|__GFP_ZERO); + info->queue = dma_zalloc_coherent(_dev->pci_dev->dev, size, + >queue_dma_addr, + GFP_KERNEL); if (info->queue == NULL) return ERR_PTR(-ENOMEM); /* activate the queue */ - iowrite32(virt_to_phys(info->queue) >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, + iowrite32(info->queue_dma_addr >> VIRTIO_PCI_QUEUE_ADDR_SHIFT, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); /* create the vring */ @@ -169,7 +171,8 @@ out_assign: vring_del_virtqueue(vq); out_activate_queue: iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); - free_pages_exact(info->queue, size); + dma_free_coherent(_dev->pci_dev->dev, size, + info->queue, info->queue_dma_addr); return ERR_PTR(err); } @@ -194,7 +197,8 @@ static void del_vq(struct virtio_pci_vq_info *info) iowrite32(0, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_PFN); size = PAGE_ALIGN(vring_size(info->num, VIRTIO_PCI_VRING_ALIGN)); - free_pages_exact(info->queue, size); + dma_free_coherent(_dev->pci_dev->dev, size, + info->queue, info->queue_dma_addr); } static const struct virtio_config_ops virtio_pci_config_ops = { @@ -227,6 +231,13 @@ int virtio_pci_legacy_probe(struct virtio_pci_device *vp_dev) return -ENODEV; } + rc = dma_set_mask_and_coherent(_dev->dev, DMA_BIT_MASK(64)); + if (rc) + rc = dma_set_mask_and_coherent(_dev->dev, + DMA_BIT_MASK(32)); + if (rc) + dev_warn(_dev->dev, "Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.\n"); + rc = pci_request_region(pci_dev, 0, "virtio-pci-legacy"); if (rc) return rc; diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 8e5cf194cc0b..fbe0bd1c4881 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -293,14 +293,16 @@ static size_t vring_pci_size(u16 num) return PAGE_ALIGN(vring_size(num, SMP_CACHE_BYTES)); } -static void *alloc_virtqueue_pages(int *num) +static void *alloc_virtqueue_pages(struct virtio_pci_device *vp_dev, + int *num, dma_addr_t *dma_addr) { void *pages; /* TODO: allocate each queue chunk individually */ for (; *num && vring_pci_size(*num) > PAGE_SIZE; *num /= 2) { - pages = alloc_pages_exact(vring_pci_size(*num), - GFP_KERNEL|__GFP_ZERO|__GFP_NOWARN); + pages = dma_zalloc_coherent( + _dev->pci_dev->dev, vring_pci_size(*num), + dma_addr, GFP_KERNEL|__GFP_NOWARN); if (pages) return pages; } @@ -309,7 +311,9 @@ static void *alloc_virtqueue_pages(int *num) return NULL; /* Try to get a single page. You are my only hope! */ - return alloc_pages_exact(vring_pci_size(*num), GFP_KERNEL|__GFP_ZERO); + return dma_zalloc_coherent( +
Re: [PATCH v2 2/3] virtio_ring: Support DMA APIs
Am 28.10.2015 um 14:30 schrieb Andy Lutomirski: > +static void vring_unmap_one(const struct vring_virtqueue *vq, > + struct vring_desc *desc) > +{ > + u16 flags = virtio16_to_cpu(vq->vq.vdev, desc->flags); > + > + if (flags & VRING_DESC_F_INDIRECT) { > + dma_unmap_single(vring_dma_dev(vq), > + virtio64_to_cpu(vq->vq.vdev, desc->addr), > + virtio32_to_cpu(vq->vq.vdev, desc->len), > + (flags & VRING_DESC_F_WRITE) ? > + DMA_FROM_DEVICE : DMA_TO_DEVICE); > + } else { > + dma_unmap_page(vring_dma_dev(vq), > +virtio64_to_cpu(vq->vq.vdev, desc->addr), > +virtio32_to_cpu(vq->vq.vdev, desc->len), > +(flags & VRING_DESC_F_WRITE) ? > +DMA_FROM_DEVICE : DMA_TO_DEVICE); > + } > +} > + I still have a failure in this: [1.913040] Unable to handle kernel pointer dereference in virtual kernel address space [1.913044] failing address: 0c80 TEID: 0c800803 [1.913045] Fault in home space mode while using kernel ASCE. [1.913048] AS:00d56007 R3:0c7f0007 S:0020 [1.913099] Oops: 0010 ilc:2 [#1] SMP [1.913142] Modules linked in: [1.913144] CPU: 4 PID: 50 Comm: kworker/u18:1 Not tainted 4.3.0-rc3+ #252 [1.913150] Workqueue: events_unbound call_usermodehelper_exec_work [1.913152] task: 0bb8b310 ti: 0bba8000 task.ti: 0bba8000 [1.913154] Krnl PSW : 0404e0018000 0059b266 (vring_unmap_one+0x46/0x8d0) [1.913158]R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 EA:3 Krnl GPRS: 0c80 0c036800 0c80 [1.913161]005a21a8 [1.913162]0704c0018000 04000bdbfce8 0c082000 0bdbf6e8 [1.913164]0400 0bdbf988 0bdbf6e8 [1.913170] Krnl Code: 0059b254: e310b0a80004lg %r1,168(%r11) 0059b25a: e3201024 lg %r2,32(%r1) #0059b260: e310b0a4 lg %r1,160(%r11) >0059b266: 4810100c lh %r1,12(%r1) 0059b26a: e320b2a80024 stg %r2,680(%r11) 0059b270: 4010b2a6 sth %r1,678(%r11) 0059b274: e310b2a80004 lg %r1,680(%r11) 0059b27a: e310b2980024 stg %r1,664(%r11) [1.913183] Call Trace: [1.913185] ([<0059b74c>] vring_unmap_one+0x52c/0x8d0) [1.913187] [<005a21a8>] detach_buf+0x720/0x788 [1.913188] [<005a2830>] virtqueue_get_buf+0x620/0x908 [1.913191] [<005e5336>] virtblk_done+0xa6/0x120 [1.913192] [<005a3e46>] vring_interrupt+0x2a6/0x2c0 [1.913224] [<006c9bdc>] virtio_airq_handler+0x7c/0x120 [1.913226] [<0065f88c>] do_airq_interrupt+0xa4/0xc8 [1.913229] [<001b79a0>] handle_irq_event_percpu+0x60/0x1f0 [1.913230] [<001bbbea>] handle_percpu_irq+0x72/0xa0 [1.913232] [<001b6fa4>] generic_handle_irq+0x4c/0x78 [1.913234] [<0010cc7c>] do_IRQ+0x64/0x88 [1.913236] [<00815d42>] io_int_handler+0x10a/0x218 [1.913238] [<00104268>] copy_thread+0x78/0x1a0 [1.913240] ([<001548f8>] copy_process.isra.11+0x750/0x1a80) [1.913242] [<00156122>] _do_fork+0x9a/0x338 [1.913243] [<0015644e>] kernel_thread+0x4e/0x60 [1.913245] [<0016da7a>] call_usermodehelper_exec_work+0x7a/0xf0 [1.913247] [<00171c06>] process_one_work+0x1b6/0x490 [1.913248] [<00171f38>] worker_thread+0x58/0x588 [1.913250] [<001788fa>] kthread+0x10a/0x110 [1.913252] [<00815a8e>] kernel_thread_starter+0x6/0xc [1.913254] [<00815a88>] kernel_thread_starter+0x0/0xc [1.913255] Last Breaking-Event-Address: [1.913256] [<005a21a2>] detach_buf+0x71a/0x788 [1.913258] [1.913263] Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html