Re: [PATCH 3/8] KVM: Handle device assignment to guests
Ben-Ami Yassour wrote: On Thu, 2008-07-17 at 11:31 +0300, Avi Kivity wrote: Ben-Ami Yassour wrote: + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ Isn't this a showstopper? There is no easy way for a user to avoid sharing, especially as we have only three pci irqs at present. What do you mean by only 3 (for passthrough we use whatever interrupt the host is using for that device)? There are two issue: - shared host interrupts - shared guest interrupts For shared host interrupts, I don't think there's a cost-effective solution, especially as hosts are transitioning to MSI. But the problem the comment describes is shared guest interrupts, where the assigned device's interrupt is shared with another device (assigned or virtual). And currently we only have three shareable interrupts: 5, 10, and 11. See qemu's pci_set_irq() for the logic used to share interrupts. I think we'll leave this issue until we fix the more basic comments. In any case, how can we make qemu configure the assigned device to share an interrupt with another device (assigned or virtual)? It's the guest's responsibility to assign irq lines, when running in acpi mode. In no-acpi mode, the bios assigns lines according to an algorithm in bios/rombios32.c:pci_bios_init_device(). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
Ben-Ami Yassour wrote: On Thu, 2008-07-17 at 12:50 +0300, Avi Kivity wrote: Ben-Ami Yassour wrote: On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote: If a level triggered interrupt remains active after the eoi, the ioapic has to inject it. This is used to support shared interrupts, or when the device has re-raised the line by the time the ack arrives. I don't see why it should behave differently for assigned devices. The difference is that for emulated devices, qemu is resetting the irr bit. For assigned devices it does not, and that's the difference. The first chance that we can clear the irr bit for real devices is the eoi function, and actually this is what the ack notify handler is doing (by calling pci_set_irq(kvm,irq,0) ). I was able to remove the code in ioapic by calling the ack notify handler before the irr check, and it seems to work fine. Okay. (to make it work, I also had to remove the queuing of the ack handler which was not necessary, as you mentioned in earlier comment) The eoi function now looks like this: static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi) { union ioapic_redir_entry *ent; ent = &ioapic->redirtbl[gsi]; ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); ent->fields.remote_irr = 0; if (ioapic->ack_notifier) ioapic->ack_notifier(ioapic->kvm, gsi); if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) ioapic_service(ioapic, gsi); } Any comments on such an approach? I think it's fine. The point where the ack notifier is called is between the end of service of the old interrupt, and the beginning of service of a potential new interrupt (from the same device or some other device on the same guest line). -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
On Thu, 2008-07-17 at 11:31 +0300, Avi Kivity wrote: > Ben-Ami Yassour wrote: > > + > > +/* FIXME: Implement the OR logic needed to make shared interrupts > > on + * this line behave properly + */ > > > > > > > Isn't this a showstopper? There is no easy way for a user to avoid > sharing, especially as we have only three pci irqs at present. > > > What do you mean by only 3 (for passthrough we use whatever interrupt > > the host is using for that device)? > > > > > > There are two issue: > - shared host interrupts > - shared guest interrupts > > For shared host interrupts, I don't think there's a cost-effective > solution, especially as hosts are transitioning to MSI. But the problem > the comment describes is shared guest interrupts, where the assigned > device's interrupt is shared with another device (assigned or virtual). > > And currently we only have three shareable interrupts: 5, 10, and 11. > > See qemu's pci_set_irq() for the logic used to share interrupts. I think we'll leave this issue until we fix the more basic comments. In any case, how can we make qemu configure the assigned device to share an interrupt with another device (assigned or virtual)? Thanks, Ben -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
On Thu, 2008-07-17 at 12:50 +0300, Avi Kivity wrote: > Ben-Ami Yassour wrote: > > On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote: > > > > If a level triggered interrupt remains active after the eoi, the ioapic > has to inject it. This is used to support shared interrupts, or when > the device has re-raised the line by the time the ack arrives. > > I don't see why it should behave differently for assigned devices. > The difference is that for emulated devices, qemu is resetting the irr bit. For assigned devices it does not, and that's the difference. The first chance that we can clear the irr bit for real devices is the eoi function, and actually this is what the ack notify handler is doing (by calling pci_set_irq(kvm,irq,0) ). I was able to remove the code in ioapic by calling the ack notify handler before the irr check, and it seems to work fine. (to make it work, I also had to remove the queuing of the ack handler which was not necessary, as you mentioned in earlier comment) The eoi function now looks like this: static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi) { union ioapic_redir_entry *ent; ent = &ioapic->redirtbl[gsi]; ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); ent->fields.remote_irr = 0; if (ioapic->ack_notifier) ioapic->ack_notifier(ioapic->kvm, gsi); if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) ioapic_service(ioapic, gsi); } Any comments on such an approach? Thanks, Ben -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
Ben-Ami Yassour wrote: On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote: Ben-Ami Yassour wrote: +/* Stores information for identifying host PCI devices assigned to the + * guest: this is used in the host kernel and in the userspace. + */ +struct kvm_pci_pt_info { + unsigned char busnr; + unsigned int devfn; + __u32 irq; +}; Badly aligned. Please use __u32 for all fields, and add a __u32 padding field at the end so we have the same ABI for i386 and x86_64. Some pci devices support more than one irq. Should we make irq an array? Alternatively, decouple irq assignment from pci information and let userspace handle everything. This lets us assign non-pci devices (like serial ports, etc.). Can we limit the first version to single irq devices, and handle this after the merge? Yes. But we need to get at least the userspace interface right (and perhaps only partially implement it). Breaking the userspace interface later is possible, but I'd rather avoid it. * ioctls for vcpu fds diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 8ce93c7..0b5bc40 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -288,13 +288,22 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int level) static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi) { union ioapic_redir_entry *ent; + struct kvm_pci_pt_dev_list *match; + unsigned long flags; ent = &ioapic->redirtbl[gsi]; ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); ent->fields.remote_irr = 0; - if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) - ioapic_service(ioapic, gsi); + + read_lock_irqsave(&kvm_pci_pt_lock, flags); + match = kvm_find_pci_pt_dev(&ioapic->kvm->arch.pci_pt_dev_head, NULL, + gsi, KVM_PT_SOURCE_IRQ_ACK); + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + if (!match) { + if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) + ioapic_service(ioapic, gsi); + } What's device assignment code doing in the ioapic? This should be done through the notifier (if it needs to return a value, great). This purpose of this code is to avoid the call to ioapic_service for assigned devices, because it is not required and causes a double injection of the interrupt. Can you please explain why as part of eoi we inject a new interrupt? If a level triggered interrupt remains active after the eoi, the ioapic has to inject it. This is used to support shared interrupts, or when the device has re-raised the line by the time the ack arrives. I don't see why it should behave differently for assigned devices. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote: > Ben-Ami Yassour wrote: > > +/* Stores information for identifying host PCI devices assigned to the > > + * guest: this is used in the host kernel and in the userspace. > > + */ > > +struct kvm_pci_pt_info { > > + unsigned char busnr; > > + unsigned int devfn; > > + __u32 irq; > > +}; > > > > Badly aligned. Please use __u32 for all fields, and add a __u32 padding > field at the end so we have the same ABI for i386 and x86_64. > > Some pci devices support more than one irq. Should we make irq an > array? Alternatively, decouple irq assignment from pci information > and > let userspace handle everything. This lets us assign non-pci devices > (like serial ports, etc.). Can we limit the first version to single irq devices, and handle this after the merge? > > * ioctls for vcpu fds > > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c > > index 8ce93c7..0b5bc40 100644 > > --- a/virt/kvm/ioapic.c > > +++ b/virt/kvm/ioapic.c > > @@ -288,13 +288,22 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, > > int irq, int level) > > static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi) > > { > > union ioapic_redir_entry *ent; > > + struct kvm_pci_pt_dev_list *match; > > + unsigned long flags; > > > > ent = &ioapic->redirtbl[gsi]; > > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); > > > > ent->fields.remote_irr = 0; > > - if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) > > - ioapic_service(ioapic, gsi); > > + > > + read_lock_irqsave(&kvm_pci_pt_lock, flags); > > + match = kvm_find_pci_pt_dev(&ioapic->kvm->arch.pci_pt_dev_head, NULL, > > + gsi, KVM_PT_SOURCE_IRQ_ACK); > > + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); > > + if (!match) { > > + if (!ent->fields.mask && (ioapic->irr & (1 << gsi))) > > + ioapic_service(ioapic, gsi); > > + } > > > > What's device assignment code doing in the ioapic? This should be done > through the notifier (if it needs to return a value, great). > This purpose of this code is to avoid the call to ioapic_service for assigned devices, because it is not required and causes a double injection of the interrupt. Can you please explain why as part of eoi we inject a new interrupt? Thanks, Ben -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
Ben-Ami Yassour wrote: + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ Isn't this a showstopper? There is no easy way for a user to avoid sharing, especially as we have only three pci irqs at present. What do you mean by only 3 (for passthrough we use whatever interrupt the host is using for that device)? There are two issue: - shared host interrupts - shared guest interrupts For shared host interrupts, I don't think there's a cost-effective solution, especially as hosts are transitioning to MSI. But the problem the comment describes is shared guest interrupts, where the assigned device's interrupt is shared with another device (assigned or virtual). And currently we only have three shareable interrupts: 5, 10, and 11. See qemu's pci_set_irq() for the logic used to share interrupts. Currently it's not easy to avoid sharing. I think we can support MSI for assgined device to solve sharing problem. MSI is definitely the right direction, but we also need to support the OR logic for guests that do not support MSI. The main problems that I am ware of with shared interrupt are as follows: 1. Protection A bad guest can keep the interrupt line up, and if its shared then it interferes with other devices that might not be assigned to this guest. 2. Performance The current implementation disables the irq when the interrupt is received on the host and re-enables it when the guest acks the interrupt on the virtual ioapic. Clearly its not nice to do that when other devices uses the same irq... When removing the enable-disable of the irq, it causes serious performance degradation. The throughput goes down from 700 to 400 Mb/sec. The number of interrupts per second on the host (from the NIC) is 13 (compared to 22200 before). These are all shared host interrupt problems. Regardless, I think that the right way to go would be to merge the current code (with the non-shared interrupt limitation) and then continue to work on additional features like shared interrupts support. Do you agree? The current code will be incredibly easy to break; all you need is more than three pci devices. I think the sharing code can be easily added (as a separate patch); all that's needed is to make sure the API is good. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
On Thu, 2008-07-17 at 09:02 +0300, Avi Kivity wrote: > Han, Weidong wrote: > > Avi Kivity wrote: > > > >>> +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ > >>> + struct kvm_pci_pt_dev_list *match; > >>> + struct kvm_pci_pt_work *int_work; > >>> + int source; > >>> + unsigned long flags; > >>> + int guest_irq; > >>> + int host_irq; > >>> + > >>> + int_work = container_of(work, struct kvm_pci_pt_work, work); + > >>> + source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : > >>> KVM_PT_SOURCE_IRQ; + +/* This is taken to safely inject irq > >>> > > inside > > > >>> the guest. When + * the interrupt injection (or the ioapic code) > >>> uses a + * finer-grained lock, update this > >>> + */ > >>> + mutex_lock(&int_work->kvm->lock); > >>> + read_lock_irqsave(&kvm_pci_pt_lock, flags); > >>> + match = > >>> > > kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, > > > >>> NULL, + int_work->irq, source); > >>> + if (!match) { > >>> + printk(KERN_ERR "%s: no matching device assigned to > >>> > > guest " > > > >>> +"found for irq %d, source = %d!\n", > >>> +__func__, int_work->irq, int_work->source); > >>> + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + > >>> > > goto out; > > > >>> + } > >>> + guest_irq = match->pt_dev.guest.irq; > >>> + host_irq = match->pt_dev.host.irq; > >>> + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); > >>> + > >>> + if (source == KVM_PT_SOURCE_IRQ) > >>> + kvm_set_irq(int_work->kvm, guest_irq, 1); > >>> + else { > >>> + kvm_set_irq(int_work->kvm, int_work->irq, 0); > >>> + enable_irq(host_irq); > >>> + } > >>> +out: > >>> + mutex_unlock(&int_work->kvm->lock); > >>> + kvm_put_kvm(int_work->kvm); > >>> +} > >>> > >>> + > >>> +/* FIXME: Implement the OR logic needed to make shared interrupts > >>> on + * this line behave properly + */ > >>> > >>> > >> Isn't this a showstopper? There is no easy way for a user to avoid > >> sharing, especially as we have only three pci irqs at present. What do you mean by only 3 (for passthrough we use whatever interrupt the host is using for that device)? > >> > >> > > > > Currently it's not easy to avoid sharing. I think we can support MSI for > > assgined device to solve sharing problem. > > > > MSI is definitely the right direction, but we also need to support > the > OR logic for guests that do not support MSI. > The main problems that I am ware of with shared interrupt are as follows: 1. Protection A bad guest can keep the interrupt line up, and if its shared then it interferes with other devices that might not be assigned to this guest. 2. Performance The current implementation disables the irq when the interrupt is received on the host and re-enables it when the guest acks the interrupt on the virtual ioapic. Clearly its not nice to do that when other devices uses the same irq... When removing the enable-disable of the irq, it causes serious performance degradation. The throughput goes down from 700 to 400 Mb/sec. The number of interrupts per second on the host (from the NIC) is 13 (compared to 22200 before). Regardless, I think that the right way to go would be to merge the current code (with the non-shared interrupt limitation) and then continue to work on additional features like shared interrupts support. Do you agree? Regards, Ben -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
Han, Weidong wrote: Avi Kivity wrote: +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_dev_list *match; + struct kvm_pci_pt_work *int_work; + int source; + unsigned long flags; + int guest_irq; + int host_irq; + + int_work = container_of(work, struct kvm_pci_pt_work, work); + + source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a + * finer-grained lock, update this +*/ + mutex_lock(&int_work->kvm->lock); + read_lock_irqsave(&kvm_pci_pt_lock, flags); + match = kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, NULL, + int_work->irq, source); + if (!match) { + printk(KERN_ERR "%s: no matching device assigned to guest " + "found for irq %d, source = %d!\n", + __func__, int_work->irq, int_work->source); + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + goto out; + } + guest_irq = match->pt_dev.guest.irq; + host_irq = match->pt_dev.host.irq; + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + + if (source == KVM_PT_SOURCE_IRQ) + kvm_set_irq(int_work->kvm, guest_irq, 1); + else { + kvm_set_irq(int_work->kvm, int_work->irq, 0); + enable_irq(host_irq); + } +out: + mutex_unlock(&int_work->kvm->lock); + kvm_put_kvm(int_work->kvm); +} + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ Isn't this a showstopper? There is no easy way for a user to avoid sharing, especially as we have only three pci irqs at present. Currently it's not easy to avoid sharing. I think we can support MSI for assgined device to solve sharing problem. MSI is definitely the right direction, but we also need to support the OR logic for guests that do not support MSI. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
On Thursday 17 July 2008 10:09:57 Han, Weidong wrote: > Avi Kivity wrote: > >> +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ > >> + struct kvm_pci_pt_dev_list *match; > >> + struct kvm_pci_pt_work *int_work; > >> + int source; > >> + unsigned long flags; > >> + int guest_irq; > >> + int host_irq; > >> + > >> + int_work = container_of(work, struct kvm_pci_pt_work, work); + > >> + source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : > >> KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq > > inside > > >> the guest. When + * the interrupt injection (or the ioapic > >> code) uses a + * finer-grained lock, update this > >> + */ > >> + mutex_lock(&int_work->kvm->lock); > >> + read_lock_irqsave(&kvm_pci_pt_lock, flags); > >> + match = > > kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, > > >> NULL, +int_work->irq, source); > >> + if (!match) { > >> + printk(KERN_ERR "%s: no matching device assigned to > > guest " > > >> + "found for irq %d, source = %d!\n", > >> + __func__, int_work->irq, int_work->source); > >> + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + > > goto out; > > >> + } > >> + guest_irq = match->pt_dev.guest.irq; > >> + host_irq = match->pt_dev.host.irq; > >> + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); > >> + > >> + if (source == KVM_PT_SOURCE_IRQ) > >> + kvm_set_irq(int_work->kvm, guest_irq, 1); > >> + else { > >> + kvm_set_irq(int_work->kvm, int_work->irq, 0); > >> + enable_irq(host_irq); > >> + } > >> +out: > >> + mutex_unlock(&int_work->kvm->lock); > >> + kvm_put_kvm(int_work->kvm); > >> +} > >> > >> + > >> +/* FIXME: Implement the OR logic needed to make shared > >> interrupts on + * this line behave properly + */ > > > > Isn't this a showstopper? There is no easy way for a user to > > avoid sharing, especially as we have only three pci irqs at > > present. > > Currently it's not easy to avoid sharing. I think we can support > MSI for assgined device to solve sharing problem. > I am working on the MSI support now. -- regards Yang, Sheng -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/8] KVM: Handle device assignment to guests
Avi Kivity wrote: >> +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ >> +struct kvm_pci_pt_dev_list *match; >> +struct kvm_pci_pt_work *int_work; >> +int source; >> +unsigned long flags; >> +int guest_irq; >> +int host_irq; >> + >> +int_work = container_of(work, struct kvm_pci_pt_work, work); + >> +source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : >> KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq inside >> the guest. When + * the interrupt injection (or the ioapic code) >> uses a + * finer-grained lock, update this >> + */ >> +mutex_lock(&int_work->kvm->lock); >> +read_lock_irqsave(&kvm_pci_pt_lock, flags); >> +match = kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, >> NULL, + int_work->irq, source); >> +if (!match) { >> +printk(KERN_ERR "%s: no matching device assigned to guest " >> + "found for irq %d, source = %d!\n", >> + __func__, int_work->irq, int_work->source); >> +read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + goto out; >> +} >> +guest_irq = match->pt_dev.guest.irq; >> +host_irq = match->pt_dev.host.irq; >> +read_unlock_irqrestore(&kvm_pci_pt_lock, flags); >> + >> +if (source == KVM_PT_SOURCE_IRQ) >> +kvm_set_irq(int_work->kvm, guest_irq, 1); >> +else { >> +kvm_set_irq(int_work->kvm, int_work->irq, 0); >> +enable_irq(host_irq); >> +} >> +out: >> +mutex_unlock(&int_work->kvm->lock); >> +kvm_put_kvm(int_work->kvm); >> +} >> >> + >> +/* FIXME: Implement the OR logic needed to make shared interrupts >> on + * this line behave properly + */ >> > > Isn't this a showstopper? There is no easy way for a user to avoid > sharing, especially as we have only three pci irqs at present. > Currently it's not easy to avoid sharing. I think we can support MSI for assgined device to solve sharing problem. Randy (Weidong) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/8] KVM: Handle device assignment to guests
Ben-Ami Yassour wrote: From: Han, Weidong <[EMAIL PROTECTED]> This patch adds support for handling PCI devices that are assigned to the guest ("PCI passthrough"). + +/* + * Used to find a registered host PCI device (a "passthrough" device) + * during ioctls, interrupts or EOI + */ +struct kvm_pci_pt_dev_list * +kvm_find_pci_pt_dev(struct list_head *head, + struct kvm_pci_pt_info *pt_pci_info, int irq, int source) +{ + struct list_head *ptr; + struct kvm_pci_pt_dev_list *match; + + list_for_each(ptr, head) { + match = list_entry(ptr, struct kvm_pci_pt_dev_list, list); + + switch (source) { + case KVM_PT_SOURCE_IRQ: + /* +* Used to find a registered host device +* during interrupt context on host +*/ + if (match->pt_dev.host.irq == irq) + return match; + break; + case KVM_PT_SOURCE_IRQ_ACK: + /* +* Used to find a registered host device when +* the guest acks an interrupt +*/ + if (match->pt_dev.guest.irq == irq) + return match; + break; + case KVM_PT_SOURCE_UPDATE: + if ((match->pt_dev.host.busnr == pt_pci_info->busnr) && + (match->pt_dev.host.devfn == pt_pci_info->devfn)) + return match; + break; + } + } + return NULL; +} This monster is best split into three functions each handling a separate case, without the 'source' argument. +static void kvm_pci_pt_work_fn(struct work_struct *work) +{ + struct kvm_pci_pt_dev_list *match; + struct kvm_pci_pt_work *int_work; + int source; + unsigned long flags; + int guest_irq; + int host_irq; + + int_work = container_of(work, struct kvm_pci_pt_work, work); + + source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq inside the guest. When +* the interrupt injection (or the ioapic code) uses a +* finer-grained lock, update this +*/ + mutex_lock(&int_work->kvm->lock); + read_lock_irqsave(&kvm_pci_pt_lock, flags); + match = kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, NULL, + int_work->irq, source); + if (!match) { + printk(KERN_ERR "%s: no matching device assigned to guest " + "found for irq %d, source = %d!\n", + __func__, int_work->irq, int_work->source); + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + goto out; + } + guest_irq = match->pt_dev.guest.irq; + host_irq = match->pt_dev.host.irq; + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); + + if (source == KVM_PT_SOURCE_IRQ) + kvm_set_irq(int_work->kvm, guest_irq, 1); + else { + kvm_set_irq(int_work->kvm, int_work->irq, 0); + enable_irq(host_irq); + } +out: + mutex_unlock(&int_work->kvm->lock); + kvm_put_kvm(int_work->kvm); +} + +/* FIXME: Implement the OR logic needed to make shared interrupts on + * this line behave properly + */ Isn't this a showstopper? There is no easy way for a user to avoid sharing, especially as we have only three pci irqs at present. +static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id) +{ + struct kvm *kvm = (struct kvm *) dev_id; + struct kvm_pci_pt_dev_list *pci_pt_dev; + + if (!test_bit(irq, pt_irq_handled)) + return IRQ_NONE; + + read_lock(&kvm_pci_pt_lock); + pci_pt_dev = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head, NULL, +irq, KVM_PT_SOURCE_IRQ); + if (!pci_pt_dev) { + read_unlock(&kvm_pci_pt_lock); + return IRQ_NONE; + } I see we don't reuse the result of the search. I guess we can't, since the list may change between the interrupt and the execution of the work function. + + pci_pt_dev->pt_dev.int_work.irq = irq; + pci_pt_dev->pt_dev.int_work.kvm = kvm; + pci_pt_dev->pt_dev.int_work.source = 0; + For a bool, use false, not 0. But 'source' isn't really a good name for a boolean. Perhaps 'is_ack'? + +/* Ack the irq line for a passthrough device */ +static void kvm_pci_pt_ack_irq(void *opaque, int irq) +{ + struct kvm *kvm = opaque; + struct kvm_pci_pt_dev_list *pci_pt_dev; + unsigned long flags; + + if (irq == -1) + return; + + read_lock_irqsave(&kvm_pci_pt_lock, flags); + pci_pt_dev = kvm_find_pci_pt_d