Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Avi Kivity

Ben-Ami Yassour wrote:

On Thu, 2008-07-17 at 11:31 +0300, Avi Kivity wrote:
  

Ben-Ami Yassour wrote:


+
+/* FIXME: Implement the OR logic needed to make shared interrupts
on + * this line behave properly + */

  
  
  

Isn't this a showstopper?  There is no easy way for a user to avoid
sharing, especially as we have only three pci irqs at present.



What do you mean by only 3 (for passthrough we use whatever interrupt
the host is using for that device)?

  
  

There are two issue:
- shared host interrupts
- shared guest interrupts

For shared host interrupts, I don't think there's a cost-effective 
solution, especially as hosts are transitioning to MSI.  But the problem 
the comment describes is shared guest interrupts, where the assigned 
device's interrupt is shared with another device (assigned or virtual).


And currently we only have three shareable interrupts: 5, 10, and 11.

See qemu's pci_set_irq() for the logic used to share interrupts.



I think we'll leave this issue until we fix the more basic comments. In
any case, how can we make qemu configure the assigned device to share an
interrupt with another device (assigned or virtual)?

  


It's the guest's responsibility to assign irq lines, when running in 
acpi mode.  In no-acpi mode, the bios assigns lines according to an 
algorithm in bios/rombios32.c:pci_bios_init_device().


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Avi Kivity

Ben-Ami Yassour wrote:

On Thu, 2008-07-17 at 12:50 +0300, Avi Kivity wrote:
  

Ben-Ami Yassour wrote:


On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote:
  
  
If a level triggered interrupt remains active after the eoi, the ioapic 
has to inject it.  This is used to support shared interrupts, or when 
the device has re-raised the line by the time the ack arrives.


I don't see why it should behave differently for assigned devices.




The difference is that for emulated devices, qemu is resetting the irr
bit. For assigned devices it does not, and that's the difference.
The first chance that we can clear the irr bit for real devices is the
eoi function, and actually this is what the ack notify handler is doing
(by calling pci_set_irq(kvm,irq,0) ).
I was able to remove the code in ioapic by calling the ack notify
handler before the irr check, and it seems to work fine.
  


Okay.


(to make it work, I also had to remove the queuing of the ack handler
which was not necessary, as you mentioned in earlier comment)

The eoi function now looks like this:
static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
{
union ioapic_redir_entry *ent;

ent = &ioapic->redirtbl[gsi];
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);

ent->fields.remote_irr = 0;

if (ioapic->ack_notifier)
ioapic->ack_notifier(ioapic->kvm, gsi);

if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
ioapic_service(ioapic, gsi);
}

Any comments on such an approach?

  


I think it's fine.  The point where the ack notifier is called is 
between the end of service of the old interrupt, and the beginning of 
service of a potential new interrupt (from the same device or some other 
device on the same guest line).



--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Ben-Ami Yassour
On Thu, 2008-07-17 at 11:31 +0300, Avi Kivity wrote:
> Ben-Ami Yassour wrote:
> > +
> > +/* FIXME: Implement the OR logic needed to make shared interrupts
> > on + * this line behave properly + */
> >
> >   
> >   
>  Isn't this a showstopper?  There is no easy way for a user to avoid
>  sharing, especially as we have only three pci irqs at present.
>  
> > What do you mean by only 3 (for passthrough we use whatever interrupt
> > the host is using for that device)?
> >
> >   
> 
> There are two issue:
> - shared host interrupts
> - shared guest interrupts
> 
> For shared host interrupts, I don't think there's a cost-effective 
> solution, especially as hosts are transitioning to MSI.  But the problem 
> the comment describes is shared guest interrupts, where the assigned 
> device's interrupt is shared with another device (assigned or virtual).
> 
> And currently we only have three shareable interrupts: 5, 10, and 11.
> 
> See qemu's pci_set_irq() for the logic used to share interrupts.

I think we'll leave this issue until we fix the more basic comments. In
any case, how can we make qemu configure the assigned device to share an
interrupt with another device (assigned or virtual)?

Thanks,
Ben

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Ben-Ami Yassour
On Thu, 2008-07-17 at 12:50 +0300, Avi Kivity wrote:
> Ben-Ami Yassour wrote:
> > On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote:
> >   
> 
> If a level triggered interrupt remains active after the eoi, the ioapic 
> has to inject it.  This is used to support shared interrupts, or when 
> the device has re-raised the line by the time the ack arrives.
> 
> I don't see why it should behave differently for assigned devices.
> 

The difference is that for emulated devices, qemu is resetting the irr
bit. For assigned devices it does not, and that's the difference.
The first chance that we can clear the irr bit for real devices is the
eoi function, and actually this is what the ack notify handler is doing
(by calling pci_set_irq(kvm,irq,0) ).
I was able to remove the code in ioapic by calling the ack notify
handler before the irr check, and it seems to work fine.
(to make it work, I also had to remove the queuing of the ack handler
which was not necessary, as you mentioned in earlier comment)

The eoi function now looks like this:
static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
{
union ioapic_redir_entry *ent;

ent = &ioapic->redirtbl[gsi];
ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);

ent->fields.remote_irr = 0;

if (ioapic->ack_notifier)
ioapic->ack_notifier(ioapic->kvm, gsi);

if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
ioapic_service(ioapic, gsi);
}

Any comments on such an approach?

Thanks,
Ben


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Avi Kivity

Ben-Ami Yassour wrote:

On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote:
  

Ben-Ami Yassour wrote:


+/* Stores information for identifying host PCI devices assigned to the
+ * guest: this is used in the host kernel and in the userspace.
+ */
+struct kvm_pci_pt_info {
+   unsigned char busnr;
+   unsigned int devfn;
+   __u32 irq;
+};
  
  
Badly aligned.  Please use __u32 for all fields, and add a __u32 padding 
field at the end so we have the same ABI for i386 and x86_64.


Some pci devices support more than one irq.  Should we make irq an 
array?  Alternatively, decouple irq assignment from pci information
and 
let userspace handle everything.  This lets us assign non-pci devices 
(like serial ports, etc.).


Can we limit the first version to single irq devices, and handle this
after the merge?

  


Yes.  But we need to get at least the userspace interface right (and 
perhaps only partially implement it).  Breaking the userspace interface 
later is possible, but I'd rather avoid it.



  * ioctls for vcpu fds
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 8ce93c7..0b5bc40 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -288,13 +288,22 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
irq, int level)
 static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
 {
union ioapic_redir_entry *ent;
+   struct kvm_pci_pt_dev_list *match;
+   unsigned long flags;
 
 	ent = &ioapic->redirtbl[gsi];

ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
 
 	ent->fields.remote_irr = 0;

-   if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
-   ioapic_service(ioapic, gsi);
+
+   read_lock_irqsave(&kvm_pci_pt_lock, flags);
+   match = kvm_find_pci_pt_dev(&ioapic->kvm->arch.pci_pt_dev_head, NULL,
+   gsi, KVM_PT_SOURCE_IRQ_ACK);
+   read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
+   if (!match) {
+   if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
+   ioapic_service(ioapic, gsi);
+   }
  
  
What's device assignment code doing in the ioapic?  This should be done 
through the notifier (if it needs to return a value, great).





This purpose of this code is to avoid the call to ioapic_service for
assigned devices, because it is not required and causes a double
injection of the interrupt.

Can you please explain why as part of eoi we inject a new interrupt?
  


If a level triggered interrupt remains active after the eoi, the ioapic 
has to inject it.  This is used to support shared interrupts, or when 
the device has re-raised the line by the time the ack arrives.


I don't see why it should behave differently for assigned devices.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Ben-Ami Yassour
On Wed, 2008-07-16 at 18:04 +0300, Avi Kivity wrote:
> Ben-Ami Yassour wrote:
> > +/* Stores information for identifying host PCI devices assigned to the
> > + * guest: this is used in the host kernel and in the userspace.
> > + */
> > +struct kvm_pci_pt_info {
> > +   unsigned char busnr;
> > +   unsigned int devfn;
> > +   __u32 irq;
> > +};
> >   
> 
> Badly aligned.  Please use __u32 for all fields, and add a __u32 padding 
> field at the end so we have the same ABI for i386 and x86_64.
> 
> Some pci devices support more than one irq.  Should we make irq an 
> array?  Alternatively, decouple irq assignment from pci information
> and 
> let userspace handle everything.  This lets us assign non-pci devices 
> (like serial ports, etc.).
Can we limit the first version to single irq devices, and handle this
after the merge?

> >   * ioctls for vcpu fds
> > diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> > index 8ce93c7..0b5bc40 100644
> > --- a/virt/kvm/ioapic.c
> > +++ b/virt/kvm/ioapic.c
> > @@ -288,13 +288,22 @@ void kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, 
> > int irq, int level)
> >  static void __kvm_ioapic_update_eoi(struct kvm_ioapic *ioapic, int gsi)
> >  {
> > union ioapic_redir_entry *ent;
> > +   struct kvm_pci_pt_dev_list *match;
> > +   unsigned long flags;
> >  
> > ent = &ioapic->redirtbl[gsi];
> > ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> >  
> > ent->fields.remote_irr = 0;
> > -   if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
> > -   ioapic_service(ioapic, gsi);
> > +
> > +   read_lock_irqsave(&kvm_pci_pt_lock, flags);
> > +   match = kvm_find_pci_pt_dev(&ioapic->kvm->arch.pci_pt_dev_head, NULL,
> > +   gsi, KVM_PT_SOURCE_IRQ_ACK);
> > +   read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
> > +   if (!match) {
> > +   if (!ent->fields.mask && (ioapic->irr & (1 << gsi)))
> > +   ioapic_service(ioapic, gsi);
> > +   }
> >   
> 
> What's device assignment code doing in the ioapic?  This should be done 
> through the notifier (if it needs to return a value, great).
> 

This purpose of this code is to avoid the call to ioapic_service for
assigned devices, because it is not required and causes a double
injection of the interrupt.

Can you please explain why as part of eoi we inject a new interrupt?

Thanks,
Ben

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Avi Kivity

Ben-Ami Yassour wrote:

+
+/* FIXME: Implement the OR logic needed to make shared interrupts
on + * this line behave properly + */

  
  

Isn't this a showstopper?  There is no easy way for a user to avoid
sharing, especially as we have only three pci irqs at present.


What do you mean by only 3 (for passthrough we use whatever interrupt
the host is using for that device)?

  


There are two issue:
- shared host interrupts
- shared guest interrupts

For shared host interrupts, I don't think there's a cost-effective 
solution, especially as hosts are transitioning to MSI.  But the problem 
the comment describes is shared guest interrupts, where the assigned 
device's interrupt is shared with another device (assigned or virtual).


And currently we only have three shareable interrupts: 5, 10, and 11.

See qemu's pci_set_irq() for the logic used to share interrupts.



Currently it's not easy to avoid sharing. I think we can support MSI for
assgined device to solve sharing problem. 
  
  

MSI is definitely the right direction, but we also need to support
the 
OR logic for guests that do not support MSI.





The main problems that I am ware of with shared interrupt are as
follows:

1. Protection
A bad guest can keep the interrupt line up, and if its shared then it
interferes with other devices that might not be assigned to this guest.

2. Performance
The current implementation disables the irq when the interrupt is
received on the host and re-enables it when the guest acks the interrupt
on the virtual ioapic.

Clearly its not nice to do that when other devices uses the same irq...

When removing the enable-disable of the irq, it causes serious
performance degradation.
The throughput goes down from 700 to 400 Mb/sec.
The number of interrupts per second on the host (from the NIC) is 13
(compared to 22200 before).

  


These are all shared host interrupt problems.


Regardless, I think that the right way to go would be to merge the
current code (with the non-shared interrupt limitation) and then
continue to work on additional features like shared interrupts support. 
Do you agree?
  


The current code will be incredibly easy to break; all you need is more 
than three pci devices.  I think the sharing code can be easily added 
(as a separate patch); all that's needed is to make sure the API is good.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-17 Thread Ben-Ami Yassour
On Thu, 2008-07-17 at 09:02 +0300, Avi Kivity wrote:
> Han, Weidong wrote:
> > Avi Kivity wrote:
> >   
> >>> +static void kvm_pci_pt_work_fn(struct work_struct *work) +{
> >>> + struct kvm_pci_pt_dev_list *match;
> >>> + struct kvm_pci_pt_work *int_work;
> >>> + int source;
> >>> + unsigned long flags;
> >>> + int guest_irq;
> >>> + int host_irq;
> >>> +
> >>> + int_work = container_of(work, struct kvm_pci_pt_work, work); +
> >>> + source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK :
> >>> KVM_PT_SOURCE_IRQ; + +/* This is taken to safely inject irq
> >>>   
> > inside
> >   
> >>> the guest. When +  * the interrupt injection (or the ioapic code)
> >>> uses a +   * finer-grained lock, update this
> >>> +  */
> >>> + mutex_lock(&int_work->kvm->lock);
> >>> + read_lock_irqsave(&kvm_pci_pt_lock, flags);
> >>> + match =
> >>>   
> > kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head,
> >   
> >>> NULL, +   int_work->irq, source);
> >>> + if (!match) {
> >>> + printk(KERN_ERR "%s: no matching device assigned to
> >>>   
> > guest "
> >   
> >>> +"found for irq %d, source = %d!\n",
> >>> +__func__, int_work->irq, int_work->source);
> >>> + read_unlock_irqrestore(&kvm_pci_pt_lock, flags); +
> >>>   
> > goto out;
> >   
> >>> + }
> >>> + guest_irq = match->pt_dev.guest.irq;
> >>> + host_irq = match->pt_dev.host.irq;
> >>> + read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
> >>> +
> >>> + if (source == KVM_PT_SOURCE_IRQ)
> >>> + kvm_set_irq(int_work->kvm, guest_irq, 1);
> >>> + else {
> >>> + kvm_set_irq(int_work->kvm, int_work->irq, 0);
> >>> + enable_irq(host_irq);
> >>> + }
> >>> +out:
> >>> + mutex_unlock(&int_work->kvm->lock);
> >>> + kvm_put_kvm(int_work->kvm);
> >>> +}
> >>>
> >>> +
> >>> +/* FIXME: Implement the OR logic needed to make shared interrupts
> >>> on + * this line behave properly + */
> >>>
> >>>   
> >> Isn't this a showstopper?  There is no easy way for a user to avoid
> >> sharing, especially as we have only three pci irqs at present.
What do you mean by only 3 (for passthrough we use whatever interrupt
the host is using for that device)?

> >>
> >> 
> >
> > Currently it's not easy to avoid sharing. I think we can support MSI for
> > assgined device to solve sharing problem. 
> >   
> 
> MSI is definitely the right direction, but we also need to support
> the 
> OR logic for guests that do not support MSI.
> 

The main problems that I am ware of with shared interrupt are as
follows:

1. Protection
A bad guest can keep the interrupt line up, and if its shared then it
interferes with other devices that might not be assigned to this guest.

2. Performance
The current implementation disables the irq when the interrupt is
received on the host and re-enables it when the guest acks the interrupt
on the virtual ioapic.

Clearly its not nice to do that when other devices uses the same irq...

When removing the enable-disable of the irq, it causes serious
performance degradation.
The throughput goes down from 700 to 400 Mb/sec.
The number of interrupts per second on the host (from the NIC) is 13
(compared to 22200 before).

Regardless, I think that the right way to go would be to merge the
current code (with the non-shared interrupt limitation) and then
continue to work on additional features like shared interrupts support. 
Do you agree?

Regards,
Ben


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-16 Thread Avi Kivity

Han, Weidong wrote:

Avi Kivity wrote:
  

+static void kvm_pci_pt_work_fn(struct work_struct *work) +{
+   struct kvm_pci_pt_dev_list *match;
+   struct kvm_pci_pt_work *int_work;
+   int source;
+   unsigned long flags;
+   int guest_irq;
+   int host_irq;
+
+   int_work = container_of(work, struct kvm_pci_pt_work, work); +
+   source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK :
KVM_PT_SOURCE_IRQ; + +  /* This is taken to safely inject irq
  

inside
  

the guest. When +* the interrupt injection (or the ioapic code)
uses a + * finer-grained lock, update this
+*/
+   mutex_lock(&int_work->kvm->lock);
+   read_lock_irqsave(&kvm_pci_pt_lock, flags);
+   match =
  

kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head,
  

NULL, + int_work->irq, source);
+   if (!match) {
+   printk(KERN_ERR "%s: no matching device assigned to
  

guest "
  

+  "found for irq %d, source = %d!\n",
+  __func__, int_work->irq, int_work->source);
+   read_unlock_irqrestore(&kvm_pci_pt_lock, flags); +
  

goto out;
  

+   }
+   guest_irq = match->pt_dev.guest.irq;
+   host_irq = match->pt_dev.host.irq;
+   read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
+
+   if (source == KVM_PT_SOURCE_IRQ)
+   kvm_set_irq(int_work->kvm, guest_irq, 1);
+   else {
+   kvm_set_irq(int_work->kvm, int_work->irq, 0);
+   enable_irq(host_irq);
+   }
+out:
+   mutex_unlock(&int_work->kvm->lock);
+   kvm_put_kvm(int_work->kvm);
+}

+
+/* FIXME: Implement the OR logic needed to make shared interrupts
on + * this line behave properly + */

  

Isn't this a showstopper?  There is no easy way for a user to avoid
sharing, especially as we have only three pci irqs at present.




Currently it's not easy to avoid sharing. I think we can support MSI for
assgined device to solve sharing problem. 
  


MSI is definitely the right direction, but we also need to support the 
OR logic for guests that do not support MSI.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-16 Thread Yang, Sheng
On Thursday 17 July 2008 10:09:57 Han, Weidong wrote:
> Avi Kivity wrote:
> >> +static void kvm_pci_pt_work_fn(struct work_struct *work) +{
> >> +  struct kvm_pci_pt_dev_list *match;
> >> +  struct kvm_pci_pt_work *int_work;
> >> +  int source;
> >> +  unsigned long flags;
> >> +  int guest_irq;
> >> +  int host_irq;
> >> +
> >> +  int_work = container_of(work, struct kvm_pci_pt_work, work); +
> >> +  source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK :
> >> KVM_PT_SOURCE_IRQ; + + /* This is taken to safely inject irq
>
> inside
>
> >> the guest. When +   * the interrupt injection (or the ioapic
> >> code) uses a +  * finer-grained lock, update this
> >> +   */
> >> +  mutex_lock(&int_work->kvm->lock);
> >> +  read_lock_irqsave(&kvm_pci_pt_lock, flags);
> >> +  match =
>
> kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head,
>
> >> NULL, +int_work->irq, source);
> >> +  if (!match) {
> >> +  printk(KERN_ERR "%s: no matching device assigned to
>
> guest "
>
> >> + "found for irq %d, source = %d!\n",
> >> + __func__, int_work->irq, int_work->source);
> >> +  read_unlock_irqrestore(&kvm_pci_pt_lock, flags); +
>
> goto out;
>
> >> +  }
> >> +  guest_irq = match->pt_dev.guest.irq;
> >> +  host_irq = match->pt_dev.host.irq;
> >> +  read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
> >> +
> >> +  if (source == KVM_PT_SOURCE_IRQ)
> >> +  kvm_set_irq(int_work->kvm, guest_irq, 1);
> >> +  else {
> >> +  kvm_set_irq(int_work->kvm, int_work->irq, 0);
> >> +  enable_irq(host_irq);
> >> +  }
> >> +out:
> >> +  mutex_unlock(&int_work->kvm->lock);
> >> +  kvm_put_kvm(int_work->kvm);
> >> +}
> >>
> >> +
> >> +/* FIXME: Implement the OR logic needed to make shared
> >> interrupts on + * this line behave properly + */
> >
> > Isn't this a showstopper?  There is no easy way for a user to
> > avoid sharing, especially as we have only three pci irqs at
> > present.
>
> Currently it's not easy to avoid sharing. I think we can support
> MSI for assgined device to solve sharing problem.
>

I am working on the MSI support now.

-- 
regards
Yang, Sheng


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-16 Thread Han, Weidong
Avi Kivity wrote:
>> +static void kvm_pci_pt_work_fn(struct work_struct *work) +{
>> +struct kvm_pci_pt_dev_list *match;
>> +struct kvm_pci_pt_work *int_work;
>> +int source;
>> +unsigned long flags;
>> +int guest_irq;
>> +int host_irq;
>> +
>> +int_work = container_of(work, struct kvm_pci_pt_work, work); +
>> +source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK :
>> KVM_PT_SOURCE_IRQ; + +   /* This is taken to safely inject irq
inside
>> the guest. When + * the interrupt injection (or the ioapic code)
>> uses a +  * finer-grained lock, update this
>> + */
>> +mutex_lock(&int_work->kvm->lock);
>> +read_lock_irqsave(&kvm_pci_pt_lock, flags);
>> +match =
kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head,
>> NULL, +  int_work->irq, source);
>> +if (!match) {
>> +printk(KERN_ERR "%s: no matching device assigned to
guest "
>> +   "found for irq %d, source = %d!\n",
>> +   __func__, int_work->irq, int_work->source);
>> +read_unlock_irqrestore(&kvm_pci_pt_lock, flags); +
goto out;
>> +}
>> +guest_irq = match->pt_dev.guest.irq;
>> +host_irq = match->pt_dev.host.irq;
>> +read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
>> +
>> +if (source == KVM_PT_SOURCE_IRQ)
>> +kvm_set_irq(int_work->kvm, guest_irq, 1);
>> +else {
>> +kvm_set_irq(int_work->kvm, int_work->irq, 0);
>> +enable_irq(host_irq);
>> +}
>> +out:
>> +mutex_unlock(&int_work->kvm->lock);
>> +kvm_put_kvm(int_work->kvm);
>> +}
>> 
>> +
>> +/* FIXME: Implement the OR logic needed to make shared interrupts
>> on + * this line behave properly + */
>> 
> 
> Isn't this a showstopper?  There is no easy way for a user to avoid
> sharing, especially as we have only three pci irqs at present.
> 

Currently it's not easy to avoid sharing. I think we can support MSI for
assgined device to solve sharing problem. 

Randy (Weidong)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: Handle device assignment to guests

2008-07-16 Thread Avi Kivity

Ben-Ami Yassour wrote:

From: Han, Weidong <[EMAIL PROTECTED]>

This patch adds support for handling PCI devices that are assigned to
the guest ("PCI passthrough").
+
+/*
+ * Used to find a registered host PCI device (a "passthrough" device)
+ * during ioctls, interrupts or EOI
+ */
+struct kvm_pci_pt_dev_list *
+kvm_find_pci_pt_dev(struct list_head *head,
+   struct kvm_pci_pt_info *pt_pci_info, int irq, int source)
+{
+   struct list_head *ptr;
+   struct kvm_pci_pt_dev_list *match;
+
+   list_for_each(ptr, head) {
+   match = list_entry(ptr, struct kvm_pci_pt_dev_list, list);
+
+   switch (source) {
+   case KVM_PT_SOURCE_IRQ:
+   /*
+* Used to find a registered host device
+* during interrupt context on host
+*/
+   if (match->pt_dev.host.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_IRQ_ACK:
+   /*
+* Used to find a registered host device when
+* the guest acks an interrupt
+*/
+   if (match->pt_dev.guest.irq == irq)
+   return match;
+   break;
+   case KVM_PT_SOURCE_UPDATE:
+   if ((match->pt_dev.host.busnr == pt_pci_info->busnr) &&
+   (match->pt_dev.host.devfn == pt_pci_info->devfn))
+   return match;
+   break;
+   }
+   }
+   return NULL;
+}
  


This monster is best split into three functions each handling a separate 
case, without the 'source' argument.



+static void kvm_pci_pt_work_fn(struct work_struct *work)
+{
+   struct kvm_pci_pt_dev_list *match;
+   struct kvm_pci_pt_work *int_work;
+   int source;
+   unsigned long flags;
+   int guest_irq;
+   int host_irq;
+
+   int_work = container_of(work, struct kvm_pci_pt_work, work);
+
+   source = int_work->source ? KVM_PT_SOURCE_IRQ_ACK : KVM_PT_SOURCE_IRQ;
+
+   /* This is taken to safely inject irq inside the guest. When
+* the interrupt injection (or the ioapic code) uses a
+* finer-grained lock, update this
+*/
+   mutex_lock(&int_work->kvm->lock);
+   read_lock_irqsave(&kvm_pci_pt_lock, flags);
+   match = kvm_find_pci_pt_dev(&int_work->kvm->arch.pci_pt_dev_head, NULL,
+   int_work->irq, source);
+   if (!match) {
+   printk(KERN_ERR "%s: no matching device assigned to guest "
+  "found for irq %d, source = %d!\n",
+  __func__, int_work->irq, int_work->source);
+   read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
+   goto out;
+   }
+   guest_irq = match->pt_dev.guest.irq;
+   host_irq = match->pt_dev.host.irq;
+   read_unlock_irqrestore(&kvm_pci_pt_lock, flags);
+
+   if (source == KVM_PT_SOURCE_IRQ)
+   kvm_set_irq(int_work->kvm, guest_irq, 1);
+   else {
+   kvm_set_irq(int_work->kvm, int_work->irq, 0);
+   enable_irq(host_irq);
+   }
+out:
+   mutex_unlock(&int_work->kvm->lock);
+   kvm_put_kvm(int_work->kvm);
+}
  
+

+/* FIXME: Implement the OR logic needed to make shared interrupts on
+ * this line behave properly
+ */
  


Isn't this a showstopper?  There is no easy way for a user to avoid 
sharing, especially as we have only three pci irqs at present.



+static irqreturn_t kvm_pci_pt_dev_intr(int irq, void *dev_id)
+{
+   struct kvm *kvm = (struct kvm *) dev_id;
+   struct kvm_pci_pt_dev_list *pci_pt_dev;
+
+   if (!test_bit(irq, pt_irq_handled))
+   return IRQ_NONE;
+
+   read_lock(&kvm_pci_pt_lock);
+   pci_pt_dev = kvm_find_pci_pt_dev(&kvm->arch.pci_pt_dev_head, NULL,
+irq, KVM_PT_SOURCE_IRQ);
+   if (!pci_pt_dev) {
+   read_unlock(&kvm_pci_pt_lock);
+   return IRQ_NONE;
+   }
  



I see we don't reuse the result of the search.  I guess we can't, since 
the list may change between the interrupt and the execution of the work 
function.



+
+   pci_pt_dev->pt_dev.int_work.irq = irq;
+   pci_pt_dev->pt_dev.int_work.kvm = kvm;
+   pci_pt_dev->pt_dev.int_work.source = 0;
+
  


For a bool, use false, not 0.  But 'source' isn't really a good name for 
a boolean.  Perhaps 'is_ack'?



+
+/* Ack the irq line for a passthrough device */
+static void kvm_pci_pt_ack_irq(void *opaque, int irq)
+{
+   struct kvm *kvm = opaque;
+   struct kvm_pci_pt_dev_list *pci_pt_dev;
+   unsigned long flags;
+
+   if (irq == -1)
+   return;
+
+   read_lock_irqsave(&kvm_pci_pt_lock, flags);
+   pci_pt_dev = kvm_find_pci_pt_d