RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-23 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-24:
> 
> 
> Zhang, Yang Z wrote on 2014-12-24:
>> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> KVM list; Eric Auger
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Jiang Liu wrote on 2014-12-24:
>>> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>>>> Paolo Bonzini wrote on 2014-12-23:
>>>>> 
>>>>> 
>>>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs
>>>>>>>> in host. Do you think it is a right behavior?
>>>>>>> 
>>>>>>> No, the interrupt is not delivered at all in the host. Normally
>>>>>>> you'd have:
>>>>>>> 
>>>>>>> - interrupt delivered to CPU from host affinity
>>>>>>> 
>>>>>>> - VFIO interrupt handler writes to irqfd
>>>>>>> 
>>>>>>> - interrupt delivered to vCPU from guest affinity
>>>>>>> 
>>>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>>>> delivered to the thread that is running the vCPU directly, so
>>>>>>> the host affinity is bypassed entirely.
>>>>>>> 
>>>>>>> ... unless you are considering the case where the vCPU is
>>>>>>> blocked and the host is processing the posted interrupt wakeup vector.
>>>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>>>> matching the host
>>> affinity.
>>>>>> 
>>>>>> In my understanding, wakeup vector should have no relationship
>>>>>> with the host affinity of the irq. Wakeup notification event
>>>>>> should be delivered to the pCPU which the vCPU was blocked on.
>>>>>> And in kernel's point of view, the irq is not associated with
>>>>>> the wakeup vector,
>> right?
>>>>> 
>>>>> That is correct indeed.  It is not associated to the wakeup
>>>>> vector, hence this patch is right, I think.
>>>>> 
>>>>> However, the wakeup vector has the same function as the VFIO
>>>>> interrupt handler, so you could argue that it is tied to the
>>>>> host affinity rather than the guest.  Let's wait for Yang to answer.
>>>> 
>>>> Actually, that's my original question too. I am wondering what
>>>> happens if the
>>> user changes the assigned device's affinity in host's /proc/irq/? If
>>> ignore it is acceptable, then this patch is ok. But it seems the
>>> discussion out of my scope, need some experts to tell us their idea
>>> since it will impact the user experience. Hi Yang,
>> 
>> Hi Jiang,
>> 
>>> Originally we have a proposal to return failure when user sets
>>> IRQ affinity through native OS interfaces if an IRQ is in PI mode.
>>> But that proposal will break CPU hot-removal because OS needs to
>>> migrate away all IRQs binding to the CPU to be offlined. Then we
>>> propose saving user IRQ affinity setting without changing hardware
>>> configuration (keeping PI configuration). Later when PI mode is
>>> disabled, the cached affinity setting will be used to setup IRQ
>>> destination for native OS. On the other hand, for IRQ in PI mode,
>>> it won't be delivered to native OS, so user may not sense that the
>>> IRQ is
>> delivered to CPUs other than those in the affinity set.
>> 
>> The IRQ is still there but will be delivered to host in the form of
>> PI event(if the VCPU is running in root-mode). I am not sure whether
>> those interrupts should be reflected in /proc/interrupts? If the
>> answer is yes, then which entries should be used, a new PI entry or
>> use the
> original IRQ entry?
> 
> Even though, setting the affinity of the IRQ in host should not affect
> the destination of the PI event (normal notification event of wakeup

This is your implementation. To me, disable PI if the VCPU is going to 
run in the CPU out of IRQ affinity bitmap also is acceptable. And it will 
keep the user interface looks the same as before. 

Hi Thomas, Ingo, Peter

Can you guys help to review this patch? Really appreciate if you can give
some feedbacks.

> notification event), because the destination of the PI event is
> determined in NDST field of Posted-interrupts descriptor and PI
> notification vector is global. Just had a discussion with Jiang
> offline, maybe we can add the statistics information for the notification 
> vector in /proc/interrupts just like any other global interrupts.
> 
> Thanks,
> Feng
> 
>> 
>>> In that aspect, I think it's acceptable:) Regards!
>> 
>> Yes, if all of you guys(especially the IRQ maintainer) are think it
>> is acceptable then we can follow current implementation and document it.
>> 
>>> Gerry
>>>> 
>>>>> 
>>>>> Paolo
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-23 Thread Zhang, Yang Z
Jiang Liu wrote on 2014-12-24:
> On 2014/12/24 9:38, Zhang, Yang Z wrote:
>> Paolo Bonzini wrote on 2014-12-23:
>>> 
>>> 
>>> On 23/12/2014 10:07, Wu, Feng wrote:
>>>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>>>> I don't quite understand it. If user set an interrupt's affinity
>>>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>>>> Do you think it is a right behavior?
>>>>> 
>>>>> No, the interrupt is not delivered at all in the host.  Normally you'd 
>>>>> have:
>>>>> 
>>>>> - interrupt delivered to CPU from host affinity
>>>>> 
>>>>> - VFIO interrupt handler writes to irqfd
>>>>> 
>>>>> - interrupt delivered to vCPU from guest affinity
>>>>> 
>>>>> Here, you just skip the first two steps.  The interrupt is
>>>>> delivered to the thread that is running the vCPU directly, so the
>>>>> host affinity is bypassed entirely.
>>>>> 
>>>>> ... unless you are considering the case where the vCPU is blocked
>>>>> and the host is processing the posted interrupt wakeup vector.
>>>>> In that case yes, it would be better to set NDST to a CPU
>>>>> matching the host
> affinity.
>>>> 
>>>> In my understanding, wakeup vector should have no relationship
>>>> with the host affinity of the irq. Wakeup notification event
>>>> should be delivered to the pCPU which the vCPU was blocked on. And
>>>> in kernel's point of view, the irq is not associated with the wakeup 
>>>> vector, right?
>>> 
>>> That is correct indeed.  It is not associated to the wakeup vector,
>>> hence this patch is right, I think.
>>> 
>>> However, the wakeup vector has the same function as the VFIO
>>> interrupt handler, so you could argue that it is tied to the host
>>> affinity rather than the guest.  Let's wait for Yang to answer.
>> 
>> Actually, that's my original question too. I am wondering what
>> happens if the
> user changes the assigned device's affinity in host's /proc/irq/? If
> ignore it is acceptable, then this patch is ok. But it seems the
> discussion out of my scope, need some experts to tell us their idea since it 
> will impact the user experience.
> Hi Yang,

Hi Jiang,

>   Originally we have a proposal to return failure when user sets IRQ
> affinity through native OS interfaces if an IRQ is in PI mode. But
> that proposal will break CPU hot-removal because OS needs to migrate
> away all IRQs binding to the CPU to be offlined. Then we propose
> saving user IRQ affinity setting without changing hardware
> configuration (keeping PI configuration). Later when PI mode is
> disabled, the cached affinity setting will be used to setup IRQ
> destination for native OS. On the other hand, for IRQ in PI mode, it
> won't be delivered to native OS, so user may not sense that the IRQ is 
> delivered to CPUs other than those in the affinity set.

The IRQ is still there but will be delivered to host in the form of PI event(if 
the VCPU is running in root-mode). I am not sure whether those interrupts 
should be reflected in /proc/interrupts? If the answer is yes, then which 
entries should be used, a new PI entry or use the original IRQ entry?

> In that aspect, I think it's acceptable:) Regards!

Yes, if all of you guys(especially the IRQ maintainer) are think it is 
acceptable then we can follow current implementation and document it.

> Gerry
>> 
>>> 
>>> Paolo
>> 
>> 
>> Best regards,
>> Yang
>> 
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-23 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-23:
> 
> 
> On 23/12/2014 10:07, Wu, Feng wrote:
>>> On 23/12/2014 01:37, Zhang, Yang Z wrote:
>>>> I don't quite understand it. If user set an interrupt's affinity
>>>> to a CPU, but he still see the interrupt delivers to other CPUs in host.
>>>> Do you think it is a right behavior?
>>> 
>>> No, the interrupt is not delivered at all in the host.  Normally you'd have:
>>> 
>>> - interrupt delivered to CPU from host affinity
>>> 
>>> - VFIO interrupt handler writes to irqfd
>>> 
>>> - interrupt delivered to vCPU from guest affinity
>>> 
>>> Here, you just skip the first two steps.  The interrupt is
>>> delivered to the thread that is running the vCPU directly, so the
>>> host affinity is bypassed entirely.
>>> 
>>> ... unless you are considering the case where the vCPU is blocked
>>> and the host is processing the posted interrupt wakeup vector.  In
>>> that case yes, it would be better to set NDST to a CPU matching the host 
>>> affinity.
>> 
>> In my understanding, wakeup vector should have no relationship with
>> the host affinity of the irq. Wakeup notification event should be
>> delivered to the pCPU which the vCPU was blocked on. And in kernel's
>> point of view, the irq is not associated with the wakeup vector, right?
> 
> That is correct indeed.  It is not associated to the wakeup vector,
> hence this patch is right, I think.
> 
> However, the wakeup vector has the same function as the VFIO interrupt
> handler, so you could argue that it is tied to the host affinity
> rather than the guest.  Let's wait for Yang to answer.

Actually, that's my original question too. I am wondering what happens if the 
user changes the assigned device's affinity in host's /proc/irq/? If ignore it 
is acceptable, then this patch is ok. But it seems the discussion out of my 
scope, need some experts to tell us their idea since it will impact the user 
experience. 

> 
> Paolo


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-22 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-23:
>> The problem is we still need to support PI with lowest priority
>> delivery mode
> even if guest does not configure irq affinity via /proc/irq/. Don't we?
> 
> Yes, but we can get the basic support working first.
> 
> I and Feng talked on irc and agreed to start with a simple
> implementation. Then he can add support for vector hashing for all
> interrupts, both vt-d pi and normal LAPIC interrupts.

Agree. Obviously, current PI has some limitations to achieve highest 
performance. We can start with a simple implementation but must ensure we don't 
change hardware's behavior(From guest's point). Feng's current implementation 
or use the way I suggested are both ok since both of them cannot solve the 
problem well.

> 
> Paolo
>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-22 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-19:
> 
> 
> On 19/12/2014 02:46, Zhang, Yang Z wrote:
>>> If the IRQ is posted, its affinity is controlled by guest (irq
>>> <---> vCPU <> pCPU), it has no effect when host changes its affinity.
>> 
>> That's the problem: User is able to changes it in host but it never
>> takes effect since it is actually controlled by guest. I guess it
>> will break the IRQ balance too.
> 
> I don't think that's a problem.
> 
> Controlling the affinity in the host affects which CPU in the host
> takes care of signaling the guest.
> 
> If this signaling is done directly by the chipset, there is no need to
> do anything in the host and thus the host affinity can be bypassed.

I don't quite understand it. If user set an interrupt's affinity to a CPU, but 
he still see the interrupt delivers to other CPUs in host. Do you think it is a 
right behavior?

> 
> Paolo


Best regards,
Yang

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> Zhang, Yang Z wrote on 2014-12-19:
>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>> is set
>>>> 
>>>> Wu, Feng wrote on 2014-12-19:
>>>>> 
>>>>> 
>>>>> Zhang, Yang Z wrote on 2014-12-19:
>>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>>> is set
>>>>>> 
>>>>>> Wu, Feng wrote on 2014-12-19:
>>>>>>> 
>>>>>>> 
>>>>>>> iommu-boun...@lists.linux-foundation.org wrote on
>>>>>> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>>>>>>>> Cc: iommu@lists.linux-foundation.org;
>>>>>>>> linux-ker...@vger.kernel.org; k...@vger.kernel.org
>>>>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>>>>> is set
>>>>>>>> 
>>>>>>>> Paolo Bonzini wrote on 2014-12-18:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> linux-kernel-ow...@vger.kernel.org wrote on
>>>>>>>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>>>>>>>>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
>>>>>>>>>>> dw...@infradead.org;
>>>>>>>>>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>>>>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>>>>>>>>>> iommu@lists.linux-foundation.org;
>>>>>>>>>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org;
> KVM
>> list;
>>>>>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
>>>>>>>>>>> posted-interrupt when 'SN' is set
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>>>>>>>> Currently, we don't support urgent interrupt, all
>>>>>>>>>>>> interrupts are recognized as non-urgent interrupt, so we
>>>>>>>>>>>> cannot send posted-interrupt when 'SN' is set.
>>>>>>>>>>> 
>>>>>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot
>>>>>>>>>>> have been scheduled out, and that's the only case when SN is set.
>>>>>>>>>>> 
>>>>>>>>>>> Paolo
>>>>>>>>>> 
>>>>>>>>>> Currently, the only place where SN is set is vCPU is
>>>>>>>>>> preempted and
>>>>>>>> 
>>>>>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>>>>>>>> What happens if a PI is occurs when vCPU is preempted?
>>>>>>> 
>>>>>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>>>>>> interrupts are suppressed for posting.
>>>>>> 
>>>>>> I mean what happens if we don't set SN bit. From my point, if
>>>>>> preempter already disabled the interrupt, it is ok to leave SN
>>>>>> bit as zero. But if preempter enabled the interrupt, doesn't
>>>>>> this mean he allow interrupt to happen? BTW, since there
>>>>>> already has ON bit, so this means there only have one interrupt
>>>>>> arrived at most and it doesn't hurt performance. Do we really need to 
>>>>>> set SN bit?
>>>>> 
>>>>> 
>>>>> See this scenario:
>>>>> vCPU0 is running on pCPU0
>>>>> --> vCPU0 is preempted by vCPU1
>>>>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
>>>>> --&g

RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> Zhang, Yang Z wrote on 2014-12-19:
>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>> is set
>>>> 
>>>> Wu, Feng wrote on 2014-12-19:
>>>>> 
>>>>> 
>>>>> iommu-boun...@lists.linux-foundation.org wrote on
>>>> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>>>>>> Cc: iommu@lists.linux-foundation.org;
>>>>>> linux-ker...@vger.kernel.org; k...@vger.kernel.org
>>>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>>>> is set
>>>>>> 
>>>>>> Paolo Bonzini wrote on 2014-12-18:
>>>>>>> 
>>>>>>> 
>>>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> linux-kernel-ow...@vger.kernel.org wrote on
>>>>>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>>>>>>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
>>>>>>>>> dw...@infradead.org;
>>>>>>>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>>>>>>>> iommu@lists.linux-foundation.org;
>>>>>>>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
>>>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
>>>>>>>>> posted-interrupt when 'SN' is set
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>>>>>> Currently, we don't support urgent interrupt, all
>>>>>>>>>> interrupts are recognized as non-urgent interrupt, so we
>>>>>>>>>> cannot send posted-interrupt when 'SN' is set.
>>>>>>>>> 
>>>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot
>>>>>>>>> have been scheduled out, and that's the only case when SN is set.
>>>>>>>>> 
>>>>>>>>> Paolo
>>>>>>>> 
>>>>>>>> Currently, the only place where SN is set is vCPU is
>>>>>>>> preempted and
>>>>>> 
>>>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>>>>>> What happens if a PI is occurs when vCPU is preempted?
>>>>> 
>>>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>>>> interrupts are suppressed for posting.
>>>> 
>>>> I mean what happens if we don't set SN bit. From my point, if
>>>> preempter already disabled the interrupt, it is ok to leave SN
>>>> bit as zero. But if preempter enabled the interrupt, doesn't this
>>>> mean he allow interrupt to happen? BTW, since there already has
>>>> ON bit, so this means there only have one interrupt arrived at
>>>> most and it doesn't hurt performance. Do we really need to set SN bit?
>>> 
>>> 
>>> See this scenario:
>>> vCPU0 is running on pCPU0
>>> --> vCPU0 is preempted by vCPU1
>>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
>>> --> schedule in runqueue
>>> 
>>> If the we don't set SN for vCPU0, then all subsequent interrupts
>>> for
>>> vCPU0 is posted to vCPU1, this will consume hardware and software
>> 
>> The PI vector for vCPU1 is notification vector, but the PI vector
>> for
>> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
> 
> Wakeup vector is only used for blocking case, when vCPU is preempted
> and waiting in the runqueue, the NV is the notification vector.

I see your point. But from performance point, if we can schedule the vCPU to 
another PCPU to handle the interrupt, it would helpful. But I remember current 
KVM will not schedule the vCPU in run queue (even though it got preempted) to 
another pCPU to run(Am I right?). So it may hard to do it.

> 
> Thanks,
> Feng
> 
>> 
>>> efforts and in fact it is not needed at all. If SN is set for
>>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
>>> when an interrupt is for it, but just setting the related PIR bit.
>>> 
>>> Thanks,
>>> Feng
>>> 
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Feng
>>>>> 
>>>>>> 
>>>>>>>> waiting for the next scheduling in the runqueue. But I am not
>>>>>>>> sure whether we need to set SN for other purpose in future.
>>>>>>>> Adding SN checking here is just to follow the Spec.
>>>>>>>> non-urgent interrupts are suppressed
>>>>>>> when SN is set.
>>>>>>> 
>>>>>>> I would change that to a WARN_ON_ONCE then.
>>>>>> 
>>>>>> 
>>>>>> Best regards,
>>>>>> Yang
>>>>>> 
>>>>>> 
>>>>>> ___
>>>>>> iommu mailing list
>>>>>> iommu@lists.linux-foundation.org
>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> iommu-boun...@lists.linux-foundation.org wrote on
>> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>>>> Cc: iommu@lists.linux-foundation.org;
>>>> linux-ker...@vger.kernel.org; k...@vger.kernel.org
>>>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>>>> is set
>>>> 
>>>> Paolo Bonzini wrote on 2014-12-18:
>>>>> 
>>>>> 
>>>>> On 18/12/2014 04:14, Wu, Feng wrote:
>>>>>> 
>>>>>> 
>>>>>> linux-kernel-ow...@vger.kernel.org wrote on
>>>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>>>>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org;
>>>>>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>>>>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>>>>>> iommu@lists.linux-foundation.org;
>>>>>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
>>>>>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt
>>>>>>> when 'SN' is set
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 12/12/2014 16:14, Feng Wu wrote:
>>>>>>>> Currently, we don't support urgent interrupt, all interrupts
>>>>>>>> are recognized as non-urgent interrupt, so we cannot send
>>>>>>>> posted-interrupt when 'SN' is set.
>>>>>>> 
>>>>>>> Can this happen?  If the vcpu is in guest mode, it cannot have
>>>>>>> been scheduled out, and that's the only case when SN is set.
>>>>>>> 
>>>>>>> Paolo
>>>>>> 
>>>>>> Currently, the only place where SN is set is vCPU is preempted
>>>>>> and
>>>> 
>>>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>>>> What happens if a PI is occurs when vCPU is preempted?
>>> 
>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>> interrupts are suppressed for posting.
>> 
>> I mean what happens if we don't set SN bit. From my point, if
>> preempter already disabled the interrupt, it is ok to leave SN bit
>> as zero. But if preempter enabled the interrupt, doesn't this mean
>> he allow interrupt to happen? BTW, since there already has ON bit,
>> so this means there only have one interrupt arrived at most and it
>> doesn't hurt performance. Do we really need to set SN bit?
> 
> 
> See this scenario:
> vCPU0 is running on pCPU0
> --> vCPU0 is preempted by vCPU1
> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule
> --> in runqueue
> 
> If the we don't set SN for vCPU0, then all subsequent interrupts for
> vCPU0 is posted to vCPU1, this will consume hardware and software

The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0 
should be wakeup vector. Why vCPU1 will consume this PI event?

> efforts and in fact it is not needed at all. If SN is set for vCPU0,
> VT-d hardware will not issue Notification Event for vCPU0 when an
> interrupt is for it, but just setting the related PIR bit.
> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> Thanks,
>>> Feng
>>> 
>>>> 
>>>>>> waiting for the next scheduling in the runqueue. But I am not
>>>>>> sure whether we need to set SN for other purpose in future.
>>>>>> Adding SN checking here is just to follow the Spec. non-urgent
>>>>>> interrupts are suppressed
>>>>> when SN is set.
>>>>> 
>>>>> I would change that to a WARN_ON_ONCE then.
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> 
>>>> ___
>>>> iommu mailing list
>>>> iommu@lists.linux-foundation.org
>>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> iommu-boun...@lists.linux-foundation.org wrote on 
> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Paolo Bonzini wrote on 2014-12-18:
>>> 
>>> 
>>> On 18/12/2014 04:14, Wu, Feng wrote:
 
 
 linux-kernel-ow...@vger.kernel.org wrote on
>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
> dw...@infradead.org;
> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
> joro-zLv9SwRftAIdnm+Jiang
> Liu
> Cc: iommu@lists.linux-foundation.org;
> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
> Eric Auger
> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> is set
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
>> Currently, we don't support urgent interrupt, all interrupts
>> are recognized as non-urgent interrupt, so we cannot send
>> posted-interrupt when 'SN' is set.
> 
> Can this happen?  If the vcpu is in guest mode, it cannot have
> been scheduled out, and that's the only case when SN is set.
> 
> Paolo
 
 Currently, the only place where SN is set is vCPU is preempted
 and
>> 
>> If the vCPU is preempted, shouldn't the subsequent be ignored? What
>> happens if a PI is occurs when vCPU is preempted?
> 
> If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts
> are suppressed for posting.

I mean what happens if we don't set SN bit. From my point, if preempter already 
disabled the interrupt, it is ok to leave SN bit as zero. But if preempter 
enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW, 
since there already has ON bit, so this means there only have one interrupt 
arrived at most and it doesn't hurt performance. Do we really need to set SN 
bit?

> 
> Thanks,
> Feng
> 
>> 
 waiting for the next scheduling in the runqueue. But I am not
 sure whether we need to set SN for other purpose in future.
 Adding SN checking here is just to follow the Spec. non-urgent
 interrupts are suppressed
>>> when SN is set.
>>> 
>>> I would change that to a WARN_ON_ONCE then.
>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Paolo Bonzini wrote on 2014-12-19:
>> jiang@linux.intel.com
>> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
>> iommu@lists.linux-foundation.org; k...@vger.kernel.org
>> Subject: Re: [v3 13/26] KVM: Define a new interface
>> kvm_find_dest_vcpu() for VT-d PI
>> 
>> 
>> 
>> On 18/12/2014 15:49, Zhang, Yang Z wrote:
>>>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
>>>>> guest lowest priority interrtups when VT-d PI is used. Here is
>>>>> the
>>>>> ideas: - Each vCPU has a counter 'round_robin_counter'. - When
>>>>> guests sets an interrupts to lowest priority, we choose the vCPU
>>>>> with smallest 'round_robin_counter' as the destination, then
>>>>> increase it.
>>> 
>>> How this can work well? All subsequent interrupts are delivered to
>>> one vCPU? It shouldn't be the best solution, need more consideration.
>> 
>> Well, it's a hardware limitation.  The alternative (which is easy to
>> implement) is to only do PI for single-CPU interrupts.  This should
>> work well for multiqueue NICs (and of course for UP guests :)), so
>> perhaps it's a good idea to only support that as a first attempt.
>> 
>> Paolo
> 
> Paolo, what do you mean by "single-CPU interrupts"? Do you mean we

It should be same idea as I mentioned on another thread: deliver the interrupt 
to a single CPU(maybe the first matched VCPU?)

> don't support lowest priority interrupts for PI? But Linux OS uses
> lowest priority for most of the case? If so, we can hardly get benefit
> from this feature for Linux guest OS.
> 
> Thanks,
> Feng
> 
>> 
>>> Also, I think you should take the apic_arb_prio into consider
>>> since the priority is for the whole vCPU not for one interrupt.


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-18:
>> jiang@linux.intel.com
>> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
>> iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Feng Wu wrote on 2014-12-12:
>>> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
>>> When 'pst' is set in IRTE, the associated irq will be posted to
>>> guests instead of interrupt remapping. The destination of the
>>> interrupt is set in Posted-Interrupts Descriptor, and the
>>> migration happens during vCPU scheduling.
>>> 
>>> However, we still update the cached irte here, which can be used
>>> when changing back to remapping mode.
>>> 
>>> Signed-off-by: Feng Wu 
>>> Reviewed-by: Jiang Liu 
>>> ---
>>>  drivers/iommu/intel_irq_remapping.c | 6 +-
>>>  1 file changed, 5 insertions(+), 1 deletion(-) diff --git
>>> a/drivers/iommu/intel_irq_remapping.c
>>> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a
>>> 100644
>>> --- a/drivers/iommu/intel_irq_remapping.c +++
>>> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
>>> intel_ir_set_affinity(struct irq_data *data, const struct cpumask
>>> *mask,  {
>>> struct intel_ir_data *ir_data = data->chip_data;struct irte 
>>> *irte =
>>>  &ir_data->irte_entry; +struct irte_pi *irte_pi = (struct irte_pi
>>>  *)irte;struct irq_cfg *cfg = irqd_cfg(data);   struct irq_data *parent
>>>  = data->parent_data;   int ret;
>>> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
>>> const struct cpumask *mask,
>>>  */
>>> irte->vector = cfg->vector;
>>> irte->dest_id = IRTE_DEST(cfg->dest_apicid);
>>> -   modify_irte(&ir_data->irq_2_iommu, irte);
>>> +
>>> +   /* We don't need to modify irte if the interrupt is for posting. */
>>> +   if (irte_pi->pst != 1)
>>> +   modify_irte(&ir_data->irq_2_iommu, irte);
>> 
>> What happens if user changes the IRQ affinity manually?
> 
> If the IRQ is posted, its affinity is controlled by guest (irq <--->
> vCPU <> pCPU), it has no effect when host changes its affinity.

That's the problem: User is able to changes it in host but it never takes 
effect since it is actually controlled by guest. I guess it will break the IRQ 
balance too.

> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> /*
>>>  * After this point, all the interrupts will start arriving
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-19:
> 
> 
> On 18/12/2014 15:49, Zhang, Yang Z wrote:
>>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
>>>> guest lowest priority interrtups when VT-d PI is used. Here is the
>>>> ideas: - Each vCPU has a counter 'round_robin_counter'. - When
>>>> guests sets an interrupts to lowest priority, we choose the vCPU
>>>> with smallest 'round_robin_counter' as the destination, then
>>>> increase it.
>> 
>> How this can work well? All subsequent interrupts are delivered to
>> one vCPU? It shouldn't be the best solution, need more consideration.
> 
> Well, it's a hardware limitation.  The alternative (which is easy to

Agree, it is limited by hardware. But lowest priority distributes the interrupt 
more efficient than fixed mode. And current implementation more likes to switch 
the lowest priority mode to fixed mode. In case of interrupt intensive 
environment, this may be a bottleneck and VM may not benefit greatly from VT-d 
PI. But agree again, it is really a hardware limitation.

> implement) is to only do PI for single-CPU interrupts.  This should
> work well for multiqueue NICs (and of course for UP guests :)), so
> perhaps it's a good idea to only support that as a first attempt.

The more easy way is to deliver the interrupt to the first matched VCPU we 
find. The round_robin_counter really helps nothing here since the interrupt is 
delivered by hardware directly.

> 
> Paolo
> 
>> Also, I think you should take the apic_arb_prio into consider since
>> the priority is for the whole vCPU not for one interrupt.


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> This patch initializes the VT-d Posted-Interrupts Descriptor.
> 
> Signed-off-by: Feng Wu 
> ---
>  arch/x86/kvm/vmx.c | 27 +++
>  1 file changed, 27 insertions(+)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> 0b1383e..66ca275 100644 --- a/arch/x86/kvm/vmx.c +++
> b/arch/x86/kvm/vmx.c @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "trace.h"
> @@ -4433,6 +4434,30 @@ static void ept_set_mmio_spte_mask(void)
>   kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);  }
> +static void pi_desc_init(struct vcpu_vmx *vmx) {
> + unsigned int dest;
> +
> + if (!irq_remapping_cap(IRQ_POSTING_CAP))
> + return;
> +
> + /*
> +  * Initialize Posted-Interrupt Descriptor
> +  */
> +
> + pi_clear_sn(&vmx->pi_desc);
> + vmx->pi_desc.nv = POSTED_INTR_VECTOR;

Here.

> +
> + /* Physical mode for Notificaiton Event */
> + vmx->pi_desc.ndm = 0;

And from here..

> + dest = cpu_physical_id(vmx->vcpu.cpu);
> +
> + if (x2apic_enabled())
> + vmx->pi_desc.ndst = dest;
> + else
> + vmx->pi_desc.ndst = (dest << 8) & 0xFF00; }
> +

..to here are useless. The right place to update PI descriptor is where vcpu 
got loaded not in initialization.

>  /*
>   * Sets up the vmcs for emulated real mode.
>   */
> @@ -4476,6 +4501,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> 
>   vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>   vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> +
> + pi_desc_init(vmx);
>   }
>  
>   if (ple_gap) {


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-18:
> 
> 
> On 18/12/2014 04:14, Wu, Feng wrote:
>> 
>> 
>> linux-kernel-ow...@vger.kernel.org wrote on 
>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org;
>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>> joro-zLv9SwRftAIdnm+Jiang
>>> Liu
>>> Cc: iommu@lists.linux-foundation.org;
>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
>>> Eric Auger
>>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>>> set
>>> 
>>> 
>>> 
>>> On 12/12/2014 16:14, Feng Wu wrote:
 Currently, we don't support urgent interrupt, all interrupts are
 recognized as non-urgent interrupt, so we cannot send
 posted-interrupt when 'SN' is set.
>>> 
>>> Can this happen?  If the vcpu is in guest mode, it cannot have been
>>> scheduled out, and that's the only case when SN is set.
>>> 
>>> Paolo
>> 
>> Currently, the only place where SN is set is vCPU is preempted and

If the vCPU is preempted, shouldn't the subsequent be ignored? What happens if 
a PI is occurs when vCPU is preempted?

>> waiting for the next scheduling in the runqueue. But I am not sure
>> whether we need to set SN for other purpose in future. Adding SN
>> checking here is just to follow the Spec. non-urgent interrupts are
>> suppressed
> when SN is set.
> 
> I would change that to a WARN_ON_ONCE then.


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> Currently, we use a global vector as the Posted-Interrupts
> Notification Event for all the vCPUs in the system. We need to
> introduce another global vector for VT-d Posted-Interrtups, which will
> be used to wakeup the sleep vCPU when an external interrupt from a 
> direct-assigned device happens for that vCPU.
> 

Hi Feng, 

Since the idea of two global vectors mechanism is from me, please add me to the 
comments.

> Signed-off-by: Feng Wu 
> ---
>  arch/x86/include/asm/entry_arch.h  |  2 ++
>  arch/x86/include/asm/hardirq.h |  1 +
>  arch/x86/include/asm/hw_irq.h  |  2 ++
>  arch/x86/include/asm/irq_vectors.h |  1 +
>  arch/x86/kernel/entry_64.S |  2 ++
>  arch/x86/kernel/irq.c  | 27 +++
>  arch/x86/kernel/irqinit.c  |  2 ++
>  7 files changed, 37 insertions(+)
> diff --git a/arch/x86/include/asm/entry_arch.h
> b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 ---
> a/arch/x86/include/asm/entry_arch.h +++
> b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@
> BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)  #ifdef
> CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
>smp_kvm_posted_intr_ipi)
> +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
> +  smp_kvm_posted_intr_wakeup_ipi)
>  #endif
>  
>  /*
> diff --git a/arch/x86/include/asm/hardirq.h
> b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -14,6 +14,7 @@ typedef struct {
>  #endif #ifdef CONFIG_HAVE_KVMunsigned int kvm_posted_intr_ipis;
>  +unsigned int kvm_posted_intr_wakeup_ipis; #endifunsigned int
>  x86_platform_ipis;   /* arch dependent */unsigned int apic_perf_irqs;
> diff --git a/arch/x86/include/asm/hw_irq.h
> b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -29,6 +29,7 @@
>  extern asmlinkage void apic_timer_interrupt(void);  extern asmlinkage
> void x86_platform_ipi(void);  extern asmlinkage void
> kvm_posted_intr_ipi(void); +extern asmlinkage void
> kvm_posted_intr_wakeup_ipi(void);
>  extern asmlinkage void error_interrupt(void);  extern asmlinkage void
> irq_work_interrupt(void);
> 
> @@ -92,6 +93,7 @@ extern void
> trace_call_function_single_interrupt(void);
>  #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
> #define trace_reboot_interrupt  reboot_interrupt  #define
> trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
> +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
>  #endif /* CONFIG_TRACING */
>  
>  struct irq_domain;
> diff --git a/arch/x86/include/asm/irq_vectors.h
> b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 ---
> a/arch/x86/include/asm/irq_vectors.h +++
> b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@
>  /* Vector for KVM to deliver posted interrupt IPI */  #ifdef
>  CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR   0xf2 +#define
>  POSTED_INTR_WAKEUP_VECTOR0xf1 #endif
>  
>  /*
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++
> b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt
> X86_PLATFORM_IPI_VECTOR \  #ifdef CONFIG_HAVE_KVM
>  apicinterrupt3 POSTED_INTR_VECTOR \
>   kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
> +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
> + kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
>  #endif
>  
>  #ifdef CONFIG_X86_MCE_THRESHOLD
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index
> 922d285..47408c3 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs
> *regs)  }
> 
>  #ifdef CONFIG_HAVE_KVM
> +void (*wakeup_handler_callback)(void) = NULL;
> +EXPORT_SYMBOL_GPL(wakeup_handler_callback); +
>  /*
>   * Handler for POSTED_INTERRUPT_VECTOR.
>   */
> @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct
> pt_regs *regs)
> 
>   set_irq_regs(old_regs);
>  }
> +
> +/*
> + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> + */
> +__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs) {
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> + ack_APIC_irq();
> +
> + irq_enter();
> +
> + exit_idle();
> +
> + inc_irq_stat(kvm_posted_intr_wakeup_ipis);
> +
> + if (wakeup_handler_callback)
> + wakeup_handler_callback();
> +
> + irq_exit();
> +
> + set_irq_regs(old_regs);
> +}
> +
>  #endif
>  
>  __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs) diff
> --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index
> 70e181e..844673c 100644 --- a/arch/x86/kernel/irqinit.c +++
> b/arch/x86/kernel/irqinit.c @@ -144,6 +144,8 

RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> This patch defines a new interface kvm_find_dest_vcpu for
> VT-d PI, which can returns the destination vCPU of the
> interrupt for guests.
> 
> Since VT-d PI cannot handle broadcast/multicast interrupt,
> Here we only handle Fixed and Lowest priority interrupts.
> 
> The current method of handling guest lowest priority interrtups
> is to use a counter 'apic_arb_prio' for each vCPU, we choose the
> vCPU with smallest 'apic_arb_prio' and then increase it by 1.
> However, for VT-d PI, we cannot re-use this, since we no longer
> have control to 'apic_arb_prio' with posted interrupt direct
> delivery by Hardware.
> 
> Here, we introduce a similar way with 'apic_arb_prio' to handle guest
> lowest priority interrtups when VT-d PI is used. Here is the ideas: -
> Each vCPU has a counter 'round_robin_counter'. - When guests sets an
> interrupts to lowest priority, we choose the vCPU with smallest
> 'round_robin_counter' as the destination, then increase it.
 
How this can work well? All subsequent interrupts are delivered to one vCPU? It 
shouldn't be the best solution, need more consideration. Also, I think you 
should take the apic_arb_prio into consider since the priority is for the whole 
vCPU not for one interrupt.

Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to guests
> instead of interrupt remapping. The destination of the interrupt is
> set in Posted-Interrupts Descriptor, and the migration happens during
> vCPU scheduling.
> 
> However, we still update the cached irte here, which can be used when
> changing back to remapping mode.
> 
> Signed-off-by: Feng Wu 
> Reviewed-by: Jiang Liu 
> ---
>  drivers/iommu/intel_irq_remapping.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> a/drivers/iommu/intel_irq_remapping.c +++
> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
>  {
>   struct intel_ir_data *ir_data = data->chip_data;struct irte 
> *irte =
>  &ir_data->irte_entry; +  struct irte_pi *irte_pi = (struct irte_pi
>  *)irte;  struct irq_cfg *cfg = irqd_cfg(data);   struct irq_data *parent
>  = data->parent_data; int ret;
> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> const struct cpumask *mask,
>*/
>   irte->vector = cfg->vector;
>   irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> - modify_irte(&ir_data->irq_2_iommu, irte);
> +
> + /* We don't need to modify irte if the interrupt is for posting. */
> + if (irte_pi->pst != 1)
> + modify_irte(&ir_data->irq_2_iommu, irte);

What happens if user changes the IRQ affinity manually?

> 
>   /*
>* After this point, all the interrupts will start arriving


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-11-13:
> 
> 
> Zhang, Yang Z wrote on 2014-11-13:
>> k...@vger.kernel.org; iommu@lists.linux-foundation.org;
>> linux-ker...@vger.kernel.org
>> Subject: RE: [PATCH 05/13] KVM: Update IRTE according to guest
>> interrupt configuration changes
>> 
>> Wu, Feng wrote on 2014-11-13:
>>> 
>>> 
>>> kvm-ow...@vger.kernel.org wrote on 2014-11-12:
>>>> k...@vger.kernel.org; iommu@lists.linux-foundation.org;
>>>> linux-ker...@vger.kernel.org
>>>> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest
>>>> interrupt configuration changes
>>>> 
>>>> 
>>>> 
>>>> On 12/11/2014 10:19, Wu, Feng wrote:
>>>>>> You can certainly backport these patches to distros that do not
>>>>>> have VFIO.  But upstream we should work on VFIO first.  VFIO
>>>>>> has feature parity with legacy device assignment, and adding a
>>>>>> new feature that is not in VFIO would be a bad idea.
>>>>>> 
>>>>>> By the way, do you have benchmark results for it?  We have not
>>>>>> been able to see any performance improvement for APICv on e.g.
> netperf.
>>>>> 
>>>>> Do you mean benchmark results for APICv itself or VT-d
> Posted-Interrtups?
>>>> 
>>>> Especially for VT-d posted interrupts---but it'd be great to know
>>>> which workloads see the biggest speedup from APICv.
>>> 
>>> We have some draft performance data internally, please see the
>>> attached. For VT-d PI, I think we can get the biggest performance gain
>>> if the VCPU is running in non-root mode for most of the time (not in
>>> HLT state), since external interrupt from assigned devices will be
>>> delivered by guest directly in this case. That means we can run some
>>> cpu intensive workload in the guests.
>> 
>> Have you check that the CPU side posted interrupt is taking effect
>> in w/o VT-D PI case? Per my understanding, the performance gap
>> should be so large if you use CPU side posted interrupt. This data
>> more like the VT-d PI vs non PI(both VT-d and CPU).
> 
> Yes, this data is VT-d PI vs Non VT-d PI. The CPU side APICv mechanism
> (including CPU side Posted-Interrtups) is enabled.

>From the CPU utilization data, it seems the environment of APICv is not 
>reasonable to me. with current APICv, the interrupt should not deliver to the 
>PCPU where vcpu is running. Otherwise, it will force the vcpu vmexit and the 
>CPU side posted interrupt cannot take effect. Do you set the interrupt 
>affinity manually?

> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> Thanks,
>>> Feng
>>> 
>>>> 
>>>> Paolo
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majord...@vger.kernel.org More majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-12 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-11-13:
> 
> 
> kvm-ow...@vger.kernel.org wrote on 2014-11-12:
>> k...@vger.kernel.org; iommu@lists.linux-foundation.org; 
>> linux-ker...@vger.kernel.org
>> Subject: Re: [PATCH 05/13] KVM: Update IRTE according to guest 
>> interrupt configuration changes
>> 
>> 
>> 
>> On 12/11/2014 10:19, Wu, Feng wrote:
 You can certainly backport these patches to distros that do not 
 have VFIO.  But upstream we should work on VFIO first.  VFIO has 
 feature parity with legacy device assignment, and adding a new 
 feature that is not in VFIO would be a bad idea.
 
 By the way, do you have benchmark results for it?  We have not been 
 able to see any performance improvement for APICv on e.g. netperf.
>>> 
>>> Do you mean benchmark results for APICv itself or VT-d Posted-Interrtups?
>> 
>> Especially for VT-d posted interrupts---but it'd be great to know 
>> which workloads see the biggest speedup from APICv.
> 
> We have some draft performance data internally, please see the 
> attached. For VT-d PI, I think we can get the biggest performance gain 
> if the VCPU is running in non-root mode for most of the time (not in 
> HLT state), since external interrupt from assigned devices will be delivered 
> by guest directly in this case.
> That means we can run some cpu intensive workload in the guests.

Have you check that the CPU side posted interrupt is taking effect in w/o VT-D 
PI case? Per my understanding, the performance gap should be so large if you 
use CPU side posted interrupt. This data more like the VT-d PI vs non PI(both 
VT-d and CPU).

> 
> Thanks,
> Feng
> 
>> 
>> Paolo
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in the 
>> body of a message to majord...@vger.kernel.org More majordomo info at 
>> http://vger.kernel.org/majordomo-info.html


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH 05/13] KVM: Update IRTE according to guest interrupt configuration changes

2014-11-11 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-11-11:
> 
> 
> On 11/11/2014 10:20, Wu, Feng wrote:
>>> Since legacy KVM device assignment is effectively deprecated, have
>>> you considered how we might do this with VFIO?  Thanks,
>> 
>> I haven't thought about how to enable this in VFIO so far. I think I
>> can continue to implement that if needed after this patch set is finished.
> What do you think of this?
> 
> Hi Feng,
> 
> we are not applying new features to legacy KVM device assignment,
> since it is unsafe (it does not honor ACS).

Personally, I think this feature will be helpful to the legacy device 
assignment. Agree, vfio is the right solution for future feature enabling. But 
the old kvm without the good vfio supporting is still used largely today. The 
user really looking for this feature but they will not upgrade their kernel. 
It's easy for us to backport this feature to old kvm with the legacy device 
assignment, but it is impossible to backport the whole vfio. So I think you 
guys can take a consider to add this feature to both vfio and legacy device 
assignment.

> 
> I and Alex can help you with designing a way to interface VFIO with
> KVM posted interrupts.  Give us a few days to study these patches
> more, or feel free to request comments if you have ideas about it yourself.
> 
> Paolo


Best regards,
Yang

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH] iommu/intel: SNP bit is not dependent on iommu domain coherency

2014-01-06 Thread Zhang, Yang Z
Alex Williamson wrote on 2013-12-24:
> David,
> 
> Any comments on this patch?  Thanks,
> 

Hi Alex,

There do have some IOMMUs will treat SNP bit in the PTE as reserved (0) and 
will cause a reserved field violation fault if it is set but hardware not 
support snoop-control(bit 7 in ECAP_REG is 0). So your patch seems wrong to me.

> Alex
> 
> On Tue, 2013-10-29 at 10:21 -0600, Alex Williamson wrote:
>> The setting of the SNP bit in the intel-iommu page tables should not
>> be dependent on the current capability of the iommu domain.  The
>> current VT-d spec (2.2) indicates the SNP bit is "treated as
>> reserved[0] by hardware implementations not supporting Snoop Control".
>> Furthermore, section 3.7.3 indicates:
>> 
>>   If the Snoop Control (SC) field in extended capability Register is
>>   reported as 0, snoop behavior for access to the page mapped through
>>   second-level translation is determined by the no-snoop attribute in
>>   the request.
>> This all seems to indicate that hardware incapable of Snoop Control
>> will handle the SNP bit as zero regardless of the value stored in
>> the PTE.
>> 
>> The trouble with the current implementation is that mapping flags
>> depend on the state of the iommu domain at the time of the mapping,
>> yet no attempt is made to update existing mappings when the iommu
>> domain composition changes.  This leaves the iommu domain in a state
>> where some mappings may enforce coherency, others do not, and the
>> user of the IOMMU API has no ability to later enable the desired
>> flags atomically with respect to DMA.
>> 
>> If we always honor the IOMMU_CACHE flag then an IOMMU API user who
>> specifies IOMMU_CACHE for all mappings can assume that the coherency
>> of the mappings within a domain follow the coherency capability of
>> the domain itself.
>> 
>> Signed-off-by: Alex Williamson 
>> ---
>>  drivers/iommu/intel-iommu.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> diff --git a/drivers/iommu/intel-iommu.c
>> b/drivers/iommu/intel-iommu.c index 15e9b57..c46c6a6 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -4084,7 +4084,7 @@ static int intel_iommu_map(struct iommu_domain
> *domain,
>>  prot |= DMA_PTE_READ;
>>  if (iommu_prot & IOMMU_WRITE)
>>  prot |= DMA_PTE_WRITE;
>> -if ((iommu_prot & IOMMU_CACHE) && dmar_domain->iommu_snooping)
>> +if (iommu_prot & IOMMU_CACHE)
>>  prot |= DMA_PTE_SNP;
>>  
>>  max_addr = iova + size;
> 
>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu