RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> Zhang, Yang Z wrote on 2014-12-19:
 Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
 is set
 
 Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>> is set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> iommu-boun...@lists.linux-foundation.org wrote on
>> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
 Cc: iommu@lists.linux-foundation.org;
 linux-ker...@vger.kernel.org; k...@vger.kernel.org
 Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
 is set
 
 Paolo Bonzini wrote on 2014-12-18:
> 
> 
> On 18/12/2014 04:14, Wu, Feng wrote:
>> 
>> 
>> linux-kernel-ow...@vger.kernel.org wrote on
 mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
>>> dw...@infradead.org;
>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>> iommu@lists.linux-foundation.org;
>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org;
> KVM
>> list;
>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
>>> posted-interrupt when 'SN' is set
>>> 
>>> 
>>> 
>>> On 12/12/2014 16:14, Feng Wu wrote:
 Currently, we don't support urgent interrupt, all
 interrupts are recognized as non-urgent interrupt, so we
 cannot send posted-interrupt when 'SN' is set.
>>> 
>>> Can this happen?  If the vcpu is in guest mode, it cannot
>>> have been scheduled out, and that's the only case when SN is set.
>>> 
>>> Paolo
>> 
>> Currently, the only place where SN is set is vCPU is
>> preempted and
 
 If the vCPU is preempted, shouldn't the subsequent be ignored?
 What happens if a PI is occurs when vCPU is preempted?
>>> 
>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>> interrupts are suppressed for posting.
>> 
>> I mean what happens if we don't set SN bit. From my point, if
>> preempter already disabled the interrupt, it is ok to leave SN
>> bit as zero. But if preempter enabled the interrupt, doesn't
>> this mean he allow interrupt to happen? BTW, since there
>> already has ON bit, so this means there only have one interrupt
>> arrived at most and it doesn't hurt performance. Do we really need to 
>> set SN bit?
> 
> 
> See this scenario:
> vCPU0 is running on pCPU0
> --> vCPU0 is preempted by vCPU1
> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
> --> schedule in runqueue
> 
> If the we don't set SN for vCPU0, then all subsequent interrupts
> for
> vCPU0 is posted to vCPU1, this will consume hardware and
> software
 
 The PI vector for vCPU1 is notification vector, but the PI vector
 for
 vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
>>> 
>>> Wakeup vector is only used for blocking case, when vCPU is
>>> preempted and waiting in the runqueue, the NV is the notification vector.
>> 
>> I see your point. But from performance point, if we can schedule the
>> vCPU to another PCPU to handle the interrupt, it would helpful. But I
>> remember current KVM will not schedule the vCPU in run queue (even
>> though it got preempted) to another pCPU to run(Am I right?). So it may
>> hard to do it.
>> 
> 
> KVM is using the Linux scheduler, when the preempted vCPU (in
> runqueue) is scheduled again depends on the scheduling algorithm
> itself, I think it is a little hard for us to get involved.
> 
> I think what you mentioned is a little like the urgent interrupt in VT-d PI 
> Spec.
> For this kind of interrupts, if an interrupt is coming for an
> preempted vCPU (waiting in the run queue), we need to schedule the
> vCPU immediately. This is some real time things. And we don't support urgent 
> interrupt so far.

Yes. IIRC, if we use two global vectors mechanism properly, there should no 
need to use hardware urgent interrupt mechanism. :)

> 
> Thanks,
> Feng
> 
>>> 
>>> Thanks,
>>> Feng
>>> 
 
> efforts and in fact it is not needed at all. If SN is set for
> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
> when an interrupt is for it, but just setting the related PIR bit.
> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> Thanks,
>>> Feng
>>> 
 
>> waiting for the next scheduling in the runqueue. But I am
>> 

RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 1:26 PM
> To: Wu, Feng; Paolo Bonzini; k...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Wu, Feng wrote on 2014-12-19:
> >
> >
> > Zhang, Yang Z wrote on 2014-12-19:
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >> set
> >>
> >> Wu, Feng wrote on 2014-12-19:
> >>>
> >>>
> >>> Zhang, Yang Z wrote on 2014-12-19:
>  Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>  is set
> 
>  Wu, Feng wrote on 2014-12-19:
> >
> >
> > iommu-boun...@lists.linux-foundation.org wrote on
>  mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
> >> Cc: iommu@lists.linux-foundation.org;
> >> linux-ker...@vger.kernel.org; k...@vger.kernel.org
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> >> is set
> >>
> >> Paolo Bonzini wrote on 2014-12-18:
> >>>
> >>>
> >>> On 18/12/2014 04:14, Wu, Feng wrote:
> 
> 
>  linux-kernel-ow...@vger.kernel.org wrote on
> >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> > x...@kernel.org; Gleb Natapov; Paolo Bonzini;
> > dw...@infradead.org;
> > joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex
> Williamson;
> > joro-zLv9SwRftAIdnm+Jiang Liu Cc:
> > iommu@lists.linux-foundation.org;
> > linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM
> list;
> > Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
> > posted-interrupt when 'SN' is set
> >
> >
> >
> > On 12/12/2014 16:14, Feng Wu wrote:
> >> Currently, we don't support urgent interrupt, all
> >> interrupts are recognized as non-urgent interrupt, so we
> >> cannot send posted-interrupt when 'SN' is set.
> >
> > Can this happen?  If the vcpu is in guest mode, it cannot
> > have been scheduled out, and that's the only case when SN is set.
> >
> > Paolo
> 
>  Currently, the only place where SN is set is vCPU is
>  preempted and
> >>
> >> If the vCPU is preempted, shouldn't the subsequent be ignored?
> >> What happens if a PI is occurs when vCPU is preempted?
> >
> > If a vCPU is preempted, the 'SN' bit is set, the subsequent
> > interrupts are suppressed for posting.
> 
>  I mean what happens if we don't set SN bit. From my point, if
>  preempter already disabled the interrupt, it is ok to leave SN
>  bit as zero. But if preempter enabled the interrupt, doesn't this
>  mean he allow interrupt to happen? BTW, since there already has
>  ON bit, so this means there only have one interrupt arrived at
>  most and it doesn't hurt performance. Do we really need to set SN bit?
> >>>
> >>>
> >>> See this scenario:
> >>> vCPU0 is running on pCPU0
> >>> --> vCPU0 is preempted by vCPU1
> >>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
> >>> --> schedule in runqueue
> >>>
> >>> If the we don't set SN for vCPU0, then all subsequent interrupts
> >>> for
> >>> vCPU0 is posted to vCPU1, this will consume hardware and software
> >>
> >> The PI vector for vCPU1 is notification vector, but the PI vector
> >> for
> >> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
> >
> > Wakeup vector is only used for blocking case, when vCPU is preempted
> > and waiting in the runqueue, the NV is the notification vector.
> 
> I see your point. But from performance point, if we can schedule the vCPU to
> another PCPU to handle the interrupt, it would helpful. But I remember current
> KVM will not schedule the vCPU in run queue (even though it got preempted) to
> another pCPU to run(Am I right?). So it may hard to do it.
> 

KVM is using the Linux scheduler, when the preempted vCPU (in runqueue) is
scheduled again depends on the scheduling algorithm itself, I think it is a 
little
hard for us to get involved.

I think what you mentioned is a little like the urgent interrupt in VT-d PI 
Spec.
For this kind of interrupts, if an interrupt is coming for an preempted vCPU
(waiting in the run queue), we need to schedule the vCPU immediately. This
is some real time things. And we don't support urgent interrupt so far.

Thanks,
Feng

> >
> > Thanks,
> > Feng
> >
> >>
> >>> efforts and in fact it is not needed at all. If SN is set for
> >>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
> >>> when an interrupt is for it, but just setting the related PIR bit.
> >>>
> >>> Thanks,
> >>> Feng
> >>>
> 
> >
> > Thanks,
> > Feng
> >
> >>
>  waiting for the next scheduling in the runqueue. But I am not
>  sure whether we need to set SN for other purpose in futu

RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> Zhang, Yang Z wrote on 2014-12-19:
 Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
 is set
 
 Wu, Feng wrote on 2014-12-19:
> 
> 
> iommu-boun...@lists.linux-foundation.org wrote on
 mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>> Cc: iommu@lists.linux-foundation.org;
>> linux-ker...@vger.kernel.org; k...@vger.kernel.org
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>> is set
>> 
>> Paolo Bonzini wrote on 2014-12-18:
>>> 
>>> 
>>> On 18/12/2014 04:14, Wu, Feng wrote:
 
 
 linux-kernel-ow...@vger.kernel.org wrote on
>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
> dw...@infradead.org;
> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
> iommu@lists.linux-foundation.org;
> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress
> posted-interrupt when 'SN' is set
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
>> Currently, we don't support urgent interrupt, all
>> interrupts are recognized as non-urgent interrupt, so we
>> cannot send posted-interrupt when 'SN' is set.
> 
> Can this happen?  If the vcpu is in guest mode, it cannot
> have been scheduled out, and that's the only case when SN is set.
> 
> Paolo
 
 Currently, the only place where SN is set is vCPU is
 preempted and
>> 
>> If the vCPU is preempted, shouldn't the subsequent be ignored?
>> What happens if a PI is occurs when vCPU is preempted?
> 
> If a vCPU is preempted, the 'SN' bit is set, the subsequent
> interrupts are suppressed for posting.
 
 I mean what happens if we don't set SN bit. From my point, if
 preempter already disabled the interrupt, it is ok to leave SN
 bit as zero. But if preempter enabled the interrupt, doesn't this
 mean he allow interrupt to happen? BTW, since there already has
 ON bit, so this means there only have one interrupt arrived at
 most and it doesn't hurt performance. Do we really need to set SN bit?
>>> 
>>> 
>>> See this scenario:
>>> vCPU0 is running on pCPU0
>>> --> vCPU0 is preempted by vCPU1
>>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for
>>> --> schedule in runqueue
>>> 
>>> If the we don't set SN for vCPU0, then all subsequent interrupts
>>> for
>>> vCPU0 is posted to vCPU1, this will consume hardware and software
>> 
>> The PI vector for vCPU1 is notification vector, but the PI vector
>> for
>> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event?
> 
> Wakeup vector is only used for blocking case, when vCPU is preempted
> and waiting in the runqueue, the NV is the notification vector.

I see your point. But from performance point, if we can schedule the vCPU to 
another PCPU to handle the interrupt, it would helpful. But I remember current 
KVM will not schedule the vCPU in run queue (even though it got preempted) to 
another pCPU to run(Am I right?). So it may hard to do it.

> 
> Thanks,
> Feng
> 
>> 
>>> efforts and in fact it is not needed at all. If SN is set for
>>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0
>>> when an interrupt is for it, but just setting the related PIR bit.
>>> 
>>> Thanks,
>>> Feng
>>> 
 
> 
> Thanks,
> Feng
> 
>> 
 waiting for the next scheduling in the runqueue. But I am not
 sure whether we need to set SN for other purpose in future.
 Adding SN checking here is just to follow the Spec.
 non-urgent interrupts are suppressed
>>> when SN is set.
>>> 
>>> I would change that to a WARN_ON_ONCE then.
>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
 
 
 Best regards,
 Yang
 
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 12:44 PM
> To: Wu, Feng; Paolo Bonzini; k...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Wu, Feng wrote on 2014-12-19:
> >
> >
> > Zhang, Yang Z wrote on 2014-12-19:
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >> set
> >>
> >> Wu, Feng wrote on 2014-12-19:
> >>>
> >>>
> >>> iommu-boun...@lists.linux-foundation.org wrote on
> >> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>  Cc: iommu@lists.linux-foundation.org;
>  linux-ker...@vger.kernel.org; k...@vger.kernel.org
>  Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
>  is set
> 
>  Paolo Bonzini wrote on 2014-12-18:
> >
> >
> > On 18/12/2014 04:14, Wu, Feng wrote:
> >>
> >>
> >> linux-kernel-ow...@vger.kernel.org wrote on
>  mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
> dw...@infradead.org;
> >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
> >>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
> >>> iommu@lists.linux-foundation.org;
> >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM
> list;
> >>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt
> >>> when 'SN' is set
> >>>
> >>>
> >>>
> >>> On 12/12/2014 16:14, Feng Wu wrote:
>  Currently, we don't support urgent interrupt, all interrupts
>  are recognized as non-urgent interrupt, so we cannot send
>  posted-interrupt when 'SN' is set.
> >>>
> >>> Can this happen?  If the vcpu is in guest mode, it cannot have
> >>> been scheduled out, and that's the only case when SN is set.
> >>>
> >>> Paolo
> >>
> >> Currently, the only place where SN is set is vCPU is preempted
> >> and
> 
>  If the vCPU is preempted, shouldn't the subsequent be ignored?
>  What happens if a PI is occurs when vCPU is preempted?
> >>>
> >>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
> >>> interrupts are suppressed for posting.
> >>
> >> I mean what happens if we don't set SN bit. From my point, if
> >> preempter already disabled the interrupt, it is ok to leave SN bit
> >> as zero. But if preempter enabled the interrupt, doesn't this mean
> >> he allow interrupt to happen? BTW, since there already has ON bit,
> >> so this means there only have one interrupt arrived at most and it
> >> doesn't hurt performance. Do we really need to set SN bit?
> >
> >
> > See this scenario:
> > vCPU0 is running on pCPU0
> > --> vCPU0 is preempted by vCPU1
> > --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule
> > --> in runqueue
> >
> > If the we don't set SN for vCPU0, then all subsequent interrupts for
> > vCPU0 is posted to vCPU1, this will consume hardware and software
> 
> The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0
> should be wakeup vector. Why vCPU1 will consume this PI event?

Wakeup vector is only used for blocking case, when vCPU is preempted
and waiting in the runqueue, the NV is the notification vector.

Thanks,
Feng

> 
> > efforts and in fact it is not needed at all. If SN is set for vCPU0,
> > VT-d hardware will not issue Notification Event for vCPU0 when an
> > interrupt is for it, but just setting the related PIR bit.
> >
> > Thanks,
> > Feng
> >
> >>
> >>>
> >>> Thanks,
> >>> Feng
> >>>
> 
> >> waiting for the next scheduling in the runqueue. But I am not
> >> sure whether we need to set SN for other purpose in future.
> >> Adding SN checking here is just to follow the Spec. non-urgent
> >> interrupts are suppressed
> > when SN is set.
> >
> > I would change that to a WARN_ON_ONCE then.
> 
> 
>  Best regards,
>  Yang
> 
> 
>  ___
>  iommu mailing list
>  iommu@lists.linux-foundation.org
>  https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> 
> 
> Best regards,
> Yang
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-19:
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Wu, Feng wrote on 2014-12-19:
>>> 
>>> 
>>> iommu-boun...@lists.linux-foundation.org wrote on
>> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
 Cc: iommu@lists.linux-foundation.org;
 linux-ker...@vger.kernel.org; k...@vger.kernel.org
 Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
 is set
 
 Paolo Bonzini wrote on 2014-12-18:
> 
> 
> On 18/12/2014 04:14, Wu, Feng wrote:
>> 
>> 
>> linux-kernel-ow...@vger.kernel.org wrote on
 mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org;
>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>> joro-zLv9SwRftAIdnm+Jiang Liu Cc:
>>> iommu@lists.linux-foundation.org;
>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
>>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt
>>> when 'SN' is set
>>> 
>>> 
>>> 
>>> On 12/12/2014 16:14, Feng Wu wrote:
 Currently, we don't support urgent interrupt, all interrupts
 are recognized as non-urgent interrupt, so we cannot send
 posted-interrupt when 'SN' is set.
>>> 
>>> Can this happen?  If the vcpu is in guest mode, it cannot have
>>> been scheduled out, and that's the only case when SN is set.
>>> 
>>> Paolo
>> 
>> Currently, the only place where SN is set is vCPU is preempted
>> and
 
 If the vCPU is preempted, shouldn't the subsequent be ignored?
 What happens if a PI is occurs when vCPU is preempted?
>>> 
>>> If a vCPU is preempted, the 'SN' bit is set, the subsequent
>>> interrupts are suppressed for posting.
>> 
>> I mean what happens if we don't set SN bit. From my point, if
>> preempter already disabled the interrupt, it is ok to leave SN bit
>> as zero. But if preempter enabled the interrupt, doesn't this mean
>> he allow interrupt to happen? BTW, since there already has ON bit,
>> so this means there only have one interrupt arrived at most and it
>> doesn't hurt performance. Do we really need to set SN bit?
> 
> 
> See this scenario:
> vCPU0 is running on pCPU0
> --> vCPU0 is preempted by vCPU1
> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule
> --> in runqueue
> 
> If the we don't set SN for vCPU0, then all subsequent interrupts for
> vCPU0 is posted to vCPU1, this will consume hardware and software

The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0 
should be wakeup vector. Why vCPU1 will consume this PI event?

> efforts and in fact it is not needed at all. If SN is set for vCPU0,
> VT-d hardware will not issue Notification Event for vCPU0 when an
> interrupt is for it, but just setting the related PIR bit.
> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> Thanks,
>>> Feng
>>> 
 
>> waiting for the next scheduling in the runqueue. But I am not
>> sure whether we need to set SN for other purpose in future.
>> Adding SN checking here is just to follow the Spec. non-urgent
>> interrupts are suppressed
> when SN is set.
> 
> I would change that to a WARN_ON_ONCE then.
 
 
 Best regards,
 Yang
 
 
 ___
 iommu mailing list
 iommu@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/iommu
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 11:33 AM
> To: Wu, Feng; Paolo Bonzini; k...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Wu, Feng wrote on 2014-12-19:
> >
> >
> > iommu-boun...@lists.linux-foundation.org wrote on
> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
> >> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> >> k...@vger.kernel.org
> >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >> set
> >>
> >> Paolo Bonzini wrote on 2014-12-18:
> >>>
> >>>
> >>> On 18/12/2014 04:14, Wu, Feng wrote:
> 
> 
>  linux-kernel-ow...@vger.kernel.org wrote on
> >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> > x...@kernel.org; Gleb Natapov; Paolo Bonzini;
> > dw...@infradead.org;
> > joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
> > joro-zLv9SwRftAIdnm+Jiang
> > Liu
> > Cc: iommu@lists.linux-foundation.org;
> > linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
> > Eric Auger
> > Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> > is set
> >
> >
> >
> > On 12/12/2014 16:14, Feng Wu wrote:
> >> Currently, we don't support urgent interrupt, all interrupts
> >> are recognized as non-urgent interrupt, so we cannot send
> >> posted-interrupt when 'SN' is set.
> >
> > Can this happen?  If the vcpu is in guest mode, it cannot have
> > been scheduled out, and that's the only case when SN is set.
> >
> > Paolo
> 
>  Currently, the only place where SN is set is vCPU is preempted
>  and
> >>
> >> If the vCPU is preempted, shouldn't the subsequent be ignored? What
> >> happens if a PI is occurs when vCPU is preempted?
> >
> > If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts
> > are suppressed for posting.
> 
> I mean what happens if we don't set SN bit. From my point, if preempter
> already disabled the interrupt, it is ok to leave SN bit as zero. But if 
> preempter
> enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW,
> since there already has ON bit, so this means there only have one interrupt
> arrived at most and it doesn't hurt performance. Do we really need to set SN
> bit?


See this scenario:
vCPU0 is running on pCPU0
--> vCPU0 is preempted by vCPU1
--> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule in runqueue

If the we don't set SN for vCPU0, then all subsequent interrupts for vCPU0 is 
posted
to vCPU1, this will consume hardware and software efforts and in fact it is not 
needed
at all. If SN is set for vCPU0, VT-d hardware will not issue Notification Event 
for vCPU0
when an interrupt is for it, but just setting the related PIR bit.

Thanks,
Feng

> 
> >
> > Thanks,
> > Feng
> >
> >>
>  waiting for the next scheduling in the runqueue. But I am not
>  sure whether we need to set SN for other purpose in future.
>  Adding SN checking here is just to follow the Spec. non-urgent
>  interrupts are suppressed
> >>> when SN is set.
> >>>
> >>> I would change that to a WARN_ON_ONCE then.
> >>
> >>
> >> Best regards,
> >> Yang
> >>
> >>
> >> ___
> >> iommu mailing list
> >> iommu@lists.linux-foundation.org
> >> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> 
> 
> Best regards,
> Yang
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> iommu-boun...@lists.linux-foundation.org wrote on 
> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of:
>> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
>> k...@vger.kernel.org
>> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>> set
>> 
>> Paolo Bonzini wrote on 2014-12-18:
>>> 
>>> 
>>> On 18/12/2014 04:14, Wu, Feng wrote:
 
 
 linux-kernel-ow...@vger.kernel.org wrote on
>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> x...@kernel.org; Gleb Natapov; Paolo Bonzini;
> dw...@infradead.org;
> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
> joro-zLv9SwRftAIdnm+Jiang
> Liu
> Cc: iommu@lists.linux-foundation.org;
> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
> Eric Auger
> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN'
> is set
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
>> Currently, we don't support urgent interrupt, all interrupts
>> are recognized as non-urgent interrupt, so we cannot send
>> posted-interrupt when 'SN' is set.
> 
> Can this happen?  If the vcpu is in guest mode, it cannot have
> been scheduled out, and that's the only case when SN is set.
> 
> Paolo
 
 Currently, the only place where SN is set is vCPU is preempted
 and
>> 
>> If the vCPU is preempted, shouldn't the subsequent be ignored? What
>> happens if a PI is occurs when vCPU is preempted?
> 
> If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts
> are suppressed for posting.

I mean what happens if we don't set SN bit. From my point, if preempter already 
disabled the interrupt, it is ok to leave SN bit as zero. But if preempter 
enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW, 
since there already has ON bit, so this means there only have one interrupt 
arrived at most and it doesn't hurt performance. Do we really need to set SN 
bit?

> 
> Thanks,
> Feng
> 
>> 
 waiting for the next scheduling in the runqueue. But I am not
 sure whether we need to set SN for other purpose in future.
 Adding SN checking here is just to follow the Spec. non-urgent
 interrupts are suppressed
>>> when SN is set.
>>> 
>>> I would change that to a WARN_ON_ONCE then.
>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> ___
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: iommu-boun...@lists.linux-foundation.org
> [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of Zhang, Yang Z
> Sent: Thursday, December 18, 2014 11:10 PM
> To: Paolo Bonzini; k...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org;
> k...@vger.kernel.org
> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> 
> Paolo Bonzini wrote on 2014-12-18:
> >
> >
> > On 18/12/2014 04:14, Wu, Feng wrote:
> >>
> >>
> >> linux-kernel-ow...@vger.kernel.org wrote on
> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
> >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org;
> >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
> >>> joro-zLv9SwRftAIdnm+Jiang
> >>> Liu
> >>> Cc: iommu@lists.linux-foundation.org;
> >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
> >>> Eric Auger
> >>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
> >>> set
> >>>
> >>>
> >>>
> >>> On 12/12/2014 16:14, Feng Wu wrote:
>  Currently, we don't support urgent interrupt, all interrupts are
>  recognized as non-urgent interrupt, so we cannot send
>  posted-interrupt when 'SN' is set.
> >>>
> >>> Can this happen?  If the vcpu is in guest mode, it cannot have been
> >>> scheduled out, and that's the only case when SN is set.
> >>>
> >>> Paolo
> >>
> >> Currently, the only place where SN is set is vCPU is preempted and
> 
> If the vCPU is preempted, shouldn't the subsequent be ignored? What happens
> if a PI is occurs when vCPU is preempted?

If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts are
suppressed for posting.

Thanks,
Feng

> 
> >> waiting for the next scheduling in the runqueue. But I am not sure
> >> whether we need to set SN for other purpose in future. Adding SN
> >> checking here is just to follow the Spec. non-urgent interrupts are
> >> suppressed
> > when SN is set.
> >
> > I would change that to a WARN_ON_ONCE then.
> 
> 
> Best regards,
> Yang
> 
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 4:37 PM
> To: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is blocked
> 
> 
> 
> On 18/12/2014 04:16, Wu, Feng wrote:
> >>> pre-block:
> >>> - Add the vCPU to the blocked per-CPU list
> >>> - Clear 'SN'
> >>
> >> Should SN be already clear (and NV set to POSTED_INTR_VECTOR)?
> >
> > I think the SN bit should be clear here, Adding it here is just to make sure
> > SN is clear when vCPU is blocked, so it can receive wakeup notification 
> > event
> later.
> 
> Then, please, WARN if the SN bit is set inside the if (vcpu->blocked).
> Inside that if you can just add the vCPU to the blocked list on vcpu_put.
> 
> >> Can it
> >> happen that you go from sched-out to blocked without doing a sched-in
> first?
> >>
> >
> > I cannot imagine this scenario, can you please be more specific? Thanks a 
> > lot!
> 
> I cannot either. :)  But it would be the case where SN is not cleared.
> So we agree that it cannot happen.
> 
> >> In fact, if this is possible, what happens if vcpu->preempted &&
> >> vcpu->blocked?
> >
> > In fact, vcpu->preempted && vcpu->blocked happens sometimes, but I think
> there is
> > no issues. Please refer to the following case:
> 
> I agree that there should be no issues.  But if it can happen, it's better:
> 
> 1) to separate the handling of preemption and blocking: preemption
> handles SN/NV/NDST, blocking handles the wakeup list.
> 
Sorry, I don't quite understand this.

I think handling of preemption and blocking is separated in vmx_vcpu_put().
For vmx_vcpu_load(), the handling of SN/NV/NDST is common for preemption
and blocking.

Thanks,
Feng

> 2) to change this
> 
> + } else if (vcpu->blocked) {
> + /*
> +  * The vcpu is blocked on the wait queue.
> +  * Store the blocked vCPU on the list of the
> +  * vcpu->wakeup_cpu, which is the destination
> +  * of the wake-up notification event.
> 
> to just
> 
>   }
>   if (vcpu->blocked) {
>   ...
>   }
> > kvm_vcpu_block()
> > -> vcpu->blocked = true;
> > -> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> >
> > before schedule() is called, this vcpu is woken up by another guy, so
> > the state of the vcpu associated thread is changed to TASK_RUNNING,
> > then preemption happens after interrupts or the following schedule() is
> > hit, this will call kvm_sched_out(), in which current->state ==
> TASK_RUNNING
> > and vcpu->preempted is set to true. So now vcpu->preempted and
> vcpu->blocked
> > are both true. In vmx_vcpu_put(), we will check vcpu->preempted first, 
> > so
> > the vCPU will not be blocked, and the vcpu->blocked will be set the 
> > false in
> > vmx_vcpu_load().
> >
> > But maybe I need do a little change to the vmx_vcpu_load() like below:
> >
> > /*
> >  * Delete the vCPU from the related wakeup queue
> >  * if we are resuming from blocked state
> >  */
> > if (vcpu->blocked) {
> > vcpu->blocked = false;
> > +   /* if wakeup_cpu == -1, the 
> > vcpu is currently not
> blocked on any
> > + pCPU, don't need dequeue here 
> > */
> > +   if (vcpu->wakeup_cpu != -1) {
> >
> spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > vcpu->wakeup_cpu), flags);
> >  list_del(&vcpu->blocked_vcpu_list);
> >
> spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > vcpu->wakeup_cpu), flags);
> >  vcpu->wakeup_cpu = -1;
> > +   }
> > }
> 
> Good idea.
> 
> Paolo
> 
> > Any ideas about this? Thanks a lot!
> >
> > Thanks,
> > Feng
> >
> >
> > -> schedule();
> >
> >
> >>
> >>> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR
> >>>
> >>> post-block:
> >>> - Remove the vCPU from the per-CPU list
> >>
> >> Paolo
> >>
> >>> Signed-off-by: Feng Wu 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http:

RE: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 12:18 AM
> To: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
> 
> 
> 
> On 12/12/2014 16:14, Feng Wu wrote:
> > Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h,
> > so we can use it outside of irqchip.c.
> >
> > Signed-off-by: Feng Wu 
> > ---
> >  include/linux/kvm_host.h | 19 +++
> >  virt/kvm/irqchip.c   | 11 ---
> >  2 files changed, 19 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 0b9659d..cfa85ac 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry {
> > struct hlist_node link;
> >  };
> >
> > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
> > +
> > +struct kvm_irq_routing_table {
> > +   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > +   struct kvm_kernel_irq_routing_entry *rt_entries;
> > +   u32 nr_rt_entries;
> > +   /*
> > +* Array indexed by gsi. Each entry contains list of irq chips
> > +* the gsi is connected to.
> > +*/
> > +   struct hlist_head map[0];
> > +};
> > +
> > +#else
> > +
> > +struct kvm_irq_routing_table {};
> 
> If possible, just make this "struct kvm_irq_routing_table;" and pull
> this line to include/linux/kvm_types.h.
> 
> Paolo

Do you mean move the definition of struct kvm_irq_routing_table
to include/linux/kvm_types.h and add a declaration here?

Thanks,
Feng

> 
> > +
> > +#endif
> > +
> >  #ifndef KVM_PRIVATE_MEM_SLOTS
> >  #define KVM_PRIVATE_MEM_SLOTS 0
> >  #endif
> > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> > index 7f256f3..cdf29a6 100644
> > --- a/virt/kvm/irqchip.c
> > +++ b/virt/kvm/irqchip.c
> > @@ -31,17 +31,6 @@
> >  #include 
> >  #include "irq.h"
> >
> > -struct kvm_irq_routing_table {
> > -   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> > -   struct kvm_kernel_irq_routing_entry *rt_entries;
> > -   u32 nr_rt_entries;
> > -   /*
> > -* Array indexed by gsi. Each entry contains list of irq chips
> > -* the gsi is connected to.
> > -*/
> > -   struct hlist_head map[0];
> > -};
> > -
> >  int kvm_irq_map_gsi(struct kvm *kvm,
> > struct kvm_kernel_irq_routing_entry *entries, int gsi)
> >  {
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org
> [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini
> Sent: Thursday, December 18, 2014 4:32 PM
> To: linux-ker...@vger.kernel.org
> Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org;
> linux-ker...@vger.kernel.org; k...@vger.kernel.org
> Subject: Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU
> is preempted
> 
> 
> 
> On 18/12/2014 04:15, Wu, Feng wrote:
> > Thanks for your comments, Paolo!
> >
> > If we use u64 new_control, we cannot use new.sn any more.
> > Maybe we can change the struct pi_desc {} like this:
> >
> > typedef struct pid_control{
> > u64 on  : 1,
> > sn  : 1,
> > rsvd_1  : 13,
> > ndm : 1,
> > nv  : 8,
> > rsvd_2  : 8,
> > ndst: 32;
> > }pid_control_t;
> >
> > struct pi_desc {
> > u32 pir[8]; /* Posted interrupt requested */
> > pid_control_t control;
> 
> Probably something like this to keep the union:
> 
> typedef union pid_control {
>   u64 full;
>   struct {
>   u64 on : 1,
>   ...
>   } fields;
> };
> 
> > u32 rsvd[6];
> > } __aligned(64);
> >
> >
> > Then we can define pid_control_t new_control, old_control. And use
> new_control.sn = 0.
> >
> > What is your opinon?
> 
> Sure.  Alternatively, keep using struct pi_desc new; just
> do not zero it, nor access any field outide the control word.
> 
> Paolo

Yes, this is also a good idea. Thanks!

Thanks,
Feng

> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-18 Thread Rafael J. Wysocki
On Thursday, December 18, 2014 11:28:58 PM Laurent Pinchart wrote:
> Hi Kevin,
> 

[cut]

> > >> 
> > >> It would be better to be able to reference count the DMA engine from the
> > >> bus master IMO and arguably you can use the runtime PM framework for
> > >> that. Namely, give bus masters someting like
> > >> 
> > >>  pm_runtime_get_my_DMA_engine(bus_master_device)
> > >>  pm_runtime_put_my_DMA_engine(bus_master_device)
> > >> 
> > >> and let them call these as they see fit.
> > > 
> > > Please note that we're not talking about DMA engines here, but about
> > > IOMMUs. DMA is involved through the DMA mapping API which hides the IOMMU
> > > completely from the bus master drivers, not the DMA engine API.
> > > 
> > > Exposing the IOMMU is something we want to avoid, but DMA mapping
> > > start/stop operations could certainly be implemented.
> > 
> > The problem with that is it only solves the IOMMU problem.  We have a
> > more generic PM dependency problem of which this IOMMU example is only a
> > subset, so I think we need a more generic solution.
> 
> I agree that a more generic solution is needed at least to support ACPI _DEP, 
> but that might not be optimal in the IOMMU use case as explained above.

Well, since we need it anyway, why don't we implement it and then figure out
if anything more specific needs to be done for the IOMMU case?


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Paolo Bonzini wrote on 2014-12-19:
>> jiang@linux.intel.com
>> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
>> iommu@lists.linux-foundation.org; k...@vger.kernel.org
>> Subject: Re: [v3 13/26] KVM: Define a new interface
>> kvm_find_dest_vcpu() for VT-d PI
>> 
>> 
>> 
>> On 18/12/2014 15:49, Zhang, Yang Z wrote:
> Here, we introduce a similar way with 'apic_arb_prio' to handle
> guest lowest priority interrtups when VT-d PI is used. Here is
> the
> ideas: - Each vCPU has a counter 'round_robin_counter'. - When
> guests sets an interrupts to lowest priority, we choose the vCPU
> with smallest 'round_robin_counter' as the destination, then
> increase it.
>>> 
>>> How this can work well? All subsequent interrupts are delivered to
>>> one vCPU? It shouldn't be the best solution, need more consideration.
>> 
>> Well, it's a hardware limitation.  The alternative (which is easy to
>> implement) is to only do PI for single-CPU interrupts.  This should
>> work well for multiqueue NICs (and of course for UP guests :)), so
>> perhaps it's a good idea to only support that as a first attempt.
>> 
>> Paolo
> 
> Paolo, what do you mean by "single-CPU interrupts"? Do you mean we

It should be same idea as I mentioned on another thread: deliver the interrupt 
to a single CPU(maybe the first matched VCPU?)

> don't support lowest priority interrupts for PI? But Linux OS uses
> lowest priority for most of the case? If so, we can hardly get benefit
> from this feature for Linux guest OS.
> 
> Thanks,
> Feng
> 
>> 
>>> Also, I think you should take the apic_arb_prio into consider
>>> since the priority is for the whole vCPU not for one interrupt.


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-18 Thread Zhang, Yang Z
Wu, Feng wrote on 2014-12-19:
> 
> 
> Zhang, Yang Z wrote on 2014-12-18:
>> jiang@linux.intel.com
>> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
>> iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng
>> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for
>> VT-d Posted-Interrupts
>> 
>> Feng Wu wrote on 2014-12-12:
>>> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
>>> When 'pst' is set in IRTE, the associated irq will be posted to
>>> guests instead of interrupt remapping. The destination of the
>>> interrupt is set in Posted-Interrupts Descriptor, and the
>>> migration happens during vCPU scheduling.
>>> 
>>> However, we still update the cached irte here, which can be used
>>> when changing back to remapping mode.
>>> 
>>> Signed-off-by: Feng Wu 
>>> Reviewed-by: Jiang Liu 
>>> ---
>>>  drivers/iommu/intel_irq_remapping.c | 6 +-
>>>  1 file changed, 5 insertions(+), 1 deletion(-) diff --git
>>> a/drivers/iommu/intel_irq_remapping.c
>>> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a
>>> 100644
>>> --- a/drivers/iommu/intel_irq_remapping.c +++
>>> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
>>> intel_ir_set_affinity(struct irq_data *data, const struct cpumask
>>> *mask,  {
>>> struct intel_ir_data *ir_data = data->chip_data;struct irte 
>>> *irte =
>>>  &ir_data->irte_entry; +struct irte_pi *irte_pi = (struct irte_pi
>>>  *)irte;struct irq_cfg *cfg = irqd_cfg(data);   struct irq_data *parent
>>>  = data->parent_data;   int ret;
>>> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
>>> const struct cpumask *mask,
>>>  */
>>> irte->vector = cfg->vector;
>>> irte->dest_id = IRTE_DEST(cfg->dest_apicid);
>>> -   modify_irte(&ir_data->irq_2_iommu, irte);
>>> +
>>> +   /* We don't need to modify irte if the interrupt is for posting. */
>>> +   if (irte_pi->pst != 1)
>>> +   modify_irte(&ir_data->irq_2_iommu, irte);
>> 
>> What happens if user changes the IRQ affinity manually?
> 
> If the IRQ is posted, its affinity is controlled by guest (irq <--->
> vCPU <> pCPU), it has no effect when host changes its affinity.

That's the problem: User is able to changes it in host but it never takes 
effect since it is actually controlled by guest. I guess it will break the IRQ 
balance too.

> 
> Thanks,
> Feng
> 
>> 
>>> 
>>> /*
>>>  * After this point, all the interrupts will start arriving
>> 
>> 
>> Best regards,
>> Yang
>>


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Zhang, Yang Z
> Sent: Thursday, December 18, 2014 10:26 PM
> To: Wu, Feng; t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
> x...@kernel.org; g...@kernel.org; pbonz...@redhat.com;
> dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com;
> jiang@linux.intel.com
> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng
> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d
> Posted-Interrupts
> 
> Feng Wu wrote on 2014-12-12:
> > We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> > When 'pst' is set in IRTE, the associated irq will be posted to guests
> > instead of interrupt remapping. The destination of the interrupt is
> > set in Posted-Interrupts Descriptor, and the migration happens during
> > vCPU scheduling.
> >
> > However, we still update the cached irte here, which can be used when
> > changing back to remapping mode.
> >
> > Signed-off-by: Feng Wu 
> > Reviewed-by: Jiang Liu 
> > ---
> >  drivers/iommu/intel_irq_remapping.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > diff --git a/drivers/iommu/intel_irq_remapping.c
> > b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> > a/drivers/iommu/intel_irq_remapping.c +++
> > b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> > intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
> >  {
> > struct intel_ir_data *ir_data = data->chip_data;struct irte 
> > *irte =
> >  &ir_data->irte_entry; +struct irte_pi *irte_pi = (struct irte_pi
> >  *)irte;struct irq_cfg *cfg = irqd_cfg(data);   struct irq_data *parent
> >  = data->parent_data;   int ret;
> > @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> > const struct cpumask *mask,
> >  */
> > irte->vector = cfg->vector;
> > irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> > -   modify_irte(&ir_data->irq_2_iommu, irte);
> > +
> > +   /* We don't need to modify irte if the interrupt is for posting. */
> > +   if (irte_pi->pst != 1)
> > +   modify_irte(&ir_data->irq_2_iommu, irte);
> 
> What happens if user changes the IRQ affinity manually?

If the IRQ is posted, its affinity is controlled by guest (irq <---> vCPU 
<> pCPU),
it has no effect when host changes its affinity.

Thanks,
Feng

> 
> >
> > /*
> >  * After this point, all the interrupts will start arriving
> 
> 
> Best regards,
> Yang
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Zhang, Yang Z
> Sent: Friday, December 19, 2014 9:14 AM
> To: Paolo Bonzini; Wu, Feng; t...@linutronix.de; mi...@redhat.com;
> h...@zytor.com; x...@kernel.org; g...@kernel.org; dw...@infradead.org;
> j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> iommu@lists.linux-foundation.org; k...@vger.kernel.org
> Subject: RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> Paolo Bonzini wrote on 2014-12-19:
> >
> >
> > On 18/12/2014 15:49, Zhang, Yang Z wrote:
>  Here, we introduce a similar way with 'apic_arb_prio' to handle
>  guest lowest priority interrtups when VT-d PI is used. Here is the
>  ideas: - Each vCPU has a counter 'round_robin_counter'. - When
>  guests sets an interrupts to lowest priority, we choose the vCPU
>  with smallest 'round_robin_counter' as the destination, then
>  increase it.
> >>
> >> How this can work well? All subsequent interrupts are delivered to
> >> one vCPU? It shouldn't be the best solution, need more consideration.
> >
> > Well, it's a hardware limitation.  The alternative (which is easy to
> 
> Agree, it is limited by hardware. But lowest priority distributes the 
> interrupt
> more efficient than fixed mode. And current implementation more likes to
> switch the lowest priority mode to fixed mode. In case of interrupt intensive
> environment, this may be a bottleneck and VM may not benefit greatly from
> VT-d PI. But agree again, it is really a hardware limitation.
> 
> > implement) is to only do PI for single-CPU interrupts.  This should
> > work well for multiqueue NICs (and of course for UP guests :)), so
> > perhaps it's a good idea to only support that as a first attempt.
> 
> The more easy way is to deliver the interrupt to the first matched VCPU we 
> find.
> The round_robin_counter really helps nothing here since the interrupt is
> delivered by hardware directly.
> 
> >
> > Paolo
> >
> >> Also, I think you should take the apic_arb_prio into consider since
> >> the priority is for the whole vCPU not for one interrupt.
> 
> 
> Best regards,
> Yang

In fact, the current solution was discussed with Rajesh in the cc List, here is 
Rajesh's original words:

"When you see a guest requesting a lowest priority interrupts (by programming 
the virtual IOAPIC, or by programming the virtual MSI/MSI-X registers), have 
KVM associate it to a vCPU.  Or, put another way, use the 'apic_arb_prio' 
method you describe below, but instead of using it at time of interrupt (which 
you no longer have control with posted interrupt direct delivery), do it at 
time of initializing the interrupt resource.  This way, if the guest asks for 4 
lowest priority interrupts, and say you a guest with two vCPUs, the first 
interrupt request will be serviced by KVM by assigning it through posting to 
vCPU0, the next one goes to vCPU1, the next one would go back to vCPU0, and so 
forth..  You could also choose to do this based on vector hashing instead of 
round-robin."

Thanks,
Feng

> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Paolo Bonzini [mailto:pbonz...@redhat.com]
> Sent: Friday, December 19, 2014 12:58 AM
> To: Zhang, Yang Z; Wu, Feng; t...@linutronix.de; mi...@redhat.com;
> h...@zytor.com; x...@kernel.org; g...@kernel.org; dw...@infradead.org;
> j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com
> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> iommu@lists.linux-foundation.org; k...@vger.kernel.org
> Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for
> VT-d PI
> 
> 
> 
> On 18/12/2014 15:49, Zhang, Yang Z wrote:
> >>> Here, we introduce a similar way with 'apic_arb_prio' to handle
> >>> guest lowest priority interrtups when VT-d PI is used. Here is
> >>> the ideas: - Each vCPU has a counter 'round_robin_counter'. -
> >>> When guests sets an interrupts to lowest priority, we choose the
> >>> vCPU with smallest 'round_robin_counter' as the destination, then
> >>> increase it.
> >
> > How this can work well? All subsequent interrupts are delivered to
> > one vCPU? It shouldn't be the best solution, need more consideration.
> 
> Well, it's a hardware limitation.  The alternative (which is easy to
> implement) is to only do PI for single-CPU interrupts.  This should work
> well for multiqueue NICs (and of course for UP guests :)), so perhaps
> it's a good idea to only support that as a first attempt.
> 
> Paolo

Paolo, what do you mean by "single-CPU interrupts"? Do you mean we don't
support lowest priority interrupts for PI? But Linux OS uses lowest priority
for most of the case? If so, we can hardly get benefit from this feature for
Linux guest OS.

Thanks,
Feng

> 
> > Also, I think you should take the apic_arb_prio into consider since
> > the priority is for the whole vCPU not for one interrupt.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-19:
> 
> 
> On 18/12/2014 15:49, Zhang, Yang Z wrote:
 Here, we introduce a similar way with 'apic_arb_prio' to handle
 guest lowest priority interrtups when VT-d PI is used. Here is the
 ideas: - Each vCPU has a counter 'round_robin_counter'. - When
 guests sets an interrupts to lowest priority, we choose the vCPU
 with smallest 'round_robin_counter' as the destination, then
 increase it.
>> 
>> How this can work well? All subsequent interrupts are delivered to
>> one vCPU? It shouldn't be the best solution, need more consideration.
> 
> Well, it's a hardware limitation.  The alternative (which is easy to

Agree, it is limited by hardware. But lowest priority distributes the interrupt 
more efficient than fixed mode. And current implementation more likes to switch 
the lowest priority mode to fixed mode. In case of interrupt intensive 
environment, this may be a bottleneck and VM may not benefit greatly from VT-d 
PI. But agree again, it is really a hardware limitation.

> implement) is to only do PI for single-CPU interrupts.  This should
> work well for multiqueue NICs (and of course for UP guests :)), so
> perhaps it's a good idea to only support that as a first attempt.

The more easy way is to deliver the interrupt to the first matched VCPU we 
find. The round_robin_counter really helps nothing here since the interrupt is 
delivered by hardware directly.

> 
> Paolo
> 
>> Also, I think you should take the apic_arb_prio into consider since
>> the priority is for the whole vCPU not for one interrupt.


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts

2014-12-18 Thread Wu, Feng


> -Original Message-
> From: Zhang, Yang Z
> Sent: Thursday, December 18, 2014 10:55 PM
> To: Wu, Feng; t...@linutronix.de; mi...@redhat.com; h...@zytor.com;
> x...@kernel.org; g...@kernel.org; pbonz...@redhat.com;
> dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com;
> jiang@linux.intel.com
> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org;
> iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng
> Subject: RE: [v3 21/26] x86, irq: Define a global vector for VT-d
> Posted-Interrupts
> 
> Feng Wu wrote on 2014-12-12:
> > Currently, we use a global vector as the Posted-Interrupts
> > Notification Event for all the vCPUs in the system. We need to
> > introduce another global vector for VT-d Posted-Interrtups, which will
> > be used to wakeup the sleep vCPU when an external interrupt from a
> direct-assigned device happens for that vCPU.
> >
> 
> Hi Feng,
> 
> Since the idea of two global vectors mechanism is from me, please add me to
> the comments.

No problem, Yang, I will add a "suggested-by Yang Zhang 
"
in this patch. Thanks a lot!

Thanks,
Feng

> 
> > Signed-off-by: Feng Wu 
> > ---
> >  arch/x86/include/asm/entry_arch.h  |  2 ++
> >  arch/x86/include/asm/hardirq.h |  1 +
> >  arch/x86/include/asm/hw_irq.h  |  2 ++
> >  arch/x86/include/asm/irq_vectors.h |  1 +
> >  arch/x86/kernel/entry_64.S |  2 ++
> >  arch/x86/kernel/irq.c  | 27
> +++
> >  arch/x86/kernel/irqinit.c  |  2 ++
> >  7 files changed, 37 insertions(+)
> > diff --git a/arch/x86/include/asm/entry_arch.h
> > b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 ---
> > a/arch/x86/include/asm/entry_arch.h +++
> > b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@
> > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)  #ifdef
> > CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi,
> POSTED_INTR_VECTOR,
> >  smp_kvm_posted_intr_ipi)
> > +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi,
> POSTED_INTR_WAKEUP_VECTOR,
> > +smp_kvm_posted_intr_wakeup_ipi)
> >  #endif
> >
> >  /*
> > diff --git a/arch/x86/include/asm/hardirq.h
> > b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644
> > --- a/arch/x86/include/asm/hardirq.h
> > +++ b/arch/x86/include/asm/hardirq.h
> > @@ -14,6 +14,7 @@ typedef struct {
> >  #endif #ifdef CONFIG_HAVE_KVM  unsigned int kvm_posted_intr_ipis;
> >  +  unsigned int kvm_posted_intr_wakeup_ipis; #endifunsigned int
> >  x86_platform_ipis; /* arch dependent */unsigned int apic_perf_irqs;
> > diff --git a/arch/x86/include/asm/hw_irq.h
> > b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644
> > --- a/arch/x86/include/asm/hw_irq.h
> > +++ b/arch/x86/include/asm/hw_irq.h
> > @@ -29,6 +29,7 @@
> >  extern asmlinkage void apic_timer_interrupt(void);  extern asmlinkage
> > void x86_platform_ipi(void);  extern asmlinkage void
> > kvm_posted_intr_ipi(void); +extern asmlinkage void
> > kvm_posted_intr_wakeup_ipi(void);
> >  extern asmlinkage void error_interrupt(void);  extern asmlinkage void
> > irq_work_interrupt(void);
> >
> > @@ -92,6 +93,7 @@ extern void
> > trace_call_function_single_interrupt(void);
> >  #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
> > #define trace_reboot_interrupt  reboot_interrupt  #define
> > trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
> > +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
> >  #endif /* CONFIG_TRACING */
> >
> >  struct irq_domain;
> > diff --git a/arch/x86/include/asm/irq_vectors.h
> > b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 ---
> > a/arch/x86/include/asm/irq_vectors.h +++
> > b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@
> >  /* Vector for KVM to deliver posted interrupt IPI */  #ifdef
> >  CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR 0xf2 +#define
> >  POSTED_INTR_WAKEUP_VECTOR  0xf1 #endif
> >
> >  /*
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++
> > b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt
> > X86_PLATFORM_IPI_VECTOR \  #ifdef CONFIG_HAVE_KVM
> >  apicinterrupt3 POSTED_INTR_VECTOR \
> > kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
> > +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
> > +   kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
> >  #endif
> >
> >  #ifdef CONFIG_X86_MCE_THRESHOLD
> > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index
> > 922d285..47408c3 100644
> > --- a/arch/x86/kernel/irq.c
> > +++ b/arch/x86/kernel/irq.c
> > @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs
> > *regs)  }
> >
> >  #ifdef CONFIG_HAVE_KVM
> > +void (*wakeup_handler_callback)(void) = NULL;
> > +EXPORT_SYMBOL_GPL(wakeup_handler_callback); +
> >  /*
> >   * Handler for POSTED_INTERRUPT_VECTOR.
> >   */
> > @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(

Re: [Linaro-mm-sig] [PATCH v3 18/19] iommu: exynos: init from dt-specific callback instead of initcall

2014-12-18 Thread Arnd Bergmann
On Thursday 18 December 2014 22:36:14 Laurent Pinchart wrote:
> 
> > (currently only on ARM64, not ARM32, until someone adds support). I can see
> > your point regarding machines that have a mandatory IOMMU with no fallback
> > when there is no driver, but we can support them by making the iommu driver
> > selected through Kconfig for that platform, while still allowing other
> > platforms to work with drivers left out of the kernel.
> 
> The question is how to tell the kernel not to wait for an IOMMU that will 
> never be there. Would a kernel command line argument be an acceptable 
> solution 
> or do we need something more automatic ?

I would hope that we can find an automatic solution without relying on
the command line.

Unfortunately, I also remembered one case in support of what Lucas mentioned
and that would break this: There is at least one SoC that uses cache-coherent
DMA only when the IOMMU (ARM SMMU IIRC)is enabled, but is non-coherent
otherwise. We can't really express that in DT, so we actually do rely on
the IOMMU driver to be present on this machine when any "iommus" properties
are used, as they would always come in combination with "dma-coherent"
properties that are otherwise wrong.

We can still enforce this by requiring the smmu driver to be built into
the kernel on this platform, but simply saying that the device cannot
support DMA as long as there is an iommus property but no driver for
it does make a lot of sense.

Note that there are more than two ways that the kernel could treat
the situation of probing a device with a valid iommus reference but
no driver loaded for the iommu:

a) assume all iommu drivers are initialized early, so use linear or
   swiotlb dma_map_ops, and probe the driver normally. This breaks
   the scenario with conditionally coherent devices, and requires doing
   the iommu init early
b) assume all iommu drivers are initialized early, so disallow any DMA
   by setting a zero dma_mask but allow the driver to be probed using
   polling I/O mode (useful for slow devices like UART or SPI)
   This breaks devices that require DMA but can fall back to linear
   or swiotlb mappings, and requires doing the iommu init early.
c) defer probing until an iommu driver is gets initialized. This breaks
   both the cases where we could fall back to a sane behavior without
   iommu, but it would let us use a proper driver with regular power
   management etc.

Arnd
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-18 Thread Laurent Pinchart
Hi Kevin,

On Thursday 18 December 2014 13:14:24 Kevin Hilman wrote:
> Laurent Pinchart  writes:
> > On Thursday 18 December 2014 02:32:30 Rafael J. Wysocki wrote:
> >> On Wednesday, December 17, 2014 02:15:31 AM Laurent Pinchart wrote:
> >>> On Tuesday 16 December 2014 11:18:33 Tomasz Figa wrote:
>  On Tue, Dec 16, 2014 at 4:53 AM, Laurent Pinchart wrote:
> > On Monday 15 December 2014 11:39:01 Tomasz Figa wrote:
> >> On Sat, Dec 13, 2014 at 5:47 AM, Laurent Pinchart wrote:
> >>> On Friday 12 December 2014 13:15:51 Tomasz Figa wrote:
>  On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote:
> > On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote:
> >> On 11 December 2014 at 16:31, Kevin Hilman wrote:

[...]

> >>> From a high-level, the IOMMU is just another device inside the
> >>> PM domain, so ideally it should be doing it's own _get() and
> >>> _put() calls so the PM domain code would just do the right
> >>> thing without the need for notifiers.
> >> 
> >> As I understand it, the IOMMU (or for other similar cases)
> >> shouldn't be doing any get() and put() at all because there are
> >> no IO API to serve request from.
> >>> 
> >>> Speaking purely from an IOMMU point of view that's not entirely
> >>> tree. IOMMU drivers expose map and unmap operations, so they can
> >>> track whether any memory is mapped. From a bus master point of view
> >>> the IOMMU map and unmap operations are hidden by the DMA mapping
> >>> API. The IOMMU can thus track the existence of mappings without any
> >>> IOMMU awareness in the bus master driver.
> >>> 
> >>> If no mapping exist the IOMMU shouldn't receive any translation
> >>> request. An IOMMU driver could thus call pm_runtime_get_sync() in
> >>> the map handler when creating the first mapping, and
> >>> pm_runtime_put() in the unmap handler when tearing the last mapping
> >>> down.
> >>> 
> >>> One could argue that the IOMMU would end up being powered more
> >>> often than strictly needed, as bus masters drivers, even when
> >>> written properly, could keep mappings around at times they don't
> >>> perform bus access. This is true, and that's an argument I've raised
> >>> during the last kernel summit. The general response (including Linus
> >>> Torvald's) was that micro-optimizing power management might not be
> >>> worth it, and that measurements proving that the gain is worth it
> >>> are required before introducing new APIs to solve the problem. I
> >>> can't disagree with that argument.
> >> 
> >> This would be a micro optimization if the IOMMU was located in its
> >> own power domain. Unfortunately in most cases it is not, so keeping
> >> all the devices in the domain powered on, because one of them have a
> >> mapping created doesn't sound like a good idea.
> >> 
> >> Moreover, most of the drivers will keep the mapping for much longer
> >> than one run cycle. Please take a look at V4L2's videobuf2 subsystem
> >> (which I guess you are more familiar with than me;)), which will
> >> keep MMAP buffers mapped in IOMMU address space for their whole
> >> lifetime. I believe similar is the case for DRM drivers.
> > 
> > Yes, but that doesn't mean it gets out of control. Buffers shouldn't
> > be allocated if they won't be used. Granted, they could be
> > preallocated (or rather premapped) slightly before being used, but in
> > sane use cases that shouldn't be long before the hardware needs to be
> > turned on.
>  
>  Assuming that we don't have a third party, called "user", involved.
> >>> 
> >>> Who needs that ? :-D
> >>> 
>  A simple use case is video playback pause. Usually all the software
>  state (including output buffers that can be used as reference for
>  decoding next frames) needs to be preserved to continue decoding after
>  resume, but it would be nice to power off the decoder, if it is unused
>  for some period. In addition, we would like the pause/resume operation
>  to be fast, so unmapping/freeing buffers and then exactly the opposite
>  on resume doesn't sound like a good idea.
> >>> 
> >>> OK, then we have one possible use case. I expect people to still want
> >>> to see power consumption numbers though.
> >> 
> >> Well, we have them, kind of.
> >> 
> >> In the ACPI world there's something called _DEP which gives us a list of
> >> devices depended on by the given one.  Those may be devices whose drivers
> >> provide so called "operation region" handling which means that an ACPI
> >> method executed for the dependent device may access a device it depends
> >> on indirectly.  Because of that indirection we basically need the devices
> >> listed by _DEP to be "on" whenever the dependent device is "on" or things
> >> may break in nasty ways otherwise

Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-18 Thread Kevin Hilman
Laurent Pinchart  writes:

> Hi Rafael,
>
> On Thursday 18 December 2014 02:32:30 Rafael J. Wysocki wrote:
>> On Wednesday, December 17, 2014 02:15:31 AM Laurent Pinchart wrote:
>> > On Tuesday 16 December 2014 11:18:33 Tomasz Figa wrote:
>> >> On Tue, Dec 16, 2014 at 4:53 AM, Laurent Pinchart wrote:
>> >>> On Monday 15 December 2014 11:39:01 Tomasz Figa wrote:
>>  On Sat, Dec 13, 2014 at 5:47 AM, Laurent Pinchart wrote:
>> > On Friday 12 December 2014 13:15:51 Tomasz Figa wrote:
>> >> On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote:
>> >>> On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote:
>>  On 11 December 2014 at 16:31, Kevin Hilman wrote:
>> > Tomasz Figa  writes:
>> >> On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson wrote:
>> > [...]
>> > 
>>  @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct
>>  platform_device *pdev)
>>  return -ENXIO;
>>  }
>>  
>>  +   pm_runtime_no_callbacks(dev);
>>  +   pm_runtime_enable(dev);
>>  +
>>  +   /* Synchronize state of the domain with driver data.
>>  */
>>  +   pm_runtime_get_sync(dev);
>>  +   iommu->is_powered = true;
>> >>> 
>> >>> Doesn't the runtime PM status reflect the value of
>> >>> "is_powered", thus why do you need to have a copy of it? Could
>> >>> it perpahps be that you try to cope with the case when
>> >>> CONFIG_PM is unset?
>> >> 
>> >> It's worth noting that this driver fully relies on status of
>> >> other devices in the power domain the IOMMU is in and does not
>> >> enforce the status on its own. So in general, as far as my
>> >> understanding of PM runtime subsystem, the status of the IOMMU
>> >> device will be always suspended, because nobody will call
>> >> pm_runtime_get() on it (except the get and put pair in probe).
>> >> So is_powered is here to track status of the domain, not the
>> >> device. Feel free to suggest a better way, though.
>> > 
>> > I still don't like these notifiers.  I think they add ways to
>> > bypass having proper runtime PM implemented for
>> > devices/subsystems.
>>  
>>  I do agree, but I haven't found another good solution to the
>>  problem.
>> >>> 
>> >>> For the record, I'm not liking this mostly because it "fixes" a
>> >>> generic problem in a way that's hidden in the genpd code and very
>> >>> indirect.
>> >> 
>> >> Well, that's true. This is indeed a generic problem of PM
>> >> dependencies between devices (other than those represented by
>> >> parent-child relation), which in fact doesn't have much to do with
>> >> genpd, but rather with those devices directly. It is just that
>> >> genpd is the most convenient location to solve this in current code
>> >> and in a simple way. In other words, I see this solution as a
>> >> reasonable way to get the problem solved quickly for now, so that
>> >> we can start thinking about a more elegant solution.
>> >> 
>> > From a high-level, the IOMMU is just another device inside the
>> > PM domain, so ideally it should be doing it's own _get() and
>> > _put() calls so the PM domain code would just do the right thing
>> > without the need for notifiers.
>>  
>>  As I understand it, the IOMMU (or for other similar cases)
>>  shouldn't be doing any get() and put() at all because there are
>>  no IO API to serve request from.
>> > 
>> > Speaking purely from an IOMMU point of view that's not entirely
>> > tree. IOMMU drivers expose map and unmap operations, so they can
>> > track whether any memory is mapped. From a bus master point of view
>> > the IOMMU map and unmap operations are hidden by the DMA mapping
>> > API. The IOMMU can thus track the existence of mappings without any
>> > IOMMU awareness in the bus master driver.
>> > 
>> > If no mapping exist the IOMMU shouldn't receive any translation
>> > request. An IOMMU driver could thus call pm_runtime_get_sync() in
>> > the map handler when creating the first mapping, and
>> > pm_runtime_put() in the unmap handler when tearing the last mapping
>> > down.
>> > 
>> > One could argue that the IOMMU would end up being powered more often
>> > than strictly needed, as bus masters drivers, even when written
>> > properly, could keep mappings around at times they don't perform bus
>> > access. This is true, and that's an argument I've raised during the
>> > last kernel summit. The general response (including Linus Torvald's)
>> > was that micro-optimizing power management might not be worth it,
>> > an

Re: [Linaro-mm-sig] [PATCH v3 18/19] iommu: exynos: init from dt-specific callback instead of initcall

2014-12-18 Thread Laurent Pinchart
Hi Arnd,

On Wednesday 17 December 2014 16:56:53 Arnd Bergmann wrote:
> On Wednesday 17 December 2014 15:53:25 Lucas Stach wrote:
> > Am Mittwoch, den 17.12.2014, 15:27 +0100 schrieb Arnd Bergmann:
> >> On Wednesday 17 December 2014 01:24:42 Laurent Pinchart wrote:
> >>> If we forbid the IOMMU driver from being compiled as a module can't we
> >>> just rely on deferred probing ? The bus master driver will just be
> >>> reprobed after the IOMMU gets probed, like for other devices.
> >>> 
> >>> This could fail in case the IOMMU device permanently fails to probe.
> >>> There would be something very wrong in the system in that case,
> >>> Enabling the bus masters totally transparently without IOMMU support
> >>> could not be such a good idea.
> >> 
> >> I believe in the majority of cases today, the IOMMU is entirely
> >> optional. There are valid reasons for not including the IOMMU driver in
> >> the kernel, e.g. when you know that all the machines with that driver
> >> can DMA to all of their RAM and you want to avoid the overhead of IOTLB
> >> misses.
> > 
> > This is similar to the problems we discussed with the componentized
> > device framework and in the end everybody agreed on a simple rule:
> > if the device node is enabled in the DT there must be a driver bound to
> > it before other devices depending on this node can be probed.
> > 
> > If we apply the same logic to the IOMMU situation we get two
> > possibilities to allow bypassing the IOMMU:
> > 1. completely disable the IOMMU node in the DT
> > 2. leave node enabled in DT, but add a bypass property
> > 
> > Obviously the second option still requires to have the IOMMU driver
> > available, but is more flexible. Right now everything depends on the
> > IOMMU being in passthrough mode at reset with no driver bound. If a
> > device comes around where this isn't the case thing will break with the
> > current assumptions or the first option.
> > 
> > Having the IOMMU node enabled in DT with no driver available to the
> > kernel seems like an invalid configuration which should be expected to
> > break. Exactly the same thing as with componentized devices...
> 
> One differences here is that for the IOMMU, we should generally be able
> to fall back to swiotlb

Or to nothing at all if the device can DMA to the allocated memory directly.

> (currently only on ARM64, not ARM32, until someone adds support). I can see
> your point regarding machines that have a mandatory IOMMU with no fallback
> when there is no driver, but we can support them by making the iommu driver
> selected through Kconfig for that platform, while still allowing other
> platforms to work with drivers left out of the kernel.

The question is how to tell the kernel not to wait for an IOMMU that will 
never be there. Would a kernel command line argument be an acceptable solution 
or do we need something more automatic ?

-- 
Regards,

Laurent Pinchart

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM

2014-12-18 Thread Laurent Pinchart
Hi Rafael,

On Thursday 18 December 2014 02:32:30 Rafael J. Wysocki wrote:
> On Wednesday, December 17, 2014 02:15:31 AM Laurent Pinchart wrote:
> > On Tuesday 16 December 2014 11:18:33 Tomasz Figa wrote:
> >> On Tue, Dec 16, 2014 at 4:53 AM, Laurent Pinchart wrote:
> >>> On Monday 15 December 2014 11:39:01 Tomasz Figa wrote:
>  On Sat, Dec 13, 2014 at 5:47 AM, Laurent Pinchart wrote:
> > On Friday 12 December 2014 13:15:51 Tomasz Figa wrote:
> >> On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote:
> >>> On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote:
>  On 11 December 2014 at 16:31, Kevin Hilman wrote:
> > Tomasz Figa  writes:
> >> On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson wrote:
> > [...]
> > 
>  @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct
>  platform_device *pdev)
>  return -ENXIO;
>  }
>  
>  +   pm_runtime_no_callbacks(dev);
>  +   pm_runtime_enable(dev);
>  +
>  +   /* Synchronize state of the domain with driver data.
>  */
>  +   pm_runtime_get_sync(dev);
>  +   iommu->is_powered = true;
> >>> 
> >>> Doesn't the runtime PM status reflect the value of
> >>> "is_powered", thus why do you need to have a copy of it? Could
> >>> it perpahps be that you try to cope with the case when
> >>> CONFIG_PM is unset?
> >> 
> >> It's worth noting that this driver fully relies on status of
> >> other devices in the power domain the IOMMU is in and does not
> >> enforce the status on its own. So in general, as far as my
> >> understanding of PM runtime subsystem, the status of the IOMMU
> >> device will be always suspended, because nobody will call
> >> pm_runtime_get() on it (except the get and put pair in probe).
> >> So is_powered is here to track status of the domain, not the
> >> device. Feel free to suggest a better way, though.
> > 
> > I still don't like these notifiers.  I think they add ways to
> > bypass having proper runtime PM implemented for
> > devices/subsystems.
>  
>  I do agree, but I haven't found another good solution to the
>  problem.
> >>> 
> >>> For the record, I'm not liking this mostly because it "fixes" a
> >>> generic problem in a way that's hidden in the genpd code and very
> >>> indirect.
> >> 
> >> Well, that's true. This is indeed a generic problem of PM
> >> dependencies between devices (other than those represented by
> >> parent-child relation), which in fact doesn't have much to do with
> >> genpd, but rather with those devices directly. It is just that
> >> genpd is the most convenient location to solve this in current code
> >> and in a simple way. In other words, I see this solution as a
> >> reasonable way to get the problem solved quickly for now, so that
> >> we can start thinking about a more elegant solution.
> >> 
> > From a high-level, the IOMMU is just another device inside the
> > PM domain, so ideally it should be doing it's own _get() and
> > _put() calls so the PM domain code would just do the right thing
> > without the need for notifiers.
>  
>  As I understand it, the IOMMU (or for other similar cases)
>  shouldn't be doing any get() and put() at all because there are
>  no IO API to serve request from.
> > 
> > Speaking purely from an IOMMU point of view that's not entirely
> > tree. IOMMU drivers expose map and unmap operations, so they can
> > track whether any memory is mapped. From a bus master point of view
> > the IOMMU map and unmap operations are hidden by the DMA mapping
> > API. The IOMMU can thus track the existence of mappings without any
> > IOMMU awareness in the bus master driver.
> > 
> > If no mapping exist the IOMMU shouldn't receive any translation
> > request. An IOMMU driver could thus call pm_runtime_get_sync() in
> > the map handler when creating the first mapping, and
> > pm_runtime_put() in the unmap handler when tearing the last mapping
> > down.
> > 
> > One could argue that the IOMMU would end up being powered more often
> > than strictly needed, as bus masters drivers, even when written
> > properly, could keep mappings around at times they don't perform bus
> > access. This is true, and that's an argument I've raised during the
> > last kernel summit. The general response (including Linus Torvald's)
> > was that micro-optimizing power management might not be worth it,
> > and that measurements proving that the gain is worth it are required
> > before introducing new APIs to solve the probl

Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Paolo Bonzini


On 18/12/2014 15:49, Zhang, Yang Z wrote:
>>> Here, we introduce a similar way with 'apic_arb_prio' to handle
>>> guest lowest priority interrtups when VT-d PI is used. Here is
>>> the ideas: - Each vCPU has a counter 'round_robin_counter'. -
>>> When guests sets an interrupts to lowest priority, we choose the
>>> vCPU with smallest 'round_robin_counter' as the destination, then
>>> increase it.
> 
> How this can work well? All subsequent interrupts are delivered to
> one vCPU? It shouldn't be the best solution, need more consideration.

Well, it's a hardware limitation.  The alternative (which is easy to
implement) is to only do PI for single-CPU interrupts.  This should work
well for multiqueue NICs (and of course for UP guests :)), so perhaps
it's a good idea to only support that as a first attempt.

Paolo

> Also, I think you should take the apic_arb_prio into consider since
> the priority is for the whole vCPU not for one interrupt.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> This patch initializes the VT-d Posted-Interrupts Descriptor.
> 
> Signed-off-by: Feng Wu 
> ---
>  arch/x86/kvm/vmx.c | 27 +++
>  1 file changed, 27 insertions(+)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> 0b1383e..66ca275 100644 --- a/arch/x86/kvm/vmx.c +++
> b/arch/x86/kvm/vmx.c @@ -45,6 +45,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "trace.h"
> @@ -4433,6 +4434,30 @@ static void ept_set_mmio_spte_mask(void)
>   kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);  }
> +static void pi_desc_init(struct vcpu_vmx *vmx) {
> + unsigned int dest;
> +
> + if (!irq_remapping_cap(IRQ_POSTING_CAP))
> + return;
> +
> + /*
> +  * Initialize Posted-Interrupt Descriptor
> +  */
> +
> + pi_clear_sn(&vmx->pi_desc);
> + vmx->pi_desc.nv = POSTED_INTR_VECTOR;

Here.

> +
> + /* Physical mode for Notificaiton Event */
> + vmx->pi_desc.ndm = 0;

And from here..

> + dest = cpu_physical_id(vmx->vcpu.cpu);
> +
> + if (x2apic_enabled())
> + vmx->pi_desc.ndst = dest;
> + else
> + vmx->pi_desc.ndst = (dest << 8) & 0xFF00; }
> +

..to here are useless. The right place to update PI descriptor is where vcpu 
got loaded not in initialization.

>  /*
>   * Sets up the vmcs for emulated real mode.
>   */
> @@ -4476,6 +4501,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)
> 
>   vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR);
>   vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc)));
> +
> + pi_desc_init(vmx);
>   }
>  
>   if (ple_gap) {


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set

2014-12-18 Thread Zhang, Yang Z
Paolo Bonzini wrote on 2014-12-18:
> 
> 
> On 18/12/2014 04:14, Wu, Feng wrote:
>> 
>> 
>> linux-kernel-ow...@vger.kernel.org wrote on 
>> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo:
>>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org;
>>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson;
>>> joro-zLv9SwRftAIdnm+Jiang
>>> Liu
>>> Cc: iommu@lists.linux-foundation.org;
>>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list;
>>> Eric Auger
>>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is
>>> set
>>> 
>>> 
>>> 
>>> On 12/12/2014 16:14, Feng Wu wrote:
 Currently, we don't support urgent interrupt, all interrupts are
 recognized as non-urgent interrupt, so we cannot send
 posted-interrupt when 'SN' is set.
>>> 
>>> Can this happen?  If the vcpu is in guest mode, it cannot have been
>>> scheduled out, and that's the only case when SN is set.
>>> 
>>> Paolo
>> 
>> Currently, the only place where SN is set is vCPU is preempted and

If the vCPU is preempted, shouldn't the subsequent be ignored? What happens if 
a PI is occurs when vCPU is preempted?

>> waiting for the next scheduling in the runqueue. But I am not sure
>> whether we need to set SN for other purpose in future. Adding SN
>> checking here is just to follow the Spec. non-urgent interrupts are
>> suppressed
> when SN is set.
> 
> I would change that to a WARN_ON_ONCE then.


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> Currently, we use a global vector as the Posted-Interrupts
> Notification Event for all the vCPUs in the system. We need to
> introduce another global vector for VT-d Posted-Interrtups, which will
> be used to wakeup the sleep vCPU when an external interrupt from a 
> direct-assigned device happens for that vCPU.
> 

Hi Feng, 

Since the idea of two global vectors mechanism is from me, please add me to the 
comments.

> Signed-off-by: Feng Wu 
> ---
>  arch/x86/include/asm/entry_arch.h  |  2 ++
>  arch/x86/include/asm/hardirq.h |  1 +
>  arch/x86/include/asm/hw_irq.h  |  2 ++
>  arch/x86/include/asm/irq_vectors.h |  1 +
>  arch/x86/kernel/entry_64.S |  2 ++
>  arch/x86/kernel/irq.c  | 27 +++
>  arch/x86/kernel/irqinit.c  |  2 ++
>  7 files changed, 37 insertions(+)
> diff --git a/arch/x86/include/asm/entry_arch.h
> b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 ---
> a/arch/x86/include/asm/entry_arch.h +++
> b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@
> BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR)  #ifdef
> CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR,
>smp_kvm_posted_intr_ipi)
> +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR,
> +  smp_kvm_posted_intr_wakeup_ipi)
>  #endif
>  
>  /*
> diff --git a/arch/x86/include/asm/hardirq.h
> b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -14,6 +14,7 @@ typedef struct {
>  #endif #ifdef CONFIG_HAVE_KVMunsigned int kvm_posted_intr_ipis;
>  +unsigned int kvm_posted_intr_wakeup_ipis; #endifunsigned int
>  x86_platform_ipis;   /* arch dependent */unsigned int apic_perf_irqs;
> diff --git a/arch/x86/include/asm/hw_irq.h
> b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h
> @@ -29,6 +29,7 @@
>  extern asmlinkage void apic_timer_interrupt(void);  extern asmlinkage
> void x86_platform_ipi(void);  extern asmlinkage void
> kvm_posted_intr_ipi(void); +extern asmlinkage void
> kvm_posted_intr_wakeup_ipi(void);
>  extern asmlinkage void error_interrupt(void);  extern asmlinkage void
> irq_work_interrupt(void);
> 
> @@ -92,6 +93,7 @@ extern void
> trace_call_function_single_interrupt(void);
>  #define trace_irq_move_cleanup_interrupt  irq_move_cleanup_interrupt
> #define trace_reboot_interrupt  reboot_interrupt  #define
> trace_kvm_posted_intr_ipi kvm_posted_intr_ipi
> +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi
>  #endif /* CONFIG_TRACING */
>  
>  struct irq_domain;
> diff --git a/arch/x86/include/asm/irq_vectors.h
> b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 ---
> a/arch/x86/include/asm/irq_vectors.h +++
> b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@
>  /* Vector for KVM to deliver posted interrupt IPI */  #ifdef
>  CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR   0xf2 +#define
>  POSTED_INTR_WAKEUP_VECTOR0xf1 #endif
>  
>  /*
> diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++
> b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt
> X86_PLATFORM_IPI_VECTOR \  #ifdef CONFIG_HAVE_KVM
>  apicinterrupt3 POSTED_INTR_VECTOR \
>   kvm_posted_intr_ipi smp_kvm_posted_intr_ipi
> +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \
> + kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi
>  #endif
>  
>  #ifdef CONFIG_X86_MCE_THRESHOLD
> diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index
> 922d285..47408c3 100644
> --- a/arch/x86/kernel/irq.c
> +++ b/arch/x86/kernel/irq.c
> @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs
> *regs)  }
> 
>  #ifdef CONFIG_HAVE_KVM
> +void (*wakeup_handler_callback)(void) = NULL;
> +EXPORT_SYMBOL_GPL(wakeup_handler_callback); +
>  /*
>   * Handler for POSTED_INTERRUPT_VECTOR.
>   */
> @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct
> pt_regs *regs)
> 
>   set_irq_regs(old_regs);
>  }
> +
> +/*
> + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR.
> + */
> +__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs) {
> + struct pt_regs *old_regs = set_irq_regs(regs);
> +
> + ack_APIC_irq();
> +
> + irq_enter();
> +
> + exit_idle();
> +
> + inc_irq_stat(kvm_posted_intr_wakeup_ipis);
> +
> + if (wakeup_handler_callback)
> + wakeup_handler_callback();
> +
> + irq_exit();
> +
> + set_irq_regs(old_regs);
> +}
> +
>  #endif
>  
>  __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs) diff
> --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index
> 70e181e..844673c 100644 --- a/arch/x86/kernel/irqinit.c +++
> b/arch/x86/kernel/irqinit.c @@ -144,6 +144,8 

RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> This patch defines a new interface kvm_find_dest_vcpu for
> VT-d PI, which can returns the destination vCPU of the
> interrupt for guests.
> 
> Since VT-d PI cannot handle broadcast/multicast interrupt,
> Here we only handle Fixed and Lowest priority interrupts.
> 
> The current method of handling guest lowest priority interrtups
> is to use a counter 'apic_arb_prio' for each vCPU, we choose the
> vCPU with smallest 'apic_arb_prio' and then increase it by 1.
> However, for VT-d PI, we cannot re-use this, since we no longer
> have control to 'apic_arb_prio' with posted interrupt direct
> delivery by Hardware.
> 
> Here, we introduce a similar way with 'apic_arb_prio' to handle guest
> lowest priority interrtups when VT-d PI is used. Here is the ideas: -
> Each vCPU has a counter 'round_robin_counter'. - When guests sets an
> interrupts to lowest priority, we choose the vCPU with smallest
> 'round_robin_counter' as the destination, then increase it.
 
How this can work well? All subsequent interrupts are delivered to one vCPU? It 
shouldn't be the best solution, need more consideration. Also, I think you 
should take the apic_arb_prio into consider since the priority is for the whole 
vCPU not for one interrupt.

Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts

2014-12-18 Thread Zhang, Yang Z
Feng Wu wrote on 2014-12-12:
> We don't need to migrate the irqs for VT-d Posted-Interrupts here.
> When 'pst' is set in IRTE, the associated irq will be posted to guests
> instead of interrupt remapping. The destination of the interrupt is
> set in Posted-Interrupts Descriptor, and the migration happens during
> vCPU scheduling.
> 
> However, we still update the cached irte here, which can be used when
> changing back to remapping mode.
> 
> Signed-off-by: Feng Wu 
> Reviewed-by: Jiang Liu 
> ---
>  drivers/iommu/intel_irq_remapping.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 ---
> a/drivers/iommu/intel_irq_remapping.c +++
> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@
> intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask,
>  {
>   struct intel_ir_data *ir_data = data->chip_data;struct irte 
> *irte =
>  &ir_data->irte_entry; +  struct irte_pi *irte_pi = (struct irte_pi
>  *)irte;  struct irq_cfg *cfg = irqd_cfg(data);   struct irq_data *parent
>  = data->parent_data; int ret;
> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data,
> const struct cpumask *mask,
>*/
>   irte->vector = cfg->vector;
>   irte->dest_id = IRTE_DEST(cfg->dest_apicid);
> - modify_irte(&ir_data->irq_2_iommu, irte);
> +
> + /* We don't need to modify irte if the interrupt is for posting. */
> + if (irte_pi->pst != 1)
> + modify_irte(&ir_data->irq_2_iommu, irte);

What happens if user changes the IRQ affinity manually?

> 
>   /*
>* After this point, all the interrupts will start arriving


Best regards,
Yang


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu