RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
Wu, Feng wrote on 2014-12-19: > > > Zhang, Yang Z wrote on 2014-12-19: >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is >> set >> >> Wu, Feng wrote on 2014-12-19: >>> >>> >>> Zhang, Yang Z wrote on 2014-12-19: Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Wu, Feng wrote on 2014-12-19: > > > Zhang, Yang Z wrote on 2014-12-19: >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' >> is set >> >> Wu, Feng wrote on 2014-12-19: >>> >>> >>> iommu-boun...@lists.linux-foundation.org wrote on >> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Paolo Bonzini wrote on 2014-12-18: > > > On 18/12/2014 04:14, Wu, Feng wrote: >> >> >> linux-kernel-ow...@vger.kernel.org wrote on mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; >>> dw...@infradead.org; >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; >>> joro-zLv9SwRftAIdnm+Jiang Liu Cc: >>> iommu@lists.linux-foundation.org; >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; > KVM >> list; >>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress >>> posted-interrupt when 'SN' is set >>> >>> >>> >>> On 12/12/2014 16:14, Feng Wu wrote: Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. >>> >>> Can this happen? If the vcpu is in guest mode, it cannot >>> have been scheduled out, and that's the only case when SN is set. >>> >>> Paolo >> >> Currently, the only place where SN is set is vCPU is >> preempted and If the vCPU is preempted, shouldn't the subsequent be ignored? What happens if a PI is occurs when vCPU is preempted? >>> >>> If a vCPU is preempted, the 'SN' bit is set, the subsequent >>> interrupts are suppressed for posting. >> >> I mean what happens if we don't set SN bit. From my point, if >> preempter already disabled the interrupt, it is ok to leave SN >> bit as zero. But if preempter enabled the interrupt, doesn't >> this mean he allow interrupt to happen? BTW, since there >> already has ON bit, so this means there only have one interrupt >> arrived at most and it doesn't hurt performance. Do we really need to >> set SN bit? > > > See this scenario: > vCPU0 is running on pCPU0 > --> vCPU0 is preempted by vCPU1 > --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for > --> schedule in runqueue > > If the we don't set SN for vCPU0, then all subsequent interrupts > for > vCPU0 is posted to vCPU1, this will consume hardware and > software The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event? >>> >>> Wakeup vector is only used for blocking case, when vCPU is >>> preempted and waiting in the runqueue, the NV is the notification vector. >> >> I see your point. But from performance point, if we can schedule the >> vCPU to another PCPU to handle the interrupt, it would helpful. But I >> remember current KVM will not schedule the vCPU in run queue (even >> though it got preempted) to another pCPU to run(Am I right?). So it may >> hard to do it. >> > > KVM is using the Linux scheduler, when the preempted vCPU (in > runqueue) is scheduled again depends on the scheduling algorithm > itself, I think it is a little hard for us to get involved. > > I think what you mentioned is a little like the urgent interrupt in VT-d PI > Spec. > For this kind of interrupts, if an interrupt is coming for an > preempted vCPU (waiting in the run queue), we need to schedule the > vCPU immediately. This is some real time things. And we don't support urgent > interrupt so far. Yes. IIRC, if we use two global vectors mechanism properly, there should no need to use hardware urgent interrupt mechanism. :) > > Thanks, > Feng > >>> >>> Thanks, >>> Feng >>> > efforts and in fact it is not needed at all. If SN is set for > vCPU0, VT-d hardware will not issue Notification Event for vCPU0 > when an interrupt is for it, but just setting the related PIR bit. > > Thanks, > Feng > >> >>> >>> Thanks, >>> Feng >>> >> waiting for the next scheduling in the runqueue. But I am >>
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> -Original Message- > From: Zhang, Yang Z > Sent: Friday, December 19, 2014 1:26 PM > To: Wu, Feng; Paolo Bonzini; k...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org > Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set > > Wu, Feng wrote on 2014-12-19: > > > > > > Zhang, Yang Z wrote on 2014-12-19: > >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is > >> set > >> > >> Wu, Feng wrote on 2014-12-19: > >>> > >>> > >>> Zhang, Yang Z wrote on 2014-12-19: > Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' > is set > > Wu, Feng wrote on 2014-12-19: > > > > > > iommu-boun...@lists.linux-foundation.org wrote on > mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: > >> Cc: iommu@lists.linux-foundation.org; > >> linux-ker...@vger.kernel.org; k...@vger.kernel.org > >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' > >> is set > >> > >> Paolo Bonzini wrote on 2014-12-18: > >>> > >>> > >>> On 18/12/2014 04:14, Wu, Feng wrote: > > > linux-kernel-ow...@vger.kernel.org wrote on > >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: > > x...@kernel.org; Gleb Natapov; Paolo Bonzini; > > dw...@infradead.org; > > joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex > Williamson; > > joro-zLv9SwRftAIdnm+Jiang Liu Cc: > > iommu@lists.linux-foundation.org; > > linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM > list; > > Eric Auger Subject: Re: [v3 25/26] KVM: Suppress > > posted-interrupt when 'SN' is set > > > > > > > > On 12/12/2014 16:14, Feng Wu wrote: > >> Currently, we don't support urgent interrupt, all > >> interrupts are recognized as non-urgent interrupt, so we > >> cannot send posted-interrupt when 'SN' is set. > > > > Can this happen? If the vcpu is in guest mode, it cannot > > have been scheduled out, and that's the only case when SN is set. > > > > Paolo > > Currently, the only place where SN is set is vCPU is > preempted and > >> > >> If the vCPU is preempted, shouldn't the subsequent be ignored? > >> What happens if a PI is occurs when vCPU is preempted? > > > > If a vCPU is preempted, the 'SN' bit is set, the subsequent > > interrupts are suppressed for posting. > > I mean what happens if we don't set SN bit. From my point, if > preempter already disabled the interrupt, it is ok to leave SN > bit as zero. But if preempter enabled the interrupt, doesn't this > mean he allow interrupt to happen? BTW, since there already has > ON bit, so this means there only have one interrupt arrived at > most and it doesn't hurt performance. Do we really need to set SN bit? > >>> > >>> > >>> See this scenario: > >>> vCPU0 is running on pCPU0 > >>> --> vCPU0 is preempted by vCPU1 > >>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for > >>> --> schedule in runqueue > >>> > >>> If the we don't set SN for vCPU0, then all subsequent interrupts > >>> for > >>> vCPU0 is posted to vCPU1, this will consume hardware and software > >> > >> The PI vector for vCPU1 is notification vector, but the PI vector > >> for > >> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event? > > > > Wakeup vector is only used for blocking case, when vCPU is preempted > > and waiting in the runqueue, the NV is the notification vector. > > I see your point. But from performance point, if we can schedule the vCPU to > another PCPU to handle the interrupt, it would helpful. But I remember current > KVM will not schedule the vCPU in run queue (even though it got preempted) to > another pCPU to run(Am I right?). So it may hard to do it. > KVM is using the Linux scheduler, when the preempted vCPU (in runqueue) is scheduled again depends on the scheduling algorithm itself, I think it is a little hard for us to get involved. I think what you mentioned is a little like the urgent interrupt in VT-d PI Spec. For this kind of interrupts, if an interrupt is coming for an preempted vCPU (waiting in the run queue), we need to schedule the vCPU immediately. This is some real time things. And we don't support urgent interrupt so far. Thanks, Feng > > > > Thanks, > > Feng > > > >> > >>> efforts and in fact it is not needed at all. If SN is set for > >>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0 > >>> when an interrupt is for it, but just setting the related PIR bit. > >>> > >>> Thanks, > >>> Feng > >>> > > > > > Thanks, > > Feng > > > >> > waiting for the next scheduling in the runqueue. But I am not > sure whether we need to set SN for other purpose in futu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
Wu, Feng wrote on 2014-12-19: > > > Zhang, Yang Z wrote on 2014-12-19: >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is >> set >> >> Wu, Feng wrote on 2014-12-19: >>> >>> >>> Zhang, Yang Z wrote on 2014-12-19: Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Wu, Feng wrote on 2014-12-19: > > > iommu-boun...@lists.linux-foundation.org wrote on mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: >> Cc: iommu@lists.linux-foundation.org; >> linux-ker...@vger.kernel.org; k...@vger.kernel.org >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' >> is set >> >> Paolo Bonzini wrote on 2014-12-18: >>> >>> >>> On 18/12/2014 04:14, Wu, Feng wrote: linux-kernel-ow...@vger.kernel.org wrote on >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: > x...@kernel.org; Gleb Natapov; Paolo Bonzini; > dw...@infradead.org; > joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; > joro-zLv9SwRftAIdnm+Jiang Liu Cc: > iommu@lists.linux-foundation.org; > linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list; > Eric Auger Subject: Re: [v3 25/26] KVM: Suppress > posted-interrupt when 'SN' is set > > > > On 12/12/2014 16:14, Feng Wu wrote: >> Currently, we don't support urgent interrupt, all >> interrupts are recognized as non-urgent interrupt, so we >> cannot send posted-interrupt when 'SN' is set. > > Can this happen? If the vcpu is in guest mode, it cannot > have been scheduled out, and that's the only case when SN is set. > > Paolo Currently, the only place where SN is set is vCPU is preempted and >> >> If the vCPU is preempted, shouldn't the subsequent be ignored? >> What happens if a PI is occurs when vCPU is preempted? > > If a vCPU is preempted, the 'SN' bit is set, the subsequent > interrupts are suppressed for posting. I mean what happens if we don't set SN bit. From my point, if preempter already disabled the interrupt, it is ok to leave SN bit as zero. But if preempter enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW, since there already has ON bit, so this means there only have one interrupt arrived at most and it doesn't hurt performance. Do we really need to set SN bit? >>> >>> >>> See this scenario: >>> vCPU0 is running on pCPU0 >>> --> vCPU0 is preempted by vCPU1 >>> --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for >>> --> schedule in runqueue >>> >>> If the we don't set SN for vCPU0, then all subsequent interrupts >>> for >>> vCPU0 is posted to vCPU1, this will consume hardware and software >> >> The PI vector for vCPU1 is notification vector, but the PI vector >> for >> vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event? > > Wakeup vector is only used for blocking case, when vCPU is preempted > and waiting in the runqueue, the NV is the notification vector. I see your point. But from performance point, if we can schedule the vCPU to another PCPU to handle the interrupt, it would helpful. But I remember current KVM will not schedule the vCPU in run queue (even though it got preempted) to another pCPU to run(Am I right?). So it may hard to do it. > > Thanks, > Feng > >> >>> efforts and in fact it is not needed at all. If SN is set for >>> vCPU0, VT-d hardware will not issue Notification Event for vCPU0 >>> when an interrupt is for it, but just setting the related PIR bit. >>> >>> Thanks, >>> Feng >>> > > Thanks, > Feng > >> waiting for the next scheduling in the runqueue. But I am not sure whether we need to set SN for other purpose in future. Adding SN checking here is just to follow the Spec. non-urgent interrupts are suppressed >>> when SN is set. >>> >>> I would change that to a WARN_ON_ONCE then. >> >> >> Best regards, >> Yang >> >> >> ___ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu Best regards, Yang >> >> >> Best regards, >> Yang >> Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> -Original Message- > From: Zhang, Yang Z > Sent: Friday, December 19, 2014 12:44 PM > To: Wu, Feng; Paolo Bonzini; k...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org > Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set > > Wu, Feng wrote on 2014-12-19: > > > > > > Zhang, Yang Z wrote on 2014-12-19: > >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is > >> set > >> > >> Wu, Feng wrote on 2014-12-19: > >>> > >>> > >>> iommu-boun...@lists.linux-foundation.org wrote on > >> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: > Cc: iommu@lists.linux-foundation.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org > Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' > is set > > Paolo Bonzini wrote on 2014-12-18: > > > > > > On 18/12/2014 04:14, Wu, Feng wrote: > >> > >> > >> linux-kernel-ow...@vger.kernel.org wrote on > mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: > >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; > dw...@infradead.org; > >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; > >>> joro-zLv9SwRftAIdnm+Jiang Liu Cc: > >>> iommu@lists.linux-foundation.org; > >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM > list; > >>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt > >>> when 'SN' is set > >>> > >>> > >>> > >>> On 12/12/2014 16:14, Feng Wu wrote: > Currently, we don't support urgent interrupt, all interrupts > are recognized as non-urgent interrupt, so we cannot send > posted-interrupt when 'SN' is set. > >>> > >>> Can this happen? If the vcpu is in guest mode, it cannot have > >>> been scheduled out, and that's the only case when SN is set. > >>> > >>> Paolo > >> > >> Currently, the only place where SN is set is vCPU is preempted > >> and > > If the vCPU is preempted, shouldn't the subsequent be ignored? > What happens if a PI is occurs when vCPU is preempted? > >>> > >>> If a vCPU is preempted, the 'SN' bit is set, the subsequent > >>> interrupts are suppressed for posting. > >> > >> I mean what happens if we don't set SN bit. From my point, if > >> preempter already disabled the interrupt, it is ok to leave SN bit > >> as zero. But if preempter enabled the interrupt, doesn't this mean > >> he allow interrupt to happen? BTW, since there already has ON bit, > >> so this means there only have one interrupt arrived at most and it > >> doesn't hurt performance. Do we really need to set SN bit? > > > > > > See this scenario: > > vCPU0 is running on pCPU0 > > --> vCPU0 is preempted by vCPU1 > > --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule > > --> in runqueue > > > > If the we don't set SN for vCPU0, then all subsequent interrupts for > > vCPU0 is posted to vCPU1, this will consume hardware and software > > The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0 > should be wakeup vector. Why vCPU1 will consume this PI event? Wakeup vector is only used for blocking case, when vCPU is preempted and waiting in the runqueue, the NV is the notification vector. Thanks, Feng > > > efforts and in fact it is not needed at all. If SN is set for vCPU0, > > VT-d hardware will not issue Notification Event for vCPU0 when an > > interrupt is for it, but just setting the related PIR bit. > > > > Thanks, > > Feng > > > >> > >>> > >>> Thanks, > >>> Feng > >>> > > >> waiting for the next scheduling in the runqueue. But I am not > >> sure whether we need to set SN for other purpose in future. > >> Adding SN checking here is just to follow the Spec. non-urgent > >> interrupts are suppressed > > when SN is set. > > > > I would change that to a WARN_ON_ONCE then. > > > Best regards, > Yang > > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu > >> > >> > >> Best regards, > >> Yang > >> > > > Best regards, > Yang > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
Wu, Feng wrote on 2014-12-19: > > > Zhang, Yang Z wrote on 2014-12-19: >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is >> set >> >> Wu, Feng wrote on 2014-12-19: >>> >>> >>> iommu-boun...@lists.linux-foundation.org wrote on >> mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; k...@vger.kernel.org Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set Paolo Bonzini wrote on 2014-12-18: > > > On 18/12/2014 04:14, Wu, Feng wrote: >> >> >> linux-kernel-ow...@vger.kernel.org wrote on mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org; >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; >>> joro-zLv9SwRftAIdnm+Jiang Liu Cc: >>> iommu@lists.linux-foundation.org; >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list; >>> Eric Auger Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt >>> when 'SN' is set >>> >>> >>> >>> On 12/12/2014 16:14, Feng Wu wrote: Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. >>> >>> Can this happen? If the vcpu is in guest mode, it cannot have >>> been scheduled out, and that's the only case when SN is set. >>> >>> Paolo >> >> Currently, the only place where SN is set is vCPU is preempted >> and If the vCPU is preempted, shouldn't the subsequent be ignored? What happens if a PI is occurs when vCPU is preempted? >>> >>> If a vCPU is preempted, the 'SN' bit is set, the subsequent >>> interrupts are suppressed for posting. >> >> I mean what happens if we don't set SN bit. From my point, if >> preempter already disabled the interrupt, it is ok to leave SN bit >> as zero. But if preempter enabled the interrupt, doesn't this mean >> he allow interrupt to happen? BTW, since there already has ON bit, >> so this means there only have one interrupt arrived at most and it >> doesn't hurt performance. Do we really need to set SN bit? > > > See this scenario: > vCPU0 is running on pCPU0 > --> vCPU0 is preempted by vCPU1 > --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule > --> in runqueue > > If the we don't set SN for vCPU0, then all subsequent interrupts for > vCPU0 is posted to vCPU1, this will consume hardware and software The PI vector for vCPU1 is notification vector, but the PI vector for vCPU0 should be wakeup vector. Why vCPU1 will consume this PI event? > efforts and in fact it is not needed at all. If SN is set for vCPU0, > VT-d hardware will not issue Notification Event for vCPU0 when an > interrupt is for it, but just setting the related PIR bit. > > Thanks, > Feng > >> >>> >>> Thanks, >>> Feng >>> >> waiting for the next scheduling in the runqueue. But I am not >> sure whether we need to set SN for other purpose in future. >> Adding SN checking here is just to follow the Spec. non-urgent >> interrupts are suppressed > when SN is set. > > I would change that to a WARN_ON_ONCE then. Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu >> >> >> Best regards, >> Yang >> Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> -Original Message- > From: Zhang, Yang Z > Sent: Friday, December 19, 2014 11:33 AM > To: Wu, Feng; Paolo Bonzini; k...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org > Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set > > Wu, Feng wrote on 2014-12-19: > > > > > > iommu-boun...@lists.linux-foundation.org wrote on > mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: > >> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > >> k...@vger.kernel.org > >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is > >> set > >> > >> Paolo Bonzini wrote on 2014-12-18: > >>> > >>> > >>> On 18/12/2014 04:14, Wu, Feng wrote: > > > linux-kernel-ow...@vger.kernel.org wrote on > >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: > > x...@kernel.org; Gleb Natapov; Paolo Bonzini; > > dw...@infradead.org; > > joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; > > joro-zLv9SwRftAIdnm+Jiang > > Liu > > Cc: iommu@lists.linux-foundation.org; > > linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list; > > Eric Auger > > Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' > > is set > > > > > > > > On 12/12/2014 16:14, Feng Wu wrote: > >> Currently, we don't support urgent interrupt, all interrupts > >> are recognized as non-urgent interrupt, so we cannot send > >> posted-interrupt when 'SN' is set. > > > > Can this happen? If the vcpu is in guest mode, it cannot have > > been scheduled out, and that's the only case when SN is set. > > > > Paolo > > Currently, the only place where SN is set is vCPU is preempted > and > >> > >> If the vCPU is preempted, shouldn't the subsequent be ignored? What > >> happens if a PI is occurs when vCPU is preempted? > > > > If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts > > are suppressed for posting. > > I mean what happens if we don't set SN bit. From my point, if preempter > already disabled the interrupt, it is ok to leave SN bit as zero. But if > preempter > enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW, > since there already has ON bit, so this means there only have one interrupt > arrived at most and it doesn't hurt performance. Do we really need to set SN > bit? See this scenario: vCPU0 is running on pCPU0 --> vCPU0 is preempted by vCPU1 --> Then vCPU1 is running on pCPU0 and vCPU0 is waiting for schedule in runqueue If the we don't set SN for vCPU0, then all subsequent interrupts for vCPU0 is posted to vCPU1, this will consume hardware and software efforts and in fact it is not needed at all. If SN is set for vCPU0, VT-d hardware will not issue Notification Event for vCPU0 when an interrupt is for it, but just setting the related PIR bit. Thanks, Feng > > > > > Thanks, > > Feng > > > >> > waiting for the next scheduling in the runqueue. But I am not > sure whether we need to set SN for other purpose in future. > Adding SN checking here is just to follow the Spec. non-urgent > interrupts are suppressed > >>> when SN is set. > >>> > >>> I would change that to a WARN_ON_ONCE then. > >> > >> > >> Best regards, > >> Yang > >> > >> > >> ___ > >> iommu mailing list > >> iommu@lists.linux-foundation.org > >> https://lists.linuxfoundation.org/mailman/listinfo/iommu > > > Best regards, > Yang > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
Wu, Feng wrote on 2014-12-19: > > > iommu-boun...@lists.linux-foundation.org wrote on > mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of: >> Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; >> k...@vger.kernel.org >> Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is >> set >> >> Paolo Bonzini wrote on 2014-12-18: >>> >>> >>> On 18/12/2014 04:14, Wu, Feng wrote: linux-kernel-ow...@vger.kernel.org wrote on >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: > x...@kernel.org; Gleb Natapov; Paolo Bonzini; > dw...@infradead.org; > joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; > joro-zLv9SwRftAIdnm+Jiang > Liu > Cc: iommu@lists.linux-foundation.org; > linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list; > Eric Auger > Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' > is set > > > > On 12/12/2014 16:14, Feng Wu wrote: >> Currently, we don't support urgent interrupt, all interrupts >> are recognized as non-urgent interrupt, so we cannot send >> posted-interrupt when 'SN' is set. > > Can this happen? If the vcpu is in guest mode, it cannot have > been scheduled out, and that's the only case when SN is set. > > Paolo Currently, the only place where SN is set is vCPU is preempted and >> >> If the vCPU is preempted, shouldn't the subsequent be ignored? What >> happens if a PI is occurs when vCPU is preempted? > > If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts > are suppressed for posting. I mean what happens if we don't set SN bit. From my point, if preempter already disabled the interrupt, it is ok to leave SN bit as zero. But if preempter enabled the interrupt, doesn't this mean he allow interrupt to happen? BTW, since there already has ON bit, so this means there only have one interrupt arrived at most and it doesn't hurt performance. Do we really need to set SN bit? > > Thanks, > Feng > >> waiting for the next scheduling in the runqueue. But I am not sure whether we need to set SN for other purpose in future. Adding SN checking here is just to follow the Spec. non-urgent interrupts are suppressed >>> when SN is set. >>> >>> I would change that to a WARN_ON_ONCE then. >> >> >> Best regards, >> Yang >> >> >> ___ >> iommu mailing list >> iommu@lists.linux-foundation.org >> https://lists.linuxfoundation.org/mailman/listinfo/iommu Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
> -Original Message- > From: iommu-boun...@lists.linux-foundation.org > [mailto:iommu-boun...@lists.linux-foundation.org] On Behalf Of Zhang, Yang Z > Sent: Thursday, December 18, 2014 11:10 PM > To: Paolo Bonzini; k...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org; > k...@vger.kernel.org > Subject: RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set > > Paolo Bonzini wrote on 2014-12-18: > > > > > > On 18/12/2014 04:14, Wu, Feng wrote: > >> > >> > >> linux-kernel-ow...@vger.kernel.org wrote on > mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: > >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org; > >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; > >>> joro-zLv9SwRftAIdnm+Jiang > >>> Liu > >>> Cc: iommu@lists.linux-foundation.org; > >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list; > >>> Eric Auger > >>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is > >>> set > >>> > >>> > >>> > >>> On 12/12/2014 16:14, Feng Wu wrote: > Currently, we don't support urgent interrupt, all interrupts are > recognized as non-urgent interrupt, so we cannot send > posted-interrupt when 'SN' is set. > >>> > >>> Can this happen? If the vcpu is in guest mode, it cannot have been > >>> scheduled out, and that's the only case when SN is set. > >>> > >>> Paolo > >> > >> Currently, the only place where SN is set is vCPU is preempted and > > If the vCPU is preempted, shouldn't the subsequent be ignored? What happens > if a PI is occurs when vCPU is preempted? If a vCPU is preempted, the 'SN' bit is set, the subsequent interrupts are suppressed for posting. Thanks, Feng > > >> waiting for the next scheduling in the runqueue. But I am not sure > >> whether we need to set SN for other purpose in future. Adding SN > >> checking here is just to follow the Spec. non-urgent interrupts are > >> suppressed > > when SN is set. > > > > I would change that to a WARN_ON_ONCE then. > > > Best regards, > Yang > > > ___ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU is blocked
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini > Sent: Thursday, December 18, 2014 4:37 PM > To: linux-ker...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org > Subject: Re: [v3 24/26] KVM: Update Posted-Interrupts Descriptor when vCPU > is blocked > > > > On 18/12/2014 04:16, Wu, Feng wrote: > >>> pre-block: > >>> - Add the vCPU to the blocked per-CPU list > >>> - Clear 'SN' > >> > >> Should SN be already clear (and NV set to POSTED_INTR_VECTOR)? > > > > I think the SN bit should be clear here, Adding it here is just to make sure > > SN is clear when vCPU is blocked, so it can receive wakeup notification > > event > later. > > Then, please, WARN if the SN bit is set inside the if (vcpu->blocked). > Inside that if you can just add the vCPU to the blocked list on vcpu_put. > > >> Can it > >> happen that you go from sched-out to blocked without doing a sched-in > first? > >> > > > > I cannot imagine this scenario, can you please be more specific? Thanks a > > lot! > > I cannot either. :) But it would be the case where SN is not cleared. > So we agree that it cannot happen. > > >> In fact, if this is possible, what happens if vcpu->preempted && > >> vcpu->blocked? > > > > In fact, vcpu->preempted && vcpu->blocked happens sometimes, but I think > there is > > no issues. Please refer to the following case: > > I agree that there should be no issues. But if it can happen, it's better: > > 1) to separate the handling of preemption and blocking: preemption > handles SN/NV/NDST, blocking handles the wakeup list. > Sorry, I don't quite understand this. I think handling of preemption and blocking is separated in vmx_vcpu_put(). For vmx_vcpu_load(), the handling of SN/NV/NDST is common for preemption and blocking. Thanks, Feng > 2) to change this > > + } else if (vcpu->blocked) { > + /* > + * The vcpu is blocked on the wait queue. > + * Store the blocked vCPU on the list of the > + * vcpu->wakeup_cpu, which is the destination > + * of the wake-up notification event. > > to just > > } > if (vcpu->blocked) { > ... > } > > kvm_vcpu_block() > > -> vcpu->blocked = true; > > -> prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE); > > > > before schedule() is called, this vcpu is woken up by another guy, so > > the state of the vcpu associated thread is changed to TASK_RUNNING, > > then preemption happens after interrupts or the following schedule() is > > hit, this will call kvm_sched_out(), in which current->state == > TASK_RUNNING > > and vcpu->preempted is set to true. So now vcpu->preempted and > vcpu->blocked > > are both true. In vmx_vcpu_put(), we will check vcpu->preempted first, > > so > > the vCPU will not be blocked, and the vcpu->blocked will be set the > > false in > > vmx_vcpu_load(). > > > > But maybe I need do a little change to the vmx_vcpu_load() like below: > > > > /* > > * Delete the vCPU from the related wakeup queue > > * if we are resuming from blocked state > > */ > > if (vcpu->blocked) { > > vcpu->blocked = false; > > + /* if wakeup_cpu == -1, the > > vcpu is currently not > blocked on any > > + pCPU, don't need dequeue here > > */ > > + if (vcpu->wakeup_cpu != -1) { > > > spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock, > > vcpu->wakeup_cpu), flags); > > list_del(&vcpu->blocked_vcpu_list); > > > spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock, > > vcpu->wakeup_cpu), flags); > > vcpu->wakeup_cpu = -1; > > + } > > } > > Good idea. > > Paolo > > > Any ideas about this? Thanks a lot! > > > > Thanks, > > Feng > > > > > > -> schedule(); > > > > > >> > >>> - Set 'NV' to POSTED_INTR_WAKEUP_VECTOR > >>> > >>> post-block: > >>> - Remove the vCPU from the per-CPU list > >> > >> Paolo > >> > >>> Signed-off-by: Feng Wu > >> -- > >> To unsubscribe from this list: send the line "unsubscribe kvm" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http:
RE: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini > Sent: Thursday, December 18, 2014 12:18 AM > To: linux-ker...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org > Subject: Re: [v3 16/26] KVM: Make struct kvm_irq_routing_table accessible > > > > On 12/12/2014 16:14, Feng Wu wrote: > > Move struct kvm_irq_routing_table from irqchip.c to kvm_host.h, > > so we can use it outside of irqchip.c. > > > > Signed-off-by: Feng Wu > > --- > > include/linux/kvm_host.h | 19 +++ > > virt/kvm/irqchip.c | 11 --- > > 2 files changed, 19 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > index 0b9659d..cfa85ac 100644 > > --- a/include/linux/kvm_host.h > > +++ b/include/linux/kvm_host.h > > @@ -335,6 +335,25 @@ struct kvm_kernel_irq_routing_entry { > > struct hlist_node link; > > }; > > > > +#ifdef CONFIG_HAVE_KVM_IRQ_ROUTING > > + > > +struct kvm_irq_routing_table { > > + int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS]; > > + struct kvm_kernel_irq_routing_entry *rt_entries; > > + u32 nr_rt_entries; > > + /* > > +* Array indexed by gsi. Each entry contains list of irq chips > > +* the gsi is connected to. > > +*/ > > + struct hlist_head map[0]; > > +}; > > + > > +#else > > + > > +struct kvm_irq_routing_table {}; > > If possible, just make this "struct kvm_irq_routing_table;" and pull > this line to include/linux/kvm_types.h. > > Paolo Do you mean move the definition of struct kvm_irq_routing_table to include/linux/kvm_types.h and add a declaration here? Thanks, Feng > > > + > > +#endif > > + > > #ifndef KVM_PRIVATE_MEM_SLOTS > > #define KVM_PRIVATE_MEM_SLOTS 0 > > #endif > > diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c > > index 7f256f3..cdf29a6 100644 > > --- a/virt/kvm/irqchip.c > > +++ b/virt/kvm/irqchip.c > > @@ -31,17 +31,6 @@ > > #include > > #include "irq.h" > > > > -struct kvm_irq_routing_table { > > - int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS]; > > - struct kvm_kernel_irq_routing_entry *rt_entries; > > - u32 nr_rt_entries; > > - /* > > -* Array indexed by gsi. Each entry contains list of irq chips > > -* the gsi is connected to. > > -*/ > > - struct hlist_head map[0]; > > -}; > > - > > int kvm_irq_map_gsi(struct kvm *kvm, > > struct kvm_kernel_irq_routing_entry *entries, int gsi) > > { > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU is preempted
> -Original Message- > From: linux-kernel-ow...@vger.kernel.org > [mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo Bonzini > Sent: Thursday, December 18, 2014 4:32 PM > To: linux-ker...@vger.kernel.org > Cc: iommu@lists.linux-foundation.org; k...@vger.kernel.org; > linux-ker...@vger.kernel.org; k...@vger.kernel.org > Subject: Re: [v3 23/26] KVM: Update Posted-Interrupts Descriptor when vCPU > is preempted > > > > On 18/12/2014 04:15, Wu, Feng wrote: > > Thanks for your comments, Paolo! > > > > If we use u64 new_control, we cannot use new.sn any more. > > Maybe we can change the struct pi_desc {} like this: > > > > typedef struct pid_control{ > > u64 on : 1, > > sn : 1, > > rsvd_1 : 13, > > ndm : 1, > > nv : 8, > > rsvd_2 : 8, > > ndst: 32; > > }pid_control_t; > > > > struct pi_desc { > > u32 pir[8]; /* Posted interrupt requested */ > > pid_control_t control; > > Probably something like this to keep the union: > > typedef union pid_control { > u64 full; > struct { > u64 on : 1, > ... > } fields; > }; > > > u32 rsvd[6]; > > } __aligned(64); > > > > > > Then we can define pid_control_t new_control, old_control. And use > new_control.sn = 0. > > > > What is your opinon? > > Sure. Alternatively, keep using struct pi_desc new; just > do not zero it, nor access any field outide the control word. > > Paolo Yes, this is also a good idea. Thanks! Thanks, Feng > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM
On Thursday, December 18, 2014 11:28:58 PM Laurent Pinchart wrote: > Hi Kevin, > [cut] > > >> > > >> It would be better to be able to reference count the DMA engine from the > > >> bus master IMO and arguably you can use the runtime PM framework for > > >> that. Namely, give bus masters someting like > > >> > > >> pm_runtime_get_my_DMA_engine(bus_master_device) > > >> pm_runtime_put_my_DMA_engine(bus_master_device) > > >> > > >> and let them call these as they see fit. > > > > > > Please note that we're not talking about DMA engines here, but about > > > IOMMUs. DMA is involved through the DMA mapping API which hides the IOMMU > > > completely from the bus master drivers, not the DMA engine API. > > > > > > Exposing the IOMMU is something we want to avoid, but DMA mapping > > > start/stop operations could certainly be implemented. > > > > The problem with that is it only solves the IOMMU problem. We have a > > more generic PM dependency problem of which this IOMMU example is only a > > subset, so I think we need a more generic solution. > > I agree that a more generic solution is needed at least to support ACPI _DEP, > but that might not be optimal in the IOMMU use case as explained above. Well, since we need it anyway, why don't we implement it and then figure out if anything more specific needs to be done for the IOMMU case? -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
Wu, Feng wrote on 2014-12-19: > > > Paolo Bonzini wrote on 2014-12-19: >> jiang@linux.intel.com >> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; >> iommu@lists.linux-foundation.org; k...@vger.kernel.org >> Subject: Re: [v3 13/26] KVM: Define a new interface >> kvm_find_dest_vcpu() for VT-d PI >> >> >> >> On 18/12/2014 15:49, Zhang, Yang Z wrote: > Here, we introduce a similar way with 'apic_arb_prio' to handle > guest lowest priority interrtups when VT-d PI is used. Here is > the > ideas: - Each vCPU has a counter 'round_robin_counter'. - When > guests sets an interrupts to lowest priority, we choose the vCPU > with smallest 'round_robin_counter' as the destination, then > increase it. >>> >>> How this can work well? All subsequent interrupts are delivered to >>> one vCPU? It shouldn't be the best solution, need more consideration. >> >> Well, it's a hardware limitation. The alternative (which is easy to >> implement) is to only do PI for single-CPU interrupts. This should >> work well for multiqueue NICs (and of course for UP guests :)), so >> perhaps it's a good idea to only support that as a first attempt. >> >> Paolo > > Paolo, what do you mean by "single-CPU interrupts"? Do you mean we It should be same idea as I mentioned on another thread: deliver the interrupt to a single CPU(maybe the first matched VCPU?) > don't support lowest priority interrupts for PI? But Linux OS uses > lowest priority for most of the case? If so, we can hardly get benefit > from this feature for Linux guest OS. > > Thanks, > Feng > >> >>> Also, I think you should take the apic_arb_prio into consider >>> since the priority is for the whole vCPU not for one interrupt. Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
Wu, Feng wrote on 2014-12-19: > > > Zhang, Yang Z wrote on 2014-12-18: >> jiang@linux.intel.com >> Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; >> iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng >> Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for >> VT-d Posted-Interrupts >> >> Feng Wu wrote on 2014-12-12: >>> We don't need to migrate the irqs for VT-d Posted-Interrupts here. >>> When 'pst' is set in IRTE, the associated irq will be posted to >>> guests instead of interrupt remapping. The destination of the >>> interrupt is set in Posted-Interrupts Descriptor, and the >>> migration happens during vCPU scheduling. >>> >>> However, we still update the cached irte here, which can be used >>> when changing back to remapping mode. >>> >>> Signed-off-by: Feng Wu >>> Reviewed-by: Jiang Liu >>> --- >>> drivers/iommu/intel_irq_remapping.c | 6 +- >>> 1 file changed, 5 insertions(+), 1 deletion(-) diff --git >>> a/drivers/iommu/intel_irq_remapping.c >>> b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a >>> 100644 >>> --- a/drivers/iommu/intel_irq_remapping.c +++ >>> b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@ >>> intel_ir_set_affinity(struct irq_data *data, const struct cpumask >>> *mask, { >>> struct intel_ir_data *ir_data = data->chip_data;struct irte >>> *irte = >>> &ir_data->irte_entry; +struct irte_pi *irte_pi = (struct irte_pi >>> *)irte;struct irq_cfg *cfg = irqd_cfg(data); struct irq_data *parent >>> = data->parent_data; int ret; >>> @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data, >>> const struct cpumask *mask, >>> */ >>> irte->vector = cfg->vector; >>> irte->dest_id = IRTE_DEST(cfg->dest_apicid); >>> - modify_irte(&ir_data->irq_2_iommu, irte); >>> + >>> + /* We don't need to modify irte if the interrupt is for posting. */ >>> + if (irte_pi->pst != 1) >>> + modify_irte(&ir_data->irq_2_iommu, irte); >> >> What happens if user changes the IRQ affinity manually? > > If the IRQ is posted, its affinity is controlled by guest (irq <---> > vCPU <> pCPU), it has no effect when host changes its affinity. That's the problem: User is able to changes it in host but it never takes effect since it is actually controlled by guest. I guess it will break the IRQ balance too. > > Thanks, > Feng > >> >>> >>> /* >>> * After this point, all the interrupts will start arriving >> >> >> Best regards, >> Yang >> Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
> -Original Message- > From: Zhang, Yang Z > Sent: Thursday, December 18, 2014 10:26 PM > To: Wu, Feng; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; > x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; > dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; > jiang@linux.intel.com > Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; > iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng > Subject: RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d > Posted-Interrupts > > Feng Wu wrote on 2014-12-12: > > We don't need to migrate the irqs for VT-d Posted-Interrupts here. > > When 'pst' is set in IRTE, the associated irq will be posted to guests > > instead of interrupt remapping. The destination of the interrupt is > > set in Posted-Interrupts Descriptor, and the migration happens during > > vCPU scheduling. > > > > However, we still update the cached irte here, which can be used when > > changing back to remapping mode. > > > > Signed-off-by: Feng Wu > > Reviewed-by: Jiang Liu > > --- > > drivers/iommu/intel_irq_remapping.c | 6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel_irq_remapping.c > > b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 --- > > a/drivers/iommu/intel_irq_remapping.c +++ > > b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@ > > intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask, > > { > > struct intel_ir_data *ir_data = data->chip_data;struct irte > > *irte = > > &ir_data->irte_entry; +struct irte_pi *irte_pi = (struct irte_pi > > *)irte;struct irq_cfg *cfg = irqd_cfg(data); struct irq_data *parent > > = data->parent_data; int ret; > > @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data, > > const struct cpumask *mask, > > */ > > irte->vector = cfg->vector; > > irte->dest_id = IRTE_DEST(cfg->dest_apicid); > > - modify_irte(&ir_data->irq_2_iommu, irte); > > + > > + /* We don't need to modify irte if the interrupt is for posting. */ > > + if (irte_pi->pst != 1) > > + modify_irte(&ir_data->irq_2_iommu, irte); > > What happens if user changes the IRQ affinity manually? If the IRQ is posted, its affinity is controlled by guest (irq <---> vCPU <> pCPU), it has no effect when host changes its affinity. Thanks, Feng > > > > > /* > > * After this point, all the interrupts will start arriving > > > Best regards, > Yang > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
> -Original Message- > From: Zhang, Yang Z > Sent: Friday, December 19, 2014 9:14 AM > To: Paolo Bonzini; Wu, Feng; t...@linutronix.de; mi...@redhat.com; > h...@zytor.com; x...@kernel.org; g...@kernel.org; dw...@infradead.org; > j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com > Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; > iommu@lists.linux-foundation.org; k...@vger.kernel.org > Subject: RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for > VT-d PI > > Paolo Bonzini wrote on 2014-12-19: > > > > > > On 18/12/2014 15:49, Zhang, Yang Z wrote: > Here, we introduce a similar way with 'apic_arb_prio' to handle > guest lowest priority interrtups when VT-d PI is used. Here is the > ideas: - Each vCPU has a counter 'round_robin_counter'. - When > guests sets an interrupts to lowest priority, we choose the vCPU > with smallest 'round_robin_counter' as the destination, then > increase it. > >> > >> How this can work well? All subsequent interrupts are delivered to > >> one vCPU? It shouldn't be the best solution, need more consideration. > > > > Well, it's a hardware limitation. The alternative (which is easy to > > Agree, it is limited by hardware. But lowest priority distributes the > interrupt > more efficient than fixed mode. And current implementation more likes to > switch the lowest priority mode to fixed mode. In case of interrupt intensive > environment, this may be a bottleneck and VM may not benefit greatly from > VT-d PI. But agree again, it is really a hardware limitation. > > > implement) is to only do PI for single-CPU interrupts. This should > > work well for multiqueue NICs (and of course for UP guests :)), so > > perhaps it's a good idea to only support that as a first attempt. > > The more easy way is to deliver the interrupt to the first matched VCPU we > find. > The round_robin_counter really helps nothing here since the interrupt is > delivered by hardware directly. > > > > > Paolo > > > >> Also, I think you should take the apic_arb_prio into consider since > >> the priority is for the whole vCPU not for one interrupt. > > > Best regards, > Yang In fact, the current solution was discussed with Rajesh in the cc List, here is Rajesh's original words: "When you see a guest requesting a lowest priority interrupts (by programming the virtual IOAPIC, or by programming the virtual MSI/MSI-X registers), have KVM associate it to a vCPU. Or, put another way, use the 'apic_arb_prio' method you describe below, but instead of using it at time of interrupt (which you no longer have control with posted interrupt direct delivery), do it at time of initializing the interrupt resource. This way, if the guest asks for 4 lowest priority interrupts, and say you a guest with two vCPUs, the first interrupt request will be serviced by KVM by assigning it through posting to vCPU0, the next one goes to vCPU1, the next one would go back to vCPU0, and so forth.. You could also choose to do this based on vector hashing instead of round-robin." Thanks, Feng > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Friday, December 19, 2014 12:58 AM > To: Zhang, Yang Z; Wu, Feng; t...@linutronix.de; mi...@redhat.com; > h...@zytor.com; x...@kernel.org; g...@kernel.org; dw...@infradead.org; > j...@8bytes.org; alex.william...@redhat.com; jiang@linux.intel.com > Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; > iommu@lists.linux-foundation.org; k...@vger.kernel.org > Subject: Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for > VT-d PI > > > > On 18/12/2014 15:49, Zhang, Yang Z wrote: > >>> Here, we introduce a similar way with 'apic_arb_prio' to handle > >>> guest lowest priority interrtups when VT-d PI is used. Here is > >>> the ideas: - Each vCPU has a counter 'round_robin_counter'. - > >>> When guests sets an interrupts to lowest priority, we choose the > >>> vCPU with smallest 'round_robin_counter' as the destination, then > >>> increase it. > > > > How this can work well? All subsequent interrupts are delivered to > > one vCPU? It shouldn't be the best solution, need more consideration. > > Well, it's a hardware limitation. The alternative (which is easy to > implement) is to only do PI for single-CPU interrupts. This should work > well for multiqueue NICs (and of course for UP guests :)), so perhaps > it's a good idea to only support that as a first attempt. > > Paolo Paolo, what do you mean by "single-CPU interrupts"? Do you mean we don't support lowest priority interrupts for PI? But Linux OS uses lowest priority for most of the case? If so, we can hardly get benefit from this feature for Linux guest OS. Thanks, Feng > > > Also, I think you should take the apic_arb_prio into consider since > > the priority is for the whole vCPU not for one interrupt. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
Paolo Bonzini wrote on 2014-12-19: > > > On 18/12/2014 15:49, Zhang, Yang Z wrote: Here, we introduce a similar way with 'apic_arb_prio' to handle guest lowest priority interrtups when VT-d PI is used. Here is the ideas: - Each vCPU has a counter 'round_robin_counter'. - When guests sets an interrupts to lowest priority, we choose the vCPU with smallest 'round_robin_counter' as the destination, then increase it. >> >> How this can work well? All subsequent interrupts are delivered to >> one vCPU? It shouldn't be the best solution, need more consideration. > > Well, it's a hardware limitation. The alternative (which is easy to Agree, it is limited by hardware. But lowest priority distributes the interrupt more efficient than fixed mode. And current implementation more likes to switch the lowest priority mode to fixed mode. In case of interrupt intensive environment, this may be a bottleneck and VM may not benefit greatly from VT-d PI. But agree again, it is really a hardware limitation. > implement) is to only do PI for single-CPU interrupts. This should > work well for multiqueue NICs (and of course for UP guests :)), so > perhaps it's a good idea to only support that as a first attempt. The more easy way is to deliver the interrupt to the first matched VCPU we find. The round_robin_counter really helps nothing here since the interrupt is delivered by hardware directly. > > Paolo > >> Also, I think you should take the apic_arb_prio into consider since >> the priority is for the whole vCPU not for one interrupt. Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
> -Original Message- > From: Zhang, Yang Z > Sent: Thursday, December 18, 2014 10:55 PM > To: Wu, Feng; t...@linutronix.de; mi...@redhat.com; h...@zytor.com; > x...@kernel.org; g...@kernel.org; pbonz...@redhat.com; > dw...@infradead.org; j...@8bytes.org; alex.william...@redhat.com; > jiang@linux.intel.com > Cc: eric.au...@linaro.org; linux-ker...@vger.kernel.org; > iommu@lists.linux-foundation.org; k...@vger.kernel.org; Wu, Feng > Subject: RE: [v3 21/26] x86, irq: Define a global vector for VT-d > Posted-Interrupts > > Feng Wu wrote on 2014-12-12: > > Currently, we use a global vector as the Posted-Interrupts > > Notification Event for all the vCPUs in the system. We need to > > introduce another global vector for VT-d Posted-Interrtups, which will > > be used to wakeup the sleep vCPU when an external interrupt from a > direct-assigned device happens for that vCPU. > > > > Hi Feng, > > Since the idea of two global vectors mechanism is from me, please add me to > the comments. No problem, Yang, I will add a "suggested-by Yang Zhang " in this patch. Thanks a lot! Thanks, Feng > > > Signed-off-by: Feng Wu > > --- > > arch/x86/include/asm/entry_arch.h | 2 ++ > > arch/x86/include/asm/hardirq.h | 1 + > > arch/x86/include/asm/hw_irq.h | 2 ++ > > arch/x86/include/asm/irq_vectors.h | 1 + > > arch/x86/kernel/entry_64.S | 2 ++ > > arch/x86/kernel/irq.c | 27 > +++ > > arch/x86/kernel/irqinit.c | 2 ++ > > 7 files changed, 37 insertions(+) > > diff --git a/arch/x86/include/asm/entry_arch.h > > b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 --- > > a/arch/x86/include/asm/entry_arch.h +++ > > b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@ > > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) #ifdef > > CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi, > POSTED_INTR_VECTOR, > > smp_kvm_posted_intr_ipi) > > +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, > POSTED_INTR_WAKEUP_VECTOR, > > +smp_kvm_posted_intr_wakeup_ipi) > > #endif > > > > /* > > diff --git a/arch/x86/include/asm/hardirq.h > > b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644 > > --- a/arch/x86/include/asm/hardirq.h > > +++ b/arch/x86/include/asm/hardirq.h > > @@ -14,6 +14,7 @@ typedef struct { > > #endif #ifdef CONFIG_HAVE_KVM unsigned int kvm_posted_intr_ipis; > > + unsigned int kvm_posted_intr_wakeup_ipis; #endifunsigned int > > x86_platform_ipis; /* arch dependent */unsigned int apic_perf_irqs; > > diff --git a/arch/x86/include/asm/hw_irq.h > > b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644 > > --- a/arch/x86/include/asm/hw_irq.h > > +++ b/arch/x86/include/asm/hw_irq.h > > @@ -29,6 +29,7 @@ > > extern asmlinkage void apic_timer_interrupt(void); extern asmlinkage > > void x86_platform_ipi(void); extern asmlinkage void > > kvm_posted_intr_ipi(void); +extern asmlinkage void > > kvm_posted_intr_wakeup_ipi(void); > > extern asmlinkage void error_interrupt(void); extern asmlinkage void > > irq_work_interrupt(void); > > > > @@ -92,6 +93,7 @@ extern void > > trace_call_function_single_interrupt(void); > > #define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt > > #define trace_reboot_interrupt reboot_interrupt #define > > trace_kvm_posted_intr_ipi kvm_posted_intr_ipi > > +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi > > #endif /* CONFIG_TRACING */ > > > > struct irq_domain; > > diff --git a/arch/x86/include/asm/irq_vectors.h > > b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 --- > > a/arch/x86/include/asm/irq_vectors.h +++ > > b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@ > > /* Vector for KVM to deliver posted interrupt IPI */ #ifdef > > CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR 0xf2 +#define > > POSTED_INTR_WAKEUP_VECTOR 0xf1 #endif > > > > /* > > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > > index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++ > > b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt > > X86_PLATFORM_IPI_VECTOR \ #ifdef CONFIG_HAVE_KVM > > apicinterrupt3 POSTED_INTR_VECTOR \ > > kvm_posted_intr_ipi smp_kvm_posted_intr_ipi > > +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \ > > + kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi > > #endif > > > > #ifdef CONFIG_X86_MCE_THRESHOLD > > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index > > 922d285..47408c3 100644 > > --- a/arch/x86/kernel/irq.c > > +++ b/arch/x86/kernel/irq.c > > @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs > > *regs) } > > > > #ifdef CONFIG_HAVE_KVM > > +void (*wakeup_handler_callback)(void) = NULL; > > +EXPORT_SYMBOL_GPL(wakeup_handler_callback); + > > /* > > * Handler for POSTED_INTERRUPT_VECTOR. > > */ > > @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(
Re: [Linaro-mm-sig] [PATCH v3 18/19] iommu: exynos: init from dt-specific callback instead of initcall
On Thursday 18 December 2014 22:36:14 Laurent Pinchart wrote: > > > (currently only on ARM64, not ARM32, until someone adds support). I can see > > your point regarding machines that have a mandatory IOMMU with no fallback > > when there is no driver, but we can support them by making the iommu driver > > selected through Kconfig for that platform, while still allowing other > > platforms to work with drivers left out of the kernel. > > The question is how to tell the kernel not to wait for an IOMMU that will > never be there. Would a kernel command line argument be an acceptable > solution > or do we need something more automatic ? I would hope that we can find an automatic solution without relying on the command line. Unfortunately, I also remembered one case in support of what Lucas mentioned and that would break this: There is at least one SoC that uses cache-coherent DMA only when the IOMMU (ARM SMMU IIRC)is enabled, but is non-coherent otherwise. We can't really express that in DT, so we actually do rely on the IOMMU driver to be present on this machine when any "iommus" properties are used, as they would always come in combination with "dma-coherent" properties that are otherwise wrong. We can still enforce this by requiring the smmu driver to be built into the kernel on this platform, but simply saying that the device cannot support DMA as long as there is an iommus property but no driver for it does make a lot of sense. Note that there are more than two ways that the kernel could treat the situation of probing a device with a valid iommus reference but no driver loaded for the iommu: a) assume all iommu drivers are initialized early, so use linear or swiotlb dma_map_ops, and probe the driver normally. This breaks the scenario with conditionally coherent devices, and requires doing the iommu init early b) assume all iommu drivers are initialized early, so disallow any DMA by setting a zero dma_mask but allow the driver to be probed using polling I/O mode (useful for slow devices like UART or SPI) This breaks devices that require DMA but can fall back to linear or swiotlb mappings, and requires doing the iommu init early. c) defer probing until an iommu driver is gets initialized. This breaks both the cases where we could fall back to a sane behavior without iommu, but it would let us use a proper driver with regular power management etc. Arnd ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM
Hi Kevin, On Thursday 18 December 2014 13:14:24 Kevin Hilman wrote: > Laurent Pinchart writes: > > On Thursday 18 December 2014 02:32:30 Rafael J. Wysocki wrote: > >> On Wednesday, December 17, 2014 02:15:31 AM Laurent Pinchart wrote: > >>> On Tuesday 16 December 2014 11:18:33 Tomasz Figa wrote: > On Tue, Dec 16, 2014 at 4:53 AM, Laurent Pinchart wrote: > > On Monday 15 December 2014 11:39:01 Tomasz Figa wrote: > >> On Sat, Dec 13, 2014 at 5:47 AM, Laurent Pinchart wrote: > >>> On Friday 12 December 2014 13:15:51 Tomasz Figa wrote: > On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote: > > On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote: > >> On 11 December 2014 at 16:31, Kevin Hilman wrote: [...] > >>> From a high-level, the IOMMU is just another device inside the > >>> PM domain, so ideally it should be doing it's own _get() and > >>> _put() calls so the PM domain code would just do the right > >>> thing without the need for notifiers. > >> > >> As I understand it, the IOMMU (or for other similar cases) > >> shouldn't be doing any get() and put() at all because there are > >> no IO API to serve request from. > >>> > >>> Speaking purely from an IOMMU point of view that's not entirely > >>> tree. IOMMU drivers expose map and unmap operations, so they can > >>> track whether any memory is mapped. From a bus master point of view > >>> the IOMMU map and unmap operations are hidden by the DMA mapping > >>> API. The IOMMU can thus track the existence of mappings without any > >>> IOMMU awareness in the bus master driver. > >>> > >>> If no mapping exist the IOMMU shouldn't receive any translation > >>> request. An IOMMU driver could thus call pm_runtime_get_sync() in > >>> the map handler when creating the first mapping, and > >>> pm_runtime_put() in the unmap handler when tearing the last mapping > >>> down. > >>> > >>> One could argue that the IOMMU would end up being powered more > >>> often than strictly needed, as bus masters drivers, even when > >>> written properly, could keep mappings around at times they don't > >>> perform bus access. This is true, and that's an argument I've raised > >>> during the last kernel summit. The general response (including Linus > >>> Torvald's) was that micro-optimizing power management might not be > >>> worth it, and that measurements proving that the gain is worth it > >>> are required before introducing new APIs to solve the problem. I > >>> can't disagree with that argument. > >> > >> This would be a micro optimization if the IOMMU was located in its > >> own power domain. Unfortunately in most cases it is not, so keeping > >> all the devices in the domain powered on, because one of them have a > >> mapping created doesn't sound like a good idea. > >> > >> Moreover, most of the drivers will keep the mapping for much longer > >> than one run cycle. Please take a look at V4L2's videobuf2 subsystem > >> (which I guess you are more familiar with than me;)), which will > >> keep MMAP buffers mapped in IOMMU address space for their whole > >> lifetime. I believe similar is the case for DRM drivers. > > > > Yes, but that doesn't mean it gets out of control. Buffers shouldn't > > be allocated if they won't be used. Granted, they could be > > preallocated (or rather premapped) slightly before being used, but in > > sane use cases that shouldn't be long before the hardware needs to be > > turned on. > > Assuming that we don't have a third party, called "user", involved. > >>> > >>> Who needs that ? :-D > >>> > A simple use case is video playback pause. Usually all the software > state (including output buffers that can be used as reference for > decoding next frames) needs to be preserved to continue decoding after > resume, but it would be nice to power off the decoder, if it is unused > for some period. In addition, we would like the pause/resume operation > to be fast, so unmapping/freeing buffers and then exactly the opposite > on resume doesn't sound like a good idea. > >>> > >>> OK, then we have one possible use case. I expect people to still want > >>> to see power consumption numbers though. > >> > >> Well, we have them, kind of. > >> > >> In the ACPI world there's something called _DEP which gives us a list of > >> devices depended on by the given one. Those may be devices whose drivers > >> provide so called "operation region" handling which means that an ACPI > >> method executed for the dependent device may access a device it depends > >> on indirectly. Because of that indirection we basically need the devices > >> listed by _DEP to be "on" whenever the dependent device is "on" or things > >> may break in nasty ways otherwise
Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM
Laurent Pinchart writes: > Hi Rafael, > > On Thursday 18 December 2014 02:32:30 Rafael J. Wysocki wrote: >> On Wednesday, December 17, 2014 02:15:31 AM Laurent Pinchart wrote: >> > On Tuesday 16 December 2014 11:18:33 Tomasz Figa wrote: >> >> On Tue, Dec 16, 2014 at 4:53 AM, Laurent Pinchart wrote: >> >>> On Monday 15 December 2014 11:39:01 Tomasz Figa wrote: >> On Sat, Dec 13, 2014 at 5:47 AM, Laurent Pinchart wrote: >> > On Friday 12 December 2014 13:15:51 Tomasz Figa wrote: >> >> On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote: >> >>> On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote: >> On 11 December 2014 at 16:31, Kevin Hilman wrote: >> > Tomasz Figa writes: >> >> On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson wrote: >> > [...] >> > >> @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct >> platform_device *pdev) >> return -ENXIO; >> } >> >> + pm_runtime_no_callbacks(dev); >> + pm_runtime_enable(dev); >> + >> + /* Synchronize state of the domain with driver data. >> */ >> + pm_runtime_get_sync(dev); >> + iommu->is_powered = true; >> >>> >> >>> Doesn't the runtime PM status reflect the value of >> >>> "is_powered", thus why do you need to have a copy of it? Could >> >>> it perpahps be that you try to cope with the case when >> >>> CONFIG_PM is unset? >> >> >> >> It's worth noting that this driver fully relies on status of >> >> other devices in the power domain the IOMMU is in and does not >> >> enforce the status on its own. So in general, as far as my >> >> understanding of PM runtime subsystem, the status of the IOMMU >> >> device will be always suspended, because nobody will call >> >> pm_runtime_get() on it (except the get and put pair in probe). >> >> So is_powered is here to track status of the domain, not the >> >> device. Feel free to suggest a better way, though. >> > >> > I still don't like these notifiers. I think they add ways to >> > bypass having proper runtime PM implemented for >> > devices/subsystems. >> >> I do agree, but I haven't found another good solution to the >> problem. >> >>> >> >>> For the record, I'm not liking this mostly because it "fixes" a >> >>> generic problem in a way that's hidden in the genpd code and very >> >>> indirect. >> >> >> >> Well, that's true. This is indeed a generic problem of PM >> >> dependencies between devices (other than those represented by >> >> parent-child relation), which in fact doesn't have much to do with >> >> genpd, but rather with those devices directly. It is just that >> >> genpd is the most convenient location to solve this in current code >> >> and in a simple way. In other words, I see this solution as a >> >> reasonable way to get the problem solved quickly for now, so that >> >> we can start thinking about a more elegant solution. >> >> >> > From a high-level, the IOMMU is just another device inside the >> > PM domain, so ideally it should be doing it's own _get() and >> > _put() calls so the PM domain code would just do the right thing >> > without the need for notifiers. >> >> As I understand it, the IOMMU (or for other similar cases) >> shouldn't be doing any get() and put() at all because there are >> no IO API to serve request from. >> > >> > Speaking purely from an IOMMU point of view that's not entirely >> > tree. IOMMU drivers expose map and unmap operations, so they can >> > track whether any memory is mapped. From a bus master point of view >> > the IOMMU map and unmap operations are hidden by the DMA mapping >> > API. The IOMMU can thus track the existence of mappings without any >> > IOMMU awareness in the bus master driver. >> > >> > If no mapping exist the IOMMU shouldn't receive any translation >> > request. An IOMMU driver could thus call pm_runtime_get_sync() in >> > the map handler when creating the first mapping, and >> > pm_runtime_put() in the unmap handler when tearing the last mapping >> > down. >> > >> > One could argue that the IOMMU would end up being powered more often >> > than strictly needed, as bus masters drivers, even when written >> > properly, could keep mappings around at times they don't perform bus >> > access. This is true, and that's an argument I've raised during the >> > last kernel summit. The general response (including Linus Torvald's) >> > was that micro-optimizing power management might not be worth it, >> > an
Re: [Linaro-mm-sig] [PATCH v3 18/19] iommu: exynos: init from dt-specific callback instead of initcall
Hi Arnd, On Wednesday 17 December 2014 16:56:53 Arnd Bergmann wrote: > On Wednesday 17 December 2014 15:53:25 Lucas Stach wrote: > > Am Mittwoch, den 17.12.2014, 15:27 +0100 schrieb Arnd Bergmann: > >> On Wednesday 17 December 2014 01:24:42 Laurent Pinchart wrote: > >>> If we forbid the IOMMU driver from being compiled as a module can't we > >>> just rely on deferred probing ? The bus master driver will just be > >>> reprobed after the IOMMU gets probed, like for other devices. > >>> > >>> This could fail in case the IOMMU device permanently fails to probe. > >>> There would be something very wrong in the system in that case, > >>> Enabling the bus masters totally transparently without IOMMU support > >>> could not be such a good idea. > >> > >> I believe in the majority of cases today, the IOMMU is entirely > >> optional. There are valid reasons for not including the IOMMU driver in > >> the kernel, e.g. when you know that all the machines with that driver > >> can DMA to all of their RAM and you want to avoid the overhead of IOTLB > >> misses. > > > > This is similar to the problems we discussed with the componentized > > device framework and in the end everybody agreed on a simple rule: > > if the device node is enabled in the DT there must be a driver bound to > > it before other devices depending on this node can be probed. > > > > If we apply the same logic to the IOMMU situation we get two > > possibilities to allow bypassing the IOMMU: > > 1. completely disable the IOMMU node in the DT > > 2. leave node enabled in DT, but add a bypass property > > > > Obviously the second option still requires to have the IOMMU driver > > available, but is more flexible. Right now everything depends on the > > IOMMU being in passthrough mode at reset with no driver bound. If a > > device comes around where this isn't the case thing will break with the > > current assumptions or the first option. > > > > Having the IOMMU node enabled in DT with no driver available to the > > kernel seems like an invalid configuration which should be expected to > > break. Exactly the same thing as with componentized devices... > > One differences here is that for the IOMMU, we should generally be able > to fall back to swiotlb Or to nothing at all if the device can DMA to the allocated memory directly. > (currently only on ARM64, not ARM32, until someone adds support). I can see > your point regarding machines that have a mandatory IOMMU with no fallback > when there is no driver, but we can support them by making the iommu driver > selected through Kconfig for that platform, while still allowing other > platforms to work with drivers left out of the kernel. The question is how to tell the kernel not to wait for an IOMMU that will never be there. Would a kernel command line argument be an acceptable solution or do we need something more automatic ? -- Regards, Laurent Pinchart ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH 2/2] iommu: rockchip: Handle system-wide and runtime PM
Hi Rafael, On Thursday 18 December 2014 02:32:30 Rafael J. Wysocki wrote: > On Wednesday, December 17, 2014 02:15:31 AM Laurent Pinchart wrote: > > On Tuesday 16 December 2014 11:18:33 Tomasz Figa wrote: > >> On Tue, Dec 16, 2014 at 4:53 AM, Laurent Pinchart wrote: > >>> On Monday 15 December 2014 11:39:01 Tomasz Figa wrote: > On Sat, Dec 13, 2014 at 5:47 AM, Laurent Pinchart wrote: > > On Friday 12 December 2014 13:15:51 Tomasz Figa wrote: > >> On Fri, Dec 12, 2014 at 5:48 AM, Rafael J. Wysocki wrote: > >>> On Thursday, December 11, 2014 04:51:37 PM Ulf Hansson wrote: > On 11 December 2014 at 16:31, Kevin Hilman wrote: > > Tomasz Figa writes: > >> On Thu, Dec 11, 2014 at 8:58 PM, Ulf Hansson wrote: > > [...] > > > @@ -988,11 +1107,28 @@ static int rk_iommu_probe(struct > platform_device *pdev) > return -ENXIO; > } > > + pm_runtime_no_callbacks(dev); > + pm_runtime_enable(dev); > + > + /* Synchronize state of the domain with driver data. > */ > + pm_runtime_get_sync(dev); > + iommu->is_powered = true; > >>> > >>> Doesn't the runtime PM status reflect the value of > >>> "is_powered", thus why do you need to have a copy of it? Could > >>> it perpahps be that you try to cope with the case when > >>> CONFIG_PM is unset? > >> > >> It's worth noting that this driver fully relies on status of > >> other devices in the power domain the IOMMU is in and does not > >> enforce the status on its own. So in general, as far as my > >> understanding of PM runtime subsystem, the status of the IOMMU > >> device will be always suspended, because nobody will call > >> pm_runtime_get() on it (except the get and put pair in probe). > >> So is_powered is here to track status of the domain, not the > >> device. Feel free to suggest a better way, though. > > > > I still don't like these notifiers. I think they add ways to > > bypass having proper runtime PM implemented for > > devices/subsystems. > > I do agree, but I haven't found another good solution to the > problem. > >>> > >>> For the record, I'm not liking this mostly because it "fixes" a > >>> generic problem in a way that's hidden in the genpd code and very > >>> indirect. > >> > >> Well, that's true. This is indeed a generic problem of PM > >> dependencies between devices (other than those represented by > >> parent-child relation), which in fact doesn't have much to do with > >> genpd, but rather with those devices directly. It is just that > >> genpd is the most convenient location to solve this in current code > >> and in a simple way. In other words, I see this solution as a > >> reasonable way to get the problem solved quickly for now, so that > >> we can start thinking about a more elegant solution. > >> > > From a high-level, the IOMMU is just another device inside the > > PM domain, so ideally it should be doing it's own _get() and > > _put() calls so the PM domain code would just do the right thing > > without the need for notifiers. > > As I understand it, the IOMMU (or for other similar cases) > shouldn't be doing any get() and put() at all because there are > no IO API to serve request from. > > > > Speaking purely from an IOMMU point of view that's not entirely > > tree. IOMMU drivers expose map and unmap operations, so they can > > track whether any memory is mapped. From a bus master point of view > > the IOMMU map and unmap operations are hidden by the DMA mapping > > API. The IOMMU can thus track the existence of mappings without any > > IOMMU awareness in the bus master driver. > > > > If no mapping exist the IOMMU shouldn't receive any translation > > request. An IOMMU driver could thus call pm_runtime_get_sync() in > > the map handler when creating the first mapping, and > > pm_runtime_put() in the unmap handler when tearing the last mapping > > down. > > > > One could argue that the IOMMU would end up being powered more often > > than strictly needed, as bus masters drivers, even when written > > properly, could keep mappings around at times they don't perform bus > > access. This is true, and that's an argument I've raised during the > > last kernel summit. The general response (including Linus Torvald's) > > was that micro-optimizing power management might not be worth it, > > and that measurements proving that the gain is worth it are required > > before introducing new APIs to solve the probl
Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
On 18/12/2014 15:49, Zhang, Yang Z wrote: >>> Here, we introduce a similar way with 'apic_arb_prio' to handle >>> guest lowest priority interrtups when VT-d PI is used. Here is >>> the ideas: - Each vCPU has a counter 'round_robin_counter'. - >>> When guests sets an interrupts to lowest priority, we choose the >>> vCPU with smallest 'round_robin_counter' as the destination, then >>> increase it. > > How this can work well? All subsequent interrupts are delivered to > one vCPU? It shouldn't be the best solution, need more consideration. Well, it's a hardware limitation. The alternative (which is easy to implement) is to only do PI for single-CPU interrupts. This should work well for multiqueue NICs (and of course for UP guests :)), so perhaps it's a good idea to only support that as a first attempt. Paolo > Also, I think you should take the apic_arb_prio into consider since > the priority is for the whole vCPU not for one interrupt. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 12/26] KVM: Initialize VT-d Posted-Interrupts Descriptor
Feng Wu wrote on 2014-12-12: > This patch initializes the VT-d Posted-Interrupts Descriptor. > > Signed-off-by: Feng Wu > --- > arch/x86/kvm/vmx.c | 27 +++ > 1 file changed, 27 insertions(+) > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index > 0b1383e..66ca275 100644 --- a/arch/x86/kvm/vmx.c +++ > b/arch/x86/kvm/vmx.c @@ -45,6 +45,7 @@ > #include > #include > #include > +#include > > #include "trace.h" > @@ -4433,6 +4434,30 @@ static void ept_set_mmio_spte_mask(void) > kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull); } > +static void pi_desc_init(struct vcpu_vmx *vmx) { > + unsigned int dest; > + > + if (!irq_remapping_cap(IRQ_POSTING_CAP)) > + return; > + > + /* > + * Initialize Posted-Interrupt Descriptor > + */ > + > + pi_clear_sn(&vmx->pi_desc); > + vmx->pi_desc.nv = POSTED_INTR_VECTOR; Here. > + > + /* Physical mode for Notificaiton Event */ > + vmx->pi_desc.ndm = 0; And from here.. > + dest = cpu_physical_id(vmx->vcpu.cpu); > + > + if (x2apic_enabled()) > + vmx->pi_desc.ndst = dest; > + else > + vmx->pi_desc.ndst = (dest << 8) & 0xFF00; } > + ..to here are useless. The right place to update PI descriptor is where vcpu got loaded not in initialization. > /* > * Sets up the vmcs for emulated real mode. > */ > @@ -4476,6 +4501,8 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx) > > vmcs_write64(POSTED_INTR_NV, POSTED_INTR_VECTOR); > vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((&vmx->pi_desc))); > + > + pi_desc_init(vmx); > } > > if (ple_gap) { Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is set
Paolo Bonzini wrote on 2014-12-18: > > > On 18/12/2014 04:14, Wu, Feng wrote: >> >> >> linux-kernel-ow...@vger.kernel.org wrote on >> mailto:linux-kernel-ow...@vger.kernel.org] On Behalf Of Paolo: >>> x...@kernel.org; Gleb Natapov; Paolo Bonzini; dw...@infradead.org; >>> joro-zlv9swrftaidnm+yrof...@public.gmane.org; Alex Williamson; >>> joro-zLv9SwRftAIdnm+Jiang >>> Liu >>> Cc: iommu@lists.linux-foundation.org; >>> linux-kernel-u79uwxl29ty76z2rm5m...@public.gmane.org; KVM list; >>> Eric Auger >>> Subject: Re: [v3 25/26] KVM: Suppress posted-interrupt when 'SN' is >>> set >>> >>> >>> >>> On 12/12/2014 16:14, Feng Wu wrote: Currently, we don't support urgent interrupt, all interrupts are recognized as non-urgent interrupt, so we cannot send posted-interrupt when 'SN' is set. >>> >>> Can this happen? If the vcpu is in guest mode, it cannot have been >>> scheduled out, and that's the only case when SN is set. >>> >>> Paolo >> >> Currently, the only place where SN is set is vCPU is preempted and If the vCPU is preempted, shouldn't the subsequent be ignored? What happens if a PI is occurs when vCPU is preempted? >> waiting for the next scheduling in the runqueue. But I am not sure >> whether we need to set SN for other purpose in future. Adding SN >> checking here is just to follow the Spec. non-urgent interrupts are >> suppressed > when SN is set. > > I would change that to a WARN_ON_ONCE then. Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 21/26] x86, irq: Define a global vector for VT-d Posted-Interrupts
Feng Wu wrote on 2014-12-12: > Currently, we use a global vector as the Posted-Interrupts > Notification Event for all the vCPUs in the system. We need to > introduce another global vector for VT-d Posted-Interrtups, which will > be used to wakeup the sleep vCPU when an external interrupt from a > direct-assigned device happens for that vCPU. > Hi Feng, Since the idea of two global vectors mechanism is from me, please add me to the comments. > Signed-off-by: Feng Wu > --- > arch/x86/include/asm/entry_arch.h | 2 ++ > arch/x86/include/asm/hardirq.h | 1 + > arch/x86/include/asm/hw_irq.h | 2 ++ > arch/x86/include/asm/irq_vectors.h | 1 + > arch/x86/kernel/entry_64.S | 2 ++ > arch/x86/kernel/irq.c | 27 +++ > arch/x86/kernel/irqinit.c | 2 ++ > 7 files changed, 37 insertions(+) > diff --git a/arch/x86/include/asm/entry_arch.h > b/arch/x86/include/asm/entry_arch.h index dc5fa66..27ca0af 100644 --- > a/arch/x86/include/asm/entry_arch.h +++ > b/arch/x86/include/asm/entry_arch.h @@ -23,6 +23,8 @@ > BUILD_INTERRUPT(x86_platform_ipi, X86_PLATFORM_IPI_VECTOR) #ifdef > CONFIG_HAVE_KVM BUILD_INTERRUPT3(kvm_posted_intr_ipi, POSTED_INTR_VECTOR, >smp_kvm_posted_intr_ipi) > +BUILD_INTERRUPT3(kvm_posted_intr_wakeup_ipi, POSTED_INTR_WAKEUP_VECTOR, > + smp_kvm_posted_intr_wakeup_ipi) > #endif > > /* > diff --git a/arch/x86/include/asm/hardirq.h > b/arch/x86/include/asm/hardirq.h index 0f5fb6b..9866065 100644 > --- a/arch/x86/include/asm/hardirq.h > +++ b/arch/x86/include/asm/hardirq.h > @@ -14,6 +14,7 @@ typedef struct { > #endif #ifdef CONFIG_HAVE_KVMunsigned int kvm_posted_intr_ipis; > +unsigned int kvm_posted_intr_wakeup_ipis; #endifunsigned int > x86_platform_ipis; /* arch dependent */unsigned int apic_perf_irqs; > diff --git a/arch/x86/include/asm/hw_irq.h > b/arch/x86/include/asm/hw_irq.h index e7ae6eb..38fac9b 100644 > --- a/arch/x86/include/asm/hw_irq.h > +++ b/arch/x86/include/asm/hw_irq.h > @@ -29,6 +29,7 @@ > extern asmlinkage void apic_timer_interrupt(void); extern asmlinkage > void x86_platform_ipi(void); extern asmlinkage void > kvm_posted_intr_ipi(void); +extern asmlinkage void > kvm_posted_intr_wakeup_ipi(void); > extern asmlinkage void error_interrupt(void); extern asmlinkage void > irq_work_interrupt(void); > > @@ -92,6 +93,7 @@ extern void > trace_call_function_single_interrupt(void); > #define trace_irq_move_cleanup_interrupt irq_move_cleanup_interrupt > #define trace_reboot_interrupt reboot_interrupt #define > trace_kvm_posted_intr_ipi kvm_posted_intr_ipi > +#define trace_kvm_posted_intr_wakeup_ipi kvm_posted_intr_wakeup_ipi > #endif /* CONFIG_TRACING */ > > struct irq_domain; > diff --git a/arch/x86/include/asm/irq_vectors.h > b/arch/x86/include/asm/irq_vectors.h index b26cb12..dca94f2 100644 --- > a/arch/x86/include/asm/irq_vectors.h +++ > b/arch/x86/include/asm/irq_vectors.h @@ -105,6 +105,7 @@ > /* Vector for KVM to deliver posted interrupt IPI */ #ifdef > CONFIG_HAVE_KVM #define POSTED_INTR_VECTOR 0xf2 +#define > POSTED_INTR_WAKEUP_VECTOR0xf1 #endif > > /* > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S > index e61c14a..a598447 100644 --- a/arch/x86/kernel/entry_64.S +++ > b/arch/x86/kernel/entry_64.S @@ -960,6 +960,8 @@ apicinterrupt > X86_PLATFORM_IPI_VECTOR \ #ifdef CONFIG_HAVE_KVM > apicinterrupt3 POSTED_INTR_VECTOR \ > kvm_posted_intr_ipi smp_kvm_posted_intr_ipi > +apicinterrupt3 POSTED_INTR_WAKEUP_VECTOR \ > + kvm_posted_intr_wakeup_ipi smp_kvm_posted_intr_wakeup_ipi > #endif > > #ifdef CONFIG_X86_MCE_THRESHOLD > diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index > 922d285..47408c3 100644 > --- a/arch/x86/kernel/irq.c > +++ b/arch/x86/kernel/irq.c > @@ -237,6 +237,9 @@ __visible void smp_x86_platform_ipi(struct pt_regs > *regs) } > > #ifdef CONFIG_HAVE_KVM > +void (*wakeup_handler_callback)(void) = NULL; > +EXPORT_SYMBOL_GPL(wakeup_handler_callback); + > /* > * Handler for POSTED_INTERRUPT_VECTOR. > */ > @@ -256,6 +259,30 @@ __visible void smp_kvm_posted_intr_ipi(struct > pt_regs *regs) > > set_irq_regs(old_regs); > } > + > +/* > + * Handler for POSTED_INTERRUPT_WAKEUP_VECTOR. > + */ > +__visible void smp_kvm_posted_intr_wakeup_ipi(struct pt_regs *regs) { > + struct pt_regs *old_regs = set_irq_regs(regs); > + > + ack_APIC_irq(); > + > + irq_enter(); > + > + exit_idle(); > + > + inc_irq_stat(kvm_posted_intr_wakeup_ipis); > + > + if (wakeup_handler_callback) > + wakeup_handler_callback(); > + > + irq_exit(); > + > + set_irq_regs(old_regs); > +} > + > #endif > > __visible void smp_trace_x86_platform_ipi(struct pt_regs *regs) diff > --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index > 70e181e..844673c 100644 --- a/arch/x86/kernel/irqinit.c +++ > b/arch/x86/kernel/irqinit.c @@ -144,6 +144,8
RE: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
Feng Wu wrote on 2014-12-12: > This patch defines a new interface kvm_find_dest_vcpu for > VT-d PI, which can returns the destination vCPU of the > interrupt for guests. > > Since VT-d PI cannot handle broadcast/multicast interrupt, > Here we only handle Fixed and Lowest priority interrupts. > > The current method of handling guest lowest priority interrtups > is to use a counter 'apic_arb_prio' for each vCPU, we choose the > vCPU with smallest 'apic_arb_prio' and then increase it by 1. > However, for VT-d PI, we cannot re-use this, since we no longer > have control to 'apic_arb_prio' with posted interrupt direct > delivery by Hardware. > > Here, we introduce a similar way with 'apic_arb_prio' to handle guest > lowest priority interrtups when VT-d PI is used. Here is the ideas: - > Each vCPU has a counter 'round_robin_counter'. - When guests sets an > interrupts to lowest priority, we choose the vCPU with smallest > 'round_robin_counter' as the destination, then increase it. How this can work well? All subsequent interrupts are delivered to one vCPU? It shouldn't be the best solution, need more consideration. Also, I think you should take the apic_arb_prio into consider since the priority is for the whole vCPU not for one interrupt. Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [v3 06/26] iommu, x86: No need to migrating irq for VT-d Posted-Interrupts
Feng Wu wrote on 2014-12-12: > We don't need to migrate the irqs for VT-d Posted-Interrupts here. > When 'pst' is set in IRTE, the associated irq will be posted to guests > instead of interrupt remapping. The destination of the interrupt is > set in Posted-Interrupts Descriptor, and the migration happens during > vCPU scheduling. > > However, we still update the cached irte here, which can be used when > changing back to remapping mode. > > Signed-off-by: Feng Wu > Reviewed-by: Jiang Liu > --- > drivers/iommu/intel_irq_remapping.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > diff --git a/drivers/iommu/intel_irq_remapping.c > b/drivers/iommu/intel_irq_remapping.c index 48c2051..ab9057a 100644 --- > a/drivers/iommu/intel_irq_remapping.c +++ > b/drivers/iommu/intel_irq_remapping.c @@ -977,6 +977,7 @@ > intel_ir_set_affinity(struct irq_data *data, const struct cpumask *mask, > { > struct intel_ir_data *ir_data = data->chip_data;struct irte > *irte = > &ir_data->irte_entry; + struct irte_pi *irte_pi = (struct irte_pi > *)irte; struct irq_cfg *cfg = irqd_cfg(data); struct irq_data *parent > = data->parent_data; int ret; > @@ -991,7 +992,10 @@ intel_ir_set_affinity(struct irq_data *data, > const struct cpumask *mask, >*/ > irte->vector = cfg->vector; > irte->dest_id = IRTE_DEST(cfg->dest_apicid); > - modify_irte(&ir_data->irq_2_iommu, irte); > + > + /* We don't need to modify irte if the interrupt is for posting. */ > + if (irte_pi->pst != 1) > + modify_irte(&ir_data->irq_2_iommu, irte); What happens if user changes the IRQ affinity manually? > > /* >* After this point, all the interrupts will start arriving Best regards, Yang ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu