Re: [Xen-devel] [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-23 Thread Xuquan (Euler)
On September 24, 2016 7:34 AM, Tian Kevin < kevin.t...@intel.com > wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Friday, September 23, 2016 11:34 PM
>>
>> >>> On 20.09.16 at 15:30,  wrote:
>> > --- a/xen/arch/x86/hvm/vlapic.c
>> > +++ b/xen/arch/x86/hvm/vlapic.c
>> > @@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>> > void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>> >  struct domain *d = vlapic_domain(vlapic);
>> > +struct vcpu *v = vlapic_vcpu(vlapic);
>> > +struct hvm_intack pt_intack;
>> > +
>> > +pt_intack.vector = vector;
>> > +pt_intack.source = hvm_intsrc_lapic;
>> > +pt_intr_post(v, pt_intack);
>>
>> This also sits on the EOI LAPIC register write path, i.e. the change
>> then also affects non-apicv environments.
>
>The new logic should be entered only when EOI-induced exit happens.
>

Yes, more that the EOI-induced exit is conditional, specifically, the bitmap is 
set by vmx_set_eoi_exit_bitmap().
Jan, what do you think? While I recall from v1 discussion, you have the same 
comment. We can dig it deep..

>>
>> Furthermore - don't we have the same problem as with v1 again then?
>> What prevents multiple EOIs to come here before the timer interrupt
>> actually gets handled? You'd then clear ->irq_issued each time,
>> rendering your change to pt_update_irq() ineffective.
>
>based on this patch, one irq_issued should cause only one EOI on related pt
>vector and this EOI exit clears irq_issued then next EOI would be seen only 
>upon
>another injection when irq_issued is set again... However there might be an
>issue if this pt vector is shared with another device interrupt, which 
>although is
>not a typical case...
>
Agreed.

>>
>> In any event please use hvm_intack_lapic() instead of open coding it
>> (and then I don't think you need a local variable at all).
>>

Indeed.

>> > --- a/xen/arch/x86/hvm/vmx/intr.c
>> > +++ b/xen/arch/x86/hvm/vmx/intr.c
>> > @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>
>> >  clear_bit(i, >arch.hvm_vmx.eoi_exitmap_changed);
>> >  __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>> >  }
>> > -
>> > -pt_intr_post(v, intack);
>> >  }
>>
>> I'll defer to the VMX maintainers to determine whether removing this
>> but not the other one is correct.
>
>This is correct. The other one is for non-apicv scenario.
>
>>
>> > --- a/xen/arch/x86/hvm/vpt.c
>> > +++ b/xen/arch/x86/hvm/vpt.c
>> > @@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
>> >  }
>> >  else
>> >  {
>> > -if ( (pt->last_plt_gtime + pt->period) < max_lag )
>> > +if ( (pt->last_plt_gtime + pt->period) < max_lag &&
>> > + !pt->irq_issued )
>> >  {
>> >  max_lag = pt->last_plt_gtime + pt->period;
>> >  earliest_pt = pt;
>>
>> Looking at the rest of the code I really wonder why this check hadn't
>> been there from the beginning. Tim, do recall whether this was
>> intentional (as opposed to simply having been an oversight)?
>>
>
>Possibly simplify the emulation by relying on IRR/ISR to handling pending
>situation?

I think IRR is enough. But there is a challenge that the pt interrupt is 
__not_only__ handled by LAPIC ( see the bottom of pt_update_irq() )..

Quan






___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [ vTPM ] ownership disappear

2016-09-22 Thread Xuquan (Euler)
On September 20, 2016 9:14,
> Hi everyone,
> when I create a vTPM instance and attached to it a VM, I use tpm-tools to 
> take thw ownership of the vTPM. Then,
> if I destroy the vTPM and the VM, my expectation is that when I re-create the 
> same vTPM attached to the same VM,
> the owership is already taken, but this is not the case: if I re-run 
> tpm_takeownership, the command is successfull.
> Why this happens?

Ernesto,

I think the emulated TPM is always started with 'ST_CLEAR' status..

in vtpm.c:
|main()
|-main_loop() -- tpm_emulator_init(opt_args.startup ...)


in tpm_emulator_init(), the opt_args.startup is with ''ST_CLEAR''


struct Opt_args opt_args = {
   .startup = ST_CLEAR,
   ...
};


btw, in my experience, when tpm_takeownership is successfull and you re-run it 
again, it also print out
'''success'''


cc:
Daniel De Graaf 
Emil Condrea 



Huawei 2012 lab
Quan Xu


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-20 Thread Xuquan (Euler)
>From 97760602b5c94745e76ed78d23e8fdf9988d234e Mon Sep 17 00:00:00 2001
From: Quan Xu 
Date: Tue, 20 Sep 2016 21:12:54 +0800
Subject: [PATCH v2] x86/apicv: fix RTC periodic timer and apicv issue

When Xen apicv is enabled, wall clock time is faster on Windows7-32
guest with high payload (with 2vCPU, captured from xentrace, in
high payload, the count of IPI interrupt increases rapidly between
these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in vIRR), unfortunately, the IPI
intrrupt is high priority than periodic timer interrupt. Xen updates
IPI interrupt bit set in vIRR to guest interrupt status (RVI) as a high
priority and apicv (Virtual-Interrupt Delivery) delivers IPI interrupt
within VMX non-root operation without a VM-Exit. Within VMX non-root
operation, if periodic timer interrupt index of bit is set in vIRR and
highest, the apicv delivers periodic timer interrupt within VMX non-root
operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit
set in vIRR to guest interrupt status (RVI) directly, Xen is not aware
of this case to decrease the count (pending_intr_nr) of pending periodic
timer interrupt, then Xen will deliver a periodic timer interrupt again.

And that we update periodic timer interrupt in every VM-entry, there is
a chance that already-injected instance (before EOI-induced exit happens)
will incur another pending IRR setting if there is a VM-exit happens
between virtual interrupt injection (vIRR->0, vISR->1) and EOI-induced
exit (vISR->0), since pt_intr_post hasn't been invoked yet, then the
guest receives more periodic timer interrupt.

So change to pt_intr_post in EOI-induced exit handler and skip periodic
timer when it is not be completely consumed (irq_issued is ture).

Signed-off-by: Yifei Jiang 
Signed-off-by: Rongguang He 
Signed-off-by: Quan Xu 

---
v2:
  -change to pt_intr_post in EOI-induced exit handler.
  -skip periodic timer when it is not be completely consumed
   (irq_issued is ture).
---
 xen/arch/x86/hvm/vlapic.c   | 6 ++
 xen/arch/x86/hvm/vmx/intr.c | 2 --
 xen/arch/x86/hvm/vpt.c  | 3 ++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 1d5d287..f83d6ab 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -433,6 +433,12 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
 struct domain *d = vlapic_domain(vlapic);
+struct vcpu *v = vlapic_vcpu(vlapic);
+struct hvm_intack pt_intack;
+
+pt_intack.vector = vector;
+pt_intack.source = hvm_intsrc_lapic;
+pt_intr_post(v, pt_intack);

 if ( vlapic_test_and_clear_vector(vector, >regs->data[APIC_TMR]) )
 vioapic_update_EOI(d, vector);
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 8fca08c..29d9bbf 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -333,8 +333,6 @@ void vmx_intr_assist(void)
 clear_bit(i, >arch.hvm_vmx.eoi_exitmap_changed);
 __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
 }
-
-pt_intr_post(v, intack);
 }
 else
 {
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 5c48fdb..a9da436 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -252,7 +252,8 @@ int pt_update_irq(struct vcpu *v)
 }
 else
 {
-if ( (pt->last_plt_gtime + pt->period) < max_lag )
+if ( (pt->last_plt_gtime + pt->period) < max_lag &&
+ !pt->irq_issued )
 {
 max_lag = pt->last_plt_gtime + pt->period;
 earliest_pt = pt;
--
1.8.3.4



0001-x86-apicv-fix-RTC-periodic-timer-and-apicv-issue.patch
Description: 0001-x86-apicv-fix-RTC-periodic-timer-and-apicv-issue.patch
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-19 Thread Xuquan (Euler)
On September 19, 2016 5:25 PM, Tian Kevin <kevin.t...@intel.com> wrote:
>> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> Sent: Monday, September 12, 2016 5:08 PM
>>
>> On September 12, 2016 3:58 PM, Tian, Kevin <kevin.t...@intel.com> wrote:
>> >> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> >> Sent: Friday, September 09, 2016 11:02 AM
>> >>
>> >> On August 30, 2016 1:58 PM, Tian Kevin < kevin.t...@intel.com > wrote:
>> >> >> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> >> >> Sent: Friday, August 19, 2016 8:59 PM
>> >> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> >> index 1d5d287..cc247c3 100644
>> >> --- a/xen/arch/x86/hvm/vlapic.c
>> >> +++ b/xen/arch/x86/hvm/vlapic.c
>> >> @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic)
>> >> void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>> >>  struct domain *d = vlapic_domain(vlapic);
>> >> +struct hvm_intack pt_intack;
>> >> +
>> >> +pt_intack.vector = vector;
>> >> +pt_intack.source = hvm_intsrc_lapic;
>> >> +pt_intr_post(vlapic_vcpu(vlapic), pt_intack);
>> >>
>> >>  if ( vlapic_test_and_clear_vector(vector,
>> >>regs->data[APIC_TMR]) )
>> >>  vioapic_update_EOI(d, vector); diff --git
>> >> a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index
>> >> 8fca08c..29d9bbf 100644
>> >> --- a/xen/arch/x86/hvm/vmx/intr.c
>> >> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> >> @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>> >>  clear_bit(i, >arch.hvm_vmx.eoi_exitmap_changed);
>> >>  __vmwrite(EOI_EXIT_BITMAP(i),
>> >v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>> >>  }
>> >> -
>> >> -pt_intr_post(v, intack);
>> >>  }
>> >>  else
>> >>  {
>> >>
>> >
>> >Because we update pt irq in every vmentry, there is a chance that
>> >already-injected instance (before EOI-induced exit happens) will
>> >incur another pending IRR setting if there is a VM-exit happens
>> >between HW virtual interrupt injection (vIRR->0, vISR->1) and
>> >EOI-induced exit (vISR->0), since pt_intr_post hasn't been invoked
>> >yet. I guess this is the reason why you still see faster wallclock.
>> >
>>
>> Agreed. A good description. My bad description is from another aspect.
>>
>> >I think you need mark this pending_intr_post situation explicitly.
>> >Then pt_update_irq should skip such pt timer when pending_intr_post
>> >of that timer is true (otherwise the update is meaningless since
>> >previous one hasn't been posted yet). Then with your change to post
>> >in EOI-induced exit handler, it should work correctly to meet the
>> >goal
>> >- one virtual interrupt delivery for one pending pt intr...
>> >
>> I think we are at least on the right track.
>> But I can't follow ' pending_intr_post ', a new parameter? Thanks.
>>
>>
>
>yes, a new parameter to record whether a intr_post operation is pending


The existing parameter ' irq_issued ' looks good. I have tested with below 
modification last night, and it is working. 
If it is okay, I will send out v2..

Quan

 modification =
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index 1d5d287..cc247c3 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic)
 void vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)
 {
 struct domain *d = vlapic_domain(vlapic);
+struct hvm_intack pt_intack;
+
+pt_intack.vector = vector;
+pt_intack.source = hvm_intsrc_lapic;
+pt_intr_post(vlapic_vcpu(vlapic), pt_intack);

 if ( vlapic_test_and_clear_vector(vector, >regs->data[APIC_TMR]) )
 vioapic_update_EOI(d, vector);
diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 8fca08c..29d9bbf 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -333,8 +333,6 @@ void vmx_intr_assist(void)
 clear_bit(i, >arch.hvm_vmx.eoi_exitmap_changed);
 __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
 }
-
-pt_intr_post(v, intack);
 }
 else
 {
diff --git a/xen/arch/x86/hvm/vpt.c b/xen/arch/x86/hvm/vpt.c
index 5c48fdb..620ca68 100644
--- a/xen/arch/x86/hvm/vpt.c
+++ b/xen/arch/x86/hvm/vpt.c
@@ -267,6 +267,11 @@ int pt_update_irq(struct vcpu *v)
 return -1;
 }

+if ( earliest_pt->irq_issued )
+{
+spin_unlock(>arch.hvm_vcpu.tm_lock);
+return -1;
+}
 earliest_pt->irq_issued = 1;
 irq = earliest_pt->irq;
 is_lapic = (earliest_pt->source == PTSRC_lapic);


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-12 Thread Xuquan (Euler)
On September 12, 2016 3:58 PM, Tian, Kevin <kevin.t...@intel.com> wrote:
>> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> Sent: Friday, September 09, 2016 11:02 AM
>>
>> On August 30, 2016 1:58 PM, Tian Kevin < kevin.t...@intel.com > wrote:
>> >> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> >> Sent: Friday, August 19, 2016 8:59 PM
>> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
>> index 1d5d287..cc247c3 100644
>> --- a/xen/arch/x86/hvm/vlapic.c
>> +++ b/xen/arch/x86/hvm/vlapic.c
>> @@ -433,6 +433,11 @@ void vlapic_EOI_set(struct vlapic *vlapic)  void
>> vlapic_handle_EOI(struct vlapic *vlapic, u8 vector)  {
>>  struct domain *d = vlapic_domain(vlapic);
>> +struct hvm_intack pt_intack;
>> +
>> +pt_intack.vector = vector;
>> +pt_intack.source = hvm_intsrc_lapic;
>> +pt_intr_post(vlapic_vcpu(vlapic), pt_intack);
>>
>>  if ( vlapic_test_and_clear_vector(vector,
>>regs->data[APIC_TMR]) )
>>  vioapic_update_EOI(d, vector); diff --git
>> a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c index
>> 8fca08c..29d9bbf 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -333,8 +333,6 @@ void vmx_intr_assist(void)
>>  clear_bit(i, >arch.hvm_vmx.eoi_exitmap_changed);
>>  __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>  }
>> -
>> -pt_intr_post(v, intack);
>>  }
>>  else
>>  {
>>
>
>Because we update pt irq in every vmentry, there is a chance that
>already-injected instance (before EOI-induced exit happens) will incur another
>pending IRR setting if there is a VM-exit happens between HW virtual interrupt
>injection (vIRR->0, vISR->1) and EOI-induced exit (vISR->0), since pt_intr_post
>hasn't been invoked yet. I guess this is the reason why you still see faster
>wallclock.
>

Agreed. A good description. My bad description is from another aspect.

>I think you need mark this pending_intr_post situation explicitly.
>Then pt_update_irq should skip such pt timer when pending_intr_post of that
>timer is true (otherwise the update is meaningless since previous one hasn't
>been posted yet). Then with your change to post in EOI-induced exit handler, it
>should work correctly to meet the goal
>- one virtual interrupt delivery for one pending pt intr...
>
I think we are at least on the right track.
But I can't follow ' pending_intr_post ', a new parameter? Thanks.


Quan



___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-08 Thread Xuquan (Euler)
On August 30, 2016 1:58 PM, Tian Kevin < kevin.t...@intel.com > wrote:
>> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> Sent: Friday, August 19, 2016 8:59 PM
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> guest with high payload (with 2vCPU, captured from xentrace, in high
>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>> interrupt within VMX non-root operation without a VM exit. Within VMX
>> non-root operation, if periodic timer interrupt index of bit is set in
>> VIRR and highest, the apicv delivers periodic timer interrupt within VMX
>non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt
>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>> aware of this case to decrease the count (pending_intr_nr) of pending
>> periodic timer interrupt, then Xen will deliver a periodic timer
>> interrupt again. The guest receives more periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest
>> priority, make Xen be aware of this case to decrease the count of
>> pending periodic timer interrupt.
>>
>> Signed-off-by: Yifei Jiang <jiangyi...@huawei.com>
>> Signed-off-by: Rongguang He <herongguang...@huawei.com>
>> Signed-off-by: Quan Xu <xuqu...@huawei.com>
>> ---
>> Why RFC:
>> 1. I am not quite sure for other cases, such as nested case.
>> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including
>external
>>interrupts, non-maskable interrupt system-management interrrupts,
>exceptions
>>and VM exit) may occur before delivery of a periodic timer interrupt, the
>periodic
>>timer interrupt may be lost when a coming periodic timer interrupt is
>delivered.
>>Actually, and so current code is.
>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 8fca08c..d3a034e 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>  __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>  }
>>
>> -pt_intr_post(v, intack);
>> +   /*
>> +* If the periodic timer interrut is delivered and not the highest
>priority,
>> +* make Xen be aware of this case to decrease the count of pending
>periodic
>> +* timer interrupt.
>> +*/
>> +if ( pt_vector != -1 && intack.vector > pt_vector )
>> +{
>> +struct hvm_intack pt_intack;
>> +
>> +pt_intack.vector = pt_vector;
>> +pt_intack.source = hvm_intsrc_lapic;
>> +pt_intr_post(v, pt_intack);
>> +}
>> +else
>> +pt_intr_post(v, intack);
>>  }
>
>Here is my thought. w/o above patch one pending pt interrupt may result in
>multiple injections of guest timer interrupt, as you identified, when pt_vector
>happens to be not the highest pending vector. Then w/ your patch, multiple
>pending pt interrupt instances may be combined as one injection of guest timer
>interrupt. Especially intack.vector
>>pt_vector doesn't mean pt_intr is eligible for injection, which might
>be blocked due to other reasons. As Jan pointed out, this path is very 
>fragile, so
>better we can have a fix to sustain the original behavior with one pending pt
>instance causing one actual injection.
>
>What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for
>pt_intr is already set to 1 which means every injection would incur an
>EOI-induced VM-exit:
>
>   /*
>* Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced
>VM
>* exit, then pending periodic time interrups have the chance to be
>injected
>* for compensation
>*/
>if (pt_vector != -1)
>vmx_set_eoi_exit_bitmap(v, pt_vector);
>
>I don't think delaying post makes much difference here. Even we do post earlier
>like your p

Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-06 Thread Xuquan (Euler)
On September 06, 2016 9:21 PM, Jan Beulich  wrote:
 On 06.09.16 at 13:22,  wrote:
>> On September 06, 2016 5:05 PM, Jan Beulich 
>> On 06.09.16 at 10:53,  wrote:
 to be honest, my fix is also a tradeoff.. w/ my patch, multiple
 pending pt interrupt instances may be combined as one injection of
 guest timer interrupt..
  but as Yang said, It should be ok since it happens rarely and even
 native OS will encounter the same problem if it disable the
 interrupt for too
>>>long.
>>>
>>>And one OS may properly deal with that situation while another might
>>>not. For example, let me remind you of issues the hypervisor had in
>>>earlier days when no event/softirq processing happened for long
>>>enough, resulting in time management issues due to missed platform
>>>timer overflow handling. IOW an OS may avoid that situation by simply not
>disabling IRQs for too long a time.
>>>
>> Thanks for your reminding,
>> Jan, do you have any suggestion to fix this PT and apicv issue?
>> What do you think about Kevin's?
>
>Kevin's suggestion sounded reasonable, iirc, and I'd rely on his and your 
>better
>VMX knowledge for the low level details.
>

I'll try to verify Kevin's later.
Kevin is really a master but I am just a rookie. I always appreciate both of 
your education..

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-06 Thread Xuquan (Euler)
On September 06, 2016 5:05 PM, Jan Beulich 
 On 06.09.16 at 10:53,  wrote:
>> to be honest, my fix is also a tradeoff.. w/ my patch, multiple
>> pending pt interrupt instances may be combined as one injection of
>> guest timer interrupt..
>>  but as Yang said, It should be ok since it happens rarely and even
>> native OS will encounter the same problem if it disable the interrupt for too
>long.
>
>And one OS may properly deal with that situation while another might not. For
>example, let me remind you of issues the hypervisor had in earlier days when no
>event/softirq processing happened for long enough, resulting in time
>management issues due to missed platform timer overflow handling. IOW an OS
>may avoid that situation by simply not disabling IRQs for too long a time.
>
Thanks for your reminding,
Jan, do you have any suggestion to fix this PT and apicv issue?
What do you think about Kevin's?

Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-09-06 Thread Xuquan (Euler)
On August 30, 2016 1:58 PM, Tian, Kevin <kevin.t...@intel.com> wrote:
>> From: Xuquan (Euler) [mailto:xuqu...@huawei.com]
>> Sent: Friday, August 19, 2016 8:59 PM
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> guest with high payload (with 2vCPU, captured from xentrace, in high
>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>> interrupt within VMX non-root operation without a VM exit. Within VMX
>> non-root operation, if periodic timer interrupt index of bit is set in
>> VIRR and highest, the apicv delivers periodic timer interrupt within VMX
>non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt
>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>> aware of this case to decrease the count (pending_intr_nr) of pending
>> periodic timer interrupt, then Xen will deliver a periodic timer
>> interrupt again. The guest receives more periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest
>> priority, make Xen be aware of this case to decrease the count of
>> pending periodic timer interrupt.
>>
>> Signed-off-by: Yifei Jiang <jiangyi...@huawei.com>
>> Signed-off-by: Rongguang He <herongguang...@huawei.com>
>> Signed-off-by: Quan Xu <xuqu...@huawei.com>
>> ---
>> Why RFC:
>> 1. I am not quite sure for other cases, such as nested case.
>> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including
>external
>>interrupts, non-maskable interrupt system-management interrrupts,
>exceptions
>>and VM exit) may occur before delivery of a periodic timer interrupt, the
>periodic
>>timer interrupt may be lost when a coming periodic timer interrupt is
>delivered.
>>Actually, and so current code is.
>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 8fca08c..d3a034e 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>  __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>  }
>>
>> -pt_intr_post(v, intack);
>> +   /*
>> +* If the periodic timer interrut is delivered and not the highest
>priority,
>> +* make Xen be aware of this case to decrease the count of pending
>periodic
>> +* timer interrupt.
>> +*/
>> +if ( pt_vector != -1 && intack.vector > pt_vector )
>> +{
>> +struct hvm_intack pt_intack;
>> +
>> +pt_intack.vector = pt_vector;
>> +pt_intack.source = hvm_intsrc_lapic;
>> +pt_intr_post(v, pt_intack);
>> +}
>> +else
>> +pt_intr_post(v, intack);
>>  }
>
>Here is my thought. w/o above patch one pending pt interrupt may result in
>multiple injections of guest timer interrupt, as you identified, when pt_vector
>happens to be not the highest pending vector. Then w/ your patch, multiple
>pending pt interrupt instances may be combined as one injection of guest timer
>interrupt. Especially intack.vector
>>pt_vector doesn't mean pt_intr is eligible for injection, which might
>be blocked due to other reasons. As Jan pointed out, this path is very 
>fragile, so
>better we can have a fix to sustain the original behavior with one pending pt
>instance causing one actual injection.
>

Agreed. Good summary..

>What about doing pt_intr_post in EOI-induced VM-exit instead? EOI exit bit for
>pt_intr is already set to 1 which means every injection would incur an
>EOI-induced VM-exit:
>
>   /*
>* Set eoi_exit_bitmap for periodic timer interrup to cause EOI-induced
>VM
>* exit, then pending periodic time interrups have the chance to be
>injected
>* for compensation
>*/
>if (pt_vector != -1)
>vmx_set_eoi_exit_bitmap(v, pt_vector);
>
>I don't think delaying post makes much difference here. Even we do 

Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-22 Thread Xuquan (Euler)
On August 22, 2016 11:12 PM,  wrote:
 On 22.08.16 at 16:02,  wrote:
>> On August 22, 2016 8:04 PM,  wrote:
>> On 22.08.16 at 13:41,  wrote:
 On August 22, 2016 6:36 PM,  wrote:
 On 19.08.16 at 14:58,  wrote:
>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17
>00:00:00
>> 2001
>> From: Quan Xu 
>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>> issue
>>
>> When Xen apicv is enabled, wall clock time is faster on
>> Windows7-32 guest with high payload (with 2vCPU, captured from
>> xentrace, in high payload, the count of IPI interrupt increases rapidly
>between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
>> the IPI intrrupt is high priority than periodic timer interrupt.
>> Xen updates IPI interrupt bit set in VIRR to guest interrupt
>> status
>> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
>> delivers IPI interrupt within VMX non-root operation without a VM
>> exit. Within VMX non-root operation, if periodic timer interrupt
>> index of bit is set in VIRR and highest, the apicv delivers
>> periodic timer
>>>interrupt within VMX non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer
>> interrupt bit set in VIRR to guest interrupt status (RVI)
>> directly, Xen is not aware of this case to decrease the count
>> (pending_intr_nr) of pending periodic timer interrupt, then Xen
>> will deliver a periodic timer interrupt again. The guest receives
>> more periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest
>> priority, make Xen be aware of this case to decrease the count of
>> pending periodic timer interrupt.
>
>I can see the issue you're trying to address, but for one - doesn't
>this lead to other double accounting, namely once the pt irq becomes
>the highest priority one?
>

 It is does NOT lead to other double accounting..
 As if the pt irq becomes the highest priority one, the intack is pt one..
 Then:

  +else
  +pt_intr_post(v, intack);
>>>
>>>As just said in reply to Yang: If this is still the same interrupt
>>>instance
>> as in a
>>>prior run through here which took the if() branch, this change looks
>>>like
>> having
>>>the potential of double accounting.
>>>
>>
>> I very appreciate your detail review. It looks like, but actually it
>> doesn't happen.
>>
>>  As the key parameter 'pt->irq_issued'..
>>
>> In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
>> In pt_intr_post(), clear the pt->irq_issued before touching the count
>> 'pt->pending_intr_nr'..
>>
>> According to your assumption, at the second call to pt_intr_post(), As
>> if 'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check, then
>> return, there is no chance to touch the count 'pt->pending_intr_nr'..
>
>Hmm, wait: That second pass would also get us through
>pt_update_irq() a second time, which might cause irq_issued to get set again.
>

 iiuc this case is IRQ combination, issue twice but delivery once..
Another enhancement may be required here,
 IF  hvm_intsrc_lapic && the PT IRQ index bit in VIRR is not clear
THEN don't issue PT IRQ in pt_update_irq()
 FI

>Granted this code is fragile, therefore please excuse that I'm trying to be 
>extra
>careful with accepting changes to it.

Understood :):)...

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-22 Thread Xuquan (Euler)
On August 22, 2016 8:04 PM,  wrote: 
 On 22.08.16 at 13:41,  wrote:
>> On August 22, 2016 6:36 PM,  wrote:
>> On 19.08.16 at 14:58,  wrote:
 From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
 2001
 From: Quan Xu 
 Date: Fri, 19 Aug 2016 20:40:31 +0800
 Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
 issue

 When Xen apicv is enabled, wall clock time is faster on Windows7-32
 guest with high payload (with 2vCPU, captured from xentrace, in high
 payload, the count of IPI interrupt increases rapidly between these vCPUs).

 If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
 0xd1) are both pending (index of bit set in VIRR), unfortunately,
 the IPI intrrupt is high priority than periodic timer interrupt. Xen
 updates IPI interrupt bit set in VIRR to guest interrupt status
 (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
 delivers IPI interrupt within VMX non-root operation without a VM
 exit. Within VMX non-root operation, if periodic timer interrupt
 index of bit is set in VIRR and highest, the apicv delivers periodic timer
>interrupt within VMX non-root operation as well.

 But in current code, if Xen doesn't update periodic timer interrupt
 bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
 aware of this case to decrease the count (pending_intr_nr) of
 pending periodic timer interrupt, then Xen will deliver a periodic
 timer interrupt again. The guest receives more periodic timer
 interrupt.

 If the periodic timer interrut is delivered and not the highest
 priority, make Xen be aware of this case to decrease the count of
 pending periodic timer interrupt.
>>>
>>>I can see the issue you're trying to address, but for one - doesn't
>>>this lead to other double accounting, namely once the pt irq becomes
>>>the highest priority one?
>>>
>>
>> It is does NOT lead to other double accounting..
>> As if the pt irq becomes the highest priority one, the intack is pt one..
>> Then:
>>
>>  +else
>>  +pt_intr_post(v, intack);
>
>As just said in reply to Yang: If this is still the same interrupt instance as 
>in a
>prior run through here which took the if() branch, this change looks like 
>having
>the potential of double accounting.
>

I very appreciate your detail review. It looks like, but actually it doesn't 
happen.

 As the key parameter 'pt->irq_issued'..

In pt_update_irq(), once the PT irq is issued, set the pt->irq_issued..
In pt_intr_post(), clear the pt->irq_issued before touching the count 
'pt->pending_intr_nr'..

According to your assumption, at the second call to pt_intr_post(), As if 
'pt->irq_issued' is clear, pt is NULL in is_pt_irq() check,
then return, there is no chance to touch the count 'pt->pending_intr_nr'..
--
void pt_intr_post(struct vcpu *v, struct hvm_intack intack)
{
...
pt = is_pt_irq(v, intack);
if ( pt == NULL )
{
spin_unlock(>arch.hvm_vcpu.tm_lock);
return;
}
...

 
...
}


static struct periodic_time *is_pt_irq()
{
 
list_for_each_entry ( pt, head, list )
{
if ( pt->pending_intr_nr && pt->irq_issued___ &&
 (intack.vector == pt_irq_vector(pt, intack.source)) )
return pt;
}

return NULL;
}





__IIUC__, this question is based on the following pseudocode detail the 
behavior of virtual-interrupt delivery is __not__ atomic:

Vector <- RVI;
VISR[Vector] <- 1;
SVI <- Vector;
VPPR<- Vector & F0H;
VIRR[Vector] <- 0;
IF any bits set in VIRR
   Then RVI<-highest index of bit set in VIRR
   ELSE RVI <-0
FI
Deliver interrupt with Vector through IDT
...


We'd better check this first, as Yang said, this is atomic..

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-22 Thread Xuquan (Euler)
On August 22, 2016 9:10 PM, Yang Zhang wrote:
>On 2016/8/22 20:04, Jan Beulich wrote:
> On 22.08.16 at 13:41,  wrote:
>>> On August 22, 2016 6:36 PM,  wrote:
>>> On 19.08.16 at 14:58,  wrote:
> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
> 2001
> From: Quan Xu 
> Date: Fri, 19 Aug 2016 20:40:31 +0800
> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
> issue
>
> When Xen apicv is enabled, wall clock time is faster on Windows7-32
> guest with high payload (with 2vCPU, captured from xentrace, in
> high payload, the count of IPI interrupt increases rapidly between these
>vCPUs).
>
> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
> 0xd1) are both pending (index of bit set in VIRR), unfortunately,
> the IPI intrrupt is high priority than periodic timer interrupt.
> Xen updates IPI interrupt bit set in VIRR to guest interrupt status
> (RVI) as a high priority and apicv (Virtual-Interrupt Delivery)
> delivers IPI interrupt within VMX non-root operation without a VM
> exit. Within VMX non-root operation, if periodic timer interrupt
> index of bit is set in VIRR and highest, the apicv delivers periodic timer
>interrupt within VMX non-root operation as well.
>
> But in current code, if Xen doesn't update periodic timer interrupt
> bit set in VIRR to guest interrupt status (RVI) directly, Xen is
> not aware of this case to decrease the count (pending_intr_nr) of
> pending periodic timer interrupt, then Xen will deliver a periodic
> timer interrupt again. The guest receives more periodic timer
> interrupt.
>
> If the periodic timer interrut is delivered and not the highest
> priority, make Xen be aware of this case to decrease the count of
> pending periodic timer interrupt.

 I can see the issue you're trying to address, but for one - doesn't
 this lead to other double accounting, namely once the pt irq becomes
 the highest priority one?

>>>
>>> It is does NOT lead to other double accounting..
>>> As if the pt irq becomes the highest priority one, the intack is pt one..
>>> Then:
>>>
>>>  +else
>>>  +pt_intr_post(v, intack);
>>
>> As just said in reply to Yang: If this is still the same interrupt
>> instance as in a prior run through here which took the if() branch,
>> this change looks like having the potential of double accounting.
>
>This cannot happen: if pt intr is the second priority one, then corresponding 
>vIRR
>bit will be cleared by hardware while the highest priority intr is delivered to
>guest. And the next vmx_intr_assit cannot aware of it since the vIRR bit is 
>zero.
>

I am afraid that this may happen.. as I mentioned as 2nd RFC reason, 

Within VMX non-root operation, an Asynchronous Enclave Exit (including external
interrupts, non-maskable interrupt system-management interrrupts, exceptions 
and VM exit)
may occur before delivery of a periodic timer interrupt, the pt bit of VIRR is 
still there...

but I will explain more why it doesn't have the potential of double accounting 
in current code in next email.

Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-22 Thread Xuquan (Euler)
On August 22, 2016 7:40 PM, Yang Zhang wrote:
>On 2016/8/22 18:35, Jan Beulich wrote:
> On 19.08.16 at 14:58,  wrote:
>>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00
>>> 2001
>>> From: Quan Xu 
>>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv
>>> issue
>>>
>>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>>> guest with high payload (with 2vCPU, captured from xentrace, in high
>>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>>
>>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers
>>> IPI interrupt within VMX non-root operation without a VM exit. Within
>>> VMX non-root operation, if periodic timer interrupt index of bit is
>>> set in VIRR and highest, the apicv delivers periodic timer interrupt within
>VMX non-root operation as well.
>>>
>>> But in current code, if Xen doesn't update periodic timer interrupt
>>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>>> aware of this case to decrease the count (pending_intr_nr) of pending
>>> periodic timer interrupt, then Xen will deliver a periodic timer
>>> interrupt again. The guest receives more periodic timer interrupt.
>>>
>>> If the periodic timer interrut is delivered and not the highest
>>> priority, make Xen be aware of this case to decrease the count of
>>> pending periodic timer interrupt.
>>
>> I can see the issue you're trying to address, but for one - doesn't
>> this lead to other double accounting, namely once the pt irq becomes
>> the highest priority one?
>
>"intack.vector > pt_vector"
>I think above check in code can avoid the double accounting problem. It
>recalculate pending timer interrupt only when the highest priority interrupt is
>not timer.
>

Yes, 

>>
>>> --- a/xen/arch/x86/hvm/vmx/intr.c
>>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>>  __vmwrite(EOI_EXIT_BITMAP(i),
>v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>>  }
>>>
>>> -pt_intr_post(v, intack);
>>> +   /*
>>> +* If the periodic timer interrut is delivered and not the highest
>priority,
>>> +* make Xen be aware of this case to decrease the count of
>pending periodic
>>> +* timer interrupt.
>>> +*/
>>> +if ( pt_vector != -1 && intack.vector > pt_vector )
>>> +{
>>> +struct hvm_intack pt_intack;
>>> +
>>> +pt_intack.vector = pt_vector;
>>> +pt_intack.source = hvm_intsrc_lapic;
>>> +pt_intr_post(v, pt_intack);
>>> +}
>>> +else
>>> +pt_intr_post(v, intack);
>>
>> And then I'm having two problems with this change: Processing other
>> than the highest priority irq here looks to be non-architectural. From
>> looking over pt_intr_post() there's only pt related accounting and no
>> actual interrupt handling there, but I'd like this to be (a) confirmed
>> and (b) called out in the comment.
>>
>> Plus the change above is in no way apicv specific, so I'd also like to
>> see clarification why this change is valid (i.e. at least not making
>> things worse) even without apicv in use.
>
>It is apicv specific case. Above code will execute only when cpu has
>virtual interrupt delivery which is part of apicv feature.
>

Yes, I think so too. Thanks for your clarification!!

Quan


>>
>> And of course in any event VMX maintainer feedback is going to be
>> needed here.
>>
>> Jan
>>
>
>
>--
>Yang
>Alibaba Cloud Computing

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-22 Thread Xuquan (Euler)
On August 22, 2016 6:36 PM,  wrote:
 On 19.08.16 at 14:58,  wrote:
>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>> From: Quan Xu 
>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
>> with high payload (with 2vCPU, captured from xentrace, in high payload, the
>> count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
>> are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
>> is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
>> set in VIRR to guest interrupt status (RVI) as a high priority and apicv
>> (Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
>> operation without a VM exit. Within VMX non-root operation, if periodic timer
>> interrupt index of bit is set in VIRR and highest, the apicv delivers 
>> periodic
>> timer interrupt within VMX non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt bit set
>> in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
>> case to decrease the count (pending_intr_nr) of pending periodic timer
>> interrupt,
>> then Xen will deliver a periodic timer interrupt again. The guest receives
>> more
>> periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest priority, 
>> make
>> Xen be aware of this case to decrease the count of pending periodic timer
>> interrupt.
>
>I can see the issue you're trying to address, but for one - doesn't
>this lead to other double accounting, namely once the pt irq becomes
>the highest priority one?
>

It is does NOT lead to other double accounting..
As if the pt irq becomes the highest priority one, the intack is pt one..
Then:

 +else
 +pt_intr_post(v, intack);


>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>  __vmwrite(EOI_EXIT_BITMAP(i),
>> v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>  }
>>
>> -pt_intr_post(v, intack);
>> +   /*
>> +* If the periodic timer interrut is delivered and not the highest
>> priority,
>> +* make Xen be aware of this case to decrease the count of pending
>> periodic
>> +* timer interrupt.
>> +*/
>> +if ( pt_vector != -1 && intack.vector > pt_vector )
>> +{
>> +struct hvm_intack pt_intack;
>> +
>> +pt_intack.vector = pt_vector;
>> +pt_intack.source = hvm_intsrc_lapic;
>> +pt_intr_post(v, pt_intack);
>> +}
>> +else
>> +pt_intr_post(v, intack);
>
>And then I'm having two problems with this change: Processing other
>than the highest priority irq here looks to be non-architectural. From
>looking over pt_intr_post() there's only pt related accounting and no
>actual interrupt handling there, but I'd like this to be
> (a) confirmed

To me, YES.. but I hope Kevin could confirm it again.

>and (b) called out in the comment.

Agreed. Good idea.

>
>Plus the change above is in no way apicv specific, so I'd also like to
>see clarification why this change is valid (i.e. at least not making
>things worse) even without apicv in use.
>

__iiuc__, it is apicv specific, as these change is under condition 
'cpu_has_vmx_virtual_intr_delivery' (Virtual Interrupt Delivery is one of apicv 
features) as below:

if ( cpu_has_vmx_virtual_intr_delivery ...) {  ___change___ }
 



>And of course in any event VMX maintainer feedback is going to be
>needed here.


Thanks for your comments!!
Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-22 Thread Xuquan (Euler)
On August 22, 2016 5:38 PM, Yang Zhang wrote:
>On 2016/8/19 20:58, Xuquan (Euler) wrote:
>> From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
>> From: Quan Xu <xuqu...@huawei.com>
>> Date: Fri, 19 Aug 2016 20:40:31 +0800
>> Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue
>>
>> When Xen apicv is enabled, wall clock time is faster on Windows7-32
>> guest with high payload (with 2vCPU, captured from xentrace, in high
>> payload, the count of IPI interrupt increases rapidly between these vCPUs).
>>
>> If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector
>> 0xd1) are both pending (index of bit set in VIRR), unfortunately, the
>> IPI intrrupt is high priority than periodic timer interrupt. Xen
>> updates IPI interrupt bit set in VIRR to guest interrupt status (RVI)
>> as a high priority and apicv (Virtual-Interrupt Delivery) delivers IPI
>> interrupt within VMX non-root operation without a VM exit. Within VMX
>> non-root operation, if periodic timer interrupt index of bit is set in
>> VIRR and highest, the apicv delivers periodic timer interrupt within VMX 
>> non-root operation as well.
>>
>> But in current code, if Xen doesn't update periodic timer interrupt
>> bit set in VIRR to guest interrupt status (RVI) directly, Xen is not
>> aware of this case to decrease the count (pending_intr_nr) of pending
>> periodic timer interrupt, then Xen will deliver a periodic timer
>> interrupt again. The guest receives more periodic timer interrupt.
>>
>> If the periodic timer interrut is delivered and not the highest
>> priority, make Xen be aware of this case to decrease the count of
>> pending periodic timer interrupt.
>
>Nice catch!
>
>>
>> Signed-off-by: Yifei Jiang <jiangyi...@huawei.com>
>> Signed-off-by: Rongguang He <herongguang...@huawei.com>
>> Signed-off-by: Quan Xu <xuqu...@huawei.com>
>> ---
>> Why RFC:
>> 1. I am not quite sure for other cases, such as nested case.
>
>We don't support the nested APICv now. So it is ok for this patch.
>
>> 2. Within VMX non-root operation, an Asynchronous Enclave Exit (including 
>> external
>>interrupts, non-maskable interrupt system-management interrrupts, 
>> exceptions
>>and VM exit) may occur before delivery of a periodic timer interrupt, the 
>> periodic
>>timer interrupt may be lost when a coming periodic timer interrupt is 
>> delivered.
>>Actually, and so current code is.
>
>I think you mean the combination not interrupt lost. It should be ok since it 
>happens rarely and even native OS will encounter the same problem if it 
>disable the interrupt for too long.
>

Yes, it is 'combination'.

>> ---
>>  xen/arch/x86/hvm/vmx/intr.c | 16 +++-
>>  1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
>> index 8fca08c..d3a034e 100644
>> --- a/xen/arch/x86/hvm/vmx/intr.c
>> +++ b/xen/arch/x86/hvm/vmx/intr.c
>> @@ -334,7 +334,21 @@ void vmx_intr_assist(void)
>>  __vmwrite(EOI_EXIT_BITMAP(i), 
>> v->arch.hvm_vmx.eoi_exit_bitmap[i]);
>>  }
>>
>> -pt_intr_post(v, intack);
>> +   /*
>> +* If the periodic timer interrut is delivered and not the highest 
>> priority,
>> +* make Xen be aware of this case to decrease the count of pending 
>> periodic
>> +* timer interrupt.
>> +*/
>> +if ( pt_vector != -1 && intack.vector > pt_vector )
>> +{
>> +struct hvm_intack pt_intack;
>> +
>> +pt_intack.vector = pt_vector;
>> +pt_intack.source = hvm_intsrc_lapic;
>> +pt_intr_post(v, pt_intack);
>> +}
>> +else
>> +pt_intr_post(v, intack);
>>  }
>>  else
>>  {
>> --
>> 1.7.12.4
>>
>
>Looks good to me.
>
>Reviewed-by: Yang Zhang <yang.zhang...@gmail.com>
>


Thanks for your comments!!
Quan


>--
>Yang
>Alibaba Cloud Computing

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

2016-08-19 Thread Xuquan (Euler)
>From 9b2df963c13ad27e2cffbeddfa3267782ac3da2a Mon Sep 17 00:00:00 2001
From: Quan Xu 
Date: Fri, 19 Aug 2016 20:40:31 +0800
Subject: [RFC PATCH] x86/apicv: fix RTC periodic timer and apicv issue

When Xen apicv is enabled, wall clock time is faster on Windows7-32 guest
with high payload (with 2vCPU, captured from xentrace, in high payload, the
count of IPI interrupt increases rapidly between these vCPUs).

If IPI intrrupt (vector 0xe1) and periodic timer interrupt (vector 0xd1)
are both pending (index of bit set in VIRR), unfortunately, the IPI intrrupt
is high priority than periodic timer interrupt. Xen updates IPI interrupt bit
set in VIRR to guest interrupt status (RVI) as a high priority and apicv
(Virtual-Interrupt Delivery) delivers IPI interrupt within VMX non-root
operation without a VM exit. Within VMX non-root operation, if periodic timer
interrupt index of bit is set in VIRR and highest, the apicv delivers periodic
timer interrupt within VMX non-root operation as well.

But in current code, if Xen doesn't update periodic timer interrupt bit set
in VIRR to guest interrupt status (RVI) directly, Xen is not aware of this
case to decrease the count (pending_intr_nr) of pending periodic timer 
interrupt,
then Xen will deliver a periodic timer interrupt again. The guest receives more
periodic timer interrupt.

If the periodic timer interrut is delivered and not the highest priority, make
Xen be aware of this case to decrease the count of pending periodic timer
interrupt.

Signed-off-by: Yifei Jiang 
Signed-off-by: Rongguang He 
Signed-off-by: Quan Xu 
---
Why RFC:
1. I am not quite sure for other cases, such as nested case.
2. Within VMX non-root operation, an Asynchronous Enclave Exit (including 
external
   interrupts, non-maskable interrupt system-management interrrupts, exceptions
   and VM exit) may occur before delivery of a periodic timer interrupt, the 
periodic
   timer interrupt may be lost when a coming periodic timer interrupt is 
delivered.
   Actually, and so current code is.
---
 xen/arch/x86/hvm/vmx/intr.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/vmx/intr.c b/xen/arch/x86/hvm/vmx/intr.c
index 8fca08c..d3a034e 100644
--- a/xen/arch/x86/hvm/vmx/intr.c
+++ b/xen/arch/x86/hvm/vmx/intr.c
@@ -334,7 +334,21 @@ void vmx_intr_assist(void)
 __vmwrite(EOI_EXIT_BITMAP(i), v->arch.hvm_vmx.eoi_exit_bitmap[i]);
 }

-pt_intr_post(v, intack);
+   /*
+* If the periodic timer interrut is delivered and not the highest 
priority,
+* make Xen be aware of this case to decrease the count of pending 
periodic
+* timer interrupt.
+*/
+if ( pt_vector != -1 && intack.vector > pt_vector )
+{
+struct hvm_intack pt_intack;
+
+pt_intack.vector = pt_vector;
+pt_intack.source = hvm_intsrc_lapic;
+pt_intr_post(v, pt_intack);
+}
+else
+pt_intr_post(v, intack);
 }
 else
 {
--
1.7.12.4


0001-x86-apicv-fix-RTC-periodic-timer-and-apicv-issue.patch
Description: 0001-x86-apicv-fix-RTC-periodic-timer-and-apicv-issue.patch
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] help -- A question about ' XENFEAT_auto_translated_physmap':

2016-08-10 Thread Xuquan (Euler)
On August 09, 2016 9:02 PM, < jbeul...@suse.com > wrote:
> >>> On 09.08.16 at 14:36,  wrote:
> > Hi Jan,
> >
> > A question about ' XENFEAT_auto_translated_physmap':
> >
> > In linux code, in arch/x86/xen/mmu.c,
>
> I assume you know that I'm not a maintainer of the Linux code.
>
> >  __xen_pgd_walk()
> > {
> > 
> >
> > if (xen_feature(XENFEAT_auto_translated_physmap))
> > return 0;
> > 
> > }
> >
> >
> >
> > Why xen_feature(XENFEAT_auto_translated_physmap) is true, then return
> > directly?
> > If not return directly, is there any potential risk?
>
> Well, the function is specifically there for operations (pinning/unpinning) 
> which are required only for the not-auto-translated case.
> Why would anyone want to traverse a page table tree just to do nothing on 
> each of the entries?

Jan, thank you!!
As I am struggling with a dom0 crash, the kernel is old, 3.0.X..
Now there is a crash in 

[] _pin_lock+0x165/0x2a0<*crash*

unable to handle kernel paging request at 8b1021826000

static void _pin_lock(struct mm_struct *mm, int lock)
{
171 pgd_t *pgd = mm->pgd;
172 unsigned g;
173
174 for (g = 0; g <= ((TASK_SIZE_MAX-1) / PGDIR_SIZE); g++, 
pgd++) {
175 pud_t *pud;
176 unsigned u;
177
178 if (pgd_none(*pgd))
179 continue;
180 pud = pud_offset(pgd, 0);
181 for (u = 0; u < PTRS_PER_PUD; u++, pud++) {
182 pmd_t *pmd;
183 unsigned m;
184
185 if (pud_none(*pud))
186 continue;
187 pmd = pmd_offset(pud, 0);
188 for (m = 0; m < PTRS_PER_PMD; m++, pmd++) {
189 spinlock_t *ptl;
190
191 if (pmd_none(*pmd))  
<-*crash*
192 continue;
193 ptl = pte_lockptr(0, pmd);
194 if (lock)
195 spin_lock(ptl);
196 else
197 spin_unlock(ptl);
198 }
199 }
200 }
201 }
202 #endif
}



Quan




___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] help -- A question about ' XENFEAT_auto_translated_physmap':

2016-08-09 Thread Xuquan (Euler)
Hi Jan,

A question about ' XENFEAT_auto_translated_physmap':

In linux code, in arch/x86/xen/mmu.c,

 __xen_pgd_walk()
{


if (xen_feature(XENFEAT_auto_translated_physmap))
return 0;

}



Why xen_feature(XENFEAT_auto_translated_physmap) is true, then return directly?
If not return directly, is there any potential risk?






Detail for  __xen_pgd_walk():

static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd,
  int (*func)(struct mm_struct *mm, struct page *,
  enum pt_level),
  unsigned long limit)
{
int flush = 0;
unsigned hole_low, hole_high;
unsigned pgdidx_limit, pudidx_limit, pmdidx_limit;
unsigned pgdidx, pudidx, pmdidx;

/* The limit is the last byte to be touched */
limit--;
BUG_ON(limit >= FIXADDR_TOP);

if (xen_feature(XENFEAT_auto_translated_physmap))
return 0;

/*
 * 64-bit has a great big hole in the middle of the address
 * space, which contains the Xen mappings.  On 32-bit these
 * will end up making a zero-sized hole and so is a no-op.
 */
hole_low = pgd_index(USER_LIMIT);
hole_high = pgd_index(PAGE_OFFSET);

pgdidx_limit = pgd_index(limit);
#if PTRS_PER_PUD > 1
pudidx_limit = pud_index(limit);
#else
pudidx_limit = 0;
#endif
#if PTRS_PER_PMD > 1
pmdidx_limit = pmd_index(limit);
#else
pmdidx_limit = 0;
#endif

for (pgdidx = 0; pgdidx <= pgdidx_limit; pgdidx++) {
pud_t *pud;

if (pgdidx >= hole_low && pgdidx < hole_high)
continue;

if (!pgd_val(pgd[pgdidx]))
continue;

pud = pud_offset([pgdidx], 0);

if (PTRS_PER_PUD > 1) /* not folded */
flush |= (*func)(mm, virt_to_page(pud), PT_PUD);

for (pudidx = 0; pudidx < PTRS_PER_PUD; pudidx++) {
pmd_t *pmd;

if (pgdidx == pgdidx_limit &&
pudidx > pudidx_limit)
goto out;

if (pud_none(pud[pudidx]))
continue;

pmd = pmd_offset([pudidx], 0);

if (PTRS_PER_PMD > 1) /* not folded */
flush |= (*func)(mm, virt_to_page(pmd), PT_PMD);

for (pmdidx = 0; pmdidx < PTRS_PER_PMD; pmdidx++) {
struct page *pte;

if (pgdidx == pgdidx_limit &&
pudidx == pudidx_limit &&
pmdidx > pmdidx_limit)
goto out;

if (pmd_none(pmd[pmdidx]))
continue;

pte = pmd_page(pmd[pmdidx]);
flush |= (*func)(mm, pte, PT_PTE);
}
}
}

out:
/* Do the top level last, so that the callbacks can use it as
   a cue to do final things like tlb flushes. */
flush |= (*func)(mm, virt_to_page(pgd), PT_PGD);

return flush;
}



Quan






___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/19] Qemu-Xen-vTPM: Xen frontend driver infrastructure

2016-08-09 Thread Xuquan (Euler)
On August 07, 2016 7:39 PM, Emil Condrea  wrote:
On Mon, Jul 25, 2016 at 7:01 PM, Anthony PERARD  
wrote:
> >
> > > +{
> > > +xs_transaction_t xbt = XBT_NULL;
> > > +
> > > +if (xendev->fe_state == xbus) {
> > > +return 0;
> > > +}
> > > +
> > > +xendev->fe_state = xbus;
> > > +if (xendev->fe == NULL) {
> > > +xen_pv_printf(NULL, 0, "xendev->fe is NULL\n");
> > > +return -1;
> > > +}
> > > +
> > > +retry_transaction:

Add a space here, then
   s/retry_transaction:/ retry_transaction:/
the same for the other cases..


> > > +xbt = xs_transaction_start(xenstore);
> > > +if (xbt == XBT_NULL) {
> > > +goto abort_transaction;
> > > +}
> >
> > There is a transaction started, but I don't think it is used by the
> > function below. Could you remove the transaction?
>
> I will remove it. For current version I don't see a direct usage of this 
> transaction.
> Quan, did you have a specific reason for past versions for this transaction?

No specific reason, maybe I copied these code from xen / libxl .. 
btw, why does libxl use ' retry_transaction: ' logic ... but QEMU doesn't ?


Thank you both.

Quan

> >
> >
> > > +if (xenstore_write_int(xendev->fe, "state", xbus)) {
> > > +goto abort_transaction;
> > > +}
> > > +
> > > +if (!xs_transaction_end(xenstore, xbt, 0)) {
> > > +if (errno == EAGAIN) {
> > > +goto retry_transaction;
> > > +}
> > > +}
> > > +
> > > +return 0;
> > > +
> > > +abort_transaction:
> > > +xs_transaction_end(xenstore, xbt, 1);
> > > +return -1;
> > > +}
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RESEND] MAINTAINERS: update email

2016-08-01 Thread Xuquan (Euler)
Wei, thanks!!
This email is sent out from outlook (the Linux can't send out email). the 
format may be not correct. 

Quan

On Aug 01 2016 at 18:46:00pm, Wei Liu <wei.l...@citrix.com> wrote:
On Mon, Aug 01, 2016 at 10:36:43AM +, Xuquan (Euler) wrote:
> Signed-off-by: Quan Xu <xuqu...@huawei.com>
> Reviewed-by: Kevin Tian <kevin.t...@intel.com>

Thank you both.

Quan, I think there is something wrong with your email setup.

When I tried to git am this patch, I got:

fatal: corrupt patch at line 11
Patch failed at 0001 MAINTAINERS: update email The copy of the patch that 
failed is found in:
   /local/work/COMMITTER/xen.git/.git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Looking at the email it seems that it is encoded somehow.

No need to resend though. This patch is simple enough that I've fixed it for 
you.

And I changed the subject line to:

   MAINTAINERS: update Quan Xu's email address

before pushing to staging.

> ---
> MAINTAINERS | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS index 65c2447..97720a8 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -415,7 +415,7 @@ F:  xen/include/asm-x86/hvm/monitor.h
> 
> VTPM
> M: Daniel De Graaf <dgde...@tycho.nsa.gov>
> -M: Quan Xu <quan...@intel.com>
> +M: Quan Xu <xuqu...@huawei.com>
> S: Supported
> F: extras/mini-os/tpm*
> F: extras/mini-os/include/tpm*
> --
> 1.7.12.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH RESEND] MAINTAINERS: update email

2016-08-01 Thread Xuquan (Euler)
Signed-off-by: Quan Xu 
Reviewed-by: Kevin Tian 
---
MAINTAINERS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 65c2447..97720a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -415,7 +415,7 @@ F:  xen/include/asm-x86/hvm/monitor.h

VTPM
M: Daniel De Graaf 
-M: Quan Xu 
+M: Quan Xu 
S: Supported
F: extras/mini-os/tpm*
F: extras/mini-os/include/tpm*
--
1.7.12.4
___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel