Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-20 Thread Tian, Kevin
> From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
> Sent: Monday, February 20, 2017 8:04 PM
> 
> On February 13, 2017 4:21 PM, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Wednesday, February 08, 2017 4:52 PM
> >>
> >> >>> On 08.02.17 at 09:27,  wrote:
> >> > Assumed vCPU is in guest_mode..
> >> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
> >> > then
> >> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
> >> > (also no
> >> > vcpu_kick() )..
> >> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
> >> > deliver posted interrupt. if posted interrupt is not delivered, the
> >> > posted interrupt is pending until next VM entry -- by PIR to vIRR..
> >> >
> >> > one condition is :
> >> > In __vmx_deliver_posted_interrupt(),  ' if (
> >> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
> >> >
> >> > Specifically, we did verify it by RES interrupt, which is used for
> >> > smp_reschedule_interrupt..
> >> > We even cost more time to deliver RES interrupt than no-apicv in
> >average..
> >> >
> >> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
> >> > still guest_mode).. when tries to deliver next-coming RES interrupt
> >> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
> >> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
> >> > interrupt (no. 1)..
> >> >
> >> > Then the next-coming RES interrupt (no. 2) is pending until next VM
> >> > entry -- by PIR to vIRR..
> >> >
> >> >
> >> > We can fix it as below(I don't think this is a best one, it is
> >> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
> >> >
> >> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> >> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> >> > @@ -1846,7 +1846,7 @@ static void
> >__vmx_deliver_posted_interrupt(struct vcpu *v)
> >> >  {
> >> >  unsigned int cpu = v->processor;
> >> >
> >> > -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
> >_pending(cpu))
> >> > +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
> >> >   && (cpu != smp_processor_id()) )
> >> >  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >> >  }
> >>
> >> While I don't think I fully understand your description, the line you
> >> change here has always been puzzling me: If we were to raise a softirq
> >> here, we ought to call cpu_raise_softirq() instead of partly open
> >> coding what it does.
> >
> >We require posted_intr_vector for target CPU to ack/deliver virtual
> >interrupt in non-root mode. cpu_raise_softirq uses a different vector, which
> >cannot trigger such effect.
> >
> 
> 
> Kevin,
> 
> I can't follow this 'to ack'..
> As I understand, the posted_intr_vector is to call event_check_interrupt() [ 
> or
> pi_notification_interrupt() ] to writes zero to the EOI register in the local 
> APIC --
> this dismisses the interrupt with the posted interrupt notification vector 
> from the local
> APIC.
> 
> What does this ack refer to?
> 

Please look at SDM. 'ack' means evaluation of pending vIRRs when CPU is
in non-root mode which results in direct virtual interrupt delivery w/o 
incurring
VM-exit.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-20 Thread Xuquan (Quan Xu)
On February 13, 2017 4:21 PM, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Wednesday, February 08, 2017 4:52 PM
>>
>> >>> On 08.02.17 at 09:27,  wrote:
>> > Assumed vCPU is in guest_mode..
>> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> > then
>> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>> > (also no
>> > vcpu_kick() )..
>> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>> > deliver posted interrupt. if posted interrupt is not delivered, the
>> > posted interrupt is pending until next VM entry -- by PIR to vIRR..
>> >
>> > one condition is :
>> > In __vmx_deliver_posted_interrupt(),  ' if (
>> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
>> >
>> > Specifically, we did verify it by RES interrupt, which is used for
>> > smp_reschedule_interrupt..
>> > We even cost more time to deliver RES interrupt than no-apicv in
>average..
>> >
>> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>> > still guest_mode).. when tries to deliver next-coming RES interrupt
>> > (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
>> > delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>> > interrupt (no. 1)..
>> >
>> > Then the next-coming RES interrupt (no. 2) is pending until next VM
>> > entry -- by PIR to vIRR..
>> >
>> >
>> > We can fix it as below(I don't think this is a best one, it is
>> > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>> >
>> > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>> >  {
>> >  unsigned int cpu = v->processor;
>> >
>> > -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>_pending(cpu))
>> > +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
>> >   && (cpu != smp_processor_id()) )
>> >  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> >  }
>>
>> While I don't think I fully understand your description, the line you
>> change here has always been puzzling me: If we were to raise a softirq
>> here, we ought to call cpu_raise_softirq() instead of partly open
>> coding what it does.
>
>We require posted_intr_vector for target CPU to ack/deliver virtual
>interrupt in non-root mode. cpu_raise_softirq uses a different vector, which
>cannot trigger such effect.
>


Kevin,

I can't follow this 'to ack'..
As I understand, the posted_intr_vector is to call event_check_interrupt() [ or 
pi_notification_interrupt() ] to writes zero to the EOI register in the local 
APIC --
this dismisses the interrupt with the posted interrupt notification vector from 
the local APIC.

What does this ack refer to?






>> So I think not marking that softirq
>> pending (but doing this incompletely) is a valid change in any case.
>> But I'll have to defer to Kevin in the hopes that he fully understands
>> what you explain above as well as him knowing why this was a
>> test-and-set here in the first place.
>>
>
>I agree we have a misuse of softirq mechanism here. If guest is already in
>non-root mode, the 1st posted interrupt will be directly delivered to guest
>(leaving softirq being set w/o actually incurring a VM-exit - breaking desired
>softirq behavior). Then further posted interrupts will skip the IPI, stay in 
>PIR
>and not noted until another VM-exit happens. Looks Quan observes such
>delay of delivery in his experiments.
>
>I'm OK to remove the set here. Actually since it's an optimization for less
>IPIs, we'd better check softirq_pending(cpu) directly instead of sticking to
>one bit only.
>
>Thanks
>Kevin
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-13 Thread Xuquan (Quan Xu)
On February 13, 2017 4:24 PM, Tian, Kevin wrote:
>> From: Tian, Kevin
>> Sent: Monday, February 13, 2017 4:21 PM
>>
>> > From: Jan Beulich [mailto:jbeul...@suse.com]
>> > Sent: Wednesday, February 08, 2017 4:52 PM
>> >
>> > >>> On 08.02.17 at 09:27,  wrote:
>> > > Assumed vCPU is in guest_mode..
>> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> > > then
>> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no
>> > > vmexit (also no
>> > > vcpu_kick() )..
>> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to
>> > > deliver posted interrupt. if posted interrupt is not delivered,
>> > > the posted interrupt is pending until next VM entry -- by PIR to vIRR..
>> > >
>> > > one condition is :
>> > > In __vmx_deliver_posted_interrupt(),  ' if (
>> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
>> > >
>> > > Specifically, we did verify it by RES interrupt, which is used for
>> > > smp_reschedule_interrupt..
>> > > We even cost more time to deliver RES interrupt than no-apicv in
>average..
>> > >
>> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is
>> > > still guest_mode).. when tries to deliver next-coming RES
>> > > interrupt (no. 2) by posted way, The next-coming RES interrupt
>> > > (no. 2) is not delivered, as we set the VCPU_KICK_SOFTIRQ bit when
>> > > we deliver RES interrupt (no. 1)..
>> > >
>> > > Then the next-coming RES interrupt (no. 2) is pending until next
>> > > VM entry -- by PIR to vIRR..
>> > >
>> > >
>> > > We can fix it as below(I don't think this is a best one, it is
>> > > better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>> > >
>> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
>> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> > > @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>> > >  {
>> > >  unsigned int cpu = v->processor;
>> > >
>> > > -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>_pending(cpu))
>> > > +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
>> > >   && (cpu != smp_processor_id()) )
>> > >  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>> > >  }
>> >
>> > While I don't think I fully understand your description, the line
>> > you change here has always been puzzling me: If we were to raise a
>> > softirq here, we ought to call cpu_raise_softirq() instead of partly
>> > open coding what it does.
>>
>> We require posted_intr_vector for target CPU to ack/deliver virtual
>> interrupt in non-root mode. cpu_raise_softirq uses a different vector,
>> which cannot trigger such effect.
>>
>> > So I think not marking that softirq
>> > pending (but doing this incompletely) is a valid change in any case.
>> > But I'll have to defer to Kevin in the hopes that he fully
>> > understands what you explain above as well as him knowing why this
>> > was a test-and-set here in the first place.
>> >
>>
>> I agree we have a misuse of softirq mechanism here. If guest is
>> already in non-root mode, the 1st posted interrupt will be directly
>> delivered to guest (leaving softirq being set w/o actually incurring a
>> VM-exit - breaking desired softirq behavior). Then further posted
>> interrupts will skip the IPI, stay in PIR and not noted until another
>> VM-exit happens. Looks Quan observes such delay of delivery in his
>> experiments.
>>
>> I'm OK to remove the set here. Actually since it's an optimization for
>> less IPIs, we'd better check softirq_pending(cpu) directly instead of
>> sticking to one bit only.
>>
>
>sent too fast... Quan, can you work out a patch following this suggestion
>and see whether your slow-delivery issue is solved?
>(hope I understand your issue correctly here).
>

Cool, Very Correct!! 
Sure, I will send out a patch in coming days..

Quan










___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-13 Thread Tian, Kevin
> From: Tian, Kevin
> Sent: Monday, February 13, 2017 4:21 PM
> 
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: Wednesday, February 08, 2017 4:52 PM
> >
> > >>> On 08.02.17 at 09:27,  wrote:
> > > Assumed vCPU is in guest_mode..
> > > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then
> > > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit 
> > > (also no
> > > vcpu_kick() )..
> > > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver 
> > > posted
> > > interrupt. if posted interrupt is not delivered, the posted interrupt is
> > > pending until next VM entry -- by PIR to vIRR..
> > >
> > > one condition is :
> > > In __vmx_deliver_posted_interrupt(),  ' if (
> > > !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
> > >
> > > Specifically, we did verify it by RES interrupt, which is used for
> > > smp_reschedule_interrupt..
> > > We even cost more time to deliver RES interrupt than no-apicv in average..
> > >
> > > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
> > > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by
> > > posted way,
> > > The next-coming RES interrupt (no. 2) is not delivered, as we set the
> > > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> > >
> > > Then the next-coming RES interrupt (no. 2) is pending until next VM entry 
> > > -- by
> > > PIR to vIRR..
> > >
> > >
> > > We can fix it as below(I don't think this is a best one, it is better to 
> > > set
> > > the VCPU_KICK_SOFTIRQ bit, but not test it):
> > >
> > > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct 
> > > vcpu *v)
> > >  {
> > >  unsigned int cpu = v->processor;
> > >
> > > -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
> > > +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
> > >   && (cpu != smp_processor_id()) )
> > >  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> > >  }
> >
> > While I don't think I fully understand your description, the line you
> > change here has always been puzzling me: If we were to raise a
> > softirq here, we ought to call cpu_raise_softirq() instead of partly
> > open coding what it does.
> 
> We require posted_intr_vector for target CPU to ack/deliver virtual
> interrupt in non-root mode. cpu_raise_softirq uses a different vector,
> which cannot trigger such effect.
> 
> > So I think not marking that softirq
> > pending (but doing this incompletely) is a valid change in any case.
> > But I'll have to defer to Kevin in the hopes that he fully
> > understands what you explain above as well as him knowing why
> > this was a test-and-set here in the first place.
> >
> 
> I agree we have a misuse of softirq mechanism here. If guest
> is already in non-root mode, the 1st posted interrupt will be directly
> delivered to guest (leaving softirq being set w/o actually incurring a
> VM-exit - breaking desired softirq behavior). Then further posted
> interrupts will skip the IPI, stay in PIR and not noted until another
> VM-exit happens. Looks Quan observes such delay of delivery in
> his experiments.
> 
> I'm OK to remove the set here. Actually since it's an optimization
> for less IPIs, we'd better check softirq_pending(cpu) directly
> instead of sticking to one bit only.
>

sent too fast... Quan, can you work out a patch following this
suggestion and see whether your slow-delivery issue is solved?
(hope I understand your issue correctly here).

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-13 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, February 08, 2017 4:52 PM
> 
> >>> On 08.02.17 at 09:27,  wrote:
> > Assumed vCPU is in guest_mode..
> > When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then
> > __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also 
> > no
> > vcpu_kick() )..
> > In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted
> > interrupt. if posted interrupt is not delivered, the posted interrupt is
> > pending until next VM entry -- by PIR to vIRR..
> >
> > one condition is :
> > In __vmx_deliver_posted_interrupt(),  ' if (
> > !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
> >
> > Specifically, we did verify it by RES interrupt, which is used for
> > smp_reschedule_interrupt..
> > We even cost more time to deliver RES interrupt than no-apicv in average..
> >
> > If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
> > guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by
> > posted way,
> > The next-coming RES interrupt (no. 2) is not delivered, as we set the
> > VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> >
> > Then the next-coming RES interrupt (no. 2) is pending until next VM entry 
> > -- by
> > PIR to vIRR..
> >
> >
> > We can fix it as below(I don't think this is a best one, it is better to set
> > the VCPU_KICK_SOFTIRQ bit, but not test it):
> >
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct 
> > vcpu *v)
> >  {
> >  unsigned int cpu = v->processor;
> >
> > -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
> > +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
> >   && (cpu != smp_processor_id()) )
> >  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
> >  }
> 
> While I don't think I fully understand your description, the line you
> change here has always been puzzling me: If we were to raise a
> softirq here, we ought to call cpu_raise_softirq() instead of partly
> open coding what it does. 

We require posted_intr_vector for target CPU to ack/deliver virtual 
interrupt in non-root mode. cpu_raise_softirq uses a different vector,
which cannot trigger such effect. 

> So I think not marking that softirq
> pending (but doing this incompletely) is a valid change in any case.
> But I'll have to defer to Kevin in the hopes that he fully
> understands what you explain above as well as him knowing why
> this was a test-and-set here in the first place.
> 

I agree we have a misuse of softirq mechanism here. If guest 
is already in non-root mode, the 1st posted interrupt will be directly 
delivered to guest (leaving softirq being set w/o actually incurring a 
VM-exit - breaking desired softirq behavior). Then further posted 
interrupts will skip the IPI, stay in PIR and not noted until another 
VM-exit happens. Looks Quan observes such delay of delivery in
his experiments.

I'm OK to remove the set here. Actually since it's an optimization
for less IPIs, we'd better check softirq_pending(cpu) directly 
instead of sticking to one bit only.

Thanks
Kevin





___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-09 Thread Chao Gao
On Thu, Feb 09, 2017 at 08:51:46AM +, Xuquan (Quan Xu) wrote:
>On February 08, 2017 4:22 PM, Chao Gao wrote:
>>On Wed, Feb 08, 2017 at 10:15:28AM +, Xuquan (Quan Xu) wrote:
>>>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>>> On 08.02.17 at 09:27,  wrote:
> Assumed vCPU is in guest_mode..
> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
> then
> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
> (also no
> vcpu_kick() )..
> In __vmx_deliver_posted_interrupt(), it is __conditional__ to
> deliver posted interrupt. if posted interrupt is not delivered, the
> posted interrupt is pending until next VM entry -- by PIR to vIRR..
>
> one condition is :
> In __vmx_deliver_posted_interrupt(),  ' if (
> !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
>
> Specifically, we did verify it by RES interrupt, which is used for
> smp_reschedule_interrupt..
> We even cost more time to deliver RES interrupt than no-apicv in
average..
>
> If RES interrupt (no. 1) is delivered by posted way (the vcpu is
> still guest_mode).. when tries to deliver next-coming RES interrupt
> (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
> delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>>interrupt (no.
> 1)..
>
> Then the next-coming RES interrupt (no. 2) is pending until next VM
> entry -- by PIR to vIRR..
>
>
> We can fix it as below(I don't think this is a best one, it is
> better to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1846,7 +1846,7 @@ static void
__vmx_deliver_posted_interrupt(struct vcpu *v)
>  {
>  unsigned int cpu = v->processor;
>
> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
_pending(cpu))
> +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
>   && (cpu != smp_processor_id()) )
>  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>  }

While I don't think I fully understand your description,
>>>
>>>Sorry!!
>>>
the line you change
here has always been puzzling me: If we were to raise a softirq here,
we ought to call cpu_raise_softirq() instead of partly open coding what it
>>does.
So I think not marking that softirq pending (but doing this
incompletely) is a valid change in any case.
>>>
>>>As comments in pi_notification_interrupt()  --
>>>xen/arch/x86/hvm/vmx/vmx.c 
>>> *
>>> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>>> * __vmx_deliver_posted_interrupt(). So the pending interrupt in
>>PIRR will
>>> * be synced to vIRR before VM-Exit in time.
>>> *
>>>
>>>
>>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will
>>be synced to vIRR before VM-Exit in time.
>>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but
>>not test it..
>>>
>>
>>I think there is a typo. It should be "before VM-Entry in time". It set
>>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
>>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the
>>PIR to vIRR in vmx_intr_assist(). 
>
>impressive analysis..
>chao, could you show the related code?
>

In xen/arch/x86/vmx/entry.S, 
.Lvmx_do_vmentry:
call vmx_intr_assist
... 
cli
cmp %ecx,(%rdx,%rax,1)
jnz .Lvmx_process_softirqs
and 
.Lvmx_process_softirqs:
sti
call do_softirq
jmp .Lvmx_do_vmentry

In vmx_intr_assist(), PIR is synced to vIRR. After vmx_intr_assist(), 
if some interrupts are posted in PIR, VCPU_KICK_SOFTIRQ is set 
in pi_nofitication_interrupt() and it will jump to vmx_process_softirqs
and jump to vmx_do_vmentry again.

Thanks
Chao

>Quan
>
>
>>In root-mode, cpu treat the pi notification
>>interrupt as normal interrupt, so cpu will run the interrupt handler
>>pi_notification_interrupt() instead of syncing PIR to vIRR automatically.
>>Receiving a pi notificatio interrupt means some interrupts have been
>>posted in PIR. Setting that bit is to deliver these new arrival interrupts
>>before this VM-entry.
>>
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-09 Thread Xuquan (Quan Xu)
On February 08, 2017 4:22 PM, Chao Gao wrote:
>On Wed, Feb 08, 2017 at 10:15:28AM +, Xuquan (Quan Xu) wrote:
>>On February 08, 2017 4:52 PM, Jan Beulich wrote:
>> On 08.02.17 at 09:27,  wrote:
 Assumed vCPU is in guest_mode..
 When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
 then
 __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
 (also no
 vcpu_kick() )..
 In __vmx_deliver_posted_interrupt(), it is __conditional__ to
 deliver posted interrupt. if posted interrupt is not delivered, the
 posted interrupt is pending until next VM entry -- by PIR to vIRR..

 one condition is :
 In __vmx_deliver_posted_interrupt(),  ' if (
 !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..

 Specifically, we did verify it by RES interrupt, which is used for
 smp_reschedule_interrupt..
 We even cost more time to deliver RES interrupt than no-apicv in
>>>average..

 If RES interrupt (no. 1) is delivered by posted way (the vcpu is
 still guest_mode).. when tries to deliver next-coming RES interrupt
 (no. 2) by posted way, The next-coming RES interrupt (no. 2) is not
 delivered, as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES
>interrupt (no.
 1)..

 Then the next-coming RES interrupt (no. 2) is pending until next VM
 entry -- by PIR to vIRR..


 We can fix it as below(I don't think this is a best one, it is
 better to set the VCPU_KICK_SOFTIRQ bit, but not test it):

 --- a/xen/arch/x86/hvm/vmx/vmx.c
 +++ b/xen/arch/x86/hvm/vmx/vmx.c
 @@ -1846,7 +1846,7 @@ static void
>>>__vmx_deliver_posted_interrupt(struct vcpu *v)
  {
  unsigned int cpu = v->processor;

 -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>>_pending(cpu))
 +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
   && (cpu != smp_processor_id()) )
  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
  }
>>>
>>>While I don't think I fully understand your description,
>>
>>Sorry!!
>>
>>>the line you change
>>>here has always been puzzling me: If we were to raise a softirq here,
>>>we ought to call cpu_raise_softirq() instead of partly open coding what it
>does.
>>>So I think not marking that softirq pending (but doing this
>>>incompletely) is a valid change in any case.
>>
>>As comments in pi_notification_interrupt()  --
>>xen/arch/x86/hvm/vmx/vmx.c 
>> *
>> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
>> * __vmx_deliver_posted_interrupt(). So the pending interrupt in
>PIRR will
>> * be synced to vIRR before VM-Exit in time.
>> *
>>
>>
>>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will
>be synced to vIRR before VM-Exit in time.
>>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but
>not test it..
>>
>
>I think there is a typo. It should be "before VM-Entry in time". It set
>VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
>entering guest directly. Jumping to vmx_do_vmentry again can re-sync the
>PIR to vIRR in vmx_intr_assist(). 

impressive analysis..
chao, could you show the related code?

Quan


>In root-mode, cpu treat the pi notification
>interrupt as normal interrupt, so cpu will run the interrupt handler
>pi_notification_interrupt() instead of syncing PIR to vIRR automatically.
>Receiving a pi notificatio interrupt means some interrupts have been
>posted in PIR. Setting that bit is to deliver these new arrival interrupts
>before this VM-entry.
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-08 Thread Chao Gao
On Wed, Feb 08, 2017 at 10:15:28AM +, Xuquan (Quan Xu) wrote:
>On February 08, 2017 4:52 PM, Jan Beulich wrote:
> On 08.02.17 at 09:27,  wrote:
>>> Assumed vCPU is in guest_mode..
>>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>>> then
>>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>>> (also no
>>> vcpu_kick() )..
>>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver
>>> posted interrupt. if posted interrupt is not delivered, the posted
>>> interrupt is pending until next VM entry -- by PIR to vIRR..
>>>
>>> one condition is :
>>> In __vmx_deliver_posted_interrupt(),  ' if (
>>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
>>>
>>> Specifically, we did verify it by RES interrupt, which is used for
>>> smp_reschedule_interrupt..
>>> We even cost more time to deliver RES interrupt than no-apicv in
>>average..
>>>
>>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
>>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2)
>>> by posted way, The next-coming RES interrupt (no. 2) is not delivered,
>>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no.
>>> 1)..
>>>
>>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>>> entry -- by PIR to vIRR..
>>>
>>>
>>> We can fix it as below(I don't think this is a best one, it is better
>>> to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>>
>>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>>> @@ -1846,7 +1846,7 @@ static void
>>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>>  {
>>>  unsigned int cpu = v->processor;
>>>
>>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>>_pending(cpu))
>>> +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
>>>   && (cpu != smp_processor_id()) )
>>>  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>>  }
>>
>>While I don't think I fully understand your description, 
>
>Sorry!!
>
>>the line you change
>>here has always been puzzling me: If we were to raise a softirq here, we
>>ought to call cpu_raise_softirq() instead of partly open coding what it does.
>>So I think not marking that softirq pending (but doing this incompletely) is
>>a valid change in any case.
>
>As comments in pi_notification_interrupt()  -- xen/arch/x86/hvm/vmx/vmx.c
>
> *
> * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
> * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
> * be synced to vIRR before VM-Exit in time.
> *
>
>
>I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be 
>synced to vIRR before VM-Exit in time.
>That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not 
>test it..
>

I think there is a typo. It should be "before VM-Entry in time". It set 
VCPU_KICK_SOFTIRQ bit only to jump to vmx_do_vmentry again instead of
entering guest directly. Jumping to vmx_do_vmentry again can re-sync the PIR to
vIRR in vmx_intr_assist(). In root-mode, cpu treat the pi notification interrupt
as normal interrupt, so cpu will run the interrupt handler 
pi_notification_interrupt()
instead of syncing PIR to vIRR automatically. Receiving a pi notificatio 
interrupt
means some interrupts have been posted in PIR. Setting that bit is to deliver 
these new arrival interrupts before this VM-entry. 

Thanks
chao

>
>>But I'll have to defer to Kevin in the hopes that he fully understands what
>>you explain above as well as him knowing why this was a test-and-set here
>>in the first place.
>>
>
>To me, this test-and-set is a bug.
>
>Quan
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-08 Thread Xuquan (Quan Xu)
On February 08, 2017 4:52 PM, Jan Beulich wrote:
 On 08.02.17 at 09:27,  wrote:
>> Assumed vCPU is in guest_mode..
>> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(),
>> then
>> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit
>> (also no
>> vcpu_kick() )..
>> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver
>> posted interrupt. if posted interrupt is not delivered, the posted
>> interrupt is pending until next VM entry -- by PIR to vIRR..
>>
>> one condition is :
>> In __vmx_deliver_posted_interrupt(),  ' if (
>> !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
>>
>> Specifically, we did verify it by RES interrupt, which is used for
>> smp_reschedule_interrupt..
>> We even cost more time to deliver RES interrupt than no-apicv in
>average..
>>
>> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still
>> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2)
>> by posted way, The next-coming RES interrupt (no. 2) is not delivered,
>> as we set the VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no.
>> 1)..
>>
>> Then the next-coming RES interrupt (no. 2) is pending until next VM
>> entry -- by PIR to vIRR..
>>
>>
>> We can fix it as below(I don't think this is a best one, it is better
>> to set the VCPU_KICK_SOFTIRQ bit, but not test it):
>>
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -1846,7 +1846,7 @@ static void
>__vmx_deliver_posted_interrupt(struct vcpu *v)
>>  {
>>  unsigned int cpu = v->processor;
>>
>> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>_pending(cpu))
>> +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
>>   && (cpu != smp_processor_id()) )
>>  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>>  }
>
>While I don't think I fully understand your description, 

Sorry!!

>the line you change
>here has always been puzzling me: If we were to raise a softirq here, we
>ought to call cpu_raise_softirq() instead of partly open coding what it does.
>So I think not marking that softirq pending (but doing this incompletely) is
>a valid change in any case.

As comments in pi_notification_interrupt()  -- xen/arch/x86/hvm/vmx/vmx.c

 *
 * we need to set VCPU_KICK_SOFTIRQ for the current cpu, just like
 * __vmx_deliver_posted_interrupt(). So the pending interrupt in PIRR will
 * be synced to vIRR before VM-Exit in time.
 *


I think setting VCPU_KICK_SOFTIRQ bit -- the pending interrupt in PIRR will be 
synced to vIRR before VM-Exit in time.
That's also why i said it is better to set the VCPU_KICK_SOFTIRQ bit, but not 
test it..


>But I'll have to defer to Kevin in the hopes that he fully understands what
>you explain above as well as him knowing why this was a test-and-set here
>in the first place.
>

To me, this test-and-set is a bug.

Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-08 Thread Jan Beulich
>>> On 08.02.17 at 09:27,  wrote:
> Assumed vCPU is in guest_mode..
> When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then 
> __vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no 
> vcpu_kick() )..
> In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted 
> interrupt. if posted interrupt is not delivered, the posted interrupt is 
> pending until next VM entry -- by PIR to vIRR.. 
> 
> one condition is :
> In __vmx_deliver_posted_interrupt(),  ' if ( 
> !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..
> 
> Specifically, we did verify it by RES interrupt, which is used for 
> smp_reschedule_interrupt..
> We even cost more time to deliver RES interrupt than no-apicv in average..
> 
> If RES interrupt (no. 1) is delivered by posted way (the vcpu is still 
> guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by 
> posted way,
> The next-coming RES interrupt (no. 2) is not delivered, as we set the 
> VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..
> 
> Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- 
> by 
> PIR to vIRR..
> 
> 
> We can fix it as below(I don't think this is a best one, it is better to set 
> the VCPU_KICK_SOFTIRQ bit, but not test it):
> 
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu 
> *v)
>  {
>  unsigned int cpu = v->processor;
> 
> -if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
> +if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
>   && (cpu != smp_processor_id()) )
>  send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
>  }

While I don't think I fully understand your description, the line you
change here has always been puzzling me: If we were to raise a
softirq here, we ought to call cpu_raise_softirq() instead of partly
open coding what it does. So I think not marking that softirq
pending (but doing this incompletely) is a valid change in any case.
But I'll have to defer to Kevin in the hopes that he fully
understands what you explain above as well as him knowing why
this was a test-and-set here in the first place.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-08 Thread Xuquan (Quan Xu)
On February 08, 2017 2:51 PM, Tian, Kevin wrote:
>> From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
>> Sent: Monday, January 23, 2017 6:57 PM
>>
>> On January 20, 2017 5:09 PM, Quan Xu wrote:
>> >btw, for PIR.. I find that there might be a bug in
>> >__vmx_deliver_posted_interrupt()...
>> >why test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu)) ??
>> >
>> >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
>> >if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>> >_pending(cpu)) ...
>> >}
>> >
>> >Suppose that vCPUx is in guest mode, there are two (even more)
>> >interrupts to vCPUx..
>> >As the bit is set when delivers the first interrupt... the second
>> >interrupt is pending until next VM entry -- by PIR to vIRR..
>> >
>>
>> Jan , Kevin
>> Correct me if I am wrong...
>>
>> Quan
>
>I don't quite understand the point here. Can you elaborate?
>

Assumed vCPU is in guest_mode..
When apicv is enabled, hypervisor calls vmx_deliver_posted_intr(), then 
__vmx_deliver_posted_interrupt() to deliver interrupt, but no vmexit (also no 
vcpu_kick() )..
In __vmx_deliver_posted_interrupt(), it is __conditional__ to deliver posted 
interrupt. if posted interrupt is not delivered, the posted interrupt is 
pending until next VM entry -- by PIR to vIRR.. 

one condition is :
In __vmx_deliver_posted_interrupt(),  ' if ( 
!test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))' ..

Specifically, we did verify it by RES interrupt, which is used for 
smp_reschedule_interrupt..
We even cost more time to deliver RES interrupt than no-apicv in average..

If RES interrupt (no. 1) is delivered by posted way (the vcpu is still 
guest_mode).. when tries to deliver next-coming RES interrupt (no. 2) by posted 
way,
The next-coming RES interrupt (no. 2) is not delivered, as we set the 
VCPU_KICK_SOFTIRQ bit when we deliver RES interrupt (no. 1)..

Then the next-coming RES interrupt (no. 2) is pending until next VM entry -- by 
PIR to vIRR..


We can fix it as below(I don't think this is a best one, it is better to set 
the VCPU_KICK_SOFTIRQ bit, but not test it):

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1846,7 +1846,7 @@ static void __vmx_deliver_posted_interrupt(struct vcpu *v)
 {
 unsigned int cpu = v->processor;

-if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
+if ( !test_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
  && (cpu != smp_processor_id()) )
 send_IPI_mask(cpumask_of(cpu), posted_intr_vector);
 }



To be honest, I really spent several days for the original 
awkward-description:).. 

Happy Spring Festival!!
Quan






___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-02-07 Thread Tian, Kevin
> From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
> Sent: Monday, January 23, 2017 6:57 PM
> 
> On January 20, 2017 5:09 PM, Quan Xu wrote:
> >btw, for PIR.. I find that there might be a bug in
> >__vmx_deliver_posted_interrupt()...
> >why test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu)) ??
> >
> >static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
> >if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
> >_pending(cpu)) ...
> >}
> >
> >Suppose that vCPUx is in guest mode, there are two (even more) interrupts
> >to vCPUx..
> >As the bit is set when delivers the first interrupt... the second interrupt 
> >is
> >pending until next VM entry -- by PIR to vIRR..
> >
> 
> Jan , Kevin
> Correct me if I am wrong...
> 
> Quan

I don't quite understand the point here. Can you elaborate?

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-23 Thread Xuquan (Quan Xu)
On January 20, 2017 5:09 PM, Quan Xu wrote:
>btw, for PIR.. I find that there might be a bug in
>__vmx_deliver_posted_interrupt()...
>why test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu)) ??
>
>static void __vmx_deliver_posted_interrupt(struct vcpu *v) { ...
>if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ,
>_pending(cpu)) ...
>}
>
>Suppose that vCPUx is in guest mode, there are two (even more) interrupts
>to vCPUx..
>As the bit is set when delivers the first interrupt... the second interrupt is
>pending until next VM entry -- by PIR to vIRR..
>

Jan , Kevin
Correct me if I am wrong...

Quan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-23 Thread Jan Beulich
>>> On 22.01.17 at 05:35,  wrote:
> Yes, I asked Chao to add some debug info in that case. The problem now
> is when we will reproduce the bug to see meaningful hint...

If written reasonably, feel free to submit the patch for considering to
put it into staging for a while. The bigger problem is that we'd then
need to constantly scan logs for incidents.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-21 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, January 20, 2017 7:49 PM
> 
> >>> On 18.01.17 at 11:23,  wrote:
> >>  From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Wednesday, January 18, 2017 5:38 PM
> >>
> >> >>> On 18.01.17 at 05:57,  wrote:
> >> > Attached was my earlier comment:
> >> >
> >> > --
> >> >> >>> On 20.12.16 at 06:37,  wrote:
> >> >> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
> >> >> >> Sent: Friday, December 16, 2016 5:40 PM
> >> >> >> -if (pt_vector != -1)
> >> >> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> >> +if ( pt_vector != -1 ) {
> >> >> >> +if ( intack.vector > pt_vector )
> >> >> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> >> >> +else
> >> >> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> >> +}
> >> >> >
> >> >> > Above can be simplified as one line change:
> >> >> >   if ( pt_vector != -1 )
> >> >> >   vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> >>
> >> >> Hmm, I don't understand. Did you mean to use max() here? Or
> >> >> else how is this an equivalent of the originally proposed code?
> >> >>
> >> >
> >> > Original code is not 100% correct. The purpose is to set EOI exit
> >> > bitmap for any vector which may block injection of pt_vector -
> >> > give chance to recognize pt_vector in future intack and then do pt
> >> > intr post. The simplified code achieves this effect same as original
> >> > code if intack.vector >= vector. I cannot come up a case why
> >> > intack.vector might be smaller than vector. If this case happens,
> >> > we still need enable exit bitmap for intack.vector instead of
> >> > pt_vector for said purpose while original code did it wrong.
> >> >
> >> > Thanks
> >> > Kevin
> >> > --
> >> >
> >> > Using intack.vector is always expected here regardless of the
> >> > comparison result between intack.vector and pt_vector. The
> >> > reason why I was OK adding an ASSERT was simply to test
> >> > whether intack.vecor >> > orthogonal to the fix itself.
> >>
> >> Well, a vector lower than pt_vector can't block delivery. Or wait:
> >
> > There are two points here:
> >
> > a) We need enable EOI exit bitmap when pt_vector is blocked.
> >
> > b) As you said, ideally a vector lower than pt_vecotr cannot block
> >
> > The patch fixed a) and then added an ASSERT to verify b). Strictly
> > speaking they are separate issues.
> 
> Okay, I think I finally understand your argumentation here.
> 
> >> Don't we need to consider vector classes here, i.e.
> >>
> >> ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
> >>
> >> ?
> >>
> >
> > However it still doesn't explain why original ASSERT is triggered.
> > vlapic_find_highest_vector actually finds the highest vector, instead
> > of highest class...
> >
> > static int vlapic_find_highest_vector(const void *bitmap)
> > {
> > const uint32_t *word = bitmap;
> > unsigned int word_offset = NR_VECTORS / 32;
> >
> > /* Work backwards through the bitmap (first 32-bit word in every four). 
> > */
> > while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) )
> > continue;
> >
> > return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
> > }
> 
> Well, perhaps a PIR -> IRR syncing issue then (I in particular note
> the early bailing from vmx_sync_pir_to_irr())? I guess we'd want
> to see the entire IRR array (and perhaps also PI state) if the check
> in the assertion fails.
> 

Yes, I asked Chao to add some debug info in that case. The problem now
is when we will reproduce the bug to see meaningful hint...

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-20 Thread Jan Beulich
>>> On 18.01.17 at 11:23,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Wednesday, January 18, 2017 5:38 PM
>> 
>> >>> On 18.01.17 at 05:57,  wrote:
>> > Attached was my earlier comment:
>> >
>> > --
>> >> >>> On 20.12.16 at 06:37,  wrote:
>> >> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
>> >> >> Sent: Friday, December 16, 2016 5:40 PM
>> >> >> -if (pt_vector != -1)
>> >> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> >> +if ( pt_vector != -1 ) {
>> >> >> +if ( intack.vector > pt_vector )
>> >> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> >> +else
>> >> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> >> +}
>> >> >
>> >> > Above can be simplified as one line change:
>> >> > if ( pt_vector != -1 )
>> >> > vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >>
>> >> Hmm, I don't understand. Did you mean to use max() here? Or
>> >> else how is this an equivalent of the originally proposed code?
>> >>
>> >
>> > Original code is not 100% correct. The purpose is to set EOI exit
>> > bitmap for any vector which may block injection of pt_vector -
>> > give chance to recognize pt_vector in future intack and then do pt
>> > intr post. The simplified code achieves this effect same as original
>> > code if intack.vector >= vector. I cannot come up a case why
>> > intack.vector might be smaller than vector. If this case happens,
>> > we still need enable exit bitmap for intack.vector instead of
>> > pt_vector for said purpose while original code did it wrong.
>> >
>> > Thanks
>> > Kevin
>> > --
>> >
>> > Using intack.vector is always expected here regardless of the
>> > comparison result between intack.vector and pt_vector. The
>> > reason why I was OK adding an ASSERT was simply to test
>> > whether intack.vecor> > orthogonal to the fix itself.
>> 
>> Well, a vector lower than pt_vector can't block delivery. Or wait:
> 
> There are two points here:
> 
> a) We need enable EOI exit bitmap when pt_vector is blocked.
> 
> b) As you said, ideally a vector lower than pt_vecotr cannot block
> 
> The patch fixed a) and then added an ASSERT to verify b). Strictly
> speaking they are separate issues.

Okay, I think I finally understand your argumentation here.

>> Don't we need to consider vector classes here, i.e.
>> 
>> ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
>> 
>> ?
>> 
> 
> However it still doesn't explain why original ASSERT is triggered.
> vlapic_find_highest_vector actually finds the highest vector, instead
> of highest class...
> 
> static int vlapic_find_highest_vector(const void *bitmap)
> {
> const uint32_t *word = bitmap;
> unsigned int word_offset = NR_VECTORS / 32;
> 
> /* Work backwards through the bitmap (first 32-bit word in every four). */
> while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) )
> continue;
> 
> return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
> }

Well, perhaps a PIR -> IRR syncing issue then (I in particular note
the early bailing from vmx_sync_pir_to_irr())? I guess we'd want
to see the entire IRR array (and perhaps also PI state) if the check
in the assertion fails.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-20 Thread Xuquan (Quan Xu)
On January 16, 2017 1:26 PM, Tian, Kevin wrote:
>I cannot come up a valid reason for such situation (intack.vector is 0x30 while
>pt_vector is 0x38 from Chao's data). pt_update_irq is invoked before checking
>highest pending IRRs so pt_vector should be honored anyway.
>One possible reason is that being some reason pt_vector is not in vIRR at that
>point (due to some bug in the path from PIR to vIRR). 

btw, for PIR.. I find that there might be a bug in 
__vmx_deliver_posted_interrupt()...
why test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu)) ??

static void __vmx_deliver_posted_interrupt(struct vcpu *v)
{
...
if ( !test_and_set_bit(VCPU_KICK_SOFTIRQ, _pending(cpu))
...
}

Suppose that vCPUx is in guest mode, there are two (even more) interrupts to 
vCPUx..
As the bit is set when delivers the first interrupt... the second interrupt is 
pending until next VM entry -- by PIR to vIRR..


Quan




However I didn't catch
>such bug simply by looking at code. We need reproduce this problem in
>developer side to find out actual reason. Andrew it'd be helpful if you can 
>help
>Quan/Chao to find out more test environment info.
>
>One thing noted though. The original patch from Quan is actually orthogonal to
>this ASSERT. Regardless of whether intack.vector is larger or smaller than
>pt_vector, we always require the trick as long as pt_vector is not the one 
>being
>currently programmed to RVI. Then do we want to revert the whole commit
>until the problem is finally fixed, or OK to just remove ASSERT (or replace 
>with
>WARN_ON with more debug info) to unblock test system before the fix is
>ready?
>
>Thanks
>Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-20 Thread Jan Beulich
>>> On 20.01.17 at 09:47,  wrote:
> Jan, I can't follow vector classes.. could you explain more? Thanks..

For determining vector priority, the LAPIC uses only the high 4 bits.
Iirc this is well documented in the SDM.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-20 Thread Xuquan (Quan Xu)
On January 18, 2017 5:38 PM, Jan Beulich wrote:
 On 18.01.17 at 05:57,  wrote:
>> Attached was my earlier comment:
>>
>> --
>>> >>> On 20.12.16 at 06:37,  wrote:
>>> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
>>> >> Sent: Friday, December 16, 2016 5:40 PM
>>> >> -if (pt_vector != -1)
>>> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
>>> >> +if ( pt_vector != -1 ) {
>>> >> +if ( intack.vector > pt_vector )
>>> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
>>> >> +else
>>> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
>>> >> +}
>>> >
>>> > Above can be simplified as one line change:
>>> >   if ( pt_vector != -1 )
>>> >   vmx_set_eoi_exit_bitmap(v, intack.vector);
>>>
>>> Hmm, I don't understand. Did you mean to use max() here? Or else how
>>> is this an equivalent of the originally proposed code?
>>>
>>
>> Original code is not 100% correct. The purpose is to set EOI exit
>> bitmap for any vector which may block injection of pt_vector - give
>> chance to recognize pt_vector in future intack and then do pt intr
>> post. The simplified code achieves this effect same as original code
>> if intack.vector >= vector. I cannot come up a case why intack.vector
>> might be smaller than vector. If this case happens, we still need
>> enable exit bitmap for intack.vector instead of pt_vector for said
>> purpose while original code did it wrong.
>>
>> Thanks
>> Kevin
>> --
>>
>> Using intack.vector is always expected here regardless of the
>> comparison result between intack.vector and pt_vector. The reason why
>> I was OK adding an ASSERT was simply to test whether
>> intack.vecor> itself.
>
>Well, a vector lower than pt_vector can't block delivery. Or wait:
>Don't we need to consider vector classes here, i.e.
>
>ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
>

Jan, I can't follow vector classes.. could you explain more? Thanks..


Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-18 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, January 18, 2017 5:38 PM
> 
> >>> On 18.01.17 at 05:57,  wrote:
> > Attached was my earlier comment:
> >
> > --
> >> >>> On 20.12.16 at 06:37,  wrote:
> >> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
> >> >> Sent: Friday, December 16, 2016 5:40 PM
> >> >> -if (pt_vector != -1)
> >> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> +if ( pt_vector != -1 ) {
> >> >> +if ( intack.vector > pt_vector )
> >> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> >> +else
> >> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> >> +}
> >> >
> >> > Above can be simplified as one line change:
> >> >  if ( pt_vector != -1 )
> >> >  vmx_set_eoi_exit_bitmap(v, intack.vector);
> >>
> >> Hmm, I don't understand. Did you mean to use max() here? Or
> >> else how is this an equivalent of the originally proposed code?
> >>
> >
> > Original code is not 100% correct. The purpose is to set EOI exit
> > bitmap for any vector which may block injection of pt_vector -
> > give chance to recognize pt_vector in future intack and then do pt
> > intr post. The simplified code achieves this effect same as original
> > code if intack.vector >= vector. I cannot come up a case why
> > intack.vector might be smaller than vector. If this case happens,
> > we still need enable exit bitmap for intack.vector instead of
> > pt_vector for said purpose while original code did it wrong.
> >
> > Thanks
> > Kevin
> > --
> >
> > Using intack.vector is always expected here regardless of the
> > comparison result between intack.vector and pt_vector. The
> > reason why I was OK adding an ASSERT was simply to test
> > whether intack.vecor > orthogonal to the fix itself.
> 
> Well, a vector lower than pt_vector can't block delivery. Or wait:

There are two points here:

a) We need enable EOI exit bitmap when pt_vector is blocked.

b) As you said, ideally a vector lower than pt_vecotr cannot block

The patch fixed a) and then added an ASSERT to verify b). Strictly
speaking they are separate issues.

> Don't we need to consider vector classes here, i.e.
> 
> ASSERT((intack.vector >> 4) >= (pt_vector >> 4));
> 
> ?
> 

However it still doesn't explain why original ASSERT is triggered.
vlapic_find_highest_vector actually finds the highest vector, instead
of highest class...

static int vlapic_find_highest_vector(const void *bitmap)
{
const uint32_t *word = bitmap;
unsigned int word_offset = NR_VECTORS / 32;

/* Work backwards through the bitmap (first 32-bit word in every four). */
while ( (word_offset != 0) && (word[(--word_offset)*4] == 0) )
continue;

return (fls(word[word_offset*4]) - 1) + (word_offset * 32);
}

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-18 Thread Jan Beulich
>>> On 18.01.17 at 05:57,  wrote:
> Attached was my earlier comment:
> 
> --
>> >>> On 20.12.16 at 06:37,  wrote:
>> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
>> >> Sent: Friday, December 16, 2016 5:40 PM
>> >> -if (pt_vector != -1)
>> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +if ( pt_vector != -1 ) {
>> >> +if ( intack.vector > pt_vector )
>> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
>> >> +else
>> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
>> >> +}
>> >
>> > Above can be simplified as one line change:
>> >if ( pt_vector != -1 )
>> >vmx_set_eoi_exit_bitmap(v, intack.vector);
>> 
>> Hmm, I don't understand. Did you mean to use max() here? Or
>> else how is this an equivalent of the originally proposed code?
>> 
> 
> Original code is not 100% correct. The purpose is to set EOI exit
> bitmap for any vector which may block injection of pt_vector - 
> give chance to recognize pt_vector in future intack and then do pt 
> intr post. The simplified code achieves this effect same as original
> code if intack.vector >= vector. I cannot come up a case why
> intack.vector might be smaller than vector. If this case happens,
> we still need enable exit bitmap for intack.vector instead of
> pt_vector for said purpose while original code did it wrong.
> 
> Thanks
> Kevin
> --
> 
> Using intack.vector is always expected here regardless of the 
> comparison result between intack.vector and pt_vector. The 
> reason why I was OK adding an ASSERT was simply to test 
> whether intack.vecor orthogonal to the fix itself.

Well, a vector lower than pt_vector can't block delivery. Or wait:
Don't we need to consider vector classes here, i.e.

ASSERT((intack.vector >> 4) >= (pt_vector >> 4));

?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-17 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Monday, January 16, 2017 7:00 PM
> 
> >>> On 16.01.17 at 06:25,  wrote:
> > One thing noted though. The original patch from Quan is actually orthogonal
> > to this ASSERT. Regardless of whether intack.vector is larger or smaller
> > than pt_vector, we always require the trick as long as pt_vector is not the
> > one being currently programmed to RVI.
> 
> I don't think the ASSERT() addition is orthogonal: It exchanges
> intack.vector for pt_vector in the invocation of
> vmx_set_eoi_exit_bitmap(), and during discussion of the patch
> there at least intermediately was max() of the two used instead.
> It was - iirc - one of you who suggested that the use of max()
> there is unnecessary, which the ASSERT() triggering has now
> shown is wrong.

Attached was my earlier comment:

--
> >>> On 20.12.16 at 06:37,  wrote:
> >>  From: Xuquan (Quan Xu) [mailto:xuqu...@huawei.com]
> >> Sent: Friday, December 16, 2016 5:40 PM
> >> -if (pt_vector != -1)
> >> -vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +if ( pt_vector != -1 ) {
> >> +if ( intack.vector > pt_vector )
> >> +vmx_set_eoi_exit_bitmap(v, intack.vector);
> >> +else
> >> +vmx_set_eoi_exit_bitmap(v, pt_vector);
> >> +}
> >
> > Above can be simplified as one line change:
> > if ( pt_vector != -1 )
> > vmx_set_eoi_exit_bitmap(v, intack.vector);
> 
> Hmm, I don't understand. Did you mean to use max() here? Or
> else how is this an equivalent of the originally proposed code?
> 

Original code is not 100% correct. The purpose is to set EOI exit
bitmap for any vector which may block injection of pt_vector - 
give chance to recognize pt_vector in future intack and then do pt 
intr post. The simplified code achieves this effect same as original
code if intack.vector >= vector. I cannot come up a case why
intack.vector might be smaller than vector. If this case happens,
we still need enable exit bitmap for intack.vector instead of
pt_vector for said purpose while original code did it wrong.

Thanks
Kevin
--

Using intack.vector is always expected here regardless of the 
comparison result between intack.vector and pt_vector. The 
reason why I was OK adding an ASSERT was simply to test 
whether intack.vecor 
> > Then do we want to revert the whole
> > commit until the problem is finally fixed, or OK to just remove ASSERT
> > (or replace with WARN_ON with more debug info) to unblock test system
> > before the fix is ready?
> 
> Well, as the VMX maintainer I think the proposal of whether to
> revert or wait should really come from you.
> 
> Jan

Andrew, how long do you usually tolerate a failure case in osstest?
I'm not sure how long it may take for developer to reproduce this
situation. If it has blocking impact in your side, I'd suggest go 
replacing ASSERT with more informative warn info before final 
root-cause, if Quan cannot reproduce in a short time (say
1 or 2wk or so).

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-16 Thread Jan Beulich
>>> On 16.01.17 at 06:25,  wrote:
> One thing noted though. The original patch from Quan is actually orthogonal
> to this ASSERT. Regardless of whether intack.vector is larger or smaller
> than pt_vector, we always require the trick as long as pt_vector is not the
> one being currently programmed to RVI.

I don't think the ASSERT() addition is orthogonal: It exchanges
intack.vector for pt_vector in the invocation of
vmx_set_eoi_exit_bitmap(), and during discussion of the patch
there at least intermediately was max() of the two used instead.
It was - iirc - one of you who suggested that the use of max()
there is unnecessary, which the ASSERT() triggering has now
shown is wrong.

> Then do we want to revert the whole
> commit until the problem is finally fixed, or OK to just remove ASSERT 
> (or replace with WARN_ON with more debug info) to unblock test system
> before the fix is ready?

Well, as the VMX maintainer I think the proposal of whether to
revert or wait should really come from you.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-15 Thread Chao Gao
On Mon, Jan 16, 2017 at 06:27:23AM +, Xuquan (Quan Xu) wrote:
>On January 16, 2017 1:26 PM, Tian, Kevin wrote:
>>> From: Jan Beulich [mailto:jbeul...@suse.com]
>>> Sent: Thursday, January 12, 2017 8:26 PM
>>>
>>> >>> On 12.01.17 at 13:15,  wrote:
>>> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
>>> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>>> >>> On 12/01/2017 06:46, osstest service owner wrote:
>>>  flight 104131 xen-unstable real [real]
>>>  http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>> 
>>>  Regressions :-(
>>> 
>>>  Tests which did not succeed and are blocking, including tests
>>>  which could not be run:
>>>   test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>>> >>> REGR. vs. 104119
>>> >>>
>>> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >=
>>> >>> pt_vector' failed at
>>> >>> intr.c:321
>>> >>> Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64
>>> >>> debug=y Not tainted ]
>>> >>> Jan 12 01:25:37.141577 (XEN) CPU:14
>>> >>> Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
>>> >>> vmx_intr_assist+0x35e/0x51d
>>> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202
>>CONTEXT:
>>> >>> hypervisor (d15v0)
>>> >>> Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
>>> >>> 830079e1e000   rcx: 0030
>>> >>> Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
>>> >>> 0030   rdi: 830079e1e000
>>> >>> Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp:
>>83047de2fea8
>>> >>> r8:  82c00022f000
>>> >>> Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
>>> >>> 830176386560   r11: 01955ee79bd0
>>> >>> Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
>>> >>> 3002   r14: 0030
>>> >>> Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
>>> >>> 80050033   cr4: 003526e0
>>> >>> Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
>>> >>> 02487034
>>> >>> Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs:
>>
>>> >>> ss:    cs: e008
>>> >>> Jan 12 01:25:37.205606 (XEN) Xen code around 
>>> >>> (vmx_intr_assist+0x35e/0x51d):
>>> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48
>>> >>> 89 df e8 51
>>> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack
>>> >>> trace
>>> >> >from rsp=83047de2fea8:
>>> >>> Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
>>> >>> 83047de2 83023fec2000
>>> >>> Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
>>> >>> 830079e1e000 830079e1e000
>>> >>> Jan 12 01:25:37.245588 (XEN)83007bae2000
>>000e
>>> >>> 830233117000 83023fec2000
>>> >>> Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
>>> >>> 0004 00c2
>>> >>> Jan 12 01:25:37.261584 (XEN)0020
>>0007
>>> >>> 8800e8d28000 81add0a0
>>> >>> Jan 12 01:25:37.269607 (XEN)0246
>>
>>> >>> 88014248 0004
>>> >>> Jan 12 01:25:37.277580 (XEN)0036
>>
>>> >>> 03f8 03f8
>>> >>> Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
>>> >>> 813899a4 00bfbeef
>>> >>> Jan 12 01:25:37.293567 (XEN)0002
>>880147c03e08
>>> >>> beef 1cec835356e5beef
>>> >>> Jan 12 01:25:37.293606 (XEN)085d8b002674beef
>>01dcb38b000cbeef
>>> >>> 8914458d3174beef 2444c71e
>>> >>> Jan 12 01:25:37.301586 (XEN)830079e1e000
>>0031bfc37600
>>> >>> 003526e0
>>> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>>> >>> Jan 12 01:25:37.309639 (XEN)[]
>>> >>> vmx_intr_assist+0x35e/0x51d
>>> >>> Jan 12 01:25:37.317591 (XEN)[]
>>> >>> vmx_asm_vmexit_handler+0x41/0x120
>>> >>> Jan 12 01:25:37.325598 (XEN)
>>> >>> Jan 12 01:25:37.325624 (XEN)
>>> >>> Jan 12 01:25:37.325647 (XEN)
>>> >>> 
>>> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>>> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >=
>>> >>> pt_vector' failed at
>>> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>>> >>> 
>>> >>> Jan 12 01:25:37.341603 (XEN)
>>> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>>> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>>> >>> RESET_REG.
>>> >>>
>>> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv
>>> >>> issue".  It is not a deterministic issue, as it appears to have
>>> >>> survived a week of testing already, but there is clearly something still
>>problematic with the code.
>>> >>>
>>> >>
>>> >> Andrew,
>>> >> If you have, could you give more information?
>>> >
>>> > No further information sorry.  This was found by the automated test
>>system.
>>>
>>> But some can 

Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-15 Thread Xuquan (Quan Xu)
On January 16, 2017 1:26 PM, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, January 12, 2017 8:26 PM
>>
>> >>> On 12.01.17 at 13:15,  wrote:
>> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
>> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>> >>> On 12/01/2017 06:46, osstest service owner wrote:
>>  flight 104131 xen-unstable real [real]
>>  http://logs.test-lab.xenproject.org/osstest/logs/104131/
>> 
>>  Regressions :-(
>> 
>>  Tests which did not succeed and are blocking, including tests
>>  which could not be run:
>>   test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>> >>> REGR. vs. 104119
>> >>>
>> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >=
>> >>> pt_vector' failed at
>> >>> intr.c:321
>> >>> Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64
>> >>> debug=y Not tainted ]
>> >>> Jan 12 01:25:37.141577 (XEN) CPU:14
>> >>> Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
>> >>> vmx_intr_assist+0x35e/0x51d
>> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202
>CONTEXT:
>> >>> hypervisor (d15v0)
>> >>> Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
>> >>> 830079e1e000   rcx: 0030
>> >>> Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
>> >>> 0030   rdi: 830079e1e000
>> >>> Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp:
>83047de2fea8
>> >>> r8:  82c00022f000
>> >>> Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
>> >>> 830176386560   r11: 01955ee79bd0
>> >>> Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
>> >>> 3002   r14: 0030
>> >>> Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
>> >>> 80050033   cr4: 003526e0
>> >>> Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
>> >>> 02487034
>> >>> Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs:
>
>> >>> ss:    cs: e008
>> >>> Jan 12 01:25:37.205606 (XEN) Xen code around 
>> >>> (vmx_intr_assist+0x35e/0x51d):
>> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48
>> >>> 89 df e8 51
>> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack
>> >>> trace
>> >> >from rsp=83047de2fea8:
>> >>> Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
>> >>> 83047de2 83023fec2000
>> >>> Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
>> >>> 830079e1e000 830079e1e000
>> >>> Jan 12 01:25:37.245588 (XEN)83007bae2000
>000e
>> >>> 830233117000 83023fec2000
>> >>> Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
>> >>> 0004 00c2
>> >>> Jan 12 01:25:37.261584 (XEN)0020
>0007
>> >>> 8800e8d28000 81add0a0
>> >>> Jan 12 01:25:37.269607 (XEN)0246
>
>> >>> 88014248 0004
>> >>> Jan 12 01:25:37.277580 (XEN)0036
>
>> >>> 03f8 03f8
>> >>> Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
>> >>> 813899a4 00bfbeef
>> >>> Jan 12 01:25:37.293567 (XEN)0002
>880147c03e08
>> >>> beef 1cec835356e5beef
>> >>> Jan 12 01:25:37.293606 (XEN)085d8b002674beef
>01dcb38b000cbeef
>> >>> 8914458d3174beef 2444c71e
>> >>> Jan 12 01:25:37.301586 (XEN)830079e1e000
>0031bfc37600
>> >>> 003526e0
>> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>> >>> Jan 12 01:25:37.309639 (XEN)[]
>> >>> vmx_intr_assist+0x35e/0x51d
>> >>> Jan 12 01:25:37.317591 (XEN)[]
>> >>> vmx_asm_vmexit_handler+0x41/0x120
>> >>> Jan 12 01:25:37.325598 (XEN)
>> >>> Jan 12 01:25:37.325624 (XEN)
>> >>> Jan 12 01:25:37.325647 (XEN)
>> >>> 
>> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >=
>> >>> pt_vector' failed at
>> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>> >>> 
>> >>> Jan 12 01:25:37.341603 (XEN)
>> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>> >>> RESET_REG.
>> >>>
>> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv
>> >>> issue".  It is not a deterministic issue, as it appears to have
>> >>> survived a week of testing already, but there is clearly something still
>problematic with the code.
>> >>>
>> >>
>> >> Andrew,
>> >> If you have, could you give more information?
>> >
>> > No further information sorry.  This was found by the automated test
>system.
>>
>> But some can be gathered:
>>
>> > Full logs are available from
>> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i
>> > 386-xl-q
>> > emuu-debianhvm-amd64/
>> > but I 

Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-15 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, January 12, 2017 8:26 PM
> 
> >>> On 12.01.17 at 13:15,  wrote:
> > On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
> >> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
> >>> On 12/01/2017 06:46, osstest service owner wrote:
>  flight 104131 xen-unstable real [real]
>  http://logs.test-lab.xenproject.org/osstest/logs/104131/
> 
>  Regressions :-(
> 
>  Tests which did not succeed and are blocking, including tests which
>  could not be run:
>   test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
> >>> REGR. vs. 104119
> >>>
> >>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' 
> >>> failed at
> >>> intr.c:321
> >>> Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64  debug=y
> >>> Not tainted ]
> >>> Jan 12 01:25:37.141577 (XEN) CPU:14
> >>> Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
> >>> vmx_intr_assist+0x35e/0x51d
> >>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202   CONTEXT:
> >>> hypervisor (d15v0)
> >>> Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
> >>> 830079e1e000   rcx: 0030
> >>> Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
> >>> 0030   rdi: 830079e1e000
> >>> Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp: 83047de2fea8
> >>> r8:  82c00022f000
> >>> Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
> >>> 830176386560   r11: 01955ee79bd0
> >>> Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
> >>> 3002   r14: 0030
> >>> Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
> >>> 80050033   cr4: 003526e0
> >>> Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
> >>> 02487034
> >>> Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs: 
> >>> ss:    cs: e008
> >>> Jan 12 01:25:37.205606 (XEN) Xen code around 
> >>> (vmx_intr_assist+0x35e/0x51d):
> >>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 
> >>> 51
> >>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
> >> >from rsp=83047de2fea8:
> >>> Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
> >>> 83047de2 83023fec2000
> >>> Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
> >>> 830079e1e000 830079e1e000
> >>> Jan 12 01:25:37.245588 (XEN)83007bae2000 000e
> >>> 830233117000 83023fec2000
> >>> Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
> >>> 0004 00c2
> >>> Jan 12 01:25:37.261584 (XEN)0020 0007
> >>> 8800e8d28000 81add0a0
> >>> Jan 12 01:25:37.269607 (XEN)0246 
> >>> 88014248 0004
> >>> Jan 12 01:25:37.277580 (XEN)0036 
> >>> 03f8 03f8
> >>> Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
> >>> 813899a4 00bfbeef
> >>> Jan 12 01:25:37.293567 (XEN)0002 880147c03e08
> >>> beef 1cec835356e5beef
> >>> Jan 12 01:25:37.293606 (XEN)085d8b002674beef 01dcb38b000cbeef
> >>> 8914458d3174beef 2444c71e
> >>> Jan 12 01:25:37.301586 (XEN)830079e1e000 0031bfc37600
> >>> 003526e0
> >>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
> >>> Jan 12 01:25:37.309639 (XEN)[]
> >>> vmx_intr_assist+0x35e/0x51d
> >>> Jan 12 01:25:37.317591 (XEN)[]
> >>> vmx_asm_vmexit_handler+0x41/0x120
> >>> Jan 12 01:25:37.325598 (XEN)
> >>> Jan 12 01:25:37.325624 (XEN)
> >>> Jan 12 01:25:37.325647 (XEN)
> >>> 
> >>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
> >>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' 
> >>> failed at
> >>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
> >>> 
> >>> Jan 12 01:25:37.341603 (XEN)
> >>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
> >>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
> >>> RESET_REG.
> >>>
> >>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  
> >>> It is
> >>> not a deterministic issue, as it appears to have survived a week of 
> >>> testing
> >>> already, but there is clearly something still problematic with the code.
> >>>
> >>
> >> Andrew,
> >> If you have, could you give more information?
> >
> > No further information sorry.  This was found by the automated test system.
> 
> But some can be gathered:
> 
> > Full logs are available from
> > http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q
> > emuu-debianhvm-amd64/
> > but I doubt any of them will help in diagnosing the issue any further.
> >
> >> Such as the value of intack.vector / pt_vector..
> 
> At leastb one of the two values 

Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-12 Thread Chao Gao
According the code around the assert: 
movzbl %r14b, %esi  41 0f b6 f6 
cmp %esi, %eax  39 f0
jle ... 7e 02
ud2 <0f> 0b 
mov %rbx, %rdi  48 89 df
callq ...   e8 51 20 00 00 
mov $0x810, %eaxb8 10 08 00 00 

so I think one is 0x38 %eax, the other is 0x30 %esi

On Thu, Jan 12, 2017 at 12:07:53PM +, Xuquan (Quan Xu) wrote:
>On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>>On 12/01/2017 06:46, osstest service owner wrote:
>>> flight 104131 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking, including tests which
>>> could not be run:
>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>>REGR. vs. 104119
>>
>>Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>>intr.c:321
>>Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64  debug=y
>>Not tainted ]
>>Jan 12 01:25:37.141577 (XEN) CPU:14
>>Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
>>vmx_intr_assist+0x35e/0x51d
>>Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202   CONTEXT:
>>hypervisor (d15v0)
>>Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
>>830079e1e000   rcx: 0030
>>Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
>>0030   rdi: 830079e1e000
>>Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp: 83047de2fea8
>>r8:  82c00022f000
>>Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
>>830176386560   r11: 01955ee79bd0
>>Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
>>3002   r14: 0030
>>Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
>>80050033   cr4: 003526e0
>>Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
>>02487034
>>Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs: 
>>ss:    cs: e008
>>Jan 12 01:25:37.205606 (XEN) Xen code around 
>>(vmx_intr_assist+0x35e/0x51d):
>>Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>>20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
>>from rsp=83047de2fea8:
>>Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
>>83047de2 83023fec2000
>>Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
>>830079e1e000 830079e1e000
>>Jan 12 01:25:37.245588 (XEN)83007bae2000 000e
>>830233117000 83023fec2000
>>Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
>>0004 00c2
>>Jan 12 01:25:37.261584 (XEN)0020 0007
>>8800e8d28000 81add0a0
>>Jan 12 01:25:37.269607 (XEN)0246 
>>88014248 0004
>>Jan 12 01:25:37.277580 (XEN)0036 
>>03f8 03f8
>>Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
>>813899a4 00bfbeef
>>Jan 12 01:25:37.293567 (XEN)0002 880147c03e08
>>beef 1cec835356e5beef
>>Jan 12 01:25:37.293606 (XEN)085d8b002674beef 01dcb38b000cbeef
>>8914458d3174beef 2444c71e
>>Jan 12 01:25:37.301586 (XEN)830079e1e000 0031bfc37600
>>003526e0
>>Jan 12 01:25:37.309607 (XEN) Xen call trace:
>>Jan 12 01:25:37.309639 (XEN)[]
>>vmx_intr_assist+0x35e/0x51d
>>Jan 12 01:25:37.317591 (XEN)[]
>>vmx_asm_vmexit_handler+0x41/0x120
>>Jan 12 01:25:37.325598 (XEN)
>>Jan 12 01:25:37.325624 (XEN)
>>Jan 12 01:25:37.325647 (XEN)
>>
>>Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>>Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>>intr.c:321 Jan 12 01:25:37.341571 (XEN)
>>
>>Jan 12 01:25:37.341603 (XEN)
>>Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>>Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>>RESET_REG.
>>
>>This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>>not a deterministic issue, as it appears to have survived a week of testing
>>already, but there is clearly something still problematic with the code.
>>
>
>
>Andrew,
>If you have, could you give more information? Such as the value of 
>intack.vector / pt_vector..
>I guess, the reason may be that the intack.vector is ' uint8_t ' and the 
>pt_vector is 'int'..
>
>Or there is a corner case that intack.vector is __not__ the highest priority 
>vector..
>
>Kevin / Jan,  any thoughts?
>
>Quan
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-12 Thread Jan Beulich
>>> On 12.01.17 at 13:15,  wrote:
> On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
>> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>>> On 12/01/2017 06:46, osstest service owner wrote:
 flight 104131 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/104131/ 

 Regressions :-(

 Tests which did not succeed and are blocking, including tests which
 could not be run:
  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>>> REGR. vs. 104119
>>>
>>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed 
>>> at
>>> intr.c:321
>>> Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64  debug=y
>>> Not tainted ]
>>> Jan 12 01:25:37.141577 (XEN) CPU:14
>>> Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
>>> vmx_intr_assist+0x35e/0x51d
>>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202   CONTEXT:
>>> hypervisor (d15v0)
>>> Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
>>> 830079e1e000   rcx: 0030
>>> Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
>>> 0030   rdi: 830079e1e000
>>> Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp: 83047de2fea8
>>> r8:  82c00022f000
>>> Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
>>> 830176386560   r11: 01955ee79bd0
>>> Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
>>> 3002   r14: 0030
>>> Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
>>> 80050033   cr4: 003526e0
>>> Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
>>> 02487034
>>> Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs: 
>>> ss:    cs: e008
>>> Jan 12 01:25:37.205606 (XEN) Xen code around 
>>> (vmx_intr_assist+0x35e/0x51d):
>>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
>> >from rsp=83047de2fea8:
>>> Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
>>> 83047de2 83023fec2000
>>> Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
>>> 830079e1e000 830079e1e000
>>> Jan 12 01:25:37.245588 (XEN)83007bae2000 000e
>>> 830233117000 83023fec2000
>>> Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
>>> 0004 00c2
>>> Jan 12 01:25:37.261584 (XEN)0020 0007
>>> 8800e8d28000 81add0a0
>>> Jan 12 01:25:37.269607 (XEN)0246 
>>> 88014248 0004
>>> Jan 12 01:25:37.277580 (XEN)0036 
>>> 03f8 03f8
>>> Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
>>> 813899a4 00bfbeef
>>> Jan 12 01:25:37.293567 (XEN)0002 880147c03e08
>>> beef 1cec835356e5beef
>>> Jan 12 01:25:37.293606 (XEN)085d8b002674beef 01dcb38b000cbeef
>>> 8914458d3174beef 2444c71e
>>> Jan 12 01:25:37.301586 (XEN)830079e1e000 0031bfc37600
>>> 003526e0
>>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>>> Jan 12 01:25:37.309639 (XEN)[]
>>> vmx_intr_assist+0x35e/0x51d
>>> Jan 12 01:25:37.317591 (XEN)[]
>>> vmx_asm_vmexit_handler+0x41/0x120
>>> Jan 12 01:25:37.325598 (XEN)
>>> Jan 12 01:25:37.325624 (XEN)
>>> Jan 12 01:25:37.325647 (XEN)
>>> 
>>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed 
>>> at
>>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>>> 
>>> Jan 12 01:25:37.341603 (XEN)
>>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>>> RESET_REG.
>>>
>>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It 
>>> is
>>> not a deterministic issue, as it appears to have survived a week of testing
>>> already, but there is clearly something still problematic with the code.
>>>
>>
>> Andrew,
>> If you have, could you give more information?
> 
> No further information sorry.  This was found by the automated test system.

But some can be gathered:

> Full logs are available from
> http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-q 
> emuu-debianhvm-amd64/
> but I doubt any of them will help in diagnosing the issue any further.
> 
>> Such as the value of intack.vector / pt_vector..

At leastb one of the two values is likely to live in a register, and
hence its value would be available in the dump. Just takes looking
at the disassembly.

>> I guess, the reason may be that the intack.vector is ' uint8_t ' and the 
>> pt_vector is 'int'..

That would be odd.

>> Or there is a corner case that intack.vector is __not__ the highest 

Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-12 Thread Andrew Cooper
On 12/01/17 12:07, Xuquan (Quan Xu) wrote:
> On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>> On 12/01/2017 06:46, osstest service owner wrote:
>>> flight 104131 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking, including tests which
>>> could not be run:
>>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>> REGR. vs. 104119
>>
>> Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>> intr.c:321
>> Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64  debug=y
>> Not tainted ]
>> Jan 12 01:25:37.141577 (XEN) CPU:14
>> Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
>> vmx_intr_assist+0x35e/0x51d
>> Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202   CONTEXT:
>> hypervisor (d15v0)
>> Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
>> 830079e1e000   rcx: 0030
>> Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
>> 0030   rdi: 830079e1e000
>> Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp: 83047de2fea8
>> r8:  82c00022f000
>> Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
>> 830176386560   r11: 01955ee79bd0
>> Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
>> 3002   r14: 0030
>> Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
>> 80050033   cr4: 003526e0
>> Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
>> 02487034
>> Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs: 
>> ss:    cs: e008
>> Jan 12 01:25:37.205606 (XEN) Xen code around 
>> (vmx_intr_assist+0x35e/0x51d):
>> Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>> 20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
> >from rsp=83047de2fea8:
>> Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
>> 83047de2 83023fec2000
>> Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
>> 830079e1e000 830079e1e000
>> Jan 12 01:25:37.245588 (XEN)83007bae2000 000e
>> 830233117000 83023fec2000
>> Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
>> 0004 00c2
>> Jan 12 01:25:37.261584 (XEN)0020 0007
>> 8800e8d28000 81add0a0
>> Jan 12 01:25:37.269607 (XEN)0246 
>> 88014248 0004
>> Jan 12 01:25:37.277580 (XEN)0036 
>> 03f8 03f8
>> Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
>> 813899a4 00bfbeef
>> Jan 12 01:25:37.293567 (XEN)0002 880147c03e08
>> beef 1cec835356e5beef
>> Jan 12 01:25:37.293606 (XEN)085d8b002674beef 01dcb38b000cbeef
>> 8914458d3174beef 2444c71e
>> Jan 12 01:25:37.301586 (XEN)830079e1e000 0031bfc37600
>> 003526e0
>> Jan 12 01:25:37.309607 (XEN) Xen call trace:
>> Jan 12 01:25:37.309639 (XEN)[]
>> vmx_intr_assist+0x35e/0x51d
>> Jan 12 01:25:37.317591 (XEN)[]
>> vmx_asm_vmexit_handler+0x41/0x120
>> Jan 12 01:25:37.325598 (XEN)
>> Jan 12 01:25:37.325624 (XEN)
>> Jan 12 01:25:37.325647 (XEN)
>> 
>> Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>> Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>> intr.c:321 Jan 12 01:25:37.341571 (XEN)
>> 
>> Jan 12 01:25:37.341603 (XEN)
>> Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>> Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>> RESET_REG.
>>
>> This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>> not a deterministic issue, as it appears to have survived a week of testing
>> already, but there is clearly something still problematic with the code.
>>
>
> Andrew,
> If you have, could you give more information?

No further information sorry.  This was found by the automated test system.

Full logs are available from
http://logs.test-lab.xenproject.org/osstest/logs/104131/test-amd64-i386-xl-qemuu-debianhvm-amd64/
but I doubt any of them will help in diagnosing the issue any further.

> Such as the value of intack.vector / pt_vector..
> I guess, the reason may be that the intack.vector is ' uint8_t ' and the 
> pt_vector is 'int'..
>
> Or there is a corner case that intack.vector is __not__ the highest priority 
> vector..
>
> Kevin / Jan,  any thoughts?

It happened during domain shutdown.  It might be an edge case
interaction with qemu-raised interrupts via hypercall?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-12 Thread Xuquan (Quan Xu)
On January 12, 2017 5:14 PM, Andrew Cooper wrote:
>On 12/01/2017 06:46, osstest service owner wrote:
>> flight 104131 xen-unstable real [real]
>> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking, including tests which
>> could not be run:
>>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail
>REGR. vs. 104119
>
>Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>intr.c:321
>Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64  debug=y
>Not tainted ]
>Jan 12 01:25:37.141577 (XEN) CPU:14
>Jan 12 01:25:37.141607 (XEN) RIP:e008:[]
>vmx_intr_assist+0x35e/0x51d
>Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202   CONTEXT:
>hypervisor (d15v0)
>Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx:
>830079e1e000   rcx: 0030
>Jan 12 01:25:37.157582 (XEN) rdx:    rsi:
>0030   rdi: 830079e1e000
>Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp: 83047de2fea8
>r8:  82c00022f000
>Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10:
>830176386560   r11: 01955ee79bd0
>Jan 12 01:25:37.181582 (XEN) r12: 3002   r13:
>3002   r14: 0030
>Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0:
>80050033   cr4: 003526e0
>Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2:
>02487034
>Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs: 
>ss:    cs: e008
>Jan 12 01:25:37.205606 (XEN) Xen code around 
>(vmx_intr_assist+0x35e/0x51d):
>Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51
>20 00 00 b8 10 08 00 00 0f Jan 12 01:25:37.221561 (XEN) Xen stack trace
>from rsp=83047de2fea8:
>Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038
>83047de2 83023fec2000
>Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6
>830079e1e000 830079e1e000
>Jan 12 01:25:37.245588 (XEN)83007bae2000 000e
>830233117000 83023fec2000
>Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1
>0004 00c2
>Jan 12 01:25:37.261584 (XEN)0020 0007
>8800e8d28000 81add0a0
>Jan 12 01:25:37.269607 (XEN)0246 
>88014248 0004
>Jan 12 01:25:37.277580 (XEN)0036 
>03f8 03f8
>Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef
>813899a4 00bfbeef
>Jan 12 01:25:37.293567 (XEN)0002 880147c03e08
>beef 1cec835356e5beef
>Jan 12 01:25:37.293606 (XEN)085d8b002674beef 01dcb38b000cbeef
>8914458d3174beef 2444c71e
>Jan 12 01:25:37.301586 (XEN)830079e1e000 0031bfc37600
>003526e0
>Jan 12 01:25:37.309607 (XEN) Xen call trace:
>Jan 12 01:25:37.309639 (XEN)[]
>vmx_intr_assist+0x35e/0x51d
>Jan 12 01:25:37.317591 (XEN)[]
>vmx_asm_vmexit_handler+0x41/0x120
>Jan 12 01:25:37.325598 (XEN)
>Jan 12 01:25:37.325624 (XEN)
>Jan 12 01:25:37.325647 (XEN)
>
>Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
>Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at
>intr.c:321 Jan 12 01:25:37.341571 (XEN)
>
>Jan 12 01:25:37.341603 (XEN)
>Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
>Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O
>RESET_REG.
>
>This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is
>not a deterministic issue, as it appears to have survived a week of testing
>already, but there is clearly something still problematic with the code.
>


Andrew,
If you have, could you give more information? Such as the value of 
intack.vector / pt_vector..
I guess, the reason may be that the intack.vector is ' uint8_t ' and the 
pt_vector is 'int'..

Or there is a corner case that intack.vector is __not__ the highest priority 
vector..

Kevin / Jan,  any thoughts?

Quan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-12 Thread Andrew Cooper
On 12/01/2017 06:46, osstest service owner wrote:
> flight 104131 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/104131/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail REGR. vs. 
> 104119

Jan 12 01:25:17.397607 (XEN) Assertion 'intack.vector >= pt_vector' failed at 
intr.c:321
Jan 12 01:25:37.133596 (XEN) [ Xen-4.9-unstable  x86_64  debug=y   Not 
tainted ]
Jan 12 01:25:37.141577 (XEN) CPU:14
Jan 12 01:25:37.141607 (XEN) RIP:e008:[] 
vmx_intr_assist+0x35e/0x51d
Jan 12 01:25:37.149617 (XEN) RFLAGS: 00010202   CONTEXT: hypervisor 
(d15v0)
Jan 12 01:25:37.149655 (XEN) rax: 0038   rbx: 830079e1e000   
rcx: 0030
Jan 12 01:25:37.157582 (XEN) rdx:    rsi: 0030   
rdi: 830079e1e000
Jan 12 01:25:37.165584 (XEN) rbp: 83047de2ff08   rsp: 83047de2fea8   
r8:  82c00022f000
Jan 12 01:25:37.173579 (XEN) r9:  8301b63ede80   r10: 830176386560   
r11: 01955ee79bd0
Jan 12 01:25:37.181582 (XEN) r12: 3002   r13: 3002   
r14: 0030
Jan 12 01:25:37.189584 (XEN) r15: 83023fec2000   cr0: 80050033   
cr4: 003526e0
Jan 12 01:25:37.197572 (XEN) cr3: 000232edb000   cr2: 02487034
Jan 12 01:25:37.205569 (XEN) ds:    es:    fs:    gs:    ss: 
   cs: e008
Jan 12 01:25:37.205606 (XEN) Xen code around  
(vmx_intr_assist+0x35e/0x51d):
Jan 12 01:25:37.213575 (XEN)  41 0f b6 f6 39 f0 7e 02 <0f> 0b 48 89 df e8 51 20 
00 00 b8 10 08 00 00 0f
Jan 12 01:25:37.221561 (XEN) Xen stack trace from rsp=83047de2fea8:
Jan 12 01:25:37.229600 (XEN)82d08031aa80 0038 
83047de2 83023fec2000
Jan 12 01:25:37.237594 (XEN)83047de2fef8 82d080130cb6 
830079e1e000 830079e1e000
Jan 12 01:25:37.245588 (XEN)83007bae2000 000e 
830233117000 83023fec2000
Jan 12 01:25:37.253594 (XEN)83047de2fdc0 82d0801fdeb1 
0004 00c2
Jan 12 01:25:37.261584 (XEN)0020 0007 
8800e8d28000 81add0a0
Jan 12 01:25:37.269607 (XEN)0246  
88014248 0004
Jan 12 01:25:37.277580 (XEN)0036  
03f8 03f8
Jan 12 01:25:37.285584 (XEN)81add0a0 beefbeef 
813899a4 00bfbeef
Jan 12 01:25:37.293567 (XEN)0002 880147c03e08 
beef 1cec835356e5beef
Jan 12 01:25:37.293606 (XEN)085d8b002674beef 01dcb38b000cbeef 
8914458d3174beef 2444c71e
Jan 12 01:25:37.301586 (XEN)830079e1e000 0031bfc37600 
003526e0
Jan 12 01:25:37.309607 (XEN) Xen call trace:
Jan 12 01:25:37.309639 (XEN)[] vmx_intr_assist+0x35e/0x51d
Jan 12 01:25:37.317591 (XEN)[] 
vmx_asm_vmexit_handler+0x41/0x120
Jan 12 01:25:37.325598 (XEN) 
Jan 12 01:25:37.325624 (XEN) 
Jan 12 01:25:37.325647 (XEN) 
Jan 12 01:25:37.333653 (XEN) Panic on CPU 14:
Jan 12 01:25:37.333684 (XEN) Assertion 'intack.vector >= pt_vector' failed at 
intr.c:321
Jan 12 01:25:37.341571 (XEN) 
Jan 12 01:25:37.341603 (XEN) 
Jan 12 01:25:37.341626 (XEN) Reboot in five seconds...
Jan 12 01:25:37.349566 (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

This is caused by "x86/apicv: fix RTC periodic timer and apicv issue".  It is 
not a deterministic issue, as it appears to have survived a week of testing 
already, but there is clearly something still problematic with the code.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 104131: regressions - FAIL

2017-01-11 Thread osstest service owner
flight 104131 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/104131/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-debianhvm-amd64 16 guest-stop   fail REGR. vs. 104119

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds15 guest-start/debian.repeat fail REGR. vs. 104119
 test-armhf-armhf-libvirt-xsm 13 saverestore-support-checkfail  like 104104
 test-armhf-armhf-libvirt 13 saverestore-support-checkfail  like 104119
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail like 104119
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop fail like 104119
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stopfail like 104119
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stopfail like 104119
 test-armhf-armhf-libvirt-qcow2 12 saverestore-support-check   fail like 104119
 test-armhf-armhf-libvirt-raw 12 saverestore-support-checkfail  like 104119
 test-amd64-amd64-xl-rtds  9 debian-install   fail  like 104119

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 10 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 11 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 16 debian-hvm-install/l1/l2  fail never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 11 migrate-support-checkfail never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  11 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 13 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  0d045d65c19ac48b31344b566cbf82a0270e6e44
baseline version:
 xen  ffc103c223a6d12e5221f66b7e96396a61ba1b20

Last test of basis   104119  2017-01-11 06:45:46 Z0 days
Failing since104126  2017-01-11 16:44:54 Z0 days2 attempts
Testing same since   104131  2017-01-11 22:43:41 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Kevin Tian 
  Stefano Stabellini 
  Wei Liu 

jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt