Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-13 Thread Wu, Feng
> > > >This is what I've been remembering:
> > > >
> > > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9
> > > 927
> > > > 8b6ba885616bb183b88ad67
> > >
> > > The comment on this link describes exactly what I am experiencing.
> > > Thanks so much.
> >
> > Thanks Jan for providing the information above. Kelly, if you still met the 
> > same
> > issue after applying the patches, let us know, maybe I can consult some
> > hardware expert internally.
> 
> Turns out this was exactly my problem.  The description matched my symptoms
> and when I applied the patch the problem has gone away.
> Thanks,
> Kelly
> 

Good to hear this! :)

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-13 Thread Zytaruk, Kelly


> -Original Message-
> From: Wu, Feng [mailto:feng...@intel.com]
> Sent: Friday, May 13, 2016 3:11 AM
> To: Zytaruk, Kelly; Jan Beulich
> Cc: Tian, Kevin; xen-devel@lists.xen.org; Wu, Feng
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> 
> 
> > -Original Message-
> > From: Zytaruk, Kelly [mailto:kelly.zyta...@amd.com]
> > Sent: Thursday, May 12, 2016 10:21 PM
> > To: Jan Beulich <jbeul...@suse.com>
> > Cc: Wu, Feng <feng...@intel.com>; Tian, Kevin <kevin.t...@intel.com>;
> > xen- de...@lists.xen.org
> > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was
> > not
> > executed\n")
> >
> >
> >
> > > -Original Message-
> > > From: Jan Beulich [mailto:jbeul...@suse.com]
> > > Sent: Thursday, May 12, 2016 9:51 AM
> > > To: Zytaruk, Kelly
> > > Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> > > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was
> > > not
> > > executed\n")
> > >
> > > >>> On 12.05.16 at 14:36, <kelly.zyta...@amd.com> wrote:
> > > >> From: Jan Beulich [mailto:jbeul...@suse.com]
> > > >> Sent: Thursday, May 12, 2016 5:49 AM
> > > >> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> > > >> > During Xen boot I am seeing the panic in the subject line from
> > > >> > .../xen/drivers/passthrough/vgt/qinval.c
> > > >>
> > > >> And this is with current staging, or some much older version of Xen?
> > > >> (ISTR some issue with the invalidation request getting sent to
> > > >> the wrong IOMMU, leading to a timeout.)
> > > >
> > > > No this is not current Xen, it is with 4.2.
> > > >
> > > > Can you tell me more about the invalidation request getting sent
> > > > to the wrong IOMMU problem and approximately when it was fixed?
> > > > If you could identify the patch I could back port it into my copy of 
> > > > Xen for
> testing.
> > >
> > > Note that 4.2.5 has said change, and also note that you could have
> > > done
> > exactly
> > > what I have done now - go through the list of commits altering files
> > > in the vtd/ subtree.
> >
> > Unfortunately GIT is not my strong suit :( I am still learning to
> > navigate with it. I guess part of my problem with GIT is that I don't yet 
> > know
> what I don't know.
> >
> > >This is what I've been remembering:
> > >
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9
> > 927
> > > 8b6ba885616bb183b88ad67
> >
> > The comment on this link describes exactly what I am experiencing.
> > Thanks so much.
> 
> Thanks Jan for providing the information above. Kelly, if you still met the 
> same
> issue after applying the patches, let us know, maybe I can consult some
> hardware expert internally.

Turns out this was exactly my problem.  The description matched my symptoms and 
when I applied the patch the problem has gone away.
Thanks,
Kelly

> 
> Thanks,
> Feng
> 
> >
> > >
> > > Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-13 Thread Wu, Feng


> -Original Message-
> From: Zytaruk, Kelly [mailto:kelly.zyta...@amd.com]
> Sent: Thursday, May 12, 2016 10:21 PM
> To: Jan Beulich <jbeul...@suse.com>
> Cc: Wu, Feng <feng...@intel.com>; Tian, Kevin <kevin.t...@intel.com>; xen-
> de...@lists.xen.org
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> 
> 
> > -Original Message-
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: Thursday, May 12, 2016 9:51 AM
> > To: Zytaruk, Kelly
> > Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> > executed\n")
> >
> > >>> On 12.05.16 at 14:36, <kelly.zyta...@amd.com> wrote:
> > >> From: Jan Beulich [mailto:jbeul...@suse.com]
> > >> Sent: Thursday, May 12, 2016 5:49 AM
> > >> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> > >> > During Xen boot I am seeing the panic in the subject line from
> > >> > .../xen/drivers/passthrough/vgt/qinval.c
> > >>
> > >> And this is with current staging, or some much older version of Xen?
> > >> (ISTR some issue with the invalidation request getting sent to the
> > >> wrong IOMMU, leading to a timeout.)
> > >
> > > No this is not current Xen, it is with 4.2.
> > >
> > > Can you tell me more about the invalidation request getting sent to
> > > the wrong IOMMU problem and approximately when it was fixed?  If you
> > > could identify the patch I could back port it into my copy of Xen for 
> > > testing.
> >
> > Note that 4.2.5 has said change, and also note that you could have done
> exactly
> > what I have done now - go through the list of commits altering files in the 
> > vtd/
> > subtree.
> 
> Unfortunately GIT is not my strong suit :( I am still learning to navigate 
> with it. I
> guess part of my problem with GIT is that I don't yet know what I don't know.
> 
> >This is what I've been remembering:
> >
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9927
> > 8b6ba885616bb183b88ad67
> 
> The comment on this link describes exactly what I am experiencing.
> Thanks so much.

Thanks Jan for providing the information above. Kelly, if you still met the same
issue after applying the patches, let us know, maybe I can consult some hardware
expert internally.

Thanks,
Feng

> 
> >
> > Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-12 Thread Zytaruk, Kelly


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, May 12, 2016 9:51 AM
> To: Zytaruk, Kelly
> Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> >>> On 12.05.16 at 14:36, <kelly.zyta...@amd.com> wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, May 12, 2016 5:49 AM
> >> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> >> > During Xen boot I am seeing the panic in the subject line from
> >> > .../xen/drivers/passthrough/vgt/qinval.c
> >>
> >> And this is with current staging, or some much older version of Xen?
> >> (ISTR some issue with the invalidation request getting sent to the
> >> wrong IOMMU, leading to a timeout.)
> >
> > No this is not current Xen, it is with 4.2.
> >
> > Can you tell me more about the invalidation request getting sent to
> > the wrong IOMMU problem and approximately when it was fixed?  If you
> > could identify the patch I could back port it into my copy of Xen for 
> > testing.
> 
> Note that 4.2.5 has said change, and also note that you could have done 
> exactly
> what I have done now - go through the list of commits altering files in the 
> vtd/
> subtree. 

Unfortunately GIT is not my strong suit :( I am still learning to navigate with 
it. I guess part of my problem with GIT is that I don't yet know what I don't 
know.

>This is what I've been remembering:
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9927
> 8b6ba885616bb183b88ad67

The comment on this link describes exactly what I am experiencing.
Thanks so much.

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-12 Thread Jan Beulich
>>> On 12.05.16 at 14:36,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, May 12, 2016 5:49 AM
>> >>> On 11.05.16 at 15:51,  wrote:
>> > During Xen boot I am seeing the panic in the subject line from
>> > .../xen/drivers/passthrough/vgt/qinval.c
>> 
>> And this is with current staging, or some much older version of Xen?
>> (ISTR some issue with the invalidation request getting sent to the wrong
>> IOMMU, leading to a timeout.)
> 
> No this is not current Xen, it is with 4.2.
> 
> Can you tell me more about the invalidation request getting sent to the 
> wrong IOMMU problem and approximately when it was fixed?  If you could 
> identify the patch I could back port it into my copy of Xen for testing.

Note that 4.2.5 has said change, and also note that you could have
done exactly what I have done now - go through the list of commits
altering files in the vtd/ subtree. This is what I've been remembering:
http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb99278b6ba885616bb183b88ad67

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-12 Thread Zytaruk, Kelly


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, May 12, 2016 5:49 AM
> To: Zytaruk, Kelly
> Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> > During Xen boot I am seeing the panic in the subject line from
> > .../xen/drivers/passthrough/vgt/qinval.c
> 
> And this is with current staging, or some much older version of Xen?
> (ISTR some issue with the invalidation request getting sent to the wrong
> IOMMU, leading to a timeout.)

No this is not current Xen, it is with 4.2.

Can you tell me more about the invalidation request getting sent to the wrong 
IOMMU problem and approximately when it was fixed?  If you could identify the 
patch I could back port it into my copy of Xen for testing.

This is a NUMA system with 2 IOMMUs
I have 4 devices on 2 PCIe cards (2 per card)
They reside at the following locations 3:0.0, 5:0.0, 83:0.0 and 85:0.0
From what I understand about NUMA, based on the BDFs,  2 devices should be on 
one IOMMU and the other 2 should on the other IOMMU.

I put in some more print statements last night and discovered that during boot 
Xen attaches all 4 devices to the same IOMMU structure. Xen sends out a flush 
to all 4 devices on the first IOMMU and then follows it with a Wait 
invalidation packet to the same IOMMU.  Below is what I am seeing;

(XEN) IOMMU LIST - List of defined IOMMU structures
(XEN) iommu[00] @ 83103fffa5c0, Q=2060c04002, HEAD=90, TAIL=90
(XEN) Seq Num = 0, pt_levels = 4, cap = 0x00d2078c106f0466, ecap = 
0x00f020df, domid_bitmap = 1, domid_map=0x0
(XEN) iommu[01] @ 83103fffa790, Q=103ffec002, HEAD=bd0, TAIL=bd0
(XEN) Seq Num = 1, pt_levels = 4, cap = 0x00d2078c106f0466, ecap = 
0x00f020df, domid_bitmap = 1, domid_map=0x0

(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x8303 for 83:00.0 (index = 9), iommu = 83103fffa5c0, fault = 
0x
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x8103 for 81:00.0 (index = 10), iommu = 83103fffa5c0, fault = 
0x
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x0503 for 05:00.0 (index = 11), iommu = 83103fffa5c0, fault = 
0x
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x0303 for 03:00.0 (index = 12), iommu = 83103fffa5c0, fault = 
0x
(XEN) queue_invalidate_wait (iommu = 83103fffa5c0)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) queue invalidate wait descriptor was not executed
(XEN) 

Is it a bug to have all 4 devices on the same IOMMU?  Is this why the Wait 
Invalidation is failing?
Actually I am not sure if Xen is attaching all 4 devices to the same IOMMU or 
if it is generating the dev iotlb descriptors wrong

> 
> > From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware
> > timeout on the invalidate
> >
> > Disabling queued invalidation is not an option.  I need to find out
> > why the operation is timing out and fix it.
> >
> > I found two timeouts; one in software and one in hardware.
> > After the invalidate is submitted there is a wait packet submitted and
> > the boot software waits for the wait packet to complete in a loop with
> > a software timeout.  At the end of the software timeout it issues the
> > panic.  I can increase the software timeout but it still doesn't solve
> > the problem.  Just before the panic I dump the value of the Fault
> > Status Register and I see that the hardware has already timed out
> > (FSTS_REG = 0x40 = ITE = "Invalidation Timeout Error").  As a first
> > step in solving this I would like to increase the hardware timeout value.
> >
> > I have the Intel spec and I was reading from the spec...
> >
> > " Hardware starts an invalidation completion timer for this ITag, and
> > issues the invalidation request message to the specified endpoint. If
> > the invalidation command from software is for a first-level mapping,
> > the invalidation request message is generated with the appropriate
> > PASID prefix to identify the target PASID. The invalidation completion
> > time-out value is recommended to be sufficiently larger than the
> > PCI-Express read completion time-outs. "
> >
> > The above leads me to believe that there should be some way of setting
> > the invalidation completion time-out value.  Unfortunately I couldn't
> > find anything in the Intel spec that tells me how to set the "invalidation
> > com

Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-12 Thread Jan Beulich
>>> On 11.05.16 at 15:51,  wrote:
> During Xen boot I am seeing the panic in the subject line from 
> .../xen/drivers/passthrough/vgt/qinval.c

And this is with current staging, or some much older version of Xen?
(ISTR some issue with the invalidation request getting sent to the
wrong IOMMU, leading to a timeout.)

> From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware 
> timeout on the invalidate
> 
> Disabling queued invalidation is not an option.  I need to find out why the 
> operation is timing out and fix it.  
> 
> I found two timeouts; one in software and one in hardware. 
> After the invalidate is submitted there is a wait packet submitted and the 
> boot software waits for the wait packet to complete in a loop with a software 
> timeout.  At the end of the software timeout it issues the panic.  I can 
> increase the software timeout but it still doesn't solve the problem.  Just 
> before the panic I dump the value of the Fault Status Register and I see that 
> the hardware has already timed out (FSTS_REG = 0x40 = ITE = "Invalidation 
> Timeout Error").  As a first step in solving this I would like to increase 
> the hardware timeout value.
> 
> I have the Intel spec and I was reading from the spec...
> 
> " Hardware starts an invalidation completion timer for this ITag, and issues 
> the invalidation request message to the specified endpoint. If the 
> invalidation command from software is for a first-level mapping, the 
> invalidation request message is generated with the appropriate PASID prefix 
> to identify the target PASID. The invalidation completion time-out value is 
> recommended to be sufficiently larger than the PCI-Express read completion 
> time-outs. "
> 
> The above leads me to believe that there should be some way of setting the 
> invalidation completion time-out value.  Unfortunately I couldn't find 
> anything in the Intel spec that tells me how to set the "invalidation 
> completion time-out".   Can someone point me in the right direction to 
> setting the completion timer?

For this I guess you should have Cc-ed the VT-d maintainers, which
I have now done.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-11 Thread Zytaruk, Kelly
During Xen boot I am seeing the panic in the subject line from 
.../xen/drivers/passthrough/vgt/qinval.c

From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware timeout 
on the invalidate

Disabling queued invalidation is not an option.  I need to find out why the 
operation is timing out and fix it.  

I found two timeouts; one in software and one in hardware. 
After the invalidate is submitted there is a wait packet submitted and the boot 
software waits for the wait packet to complete in a loop with a software 
timeout.  At the end of the software timeout it issues the panic.  I can 
increase the software timeout but it still doesn't solve the problem.  Just 
before the panic I dump the value of the Fault Status Register and I see that 
the hardware has already timed out (FSTS_REG = 0x40 = ITE = "Invalidation 
Timeout Error").  As a first step in solving this I would like to increase the 
hardware timeout value.

I have the Intel spec and I was reading from the spec...

" Hardware starts an invalidation completion timer for this ITag, and issues 
the invalidation request message to the specified endpoint. If the invalidation 
command from software is for a first-level mapping, the invalidation request 
message is generated with the appropriate PASID prefix to identify the target 
PASID. The invalidation completion time-out value is recommended to be 
sufficiently larger than the PCI-Express read completion time-outs. "

The above leads me to believe that there should be some way of setting the 
invalidation completion time-out value.  Unfortunately I couldn't find anything 
in the Intel spec that tells me how to set the "invalidation completion 
time-out".   Can someone point me in the right direction to setting the 
completion timer?

Thanks,
Kelly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel