Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-13 Thread Zytaruk, Kelly


> -Original Message-
> From: Wu, Feng [mailto:feng...@intel.com]
> Sent: Friday, May 13, 2016 3:11 AM
> To: Zytaruk, Kelly; Jan Beulich
> Cc: Tian, Kevin; xen-devel@lists.xen.org; Wu, Feng
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> 
> 
> > -Original Message-
> > From: Zytaruk, Kelly [mailto:kelly.zyta...@amd.com]
> > Sent: Thursday, May 12, 2016 10:21 PM
> > To: Jan Beulich <jbeul...@suse.com>
> > Cc: Wu, Feng <feng...@intel.com>; Tian, Kevin <kevin.t...@intel.com>;
> > xen- de...@lists.xen.org
> > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was
> > not
> > executed\n")
> >
> >
> >
> > > -Original Message-
> > > From: Jan Beulich [mailto:jbeul...@suse.com]
> > > Sent: Thursday, May 12, 2016 9:51 AM
> > > To: Zytaruk, Kelly
> > > Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> > > Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was
> > > not
> > > executed\n")
> > >
> > > >>> On 12.05.16 at 14:36, <kelly.zyta...@amd.com> wrote:
> > > >> From: Jan Beulich [mailto:jbeul...@suse.com]
> > > >> Sent: Thursday, May 12, 2016 5:49 AM
> > > >> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> > > >> > During Xen boot I am seeing the panic in the subject line from
> > > >> > .../xen/drivers/passthrough/vgt/qinval.c
> > > >>
> > > >> And this is with current staging, or some much older version of Xen?
> > > >> (ISTR some issue with the invalidation request getting sent to
> > > >> the wrong IOMMU, leading to a timeout.)
> > > >
> > > > No this is not current Xen, it is with 4.2.
> > > >
> > > > Can you tell me more about the invalidation request getting sent
> > > > to the wrong IOMMU problem and approximately when it was fixed?
> > > > If you could identify the patch I could back port it into my copy of 
> > > > Xen for
> testing.
> > >
> > > Note that 4.2.5 has said change, and also note that you could have
> > > done
> > exactly
> > > what I have done now - go through the list of commits altering files
> > > in the vtd/ subtree.
> >
> > Unfortunately GIT is not my strong suit :( I am still learning to
> > navigate with it. I guess part of my problem with GIT is that I don't yet 
> > know
> what I don't know.
> >
> > >This is what I've been remembering:
> > >
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9
> > 927
> > > 8b6ba885616bb183b88ad67
> >
> > The comment on this link describes exactly what I am experiencing.
> > Thanks so much.
> 
> Thanks Jan for providing the information above. Kelly, if you still met the 
> same
> issue after applying the patches, let us know, maybe I can consult some
> hardware expert internally.

Turns out this was exactly my problem.  The description matched my symptoms and 
when I applied the patch the problem has gone away.
Thanks,
Kelly

> 
> Thanks,
> Feng
> 
> >
> > >
> > > Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-12 Thread Zytaruk, Kelly


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, May 12, 2016 9:51 AM
> To: Zytaruk, Kelly
> Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> Subject: RE: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> >>> On 12.05.16 at 14:36, <kelly.zyta...@amd.com> wrote:
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, May 12, 2016 5:49 AM
> >> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> >> > During Xen boot I am seeing the panic in the subject line from
> >> > .../xen/drivers/passthrough/vgt/qinval.c
> >>
> >> And this is with current staging, or some much older version of Xen?
> >> (ISTR some issue with the invalidation request getting sent to the
> >> wrong IOMMU, leading to a timeout.)
> >
> > No this is not current Xen, it is with 4.2.
> >
> > Can you tell me more about the invalidation request getting sent to
> > the wrong IOMMU problem and approximately when it was fixed?  If you
> > could identify the patch I could back port it into my copy of Xen for 
> > testing.
> 
> Note that 4.2.5 has said change, and also note that you could have done 
> exactly
> what I have done now - go through the list of commits altering files in the 
> vtd/
> subtree. 

Unfortunately GIT is not my strong suit :( I am still learning to navigate with 
it. I guess part of my problem with GIT is that I don't yet know what I don't 
know.

>This is what I've been remembering:
> http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=84c340ba4c3eb9927
> 8b6ba885616bb183b88ad67

The comment on this link describes exactly what I am experiencing.
Thanks so much.

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-12 Thread Zytaruk, Kelly


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, May 12, 2016 5:49 AM
> To: Zytaruk, Kelly
> Cc: Feng Wu; Kevin Tian; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] panic("queue invalidate wait descriptor was not
> executed\n")
> 
> >>> On 11.05.16 at 15:51, <kelly.zyta...@amd.com> wrote:
> > During Xen boot I am seeing the panic in the subject line from
> > .../xen/drivers/passthrough/vgt/qinval.c
> 
> And this is with current staging, or some much older version of Xen?
> (ISTR some issue with the invalidation request getting sent to the wrong
> IOMMU, leading to a timeout.)

No this is not current Xen, it is with 4.2.

Can you tell me more about the invalidation request getting sent to the wrong 
IOMMU problem and approximately when it was fixed?  If you could identify the 
patch I could back port it into my copy of Xen for testing.

This is a NUMA system with 2 IOMMUs
I have 4 devices on 2 PCIe cards (2 per card)
They reside at the following locations 3:0.0, 5:0.0, 83:0.0 and 85:0.0
From what I understand about NUMA, based on the BDFs,  2 devices should be on 
one IOMMU and the other 2 should on the other IOMMU.

I put in some more print statements last night and discovered that during boot 
Xen attaches all 4 devices to the same IOMMU structure. Xen sends out a flush 
to all 4 devices on the first IOMMU and then follows it with a Wait 
invalidation packet to the same IOMMU.  Below is what I am seeing;

(XEN) IOMMU LIST - List of defined IOMMU structures
(XEN) iommu[00] @ 83103fffa5c0, Q=2060c04002, HEAD=90, TAIL=90
(XEN) Seq Num = 0, pt_levels = 4, cap = 0x00d2078c106f0466, ecap = 
0x00f020df, domid_bitmap = 1, domid_map=0x0
(XEN) iommu[01] @ 83103fffa790, Q=103ffec002, HEAD=bd0, TAIL=bd0
(XEN) Seq Num = 1, pt_levels = 4, cap = 0x00d2078c106f0466, ecap = 
0x00f020df, domid_bitmap = 1, domid_map=0x0

(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x8303 for 83:00.0 (index = 9), iommu = 83103fffa5c0, fault = 
0x
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x8103 for 81:00.0 (index = 10), iommu = 83103fffa5c0, fault = 
0x
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x0503 for 05:00.0 (index = 11), iommu = 83103fffa5c0, fault = 
0x
(XEN) gen_dev_iotlb_inv_dsc - DEVICE IOTLB Descriptor 0x7001 
0x0303 for 03:00.0 (index = 12), iommu = 83103fffa5c0, fault = 
0x
(XEN) queue_invalidate_wait (iommu = 83103fffa5c0)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) queue invalidate wait descriptor was not executed
(XEN) 

Is it a bug to have all 4 devices on the same IOMMU?  Is this why the Wait 
Invalidation is failing?
Actually I am not sure if Xen is attaching all 4 devices to the same IOMMU or 
if it is generating the dev iotlb descriptors wrong

> 
> > From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware
> > timeout on the invalidate
> >
> > Disabling queued invalidation is not an option.  I need to find out
> > why the operation is timing out and fix it.
> >
> > I found two timeouts; one in software and one in hardware.
> > After the invalidate is submitted there is a wait packet submitted and
> > the boot software waits for the wait packet to complete in a loop with
> > a software timeout.  At the end of the software timeout it issues the
> > panic.  I can increase the software timeout but it still doesn't solve
> > the problem.  Just before the panic I dump the value of the Fault
> > Status Register and I see that the hardware has already timed out
> > (FSTS_REG = 0x40 = ITE = "Invalidation Timeout Error").  As a first
> > step in solving this I would like to increase the hardware timeout value.
> >
> > I have the Intel spec and I was reading from the spec...
> >
> > " Hardware starts an invalidation completion timer for this ITag, and
> > issues the invalidation request message to the specified endpoint. If
> > the invalidation command from software is for a first-level mapping,
> > the invalidation request message is generated with the appropriate
> > PASID prefix to identify the target PASID. The invalidation completion
> > time-out value is recommended to be sufficiently larger than the
> > PCI-Express read completion time-outs. "
> >
> > The above leads me to believe that there should be some way of setting
> > the invalidation completion time-out value.  Unfortunately I couldn't
> > find anything in the Intel spec that tells me how to set the "invalidation
> > com

[Xen-devel] panic("queue invalidate wait descriptor was not executed\n")

2016-05-11 Thread Zytaruk, Kelly
During Xen boot I am seeing the panic in the subject line from 
.../xen/drivers/passthrough/vgt/qinval.c

From the Fault Status Register (= 0x40 (ITE)). I am seeing a hardware timeout 
on the invalidate

Disabling queued invalidation is not an option.  I need to find out why the 
operation is timing out and fix it.  

I found two timeouts; one in software and one in hardware. 
After the invalidate is submitted there is a wait packet submitted and the boot 
software waits for the wait packet to complete in a loop with a software 
timeout.  At the end of the software timeout it issues the panic.  I can 
increase the software timeout but it still doesn't solve the problem.  Just 
before the panic I dump the value of the Fault Status Register and I see that 
the hardware has already timed out (FSTS_REG = 0x40 = ITE = "Invalidation 
Timeout Error").  As a first step in solving this I would like to increase the 
hardware timeout value.

I have the Intel spec and I was reading from the spec...

" Hardware starts an invalidation completion timer for this ITag, and issues 
the invalidation request message to the specified endpoint. If the invalidation 
command from software is for a first-level mapping, the invalidation request 
message is generated with the appropriate PASID prefix to identify the target 
PASID. The invalidation completion time-out value is recommended to be 
sufficiently larger than the PCI-Express read completion time-outs. "

The above leads me to believe that there should be some way of setting the 
invalidation completion time-out value.  Unfortunately I couldn't find anything 
in the Intel spec that tells me how to set the "invalidation completion 
time-out".   Can someone point me in the right direction to setting the 
completion timer?

Thanks,
Kelly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] dumping Xen stack

2016-05-09 Thread Zytaruk, Kelly


> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Monday, May 09, 2016 1:14 PM
> To: Zytaruk, Kelly; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] dumping Xen stack
> 
> On 09/05/16 18:11, Zytaruk, Kelly wrote:
> >
> >> -Original Message-
> >> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> >> Sent: Monday, May 09, 2016 12:40 PM
> >> To: Zytaruk, Kelly; xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] dumping Xen stack
> >>
> >> On 09/05/16 17:37, Zytaruk, Kelly wrote:
> >>> Does Xen have an equivalent function to the Linux dump_stack() function?
> >>>
> >>> I am hitting a panic followed by a reboot and would like to find out
> >>> where I am
> >> coming from.
> >>
> >> At the point of a crash, the stack should be printed on the console.
> > The only thing I am seeing on the console is the panic message followed by 
> > the
> system will reboot in 5 sec.
> 
> Ah - plain panic()s don't automatically dump register/stack information.
> 
> >> Alternatively, show_execution_state() at any point should dump the
> >> Xen register/stack.
> > I looked up show_execution_state() and show_trace() and they both require a
> pointer to registers.
> > I am hitting a panic during boot in queue_invalidate_wait() in
> .../drivers/passthrough/vtd/qinval.c .  I am not sure how to get the register
> pointer from here.
> 
> Oops sorry - I meant dump_execution_state() which takes no parameters.

That is exactly what I am looking for.  Thx.

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] dumping Xen stack

2016-05-09 Thread Zytaruk, Kelly


> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Monday, May 09, 2016 12:40 PM
> To: Zytaruk, Kelly; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] dumping Xen stack
> 
> On 09/05/16 17:37, Zytaruk, Kelly wrote:
> > Does Xen have an equivalent function to the Linux dump_stack() function?
> >
> > I am hitting a panic followed by a reboot and would like to find out where 
> > I am
> coming from.
> 
> At the point of a crash, the stack should be printed on the console.

The only thing I am seeing on the console is the panic message followed by the 
system will reboot in 5 sec.
> 
> Alternatively, show_execution_state() at any point should dump the Xen
> register/stack.

I looked up show_execution_state() and show_trace() and they both require a 
pointer to registers.
I am hitting a panic during boot in queue_invalidate_wait() in 
.../drivers/passthrough/vtd/qinval.c .  I am not sure how to get the register 
pointer from here.

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] dumping Xen stack

2016-05-09 Thread Zytaruk, Kelly
Does Xen have an equivalent function to the Linux dump_stack() function?

I am hitting a panic followed by a reboot and would like to find out where I am 
coming from.

Thanks,
Kelly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Debugging Xen early boot

2016-05-06 Thread Zytaruk, Kelly


> -Original Message-
> From: David Vrabel [mailto:david.vra...@citrix.com]
> Sent: Friday, May 06, 2016 9:04 AM
> To: Zytaruk, Kelly; Konrad Rzeszutek Wilk
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Debugging Xen early boot
> 
> On 06/05/16 13:31, Zytaruk, Kelly wrote:
> >
> > As for the other question I am still curious as to how to debug early
> > Xen. How was __start_xen written and tested, what sort of tools were
> > used to debug it and get it working right? What do I do in situations
> > where I absolutely can't get serial working properly?
> 
> When I was working on the kexec code in Xen I had some success with using an
> ICE.  They're not cheap though and no off-the-shelf x86 motherboard has the
> correct header to connect an ICE...

Looks like my options are limited as I am on a budget :(
I think I will pass on trying to resolve the issue with the bad PCIe port and 
use the one that works.

> 
> David
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Debugging Xen early boot

2016-05-06 Thread Zytaruk, Kelly


> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Jan
> Beulich
> Sent: Friday, May 06, 2016 8:54 AM
> To: Zytaruk, Kelly
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Debugging Xen early boot
> 
> >>> On 06.05.16 at 14:31, <kelly.zyta...@amd.com> wrote:
> > As for the other question I am still curious as to how to debug early Xen.
> > How was __start_xen written and tested, what sort of tools were used
> > to debug it and get it working right?
> 
> I don't know how it was done here, but having been through the exercise of
> debugging early boot code in other environments, I can only say: Get creative.
> E.g. leverage whatever storage you have that persists across at least a warm
> reboot. Depending on system I have found e.g. video RAM or some of the I/O
> ports in the 0x80-0x8f range usable. I also recall that on one specific box I 
> had to
> resort to using some of the CMOS non-volatile RAM or even the RTC registers
> that don't change rapidly.

Ouch! Sounds painful.  Looks like I had better hope that everything works fine 
up until the end of the init of the 16550s :(

> 
> >  What do I do in situations where I absolutely can't get serial
> > working properly?
> 
> Get serial working properly ;-) ? Or try USB (EHCI). Or use text mode video
> output.
> 
> Jan
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Debugging Xen early boot

2016-05-06 Thread Zytaruk, Kelly
Konrad, immediate problem is solved although I don't know why.  I tried putting 
the serial card into a different PCIe slot and it works properly.  Must be 
something wrong with the slot.  I will leave that as a problem for another day.

As for the other question I am still curious as to how to debug early Xen.  How 
was __start_xen written and tested, what sort of tools were used to debug it 
and get it working right?  What do I do in situations where I absolutely can't 
get serial working properly?

Thanks,
Kelly

> -Original Message-
> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
> Sent: Thursday, May 05, 2016 10:20 AM
> To: Zytaruk, Kelly
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Debugging Xen early boot
> 
> On Thu, May 05, 2016 at 01:24:26PM +, Zytaruk, Kelly wrote:
> > I am having problems getting XEN boot messages over a serial console.
> >
> > I have two systems.  On one system I can see the XEN boot message over a
> serial console whereas on the other system I cannot.
> >
> > I physically move the same PCIe serial card and hard drive from one system 
> > to
> the other.  One system shows me the Xen boot messages and the other doesn't.
> >
> > My config in /etc/grub.d/20_linux_xen is
> > xen_args="dom0_max_vcpus=4 dom0_mem=2048M,max:3072M iommu=1
> conring_size=16384 loglvl=all guest_loglvl=all com1=115200,8n1,pci
> console=com1"
> >
> > I put a printk at the beginning of the Linux boot and I see the Dom0 Linux 
> > boot
> messages on both systems.  So I know that the serial port works on both
> systems but for some reason I don't see the Xen messages.
> 
> Also, are there any other Serial or Communication devices on the other
> machine? Could you provide the lspci -v from both machines?
> It may be that there is a built-in on the motherboard (like an AMT type 
> thing, or
> IPMI SoL?)
> 
> And if you boot Linux on it, can you do 'dmesg | grep tty' to see what it 
> finds in
> terms of serial cards?
> 
> >
> > I would like to debug through __start_xen to see how the 16550 is being
> initialized.  Is there an easy way to debug __start_xen?
> 
> >
> > Thanks,
> > Kelly
> >
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Debugging Xen early boot

2016-05-05 Thread Zytaruk, Kelly
I am having problems getting XEN boot messages over a serial console.

I have two systems.  On one system I can see the XEN boot message over a serial 
console whereas on the other system I cannot.

I physically move the same PCIe serial card and hard drive from one system to 
the other.  One system shows me the Xen boot messages and the other doesn't.

My config in /etc/grub.d/20_linux_xen is
xen_args="dom0_max_vcpus=4 dom0_mem=2048M,max:3072M iommu=1 conring_size=16384 
loglvl=all guest_loglvl=all com1=115200,8n1,pci console=com1"

I put a printk at the beginning of the Linux boot and I see the Dom0 Linux boot 
messages on both systems.  So I know that the serial port works on both systems 
but for some reason I don't see the Xen messages.

I would like to debug through __start_xen to see how the 16550 is being 
initialized.  Is there an easy way to debug __start_xen? 

Thanks,
Kelly


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel