Re: Boot hang on Xen after r318347/(310418)

2017-05-25 Thread Adam McDougall
On Thu, May 25, 2017 at 10:41:03AM +0100, Roger Pau Monné wrote:

> On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote:
> > Hello,
> > 
> > Recently I made a new build of 11-STABLE but encountered a boot hang
> > at this state:
> > http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png
> > 
> > It is easy to reproduce, I can just boot from any 11 or 12 ISO that 
> > contains the commit.
> 
> I have just tested latest HEAD (r318861) and stable/11 (r318854) and
> they both work fine on my environment (a VM with 4 vCPUs and 2GB of
> RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input,
> he has been doing some tests on HEAD and AFAIK he hasn't seen any
> issues.
> 
> > I compiled various svn revisions to confirm that r318347 caused the 
> > issue and r318346 is fine. With r318347 or later including the latest 
> > 11-STABLE, the system will only boot with one virtual CPU in XenServer. 
> > Any more cpus and it hangs. I also tried a 12 kernel from head this 
> > afternoon and I have the same hang. I had this issue on XenServer 7 
> > (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I 
> > also did much of my testing with a GENERIC kernel to try to rule out 
> > kernel configuration mistakes. When it hangs, the performance 
> > monitoring in Xen tells me at least one CPU is pegged. r318674 boots 
> > fine on physical hardware without Xen involved.
> > 
> > Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing 
> > r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to 
> > my kernel but it turned the hang into a panic but with any number of 
> > CPUs: 
> > http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png
> 
> I guess this is on stable/11 right? The panic looks easier to debug
> that the hang, so let's start by this one. Can you enable the serial
> console and kernel debug options in order to get a trace? With just
> this it's almost impossible to know what went wrong.

Yes this was on stable/11 amd64.

> If you still have that kernel around (and it's debug symbols), can you
> do:
> 
> $ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0x80793344
> 
> (The address is the instruction pointer on the crash image, I think I
> got it right)

I'll reproduce this soon and get the results from that command.

> In order to compile a stable/11 kernel with full debugging support you
> will have to add:
> 
> # For full debugger support use (turn off in stable branch):
> options   BUF_TRACKING# Track buffer history
> options   DDB # Support DDB.
> options   FULL_BUF_TRACKING   # Track more buffer history
> options   GDB # Support remote GDB.
> options   DEADLKRES   # Enable the deadlock resolver
> options   INVARIANTS  # Enable calls of extra sanity checking
> options   INVARIANT_SUPPORT   # Extra sanity checks of internal 
> structures, required by INVARIANTS
> options   WITNESS # Enable checks to detect deadlocks and 
> cycles
> options   WITNESS_SKIPSPIN# Don't run witness on spinlocks for 
> speed
> options   MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones
> 
> To your kernel config file.

I'll work on that soon too when I get a chance, thanks.

> 
> Just to be sure, this is an amd64 kernel right?

yes

> 
> Roger.
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
>  
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: Boot hang on Xen after r318347/(310418)

2017-05-25 Thread Roger Pau Monné
On Wed, May 24, 2017 at 06:33:07PM -0400, Adam McDougall wrote:
> Hello,
> 
> Recently I made a new build of 11-STABLE but encountered a boot hang
> at this state:
> http://www.egr.msu.edu/~mcdouga9/pics/r318347-smp-hang.png
> 
> It is easy to reproduce, I can just boot from any 11 or 12 ISO that 
> contains the commit.

I have just tested latest HEAD (r318861) and stable/11 (r318854) and
they both work fine on my environment (a VM with 4 vCPUs and 2GB of
RAM on OSS Xen 4.9). I'm also adding Colin in case he has some input,
he has been doing some tests on HEAD and AFAIK he hasn't seen any
issues.

> I compiled various svn revisions to confirm that r318347 caused the 
> issue and r318346 is fine. With r318347 or later including the latest 
> 11-STABLE, the system will only boot with one virtual CPU in XenServer. 
> Any more cpus and it hangs. I also tried a 12 kernel from head this 
> afternoon and I have the same hang. I had this issue on XenServer 7 
> (Xen 4.7) and XenServer 6.5 (Xen 4.4). I did most of my testing on 7. I 
> also did much of my testing with a GENERIC kernel to try to rule out 
> kernel configuration mistakes. When it hangs, the performance 
> monitoring in Xen tells me at least one CPU is pegged. r318674 boots 
> fine on physical hardware without Xen involved.
> 
> Looking at r318347 which mentions EARLY_AP_STARTUP and later seeing 
> r318763 which enables EARLY_AP_STARTUP in GENERIC, I tried adding it to 
> my kernel but it turned the hang into a panic but with any number of 
> CPUs: 
> http://www.egr.msu.edu/~mcdouga9/pics/r318347-early-ap-startup-panic.png

I guess this is on stable/11 right? The panic looks easier to debug
that the hang, so let's start by this one. Can you enable the serial
console and kernel debug options in order to get a trace? With just
this it's almost impossible to know what went wrong.

If you still have that kernel around (and it's debug symbols), can you
do:

$ addr2line -e /usr/lib/debug/boot/kernel/kernel.debug 0x80793344

(The address is the instruction pointer on the crash image, I think I
got it right)

In order to compile a stable/11 kernel with full debugging support you
will have to add:

# For full debugger support use (turn off in stable branch):
options BUF_TRACKING# Track buffer history
options DDB # Support DDB.
options FULL_BUF_TRACKING   # Track more buffer history
options GDB # Support remote GDB.
options DEADLKRES   # Enable the deadlock resolver
options INVARIANTS  # Enable calls of extra sanity checking
options INVARIANT_SUPPORT   # Extra sanity checks of internal 
structures, required by INVARIANTS
options WITNESS # Enable checks to detect deadlocks and 
cycles
options WITNESS_SKIPSPIN# Don't run witness on spinlocks for 
speed
options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones

To your kernel config file.

Just to be sure, this is an amd64 kernel right?

Roger.
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"