Re: EARLY_AP_STARTUP hangs during boot

2016-08-04 Thread John Baldwin
On Thursday, August 04, 2016 08:59:06 AM Gary Jennejohn wrote:
> On Tue, 02 Aug 2016 10:41:23 -0700
> John Baldwin  wrote:
> 
> > On Tuesday, August 02, 2016 09:03:10 AM Gary Jennejohn wrote:
> > > On Mon, 01 Aug 2016 13:19:16 -0700
> > > John Baldwin  wrote:
> > >   
> > > > On Monday, August 01, 2016 03:31:11 PM Gary Jennejohn wrote:  
> > > > > On Mon, 1 Aug 2016 09:34:34 +0200
> > > > > Gary Jennejohn  wrote:
> > > > > 
> > > > > > On Sun, 31 Jul 2016 14:22:35 -0700
> > > > > > John Baldwin  wrote:
> > > > > > 
> > > > > > > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:  
> > > > > > > > On Sat, 30 Jul 2016 12:03:59 -0700
> > > > > > > > John Baldwin  wrote:
> > > > > > > > 
> > > > > > > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:  
> > > > > > > > >   
> > > > > > > > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > > > > > > > John Baldwin  wrote:
> > > > > > > > > >   
> > > > > > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn 
> > > > > > > > > > > wrote:  
> > > > > > > > > > > > Well, now I know that ULE is a prerequiste for 
> > > > > > > > > > > > EARLY_AP_STARTUP!  I
> > > > > > > > > > > > wasn't aware of that.  I prefer BSD and that's the 
> > > > > > > > > > > > scheduler I did
> > > > > > > > > > > > the first tests with.
> > > > > > > > > > > > 
> > > > > > > > > > > > But with the ULE scheduler the system comes up all the 
> > > > > > > > > > > > way.
> > > > > > > > > > > > 
> > > > > > > > > > > > It would be nice if the BSD scheduler could also be 
> > > > > > > > > > > > modified to
> > > > > > > > > > > > work with EARLY_AP_STARTUP.
> > > > > > > > > > > 
> > > > > > > > > > > I wasn't able to reproduce your hang with 4BSD, but I 
> > > > > > > > > > > think I see a
> > > > > > > > > > > possible problem.  Try this:
> > > > > > > > > > > 
> > > > > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > > > > > index 7de56b6..d53331a 100644
> > > > > > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > > > > > > > >*  - The current thread has a higher (numerically 
> > > > > > > > > > > lower) or
> > > > > > > > > > >*equivalent priority.  Note that this prevents 
> > > > > > > > > > > curthread from
> > > > > > > > > > >*trying to preempt to itself.
> > > > > > > > > > > -  *  - It is too early in the boot for context switches 
> > > > > > > > > > > (cold is set).
> > > > > > > > > > >*  - The current thread has an inhibitor set or is in 
> > > > > > > > > > > the process of
> > > > > > > > > > >*exiting.  In this case, the current thread is 
> > > > > > > > > > > about to switch
> > > > > > > > > > >*out anyways, so there's no point in preempting.  
> > > > > > > > > > > If we did,
> > > > > > > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > > > > > > > >   ("maybe_preempt: trying to run 
> > > > > > > > > > > inhibited thread"));
> > > > > > > > > > >   pri = td->td_priority;
> > > > > > > > > > >   cpri = ctd->td_priority;
> > > > > > > > > > > - if (panicstr != NULL || pri >= cpri || cold /* || 
> > > > > > > > > > > dumping */ ||
> > > > > > > > > > > + if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > > > > > > > >   TD_IS_INHIBITED(ctd))
> > > > > > > > > > >   return (0);
> > > > > > > > > > >  #ifndef FULL_PREEMPTION
> > > > > > > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > > > > > > > >   if ((!forward_wakeup_enabled) ||
> > > > > > > > > > >(forward_wakeup_use_mask == 0 && 
> > > > > > > > > > > forward_wakeup_use_loop == 0))
> > > > > > > > > > >   return (0);
> > > > > > > > > > > - if (!smp_started || cold || panicstr)
> > > > > > > > > > > + if (!smp_started || panicstr)
> > > > > > > > > > >   return (0);
> > > > > > > > > > >  
> > > > > > > > > > >   forward_wakeups_requested++;
> > > > > > > > > > >   
> > > > > > > > > > 
> > > > > > > > > > Thanks, but with this patch the kernel hangs in exactly the 
> > > > > > > > > > same
> > > > > > > > > > place as before - after the HPET output.
> > > > > > > > > > 
> > > > > > > > > > Maybe I'm missing some kernel option which ULE works 
> > > > > > > > > > around, or
> > > > > > > > > > something like that.  
> > > > > > > > > 
> > > > > > > > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, 
> > > > > > > > > that is
> > > > > > > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > > > > > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > > > > > > > 
> > > > > > > > > Please also add this patch (on top of the previous patch):
> > > > > > > > > 
> > > > > > > > > diff 

Re: EARLY_AP_STARTUP hangs during boot

2016-08-04 Thread Gary Jennejohn
On Tue, 02 Aug 2016 10:41:23 -0700
John Baldwin  wrote:

> On Tuesday, August 02, 2016 09:03:10 AM Gary Jennejohn wrote:
> > On Mon, 01 Aug 2016 13:19:16 -0700
> > John Baldwin  wrote:
> >   
> > > On Monday, August 01, 2016 03:31:11 PM Gary Jennejohn wrote:  
> > > > On Mon, 1 Aug 2016 09:34:34 +0200
> > > > Gary Jennejohn  wrote:
> > > > 
> > > > > On Sun, 31 Jul 2016 14:22:35 -0700
> > > > > John Baldwin  wrote:
> > > > > 
> > > > > > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:  
> > > > > > > On Sat, 30 Jul 2016 12:03:59 -0700
> > > > > > > John Baldwin  wrote:
> > > > > > > 
> > > > > > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> > > > > > > > 
> > > > > > > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > > > > > > John Baldwin  wrote:
> > > > > > > > >   
> > > > > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn 
> > > > > > > > > > wrote:  
> > > > > > > > > > > Well, now I know that ULE is a prerequiste for 
> > > > > > > > > > > EARLY_AP_STARTUP!  I
> > > > > > > > > > > wasn't aware of that.  I prefer BSD and that's the 
> > > > > > > > > > > scheduler I did
> > > > > > > > > > > the first tests with.
> > > > > > > > > > > 
> > > > > > > > > > > But with the ULE scheduler the system comes up all the 
> > > > > > > > > > > way.
> > > > > > > > > > > 
> > > > > > > > > > > It would be nice if the BSD scheduler could also be 
> > > > > > > > > > > modified to
> > > > > > > > > > > work with EARLY_AP_STARTUP.
> > > > > > > > > > 
> > > > > > > > > > I wasn't able to reproduce your hang with 4BSD, but I think 
> > > > > > > > > > I see a
> > > > > > > > > > possible problem.  Try this:
> > > > > > > > > > 
> > > > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > > > > index 7de56b6..d53331a 100644
> > > > > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > > > > > > >  *  - The current thread has a higher (numerically 
> > > > > > > > > > lower) or
> > > > > > > > > >  *equivalent priority.  Note that this prevents 
> > > > > > > > > > curthread from
> > > > > > > > > >  *trying to preempt to itself.
> > > > > > > > > > -*  - It is too early in the boot for context switches 
> > > > > > > > > > (cold is set).
> > > > > > > > > >  *  - The current thread has an inhibitor set or is in 
> > > > > > > > > > the process of
> > > > > > > > > >  *exiting.  In this case, the current thread is 
> > > > > > > > > > about to switch
> > > > > > > > > >  *out anyways, so there's no point in preempting.  
> > > > > > > > > > If we did,
> > > > > > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > > > > > > > ("maybe_preempt: trying to run 
> > > > > > > > > > inhibited thread"));
> > > > > > > > > > pri = td->td_priority;
> > > > > > > > > > cpri = ctd->td_priority;
> > > > > > > > > > -   if (panicstr != NULL || pri >= cpri || cold /* || 
> > > > > > > > > > dumping */ ||
> > > > > > > > > > +   if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > > > > > > > TD_IS_INHIBITED(ctd))
> > > > > > > > > > return (0);
> > > > > > > > > >  #ifndef FULL_PREEMPTION
> > > > > > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > > > > > > > if ((!forward_wakeup_enabled) ||
> > > > > > > > > >  (forward_wakeup_use_mask == 0 && 
> > > > > > > > > > forward_wakeup_use_loop == 0))
> > > > > > > > > > return (0);
> > > > > > > > > > -   if (!smp_started || cold || panicstr)
> > > > > > > > > > +   if (!smp_started || panicstr)
> > > > > > > > > > return (0);
> > > > > > > > > >  
> > > > > > > > > > forward_wakeups_requested++;
> > > > > > > > > >   
> > > > > > > > > 
> > > > > > > > > Thanks, but with this patch the kernel hangs in exactly the 
> > > > > > > > > same
> > > > > > > > > place as before - after the HPET output.
> > > > > > > > > 
> > > > > > > > > Maybe I'm missing some kernel option which ULE works around, 
> > > > > > > > > or
> > > > > > > > > something like that.  
> > > > > > > > 
> > > > > > > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, 
> > > > > > > > that is
> > > > > > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > > > > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > > > > > > 
> > > > > > > > Please also add this patch (on top of the previous patch):
> > > > > > > > 
> > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > > index 2973a23..bab2278 100644
> > > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > > @@ -1278,6 

Re: EARLY_AP_STARTUP hangs during boot

2016-08-02 Thread John Baldwin
On Tuesday, August 02, 2016 09:03:10 AM Gary Jennejohn wrote:
> On Mon, 01 Aug 2016 13:19:16 -0700
> John Baldwin  wrote:
> 
> > On Monday, August 01, 2016 03:31:11 PM Gary Jennejohn wrote:
> > > On Mon, 1 Aug 2016 09:34:34 +0200
> > > Gary Jennejohn  wrote:
> > >   
> > > > On Sun, 31 Jul 2016 14:22:35 -0700
> > > > John Baldwin  wrote:
> > > >   
> > > > > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:
> > > > > > On Sat, 30 Jul 2016 12:03:59 -0700
> > > > > > John Baldwin  wrote:
> > > > > >   
> > > > > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:  
> > > > > > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > > > > > John Baldwin  wrote:
> > > > > > > > 
> > > > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:  
> > > > > > > > >   
> > > > > > > > > > Well, now I know that ULE is a prerequiste for 
> > > > > > > > > > EARLY_AP_STARTUP!  I
> > > > > > > > > > wasn't aware of that.  I prefer BSD and that's the 
> > > > > > > > > > scheduler I did
> > > > > > > > > > the first tests with.
> > > > > > > > > > 
> > > > > > > > > > But with the ULE scheduler the system comes up all the way.
> > > > > > > > > > 
> > > > > > > > > > It would be nice if the BSD scheduler could also be 
> > > > > > > > > > modified to
> > > > > > > > > > work with EARLY_AP_STARTUP.  
> > > > > > > > > 
> > > > > > > > > I wasn't able to reproduce your hang with 4BSD, but I think I 
> > > > > > > > > see a
> > > > > > > > > possible problem.  Try this:
> > > > > > > > > 
> > > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > > > index 7de56b6..d53331a 100644
> > > > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > > > > > >*  - The current thread has a higher (numerically 
> > > > > > > > > lower) or
> > > > > > > > >*equivalent priority.  Note that this prevents 
> > > > > > > > > curthread from
> > > > > > > > >*trying to preempt to itself.
> > > > > > > > > -  *  - It is too early in the boot for context switches 
> > > > > > > > > (cold is set).
> > > > > > > > >*  - The current thread has an inhibitor set or is in 
> > > > > > > > > the process of
> > > > > > > > >*exiting.  In this case, the current thread is 
> > > > > > > > > about to switch
> > > > > > > > >*out anyways, so there's no point in preempting.  
> > > > > > > > > If we did,
> > > > > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > > > > > >   ("maybe_preempt: trying to run 
> > > > > > > > > inhibited thread"));
> > > > > > > > >   pri = td->td_priority;
> > > > > > > > >   cpri = ctd->td_priority;
> > > > > > > > > - if (panicstr != NULL || pri >= cpri || cold /* || 
> > > > > > > > > dumping */ ||
> > > > > > > > > + if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > > > > > >   TD_IS_INHIBITED(ctd))
> > > > > > > > >   return (0);
> > > > > > > > >  #ifndef FULL_PREEMPTION
> > > > > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > > > > > >   if ((!forward_wakeup_enabled) ||
> > > > > > > > >(forward_wakeup_use_mask == 0 && 
> > > > > > > > > forward_wakeup_use_loop == 0))
> > > > > > > > >   return (0);
> > > > > > > > > - if (!smp_started || cold || panicstr)
> > > > > > > > > + if (!smp_started || panicstr)
> > > > > > > > >   return (0);
> > > > > > > > >  
> > > > > > > > >   forward_wakeups_requested++;
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Thanks, but with this patch the kernel hangs in exactly the same
> > > > > > > > place as before - after the HPET output.
> > > > > > > > 
> > > > > > > > Maybe I'm missing some kernel option which ULE works around, or
> > > > > > > > something like that.
> > > > > > > 
> > > > > > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that 
> > > > > > > is
> > > > > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > > > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > > > > > 
> > > > > > > Please also add this patch (on top of the previous patch):
> > > > > > > 
> > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > index 2973a23..bab2278 100644
> > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> > > > > > > KASSERT(td->td_flags & TDF_INMEM,
> > > > > > > ("sched_add: thread swapped out"));
> > > > > > >  
> > > > > > > +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> > > > > > > +   sched_tdname(td));
> > > > > > > 

Re: EARLY_AP_STARTUP hangs during boot

2016-08-02 Thread Gary Jennejohn
On Mon, 01 Aug 2016 13:19:16 -0700
John Baldwin  wrote:

> On Monday, August 01, 2016 03:31:11 PM Gary Jennejohn wrote:
> > On Mon, 1 Aug 2016 09:34:34 +0200
> > Gary Jennejohn  wrote:
> >   
> > > On Sun, 31 Jul 2016 14:22:35 -0700
> > > John Baldwin  wrote:
> > >   
> > > > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:
> > > > > On Sat, 30 Jul 2016 12:03:59 -0700
> > > > > John Baldwin  wrote:
> > > > >   
> > > > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:  
> > > > > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > > > > John Baldwin  wrote:
> > > > > > > 
> > > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> > > > > > > > 
> > > > > > > > > Well, now I know that ULE is a prerequiste for 
> > > > > > > > > EARLY_AP_STARTUP!  I
> > > > > > > > > wasn't aware of that.  I prefer BSD and that's the scheduler 
> > > > > > > > > I did
> > > > > > > > > the first tests with.
> > > > > > > > > 
> > > > > > > > > But with the ULE scheduler the system comes up all the way.
> > > > > > > > > 
> > > > > > > > > It would be nice if the BSD scheduler could also be modified 
> > > > > > > > > to
> > > > > > > > > work with EARLY_AP_STARTUP.  
> > > > > > > > 
> > > > > > > > I wasn't able to reproduce your hang with 4BSD, but I think I 
> > > > > > > > see a
> > > > > > > > possible problem.  Try this:
> > > > > > > > 
> > > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > > index 7de56b6..d53331a 100644
> > > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > > > > >  *  - The current thread has a higher (numerically 
> > > > > > > > lower) or
> > > > > > > >  *equivalent priority.  Note that this prevents 
> > > > > > > > curthread from
> > > > > > > >  *trying to preempt to itself.
> > > > > > > > -*  - It is too early in the boot for context switches 
> > > > > > > > (cold is set).
> > > > > > > >  *  - The current thread has an inhibitor set or is in 
> > > > > > > > the process of
> > > > > > > >  *exiting.  In this case, the current thread is 
> > > > > > > > about to switch
> > > > > > > >  *out anyways, so there's no point in preempting.  
> > > > > > > > If we did,
> > > > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > > > > > ("maybe_preempt: trying to run 
> > > > > > > > inhibited thread"));
> > > > > > > > pri = td->td_priority;
> > > > > > > > cpri = ctd->td_priority;
> > > > > > > > -   if (panicstr != NULL || pri >= cpri || cold /* || 
> > > > > > > > dumping */ ||
> > > > > > > > +   if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > > > > > TD_IS_INHIBITED(ctd))
> > > > > > > > return (0);
> > > > > > > >  #ifndef FULL_PREEMPTION
> > > > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > > > > > if ((!forward_wakeup_enabled) ||
> > > > > > > >  (forward_wakeup_use_mask == 0 && 
> > > > > > > > forward_wakeup_use_loop == 0))
> > > > > > > > return (0);
> > > > > > > > -   if (!smp_started || cold || panicstr)
> > > > > > > > +   if (!smp_started || panicstr)
> > > > > > > > return (0);
> > > > > > > >  
> > > > > > > > forward_wakeups_requested++;
> > > > > > > > 
> > > > > > > 
> > > > > > > Thanks, but with this patch the kernel hangs in exactly the same
> > > > > > > place as before - after the HPET output.
> > > > > > > 
> > > > > > > Maybe I'm missing some kernel option which ULE works around, or
> > > > > > > something like that.
> > > > > > 
> > > > > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> > > > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > > > > 
> > > > > > Please also add this patch (on top of the previous patch):
> > > > > > 
> > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > index 2973a23..bab2278 100644
> > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> > > > > > KASSERT(td->td_flags & TDF_INMEM,
> > > > > > ("sched_add: thread swapped out"));
> > > > > >  
> > > > > > +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> > > > > > +   sched_tdname(td));
> > > > > > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq 
> > > > > > add",
> > > > > > "prio:%d", td->td_priority, KTR_ATTR_LINKED,
> > > > > > sched_tdname(curthread));
> > > > > > diff --git 

Re: EARLY_AP_STARTUP hangs during boot

2016-08-01 Thread John Baldwin
On Monday, August 01, 2016 03:31:11 PM Gary Jennejohn wrote:
> On Mon, 1 Aug 2016 09:34:34 +0200
> Gary Jennejohn  wrote:
> 
> > On Sun, 31 Jul 2016 14:22:35 -0700
> > John Baldwin  wrote:
> > 
> > > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:  
> > > > On Sat, 30 Jul 2016 12:03:59 -0700
> > > > John Baldwin  wrote:
> > > > 
> > > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> > > > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > > > John Baldwin  wrote:
> > > > > >   
> > > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:  
> > > > > > > > Well, now I know that ULE is a prerequiste for 
> > > > > > > > EARLY_AP_STARTUP!  I
> > > > > > > > wasn't aware of that.  I prefer BSD and that's the scheduler I 
> > > > > > > > did
> > > > > > > > the first tests with.
> > > > > > > > 
> > > > > > > > But with the ULE scheduler the system comes up all the way.
> > > > > > > > 
> > > > > > > > It would be nice if the BSD scheduler could also be modified to
> > > > > > > > work with EARLY_AP_STARTUP.
> > > > > > > 
> > > > > > > I wasn't able to reproduce your hang with 4BSD, but I think I see 
> > > > > > > a
> > > > > > > possible problem.  Try this:
> > > > > > > 
> > > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > > index 7de56b6..d53331a 100644
> > > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > > > >*  - The current thread has a higher (numerically lower) or
> > > > > > >*equivalent priority.  Note that this prevents curthread 
> > > > > > > from
> > > > > > >*trying to preempt to itself.
> > > > > > > -  *  - It is too early in the boot for context switches (cold is 
> > > > > > > set).
> > > > > > >*  - The current thread has an inhibitor set or is in the 
> > > > > > > process of
> > > > > > >*exiting.  In this case, the current thread is about to 
> > > > > > > switch
> > > > > > >*out anyways, so there's no point in preempting.  If we 
> > > > > > > did,
> > > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > > > >   ("maybe_preempt: trying to run inhibited 
> > > > > > > thread"));
> > > > > > >   pri = td->td_priority;
> > > > > > >   cpri = ctd->td_priority;
> > > > > > > - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > > > > > > + if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > > > >   TD_IS_INHIBITED(ctd))
> > > > > > >   return (0);
> > > > > > >  #ifndef FULL_PREEMPTION
> > > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > > > >   if ((!forward_wakeup_enabled) ||
> > > > > > >(forward_wakeup_use_mask == 0 && forward_wakeup_use_loop 
> > > > > > > == 0))
> > > > > > >   return (0);
> > > > > > > - if (!smp_started || cold || panicstr)
> > > > > > > + if (!smp_started || panicstr)
> > > > > > >   return (0);
> > > > > > >  
> > > > > > >   forward_wakeups_requested++;
> > > > > > >   
> > > > > > 
> > > > > > Thanks, but with this patch the kernel hangs in exactly the same
> > > > > > place as before - after the HPET output.
> > > > > > 
> > > > > > Maybe I'm missing some kernel option which ULE works around, or
> > > > > > something like that.  
> > > > > 
> > > > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> > > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > > > 
> > > > > Please also add this patch (on top of the previous patch):
> > > > > 
> > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > index 2973a23..bab2278 100644
> > > > > --- a/sys/kern/sched_4bsd.c
> > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> > > > > KASSERT(td->td_flags & TDF_INMEM,
> > > > > ("sched_add: thread swapped out"));
> > > > >  
> > > > > +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> > > > > +   sched_tdname(td));
> > > > > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
> > > > > "prio:%d", td->td_priority, KTR_ATTR_LINKED,
> > > > > sched_tdname(curthread));
> > > > > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
> > > > > index f07b97e..1f418f1 100644
> > > > > --- a/sys/x86/x86/cpu_machdep.c
> > > > > +++ b/sys/x86/x86/cpu_machdep.c
> > > > > @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu)
> > > > > return (0);
> > > > > if (*state == STATE_MWAIT)
> > > > > *state = STATE_RUNNING;
> > > > > +   CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
> > > > > return (1);
> > > > >  }
> > > > > 
> > > > > (I haven't 

Re: EARLY_AP_STARTUP hangs during boot

2016-08-01 Thread Gary Jennejohn
On Mon, 1 Aug 2016 09:34:34 +0200
Gary Jennejohn  wrote:

> On Sun, 31 Jul 2016 14:22:35 -0700
> John Baldwin  wrote:
> 
> > On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:  
> > > On Sat, 30 Jul 2016 12:03:59 -0700
> > > John Baldwin  wrote:
> > > 
> > > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> > > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > > John Baldwin  wrote:
> > > > >   
> > > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:  
> > > > > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  
> > > > > > > I
> > > > > > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > > > > > the first tests with.
> > > > > > > 
> > > > > > > But with the ULE scheduler the system comes up all the way.
> > > > > > > 
> > > > > > > It would be nice if the BSD scheduler could also be modified to
> > > > > > > work with EARLY_AP_STARTUP.
> > > > > > 
> > > > > > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > > > > > possible problem.  Try this:
> > > > > > 
> > > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > > index 7de56b6..d53331a 100644
> > > > > > --- a/sys/kern/sched_4bsd.c
> > > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > > >  *  - The current thread has a higher (numerically lower) or
> > > > > >  *equivalent priority.  Note that this prevents curthread 
> > > > > > from
> > > > > >  *trying to preempt to itself.
> > > > > > -*  - It is too early in the boot for context switches (cold is 
> > > > > > set).
> > > > > >  *  - The current thread has an inhibitor set or is in the 
> > > > > > process of
> > > > > >  *exiting.  In this case, the current thread is about to 
> > > > > > switch
> > > > > >  *out anyways, so there's no point in preempting.  If we 
> > > > > > did,
> > > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > > > ("maybe_preempt: trying to run inhibited 
> > > > > > thread"));
> > > > > > pri = td->td_priority;
> > > > > > cpri = ctd->td_priority;
> > > > > > -   if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > > > > > +   if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > > > TD_IS_INHIBITED(ctd))
> > > > > > return (0);
> > > > > >  #ifndef FULL_PREEMPTION
> > > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > > > if ((!forward_wakeup_enabled) ||
> > > > > >  (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop 
> > > > > > == 0))
> > > > > > return (0);
> > > > > > -   if (!smp_started || cold || panicstr)
> > > > > > +   if (!smp_started || panicstr)
> > > > > > return (0);
> > > > > >  
> > > > > > forward_wakeups_requested++;
> > > > > >   
> > > > > 
> > > > > Thanks, but with this patch the kernel hangs in exactly the same
> > > > > place as before - after the HPET output.
> > > > > 
> > > > > Maybe I'm missing some kernel option which ULE works around, or
> > > > > something like that.  
> > > > 
> > > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> > > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > > 
> > > > Please also add this patch (on top of the previous patch):
> > > > 
> > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > index 2973a23..bab2278 100644
> > > > --- a/sys/kern/sched_4bsd.c
> > > > +++ b/sys/kern/sched_4bsd.c
> > > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> > > > KASSERT(td->td_flags & TDF_INMEM,
> > > > ("sched_add: thread swapped out"));
> > > >  
> > > > +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> > > > +   sched_tdname(td));
> > > > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
> > > > "prio:%d", td->td_priority, KTR_ATTR_LINKED,
> > > > sched_tdname(curthread));
> > > > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
> > > > index f07b97e..1f418f1 100644
> > > > --- a/sys/x86/x86/cpu_machdep.c
> > > > +++ b/sys/x86/x86/cpu_machdep.c
> > > > @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu)
> > > > return (0);
> > > > if (*state == STATE_MWAIT)
> > > > *state = STATE_RUNNING;
> > > > +   CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
> > > > return (1);
> > > >  }
> > > > 
> > > > (I haven't tried compiling it, you might have to add the sys/ktr.h
> > > > header to cpu_machdep.c if it doesn't build.)
> > > > 
> > > > Hopefully we will get some better trace messages before it hangs
> > > > with this added info.  The root issue seems to be that 4BSD 

Re: EARLY_AP_STARTUP hangs during boot

2016-08-01 Thread Gary Jennejohn
On Sun, 31 Jul 2016 14:22:35 -0700
John Baldwin  wrote:

> On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:
> > On Sat, 30 Jul 2016 12:03:59 -0700
> > John Baldwin  wrote:
> >   
> > > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:  
> > > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > > John Baldwin  wrote:
> > > > 
> > > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> > > > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > > > > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > > > > the first tests with.
> > > > > > 
> > > > > > But with the ULE scheduler the system comes up all the way.
> > > > > > 
> > > > > > It would be nice if the BSD scheduler could also be modified to
> > > > > > work with EARLY_AP_STARTUP.  
> > > > > 
> > > > > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > > > > possible problem.  Try this:
> > > > > 
> > > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > > index 7de56b6..d53331a 100644
> > > > > --- a/sys/kern/sched_4bsd.c
> > > > > +++ b/sys/kern/sched_4bsd.c
> > > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > > >*  - The current thread has a higher (numerically lower) or
> > > > >*equivalent priority.  Note that this prevents curthread 
> > > > > from
> > > > >*trying to preempt to itself.
> > > > > -  *  - It is too early in the boot for context switches (cold is 
> > > > > set).
> > > > >*  - The current thread has an inhibitor set or is in the 
> > > > > process of
> > > > >*exiting.  In this case, the current thread is about to 
> > > > > switch
> > > > >*out anyways, so there's no point in preempting.  If we 
> > > > > did,
> > > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > >   ("maybe_preempt: trying to run inhibited 
> > > > > thread"));
> > > > >   pri = td->td_priority;
> > > > >   cpri = ctd->td_priority;
> > > > > - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > > > > + if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > >   TD_IS_INHIBITED(ctd))
> > > > >   return (0);
> > > > >  #ifndef FULL_PREEMPTION
> > > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > >   if ((!forward_wakeup_enabled) ||
> > > > >(forward_wakeup_use_mask == 0 && forward_wakeup_use_loop 
> > > > > == 0))
> > > > >   return (0);
> > > > > - if (!smp_started || cold || panicstr)
> > > > > + if (!smp_started || panicstr)
> > > > >   return (0);
> > > > >  
> > > > >   forward_wakeups_requested++;
> > > > > 
> > > > 
> > > > Thanks, but with this patch the kernel hangs in exactly the same
> > > > place as before - after the HPET output.
> > > > 
> > > > Maybe I'm missing some kernel option which ULE works around, or
> > > > something like that.
> > > 
> > > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> > > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > > 
> > > Please also add this patch (on top of the previous patch):
> > > 
> > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > index 2973a23..bab2278 100644
> > > --- a/sys/kern/sched_4bsd.c
> > > +++ b/sys/kern/sched_4bsd.c
> > > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> > > KASSERT(td->td_flags & TDF_INMEM,
> > > ("sched_add: thread swapped out"));
> > >  
> > > +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> > > +   sched_tdname(td));
> > > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
> > > "prio:%d", td->td_priority, KTR_ATTR_LINKED,
> > > sched_tdname(curthread));
> > > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
> > > index f07b97e..1f418f1 100644
> > > --- a/sys/x86/x86/cpu_machdep.c
> > > +++ b/sys/x86/x86/cpu_machdep.c
> > > @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu)
> > > return (0);
> > > if (*state == STATE_MWAIT)
> > > *state = STATE_RUNNING;
> > > +   CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
> > > return (1);
> > >  }
> > > 
> > > (I haven't tried compiling it, you might have to add the sys/ktr.h
> > > header to cpu_machdep.c if it doesn't build.)
> > > 
> > > Hopefully we will get some better trace messages before it hangs
> > > with this added info.  The root issue seems to be that 4BSD is
> > > pinning thread0 to some other CPU (due to sched_bind that happens
> > > inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs)
> > > and that other CPU isn't waking up to realize it needs to run thread0.
> > >   
> > 
> > It compiled with no changes needed.
> 

Re: EARLY_AP_STARTUP hangs during boot

2016-07-31 Thread John Baldwin
On Sunday, July 31, 2016 11:29:14 AM Gary Jennejohn wrote:
> On Sat, 30 Jul 2016 12:03:59 -0700
> John Baldwin  wrote:
> 
> > On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> > > On Fri, 29 Jul 2016 13:17:42 -0700
> > > John Baldwin  wrote:
> > >   
> > > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:  
> > > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > > > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > > > the first tests with.
> > > > > 
> > > > > But with the ULE scheduler the system comes up all the way.
> > > > > 
> > > > > It would be nice if the BSD scheduler could also be modified to
> > > > > work with EARLY_AP_STARTUP.
> > > > 
> > > > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > > > possible problem.  Try this:
> > > > 
> > > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > > index 7de56b6..d53331a 100644
> > > > --- a/sys/kern/sched_4bsd.c
> > > > +++ b/sys/kern/sched_4bsd.c
> > > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > > >  *  - The current thread has a higher (numerically lower) or
> > > >  *equivalent priority.  Note that this prevents curthread 
> > > > from
> > > >  *trying to preempt to itself.
> > > > -*  - It is too early in the boot for context switches (cold is 
> > > > set).
> > > >  *  - The current thread has an inhibitor set or is in the 
> > > > process of
> > > >  *exiting.  In this case, the current thread is about to 
> > > > switch
> > > >  *out anyways, so there's no point in preempting.  If we 
> > > > did,
> > > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > > > ("maybe_preempt: trying to run inhibited 
> > > > thread"));
> > > > pri = td->td_priority;
> > > > cpri = ctd->td_priority;
> > > > -   if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > > > +   if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > > > TD_IS_INHIBITED(ctd))
> > > > return (0);
> > > >  #ifndef FULL_PREEMPTION
> > > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > > > if ((!forward_wakeup_enabled) ||
> > > >  (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop 
> > > > == 0))
> > > > return (0);
> > > > -   if (!smp_started || cold || panicstr)
> > > > +   if (!smp_started || panicstr)
> > > > return (0);
> > > >  
> > > > forward_wakeups_requested++;
> > > >   
> > > 
> > > Thanks, but with this patch the kernel hangs in exactly the same
> > > place as before - after the HPET output.
> > > 
> > > Maybe I'm missing some kernel option which ULE works around, or
> > > something like that.  
> > 
> > Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> > 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> > 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> > 
> > Please also add this patch (on top of the previous patch):
> > 
> > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > index 2973a23..bab2278 100644
> > --- a/sys/kern/sched_4bsd.c
> > +++ b/sys/kern/sched_4bsd.c
> > @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> > KASSERT(td->td_flags & TDF_INMEM,
> > ("sched_add: thread swapped out"));
> >  
> > +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> > +   sched_tdname(td));
> > KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
> > "prio:%d", td->td_priority, KTR_ATTR_LINKED,
> > sched_tdname(curthread));
> > diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
> > index f07b97e..1f418f1 100644
> > --- a/sys/x86/x86/cpu_machdep.c
> > +++ b/sys/x86/x86/cpu_machdep.c
> > @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu)
> > return (0);
> > if (*state == STATE_MWAIT)
> > *state = STATE_RUNNING;
> > +   CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
> > return (1);
> >  }
> > 
> > (I haven't tried compiling it, you might have to add the sys/ktr.h
> > header to cpu_machdep.c if it doesn't build.)
> > 
> > Hopefully we will get some better trace messages before it hangs
> > with this added info.  The root issue seems to be that 4BSD is
> > pinning thread0 to some other CPU (due to sched_bind that happens
> > inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs)
> > and that other CPU isn't waking up to realize it needs to run thread0.
> > 
> 
> It compiled with no changes needed.
> 
> Even though I set MAXCPU to a mere 2, the boot still hadn't
> completed after 90 minutes and I broke it off.  I still have
> the kernel, so I can try it another time when I have less need
> for my FreeBSD box.

Did you have the KTR options enabled from before?  

Re: EARLY_AP_STARTUP hangs during boot

2016-07-31 Thread Gary Jennejohn
On Sat, 30 Jul 2016 12:03:59 -0700
John Baldwin  wrote:

> On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> > On Fri, 29 Jul 2016 13:17:42 -0700
> > John Baldwin  wrote:
> >   
> > > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:  
> > > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > > the first tests with.
> > > > 
> > > > But with the ULE scheduler the system comes up all the way.
> > > > 
> > > > It would be nice if the BSD scheduler could also be modified to
> > > > work with EARLY_AP_STARTUP.
> > > 
> > > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > > possible problem.  Try this:
> > > 
> > > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > > index 7de56b6..d53331a 100644
> > > --- a/sys/kern/sched_4bsd.c
> > > +++ b/sys/kern/sched_4bsd.c
> > > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> > >*  - The current thread has a higher (numerically lower) or
> > >*equivalent priority.  Note that this prevents curthread from
> > >*trying to preempt to itself.
> > > -  *  - It is too early in the boot for context switches (cold is set).
> > >*  - The current thread has an inhibitor set or is in the process of
> > >*exiting.  In this case, the current thread is about to switch
> > >*out anyways, so there's no point in preempting.  If we did,
> > > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > >   ("maybe_preempt: trying to run inhibited thread"));
> > >   pri = td->td_priority;
> > >   cpri = ctd->td_priority;
> > > - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > > + if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > >   TD_IS_INHIBITED(ctd))
> > >   return (0);
> > >  #ifndef FULL_PREEMPTION
> > > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > >   if ((!forward_wakeup_enabled) ||
> > >(forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0))
> > >   return (0);
> > > - if (!smp_started || cold || panicstr)
> > > + if (!smp_started || panicstr)
> > >   return (0);
> > >  
> > >   forward_wakeups_requested++;
> > >   
> > 
> > Thanks, but with this patch the kernel hangs in exactly the same
> > place as before - after the HPET output.
> > 
> > Maybe I'm missing some kernel option which ULE works around, or
> > something like that.  
> 
> Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
> 'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
> 'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'
> 
> Please also add this patch (on top of the previous patch):
> 
> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> index 2973a23..bab2278 100644
> --- a/sys/kern/sched_4bsd.c
> +++ b/sys/kern/sched_4bsd.c
> @@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
> KASSERT(td->td_flags & TDF_INMEM,
> ("sched_add: thread swapped out"));
>  
> +   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
> +   sched_tdname(td));
> KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
> "prio:%d", td->td_priority, KTR_ATTR_LINKED,
> sched_tdname(curthread));
> diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
> index f07b97e..1f418f1 100644
> --- a/sys/x86/x86/cpu_machdep.c
> +++ b/sys/x86/x86/cpu_machdep.c
> @@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu)
> return (0);
> if (*state == STATE_MWAIT)
> *state = STATE_RUNNING;
> +   CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
> return (1);
>  }
> 
> (I haven't tried compiling it, you might have to add the sys/ktr.h
> header to cpu_machdep.c if it doesn't build.)
> 
> Hopefully we will get some better trace messages before it hangs
> with this added info.  The root issue seems to be that 4BSD is
> pinning thread0 to some other CPU (due to sched_bind that happens
> inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs)
> and that other CPU isn't waking up to realize it needs to run thread0.
> 

It compiled with no changes needed.

Even though I set MAXCPU to a mere 2, the boot still hadn't
completed after 90 minutes and I broke it off.  I still have
the kernel, so I can try it another time when I have less need
for my FreeBSD box.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-07-30 Thread John Baldwin
On Saturday, July 30, 2016 09:44:22 AM Gary Jennejohn wrote:
> On Fri, 29 Jul 2016 13:17:42 -0700
> John Baldwin  wrote:
> 
> > On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> > > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > > the first tests with.
> > > 
> > > But with the ULE scheduler the system comes up all the way.
> > > 
> > > It would be nice if the BSD scheduler could also be modified to
> > > work with EARLY_AP_STARTUP.  
> > 
> > I wasn't able to reproduce your hang with 4BSD, but I think I see a
> > possible problem.  Try this:
> > 
> > diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> > index 7de56b6..d53331a 100644
> > --- a/sys/kern/sched_4bsd.c
> > +++ b/sys/kern/sched_4bsd.c
> > @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
> >  *  - The current thread has a higher (numerically lower) or
> >  *equivalent priority.  Note that this prevents curthread from
> >  *trying to preempt to itself.
> > -*  - It is too early in the boot for context switches (cold is set).
> >  *  - The current thread has an inhibitor set or is in the process of
> >  *exiting.  In this case, the current thread is about to switch
> >  *out anyways, so there's no point in preempting.  If we did,
> > @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
> > ("maybe_preempt: trying to run inhibited thread"));
> > pri = td->td_priority;
> > cpri = ctd->td_priority;
> > -   if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> > +   if (panicstr != NULL || pri >= cpri /* || dumping */ ||
> > TD_IS_INHIBITED(ctd))
> > return (0);
> >  #ifndef FULL_PREEMPTION
> > @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
> > if ((!forward_wakeup_enabled) ||
> >  (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0))
> > return (0);
> > -   if (!smp_started || cold || panicstr)
> > +   if (!smp_started || panicstr)
> > return (0);
> >  
> > forward_wakeups_requested++;
> > 
> 
> Thanks, but with this patch the kernel hangs in exactly the same
> place as before - after the HPET output.
> 
> Maybe I'm missing some kernel option which ULE works around, or
> something like that.

Hmm, ok.  Please add KTR_RUNQ and KTR_SMP to the KTR masks, that is
'options KTR_COMPILE=(KTR_PROC|KTR_RUNQ|KTR_SMP)' and
'options KTR_MASK=(KTR_PROC|KTR_RUNQ|KTR_SMP)'

Please also add this patch (on top of the previous patch):

diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
index 2973a23..bab2278 100644
--- a/sys/kern/sched_4bsd.c
+++ b/sys/kern/sched_4bsd.c
@@ -1278,6 +1278,8 @@ sched_add(struct thread *td, int flags)
KASSERT(td->td_flags & TDF_INMEM,
("sched_add: thread swapped out"));
 
+   CTR2(KTR_PROC, "sched_add: thread %d (%s)", td->td_tid,
+   sched_tdname(td));
KTR_STATE2(KTR_SCHED, "thread", sched_tdname(td), "runq add",
"prio:%d", td->td_priority, KTR_ATTR_LINKED,
sched_tdname(curthread));
diff --git a/sys/x86/x86/cpu_machdep.c b/sys/x86/x86/cpu_machdep.c
index f07b97e..1f418f1 100644
--- a/sys/x86/x86/cpu_machdep.c
+++ b/sys/x86/x86/cpu_machdep.c
@@ -440,6 +440,7 @@ cpu_idle_wakeup(int cpu)
return (0);
if (*state == STATE_MWAIT)
*state = STATE_RUNNING;
+   CTR1(KTR_PROC, "cpu_idle_wakeup: wokeup CPU %d", cpu);
return (1);
 }

(I haven't tried compiling it, you might have to add the sys/ktr.h
header to cpu_machdep.c if it doesn't build.)

Hopefully we will get some better trace messages before it hangs
with this added info.  The root issue seems to be that 4BSD is
pinning thread0 to some other CPU (due to sched_bind that happens
inside of bus_bind_intr() when the HPET driver pins IRQs to CPUs)
and that other CPU isn't waking up to realize it needs to run thread0.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-07-30 Thread Gary Jennejohn
On Fri, 29 Jul 2016 13:17:42 -0700
John Baldwin  wrote:

> On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> > Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> > wasn't aware of that.  I prefer BSD and that's the scheduler I did
> > the first tests with.
> > 
> > But with the ULE scheduler the system comes up all the way.
> > 
> > It would be nice if the BSD scheduler could also be modified to
> > work with EARLY_AP_STARTUP.  
> 
> I wasn't able to reproduce your hang with 4BSD, but I think I see a
> possible problem.  Try this:
> 
> diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
> index 7de56b6..d53331a 100644
> --- a/sys/kern/sched_4bsd.c
> +++ b/sys/kern/sched_4bsd.c
> @@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
>*  - The current thread has a higher (numerically lower) or
>*equivalent priority.  Note that this prevents curthread from
>*trying to preempt to itself.
> -  *  - It is too early in the boot for context switches (cold is set).
>*  - The current thread has an inhibitor set or is in the process of
>*exiting.  In this case, the current thread is about to switch
>*out anyways, so there's no point in preempting.  If we did,
> @@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
>   ("maybe_preempt: trying to run inhibited thread"));
>   pri = td->td_priority;
>   cpri = ctd->td_priority;
> - if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
> + if (panicstr != NULL || pri >= cpri /* || dumping */ ||
>   TD_IS_INHIBITED(ctd))
>   return (0);
>  #ifndef FULL_PREEMPTION
> @@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
>   if ((!forward_wakeup_enabled) ||
>(forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0))
>   return (0);
> - if (!smp_started || cold || panicstr)
> + if (!smp_started || panicstr)
>   return (0);
>  
>   forward_wakeups_requested++;
> 

Thanks, but with this patch the kernel hangs in exactly the same
place as before - after the HPET output.

Maybe I'm missing some kernel option which ULE works around, or
something like that.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-07-29 Thread John Baldwin
On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> Well, now I know that ULE is a prerequiste for EARLY_AP_STARTUP!  I
> wasn't aware of that.  I prefer BSD and that's the scheduler I did
> the first tests with.
> 
> But with the ULE scheduler the system comes up all the way.
> 
> It would be nice if the BSD scheduler could also be modified to
> work with EARLY_AP_STARTUP.

I wasn't able to reproduce your hang with 4BSD, but I think I see a
possible problem.  Try this:

diff --git a/sys/kern/sched_4bsd.c b/sys/kern/sched_4bsd.c
index 7de56b6..d53331a 100644
--- a/sys/kern/sched_4bsd.c
+++ b/sys/kern/sched_4bsd.c
@@ -327,7 +327,6 @@ maybe_preempt(struct thread *td)
 *  - The current thread has a higher (numerically lower) or
 *equivalent priority.  Note that this prevents curthread from
 *trying to preempt to itself.
-*  - It is too early in the boot for context switches (cold is set).
 *  - The current thread has an inhibitor set or is in the process of
 *exiting.  In this case, the current thread is about to switch
 *out anyways, so there's no point in preempting.  If we did,
@@ -348,7 +347,7 @@ maybe_preempt(struct thread *td)
("maybe_preempt: trying to run inhibited thread"));
pri = td->td_priority;
cpri = ctd->td_priority;
-   if (panicstr != NULL || pri >= cpri || cold /* || dumping */ ||
+   if (panicstr != NULL || pri >= cpri /* || dumping */ ||
TD_IS_INHIBITED(ctd))
return (0);
 #ifndef FULL_PREEMPTION
@@ -1127,7 +1126,7 @@ forward_wakeup(int cpunum)
if ((!forward_wakeup_enabled) ||
 (forward_wakeup_use_mask == 0 && forward_wakeup_use_loop == 0))
return (0);
-   if (!smp_started || cold || panicstr)
+   if (!smp_started || panicstr)
return (0);
 
forward_wakeups_requested++;


-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-07-28 Thread John Baldwin
On Thursday, July 28, 2016 12:31:31 AM Gary Jennejohn wrote:
> On Wed, 27 Jul 2016 14:43:36 -0700
> John Baldwin  wrote:
> 
> > On Tuesday, June 07, 2016 12:06:54 PM Gary Jennejohn wrote:
> > > On Tue, 31 May 2016 13:10:06 -0700
> > > John Baldwin  wrote:
> > >   
> > > > On Saturday, May 28, 2016 02:11:41 PM Gary Jennejohn wrote:  
> > > > > On Fri, 27 May 2016 09:50:05 +0200
> > > > > Gary Jennejohn  wrote:
> > > > > 
> > > > > > On Thu, 26 May 2016 16:54:35 -0700
> > > > > > John Baldwin  wrote:
> > > > > > 
> > > > > > > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:  
> > > > > > > > On Mon, 16 May 2016 10:54:19 -0700
> > > > > > > > John Baldwin  wrote:
> > > > > > > > 
> > > > > > > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote: 
> > > > > > > > >
> > > > > > > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I 
> > > > > > > > > > can't
> > > > > > > > > > break into DDB.
> > > > > > > > > > 
> > > > > > > > > > I did a verbose boot and the last lines I see are related 
> > > > > > > > > > to routing
> > > > > > > > > > MSI-X to various local APIC vectors.  I copied the last few 
> > > > > > > > > > lines and
> > > > > > > > > > they look like this:
> > > > > > > > > > 
> > > > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > > > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > > > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> > > > > > > >  ^^^ Assigning
> > > > > > > > > > 
> > > > > > > > > > I tried disabling msi and msix in /boot/loader.conf, but 
> > > > > > > > > > the settings
> > > > > > > > > > were ignored (probabaly too early).  
> > > > > > > > > 
> > > > > > > > > No, those settings are not too early.  However, the routing 
> > > > > > > > > to different
> > > > > > > > > CPUs now happens earlier than it used to.  What is the line 
> > > > > > > > > before the
> > > > > > > > > MSI lines?  You can take a picture with your phone/camera if 
> > > > > > > > > that's simplest.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Here a few lines before the MSI routing happens:
> > > > > > > > 
> > > > > > > > hpet0:  iomem 0xfed0-0xfed003ff 
> > > > > > > > irq 0,8 on acpi0
> > > > > > > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy 
> > > > > > > > route
> > > > > > > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > > > Timecounter "HPET" frequency 14318180 Hz quality 950
> > > > > > > 
> > > > > > > The assigning message means it is in the loop using
> > > > > > > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > > > > > > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > > > > > > disabling the use of per-CPU timers allows you to boot?
> > > > > > >   
> > > > > > 
> > > > > > Something has changed since the last time I generated a kernel with
> > > > > > this option.
> > > > > > 
> > > > > > Now I get a NULL-pointer dereference in the kernel, doesn't matter
> > > > > > whether I set the hint or not.
> > > > > > 
> > > > > 
> > > > > OK, now that the startup has been fixed, I tried setting the hint at
> > > > > the loader prompt, but the kenel hangs in exactly the same place as
> > > > > before.  I actually booted twice to make certain I hadn't made a
> > > > > typo when setting the hint.
> > > > 
> > > > Humm, it shouldn't be calling bus_bind_intr() if the hint is set.  
> > > > Actually,
> > > > I guess it just binds them all to first CPU if per-CPU timers aren't 
> > > > set.
> > > > Can you add debug printfs to hpet_attach() in 
> > > > sys/dev/acpica/acpi_hpet.c to
> > > > narrow down which line in that function it hangs after?
> > > > 
> > > > Another option to try is to add the following to your kernel config:
> > > > 
> > > > options KTR
> > > > options KTR_COMPILE=KTR_PROC
> > > > options KTR_MASK=KTR_PROC
> > > > options KTR_VERBOSE=1
> > > > 
> > > > this will spew a lot of crap to the screen, but if it stops spewing 
> > > > when it
> > > > hangs then it might be tell us where the system is hung.  If you have 
> > > > any way
> > > > to configure a serial console then this would also be useful even if it 
> > > > spews
> > > > constantly when it is hung (assuming you could log the output of the 
> > > > serial
> > > > console).
> > > >   
> > > 
> > > I used the KTR options.
> > > 
> > > After the Timecounter "HPET" frequency 14318180 Hz quality 950 I see
> > > 
> > > cpu0 mi_switch: old thread 1 (swapper)
> > > cpu0 mi_switch: new thread 10022 (if_config_tqg_0)
> > > cpu0 

Re: EARLY_AP_STARTUP hangs during boot

2016-07-27 Thread Gary Jennejohn
On Wed, 27 Jul 2016 14:43:36 -0700
John Baldwin  wrote:

> On Tuesday, June 07, 2016 12:06:54 PM Gary Jennejohn wrote:
> > On Tue, 31 May 2016 13:10:06 -0700
> > John Baldwin  wrote:
> >   
> > > On Saturday, May 28, 2016 02:11:41 PM Gary Jennejohn wrote:  
> > > > On Fri, 27 May 2016 09:50:05 +0200
> > > > Gary Jennejohn  wrote:
> > > > 
> > > > > On Thu, 26 May 2016 16:54:35 -0700
> > > > > John Baldwin  wrote:
> > > > > 
> > > > > > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:  
> > > > > > > On Mon, 16 May 2016 10:54:19 -0700
> > > > > > > John Baldwin  wrote:
> > > > > > > 
> > > > > > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:   
> > > > > > > >  
> > > > > > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > > > > > > break into DDB.
> > > > > > > > > 
> > > > > > > > > I did a verbose boot and the last lines I see are related to 
> > > > > > > > > routing
> > > > > > > > > MSI-X to various local APIC vectors.  I copied the last few 
> > > > > > > > > lines and
> > > > > > > > > they look like this:
> > > > > > > > > 
> > > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> > > > > > >  ^^^ Assigning
> > > > > > > > > 
> > > > > > > > > I tried disabling msi and msix in /boot/loader.conf, but the 
> > > > > > > > > settings
> > > > > > > > > were ignored (probabaly too early).  
> > > > > > > > 
> > > > > > > > No, those settings are not too early.  However, the routing to 
> > > > > > > > different
> > > > > > > > CPUs now happens earlier than it used to.  What is the line 
> > > > > > > > before the
> > > > > > > > MSI lines?  You can take a picture with your phone/camera if 
> > > > > > > > that's simplest.
> > > > > > > > 
> > > > > > > 
> > > > > > > Here a few lines before the MSI routing happens:
> > > > > > > 
> > > > > > > hpet0:  iomem 0xfed0-0xfed003ff 
> > > > > > > irq 0,8 on acpi0
> > > > > > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > > > > > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > > Timecounter "HPET" frequency 14318180 Hz quality 950
> > > > > > 
> > > > > > The assigning message means it is in the loop using
> > > > > > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > > > > > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > > > > > disabling the use of per-CPU timers allows you to boot?
> > > > > >   
> > > > > 
> > > > > Something has changed since the last time I generated a kernel with
> > > > > this option.
> > > > > 
> > > > > Now I get a NULL-pointer dereference in the kernel, doesn't matter
> > > > > whether I set the hint or not.
> > > > > 
> > > > 
> > > > OK, now that the startup has been fixed, I tried setting the hint at
> > > > the loader prompt, but the kenel hangs in exactly the same place as
> > > > before.  I actually booted twice to make certain I hadn't made a
> > > > typo when setting the hint.
> > > 
> > > Humm, it shouldn't be calling bus_bind_intr() if the hint is set.  
> > > Actually,
> > > I guess it just binds them all to first CPU if per-CPU timers aren't set.
> > > Can you add debug printfs to hpet_attach() in sys/dev/acpica/acpi_hpet.c 
> > > to
> > > narrow down which line in that function it hangs after?
> > > 
> > > Another option to try is to add the following to your kernel config:
> > > 
> > > options   KTR
> > > options   KTR_COMPILE=KTR_PROC
> > > options   KTR_MASK=KTR_PROC
> > > options   KTR_VERBOSE=1
> > > 
> > > this will spew a lot of crap to the screen, but if it stops spewing when 
> > > it
> > > hangs then it might be tell us where the system is hung.  If you have any 
> > > way
> > > to configure a serial console then this would also be useful even if it 
> > > spews
> > > constantly when it is hung (assuming you could log the output of the 
> > > serial
> > > console).
> > >   
> > 
> > I used the KTR options.
> > 
> > After the Timecounter "HPET" frequency 14318180 Hz quality 950 I see
> > 
> > cpu0 mi_switch: old thread 1 (swapper)
> > cpu0 mi_switch: new thread 10022 (if_config_tqg_0)
> > cpu0 sleep_broadcast(0x80002f9a600, 0)
> > cpu0 msleep_spin: old thread 100022
> > cpu0 mi_switch: old thread 10022
> > cpu0 mi_switch: new thread 10016 (if_io_tqg_0)
> > cpu0 sleep_broadcast(0x80002f9a780, 0)
> > cpu0 msleep_spin: old thread 10016
> > cpu0 mi_switch: old thread 10016
> > cpu0 fork_exit: new thread 0x80004239510 (td_sched 0x842399d8, pid
> > 10, idle: 

Re: EARLY_AP_STARTUP hangs during boot

2016-07-27 Thread John Baldwin
On Tuesday, June 07, 2016 12:06:54 PM Gary Jennejohn wrote:
> On Tue, 31 May 2016 13:10:06 -0700
> John Baldwin  wrote:
> 
> > On Saturday, May 28, 2016 02:11:41 PM Gary Jennejohn wrote:
> > > On Fri, 27 May 2016 09:50:05 +0200
> > > Gary Jennejohn  wrote:
> > >   
> > > > On Thu, 26 May 2016 16:54:35 -0700
> > > > John Baldwin  wrote:
> > > >   
> > > > > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:
> > > > > > On Mon, 16 May 2016 10:54:19 -0700
> > > > > > John Baldwin  wrote:
> > > > > >   
> > > > > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:  
> > > > > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > > > > > break into DDB.
> > > > > > > > 
> > > > > > > > I did a verbose boot and the last lines I see are related to 
> > > > > > > > routing
> > > > > > > > MSI-X to various local APIC vectors.  I copied the last few 
> > > > > > > > lines and
> > > > > > > > they look like this:
> > > > > > > > 
> > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49  
> > > > > >  ^^^ Assigning  
> > > > > > > > 
> > > > > > > > I tried disabling msi and msix in /boot/loader.conf, but the 
> > > > > > > > settings
> > > > > > > > were ignored (probabaly too early).
> > > > > > > 
> > > > > > > No, those settings are not too early.  However, the routing to 
> > > > > > > different
> > > > > > > CPUs now happens earlier than it used to.  What is the line 
> > > > > > > before the
> > > > > > > MSI lines?  You can take a picture with your phone/camera if 
> > > > > > > that's simplest.
> > > > > > >   
> > > > > > 
> > > > > > Here a few lines before the MSI routing happens:
> > > > > > 
> > > > > > hpet0:  iomem 0xfed0-0xfed003ff irq 
> > > > > > 0,8 on acpi0
> > > > > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > > > > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > > > > Timecounter "HPET" frequency 14318180 Hz quality 950  
> > > > > 
> > > > > The assigning message means it is in the loop using
> > > > > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > > > > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > > > > disabling the use of per-CPU timers allows you to boot?
> > > > > 
> > > > 
> > > > Something has changed since the last time I generated a kernel with
> > > > this option.
> > > > 
> > > > Now I get a NULL-pointer dereference in the kernel, doesn't matter
> > > > whether I set the hint or not.
> > > >   
> > > 
> > > OK, now that the startup has been fixed, I tried setting the hint at
> > > the loader prompt, but the kenel hangs in exactly the same place as
> > > before.  I actually booted twice to make certain I hadn't made a
> > > typo when setting the hint.  
> > 
> > Humm, it shouldn't be calling bus_bind_intr() if the hint is set.  Actually,
> > I guess it just binds them all to first CPU if per-CPU timers aren't set.
> > Can you add debug printfs to hpet_attach() in sys/dev/acpica/acpi_hpet.c to
> > narrow down which line in that function it hangs after?
> > 
> > Another option to try is to add the following to your kernel config:
> > 
> > options KTR
> > options KTR_COMPILE=KTR_PROC
> > options KTR_MASK=KTR_PROC
> > options KTR_VERBOSE=1
> > 
> > this will spew a lot of crap to the screen, but if it stops spewing when it
> > hangs then it might be tell us where the system is hung.  If you have any 
> > way
> > to configure a serial console then this would also be useful even if it 
> > spews
> > constantly when it is hung (assuming you could log the output of the serial
> > console).
> > 
> 
> I used the KTR options.
> 
> After the Timecounter "HPET" frequency 14318180 Hz quality 950 I see
> 
> cpu0 mi_switch: old thread 1 (swapper)
> cpu0 mi_switch: new thread 10022 (if_config_tqg_0)
> cpu0 sleep_broadcast(0x80002f9a600, 0)
> cpu0 msleep_spin: old thread 100022
> cpu0 mi_switch: old thread 10022
> cpu0 mi_switch: new thread 10016 (if_io_tqg_0)
> cpu0 sleep_broadcast(0x80002f9a780, 0)
> cpu0 msleep_spin: old thread 10016
> cpu0 mi_switch: old thread 10016
> cpu0 fork_exit: new thread 0x80004239510 (td_sched 0x842399d8, pid
> 10, idle: cpu0)
> 
> And that's all that came out, really not very much at all.

Ok, that seems odd.

Can you apply this patch and run with the KTR output still:

Index: sched_ule.c
===
--- sched_ule.c (revision 303397)
+++ sched_ule.c (working copy)
@@ -1904,6 +1904,13 @@ sched_switch(struct thread 

Re: EARLY_AP_STARTUP hangs during boot

2016-06-07 Thread Gary Jennejohn
On Tue, 31 May 2016 13:10:06 -0700
John Baldwin  wrote:

> On Saturday, May 28, 2016 02:11:41 PM Gary Jennejohn wrote:
> > On Fri, 27 May 2016 09:50:05 +0200
> > Gary Jennejohn  wrote:
> >   
> > > On Thu, 26 May 2016 16:54:35 -0700
> > > John Baldwin  wrote:
> > >   
> > > > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:
> > > > > On Mon, 16 May 2016 10:54:19 -0700
> > > > > John Baldwin  wrote:
> > > > >   
> > > > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:  
> > > > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > > > > break into DDB.
> > > > > > > 
> > > > > > > I did a verbose boot and the last lines I see are related to 
> > > > > > > routing
> > > > > > > MSI-X to various local APIC vectors.  I copied the last few lines 
> > > > > > > and
> > > > > > > they look like this:
> > > > > > > 
> > > > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49  
> > > > >  ^^^ Assigning  
> > > > > > > 
> > > > > > > I tried disabling msi and msix in /boot/loader.conf, but the 
> > > > > > > settings
> > > > > > > were ignored (probabaly too early).
> > > > > > 
> > > > > > No, those settings are not too early.  However, the routing to 
> > > > > > different
> > > > > > CPUs now happens earlier than it used to.  What is the line before 
> > > > > > the
> > > > > > MSI lines?  You can take a picture with your phone/camera if that's 
> > > > > > simplest.
> > > > > >   
> > > > > 
> > > > > Here a few lines before the MSI routing happens:
> > > > > 
> > > > > hpet0:  iomem 0xfed0-0xfed003ff irq 
> > > > > 0,8 on acpi0
> > > > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > > > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > > > Timecounter "HPET" frequency 14318180 Hz quality 950  
> > > > 
> > > > The assigning message means it is in the loop using
> > > > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > > > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > > > disabling the use of per-CPU timers allows you to boot?
> > > > 
> > > 
> > > Something has changed since the last time I generated a kernel with
> > > this option.
> > > 
> > > Now I get a NULL-pointer dereference in the kernel, doesn't matter
> > > whether I set the hint or not.
> > >   
> > 
> > OK, now that the startup has been fixed, I tried setting the hint at
> > the loader prompt, but the kenel hangs in exactly the same place as
> > before.  I actually booted twice to make certain I hadn't made a
> > typo when setting the hint.  
> 
> Humm, it shouldn't be calling bus_bind_intr() if the hint is set.  Actually,
> I guess it just binds them all to first CPU if per-CPU timers aren't set.
> Can you add debug printfs to hpet_attach() in sys/dev/acpica/acpi_hpet.c to
> narrow down which line in that function it hangs after?
> 
> Another option to try is to add the following to your kernel config:
> 
> options   KTR
> options   KTR_COMPILE=KTR_PROC
> options   KTR_MASK=KTR_PROC
> options   KTR_VERBOSE=1
> 
> this will spew a lot of crap to the screen, but if it stops spewing when it
> hangs then it might be tell us where the system is hung.  If you have any way
> to configure a serial console then this would also be useful even if it spews
> constantly when it is hung (assuming you could log the output of the serial
> console).
> 

I used the KTR options.

After the Timecounter "HPET" frequency 14318180 Hz quality 950 I see

cpu0 mi_switch: old thread 1 (swapper)
cpu0 mi_switch: new thread 10022 (if_config_tqg_0)
cpu0 sleep_broadcast(0x80002f9a600, 0)
cpu0 msleep_spin: old thread 100022
cpu0 mi_switch: old thread 10022
cpu0 mi_switch: new thread 10016 (if_io_tqg_0)
cpu0 sleep_broadcast(0x80002f9a780, 0)
cpu0 msleep_spin: old thread 10016
cpu0 mi_switch: old thread 10016
cpu0 fork_exit: new thread 0x80004239510 (td_sched 0x842399d8, pid
10, idle: cpu0)

And that's all that came out, really not very much at all.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-31 Thread John Baldwin
On Saturday, May 28, 2016 02:11:41 PM Gary Jennejohn wrote:
> On Fri, 27 May 2016 09:50:05 +0200
> Gary Jennejohn  wrote:
> 
> > On Thu, 26 May 2016 16:54:35 -0700
> > John Baldwin  wrote:
> > 
> > > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:  
> > > > On Mon, 16 May 2016 10:54:19 -0700
> > > > John Baldwin  wrote:
> > > > 
> > > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:
> > > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > > > break into DDB.
> > > > > > 
> > > > > > I did a verbose boot and the last lines I see are related to routing
> > > > > > MSI-X to various local APIC vectors.  I copied the last few lines 
> > > > > > and
> > > > > > they look like this:
> > > > > > 
> > > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> > > >  ^^^ Assigning
> > > > > > 
> > > > > > I tried disabling msi and msix in /boot/loader.conf, but the 
> > > > > > settings
> > > > > > were ignored (probabaly too early).  
> > > > > 
> > > > > No, those settings are not too early.  However, the routing to 
> > > > > different
> > > > > CPUs now happens earlier than it used to.  What is the line before the
> > > > > MSI lines?  You can take a picture with your phone/camera if that's 
> > > > > simplest.
> > > > > 
> > > > 
> > > > Here a few lines before the MSI routing happens:
> > > > 
> > > > hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 
> > > > on acpi0
> > > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > > Timecounter "HPET" frequency 14318180 Hz quality 950
> > > 
> > > The assigning message means it is in the loop using
> > > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > > disabling the use of per-CPU timers allows you to boot?
> > >   
> > 
> > Something has changed since the last time I generated a kernel with
> > this option.
> > 
> > Now I get a NULL-pointer dereference in the kernel, doesn't matter
> > whether I set the hint or not.
> > 
> 
> OK, now that the startup has been fixed, I tried setting the hint at
> the loader prompt, but the kenel hangs in exactly the same place as
> before.  I actually booted twice to make certain I hadn't made a
> typo when setting the hint.

Humm, it shouldn't be calling bus_bind_intr() if the hint is set.  Actually,
I guess it just binds them all to first CPU if per-CPU timers aren't set.
Can you add debug printfs to hpet_attach() in sys/dev/acpica/acpi_hpet.c to
narrow down which line in that function it hangs after?

Another option to try is to add the following to your kernel config:

options KTR
options KTR_COMPILE=KTR_PROC
options KTR_MASK=KTR_PROC
options KTR_VERBOSE=1

this will spew a lot of crap to the screen, but if it stops spewing when it
hangs then it might be tell us where the system is hung.  If you have any way
to configure a serial console then this would also be useful even if it spews
constantly when it is hung (assuming you could log the output of the serial
console).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-28 Thread Gary Jennejohn
On Fri, 27 May 2016 09:50:05 +0200
Gary Jennejohn  wrote:

> On Thu, 26 May 2016 16:54:35 -0700
> John Baldwin  wrote:
> 
> > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:  
> > > On Mon, 16 May 2016 10:54:19 -0700
> > > John Baldwin  wrote:
> > > 
> > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:
> > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > > break into DDB.
> > > > > 
> > > > > I did a verbose boot and the last lines I see are related to routing
> > > > > MSI-X to various local APIC vectors.  I copied the last few lines and
> > > > > they look like this:
> > > > > 
> > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> > >  ^^^ Assigning
> > > > > 
> > > > > I tried disabling msi and msix in /boot/loader.conf, but the settings
> > > > > were ignored (probabaly too early).  
> > > > 
> > > > No, those settings are not too early.  However, the routing to different
> > > > CPUs now happens earlier than it used to.  What is the line before the
> > > > MSI lines?  You can take a picture with your phone/camera if that's 
> > > > simplest.
> > > > 
> > > 
> > > Here a few lines before the MSI routing happens:
> > > 
> > > hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 
> > > on acpi0
> > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > Timecounter "HPET" frequency 14318180 Hz quality 950
> > 
> > The assigning message means it is in the loop using
> > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > disabling the use of per-CPU timers allows you to boot?
> >   
> 
> Something has changed since the last time I generated a kernel with
> this option.
> 
> Now I get a NULL-pointer dereference in the kernel, doesn't matter
> whether I set the hint or not.
> 

OK, now that the startup has been fixed, I tried setting the hint at
the loader prompt, but the kenel hangs in exactly the same place as
before.  I actually booted twice to make certain I hadn't made a
typo when setting the hint.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-27 Thread John Baldwin
On Friday, May 27, 2016 09:50:05 AM Gary Jennejohn wrote:
> On Thu, 26 May 2016 16:54:35 -0700
> John Baldwin  wrote:
> 
> > On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:
> > > On Mon, 16 May 2016 10:54:19 -0700
> > > John Baldwin  wrote:
> > >   
> > > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:  
> > > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > > break into DDB.
> > > > > 
> > > > > I did a verbose boot and the last lines I see are related to routing
> > > > > MSI-X to various local APIC vectors.  I copied the last few lines and
> > > > > they look like this:
> > > > > 
> > > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49  
> > >  ^^^ Assigning  
> > > > > 
> > > > > I tried disabling msi and msix in /boot/loader.conf, but the settings
> > > > > were ignored (probabaly too early).
> > > > 
> > > > No, those settings are not too early.  However, the routing to different
> > > > CPUs now happens earlier than it used to.  What is the line before the
> > > > MSI lines?  You can take a picture with your phone/camera if that's 
> > > > simplest.
> > > >   
> > > 
> > > Here a few lines before the MSI routing happens:
> > > 
> > > hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 
> > > on acpi0
> > > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > > Timecounter "HPET" frequency 14318180 Hz quality 950  
> > 
> > The assigning message means it is in the loop using
> > bus_bind_intr() to setup per-CPU timers.  Can you please try
> > setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> > disabling the use of per-CPU timers allows you to boot?
> > 
> 
> Something has changed since the last time I generated a kernel with
> this option.
> 
> Now I get a NULL-pointer dereference in the kernel, doesn't matter
> whether I set the hint or not.
> 
> No crash dump is created.
> 
> Here some trace copied from the console:
> 
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; apic id = 00
> fault virtual address = 0x1818
> fault code = supervisor write data, page not present
> instruction pointer = 0x20:0x805492ef
> [some stack trace]
> taskgroup_adjust() at taskgroup_adjust+0x2f; frame 0x8196c90
> mi_startup() at mi_startup+0x118; frame 0x8196fcb0

Yeah, I have the same on my laptop here.  I'll fix that and get back to you.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-27 Thread Gary Jennejohn
On Thu, 26 May 2016 16:54:35 -0700
John Baldwin  wrote:

> On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:
> > On Mon, 16 May 2016 10:54:19 -0700
> > John Baldwin  wrote:
> >   
> > > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:  
> > > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > > break into DDB.
> > > > 
> > > > I did a verbose boot and the last lines I see are related to routing
> > > > MSI-X to various local APIC vectors.  I copied the last few lines and
> > > > they look like this:
> > > > 
> > > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49  
> >  ^^^ Assigning  
> > > > 
> > > > I tried disabling msi and msix in /boot/loader.conf, but the settings
> > > > were ignored (probabaly too early).
> > > 
> > > No, those settings are not too early.  However, the routing to different
> > > CPUs now happens earlier than it used to.  What is the line before the
> > > MSI lines?  You can take a picture with your phone/camera if that's 
> > > simplest.
> > >   
> > 
> > Here a few lines before the MSI routing happens:
> > 
> > hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 on 
> > acpi0
> > hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> > hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> > hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> > hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> > Timecounter "HPET" frequency 14318180 Hz quality 950  
> 
> The assigning message means it is in the loop using
> bus_bind_intr() to setup per-CPU timers.  Can you please try
> setting 'hint.hpet.0.per_cpu=0' at the loader prompt to see if
> disabling the use of per-CPU timers allows you to boot?
> 

Something has changed since the last time I generated a kernel with
this option.

Now I get a NULL-pointer dereference in the kernel, doesn't matter
whether I set the hint or not.

No crash dump is created.

Here some trace copied from the console:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x1818
fault code = supervisor write data, page not present
instruction pointer = 0x20:0x805492ef
[some stack trace]
taskgroup_adjust() at taskgroup_adjust+0x2f; frame 0x8196c90
mi_startup() at mi_startup+0x118; frame 0x8196fcb0

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-26 Thread John Baldwin
On Tuesday, May 17, 2016 06:47:41 PM Gary Jennejohn wrote:
> On Mon, 16 May 2016 10:54:19 -0700
> John Baldwin  wrote:
> 
> > On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:
> > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > break into DDB.
> > > 
> > > I did a verbose boot and the last lines I see are related to routing
> > > MSI-X to various local APIC vectors.  I copied the last few lines and
> > > they look like this:
> > > 
> > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
>  ^^^ Assigning
> > > 
> > > I tried disabling msi and msix in /boot/loader.conf, but the settings
> > > were ignored (probabaly too early).  
> > 
> > No, those settings are not too early.  However, the routing to different
> > CPUs now happens earlier than it used to.  What is the line before the
> > MSI lines?  You can take a picture with your phone/camera if that's 
> > simplest.
> > 
> 
> Here a few lines before the MSI routing happens:
> 
> hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 on 
> acpi0
> hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
> hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
> hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
> hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
> Timecounter "HPET" frequency 14318180 Hz quality 950

The assigning message means it is in the loop using bus_bind_intr() to setup
per-CPU timers.  Can you please try setting 'hint.hpet.0.per_cpu=0' at the
loader prompt to see if disabling the use of per-CPU timers allows you to
boot?

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-17 Thread Gary Jennejohn
On Mon, 16 May 2016 10:54:19 -0700
John Baldwin  wrote:

> On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:
> > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > break into DDB.
> > 
> > I did a verbose boot and the last lines I see are related to routing
> > MSI-X to various local APIC vectors.  I copied the last few lines and
> > they look like this:
> > 
> > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
 ^^^ Assigning
> > 
> > I tried disabling msi and msix in /boot/loader.conf, but the settings
> > were ignored (probabaly too early).  
> 
> No, those settings are not too early.  However, the routing to different
> CPUs now happens earlier than it used to.  What is the line before the
> MSI lines?  You can take a picture with your phone/camera if that's simplest.
> 

Here a few lines before the MSI routing happens:

hpet0:  iomem 0xfed0-0xfed003ff irq 0,8 on acpi0
hpet0: vendor 0x4353, rev 0x1, 14318180 Hz, 3 timers, legacy route
hpet0: t0 : irqs 0x00c0ff (0), MSI, periodic
hpet0: t1 : irqs 0x00c0ff (0), MSI, periodic
hpet0: t2 : irqs 0x00c0ff (0), MSI, periodic
Timecounter "HPET" frequency 14318180 Hz quality 950

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-16 Thread John Baldwin
On Monday, May 16, 2016 12:22:42 PM Gary Jennejohn wrote:
> I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> break into DDB.
> 
> I did a verbose boot and the last lines I see are related to routing
> MSI-X to various local APIC vectors.  I copied the last few lines and
> they look like this:
> 
> msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> 
> I tried disabling msi and msix in /boot/loader.conf, but the settings
> were ignored (probabaly too early).

No, those settings are not too early.  However, the routing to different
CPUs now happens earlier than it used to.  What is the line before the
MSI lines?  You can take a picture with your phone/camera if that's simplest.

> I'm running on a AMD Phenom(tm) II X6 1090T Processor.
> 
> So, maybe this option only really works correctly on Intel CPUs?

No, there is absolutely zero/zilch/nada about this that is specific to
Intel CPUs.  Very, very little in FreeBSD is specific to AMD vs Intel
CPUs.  It is, OTOH, quite likely that this is specific to device driver
for a piece of hardware.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-16 Thread Gary Jennejohn
On Mon, 16 May 2016 13:38:35 +0300
Konstantin Belousov  wrote:

> On Mon, May 16, 2016 at 12:27:31PM +0200, Gary Jennejohn wrote:
> > On Mon, 16 May 2016 12:22:42 +0200
> > Gary Jennejohn  wrote:
> >   
> > > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > > break into DDB.
> > > 
> > > I did a verbose boot and the last lines I see are related to routing
> > > MSI-X to various local APIC vectors.  I copied the last few lines and
> > > they look like this:
> > > 
> > > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49  
> > 
> > Oops, the the last line should read
> > msi: Assigning MSI-X IRQ 256 to local APIC 0 vector 49  
> You should be able to enter ddb at this point. Methods depend on the
> console used, serial break for serial console, ctrl-alt-esc for sc/vt
> AFAIR. If you have IPMI/DRAC/ILO, send nmi.
> 
> After getting at ddb> prompt, do 'bt' then 'ps' then 'alltrace' and
> show the output.

I tried ctrl-alt-esc, nothing happens.  Maybe because USB isn't
up yet.

Judging from the way the case fan starts ramping up it seems like
the CPU is in a tight loop.

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-16 Thread Konstantin Belousov
On Mon, May 16, 2016 at 12:27:31PM +0200, Gary Jennejohn wrote:
> On Mon, 16 May 2016 12:22:42 +0200
> Gary Jennejohn  wrote:
> 
> > I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> > break into DDB.
> > 
> > I did a verbose boot and the last lines I see are related to routing
> > MSI-X to various local APIC vectors.  I copied the last few lines and
> > they look like this:
> > 
> > msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> > msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> > msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> > msi: routing MSI-X IRQ 256 to local APIC 0 vector 49
> 
> Oops, the the last line should read
> msi: Assigning MSI-X IRQ 256 to local APIC 0 vector 49
You should be able to enter ddb at this point. Methods depend on the
console used, serial break for serial console, ctrl-alt-esc for sc/vt
AFAIR. If you have IPMI/DRAC/ILO, send nmi.

After getting at ddb> prompt, do 'bt' then 'ps' then 'alltrace' and
show the output.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: EARLY_AP_STARTUP hangs during boot

2016-05-16 Thread Gary Jennejohn
On Mon, 16 May 2016 12:22:42 +0200
Gary Jennejohn  wrote:

> I tried out EARLY_AP_STARTUP, but the kernel hangs and I can't
> break into DDB.
> 
> I did a verbose boot and the last lines I see are related to routing
> MSI-X to various local APIC vectors.  I copied the last few lines and
> they look like this:
> 
> msi: routing MSI-X IRQ 256 to local APIC 2 vector 48
> msi: routing MSI-X IRQ 257 to local APIC 3 vector 48
> msi: routing MSI-X IRQ 258 to local APIC 4 vector 48
> msi: routing MSI-X IRQ 256 to local APIC 0 vector 49

Oops, the the last line should read
msi: Assigning MSI-X IRQ 256 to local APIC 0 vector 49

> I tried disabling msi and msix in /boot/loader.conf, but the settings
> were ignored (probabaly too early).
> 
> I'm running on a AMD Phenom(tm) II X6 1090T Processor.
> 
> So, maybe this option only really works correctly on Intel CPUs?
> 

-- 
Gary Jennejohn
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"