subject:"stable\/11 debugging kernel unable to produce crashdump again"

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Eugene Grosbein

On 25.07.2017 00:22, Mark Johnston wrote:
> On Tue, Jul 25, 2017 at 12:03:05AM +0700, Eugene Grosbein wrote:
>> Thanks, this helped:
>>
>> $ addr2line -f -e kernel.debug 0x80919c00
>> g_raid_shutdown_post_sync
>> /home/src/sys/geom/raid/g_raid.c:2458
>>
>> That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called just 
>> before
>> crashdump generation but works just fine during normal system shutdown.
> 
> I think graid probably needs a treatment similar to r301173/r316032.
> g_raid_shutdown_post_sync() appears to be quite similar to the
> corresponding gmirror handler. In particular, it just attempts to mark
> the individual components as clean and destroy the GEOM, which is not
> really safe after a panic.
> 
> diff --git a/sys/geom/raid/g_raid.c b/sys/geom/raid/g_raid.c
> index 7a1fd8c5ce2e..aa2529d5466a 100644
> --- a/sys/geom/raid/g_raid.c
> +++ b/sys/geom/raid/g_raid.c
> @@ -2461,6 +2461,9 @@ g_raid_shutdown_post_sync(void *arg, int howto)
>   struct g_raid_softc *sc;
>   struct g_raid_volume *vol;
>  
> + if (panicstr != NULL)
> + return;
> +
>   mp = arg;
>   g_topology_lock();
>   g_raid_shutdown = 1;
> 

I'r rather leave this to Alexander.

Funny thing is that it's not 100% hangs if I add some debugging printfs:
more printfs added, more probability that it does not hang and proceeds
to successfull crashdump generation. I use old "sc" console (not vt), if that 
matters.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Warner Losh

I've often wondered why, for CAM at least, we don't automatically fall back
to the dump way when scheduling is stopped rather than have two different
interfaces and special knowledge of this in a lot of places...

Warner

On Mon, Jul 24, 2017 at 11:25 AM, Alexander Motin  wrote:

> I guess that problem of g_raid_shutdown_post_sync in case of panic can
> be explained by the fact it tries to write clean metadata in regular
> (not dumping) way while system is already in panic mode and there is no
> proper scheduling.  May be it could be just bypassed in case of dumping
> (should be trivial and probably OK), or use g_raid_subdisk_kerneldump()
> in that case instead of normal GEOM I/O.
>
> On 24.07.2017 20:03, Eugene Grosbein wrote:
> > CCing mav@ as graid expert.
> >
> > On 24.07.2017 08:44, Mark Johnston wrote:
> >
> >>> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing
> crashdump:
> >>>
> >>> - "call doadump" from DDB prompt works just fine;
> >>> - "shutdown -r now" reboots the system without problems;
> >>> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system
> hangs just afer showing uptime
> >>> instead of continuing with crashdump generation; same if "real" panic
> occurs.
> >>>
> >>> Same for debug.minidump set to 1 or 0. How do I debug this?
> >>
> >> I'm not able to reproduce the problem in bhyve using r321401. Looking
> >> at the code, the culprits might be cngrab(), or one of the
> >> shutdown_post_sync eventhandlers. Since you're apparently able to see
> >> the console output at the time of the panic, I guess it's probably the
> >> latter. Could you try your test with the patch below applied? It'll
> >> print a bunch of "entering post_sync"/"leaving post_sync" messages with
> >> addresses that can be resolved using kgdb. That'll help determine where
> >> we're getting stuck.
> >>
> >> Index: sys/sys/eventhandler.h
> >> ===
> >> --- sys/sys/eventhandler.h   (revision 321401)
> >> +++ sys/sys/eventhandler.h   (working copy)
> >> @@ -85,7 +85,11 @@
> >>  _t = (struct eventhandler_entry_ ## name *)_ep; \
> >>  CTR1(KTR_EVH, "eventhandler_invoke: executing %p",
> \
> >>  (void *)_t->eh_func);   \
> >> +if (strcmp(__STRING(name), "shutdown_post_sync")
> == 0) \
> >> +printf("entering post_sync %p\n", (void
> *)_t->eh_func); \
> >>  _t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
> >> +if (strcmp(__STRING(name), "shutdown_post_sync")
> == 0) \
> >> +printf("leaving post_sync %p\n", (void
> *)_t->eh_func); \
> >>  EHL_LOCK((list));   \
> >>  }   \
> >>  }   \
> >>
> >
> > Thanks, this helped:
> >
> > $ addr2line -f -e kernel.debug 0x80919c00
> > g_raid_shutdown_post_sync
> > /home/src/sys/geom/raid/g_raid.c:2458
> >
> > That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called
> just before
> > crashdump generation but works just fine during normal system shutdown.
> >
> > I should note my graid's RAID1 is running in degraded state currently
> > due to dead SSD module that does not respond. Here is part of boot log:
> >
> > ahcich5: AHCI reset: device not ready after 31000ms (tfd = 0080)
> > ahcich5: Poll timeout on slot 2 port 0
> > ahcich5: is  cs 0004 ss  rs 0004 tfd 80 serr
>  cmd c217
> > (aprobe2:ahcich5:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00
> 00 00 00
> > (aprobe2:ahcich5:0:0:0): CAM status: Command timeout
> > (aprobe2:ahcich5:0:0:0): Error 5, Retries exhausted
> > run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
> > ahcich5: Poll timeout on slot 3 port 0
> > ahcich5: is  cs 0008 ss  rs 0008 tfd 80 serr
>  cmd c317
> > (aprobe2:ahcich5:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00
> 00 00
> > (aprobe2:ahcich5:0:0:0): CAM status: Command timeout
> > (aprobe2:ahcich5:0:0:0): Error 5, Retries exhausted
> > [skip]
> > Trying to mount root from ufs:/dev/raid/r0s4a [rw,noatime]...
> > Root mount waiting for: GRAID-Intel
> > Root mount waiting for: GRAID-Intel
> > Root mount waiting for: GRAID-Intel
> > Root mount waiting for: GRAID-Intel
> > Root mount waiting for: GRAID-Intel
> > GEOM_RAID: Intel-c291fe96: Force array start due to timeout.
> > GEOM_RAID: Intel-c291fe96: Disk ada0 state changed from NONE to ACTIVE.
> > GEOM_RAID: Intel-c291fe96: Subdisk r0:0-ada0 state changed from NONE to
> STALE.
> > GEOM_RAID: Intel-c291fe96: Array started.
> > GEOM_RAID: Intel-c291fe96: Subdisk r0:0-ada0 state changed from STALE to
> ACTIVE.
> > GEOM_RAID: Intel-c291fe96: Volume r0 state

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Alexander Motin

I guess that problem of g_raid_shutdown_post_sync in case of panic can
be explained by the fact it tries to write clean metadata in regular
(not dumping) way while system is already in panic mode and there is no
proper scheduling.  May be it could be just bypassed in case of dumping
(should be trivial and probably OK), or use g_raid_subdisk_kerneldump()
in that case instead of normal GEOM I/O.

On 24.07.2017 20:03, Eugene Grosbein wrote:
> CCing mav@ as graid expert.
> 
> On 24.07.2017 08:44, Mark Johnston wrote:
> 
>>> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:
>>>
>>> - "call doadump" from DDB prompt works just fine;
>>> - "shutdown -r now" reboots the system without problems;
>>> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs 
>>> just afer showing uptime
>>> instead of continuing with crashdump generation; same if "real" panic 
>>> occurs.
>>>
>>> Same for debug.minidump set to 1 or 0. How do I debug this?
>>
>> I'm not able to reproduce the problem in bhyve using r321401. Looking
>> at the code, the culprits might be cngrab(), or one of the
>> shutdown_post_sync eventhandlers. Since you're apparently able to see
>> the console output at the time of the panic, I guess it's probably the
>> latter. Could you try your test with the patch below applied? It'll
>> print a bunch of "entering post_sync"/"leaving post_sync" messages with
>> addresses that can be resolved using kgdb. That'll help determine where
>> we're getting stuck.
>>
>> Index: sys/sys/eventhandler.h
>> ===
>> --- sys/sys/eventhandler.h   (revision 321401)
>> +++ sys/sys/eventhandler.h   (working copy)
>> @@ -85,7 +85,11 @@
>>  _t = (struct eventhandler_entry_ ## name *)_ep; \
>>  CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
>>  (void *)_t->eh_func);   \
>> +if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
>> +printf("entering post_sync %p\n", (void 
>> *)_t->eh_func); \
>>  _t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
>> +if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
>> +printf("leaving post_sync %p\n", (void 
>> *)_t->eh_func); \
>>  EHL_LOCK((list));   \
>>  }   \
>>  }   \
>>
> 
> Thanks, this helped:
> 
> $ addr2line -f -e kernel.debug 0x80919c00
> g_raid_shutdown_post_sync
> /home/src/sys/geom/raid/g_raid.c:2458
> 
> That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called just 
> before
> crashdump generation but works just fine during normal system shutdown.
> 
> I should note my graid's RAID1 is running in degraded state currently
> due to dead SSD module that does not respond. Here is part of boot log:
> 
> ahcich5: AHCI reset: device not ready after 31000ms (tfd = 0080)
> ahcich5: Poll timeout on slot 2 port 0
> ahcich5: is  cs 0004 ss  rs 0004 tfd 80 serr  
> cmd c217
> (aprobe2:ahcich5:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 
> 00 00
> (aprobe2:ahcich5:0:0:0): CAM status: Command timeout
> (aprobe2:ahcich5:0:0:0): Error 5, Retries exhausted
> run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
> ahcich5: Poll timeout on slot 3 port 0
> ahcich5: is  cs 0008 ss  rs 0008 tfd 80 serr  
> cmd c317
> (aprobe2:ahcich5:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
> (aprobe2:ahcich5:0:0:0): CAM status: Command timeout
> (aprobe2:ahcich5:0:0:0): Error 5, Retries exhausted
> [skip]
> Trying to mount root from ufs:/dev/raid/r0s4a [rw,noatime]...
> Root mount waiting for: GRAID-Intel
> Root mount waiting for: GRAID-Intel
> Root mount waiting for: GRAID-Intel
> Root mount waiting for: GRAID-Intel
> Root mount waiting for: GRAID-Intel
> GEOM_RAID: Intel-c291fe96: Force array start due to timeout.
> GEOM_RAID: Intel-c291fe96: Disk ada0 state changed from NONE to ACTIVE.
> GEOM_RAID: Intel-c291fe96: Subdisk r0:0-ada0 state changed from NONE to STALE.
> GEOM_RAID: Intel-c291fe96: Array started.
> GEOM_RAID: Intel-c291fe96: Subdisk r0:0-ada0 state changed from STALE to 
> ACTIVE.
> GEOM_RAID: Intel-c291fe96: Volume r0 state changed from STARTING to DEGRADED.
> GEOM_RAID: Intel-c291fe96: Provider raid/r0 for volume r0 created.
> 
>  
> 

-- 
Alexander Motin
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Mark Johnston

On Tue, Jul 25, 2017 at 12:03:05AM +0700, Eugene Grosbein wrote:
> Thanks, this helped:
> 
> $ addr2line -f -e kernel.debug 0x80919c00
> g_raid_shutdown_post_sync
> /home/src/sys/geom/raid/g_raid.c:2458
> 
> That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called just 
> before
> crashdump generation but works just fine during normal system shutdown.

I think graid probably needs a treatment similar to r301173/r316032.
g_raid_shutdown_post_sync() appears to be quite similar to the
corresponding gmirror handler. In particular, it just attempts to mark
the individual components as clean and destroy the GEOM, which is not
really safe after a panic.

diff --git a/sys/geom/raid/g_raid.c b/sys/geom/raid/g_raid.c
index 7a1fd8c5ce2e..aa2529d5466a 100644
--- a/sys/geom/raid/g_raid.c
+++ b/sys/geom/raid/g_raid.c
@@ -2461,6 +2461,9 @@ g_raid_shutdown_post_sync(void *arg, int howto)
struct g_raid_softc *sc;
struct g_raid_volume *vol;
 
+   if (panicstr != NULL)
+   return;
+
mp = arg;
g_topology_lock();
g_raid_shutdown = 1;
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Eugene Grosbein

CCing mav@ as graid expert.

On 24.07.2017 08:44, Mark Johnston wrote:

>> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:
>>
>> - "call doadump" from DDB prompt works just fine;
>> - "shutdown -r now" reboots the system without problems;
>> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs 
>> just afer showing uptime
>> instead of continuing with crashdump generation; same if "real" panic occurs.
>>
>> Same for debug.minidump set to 1 or 0. How do I debug this?
> 
> I'm not able to reproduce the problem in bhyve using r321401. Looking
> at the code, the culprits might be cngrab(), or one of the
> shutdown_post_sync eventhandlers. Since you're apparently able to see
> the console output at the time of the panic, I guess it's probably the
> latter. Could you try your test with the patch below applied? It'll
> print a bunch of "entering post_sync"/"leaving post_sync" messages with
> addresses that can be resolved using kgdb. That'll help determine where
> we're getting stuck.
> 
> Index: sys/sys/eventhandler.h
> ===
> --- sys/sys/eventhandler.h(revision 321401)
> +++ sys/sys/eventhandler.h(working copy)
> @@ -85,7 +85,11 @@
>   _t = (struct eventhandler_entry_ ## name *)_ep; \
>   CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
>   (void *)_t->eh_func);   \
> + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
> + printf("entering post_sync %p\n", (void 
> *)_t->eh_func); \
>   _t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
> + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
> + printf("leaving post_sync %p\n", (void 
> *)_t->eh_func); \
>   EHL_LOCK((list));   \
>   }   \
>   }   \
> 

Thanks, this helped:

$ addr2line -f -e kernel.debug 0x80919c00
g_raid_shutdown_post_sync
/home/src/sys/geom/raid/g_raid.c:2458

That is GEOM_RAID's g_raid_shutdown_post_sync() that hangs if called just before
crashdump generation but works just fine during normal system shutdown.

I should note my graid's RAID1 is running in degraded state currently
due to dead SSD module that does not respond. Here is part of boot log:

ahcich5: AHCI reset: device not ready after 31000ms (tfd = 0080)
ahcich5: Poll timeout on slot 2 port 0
ahcich5: is  cs 0004 ss  rs 0004 tfd 80 serr  
cmd c217
(aprobe2:ahcich5:0:0:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 
00
(aprobe2:ahcich5:0:0:0): CAM status: Command timeout
(aprobe2:ahcich5:0:0:0): Error 5, Retries exhausted
run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
ahcich5: Poll timeout on slot 3 port 0
ahcich5: is  cs 0008 ss  rs 0008 tfd 80 serr  
cmd c317
(aprobe2:ahcich5:0:0:0): SOFT_RESET. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe2:ahcich5:0:0:0): CAM status: Command timeout
(aprobe2:ahcich5:0:0:0): Error 5, Retries exhausted
[skip]
Trying to mount root from ufs:/dev/raid/r0s4a [rw,noatime]...
Root mount waiting for: GRAID-Intel
Root mount waiting for: GRAID-Intel
Root mount waiting for: GRAID-Intel
Root mount waiting for: GRAID-Intel
Root mount waiting for: GRAID-Intel
GEOM_RAID: Intel-c291fe96: Force array start due to timeout.
GEOM_RAID: Intel-c291fe96: Disk ada0 state changed from NONE to ACTIVE.
GEOM_RAID: Intel-c291fe96: Subdisk r0:0-ada0 state changed from NONE to STALE.
GEOM_RAID: Intel-c291fe96: Array started.
GEOM_RAID: Intel-c291fe96: Subdisk r0:0-ada0 state changed from STALE to ACTIVE.
GEOM_RAID: Intel-c291fe96: Volume r0 state changed from STARTING to DEGRADED.
GEOM_RAID: Intel-c291fe96: Provider raid/r0 for volume r0 created.

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-24 Thread Eugene Grosbein

On 24.07.2017 08:44, Mark Johnston wrote:

>> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:
> 
> Is this amd64 GENERIC, or something else?

Custom kernel, amd64.

> 
>>
>> - "call doadump" from DDB prompt works just fine;
>> - "shutdown -r now" reboots the system without problems;
>> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs 
>> just afer showing uptime
>> instead of continuing with crashdump generation; same if "real" panic occurs.
>>
>> Same for debug.minidump set to 1 or 0. How do I debug this?
> 
> I'm not able to reproduce the problem in bhyve using r321401. Looking
> at the code, the culprits might be cngrab(), or one of the
> shutdown_post_sync eventhandlers. Since you're apparently able to see
> the console output at the time of the panic, I guess it's probably the
> latter. Could you try your test with the patch below applied? It'll
> print a bunch of "entering post_sync"/"leaving post_sync" messages with
> addresses that can be resolved using kgdb. That'll help determine where
> we're getting stuck.
> 
> Index: sys/sys/eventhandler.h
> ===
> --- sys/sys/eventhandler.h(revision 321401)
> +++ sys/sys/eventhandler.h(working copy)
> @@ -85,7 +85,11 @@
>   _t = (struct eventhandler_entry_ ## name *)_ep; \
>   CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
>   (void *)_t->eh_func);   \
> + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
> + printf("entering post_sync %p\n", (void 
> *)_t->eh_func); \
>   _t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
> + if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
> + printf("leaving post_sync %p\n", (void 
> *)_t->eh_func); \
>   EHL_LOCK((list));   \
>   }   \
>   }   \
> 

I'l try this evening.


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/11 debugging kernel unable to produce crashdump again

2017-07-23 Thread Mark Johnston

On Sun, Jul 23, 2017 at 04:26:45PM +0700, Eugene Grosbein wrote:
> On 14.01.2017 18:40, Eugene Grosbein wrote:
> > 
> >> I suspect that this is because we only stop the scheduler upon a panic
> >> if SMP is configured. Can you retest with the patch below applied?
> >>
> >> Index: sys/kern/kern_shutdown.c
> >> ===
> >> --- sys/kern/kern_shutdown.c   (revision 312082)
> >> +++ sys/kern/kern_shutdown.c   (working copy)
> >> @@ -713,6 +713,7 @@
> >>CPU_CLR(PCPU_GET(cpuid), _cpus);
> >>stop_cpus_hard(other_cpus);
> >>}
> >> +#endif
> >>  
> >>/*
> >> * Ensure that the scheduler is stopped while panicking, even if panic
> >> @@ -719,7 +720,6 @@
> >> * has been entered from kdb.
> >> */
> >>td->td_stopsched = 1;
> >> -#endif
> >>  
> >>bootopt = RB_AUTOBOOT;
> >>newpanic = 0;
> >>
> >>
> > 
> > Indeed, my router is uniprocessor system and your patch really solves the 
> > problem.
> > Now kernel generates crashdump just fine in case of panic. Please commit 
> > the fix, thanks!
> 
> Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:

Is this amd64 GENERIC, or something else?

> 
> - "call doadump" from DDB prompt works just fine;
> - "shutdown -r now" reboots the system without problems;
> - "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just 
> afer showing uptime
> instead of continuing with crashdump generation; same if "real" panic occurs.
> 
> Same for debug.minidump set to 1 or 0. How do I debug this?

I'm not able to reproduce the problem in bhyve using r321401. Looking
at the code, the culprits might be cngrab(), or one of the
shutdown_post_sync eventhandlers. Since you're apparently able to see
the console output at the time of the panic, I guess it's probably the
latter. Could you try your test with the patch below applied? It'll
print a bunch of "entering post_sync"/"leaving post_sync" messages with
addresses that can be resolved using kgdb. That'll help determine where
we're getting stuck.

Index: sys/sys/eventhandler.h
===
--- sys/sys/eventhandler.h  (revision 321401)
+++ sys/sys/eventhandler.h  (working copy)
@@ -85,7 +85,11 @@
_t = (struct eventhandler_entry_ ## name *)_ep; \
CTR1(KTR_EVH, "eventhandler_invoke: executing %p", \
(void *)_t->eh_func);   \
+   if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+   printf("entering post_sync %p\n", (void 
*)_t->eh_func); \
_t->eh_func(_ep->ee_arg , ## __VA_ARGS__);  \
+   if (strcmp(__STRING(name), "shutdown_post_sync") == 0) \
+   printf("leaving post_sync %p\n", (void 
*)_t->eh_func); \
EHL_LOCK((list));   \
}   \
}   \
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

stable/11 debugging kernel unable to produce crashdump again

2017-07-23 Thread Eugene Grosbein

On 14.01.2017 18:40, Eugene Grosbein wrote:
> 
>> I suspect that this is because we only stop the scheduler upon a panic
>> if SMP is configured. Can you retest with the patch below applied?
>>
>> Index: sys/kern/kern_shutdown.c
>> ===
>> --- sys/kern/kern_shutdown.c (revision 312082)
>> +++ sys/kern/kern_shutdown.c (working copy)
>> @@ -713,6 +713,7 @@
>>  CPU_CLR(PCPU_GET(cpuid), _cpus);
>>  stop_cpus_hard(other_cpus);
>>  }
>> +#endif
>>  
>>  /*
>>   * Ensure that the scheduler is stopped while panicking, even if panic
>> @@ -719,7 +720,6 @@
>>   * has been entered from kdb.
>>   */
>>  td->td_stopsched = 1;
>> -#endif
>>  
>>  bootopt = RB_AUTOBOOT;
>>  newpanic = 0;
>>
>>
> 
> Indeed, my router is uniprocessor system and your patch really solves the 
> problem.
> Now kernel generates crashdump just fine in case of panic. Please commit the 
> fix, thanks!

Sadly, this time 11.1-STABLE r321371 SMP hangs instead of doing crashdump:

- "call doadump" from DDB prompt works just fine;
- "shutdown -r now" reboots the system without problems;
- "sysctl debug.kdb.panic=1" triggers a panic just fine but system hangs just 
afer showing uptime
instead of continuing with crashdump generation; same if "real" panic occurs.

Same for debug.minidump set to 1 or 0. How do I debug this?

Eugene Grosbein

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: stable/11 debugging kernel unable to produce crashdump again

Re: stable/11 debugging kernel unable to produce crashdump again

Re: stable/11 debugging kernel unable to produce crashdump again

Re: stable/11 debugging kernel unable to produce crashdump again

Re: stable/11 debugging kernel unable to produce crashdump again

Re: stable/11 debugging kernel unable to produce crashdump again

Re: stable/11 debugging kernel unable to produce crashdump again

stable/11 debugging kernel unable to produce crashdump again

8 matches

Site Navigation

Mail list logo

Footer information