Re: [j-nsp] mx960 crashed
It seems not be documented by juniper - atleast i couldnt find any (MX related) info. However its a basic linux procedure, see: https://en.wikipedia.org/wiki/Magic_SysRq_key Juniper only has some info regarding SysRQ & the IDP series at: https://kb.juniper.net/InfoCenter/index?page=content&id=KB6660&actp=MET ADATA -Jonas Am Donnerstag, den 05.04.2018, 09:34 -0500 schrieb Aaron Gould: > Thanks Rob, Is a break followed by c within 5 seconds a documented > way to > crash a RE-S-X6-64G ? > > Btw, Jtac couldn't find the dump > > Rma'ing RE ... said bad SSD on RE > > -Aaron > > -Original Message- > From: Rob Foehl [mailto:r...@loonybin.net] > Sent: Wednesday, April 4, 2018 2:11 PM > To: Aaron Gould > Cc: juniper-nsp@puck.nether.net > Subject: Re: [j-nsp] mx960 crashed > > On Wed, 4 Apr 2018, Aaron Gould wrote: > > > > > Any idea why this happened and how do I tshoot cause ? > > > > login: root > > > > Password:SysRq : Trigger a crash > Looks like you're running a RE-S-X6-64G, and somehow sent it SysRq c > -- > which is a break followed by c within 5 seconds on a serial console > -- and > the hypervisor dutifully crashed and wrote out a dump. Can't really > blame > it for doing what it's told. > > -Rob > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp signature.asc Description: This is a digitally signed message part ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] mx960 crashed
Thanks Rob, Is a break followed by c within 5 seconds a documented way to crash a RE-S-X6-64G ? Btw, Jtac couldn't find the dump Rma'ing RE ... said bad SSD on RE -Aaron -Original Message- From: Rob Foehl [mailto:r...@loonybin.net] Sent: Wednesday, April 4, 2018 2:11 PM To: Aaron Gould Cc: juniper-nsp@puck.nether.net Subject: Re: [j-nsp] mx960 crashed On Wed, 4 Apr 2018, Aaron Gould wrote: > Any idea why this happened and how do I tshoot cause ? > login: root > > Password:SysRq : Trigger a crash Looks like you're running a RE-S-X6-64G, and somehow sent it SysRq c -- which is a break followed by c within 5 seconds on a serial console -- and the hypervisor dutifully crashed and wrote out a dump. Can't really blame it for doing what it's told. -Rob ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] mx960 crashed
On Wed, 4 Apr 2018, Aaron Gould wrote: Any idea why this happened and how do I tshoot cause ? login: root Password:SysRq : Trigger a crash Looks like you're running a RE-S-X6-64G, and somehow sent it SysRq c -- which is a break followed by c within 5 seconds on a serial console -- and the hypervisor dutifully crashed and wrote out a dump. Can't really blame it for doing what it's told. -Rob ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] mx960 crashed
Thanks Jtac says “DMA errors indicate a faulty SSD” processing RMA of RE1 agould@mx960> show chassis routing-engine | grep "reason|slot|uptime" Slot 0: Uptime 19 days, 23 hours, 37 minutes, 40 seconds Last reboot reason 0x4000:VJUNOS reboot Slot 1: Uptime 5 hours, 52 minutes, 22 seconds Last reboot reason 0x800:reboot due to exception No /dev/mapper/ seen anywhere RE0 … /dev/kmemLast changed: Mar 15 14:14:49 /dev/ksyms Last changed: Mar 15 14:15:16 /dev/led/Last changed: Mar 15 14:14:52 /dev/mch Last changed: Mar 15 14:23:50 /dev/md0 Last changed: Mar 15 14:23:50 /dev/md0.uzipLast changed: Mar 15 14:14:49 /dev/md1 Last changed: Mar 15 14:14:53 /dev/md1.uzipLast changed: Mar 15 14:14:53 … BU RE1 … /dev/ksyms Last changed: Apr 04 08:08:38 /dev/led/Last changed: Apr 04 08:08:16 /dev/mch Last changed: Apr 04 08:09:06 /dev/md0 Last changed: Apr 04 08:09:06 /dev/md0.uzipLast changed: Apr 04 08:08:13 … From: Graham Brown [mailto:juniper-...@grahambrown.info] Sent: Wednesday, April 4, 2018 12:55 PM To: Aaron Gould Cc: juniper-nsp@puck.nether.net Subject: Re: [j-nsp] mx960 crashed Hi Aaron, Looks like you have a core file there, raise a TAC case and they’ll be able to determine the root cause for you. Also see if there is anything readable in /dev/mapper/jvg_P-jlvmrootrw/var/crash/override_reboot_reason that may point you to the root cause. HTH, Graham On Thu, 5 Apr 2018 at 01:13, Aaron Gould wrote: Still in the process of turning up this new 5-node mx960 100 gig ring, and I went to the backup re console and went to login and saw this happen. Any idea why this happened and how do I tshoot cause ? FreeBSD/amd64 (mx960) (ttyu0) login: root Password:SysRq : Trigger a crash dmar: DRHD: handling fault status reg 2 dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 151d9000 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 102 dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 1540c000 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 202 dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 152fc000 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 302 dmar: DMAR:[DMA Write] Request device [06:0a.0] fault addr 1ed10d000 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 402 dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000 DMAR:[fault reason 01] Present bit in root entry is clear dmar: INTR-REMAP: Request device [[05:00.1] fault index 40 INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear dmar: DRHD: handling fault status reg 602 dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000 DMAR:[fault reason 01] Present bit in root entry is clear dmar: DRHD: handling fault status reg 702 dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000 DMAR:[fault reason 01] Present bit in root entry is clear 4 logical volume(s) in volume group "jvg_S" now active 4 logical volume(s) in volume group "jvg_P" now active Override SW Exception reboot reason saved to /dev/mapper/jvg_P-jlvmrootrw/var/crash/override_reboot_reason Compressed level 31 vmhost kernel core will be dumped to jvg_P-jlvmrootrw. Please use crash utility to analyze the core Copying data : [ 13 %] . . . Copying data : [100 %] The dumpfile is saved to vmcore-0-compressed-201804041305. makedumpfile Completed. (many more boot messages seen) -Aaron ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp -- -sent from my iPhone; please excuse spelling, grammar and brevity- ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp
Re: [j-nsp] mx960 crashed
Hi Aaron, Looks like you have a core file there, raise a TAC case and they’ll be able to determine the root cause for you. Also see if there is anything readable in /dev/mapper/jvg_P-jlvmrootrw/var/crash/override_reboot_reason that may point you to the root cause. HTH, Graham On Thu, 5 Apr 2018 at 01:13, Aaron Gould wrote: > Still in the process of turning up this new 5-node mx960 100 gig ring, and > I > went to the backup re console and went to login and saw this happen. > > > > Any idea why this happened and how do I tshoot cause ? > > > > > > > > FreeBSD/amd64 (mx960) (ttyu0) > > > > login: root > > Password:SysRq : Trigger a crash > > dmar: DRHD: handling fault status reg 2 > > dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 151d9000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > dmar: DRHD: handling fault status reg 102 > > dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 1540c000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > dmar: DRHD: handling fault status reg 202 > > dmar: DMAR:[DMA Write] Request device [02:10.1] fault addr 152fc000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > dmar: DRHD: handling fault status reg 302 > > dmar: DMAR:[DMA Write] Request device [06:0a.0] fault addr 1ed10d000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > dmar: DRHD: handling fault status reg 402 > > dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > dmar: INTR-REMAP: Request device [[05:00.1] fault index 40 > > INTR-REMAP:[fault reason 34] Present field in the IRTE entry is clear > > dmar: DRHD: handling fault status reg 602 > > dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > dmar: DRHD: handling fault status reg 702 > > dmar: DMAR:[DMA Read] Request device [06:0a.0] fault addr ad4a000 > > DMAR:[fault reason 01] Present bit in root entry is clear > > 4 logical volume(s) in volume group "jvg_S" now active > > 4 logical volume(s) in volume group "jvg_P" now active > > Override SW Exception reboot reason saved to > /dev/mapper/jvg_P-jlvmrootrw/var/crash/override_reboot_reason > > Compressed level 31 vmhost kernel core will be dumped to jvg_P-jlvmrootrw. > > Please use crash utility to analyze the core > > Copying data : [ 13 %] > > . > > . > > . > > Copying data : [100 %] > > > > The dumpfile is saved to vmcore-0-compressed-201804041305. > > > > makedumpfile Completed. > > > > > > (many more boot messages seen) > > > > > > -Aaron > > ___ > juniper-nsp mailing list juniper-nsp@puck.nether.net > https://puck.nether.net/mailman/listinfo/juniper-nsp > -- -sent from my iPhone; please excuse spelling, grammar and brevity- ___ juniper-nsp mailing list juniper-nsp@puck.nether.net https://puck.nether.net/mailman/listinfo/juniper-nsp