Re: [osol-code] Queries regarding crash dump .....

Gavin Maltby Tue, 31 Oct 2006 02:20:27 -0800

Hi,

On 10/29/06 08:54, Gaurav Dhiman wrote:

Hi,


I am studing the crash dump routines of opensoalris now days, mainly the
flow of panic() funcion.
Can someone let me know the answers of following questions.

- What all can be configured as dump device ?
    - Swap partition on local disk is ok, that is by default


Yes.

    - Dedicated raw partition on local disk is ok but dont know how to do
it


dumpadm -d <block-device>

    - Partition across the network (i mean through NFS), is it posible, I
gone thru the code of nfs_dump() and when I make my kernel crash and follow

the flow in KMDB, then the flow goes till nfs_dump() ---> nd_init() andthen

returns back with error due to which the dumping is not done across the
network. I figured out the cause of that error code returned. Actually in

nd_init(), first thing we do is that we do check the version number ofvnode

structutre of our configured dump file, and in my case that comes out to be
0, where as the supported NFS versions for dumping are ony 2 and 3 as per
the code in nd_init(), can someone explain me what should I do to have the
dump across the network. One more thing is it possible to have eh dump in
normal partition or only in swap partition ?


I have not tried a dump across nfs for a very long time.  I believe it
only exists for diskless clients - you would not want to use it on
any regular systems.

   - Can I throught the minimum debugging messages like processor state and
panic stack on local or serial console at the time of panicing ? How is it
that possible if at all ?


Panic info does go to the console already.  Of course it may then disappear
on system reset, which is why logging serial consoles are nice to have.

    - What the purpose of putting other CPUs in infinite loop other than
the CPU which paniced. As far as I understand this is done to keep other
CPUs bussy in infinte loop so that they do not disturbe the physical memory

of which the current panic is taking snapshot in terms of dump.


Correct, we want to dump as close to a snapshot of system state as possible
and so we don't want other cpus continuing to run.

I dontthink

we do report the state of other CPUs in our panic dump, do we ? If ys, I
could not find that code which dump the other CPUs state.


The short asnwer is that we do dump the cpu structures of all cpus
during panic, so we can see their state from that.  That is what
::cpuinfo in mdb uses, for example.

In sparc there is an exception: if you drop to the OBP prompt (via OBP
breakpoint or break sequence) and 'sync' for a dump then state of
the non-panic cpus is incomplete.  That is because they have "parked"
themselves in OBP and their final state info is held in OBP buffers;
the 'sync' callback does not unpark the other cpus so they do not
have their latest state reflected in the dump.  We had a fix for this
once but it lost some other functionality (can't remember what now).

    - If some one has already gone through some code of kernel panic and
crash dump, can someone let me know what are the low level arch-dependent
APIs in crash dumping code. I could figure out few of those, vpanic()
(which dumps the panic CPU state on panic stack), panic_stop_cpus() (wich
sends IPIs and put other CPUs in infinte loop, sending IPIs is
arch-dependent), panic_savetrap() (dont know what it do), panic_saveregs()
(copies the processor state from panic stack to panic buffer)


These are not APIs, they are implementation-private detail.  Kernel code
can elect to panic through calling cmn_err(CE_PANIC, ...) or
panic() directly, thereafter it is in the hands of the panic code.
The cmn_err and panic functions call vpanic to do the work.
The vpanic function is in assembler, and it's main job is to decide
which cpu will be the primary panic cpu - the one that grabs the
panic_quiesce trigger first (sometimes mutliple cpus try to panic at once -
only once will be allowed through as the primary panic cpu and
it will dump the system).

The big block comment comment in panic.c explains a lot of this.
The panic triggers - quiesce, sync, dump - track the state of the
system as panic progresses.  We record this state in case the
main panic thread panics again - we reenter panicsys and can skip
over filesystem sync, for instance, if we paniced again during the initial
attempt at filesystem sync.

From vpanic we call panicsys.  The "on_panic_stack" argument is nonzero
if this is the primary panic thread - atomically set the panic_quiesce
trigger.  So the if (on_panic_stack) chunk only applies the first time
the primary panic thread enter panicsys (if it panics again it will reenter
panicsys but with on_panic_stack zero).  In the on_panic_stack block
we record primary panic info, stop other cpus (if they'll listen),
print out the panic stack, and then go on to sync the filesystems.
It also sets panicstr non NULL - importantly only after stopping other
cpus.

A thread entering panicsys without on_panic_stack set (a secondary panic
thread, or the primary thread re-entering) see this:

if (on_panic_stack) {
        /* not this */
} else if (panic_dump || panic_sync || panicstr) {
        print secondary panic message and full thru to sync code
} else
        goto spin

Since the primary panic cpu stops others before setting panicstr
or the panic_sync/panic_dump triggers it is only the primary panic
thread that can see that else if condition if it has re-entered
panic code.  Any thread that did not win the original panic_quiesce
trigger will go an spin.

Next the panic thread initiates sync, but only once (through
the panic_sync trigger).  That attempt may panic and re-enter,
as described above, in which case the second pass will not
try to sync again.

Finally, again under protection of a trigger panic_dump, we move on
to dump the system using dumpsys, and then call mdboot to reset the
system.

    - Why are we maintianing two seperate buffers, panicbuf and dumpbuf.
Right now my understading is that panic buf is only used to save the CPU
state and panic stack, where as the dumpbuf is used to save the system dump
information, like kernel symbol table, page table and physical memory dump.
once these bufffers are filled they are sent to dumpvp_flush() to put the
dump on disk or across network. Why are two buffers maintained seperately.
Is it for the reason that in live dumping cases, CPU state and panic stack
is not the part of dump and we only save the system information only ?


They have separate intentions.  dumpbuf is used as a staging buffer in
copying memory data out to the dump device.  panicbuf serves two
purposes.  Firstly it is a well-known data structure which debuggers
can look at for panic summary information.  Secondly, and only
on sparc systems, panicbuf is held within a page of memory that
is retained (via an OBP interface) across warm reset so anything
we write there will still be readable after we reboot (even if the
dump attempt fails, for instance).

Cheers

Gavin
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Re: [osol-code] Queries regarding crash dump .....

Reply via email to