On Thu, 13 Dec 2007, Richard L. Hamilton wrote:

>> On Thu, 13 Dec 2007, Richard L. Hamilton wrote:
>>
>>> Sometimes a large system, despite precautions (or
>> in the absence of them),
>>> runs out of resources (VM, mainly) to the degree
>> that no useful progress
>>> is being made: that is, one can't even log in and
>> kill the hogging processes.
>>>
>>> (At least on SPARC) the usual workaround would be
>> to break to the boot PROM
>>> and sync; however, this invariably causes a crash
>> dump to be taken.  In the
>>
>> Workaround:
>>
>> If you have "obpdebug" defined, you can do:
>>
>>      dumpvp 0 x!; sync
>> at the ok prompt to force skipping the dump.
>
> Thanks - _that's_ the sort of thing I was looking for; although it 
> presupposes that obpdebug
> support is loaded, which I suppose it usually isn't by default.

If you know the address of the 'dumpvp' (or 'dumphdr') global, then you 
can just do "<addr> 0 x!". Have used this technique in the past, equipped 
with a directory containing a set of kernel patches to look them up. 
Cumbersome, though.

It's possible to code an "OBP add-on" like the 'sync' command. Search the 
source for "add_vx_handler()", it's technically simple.
A point can be made for/against a sparc-only/-specific or generic - via 
dumpadm - method. But that discussion would be something for ARC-level.


[ ... ]
> Because I'd really rather the buffers got flushed, to minimize data loss?  
> Great if everything
> was on zfs (and great to ignore SPARC-specific features if you live your life 
> only on x86, I guess).

Well, minimize. Thing is, without a recovery/rollback/checkpointing 
mechanism, you can't really know whether you've lost something, and/or if 
you've lost something critical. It's like returning from holiday and 
finding your front door broken. You look inside and nothing _seems_ amiss. 
But then, do you remember where Granny left her money jar ?

I'd think saying "rely on sync" is the wrong word. It's more like uttering 
a prayer - calms the soul, and won't do no harm, and there are believers 
who will strongly claim it did them good. You don't really _know_, though.

But there's nothing wrong with a good belief, mind you :)

My experience there is rather that if the 'syncing filesystems ...' part 
works, then the dump will not hang either. They tend to go through the 
same I/O drivers/devices. In fact, the 'syncing ...' part accesses more 
I/O devs than the 'dumping ...' part does (the former goes for everything 
unflushed, while the latter only attempts to get at the dump device). We 
do have some service documents explaining how to get a dump if the box 
hangs during 'syncing filesystems ...' - but none to my knowledge that do 
the opposite.
The time the dump takes, though, is known to be "high".

FrankH.
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to