Re: [osol-discuss] snv_111b x86 crashing

Brian Ruthven - Solaris Network Sustaining - Sun UK Mon, 14 Dec 2009 09:09:05 -0800


melbogia wrote:

melbogia wrote:

There is no "panic" or "syncing filesystems" in

/var/adm/messages. Here is what I see after I execute
the command "mkfile 100g /datapool/testfile"

I was on the machine at 1600 hours yesterday and

the next entry after that at 8:43 AM is system
startup information, it seems it can't even write
anything to /var/adm/messages before it reboots?

Also this may be relevant, the datapool pool

doesn't exist after it comes up from crash. I have to
do a "zpool import -f datapool" to import it.

This may be multiple issues. The fact a reboot

happens with no trace in/var/adm/messages is somewhat worrying to me.However, we need to framethe issue. Things like:


When you were using the system on the 10th, how

were you connectedto it? Were you logged in remotely (via

ssh/telnet/etc...)?
Where you on the console? (graphics console, or
 text mode)?
   What did you see on the screen around 16:19 ?


I was connected to the server via ssh and I didn't see anything at 16:19.

I can't imagine you saw absolutely nothing from your ssh session. I'mguessing that it hung, and eventually gave some error like "timeout" or"connection reset by peer" or something? Even if you reset it, did youtry ssh again, ping, etc...? What were the results here?


Did you go (if able) to the system itself? What state was it in? etc...
It wasn't a power cut was it?   ;-)

I am not sure about the hardware reset issues, I'll poke around and see what I 
find. But there is another machine with the Opensolaris 2009.06 as well, it has 
the exact same hardware with the exception of hard disks. It has been working 
fine.

That's kind of my point. If there is an "identical" system (and beware -they are rarely exactly identical with no differences whatsoever), andit is functioning fine, then (assuming similar usage patterns) it mayhelp point to a hardware/firmware issue, etc... Don't get me wrong - I'mnot saying your hardware is bust - just something to bear in mind if thefault always stays with the one system and no other system is affected.

Dec 11 08:43:36 dirt zfs: [ID 427000 kern.warning]

WARNING: pool 'datapool' could not be loaded as it
was last accessed by another system (host: dirt
hostid: 0x409a4c). See:
http://www.sun.com/msg/ZFS-8000-EY

Whilst I can't comment on this line myself, it does

(IMO) explain whyyou had to force import the pool after the reboot.

Assuming there is only one host with name "dirt",

then this would seemto be a bug.


that is correct, there is only one host with that name in the environment.

You might want to ask this question on zfs-discuss, and give them thebackground.

Just a thought (assuming the system will let you do this), who else waslogged in at the time? Could somebody have done a forced export of allzpools on the system? Could this explain why the messages file stoppedbeing written to?


Regards,
Brian

--
Brian Ruthven
Solaris Revenue Product Engineering
Sun Microsystems UK
Sparc House, Guillemont Park, Camberley, GU17 9QG

_______________________________________________
opensolaris-discuss mailing list
[email protected]

Re: [osol-discuss] snv_111b x86 crashing

Reply via email to