Set up netdump/netconsole. We print more messages after the
write_timeout which will provide more clues. As the node is panicing,
these messages are caught only by the netdump server.

Google "redhat netdump rhel4" for details on setting it up.

Stephan A. Rickauer wrote:
> Stephan A. Rickauer wrote:
>   
>>> When the hb thread panics, it dumps messages indicating
>>> the times it took to perform the tasks. Could you share
>>> those messages?
>>>       
>> Actually, I have not seen those messages. Give me a couple of minutes
>> and I will reproduce the crash to post the numbers here.
>>     
>
> Ok, this is what I get when reducing the heartbeat treshold to the
> default in /etc/sysconfig/o2cb:
>
> ---snip---
> (3,0):o2hb_write_timeout: 164 ERROR: Heartbeat write timeout to device
> sdb1 after 12000 milliseconds
> (3,0):02hb_stop_all_regions: 1727 ERROR: stopping heartbeat on all
> active regions
> Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
> system by panicing
>
> <3>iscsi-sfnet:host1: ping timeout of 5 secs expired, last rx
> 4296316431, last ping 4296321431, now 4296326431
> ---snip---
>
> I haven't reported the iscsi-sfnet message the first time, since I
> believed it is a followup error of the ocfs2 crash. However, this is all
> I have on the screen.
>
>
> Apart from that, here is what I get when I mount my ocfs2 fs (before the
> crash, of course). May be irrelevant:
>
> ---snip---
> [EMAIL PROTECTED] ~]# mount /dev/sdb1 /mnt/iscsi
> (2943,0):ocfs2_initialize_super:1354 max_slots for this device: 4
> (2943,0):ocfs2_fill_local_node_info:1031 I am node 0
> (2943,0):__dlm_print_nodes:384 Nodes in my domain
> ("6862E40BCE3F4A0CBB047A5ADF8FA2E6"):
> (2943,0):__dlm_print_nodes:388  node 0
> (2943,0):ocfs2_find_slot:267 taking node slot 0
> ocfs2: Mounting device (8,17) on (node 0, slot 0)
> ---snip---
>
>
> And the proof of using deadline plus some additional info:
>
> ---snip---
> [EMAIL PROTECTED] ~]# dmesg | grep sched
> Using deadline io scheduler
>
> [EMAIL PROTECTED] ~]# lspci | grep Broadcom
> 02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
> Gigabit Ethernet (rev 10)
>
> [EMAIL PROTECTED] ~]# uname -a
> Linux lvs02.lan.ini.unizh.ch 2.6.9-34.EL #1 Thu Mar 9 06:03:30 GMT 2006
> x86_64 x86_64 x86_64 GNU/Linux
>
> [EMAIL PROTECTED] ~]# rpm -qa | grep ocfs2
> ocfs2console-1.2.0-1
> ocfs2-2.6.9-34.EL-1.2.0-1
> ocfs2-tools-1.2.0-1
>
> [EMAIL PROTECTED] ~]# cat /proc/cpuinfo | grep name
> model name      : AMD Opteron(tm) Processor 254
> ---snip---
>
>
> let me know if you need more... or how I can help.
>
> Thanks!
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> [email protected]
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to