Hi to all, I am running a very simple configuration of drbd primary primary.. I make all test some weeks ago and all runs very well, (shudown the nodes, etc etc etc)..
I will repeat the probes yesterday and now :(... I don't know what happens, again!!! but every time that I stop one node (shutdown, not poweroff) the cluster is broken :-(... I shutdown the filesystem an make a fsck.ocfs2 and there is many errors y cluster file but there is no way to test that the ocfs2 are ok? I can stop in night but for me this are crazy, because every to months the filesystem are broken and if I stop one node the running node go down... I have all system in debian squezee with ocfs2 1.6.3 Any Ideas?? Feb 7 13:58:33 servidoradantra2 kernel: [1864496.744051] block drbd0: conn( Unconnected -> WFConnection ) Feb 7 13:59:24 servidoradantra2 kernel: [1864547.064015] o2net: connection to node servidoradantra1 (num 0) at 192.168.2.1:7777 has been idle for 60.0 seconds, shutting it down. Feb 7 13:59:24 servidoradantra2 kernel: [1864547.064025] (0,0):o2net_idle_timer:1495 here are some times that might help debug the situation: (tmr 1328619504.71832 now 1328619564.71605 dr 1328619504.71815 adv 1328619504.71839:1328619504.71840 func (18797194:507) 1328619488.80748:1328619488.80749) Feb 7 13:59:24 servidoradantra2 kernel: [1864547.064048] o2net: no longer connected to node servidoradantra1 (num 0) at 192.168.2.1:7777 Feb 7 13:59:31 servidoradantra2 kernel: [1864554.860190] (2950,0):o2dlm_eviction_cb:269 o2dlm has evicted node 0 from group F0E244E5687046DBAAF6A928CCDEEEF1 Feb 7 13:59:31 servidoradantra2 kernel: [1864554.874012] (28219,0):dlm_get_lock_resource:839 F0E244E5687046DBAAF6A928CCDEEEF1:M00000000000000000000120766ee68: at least one node (0) to recover before lock mastery can begin Feb 7 13:59:32 servidoradantra2 kernel: [1864555.876011] (28219,0):dlm_get_lock_resource:893 F0E244E5687046DBAAF6A928CCDEEEF1:M00000000000000000000120766ee68: at least one node (0) to recover before lock mastery can begin Feb 7 13:59:35 servidoradantra2 kernel: [1864558.309527] (3132,3):dlm_get_lock_resource:839 F0E244E5687046DBAAF6A928CCDEEEF1:$RECOVERY: at least one node (0) to recover before lock mastery can begin Feb 7 13:59:35 servidoradantra2 kernel: [1864558.309533] (3132,3):dlm_get_lock_resource:873 F0E244E5687046DBAAF6A928CCDEEEF1: recovery map is not empty, but must master $RECOVERY lock now Feb 7 13:59:35 servidoradantra2 kernel: [1864558.309549] (3132,3):dlm_do_recovery:523 (3132) Node 1 is the Recovery Master for the Dead Node 0 for Domain F0E244E5687046DBAAF6A928CCDEEEF1 Feb 7 13:59:43 servidoradantra2 kernel: [1864566.880235] (28219,0):ocfs2_replay_journal:1607 Recovering node 0 from slot 0 on device (147,0) Feb 7 13:59:47 servidoradantra2 kernel: [1864570.884880] ------------[ cut here ]------------ Feb 7 13:59:47 servidoradantra2 kernel: [1864570.884902] kernel BUG at /build/buildd-linux-2.6_2.6.32-39squeeze1-i386-F5tMlP/linux-2.6-2.6.32/debian/build/source_i386_none/fs/ocfs2/journal.c:1702! Feb 7 13:59:47 servidoradantra2 kernel: [1864570.884938] invalid opcode: 0000 [#1] SMP Feb 7 13:59:47 servidoradantra2 kernel: [1864570.884960] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host5/target5:0:0/5:0:0:0/model Feb 7 13:59:47 servidoradantra2 kernel: [1864570.884991] Modules linked in: ocfs2 jbd2 quota_tree crc32c drbd lru_cache cn pci_stub vboxpci vboxnetadp vboxnetflt vboxdrv cls_u32 sch_htb sch_ingress sch_sfq xt_time xt_connlimit xt_realm iptable_raw xt_TPROXY nf_tproxy_core xt_hashlimit xt_comment xt_owner xt_recent xt_iprange xt_policy xt_multiport ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle nfnetlink iptable_filter ip_tables x_tables ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs xfs exportfs it87 hwmon_vid coretemp loop firewire_sbp2 snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nouveau ttm drm_kms_helper snd_pcm drm snd_timer snd soundcore i2c_i801 i2c_ Feb 7 13:59:47 servidoradantra2 kernel: algo_bit parport_pc i2c_core snd_page_alloc parport psmouse evdev button pcspkr serio_raw processor ext3 jbd mbcache dm_mod sg usbhid hid sr_mod cdrom ata_generic sd_mod crc_t10dif uhci_hcd pata_jmicron firewire_ohci thermal ahci firewire_core floppy crc_itu_t libata r8169 mii ehci_hcd scsi_mod thermal_sys sky2 usbcore nls_base [last unloaded: scsi_wait_scan] Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886462] Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886477] Pid: 28219, comm: ocfs2rec Not tainted (2.6.32-5-686-bigmem #1) 965P-DS4 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886505] EIP: 0060:[<fd01d47a>] EFLAGS: 00010246 CPU: 0 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886532] EIP is at __ocfs2_recovery_thread+0x3af/0x146d [ocfs2] Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886550] EAX: 00000001 EBX: f5da6800 ECX: 00000001 EDX: 00000001 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886569] ESI: 00000001 EDI: f6ade038 EBP: 00000000 ESP: e0cb9ed4 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886587] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886605] Process ocfs2rec (pid: 28219, ti=e0cb8000 task=c91c0440 task.ti=e0cb8000) Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886633] Stack: Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886647] c91c0440 c91c0440 f5da689c 00000001 00000001 f5da6800 f6ade038 f6b21930 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886682] <0> 00000002 00010000 00000000 00010000 00000000 e6f91000 d2baa848 00000000 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886731] <0> 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886790] Call Trace: Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886814] [<fd01d0cb>] ? __ocfs2_recovery_thread+0x0/0x146d [ocfs2] Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886835] [<c104a420>] ? kthread+0x61/0x66 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886853] [<c104a3bf>] ? kthread+0x0/0x66 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886871] [<c1008d87>] ? kernel_thread_helper+0x7/0x10 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.886888] Code: 00 00 68 24 b7 05 fd 50 ff b2 2c 01 00 00 68 c9 47 06 fd e8 99 10 26 c4 83 c4 20 8b 5c 24 14 8b 44 24 0c 39 83 bc 00 00 00 75 04 <0f> 0b eb fe 8d 84 24 d0 00 00 00 c7 84 24 d0 00 00 00 00 00 00 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.887102] EIP: [<fd01d47a>] __ocfs2_recovery_thread+0x3af/0x146d [ocfs2] SS:ESP 0068:e0cb9ed4 Feb 7 13:59:47 servidoradantra2 kernel: [1864570.887413] ---[ end trace 22961f2e1f624b7d ]--- Feb 7 14:07:19 servidoradantra2 kernel: imklog 4.6.4, log source = /proc/kmsg started. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users