Hello everyone, once again. We are running into a problem, which has shown now 2 times, possible 3 (once the systems looked different.)
The environment is 6 HP DL360/380 g5 servers with eth0 being the public interface, eth1 and bond0 (eth2 and eth3) used for clusterware and bond0 also used for OCFS2. The bond0 interface is in active/passive mode. There are no network errors counters showing and even during the problem we can communicate via the bond0 interface. This setup has been running for more then 2 months but last Wednesday morning and today again, we had 2 nodes causing locking problems. The problem starts with messages like this: Jan 23 03:20:44 dbprd01 kernel: o2net: no longer connected to node dbprd02 (num 1) at 192.168.202.2:7777 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459 ERROR: status = -107 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR: status = -107 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_send_proxy_ast_msg:459 ERROR: status = -107 Jan 23 03:20:46 dbprd01 kernel: (5172,0):dlm_flush_asts:600 ERROR: status = -107 Jan 23 03:20:44 dbprd02 kernel: (5096,0):o2net_sendpage:868 ERROR: sendpage of size 24 to node dbprd01 (num 0) at 192.168.202.1:7777 failed with -11 Jan 23 03:20:44 dbprd02 kernel: o2net: no longer connected to node dbprd01 (num 0) at 192.168.202.1:7777 After these there are plenty of more messages, such as "dlm_wait_for_node_death", "dlm_send_remote_convert_request" on dbprd02 and "dlm_send_proxy_ast_msg", "dlm_flush_asts" on dbprd01. We are currently running OCFS2 1.2.5, the kernel is EL4 Update 5 x86_64 (2.6.9-55.ELsmp). I see there is one bug fixed in 1.2.6/1.2.7 related to DLM and I was wondering if the above problem could be related to it or if this is something different. Ulf Zimmermann | Senior System Architect ATC-Onlane, Inc. 4600 Bohannon Drive, Suite 100 Menlo Park, CA 94025 O: 650-532-6382 M: (510) 396-1764 F: (510) 580-0929 Email: [EMAIL PROTECTED] | Web: www.atc-onlane.com DISCLAIMER: This e-mail and any attachments are confidential and also may be privileged. If you are not the named recipient, or have otherwise received this communication in error, please delete it from your inbox, notify the sender immediately, and do not disclose its contents to any other person, use them for any purpose, or store or copy them in any medium. Thank you for your cooperation. _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
