[Linux-cluster] GFS fail with iozone

2012-09-20 Thread Andrew Holway
Hello, I have set up a 4 node cluster. They are interconnected with an IPoIB (connected mode) Whist running a benchmark with IOzone I got the following errors: IO seems to have halted. Thanks, Andrew Sep 20 16:01:57 node001 kernel: INFO: task iozone:15816 blocked for more than 120 seconds.

Re: [Linux-cluster] GFS fail with iozone

2012-09-20 Thread Andrew Holway
Aslo, IOzone gave this error: Error writing block 29813, fd= 3 GFS2: fsid=nimble_cluster:gfs_test.0: jid=3: Trying to acquire journal lock... GFS2: fsid=nimble_cluster:gfs_test.0: jid=3: Looking at journal... GFS2: fsid=nimble_cluster:gfs_test.0: jid=3: Acquiring the transaction lock... GFS2:

Re: [Linux-cluster] GFS fail with iozone

2012-09-20 Thread Andrew Holway
It seems that my node004 is the problem. I cannot kill the iozone processes and I find this in the logs. Sep 20 15:59:09 node004 kernel: sd 3:0:0:0: timing out command, waited 180s Sep 20 15:59:09 node004 kernel: sd 3:0:0:0: [sdi] Unhandled error code Sep 20 15:59:09 node004 kernel: sd 3:0:0:0:

[Linux-cluster] Problem with rgmanager / rgmanager #37: Error receiving header from 2 sz=0 CTX 0x1f5d420

2012-09-20 Thread Ralf Aumueller
Hello, we have a two node CentOS6.2 Cluster (rgmanager-3.0.12.1-5). After a reboot of node2 the cluster won't work as expected. On node2 clustat just say's : clustat: Cluster Status for cluster1 @ Thu Sep 20 17:06:02 2012 Member Status: Quorate Member Name

Re: [Linux-cluster] Problem with rgmanager / rgmanager #37: Error receiving header from 2 sz=0 CTX 0x1f5d420

2012-09-20 Thread Heiko Nardmann
When I hear BIOS update and Dell then some red alarm signs appear in front of my mind ... Okay, what I want to say is: did they also check the NIC firmware version? We here had really big trouble with the Broadcom NICs in our R610 machines. Probably that has nothing to do with your problem

Re: [Linux-cluster] Problem with rgmanager / rgmanager #37: Error receiving header from 2 sz=0 CTX 0x1f5d420

2012-09-20 Thread Digimer
On 09/20/2012 12:21 PM, Ralf Aumueller wrote: Hello, we have a two node CentOS6.2 Cluster (rgmanager-3.0.12.1-5). After a reboot of node2 the cluster won't work as expected. On node2 clustat just say's : clustat: Cluster Status for cluster1 @ Thu Sep 20 17:06:02 2012 Member Status: Quorate