Hello,
I have set up a 4 node cluster. They are interconnected with an IPoIB
(connected mode)
Whist running a benchmark with IOzone I got the following errors:
IO seems to have halted.
Thanks,
Andrew
Sep 20 16:01:57 node001 kernel: INFO: task iozone:15816 blocked for more than
120 seconds.
Aslo,
IOzone gave this error: Error writing block 29813, fd= 3
GFS2: fsid=nimble_cluster:gfs_test.0: jid=3: Trying to acquire journal lock...
GFS2: fsid=nimble_cluster:gfs_test.0: jid=3: Looking at journal...
GFS2: fsid=nimble_cluster:gfs_test.0: jid=3: Acquiring the transaction lock...
GFS2:
It seems that my node004 is the problem.
I cannot kill the iozone processes and I find this in the logs.
Sep 20 15:59:09 node004 kernel: sd 3:0:0:0: timing out command, waited 180s
Sep 20 15:59:09 node004 kernel: sd 3:0:0:0: [sdi] Unhandled error code
Sep 20 15:59:09 node004 kernel: sd 3:0:0:0:
Hello,
we have a two node CentOS6.2 Cluster (rgmanager-3.0.12.1-5). After a reboot of
node2 the cluster won't work as expected. On node2 clustat just say's :
clustat:
Cluster Status for cluster1 @ Thu Sep 20 17:06:02 2012
Member Status: Quorate
Member Name
When I hear BIOS update and Dell then some red alarm signs appear in
front of my mind ...
Okay, what I want to say is: did they also check the NIC firmware
version? We here had really big trouble with the Broadcom NICs in our
R610 machines.
Probably that has nothing to do with your problem
On 09/20/2012 12:21 PM, Ralf Aumueller wrote:
Hello,
we have a two node CentOS6.2 Cluster (rgmanager-3.0.12.1-5). After a reboot of
node2 the cluster won't work as expected. On node2 clustat just say's :
clustat:
Cluster Status for cluster1 @ Thu Sep 20 17:06:02 2012
Member Status: Quorate