Re: [Linux-cluster] 3 node cluster problems

2008-03-27 Thread Bennie Thomas
How is your Cluster connections connected. (ie. Are you using a 
hub,switch or direct connecting the heartbeat cables) ?


Dalton, Maurice wrote:
Still having the problem. I can't figure it out. 


I just upgraded to the latest 5.1 cman.. No help.!


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You know 
have peaked

my curiosity and I will have to try building one. So were you also using

GFS ?

Dalton, Maurice wrote:
  

Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs


error
  

-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove 


Restarted cman on all systems and for some strange reason my clusters
are working.

It doesn't make any sense.

I can't thank you enough for your help...!!


Thanks.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I am currently running several 3-node cluster without a quorum disk. 
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send 
your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS


I configured some using Conga and some with system-config-cluster.

When 
  

using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and



  

cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the problem.

Dalton, Maurice wrote:
  


Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the same

[EMAIL PROTECTED] cluster]# more cluster.conf

?xml version=1.0?

cluster config_version=6 name=csarcsys5

fence_daemon post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0 nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0 nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0 nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsysfo ordered=0 restricted=1

failoverdomainnode name=csarcsys1-eth0 priority=1/

failoverdomainnode name=csarcsys2-eth0 priority=1/

failoverdomainnode name=csarcsys3-eth0 priority=1/

/failoverdomain

/failoverdomains

resources

ip address=172.24.86.177 monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
fstype=ext3 mountpo


int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

/resources

/rm

quorumd interval=4 label=csarcsysQ min_score=1 tko=30

  

votes=2/
  


/cluster

More info from csarcsys3

[EMAIL PROTECTED] cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[EMAIL PROTECTED] cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

brw-r- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

brw-r- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

[EMAIL PROTECTED] cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of *Panigrahi, 
Santosh Kumar

*Sent:* Tuesday, March

RE: [Linux-cluster] 3 node cluster problems

2008-03-27 Thread Dalton, Maurice
Cisco 3550


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

what is the switch brand.   I have read where the RHCS has problems with

certain switches

Dalton, Maurice wrote:
 Switches

 Storage is fiber


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Thursday, March 27, 2008 9:04 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 How is your Cluster connections connected. (ie. Are you using a 
 hub,switch or direct connecting the heartbeat cables) ?

 Dalton, Maurice wrote:
   
 Still having the problem. I can't figure it out. 

 I just upgraded to the latest 5.1 cman.. No help.!


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Tuesday, March 25, 2008 10:57 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems


 Glad they are working. I have not used lvm with our Clusters. You
know
 

   
 have peaked
 my curiosity and I will have to try building one. So were you also
 
 using
   
 GFS ?

 Dalton, Maurice wrote:
   
 
 Sorry but security here will not allow me to send host files

 BUT.


 I was getting this in /var/log/messages on csarcsys3

 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused
 Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
 
   
 error
   
 
 -111, check ccsd or cluster status
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused


 I had /dev/vg0/gfsvol on these systems.

 I did a lvremove 

 Restarted cman on all systems and for some strange reason my
clusters
 are working.

 It doesn't make any sense.

 I can't thank you enough for your help...!!


 Thanks.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Tuesday, March 25, 2008 10:27 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 I am currently running several 3-node cluster without a quorum disk.

 However, If you want your cluster to run
 if only one node is up then you will need a quorum disk. Can you
send
   

   
 your /etc/hosts file
 for all systems, Also, could there be another node name called 
 csarcsys3-eth0 in your NIS or DNS

 I configured some using Conga and some with system-config-cluster.
 
   
 When 
   
 
 using the system-config-cluster
 I basically run the config on all nodes; just adding the nodenames
   
 and
   
 
   
   
 
 cluster name. I reboot all nodes
 to make sure they see each other then go back and modify the config
 files.

 The file /var/log/messages should also shed some light on the
   
 problem.
   
 Dalton, Maurice wrote:
   
 
   
 Same problem.

 I now have qdiskd running.

 I have ran diff's on all three cluster.conf files.. all are the
same

 [EMAIL PROTECTED] cluster]# more cluster.conf

 ?xml version=1.0?

 cluster config_version=6 name=csarcsys5

 fence_daemon post_fail_delay=0 post_join_delay=3/

 clusternodes

 clusternode name=csarcsys1-eth0 nodeid=1 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys2-eth0 nodeid=2 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys3-eth0 nodeid=3 votes=1

 fence/

 /clusternode

 /clusternodes

 cman/

 fencedevices/

 rm

 failoverdomains

 failoverdomain name=csarcsysfo ordered=0 restricted=1

 failoverdomainnode name=csarcsys1-eth0 priority=1/

 failoverdomainnode name=csarcsys2-eth0 priority=1/

 failoverdomainnode name=csarcsys3-eth0 priority=1/

 /failoverdomain

 /failoverdomains

 resources

 ip address=172.24.86.177 monitor_link=1/

 fs device=/dev/sdc1 force_fsck=0 force_unmount=1
fsid=57739
 

   
 fstype=ext3 mountpo

 int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

 /resources

 /rm

 quorumd interval=4 label=csarcsysQ min_score=1 tko=30
 
   
 
 votes=2/
   
 
   
 /cluster

 More info from csarcsys3

 [EMAIL PROTECTED] cluster]# clustat

 msg_open: No such file or directory

 Member Status: Inquorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Offline

 csarcsys2-eth0 2 Offline

 csarcsys3-eth0 3 Online, Local

 /dev/sdd1 0 Offline

 [EMAIL PROTECTED] cluster]# mkqdisk -L

 mkqdisk v0.5.1

 /dev/sdd1:

 Magic: eb7a62c2

 Label: csarcsysQ

 Created: Wed Feb 13 13:44:35 2008

 Host: csarcsys1-eth0.xxx.xxx.nasa.gov

 [EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

 brw-r- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

 clustat from csarcsys1

Re: [Linux-cluster] 3 node cluster problems

2008-03-27 Thread Bennie Thomas
Are you using a private vlan for your cluster communications. If not, 
you should be. the communicatuions

between the clustered nodes is very chatty Just my opinion.

These are my opinions and experiences.

Any views or opinions presented are solely those of the author and do not necessarily 
represent those of Raytheon unless specifically stated. 
Electronic communications including email might be monitored by Raytheon. 
for operational or business reasons.



Dalton, Maurice wrote:

Cisco 3550


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

what is the switch brand.   I have read where the RHCS has problems with

certain switches

Dalton, Maurice wrote:
  

Switches

Storage is fiber


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:04 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

How is your Cluster connections connected. (ie. Are you using a 
hub,switch or direct connecting the heartbeat cables) ?


Dalton, Maurice wrote:
  

Still having the problem. I can't figure it out. 


I just upgraded to the latest 5.1 cman.. No help.!


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You
  

know
  

  
  


have peaked
my curiosity and I will have to try building one. So were you also

  

using
  


GFS ?

Dalton, Maurice wrote:
  

  

Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs

  


error
  

  

-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove 


Restarted cman on all systems and for some strange reason my


clusters
  

are working.

It doesn't make any sense.

I can't thank you enough for your help...!!


Thanks.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I am currently running several 3-node cluster without a quorum disk.



  

However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you


send
  
  

  


your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS


I configured some using Conga and some with system-config-cluster.

  

When 
  

  

using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames
  


and
  


  

  

  

cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the
  


problem.
  


Dalton, Maurice wrote:
  

  


Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the
  

same
  

[EMAIL PROTECTED] cluster]# more cluster.conf

?xml version=1.0?

cluster config_version=6 name=csarcsys5

fence_daemon post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0 nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0 nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0 nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsysfo ordered=0 restricted=1

failoverdomainnode name=csarcsys1-eth0 priority=1/

failoverdomainnode name=csarcsys2-eth0 priority=1/

failoverdomainnode name=csarcsys3-eth0 priority=1/

/failoverdomain

/failoverdomains

resources

ip address=172.24.86.177 monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1
  

fsid=57739
  

  
  


fstype=ext3 mountpo

int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

/resources

/rm

quorumd interval=4 label=csarcsysQ min_score=1 tko=30

  

  

votes=2

Re: [Linux-cluster] 3 node cluster problems

2008-03-27 Thread John Ruemker
I believe some of the cisco switches do not have multicast enabled by 
default which would prevent some of the cluster communications from 
getting through properly.


   http://kbase.redhat.com/faq/FAQ_51_11755

John

Bennie Thomas wrote:
Are you using a private vlan for your cluster communications. If not, 
you should be. the communicatuions

between the clustered nodes is very chatty Just my opinion.

These are my opinions and experiences.

Any views or opinions presented are solely those of the author and do 
not necessarily represent those of Raytheon unless specifically 
stated. Electronic communications including email might be monitored 
by Raytheon. for operational or business reasons.



Dalton, Maurice wrote:

Cisco 3550


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:53 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

what is the switch brand.   I have read where the RHCS has problems with

certain switches

Dalton, Maurice wrote:
 

Switches

Storage is fiber


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 9:04 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

How is your Cluster connections connected. (ie. Are you using a 
hub,switch or direct connecting the heartbeat cables) ?


Dalton, Maurice wrote:
 

Still having the problem. I can't figure it out.
I just upgraded to the latest 5.1 cman.. No help.!


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You
  

know
 
  
 

have peaked
my curiosity and I will have to try building one. So were you also
  

using
 

GFS ?

Dalton, Maurice wrote:
   

Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
  

error
   

-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove
Restarted cman on all systems and for some strange reason my


clusters
 

are working.

It doesn't make any sense.

I can't thank you enough for your help...!!


Thanks.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I am currently running several 3-node cluster without a quorum disk.



 

However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you


send
 
  
 

your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS


I configured some using Conga and some with system-config-cluster.
  
When

using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames
  

and
 
  
   

cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the
  

problem.
 

Dalton, Maurice wrote:
   

Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the
  

same
 

[EMAIL PROTECTED] cluster]# more cluster.conf

?xml version=1.0?

cluster config_version=6 name=csarcsys5

fence_daemon post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0 nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0 nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0 nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsysfo ordered=0 restricted=1

failoverdomainnode name=csarcsys1-eth0 priority=1/

failoverdomainnode name=csarcsys2-eth0 priority=1/

failoverdomainnode name=csarcsys3-eth0 priority=1/

/failoverdomain

/failoverdomains

resources

ip address=172.24.86.177 monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1
  

fsid=57739
 
  
 

fstype=ext3 mountpo

RE: [Linux-cluster] 3 node cluster problems

2008-03-27 Thread Dalton, Maurice
Ours are enabled.. I have verified that.. 
Thanks


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John Ruemker
Sent: Thursday, March 27, 2008 10:41 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I believe some of the cisco switches do not have multicast enabled by 
default which would prevent some of the cluster communications from 
getting through properly.

http://kbase.redhat.com/faq/FAQ_51_11755

John

Bennie Thomas wrote:
 Are you using a private vlan for your cluster communications. If not, 
 you should be. the communicatuions
 between the clustered nodes is very chatty Just my opinion.

 These are my opinions and experiences.

 Any views or opinions presented are solely those of the author and do 
 not necessarily represent those of Raytheon unless specifically 
 stated. Electronic communications including email might be monitored 
 by Raytheon. for operational or business reasons.


 Dalton, Maurice wrote:
 Cisco 3550


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Thursday, March 27, 2008 9:53 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 what is the switch brand.   I have read where the RHCS has problems
with

 certain switches

 Dalton, Maurice wrote:
  
 Switches

 Storage is fiber


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Thursday, March 27, 2008 9:04 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 How is your Cluster connections connected. (ie. Are you using a 
 hub,switch or direct connecting the heartbeat cables) ?

 Dalton, Maurice wrote:
  
 Still having the problem. I can't figure it out.
 I just upgraded to the latest 5.1 cman.. No help.!


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie
Thomas
 Sent: Tuesday, March 25, 2008 10:57 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems


 Glad they are working. I have not used lvm with our Clusters. You
   
 know
  
   
  
 have peaked
 my curiosity and I will have to try building one. So were you also
   
 using
  
 GFS ?

 Dalton, Maurice wrote:

 Sorry but security here will not allow me to send host files

 BUT.


 I was getting this in /var/log/messages on csarcsys3

 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused
 Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
   
 error

 -111, check ccsd or cluster status
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused


 I had /dev/vg0/gfsvol on these systems.

 I did a lvremove
 Restarted cman on all systems and for some strange reason my
 
 clusters
  
 are working.

 It doesn't make any sense.

 I can't thank you enough for your help...!!


 Thanks.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie
Thomas
 Sent: Tuesday, March 25, 2008 10:27 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 I am currently running several 3-node cluster without a quorum
disk.
 

  
 However, If you want your cluster to run
 if only one node is up then you will need a quorum disk. Can you
 
 send
  
   
  
 your /etc/hosts file
 for all systems, Also, could there be another node name called 
 csarcsys3-eth0 in your NIS or DNS

 I configured some using Conga and some with system-config-cluster.
   
 When
 using the system-config-cluster
 I basically run the config on all nodes; just adding the nodenames
   
 and
  
   

 cluster name. I reboot all nodes
 to make sure they see each other then go back and modify the
config
 files.

 The file /var/log/messages should also shed some light on the
   
 problem.
  
 Dalton, Maurice wrote:

 Same problem.

 I now have qdiskd running.

 I have ran diff's on all three cluster.conf files.. all are the
   
 same
  
 [EMAIL PROTECTED] cluster]# more cluster.conf

 ?xml version=1.0?

 cluster config_version=6 name=csarcsys5

 fence_daemon post_fail_delay=0 post_join_delay=3/

 clusternodes

 clusternode name=csarcsys1-eth0 nodeid=1 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys2-eth0 nodeid=2 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys3-eth0 nodeid=3 votes=1

 fence/

 /clusternode

 /clusternodes

 cman/

 fencedevices/

 rm

 failoverdomains

 failoverdomain name=csarcsysfo

RE: [Linux-cluster] 3 node cluster problems

2008-03-27 Thread Dalton, Maurice
I have removed the 3rd server, as long as I am running with 2 nodes and
qdisk. I am not seeing any problems

I add the 3rd server and my problems begin.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Thursday, March 27, 2008 10:28 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

Are you using a private vlan for your cluster communications. If not, 
you should be. the communicatuions
between the clustered nodes is very chatty Just my opinion.

These are my opinions and experiences.

Any views or opinions presented are solely those of the author and do
not necessarily 
represent those of Raytheon unless specifically stated. 
Electronic communications including email might be monitored by
Raytheon. 
for operational or business reasons.


Dalton, Maurice wrote:
 Cisco 3550


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Thursday, March 27, 2008 9:53 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 what is the switch brand.   I have read where the RHCS has problems
with

 certain switches

 Dalton, Maurice wrote:
   
 Switches

 Storage is fiber


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Thursday, March 27, 2008 9:04 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 How is your Cluster connections connected. (ie. Are you using a 
 hub,switch or direct connecting the heartbeat cables) ?

 Dalton, Maurice wrote:
   
 
 Still having the problem. I can't figure it out. 

 I just upgraded to the latest 5.1 cman.. No help.!


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Tuesday, March 25, 2008 10:57 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems


 Glad they are working. I have not used lvm with our Clusters. You
   
 know
   
 
   
   
 
 have peaked
 my curiosity and I will have to try building one. So were you also
 
   
 using
   
 
 GFS ?

 Dalton, Maurice wrote:
   
 
   
 Sorry but security here will not allow me to send host files

 BUT.


 I was getting this in /var/log/messages on csarcsys3

 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused
 Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
 
   
 
 error
   
 
   
 -111, check ccsd or cluster status
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused


 I had /dev/vg0/gfsvol on these systems.

 I did a lvremove 

 Restarted cman on all systems and for some strange reason my
 
 clusters
   
 are working.

 It doesn't make any sense.

 I can't thank you enough for your help...!!


 Thanks.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie
Thomas
 Sent: Tuesday, March 25, 2008 10:27 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 I am currently running several 3-node cluster without a quorum
disk.
 

   
 However, If you want your cluster to run
 if only one node is up then you will need a quorum disk. Can you
 
 send
   
   
 
   
 
 your /etc/hosts file
 for all systems, Also, could there be another node name called 
 csarcsys3-eth0 in your NIS or DNS

 I configured some using Conga and some with system-config-cluster.
 
   
 
 When 
   
 
   
 using the system-config-cluster
 I basically run the config on all nodes; just adding the nodenames
   
 
 and
   
 
 
   
 
   
 
   
 cluster name. I reboot all nodes
 to make sure they see each other then go back and modify the config
 files.

 The file /var/log/messages should also shed some light on the
   
 
 problem.
   
 
 Dalton, Maurice wrote:
   
 
   
 
 Same problem.

 I now have qdiskd running.

 I have ran diff's on all three cluster.conf files.. all are the
   
 same
   
 [EMAIL PROTECTED] cluster]# more cluster.conf

 ?xml version=1.0?

 cluster config_version=6 name=csarcsys5

 fence_daemon post_fail_delay=0 post_join_delay=3/

 clusternodes

 clusternode name=csarcsys1-eth0 nodeid=1 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys2-eth0 nodeid=2 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys3-eth0 nodeid=3 votes=1

 fence/

 /clusternode

 /clusternodes

 cman/

 fencedevices/

 rm

 failoverdomains

 failoverdomain name=csarcsysfo ordered=0 restricted=1

 failoverdomainnode name=csarcsys1-eth0

RE: [Linux-cluster] 3 node cluster problems

2008-03-25 Thread Dalton, Maurice
Still no change. Same as below. 

I completely rebuilt the cluster using system-config-cluster
The Cluster software was installed from rhn, luci and ricci are running.

This is the new config file and it has been copied to the 2 other
systems



[EMAIL PROTECTED] cluster]# more cluster.conf
?xml version=1.0?
cluster config_version=5 name=csarcsys5
fence_daemon post_fail_delay=0 post_join_delay=3/
clusternodes
clusternode name=csarcsys1-eth0 nodeid=1 votes=1
fence/
/clusternode
clusternode name=csarcsys2-eth0 nodeid=2 votes=1
fence/
/clusternode
clusternode name=csarcsys3-eth0 nodeid=3 votes=1
fence/
/clusternode
/clusternodes
cman/
fencedevices/
rm
failoverdomains
failoverdomain name=csarcsysfo ordered=0
restricted=1
failoverdomainnode
name=csarcsys1-eth0 priority=1/
failoverdomainnode
name=csarcsys2-eth0 priority=1/
failoverdomainnode
name=csarcsys3-eth0 priority=1/
/failoverdomain
/failoverdomains
resources
ip address=172.xx.xx.xxx monitor_link=1/
fs device=/dev/sdc1 force_fsck=0
force_unmount=1 fsid=57739 fstype=ext3 mountpo
int=/csarc-test name=csarcsys-fs options=rw self_fence=0/
/resources
/rm
/cluster

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 4:17 PM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

Did you load the Cluster software via Conga or manually ? You would have

had to load
luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the

other two nodes.
Make sure you can ping the private interface to/from all nodes and 
reboot. If this does not work
post your /etc/cluster/cluster.conf file again.


Dalton, Maurice wrote:
 Yes
 I also rebooted again just now to be sure.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Monday, March 24, 2008 3:33 PM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 When you changed the nodenames in the /etc/lcuster/cluster.conf and
made

 sure the /etc/hosts
 file had the correct nodenames (Ie. 10.0.0.100  csarcsys1-eth0   
 csarcsys1-eth0...xxx.)
 Did you reboot all the nodes at the sametime ?

 Dalton, Maurice wrote:
   
 No luck. It seems as if csarcsys3 thinks its in his own cluster
 I renamed all config files and rebuilt from system-config-cluster

 Clustat command from csarcsys3


 [EMAIL PROTECTED] cluster]# clustat
 msg_open: No such file or directory
 Member Status: Inquorate

   Member NameID   Status
   --  --
   csarcsys1-eth01 Offline
   csarcsys2-eth02 Offline
   csarcsys3-eth03 Online, Local

 clustat command from csarcsys2 

 [EMAIL PROTECTED] cluster]# clustat
 msg_open: No such file or directory
 Member Status: Quorate

   Member NameID   Status
   --  --
   csarcsys1-eth01 Online
   csarcsys2-eth02 Online, Local
   csarcsys3-eth03 Offline


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Monday, March 24, 2008 2:25 PM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 You will also, need to make sure the clustered nodenames are in your 
 /etc/hosts file.
 Also, make sure your cluster network interface is up on all nodes and

 that the
 /etc/cluster/cluster.conf are the same on all nodes.



 Dalton, Maurice wrote:
   
 
 The last post is incorrect.

 Fence is still hanging at start up.

 Here's another log message.

 Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 
 connect: Connection refused

 Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 
 error -111, check ccsd or cluster status

 *From:* [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] *On Behalf Of *Bennie
   
 Thomas
   
 *Sent:* Monday, March 24, 2008 11:22 AM
 *To:* linux clustering
 *Subject:* Re: [Linux-cluster] 3 node cluster problems

 try removing the fully qualified hostname from the cluster.conf
file.


 Dalton, Maurice wrote:

 I have NO fencing equipment

 I have been task to setup a 3 node cluster

 Currently I have having problems getting cman(fence) to start

 Fence will try to start up during cman start up but will fail

 I tried

Re: [Linux-cluster] 3 node cluster problems

2008-03-25 Thread Bennie Thomas
I am currently running several 3-node cluster without a quorum disk. 
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send 
your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS


I configured some using Conga and some with system-config-cluster. When 
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and 
cluster name. I reboot all nodes

to make sure they see each other then go back and modify the config files.

The file /var/log/messages should also shed some light on the problem.

Dalton, Maurice wrote:


Same problem.

I now have qdiskd running.

I have ran diff’s on all three cluster.conf files.. all are the same

[EMAIL PROTECTED] cluster]# more cluster.conf

?xml version=1.0?

cluster config_version=6 name=csarcsys5

fence_daemon post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0 nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0 nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0 nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsysfo ordered=0 restricted=1

failoverdomainnode name=csarcsys1-eth0 priority=1/

failoverdomainnode name=csarcsys2-eth0 priority=1/

failoverdomainnode name=csarcsys3-eth0 priority=1/

/failoverdomain

/failoverdomains

resources

ip address=172.24.86.177 monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
fstype=ext3 mountpo


int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

/resources

/rm

quorumd interval=4 label=csarcsysQ min_score=1 tko=30 votes=2/

/cluster

More info from csarcsys3

[EMAIL PROTECTED] cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[EMAIL PROTECTED] cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

brw-r- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

brw-r- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

[EMAIL PROTECTED] cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of *Panigrahi, 
Santosh Kumar

*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE: [Linux-cluster] 3 node cluster problems

If you are configuring your cluster by system-config-cluster then no 
need to run ricci/luci. Ricci/luci needed for configuring the cluster 
using conga. You can configure in either ways.


On seeing your clustat command outputs, it seems cluster is 
partitioned (spilt brain) into 2 sub clusters [Sub1-* 
**(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a 
quorum device you can more often face this situation. To avoid this 
you can configure a quorum device with a heuristic like ping message. 
Use the link 
(http://www.redhatmagazine.com/2007/12/19/enhancing-cluster-quorum-with-qdisk/) 
for configuring a quorum disk in RHCS.


Thanks,

S

-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Dalton, Maurice

Sent: Tuesday, March 25, 2008 5:18 PM
To: linux clustering
Subject: RE: [Linux-cluster] 3 node cluster problems

Still no change. Same as below.

I completely rebuilt the cluster using system-config-cluster

The Cluster software was installed from rhn, luci and ricci are running.

This is the new config file and it has been copied to the 2 other

systems

[EMAIL PROTECTED] cluster]# more cluster.conf

?xml version=1.0?

cluster config_version=5 name=csarcsys5

fence_daemon post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0 nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0 nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0 nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsysfo ordered=0

restricted=1

failoverdomainnode

name=csarcsys1-eth0 priority=1

RE: [Linux-cluster] 3 node cluster problems

2008-03-25 Thread Dalton, Maurice
Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs error
-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove 

Restarted cman on all systems and for some strange reason my clusters
are working.

It doesn't make any sense.

I can't thank you enough for your help...!!


Thanks.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I am currently running several 3-node cluster without a quorum disk. 
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send 
your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS

I configured some using Conga and some with system-config-cluster. When 
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and 
cluster name. I reboot all nodes
to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the problem.

Dalton, Maurice wrote:

 Same problem.

 I now have qdiskd running.

 I have ran diff's on all three cluster.conf files.. all are the same

 [EMAIL PROTECTED] cluster]# more cluster.conf

 ?xml version=1.0?

 cluster config_version=6 name=csarcsys5

 fence_daemon post_fail_delay=0 post_join_delay=3/

 clusternodes

 clusternode name=csarcsys1-eth0 nodeid=1 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys2-eth0 nodeid=2 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys3-eth0 nodeid=3 votes=1

 fence/

 /clusternode

 /clusternodes

 cman/

 fencedevices/

 rm

 failoverdomains

 failoverdomain name=csarcsysfo ordered=0 restricted=1

 failoverdomainnode name=csarcsys1-eth0 priority=1/

 failoverdomainnode name=csarcsys2-eth0 priority=1/

 failoverdomainnode name=csarcsys3-eth0 priority=1/

 /failoverdomain

 /failoverdomains

 resources

 ip address=172.24.86.177 monitor_link=1/

 fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
 fstype=ext3 mountpo

 int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

 /resources

 /rm

 quorumd interval=4 label=csarcsysQ min_score=1 tko=30
votes=2/

 /cluster

 More info from csarcsys3

 [EMAIL PROTECTED] cluster]# clustat

 msg_open: No such file or directory

 Member Status: Inquorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Offline

 csarcsys2-eth0 2 Offline

 csarcsys3-eth0 3 Online, Local

 /dev/sdd1 0 Offline

 [EMAIL PROTECTED] cluster]# mkqdisk -L

 mkqdisk v0.5.1

 /dev/sdd1:

 Magic: eb7a62c2

 Label: csarcsysQ

 Created: Wed Feb 13 13:44:35 2008

 Host: csarcsys1-eth0.xxx.xxx.nasa.gov

 [EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

 brw-r- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

 clustat from csarcsys1

 msg_open: No such file or directory

 Member Status: Quorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Online, Local

 csarcsys2-eth0 2 Online

 csarcsys3-eth0 3 Offline

 /dev/sdd1 0 Offline, Quorum Disk

 [EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

 brw-r- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

 mkqdisk v0.5.1

 /dev/sdd1:

 Magic: eb7a62c2

 Label: csarcsysQ

 Created: Wed Feb 13 13:44:35 2008

 Host: csarcsys1-eth0.xxx.xxx.nasa.gov

 Info from csarcsys2

 [EMAIL PROTECTED] cluster]# clustat

 msg_open: No such file or directory

 Member Status: Quorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Offline

 csarcsys2-eth0 2 Online, Local

 csarcsys3-eth0 3 Offline

 /dev/sdd1 0 Online, Quorum Disk

 *From:* [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] *On Behalf Of *Panigrahi, 
 Santosh Kumar
 *Sent:* Tuesday, March 25, 2008 7:33 AM
 *To:* linux clustering
 *Subject:* RE: [Linux-cluster] 3 node cluster problems

 If you are configuring your cluster by system-config-cluster then no 
 need to run ricci/luci. Ricci/luci needed for configuring the cluster 
 using conga. You can configure in either ways.

 On seeing your clustat command outputs, it seems cluster is 
 partitioned (spilt brain) into 2 sub clusters [Sub1-* 
 **(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a 
 quorum device you can more often face this situation. To avoid this 
 you can configure a quorum device with a heuristic like ping message. 
 Use the link 

(http

Re: [Linux-cluster] 3 node cluster problems

2008-03-25 Thread Bennie Thomas


Glad they are working. I have not used lvm with our Clusters. You know 
have peaked
my curiosity and I will have to try building one. So were you also using 
GFS ?


Dalton, Maurice wrote:

Sorry but security here will not allow me to send host files

BUT.


I was getting this in /var/log/messages on csarcsys3

Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused
Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs error
-111, check ccsd or cluster status
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
Refusing connection.
Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
connect: Connection refused


I had /dev/vg0/gfsvol on these systems.

I did a lvremove 


Restarted cman on all systems and for some strange reason my clusters
are working.

It doesn't make any sense.

I can't thank you enough for your help...!!


Thanks.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:27 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

I am currently running several 3-node cluster without a quorum disk. 
However, If you want your cluster to run
if only one node is up then you will need a quorum disk. Can you send 
your /etc/hosts file
for all systems, Also, could there be another node name called 
csarcsys3-eth0 in your NIS or DNS


I configured some using Conga and some with system-config-cluster. When 
using the system-config-cluster
I basically run the config on all nodes; just adding the nodenames and 
cluster name. I reboot all nodes

to make sure they see each other then go back and modify the config
files.

The file /var/log/messages should also shed some light on the problem.

Dalton, Maurice wrote:
  

Same problem.

I now have qdiskd running.

I have ran diff's on all three cluster.conf files.. all are the same

[EMAIL PROTECTED] cluster]# more cluster.conf

?xml version=1.0?

cluster config_version=6 name=csarcsys5

fence_daemon post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0 nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0 nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0 nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsysfo ordered=0 restricted=1

failoverdomainnode name=csarcsys1-eth0 priority=1/

failoverdomainnode name=csarcsys2-eth0 priority=1/

failoverdomainnode name=csarcsys3-eth0 priority=1/

/failoverdomain

/failoverdomains

resources

ip address=172.24.86.177 monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
fstype=ext3 mountpo


int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

/resources

/rm

quorumd interval=4 label=csarcsysQ min_score=1 tko=30


votes=2/
  

/cluster

More info from csarcsys3

[EMAIL PROTECTED] cluster]# clustat

msg_open: No such file or directory

Member Status: Inquorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Offline

csarcsys3-eth0 3 Online, Local

/dev/sdd1 0 Offline

[EMAIL PROTECTED] cluster]# mkqdisk -L

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

[EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

brw-r- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

clustat from csarcsys1

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Online, Local

csarcsys2-eth0 2 Online

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Offline, Quorum Disk

[EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

brw-r- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

mkqdisk v0.5.1

/dev/sdd1:

Magic: eb7a62c2

Label: csarcsysQ

Created: Wed Feb 13 13:44:35 2008

Host: csarcsys1-eth0.xxx.xxx.nasa.gov

Info from csarcsys2

[EMAIL PROTECTED] cluster]# clustat

msg_open: No such file or directory

Member Status: Quorate

Member Name ID Status

--   --

csarcsys1-eth0 1 Offline

csarcsys2-eth0 2 Online, Local

csarcsys3-eth0 3 Offline

/dev/sdd1 0 Online, Quorum Disk

*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of *Panigrahi, 
Santosh Kumar

*Sent:* Tuesday, March 25, 2008 7:33 AM
*To:* linux clustering
*Subject:* RE: [Linux-cluster] 3 node cluster problems

If you are configuring your cluster by system-config-cluster then no 
need to run ricci/luci. Ricci/luci needed for configuring the cluster 
using conga. You can configure in either ways.


On seeing your clustat command outputs, it seems cluster is 
partitioned (spilt brain) into 2 sub clusters [Sub1-* 
**(csarcsys1-eth0, csarcsys2-eth0*) 2-* **csarcsys3-eth0*]. Without a 
quorum device you can more often face this situation

RE: [Linux-cluster] 3 node cluster problems

2008-03-25 Thread Dalton, Maurice
I have just began to start test gfs. 

Sadly I rebooted csarcsys1 and now I am back in my same situation

This is weird.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Tuesday, March 25, 2008 10:57 AM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems


Glad they are working. I have not used lvm with our Clusters. You know 
have peaked
my curiosity and I will have to try building one. So were you also using

GFS ?

Dalton, Maurice wrote:
 Sorry but security here will not allow me to send host files

 BUT.


 I was getting this in /var/log/messages on csarcsys3

 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:11 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused
 Mar 25 15:26:12 csarcsys3-eth0 dlm_controld[7476]: connect to ccs
error
 -111, check ccsd or cluster status
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Cluster is not quorate.
 Refusing connection.
 Mar 25 15:26:12 csarcsys3-eth0 ccsd[7448]: Error while processing
 connect: Connection refused


 I had /dev/vg0/gfsvol on these systems.

 I did a lvremove 

 Restarted cman on all systems and for some strange reason my clusters
 are working.

 It doesn't make any sense.

 I can't thank you enough for your help...!!


 Thanks.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
 Sent: Tuesday, March 25, 2008 10:27 AM
 To: linux clustering
 Subject: Re: [Linux-cluster] 3 node cluster problems

 I am currently running several 3-node cluster without a quorum disk. 
 However, If you want your cluster to run
 if only one node is up then you will need a quorum disk. Can you send 
 your /etc/hosts file
 for all systems, Also, could there be another node name called 
 csarcsys3-eth0 in your NIS or DNS

 I configured some using Conga and some with system-config-cluster.
When 
 using the system-config-cluster
 I basically run the config on all nodes; just adding the nodenames and

 cluster name. I reboot all nodes
 to make sure they see each other then go back and modify the config
 files.

 The file /var/log/messages should also shed some light on the problem.

 Dalton, Maurice wrote:
   
 Same problem.

 I now have qdiskd running.

 I have ran diff's on all three cluster.conf files.. all are the same

 [EMAIL PROTECTED] cluster]# more cluster.conf

 ?xml version=1.0?

 cluster config_version=6 name=csarcsys5

 fence_daemon post_fail_delay=0 post_join_delay=3/

 clusternodes

 clusternode name=csarcsys1-eth0 nodeid=1 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys2-eth0 nodeid=2 votes=1

 fence/

 /clusternode

 clusternode name=csarcsys3-eth0 nodeid=3 votes=1

 fence/

 /clusternode

 /clusternodes

 cman/

 fencedevices/

 rm

 failoverdomains

 failoverdomain name=csarcsysfo ordered=0 restricted=1

 failoverdomainnode name=csarcsys1-eth0 priority=1/

 failoverdomainnode name=csarcsys2-eth0 priority=1/

 failoverdomainnode name=csarcsys3-eth0 priority=1/

 /failoverdomain

 /failoverdomains

 resources

 ip address=172.24.86.177 monitor_link=1/

 fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
 fstype=ext3 mountpo

 int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

 /resources

 /rm

 quorumd interval=4 label=csarcsysQ min_score=1 tko=30
 
 votes=2/
   
 /cluster

 More info from csarcsys3

 [EMAIL PROTECTED] cluster]# clustat

 msg_open: No such file or directory

 Member Status: Inquorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Offline

 csarcsys2-eth0 2 Offline

 csarcsys3-eth0 3 Online, Local

 /dev/sdd1 0 Offline

 [EMAIL PROTECTED] cluster]# mkqdisk -L

 mkqdisk v0.5.1

 /dev/sdd1:

 Magic: eb7a62c2

 Label: csarcsysQ

 Created: Wed Feb 13 13:44:35 2008

 Host: csarcsys1-eth0.xxx.xxx.nasa.gov

 [EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

 brw-r- 1 root disk 8, 49 Mar 25 14:09 /dev/sdd1

 clustat from csarcsys1

 msg_open: No such file or directory

 Member Status: Quorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Online, Local

 csarcsys2-eth0 2 Online

 csarcsys3-eth0 3 Offline

 /dev/sdd1 0 Offline, Quorum Disk

 [EMAIL PROTECTED] cluster]# ls -l /dev/sdd1

 brw-r- 1 root disk 8, 49 Mar 25 14:19 /dev/sdd1

 mkqdisk v0.5.1

 /dev/sdd1:

 Magic: eb7a62c2

 Label: csarcsysQ

 Created: Wed Feb 13 13:44:35 2008

 Host: csarcsys1-eth0.xxx.xxx.nasa.gov

 Info from csarcsys2

 [EMAIL PROTECTED] cluster]# clustat

 msg_open: No such file or directory

 Member Status: Quorate

 Member Name ID Status

 --   --

 csarcsys1-eth0 1 Offline

 csarcsys2-eth0 2 Online, Local

 csarcsys3-eth0 3 Offline

 /dev/sdd1 0 Online, Quorum Disk

 *From:* [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] *On Behalf Of *Panigrahi, 
 Santosh Kumar
 *Sent:* Tuesday, March 25, 2008 7:33 AM
 *To:* linux clustering
 *Subject:* RE: [Linux

Re: [Linux-cluster] 3 node cluster problems

2008-03-24 Thread Bennie Thomas
You will also, need to make sure the clustered nodenames are in your 
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes and 
that the

/etc/cluster/cluster.conf are the same on all nodes.



Dalton, Maurice wrote:


The last post is incorrect.

Fence is still hanging at start up.

Here’s another log message.

Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 
connect: Connection refused


Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 
error -111, check ccsd or cluster status


*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of *Bennie Thomas

*Sent:* Monday, March 24, 2008 11:22 AM
*To:* linux clustering
*Subject:* Re: [Linux-cluster] 3 node cluster problems

try removing the fully qualified hostname from the cluster.conf file.


Dalton, Maurice wrote:

I have NO fencing equipment

I have been task to setup a 3 node cluster

Currently I have having problems getting cman(fence) to start

Fence will try to start up during cman start up but will fail

I tried to run /sbin/fenced –D - I get the following

1206373475 cman_init error 0 111

Here’s my cluster.conf file

?xml version=1.0?

cluster alias=csarcsys51 config_version=26 name=csarcsys51

fence_daemon clean_start=0 post_fail_delay=0 post_join_delay=3/

clusternodes

clusternode name=csarcsys1-eth0.xxx..nasa.gov nodeid=1 votes=1

fence/

/clusternode

clusternode name=csarcsys2-eth0.xxx..nasa.gov nodeid=2 votes=1

fence/

/clusternode

clusternode name=csarcsys3-eth0.xxx.nasa.gov nodeid=3 votes=1

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsys-fo ordered=1 restricted=0

failoverdomainnode name=csarcsys1-eth0.xxx..nasa.gov priority=1/

failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov priority=1/

failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov priority=1/

/failoverdomain

/failoverdomains

resources

ip address=xxx.xxx.xxx.xxx monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
fstype=ext3 mountpo


int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

nfsexport name=csarcsys-export/

nfsclient name=csarcsys-nfs-client options=no_root_squash,rw 
path=/csarc-test targe


t=xxx.xxx.xxx.*/

/resources

/rm

/cluster

Messages from the logs

ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


 




  
 
--

Linux-cluster mailing list
Linux-cluster@redhat.com mailto:Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster




--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster


RE: [Linux-cluster] 3 node cluster problems

2008-03-24 Thread Dalton, Maurice
No luck. It seems as if csarcsys3 thinks its in his own cluster
I renamed all config files and rebuilt from system-config-cluster

Clustat command from csarcsys3


[EMAIL PROTECTED] cluster]# clustat
msg_open: No such file or directory
Member Status: Inquorate

  Member NameID   Status
  --  --
  csarcsys1-eth01 Offline
  csarcsys2-eth02 Offline
  csarcsys3-eth03 Online, Local

clustat command from csarcsys2 

[EMAIL PROTECTED] cluster]# clustat
msg_open: No such file or directory
Member Status: Quorate

  Member NameID   Status
  --  --
  csarcsys1-eth01 Online
  csarcsys2-eth02 Online, Local
  csarcsys3-eth03 Offline


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 2:25 PM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

You will also, need to make sure the clustered nodenames are in your 
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes and 
that the
/etc/cluster/cluster.conf are the same on all nodes.



Dalton, Maurice wrote:

 The last post is incorrect.

 Fence is still hanging at start up.

 Here's another log message.

 Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 
 connect: Connection refused

 Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 
 error -111, check ccsd or cluster status

 *From:* [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] *On Behalf Of *Bennie Thomas
 *Sent:* Monday, March 24, 2008 11:22 AM
 *To:* linux clustering
 *Subject:* Re: [Linux-cluster] 3 node cluster problems

 try removing the fully qualified hostname from the cluster.conf file.


 Dalton, Maurice wrote:

 I have NO fencing equipment

 I have been task to setup a 3 node cluster

 Currently I have having problems getting cman(fence) to start

 Fence will try to start up during cman start up but will fail

 I tried to run /sbin/fenced -D - I get the following

 1206373475 cman_init error 0 111

 Here's my cluster.conf file

 ?xml version=1.0?

 cluster alias=csarcsys51 config_version=26 name=csarcsys51

 fence_daemon clean_start=0 post_fail_delay=0
post_join_delay=3/

 clusternodes

 clusternode name=csarcsys1-eth0.xxx..nasa.gov nodeid=1
votes=1

 fence/

 /clusternode

 clusternode name=csarcsys2-eth0.xxx..nasa.gov nodeid=2
votes=1

 fence/

 /clusternode

 clusternode name=csarcsys3-eth0.xxx.nasa.gov nodeid=3
votes=1

 fence/

 /clusternode

 /clusternodes

 cman/

 fencedevices/

 rm

 failoverdomains

 failoverdomain name=csarcsys-fo ordered=1 restricted=0

 failoverdomainnode name=csarcsys1-eth0.xxx..nasa.gov
priority=1/

 failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov
priority=1/

 failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov
priority=1/

 /failoverdomain

 /failoverdomains

 resources

 ip address=xxx.xxx.xxx.xxx monitor_link=1/

 fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
 fstype=ext3 mountpo

 int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

 nfsexport name=csarcsys-export/

 nfsclient name=csarcsys-nfs-client options=no_root_squash,rw 
 path=/csarc-test targe

 t=xxx.xxx.xxx.*/

 /resources

 /rm

 /cluster

 Messages from the logs

 ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
 Refusing connection.

 Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 
 connect: Connection refused

 Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
 Refusing connection.

 Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 
 connect: Connection refused

 Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
 Refusing connection.

 Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing 
 connect: Connection refused

 Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
 Refusing connection.

 Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing 
 connect: Connection refused

 Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
 Refusing connection.

 Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing 
 connect: Connection refused

  




   
  
 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com mailto:Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster




 --
 Linux-cluster mailing list
 Linux-cluster@redhat.com
 https://www.redhat.com/mailman/listinfo/linux-cluster



--
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] 3 node cluster problems

2008-03-24 Thread Bennie Thomas
When you changed the nodenames in the /etc/lcuster/cluster.conf and made 
sure the /etc/hosts
file had the correct nodenames (Ie. 10.0.0.100  csarcsys1-eth0   
csarcsys1-eth0...xxx.)

Did you reboot all the nodes at the sametime ?

Dalton, Maurice wrote:

No luck. It seems as if csarcsys3 thinks its in his own cluster
I renamed all config files and rebuilt from system-config-cluster

Clustat command from csarcsys3


[EMAIL PROTECTED] cluster]# clustat
msg_open: No such file or directory
Member Status: Inquorate

  Member NameID   Status
  --  --
  csarcsys1-eth01 Offline
  csarcsys2-eth02 Offline
  csarcsys3-eth03 Online, Local

clustat command from csarcsys2 


[EMAIL PROTECTED] cluster]# clustat
msg_open: No such file or directory
Member Status: Quorate

  Member NameID   Status
  --  --
  csarcsys1-eth01 Online
  csarcsys2-eth02 Online, Local
  csarcsys3-eth03 Offline


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 2:25 PM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

You will also, need to make sure the clustered nodenames are in your 
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes and 
that the

/etc/cluster/cluster.conf are the same on all nodes.



Dalton, Maurice wrote:
  

The last post is incorrect.

Fence is still hanging at start up.

Here's another log message.

Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 
connect: Connection refused


Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 
error -111, check ccsd or cluster status


*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of *Bennie Thomas

*Sent:* Monday, March 24, 2008 11:22 AM
*To:* linux clustering
*Subject:* Re: [Linux-cluster] 3 node cluster problems

try removing the fully qualified hostname from the cluster.conf file.


Dalton, Maurice wrote:

I have NO fencing equipment

I have been task to setup a 3 node cluster

Currently I have having problems getting cman(fence) to start

Fence will try to start up during cman start up but will fail

I tried to run /sbin/fenced -D - I get the following

1206373475 cman_init error 0 111

Here's my cluster.conf file

?xml version=1.0?

cluster alias=csarcsys51 config_version=26 name=csarcsys51

fence_daemon clean_start=0 post_fail_delay=0


post_join_delay=3/
  

clusternodes

clusternode name=csarcsys1-eth0.xxx..nasa.gov nodeid=1


votes=1
  

fence/

/clusternode

clusternode name=csarcsys2-eth0.xxx..nasa.gov nodeid=2


votes=1
  

fence/

/clusternode

clusternode name=csarcsys3-eth0.xxx.nasa.gov nodeid=3


votes=1
  

fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsys-fo ordered=1 restricted=0

failoverdomainnode name=csarcsys1-eth0.xxx..nasa.gov


priority=1/
  

failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov


priority=1/
  

failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov


priority=1/
  

/failoverdomain

/failoverdomains

resources

ip address=xxx.xxx.xxx.xxx monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
fstype=ext3 mountpo


int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

nfsexport name=csarcsys-export/

nfsclient name=csarcsys-nfs-client options=no_root_squash,rw 
path=/csarc-test targe


t=xxx.xxx.xxx.*/

/resources

/rm

/cluster

Messages from the logs

ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:22 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:23 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


 




  
  
 
--

Linux-cluster mailing list
Linux-cluster@redhat.com mailto:Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Re: [Linux-cluster] 3 node cluster problems

2008-03-24 Thread Bennie Thomas
Did you load the Cluster software via Conga or manually ? You would have 
had to load

luci on one node and ricci on all three.

Try copying the modified /etc/cluster/cluster.conf from csarcsys1 to the 
other two nodes.
Make sure you can ping the private interface to/from all nodes and 
reboot. If this does not work

post your /etc/cluster/cluster.conf file again.


Dalton, Maurice wrote:

Yes
I also rebooted again just now to be sure.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 3:33 PM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

When you changed the nodenames in the /etc/lcuster/cluster.conf and made

sure the /etc/hosts
file had the correct nodenames (Ie. 10.0.0.100  csarcsys1-eth0   
csarcsys1-eth0...xxx.)

Did you reboot all the nodes at the sametime ?

Dalton, Maurice wrote:
  

No luck. It seems as if csarcsys3 thinks its in his own cluster
I renamed all config files and rebuilt from system-config-cluster

Clustat command from csarcsys3


[EMAIL PROTECTED] cluster]# clustat
msg_open: No such file or directory
Member Status: Inquorate

  Member NameID   Status
  --  --
  csarcsys1-eth01 Offline
  csarcsys2-eth02 Offline
  csarcsys3-eth03 Online, Local

clustat command from csarcsys2 


[EMAIL PROTECTED] cluster]# clustat
msg_open: No such file or directory
Member Status: Quorate

  Member NameID   Status
  --  --
  csarcsys1-eth01 Online
  csarcsys2-eth02 Online, Local
  csarcsys3-eth03 Offline


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bennie Thomas
Sent: Monday, March 24, 2008 2:25 PM
To: linux clustering
Subject: Re: [Linux-cluster] 3 node cluster problems

You will also, need to make sure the clustered nodenames are in your 
/etc/hosts file.
Also, make sure your cluster network interface is up on all nodes and 
that the

/etc/cluster/cluster.conf are the same on all nodes.



Dalton, Maurice wrote:
  


The last post is incorrect.

Fence is still hanging at start up.

Here's another log message.

Mar 24 19:03:14 csarcsys3-eth0 ccsd[6425]: Error while processing 
connect: Connection refused


Mar 24 19:03:15 csarcsys3-eth0 dlm_controld[6453]: connect to ccs 
error -111, check ccsd or cluster status


*From:* [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] *On Behalf Of *Bennie
  

Thomas
  

*Sent:* Monday, March 24, 2008 11:22 AM
*To:* linux clustering
*Subject:* Re: [Linux-cluster] 3 node cluster problems

try removing the fully qualified hostname from the cluster.conf file.


Dalton, Maurice wrote:

I have NO fencing equipment

I have been task to setup a 3 node cluster

Currently I have having problems getting cman(fence) to start

Fence will try to start up during cman start up but will fail

I tried to run /sbin/fenced -D - I get the following

1206373475 cman_init error 0 111

Here's my cluster.conf file

?xml version=1.0?

cluster alias=csarcsys51 config_version=26 name=csarcsys51

fence_daemon clean_start=0 post_fail_delay=0

  

post_join_delay=3/
  


clusternodes

clusternode name=csarcsys1-eth0.xxx..nasa.gov nodeid=1

  

votes=1
  


fence/

/clusternode

clusternode name=csarcsys2-eth0.xxx..nasa.gov nodeid=2

  

votes=1
  


fence/

/clusternode

clusternode name=csarcsys3-eth0.xxx.nasa.gov nodeid=3

  

votes=1
  


fence/

/clusternode

/clusternodes

cman/

fencedevices/

rm

failoverdomains

failoverdomain name=csarcsys-fo ordered=1 restricted=0

failoverdomainnode name=csarcsys1-eth0.xxx..nasa.gov

  

priority=1/
  


failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov

  

priority=1/
  


failoverdomainnode name=csarcsys2-eth0.xxx..nasa.gov

  

priority=1/
  


/failoverdomain

/failoverdomains

resources

ip address=xxx.xxx.xxx.xxx monitor_link=1/

fs device=/dev/sdc1 force_fsck=0 force_unmount=1 fsid=57739 
fstype=ext3 mountpo


int=/csarc-test name=csarcsys-fs options=rw self_fence=0/

nfsexport name=csarcsys-export/

nfsclient name=csarcsys-nfs-client options=no_root_squash,rw 
path=/csarc-test targe


t=xxx.xxx.xxx.*/

/resources

/rm

/cluster

Messages from the logs

ar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:19 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate. 
Refusing connection.


Mar 24 13:24:20 csarcsys2-eth0 ccsd[24888]: Error while processing 
connect: Connection refused


Mar 24 13:24:21 csarcsys2-eth0 ccsd[24888]: Cluster is not quorate