Re: [xcat-user] xCAT HA documentation

2014-02-05 Thread Arif Ali
further from this last year, I am in the process of doing some rhels6.5
installs, and again RedHat have changed it again.

Now they officially support pcs.

But there are docs suggesting that they will be moving away from corosync,
and therefore need to use ccs and cman.

I have spent a couple of days on this, and have now updated the (the below
mentioned) documentation a bit, and added a Appendix B, with the relevant
configs for cman, and the config changes for pcs.

FYI, I have successfully done HA with pcs already with at least 3 customer
sites(rhel6.4), and one with crm (rhel6.3) The customer even installed the
Primary MN with the secondary, and it was successful. So thanks again for
the xCAT team for all the original docs.

regards,
Arif

--
Arif Ali

IRC: arif-ali at freenode
LinkedIn: http://uk.linkedin.com/in/arifali


On 21 August 2013 02:29, Guang Cheng Li ligua...@cn.ibm.com wrote:

  HI Arif,

 Thanks for the configuration listed below, I updated the xCAT doc
 *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setup_HA_Mgmt_Node_With_DRBD_Pacemaker_Corosync*http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setup_HA_Mgmt_Node_With_DRBD_Pacemaker_Corosync
  to
 reflect this. If you have any further updates with the configuration or
 procedure, please let me know and I can update the doc. Thanks.

 -
 Li,Guang Cheng (李光成)
 IBM China System Technology Laboratory

 Email: ligua...@cn.ibm.com
 Address: Building 28, ZhongGuanCun Software Park,
  No.8, Dong Bei Wang West Road, Haidian District Beijing 100193,
 PRC

 北京市海淀区东北旺西路8号中关村软件园28号楼
 邮编: 100193

 [image: Inactive hide details for Arif Ali ---2013-08-21 05:18:40---Hi
 Lindsay, Thanks for the info, that is good to know.]Arif Ali
 ---2013-08-21 05:18:40---Hi Lindsay, Thanks for the info, that is good to
 know.

 From: Arif Ali m...@arif-ali.co.uk
 To: xCAT Users Mailing list xcat-user@lists.sourceforge.net
 Date: 2013-08-21 05:18
 Subject: Re: [xcat-user] xCAT HA documentation
 --



 Hi Lindsay,

 Thanks for the info, that is good to know.

 But after diagnosing for the day, I have found the relevant stuff to get
 it all working,So wrt the documentation from
 *http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setup_HA_Mgmt_Node_With_DRBD_Pacemaker_Corosync*http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setup_HA_Mgmt_Node_With_DRBD_Pacemaker_Corosync,
 the following would be the current changes, this is now successfully
 working for me know. Will be doing more testing tomorrow. If the devs are
 happy I will make the relevant changes to reflect the chang in rhels6.4

 pcs property set stonith-enabled=false
 pcs property set no-quorum-policy=ignore
 pcs resource op defaults timeout=120s

 pcs resource create ip_xCAT ocf:heartbeat:IPaddr2 ip=10.1.0.1 \
  iflabel=xCAT cidr_netmask=24 \
  op monitor interval=37s
 pcs resource create NFS_xCAT lsb:nfs \
  op monitor interval=41s
 pcs resource create NFSlock_xCAT lsb:nfslock \
  op monitor interval=43s
 pcs resource create apache_xCAT ocf:heartbeat:apache
 configfile=/etc/httpd/conf/httpd.conf \
  
 statusurl=*http://localhost:80/icons/README.html*http://localhost/icons/README.html
 testregex=/html \
  op monitor interval=57s
 pcs resource create db_xCAT ocf:heartbeat:mysql
 config=/xCATdrbd/etc/my.cnf test_user=mysql \
  binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid
 socket=/var/lib/mysql/mysql.sock \
  op monitor interval=57s
 pcs resource create dhcpd lsb:dhcpd
  op monitor interval=37s
 pcs resource create drbd_xCAT ocf:linbit:drbd drbd_resource=xCAT
 pcs resource master ms_drbd_xCAT drbd_xCAT master-max=1
 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
 pcs resource create dummy ocf:heartbeat:Dummy
 pcs resource create fs_xCAT ocf:heartbeat:Filesystem
 device=/dev/drbd/by-res/xCAT directory=/xCATdrbd fstype=ext4 \
  op monitor interval=57s
 pcs resource create named lsb:named \
  op monitor interval=37s
 pcs resource create symlinks_xCAT ocf:tummy:drbdlinks
 configfile=/xCATdrbd/etc/drbdlinks.xCAT.conf \
  op monitor interval=31s
 pcs resource create xCAT lsb:xcatd \
  op monitor interval=42s
 pcs resource clone clone_named named clone-max=2 clone-node-max=1
 notify=false
 pcs resource group add fs_xCAT symlinks_xCAT
 pcs constraint colocation add NFS_xCAT grp_xCAT
 pcs constraint colocation add NFSlock_xCAT grp_xCAT
 pcs constraint colocation add apache_xCAT grp_xCAT
 pcs constraint colocation add dhcpd grp_xCAT
 pcs constraint colocation add db_xCAT grp_xCAT
 pcs constraint colocation add dummy grp_xCAT
 pcs constraint colocation add xCAT grp_xCAT
 pcs constraint colocation add grp_xCAT ms_drbd_xCAT INFINITY
 with-rsc-role=Master
 pcs constraint colocation add ip_xCAT ms_drbd_xCAT INFINITY
 with-rsc-role=Master
 pcs constraint order list xCAT dummy
 pcs constraint order list 

Re: [xcat-user] How to create and deploy an xCAT Service Node

2014-02-05 Thread Josh Nielsen
Okay, I guess I need to revive this again now that I have the SNs deployed
and now I am trying to snmove some nodes onto them. The Heirarchical
Cluster wiki page is oriented toward those setting up a brand new cluster
and not migrating an established cluster to include SNs, so it does not
include clear instructions of what commands to run after you have created
groups of CNs for SNs to manage. I am assuming that to get nodes to
initially look away from the MN and put them on an SN for the first time
you must execute snmove with -d and -D to point to the SN.

My config follows:

I am testing on just two of the nodes in my cluster for now. So first I did
this:

mkdef -t group -o serv1_compute members=node0001,node0002

Then following the documentation for creating service pools I did this:

chdef -t group serv1_compute servicenode=xcat-serv1,xcat-serv2

# lsdef -t group serv1_compute
Object name: serv1_compute
grouptype=static
members=node0001,node0002
servicenode=xcat-serv1,xcat-serv2

And noderes looks like this now:

#node,servicenode,netboot,tftpserver,tftpdir,nfsserver,monserver,nfsdir,installnic,primarynic,discoverynics,cmdinterface,xcatmaster,current_osimage,next_osimage,nimserver,routenames,nameservers,comments,disable
user,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,,
service,,xnba,MN_IP,,MN_IP,,,mac,mac,,,MN_IP,,,
storage,,xnba,MN_IP,,MN_IP,,,eth1,eth1,,,MN_IP,,,
compute,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,,
login,,xnba,MN_IP,,MN_IP,,,eth0,eth0,,,MN_IP,,,
node0059,,xnba,
hinode01,,xnba,
serv1_compute,xcat-serv1,xcat-serv2,,
node0001,xcat-serv1,xcat-serv2,,xcat-serv1,,xcat-serv1,,,xcat-serv1,,,
node0002,xcat-serv1,xcat-serv2,,xcat-serv1,,xcat-serv1,,,xcat-serv1,,,

I may have a conflict problem though in that the established compute
group which node0001 and node0002 are in is pointing to MN_IP (the MN's ip
address) while serv1_compute points to xcat-serv1. I was hoping that since
noderes FURTHER defined the servicenode and xcatmaster for them that it
would override the settings for compute. Will that work or do I have to
remove node0001  node0002 from compute altogether?

Their nodelist entries look like this:

node0001,compute,compute-profile,ipmi,dx360m2,rack01,all,serv1_compute,booting,11-24-2013
13:55:00,synced,02-05-2014 08:59:57,,
node0002,compute,compute-profile,ipmi,dx360m2,rack01,all,serv1_compute,booting,11-24-2013
13:55:00



Then after all the configuration, I tried an snmove on just node0001:


# snmove serv1_compute -d xcat-serv1 -D xcat-serv1
Moving nodes to their backup service nodes.

Setting new values in the xCAT database.


node0001: install centos6.4-x86_64-compute
node0002: install centos6.4-x86_64-compute
node0001: install centos6.4-x86_64-compute
node0002: install centos6.4-x86_64-compute
Running postscripts on the nodes.
If you specify the -s flag you must not specify either the -S or -k or -P
 flags

In /var/log/messages I saw: Allowing nodeset to node0001,node0002 install
for x3650-head01.haib.org http://x3650-head01.haig.org/ from x3650-head01

Firstly, why was a nodeset done when I typed snmove? The nodes are already
installed, I don't want to reinstall them.

Secondly, According to the wiki documentation: If the CNs are up at the
time the *snmove* command is run then snmove will run postscripts on the
CNs to reconfigure them for the new SN.

However I checked files on node0001 like /etc/ntp.conf and their timestamp
had not changed (therefore I deduce the postscript did not run). So I ran
the postscripts manually with updatenode node0001 syslog,setupntp. I
checked  /etc/ntp.conf again and this time the timestamp was updated but
the file's contents were identical to before: it pointed to the MN_IP and
not xcat-serv1 as it should based on the xcatmaster setting in the noderes
table.

What am I doing wrong here?

Thanks,
Josh




On Fri, Jan 10, 2014 at 1:48 PM, Josh Nielsen jniel...@hudsonalpha.orgwrote:

 Thank you Lissa, that is helpful.

 -Josh


 On Fri, Jan 10, 2014 at 1:25 PM, Lissa Valletta lis...@us.ibm.com wrote:

  DNS and DHCP  will still work from the Service Node, if setup
 correctly.  In other words,   you have configured the service node as the
 DNS server and/or  DHCP server for the nodes  and there is no requirement
 on the Management Node for dns or dhcp.   You will  not be able to run
 any xcat commands on the service node, if the Management Node is down.
 xCAT  requires access to the database configured on the MN for the xcat
 cluster ( mysql, postgresql) to run most xcat commands.  Even to recognize
 that the node is in the xcat cluster.

 Lissa K. Valletta
 8-3/B10
 Poughkeepsie, NY 12601
 (tie 293) 433-3102



 [image: Inactive hide details for Josh Nielsen ---01/10/2014 12:59:12
 PM---Hi Wang Xiaopeng (I apologize if I got your name wrong befor]Josh
 Nielsen ---01/10/2014 12:59:12 PM---Hi Wang Xiaopeng (I apologize if I got
 your