Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, getenforce returns Enforcing ls -dZ /var/www/html returns drwxr-xr-x. root root system_u:object_r:httpd_sys_content_t:s0 /var/www/html on both nodes. Running restorecon doesn't change the ls-dZ output. On Wed, Nov 12, 2014 at 2:24 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 11.11.2014 07:27, Sihan Goi wrote: Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. It look like apache is unable to stat its document root. Could you please show output of two commands: getenforce ls -dZ /var/www/html on both nodes when fs is mounted on one of them? If you see 'Enforcing', and the last part of the selinux context of a mounted fs root is not httpd_sys_content_t, then run 'restorecon -R /var/www/html' on that node. Status URL is enabled in both nodes. On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net mailto:and...@beekhof.net wrote: On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com mailto:gois...@gmail.com wrote: Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for /dev/drbd/by-res/wwwdata on /var/www/html Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating action 7: start WebSite_start_0 on node02 (local) Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory Is DocumentRoot still set to /var/www/html? If so, what happens if you run 'ls -al /var/www/html' in a shell? Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Did you enable the status url? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, I'm fluent in English so I doubt it's a language barrier. I have reasonable user experience in Linux, though not extensive experience in the various system commands, and I have zero experience in HA. I'm in fact trying to make things as simple as possible by simply following the Clusters from Scratch guide step by step, and only modifying/omitting steps when they don't work. I know a block device (like /dev/sda) is simply a device (such as a hard disk) that appears like a file in Linux, allowing users buffered access to the device. I know a file system is like FAT/NTFS/ext2/etc. I know a mount point is a directory that you can mount an image file with a file system onto it. Once mounted, it would be as if the entire file system has the mount point as its root directory. I set up DRBD almost exactly like the instructions from Chapter 7 of Clusters from Scratch. The only differences are in our setups. The guide assumes Fedora 13, DRBD 8.3 while I'm using CentOS 6.5 and DRBD 8.4. Since I was following the guide from start to finish, /var/www/html already has index.html already in there. node01 has it's own index.html, and node02 has its own index.html, both with different content. The guide did not instruct me to delete these files, and seems to configure the mount point to be /var/www/html (Chapter 7.4) with an ext4 file system, hence mounting the image onto a directory that already has files in it. Is this a problem? On Tue, Nov 11, 2014 at 6:07 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Tue, Nov 11, 2014 at 12:27:23PM +0800, Sihan Goi wrote: Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. I'm unsure if there is just a language barrier, or if you just have not enough experience with linux in general, or if you try to make things more complicated as they are. Do you know * what a block device is? * what a file system is? * what a mount point is? * that a mount point may not be empty, even though it typically is? * what it means to mount a file system to a mount point? Assuming you set up DRBD in a sane way, and it is mounted on *one* node (the node where it is Primary), then on the *other* node, where it is NOT mounted, you will only see the mount point, and whatever happens to be in there. You probably should clear out the contents of that mount point, so that you'd have an empty mount point. Or, if you like, replace it with some dummy content that clearly shows that this is the mount point, and not the file system that is intended to be mounted there. Status URL is enabled in both nodes. As for the DocumentRoot must be a directory, please double check for typos... On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net wrote: On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com wrote: Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for /dev/drbd/by-res/wwwdata on /var/www/html Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating action 7: start WebSite_start_0 on node02 (local) Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory Is DocumentRoot still set to /var/www/html? If so, what happens if you run 'ls -al /var/www/html' in a shell? Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Did you enable the status url? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, DocumentRoot is still set to /var/www/html ls -al /var/www/html shows different things on the 2 nodes node01: total 28 drwxr-xr-x. 3 root root 4096 Nov 11 12:25 . drwxr-xr-x. 6 root root 4096 Jul 23 22:18 .. -rw-r--r--. 1 root root50 Oct 28 18:00 index.html drwx--. 2 root root 16384 Oct 28 17:59 lost+found node02 only has index.html, no lost+found, and it's a different version of the file. Status URL is enabled in both nodes. On Oct 30, 2014 11:14 AM, Andrew Beekhof and...@beekhof.net wrote: On 29 Oct 2014, at 1:01 pm, Sihan Goi gois...@gmail.com wrote: Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! Oct 28 18:13:38 node02 Filesystem(WebFS)[29940]: INFO: Running start for /dev/drbd/by-res/wwwdata on /var/www/html Oct 28 18:13:39 node02 kernel: EXT4-fs (drbd1): mounted filesystem with ordered data mode. Opts: Oct 28 18:13:39 node02 crmd[9870]: notice: process_lrm_event: LRM operation WebFS_start_0 (call=164, rc=0, cib-update=298, confirmed=true) ok Oct 28 18:13:39 node02 crmd[9870]: notice: te_rsc_command: Initiating action 7: start WebSite_start_0 on node02 (local) Oct 28 18:13:39 node02 apache(WebSite)[30007]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory Is DocumentRoot still set to /var/www/html? If so, what happens if you run 'ls -al /var/www/html' in a shell? Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: apache not running Oct 28 18:13:39 node02 apache(WebSite)[30007]: INFO: waiting for apache /etc/httpd/conf/httpd.conf to come up Did you enable the status url? http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_enable_the_apache_status_url.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, No, I did not do this. I followed the Pacemaker 1.1 - Clusters from scratch edition 5 for Fedora 13, and in section 7.3.4 it instructed me to run the following commands, which I did: mkfs.ext4 /dev/drbd1 mount /dev/drbd1 /mnt create index.html file in /mnt umount /dev/drbd1 Subsequently, after unmounting, there were no further instructions to mount any other directories. So, how should I mount /dev/mapper/vg_node02-drbd--demo to /var/www/html? Should I be mounting /dev/mapper/vg_node02-drbd--demo, or /dev/drbd1. Since I've already created index.html in /dev/drbd1, should I be mounting that? I'm a little confused here. On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof and...@beekhof.net wrote: On 27 Oct 2014, at 6:05 pm, Sihan Goi gois...@gmail.com wrote: Hi, That offending line is as follows: DocumentRoot /var/www/html I'm guessing it needs to be updated to the DRBD block device, but I'm not sure how to do that, or even what the block device is. fdisk -l shows the following, which I'm guessing is the block device? /dev/mapper/vg_node02-drbd--demo lvs shows the following: drbd-demo vg_node02 -wi-ao 1.00g btw I'm running the commands on node02 (secondary) rather than node01 (primary). It's just a matter of convenience due to the physical location of the machine. Does it matter? Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html with a FileSystem resource. Have you not done this? Thanks. On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof and...@beekhof.net wrote: Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory On 27 Oct 2014, at 1:36 pm, Sihan Goi gois...@gmail.com wrote: Hi Andrew, Logs in /var/log/httpd/ are empty, but here's a snippet of /var/log/messages right after I start pacemaker and do a crm status http://pastebin.com/ivQdyV4u Seems like the Apache service doesn't come up. This only happens after I run the commands in the guide to configure DRBD. On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote: logs? On 23 Oct 2014, at 1:08 pm, Sihan Goi gois...@gmail.com wrote: Hi, can anyone help? Really stuck here... On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote: Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed to get a 2 node cluster working with Apache. However, once I tried to add DRBD 8.4 to the mix, it stopped working. I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started node02 Master/Slave Set: WebDataClone [WebData] Masters: [ node02 ] Slaves: [ node01 ] WebFS (ocf::heartbeat:Filesystem):Started node02 Failed actions: WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms Seems like the apache Website resource isn't starting up. Apache was working just fine before I configured DRBD. What did I do wrong? -- - Goi Sihan gois...@gmail.com -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, I followed those steps previously. I just tried it again, but I'm still getting the same error. My crm configure show shows the following: node node01 \ attributes standby=off node node02 primitive ClusterIP IPaddr2 \ params ip=192.168.1.110 cidr_netmask=24 \ op monitor interval=30s primitive WebData ocf:linbit:drbd \ params drbd_resource=wwwdata \ op monitor interval=60s primitive WebFS Filesystem \ params device=/dev/drbd/by-res/wwwdata directory=/var/www/html fstype=ext4 primitive WebSite apache \ params configfile=/etc/httpd/conf/httpd.conf \ op monitor interval=1min ms WebDataClone WebData \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true location prefer-node01 WebSite 50: node01 colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip Mandatory: ClusterIP WebSite property cib-bootstrap-options: \ dc-version=1.1.10-14.el6_5.3-368c726 \ cluster-infrastructure=cman \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults rsc_defaults-options: \ migration-threshold=1 What am I doing wrong? On Tue, Oct 28, 2014 at 5:11 PM, Andrew Beekhof and...@beekhof.net wrote: On 28 Oct 2014, at 6:26 pm, Sihan Goi gois...@gmail.com wrote: Hi, No, I did not do this. I followed the Pacemaker 1.1 - Clusters from scratch edition 5 for Fedora 13, and in section 7.3.4 it instructed me to run the following commands, which I did: mkfs.ext4 /dev/drbd1 mount /dev/drbd1 /mnt create index.html file in /mnt umount /dev/drbd1 Subsequently, after unmounting, there were no further instructions to mount any other directories. So, how should I mount /dev/mapper/vg_node02-drbd--demo to /var/www/html? Should I be mounting /dev/mapper/vg_node02-drbd--demo, or /dev/drbd1. Since I've already created index.html in /dev/drbd1, should I be mounting that? I'm a little confused here. http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configure_the_cluster_for_drbd.html Look for Now that DRBD is functioning we can configure a Filesystem resource to use it On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof and...@beekhof.net wrote: On 27 Oct 2014, at 6:05 pm, Sihan Goi gois...@gmail.com wrote: Hi, That offending line is as follows: DocumentRoot /var/www/html I'm guessing it needs to be updated to the DRBD block device, but I'm not sure how to do that, or even what the block device is. fdisk -l shows the following, which I'm guessing is the block device? /dev/mapper/vg_node02-drbd--demo lvs shows the following: drbd-demo vg_node02 -wi-ao 1.00g btw I'm running the commands on node02 (secondary) rather than node01 (primary). It's just a matter of convenience due to the physical location of the machine. Does it matter? Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html with a FileSystem resource. Have you not done this? Thanks. On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof and...@beekhof.net wrote: Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory On 27 Oct 2014, at 1:36 pm, Sihan Goi gois...@gmail.com wrote: Hi Andrew, Logs in /var/log/httpd/ are empty, but here's a snippet of /var/log/messages right after I start pacemaker and do a crm status http://pastebin.com/ivQdyV4u Seems like the Apache service doesn't come up. This only happens after I run the commands in the guide to configure DRBD. On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote: logs? On 23 Oct 2014, at 1:08 pm, Sihan Goi gois...@gmail.com wrote: Hi, can anyone help? Really stuck here... On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote: Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed to get a 2 node cluster working with Apache. However, once I tried to add DRBD 8.4 to the mix, it stopped working. I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, I've never used crm_report before. I just read the man file and generated a tarball from 1-2 hours before I reconfigured all the DRBD related resources. I've put the tarball here - https://www.dropbox.com/s/suj9pttjp403msv/unexplained-apache-failure.tar.bz2?dl=0 Hope you can help figure out what I'm doing wrong. Thanks for the help! On Wed, Oct 29, 2014 at 9:24 AM, Andrew Beekhof and...@beekhof.net wrote: Can you run crm_report so we can see the logs and PE files? On 28 Oct 2014, at 9:16 pm, Sihan Goi gois...@gmail.com wrote: Hi, I followed those steps previously. I just tried it again, but I'm still getting the same error. My crm configure show shows the following: node node01 \ attributes standby=off node node02 primitive ClusterIP IPaddr2 \ params ip=192.168.1.110 cidr_netmask=24 \ op monitor interval=30s primitive WebData ocf:linbit:drbd \ params drbd_resource=wwwdata \ op monitor interval=60s primitive WebFS Filesystem \ params device=/dev/drbd/by-res/wwwdata directory=/var/www/html fstype=ext4 primitive WebSite apache \ params configfile=/etc/httpd/conf/httpd.conf \ op monitor interval=1min ms WebDataClone WebData \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true location prefer-node01 WebSite 50: node01 colocation WebSite-with-WebFS inf: WebSite WebFS colocation fs_on_drbd inf: WebFS WebDataClone:Master colocation website-with-ip inf: WebSite ClusterIP order WebFS-after-WebData inf: WebDataClone:promote WebFS:start order WebSite-after-WebFS inf: WebFS WebSite order apache-after-ip Mandatory: ClusterIP WebSite property cib-bootstrap-options: \ dc-version=1.1.10-14.el6_5.3-368c726 \ cluster-infrastructure=cman \ stonith-enabled=false \ no-quorum-policy=ignore rsc_defaults rsc_defaults-options: \ migration-threshold=1 What am I doing wrong? On Tue, Oct 28, 2014 at 5:11 PM, Andrew Beekhof and...@beekhof.net wrote: On 28 Oct 2014, at 6:26 pm, Sihan Goi gois...@gmail.com wrote: Hi, No, I did not do this. I followed the Pacemaker 1.1 - Clusters from scratch edition 5 for Fedora 13, and in section 7.3.4 it instructed me to run the following commands, which I did: mkfs.ext4 /dev/drbd1 mount /dev/drbd1 /mnt create index.html file in /mnt umount /dev/drbd1 Subsequently, after unmounting, there were no further instructions to mount any other directories. So, how should I mount /dev/mapper/vg_node02-drbd--demo to /var/www/html? Should I be mounting /dev/mapper/vg_node02-drbd--demo, or /dev/drbd1. Since I've already created index.html in /dev/drbd1, should I be mounting that? I'm a little confused here. http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_configure_the_cluster_for_drbd.html Look for Now that DRBD is functioning we can configure a Filesystem resource to use it On Tue, Oct 28, 2014 at 11:41 AM, Andrew Beekhof and...@beekhof.net wrote: On 27 Oct 2014, at 6:05 pm, Sihan Goi gois...@gmail.com wrote: Hi, That offending line is as follows: DocumentRoot /var/www/html I'm guessing it needs to be updated to the DRBD block device, but I'm not sure how to do that, or even what the block device is. fdisk -l shows the following, which I'm guessing is the block device? /dev/mapper/vg_node02-drbd--demo lvs shows the following: drbd-demo vg_node02 -wi-ao 1.00g btw I'm running the commands on node02 (secondary) rather than node01 (primary). It's just a matter of convenience due to the physical location of the machine. Does it matter? Um, you need to mount /dev/mapper/vg_node02-drbd--demo to /var/www/html with a FileSystem resource. Have you not done this? Thanks. On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof and...@beekhof.net wrote: Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory On 27 Oct 2014, at 1:36 pm, Sihan Goi gois...@gmail.com wrote: Hi Andrew, Logs in /var/log/httpd/ are empty, but here's a snippet of /var/log/messages right after I start pacemaker and do a crm status http://pastebin.com/ivQdyV4u Seems like the Apache service doesn't come up. This only happens after I run the commands in the guide to configure DRBD. On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote: logs? On 23 Oct 2014, at 1:08 pm, Sihan Goi gois...@gmail.com wrote: Hi, can anyone help? Really stuck here... On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote: Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, That offending line is as follows: DocumentRoot /var/www/html I'm guessing it needs to be updated to the DRBD block device, but I'm not sure how to do that, or even what the block device is. fdisk -l shows the following, which I'm guessing is the block device? /dev/mapper/vg_node02-drbd--demo lvs shows the following: drbd-demo vg_node02 -wi-ao 1.00g btw I'm running the commands on node02 (secondary) rather than node01 (primary). It's just a matter of convenience due to the physical location of the machine. Does it matter? Thanks. On Mon, Oct 27, 2014 at 11:35 AM, Andrew Beekhof and...@beekhof.net wrote: Oct 27 10:28:44 node02 apache(WebSite)[10515]: ERROR: Syntax error on line 292 of /etc/httpd/conf/httpd.conf: DocumentRoot must be a directory On 27 Oct 2014, at 1:36 pm, Sihan Goi gois...@gmail.com wrote: Hi Andrew, Logs in /var/log/httpd/ are empty, but here's a snippet of /var/log/messages right after I start pacemaker and do a crm status http://pastebin.com/ivQdyV4u Seems like the Apache service doesn't come up. This only happens after I run the commands in the guide to configure DRBD. On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote: logs? On 23 Oct 2014, at 1:08 pm, Sihan Goi gois...@gmail.com wrote: Hi, can anyone help? Really stuck here... On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote: Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed to get a 2 node cluster working with Apache. However, once I tried to add DRBD 8.4 to the mix, it stopped working. I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started node02 Master/Slave Set: WebDataClone [WebData] Masters: [ node02 ] Slaves: [ node01 ] WebFS (ocf::heartbeat:Filesystem):Started node02 Failed actions: WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms Seems like the apache Website resource isn't starting up. Apache was working just fine before I configured DRBD. What did I do wrong? -- - Goi Sihan gois...@gmail.com -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi Andrew, Logs in /var/log/httpd/ are empty, but here's a snippet of /var/log/messages right after I start pacemaker and do a crm status http://pastebin.com/ivQdyV4u Seems like the Apache service doesn't come up. This only happens after I run the commands in the guide to configure DRBD. On Fri, Oct 24, 2014 at 8:29 AM, Andrew Beekhof and...@beekhof.net wrote: logs? On 23 Oct 2014, at 1:08 pm, Sihan Goi gois...@gmail.com wrote: Hi, can anyone help? Really stuck here... On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote: Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed to get a 2 node cluster working with Apache. However, once I tried to add DRBD 8.4 to the mix, it stopped working. I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started node02 Master/Slave Set: WebDataClone [WebData] Masters: [ node02 ] Slaves: [ node01 ] WebFS (ocf::heartbeat:Filesystem):Started node02 Failed actions: WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms Seems like the apache Website resource isn't starting up. Apache was working just fine before I configured DRBD. What did I do wrong? -- - Goi Sihan gois...@gmail.com -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, can anyone help? Really stuck here... On Mon, Oct 20, 2014 at 9:46 AM, Sihan Goi gois...@gmail.com wrote: Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed to get a 2 node cluster working with Apache. However, once I tried to add DRBD 8.4 to the mix, it stopped working. I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started node02 Master/Slave Set: WebDataClone [WebData] Masters: [ node02 ] Slaves: [ node01 ] WebFS (ocf::heartbeat:Filesystem):Started node02 Failed actions: WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms Seems like the apache Website resource isn't starting up. Apache was working just fine before I configured DRBD. What did I do wrong? -- - Goi Sihan gois...@gmail.com -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] DRBD with Pacemaker on CentOs 6.5
Hi, I'm following the Clusters from Scratch guide for Fedora 13, and I've managed to get a 2 node cluster working with Apache. However, once I tried to add DRBD 8.4 to the mix, it stopped working. I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started node02 Master/Slave Set: WebDataClone [WebData] Masters: [ node02 ] Slaves: [ node01 ] WebFS (ocf::heartbeat:Filesystem):Started node02 Failed actions: WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed Out, last-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed Out, last-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms Seems like the apache Website resource isn't starting up. Apache was working just fine before I configured DRBD. What did I do wrong? -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Linux HA setup for CentOS 6.5
Thanks! OK, so I've followed the DRBD steps in the guide all the way till cib commit fs in Section 7.4, right before Testing Migration. However, when I do a crm_mon, I get the following failed actions. Last updated: Thu Oct 16 17:28:34 2014 Last change: Thu Oct 16 17:26:04 2014 via crm_shadow on node01 Stack: cman Current DC: node02 - partition with quorum Version: 1.1.10-14.el6_5.3-368c726 2 Nodes configured 5 Resources configured Online: [ node01 node02 ] ClusterIP(ocf::heartbeat:IPaddr2):Started node02 Master/Slave Set: WebDataClone [WebData] Masters: [ node02 ] Slaves: [ node01 ] WebFS (ocf::heartbeat:Filesystem):Started node02 Failed actions: WebSite_start_0 on node02 'unknown error' (1): call=278, status=Timed Out, l ast-rc-change='Thu Oct 16 17:26:28 2014', queued=2ms, exec=0ms WebSite_start_0 on node01 'unknown error' (1): call=203, status=Timed Out, l ast-rc-change='Thu Oct 16 17:26:09 2014', queued=2ms, exec=0ms Seems like the apache Website resource isn't starting up. Apache was working just fine before I configured DRBD. What did I do wrong? On Thu, Oct 16, 2014 at 1:49 PM, Digimer li...@alteeve.ca wrote: On 16/10/14 12:14 AM, Sihan Goi wrote: After following the guide, I've successfully managed to get Apache server up and running in the cluster as an active/passive setup, but with some differences. My cluster stack is stated as being cman while the guide's is openais. Not sure if that's a problem. Also, some commands in the guide don't seem to work. If you can provide examples of what issues you're having, I will be happy to try an help. I'm moving on to DRBD installation now, but when I do a yum install drbd-pacemaker drbd-udev, these packages are not available. After some googling, it seems that drbd83-utils/kmod-drbd83 or drbd84-utils/kmod-drbd84 is available via another repo. Does this work with the guide? You need to get them from a 3rd party repo (or install from source). I personally still use 8.3.16 (consistency during Anvil! generations), but I know that 8.4 is fine on EL6 (and EL7, to address an earlier comment). I have my own repos with these packages, but you would likely be better served using the ELRepo ones. https://alteeve.ca/w/AN!Cluster_Tutorial_2#Installing_DRBD The only real difference is to s/83/84/: + yum install drbd84-utils kmod-drbd84 - yum install drbd83-utils kmod-drbd83 If you run into any troubles, please share details and I am sure we'll get you sorted out in no time. Cheers -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Linux HA setup for CentOS 6.5
Hi, So I've decided to make things simpler and go with a wired network instead of wireless. I connected both boxes to a router, manually edited the ifcfg-eth0 files to set static IP addresses for both boxes (not before downloading and building a driver for the nic of 1 of the boxes), did a chkconfig NetworkManager off, service NetworkManager stop, and service network restart. I'm able to ping each other via IP address and hostname. I also already have corosync, pacemaker, crmsh and cman installed. I then did the following as per the guide at http://geekpeek.net/linux-cluster-corosync-pacemaker service corosync start - success. service pacemaker start - I get a Starting cman...corosync cluster engine is already running [FAILED] What's up? :( On Oct 15, 2014 12:23 PM, Sihan Goi gois...@gmail.com wrote: No typo. [root@node02 network-scripts]# ls -lah /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 254 Oct 10 2013 /etc/sysconfig/network-scripts/ifcfg-lo I installed CentOS 6.5 with the LiveDVD. I found it weird as well that these files were missing. On Wed, Oct 15, 2014 at 11:54 AM, Digimer li...@alteeve.ca wrote: Sure there isn't a typo there? an-c05n01:~# ls -lah /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 225 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-bond0 -rw-r--r--. 1 root root 220 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-bond1 -rw-r--r--. 1 root root 198 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-bond2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth0 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth1 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network- scripts/ifcfg-eth2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth3 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth4 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network- scripts/ifcfg-eth5 -rw-r--r--. 1 root root 254 Jul 22 09:56 /etc/sysconfig/network- scripts/ifcfg-lo -rw-r--r--. 1 root root 213 Mar 13 2013 /etc/sysconfig/network- scripts/ifcfg-vbr2 I've never seen an EL6 install without the files there, 'network' or NetworkManager aside. digimer On 14/10/14 11:32 PM, Sihan Goi wrote: There aren't any config files in /etc/sysconfig/network-scripts. When I was using CentOS 7, the config files were there (ifcfg-something) but in this CentOS 6.5 installation, they are missing. If is possible to not use cman, and just use corosync and pacemaker? If so, how? On Wed, Oct 15, 2014 at 11:22 AM, Digimer li...@alteeve.ca mailto:li...@alteeve.ca wrote: You can manually configure the wireless LAN without NetworkManager. If you take a look, there should be existing config files in /etc/sysconfig/network-__scripts/ for the wireless connection. I've not done it myself since many Fedora's ago, but I believe you can change NMCONTROLLER=no and then start it up with /etc/sysconfig/network start. I could be a bit wrong, but I am sure you can make wireless work without NM. Question; Servers with WLAN? I assume these won't be used for corosync? digimer On 14/10/14 11:17 PM, Sihan Goi wrote: Hi, Is there a tutorial showing how to get a basic Linux HA setup with replicated storage (via DRBD) working on CentOS 6.5? I want to have mySQL as the HA resource with the database replicated across the nodes. I've scoured the web for one but it seems that I get stuck in each one somewhere. To elaborate, I have 2 CentOS 6.5 nodes configured with distinct hostnames and static IPs. They are connected to a wireless AP, and can ping each other. I tried following this guide - http://clusterlabs.org/__quickstart-redhat.html http://clusterlabs.org/quickstart-redhat.html However, cman will not start when NetworkManager is running, and my nodes cannot connect to the wireless AP without NetworkManager running. Am I missing something or is that the stupidest dependency ever? How is a cluster supposed to work when the nodes aren't connected to one another? I also tried following the clusters from scratch guide but that seems to rely on systemctl calls which aren't available on CentOS 6.5. Any help? -- - Goi Sihan gois...@gmail.com mailto:gois...@gmail.com mailto:gois...@gmail.com mailto:gois...@gmail.com _ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/__mailman/listinfo/pacemaker http://oss.clusterlabs.org/mailman/listinfo
Re: [Pacemaker] Linux HA setup for CentOS 6.5
Hi, Thanks for the guide! I thought I had the same exact version...mine is also named Pacemaker 1.1 Clusters from Scratch Creating Active/Passive and Active/Active Clusters on Fedora Edition 5, but my version of the document is meant for Fedora 17, and uses pcs and systemctl calls which don't exist on CentOS 6.5. I was trying to get it to work on CentOS 7 but realized support for DRBD on CentOS 7 is really lacking. I'll refer to the version you posted from hereon. On Wed, Oct 15, 2014 at 11:43 PM, Digimer li...@alteeve.ca wrote: Let pacemaker start cman/corosync on EL6. This is the guide that covers it, written by Pacemaker's author: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html- single/Clusters_from_Scratch/index.html It notes that it's based on Fedora 13, but that maps to EL6 almost perfectly. A very slightly altered approach is here, in my *very* unfinished tutorial: https://alteeve.ca/w/Anvil!_Tutorial_3_on_EL6#Configuring_the_Anvil.21 The main difference is that Andrew's approach (see section 8.2.2) is to disable quorum via editing /etc/sysconfig/cman, where my approach handles it in the main /etc/cluster/cluster.conf (cman's main config file). In any case, from then on, start pacemaker and let it handle everything else. Cheers digimer On 15/10/14 04:27 AM, Sihan Goi wrote: Hi, So I've decided to make things simpler and go with a wired network instead of wireless. I connected both boxes to a router, manually edited the ifcfg-eth0 files to set static IP addresses for both boxes (not before downloading and building a driver for the nic of 1 of the boxes), did a chkconfig NetworkManager off, service NetworkManager stop, and service network restart. I'm able to ping each other via IP address and hostname. I also already have corosync, pacemaker, crmsh and cman installed. I then did the following as per the guide at http://geekpeek.net/linux-cluster-corosync-pacemaker service corosync start - success. service pacemaker start - I get a Starting cman...corosync cluster engine is already running [FAILED] What's up? :( On Oct 15, 2014 12:23 PM, Sihan Goi gois...@gmail.com mailto:gois...@gmail.com wrote: No typo. [root@node02 network-scripts]# ls -lah /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 254 Oct 10 2013 /etc/sysconfig/network-scripts/ifcfg-lo I installed CentOS 6.5 with the LiveDVD. I found it weird as well that these files were missing. On Wed, Oct 15, 2014 at 11:54 AM, Digimer li...@alteeve.ca mailto:li...@alteeve.ca wrote: Sure there isn't a typo there? an-c05n01:~# ls -lah /etc/sysconfig/network-__scripts/ifcfg-* -rw-r--r--. 1 root root 225 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-bond0 -rw-r--r--. 1 root root 220 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-bond1 -rw-r--r--. 1 root root 198 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-bond2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth0 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth1 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network-__scripts/ifcfg-eth2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth3 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth4 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network-__scripts/ifcfg-eth5 -rw-r--r--. 1 root root 254 Jul 22 09:56 /etc/sysconfig/network-__scripts/ifcfg-lo -rw-r--r--. 1 root root 213 Mar 13 2013 /etc/sysconfig/network-__scripts/ifcfg-vbr2 I've never seen an EL6 install without the files there, 'network' or NetworkManager aside. digimer On 14/10/14 11:32 PM, Sihan Goi wrote: There aren't any config files in /etc/sysconfig/network-__scripts. When I was using CentOS 7, the config files were there (ifcfg-something) but in this CentOS 6.5 installation, they are missing. If is possible to not use cman, and just use corosync and pacemaker? If so, how? On Wed, Oct 15, 2014 at 11:22 AM, Digimer li...@alteeve.ca mailto:li...@alteeve.ca mailto:li...@alteeve.ca mailto:li...@alteeve.ca wrote: You can manually configure the wireless LAN without NetworkManager. If you take a look, there should be existing config files in /etc/sysconfig/network-scripts/ for the wireless connection. I've not done it myself since many Fedora's ago, but I believe you can
Re: [Pacemaker] Linux HA setup for CentOS 6.5
After following the guide, I've successfully managed to get Apache server up and running in the cluster as an active/passive setup, but with some differences. My cluster stack is stated as being cman while the guide's is openais. Not sure if that's a problem. Also, some commands in the guide don't seem to work. I'm moving on to DRBD installation now, but when I do a yum install drbd-pacemaker drbd-udev, these packages are not available. After some googling, it seems that drbd83-utils/kmod-drbd83 or drbd84-utils/kmod-drbd84 is available via another repo. Does this work with the guide? On Thu, Oct 16, 2014 at 9:35 AM, Sihan Goi gois...@gmail.com wrote: Hi, Thanks for the guide! I thought I had the same exact version...mine is also named Pacemaker 1.1 Clusters from Scratch Creating Active/Passive and Active/Active Clusters on Fedora Edition 5, but my version of the document is meant for Fedora 17, and uses pcs and systemctl calls which don't exist on CentOS 6.5. I was trying to get it to work on CentOS 7 but realized support for DRBD on CentOS 7 is really lacking. I'll refer to the version you posted from hereon. On Wed, Oct 15, 2014 at 11:43 PM, Digimer li...@alteeve.ca wrote: Let pacemaker start cman/corosync on EL6. This is the guide that covers it, written by Pacemaker's author: http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html- single/Clusters_from_Scratch/index.html It notes that it's based on Fedora 13, but that maps to EL6 almost perfectly. A very slightly altered approach is here, in my *very* unfinished tutorial: https://alteeve.ca/w/Anvil!_Tutorial_3_on_EL6#Configuring_the_Anvil.21 The main difference is that Andrew's approach (see section 8.2.2) is to disable quorum via editing /etc/sysconfig/cman, where my approach handles it in the main /etc/cluster/cluster.conf (cman's main config file). In any case, from then on, start pacemaker and let it handle everything else. Cheers digimer On 15/10/14 04:27 AM, Sihan Goi wrote: Hi, So I've decided to make things simpler and go with a wired network instead of wireless. I connected both boxes to a router, manually edited the ifcfg-eth0 files to set static IP addresses for both boxes (not before downloading and building a driver for the nic of 1 of the boxes), did a chkconfig NetworkManager off, service NetworkManager stop, and service network restart. I'm able to ping each other via IP address and hostname. I also already have corosync, pacemaker, crmsh and cman installed. I then did the following as per the guide at http://geekpeek.net/linux-cluster-corosync-pacemaker service corosync start - success. service pacemaker start - I get a Starting cman...corosync cluster engine is already running [FAILED] What's up? :( On Oct 15, 2014 12:23 PM, Sihan Goi gois...@gmail.com mailto:gois...@gmail.com wrote: No typo. [root@node02 network-scripts]# ls -lah /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 254 Oct 10 2013 /etc/sysconfig/network-scripts/ifcfg-lo I installed CentOS 6.5 with the LiveDVD. I found it weird as well that these files were missing. On Wed, Oct 15, 2014 at 11:54 AM, Digimer li...@alteeve.ca mailto:li...@alteeve.ca wrote: Sure there isn't a typo there? an-c05n01:~# ls -lah /etc/sysconfig/network-__scripts/ifcfg-* -rw-r--r--. 1 root root 225 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-bond0 -rw-r--r--. 1 root root 220 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-bond1 -rw-r--r--. 1 root root 198 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-bond2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth0 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth1 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network-__scripts/ifcfg-eth2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth3 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network-__scripts/ifcfg-eth4 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network-__scripts/ifcfg-eth5 -rw-r--r--. 1 root root 254 Jul 22 09:56 /etc/sysconfig/network-__scripts/ifcfg-lo -rw-r--r--. 1 root root 213 Mar 13 2013 /etc/sysconfig/network-__scripts/ifcfg-vbr2 I've never seen an EL6 install without the files there, 'network' or NetworkManager aside. digimer On 14/10/14 11:32 PM, Sihan Goi wrote: There aren't any config files in /etc/sysconfig/network-__scripts. When I was using CentOS 7, the config files were there (ifcfg-something) but in this CentOS 6.5 installation, they are missing. If is possible
[Pacemaker] Linux HA setup for CentOS 6.5
Hi, Is there a tutorial showing how to get a basic Linux HA setup with replicated storage (via DRBD) working on CentOS 6.5? I want to have mySQL as the HA resource with the database replicated across the nodes. I've scoured the web for one but it seems that I get stuck in each one somewhere. To elaborate, I have 2 CentOS 6.5 nodes configured with distinct hostnames and static IPs. They are connected to a wireless AP, and can ping each other. I tried following this guide - http://clusterlabs.org/quickstart-redhat.html However, cman will not start when NetworkManager is running, and my nodes cannot connect to the wireless AP without NetworkManager running. Am I missing something or is that the stupidest dependency ever? How is a cluster supposed to work when the nodes aren't connected to one another? I also tried following the clusters from scratch guide but that seems to rely on systemctl calls which aren't available on CentOS 6.5. Any help? -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] Linux HA setup for CentOS 6.5
No typo. [root@node02 network-scripts]# ls -lah /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 254 Oct 10 2013 /etc/sysconfig/network-scripts/ifcfg-lo I installed CentOS 6.5 with the LiveDVD. I found it weird as well that these files were missing. On Wed, Oct 15, 2014 at 11:54 AM, Digimer li...@alteeve.ca wrote: Sure there isn't a typo there? an-c05n01:~# ls -lah /etc/sysconfig/network-scripts/ifcfg-* -rw-r--r--. 1 root root 225 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-bond0 -rw-r--r--. 1 root root 220 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-bond1 -rw-r--r--. 1 root root 198 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-bond2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth0 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth1 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network- scripts/ifcfg-eth2 -rw-r--r--. 1 root root 149 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth3 -rw-r--r--. 1 root root 144 Jan 16 2013 /etc/sysconfig/network- scripts/ifcfg-eth4 -rw-r--r--. 1 root root 152 Mar 14 2013 /etc/sysconfig/network- scripts/ifcfg-eth5 -rw-r--r--. 1 root root 254 Jul 22 09:56 /etc/sysconfig/network- scripts/ifcfg-lo -rw-r--r--. 1 root root 213 Mar 13 2013 /etc/sysconfig/network- scripts/ifcfg-vbr2 I've never seen an EL6 install without the files there, 'network' or NetworkManager aside. digimer On 14/10/14 11:32 PM, Sihan Goi wrote: There aren't any config files in /etc/sysconfig/network-scripts. When I was using CentOS 7, the config files were there (ifcfg-something) but in this CentOS 6.5 installation, they are missing. If is possible to not use cman, and just use corosync and pacemaker? If so, how? On Wed, Oct 15, 2014 at 11:22 AM, Digimer li...@alteeve.ca mailto:li...@alteeve.ca wrote: You can manually configure the wireless LAN without NetworkManager. If you take a look, there should be existing config files in /etc/sysconfig/network-__scripts/ for the wireless connection. I've not done it myself since many Fedora's ago, but I believe you can change NMCONTROLLER=no and then start it up with /etc/sysconfig/network start. I could be a bit wrong, but I am sure you can make wireless work without NM. Question; Servers with WLAN? I assume these won't be used for corosync? digimer On 14/10/14 11:17 PM, Sihan Goi wrote: Hi, Is there a tutorial showing how to get a basic Linux HA setup with replicated storage (via DRBD) working on CentOS 6.5? I want to have mySQL as the HA resource with the database replicated across the nodes. I've scoured the web for one but it seems that I get stuck in each one somewhere. To elaborate, I have 2 CentOS 6.5 nodes configured with distinct hostnames and static IPs. They are connected to a wireless AP, and can ping each other. I tried following this guide - http://clusterlabs.org/__quickstart-redhat.html http://clusterlabs.org/quickstart-redhat.html However, cman will not start when NetworkManager is running, and my nodes cannot connect to the wireless AP without NetworkManager running. Am I missing something or is that the stupidest dependency ever? How is a cluster supposed to work when the nodes aren't connected to one another? I also tried following the clusters from scratch guide but that seems to rely on systemctl calls which aren't available on CentOS 6.5. Any help? -- - Goi Sihan gois...@gmail.com mailto:gois...@gmail.com mailto:gois...@gmail.com mailto:gois...@gmail.com _ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/__mailman/listinfo/pacemaker http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/__doc/Cluster_from_Scratch.pdf http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org mailto:Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/__mailman/listinfo/pacemaker http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/__doc
Re: [Pacemaker] ERROR: Unable to find nic or netmask.
Figured out the problem - the firewall rules are somehow not persistent. After running the following commands: iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT iptables -I INPUT -p igmp -j ACCEPT iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT service iptables save Both nodes are able to communicate with each other. Seems like several things aren't persistent upon reboots, and need to be restarted/reconfigured. Is this the intended behavior? On Tue, Sep 2, 2014 at 2:05 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote: Hi, maybe is following helpfull: https://www.google.at/url?sa=trct=jq=esrc=ssource=webcd=2cad=rjauact=8ved=0CDEQFjABurl=http%3A%2F%2Fhttpd.apache.org%2Fdocs%2Ftrunk%2Fbind.htmlei=QV0FVK2YBYHO0QXPxYHQDwusg=AFQjCNGCErofEEVtclS_x6ZXA3bXvJiawwsig2=hR8kUWRcpmN4PE1V42t9kgbvm=bv.74115972,d.bGE https://www.google.at/url?sa=trct=jq=esrc=ssource=webcd=1cad=rjauact=8ved=0CC0QrAIwAAurl=http%3A%2F%2Fubuntuforums.org%2Fshowthread.php%3Ft%3D1636667ei=QV0FVK2YBYHO0QXPxYHQDwusg=AFQjCNHcs7alJ_RwBc4tWq2X7ew4ynEmzgsig2=ra1qjZ8nly8opwawrACidwbvm=bv.74115972,d.bGE HTH Nikita On 02.09.2014 07:47, Sihan Goi wrote: Hi, After some investigation, it seems that my Apache is having trouble starting in both nodes. I get the following error message when I try to restart the service: Job for httpd.service failed. See 'systemctl status httpd.service' and 'journalctl -xn' for details. systemctl status httpd.service shows the following output: httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: failed (Result: exit-code) since Tue 2014-09-02 13:45:52 SGT; 8s ago Process: 26095 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS) Process: 26093 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE) Main PID: 26093 (code=exited, status=1/FAILURE) Sep 02 13:45:52 node02 httpd[26093]: AH00558: httpd: Could not reliably det...ge Sep 02 13:45:52 node02 httpd[26093]: (98)Address already in use: AH00072: m...80 Sep 02 13:45:52 node02 httpd[26093]: no listening sockets available, shutti...wn Sep 02 13:45:52 node02 httpd[26093]: AH00015: Unable to open logs Sep 02 13:45:52 node02 systemd[1]: httpd.service: main process exited, code...RE Sep 02 13:45:52 node02 systemd[1]: Failed to start The Apache HTTP Server. Sep 02 13:45:52 node02 systemd[1]: Unit httpd.service entered failed state. Hint: Some lines were ellipsized, use -l to show in full. /var/log/messages also shows similar messages Sep 2 13:41:12 node02 systemd: Starting The Apache HTTP Server... Sep 2 13:41:12 node02 httpd: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.0.112. Set the 'ServerName' directive globally to suppress this message Sep 2 13:41:12 node02 httpd: (98)Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:80 Sep 2 13:41:12 node02 httpd: no listening sockets available, shutting down Sep 2 13:41:12 node02 httpd: AH00015: Unable to open logs Sep 2 13:41:12 node02 systemd: httpd.service: main process exited, code=exited, status=1/FAILURE Sep 2 13:41:12 node02 systemd: Failed to start The Apache HTTP Server. Sep 2 13:41:12 node02 systemd: Unit httpd.service entered failed state. Is this related to the problem? On Tue, Sep 2, 2014 at 12:42 PM, Teerapatr Kittiratanachai maillist...@gmail.com wrote: Try to set cidr_netmask=32 for resource only, and let the physical interface's netmask be 24. On Tue, Sep 2, 2014 at 11:27 AM, Sihan Goi gois...@gmail.com gois...@gmail.com wrote: Got it. Changed the netmask for both PCs to 255.255.255.0 and changed cidr_netmask to 24 and it works...sort of. It was working for a while, and then I rebooted both PCs, and now each thinks its online and the other is offline. pcs status on my node01 gives the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:21:25 2014 Last change: Tue Sep 2 12:13:27 2014 via cibadmin on node02 Stack: corosync Current DC: node01 (1) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes configured 2 Resources configured Online: [ node01 ] OFFLINE: [ node02 ] Full list of resources: virtual_ip(ocf::heartbeat:IPaddr2):Started node01 webserver(ocf::heartbeat:apache):Started node01 PCSD Status: node01: Offline node02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled However, pcs status on node02 shows the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:20:41 2014 Last change: Tue Sep 2 11:59:03 2014 via cibadmin on node02 Stack: corosync Current DC: node02 (2) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes
Re: [Pacemaker] ERROR: Unable to find nic or netmask.
I mean things like firewall settings, as well as services like pcsd, pacemaker and corosync not starting up automatically sometimes. On Tue, Sep 16, 2014 at 5:10 PM, Nikita Michalko michalko.sys...@a-i-p.com wrote: On 16.09.2014 10:31, Sihan Goi wrote: Figured out the problem - the firewall rules are somehow not persistent. After running the following commands: iptables -I INPUT -m state --state NEW -p udp -m multiport --dports 5404,5405 -j ACCEPT iptables -I INPUT -p tcp -m state --state NEW -m tcp --dport 2224 -j ACCEPT iptables -I INPUT -p igmp -j ACCEPT iptables -I INPUT -m addrtype --dst-type MULTICAST -j ACCEPT service iptables save Both nodes are able to communicate with each other. Seems like several things aren't persistent upon reboots, and need to be restarted/reconfigured. Is this the intended behavior? What do you mean with several things ? Firewall/iptables on CentOS 7? Or Pacemaker/Corosync/pcs ? Nikita On Tue, Sep 2, 2014 at 2:05 PM, Nikita Michalko michalko.sys...@a-i-p.com michalko.sys...@a-i-p.com wrote: Hi, maybe is following helpfull:https://www.google.at/url?sa=trct=jq=esrc=ssource=webcd=2cad=rjauact=8ved=0CDEQFjABurl=http%3A%2F%2Fhttpd.apache.org%2Fdocs%2Ftrunk%2Fbind.htmlei=QV0FVK2YBYHO0QXPxYHQDwusg=AFQjCNGCErofEEVtclS_x6ZXA3bXvJiawwsig2=hR8kUWRcpmN4PE1V42t9kgbvm=bv.74115972,d.bGEhttps://www.google.at/url?sa=trct=jq=esrc=ssource=webcd=1cad=rjauact=8ved=0CC0QrAIwAAurl=http%3A%2F%2Fubuntuforums.org%2Fshowthread.php%3Ft%3D1636667ei=QV0FVK2YBYHO0QXPxYHQDwusg=AFQjCNHcs7alJ_RwBc4tWq2X7ew4ynEmzgsig2=ra1qjZ8nly8opwawrACidwbvm=bv.74115972,d.bGE HTH Nikita On 02.09.2014 07:47, Sihan Goi wrote: Hi, After some investigation, it seems that my Apache is having trouble starting in both nodes. I get the following error message when I try to restart the service: Job for httpd.service failed. See 'systemctl status httpd.service' and 'journalctl -xn' for details. systemctl status httpd.service shows the following output: httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: failed (Result: exit-code) since Tue 2014-09-02 13:45:52 SGT; 8s ago Process: 26095 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS) Process: 26093 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE) Main PID: 26093 (code=exited, status=1/FAILURE) Sep 02 13:45:52 node02 httpd[26093]: AH00558: httpd: Could not reliably det...ge Sep 02 13:45:52 node02 httpd[26093]: (98)Address already in use: AH00072: m...80 Sep 02 13:45:52 node02 httpd[26093]: no listening sockets available, shutti...wn Sep 02 13:45:52 node02 httpd[26093]: AH00015: Unable to open logs Sep 02 13:45:52 node02 systemd[1]: httpd.service: main process exited, code...RE Sep 02 13:45:52 node02 systemd[1]: Failed to start The Apache HTTP Server. Sep 02 13:45:52 node02 systemd[1]: Unit httpd.service entered failed state. Hint: Some lines were ellipsized, use -l to show in full. /var/log/messages also shows similar messages Sep 2 13:41:12 node02 systemd: Starting The Apache HTTP Server... Sep 2 13:41:12 node02 httpd: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.0.112. Set the 'ServerName' directive globally to suppress this message Sep 2 13:41:12 node02 httpd: (98)Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:80 Sep 2 13:41:12 node02 httpd: no listening sockets available, shutting down Sep 2 13:41:12 node02 httpd: AH00015: Unable to open logs Sep 2 13:41:12 node02 systemd: httpd.service: main process exited, code=exited, status=1/FAILURE Sep 2 13:41:12 node02 systemd: Failed to start The Apache HTTP Server. Sep 2 13:41:12 node02 systemd: Unit httpd.service entered failed state. Is this related to the problem? On Tue, Sep 2, 2014 at 12:42 PM, Teerapatr Kittiratanachai maillist...@gmail.com maillist...@gmail.com wrote: Try to set cidr_netmask=32 for resource only, and let the physical interface's netmask be 24. On Tue, Sep 2, 2014 at 11:27 AM, Sihan Goi gois...@gmail.com gois...@gmail.com gois...@gmail.com gois...@gmail.com wrote: Got it. Changed the netmask for both PCs to 255.255.255.0 and changed cidr_netmask to 24 and it works...sort of. It was working for a while, and then I rebooted both PCs, and now each thinks its online and the other is offline. pcs status on my node01 gives the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:21:25 2014 Last change: Tue Sep 2 12:13:27 2014 via cibadmin on node02 Stack: corosync Current DC: node01 (1) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes configured 2 Resources configured Online: [ node01 ] OFFLINE: [ node02 ] Full list of resources: virtual_ip(ocf::heartbeat:IPaddr2):Started node01 webserver(ocf::heartbeat:apache
[Pacemaker] Notification when a node is down
Hi, Is there any way for a Pacemaker/Corosync/PCS setup to send a notification when it detects that a node in a cluster is down? I read that Pacemaker and Corosync logs events to syslog, but where is the syslog file in CentOS? Do they log events such as a failover occurrence? Thanks. -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] pcs cluster auth shows Error: Unable to communicate with node message
Hi, I had a basic HA setup working with 2 nodes previously running a simple Apache web server on a private local network. However, I'm having trouble getting it to work right now, and I haven't changed anything other than rebooting a few times. Firstly, I've noticed that I need to start the pcsd service manually after every reboot with systemctl start pcsd. Corosync seems to start automatically After starting pcsd and restarting the cluster, the HA cluster used to work. However, now it doesn't seem to. pcs status on the node01 would show node1 as online and node02 as offline, and vice versa. When I try pcs cluster auth node02 from node01, I'd get Error: Unable to communicate with node02, even though I'm able to ping both the IP address and hostname of node02 from node01 node01 and node02 would both serve their own web page when I enter the virtual IP address in the browser URL bar. However, a 3rd device connected to the same network is unable to load the webpage from the virtual IP address. What's wrong? Thanks! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] pcs cluster auth shows Error: Unable to communicate with node message
Tried that, same problem. On Sep 9, 2014 3:44 PM, emmanuel segura emi2f...@gmail.com wrote: systemctl enable pcsd.service ? 2014-09-09 9:37 GMT+02:00 Sihan Goi gois...@gmail.com: Hi, I had a basic HA setup working with 2 nodes previously running a simple Apache web server on a private local network. However, I'm having trouble getting it to work right now, and I haven't changed anything other than rebooting a few times. Firstly, I've noticed that I need to start the pcsd service manually after every reboot with systemctl start pcsd. Corosync seems to start automatically After starting pcsd and restarting the cluster, the HA cluster used to work. However, now it doesn't seem to. pcs status on the node01 would show node1 as online and node02 as offline, and vice versa. When I try pcs cluster auth node02 from node01, I'd get Error: Unable to communicate with node02, even though I'm able to ping both the IP address and hostname of node02 from node01 node01 and node02 would both serve their own web page when I enter the virtual IP address in the browser URL bar. However, a 3rd device connected to the same network is unable to load the webpage from the virtual IP address. What's wrong? Thanks! ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- esta es mi vida e me la vivo hasta que dios quiera ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] ERROR: Unable to find nic or netmask.
Hi, I'm trying to create a HA cluster with 2 CentOS 7 PCs connected to a wireless AP. The PCs have the static IP addresses 192.168.0.111 and 192.168.0.112 respectively and hostnames node01 and node02 respectively. I've tried to create a virtual IP address of 192.168.0.110 using the following command: pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.0.110 cidr_netmask=32 op monitor interval=30s However, when I do a pcs status resources I get the following output: virtual_ip(ocf::heartbeat:IPaddr2):Stopped The virtual IP is stopped rather than started. I looked into /var/log/messages and /var/log/pacemaker.log and I find the following error messages: node02 IPaddr2(virtual_ip)[25451]: ERROR: Unable to find nic or netmask. node02 IPaddr2(virtual_ip)[25451]: ERROR: [findif] failed It seems that it's unable to find my nic. How can I fix this? Thanks. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ERROR: Unable to find nic or netmask.
Got it. Changed the netmask for both PCs to 255.255.255.0 and changed cidr_netmask to 24 and it works...sort of. It was working for a while, and then I rebooted both PCs, and now each thinks its online and the other is offline. pcs status on my node01 gives the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:21:25 2014 Last change: Tue Sep 2 12:13:27 2014 via cibadmin on node02 Stack: corosync Current DC: node01 (1) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes configured 2 Resources configured Online: [ node01 ] OFFLINE: [ node02 ] Full list of resources: virtual_ip(ocf::heartbeat:IPaddr2):Started node01 webserver(ocf::heartbeat:apache):Started node01 PCSD Status: node01: Offline node02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled However, pcs status on node02 shows the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:20:41 2014 Last change: Tue Sep 2 11:59:03 2014 via cibadmin on node02 Stack: corosync Current DC: node02 (2) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes configured 2 Resources configured Online: [ node02 ] OFFLINE: [ node01 ] Full list of resources: virtual_ip(ocf::heartbeat:IPaddr2):Started node02 webserver(ocf::heartbeat:apache):Started node02 PCSD Status: node01: Offline node02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled Seems like each node thinks it's online and the other is not. I'm running HA on apache webserver, and if I access the webpage on node01, I get node01's index.html. If I access it on node02, I get node02's index.html. If I access it via another PC connected to the same AP, the webpage is unavailable. What could be wrong? On Mon, Sep 1, 2014 at 9:09 PM, John Lauro john.la...@covenanteyes.com wrote: ip=192.168.0.110 cidr_netmask=32 /32 leaves no room for any other IP addresses on that interface and so you have to specify the nic. Are you certain 192.168.0.111 and 192.168.0.112 do not have a different netmask from 255.255.255.255, like 255.255.255.0 for /24 or 255.255.0.0 for /16? If they do have 255.255.255.255 too, then they are probably not setup correctly... PS: cidr_netmask is optional. Assuming a proper netmask (not 255.255.255.2555) is on 192.168.0.111 and 192.168.0.112 it should work without specifying cidr_netmask. -- *From: *Sihan Goi gois...@gmail.com *To: *pacemaker@oss.clusterlabs.org *Sent: *Monday, September 1, 2014 4:17:20 AM *Subject: *[Pacemaker] ERROR: Unable to find nic or netmask. Hi, I'm trying to create a HA cluster with 2 CentOS 7 PCs connected to a wireless AP. The PCs have the static IP addresses 192.168.0.111 and 192.168.0.112 respectively and hostnames node01 and node02 respectively. I've tried to create a virtual IP address of 192.168.0.110 using the following command: pcs resource create virtual_ip ocf:heartbeat:IPaddr2 ip=192.168.0.110 cidr_netmask=32 op monitor interval=30s However, when I do a pcs status resources I get the following output: virtual_ip(ocf::heartbeat:IPaddr2):Stopped The virtual IP is stopped rather than started. I looked into /var/log/messages and /var/log/pacemaker.log and I find the following error messages: node02 IPaddr2(virtual_ip)[25451]: ERROR: Unable to find nic or netmask. node02 IPaddr2(virtual_ip)[25451]: ERROR: [findif] failed It seems that it's unable to find my nic. How can I fix this? Thanks. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- - Goi Sihan gois...@gmail.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] ERROR: Unable to find nic or netmask.
Hi, After some investigation, it seems that my Apache is having trouble starting in both nodes. I get the following error message when I try to restart the service: Job for httpd.service failed. See 'systemctl status httpd.service' and 'journalctl -xn' for details. systemctl status httpd.service shows the following output: httpd.service - The Apache HTTP Server Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: failed (Result: exit-code) since Tue 2014-09-02 13:45:52 SGT; 8s ago Process: 26095 ExecStop=/bin/kill -WINCH ${MAINPID} (code=exited, status=0/SUCCESS) Process: 26093 ExecStart=/usr/sbin/httpd $OPTIONS -DFOREGROUND (code=exited, status=1/FAILURE) Main PID: 26093 (code=exited, status=1/FAILURE) Sep 02 13:45:52 node02 httpd[26093]: AH00558: httpd: Could not reliably det...ge Sep 02 13:45:52 node02 httpd[26093]: (98)Address already in use: AH00072: m...80 Sep 02 13:45:52 node02 httpd[26093]: no listening sockets available, shutti...wn Sep 02 13:45:52 node02 httpd[26093]: AH00015: Unable to open logs Sep 02 13:45:52 node02 systemd[1]: httpd.service: main process exited, code...RE Sep 02 13:45:52 node02 systemd[1]: Failed to start The Apache HTTP Server. Sep 02 13:45:52 node02 systemd[1]: Unit httpd.service entered failed state. Hint: Some lines were ellipsized, use -l to show in full. /var/log/messages also shows similar messages Sep 2 13:41:12 node02 systemd: Starting The Apache HTTP Server... Sep 2 13:41:12 node02 httpd: AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 192.168.0.112. Set the 'ServerName' directive globally to suppress this message Sep 2 13:41:12 node02 httpd: (98)Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:80 Sep 2 13:41:12 node02 httpd: no listening sockets available, shutting down Sep 2 13:41:12 node02 httpd: AH00015: Unable to open logs Sep 2 13:41:12 node02 systemd: httpd.service: main process exited, code=exited, status=1/FAILURE Sep 2 13:41:12 node02 systemd: Failed to start The Apache HTTP Server. Sep 2 13:41:12 node02 systemd: Unit httpd.service entered failed state. Is this related to the problem? On Tue, Sep 2, 2014 at 12:42 PM, Teerapatr Kittiratanachai maillist...@gmail.com wrote: Try to set cidr_netmask=32 for resource only, and let the physical interface's netmask be 24. On Tue, Sep 2, 2014 at 11:27 AM, Sihan Goi gois...@gmail.com wrote: Got it. Changed the netmask for both PCs to 255.255.255.0 and changed cidr_netmask to 24 and it works...sort of. It was working for a while, and then I rebooted both PCs, and now each thinks its online and the other is offline. pcs status on my node01 gives the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:21:25 2014 Last change: Tue Sep 2 12:13:27 2014 via cibadmin on node02 Stack: corosync Current DC: node01 (1) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes configured 2 Resources configured Online: [ node01 ] OFFLINE: [ node02 ] Full list of resources: virtual_ip(ocf::heartbeat:IPaddr2):Started node01 webserver(ocf::heartbeat:apache):Started node01 PCSD Status: node01: Offline node02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled However, pcs status on node02 shows the following output: Cluster name: cluster_web Last updated: Tue Sep 2 12:20:41 2014 Last change: Tue Sep 2 11:59:03 2014 via cibadmin on node02 Stack: corosync Current DC: node02 (2) - partition WITHOUT quorum Version: 1.1.10-32.el7_0-368c726 2 Nodes configured 2 Resources configured Online: [ node02 ] OFFLINE: [ node01 ] Full list of resources: virtual_ip(ocf::heartbeat:IPaddr2):Started node02 webserver(ocf::heartbeat:apache):Started node02 PCSD Status: node01: Offline node02: Online Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/disabled Seems like each node thinks it's online and the other is not. I'm running HA on apache webserver, and if I access the webpage on node01, I get node01's index.html. If I access it on node02, I get node02's index.html. If I access it via another PC connected to the same AP, the webpage is unavailable. What could be wrong? On Mon, Sep 1, 2014 at 9:09 PM, John Lauro john.la...@covenanteyes.com wrote: ip=192.168.0.110 cidr_netmask=32 /32 leaves no room for any other IP addresses on that interface and so you have to specify the nic. Are you certain 192.168.0.111 and 192.168.0.112 do not have a different netmask from 255.255.255.255, like 255.255.255.0 for /24 or 255.255.0.0 for /16? If they do have 255.255.255.255 too, then they are probably not setup correctly... PS: cidr_netmask is optional. Assuming a proper netmask (not 255.255.255.2555