Re: [Linux-ha-dev] Monitoring Process Death
On 2010-05-21T12:12:12, Bob Schatz bsch...@yahoo.com wrote: I think the basic requirements are: 1.When a process starts it registers itself with a kernel component. This registration also gets passed an action. The easiest way would be for the RA to register pids to be monitored to lrmd, and have lrmd generate failed monitor messages if one of them goes away. (Of course, the RA needs to unregister them as part of the stop operation!) No kernel changes are necessary, this can all be implemented via a patch to lrmd and some supporting shell funcs. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] Monitoring Process Death
Thanks Lars and Dejan for your feedback. I have started reading the lrmd source. One thing I am worried about is that if I give a PID to lrmd, how will lrmd monitor it? My RA is a shell script that forks off a daemon. If I give this daemon PID to lrmd does lrmd start a thread to monitor PID? I don't think that lrmd would get a SIGCHLD in my case. Thanks, Bob - Original Message From: Lars Marowsky-Bree l...@novell.com To: High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org Sent: Fri, May 28, 2010 8:49:28 AM Subject: Re: [Linux-ha-dev] Monitoring Process Death On 2010-05-21T12:12:12, Bob Schatz bsch...@yahoo.com wrote: I think the basic requirements are: 1.When a process starts it registers itself with a kernel component. This registration also gets passed an action. The easiest way would be for the RA to register pids to be monitored to lrmd, and have lrmd generate failed monitor messages if one of them goes away. (Of course, the RA needs to unregister them as part of the stop operation!) No kernel changes are necessary, this can all be implemented via a patch to lrmd and some supporting shell funcs. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-HA] stonith: failure using expect+ssh (solved)
On Thu, May 27, 2010 at 09:48:15PM -0600, Tim Serong wrote: On 5/27/2010 at 10:24 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 27, 2010 at 12:46:14AM +0200, Matthias Ferdinand wrote: --On Wednesday, May 26, 2010 12:00:02 -0600 linux-ha-requ...@lists.linux-ha.org wrote: OK, but it'd be still better/easier to just use ssh with public key authentication. For telnet, there is a python plugin ibmrsa-telnet which could be modified for iLO. DISPLAY=dummy SSH_ASKPASS=/bin/my_cat_passwd_file.sh ssh somewhere my_cat_passwd_file.sh: # !/bin/sh cat /etc/passwd_file /etc/passwd_file: 0600 root root containing your password ;-) thank you for your hints. SSH_ASKPASS did not work for me (using password auth), ssh keeps prompting for the password. Apparently SSH_ASKPASS is for passphrases only. No. But as long as ssh _does_ have a tty, it will ask for the password on the tty ;-) Only if it does not find a tty, it will use the askpass hook. but as the script now does the job I think will just leave it at that. It needs to work for you, that is what matters. If you still want to fiddle around with SSH_ASKPASS, it might help to redirect stdin from /dev/null... Nope. ssh explicitly opens /dev/tty. So for it to not use the tty, it needs to have no tty ;-) To get rid of a tty, you usually do setsid. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync
Hi, On Thu, May 27, 2010 at 11:23:16AM +0200, RaSca wrote: Hi all, I've got some problems with my setup and I'm trying to understand if I am missing something or is a bug, here is how to reproduce the error: node debian-lenny-nodo1 node debian-lenny-nodo2 primitive drbd0 ocf:linbit:drbd \ params drbd_resource=r0 \ op monitor interval=20s timeout=40s \ op start interval=0 timeout=240s \ op stop interval=0 timeout=100s primitive nfs-common lsb:nfs-common primitive nfs-kernel-server lsb:nfs-kernel-server primitive ping ocf:pacemaker:ping \ params host_list=192.168.1.1 name=ping \ op monitor interval=60s timeout=60s \ op start interval=0 timeout=60s primitive portmap lsb:portmap primitive store-LVM ocf:heartbeat:LVM \ params volgrpname=vg_drbd \ op monitor interval=10s timeout=30s \ op start interval=0 timeout=30s \ op stop interval=0 timeout=30s primitive store-exportfs ocf:heartbeat:exportfs \ params directory=/store/share clientspec=192.168.1.0/24 options=rw,sync,no_subtree_check,no_root_squash fsid=1 \ op monitor interval=10s timeout=30s \ op start interval=0 timeout=40s \ op stop interval=0 timeout=40s \ meta target-role=Started primitive store-fs ocf:heartbeat:Filesystem \ params device=/dev/vg_drbd/lv_store directory=/store fstype=ext3 \ op monitor interval=20s timeout=40s \ op start interval=0 timeout=60s \ op stop interval=0 timeout=60s \ meta is-managed=true primitive store-ip ocf:heartbeat:IPaddr2 \ params ip=192.168.1.53 nic=bond0 \ op monitor interval=20s timeout=40s group nfs portmap nfs-common nfs-kernel-server group store store-ip store-LVM store-fs store-exportfs ms ms-drbd0 drbd0 \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true clone nfs_clone nfs \ meta globally-unique=false clone ping_clone ping \ meta globally-unique=false location cli-prefer-store store \ rule $id=cli-prefer-rule-store inf: #uname eq debian-lenny-nodo1 location store_on_connected_node store \ rule $id=store_on_connected_node-rule -inf: not_defined ping or ping lte 0 colocation store_on_ms-drbd0 inf: store ms-drbd0:Master order store_after_ms-drbd0 inf: ms-drbd0:promote store:start property $id=cib-bootstrap-options \ dc-version=1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75 \ no-quorum-policy=ignore \ stonith-enabled=false \ cluster-infrastructure=openais \ expected-quorum-votes=2 \ last-lrm-refresh=1274949951 Everything comes up smoothly: Online: [ debian-lenny-nodo1 debian-lenny-nodo2 ] Clone Set: ping_clone Started: [ debian-lenny-nodo1 debian-lenny-nodo2 ] Master/Slave Set: ms-drbd0 Masters: [ debian-lenny-nodo1 ] Slaves: [ debian-lenny-nodo2 ] Resource Group: store store-ip (ocf::heartbeat:IPaddr2): Started debian-lenny-nodo1 store-LVM (ocf::heartbeat:LVM): Started debian-lenny-nodo1 store-fs (ocf::heartbeat:Filesystem):Started debian-lenny-nodo1 store-exportfs (ocf::heartbeat:exportfs): Started debian-lenny-nodo1 Clone Set: nfs_clone Started: [ debian-lenny-nodo2 debian-lenny-nodo1 ] I mount the share on a network client, with default options, and then begin to copy with cp command. The copy goes on and after a while i migrate the group store on the second node: crm resource migrate store debian-lenny-nodo2 Everything goes smooth and on the client the copy hangs for a minute or two, and the restart. After that, from the client i copy another thing on the nfs storage, this time with rsync command. The copy starts and after a while i launch the migration command. The cluster this time hangs, giving a failure on the filesystem resource: store-fs (ocf::heartbeat:Filesystem):Started debian-lenny-nodo2 (unmanaged) FAILED the only way to make things work again is to cleanup the nfs_clone resource (or restart the nfs-kernel-server daemon) and then cleanup the store group. It seems that the filesystem is keep opened by the nfs daemon. Which it shouldn't. The lsb:nfs-kernel-server should've exited only once the server was really stopped. So, what's the difference between a simple copy and a rsync? Why with rsync the fs resource isn't able to unmount the filesystem? Did you try strace with rsync to see what is different? There is something I am missing or this should be an fs resource agent bug? Not strictly a Filesystem RA bug, though it could behave better. Currently, the stop operation fails quickly (in six seconds) in case there's something using the filesystem, which won't go away. As is the case with kernel threads. You can try the attached patch with which the Filesystem RA is going to wait until the defined stop timeout for the filesystem to be unmounted. Short
Re: [Linux-HA] stonith: failure using expect+ssh (solved)
On Fri, May 28, 2010 at 11:07:53AM +0200, Lars Ellenberg wrote: On Thu, May 27, 2010 at 09:48:15PM -0600, Tim Serong wrote: On 5/27/2010 at 10:24 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 27, 2010 at 12:46:14AM +0200, Matthias Ferdinand wrote: --On Wednesday, May 26, 2010 12:00:02 -0600 linux-ha-requ...@lists.linux-ha.org wrote: OK, but it'd be still better/easier to just use ssh with public key authentication. For telnet, there is a python plugin ibmrsa-telnet which could be modified for iLO. DISPLAY=dummy SSH_ASKPASS=/bin/my_cat_passwd_file.sh ssh somewhere my_cat_passwd_file.sh: # !/bin/sh cat /etc/passwd_file /etc/passwd_file: 0600 root root containing your password ;-) thank you for your hints. SSH_ASKPASS did not work for me (using password auth), ssh keeps prompting for the password. Apparently SSH_ASKPASS is for passphrases only. No. But as long as ssh _does_ have a tty, it will ask for the password on the tty ;-) Only if it does not find a tty, it will use the askpass hook. but as the script now does the job I think will just leave it at that. It needs to work for you, that is what matters. If you still want to fiddle around with SSH_ASKPASS, it might help to redirect stdin from /dev/null... Nope. ssh explicitly opens /dev/tty. So for it to not use the tty, it needs to have no tty ;-) To get rid of a tty, you usually do setsid. Isn't there a ssh option for this, i.e. don't allocate tty? Though this is really getting out of control ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync
Il giorno Ven 28 Mag 2010 12:29:24 CET, Dejan Muhamedagic ha scritto: Hi, Hi Dejan, thanks for your answer. [...] the only way to make things work again is to cleanup the nfs_clone resource (or restart the nfs-kernel-server daemon) and then cleanup the store group. It seems that the filesystem is keep opened by the nfs daemon. Which it shouldn't. The lsb:nfs-kernel-server should've exited only once the server was really stopped. Note that the nfs-kernel-server isn't connected to the exportfs, but is only a cloned resource, so it isn't touched by the migration process. So, what's the difference between a simple copy and a rsync? Why with rsync the fs resource isn't able to unmount the filesystem? Did you try strace with rsync to see what is different? I made some other tests and the problem appear also with cp, so it is not an rsync-related issue. There is something I am missing or this should be an fs resource agent bug? Not strictly a Filesystem RA bug, though it could behave better. Currently, the stop operation fails quickly (in six seconds) in case there's something using the filesystem, which won't go away. As is the case with kernel threads. You can try the attached patch with which the Filesystem RA is going to wait until the defined stop timeout for the filesystem to be unmounted. Short instructions: set the fast_stop parameter to no and set the timeout for the stop operation of the filesystem to however long the nfsd takes to exit. Thanks, Dejan I will give the patch an immediate try and let you know. Thanks again! -- RaSca Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene! ra...@miamammausalinux.org http://www.miamammausalinux.org ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] stonith: failure using expect+ssh (solved)
On Fri, May 28, 2010 at 12:32:33PM +0200, Dejan Muhamedagic wrote: On Fri, May 28, 2010 at 11:07:53AM +0200, Lars Ellenberg wrote: On Thu, May 27, 2010 at 09:48:15PM -0600, Tim Serong wrote: On 5/27/2010 at 10:24 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 27, 2010 at 12:46:14AM +0200, Matthias Ferdinand wrote: --On Wednesday, May 26, 2010 12:00:02 -0600 linux-ha-requ...@lists.linux-ha.org wrote: OK, but it'd be still better/easier to just use ssh with public key authentication. For telnet, there is a python plugin ibmrsa-telnet which could be modified for iLO. DISPLAY=dummy SSH_ASKPASS=/bin/my_cat_passwd_file.sh ssh somewhere my_cat_passwd_file.sh: # !/bin/sh cat /etc/passwd_file /etc/passwd_file: 0600 root root containing your password ;-) thank you for your hints. SSH_ASKPASS did not work for me (using password auth), ssh keeps prompting for the password. Apparently SSH_ASKPASS is for passphrases only. No. But as long as ssh _does_ have a tty, it will ask for the password on the tty ;-) Only if it does not find a tty, it will use the askpass hook. but as the script now does the job I think will just leave it at that. It needs to work for you, that is what matters. If you still want to fiddle around with SSH_ASKPASS, it might help to redirect stdin from /dev/null... Nope. ssh explicitly opens /dev/tty. So for it to not use the tty, it needs to have no tty ;-) To get rid of a tty, you usually do setsid. Isn't there a ssh option for this, i.e. don't allocate tty? Well, yes. That's one way I tested my claim that it works ;-) ssh -T into an other box, causing me to not have a tty there, and then doing the DISPLAY=dummy SSH_ASKPASS=script trick. But it won't help for the described (and already solved) problem. Best solution is to use key based auth. Though this is really getting out of control ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync
Il giorno Ven 28 Mag 2010 12:34:06 CET, RaSca ha scritto: [...] Note that the nfs-kernel-server isn't connected to the exportfs, but is only a cloned resource, so it isn't touched by the migration process. [...] Ok Dejan, I've patched the Filesystem RA, and here are the configuration changes: primitive share-a-fs ocf:heartbeat:Filesystem \ params device=/dev/drbd0 directory=/share-a fstype=ext3 fast_stop=no \ op monitor interval=20s timeout=40s \ op start interval=0 timeout=60s \ op stop interval=0 timeout=60s I made the same test and the problem remains, from the log I can see a lot of umount try by the RA, which are unsuccessful: ... ... May 28 14:09:51 ubuntu-nodo1 lrmd: [704]: info: RA output: (share-a-fs:stop:stderr) May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: ERROR: Couldn't unmount /share-a; trying cleanup with KILL May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: INFO: No processes on /share-a were signalled May 28 14:09:52 ubuntu-nodo1 lrmd: [704]: info: RA output: (share-a-fs:stop:stderr) umount: /share-a: device is busy.#012 (In some cases useful info about processes that use#012 the device is found by lsof(8) or fuser(1)) ... ... And then: May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: share-a-fs:stop process (PID 9651) timed out (try 1). Killing with signal SIGTERM (15). May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: operation stop[191] on ocf::Filesystem::share-a-fs for client 707, its parameters: CRM_meta_name=[stop] crm_feature_set=[3.0.1] device=[/dev/drbd0] CRM_meta_timeout=[6] directory=[/share-a] fstype=[ext3] fast_stop=[no] : pid [9651] timed out May 28 14:10:10 ubuntu-nodo1 crmd: [707]: ERROR: process_lrm_event: LRM operation share-a-fs_stop_0 (191) Timed Out (timeout=6ms) May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: status_from_rc: Action 16 (share-a-fs_stop_0) on ubuntu-nodo1 failed (target: 0 vs. rc: -2): Error May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: update_failcount: Updating failcount for share-a-fs on ubuntu-nodo1 after failed stop: rc=-2 (update=INFINITY, time=1275048610) May 28 14:10:10 ubuntu-nodo1 crmd: [707]: info: abort_transition_graph: match_graph_event:272 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=share-a-fs_stop_0, magic=2:-2;16:105:0:bd1ff2a9-427b-49a1-9845-5e3e0b91d824, cib=0.579.6) : Event failed The situation is in the end the same as before: ... ... Resource Group: share-a share-a-ip(ocf::heartbeat:IPaddr2): Started ubuntu-nodo1 share-a-fs(ocf::heartbeat:Filesystem):Started ubuntu-nodo1 (unmanaged) FAILED share-a-exportfs (ocf::heartbeat:exportfs): Stopped ... ... What can else i try? Thanks a lot, -- RaSca Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene! ra...@miamammausalinux.org http://www.miamammausalinux.org ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES10 RPM Package
2010/1/26 Andrew Beekhof and...@beekhof.net: On Mon, Jan 25, 2010 at 5:04 PM, Ciro Iriarte cyru...@gmail.com wrote: 2010/1/25 Andrew Beekhof and...@beekhof.net: On Mon, Jan 25, 2010 at 4:15 PM, Ciro Iriarte cyru...@gmail.com wrote: 2010/1/25 Andrew Beekhof and...@beekhof.net: Also, in the Advanced Build Target Selection you have this options: SUSE:SLE-10/standard SUSE:SLE-10:SDK/standard SUSE:SLE-10:SP2/standard SUSE:SLE-10:SP2:SDK/standard SUSE:SLE-10:SP3/standard SUSE:SLE-10:SP3:SDK/standard SUSE:SLE-11/standard SUSE:SLE-11:SP1/standard SUSE:SLES-9/standard Ah, that seems to be new. Not really :) Would be really nice to have it available on SLES also, but your time is yours. Thanks a lot for your work and effort in this project. If there is a public repo that I can get the rpms from, as I can for EPEL, then adding SLES is easy. There is also the option of an interested third-party dropping the tarballs into the build service instead of me :-) Hmmm, not sure what you mean. I mean if someone else wants to jump through the hoops to keep server:/ha-clustering up-to-date, thats great. I just wont be doing it myself :-) The RPMs are available from http://download.opensuse.org/repositories/server:/ha-clustering/SLES_10/ and the SPEC files from https://build.opensuse.org/project/show?project=server:ha-clustering creating a free account. In fact I see you as a member of the project, so probably you knew all this :D I can create a subproject in my home and drop the tarball there, but I would rather like to keep things in the server:/ha-clustering, spreading packages everywhere would only confuse users. Now that you mention EPEL, I see updated RHEL packages on the OBS, how do they compare?, On OBS? Wasn't me, I don't use OBS at all anymore. it's sad that RHEL packages are being updated but SLES aren't. Anyone in the world can build against the very latest EPEL repos and be compatible with CentOS and RHEL. The same isn't possible for SLES, you have to use OBS (and hope that it's up and gets to your package some time this century). OBS is a nice idea, its just to under-staffed and under-resourced to be useful. Well, apparently http://download.opensuse.org/repositories/server:/ha-clustering/ was wiped out Lars, is this permanent?, can I help with that repo? Regards, -- Ciro Iriarte http://cyruspy.wordpress.com -- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES10 RPM Package
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ciro Iriarte [28.05.2010 15:30]: 2010/1/26 Andrew Beekhof and...@beekhof.net: On Mon, Jan 25, 2010 at 5:04 PM, Ciro Iriarte cyru...@gmail.com wrote: 2010/1/25 Andrew Beekhof and...@beekhof.net: On Mon, Jan 25, 2010 at 4:15 PM, Ciro Iriarte cyru...@gmail.com wrote: 2010/1/25 Andrew Beekhof and...@beekhof.net: Also, in the Advanced Build Target Selection you have this options: SUSE:SLE-10/standard SUSE:SLE-10:SDK/standard SUSE:SLE-10:SP2/standard SUSE:SLE-10:SP2:SDK/standard SUSE:SLE-10:SP3/standard SUSE:SLE-10:SP3:SDK/standard SUSE:SLE-11/standard SUSE:SLE-11:SP1/standard SUSE:SLES-9/standard Ah, that seems to be new. Not really :) Would be really nice to have it available on SLES also, but your time is yours. Thanks a lot for your work and effort in this project. If there is a public repo that I can get the rpms from, as I can for EPEL, then adding SLES is easy. There is also the option of an interested third-party dropping the tarballs into the build service instead of me :-) Hmmm, not sure what you mean. I mean if someone else wants to jump through the hoops to keep server:/ha-clustering up-to-date, thats great. I just wont be doing it myself :-) The RPMs are available from http://download.opensuse.org/repositories/server:/ha-clustering/SLES_10/ and the SPEC files from https://build.opensuse.org/project/show?project=server:ha-clustering creating a free account. In fact I see you as a member of the project, so probably you knew all this :D I can create a subproject in my home and drop the tarball there, but I would rather like to keep things in the server:/ha-clustering, spreading packages everywhere would only confuse users. Now that you mention EPEL, I see updated RHEL packages on the OBS, how do they compare?, On OBS? Wasn't me, I don't use OBS at all anymore. it's sad that RHEL packages are being updated but SLES aren't. Anyone in the world can build against the very latest EPEL repos and be compatible with CentOS and RHEL. The same isn't possible for SLES, you have to use OBS (and hope that it's up and gets to your package some time this century). OBS is a nice idea, its just to under-staffed and under-resourced to be useful. Well, apparently http://download.opensuse.org/repositories/server:/ha-clustering/ was wiped out Lars, is this permanent?, can I help with that repo? I stumbled on that, too. The new repo is http://download.opensuse.org/repositories/network:/ha-clustering/. Don't know who had this idea... HTH Werner -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAkv/x3cACgkQk33Krq8b42NpOACePGwyRRcG1J7e4OrPIr4h34oj IVEAni4UvSMUWh6qzJ6UF1ry8SEfdbDI =bQkL -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] SLES10 RPM Package
2010/5/28 Werner Flamme werner.fla...@ufz.de: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ciro Iriarte [28.05.2010 15:30]: 2010/1/26 Andrew Beekhof and...@beekhof.net: On Mon, Jan 25, 2010 at 5:04 PM, Ciro Iriarte cyru...@gmail.com wrote: 2010/1/25 Andrew Beekhof and...@beekhof.net: On Mon, Jan 25, 2010 at 4:15 PM, Ciro Iriarte cyru...@gmail.com wrote: 2010/1/25 Andrew Beekhof and...@beekhof.net: Also, in the Advanced Build Target Selection you have this options: SUSE:SLE-10/standard SUSE:SLE-10:SDK/standard SUSE:SLE-10:SP2/standard SUSE:SLE-10:SP2:SDK/standard SUSE:SLE-10:SP3/standard SUSE:SLE-10:SP3:SDK/standard SUSE:SLE-11/standard SUSE:SLE-11:SP1/standard SUSE:SLES-9/standard Ah, that seems to be new. Not really :) Would be really nice to have it available on SLES also, but your time is yours. Thanks a lot for your work and effort in this project. If there is a public repo that I can get the rpms from, as I can for EPEL, then adding SLES is easy. There is also the option of an interested third-party dropping the tarballs into the build service instead of me :-) Hmmm, not sure what you mean. I mean if someone else wants to jump through the hoops to keep server:/ha-clustering up-to-date, thats great. I just wont be doing it myself :-) The RPMs are available from http://download.opensuse.org/repositories/server:/ha-clustering/SLES_10/ and the SPEC files from https://build.opensuse.org/project/show?project=server:ha-clustering creating a free account. In fact I see you as a member of the project, so probably you knew all this :D I can create a subproject in my home and drop the tarball there, but I would rather like to keep things in the server:/ha-clustering, spreading packages everywhere would only confuse users. Now that you mention EPEL, I see updated RHEL packages on the OBS, how do they compare?, On OBS? Wasn't me, I don't use OBS at all anymore. it's sad that RHEL packages are being updated but SLES aren't. Anyone in the world can build against the very latest EPEL repos and be compatible with CentOS and RHEL. The same isn't possible for SLES, you have to use OBS (and hope that it's up and gets to your package some time this century). OBS is a nice idea, its just to under-staffed and under-resourced to be useful. Well, apparently http://download.opensuse.org/repositories/server:/ha-clustering/ was wiped out Lars, is this permanent?, can I help with that repo? I stumbled on that, too. The new repo is http://download.opensuse.org/repositories/network:/ha-clustering/. Don't know who had this idea... HTH Werner Thanks! -- Ciro Iriarte http://cyruspy.wordpress.com -- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] odd issues with LinuxHA/ldirector
Anyone ever see an issue where ldirector would not pass requests to 2 backend real servers on a certain port (in my case 8080) but if you change that to port 22, it works flawlessly? Its really strange that it would work on one port but not another. Any hints? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] odd issues with LinuxHA/ldirector
From: linux-ha-boun...@lists.linux-ha.org on behalf of mike Sent: Fri 5/28/2010 10:01 AM To: General Linux-HA mailing list Subject: [Linux-HA] odd issues with LinuxHA/ldirector Anyone ever see an issue where ldirector would not pass requests to 2 backend real servers on a certain port (in my case 8080) but if you change that to port 22, it works flawlessly? Its really strange that it would work on one port but not another. Any hints? Mike, The port number must be a configuration parameter somewhere (some config file). pushkar winmail.dat___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] odd issues with LinuxHA/ldirector
Pushkar Pradhan wrote: From: linux-ha-boun...@lists.linux-ha.org on behalf of mike Sent: Fri 5/28/2010 10:01 AM To: General Linux-HA mailing list Subject: [Linux-HA] odd issues with LinuxHA/ldirector Anyone ever see an issue where ldirector would not pass requests to 2 backend real servers on a certain port (in my case 8080) but if you change that to port 22, it works flawlessly? Its really strange that it would work on one port but not another. Any hints? Mike, The port number must be a configuration parameter somewhere (some config file). pushkar ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Thanks Pushkar. I have the port number in ldirectord.cf, is that what you mean? Here it is: # Global Directives checktimeout=2 checkinterval=2 logfile=/var/log/ldirectord # heartbeat.example.com virtual=172.28.185.49:389 protocol=tcp scheduler=lc checktype=connect checkport=389 #negotiatetimeout=10 real=172.28.185.37:389 ipip real=172.28.185.38:389 ipip service=ldap protocol=tcp checktimeout=10 checkinterval=10 virtual=172.28.185.50:8080 protocol=tcp scheduler=lc checktype=connect checkport=8080 real=172.28.185.12:8080 ipip real=172.28.185.13:8080 ipip checktimeout=10 checkinterval=10 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] odd issues with LinuxHA/ldirector
From: linux-ha-boun...@lists.linux-ha.org on behalf of mike Sent: Fri 5/28/2010 12:08 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] odd issues with LinuxHA/ldirector Pushkar Pradhan wrote: From: linux-ha-boun...@lists.linux-ha.org on behalf of mike Sent: Fri 5/28/2010 10:01 AM To: General Linux-HA mailing list Subject: [Linux-HA] odd issues with LinuxHA/ldirector Anyone ever see an issue where ldirector would not pass requests to 2 backend real servers on a certain port (in my case 8080) but if you change that to port 22, it works flawlessly? Its really strange that it would work on one port but not another. Any hints? Mike, The port number must be a configuration parameter somewhere (some config file). pushkar ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Thanks Pushkar. I have the port number in ldirectord.cf, is that what you mean? Here it is: # Global Directives checktimeout=2 checkinterval=2 logfile=/var/log/ldirectord # heartbeat.example.com virtual=172.28.185.49:389 protocol=tcp scheduler=lc checktype=connect checkport=389 #negotiatetimeout=10 real=172.28.185.37:389 ipip real=172.28.185.38:389 ipip service=ldap protocol=tcp checktimeout=10 checkinterval=10 virtual=172.28.185.50:8080 protocol=tcp scheduler=lc checktype=connect checkport=8080 real=172.28.185.12:8080 ipip real=172.28.185.13:8080 ipip checktimeout=10 checkinterval=10 Mike, I misunderstood you earlier. I didn't know you had the port numbers in the config file. Is your firewall on either machines blocking traffic on port 8080? pushkar winmail.dat___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] odd issues with LinuxHA/ldirector
Pushkar Pradhan wrote: From: linux-ha-boun...@lists.linux-ha.org on behalf of mike Sent: Fri 5/28/2010 12:08 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] odd issues with LinuxHA/ldirector Pushkar Pradhan wrote: From: linux-ha-boun...@lists.linux-ha.org on behalf of mike Sent: Fri 5/28/2010 10:01 AM To: General Linux-HA mailing list Subject: [Linux-HA] odd issues with LinuxHA/ldirector Anyone ever see an issue where ldirector would not pass requests to 2 backend real servers on a certain port (in my case 8080) but if you change that to port 22, it works flawlessly? Its really strange that it would work on one port but not another. Any hints? Mike, The port number must be a configuration parameter somewhere (some config file). pushkar ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Thanks Pushkar. I have the port number in ldirectord.cf, is that what you mean? Here it is: # Global Directives checktimeout=2 checkinterval=2 logfile=/var/log/ldirectord # heartbeat.example.com virtual=172.28.185.49:389 protocol=tcp scheduler=lc checktype=connect checkport=389 #negotiatetimeout=10 real=172.28.185.37:389 ipip real=172.28.185.38:389 ipip service=ldap protocol=tcp checktimeout=10 checkinterval=10 virtual=172.28.185.50:8080 protocol=tcp scheduler=lc checktype=connect checkport=8080 real=172.28.185.12:8080 ipip real=172.28.185.13:8080 ipip checktimeout=10 checkinterval=10 Mike, I misunderstood you earlier. I didn't know you had the port numbers in the config file. Is your firewall on either machines blocking traffic on port 8080? pushkar ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Thanks again Pushkar, All firewalls are off on LVS and the backend servers ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] odd issues with LinuxHA/ldirector
Anyone ever see an issue where ldirector would not pass requests to 2 backend real servers on a certain port I saw that once. I checked the ldirectord perl code and it turned out that certain ports were reserved. The port I was trying to use was one of them. Can't imagine that being the case with 8080, but just sayin'. -- Eric Robinson Disclaimer - May 28, 2010 This email and any files transmitted with it are confidential and intended solely for General Linux-HA mailing list. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of . Warning: Although has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. This disclaimer was added by Policy Patrol: http://www.policypatrol.com/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Colocation, location, auto-failback=off
Hi, * I have three nodes: ha1, ha2 y ha3. * Three resources: sfex, xfs_fs, ip. * sfex and xfs_fs are members of a group called xfs_grp. * xfs_grp can run on any node but ip resource can run on ha1 or ha2 only. * When xfs_grp is running on ha1 or ha2, ip must run on the same node. * One last thing, I need manual failback. My current configuration works except for the manual failback (a.k.a. auto_failback off). node $id=0ace77ab-600a-4541-a682-ab0534bb3fc4 ha3 node $id=3d1f07b5-a79b-478f-b07c-02a7a5c5106c ha2 node $id=c44a3a26-35d4-476e-a1e6-49f03f068f12 ha1 primitive ip ocf:heartbeat:IPaddr \ params ip=192.168.1.147 primitive sfex ocf:heartbeat:sfex \ params device=/dev/sdb1 \ op monitor interval=10 timeout=10 depth=0 primitive xfs_fs ocf:heartbeat:Filesystem \ params device=/dev/sdb2 directory=/shared fstype=xfs \ op monitor interval=20 timeout=40 depth=0 group xfs_grp sfex xfs_fs location srv_loc ip -inf: ha3 colocation srv_col inf: ip xfs_grp property $id=cib-bootstrap-options \ no-quorum-policy=ignore \ expected-quorum-votes=1 \ stonith-enabled=0 \ default-resource-stickiness=INFINITY When xfs_grp is running in ha3 and ha1 or ha2 are alive again, the resources (xfs_grp and ip) move to any of them. Any ideas? Thanks! -- Diego Woitasen XTECH ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems