Re: [Linux-ha-dev] Monitoring Process Death

2010-05-28 Thread Lars Marowsky-Bree
On 2010-05-21T12:12:12, Bob Schatz bsch...@yahoo.com wrote:

 I think the basic requirements are:
 
 1.When a process starts it registers itself with a kernel component.  This 
 registration also gets passed an action.

The easiest way would be for the RA to register pids to be monitored to
lrmd, and have lrmd generate failed monitor messages if one of them goes
away.

(Of course, the RA needs to unregister them as part of the stop
operation!)

No kernel changes are necessary, this can all be implemented via a patch
to lrmd and some supporting shell funcs.


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Monitoring Process Death

2010-05-28 Thread Bob Schatz
Thanks Lars and Dejan for your feedback.

I have started reading the lrmd source.

One thing I am worried about is that if I give a PID to lrmd, how will lrmd 
monitor it?

My RA is a shell script that forks off a daemon.  If I give this daemon PID to 
lrmd does lrmd start a thread to monitor PID?

I don't think that lrmd would get a SIGCHLD in my case.


Thanks,

Bob

- Original Message 
From: Lars Marowsky-Bree l...@novell.com
To: High-Availability Linux Development List linux-ha-dev@lists.linux-ha.org
Sent: Fri, May 28, 2010 8:49:28 AM
Subject: Re: [Linux-ha-dev] Monitoring Process Death

On 2010-05-21T12:12:12, Bob Schatz bsch...@yahoo.com wrote:

 I think the basic requirements are:
 
 1.When a process starts it registers itself with a kernel component.  This 
 registration also gets passed an action.

The easiest way would be for the RA to register pids to be monitored to
lrmd, and have lrmd generate failed monitor messages if one of them goes
away.

(Of course, the RA needs to unregister them as part of the stop
operation!)

No kernel changes are necessary, this can all be implemented via a patch
to lrmd and some supporting shell funcs.


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/



  
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] stonith: failure using expect+ssh (solved)

2010-05-28 Thread Lars Ellenberg
On Thu, May 27, 2010 at 09:48:15PM -0600, Tim Serong wrote:
 On 5/27/2010 at 10:24 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: 
  On Thu, May 27, 2010 at 12:46:14AM +0200, Matthias Ferdinand wrote: 
   --On Wednesday, May 26, 2010 12:00:02 -0600 
   linux-ha-requ...@lists.linux-ha.org wrote: 
OK, but it'd be still better/easier to just use ssh with public 
key authentication. For telnet, there is a python plugin 
ibmrsa-telnet which could be modified for iLO. 
 
DISPLAY=dummy SSH_ASKPASS=/bin/my_cat_passwd_file.sh ssh somewhere 
my_cat_passwd_file.sh: 
   # !/bin/sh 
cat /etc/passwd_file 
 
/etc/passwd_file: 0600 root root containing your password ;-) 
   
   thank you for your hints. SSH_ASKPASS did not work for me (using password 
   auth), ssh keeps prompting for the password. Apparently SSH_ASKPASS is 
   for 
   passphrases only. 
   
  No.  But as long as ssh _does_ have a tty, it will ask for the password 
  on the tty ;-) 
  Only if it does not find a tty, it will use the askpass hook. 
   
   but as the script now does the job I think will just leave it at that. 
   
  It needs to work for you, that is what matters. 
 
 If you still want to fiddle around with SSH_ASKPASS, it might help to
 redirect stdin from /dev/null...

Nope.
ssh explicitly opens /dev/tty.
So for it to not use the tty, it needs to have no tty ;-)
To get rid of a tty, you usually do setsid.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync

2010-05-28 Thread Dejan Muhamedagic
Hi,

On Thu, May 27, 2010 at 11:23:16AM +0200, RaSca wrote:
 Hi all,
 I've got some problems with my setup and I'm trying to understand if I 
 am missing something or is a bug, here is how to reproduce the error:
 
 node debian-lenny-nodo1
 node debian-lenny-nodo2
 primitive drbd0 ocf:linbit:drbd \
   params drbd_resource=r0 \
   op monitor interval=20s timeout=40s \
   op start interval=0 timeout=240s \
   op stop interval=0 timeout=100s
 primitive nfs-common lsb:nfs-common
 primitive nfs-kernel-server lsb:nfs-kernel-server
 primitive ping ocf:pacemaker:ping \
   params host_list=192.168.1.1 name=ping \
   op monitor interval=60s timeout=60s \
   op start interval=0 timeout=60s
 primitive portmap lsb:portmap
 primitive store-LVM ocf:heartbeat:LVM \
   params volgrpname=vg_drbd \
   op monitor interval=10s timeout=30s \
   op start interval=0 timeout=30s \
   op stop interval=0 timeout=30s
 primitive store-exportfs ocf:heartbeat:exportfs \
   params directory=/store/share clientspec=192.168.1.0/24 
 options=rw,sync,no_subtree_check,no_root_squash fsid=1 \
   op monitor interval=10s timeout=30s \
   op start interval=0 timeout=40s \
   op stop interval=0 timeout=40s \
   meta target-role=Started
 primitive store-fs ocf:heartbeat:Filesystem \
   params device=/dev/vg_drbd/lv_store directory=/store fstype=ext3 \
   op monitor interval=20s timeout=40s \
   op start interval=0 timeout=60s \
   op stop interval=0 timeout=60s \
   meta is-managed=true
 primitive store-ip ocf:heartbeat:IPaddr2 \
   params ip=192.168.1.53 nic=bond0 \
   op monitor interval=20s timeout=40s
 group nfs portmap nfs-common nfs-kernel-server
 group store store-ip store-LVM store-fs store-exportfs
 ms ms-drbd0 drbd0 \
   meta master-max=1 master-node-max=1 clone-max=2 
 clone-node-max=1 notify=true
 clone nfs_clone nfs \
   meta globally-unique=false
 clone ping_clone ping \
   meta globally-unique=false
 location cli-prefer-store store \
   rule $id=cli-prefer-rule-store inf: #uname eq debian-lenny-nodo1
 location store_on_connected_node store \
   rule $id=store_on_connected_node-rule -inf: not_defined ping or ping 
 lte 0
 colocation store_on_ms-drbd0 inf: store ms-drbd0:Master
 order store_after_ms-drbd0 inf: ms-drbd0:promote store:start
 property $id=cib-bootstrap-options \
   dc-version=1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75 \
   no-quorum-policy=ignore \
   stonith-enabled=false \
   cluster-infrastructure=openais \
   expected-quorum-votes=2 \
   last-lrm-refresh=1274949951
 
 Everything comes up smoothly:
 
 Online: [ debian-lenny-nodo1 debian-lenny-nodo2 ]
 
   Clone Set: ping_clone
   Started: [ debian-lenny-nodo1 debian-lenny-nodo2 ]
   Master/Slave Set: ms-drbd0
   Masters: [ debian-lenny-nodo1 ]
   Slaves: [ debian-lenny-nodo2 ]
   Resource Group: store
   store-ip   (ocf::heartbeat:IPaddr2):   Started debian-lenny-nodo1
   store-LVM  (ocf::heartbeat:LVM):   Started debian-lenny-nodo1
   store-fs   (ocf::heartbeat:Filesystem):Started debian-lenny-nodo1
   store-exportfs (ocf::heartbeat:exportfs):  Started 
 debian-lenny-nodo1
   Clone Set: nfs_clone
   Started: [ debian-lenny-nodo2 debian-lenny-nodo1 ]
 
 I mount the share on a network client, with default options, and then 
 begin to copy with cp command.
 The copy goes on and after a while i migrate the group store on the 
 second node:
 
 crm resource migrate store debian-lenny-nodo2
 
 Everything goes smooth and on the client the copy hangs for a minute or 
 two, and the restart.
 After that, from the client i copy another thing on the nfs storage, 
 this time with rsync command.
 The copy starts and after a while i launch the migration command.
 The cluster this time hangs, giving a failure on the filesystem resource:
 
 store-fs   (ocf::heartbeat:Filesystem):Started debian-lenny-nodo2 
 (unmanaged) FAILED
 
 the only way to make things work again is to cleanup the nfs_clone 
 resource (or restart the nfs-kernel-server daemon) and then cleanup the 
 store group. It seems that the filesystem is keep opened by the nfs daemon.

Which it shouldn't. The lsb:nfs-kernel-server should've exited
only once the server was really stopped.

 So, what's the difference between a simple copy and a rsync? Why with 
 rsync the fs resource isn't able to unmount the filesystem?

Did you try strace with rsync to see what is different?

 There is 
 something I am missing or this should be an fs resource agent bug?

Not strictly a Filesystem RA bug, though it could behave better.
Currently, the stop operation fails quickly (in six seconds) in
case there's something using the filesystem, which won't go away.
As is the case with kernel threads. You can try the attached
patch with which the Filesystem RA is going to wait until the
defined stop timeout for the filesystem to be unmounted. Short

Re: [Linux-HA] stonith: failure using expect+ssh (solved)

2010-05-28 Thread Dejan Muhamedagic
On Fri, May 28, 2010 at 11:07:53AM +0200, Lars Ellenberg wrote:
 On Thu, May 27, 2010 at 09:48:15PM -0600, Tim Serong wrote:
  On 5/27/2010 at 10:24 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: 
   On Thu, May 27, 2010 at 12:46:14AM +0200, Matthias Ferdinand wrote: 
--On Wednesday, May 26, 2010 12:00:02 -0600 
linux-ha-requ...@lists.linux-ha.org wrote: 
 OK, but it'd be still better/easier to just use ssh with public 
 key authentication. For telnet, there is a python plugin 
 ibmrsa-telnet which could be modified for iLO. 
  
 DISPLAY=dummy SSH_ASKPASS=/bin/my_cat_passwd_file.sh ssh somewhere 
 my_cat_passwd_file.sh: 
# !/bin/sh 
 cat /etc/passwd_file 
  
 /etc/passwd_file: 0600 root root containing your password ;-) 

thank you for your hints. SSH_ASKPASS did not work for me (using 
password 
auth), ssh keeps prompting for the password. Apparently SSH_ASKPASS is 
for 
passphrases only. 

   No.  But as long as ssh _does_ have a tty, it will ask for the password 
   on the tty ;-) 
   Only if it does not find a tty, it will use the askpass hook. 

but as the script now does the job I think will just leave it at that. 

   It needs to work for you, that is what matters. 
  
  If you still want to fiddle around with SSH_ASKPASS, it might help to
  redirect stdin from /dev/null...
 
 Nope.
 ssh explicitly opens /dev/tty.
 So for it to not use the tty, it needs to have no tty ;-)
 To get rid of a tty, you usually do setsid.

Isn't there a ssh option for this, i.e. don't allocate tty?
Though this is really getting out of control ;-)

 -- 
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 
 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync

2010-05-28 Thread RaSca
Il giorno Ven 28 Mag 2010 12:29:24 CET, Dejan Muhamedagic ha scritto:
 Hi,

Hi Dejan, thanks for your answer.

[...]
 the only way to make things work again is to cleanup the nfs_clone
 resource (or restart the nfs-kernel-server daemon) and then cleanup the
 store group. It seems that the filesystem is keep opened by the nfs daemon.
 Which it shouldn't. The lsb:nfs-kernel-server should've exited
 only once the server was really stopped.

Note that the nfs-kernel-server isn't connected to the exportfs, but is 
only a cloned resource, so it isn't touched by the migration process.

 So, what's the difference between a simple copy and a rsync? Why with
 rsync the fs resource isn't able to unmount the filesystem?
 Did you try strace with rsync to see what is different?

I made some other tests and the problem appear also with cp, so it is 
not an rsync-related issue.


 There is
 something I am missing or this should be an fs resource agent bug?
 Not strictly a Filesystem RA bug, though it could behave better.
 Currently, the stop operation fails quickly (in six seconds) in
 case there's something using the filesystem, which won't go away.
 As is the case with kernel threads. You can try the attached
 patch with which the Filesystem RA is going to wait until the
 defined stop timeout for the filesystem to be unmounted. Short
 instructions: set the fast_stop parameter to no and set the
 timeout for the stop operation of the filesystem to however long
 the nfsd takes to exit.
 Thanks,
 Dejan

I will give the patch an immediate try and let you know.

Thanks again!

-- 
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
ra...@miamammausalinux.org
http://www.miamammausalinux.org
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] stonith: failure using expect+ssh (solved)

2010-05-28 Thread Lars Ellenberg
On Fri, May 28, 2010 at 12:32:33PM +0200, Dejan Muhamedagic wrote:
 On Fri, May 28, 2010 at 11:07:53AM +0200, Lars Ellenberg wrote:
  On Thu, May 27, 2010 at 09:48:15PM -0600, Tim Serong wrote:
   On 5/27/2010 at 10:24 PM, Lars Ellenberg lars.ellenb...@linbit.com 
   wrote: 
On Thu, May 27, 2010 at 12:46:14AM +0200, Matthias Ferdinand wrote: 
 --On Wednesday, May 26, 2010 12:00:02 -0600 
 linux-ha-requ...@lists.linux-ha.org wrote: 
  OK, but it'd be still better/easier to just use ssh with public 
  key authentication. For telnet, there is a python plugin 
  ibmrsa-telnet which could be modified for iLO. 
   
  DISPLAY=dummy SSH_ASKPASS=/bin/my_cat_passwd_file.sh ssh somewhere 
  my_cat_passwd_file.sh: 
 # !/bin/sh 
  cat /etc/passwd_file 
   
  /etc/passwd_file: 0600 root root containing your password ;-) 
 
 thank you for your hints. SSH_ASKPASS did not work for me (using 
 password 
 auth), ssh keeps prompting for the password. Apparently SSH_ASKPASS 
 is for 
 passphrases only. 
 
No.  But as long as ssh _does_ have a tty, it will ask for the password 
on the tty ;-) 
Only if it does not find a tty, it will use the askpass hook. 
 
 but as the script now does the job I think will just leave it at 
 that. 
 
It needs to work for you, that is what matters. 
   
   If you still want to fiddle around with SSH_ASKPASS, it might help to
   redirect stdin from /dev/null...
  
  Nope.
  ssh explicitly opens /dev/tty.
  So for it to not use the tty, it needs to have no tty ;-)
  To get rid of a tty, you usually do setsid.
 
 Isn't there a ssh option for this, i.e. don't allocate tty?

Well, yes.
That's one way I tested my claim that it works ;-)
ssh -T into an other box, causing me to not have a tty there,
and then doing the DISPLAY=dummy SSH_ASKPASS=script trick.
But it won't help for the described (and already solved) problem. 
Best solution is to use key based auth.

 Though this is really getting out of control ;-)


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync

2010-05-28 Thread RaSca
Il giorno Ven 28 Mag 2010 12:34:06 CET, RaSca ha scritto:
[...]
 Note that the nfs-kernel-server isn't connected to the exportfs, but is
 only a cloned resource, so it isn't touched by the migration process.
[...]

Ok Dejan,
I've patched the Filesystem RA, and here are the configuration changes:

primitive share-a-fs ocf:heartbeat:Filesystem \
 params device=/dev/drbd0 directory=/share-a fstype=ext3 
fast_stop=no \
 op monitor interval=20s timeout=40s \
 op start interval=0 timeout=60s \
 op stop interval=0 timeout=60s

I made the same test and the problem remains, from the log I can see a 
lot of umount try by the RA, which are unsuccessful:

...
...
May 28 14:09:51 ubuntu-nodo1 lrmd: [704]: info: RA output: 
(share-a-fs:stop:stderr)
May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: ERROR: Couldn't unmount 
/share-a; trying cleanup with KILL
May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: INFO: No processes on 
/share-a were signalled
May 28 14:09:52 ubuntu-nodo1 lrmd: [704]: info: RA output: 
(share-a-fs:stop:stderr) umount: /share-a: device is busy.#012 
(In some cases useful info about processes that use#012
   the device is found by lsof(8) or fuser(1))
...
...

And then:

May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: share-a-fs:stop process 
(PID 9651) timed out (try 1).  Killing with signal SIGTERM (15).
May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: operation stop[191] on 
ocf::Filesystem::share-a-fs for client 707, its parameters: 
CRM_meta_name=[stop] crm_feature_set=[3.0.1] device=[/dev/drbd0] 
CRM_meta_timeout=[6] directory=[/share-a] fstype=[ext3] 
fast_stop=[no] : pid [9651] timed out
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: ERROR: process_lrm_event: LRM 
operation share-a-fs_stop_0 (191) Timed Out (timeout=6ms)
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: status_from_rc: Action 
16 (share-a-fs_stop_0) on ubuntu-nodo1 failed (target: 0 vs. rc: -2): Error
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: update_failcount: 
Updating failcount for share-a-fs on ubuntu-nodo1 after failed stop: 
rc=-2 (update=INFINITY, time=1275048610)
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: info: abort_transition_graph: 
match_graph_event:272 - Triggered transition abort (complete=0, 
tag=lrm_rsc_op, id=share-a-fs_stop_0, 
magic=2:-2;16:105:0:bd1ff2a9-427b-49a1-9845-5e3e0b91d824, cib=0.579.6) : 
Event failed

The situation is in the end the same as before:

...
...
  Resource Group: share-a
  share-a-ip(ocf::heartbeat:IPaddr2):   Started ubuntu-nodo1
  share-a-fs(ocf::heartbeat:Filesystem):Started ubuntu-nodo1 
(unmanaged) FAILED
  share-a-exportfs  (ocf::heartbeat:exportfs):  Stopped
...
...

What can else i try?

Thanks a lot,

-- 
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
ra...@miamammausalinux.org
http://www.miamammausalinux.org
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] SLES10 RPM Package

2010-05-28 Thread Ciro Iriarte
2010/1/26 Andrew Beekhof and...@beekhof.net:
 On Mon, Jan 25, 2010 at 5:04 PM, Ciro Iriarte cyru...@gmail.com wrote:
 2010/1/25 Andrew Beekhof and...@beekhof.net:
 On Mon, Jan 25, 2010 at 4:15 PM, Ciro Iriarte cyru...@gmail.com wrote:
 2010/1/25 Andrew Beekhof and...@beekhof.net:

 Also, in the Advanced Build Target Selection you have this options:

 SUSE:SLE-10/standard
 SUSE:SLE-10:SDK/standard
 SUSE:SLE-10:SP2/standard
 SUSE:SLE-10:SP2:SDK/standard
 SUSE:SLE-10:SP3/standard
 SUSE:SLE-10:SP3:SDK/standard
 SUSE:SLE-11/standard
 SUSE:SLE-11:SP1/standard
 SUSE:SLES-9/standard

 Ah, that seems to be new.

 Not really :)

 Would be really nice to have it available on SLES also, but your time
 is yours. Thanks a lot for your work and effort in this project.

 If there is a public repo that I can get the rpms from, as I can for
 EPEL, then adding SLES is easy.
 There is also the option of an interested third-party dropping the
 tarballs into the build service instead of me :-)

 Hmmm, not sure what you mean.

 I mean if someone else wants to jump through the hoops to keep
 server:/ha-clustering up-to-date, thats great.
 I just wont be doing it myself :-)

 The RPMs are available from
 http://download.opensuse.org/repositories/server:/ha-clustering/SLES_10/
 and the SPEC files from
 https://build.opensuse.org/project/show?project=server:ha-clustering
 creating a free account. In fact I see you as a member of the project,
 so probably you knew all this :D

 I can create a subproject in my home and drop the tarball there, but I
 would rather like to keep things in the server:/ha-clustering,
 spreading packages everywhere would only confuse users. Now that you
 mention EPEL, I see updated RHEL packages on the OBS, how do they
 compare?,

 On OBS?
 Wasn't me, I don't use OBS at all anymore.

 it's sad that RHEL packages are being updated but SLES
 aren't.

 Anyone in the world can build against the very latest EPEL repos and
 be compatible with CentOS and RHEL.
 The same isn't possible for SLES, you have to use OBS (and hope that
 it's up and gets to your package some time this century).

 OBS is a nice idea, its just to under-staffed and under-resourced to be 
 useful.

Well, apparently
http://download.opensuse.org/repositories/server:/ha-clustering/ was
wiped out Lars, is this permanent?, can I help with that repo?

Regards,

-- 
Ciro Iriarte
http://cyruspy.wordpress.com
--
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] SLES10 RPM Package

2010-05-28 Thread Werner Flamme
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ciro Iriarte [28.05.2010 15:30]:
 2010/1/26 Andrew Beekhof and...@beekhof.net:
 On Mon, Jan 25, 2010 at 5:04 PM, Ciro Iriarte cyru...@gmail.com wrote:
 2010/1/25 Andrew Beekhof and...@beekhof.net:
 On Mon, Jan 25, 2010 at 4:15 PM, Ciro Iriarte cyru...@gmail.com wrote:
 2010/1/25 Andrew Beekhof and...@beekhof.net:

 Also, in the Advanced Build Target Selection you have this options:

 SUSE:SLE-10/standard
 SUSE:SLE-10:SDK/standard
 SUSE:SLE-10:SP2/standard
 SUSE:SLE-10:SP2:SDK/standard
 SUSE:SLE-10:SP3/standard
 SUSE:SLE-10:SP3:SDK/standard
 SUSE:SLE-11/standard
 SUSE:SLE-11:SP1/standard
 SUSE:SLES-9/standard

 Ah, that seems to be new.

 Not really :)

 Would be really nice to have it available on SLES also, but your time
 is yours. Thanks a lot for your work and effort in this project.

 If there is a public repo that I can get the rpms from, as I can for
 EPEL, then adding SLES is easy.
 There is also the option of an interested third-party dropping the
 tarballs into the build service instead of me :-)

 Hmmm, not sure what you mean.

 I mean if someone else wants to jump through the hoops to keep
 server:/ha-clustering up-to-date, thats great.
 I just wont be doing it myself :-)

 The RPMs are available from
 http://download.opensuse.org/repositories/server:/ha-clustering/SLES_10/
 and the SPEC files from
 https://build.opensuse.org/project/show?project=server:ha-clustering
 creating a free account. In fact I see you as a member of the project,
 so probably you knew all this :D

 I can create a subproject in my home and drop the tarball there, but I
 would rather like to keep things in the server:/ha-clustering,
 spreading packages everywhere would only confuse users. Now that you
 mention EPEL, I see updated RHEL packages on the OBS, how do they
 compare?,

 On OBS?
 Wasn't me, I don't use OBS at all anymore.

 it's sad that RHEL packages are being updated but SLES
 aren't.

 Anyone in the world can build against the very latest EPEL repos and
 be compatible with CentOS and RHEL.
 The same isn't possible for SLES, you have to use OBS (and hope that
 it's up and gets to your package some time this century).

 OBS is a nice idea, its just to under-staffed and under-resourced to be 
 useful.
 
 Well, apparently
 http://download.opensuse.org/repositories/server:/ha-clustering/ was
 wiped out Lars, is this permanent?, can I help with that repo?

I stumbled on that, too. The new repo is
http://download.opensuse.org/repositories/network:/ha-clustering/. Don't
know who had this idea...

HTH
Werner
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAkv/x3cACgkQk33Krq8b42NpOACePGwyRRcG1J7e4OrPIr4h34oj
IVEAni4UvSMUWh6qzJ6UF1ry8SEfdbDI
=bQkL
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] SLES10 RPM Package

2010-05-28 Thread Ciro Iriarte
2010/5/28 Werner Flamme werner.fla...@ufz.de:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Ciro Iriarte [28.05.2010 15:30]:
 2010/1/26 Andrew Beekhof and...@beekhof.net:
 On Mon, Jan 25, 2010 at 5:04 PM, Ciro Iriarte cyru...@gmail.com wrote:
 2010/1/25 Andrew Beekhof and...@beekhof.net:
 On Mon, Jan 25, 2010 at 4:15 PM, Ciro Iriarte cyru...@gmail.com wrote:
 2010/1/25 Andrew Beekhof and...@beekhof.net:

 Also, in the Advanced Build Target Selection you have this options:

 SUSE:SLE-10/standard
 SUSE:SLE-10:SDK/standard
 SUSE:SLE-10:SP2/standard
 SUSE:SLE-10:SP2:SDK/standard
 SUSE:SLE-10:SP3/standard
 SUSE:SLE-10:SP3:SDK/standard
 SUSE:SLE-11/standard
 SUSE:SLE-11:SP1/standard
 SUSE:SLES-9/standard

 Ah, that seems to be new.

 Not really :)

 Would be really nice to have it available on SLES also, but your time
 is yours. Thanks a lot for your work and effort in this project.

 If there is a public repo that I can get the rpms from, as I can for
 EPEL, then adding SLES is easy.
 There is also the option of an interested third-party dropping the
 tarballs into the build service instead of me :-)

 Hmmm, not sure what you mean.

 I mean if someone else wants to jump through the hoops to keep
 server:/ha-clustering up-to-date, thats great.
 I just wont be doing it myself :-)

 The RPMs are available from
 http://download.opensuse.org/repositories/server:/ha-clustering/SLES_10/
 and the SPEC files from
 https://build.opensuse.org/project/show?project=server:ha-clustering
 creating a free account. In fact I see you as a member of the project,
 so probably you knew all this :D

 I can create a subproject in my home and drop the tarball there, but I
 would rather like to keep things in the server:/ha-clustering,
 spreading packages everywhere would only confuse users. Now that you
 mention EPEL, I see updated RHEL packages on the OBS, how do they
 compare?,

 On OBS?
 Wasn't me, I don't use OBS at all anymore.

 it's sad that RHEL packages are being updated but SLES
 aren't.

 Anyone in the world can build against the very latest EPEL repos and
 be compatible with CentOS and RHEL.
 The same isn't possible for SLES, you have to use OBS (and hope that
 it's up and gets to your package some time this century).

 OBS is a nice idea, its just to under-staffed and under-resourced to be 
 useful.

 Well, apparently
 http://download.opensuse.org/repositories/server:/ha-clustering/ was
 wiped out Lars, is this permanent?, can I help with that repo?

 I stumbled on that, too. The new repo is
 http://download.opensuse.org/repositories/network:/ha-clustering/. Don't
 know who had this idea...

 HTH
 Werner

Thanks!

-- 
Ciro Iriarte
http://cyruspy.wordpress.com
--
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] odd issues with LinuxHA/ldirector

2010-05-28 Thread mike

Anyone ever see an issue where ldirector would not pass requests to 2 
backend real servers on a certain port (in my case 8080) but if you 
change that to port 22, it works flawlessly?

Its really strange that it would work on one port but not another. Any 
hints?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] odd issues with LinuxHA/ldirector

2010-05-28 Thread Pushkar Pradhan
 



From: linux-ha-boun...@lists.linux-ha.org on behalf of mike
Sent: Fri 5/28/2010 10:01 AM
To: General Linux-HA mailing list
Subject: [Linux-HA] odd issues with LinuxHA/ldirector




Anyone ever see an issue where ldirector would not pass requests to 2
backend real servers on a certain port (in my case 8080) but if you
change that to port 22, it works flawlessly?

Its really strange that it would work on one port but not another. Any
hints?

Mike,

The port number must be a configuration parameter somewhere (some config file).

pushkar

winmail.dat___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] odd issues with LinuxHA/ldirector

2010-05-28 Thread mike
Pushkar Pradhan wrote:
  

 

 From: linux-ha-boun...@lists.linux-ha.org on behalf of mike
 Sent: Fri 5/28/2010 10:01 AM
 To: General Linux-HA mailing list
 Subject: [Linux-HA] odd issues with LinuxHA/ldirector




 Anyone ever see an issue where ldirector would not pass requests to 2
 backend real servers on a certain port (in my case 8080) but if you
 change that to port 22, it works flawlessly?

 Its really strange that it would work on one port but not another. Any
 hints?

 Mike,

 The port number must be a configuration parameter somewhere (some config 
 file).

 pushkar

   
 

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

Thanks Pushkar. I have the port number in ldirectord.cf, is that what 
you mean? Here it is:
# Global Directives
checktimeout=2
checkinterval=2
logfile=/var/log/ldirectord

# heartbeat.example.com
virtual=172.28.185.49:389
protocol=tcp
scheduler=lc
checktype=connect
checkport=389
#negotiatetimeout=10
real=172.28.185.37:389 ipip
real=172.28.185.38:389 ipip
service=ldap
protocol=tcp
checktimeout=10
checkinterval=10

virtual=172.28.185.50:8080
protocol=tcp
scheduler=lc
checktype=connect
checkport=8080
real=172.28.185.12:8080 ipip
real=172.28.185.13:8080 ipip
checktimeout=10
checkinterval=10


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] odd issues with LinuxHA/ldirector

2010-05-28 Thread Pushkar Pradhan
 



From: linux-ha-boun...@lists.linux-ha.org on behalf of mike
Sent: Fri 5/28/2010 12:08 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] odd issues with LinuxHA/ldirector



Pushkar Pradhan wrote:
 

 

 From: linux-ha-boun...@lists.linux-ha.org on behalf of mike
 Sent: Fri 5/28/2010 10:01 AM
 To: General Linux-HA mailing list
 Subject: [Linux-HA] odd issues with LinuxHA/ldirector




 Anyone ever see an issue where ldirector would not pass requests to 2
 backend real servers on a certain port (in my case 8080) but if you
 change that to port 22, it works flawlessly?

 Its really strange that it would work on one port but not another. Any
 hints?

 Mike,

 The port number must be a configuration parameter somewhere (some config 
 file).

 pushkar

  
 

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

Thanks Pushkar. I have the port number in ldirectord.cf, is that what
you mean? Here it is:
# Global Directives
checktimeout=2
checkinterval=2
logfile=/var/log/ldirectord

# heartbeat.example.com
virtual=172.28.185.49:389
protocol=tcp
scheduler=lc
checktype=connect
checkport=389
#negotiatetimeout=10
real=172.28.185.37:389 ipip
real=172.28.185.38:389 ipip
service=ldap
protocol=tcp
checktimeout=10
checkinterval=10

virtual=172.28.185.50:8080
protocol=tcp
scheduler=lc
checktype=connect
checkport=8080
real=172.28.185.12:8080 ipip
real=172.28.185.13:8080 ipip
checktimeout=10
checkinterval=10

Mike,

I misunderstood you earlier. I didn't know you had the port numbers in the 
config file. Is your firewall on either machines blocking traffic on port 8080? 

pushkar


 

winmail.dat___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] odd issues with LinuxHA/ldirector

2010-05-28 Thread mike
Pushkar Pradhan wrote:
  

 

 From: linux-ha-boun...@lists.linux-ha.org on behalf of mike
 Sent: Fri 5/28/2010 12:08 PM
 To: General Linux-HA mailing list
 Subject: Re: [Linux-HA] odd issues with LinuxHA/ldirector



 Pushkar Pradhan wrote:
   
 

 From: linux-ha-boun...@lists.linux-ha.org on behalf of mike
 Sent: Fri 5/28/2010 10:01 AM
 To: General Linux-HA mailing list
 Subject: [Linux-HA] odd issues with LinuxHA/ldirector




 Anyone ever see an issue where ldirector would not pass requests to 2
 backend real servers on a certain port (in my case 8080) but if you
 change that to port 22, it works flawlessly?

 Its really strange that it would work on one port but not another. Any
 hints?

 Mike,

 The port number must be a configuration parameter somewhere (some config 
 file).

 pushkar

  
 

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

 Thanks Pushkar. I have the port number in ldirectord.cf, is that what
 you mean? Here it is:
 # Global Directives
 checktimeout=2
 checkinterval=2
 logfile=/var/log/ldirectord

 # heartbeat.example.com
 virtual=172.28.185.49:389
 protocol=tcp
 scheduler=lc
 checktype=connect
 checkport=389
 #negotiatetimeout=10
 real=172.28.185.37:389 ipip
 real=172.28.185.38:389 ipip
 service=ldap
 protocol=tcp
 checktimeout=10
 checkinterval=10

 virtual=172.28.185.50:8080
 protocol=tcp
 scheduler=lc
 checktype=connect
 checkport=8080
 real=172.28.185.12:8080 ipip
 real=172.28.185.13:8080 ipip
 checktimeout=10
 checkinterval=10

 Mike,

 I misunderstood you earlier. I didn't know you had the port numbers in the 
 config file. Is your firewall on either machines blocking traffic on port 
 8080? 

 pushkar


  

   
 

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
Thanks again Pushkar,

All firewalls are off on LVS and the backend servers


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] odd issues with LinuxHA/ldirector

2010-05-28 Thread Robinson, Eric
 Anyone ever see an issue where ldirector would not pass requests 
 to 2 backend real servers on a certain port 

I saw that once. I checked the ldirectord perl code and it turned out
that certain ports were reserved. The port I was trying to use was one
of them. Can't imagine that being the case with 8080, but just sayin'. 

--
Eric Robinson


Disclaimer - May 28, 2010 
This email and any files transmitted with it are confidential and intended 
solely for General Linux-HA mailing list. If you are not the named addressee 
you should not disseminate, distribute, copy or alter this email. Any views or 
opinions presented in this email are solely those of the author and might not 
represent those of . Warning: Although  has taken reasonable precautions to 
ensure no viruses are present in this email, the company cannot accept 
responsibility for any loss or damage arising from the use of this email or 
attachments. 
This disclaimer was added by Policy Patrol: http://www.policypatrol.com/
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Colocation, location, auto-failback=off

2010-05-28 Thread Diego Woitasen
Hi,
  * I have three nodes: ha1, ha2 y ha3.
  * Three resources: sfex, xfs_fs, ip.
  * sfex and xfs_fs are members of a group called xfs_grp.
  * xfs_grp can run on any node but ip resource can run on ha1 or
ha2 only.
  * When xfs_grp is running on ha1 or ha2, ip must run on the same
node.
  * One last thing, I need manual failback.

My current configuration works except for the manual failback (a.k.a.
auto_failback off).

node $id=0ace77ab-600a-4541-a682-ab0534bb3fc4 ha3
node $id=3d1f07b5-a79b-478f-b07c-02a7a5c5106c ha2
node $id=c44a3a26-35d4-476e-a1e6-49f03f068f12 ha1
primitive ip ocf:heartbeat:IPaddr \
params ip=192.168.1.147
primitive sfex ocf:heartbeat:sfex \
params device=/dev/sdb1 \
op monitor interval=10 timeout=10 depth=0
primitive xfs_fs ocf:heartbeat:Filesystem \
params device=/dev/sdb2 directory=/shared fstype=xfs \
op monitor interval=20 timeout=40 depth=0
group xfs_grp sfex xfs_fs
location srv_loc ip -inf: ha3
colocation srv_col inf: ip xfs_grp
property $id=cib-bootstrap-options \
no-quorum-policy=ignore \
expected-quorum-votes=1 \
stonith-enabled=0 \
default-resource-stickiness=INFINITY

When xfs_grp is running in ha3 and ha1 or ha2 are alive again, the
resources (xfs_grp and ip) move to any of them.

Any ideas?

Thanks!

-- 
Diego Woitasen
XTECH
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems