Re: [Linux-HA] How to painlessly change depended upon resource groups?

2013-08-24 Thread Ferenc Wagner
Ferenc Wagner wf...@niif.hu writes:

 Arnold Krille arn...@arnoldarts.de writes:

 If I understand you correctly, the problem only arises when adding new
 bridges while the cluster is running. And your vms will (rightfully)
 get restarted when you add a non-running bridge-resource to the
 cloned dependency-group.

 Exactly.

 You might be able to circumvent this problem: Define the bridge as a
 single cloned resource and start it. When it runs on all nodes,

I wonder if there's a better way to check this than doing periodic set
compares between the output of crm_node --partition and crm_resource
--quiet --resource=temporary-bridge-clone --locate...

 remove the clones for the single resource and add the resource to
 your dependency-group in one single edit. With commit the cluster
 should see that the new resource in the group is already running and
 thus not affect the vms.

 Thanks, this sounds like a plan, I'll test it!

Turns out it works indeed, but the interface is still unfortunate: after
removing the temporary clone and moving the new bridge into the group,
crm_simulate still predicts the restart of all VMs.  This does not
happen, though, because the new resource is already running, but it's a
pity that's not shown beforehand.  Anyway, this trick at least gets rid
of the manual starting work.
-- 
Cheers,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How to painlessly change depended upon resource groups?

2013-08-24 Thread Arnold Krille
Hi,

On Fri, 23 Aug 2013 10:41:21 +0200 Ferenc Wagner wf...@niif.hu wrote:
 Arnold Krille arn...@arnoldarts.de writes:
  On a side-note: I made the (sad) experience that its easier to
  configure such stuff outside of pacemaker/corosync and use the
  cluster only for the reliable ha things.
 What do you mean by reliable?  What did you experience (if you can
 put it in a few sentences)?

I did use pacemaker to manage several things:
 1) drbd and vms and the dependencies between them. And the clones
 service of libvirt.
 2) drbd for a directory of configuration for central services like
 dhcp, named and an apache2 for some services. These where in one group
 depending on the drbd-master.

The second turned out to be less optimal than I thought: Everytime you
want to change something on the dhcp or named configuration, you first
have to check on which node its active. Then when you restart dhcp, it
has to be done through pacemaker. And you shouldn't be in the
shared-config directory with your shell as otherwise pacemaker might
decide to move the resource on the restart and fail because the
directory can't be unmounted and then fail the cluster and/or fence the
node. With desastrous results for the terminal-server-vm running on
that node affecting all the co-workers...
I already dropped the central apache2 and named and replaced them by an
instance configured by chef to be the same on all nodes. Additionally
the services aren't controlled by pacemaker anymore. So named is still
available when I shut down the cluster for maintainance. And being able
to run named on all three nodes is better then running it only on the
two nodes sharing the configuration-drbd.

Next thing to do is have dhcp configured by chef and take out of this
group. Then the group will be empty:-)

Additionally pacemaker gets slower in its behaviour the more resources
you have. And when its 10-15 virtual machines each with one or more
drbd-resources, well, currently its 63 resources pacemaker is
watching...

tl;dr: pacemaker is pretty cool at making sure the defined resources
are running. But the simplier your resources are, the better. One vm
depending on one or two drbd-masters is great. Synchronizing
configuration and managing complicated dependencies can be done with
pacemaker but there are better things to spend your time with.

  Configuring several systems into a sane state is more a job for
  configuration-management such as chef, puppet or at least csync2 (to
  sync the configs).
 I'm not a big fan of configuration management systems, but they
 probably have their place.  None is present in the current setup,
 though, so setting one up for bridge configuration seemed more
 complicated than extending the cluster.  We'll see...

While automation has its advantage just by the fact that is automation,
what made it appealing for us is the repeatability. If the automation
has proven to work once in your setup, its easily portable to the
clients setup. Even more so if the initial prove of working was in
your test-setup and then re-used in your own production setup. And then
re-used on the clients network...

Take a look at Chef or Puppet or Ansible, its worth the time.

Have fun,

Arnold

PS: Sorry, it became it bit longer.


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to painlessly change depended upon resource groups?

2013-08-23 Thread Ferenc Wagner
Arnold Krille arn...@arnoldarts.de writes:

 If I understand you correctly, the problem only arises when adding new
 bridges while the cluster is running. And your vms will (rightfully)
 get restarted when you add a non-running bridge-resource to the
 cloned dependency-group.

Exactly.

 You might be able to circumvent this problem: Define the bridge as a
 single cloned resource and start it. When it runs on all nodes, remove
 the clones for the single resource and add the resource to your
 dependency-group in one single edit. With commit the cluster should see
 that the new resource in the group is already running and thus not
 affect the vms.

Thanks, this sounds like a plan, I'll test it!

 On a side-note: I made the (sad) experience that its easier to
 configure such stuff outside of pacemaker/corosync and use the cluster
 only for the reliable ha things.

What do you mean by reliable?  What did you experience (if you can put
it in a few sentences)?

 Configuring several systems into a sane state is more a job for
 configuration-management such as chef, puppet or at least csync2 (to
 sync the configs).

I'm not a big fan of configuration management systems, but they probably
have their place.  None is present in the current setup, though, so
setting one up for bridge configuration seemed more complicated than
extending the cluster.  We'll see...
-- 
Regards,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] How to painlessly change depended upon resource groups?

2013-08-22 Thread Ferenc Wagner
Hi,

I built a Pacemaker cluster to manage virtual machines (VMs).  Storage
is provided by cLVM volume groups, network access is provided by
software bridges.  I wanted to avoid maintaining precise VG and bridge
dependencies, so I created two cloned resource groups:

group storage dlm clvmd vg-vm vg-data
group network br150 br151

I cloned these groups and thus every VM resource uniformly got two these
two dependencies only, which makes it easy to add new VM resources:

colocation cl-elm-network inf: vm-elm network-clone
colocation cl-elm-storage inf: vm-elm storage-clone
order o-elm-network inf: network-clone vm-elm
order o-elm-storage inf: storage-clone vm-elm

Of course the network and storage groups do not even model their
internal dependencies correctly, as the different VGs and bridges are
independent and unordered, but this is not a serious limitation in my
case.

The problem is, if I want to extend for example the network group by a
new bridge, the cluster wants to restart all running VM resources while
starting the new bridge.  I get this info by changing a shadow copy of
the CIB and crm_simulate --run --live-check on it.  This is perfectly
understandable due to the strict ordering and colocation constraints
above, but undesirable in these cases.

The actual restarts are avoidable by putting the cluster in maintenance
mode beforehand, starting the bridge on each node manually, changing the
configuration and moving the cluster out of maintenance mode, but this
is quite a chore, and I did not find a way to make sure everything would
be fine, like seeing the planned cluster actions after the probes for
the new bridge resource are run (when there should not be anything left
to do).  Is there a way to regain my peace of mind during such
operations?  Or is there at least a way to order the cluster to start
the new bridge clones so that I don't have to invoke the resource agent
by hand on each node, thus reducing possible human mistakes?

The bridge configuration was moved into the cluster to avoid having to
maintain it in each node's OS separately.  The network and storage
resource groups provide a great concise status output with only the VM
resources expanded.  These are bonuses, but not requirements; if
sensible maintenance is not achievable with this setup, everything is
subject to change.  Actually, I'm starting to feel that simplifying the
VM dependencies may not be viable in the end, but wanted to ask for
outsider ideas before overhauling the whole configuration.
-- 
Thanks in advance,
Feri.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] How to painlessly change depended upon resource groups?

2013-08-22 Thread Arnold Krille
Hi,

On Thu, 22 Aug 2013 18:22:50 +0200 Ferenc Wagner wf...@niif.hu wrote:
 I built a Pacemaker cluster to manage virtual machines (VMs).  Storage
 is provided by cLVM volume groups, network access is provided by
 software bridges.  I wanted to avoid maintaining precise VG and bridge
 dependencies, so I created two cloned resource groups:
 
 group storage dlm clvmd vg-vm vg-data
 group network br150 br151
 
 I cloned these groups and thus every VM resource uniformly got two
 these two dependencies only, which makes it easy to add new VM
 resources:
 
 colocation cl-elm-network inf: vm-elm network-clone
 colocation cl-elm-storage inf: vm-elm storage-clone
 order o-elm-network inf: network-clone vm-elm
 order o-elm-storage inf: storage-clone vm-elm
 
 Of course the network and storage groups do not even model their
 internal dependencies correctly, as the different VGs and bridges are
 independent and unordered, but this is not a serious limitation in my
 case.
 
 The problem is, if I want to extend for example the network group by a
 new bridge, the cluster wants to restart all running VM resources
 while starting the new bridge.  I get this info by changing a shadow
 copy of the CIB and crm_simulate --run --live-check on it.  This is
 perfectly understandable due to the strict ordering and colocation
 constraints above, but undesirable in these cases.
 
 The actual restarts are avoidable by putting the cluster in
 maintenance mode beforehand, starting the bridge on each node
 manually, changing the configuration and moving the cluster out of
 maintenance mode, but this is quite a chore, and I did not find a way
 to make sure everything would be fine, like seeing the planned
 cluster actions after the probes for the new bridge resource are run
 (when there should not be anything left to do).  Is there a way to
 regain my peace of mind during such operations?  Or is there at least
 a way to order the cluster to start the new bridge clones so that I
 don't have to invoke the resource agent by hand on each node, thus
 reducing possible human mistakes?
 
 The bridge configuration was moved into the cluster to avoid having to
 maintain it in each node's OS separately.  The network and storage
 resource groups provide a great concise status output with only the VM
 resources expanded.  These are bonuses, but not requirements; if
 sensible maintenance is not achievable with this setup, everything is
 subject to change.  Actually, I'm starting to feel that simplifying
 the VM dependencies may not be viable in the end, but wanted to ask
 for outsider ideas before overhauling the whole configuration.

If I understand you correctly, the problem only arises when adding new
bridges while the cluster is running. And your vms will (rightfully)
get restarted when you add a non-running bridge-resource to the
cloned dependency-group.
You might be able to circumvent this problem: Define the bridge as a
single cloned resource and start it. When it runs on all nodes, remove
the clones for the single resource and add the resource to your
dependency-group in one single edit. With commit the cluster should see
that the new resource in the group is already running and thus not
affect the vms.


On a side-note: I made the (sad) experience that its easier to
configure such stuff outside of pacemaker/corosync and use the cluster
only for the reliable ha things. Configuring several systems into a
sane state is more a job for configuration-management such as chef,
puppet or at least csync2 (to sync the configs).

Have fun,

Arnold


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems