Re: [Linux-HA] Antw: Re: crmsh fails to stop already stopped resource

2015-02-18 Thread Vladislav Bogdanov

18.02.2015 11:03, Ulrich Windl wrote:

Lars Marowsky-Bree l...@suse.com schrieb am 16.02.2015 um 12:34 in

Nachricht
20150216113433.gb4...@suse.de:

On 2015-02-16T09:20:22, Kristoffer Grönlund kgronl...@suse.com wrote:


Actually, I decided that it does make sense to return 0 as the error
code even if the resource to delete doesn't exist, so I pushed a commit
to change this. The error message is still printed, though.


I'm not sure I agree, for once.

Idempotency is for resource agent operations, not necessarily all
operations everywhere. Especially because crmsh doesn't know whether the
object doesn't exist because it was deleted, or because it was
misspelled.


So far I was assuming we are talking about stopping a stopped resource, not
stopping a non-existing resource. I thought that crm shell would clearly
distinguish between those. The latter should be an error, of course.


That is my fault, I should be starting the new thread, but I thought 
that issue with deleting non-existent object is really very minor.


Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-16 Thread Vladislav Bogdanov

16.02.2015 14:34, Lars Marowsky-Bree wrote:

On 2015-02-16T09:20:22, Kristoffer Grönlund kgronl...@suse.com wrote:


Actually, I decided that it does make sense to return 0 as the error
code even if the resource to delete doesn't exist, so I pushed a commit
to change this. The error message is still printed, though.


I'm not sure I agree, for once.

Idempotency is for resource agent operations, not necessarily all
operations everywhere. Especially because crmsh doesn't know whether the
object doesn't exist because it was deleted, or because it was
misspelled.

Compare the Unix-as-little-else rm command; rm -f /tmp/idontexist will
give an error code.


btw with '-f' it wont. ;) And it would be enough for me if 'crm -F' 
behave the same.


Best,
Vladislav



Now a caller of crmsh has to *parse the output* to know whether the
delete command succeeded or not, which is rather non-trivial.

If the caller doesn't care whether the command succeeded or not, it
should be the caller that ignores the error code.

Or if you want to get real fancy, return different exit codes for
referenced object does not exist, or generic syntax error.


Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1


And, yeah, well, this shouldn't happen. Here idempotency applies ;-)



Regards,
 Lars



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] [PATCH] low: cibconfig: Do not fail on deletion of non-existing objects

2015-02-16 Thread Vladislav Bogdanov
---
 modules/cibconfig.py |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/modules/cibconfig.py b/modules/cibconfig.py
index 8689c1b..8680f33 100644
--- a/modules/cibconfig.py
+++ b/modules/cibconfig.py
@@ -3463,8 +3463,6 @@ class CibFactory(object):
 for obj_id in args:
 obj = self.find_object(obj_id)
 if not obj:
-no_object_err(obj_id)
-rc = False
 continue
 if not rscstat.can_delete(obj_id):
 common_err(resource %s is running, can't delete it % obj_id)
-- 
1.7.1

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-16 Thread Vladislav Bogdanov

16.02.2015 11:15, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,

may be it is worth to silently (or at least with rc=0) allow deletion of
non-existing or already-deleted configuration statements?

Background for that is that I keep track of the all configuration
statements myself, and, when I delete some resources (together with
accompanying constraints), they may go out-of-order to 'crm configure
delete', thus some constraints are automatically deleted when deleting
lower resource before the upper one. That leads to the whole crm
script to fail.


Hmm, I am not sure about doing this by default, since we would want to
show some kind of indication that a resource name may have been
misspelled for example... But I can imagine having a command line flag
for being more flexible in this regard.


Reuse '-F'?



I will look at how it works now.

BTW, I suspect that passing the --wait flag to crm while running
commands in this way may help you. Although I am not sure I entirely
understand what it is you are doing :)


Look:
crm configure
primitive a ...
primitive b ...
colocation b-with-a inf: b a
commit
exit

crm configure
delete a
delete b-with-a = fails because is already deleted automatically
delete b
commit

Best,
Vladislav



Cheers,
Kristoffer



Best,
Vladislav

13.02.2015 17:03, Vladislav Bogdanov wrote:

Hi,

Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1


Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems







___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-16 Thread Vladislav Bogdanov

Hi Dejan,

16.02.2015 13:47, Dejan Muhamedagic wrote:

Hi,

On Mon, Feb 16, 2015 at 11:20:16AM +0300, Vladislav Bogdanov wrote:

16.02.2015 11:15, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,

may be it is worth to silently (or at least with rc=0) allow deletion of
non-existing or already-deleted configuration statements?

Background for that is that I keep track of the all configuration
statements myself, and, when I delete some resources (together with
accompanying constraints), they may go out-of-order to 'crm configure
delete', thus some constraints are automatically deleted when deleting
lower resource before the upper one. That leads to the whole crm
script to fail.


crmsh tries hard to preserve the CIB sanity on removing elements.
It would be best that you just put all the elements you want to
delete on one line.


That's a really good idea, I'll look into this.




Hmm, I am not sure about doing this by default, since we would want to
show some kind of indication that a resource name may have been
misspelled for example... But I can imagine having a command line flag
for being more flexible in this regard.


Reuse '-F'?



I will look at how it works now.

BTW, I suspect that passing the --wait flag to crm while running
commands in this way may help you.


The --wait option effectively waits for the PE to settle. It is
normally useful only in resource/node levels and on configure
commit.


Although I am not sure I entirely
understand what it is you are doing :)


Look:
crm configure
primitive a ...
primitive b ...
colocation b-with-a inf: b a
commit
exit

crm configure
delete a
delete b-with-a = fails because is already deleted automatically


You can also omit removing constraints as they are going to be
removed with the resources they reference.


Unless the same function is used to remove just constraints too (like in 
my case - I compare old and new definition of an object with constraints 
and remove stale ones).


Anyways, thanks for pointer to multi-object deletes!

Best,
Vladislav



Cheers,

Dejan


delete b
commit

Best,
Vladislav



Cheers,
Kristoffer



Best,
Vladislav

13.02.2015 17:03, Vladislav Bogdanov wrote:

Hi,

Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1


Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems







___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-13 Thread Vladislav Bogdanov
Hi Kristoffer,

13.02.2015 17:20, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 Hi,

 Following fails with the current crmsh (e4b10ee).
 # crm resource stop cl-http-lv
 # crm resource stop cl-http-lv
 ERROR: crm_diff apparently failed to produce the diff (rc=0)
 ERROR: Failed to commit updates to cl-http-lv
 # echo $?
 1

 
 Hi,
 
 What would you expect to see when stopping an already stopped resource?

I'd expect crmsh to behave similar to
crm_resource --resource cl-http-lv --set-parameter target-role --meta 
--parameter-value Stopped

At least it should not exit with failure ret code.

Best,
Vladislav

 
 Cheers,
 Kristoffer
 

 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-13 Thread Vladislav Bogdanov

13.02.2015 18:04, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,

13.02.2015 17:20, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi,

Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1



Hi,

What would you expect to see when stopping an already stopped resource?


I'd expect crmsh to behave similar to
crm_resource --resource cl-http-lv --set-parameter target-role --meta 
--parameter-value Stopped

At least it should not exit with failure ret code.


Yeah, I see what you mean. I have fixed this upstream now.


Thanks Kristoffer!



Thanks!

// Kristoffer




Best,
Vladislav



Cheers,
Kristoffer



Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems










___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] [Patch] Collection of patches for crmsh

2015-01-19 Thread Vladislav Bogdanov

Hi Dejan,

19.01.2015 16:30, Dejan Muhamedagic wrote:

Hi Vladislav,


[...]
Fix transition start detection.

--- a/modules/constants.py  2014-12-22 08:48:26.0 +
+++ b/modules/constants.py  2014-12-22 13:07:43.945077805 +
@@ -272,7 +272,7 @@
  # r.group(3) file number
  transition_patt = [
  # transition start
-crmd.* do_te_invoke: Processing graph ([0-9]+) .*derived from 
(.*/pe-[^-]+-(%%)[.]bz2),
+pengine.* process_pe_message: Calculated Transition ([0-9]+): 
(.*/pe-[^-]+-(%%)[.]bz2),


Do you know when this changed?


Original message (from do_te_invoke) was downgraded into the 'info' 
priority a long ago (probably during that Andrew's massive logging 
cleanup), while process_pe_message' one still remains at the 'notice' 
level. First my patch has 2012-12-26 as its date (for crmsh-1.2.4), so 
the change was done before that. iirc process_pe_message's message was 
always there, both messages were printed before that cleanup.




The reason I'm asking is that crmsh tries to support multiple
pacemaker versions, so I'm not sure if we can just replace this
pattern.


Make tar follow symlinks.

--- a/modules/crm_pssh.py   2013-08-12 12:52:11.0 +
+++ b/modules/crm_pssh.py   2013-08-12 12:53:32.666444069 +
@@ -170,7 +170,7 @@
  dir = /%s % r.group(1)
  red_pe_l = [x.replace(%s/ % r.group(1), ) for x in pe_l]
  common_debug(getting new PE inputs %s from %s % (red_pe_l, node))
-cmdline = tar -C %s -cf - %s % (dir, ' '.join(red_pe_l))
+cmdline = tar -C %s -chf - %s % (dir, ' '.join(red_pe_l))


Just curious: where did you find links in the PE input
directories?


Ahm, you know, systems are s different around a world ;)
And system administrators sometimes want to do weird things ;)

Actually that one is specific to my diskless clusters, but it wont hurt 
anyways.




And many thanks for the patches!

Cheers,

Dejan



Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Patch] Collection of patches for crmsh

2015-01-19 Thread Vladislav Bogdanov

19.01.2015 16:27, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,


there are two patches, one for crmsh and one for parallax.
They make history commands work.


Thanks!

I have created a pull request with the patches for crmsh here:

https://github.com/crmsh/crmsh/pull/77


Thank you very much, I do not have enough will to make myself ride that 
web-2.0 tools ;)




Cheers,
Kristoffer



Best,
Vladislav






___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] [Patch] Collection of patches for crmsh

2015-01-19 Thread Vladislav Bogdanov
Hi Kristoffer,


there are two patches, one for crmsh and one for parallax.
They make history commands work.

--- a/modules/crm_pssh.py   2015-01-19 11:42:02.0 +
+++ b/modules/crm_pssh.py   2015-01-19 12:17:46.328000847 +
@@ -85,14 +85,14 @@ def do_pssh(l, opts):
'-o', 'PasswordAuthentication=no',
'-o', 'SendEnv=PSSH_NODENUM',
'-o', 'StrictHostKeyChecking=no']
-if opts.options:
+if hasattr(opts, 'options'):
 for opt in opts.options:
 cmd += ['-o', opt]
 if user:
 cmd += ['-l', user]
 if port:
 cmd += ['-p', port]
-if opts.extra:
+if hasattr(opts, 'extra'):
 cmd.extend(opts.extra)
 if cmdline:
 cmd.append(cmdline)
---

--- a/parallax/manager.py   2014-10-15 13:40:04.0 +
+++ b/parallax/manager.py   2015-01-19 12:15:47.911000236 +
@@ -47,11 +47,26 @@ class Manager(object):
 # Backwards compatibility with old __init__
 # format: Only argument is an options dict
 if not isinstance(limit, int):
-self.limit = limit.par
-self.timeout = limit.timeout
-self.askpass = limit.askpass
-self.outdir = limit.outdir
-self.errdir = limit.errdir
+if hasattr(limit, 'par'):
+self.limit = limit.par
+else:
+self.limit = DEFAULT_PARALLELISM
+if hasattr(limit, 'timeout'):
+self.timeout = limit.timeout
+else:
+self.timeout = DEFAULT_TIMEOUT
+if hasattr(limit, 'askpass'):
+self.askpass = limit.askpass
+else:
+self.askpass = False
+if hasattr(limit, 'outdir'):
+self.outdir = limit.outdir
+else:
+self.outdir = None
+if hasattr(limit, 'errdir'):
+self.errdir = limit.errdir
+else:
+self.errdir = None
 else:
 self.limit = limit
 self.timeout = timeout
---

Two more patches I use for ages in my builds are:

Fix transition start detection.

--- a/modules/constants.py  2014-12-22 08:48:26.0 +
+++ b/modules/constants.py  2014-12-22 13:07:43.945077805 +
@@ -272,7 +272,7 @@
 # r.group(3) file number
 transition_patt = [
 # transition start
-crmd.* do_te_invoke: Processing graph ([0-9]+) .*derived from 
(.*/pe-[^-]+-(%%)[.]bz2),
+pengine.* process_pe_message: Calculated Transition ([0-9]+): 
(.*/pe-[^-]+-(%%)[.]bz2),
 # r.group(1) transition number (a different thing from file number)
 # r.group(2) contains full path
 # r.group(3) transition status
---

Make tar follow symlinks.

--- a/modules/crm_pssh.py   2013-08-12 12:52:11.0 +
+++ b/modules/crm_pssh.py   2013-08-12 12:53:32.666444069 +
@@ -170,7 +170,7 @@
 dir = /%s % r.group(1)
 red_pe_l = [x.replace(%s/ % r.group(1), ) for x in pe_l]
 common_debug(getting new PE inputs %s from %s % (red_pe_l, node))
-cmdline = tar -C %s -cf - %s % (dir, ' '.join(red_pe_l))
+cmdline = tar -C %s -chf - %s % (dir, ' '.join(red_pe_l))
 opts = parse_args(outdir, errdir)
 l.append([node, cmdline])
 if not l:
---


Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm configure show to a pipe

2014-11-17 Thread Vladislav Bogdanov
17.11.2014 14:00, Dejan Muhamedagic пишет:
 Hi,
 
 On Mon, Nov 17, 2014 at 10:05:59AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer, all,

 running 'crm configure show  file' appends non-printable chars at the
 end (at least if op_defaults is used):
 
 Best to use crm configure save for filtering (I guess that you
 don't want colors in that case). As for strange codes output,

Great! How I missed that? :)
The only noticeable difference that it is impossible to save partial
CIB, filtering by object ids (like 'show' allows).

 they're most likely due to some libreadline bug and TERM set to
 xterm. I found some information at the time here:
 
 https://bugs.gentoo.org/show_bug.cgi?id=246091
 
 We dealt with that then by not importing readline unless
 absolutely necessary. The changeset is 4d11007. My bad for not
 commenting that in the code.
 
 readline probably gets imported in non-interactive mode again.
 
 Thanks,
 
 Dejan
 
 
 ...
 property cib-bootstrap-options: \
 dc-version=1.1.12-c191bf3 \
 cluster-infrastructure=corosync \
 cluster-recheck-interval=10m \
 stonith-enabled=false \
 no-quorum-policy=freeze \
 last-lrm-refresh=1415955398 \
 maintenance-mode=false \
 stop-all-resources=false \
 stop-orphan-resources=true \
 have-watchdog=false
 rsc_defaults rsc_options: \
 allow-migrate=false \
 failure-timeout=10m \
 migration-threshold=INFINITY \
 multiple-active=stop_start \
 priority=0
 op_defaults op-options: \
 record-pending=true.[?1034h


 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm configure show to a pipe

2014-11-17 Thread Vladislav Bogdanov
17.11.2014 15:39, Kristoffer Grönlund wrote:
 Dejan Muhamedagic deja...@fastmail.fm writes:
 
 Hi,

 On Mon, Nov 17, 2014 at 10:05:59AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer, all,

 running 'crm configure show  file' appends non-printable chars at the
 end (at least if op_defaults is used):

 Best to use crm configure save for filtering (I guess that you
 don't want colors in that case). As for strange codes output,
 they're most likely due to some libreadline bug and TERM set to
 xterm. I found some information at the time here:

 https://bugs.gentoo.org/show_bug.cgi?id=246091

 We dealt with that then by not importing readline unless
 absolutely necessary. The changeset is 4d11007. My bad for not
 commenting that in the code.

 readline probably gets imported in non-interactive mode again.

 
 I can confirm that yes, it does. My apologies for reintroducing this
 issue! I will change this.
 
 I will also look at adding optional filtering to the save command just
 like for show and edit. This seems like a useful feature to me.
 

Thank you for you extremely productive work!

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crmsh and 'no such resource agent' error

2014-11-17 Thread Vladislav Bogdanov
Hi Kristoffer, all,

It seems like with introduction of 'resource-discovery'
'symmetric-cluster=true' becomes not so strict in sense of resource
agents sets across nodes.

May be it is possible to add a config options to disable error messages
like:

got no meta-data, does this RA exist?
no such resource agent

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] crm configure show to a pipe

2014-11-16 Thread Vladislav Bogdanov
Hi Kristoffer, all,

running 'crm configure show  file' appends non-printable chars at the
end (at least if op_defaults is used):

...
property cib-bootstrap-options: \
dc-version=1.1.12-c191bf3 \
cluster-infrastructure=corosync \
cluster-recheck-interval=10m \
stonith-enabled=false \
no-quorum-policy=freeze \
last-lrm-refresh=1415955398 \
maintenance-mode=false \
stop-all-resources=false \
stop-orphan-resources=true \
have-watchdog=false
rsc_defaults rsc_options: \
allow-migrate=false \
failure-timeout=10m \
migration-threshold=INFINITY \
multiple-active=stop_start \
priority=0
op_defaults op-options: \
record-pending=true.[?1034h


Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Pending state support

2014-11-13 Thread Vladislav Bogdanov
13.11.2014 12:20, Ulrich Windl wrote:
 I realized that older versions of crm_mon don't have it (-j); thus it will 
 spit out a usage message. Try to avoid that problem, please.

Yes, it appeared iirc in 1.1.10 or 1.1.11, so simple version check
should be enough. And that check is already implemented and used for
other features.

 
 Vladislav Bogdanov bub...@hoster-ok.com schrieb am 13.11.2014 um 07:26 in
 Nachricht 54644f2c.3020...@hoster-ok.com:
 Hi Kristoffer!

 May I bump this one?

 Best,
 Vladislav

 04.11.2014 11:15, Vladislav Bogdanov wrote:
 Hi Kristoffer, Dejan, all.

 May be it is time to add '-j' param to 'crm_mon -1' by default (if
 supported)?

 Best,
 Vladislav


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 
 
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] crmsh and 'resource-discovery'

2014-11-12 Thread Vladislav Bogdanov
Hi Kristoffer, Dejan.

Do you have plans to add support to crmsh for 'resource-discovery'
location constraint option (added to pacemaker by David in pull requests
#589 and #605) as well as for the 'pacemaker-next' schema (this one
seems to be trivial)?

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crmsh and 'resource-discovery'

2014-11-12 Thread Vladislav Bogdanov
12.11.2014 23:32, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 Hi Kristoffer, Dejan.

 Do you have plans to add support to crmsh for 'resource-discovery'
 location constraint option (added to pacemaker by David in pull requests
 #589 and #605) as well as for the 'pacemaker-next' schema (this one
 seems to be trivial)?

 Best,
 Vladislav

 
 I haven't had time to look closer at resource-discovery, but yes, I
 certainly intend to support every option that makes it into a released
 version of pacemaker at least.

Great. Can't wait for that to happen :)

 
 As for the pacemaker-next schema, I thought I had added support for it
 already, but I haven't actually tested it :) But yes, it should be
 usable in theory at least, and if it is not, that is a bug that I will
 fix.

It is not supported in crmsh-2.1.1-1.1 rpm for EL7 in OBS. Regexps in
three places match only pacemaker-[[:digit:]]\.[[:digit:]] and can
trivially be fixed.

Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh and 'resource-discovery'

2014-11-12 Thread Vladislav Bogdanov
13.11.2014 00:12, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 I haven't had time to look closer at resource-discovery, but yes, I
 certainly intend to support every option that makes it into a released
 version of pacemaker at least.

 Great. Can't wait for that to happen :)


 As for the pacemaker-next schema, I thought I had added support for it
 already, but I haven't actually tested it :) But yes, it should be
 usable in theory at least, and if it is not, that is a bug that I will
 fix.

 It is not supported in crmsh-2.1.1-1.1 rpm for EL7 in OBS. Regexps in
 three places match only pacemaker-[[:digit:]]\.[[:digit:]] and can
 trivially be fixed.
 
 Alright, I have added tentative support for both resource-discovery and
 the pacemaker-next schema in the master branch for crmsh.

Yep!

will test tomorrow morning.

One more place for pacemaker-next is cibconfig.py, CibFactory:__init__
self.supported_cib_re = ^pacemaker-([12][.][0123]|next)$

Thank you,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Pending state support

2014-11-12 Thread Vladislav Bogdanov
Hi Kristoffer!

May I bump this one?

Best,
Vladislav

04.11.2014 11:15, Vladislav Bogdanov wrote:
 Hi Kristoffer, Dejan, all.
 
 May be it is time to add '-j' param to 'crm_mon -1' by default (if
 supported)?
 
 Best,
 Vladislav
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Pending state support

2014-11-04 Thread Vladislav Bogdanov
Hi Kristoffer, Dejan, all.

May be it is time to add '-j' param to 'crm_mon -1' by default (if
supported)?

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-29 Thread Vladislav Bogdanov
29.10.2014 12:49, Dejan Muhamedagic wrote:

...

 On the other hand, this feature is relatively new (has it ever
 been released?) so it is much simpler to fix that breakage in pacemaker.
 
 It's not pacemaker, it's just a resource agent. Which makes it
 much easier to fix, just by introducing one parameter which would
 hold the remote node name.

In this case some pacemaker internals are also involved. RA is just a
stub with a well-known name.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-29 Thread Vladislav Bogdanov
29.10.2014 13:55, Dejan Muhamedagic wrote:
 On Wed, Oct 29, 2014 at 01:03:50PM +0300, Vladislav Bogdanov wrote:
 29.10.2014 12:49, Dejan Muhamedagic wrote:

 ...

 On the other hand, this feature is relatively new (has it ever
 been released?) so it is much simpler to fix that breakage in pacemaker.

 It's not pacemaker, it's just a resource agent. Which makes it
 much easier to fix, just by introducing one parameter which would
 hold the remote node name.

 In this case some pacemaker internals are also involved. RA is just a
 stub with a well-known name.
 
 Really? Oops ;-)
 
 At any rate, Kristoffer did some small patch which makes this
 work for the most part (and as long as the node ID is different
 from its uname; sigh). It's available with the latest release
 2.1.1.

Great!

Thanks for the info.

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-28 Thread Vladislav Bogdanov
28.10.2014 21:15, David Vossel wrote:
 
 
 - Original Message -
 22.10.2014 12:02, Dejan Muhamedagic wrote:
 On Mon, Oct 20, 2014 at 07:12:23PM +0300, Vladislav Bogdanov wrote:
 20.10.2014 18:23, Dejan Muhamedagic wrote:
 Hi Vladislav,

 Hi Dejan!


 On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer,

 do you plan to add support for recently added remote node attributes
 feature to chmsh?

 Currently (at least as of 2.1, and I do not see anything relevant in the
 git log) crmsh fails to update CIB if it contains node attributes for
 remote (bare-metal) node, complaining that duplicate element is found.

 No wonder :) The uname effectively dubs as an element id.

 But for bare-metal nodes it is natural to have ocf:pacemaker:remote
 resource with name equal to remote node uname (I doubt it can be
 configured differently).

 Is that required?

 Didn't look in code, but seems like yes, :remote resource name is the
 only place where pacemaker can obtain that node name.

 I find it surprising that the id is used to carry information.
 I'm not sure if we had a similar case (apart from attributes).

 If I comment check for 'obj_id in id_set', then it fails to update CIB
 because it inserts above primitive definition into the node section.

 Could you please show what would the CIB look like with such a
 remote resource (in crmsh notation).



 node 1: node01
 node rnode001:remote \
attributes attr=value
 primitive rnode001 ocf:pacemaker:remote \
 params server=192.168.168.20 \
 op monitor interval=10 \
 meta target-role=Started

 What do you expect to happen when you reference rnode001, in say:

 That is not me ;) I just want to be able to use crmsh to assign remote
 node operational and utilization (?) attributes and to work with it
 after that.

 Probably that is not yet set in stone, and David may change that
 allowing to f.e. new 'node_name' parameter to ocf:pacemaker:remote
 override remote node name guessed from the primitive name.

 David, could you comment please?
 
 why we would want to separate the remote-node from the resource's primative
 instance name?

It breaks existing crmsh internal concept that every object in a CIB has
unique name. This also breaks syntax of some existing commands, as Dejan
says, f.e.

crm configure show rnode001

or

crm configure edit rnode001 (?)

From what I see it is very hard to modify crmsh to support objects with
different types but with equal names, and that will definitely break its
maturity. On the other hand, this feature is relatively new (has it ever
been released?) so it is much simpler to fix that breakage in pacemaker.

Best,
Vladislav

 
 -- David
 

 Best,
 Vladislav


 crm configure show rnode001

 I'm still trying to digest having hostname used to name some
 other element. Wonder what/where else will we have issues for
 this reason.

 Cheers,

 Dejan

 Best,
 Vladislav

 Given that nodes are for the most part referenced by uname
 (instead of by id), do you think that a configuration where
 a primitive element is named the same as a node, the user can
 handle that in an efficient manner? (NB: No experience here with
 ocf:pacemaker:remote :)




 Cheers,

 Dejan



 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-22 Thread Vladislav Bogdanov
22.10.2014 12:02, Dejan Muhamedagic wrote:
 On Mon, Oct 20, 2014 at 07:12:23PM +0300, Vladislav Bogdanov wrote:
 20.10.2014 18:23, Dejan Muhamedagic wrote:
 Hi Vladislav,

 Hi Dejan!


 On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer,

 do you plan to add support for recently added remote node attributes
 feature to chmsh?

 Currently (at least as of 2.1, and I do not see anything relevant in the
 git log) crmsh fails to update CIB if it contains node attributes for
 remote (bare-metal) node, complaining that duplicate element is found.

 No wonder :) The uname effectively dubs as an element id.

 But for bare-metal nodes it is natural to have ocf:pacemaker:remote
 resource with name equal to remote node uname (I doubt it can be
 configured differently).

 Is that required?

 Didn't look in code, but seems like yes, :remote resource name is the
 only place where pacemaker can obtain that node name.
 
 I find it surprising that the id is used to carry information.
 I'm not sure if we had a similar case (apart from attributes).
 
 If I comment check for 'obj_id in id_set', then it fails to update CIB
 because it inserts above primitive definition into the node section.

 Could you please show what would the CIB look like with such a
 remote resource (in crmsh notation).



 node 1: node01
 node rnode001:remote \
  attributes attr=value
 primitive rnode001 ocf:pacemaker:remote \
 params server=192.168.168.20 \
 op monitor interval=10 \
 meta target-role=Started
 
 What do you expect to happen when you reference rnode001, in say:

That is not me ;) I just want to be able to use crmsh to assign remote
node operational and utilization (?) attributes and to work with it
after that.

Probably that is not yet set in stone, and David may change that
allowing to f.e. new 'node_name' parameter to ocf:pacemaker:remote
override remote node name guessed from the primitive name.

David, could you comment please?

Best,
Vladislav

 
 crm configure show rnode001
 
 I'm still trying to digest having hostname used to name some
 other element. Wonder what/where else will we have issues for
 this reason.
 
 Cheers,
 
 Dejan
 
 Best,
 Vladislav

 Given that nodes are for the most part referenced by uname
 (instead of by id), do you think that a configuration where
 a primitive element is named the same as a node, the user can
 handle that in an efficient manner? (NB: No experience here with
 ocf:pacemaker:remote :)




 Cheers,

 Dejan



 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Remote node attributes support in crmsh

2014-10-20 Thread Vladislav Bogdanov
Hi Kristoffer,

do you plan to add support for recently added remote node attributes
feature to chmsh?

Currently (at least as of 2.1, and I do not see anything relevant in the
git log) crmsh fails to update CIB if it contains node attributes for
remote (bare-metal) node, complaining that duplicate element is found.
But for bare-metal nodes it is natural to have ocf:pacemaker:remote
resource with name equal to remote node uname (I doubt it can be
configured differently).
If I comment check for 'obj_id in id_set', then it fails to update CIB
because it inserts above primitive definition into the node section.

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-20 Thread Vladislav Bogdanov
20.10.2014 18:23, Dejan Muhamedagic wrote:
 Hi Vladislav,

Hi Dejan!

 
 On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer,

 do you plan to add support for recently added remote node attributes
 feature to chmsh?

 Currently (at least as of 2.1, and I do not see anything relevant in the
 git log) crmsh fails to update CIB if it contains node attributes for
 remote (bare-metal) node, complaining that duplicate element is found.
 
 No wonder :) The uname effectively dubs as an element id.
 
 But for bare-metal nodes it is natural to have ocf:pacemaker:remote
 resource with name equal to remote node uname (I doubt it can be
 configured differently).
 
 Is that required?

Didn't look in code, but seems like yes, :remote resource name is the
only place where pacemaker can obtain that node name.

 
 If I comment check for 'obj_id in id_set', then it fails to update CIB
 because it inserts above primitive definition into the node section.
 
 Could you please show what would the CIB look like with such a
 remote resource (in crmsh notation).
 


node 1: node01
node rnode001:remote \
attributes attr=value
primitive rnode001 ocf:pacemaker:remote \
params server=192.168.168.20 \
op monitor interval=10 \
meta target-role=Started


Best,
Vladislav

 Given that nodes are for the most part referenced by uname
 (instead of by id), do you think that a configuration where
 a primitive element is named the same as a node, the user can
 handle that in an efficient manner? (NB: No experience here with
 ocf:pacemaker:remote :)



 
 Cheers,
 
 Dejan
 
 

 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-05-26 Thread Vladislav Bogdanov
26.05.2014 15:01, Kristoffer Grönlund wrote:
 On Tue, 13 May 2014 11:42:16 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 crmsh 2.0 as released unfortunately does not support rules in
 attribute lists. However, I am working on this specific feature
 right now, and it is almost ready to be merged into the mainline
 development branch. I should have it ready some time this week.
 Once that is in, I will also release crmsh 2.1, so there will be
 packages available that supports this feature.  

 Awesome!
 Thank you for info.
 
 Hi again,
 
 Unfortunately due to some unrelated changes in crmsh I am not quite
 ready to release 2.1 just yet, but support for rules in attribute lists
 has been added to the github master branch now:
 
 https://github.com/crmsh/crmsh
 
 The release of the new version is coming soon, but until then, it
 should be possible to build updated rpms for all platforms from source.
 

Thanks Kristoffer!

Are there any known deficiencies which may affect operation?

Vladislav

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-05-13 Thread Vladislav Bogdanov
13.05.2014 11:30, Kristoffer Grönlund wrote:
 On Tue, 13 May 2014 08:26:27 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Hi Kristoffer,

 I may be missing something, but anyways.
 crmsh did not support Using Rules to Control Resource Options
 (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
 in the past.
 Is it supported now, or, if not, do you have plans implementing such
 support?

 
 Hi Vladislav,
 
 crmsh 2.0 as released unfortunately does not support rules in
 attribute lists. However, I am working on this specific feature right
 now, and it is almost ready to be merged into the mainline development
 branch. I should have it ready some time this week. Once that is in, I
 will also release crmsh 2.1, so there will be packages available that
 supports this feature.

Awesome!
Thank you for info.

 
 The syntax will be something like the following:
 
 primitive mySpecialRsc me:Special \
 params 3: rule #uname eq node1 interface=eth1 \
 params 2: rule #uname eq node2 interface=eth2 port= \
 params 1: interface=eth0 port=
 
 Cheers,
 Kristoffer
 
 Best,
 Vladislav

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

 
 
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-05-12 Thread Vladislav Bogdanov
Hi Kristoffer,

I may be missing something, but anyways.
crmsh did not support Using Rules to Control Resource Options
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
in the past.
Is it supported now, or, if not, do you have plans implementing such
support?

Best,
Vladislav

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

2013-11-19 Thread Vladislav Bogdanov
19.11.2013 13:48, Lars Ellenberg wrote:
 On Wed, Nov 13, 2013 at 09:02:47AM +0300, Vladislav Bogdanov wrote:
 13.11.2013 04:46, Jefferson Ogata wrote:
 ...

 In practice i ran into failover problems under load almost immediately.
 Under load, when i would initiate a failover, there was a race
 condition: the iSCSILogicalUnit RA will take down the LUNs one at a
 time, waiting for each connection to terminate, and if the initiators
 reconnect quickly enough, they get pissed off at finding that the target
 still exists but the LUN they were using no longer does, which is often
 the case during this transient takedown process. On the initiator, it
 looks something like this, and it's fatal (here LUN 4 has gone away but
 the target is still alive, maybe working on disconnecting LUN 3):

 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal
 Request [current]
 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit
 not supported
 Nov  7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical
 block 16542656

 One solution to this is using the portblock RA to block all initiator

 In addition I force use of multipath on initiators with no_path_retry=queue

 ...


 1. Lack of support for multiple targets using the same tgt account. This
 is a problem because the iSCSITarget RA defines the user and the target
 at the same time. If it allowed multiple targets to use the same user,
 it wouldn't know when it is safe to delete the user in a stop operation,
 because some other target might still be using it.

 To solve this i did two things: first i wrote a new RA that manages a
 
 Did I miss it, or did you post it somewhere?
 Fork on Github and push there, so we can have a look?
 
 tgt user; this is instantiated as a clone so it runs along with the tgtd
 clone. Second i tweaked the iSCSITarget RA so that on start, if
 incoming_username is defined but incoming_password is not, the RA skips
 the account creation step and simply binds the new target to
 incoming_username. On stop, it similarly no longer deletes the account
 if incoming_password is unset. I also had to relax the uniqueness
 constraint on incoming_username in the RA metadata.

 2. Disappearing LUNs during failover cause initiators to blow chunks.
 For this i used portblock, but had to modify it because the TCP Send-Q
 would never drain.

 3. portblock preventing TCP Send-Q from draining, causing tgtd
 connections to hang. I modified portblock to reverse the sense of the
 iptables rules it was adding: instead of blocking traffic from the
 initiator on the INPUT chain, it now blocks traffic from the target on
 the OUTPUT chain with a tcp-reset response. With this setup, as soon as
 portblock goes active, the next packet tgtd attempts to send to a given
 initiator will get a TCP RST response, causing tgtd to hang up the
 connection immediately. This configuration allows the connections to
 terminate promptly under load.

 I'm not totally satisfied with this workaround. It means
 acknowledgements of operations tgtd has actually completed never make it
 back to the initiator. I suspect this could cause problems in some
 scenarios. I don't think it causes a problem the way i'm using it, with
 each LUN as backing store for a distinct VM--when the LUN is back up on
 the other node, the outstanding operations are re-sent by the initiator.
 Maybe with a clustered filesystem this would cause problems; it
 certainly would cause problems if the target device were, for example, a
 tape drive.
 
 Maybe only block new incoming connection attempts?
 

That may cause issues on an initiator side in some circumstances (IIRC):
* connection is established
* pacemaker fires target move
* target is destroyed, connection breaks (TCP RST is sent to initiator)
* initiator connects again
* target is not available on iSCSI level (but portals answer either on
old or on new node) or portals are not available
* initiator *returns error* to an upper layer - this one is important
* target is configured on other node then

I was hit by this, but that was several years ago, so I may miss some
details.

My experience with IET and LIO shows it is better (safer) to block all
iSCSI traffic to target's portals, both directions.
* connection is established
* pacemaker fires target move
* both directions are blocked (DROP) on both target nodes
* target is destroyed, connection stays established on initiator side,
just TCP packets timeout
* target is configured on other node (VIPs are moved too)
* firewall rules are removed
* initiator (re)sends request
* target sends RST (?) back - it doesn't have that connection
* initiator reconnects and continues to use target


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] iSCSI corruption during interconnect failure with pacemaker+tgt+drbd+protocol C

2013-11-12 Thread Vladislav Bogdanov
13.11.2013 06:10, Jefferson Ogata wrote:
 Here's a problem i don't understand, and i'd like a solution to if
 possible, or at least i'd like to understand why it's a problem, because
 i'm clearly not getting something.
 
 I have an iSCSI target cluster using CentOS 6.4 with stock
 pacemaker/CMAN/corosync and tgt, and DRBD 8.4 which i've built from source.
 
 Both DRBD and cluster comms use a dedicated crossover link.
 
 The target storage is battery-backed RAID.
 
 DRBD resources all use protocol C.
 
 stonith is configured and working.
 
 tgtd write cache is disabled using mode_page in additional_params. This
 is correctly reported using sdparm --get WCE on initiators.
 
 Here's the question: if i am writing from an iSCSI initiator, and i take
 down the crossover link between the nodes of my cluster, i end up with
 corrupt data on the target disk.
 
 I know this isn't the formal way to test pacemaker failover.
 Everything's fine if i fence a node or do a manual migration or
 shutdown. But i don't understand why taking the crossover down results
 in corrupted write operations.
 
 In greater detail, assuming the initiator sends a write request for some
 block, here's the normal sequence as i understand it:
 
 - tgtd receives it and queues it straight for the device backing the LUN
 (write cache is disabled).
 - drbd receives it, commits it to disk, sends it to the other node, and
 waits for an acknowledgement (protocol C).
 - the remote node receives it, commits it to disk, and sends an
 acknowledgement.
 - the initial node receives the drbd acknowledgement, and acknowledges
 the write to tgtd.
 - tgtd acknowledges the write to the initiator.
 
 Now, suppose an initiator is writing when i take the crossover link
 down, and pacemaker reacts to the loss in comms by fencing the node with
 the currently active target. It then brings up the target on the
 surviving, formerly inactive, node. This results in a drbd split brain,
 since some writes have been queued on the fenced node but never made it
 to the surviving node, and must be retransmitted by the initiator; once
 the surviving node becomes active it starts committing these writes to
 its copy of the mirror. I'm fine with a split brain; i can resolve it by
 discarding outstanding data on the fenced node.
 
 But in practice, the actual written data is lost, and i don't understand
 why. AFAICS, none of the outstanding writes should have been
 acknowledged by tgtd on the fenced node, so when the surviving node
 becomes active, the initiator should simply re-send all of them. But
 this isn't what happens; instead most of the outstanding writes are
 lost. No i/o error is reported on the initiator; stuff just vanishes.
 
 I'm writing directly to a block device for these tests, so the lost data
 isn't the result of filesystem corruption; it simply never gets written
 to the target disk on the survivor.
 
 What am i missing?

Do you have handlers (fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;) configured in
drbd.conf?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

2013-11-12 Thread Vladislav Bogdanov
13.11.2013 04:46, Jefferson Ogata wrote:
...
 
 In practice i ran into failover problems under load almost immediately.
 Under load, when i would initiate a failover, there was a race
 condition: the iSCSILogicalUnit RA will take down the LUNs one at a
 time, waiting for each connection to terminate, and if the initiators
 reconnect quickly enough, they get pissed off at finding that the target
 still exists but the LUN they were using no longer does, which is often
 the case during this transient takedown process. On the initiator, it
 looks something like this, and it's fatal (here LUN 4 has gone away but
 the target is still alive, maybe working on disconnecting LUN 3):
 
 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal
 Request [current]
 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit
 not supported
 Nov  7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical
 block 16542656
 
 One solution to this is using the portblock RA to block all initiator

In addition I force use of multipath on initiators with no_path_retry=queue

...

 
 1. Lack of support for multiple targets using the same tgt account. This
 is a problem because the iSCSITarget RA defines the user and the target
 at the same time. If it allowed multiple targets to use the same user,
 it wouldn't know when it is safe to delete the user in a stop operation,
 because some other target might still be using it.
 
 To solve this i did two things: first i wrote a new RA that manages a
 tgt user; this is instantiated as a clone so it runs along with the tgtd
 clone. Second i tweaked the iSCSITarget RA so that on start, if
 incoming_username is defined but incoming_password is not, the RA skips
 the account creation step and simply binds the new target to
 incoming_username. On stop, it similarly no longer deletes the account
 if incoming_password is unset. I also had to relax the uniqueness
 constraint on incoming_username in the RA metadata.
 
 2. Disappearing LUNs during failover cause initiators to blow chunks.
 For this i used portblock, but had to modify it because the TCP Send-Q
 would never drain.
 
 3. portblock preventing TCP Send-Q from draining, causing tgtd
 connections to hang. I modified portblock to reverse the sense of the
 iptables rules it was adding: instead of blocking traffic from the
 initiator on the INPUT chain, it now blocks traffic from the target on
 the OUTPUT chain with a tcp-reset response. With this setup, as soon as
 portblock goes active, the next packet tgtd attempts to send to a given
 initiator will get a TCP RST response, causing tgtd to hang up the
 connection immediately. This configuration allows the connections to
 terminate promptly under load.
 
 I'm not totally satisfied with this workaround. It means
 acknowledgements of operations tgtd has actually completed never make it
 back to the initiator. I suspect this could cause problems in some
 scenarios. I don't think it causes a problem the way i'm using it, with
 each LUN as backing store for a distinct VM--when the LUN is back up on
 the other node, the outstanding operations are re-sent by the initiator.
 Maybe with a clustered filesystem this would cause problems; it
 certainly would cause problems if the target device were, for example, a
 tape drive.
 
 4. Insufficient privileges faults in the portblock RA. This was
 another race condition that occurred because i was using multiple
 targets, meaning that without a mutex, multiple portblock invocations
 would be running in parallel during a failover. If you try to run
 iptables while another iptables is running, you get Resource not
 available and this was coming back to pacemaker as insufficient
 privileges. This is simply a bug in the portblock RA; it should have a
 mutex to prevent parallel iptables invocations. I fixed this by adding
 an ocf_release_lock_on_exit at the top, and adding an ocf_take_lock for
 start, stop, monitor, and status operations.
 
 I'm not sure why more people haven't run into these problems before. I
 hope it's not that i'm doing things wrong, but rather that few others
 haven't earnestly tried to build anything quite like this setup. If
 anyone out there has set up a similar cluster and *not* had these
 problems, i'd like to know about it. Meanwhile, if others *have* had
 these problems, i'd also like to know, especially if they've found
 alternate solutions.

Can't say about 1, I use IET, it doesn't seem to have that limitation.
2 - I use alternative home-brew ms RA which blocks (DROP) both input and
output for a specified VIP on demote (targets are configured to be bound
to that VIPs). I also export one big LUN per target and then set up clvm
VG on top of it (all initiators are in the same another cluster).
3 - can't say as well, IET is probably not affected.
4 - That is true, iptables doesn't have atomic rules management, so you
definitely need mutex or dispatcher like firewalld (didn't try it though).


Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Vladislav Bogdanov
17.09.2013 20:51, Tom Parker wrote:
 
 On 09/17/2013 01:13 AM, Vladislav Bogdanov wrote:
 14.09.2013 07:28, Tom Parker wrote:
 Hello All

 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.

 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.
 I use following in libvirt VM definitions to prevent this:
   on_poweroffdestroy/on_poweroff
   on_rebootdestroy/on_reboot
   on_crashdestroy/on_crash

 Vladislav
 Does this not show as a lot of failed operations?  I guess they will
 clean themselves up after the failure expires.

Exactly. And this is much better than data corruption.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Xen RA and rebooting

2013-09-16 Thread Vladislav Bogdanov
14.09.2013 07:28, Tom Parker wrote:
 Hello All
 
 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.
 
 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.

I use following in libvirt VM definitions to prevent this:
  on_poweroffdestroy/on_poweroff
  on_rebootdestroy/on_reboot
  on_crashdestroy/on_crash

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Using rules for resource options control question

2013-09-13 Thread Vladislav Bogdanov
12.09.2013 11:57, Dejan Muhamedagic wrote:
 Hi Vladislav,
 
 On Wed, Sep 11, 2013 at 02:06:12PM +0300, Vladislav Bogdanov wrote:
 Hi Dejan, all,

 Didn't find the way to configure rule-controlled resource options
 (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
 in crmsh manual.

 Is it implemented and, if yes, how to use it?
 
 No, crmsh supports rules only in location constraints. I guess
 that it shouldn't be such a huge undertaking to support rules for
 attributes if we only knew how to represent them.

Hi Dejan,

Do you mean something like (multiple definitions are allowed if all of
them has score, error otherwise; only one of definitions could have
empty expression)
===
params [score: [expression]] \
   parameters themselves
===
? If it is technically possible to always correctly detect the end of
expression and beginning of parameters (I think it is).
expression parsing could be reused from location constraint.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Using rules for resource options control question

2013-09-11 Thread Vladislav Bogdanov
Hi Dejan, all,

Didn't find the way to configure rule-controlled resource options
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
in crmsh manual.

Is it implemented and, if yes, how to use it?

Thanks,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-04 Thread Vladislav Bogdanov
04.09.2013 07:16, Andrew Beekhof wrote:
 
 On 03/09/2013, at 9:20 PM, Moullé Alain alain.mou...@bull.net
 wrote:
 
 Hello,
 
 A simple question : is there a maximum number of resources (let's
 say simple primitives) that Pacemaker can support at first at
 configuration of ressources via crm, and of course after
 configuration when Pacemaker has to monitor all the primitives ?
 
 Simple answer: it depends
 
 (more precisely, could we envisage around 500 or 600 primitives, or
 is it completely mad ? ;-) )
 
 (I know it is dependant on  node power, CPU, mem, etc., but I'm
 speaking here only of eventual Pacemaker limitations)
 
 There is no inherent limit, the policy engine can cope with many
 thousands.
 
 The CIB is less able to cope - for which batch-limit is useful (to
 throttle the number of operation updates being thrown at the CIB
 which limits its CPU usage). The other limit is local and cluster
 messaging sizes - once the compressed cib gets too big for either or
 both transports you can no longer even run 'cibadmin -Q'
 
 For IPC, the limit is tuneable via the environment. For corosync, its
 high (1Mb) but (I think) only tuneable at compile time.

Are there any possibilities/plans to implement partial messages?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov
03.09.2013 07:04, Digimer wrote:
...
 To solve problem 1, you can set a delay against one of the nodes. Say
 you set the fence primitive for node 01 to have 'delay=15'. When node
 1 goes to fence node 2, it starts immediately. When node 2 starts to
 fence node 1, it sees the 15 second delay and pauses. Node 1 will power
 off node 2 long before node 2 finishes the pause. You can further help
 this problem by disabling acpid on the nodes. Without it, the power-off
 signal from the BMC will be nearly instant, shortening up the window
 where both nodes can initiate a fence.

Does anybody know for sure how and *why* does it work? I mean why
disabling userspace ACPI event reader (which reads just what kernel
sends after hardware events) affects how hardware behaves?

 
 To solve problem 2, simply disable corosync/pacemaker from starting on
 boot. This way, the fenced node will be (hopefully) back up and running,
 so you can ssh into it and look at what happened. It won't try to rejoin
 the cluster though, so no risk of a fence loop.

Enhancement to this would be enabling corosync/pacemaker back during the
clean shutdown and disabling it after boot.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov
03.09.2013 21:45, Digimer wrote:
 On 03/09/13 14:14, Vladislav Bogdanov wrote:
 03.09.2013 07:04, Digimer wrote:
 ...
 To solve problem 1, you can set a delay against one of the nodes. Say
 you set the fence primitive for node 01 to have 'delay=15'. When node
 1 goes to fence node 2, it starts immediately. When node 2 starts to
 fence node 1, it sees the 15 second delay and pauses. Node 1 will power
 off node 2 long before node 2 finishes the pause. You can further help
 this problem by disabling acpid on the nodes. Without it, the power-off
 signal from the BMC will be nearly instant, shortening up the window
 where both nodes can initiate a fence.

 Does anybody know for sure how and *why* does it work? I mean why
 disabling userspace ACPI event reader (which reads just what kernel
 sends after hardware events) affects how hardware behaves?
 
 Disabling acpid causes, in my experience, the node to instantly power
 down when it receives a power-button event. How/why this happens is
 probably buried in the kernel source and/or ACPI definitions.

This assumes some kind of back-events, which are not the part of ACPI
iirc. And kernel just translates forward ACPI events (bits in hw
port???) to userspace.

Interesting enough, how do they do it...

 
 To solve problem 2, simply disable corosync/pacemaker from starting on
 boot. This way, the fenced node will be (hopefully) back up and running,
 so you can ssh into it and look at what happened. It won't try to rejoin
 the cluster though, so no risk of a fence loop.

 Enhancement to this would be enabling corosync/pacemaker back during the
 clean shutdown and disabling it after boot.
 
 That would be a good idea, actually. I like that.
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov
03.09.2013 21:36, Lars Marowsky-Bree wrote:
 On 2013-09-03T21:14:02, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 To solve problem 2, simply disable corosync/pacemaker from starting on
 boot. This way, the fenced node will be (hopefully) back up and running,
 so you can ssh into it and look at what happened. It won't try to rejoin
 the cluster though, so no risk of a fence loop.
 Enhancement to this would be enabling corosync/pacemaker back during the
 clean shutdown and disabling it after boot.
 
 There's something in sbd which does this. See
 https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod and the -S option.

Yes, but I thought it is a no-go with just drbd replicated disks (usual
case for 2-node clusters).

 I'm contemplating how do to this in a generic fashion.

It is quite straight-forward with SysVinit and its emulation with
upstart, but could be tricky with native upstart and systemd. Need to
investigate...

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-27 Thread Vladislav Bogdanov
23.08.2013 16:48, Kristoffer Grönlund wrote:
 Hi,
 
 On Fri, 23 Aug 2013 16:33:28 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 No-no, it was before that fix too, at least with 19a3f1e5833c.
 Should I still try?

 
 Ah, in that case, it has not been fixed.
 
 No need to try. I will investigate further.

I verified that crm_diff produces correct xml diff if I change just one
property, so problem should really be in crmsh.

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-27 Thread Vladislav Bogdanov
27.08.2013 19:11, Dejan Muhamedagic wrote:
 Hi,
 
 On Tue, Aug 27, 2013 at 12:06:40PM +0300, Vladislav Bogdanov wrote:
 23.08.2013 16:48, Kristoffer Grönlund wrote:
 Hi,

 On Fri, 23 Aug 2013 16:33:28 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:

 No-no, it was before that fix too, at least with 19a3f1e5833c.
 Should I still try?


 Ah, in that case, it has not been fixed.

 No need to try. I will investigate further.

 I verified that crm_diff produces correct xml diff if I change just one
 property, so problem should really be in crmsh.
 
 Yes, just found where it is. The fix will be pushed tomorrow.

Yeees!
Thank you for info.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-23 Thread Vladislav Bogdanov
22.08.2013 13:57, Kristoffer Grönlund wrote:
 Hi Takatoshi-san,
 
 On Wed, 21 Aug 2013 13:56:34 +0900
 Takatoshi MATSUO matsuo@gmail.com wrote:
 
 Hi Kristoffer

 I reproduced the error with latest changest(b5ffd99e).
 
 Thank you, with your description I was able to reproduce and create a
 test case for the problem. I have pushed a workaround for the issue in
 the crm shell which stops the crm shell from adding comments to the
 CIB. (changeset e35236439b8e)

Kristoffer, Dejan, could you please also look why I loose the whole
rsc_defaults $id=rsc_options
section when I do 'crm configure edit' and edit one of
property $id=cib-bootstrap-options?

pacemaker is 1.1.10, crmsh is latest tip.

Relevant log lines from cib process are:
Aug 23 08:44:23 mgmt01 crm_verify[5891]:   notice: crm_log_args: Invoked: 
crm_verify -V -p 
Aug 23 08:44:24 mgmt01 cibadmin[5897]:   notice: crm_log_args: Invoked: 
cibadmin -p -P 
Aug 23 08:44:25 mgmt01 crmd[10180]:   notice: do_state_transition: State 
transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: log_cib_diff: cib:diff: Local-only 
Change: 0.772.1
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
value=100 id=cib-bootstrap-options-default-resource-stickiness/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: --   
meta_attributes id=rsc_options
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=allow-migrate value=false id=rsc_options-allow-migrate/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=failure-timeout value=10m id=rsc_options-failure-timeout/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=migration-threshold value=INFINITY 
id=rsc_options-migration-threshold/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=multiple-active value=stop_start id=rsc_options-multiple-active/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=priority value=0 id=rsc_options-priority/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: --   
/meta_attributes
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: ++ nvpair 
name=default-resource-stickiness value=10 
id=cib-bootstrap-options-default-resource-stickiness/
Aug 23 08:44:28 mgmt01 crmd[10180]:   notice: run_graph: Transition 84 
(Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-28.bz2): Complete
Aug 23 08:44:28 mgmt01 crmd[10180]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Aug 23 08:44:28 mgmt01 pengine[10179]:   notice: process_pe_message: Calculated 
Transition 84: /var/lib/pacemaker/pengine/pe-input-28.bz2

What I edited is default-resource-stickiness, but the whole meta_attributes 
id=rsc_options
gone too.

Vladislav

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-23 Thread Vladislav Bogdanov
23.08.2013 16:10, Kristoffer Grönlund wrote:
 Hi Vladislav,
 
 On Fri, 23 Aug 2013 11:50:54 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Kristoffer, Dejan, could you please also look why I loose the whole
 rsc_defaults $id=rsc_options
 section when I do 'crm configure edit' and edit one of
 property $id=cib-bootstrap-options?

 
 Hm, that is not good. I suspect that this may be a regression that I
 caused when creating the workaround for the previously reported error.

No-no, it was before that fix too, at least with 19a3f1e5833c.
Should I still try?

 I have narrowed the fix to be more precise in the crmsh repository
 (commit 8a539c209eb0), it would be great if you could try using that
 version of crmsh instead and see if that solves your issue.
 
 Thank you,
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Vladislav Bogdanov
22.08.2013 15:08, Ferenc Wagner wrote:
 Hi,
 
 Our setup uses some cluster wide pieces of meta information.  Think
 access control lists for resource instances used by some utilities or
 some common configuration data used by the resource agents.  Currently
 this info is stored in local files on the nodes or replicated in each
 primitive as parameters.  I find this suboptimal, as keeping them in
 sync is a hassle.  It is possible to store such stuff in the fake
 parameter of unmanaged Dummy resources, but that clutters the status
 output.  Can somebody offer some advice in this direction?  Or is this
 idea a pure heresy?
 

You may use meta attributes of any primitives for that. Although crmsh
doe not like that very much, it can be switched to a relaxed mode.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Adding node in advance

2013-07-15 Thread Vladislav Bogdanov
15.07.2013 12:36, Dejan Muhamedagic wrote:
 Hi Vladislav,
 
 On Fri, Jul 12, 2013 at 01:48:34PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I wanted to add new node into CIB in advance, before it is powered on
 (to power it on in a standby mode while cl#5169 is not implemented).

 So, I did
 ==
 [root@vd01-a tmp]# cat u
 node $id=4 vd01-d \
 attributes standby=on virtualization=true
 [root@vd01-a tmp]# crm configure load update u
 ERROR: 4: invalid object id
 ==
 
 According to the w3c recommendation, an id cannot start with a
 digit. However, I missed that node ids are actually defined as
 text. The test for node ids is now relaxed.
 
 Exactly the same syntax is accepted for already-known node.
 
 The id test happens only for cli snippets. It is assumed that the
 XML has already been validated.
 

Thank you, will test.

Vladislav


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] crm resource restart is broken (d4de3af6dd33)

2013-07-12 Thread Vladislav Bogdanov
Hi Dejan,

It seems like resource restart does not work any longer.

# crm resource restart test01-vm
INFO: ordering test01-vm to stop
Traceback (most recent call last):
  File /usr/sbin/crm, line 44, in module
main.run()
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 442, in run
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 349, in do_work
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 150, in 
parse_line
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149, in lambda
  File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 894, in restart
  File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 429, in wait4dc
  File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 544, in 
crm_msec
  File /usr/lib64/python2.6/re.py, line 137, in match
TypeError: expected string or buffer


Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm resource restart is broken (d4de3af6dd33)

2013-07-12 Thread Vladislav Bogdanov
12.07.2013 12:06, Vladislav Bogdanov wrote:
 Hi Dejan,
 
 It seems like resource restart does not work any longer.

Ah, this seems to be fixed by bb39cce17f20. Sorry for noise.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-12 Thread Vladislav Bogdanov
01.07.2013 17:29, Vladislav Bogdanov wrote:
 Hi,
 
 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).
 
 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #
 
 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.
 
 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 

With latest merge of Andrew's public and private trees and crmsh tip
everything works as expected.
The only (minor but confusing) issue is:

[root@vd01-a ~]# crm_node -l
3 vd01-c
4 vd01-d
1 vd01-a
2 vd01-b
[root@vd01-a ~]# crm_node -p
vd01-c vd01-a vd01-b
[root@vd01-a ~]# crm node delete vd01-d
WARNING: crm_node --force -R vd01-d failed, rc=1

Looks like missing crm_exit(pcmk_ok) for -R in try_corosync().

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Adding node in advance

2013-07-12 Thread Vladislav Bogdanov
Hi,

I wanted to add new node into CIB in advance, before it is powered on
(to power it on in a standby mode while cl#5169 is not implemented).

So, I did
==
[root@vd01-a tmp]# cat u
node $id=4 vd01-d \
attributes standby=on virtualization=true
[root@vd01-a tmp]# crm configure load update u
ERROR: 4: invalid object id
==

Exactly the same syntax is accepted for already-known node.

this is corosync-2.3.1 with nodelist/udpu, pacemaker master and crmsh tip.

Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-10 Thread Vladislav Bogdanov
10.07.2013 18:14, Dejan Muhamedagic wrote:
...
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 
 You can now:
 
 # crm --force node delete ...
 

Thanks,

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-09 Thread Vladislav Bogdanov
03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.
 
 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

Nope. Not enough. Even worse than before. I tested todays tip as it
includes that patch with merge of Andrew's public and private master heads.
=
[root@v02-b ~]# crm node show
v02-a(5): normal
standby: off
virtualization: true
$id: nodes-5
v02-b(6): normal
standby: off
virtualization: true
v02-c(7): normal
standby: off
virtualization: true
v02-d(8): normal(offline)
standby: off
virtualization: true
[root@v02-b ~]# crm node delete v02-d
ERROR: according to crm_node, node v02-d is still active
[root@v02-b ~]# crm_node -p
v02-c v02-d v02-a v02-b
[root@v02-b ~]# crm_node -l
7 v02-c
8 v02-d
5 v02-a
6 v02-b
[root@v02-b ~]#
=

That is after I stopped node, lowered votequorum expected_votes (with
corosync-quorumtool) and deleted v02-d from a cmap nodelist.

corosync-cmapctl still shows runtime info about deleted node as well:
runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.8.status (str) = left
And it is not allowed to delete that keys.

crm_node -R did the job (nothing left in the CIB), but, v02-d still
appears in its output for both -p and -l.

Andrew, I copy you directly because above is probably to you. Shouldn't
crm_node some-how show that stopped node is deleted from a corosync
nodelist?

Also, for some reason one node (v02-c) still had expected_votes set to
4, while other two remaining had it set to correct 3. That is of course
another story and need additional investigations. May be I just missed
something.


Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-09 Thread Vladislav Bogdanov
10.07.2013 03:39, Andrew Beekhof wrote:
 
 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
standby: off
virtualization: true
$id: nodes-5
 v02-b(6): normal
standby: off
virtualization: true
 v02-c(7): normal
standby: off
virtualization: true
 v02-d(8): normal(offline)
standby: off
virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?
 
 Which stack is this?

corosync 2.3 with nodelist and udpu.

 

 Also, for some reason one node (v02-c) still had expected_votes set to
 4, while other two remaining had it set to correct 3. That is of course
 another story and need additional investigations. May be I just missed
 something.


 Best,
 Vladislav

 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-09 Thread Vladislav Bogdanov
10.07.2013 07:05, Andrew Beekhof wrote:
 
 On 10/07/2013, at 2:04 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 10.07.2013 03:39, Andrew Beekhof wrote:

 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 
 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
   standby: off
   virtualization: true
   $id: nodes-5
 v02-b(6): normal
   standby: off
   virtualization: true
 v02-c(7): normal
   standby: off
   virtualization: true
 v02-d(8): normal(offline)
   standby: off
   virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?

 Which stack is this?

 corosync 2.3 with nodelist and udpu.
 
 I assume its possible, but crm_node isn't smart enough to do that yet.
 Feel like writing a patch? :)

Shouldn't it just skip offline nodes for -p?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-09 Thread Vladislav Bogdanov
10.07.2013 08:13, Andrew Beekhof wrote:
 
 On 10/07/2013, at 2:15 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 10.07.2013 07:05, Andrew Beekhof wrote:

 On 10/07/2013, at 2:04 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 10.07.2013 03:39, Andrew Beekhof wrote:

 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 
 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master 
 heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
  standby: off
  virtualization: true
  $id: nodes-5
 v02-b(6): normal
  standby: off
  virtualization: true
 v02-c(7): normal
  standby: off
  virtualization: true
 v02-d(8): normal(offline)
  standby: off
  virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?

 Which stack is this?

 corosync 2.3 with nodelist and udpu.

 I assume its possible, but crm_node isn't smart enough to do that yet.
 Feel like writing a patch? :)

 Shouldn't it just skip offline nodes for -p?

 
 Worse. It appears to be asking pacemakerd instead of corosync or crmd.
 

Hm. I do not believe I'm able to refactor it then...

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-09 Thread Vladislav Bogdanov
10.07.2013 08:38, Andrew Beekhof wrote:
 
 On 10/07/2013, at 3:37 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 10.07.2013 08:13, Andrew Beekhof wrote:

 On 10/07/2013, at 2:15 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 10.07.2013 07:05, Andrew Beekhof wrote:

 On 10/07/2013, at 2:04 PM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 10.07.2013 03:39, Andrew Beekhof wrote:

 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 
 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is 
 still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master 
 heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
 standby: off
 virtualization: true
 $id: nodes-5
 v02-b(6): normal
 standby: off
 virtualization: true
 v02-c(7): normal
 standby: off
 virtualization: true
 v02-d(8): normal(offline)
 standby: off
 virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?

 Which stack is this?

 corosync 2.3 with nodelist and udpu.

 I assume its possible, but crm_node isn't smart enough to do that yet.
 Feel like writing a patch? :)

 Shouldn't it just skip offline nodes for -p?


 Worse. It appears to be asking pacemakerd instead of corosync or crmd.


 Hm. I do not believe I'm able to refactor it then...

 
 Yeah, I'm looking at it.
 The hard part is that going to corosync directly only gives you a nodeid :-(
 

Don't you need to get info from both sources anyway (offline in crmd
and joined in corosync case - node has corosync started, but pacemaker
is not)?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-05 Thread Vladislav Bogdanov
04.07.2013 19:09, Dejan Muhamedagic wrote:
 On Thu, Jul 04, 2013 at 05:40:07PM +0300, Vladislav Bogdanov wrote:
 04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get *really*
 complex, quickly. And eventually, one ends with the need to annotate the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

   primitive r1 params p1=v1 p2=v2 meta m1=mv1

   primitive r1 params p1=nv1 p3=v3 - merge

   ---

   primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.

 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

 Ok, you know it much better ;)


 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).

 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.

 The only question is how to remove existing attributes.

 Not many choices here I think... Either set to empty or better
 predefined value (empty value may be still valid and used to override
 not-empty default one) like Lars suggested
 
 Yes, and that would be up to the users.
 
 or use some additional
 formatting ( -param_name ). Second way probably requires new load method.
 
 That's the one I'd be interested in, but for now most of the
 possibilities seem to come from the kludge domain :)

You may also look at ldif(5) (part of openldap) to see how this is
solved in the LDAP world. May be that could give some valuable pointers
(although I do not see how apply that directly). There are trick to
replace only one value from a set (or to ensure that records virtual ID
is not modified) - use delete/add instead of replace.

 
 BTW, did you ever try the configure filter command?

 Hm..
 Not yet :)
 But how that can help?
 
 filter is to edit what sed to ed is. It got actually introduced
 so that we can do automatic regression testing of the edit
 command. You could conceivably, depending on your use case,
 produce a script to get what you need. Just not sure how
 difficult it would be to get the processing right.

Thank you, but this is too complicated for my case.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-05 Thread Vladislav Bogdanov
05.07.2013 14:38, Dejan Muhamedagic wrote:
 On Fri, Jul 05, 2013 at 09:31:07AM +0300, Vladislav Bogdanov wrote:
 04.07.2013 19:09, Dejan Muhamedagic wrote:
 On Thu, Jul 04, 2013 at 05:40:07PM +0300, Vladislav Bogdanov wrote:
 04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be 
 affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get 
 *really*
 complex, quickly. And eventually, one ends with the need to annotate 
 the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

 primitive r1 params p1=v1 p2=v2 meta m1=mv1

 primitive r1 params p1=nv1 p3=v3 - merge

 ---

 primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.

 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

 Ok, you know it much better ;)


 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).

 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.

 The only question is how to remove existing attributes.

 Not many choices here I think... Either set to empty or better
 predefined value (empty value may be still valid and used to override
 not-empty default one) like Lars suggested

 Yes, and that would be up to the users.

 or use some additional
 formatting ( -param_name ). Second way probably requires new load method.

 That's the one I'd be interested in, but for now most of the
 possibilities seem to come from the kludge domain :)

 You may also look at ldif(5) (part of openldap) to see how this is
 solved in the LDAP world. May be that could give some valuable pointers
 (although I do not see how apply that directly). There are trick to
 replace only one value from a set (or to ensure that records virtual ID
 is not modified) - use delete/add instead of replace.
 
 Yes, something similar crossed my mind, but I wanted to avoid
 too much verbosity.

Understand.

May be it is possible to start with just not touch the whole section
(like params, meta, utilization, node attributes) if it does not exist
in the update, or if it contains just pre-defined value (f.e. #keep)?

Ughm...

What if to introduce *optional* replacement policy for the whole section?
I mean:

params #merge param1=value1 param2=value2

meta #replace ...

utilization #keep

and so on. With default to #replace?

 
 BTW, did you ever try the configure filter command?

 Hm..
 Not yet :)
 But how that can help?

 filter is to edit what sed to ed is. It got actually introduced
 so that we can do automatic regression testing of the edit
 command. You could conceivably, depending on your use case,
 produce a script to get what you need. Just not sure how
 difficult it would be to get the processing right.

 Thank you, but this is too complicated for my case.
 
 It seems to be too complicated for most uses :( Though something
 similar with more use potential could most probably be designed.
 
 Thanks,
 
 Dejan
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-05 Thread Vladislav Bogdanov
05.07.2013 16:25, Vladislav Bogdanov wrote:
 05.07.2013 14:38, Dejan Muhamedagic wrote:
 On Fri, Jul 05, 2013 at 09:31:07AM +0300, Vladislav Bogdanov wrote:
 04.07.2013 19:09, Dejan Muhamedagic wrote:
 On Thu, Jul 04, 2013 at 05:40:07PM +0300, Vladislav Bogdanov wrote:
 04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be 
 affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get 
 *really*
 complex, quickly. And eventually, one ends with the need to annotate 
 the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

primitive r1 params p1=v1 p2=v2 meta m1=mv1

primitive r1 params p1=nv1 p3=v3 - merge

---

primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.

 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

 Ok, you know it much better ;)


 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).

 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.

 The only question is how to remove existing attributes.

 Not many choices here I think... Either set to empty or better
 predefined value (empty value may be still valid and used to override
 not-empty default one) like Lars suggested

 Yes, and that would be up to the users.

 or use some additional
 formatting ( -param_name ). Second way probably requires new load method.

 That's the one I'd be interested in, but for now most of the
 possibilities seem to come from the kludge domain :)

 You may also look at ldif(5) (part of openldap) to see how this is
 solved in the LDAP world. May be that could give some valuable pointers
 (although I do not see how apply that directly). There are trick to
 replace only one value from a set (or to ensure that records virtual ID
 is not modified) - use delete/add instead of replace.

 Yes, something similar crossed my mind, but I wanted to avoid
 too much verbosity.
 
 Understand.
 
 May be it is possible to start with just not touch the whole section
 (like params, meta, utilization, node attributes) if it does not exist
 in the update, or if it contains just pre-defined value (f.e. #keep)?
 
 Ughm...
 
 What if to introduce *optional* replacement policy for the whole section?
 I mean:
 
 params #merge param1=value1 param2=value2
 
 meta #replace ...
 
 utilization #keep
 
 and so on. With default to #replace?

Even more.
If we allow such meta lexems anywhere (not only at the very
beginning), then they may be applied only to the rest of string (or
before other meta lexem).

The best thing I see about this idea is this is fully backwards compatible.

 

 BTW, did you ever try the configure filter command?

 Hm..
 Not yet :)
 But how that can help?

 filter is to edit what sed to ed is. It got actually introduced
 so that we can do automatic regression testing of the edit
 command. You could conceivably, depending on your use case,
 produce a script to get what you need. Just not sure how
 difficult it would be to get the processing right.

 Thank you, but this is too complicated for my case.

 It seems to be too complicated for most uses :( Though something
 similar with more use potential could most probably be designed.

 Thanks,

 Dejan

 ___
 Linux-HA mailing

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-05 Thread Vladislav Bogdanov
05.07.2013 19:46, Lars Marowsky-Bree wrote:
 On 2013-07-05T19:06:54, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 params #merge param1=value1 param2=value2

 meta #replace ...

 utilization #keep

 and so on. With default to #replace?

 Even more.
 If we allow such meta lexems anywhere (not only at the very
 beginning), then they may be applied only to the rest of string (or
 before other meta lexem).

 The best thing I see about this idea is this is fully backwards compatible.
 
 From a language aesthetics point of view, this gives me the utter
 creeps. Don't make me switch to pcs! ;-)

;)

 
 I could live with a proper merge/update, replace command as a
 prefix, just like we now have delete, though. Similar to what we do
 for groups.

delete is a command, not a configuration syntax part.
What I propose is fully optional hint-like language extension, with
defaults to the current behavior.
What I'm interested myself is just #keep part, but, yes, I prefer
complete general solutions.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-04 Thread Vladislav Bogdanov
03.07.2013 02:24, Andrew Beekhof wrote:

...


 I don't even know what I'm thinking half the time, I'd not recommend trying 
 to guess :)
 No fundamental objection to such a feature, but I'd be reluctant to add it 
 until we get an attrd that was truly atomic.
 That code is mostly bandages and sticky tape.

I filed cl#5165 for that.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-04 Thread Vladislav Bogdanov
03.07.2013 16:28, Vladislav Bogdanov wrote:
...
 
 So I'd probably just hack crmsh to not touch node utilization attributes
 if whole 'utilization' part is missing in the update.

Unfortunately this doesn't seem to be possible with my python
programming level (near zero)... :(

It is clear for me that I need to conditionally modify calls to
'replace' family of methods to newly-implemented 'merge' ones, but I do
not like adding hacks (and do not see how to do that) to the otherwise
generic code and think that only general concept reevaluation may help
to do that properly. S.O.S. :)


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-04 Thread Vladislav Bogdanov
04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get *really*
 complex, quickly. And eventually, one ends with the need to annotate the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

 primitive r1 params p1=v1 p2=v2 meta m1=mv1

 primitive r1 params p1=nv1 p3=v3 - merge

 ---

 primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.
 
 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

Ok, you know it much better ;)

 
 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).
 
 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.
 
 The only question is how to remove existing attributes.

Not many choices here I think... Either set to empty or better
predefined value (empty value may be still valid and used to override
not-empty default one) like Lars suggested or use some additional
formatting ( -param_name ). Second way probably requires new load method.

 
 BTW, did you ever try the configure filter command?

Hm..
Not yet :)
But how that can help?

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-04 Thread Vladislav Bogdanov
04.07.2013 17:40, Vladislav Bogdanov wrote:
...
 The only question is how to remove existing attributes.

Another one is how to forcibly replace the whole section or the whole
object definition, without caring about its original content.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-03 Thread Vladislav Bogdanov
03.07.2013 13:00, Lars Marowsky-Bree wrote:
 On 2013-07-03T00:20:19, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 I do not edit them. I my setup I generate full crm config with
 template-based framework.
 
 And then you do a load/replace? Tough; yes, that'll clearly overwrite

Actually 'load update'.
'replace' doesn't work when resources are running.

 what is already there and added by scripts that more dynamically modify
 the CIB.
 
 Since we don't know your complete merging rules, it's probably easier if
 your template engine gains hooks to first read the CIB for setting those
 utilization values.

Probably. But not template framework itself (it is combination of make
ans m4 actually, so it is too stupid too lookup CIB). So I'd nee to move
that onto next model level (human or controlling framework, which I'm in
process of implementing) - but that is actually what I wanted to happen
(it breaks the whole idea).

So I'd probably just hack crmsh to not touch node utilization attributes
if whole 'utilization' part is missing in the update.
If/when pacemaker has support for transient utilization attributes, I
will move to that.


 
 That is very convenient way to f.e stop dozen of resources in one shot
 for some maintenance. I have special RA which creates ticket on a
 cluster start and deletes it on a cluster stop. And many resources may
 depend on that ticket. If you request resource handled by that RA to
 stop, ticket is revoked and all dependent resources stop.

 I wouldn't write that RA if I have cluster-wide attributes (which
 perform like node attributes but for a whole cluster).
 
 Right. But. Tickets *are* cluster wide attributes that are meant to
 control the target-role of many resources depending on them. So you're
 getting exactly what you need, no? What is missing?

They are volatile.

And, I'd prefer cluster attributes to have free-form values. I was
already hit by the fact that two-state 'granted/revoked' value is too
limited for me. I then expanded logic to also use non-existent' ticket
state (it worked for some time), but then support for active/standby
came in and I switched to that.

Tat all was in lustre-server RA, which needs to control order in which
parts of a whole lustre fs are tuned/activated when it moves to another
cluster on a ticket revocation. I use additional internally-controlled
ticket there.

Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-03 Thread Vladislav Bogdanov
03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,
 
 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get *really*
 complex, quickly. And eventually, one ends with the need to annotate the
 input with how one wants a merge to be resolved (such as #undef
 values).
 
 Perhaps I misunderstood the original intention, but the idea was
 more simple:
 
   primitive r1 params p1=v1 p2=v2 meta m1=mv1
 
   primitive r1 params p1=nv1 p3=v3 - merge
 
   ---
 
   primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1
 
 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

I'd simplify that logic to sections.
* node attributes (except pacemaker internal ones like $id?)
* node utilization
* primitive params
* primitive meta
* primitive utilization
* clone/ms meta

If whole section is missing in the update, then leave it as-is.
Otherwise (also if it exists but empty) replace the whole section.

The only unclear thing is 'op', but this one can be replaced
unconditionally (like in the current logic).

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
28.06.2013 17:47, Dejan Muhamedagic wrote:
...
 If you want to test here's a new patch. It does work with
 unrelated changes happening in the meantime. I didn't test yet
 really concurrent updates.
 

One thing I see immediately, is that node utilization attributes are
deleted after I do 'load update' with empty node utilization sections.
That is probably not specific to this patch.

I have that attributes dynamic, set from a RA (as node configuration may
vary, I prefer to detect how much CPU and RAM I have and set utilization
accordingly rather then put every hardware change into CIB).

On the one hand, I would agree that crmsh does what is intended - if no
utilization attributes is set in a config update, then they shoud be
removed.
On the other, I would prefer to delete node utilization attributes on
update only if new definition contains 'utilization' section but without
that attributes.

Or may be it is possible to use transient utilization attributes?
I don't think so... Ugh, that would be nice.

Everything else works fine, I was able to do:
# crm configure
crm(live)configure# edit
edit target-role attribute on a clone
In other shell: crm resource stop...
crm(live)configure# commit

And that didn't produce any errors.
Both actions completed correctly.

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
02.07.2013 12:27, Lars Marowsky-Bree wrote:
 On 2013-07-02T11:05:01, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 One thing I see immediately, is that node utilization attributes are
 deleted after I do 'load update' with empty node utilization sections.
 That is probably not specific to this patch.
 
 Yes, that isn't specific to that.
 
 I have that attributes dynamic, set from a RA (as node configuration may
 vary, I prefer to detect how much CPU and RAM I have and set utilization
 accordingly rather then put every hardware change into CIB).
 
 Or may be it is possible to use transient utilization attributes?
 I don't think so... Ugh, that would be nice.
 
 Yes, that's exactly what you need here.

I know, but I do not expect that to be implemented soon. Together with
cluster-wide attributes for which I use hack with tickets now. But
tickets currently are quite limited - they have only 4 states, so it is
impossible to put f.e. number there.

I fully understand Andrew's point when he is unwilling to implement
features for just two setups, so... Probably I need to extend crmsh with
site-specific patch until that is implemented. That would be acceptable
work-around for me... And chance to learn python nevertheless ;)



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
02.07.2013 14:55, Andrew Beekhof wrote:
 
 On 02/07/2013, at 8:14 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 02.07.2013 12:27, Lars Marowsky-Bree wrote:
 On 2013-07-02T11:05:01, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 One thing I see immediately, is that node utilization attributes are
 deleted after I do 'load update' with empty node utilization sections.
 That is probably not specific to this patch.

 Yes, that isn't specific to that.

 I have that attributes dynamic, set from a RA (as node configuration may
 vary, I prefer to detect how much CPU and RAM I have and set utilization
 accordingly rather then put every hardware change into CIB).

 Or may be it is possible to use transient utilization attributes?
 I don't think so... Ugh, that would be nice.

 Yes, that's exactly what you need here.

 I know, but I do not expect that to be implemented soon. Together with
 cluster-wide attributes for which I use hack with tickets now. But
 tickets currently are quite limited - they have only 4 states, so it is
 impossible to put f.e. number there.

 I fully understand Andrew's point when he is unwilling to implement
 features for just two setups, so...
 
 What feature am I not considering here?  I don't follow.

I didn't ask about that yet. Just assuming what your possible reaction
could be. :)
Support for transient utilization attributes, which do not go to config
section, but to state section. I would say that is overkill to implement
that (and somehow merge two sections when doing utilization calculation)
if nobody except me is affected by absence of that.

F.e. I need to do CIB update (think of it as of full replace), because I
generate crmsh configuration with custom template-based system. And I
have some RAs which set utilization attributes on nodes.
Now, when I apply my full brand new config to a cluster after making
some changes here and there, that attributes are lost.

Transient utilization attributes would help me (I would use them in my RAs).
But, I wouldn't say that is a common setup. That's why I assume you
won't be a fan of implementing them.

 
 Probably I need to extend crmsh with
 site-specific patch until that is implemented. That would be acceptable
 work-around for me... And chance to learn python nevertheless ;)

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
02.07.2013 15:13, Lars Marowsky-Bree wrote:
 On 2013-07-02T13:14:48, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Yes, that's exactly what you need here.
 I know, but I do not expect that to be implemented soon.
 
 crm_attribute -l reboot -z doesn't strike me as an unlikely request.
 You could file an enhancement request for that.
 
 But with the XML diff feature, as long as you're not editing the node
 section, that shouldn't be a problem - unrelated changes shouldn't
 overwrite those attributes, right? That being the whole point?

I do not edit them. I my setup I generate full crm config with
template-based framework. That's why nodes go there too. And I can't
skip them, because I heavily use ordinary node attributes and they
change sometimes.

 
 (Of course, if you remove them in the copy, that'd remove them.)
 
 But tickets currently are quite limited - they have only 4 states, so
 it is impossible to put f.e. number there.
 
 What are you trying to do with that?

That is very convenient way to f.e stop dozen of resources in one shot
for some maintenance. I have special RA which creates ticket on a
cluster start and deletes it on a cluster stop. And many resources may
depend on that ticket. If you request resource handled by that RA to
stop, ticket is revoked and all dependent resources stop.

I wouldn't write that RA if I have cluster-wide attributes (which
perform like node attributes but for a whole cluster).


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
03.07.2013 00:16, Lars Marowsky-Bree wrote:
 On 2013-07-03T00:11:53, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 F.e. I need to do CIB update (think of it as of full replace), because I
 generate crmsh configuration with custom template-based system. And I
 have some RAs which set utilization attributes on nodes.
 
 The template system should insert the *diff*, not do a full replace,
 obviously, when new resources are added or previous ones removed.

Yes. And I rely on crmsh to do that. With additional hooks for stale
resource deletion.

 
 And transient load attributes also seem to suggest that you're doing a
 whole lot of that. That's probably beyond what utilization was
 originally spec'ed for (simplifying location constraints for a large
 number of pretty similar resources, e.g., VMs). Can I ask what you're
 doing?

With utilization I just set node utilization attributes to what node X
has right now. I'm free to replace hardware in a cluster, am I? And that
way I always have utilization attributes consistent with real hardware,
independently of CIB configuration I produce with my template framework.
And I use utilization for what it was originally intended - for VMs.

 
 And what are you doing with tickets? ;-)

Answered in another message.

 
 
 Regards,
 Lars
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
03.07.2013 00:20, Vladislav Bogdanov wrote:
...
 But tickets currently are quite limited - they have only 4 states, so
 it is impossible to put f.e. number there.

 What are you trying to do with that?
 
 That is very convenient way to f.e stop dozen of resources in one shot
 for some maintenance. I have special RA which creates ticket on a
 cluster start and deletes it on a cluster stop. And many resources may
 depend on that ticket. If you request resource handled by that RA to
 stop, ticket is revoked and all dependent resources stop.

Ah, and in one setup (lustre fs on top of geo-clustered two-layer drbd)
I also use ticket revocation to cause a transition abort in a
controllable way, so advisory ordering constraints work. Idea is to do a
CIB modification after some event, so after that advisory-ordered
resources are to be stopped in the same transition, and they are stopped
in an order I want.

IMHO nice hack ;)

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-02 Thread Vladislav Bogdanov
02.07.2013 20:05, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 11:05:01AM +0300, Vladislav Bogdanov wrote:
 28.06.2013 17:47, Dejan Muhamedagic wrote:
 ...
 If you want to test here's a new patch. It does work with
 unrelated changes happening in the meantime. I didn't test yet
 really concurrent updates.


 One thing I see immediately, is that node utilization attributes are
 deleted after I do 'load update' with empty node utilization sections.
 That is probably not specific to this patch.
 
 Right.
 
 I have that attributes dynamic, set from a RA (as node configuration may
 vary, I prefer to detect how much CPU and RAM I have and set utilization
 accordingly rather then put every hardware change into CIB).

 On the one hand, I would agree that crmsh does what is intended - if no
 utilization attributes is set in a config update, then they shoud be
 removed.
 
 Well, thinking more about it, the attributes should be merged.
 The only trouble is that that would then change the command
 semantically.

Not sure that is expected by most people.
How you then delete attributes?

If you really think about implementing that merging, I would introduce a
crmsh config option for that. F.e. node_attr_policy (replace|merge).

And default value should be the current one.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] crm node delete

2013-07-01 Thread Vladislav Bogdanov
Hi,

I'm trying to look if it is now safe to delete non-running nodes
(corosync 2.3, pacemaker HEAD, crmsh tip).

# crm node delete v02-d
WARNING: 2: crm_node bad format: 7 v02-c
WARNING: 2: crm_node bad format: 8 v02-d
WARNING: 2: crm_node bad format: 5 v02-a
WARNING: 2: crm_node bad format: 6 v02-b
INFO: 2: node v02-d not found by crm_node
INFO: 2: node v02-d deleted
#

So, I expect that crmsh still doesn't follow latest changes to 'crm_node
-l'. Although node seems to be deleted correctly.

For reference, output of crm_node -l is:
7 v02-c
8 v02-d
5 v02-a
6 v02-b


Best,
Vladislav


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crm node delete

2013-07-01 Thread Vladislav Bogdanov
01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,
 
 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 
 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

Likely it shows everything from a corosync nodelist.
After I deleted the node from everywhere except corosync, list is still
the same.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-27 Thread Vladislav Bogdanov
26.06.2013 18:30, Dejan Muhamedagic wrote:
 On Wed, Jun 26, 2013 at 06:13:33PM +0300, Vladislav Bogdanov wrote:
 26.06.2013 15:57, Dejan Muhamedagic wrote:
 On Thu, Jun 06, 2013 at 05:19:03PM +0200, Dejan Muhamedagic wrote:
 Hi,

 On Thu, Jun 06, 2013 at 03:11:16PM +0300, Vladislav Bogdanov wrote:
 06.06.2013 08:43, Vladislav Bogdanov wrote:
 [...]
 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at 
 the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib

 Then that is safe way to go, assuming that cib daemon serializes
 modification requests.


 It would be great if crmsh use that trick.

 Hope to have something soon. Stay tuned.

 The patch for crmsh is attached and you'll need the very latest
 pacemaker (because cibadmin needed some fixing). Unfortunately,
 I cannot push this yet to the repository, as the current
 pacemaker 1.1.10-rc still identifies itself as 1.1.9. I'd
 appreciate if you could test it.

 Seems to work during preliminary testing (stop clone with crm configure
 edit and then start it with crm resource start).
 cib process on the DC reports it received the diff and handles that
 perfectly.

 Thank you!

 I'll build updated package with this patch tomorrow and try to put that
 into real work.
 I mean to try concurrent updates.
 What would be the best way to achieve them?

 Is starting editing with crm configure edit with some concurrent command
 during that editing is enough (and save after command is run)?
 

I meant to ask when does crmsh gets original epoch to construct diff, at
the very beginning of editing, or right before commiting - there can be
rather big timeframe between that points.

It would be nice to have an intelligent patcher which takes one CIB
snapshot at the beginning of edit, than generates a diff and looks if it
applies to a current CIB cleanly (all except epoch). Then it would be
possible to use current epoch in a diff which goes to a cib daemon.
I do not know does it make a sense.

May be there is a better way to not loose big edits due to some small
unrelated changes were made meanwhile?

Or may be you can describe algorithm you use for those who do not know
python?

 I didn't remove the check for the changes and anyway cib is
 going to refuse to apply the patch if the epoch is older. Of
 course, crmsh can set the epoch attribute to something greater
 than the current epoch.

Didn't get this, sorry. Could you please reword?

 
 What would you suggest?
 
 Cheers,
 
 Dejan
 
 Vladislav

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-26 Thread Vladislav Bogdanov
26.06.2013 15:57, Dejan Muhamedagic wrote:
 On Thu, Jun 06, 2013 at 05:19:03PM +0200, Dejan Muhamedagic wrote:
 Hi,

 On Thu, Jun 06, 2013 at 03:11:16PM +0300, Vladislav Bogdanov wrote:
 06.06.2013 08:43, Vladislav Bogdanov wrote:
 [...]
 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib

 Then that is safe way to go, assuming that cib daemon serializes
 modification requests.


 It would be great if crmsh use that trick.

 Hope to have something soon. Stay tuned.
 
 The patch for crmsh is attached and you'll need the very latest
 pacemaker (because cibadmin needed some fixing). Unfortunately,
 I cannot push this yet to the repository, as the current
 pacemaker 1.1.10-rc still identifies itself as 1.1.9. I'd
 appreciate if you could test it.

Seems to work during preliminary testing (stop clone with crm configure
edit and then start it with crm resource start).
cib process on the DC reports it received the diff and handles that
perfectly.

Thank you!

I'll build updated package with this patch tomorrow and try to put that
into real work.
I mean to try concurrent updates.
What would be the best way to achieve them?

Is starting editing with crm configure edit with some concurrent command
during that editing is enough (and save after command is run)?

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] crmsh and fencing_topology

2013-06-13 Thread Vladislav Bogdanov
Dejan,

here is the patch to fix parsing of fencing_topology:
--- a/modules/xmlutil.py2013-06-07 07:21:10.0 +
+++ b/modules/xmlutil.py2013-06-13 07:51:09.704924693 +
@@ -937,7 +937,7 @@ def get_set_nodes(e,setname,create = 0):

 def xml_noorder_hash(n):
 return sorted([ hash(etree.tostring(x)) \
-for x in n.iterchildren() if is_element(c) ])
+for x in n.iterchildren() if is_element(x) ])
 xml_hash_d = {
 fencing-topology: xml_noorder_hash,
 }
---

Unfortunately, that still doesn't fully fix the problem, because
fencing-topology / is inserted into an extra configuration / node:
cib ...
   configuration
  ...
  configuration
  fencing-topology /
  /configuration
   /configuration
/cib

Can you please look at this? I expect fix to be one-line patch as well ;)

Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-06 Thread Vladislav Bogdanov
06.06.2013 09:02, Andrew Beekhof wrote:
 
 On 06/06/2013, at 3:45 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 06.06.2013 08:14, Andrew Beekhof wrote:

 On 06/06/2013, at 2:50 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 06.06.2013 07:31, Andrew Beekhof wrote:

 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource 
 in
 a shadow CIB together with the necessary constraints, runs a 
 simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that 
 if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but 
 the
 shadow CIB route has good points as well (easier to script via 
 cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib

 Ah, one more question. The whole modification request is rejected if any
 of patch hunks fail, correct?
 
 Correct (and yes everything is serialized _unless_ you start using the -l 
 cibadmin option)

Great. Thanks.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-06 Thread Vladislav Bogdanov
06.06.2013 08:43, Vladislav Bogdanov wrote:
[...]
 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib
 
 Then that is safe way to go, assuming that cib daemon serializes
 modification requests.
 

It would be great if crmsh use that trick.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Vladislav Bogdanov
05.06.2013 02:04, Andrew Beekhof wrote:
 
 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:
 
 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource in
 a shadow CIB together with the necessary constraints, runs a simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but the
 shadow CIB route has good points as well (easier to script via cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?
 
 The CIB will refuse to accept any update with a lower version:
 

 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

I recall that LDAP has similar problem, which is easily worked around
with specifying two values, one is original, second is new.
That way you tell LDAP server:
Replace value Y in attribute X to value Z. And if value is not Y at the
moment of modification request, then command fails.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Vladislav Bogdanov
06.06.2013 07:31, Andrew Beekhof wrote:
 
 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource in
 a shadow CIB together with the necessary constraints, runs a simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but the
 shadow CIB route has good points as well (easier to script via cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

   
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.
 
 cibadmin --patch works this way

Who is baking new CIB in that case, cibadmin or cib?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Vladislav Bogdanov
06.06.2013 08:14, Andrew Beekhof wrote:
 
 On 06/06/2013, at 2:50 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 06.06.2013 07:31, Andrew Beekhof wrote:

 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource in
 a shadow CIB together with the necessary constraints, runs a simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but the
 shadow CIB route has good points as well (easier to script via 
 cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

  
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?
 
 The patch is applied on the server - so cib

Then that is safe way to go, assuming that cib daemon serializes
modification requests.

Thanks for sharing info.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-05 Thread Vladislav Bogdanov
06.06.2013 08:14, Andrew Beekhof wrote:
 
 On 06/06/2013, at 2:50 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 06.06.2013 07:31, Andrew Beekhof wrote:

 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource in
 a shadow CIB together with the necessary constraints, runs a simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but the
 shadow CIB route has good points as well (easier to script via 
 cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

  
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?
 
 The patch is applied on the server - so cib

Ah, one more question. The whole modification request is rejected if any
of patch hunks fail, correct?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Behaviour of fence/stonith device fence_imm

2013-04-17 Thread Vladislav Bogdanov
16.04.2013 12:47, Andreas Mock wrote:
 Hi Marek, hi all,
 
 we just investigated this problem a little further while
 looking at the sources of fence_imm.
 
 It seems that the IMM device does a soft shutdown despite
 documented differently. I can reproduce this with the
 ipmitool directly and also using ssh access.
 
 The only thing which seems to work in the expected rigorous
 way is the ipmi-command 'power reset'. But with this
 command I can't shutdown the server.

Do you have acpid running?
If yes, try to stop/disable it. iirc that should help.
Of course what you see is a bug in a BMC, it should do hard off on a
'chassis power off' command. For graceful shutdown (with proper ACPI
signalling) there is a 'chassis power soft' command. I'd report that to
vendor (although that may be implemented that way wittingly).

 
 I'll offer more informations when I get feedback to this
 behaviour.
 
 Best regards
 Andreas
 
 
 -Ursprüngliche Nachricht-
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Marek Grac
 Gesendet: Montag, 15. April 2013 11:02
 An: Andrew Beekhof
 Cc: General Linux-HA mailing list
 Betreff: Re: [Linux-HA] Behaviour of fence/stonith device fence_imm
 
 Hi,
 
 On 04/15/2013 04:17 AM, Andrew Beekhof wrote:
 On 13/04/2013, at 12:21 AM, Andreas Mock andreas.m...@web.de wrote:

 Hi all,

 just played with the fence/stonith device fence_imm.
 (as part of pacemaker on RHEL6.x and clones)

 It is configured to use the action 'reboot'.
 This action seems to cause a graceful reboot of the node.

 My question. Is this graceful reboot feasible when the node
 gets unreliable or would it be better to power cycle the
 machine (off/on)?
 Yes, it will. For fence_imm the standard IPMILAN fence agent is used 
 without additional options. It uses  a method described by you: power 
 off / check status / power on; it looks like that there are some changes 
 in IMM we are not aware. Please fill a bugzilla for this issue, if you 
 can do a proper non-graceful power off using ipmitools, please add it too.
 
 How can I achieve that the fence_imm is making a power cycle
 (off/on) instead of a soft reboot?

 Yes, you can use -M (method in STDIN/cluster configuration) with values 
 'onoff' (default) or 'cycle' (use reboot command on IPMI)
 
 m,
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] master/slave drbd resource STILL will not failover

2012-12-01 Thread Vladislav Bogdanov
02.12.2012 00:34, Robinson, Eric wrote:

 Try to set 'target-role=Started' in both of them.

 
 Okay, but how does that address the problem of error code 11 from drbdadm?

Well, you have error promoting resources. 11 is EAGAIN, usually meaning
you did not demote the other side.

Your logs contain
Nov 27 15:32:15 [24609] ha09a   crmd:debug: do_lrm_rsc_op:
Performing key=13:750:0:8267fa3b-4f5f-45f6-89c9-fb7540f471b3
op=p_drbd0_demote_0

And that is the first mention of word 'demote' in log you provided
(frankly speaking, log from DC would help much more).

drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 ERROR: ha02_mysql: Exit code 11

That happened 16 second before the demoting is started.

I recall pacemaker operates differently (I'd say wrong) if you have
target-role=Master for a ms resource opposed to a normal
target-role=Started.

Is that enough? ;)

Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] master/slave drbd resource STILL will not failover

2012-11-30 Thread Vladislav Bogdanov
30.11.2012 00:14, Robinson, Eric wrote:
 Bump... does anyone have some insight on this? Google is not turning up 
 anything useful.
 
 Our newest cluster will not failover master/slave drbd resources. It works 
 fine manually using drbdadm from a shell prompt, but when we try it using 
 'crm node standby' and letting the cluster manage the resource, crm_mon just 
 keeps saying the resource FAILED.
 
 We see a lot of these messages in the corosync.log file:
 
 drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 DEBUG: ha02_mysql: Calling 
 drbdadm -c /etc/drbd.conf primary ha02_mysql
 drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 ERROR: ha02_mysql: Called drbdadm 
 -c /etc/drbd.conf primary ha02_mysql
 drbd(p_drbd1)[12814]:   2012/11/27_15:31:59 ERROR: ha02_mysql: Exit code 11
 
 There is no indication of what may be causing the 'Exit code 11'
 
 Here is a link to the corosync log, taken from the standby server (ha09a) 
 where we are trying to fail the resource to...
 
 www.psmnv.com/downloads/corosync1.loghttp://www.psmnv.com/downloads/corosync1.log
 
 Here is what I have installed...
 
 corosync-1.4.1-7.el6_3.1.x86_64
 corosynclib-1.4.1-7.el6_3.1.x86_64
 pacemaker-1.1.8-4.el6.x86_64
 pacemaker-cli-1.1.8-4.el6.x86_64
 pacemaker-cluster-libs-1.1.8-4.el6.x86_64
 pacemaker-libs-1.1.8-4.el6.x86_64
 
 Following is my crm config. It's pretty basic.
 
 
 node ha09a \
 attributes standby=off
 node ha09b \
 attributes standby=off
 primitive p_drbd0 ocf:linbit:drbd \
 params drbd_resource=ha01_mysql \
 op monitor interval=60s
 primitive p_drbd1 ocf:linbit:drbd \
 params drbd_resource=ha02_mysql \
 op monitor interval=45s
 primitive p_vip_clust08 ocf:heartbeat:IPaddr2 \
 params ip=192.168.10.210 cidr_netmask=32 \
 op monitor interval=30s
 primitive p_vip_clust09 ocf:heartbeat:IPaddr2 \
 params ip=192.168.10.211 cidr_netmask=32 \
 op monitor interval=30s
 ms ms_drbd0 p_drbd0 \
 meta master-max=1 master-node-max=1 clone-max=2 
 clone-node-max=1 notify=true target-role=Master
 ms ms_drbd1 p_drbd1 \
 meta master-max=1 master-node-max=1 clone-max=2 
 clone-node-max=1 notify=true target-role=Master

Try to set 'target-role=Started' in both of them.

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] cib_replace failed?

2012-11-01 Thread Vladislav Bogdanov
31.10.2012 20:55, Robinson, Eric wrote:
 Okay, the two node names are ha09a and ha09b. Starting clean with all 
 services turned off.
 
 This is what I get in /var/log/corosync.log on ha09a when I start corosync...
 
 Oct 31 10:22:43 corosync [MAIN  ] Corosync Cluster Engine ('1.4.3'): started 
 and ready to provide service.
 Oct 31 10:22:43 corosync [MAIN  ] Corosync built-in features: nss
 Oct 31 10:22:43 corosync [MAIN  ] Successfully read main configuration file 
 '/etc/corosync/corosync.conf'.
 Oct 31 10:22:43 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
 Oct 31 10:22:43 corosync [TOTEM ] Initializing transmit/receive security: 
 libtomcrypt SOBER128/SHA1HMAC (mode 0).
 Oct 31 10:22:43 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
 Oct 31 10:22:43 corosync [TOTEM ] Initializing transmit/receive security: 
 libtomcrypt SOBER128/SHA1HMAC (mode 0).
 Set r/w permissions for uid=0, gid=0 on /var/log/corosync.log
 Oct 31 10:22:43 corosync [TOTEM ] The network interface [192.168.10.58] is 
 now up.
 Oct 31 10:22:43 corosync [pcmk  ] Logging: Initialized pcmk_startup
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: Pacemaker Cluster 
 Manager 1.1.7
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: corosync extended 
 virtual synchrony service
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: corosync 
 configuration service
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: corosync cluster 
 closed process group service v1.01
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: corosync cluster 
 config database access v1.01
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: corosync profile 
 loading service
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: corosync cluster 
 quorum service v0.1
 Oct 31 10:22:43 corosync [MAIN  ] Compatibility mode set to whitetank.  Using 
 V1 and V2 of the synchronization engine.
 Oct 31 10:22:43 corosync [TOTEM ] The network interface [198.51.100.58] is 
 now up.
 Oct 31 10:22:44 corosync [TOTEM ] Incrementing problem counter for seqid 1 
 iface 198.51.100.58 to [1 of 10]
 Oct 31 10:22:44 corosync [TOTEM ] A processor joined or left the membership 
 and a new membership was formed.
 Oct 31 10:22:44 corosync [CPG   ] chosen downlist: sender r(0) 
 ip(192.168.10.58) r(1) ip(198.51.100.58) ; members(old:0 left:0)
 Oct 31 10:22:44 corosync [MAIN  ] Completed service synchronization, ready to 
 provide service.
 Oct 31 10:22:44 corosync [TOTEM ] A processor joined or left the membership 
 and a new membership was formed.
 Oct 31 10:22:44 corosync [CPG   ] chosen downlist: sender r(0) 
 ip(192.168.10.58) r(1) ip(198.51.100.58) ; members(old:1 left:0)
 Oct 31 10:22:44 corosync [MAIN  ] Completed service synchronization, ready to 
 provide service.
 Oct 31 10:22:46 corosync [TOTEM ] ring 1 active with no faults
 
 
 Some things seem to be missing from the log. According to the ClusterLabs 
 docs, I should be seeing entries similar to the following, but I am NOT. (The 
 following are adapted from the ClusterLabs documentation. They are NOT 
 showing up in my logs.)
 
 
 Aug 27 09:05:35 ha09a corosync[1540]: [pcmk ] info: pcmk_startup: CRM: 
 Initialized
 Aug 27 09:05:35 ha09a corosync[1540]: [pcmk ] Logging: Initialized 
 pcmk_startup
 Aug 27 09:05:35 ha09a corosync[1540]: [pcmk ] info: pcmk_startup: Maximum 
 core file size is: 18446744073709551615
 Aug 27 09:05:35 ha09a corosync[1540]: [pcmk ] info: pcmk_startup: Service: 9
 Aug 27 09:05:35 ha09a corosync[1540]: [pcmk ] info: pcmk_startup: Local 
 hostname: ha09a
 
 
 One thing that does stand out to me is that we are seeing the following line 
 in the log...
 
 Oct 31 10:22:43 corosync [SERV  ] Service engine loaded: Pacemaker Cluster 
 Manager 1.1.7
 
 ..however we have Pacemaker 1.1.8 installed, not 1.1.7.
 
 Where is that 1.1.7 coming from?
 
 Here is what we have installed...
 
 [root@ha09a log]# rpm -qa|egrep pacem|coros
 pacemaker-1.1.8-0.901.eedc0cc.git.el6.x86_64
 pacemaker-cluster-libs-1.1.8-0.901.eedc0cc.git.el6.x86_64

I suspect that version you run (pre-1.1.8,
https://github.com/ClusterLabs/pacemaker/commit/eedc0cc9601d563a38ff3185414694bfbeb7ff76)
actually has problems with corosync1 (plugin-based) setups. I think that
relevant fix was
https://github.com/ClusterLabs/pacemaker/commit/89c817d795da535fca667a848d6b0503a120129a,
which was committed two days later.

Why not try official 1.1.8 which should have all these fixed?

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

2012-03-15 Thread Vladislav Bogdanov
14.03.2012 00:42, William Seligman wrote:
[snip]
 These were the log messages, which show that stonith_admin did its job and 
 CMAN
 was notified of the fencing: http://pastebin.com/jaH820Bv.

Could you please look at the output of 'dlm_tool ls' and 'dlm_tool dump'?

You probably have 'kern_stop' and 'fencing' flags there. That means that
dlm is unaware that node is fenced.

 
 Unfortunately, I still got the gfs2 freeze, so this is not the complete story.

Both clvmd and gfs2 use dlm. If dlm layer thinks fencing is not
completed, both of them freeze.

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

2012-03-15 Thread Vladislav Bogdanov
15.03.2012 18:43, William Seligman wrote:
 On 3/15/12 3:43 AM, Vladislav Bogdanov wrote:
 14.03.2012 00:42, William Seligman wrote:
 [snip]
 These were the log messages, which show that stonith_admin did its job and 
 CMAN
 was notified of the fencing: http://pastebin.com/jaH820Bv.

 Could you please look at the output of 'dlm_tool ls' and 'dlm_tool dump'?

 You probably have 'kern_stop' and 'fencing' flags there. That means that
 dlm is unaware that node is fenced.
 
 Here's 'dlm_tool ls' with both nodes running cman+clvmd+gfs2:
 http://pastebin.com/QrZtm1Ue
 
 'dlm_tool dump': http://pastebin.com/UKWxx9Y4
 
 For comparison, I crashed one node and looked at the same output on the
 remaining node:
 dlm_tool ls: http://pastebin.com/cKVAGxsd
 dlm_tool dump: http://pastebin.com/c0h0p22Q (the post-crash lines begin at
 1331824940)

Everything is fine there, dlm correctly understands that node is fenced
and returns to a normal state.

The only minor issue I see is that fencing took much time - 21 sec.

 
 I don't see the kern_stop or fencing flags. There's another thing I don't
 see: at the top of 'dlm_tool dump' it displays most of the contents of my
 cluster.conf file, except for the fencing sections. Here's my cluster.conf for
 comparison: http://pastebin.com/w5XNYyAX

It also looks correct (I mean fence_pcmk), but I can be wrong here, I do
not use cman.

 
 cman doesn't see anything wrong in my cluster.conf file:
 
 # ccs_config_validate
 Configuration validates
 
 But could there be something that's causing the fencing sections to be 
 ignored?
 

 Unfortunately, I still got the gfs2 freeze, so this is not the complete 
 story.

 Both clvmd and gfs2 use dlm. If dlm layer thinks fencing is not
 completed, both of them freeze.
 
 I did 'grep -E (dlm|clvm|fenc) /var/log/messages' and looked at the time I
 crashed the node: http://pastebin.com/dvBtdLUs. I see lines that indicate 
 that
 pacemaker and drbd are fencing the node, but nothing from dlm or clvmd. Does
 this indicate what you suggest: Could dlm somehow be ignoring or overlooking 
 the
 fencing I put in? Is there any other way to check this?

No, dlm_controld (and friends) mostly uses different logging method -
that is what you see in dlm_tool dump.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-ha-dev] [ha-wg-technical] Proposal to drop vgck in LVM RA

2012-02-20 Thread Vladislav Bogdanov
Hi Dejan,

20.02.2012 20:23, Dejan Muhamedagic wrote:
 Hi Vladislav,
 
 On Fri, Feb 03, 2012 at 10:33:52AM +0300, Vladislav Bogdanov wrote:
 Hi Dejan, all,

 02.02.2012 19:44, Dejan Muhamedagic wrote:
 Hello all,

 Sorry for crossposting, but can anybody comment on the matter
 bellow? Thanks!

 Running LVM operations can fail monitor op due timeouts. I experienced
 that many times before I switched to home-brew RA for LVM.
 There I only check for existence of /dev/VG[/LV].
 Of course you need to obtain real status from LVM stack for start and
 stop ops.

 Please look at attached stripped-down version of RA I actually use (I
 quickly removed bits of code which are very site-specific or too
 experimental and are no interest for anyone). I wanted to send it long
 ago, but you all know, some guys need activation to do something they
 wish but do not actually need ;)

 I'd say that RA is (near) production-quality and had been extensively
 tested on several clusters. What is attached is probably twentieth
 revision/rewrite, and it runs almost a year without modifications, not
 causing any problems (unlike stock LVM RA for me).

 If you wish, you may include some ideas from it into LVM RA or just
 include attached as an alternative implementation (after some light
 testing because of removed code).

 It has enough comments/logs in critical sections so I hope it should be
 clean to reader.

 Main ideas lying behind that RA are:
 * Do not run LVM commands on monitor (they are simply not needed). This
 also helps to be tolerant to iSCSI link failures.
 
 Thanks for confirming this.
 
 * Skip LVM locking where it is not needed (borrowed from RedHat's
 lvm.sh). Useful when clvm waits for fencing (it would not allow any
 command to succeed until fencing is done, so RA may timeout on monitor
 without any matter when LV is actually available to system).
 
 Good point.
 
 * Use timeout to not hang forever. Better is to try again.
 
 I'm sure it's better, but it would be good to know why. timeout
 from coreutils is still fairly new.

At least to print something sane to logs. RA is timed out is not very
informative. LVM command XXX is timed out has much more info.
I also suspect that some cLVM commands highly depend on a cluster and
DLM state which may change over the time (between two command runs). I
can't tell for sure, but I suspect possible deadlocks if command arrived
at the wrong time (when cluster state was wrong). Please note, that
pacemaker and DLM points of view on a cluster state may differ (f.e.
pacemaker with openais plugin allows quick node leave-join while DLM
does not). So pacemaker may run commands while dlm waits for fencing. I
fixed this for my setups, but upstream dlm_controld.pcmk (and version in
suse) is affected by this. This should change with corosync 2.0 where
pacemaker will use CPG too.

 
 * Use realtime scheduling priority (because otherwise LVM commands may
 run for ages under high load even with well-tuned filter in LVM.conf).
 This helps to reduce run time up to scale 20 - 60 to 3 secs in some
 circumstances.
 
 In case monitors are this light, how does this help? For start/stop?

Just for all LVM commands, and yes, vgchange/lvchange for start/stop are
critical ones.

 
 * Allow to separate VG/LV management:
 ** Allow VG to be just made known to system without activating any LVs
 in it.
 ** Allow per-LV management (managing single LVs requires operation from
 previous item to be done before).

 The only limitation is that empty VGs are not supported (there is a
 comment in code describing why).

 And, it requires bash.

 Hope you find that useful,
 
 Probably, but just as an example. LVM2 is out of question:
 IPaddr2/IPaddr turned out not to be such a great idea.

It's up to you.
BTW software package this RA works with is named LVM2 ;)

Best,
Vladislav

 
 Thanks,
 
 Dejan
 
 Best,
 Vladislav


 Dejan

 On Tue, Jan 10, 2012 at 02:22:35PM +0100, Dejan Muhamedagic wrote:
 Hi Hideo-san,

 On Tue, Jan 10, 2012 at 11:28:12AM +0900, renayama19661...@ybb.ne.jp wrote:
 Hi Dejan,

 How do you think about this matter?

 I'm still inclined to drop vgck from monitor and use it just
 before start. I wouldn't even consider that a regression.

 I'm also not sure what does vgck offer in comparison with
 vgdisplay and if both actually work with the on-disk lvm
 meta-data. In that case we should drop vgdisplay as well and find
 another (and better) way to monitor VGs.

 Anybody with deeper knowledge on LVM?

 Cheers,

 Dejan

 Best Regards,
 Hideo Yamauchi.


 --- On Thu, 2011/12/8, renayama19661...@ybb.ne.jp 
 renayama19661...@ybb.ne.jp wrote:

 Hi Dejan,

 Thank you for comment.
 We examine a correction of LVM_validate_all.
 Because the handling of vgck influences it, I am going to obey the 
 decision of this argument.

 For example, even the following simple choice may be good.
  * Add exec_vgck parameter
   * true(default) : Exec vgck command.
   * false : Not exec vgck command.

 Best Regards,
 Hideo

Re: [Linux-ha-dev] [ha-wg-technical] Proposal to drop vgck in LVM RA

2012-02-02 Thread Vladislav Bogdanov
 https://lists.linuxfoundation.org/mailman/listinfo/ha-wg-technical
 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

#!/bin/bash
#
# LV management RA
#
# Copyright (c) 2011 Vladislav Bogdanov bub...@hoster-ok.com
#
# Partially based on LVM RA by Alan Robertson (Copyright: (C) 2002 - 2005
# International Business Machines, Inc.) and lvm.sh RA by Redhat.
#
###
# Initialization:

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs

###

OCF_RESKEY_activation_mode_default=auto

: ${OCF_RESKEY_activation_mode=${OCF_RESKEY_activation_mode_default}}
: ${OCF_RESKEY_force_stop=0}
: ${OCF_RESKEY_verify_stopped_on_stop=0}

need_real_status=0

usage() {
  cat EOF
usage: $0 {start|stop|reload|monitor|validate-all|meta-data}
EOF
}

meta_data() {
cat EOF
?xml version=1.0?
!DOCTYPE resource-agent SYSTEM ra-api-1.dtd
resource-agent name=LVM2
version1.0/version

longdesc lang=en
Resource script for LVM. It manages an Linux Volume Manager volume (LVM) 
as an HA resource.
/longdesc
shortdesc lang=enControls the availability of an LVM Volume or 
Group/shortdesc

parameters
parameter name=vg_name unique=0 required=1
longdesc lang=en
The name of volume group.
/longdesc
shortdesc lang=enVolume group name/shortdesc
content type=string default= /
/parameter

parameter name=lv_name unique=0
longdesc lang=en
The name of the only logical volume to activate.
If empty, then all volumes will be activated unless activation_mode
is set to none.
/longdesc
shortdesc lang=enLogical volume name/shortdesc
content type=string default= /
/parameter

parameter name=activation_mode unique=0 required=0
longdesc lang=en
Specifies activation mode for VG (LV).
Could be one of:
auto  - Activate all volumes if none is specified by 'lv_name',
otherwise activate only specified volume. Clustered
volumes and groups are activated in local mode if
resource is running as clone and in exclusive mode
otherwise.
none  - only for VGs, do not activate/deactivate volumes,
just make sure VG is known to kernel on start, and
look for active volumes on monitor. Useful if one
wants to separate VG and LV monitoring.
local - only for (volumes in) clustered groups. Make local
activation (-aly). Default if resource is run as clone.
exclusive - only for (volumes in) clustered groups. Make exclusive
activation (-aey). Default if resource is not run as
clone. RA will complain if this is specified for cloned
resource.
/longdesc
shortdesc lang=enVG/LV activation mode/shortdesc
content type=string default=${OCF_RESKEY_activation_mode_default} /
/parameter

parameter name=force_stop unique=0
longdesc lang=en
Force all logical volumes in group to be deactivated if
activation_mode is set to none. RA will fail in this case if deactivation 
failed.
Only for VG-level resources (lv_name is empty).
/longdesc
shortdesc lang=enForce deactivation of all volumes/shortdesc
content type=boolean default=0 /
/parameter

parameter name=verify_stopped_on_stop unique=0
longdesc lang=en
Fail on stop if activation_mode is set to none and VG has active volumes.
Only for VG-level resources (lv_name is empty).
/longdesc
shortdesc lang=enFail on stop if VG has active volumes/shortdesc
content type=boolean default=0 /
/parameter

actions
action name=start timeout=240 /
action name=stop timeout=240 /
action name=reload  timeout=120 /
action name=monitor depth=0 timeout=60 interval=30 /
action name=meta-data timeout=5 /
action name=validate-all timeout=5 /
/actions
/resource-agent
EOF
}

# Global vars
clustered=
activation_modifier=

check_activation_mode() {

case ${OCF_RESKEY_activation_mode} in
local)
if [ ${clustered} -eq 0 ] ; then
ocf_log err Rejecting to operate in local activation mode for 
non-clustered volume, use activation_mode={auto|none} instead.
return $OCF_ERR_CONFIGURED
fi
activation_modifier=l
;;
exclusive)
if [ ${clustered} -eq 0 ] ; then
ocf_log err Rejecting to operate in exclusive activation mode 
for non-clustered volume, use activation_mode={auto|none} instead.
return $OCF_ERR_CONFIGURED
elif [ -n ${OCF_RESKEY_CRM_meta_clone} ] ; then
ocf_log err Rejecting to operate in exclusive activation mode 
for clone resource.
return $OCF_ERR_CONFIGURED
fi
activation_modifier=e
;;
none)
if [ -n ${OCF_RESKEY_lv_name} ] ; then
ocf_log err activation_mode=none cannot be used

[Linux-HA] crmsh property management regression

2012-01-16 Thread Vladislav Bogdanov
Hi Dejan,

I'm evaluating crmsh in place of pacemaker bundled crm (because of
rsc_ticket support).

With current crmsh (b4b063507de0) it is impossible (ok, very hard) to
manage cluster properties:
# crm configure
crm(live)configure# property [tab] ERROR: crmd:metadata: could not parse
meta-data:
ERROR: crmd:metadata: could not parse meta-data:
ERROR: crmd:metadata: could not parse meta-data:
ERROR: crmd:metadata: could not parse meta-data:

Every subsequent [tab] press results in two more such lines printed.
The same is after changing properties with crm configure edit.

Pacemaker is 41dedc0 (Dec 16). Bundled crm works perfectly (except
rsc_ticket support ;) )

Hoping this can be easily fixed (something in ra.py.in? ),

Best regards,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crmsh property management regression

2012-01-16 Thread Vladislav Bogdanov
Hi Dejan,

thank you very much for a good pointer, you saved me much time.

16.01.2012 16:20, Dejan Muhamedagic wrote:
 Hi Vladislav,
 
 On Mon, Jan 16, 2012 at 02:14:29PM +0300, Vladislav Bogdanov wrote:
 Hi Dejan,

 I'm evaluating crmsh in place of pacemaker bundled crm (because of
 rsc_ticket support).

 With current crmsh (b4b063507de0) it is impossible (ok, very hard) to
 manage cluster properties:
 # crm configure
 crm(live)configure# property [tab] ERROR: crmd:metadata: could not parse
 meta-data:
 ERROR: crmd:metadata: could not parse meta-data:
 ERROR: crmd:metadata: could not parse meta-data:
 ERROR: crmd:metadata: could not parse meta-data:

 Every subsequent [tab] press results in two more such lines printed.
 The same is after changing properties with crm configure edit.
 
 How did you build crmsh? In particular, is this thing properly
 replaced by autofoo:
 
 crm_daemon_dir = @GLUE_DAEMON_DIR@

It was /usr/lib64/heartbeat while my build of pacemaker already has
daemons installed in /usr/libexec/pacemaker (actually it is built from
master branch of Andrews' private repo).

Following patch solved the issue for me

--- a/configure.ac  2012-01-12 14:32:47.0 +
+++ b/configure.ac  2012-01-16 15:39:03.413650410 +
@@ -187,8 +187,8 @@
 AC_SUBST(CRM_CONFIG_DIR)

 dnl Eventually move out of the heartbeat dir tree and create
compatability code
-dnl CRM_DAEMON_DIR=$libdir/pacemaker
-GLUE_DAEMON_DIR=`extract_header_define $GLUE_HEADER GLUE_DAEMON_DIR`
+GLUE_DAEMON_DIR=${libexecdir}/pacemaker
+dnl GLUE_DAEMON_DIR=`extract_header_define $GLUE_HEADER GLUE_DAEMON_DIR`
 AC_DEFINE_UNQUOTED(GLUE_DAEMON_DIR,$GLUE_DAEMON_DIR, Location for
Pacemaker daemons)
 AC_SUBST(GLUE_DAEMON_DIR)

Thank you again very much,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] crmsh property management regression

2012-01-16 Thread Vladislav Bogdanov
16.01.2012 20:56, Dejan Muhamedagic wrote:
 On Mon, Jan 16, 2012 at 06:47:54PM +0300, Vladislav Bogdanov wrote:
 Hi Dejan,

 thank you very much for a good pointer, you saved me much time.

 16.01.2012 16:20, Dejan Muhamedagic wrote:
 Hi Vladislav,

 On Mon, Jan 16, 2012 at 02:14:29PM +0300, Vladislav Bogdanov wrote:
 Hi Dejan,

 I'm evaluating crmsh in place of pacemaker bundled crm (because of
 rsc_ticket support).

 With current crmsh (b4b063507de0) it is impossible (ok, very hard) to
 manage cluster properties:
 # crm configure
 crm(live)configure# property [tab] ERROR: crmd:metadata: could not parse
 meta-data:
 ERROR: crmd:metadata: could not parse meta-data:
 ERROR: crmd:metadata: could not parse meta-data:
 ERROR: crmd:metadata: could not parse meta-data:

 Every subsequent [tab] press results in two more such lines printed.
 The same is after changing properties with crm configure edit.

 How did you build crmsh? In particular, is this thing properly
 replaced by autofoo:

 crm_daemon_dir = @GLUE_DAEMON_DIR@

 It was /usr/lib64/heartbeat while my build of pacemaker already has
 daemons installed in /usr/libexec/pacemaker (actually it is built from
 master branch of Andrews' private repo).
 
 OK, that's bleeding edge source and pacemaker daemons were moved
 in the meantime. I'll make the error message more specific.
 
 Following patch solved the issue for me

 --- a/configure.ac  2012-01-12 14:32:47.0 +
 +++ b/configure.ac  2012-01-16 15:39:03.413650410 +
 @@ -187,8 +187,8 @@
  AC_SUBST(CRM_CONFIG_DIR)

  dnl Eventually move out of the heartbeat dir tree and create
 compatability code
 -dnl CRM_DAEMON_DIR=$libdir/pacemaker
 -GLUE_DAEMON_DIR=`extract_header_define $GLUE_HEADER GLUE_DAEMON_DIR`
 +GLUE_DAEMON_DIR=${libexecdir}/pacemaker
 +dnl GLUE_DAEMON_DIR=`extract_header_define $GLUE_HEADER GLUE_DAEMON_DIR`
  AC_DEFINE_UNQUOTED(GLUE_DAEMON_DIR,$GLUE_DAEMON_DIR, Location for
 Pacemaker daemons)
  AC_SUBST(GLUE_DAEMON_DIR)
 
 Not the correct way, i.e. we should introduce CRM_DAEMON_DIR and
 then extract the right location from the Pacemaker include file.

I didn't find it there, that's why just a quick hack.


Cheers,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Resended : Understanding how heartbeat and pacemaker work together

2012-01-14 Thread Vladislav Bogdanov
13.01.2012 13:04, Niclas Müller wrote:
 I've grouped both as www-services and not it is running like i want. 
 Change to takeover is 4-6 sec. Its good, but I want to go to 1-3 sec as 
 far as possible. Much process last will there not because I only made a 

Pacemaker runs monitor actions at a rate you configured.
Just change that rate.

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: limited usefulness of ocf_take_lock()

2011-11-28 Thread Vladislav Bogdanov
28.11.2011 13:09, Ulrich Windl wrote:
 Hi!
 

I posted one more implementation in
http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg19760.html as
a part of bigger code snippet.

It just uses -C shell option to create lock files (exists at least in
bash and dash).


 Here is a locking sample (potential replacement functions) that
 seems
 to work: Just start the script more than once as a background process
 and watch the output
 ---snip locks.sh 
 MASTERLOCKFILE=/tmp/blabla
 
 lock() {
 (flock -e 123 
 if [ -e $1 ]; then
 if ! kill -0 $($1) 21  /dev/null; then
 # stale lock
 echo $$  $1
 else
 false
 fi
 else
 echo $$  $1
 fi) 123 $MASTERLOCKFILE
 }
 
 unlock() {
 (flock -e 124  test -e $1  rm $1) 124 $MASTERLOCKFILE
 }
 
 # application
 while true
 do
 while ! lock /tmp/foobar; do
 echo waiting for lock $$
 sleep 0.2
 done
 echo lock OK $$
 sleep 0.1
 if unlock /tmp/foobar; then
 echo unlock OK $$
 else
 echo unlock FAIL $$
 fi
 sleep 0.1
 done
 -snip
 
 Regards,
 Ulrich
 
 
 Ulrich Windl ulrich.wi...@rz.uni-regensburg.de schrieb am 28.11.2011 um
 10:07 in Nachricht 4ed35d7e02a18...@gwsmtp1.uni-regensburg.de:
 Hi!

 I was requested to work around a kernel bug by adding locks to my RA. 
 Reading the docs I found that ist supposed to be done via

 ocf_take_lock $LOCKFILE
 and
 ocf_release_lock_on_exit $LOCKFILE

 Out of curiosity I inspected the implementation in SLES11 SP1. To me the 
 functions are improperly implemented (unless I'm wrong) because:

 1) you can have only one lock per RA, no matter what $LOCKFILE you provide. 
 This is because actually not the $LOCKFILE is the lock, but the process ID 
 of 
 the shell

 2) the implementation does not guarantee mutual exclusion:

 ocf_pidfile_status() is used to query for an unowned lock. ocf_take_lock() 
 in turn waits until either the specified lockfile does not exist, or the PID 
 in the lockfile vanished.

 Then the PID of the RA's shell is written into the lockfile. As can be seen, 
 multiple processes can do that if no lock exists.

 If you had parallel execution of RAs before, you'll have parallel execution 
 even with those locks.

 Finally you can only release the lock using ocf_release_lock_on_exit(). 
 Unfortunately that function will only release tha last lock passed to that 
 function as trap does not accumulate the commends you give to it.

 Maybe an approach using flock(1) instead might be better (untested, just 
 from reading the docs):

 lock() {
 (flock -e 123; test -e $LOCKFILE || touch $LOCKFILE) 123 $MASTERLOCKFILE
 }

 unlock() {
 (flock -e 124; test -e $LOCKFILE  rm $LOCKFILE) 124 $MASTERLOCKFILE
 }

 Regards,
 Ulrich


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 

 
  
  
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Error running corosync

2011-11-20 Thread Vladislav Bogdanov
21.11.2011 04:18, Nick Khamis wrote:
 Correction!
 
 Some of the ocfs2_controld.pcmk errors posted earlier was due to pacemaker not
 running with /service.d/pcmk. The error is actually:
 http://pastebin.com/XCiuhU20.
 If I can get the standard dlm working it will all come together!

You can't. point blank
You need dlm_controld.pcmk for cman-free stack.

 This part of the project
 is so close to completion.

Especially occasional kernel panics state about that. ;)
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Error running corosync

2011-11-20 Thread Vladislav Bogdanov
21.11.2011 01:18, Andrew Beekhof wrote:
 On Sat, Nov 19, 2011 at 1:02 AM, Nick Khamis sym...@gmail.com wrote:
 Hello Andrew,

 Thank you so much for your response. My concern was elimination as
 much of cman as
 possible,
 
 Then don't use it at all.
 
 since the goal was to run pacemaker on top of
 corosync/openais however, from
 Vladislav's last email, this is only possible with yet even more hacks.
 
 Not true.
 SLES/openSUSE has supported cman-free clusters and cluster filesystems
 for many years.
 

I spoke to Dinar who does QA for cluster-related things at Suse, and he
promised to try to reproduce that flaw with CPG_NODEDOWN and fencing.

Otherwise, yes, you are absolutely correct. And I use the same things on
fedora.

Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Error running corosync

2011-11-18 Thread Vladislav Bogdanov
Hi,

18.11.2011 17:02, Nick Khamis wrote:
[snip]

 Vladislav, was this ocfs2 stack kernel crash you were experiencing one year 
 ago:

To be frank, I do not remember.
It was a year ago ;)

 
  Starting ocfs2_controld... [  OK  ]
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] Oops:  [#1] SMP
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] last sysfs file: /sys/fs/ocfs2/max_locking_protocol
Unfencing self...
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] Process ocfs2_controld. (pid: 6579, ti=c5ec
 task=c565c880 task.ti=c5ec)
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] Stack:
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] Call Trace:
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] Code: 9e 47 c8 75 c3 fe 05 d0 a1 47 c8 89 f8 5b
 5e 5f 5d c3 53 b8 d0 a1 47 c8 89 cb e8 e8 50 df f8 8b 15 d8 a1 47 c8
 31 c0 85 d2 74 1c 0f b6 42 01 50 0f b6 02 50 68 8f 98 47 c8 68 00 10
 00 00 53 e8
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] EIP: [c8479274]
 ocfs2_max_locking_protocol_show+0x19/0x3d [ocfs2_stackglue] SS:ESP
 0068:c5ec1f48
 
 Message from syslogd@astdrbd1 at Nov 18 08:51:59 ...
  kernel:[  724.636106] CR2: c861ef65
 [  OK  ]
Joining fence domain... [  OK  ]
 
 Cheers,
 
 Nick.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


  1   2   >