from:"Vladislav Bogdanov"


16.02.2015 14:34, Lars Marowsky-Bree wrote:

On 2015-02-16T09:20:22, Kristoffer Grönlund kgronl...@suse.com wrote:


Actually, I decided that it does make sense to return 0 as the error
code even if the resource to delete doesn't exist, so I pushed a commit
to change this. The error message is still printed, though.


I'm not sure I agree, for once.

Idempotency is for resource agent operations, not necessarily all
operations everywhere. Especially because crmsh doesn't know whether the
object doesn't exist because it was deleted, or because it was
misspelled.

Compare the Unix-as-little-else rm command; rm -f /tmp/idontexist will
give an error code.


btw with '-f' it wont. ;) And it would be enough for me if 'crm -F' 
behave the same.


Best,
Vladislav



Now a caller of crmsh has to *parse the output* to know whether the
delete command succeeded or not, which is rather non-trivial.

If the caller doesn't care whether the command succeeded or not, it
should be the caller that ignores the error code.

Or if you want to get real fancy, return different exit codes for
referenced object does not exist, or generic syntax error.


Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1


And, yeah, well, this shouldn't happen. Here idempotency applies ;-)



Regards,
 Lars



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] [PATCH] low: cibconfig: Do not fail on deletion of non-existing objects

---
 modules/cibconfig.py |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/modules/cibconfig.py b/modules/cibconfig.py
index 8689c1b..8680f33 100644
--- a/modules/cibconfig.py
+++ b/modules/cibconfig.py
@@ -3463,8 +3463,6 @@ class CibFactory(object):
 for obj_id in args:
 obj = self.find_object(obj_id)
 if not obj:
-no_object_err(obj_id)
-rc = False
 continue
 if not rscstat.can_delete(obj_id):
 common_err(resource %s is running, can't delete it % obj_id)
-- 
1.7.1

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource


16.02.2015 11:15, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,

may be it is worth to silently (or at least with rc=0) allow deletion of
non-existing or already-deleted configuration statements?

Background for that is that I keep track of the all configuration
statements myself, and, when I delete some resources (together with
accompanying constraints), they may go out-of-order to 'crm configure
delete', thus some constraints are automatically deleted when deleting
lower resource before the upper one. That leads to the whole crm
script to fail.


Hmm, I am not sure about doing this by default, since we would want to
show some kind of indication that a resource name may have been
misspelled for example... But I can imagine having a command line flag
for being more flexible in this regard.


Reuse '-F'?



I will look at how it works now.

BTW, I suspect that passing the --wait flag to crm while running
commands in this way may help you. Although I am not sure I entirely
understand what it is you are doing :)


Look:
crm configure
primitive a ...
primitive b ...
colocation b-with-a inf: b a
commit
exit

crm configure
delete a
delete b-with-a = fails because is already deleted automatically
delete b
commit

Best,
Vladislav



Cheers,
Kristoffer



Best,
Vladislav

13.02.2015 17:03, Vladislav Bogdanov wrote:

Hi,

Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1


Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems







___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource


Hi Dejan,

16.02.2015 13:47, Dejan Muhamedagic wrote:

Hi,

On Mon, Feb 16, 2015 at 11:20:16AM +0300, Vladislav Bogdanov wrote:

16.02.2015 11:15, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,

may be it is worth to silently (or at least with rc=0) allow deletion of
non-existing or already-deleted configuration statements?

Background for that is that I keep track of the all configuration
statements myself, and, when I delete some resources (together with
accompanying constraints), they may go out-of-order to 'crm configure
delete', thus some constraints are automatically deleted when deleting
lower resource before the upper one. That leads to the whole crm
script to fail.


crmsh tries hard to preserve the CIB sanity on removing elements.
It would be best that you just put all the elements you want to
delete on one line.


That's a really good idea, I'll look into this.




Hmm, I am not sure about doing this by default, since we would want to
show some kind of indication that a resource name may have been
misspelled for example... But I can imagine having a command line flag
for being more flexible in this regard.


Reuse '-F'?



I will look at how it works now.

BTW, I suspect that passing the --wait flag to crm while running
commands in this way may help you.


The --wait option effectively waits for the PE to settle. It is
normally useful only in resource/node levels and on configure
commit.


Although I am not sure I entirely
understand what it is you are doing :)


Look:
crm configure
primitive a ...
primitive b ...
colocation b-with-a inf: b a
commit
exit

crm configure
delete a
delete b-with-a = fails because is already deleted automatically


You can also omit removing constraints as they are going to be
removed with the resources they reference.


Unless the same function is used to remove just constraints too (like in 
my case - I compare old and new definition of an object with constraints 
and remove stale ones).


Anyways, thanks for pointer to multi-object deletes!

Best,
Vladislav



Cheers,

Dejan


delete b
commit

Best,
Vladislav



Cheers,
Kristoffer



Best,
Vladislav

13.02.2015 17:03, Vladislav Bogdanov wrote:

Hi,

Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1


Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems







___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-13 Thread Vladislav Bogdanov

Hi Kristoffer,

13.02.2015 17:20, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 Hi,

 Following fails with the current crmsh (e4b10ee).
 # crm resource stop cl-http-lv
 # crm resource stop cl-http-lv
 ERROR: crm_diff apparently failed to produce the diff (rc=0)
 ERROR: Failed to commit updates to cl-http-lv
 # echo $?
 1

 
 Hi,
 
 What would you expect to see when stopping an already stopped resource?

I'd expect crmsh to behave similar to
crm_resource --resource cl-http-lv --set-parameter target-role --meta 
--parameter-value Stopped

At least it should not exit with failure ret code.

Best,
Vladislav

 
 Cheers,
 Kristoffer
 

 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh fails to stop already stopped resource

2015-02-13 Thread Vladislav Bogdanov


13.02.2015 18:04, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,

13.02.2015 17:20, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi,

Following fails with the current crmsh (e4b10ee).
# crm resource stop cl-http-lv
# crm resource stop cl-http-lv
ERROR: crm_diff apparently failed to produce the diff (rc=0)
ERROR: Failed to commit updates to cl-http-lv
# echo $?
1



Hi,

What would you expect to see when stopping an already stopped resource?


I'd expect crmsh to behave similar to
crm_resource --resource cl-http-lv --set-parameter target-role --meta 
--parameter-value Stopped

At least it should not exit with failure ret code.


Yeah, I see what you mean. I have fixed this upstream now.


Thanks Kristoffer!



Thanks!

// Kristoffer




Best,
Vladislav



Cheers,
Kristoffer



Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems










___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] [Patch] Collection of patches for crmsh

2015-01-19 Thread Vladislav Bogdanov


Hi Dejan,

19.01.2015 16:30, Dejan Muhamedagic wrote:

Hi Vladislav,


[...]
Fix transition start detection.

--- a/modules/constants.py  2014-12-22 08:48:26.0 +
+++ b/modules/constants.py  2014-12-22 13:07:43.945077805 +
@@ -272,7 +272,7 @@
  # r.group(3) file number
  transition_patt = [
  # transition start
-crmd.* do_te_invoke: Processing graph ([0-9]+) .*derived from 
(.*/pe-[^-]+-(%%)[.]bz2),
+pengine.* process_pe_message: Calculated Transition ([0-9]+): 
(.*/pe-[^-]+-(%%)[.]bz2),


Do you know when this changed?


Original message (from do_te_invoke) was downgraded into the 'info' 
priority a long ago (probably during that Andrew's massive logging 
cleanup), while process_pe_message' one still remains at the 'notice' 
level. First my patch has 2012-12-26 as its date (for crmsh-1.2.4), so 
the change was done before that. iirc process_pe_message's message was 
always there, both messages were printed before that cleanup.




The reason I'm asking is that crmsh tries to support multiple
pacemaker versions, so I'm not sure if we can just replace this
pattern.


Make tar follow symlinks.

--- a/modules/crm_pssh.py   2013-08-12 12:52:11.0 +
+++ b/modules/crm_pssh.py   2013-08-12 12:53:32.666444069 +
@@ -170,7 +170,7 @@
  dir = /%s % r.group(1)
  red_pe_l = [x.replace(%s/ % r.group(1), ) for x in pe_l]
  common_debug(getting new PE inputs %s from %s % (red_pe_l, node))
-cmdline = tar -C %s -cf - %s % (dir, ' '.join(red_pe_l))
+cmdline = tar -C %s -chf - %s % (dir, ' '.join(red_pe_l))


Just curious: where did you find links in the PE input
directories?


Ahm, you know, systems are s different around a world ;)
And system administrators sometimes want to do weird things ;)

Actually that one is specific to my diskless clusters, but it wont hurt 
anyways.




And many thanks for the patches!

Cheers,

Dejan



Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] [Patch] Collection of patches for crmsh

2015-01-19 Thread Vladislav Bogdanov


19.01.2015 16:27, Kristoffer Grönlund wrote:

Vladislav Bogdanov bub...@hoster-ok.com writes:


Hi Kristoffer,


there are two patches, one for crmsh and one for parallax.
They make history commands work.


Thanks!

I have created a pull request with the patches for crmsh here:

https://github.com/crmsh/crmsh/pull/77


Thank you very much, I do not have enough will to make myself ride that 
web-2.0 tools ;)




Cheers,
Kristoffer



Best,
Vladislav






___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] [Patch] Collection of patches for crmsh

2015-01-19 Thread Vladislav Bogdanov

Hi Kristoffer,


there are two patches, one for crmsh and one for parallax.
They make history commands work.

--- a/modules/crm_pssh.py   2015-01-19 11:42:02.0 +
+++ b/modules/crm_pssh.py   2015-01-19 12:17:46.328000847 +
@@ -85,14 +85,14 @@ def do_pssh(l, opts):
'-o', 'PasswordAuthentication=no',
'-o', 'SendEnv=PSSH_NODENUM',
'-o', 'StrictHostKeyChecking=no']
-if opts.options:
+if hasattr(opts, 'options'):
 for opt in opts.options:
 cmd += ['-o', opt]
 if user:
 cmd += ['-l', user]
 if port:
 cmd += ['-p', port]
-if opts.extra:
+if hasattr(opts, 'extra'):
 cmd.extend(opts.extra)
 if cmdline:
 cmd.append(cmdline)
---

--- a/parallax/manager.py   2014-10-15 13:40:04.0 +
+++ b/parallax/manager.py   2015-01-19 12:15:47.911000236 +
@@ -47,11 +47,26 @@ class Manager(object):
 # Backwards compatibility with old __init__
 # format: Only argument is an options dict
 if not isinstance(limit, int):
-self.limit = limit.par
-self.timeout = limit.timeout
-self.askpass = limit.askpass
-self.outdir = limit.outdir
-self.errdir = limit.errdir
+if hasattr(limit, 'par'):
+self.limit = limit.par
+else:
+self.limit = DEFAULT_PARALLELISM
+if hasattr(limit, 'timeout'):
+self.timeout = limit.timeout
+else:
+self.timeout = DEFAULT_TIMEOUT
+if hasattr(limit, 'askpass'):
+self.askpass = limit.askpass
+else:
+self.askpass = False
+if hasattr(limit, 'outdir'):
+self.outdir = limit.outdir
+else:
+self.outdir = None
+if hasattr(limit, 'errdir'):
+self.errdir = limit.errdir
+else:
+self.errdir = None
 else:
 self.limit = limit
 self.timeout = timeout
---

Two more patches I use for ages in my builds are:

Fix transition start detection.

--- a/modules/constants.py  2014-12-22 08:48:26.0 +
+++ b/modules/constants.py  2014-12-22 13:07:43.945077805 +
@@ -272,7 +272,7 @@
 # r.group(3) file number
 transition_patt = [
 # transition start
-crmd.* do_te_invoke: Processing graph ([0-9]+) .*derived from 
(.*/pe-[^-]+-(%%)[.]bz2),
+pengine.* process_pe_message: Calculated Transition ([0-9]+): 
(.*/pe-[^-]+-(%%)[.]bz2),
 # r.group(1) transition number (a different thing from file number)
 # r.group(2) contains full path
 # r.group(3) transition status
---

Make tar follow symlinks.

--- a/modules/crm_pssh.py   2013-08-12 12:52:11.0 +
+++ b/modules/crm_pssh.py   2013-08-12 12:53:32.666444069 +
@@ -170,7 +170,7 @@
 dir = /%s % r.group(1)
 red_pe_l = [x.replace(%s/ % r.group(1), ) for x in pe_l]
 common_debug(getting new PE inputs %s from %s % (red_pe_l, node))
-cmdline = tar -C %s -cf - %s % (dir, ' '.join(red_pe_l))
+cmdline = tar -C %s -chf - %s % (dir, ' '.join(red_pe_l))
 opts = parse_args(outdir, errdir)
 l.append([node, cmdline])
 if not l:
---


Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm configure show to a pipe

2014-11-17 Thread Vladislav Bogdanov

17.11.2014 14:00, Dejan Muhamedagic пишет:
 Hi,
 
 On Mon, Nov 17, 2014 at 10:05:59AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer, all,

 running 'crm configure show  file' appends non-printable chars at the
 end (at least if op_defaults is used):
 
 Best to use crm configure save for filtering (I guess that you
 don't want colors in that case). As for strange codes output,

Great! How I missed that? :)
The only noticeable difference that it is impossible to save partial
CIB, filtering by object ids (like 'show' allows).

 they're most likely due to some libreadline bug and TERM set to
 xterm. I found some information at the time here:
 
 https://bugs.gentoo.org/show_bug.cgi?id=246091
 
 We dealt with that then by not importing readline unless
 absolutely necessary. The changeset is 4d11007. My bad for not
 commenting that in the code.
 
 readline probably gets imported in non-interactive mode again.
 
 Thanks,
 
 Dejan
 
 
 ...
 property cib-bootstrap-options: \
 dc-version=1.1.12-c191bf3 \
 cluster-infrastructure=corosync \
 cluster-recheck-interval=10m \
 stonith-enabled=false \
 no-quorum-policy=freeze \
 last-lrm-refresh=1415955398 \
 maintenance-mode=false \
 stop-all-resources=false \
 stop-orphan-resources=true \
 have-watchdog=false
 rsc_defaults rsc_options: \
 allow-migrate=false \
 failure-timeout=10m \
 migration-threshold=INFINITY \
 multiple-active=stop_start \
 priority=0
 op_defaults op-options: \
 record-pending=true.[?1034h


 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm configure show to a pipe

2014-11-17 Thread Vladislav Bogdanov

17.11.2014 15:39, Kristoffer Grönlund wrote:
 Dejan Muhamedagic deja...@fastmail.fm writes:
 
 Hi,

 On Mon, Nov 17, 2014 at 10:05:59AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer, all,

 running 'crm configure show  file' appends non-printable chars at the
 end (at least if op_defaults is used):

 Best to use crm configure save for filtering (I guess that you
 don't want colors in that case). As for strange codes output,
 they're most likely due to some libreadline bug and TERM set to
 xterm. I found some information at the time here:

 https://bugs.gentoo.org/show_bug.cgi?id=246091

 We dealt with that then by not importing readline unless
 absolutely necessary. The changeset is 4d11007. My bad for not
 commenting that in the code.

 readline probably gets imported in non-interactive mode again.

 
 I can confirm that yes, it does. My apologies for reintroducing this
 issue! I will change this.
 
 I will also look at adding optional filtering to the save command just
 like for show and edit. This seems like a useful feature to me.
 

Thank you for you extremely productive work!

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crmsh and 'no such resource agent' error

2014-11-17 Thread Vladislav Bogdanov

Hi Kristoffer, all,

It seems like with introduction of 'resource-discovery'
'symmetric-cluster=true' becomes not so strict in sense of resource
agents sets across nodes.

May be it is possible to add a config options to disable error messages
like:

got no meta-data, does this RA exist?
no such resource agent

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crm configure show to a pipe

2014-11-16 Thread Vladislav Bogdanov

Hi Kristoffer, all,

running 'crm configure show  file' appends non-printable chars at the
end (at least if op_defaults is used):

...
property cib-bootstrap-options: \
dc-version=1.1.12-c191bf3 \
cluster-infrastructure=corosync \
cluster-recheck-interval=10m \
stonith-enabled=false \
no-quorum-policy=freeze \
last-lrm-refresh=1415955398 \
maintenance-mode=false \
stop-all-resources=false \
stop-orphan-resources=true \
have-watchdog=false
rsc_defaults rsc_options: \
allow-migrate=false \
failure-timeout=10m \
migration-threshold=INFINITY \
multiple-active=stop_start \
priority=0
op_defaults op-options: \
record-pending=true.[?1034h


Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Pending state support

2014-11-13 Thread Vladislav Bogdanov

13.11.2014 12:20, Ulrich Windl wrote:
 I realized that older versions of crm_mon don't have it (-j); thus it will 
 spit out a usage message. Try to avoid that problem, please.

Yes, it appeared iirc in 1.1.10 or 1.1.11, so simple version check
should be enough. And that check is already implemented and used for
other features.

 
 Vladislav Bogdanov bub...@hoster-ok.com schrieb am 13.11.2014 um 07:26 in
 Nachricht 54644f2c.3020...@hoster-ok.com:
 Hi Kristoffer!

 May I bump this one?

 Best,
 Vladislav

 04.11.2014 11:15, Vladislav Bogdanov wrote:
 Hi Kristoffer, Dejan, all.

 May be it is time to add '-j' param to 'crm_mon -1' by default (if
 supported)?

 Best,
 Vladislav


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha 
 See also: http://linux-ha.org/ReportingProblems 
 
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crmsh and 'resource-discovery'

Hi Kristoffer, Dejan.

Do you have plans to add support to crmsh for 'resource-discovery'
location constraint option (added to pacemaker by David in pull requests
#589 and #605) as well as for the 'pacemaker-next' schema (this one
seems to be trivial)?

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh and 'resource-discovery'

12.11.2014 23:32, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 Hi Kristoffer, Dejan.

 Do you have plans to add support to crmsh for 'resource-discovery'
 location constraint option (added to pacemaker by David in pull requests
 #589 and #605) as well as for the 'pacemaker-next' schema (this one
 seems to be trivial)?

 Best,
 Vladislav

 
 I haven't had time to look closer at resource-discovery, but yes, I
 certainly intend to support every option that makes it into a released
 version of pacemaker at least.

Great. Can't wait for that to happen :)

 
 As for the pacemaker-next schema, I thought I had added support for it
 already, but I haven't actually tested it :) But yes, it should be
 usable in theory at least, and if it is not, that is a bug that I will
 fix.

It is not supported in crmsh-2.1.1-1.1 rpm for EL7 in OBS. Regexps in
three places match only pacemaker-[[:digit:]]\.[[:digit:]] and can
trivially be fixed.

Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmsh and 'resource-discovery'

13.11.2014 00:12, Kristoffer Grönlund wrote:
 Vladislav Bogdanov bub...@hoster-ok.com writes:
 
 I haven't had time to look closer at resource-discovery, but yes, I
 certainly intend to support every option that makes it into a released
 version of pacemaker at least.

 Great. Can't wait for that to happen :)


 As for the pacemaker-next schema, I thought I had added support for it
 already, but I haven't actually tested it :) But yes, it should be
 usable in theory at least, and if it is not, that is a bug that I will
 fix.

 It is not supported in crmsh-2.1.1-1.1 rpm for EL7 in OBS. Regexps in
 three places match only pacemaker-[[:digit:]]\.[[:digit:]] and can
 trivially be fixed.
 
 Alright, I have added tentative support for both resource-discovery and
 the pacemaker-next schema in the master branch for crmsh.

Yep!

will test tomorrow morning.

One more place for pacemaker-next is cibconfig.py, CibFactory:__init__
self.supported_cib_re = ^pacemaker-([12][.][0123]|next)$

Thank you,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Pending state support

Hi Kristoffer!

May I bump this one?

Best,
Vladislav

04.11.2014 11:15, Vladislav Bogdanov wrote:
 Hi Kristoffer, Dejan, all.
 
 May be it is time to add '-j' param to 'crm_mon -1' by default (if
 supported)?
 
 Best,
 Vladislav
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Pending state support

2014-11-04 Thread Vladislav Bogdanov

Hi Kristoffer, Dejan, all.

May be it is time to add '-j' param to 'crm_mon -1' by default (if
supported)?

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-29 Thread Vladislav Bogdanov

29.10.2014 12:49, Dejan Muhamedagic wrote:

...

 On the other hand, this feature is relatively new (has it ever
 been released?) so it is much simpler to fix that breakage in pacemaker.
 
 It's not pacemaker, it's just a resource agent. Which makes it
 much easier to fix, just by introducing one parameter which would
 hold the remote node name.

In this case some pacemaker internals are also involved. RA is just a
stub with a well-known name.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-29 Thread Vladislav Bogdanov

29.10.2014 13:55, Dejan Muhamedagic wrote:
 On Wed, Oct 29, 2014 at 01:03:50PM +0300, Vladislav Bogdanov wrote:
 29.10.2014 12:49, Dejan Muhamedagic wrote:

 ...

 On the other hand, this feature is relatively new (has it ever
 been released?) so it is much simpler to fix that breakage in pacemaker.

 It's not pacemaker, it's just a resource agent. Which makes it
 much easier to fix, just by introducing one parameter which would
 hold the remote node name.

 In this case some pacemaker internals are also involved. RA is just a
 stub with a well-known name.
 
 Really? Oops ;-)
 
 At any rate, Kristoffer did some small patch which makes this
 work for the most part (and as long as the node ID is different
 from its uname; sigh). It's available with the latest release
 2.1.1.

Great!

Thanks for the info.

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-28 Thread Vladislav Bogdanov

28.10.2014 21:15, David Vossel wrote:
 
 
 - Original Message -
 22.10.2014 12:02, Dejan Muhamedagic wrote:
 On Mon, Oct 20, 2014 at 07:12:23PM +0300, Vladislav Bogdanov wrote:
 20.10.2014 18:23, Dejan Muhamedagic wrote:
 Hi Vladislav,

 Hi Dejan!


 On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer,

 do you plan to add support for recently added remote node attributes
 feature to chmsh?

 Currently (at least as of 2.1, and I do not see anything relevant in the
 git log) crmsh fails to update CIB if it contains node attributes for
 remote (bare-metal) node, complaining that duplicate element is found.

 No wonder :) The uname effectively dubs as an element id.

 But for bare-metal nodes it is natural to have ocf:pacemaker:remote
 resource with name equal to remote node uname (I doubt it can be
 configured differently).

 Is that required?

 Didn't look in code, but seems like yes, :remote resource name is the
 only place where pacemaker can obtain that node name.

 I find it surprising that the id is used to carry information.
 I'm not sure if we had a similar case (apart from attributes).

 If I comment check for 'obj_id in id_set', then it fails to update CIB
 because it inserts above primitive definition into the node section.

 Could you please show what would the CIB look like with such a
 remote resource (in crmsh notation).



 node 1: node01
 node rnode001:remote \
attributes attr=value
 primitive rnode001 ocf:pacemaker:remote \
 params server=192.168.168.20 \
 op monitor interval=10 \
 meta target-role=Started

 What do you expect to happen when you reference rnode001, in say:

 That is not me ;) I just want to be able to use crmsh to assign remote
 node operational and utilization (?) attributes and to work with it
 after that.

 Probably that is not yet set in stone, and David may change that
 allowing to f.e. new 'node_name' parameter to ocf:pacemaker:remote
 override remote node name guessed from the primitive name.

 David, could you comment please?
 
 why we would want to separate the remote-node from the resource's primative
 instance name?

It breaks existing crmsh internal concept that every object in a CIB has
unique name. This also breaks syntax of some existing commands, as Dejan
says, f.e.

crm configure show rnode001

or

crm configure edit rnode001 (?)

From what I see it is very hard to modify crmsh to support objects with
different types but with equal names, and that will definitely break its
maturity. On the other hand, this feature is relatively new (has it ever
been released?) so it is much simpler to fix that breakage in pacemaker.

Best,
Vladislav

 
 -- David
 

 Best,
 Vladislav


 crm configure show rnode001

 I'm still trying to digest having hostname used to name some
 other element. Wonder what/where else will we have issues for
 this reason.

 Cheers,

 Dejan

 Best,
 Vladislav

 Given that nodes are for the most part referenced by uname
 (instead of by id), do you think that a configuration where
 a primitive element is named the same as a node, the user can
 handle that in an efficient manner? (NB: No experience here with
 ocf:pacemaker:remote :)




 Cheers,

 Dejan



 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-22 Thread Vladislav Bogdanov

22.10.2014 12:02, Dejan Muhamedagic wrote:
 On Mon, Oct 20, 2014 at 07:12:23PM +0300, Vladislav Bogdanov wrote:
 20.10.2014 18:23, Dejan Muhamedagic wrote:
 Hi Vladislav,

 Hi Dejan!


 On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer,

 do you plan to add support for recently added remote node attributes
 feature to chmsh?

 Currently (at least as of 2.1, and I do not see anything relevant in the
 git log) crmsh fails to update CIB if it contains node attributes for
 remote (bare-metal) node, complaining that duplicate element is found.

 No wonder :) The uname effectively dubs as an element id.

 But for bare-metal nodes it is natural to have ocf:pacemaker:remote
 resource with name equal to remote node uname (I doubt it can be
 configured differently).

 Is that required?

 Didn't look in code, but seems like yes, :remote resource name is the
 only place where pacemaker can obtain that node name.
 
 I find it surprising that the id is used to carry information.
 I'm not sure if we had a similar case (apart from attributes).
 
 If I comment check for 'obj_id in id_set', then it fails to update CIB
 because it inserts above primitive definition into the node section.

 Could you please show what would the CIB look like with such a
 remote resource (in crmsh notation).



 node 1: node01
 node rnode001:remote \
  attributes attr=value
 primitive rnode001 ocf:pacemaker:remote \
 params server=192.168.168.20 \
 op monitor interval=10 \
 meta target-role=Started
 
 What do you expect to happen when you reference rnode001, in say:

That is not me ;) I just want to be able to use crmsh to assign remote
node operational and utilization (?) attributes and to work with it
after that.

Probably that is not yet set in stone, and David may change that
allowing to f.e. new 'node_name' parameter to ocf:pacemaker:remote
override remote node name guessed from the primitive name.

David, could you comment please?

Best,
Vladislav

 
 crm configure show rnode001
 
 I'm still trying to digest having hostname used to name some
 other element. Wonder what/where else will we have issues for
 this reason.
 
 Cheers,
 
 Dejan
 
 Best,
 Vladislav

 Given that nodes are for the most part referenced by uname
 (instead of by id), do you think that a configuration where
 a primitive element is named the same as a node, the user can
 handle that in an efficient manner? (NB: No experience here with
 ocf:pacemaker:remote :)




 Cheers,

 Dejan



 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Remote node attributes support in crmsh

2014-10-20 Thread Vladislav Bogdanov

Hi Kristoffer,

do you plan to add support for recently added remote node attributes
feature to chmsh?

Currently (at least as of 2.1, and I do not see anything relevant in the
git log) crmsh fails to update CIB if it contains node attributes for
remote (bare-metal) node, complaining that duplicate element is found.
But for bare-metal nodes it is natural to have ocf:pacemaker:remote
resource with name equal to remote node uname (I doubt it can be
configured differently).
If I comment check for 'obj_id in id_set', then it fails to update CIB
because it inserts above primitive definition into the node section.

Best,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Remote node attributes support in crmsh

2014-10-20 Thread Vladislav Bogdanov

20.10.2014 18:23, Dejan Muhamedagic wrote:
 Hi Vladislav,

Hi Dejan!

 
 On Mon, Oct 20, 2014 at 09:03:40AM +0300, Vladislav Bogdanov wrote:
 Hi Kristoffer,

 do you plan to add support for recently added remote node attributes
 feature to chmsh?

 Currently (at least as of 2.1, and I do not see anything relevant in the
 git log) crmsh fails to update CIB if it contains node attributes for
 remote (bare-metal) node, complaining that duplicate element is found.
 
 No wonder :) The uname effectively dubs as an element id.
 
 But for bare-metal nodes it is natural to have ocf:pacemaker:remote
 resource with name equal to remote node uname (I doubt it can be
 configured differently).
 
 Is that required?

Didn't look in code, but seems like yes, :remote resource name is the
only place where pacemaker can obtain that node name.

 
 If I comment check for 'obj_id in id_set', then it fails to update CIB
 because it inserts above primitive definition into the node section.
 
 Could you please show what would the CIB look like with such a
 remote resource (in crmsh notation).
 


node 1: node01
node rnode001:remote \
attributes attr=value
primitive rnode001 ocf:pacemaker:remote \
params server=192.168.168.20 \
op monitor interval=10 \
meta target-role=Started


Best,
Vladislav

 Given that nodes are for the most part referenced by uname
 (instead of by id), do you think that a configuration where
 a primitive element is named the same as a node, the user can
 handle that in an efficient manner? (NB: No experience here with
 ocf:pacemaker:remote :)



 
 Cheers,
 
 Dejan
 
 

 Best,
 Vladislav
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-05-26 Thread Vladislav Bogdanov

26.05.2014 15:01, Kristoffer Grönlund wrote:
 On Tue, 13 May 2014 11:42:16 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 crmsh 2.0 as released unfortunately does not support rules in
 attribute lists. However, I am working on this specific feature
 right now, and it is almost ready to be merged into the mainline
 development branch. I should have it ready some time this week.
 Once that is in, I will also release crmsh 2.1, so there will be
 packages available that supports this feature.  

 Awesome!
 Thank you for info.
 
 Hi again,
 
 Unfortunately due to some unrelated changes in crmsh I am not quite
 ready to release 2.1 just yet, but support for rules in attribute lists
 has been added to the github master branch now:
 
 https://github.com/crmsh/crmsh
 
 The release of the new version is coming soon, but until then, it
 should be possible to build updated rpms for all platforms from source.
 

Thanks Kristoffer!

Are there any known deficiencies which may affect operation?

Vladislav

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-05-13 Thread Vladislav Bogdanov

13.05.2014 11:30, Kristoffer Grönlund wrote:
 On Tue, 13 May 2014 08:26:27 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Hi Kristoffer,

 I may be missing something, but anyways.
 crmsh did not support Using Rules to Control Resource Options
 (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
 in the past.
 Is it supported now, or, if not, do you have plans implementing such
 support?

 
 Hi Vladislav,
 
 crmsh 2.0 as released unfortunately does not support rules in
 attribute lists. However, I am working on this specific feature right
 now, and it is almost ready to be merged into the mainline development
 branch. I should have it ready some time this week. Once that is in, I
 will also release crmsh 2.1, so there will be packages available that
 supports this feature.

Awesome!
Thank you for info.

 
 The syntax will be something like the following:
 
 primitive mySpecialRsc me:Special \
 params 3: rule #uname eq node1 interface=eth1 \
 params 2: rule #uname eq node2 interface=eth2 port= \
 params 1: interface=eth0 port=
 
 Cheers,
 Kristoffer
 
 Best,
 Vladislav

 ___
 Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
 Home Page: http://linux-ha.org/

 
 
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crmsh 2.0 released, and moving to Github

2014-05-12 Thread Vladislav Bogdanov

Hi Kristoffer,

I may be missing something, but anyways.
crmsh did not support Using Rules to Control Resource Options
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
in the past.
Is it supported now, or, if not, do you have plans implementing such
support?

Best,
Vladislav

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

2013-11-19 Thread Vladislav Bogdanov

19.11.2013 13:48, Lars Ellenberg wrote:
 On Wed, Nov 13, 2013 at 09:02:47AM +0300, Vladislav Bogdanov wrote:
 13.11.2013 04:46, Jefferson Ogata wrote:
 ...

 In practice i ran into failover problems under load almost immediately.
 Under load, when i would initiate a failover, there was a race
 condition: the iSCSILogicalUnit RA will take down the LUNs one at a
 time, waiting for each connection to terminate, and if the initiators
 reconnect quickly enough, they get pissed off at finding that the target
 still exists but the LUN they were using no longer does, which is often
 the case during this transient takedown process. On the initiator, it
 looks something like this, and it's fatal (here LUN 4 has gone away but
 the target is still alive, maybe working on disconnecting LUN 3):

 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal
 Request [current]
 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit
 not supported
 Nov  7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical
 block 16542656

 One solution to this is using the portblock RA to block all initiator

 In addition I force use of multipath on initiators with no_path_retry=queue

 ...


 1. Lack of support for multiple targets using the same tgt account. This
 is a problem because the iSCSITarget RA defines the user and the target
 at the same time. If it allowed multiple targets to use the same user,
 it wouldn't know when it is safe to delete the user in a stop operation,
 because some other target might still be using it.

 To solve this i did two things: first i wrote a new RA that manages a
 
 Did I miss it, or did you post it somewhere?
 Fork on Github and push there, so we can have a look?
 
 tgt user; this is instantiated as a clone so it runs along with the tgtd
 clone. Second i tweaked the iSCSITarget RA so that on start, if
 incoming_username is defined but incoming_password is not, the RA skips
 the account creation step and simply binds the new target to
 incoming_username. On stop, it similarly no longer deletes the account
 if incoming_password is unset. I also had to relax the uniqueness
 constraint on incoming_username in the RA metadata.

 2. Disappearing LUNs during failover cause initiators to blow chunks.
 For this i used portblock, but had to modify it because the TCP Send-Q
 would never drain.

 3. portblock preventing TCP Send-Q from draining, causing tgtd
 connections to hang. I modified portblock to reverse the sense of the
 iptables rules it was adding: instead of blocking traffic from the
 initiator on the INPUT chain, it now blocks traffic from the target on
 the OUTPUT chain with a tcp-reset response. With this setup, as soon as
 portblock goes active, the next packet tgtd attempts to send to a given
 initiator will get a TCP RST response, causing tgtd to hang up the
 connection immediately. This configuration allows the connections to
 terminate promptly under load.

 I'm not totally satisfied with this workaround. It means
 acknowledgements of operations tgtd has actually completed never make it
 back to the initiator. I suspect this could cause problems in some
 scenarios. I don't think it causes a problem the way i'm using it, with
 each LUN as backing store for a distinct VM--when the LUN is back up on
 the other node, the outstanding operations are re-sent by the initiator.
 Maybe with a clustered filesystem this would cause problems; it
 certainly would cause problems if the target device were, for example, a
 tape drive.
 
 Maybe only block new incoming connection attempts?
 

That may cause issues on an initiator side in some circumstances (IIRC):
* connection is established
* pacemaker fires target move
* target is destroyed, connection breaks (TCP RST is sent to initiator)
* initiator connects again
* target is not available on iSCSI level (but portals answer either on
old or on new node) or portals are not available
* initiator *returns error* to an upper layer - this one is important
* target is configured on other node then

I was hit by this, but that was several years ago, so I may miss some
details.

My experience with IET and LIO shows it is better (safer) to block all
iSCSI traffic to target's portals, both directions.
* connection is established
* pacemaker fires target move
* both directions are blocked (DROP) on both target nodes
* target is destroyed, connection stays established on initiator side,
just TCP packets timeout
* target is configured on other node (VIPs are moved too)
* firewall rules are removed
* initiator (re)sends request
* target sends RST (?) back - it doesn't have that connection
* initiator reconnects and continues to use target


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] iSCSI corruption during interconnect failure with pacemaker+tgt+drbd+protocol C

2013-11-12 Thread Vladislav Bogdanov

13.11.2013 06:10, Jefferson Ogata wrote:
 Here's a problem i don't understand, and i'd like a solution to if
 possible, or at least i'd like to understand why it's a problem, because
 i'm clearly not getting something.
 
 I have an iSCSI target cluster using CentOS 6.4 with stock
 pacemaker/CMAN/corosync and tgt, and DRBD 8.4 which i've built from source.
 
 Both DRBD and cluster comms use a dedicated crossover link.
 
 The target storage is battery-backed RAID.
 
 DRBD resources all use protocol C.
 
 stonith is configured and working.
 
 tgtd write cache is disabled using mode_page in additional_params. This
 is correctly reported using sdparm --get WCE on initiators.
 
 Here's the question: if i am writing from an iSCSI initiator, and i take
 down the crossover link between the nodes of my cluster, i end up with
 corrupt data on the target disk.
 
 I know this isn't the formal way to test pacemaker failover.
 Everything's fine if i fence a node or do a manual migration or
 shutdown. But i don't understand why taking the crossover down results
 in corrupted write operations.
 
 In greater detail, assuming the initiator sends a write request for some
 block, here's the normal sequence as i understand it:
 
 - tgtd receives it and queues it straight for the device backing the LUN
 (write cache is disabled).
 - drbd receives it, commits it to disk, sends it to the other node, and
 waits for an acknowledgement (protocol C).
 - the remote node receives it, commits it to disk, and sends an
 acknowledgement.
 - the initial node receives the drbd acknowledgement, and acknowledges
 the write to tgtd.
 - tgtd acknowledges the write to the initiator.
 
 Now, suppose an initiator is writing when i take the crossover link
 down, and pacemaker reacts to the loss in comms by fencing the node with
 the currently active target. It then brings up the target on the
 surviving, formerly inactive, node. This results in a drbd split brain,
 since some writes have been queued on the fenced node but never made it
 to the surviving node, and must be retransmitted by the initiator; once
 the surviving node becomes active it starts committing these writes to
 its copy of the mirror. I'm fine with a split brain; i can resolve it by
 discarding outstanding data on the fenced node.
 
 But in practice, the actual written data is lost, and i don't understand
 why. AFAICS, none of the outstanding writes should have been
 acknowledged by tgtd on the fenced node, so when the surviving node
 becomes active, the initiator should simply re-send all of them. But
 this isn't what happens; instead most of the outstanding writes are
 lost. No i/o error is reported on the initiator; stuff just vanishes.
 
 I'm writing directly to a block device for these tests, so the lost data
 isn't the result of filesystem corruption; it simply never gets written
 to the target disk on the survivor.
 
 What am i missing?

Do you have handlers (fence-peer /usr/lib/drbd/crm-fence-peer.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;) configured in
drbd.conf?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] drbd/pacemaker multiple tgt targets, portblock, and race conditions (long-ish)

2013-11-12 Thread Vladislav Bogdanov

13.11.2013 04:46, Jefferson Ogata wrote:
...
 
 In practice i ran into failover problems under load almost immediately.
 Under load, when i would initiate a failover, there was a race
 condition: the iSCSILogicalUnit RA will take down the LUNs one at a
 time, waiting for each connection to terminate, and if the initiators
 reconnect quickly enough, they get pissed off at finding that the target
 still exists but the LUN they were using no longer does, which is often
 the case during this transient takedown process. On the initiator, it
 looks something like this, and it's fatal (here LUN 4 has gone away but
 the target is still alive, maybe working on disconnecting LUN 3):
 
 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Sense Key : Illegal
 Request [current]
 Nov  7 07:39:29 s01c kernel: sd 6:0:0:4: [sde] Add. Sense: Logical unit
 not supported
 Nov  7 07:39:29 s01c kernel: Buffer I/O error on device sde, logical
 block 16542656
 
 One solution to this is using the portblock RA to block all initiator

In addition I force use of multipath on initiators with no_path_retry=queue

...

 
 1. Lack of support for multiple targets using the same tgt account. This
 is a problem because the iSCSITarget RA defines the user and the target
 at the same time. If it allowed multiple targets to use the same user,
 it wouldn't know when it is safe to delete the user in a stop operation,
 because some other target might still be using it.
 
 To solve this i did two things: first i wrote a new RA that manages a
 tgt user; this is instantiated as a clone so it runs along with the tgtd
 clone. Second i tweaked the iSCSITarget RA so that on start, if
 incoming_username is defined but incoming_password is not, the RA skips
 the account creation step and simply binds the new target to
 incoming_username. On stop, it similarly no longer deletes the account
 if incoming_password is unset. I also had to relax the uniqueness
 constraint on incoming_username in the RA metadata.
 
 2. Disappearing LUNs during failover cause initiators to blow chunks.
 For this i used portblock, but had to modify it because the TCP Send-Q
 would never drain.
 
 3. portblock preventing TCP Send-Q from draining, causing tgtd
 connections to hang. I modified portblock to reverse the sense of the
 iptables rules it was adding: instead of blocking traffic from the
 initiator on the INPUT chain, it now blocks traffic from the target on
 the OUTPUT chain with a tcp-reset response. With this setup, as soon as
 portblock goes active, the next packet tgtd attempts to send to a given
 initiator will get a TCP RST response, causing tgtd to hang up the
 connection immediately. This configuration allows the connections to
 terminate promptly under load.
 
 I'm not totally satisfied with this workaround. It means
 acknowledgements of operations tgtd has actually completed never make it
 back to the initiator. I suspect this could cause problems in some
 scenarios. I don't think it causes a problem the way i'm using it, with
 each LUN as backing store for a distinct VM--when the LUN is back up on
 the other node, the outstanding operations are re-sent by the initiator.
 Maybe with a clustered filesystem this would cause problems; it
 certainly would cause problems if the target device were, for example, a
 tape drive.
 
 4. Insufficient privileges faults in the portblock RA. This was
 another race condition that occurred because i was using multiple
 targets, meaning that without a mutex, multiple portblock invocations
 would be running in parallel during a failover. If you try to run
 iptables while another iptables is running, you get Resource not
 available and this was coming back to pacemaker as insufficient
 privileges. This is simply a bug in the portblock RA; it should have a
 mutex to prevent parallel iptables invocations. I fixed this by adding
 an ocf_release_lock_on_exit at the top, and adding an ocf_take_lock for
 start, stop, monitor, and status operations.
 
 I'm not sure why more people haven't run into these problems before. I
 hope it's not that i'm doing things wrong, but rather that few others
 haven't earnestly tried to build anything quite like this setup. If
 anyone out there has set up a similar cluster and *not* had these
 problems, i'd like to know about it. Meanwhile, if others *have* had
 these problems, i'd also like to know, especially if they've found
 alternate solutions.

Can't say about 1, I use IET, it doesn't seem to have that limitation.
2 - I use alternative home-brew ms RA which blocks (DROP) both input and
output for a specified VIP on demote (targets are configured to be bound
to that VIPs). I also export one big LUN per target and then set up clvm
VG on top of it (all initiators are in the same another cluster).
3 - can't say as well, IET is probably not affected.
4 - That is true, iptables doesn't have atomic rules management, so you
definitely need mutex or dispatcher like firewalld (didn't try it though).

Re: [Linux-HA] Xen RA and rebooting

2013-09-17 Thread Vladislav Bogdanov

17.09.2013 20:51, Tom Parker wrote:
 
 On 09/17/2013 01:13 AM, Vladislav Bogdanov wrote:
 14.09.2013 07:28, Tom Parker wrote:
 Hello All

 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.

 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.
 I use following in libvirt VM definitions to prevent this:
   on_poweroffdestroy/on_poweroff
   on_rebootdestroy/on_reboot
   on_crashdestroy/on_crash

 Vladislav
 Does this not show as a lot of failed operations?  I guess they will
 clean themselves up after the failure expires.

Exactly. And this is much better than data corruption.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Xen RA and rebooting

2013-09-16 Thread Vladislav Bogdanov

14.09.2013 07:28, Tom Parker wrote:
 Hello All
 
 Does anyone know of a good way to prevent pacemaker from declaring a vm
 dead if it's rebooted from inside the vm.  It seems to be detecting the
 vm as stopped for the brief moment between shutting down and starting
 up.  Often this causes the cluster to have two copies of the same vm if
 the locks are not set properly (which I have found to be unreliable) one
 that is managed and one that is abandonded.
 
 If anyone has any suggestions or parameters that I should be tweaking
 that would be appreciated.

I use following in libvirt VM definitions to prevent this:
  on_poweroffdestroy/on_poweroff
  on_rebootdestroy/on_reboot
  on_crashdestroy/on_crash

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Using rules for resource options control question

2013-09-13 Thread Vladislav Bogdanov

12.09.2013 11:57, Dejan Muhamedagic wrote:
 Hi Vladislav,
 
 On Wed, Sep 11, 2013 at 02:06:12PM +0300, Vladislav Bogdanov wrote:
 Hi Dejan, all,

 Didn't find the way to configure rule-controlled resource options
 (http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
 in crmsh manual.

 Is it implemented and, if yes, how to use it?
 
 No, crmsh supports rules only in location constraints. I guess
 that it shouldn't be such a huge undertaking to support rules for
 attributes if we only knew how to represent them.

Hi Dejan,

Do you mean something like (multiple definitions are allowed if all of
them has score, error otherwise; only one of definitions could have
empty expression)
===
params [score: [expression]] \
   parameters themselves
===
? If it is technically possible to always correctly detect the end of
expression and beginning of parameters (I think it is).
expression parsing could be reused from location constraint.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Using rules for resource options control question

2013-09-11 Thread Vladislav Bogdanov

Hi Dejan, all,

Didn't find the way to configure rule-controlled resource options
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/_using_rules_to_control_resource_options.html)
in crmsh manual.

Is it implemented and, if yes, how to use it?

Thanks,
Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Max number of resources under Pacemaker ?

2013-09-04 Thread Vladislav Bogdanov

04.09.2013 07:16, Andrew Beekhof wrote:
 
 On 03/09/2013, at 9:20 PM, Moullé Alain alain.mou...@bull.net
 wrote:
 
 Hello,
 
 A simple question : is there a maximum number of resources (let's
 say simple primitives) that Pacemaker can support at first at
 configuration of ressources via crm, and of course after
 configuration when Pacemaker has to monitor all the primitives ?
 
 Simple answer: it depends
 
 (more precisely, could we envisage around 500 or 600 primitives, or
 is it completely mad ? ;-) )
 
 (I know it is dependant on  node power, CPU, mem, etc., but I'm
 speaking here only of eventual Pacemaker limitations)
 
 There is no inherent limit, the policy engine can cope with many
 thousands.
 
 The CIB is less able to cope - for which batch-limit is useful (to
 throttle the number of operation updates being thrown at the CIB
 which limits its CPU usage). The other limit is local and cluster
 messaging sizes - once the compressed cib gets too big for either or
 both transports you can no longer even run 'cibadmin -Q'
 
 For IPC, the limit is tuneable via the environment. For corosync, its
 high (1Mb) but (I think) only tuneable at compile time.

Are there any possibilities/plans to implement partial messages?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov

03.09.2013 07:04, Digimer wrote:
...
 To solve problem 1, you can set a delay against one of the nodes. Say
 you set the fence primitive for node 01 to have 'delay=15'. When node
 1 goes to fence node 2, it starts immediately. When node 2 starts to
 fence node 1, it sees the 15 second delay and pauses. Node 1 will power
 off node 2 long before node 2 finishes the pause. You can further help
 this problem by disabling acpid on the nodes. Without it, the power-off
 signal from the BMC will be nearly instant, shortening up the window
 where both nodes can initiate a fence.

Does anybody know for sure how and *why* does it work? I mean why
disabling userspace ACPI event reader (which reads just what kernel
sends after hardware events) affects how hardware behaves?

 
 To solve problem 2, simply disable corosync/pacemaker from starting on
 boot. This way, the fenced node will be (hopefully) back up and running,
 so you can ssh into it and look at what happened. It won't try to rejoin
 the cluster though, so no risk of a fence loop.

Enhancement to this would be enabling corosync/pacemaker back during the
clean shutdown and disabling it after boot.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov

03.09.2013 21:45, Digimer wrote:
 On 03/09/13 14:14, Vladislav Bogdanov wrote:
 03.09.2013 07:04, Digimer wrote:
 ...
 To solve problem 1, you can set a delay against one of the nodes. Say
 you set the fence primitive for node 01 to have 'delay=15'. When node
 1 goes to fence node 2, it starts immediately. When node 2 starts to
 fence node 1, it sees the 15 second delay and pauses. Node 1 will power
 off node 2 long before node 2 finishes the pause. You can further help
 this problem by disabling acpid on the nodes. Without it, the power-off
 signal from the BMC will be nearly instant, shortening up the window
 where both nodes can initiate a fence.

 Does anybody know for sure how and *why* does it work? I mean why
 disabling userspace ACPI event reader (which reads just what kernel
 sends after hardware events) affects how hardware behaves?
 
 Disabling acpid causes, in my experience, the node to instantly power
 down when it receives a power-button event. How/why this happens is
 probably buried in the kernel source and/or ACPI definitions.

This assumes some kind of back-events, which are not the part of ACPI
iirc. And kernel just translates forward ACPI events (bits in hw
port???) to userspace.

Interesting enough, how do they do it...

 
 To solve problem 2, simply disable corosync/pacemaker from starting on
 boot. This way, the fenced node will be (hopefully) back up and running,
 so you can ssh into it and look at what happened. It won't try to rejoin
 the cluster though, so no risk of a fence loop.

 Enhancement to this would be enabling corosync/pacemaker back during the
 clean shutdown and disabling it after boot.
 
 That would be a good idea, actually. I like that.
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Quick 'death match cycle' question.

2013-09-03 Thread Vladislav Bogdanov

03.09.2013 21:36, Lars Marowsky-Bree wrote:
 On 2013-09-03T21:14:02, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 To solve problem 2, simply disable corosync/pacemaker from starting on
 boot. This way, the fenced node will be (hopefully) back up and running,
 so you can ssh into it and look at what happened. It won't try to rejoin
 the cluster though, so no risk of a fence loop.
 Enhancement to this would be enabling corosync/pacemaker back during the
 clean shutdown and disabling it after boot.
 
 There's something in sbd which does this. See
 https://github.com/l-mb/sbd/blob/master/man/sbd.8.pod and the -S option.

Yes, but I thought it is a no-go with just drbd replicated disks (usual
case for 2-node clusters).

 I'm contemplating how do to this in a generic fashion.

It is quite straight-forward with SysVinit and its emulation with
upstart, but could be tricky with native upstart and systemd. Need to
investigate...

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-27 Thread Vladislav Bogdanov

23.08.2013 16:48, Kristoffer Grönlund wrote:
 Hi,
 
 On Fri, 23 Aug 2013 16:33:28 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 No-no, it was before that fix too, at least with 19a3f1e5833c.
 Should I still try?

 
 Ah, in that case, it has not been fixed.
 
 No need to try. I will investigate further.

I verified that crm_diff produces correct xml diff if I change just one
property, so problem should really be in crmsh.

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-27 Thread Vladislav Bogdanov

27.08.2013 19:11, Dejan Muhamedagic wrote:
 Hi,
 
 On Tue, Aug 27, 2013 at 12:06:40PM +0300, Vladislav Bogdanov wrote:
 23.08.2013 16:48, Kristoffer Grönlund wrote:
 Hi,

 On Fri, 23 Aug 2013 16:33:28 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:

 No-no, it was before that fix too, at least with 19a3f1e5833c.
 Should I still try?


 Ah, in that case, it has not been fixed.

 No need to try. I will investigate further.

 I verified that crm_diff produces correct xml diff if I change just one
 property, so problem should really be in crmsh.
 
 Yes, just found where it is. The fix will be pushed tomorrow.

Yeees!
Thank you for info.
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-23 Thread Vladislav Bogdanov

22.08.2013 13:57, Kristoffer Grönlund wrote:
 Hi Takatoshi-san,
 
 On Wed, 21 Aug 2013 13:56:34 +0900
 Takatoshi MATSUO matsuo@gmail.com wrote:
 
 Hi Kristoffer

 I reproduced the error with latest changest(b5ffd99e).
 
 Thank you, with your description I was able to reproduce and create a
 test case for the problem. I have pushed a workaround for the issue in
 the crm shell which stops the crm shell from adding comments to the
 CIB. (changeset e35236439b8e)

Kristoffer, Dejan, could you please also look why I loose the whole
rsc_defaults $id=rsc_options
section when I do 'crm configure edit' and edit one of
property $id=cib-bootstrap-options?

pacemaker is 1.1.10, crmsh is latest tip.

Relevant log lines from cib process are:
Aug 23 08:44:23 mgmt01 crm_verify[5891]:   notice: crm_log_args: Invoked: 
crm_verify -V -p 
Aug 23 08:44:24 mgmt01 cibadmin[5897]:   notice: crm_log_args: Invoked: 
cibadmin -p -P 
Aug 23 08:44:25 mgmt01 crmd[10180]:   notice: do_state_transition: State 
transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL 
origin=abort_transition_graph ]
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: log_cib_diff: cib:diff: Local-only 
Change: 0.772.1
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
value=100 id=cib-bootstrap-options-default-resource-stickiness/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: --   
meta_attributes id=rsc_options
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=allow-migrate value=false id=rsc_options-allow-migrate/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=failure-timeout value=10m id=rsc_options-failure-timeout/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=migration-threshold value=INFINITY 
id=rsc_options-migration-threshold/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=multiple-active value=stop_start id=rsc_options-multiple-active/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: -- nvpair 
name=priority value=0 id=rsc_options-priority/
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: --   
/meta_attributes
Aug 23 08:44:25 mgmt01 cib[10175]:   notice: cib:diff: ++ nvpair 
name=default-resource-stickiness value=10 
id=cib-bootstrap-options-default-resource-stickiness/
Aug 23 08:44:28 mgmt01 crmd[10180]:   notice: run_graph: Transition 84 
(Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-28.bz2): Complete
Aug 23 08:44:28 mgmt01 crmd[10180]:   notice: do_state_transition: State 
transition S_TRANSITION_ENGINE - S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]
Aug 23 08:44:28 mgmt01 pengine[10179]:   notice: process_pe_message: Calculated 
Transition 84: /var/lib/pacemaker/pengine/pe-input-28.bz2

What I edited is default-resource-stickiness, but the whole meta_attributes 
id=rsc_options
gone too.

Vladislav

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] crmsh error : cib-bootstrap-options already exist

2013-08-23 Thread Vladislav Bogdanov

23.08.2013 16:10, Kristoffer Grönlund wrote:
 Hi Vladislav,
 
 On Fri, 23 Aug 2013 11:50:54 +0300
 Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Kristoffer, Dejan, could you please also look why I loose the whole
 rsc_defaults $id=rsc_options
 section when I do 'crm configure edit' and edit one of
 property $id=cib-bootstrap-options?

 
 Hm, that is not good. I suspect that this may be a regression that I
 caused when creating the workaround for the previously reported error.

No-no, it was before that fix too, at least with 19a3f1e5833c.
Should I still try?

 I have narrowed the fix to be more precise in the crmsh repository
 (commit 8a539c209eb0), it would be great if you could try using that
 version of crmsh instead and see if that solves your issue.
 
 Thank you,
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-HA] Storing arbitrary metadata in the CIB

2013-08-22 Thread Vladislav Bogdanov

22.08.2013 15:08, Ferenc Wagner wrote:
 Hi,
 
 Our setup uses some cluster wide pieces of meta information.  Think
 access control lists for resource instances used by some utilities or
 some common configuration data used by the resource agents.  Currently
 this info is stored in local files on the nodes or replicated in each
 primitive as parameters.  I find this suboptimal, as keeping them in
 sync is a hassle.  It is possible to store such stuff in the fake
 parameter of unmanaged Dummy resources, but that clutters the status
 output.  Can somebody offer some advice in this direction?  Or is this
 idea a pure heresy?
 

You may use meta attributes of any primitives for that. Although crmsh
doe not like that very much, it can be switched to a relaxed mode.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Adding node in advance

2013-07-15 Thread Vladislav Bogdanov

15.07.2013 12:36, Dejan Muhamedagic wrote:
 Hi Vladislav,
 
 On Fri, Jul 12, 2013 at 01:48:34PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I wanted to add new node into CIB in advance, before it is powered on
 (to power it on in a standby mode while cl#5169 is not implemented).

 So, I did
 ==
 [root@vd01-a tmp]# cat u
 node $id=4 vd01-d \
 attributes standby=on virtualization=true
 [root@vd01-a tmp]# crm configure load update u
 ERROR: 4: invalid object id
 ==
 
 According to the w3c recommendation, an id cannot start with a
 digit. However, I missed that node ids are actually defined as
 text. The test for node ids is now relaxed.
 
 Exactly the same syntax is accepted for already-known node.
 
 The id test happens only for cli snippets. It is assumed that the
 XML has already been validated.
 

Thank you, will test.

Vladislav


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crm resource restart is broken (d4de3af6dd33)

Hi Dejan,

It seems like resource restart does not work any longer.

# crm resource restart test01-vm
INFO: ordering test01-vm to stop
Traceback (most recent call last):
  File /usr/sbin/crm, line 44, in module
main.run()
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 442, in run
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 349, in do_work
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 150, in 
parse_line
  File /usr/lib64/python2.6/site-packages/crmsh/main.py, line 149, in lambda
  File /usr/lib64/python2.6/site-packages/crmsh/ui.py, line 894, in restart
  File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 429, in wait4dc
  File /usr/lib64/python2.6/site-packages/crmsh/utils.py, line 544, in 
crm_msec
  File /usr/lib64/python2.6/re.py, line 137, in match
TypeError: expected string or buffer


Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm resource restart is broken (d4de3af6dd33)

12.07.2013 12:06, Vladislav Bogdanov wrote:
 Hi Dejan,
 
 It seems like resource restart does not work any longer.

Ah, this seems to be fixed by bb39cce17f20. Sorry for noise.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

01.07.2013 17:29, Vladislav Bogdanov wrote:
 Hi,
 
 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).
 
 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #
 
 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.
 
 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 

With latest merge of Andrew's public and private trees and crmsh tip
everything works as expected.
The only (minor but confusing) issue is:

[root@vd01-a ~]# crm_node -l
3 vd01-c
4 vd01-d
1 vd01-a
2 vd01-b
[root@vd01-a ~]# crm_node -p
vd01-c vd01-a vd01-b
[root@vd01-a ~]# crm node delete vd01-d
WARNING: crm_node --force -R vd01-d failed, rc=1

Looks like missing crm_exit(pcmk_ok) for -R in try_corosync().

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Adding node in advance

Hi,

I wanted to add new node into CIB in advance, before it is powered on
(to power it on in a standby mode while cl#5169 is not implemented).

So, I did
==
[root@vd01-a tmp]# cat u
node $id=4 vd01-d \
attributes standby=on virtualization=true
[root@vd01-a tmp]# crm configure load update u
ERROR: 4: invalid object id
==

Exactly the same syntax is accepted for already-known node.

this is corosync-2.3.1 with nodelist/udpu, pacemaker master and crmsh tip.

Vladislav
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

2013-07-10 Thread Vladislav Bogdanov

10.07.2013 18:14, Dejan Muhamedagic wrote:
...
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 
 You can now:
 
 # crm --force node delete ...
 

Thanks,

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.
 
 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

Nope. Not enough. Even worse than before. I tested todays tip as it
includes that patch with merge of Andrew's public and private master heads.
=
[root@v02-b ~]# crm node show
v02-a(5): normal
standby: off
virtualization: true
$id: nodes-5
v02-b(6): normal
standby: off
virtualization: true
v02-c(7): normal
standby: off
virtualization: true
v02-d(8): normal(offline)
standby: off
virtualization: true
[root@v02-b ~]# crm node delete v02-d
ERROR: according to crm_node, node v02-d is still active
[root@v02-b ~]# crm_node -p
v02-c v02-d v02-a v02-b
[root@v02-b ~]# crm_node -l
7 v02-c
8 v02-d
5 v02-a
6 v02-b
[root@v02-b ~]#
=

That is after I stopped node, lowered votequorum expected_votes (with
corosync-quorumtool) and deleted v02-d from a cmap nodelist.

corosync-cmapctl still shows runtime info about deleted node as well:
runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.8.status (str) = left
And it is not allowed to delete that keys.

crm_node -R did the job (nothing left in the CIB), but, v02-d still
appears in its output for both -p and -l.

Andrew, I copy you directly because above is probably to you. Shouldn't
crm_node some-how show that stopped node is deleted from a corosync
nodelist?

Also, for some reason one node (v02-c) still had expected_votes set to
4, while other two remaining had it set to correct 3. That is of course
another story and need additional investigations. May be I just missed
something.


Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

10.07.2013 03:39, Andrew Beekhof wrote:
 
 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
standby: off
virtualization: true
$id: nodes-5
 v02-b(6): normal
standby: off
virtualization: true
 v02-c(7): normal
standby: off
virtualization: true
 v02-d(8): normal(offline)
standby: off
virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?
 
 Which stack is this?

corosync 2.3 with nodelist and udpu.

 

 Also, for some reason one node (v02-c) still had expected_votes set to
 4, while other two remaining had it set to correct 3. That is of course
 another story and need additional investigations. May be I just missed
 something.


 Best,
 Vladislav

 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

10.07.2013 07:05, Andrew Beekhof wrote:
 
 On 10/07/2013, at 2:04 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 10.07.2013 03:39, Andrew Beekhof wrote:

 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 
 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
   standby: off
   virtualization: true
   $id: nodes-5
 v02-b(6): normal
   standby: off
   virtualization: true
 v02-c(7): normal
   standby: off
   virtualization: true
 v02-d(8): normal(offline)
   standby: off
   virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?

 Which stack is this?

 corosync 2.3 with nodelist and udpu.
 
 I assume its possible, but crm_node isn't smart enough to do that yet.
 Feel like writing a patch? :)

Shouldn't it just skip offline nodes for -p?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

10.07.2013 08:13, Andrew Beekhof wrote:
 
 On 10/07/2013, at 2:15 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 10.07.2013 07:05, Andrew Beekhof wrote:

 On 10/07/2013, at 2:04 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 10.07.2013 03:39, Andrew Beekhof wrote:

 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 
 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master 
 heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
  standby: off
  virtualization: true
  $id: nodes-5
 v02-b(6): normal
  standby: off
  virtualization: true
 v02-c(7): normal
  standby: off
  virtualization: true
 v02-d(8): normal(offline)
  standby: off
  virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?

 Which stack is this?

 corosync 2.3 with nodelist and udpu.

 I assume its possible, but crm_node isn't smart enough to do that yet.
 Feel like writing a patch? :)

 Shouldn't it just skip offline nodes for -p?

 
 Worse. It appears to be asking pacemakerd instead of corosync or crmd.
 

Hm. I do not believe I'm able to refactor it then...

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

10.07.2013 08:38, Andrew Beekhof wrote:
 
 On 10/07/2013, at 3:37 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 10.07.2013 08:13, Andrew Beekhof wrote:

 On 10/07/2013, at 2:15 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 10.07.2013 07:05, Andrew Beekhof wrote:

 On 10/07/2013, at 2:04 PM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 10.07.2013 03:39, Andrew Beekhof wrote:

 On 10/07/2013, at 1:51 AM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 03.07.2013 19:31, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
 01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,

 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 
 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b

 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

 Likely it shows everything from a corosync nodelist.
 After I deleted the node from everywhere except corosync, list is 
 still
 the same.

 OK. This patch changes the interface to crm_node to use the
 list partition option (-p). Could you please test it?

 Nope. Not enough. Even worse than before. I tested todays tip as it
 includes that patch with merge of Andrew's public and private master 
 heads.
 =
 [root@v02-b ~]# crm node show
 v02-a(5): normal
 standby: off
 virtualization: true
 $id: nodes-5
 v02-b(6): normal
 standby: off
 virtualization: true
 v02-c(7): normal
 standby: off
 virtualization: true
 v02-d(8): normal(offline)
 standby: off
 virtualization: true
 [root@v02-b ~]# crm node delete v02-d
 ERROR: according to crm_node, node v02-d is still active
 [root@v02-b ~]# crm_node -p
 v02-c v02-d v02-a v02-b
 [root@v02-b ~]# crm_node -l
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 [root@v02-b ~]#
 =

 That is after I stopped node, lowered votequorum expected_votes (with
 corosync-quorumtool) and deleted v02-d from a cmap nodelist.

 corosync-cmapctl still shows runtime info about deleted node as well:
 runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
 runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
 runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
 runtime.totem.pg.mrp.srp.members.8.status (str) = left
 And it is not allowed to delete that keys.

 crm_node -R did the job (nothing left in the CIB), but, v02-d still
 appears in its output for both -p and -l.

 Andrew, I copy you directly because above is probably to you. Shouldn't
 crm_node some-how show that stopped node is deleted from a corosync
 nodelist?

 Which stack is this?

 corosync 2.3 with nodelist and udpu.

 I assume its possible, but crm_node isn't smart enough to do that yet.
 Feel like writing a patch? :)

 Shouldn't it just skip offline nodes for -p?


 Worse. It appears to be asking pacemakerd instead of corosync or crmd.


 Hm. I do not believe I'm able to refactor it then...

 
 Yeah, I'm looking at it.
 The hard part is that going to corosync directly only gives you a nodeid :-(
 

Don't you need to get info from both sources anyway (offline in crmd
and joined in corosync case - node has corosync started, but pacemaker
is not)?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

04.07.2013 19:09, Dejan Muhamedagic wrote:
 On Thu, Jul 04, 2013 at 05:40:07PM +0300, Vladislav Bogdanov wrote:
 04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get *really*
 complex, quickly. And eventually, one ends with the need to annotate the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

   primitive r1 params p1=v1 p2=v2 meta m1=mv1

   primitive r1 params p1=nv1 p3=v3 - merge

   ---

   primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.

 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

 Ok, you know it much better ;)


 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).

 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.

 The only question is how to remove existing attributes.

 Not many choices here I think... Either set to empty or better
 predefined value (empty value may be still valid and used to override
 not-empty default one) like Lars suggested
 
 Yes, and that would be up to the users.
 
 or use some additional
 formatting ( -param_name ). Second way probably requires new load method.
 
 That's the one I'd be interested in, but for now most of the
 possibilities seem to come from the kludge domain :)

You may also look at ldif(5) (part of openldap) to see how this is
solved in the LDAP world. May be that could give some valuable pointers
(although I do not see how apply that directly). There are trick to
replace only one value from a set (or to ensure that records virtual ID
is not modified) - use delete/add instead of replace.

 
 BTW, did you ever try the configure filter command?

 Hm..
 Not yet :)
 But how that can help?
 
 filter is to edit what sed to ed is. It got actually introduced
 so that we can do automatic regression testing of the edit
 command. You could conceivably, depending on your use case,
 produce a script to get what you need. Just not sure how
 difficult it would be to get the processing right.

Thank you, but this is too complicated for my case.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

05.07.2013 14:38, Dejan Muhamedagic wrote:
 On Fri, Jul 05, 2013 at 09:31:07AM +0300, Vladislav Bogdanov wrote:
 04.07.2013 19:09, Dejan Muhamedagic wrote:
 On Thu, Jul 04, 2013 at 05:40:07PM +0300, Vladislav Bogdanov wrote:
 04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be 
 affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get 
 *really*
 complex, quickly. And eventually, one ends with the need to annotate 
 the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

 primitive r1 params p1=v1 p2=v2 meta m1=mv1

 primitive r1 params p1=nv1 p3=v3 - merge

 ---

 primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.

 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

 Ok, you know it much better ;)


 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).

 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.

 The only question is how to remove existing attributes.

 Not many choices here I think... Either set to empty or better
 predefined value (empty value may be still valid and used to override
 not-empty default one) like Lars suggested

 Yes, and that would be up to the users.

 or use some additional
 formatting ( -param_name ). Second way probably requires new load method.

 That's the one I'd be interested in, but for now most of the
 possibilities seem to come from the kludge domain :)

 You may also look at ldif(5) (part of openldap) to see how this is
 solved in the LDAP world. May be that could give some valuable pointers
 (although I do not see how apply that directly). There are trick to
 replace only one value from a set (or to ensure that records virtual ID
 is not modified) - use delete/add instead of replace.
 
 Yes, something similar crossed my mind, but I wanted to avoid
 too much verbosity.

Understand.

May be it is possible to start with just not touch the whole section
(like params, meta, utilization, node attributes) if it does not exist
in the update, or if it contains just pre-defined value (f.e. #keep)?

Ughm...

What if to introduce *optional* replacement policy for the whole section?
I mean:

params #merge param1=value1 param2=value2

meta #replace ...

utilization #keep

and so on. With default to #replace?

 
 BTW, did you ever try the configure filter command?

 Hm..
 Not yet :)
 But how that can help?

 filter is to edit what sed to ed is. It got actually introduced
 so that we can do automatic regression testing of the edit
 command. You could conceivably, depending on your use case,
 produce a script to get what you need. Just not sure how
 difficult it would be to get the processing right.

 Thank you, but this is too complicated for my case.
 
 It seems to be too complicated for most uses :( Though something
 similar with more use potential could most probably be designed.
 
 Thanks,
 
 Dejan
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

05.07.2013 16:25, Vladislav Bogdanov wrote:
 05.07.2013 14:38, Dejan Muhamedagic wrote:
 On Fri, Jul 05, 2013 at 09:31:07AM +0300, Vladislav Bogdanov wrote:
 04.07.2013 19:09, Dejan Muhamedagic wrote:
 On Thu, Jul 04, 2013 at 05:40:07PM +0300, Vladislav Bogdanov wrote:
 04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be 
 affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get 
 *really*
 complex, quickly. And eventually, one ends with the need to annotate 
 the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

primitive r1 params p1=v1 p2=v2 meta m1=mv1

primitive r1 params p1=nv1 p3=v3 - merge

---

primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.

 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

 Ok, you know it much better ;)


 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).

 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.

 The only question is how to remove existing attributes.

 Not many choices here I think... Either set to empty or better
 predefined value (empty value may be still valid and used to override
 not-empty default one) like Lars suggested

 Yes, and that would be up to the users.

 or use some additional
 formatting ( -param_name ). Second way probably requires new load method.

 That's the one I'd be interested in, but for now most of the
 possibilities seem to come from the kludge domain :)

 You may also look at ldif(5) (part of openldap) to see how this is
 solved in the LDAP world. May be that could give some valuable pointers
 (although I do not see how apply that directly). There are trick to
 replace only one value from a set (or to ensure that records virtual ID
 is not modified) - use delete/add instead of replace.

 Yes, something similar crossed my mind, but I wanted to avoid
 too much verbosity.
 
 Understand.
 
 May be it is possible to start with just not touch the whole section
 (like params, meta, utilization, node attributes) if it does not exist
 in the update, or if it contains just pre-defined value (f.e. #keep)?
 
 Ughm...
 
 What if to introduce *optional* replacement policy for the whole section?
 I mean:
 
 params #merge param1=value1 param2=value2
 
 meta #replace ...
 
 utilization #keep
 
 and so on. With default to #replace?

Even more.
If we allow such meta lexems anywhere (not only at the very
beginning), then they may be applied only to the rest of string (or
before other meta lexem).

The best thing I see about this idea is this is fully backwards compatible.

 

 BTW, did you ever try the configure filter command?

 Hm..
 Not yet :)
 But how that can help?

 filter is to edit what sed to ed is. It got actually introduced
 so that we can do automatic regression testing of the edit
 command. You could conceivably, depending on your use case,
 produce a script to get what you need. Just not sure how
 difficult it would be to get the processing right.

 Thank you, but this is too complicated for my case.

 It seems to be too complicated for most uses :( Though something
 similar with more use potential could most probably be designed.

 Thanks,

 Dejan

 ___
 Linux-HA mailing

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

05.07.2013 19:46, Lars Marowsky-Bree wrote:
 On 2013-07-05T19:06:54, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 params #merge param1=value1 param2=value2

 meta #replace ...

 utilization #keep

 and so on. With default to #replace?

 Even more.
 If we allow such meta lexems anywhere (not only at the very
 beginning), then they may be applied only to the rest of string (or
 before other meta lexem).

 The best thing I see about this idea is this is fully backwards compatible.
 
 From a language aesthetics point of view, this gives me the utter
 creeps. Don't make me switch to pcs! ;-)

;)

 
 I could live with a proper merge/update, replace command as a
 prefix, just like we now have delete, though. Similar to what we do
 for groups.

delete is a command, not a configuration syntax part.
What I propose is fully optional hint-like language extension, with
defaults to the current behavior.
What I'm interested myself is just #keep part, but, yes, I prefer
complete general solutions.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

03.07.2013 02:24, Andrew Beekhof wrote:

...


 I don't even know what I'm thinking half the time, I'd not recommend trying 
 to guess :)
 No fundamental objection to such a feature, but I'd be reluctant to add it 
 until we get an attrd that was truly atomic.
 That code is mostly bandages and sticky tape.

I filed cl#5165 for that.


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

03.07.2013 16:28, Vladislav Bogdanov wrote:
...
 
 So I'd probably just hack crmsh to not touch node utilization attributes
 if whole 'utilization' part is missing in the update.

Unfortunately this doesn't seem to be possible with my python
programming level (near zero)... :(

It is clear for me that I need to conditionally modify calls to
'replace' family of methods to newly-implemented 'merge' ones, but I do
not like adding hacks (and do not see how to do that) to the otherwise
generic code and think that only general concept reevaluation may help
to do that properly. S.O.S. :)


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

04.07.2013 17:25, Dejan Muhamedagic wrote:
 On Wed, Jul 03, 2013 at 04:33:20PM +0300, Vladislav Bogdanov wrote:
 03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,

 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get *really*
 complex, quickly. And eventually, one ends with the need to annotate the
 input with how one wants a merge to be resolved (such as #undef
 values).

 Perhaps I misunderstood the original intention, but the idea was
 more simple:

 primitive r1 params p1=v1 p2=v2 meta m1=mv1

 primitive r1 params p1=nv1 p3=v3 - merge

 ---

 primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1

 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

 I'd simplify that logic to sections.
 
 Must say that it seems to me simpler and easier to grasp on the
 attribute level. Besides, code for that already exists, so it
 would reduce effort and code size/complexity :) Of course,
 I can also see some use cases for the attribute-set level
 operations.

Ok, you know it much better ;)

 
 * node attributes (except pacemaker internal ones like $id?)
 * node utilization
 * primitive params
 * primitive meta
 * primitive utilization
 * clone/ms meta

 If whole section is missing in the update, then leave it as-is.
 Otherwise (also if it exists but empty) replace the whole section.

 The only unclear thing is 'op', but this one can be replaced
 unconditionally (like in the current logic).
 
 I guess that it can be merged just like any other set of
 attributes. Note that the user is free to specify the
 operation fully if they really want to replace it completely.
 
 The only question is how to remove existing attributes.

Not many choices here I think... Either set to empty or better
predefined value (empty value may be still valid and used to override
not-empty default one) like Lars suggested or use some additional
formatting ( -param_name ). Second way probably requires new load method.

 
 BTW, did you ever try the configure filter command?

Hm..
Not yet :)
But how that can help?

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

04.07.2013 17:40, Vladislav Bogdanov wrote:
...
 The only question is how to remove existing attributes.

Another one is how to forcibly replace the whole section or the whole
object definition, without caring about its original content.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-03 Thread Vladislav Bogdanov

03.07.2013 13:00, Lars Marowsky-Bree wrote:
 On 2013-07-03T00:20:19, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 I do not edit them. I my setup I generate full crm config with
 template-based framework.
 
 And then you do a load/replace? Tough; yes, that'll clearly overwrite

Actually 'load update'.
'replace' doesn't work when resources are running.

 what is already there and added by scripts that more dynamically modify
 the CIB.
 
 Since we don't know your complete merging rules, it's probably easier if
 your template engine gains hooks to first read the CIB for setting those
 utilization values.

Probably. But not template framework itself (it is combination of make
ans m4 actually, so it is too stupid too lookup CIB). So I'd nee to move
that onto next model level (human or controlling framework, which I'm in
process of implementing) - but that is actually what I wanted to happen
(it breaks the whole idea).

So I'd probably just hack crmsh to not touch node utilization attributes
if whole 'utilization' part is missing in the update.
If/when pacemaker has support for transient utilization attributes, I
will move to that.


 
 That is very convenient way to f.e stop dozen of resources in one shot
 for some maintenance. I have special RA which creates ticket on a
 cluster start and deletes it on a cluster stop. And many resources may
 depend on that ticket. If you request resource handled by that RA to
 stop, ticket is revoked and all dependent resources stop.

 I wouldn't write that RA if I have cluster-wide attributes (which
 perform like node attributes but for a whole cluster).
 
 Right. But. Tickets *are* cluster wide attributes that are meant to
 control the target-role of many resources depending on them. So you're
 getting exactly what you need, no? What is missing?

They are volatile.

And, I'd prefer cluster attributes to have free-form values. I was
already hit by the fact that two-state 'granted/revoked' value is too
limited for me. I then expanded logic to also use non-existent' ticket
state (it worked for some time), but then support for active/standby
came in and I switched to that.

Tat all was in lustre-server RA, which needs to control order in which
parts of a whole lustre fs are tuned/activated when it moves to another
cluster on a ticket revocation. I use additional internally-controlled
ticket there.

Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-07-03 Thread Vladislav Bogdanov

03.07.2013 15:43, Dejan Muhamedagic wrote:
 Hi Lars,
 
 On Wed, Jul 03, 2013 at 12:05:17PM +0200, Lars Marowsky-Bree wrote:
 On 2013-07-03T10:26:09, Dejan Muhamedagic deja...@fastmail.fm wrote:

 Not sure that is expected by most people.
 How you then delete attributes?
 Tough call :) Ideas welcome.

 Set them to an empty string, or a magic #undef value.

 It's not only for the nodes. Attributes of resources should be
 merged as well. Perhaps to introduce another load method, say
 merge, which would merge attributes of elements instead of
 replacing them. Though the use would then get more complex (which
 seems to be justified here).

 Well, that leaves open the question of how higher-level objects
 (primitives, clones, groups, constraints ...) would be affected/deleted.

 I'm not sure the complexity is really worth it. Merge rules get *really*
 complex, quickly. And eventually, one ends with the need to annotate the
 input with how one wants a merge to be resolved (such as #undef
 values).
 
 Perhaps I misunderstood the original intention, but the idea was
 more simple:
 
   primitive r1 params p1=v1 p2=v2 meta m1=mv1
 
   primitive r1 params p1=nv1 p3=v3 - merge
 
   ---
 
   primitive r1 params p1=nv1 p2=v2 p3=v3 meta m1=mv1
 
 If the attribute already exists, then it is overwritten. The
 existing attributes are otherwise left intact. New attributes are
 added.

I'd simplify that logic to sections.
* node attributes (except pacemaker internal ones like $id?)
* node utilization
* primitive params
* primitive meta
* primitive utilization
* clone/ms meta

If whole section is missing in the update, then leave it as-is.
Otherwise (also if it exists but empty) replace the whole section.

The only unclear thing is 'op', but this one can be replaced
unconditionally (like in the current logic).

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

28.06.2013 17:47, Dejan Muhamedagic wrote:
...
 If you want to test here's a new patch. It does work with
 unrelated changes happening in the meantime. I didn't test yet
 really concurrent updates.
 

One thing I see immediately, is that node utilization attributes are
deleted after I do 'load update' with empty node utilization sections.
That is probably not specific to this patch.

I have that attributes dynamic, set from a RA (as node configuration may
vary, I prefer to detect how much CPU and RAM I have and set utilization
accordingly rather then put every hardware change into CIB).

On the one hand, I would agree that crmsh does what is intended - if no
utilization attributes is set in a config update, then they shoud be
removed.
On the other, I would prefer to delete node utilization attributes on
update only if new definition contains 'utilization' section but without
that attributes.

Or may be it is possible to use transient utilization attributes?
I don't think so... Ugh, that would be nice.

Everything else works fine, I was able to do:
# crm configure
crm(live)configure# edit
edit target-role attribute on a clone
In other shell: crm resource stop...
crm(live)configure# commit

And that didn't produce any errors.
Both actions completed correctly.

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

02.07.2013 12:27, Lars Marowsky-Bree wrote:
 On 2013-07-02T11:05:01, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 One thing I see immediately, is that node utilization attributes are
 deleted after I do 'load update' with empty node utilization sections.
 That is probably not specific to this patch.
 
 Yes, that isn't specific to that.
 
 I have that attributes dynamic, set from a RA (as node configuration may
 vary, I prefer to detect how much CPU and RAM I have and set utilization
 accordingly rather then put every hardware change into CIB).
 
 Or may be it is possible to use transient utilization attributes?
 I don't think so... Ugh, that would be nice.
 
 Yes, that's exactly what you need here.

I know, but I do not expect that to be implemented soon. Together with
cluster-wide attributes for which I use hack with tickets now. But
tickets currently are quite limited - they have only 4 states, so it is
impossible to put f.e. number there.

I fully understand Andrew's point when he is unwilling to implement
features for just two setups, so... Probably I need to extend crmsh with
site-specific patch until that is implemented. That would be acceptable
work-around for me... And chance to learn python nevertheless ;)



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

02.07.2013 14:55, Andrew Beekhof wrote:
 
 On 02/07/2013, at 8:14 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 02.07.2013 12:27, Lars Marowsky-Bree wrote:
 On 2013-07-02T11:05:01, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 One thing I see immediately, is that node utilization attributes are
 deleted after I do 'load update' with empty node utilization sections.
 That is probably not specific to this patch.

 Yes, that isn't specific to that.

 I have that attributes dynamic, set from a RA (as node configuration may
 vary, I prefer to detect how much CPU and RAM I have and set utilization
 accordingly rather then put every hardware change into CIB).

 Or may be it is possible to use transient utilization attributes?
 I don't think so... Ugh, that would be nice.

 Yes, that's exactly what you need here.

 I know, but I do not expect that to be implemented soon. Together with
 cluster-wide attributes for which I use hack with tickets now. But
 tickets currently are quite limited - they have only 4 states, so it is
 impossible to put f.e. number there.

 I fully understand Andrew's point when he is unwilling to implement
 features for just two setups, so...
 
 What feature am I not considering here?  I don't follow.

I didn't ask about that yet. Just assuming what your possible reaction
could be. :)
Support for transient utilization attributes, which do not go to config
section, but to state section. I would say that is overkill to implement
that (and somehow merge two sections when doing utilization calculation)
if nobody except me is affected by absence of that.

F.e. I need to do CIB update (think of it as of full replace), because I
generate crmsh configuration with custom template-based system. And I
have some RAs which set utilization attributes on nodes.
Now, when I apply my full brand new config to a cluster after making
some changes here and there, that attributes are lost.

Transient utilization attributes would help me (I would use them in my RAs).
But, I wouldn't say that is a common setup. That's why I assume you
won't be a fan of implementing them.

 
 Probably I need to extend crmsh with
 site-specific patch until that is implemented. That would be acceptable
 work-around for me... And chance to learn python nevertheless ;)

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

02.07.2013 15:13, Lars Marowsky-Bree wrote:
 On 2013-07-02T13:14:48, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 Yes, that's exactly what you need here.
 I know, but I do not expect that to be implemented soon.
 
 crm_attribute -l reboot -z doesn't strike me as an unlikely request.
 You could file an enhancement request for that.
 
 But with the XML diff feature, as long as you're not editing the node
 section, that shouldn't be a problem - unrelated changes shouldn't
 overwrite those attributes, right? That being the whole point?

I do not edit them. I my setup I generate full crm config with
template-based framework. That's why nodes go there too. And I can't
skip them, because I heavily use ordinary node attributes and they
change sometimes.

 
 (Of course, if you remove them in the copy, that'd remove them.)
 
 But tickets currently are quite limited - they have only 4 states, so
 it is impossible to put f.e. number there.
 
 What are you trying to do with that?

That is very convenient way to f.e stop dozen of resources in one shot
for some maintenance. I have special RA which creates ticket on a
cluster start and deletes it on a cluster stop. And many resources may
depend on that ticket. If you request resource handled by that RA to
stop, ticket is revoked and all dependent resources stop.

I wouldn't write that RA if I have cluster-wide attributes (which
perform like node attributes but for a whole cluster).


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

03.07.2013 00:16, Lars Marowsky-Bree wrote:
 On 2013-07-03T00:11:53, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 F.e. I need to do CIB update (think of it as of full replace), because I
 generate crmsh configuration with custom template-based system. And I
 have some RAs which set utilization attributes on nodes.
 
 The template system should insert the *diff*, not do a full replace,
 obviously, when new resources are added or previous ones removed.

Yes. And I rely on crmsh to do that. With additional hooks for stale
resource deletion.

 
 And transient load attributes also seem to suggest that you're doing a
 whole lot of that. That's probably beyond what utilization was
 originally spec'ed for (simplifying location constraints for a large
 number of pretty similar resources, e.g., VMs). Can I ask what you're
 doing?

With utilization I just set node utilization attributes to what node X
has right now. I'm free to replace hardware in a cluster, am I? And that
way I always have utilization attributes consistent with real hardware,
independently of CIB configuration I produce with my template framework.
And I use utilization for what it was originally intended - for VMs.

 
 And what are you doing with tickets? ;-)

Answered in another message.

 
 
 Regards,
 Lars
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

03.07.2013 00:20, Vladislav Bogdanov wrote:
...
 But tickets currently are quite limited - they have only 4 states, so
 it is impossible to put f.e. number there.

 What are you trying to do with that?
 
 That is very convenient way to f.e stop dozen of resources in one shot
 for some maintenance. I have special RA which creates ticket on a
 cluster start and deletes it on a cluster stop. And many resources may
 depend on that ticket. If you request resource handled by that RA to
 stop, ticket is revoked and all dependent resources stop.

Ah, and in one setup (lustre fs on top of geo-clustered two-layer drbd)
I also use ticket revocation to cause a transition abort in a
controllable way, so advisory ordering constraints work. Idea is to do a
CIB modification after some event, so after that advisory-ordered
resources are to be stopped in the same transition, and they are stopped
in an order I want.

IMHO nice hack ;)

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

02.07.2013 20:05, Dejan Muhamedagic wrote:
 On Tue, Jul 02, 2013 at 11:05:01AM +0300, Vladislav Bogdanov wrote:
 28.06.2013 17:47, Dejan Muhamedagic wrote:
 ...
 If you want to test here's a new patch. It does work with
 unrelated changes happening in the meantime. I didn't test yet
 really concurrent updates.


 One thing I see immediately, is that node utilization attributes are
 deleted after I do 'load update' with empty node utilization sections.
 That is probably not specific to this patch.
 
 Right.
 
 I have that attributes dynamic, set from a RA (as node configuration may
 vary, I prefer to detect how much CPU and RAM I have and set utilization
 accordingly rather then put every hardware change into CIB).

 On the one hand, I would agree that crmsh does what is intended - if no
 utilization attributes is set in a config update, then they shoud be
 removed.
 
 Well, thinking more about it, the attributes should be merged.
 The only trouble is that that would then change the command
 semantically.

Not sure that is expected by most people.
How you then delete attributes?

If you really think about implementing that merging, I would introduce a
crmsh config option for that. F.e. node_attr_policy (replace|merge).

And default value should be the current one.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crm node delete

2013-07-01 Thread Vladislav Bogdanov

Hi,

I'm trying to look if it is now safe to delete non-running nodes
(corosync 2.3, pacemaker HEAD, crmsh tip).

# crm node delete v02-d
WARNING: 2: crm_node bad format: 7 v02-c
WARNING: 2: crm_node bad format: 8 v02-d
WARNING: 2: crm_node bad format: 5 v02-a
WARNING: 2: crm_node bad format: 6 v02-b
INFO: 2: node v02-d not found by crm_node
INFO: 2: node v02-d deleted
#

So, I expect that crmsh still doesn't follow latest changes to 'crm_node
-l'. Although node seems to be deleted correctly.

For reference, output of crm_node -l is:
7 v02-c
8 v02-d
5 v02-a
6 v02-b


Best,
Vladislav


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crm node delete

2013-07-01 Thread Vladislav Bogdanov

01.07.2013 18:29, Dejan Muhamedagic wrote:
 Hi,
 
 On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
 Hi,

 I'm trying to look if it is now safe to delete non-running nodes
 (corosync 2.3, pacemaker HEAD, crmsh tip).

 # crm node delete v02-d
 WARNING: 2: crm_node bad format: 7 v02-c
 WARNING: 2: crm_node bad format: 8 v02-d
 WARNING: 2: crm_node bad format: 5 v02-a
 WARNING: 2: crm_node bad format: 6 v02-b
 INFO: 2: node v02-d not found by crm_node
 INFO: 2: node v02-d deleted
 #

 So, I expect that crmsh still doesn't follow latest changes to 'crm_node
 -l'. Although node seems to be deleted correctly.

 For reference, output of crm_node -l is:
 7 v02-c
 8 v02-d
 5 v02-a
 6 v02-b
 
 This time the node state was empty. Or it's missing altogether.
 I'm not sure how's that supposed to be interpreted. We test the
 output of crm_node -l just to make sure that the node is not
 online. Perhaps we need to use some other command.

Likely it shows everything from a corosync nodelist.
After I deleted the node from everywhere except corosync, list is still
the same.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-27 Thread Vladislav Bogdanov

26.06.2013 18:30, Dejan Muhamedagic wrote:
 On Wed, Jun 26, 2013 at 06:13:33PM +0300, Vladislav Bogdanov wrote:
 26.06.2013 15:57, Dejan Muhamedagic wrote:
 On Thu, Jun 06, 2013 at 05:19:03PM +0200, Dejan Muhamedagic wrote:
 Hi,

 On Thu, Jun 06, 2013 at 03:11:16PM +0300, Vladislav Bogdanov wrote:
 06.06.2013 08:43, Vladislav Bogdanov wrote:
 [...]
 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at 
 the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib

 Then that is safe way to go, assuming that cib daemon serializes
 modification requests.


 It would be great if crmsh use that trick.

 Hope to have something soon. Stay tuned.

 The patch for crmsh is attached and you'll need the very latest
 pacemaker (because cibadmin needed some fixing). Unfortunately,
 I cannot push this yet to the repository, as the current
 pacemaker 1.1.10-rc still identifies itself as 1.1.9. I'd
 appreciate if you could test it.

 Seems to work during preliminary testing (stop clone with crm configure
 edit and then start it with crm resource start).
 cib process on the DC reports it received the diff and handles that
 perfectly.

 Thank you!

 I'll build updated package with this patch tomorrow and try to put that
 into real work.
 I mean to try concurrent updates.
 What would be the best way to achieve them?

 Is starting editing with crm configure edit with some concurrent command
 during that editing is enough (and save after command is run)?
 

I meant to ask when does crmsh gets original epoch to construct diff, at
the very beginning of editing, or right before commiting - there can be
rather big timeframe between that points.

It would be nice to have an intelligent patcher which takes one CIB
snapshot at the beginning of edit, than generates a diff and looks if it
applies to a current CIB cleanly (all except epoch). Then it would be
possible to use current epoch in a diff which goes to a cib daemon.
I do not know does it make a sense.

May be there is a better way to not loose big edits due to some small
unrelated changes were made meanwhile?

Or may be you can describe algorithm you use for those who do not know
python?

 I didn't remove the check for the changes and anyway cib is
 going to refuse to apply the patch if the epoch is older. Of
 course, crmsh can set the epoch attribute to something greater
 than the current epoch.

Didn't get this, sorry. Could you please reword?

 
 What would you suggest?
 
 Cheers,
 
 Dejan
 
 Vladislav

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-26 Thread Vladislav Bogdanov

26.06.2013 15:57, Dejan Muhamedagic wrote:
 On Thu, Jun 06, 2013 at 05:19:03PM +0200, Dejan Muhamedagic wrote:
 Hi,

 On Thu, Jun 06, 2013 at 03:11:16PM +0300, Vladislav Bogdanov wrote:
 06.06.2013 08:43, Vladislav Bogdanov wrote:
 [...]
 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib

 Then that is safe way to go, assuming that cib daemon serializes
 modification requests.


 It would be great if crmsh use that trick.

 Hope to have something soon. Stay tuned.
 
 The patch for crmsh is attached and you'll need the very latest
 pacemaker (because cibadmin needed some fixing). Unfortunately,
 I cannot push this yet to the repository, as the current
 pacemaker 1.1.10-rc still identifies itself as 1.1.9. I'd
 appreciate if you could test it.

Seems to work during preliminary testing (stop clone with crm configure
edit and then start it with crm resource start).
cib process on the DC reports it received the diff and handles that
perfectly.

Thank you!

I'll build updated package with this patch tomorrow and try to put that
into real work.
I mean to try concurrent updates.
What would be the best way to achieve them?

Is starting editing with crm configure edit with some concurrent command
during that editing is enough (and save after command is run)?

Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] crmsh and fencing_topology

2013-06-13 Thread Vladislav Bogdanov

Dejan,

here is the patch to fix parsing of fencing_topology:
--- a/modules/xmlutil.py2013-06-07 07:21:10.0 +
+++ b/modules/xmlutil.py2013-06-13 07:51:09.704924693 +
@@ -937,7 +937,7 @@ def get_set_nodes(e,setname,create = 0):

 def xml_noorder_hash(n):
 return sorted([ hash(etree.tostring(x)) \
-for x in n.iterchildren() if is_element(c) ])
+for x in n.iterchildren() if is_element(x) ])
 xml_hash_d = {
 fencing-topology: xml_noorder_hash,
 }
---

Unfortunately, that still doesn't fully fix the problem, because
fencing-topology / is inserted into an extra configuration / node:
cib ...
   configuration
  ...
  configuration
  fencing-topology /
  /configuration
   /configuration
/cib

Can you please look at this? I expect fix to be one-line patch as well ;)

Best,
Vladislav

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-06 Thread Vladislav Bogdanov

06.06.2013 09:02, Andrew Beekhof wrote:
 
 On 06/06/2013, at 3:45 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 06.06.2013 08:14, Andrew Beekhof wrote:

 On 06/06/2013, at 2:50 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 06.06.2013 07:31, Andrew Beekhof wrote:

 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com 
 wrote:

 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource 
 in
 a shadow CIB together with the necessary constraints, runs a 
 simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that 
 if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but 
 the
 shadow CIB route has good points as well (easier to script via 
 cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib

 Ah, one more question. The whole modification request is rejected if any
 of patch hunks fail, correct?
 
 Correct (and yes everything is serialized _unless_ you start using the -l 
 cibadmin option)

Great. Thanks.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

2013-06-06 Thread Vladislav Bogdanov

06.06.2013 08:43, Vladislav Bogdanov wrote:
[...]
 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?

 The patch is applied on the server - so cib
 
 Then that is safe way to go, assuming that cib daemon serializes
 modification requests.
 

It would be great if crmsh use that trick.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

05.06.2013 02:04, Andrew Beekhof wrote:

On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

Dejan Muhamedagic deja...@fastmail.fm writes:

On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

I've got a script for resource creation, which puts the new resource in
a shadow CIB together with the necessary constraints, runs a simulation
and finally offers to commit the shadow CIB into the live config (by
invoking an interactive crm). This works well. My concern is that if
somebody else (another cluster administrator) changes anything in the
cluster configuration between creation of the shadow copy and the
commit, those changes will be silently reverted (lost) by the commit.
Is there any way to avoid the possibility of this? According to
http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
crm provides this functionality for its configure sessions [*], but the
shadow CIB route has good points as well (easier to script via cibadmin,
simulation), which I'd like to use. Any ideas?

Record the two epoch attributes of the cib tag at the beginning
and check if they changed just before applying the changes.

Maybe I don't understand you right, but isn't this just narrowing the
time window of the race? After all, that concurrent change can happen
between the epoch check and the commit, can't it?

The CIB will refuse to accept any update with a lower version:

http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

I recall that LDAP has similar problem, which is easily worked around
with specifying two values, one is original, second is new.
That way you tell LDAP server:
Replace value Y in attribute X to value Z. And if value is not Y at the
moment of modification request, then command fails.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

06.06.2013 07:31, Andrew Beekhof wrote:
 
 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource in
 a shadow CIB together with the necessary constraints, runs a simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but the
 shadow CIB route has good points as well (easier to script via cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

   
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.
 
 cibadmin --patch works this way

Who is baking new CIB in that case, cibadmin or cib?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)

06.06.2013 08:14, Andrew Beekhof wrote:
 
 On 06/06/2013, at 2:50 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:
 
 06.06.2013 07:31, Andrew Beekhof wrote:

 On 06/06/2013, at 2:27 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote:

 05.06.2013 02:04, Andrew Beekhof wrote:

 On 05/06/2013, at 5:08 AM, Ferenc Wagner wf...@niif.hu wrote:

 Dejan Muhamedagic deja...@fastmail.fm writes:

 On Mon, Jun 03, 2013 at 06:19:06PM +0200, Ferenc Wagner wrote:

 I've got a script for resource creation, which puts the new resource in
 a shadow CIB together with the necessary constraints, runs a simulation
 and finally offers to commit the shadow CIB into the live config (by
 invoking an interactive crm).  This works well.  My concern is that if
 somebody else (another cluster administrator) changes anything in the
 cluster configuration between creation of the shadow copy and the
 commit, those changes will be silently reverted (lost) by the commit.
 Is there any way to avoid the possibility of this?  According to
 http://article.gmane.org/gmane.linux.highavailability.pacemaker/11021,
 crm provides this functionality for its configure sessions [*], but the
 shadow CIB route has good points as well (easier to script via 
 cibadmin,
 simulation), which I'd like to use.  Any ideas?

 Record the two epoch attributes of the cib tag at the beginning
 and check if they changed just before applying the changes.

 Maybe I don't understand you right, but isn't this just narrowing the
 time window of the race?  After all, that concurrent change can happen
 between the epoch check and the commit, can't it?

 The CIB will refuse to accept any update with a lower version:

  
 http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Pacemaker_Explained/_configuration_version.html

 I recall that LDAP has similar problem, which is easily worked around
 with specifying two values, one is original, second is new.
 That way you tell LDAP server:
 Replace value Y in attribute X to value Z. And if value is not Y at the
 moment of modification request, then command fails.

 cibadmin --patch works this way

 Who is baking new CIB in that case, cibadmin or cib?
 
 The patch is applied on the server - so cib

Then that is safe way to go, assuming that cib daemon serializes
modification requests.

Thanks for sharing info.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] disallowing concurrent configuration (CIB modifications)