Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-31 Thread Yusuke Iida
Hi, Andrew crm_mon has the processing which makes cib the newest, when pcmk_err_old_data is still received. Since this processing can be considered to be unnecessary like the processing changed by stonithd, I correct this. Please merge the following, if satisfactory.

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-17 Thread Andrew Beekhof
On 12 Mar 2014, at 1:45 pm, Yusuke Iida yusk.i...@gmail.com wrote: Hi, Andrew 2014-03-12 6:37 GMT+09:00 Andrew Beekhof and...@beekhof.net: Mar 07 13:24:14 [2528] vm01 crmd: (te_callbacks:493 ) error: te_update_diff: Ingoring create operation for /cib 0xf91c10, configuration

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-12 Thread Vladislav Bogdanov
12.03.2014 00:40, Andrew Beekhof wrote: On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 07.03.2014 10:30, Vladislav Bogdanov wrote: 07.03.2014 05:43, Andrew Beekhof wrote: On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Vladislav Bogdanov
07.03.2014 10:30, Vladislav Bogdanov wrote: 07.03.2014 05:43, Andrew Beekhof wrote: On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Yusuke Iida
Hi, Andrew 2014-03-11 14:21 GMT+09:00 Andrew Beekhof and...@beekhof.net: On 11 Mar 2014, at 4:14 pm, Andrew Beekhof and...@beekhof.net wrote: [snip] If I do this however: # cp start.xml 1.xml; tools/cibadmin --replace -o configuration --xml-file replace.some -V I start to see

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Andrew Beekhof
On 11 Mar 2014, at 6:51 pm, Yusuke Iida yusk.i...@gmail.com wrote: Hi, Andrew 2014-03-11 14:21 GMT+09:00 Andrew Beekhof and...@beekhof.net: On 11 Mar 2014, at 4:14 pm, Andrew Beekhof and...@beekhof.net wrote: [snip] If I do this however: # cp start.xml 1.xml; tools/cibadmin

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Andrew Beekhof
On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 07.03.2014 10:30, Vladislav Bogdanov wrote: 07.03.2014 05:43, Andrew Beekhof wrote: On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Andrew Beekhof
On 12 Mar 2014, at 8:40 am, Andrew Beekhof and...@beekhof.net wrote: On 11 Mar 2014, at 6:23 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 07.03.2014 10:30, Vladislav Bogdanov wrote: 07.03.2014 05:43, Andrew Beekhof wrote: On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Yusuke Iida
Hi, Andrew 2014-03-12 6:37 GMT+09:00 Andrew Beekhof and...@beekhof.net: Mar 07 13:24:14 [2528] vm01 crmd: (te_callbacks:493 ) error: te_update_diff: Ingoring create operation for /cib 0xf91c10, configuration Thats interesting... is that with the fixes mentioned above? I'm sorry. The

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-11 Thread Vladislav Bogdanov
12.03.2014 00:37, Andrew Beekhof wrote: ... I'm somewhat confused at this point if crmsh is using --replace, then why is it doing diff calculations? Or are replace operations only for the load operation? It uses on of two methods depending on pacemaker version.

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-10 Thread Andrew Beekhof
On 7 Mar 2014, at 5:35 pm, Yusuke Iida yusk.i...@gmail.com wrote: Hi, Andrew 2014-03-07 11:43 GMT+09:00 Andrew Beekhof and...@beekhof.net: I don't understand... crm_mon doesn't look for changes to resources or constraints and it should already be using the new faster diff format. [/me

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-10 Thread Yusuke Iida
Hi, Andrew I attach CLI file which loaded. Although loaded xml does not exist as a file, I think from a log that they are the following forms. This log is extracted from the following reports. https://drive.google.com/file/d/0BwMFJItoO-fVWEw4Qnp0aHIzSm8/edit?usp=sharing Mar 07 13:24:14 [2523]

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-10 Thread Andrew Beekhof
I tried replacing pe-input-2.bz2 with pe-input-3.bz2 and saw: # cp start.xml 1.xml; tools/cibadmin --replace --xml-file replace.xml -V ( cib_file.c:268 )info: cib_file_perform_op_delegate:cib_replace on (null) ( cib_utils.c:338 ) trace: cib_perform_op: Begin cib_replace op (

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-10 Thread Andrew Beekhof
On 11 Mar 2014, at 4:14 pm, Andrew Beekhof and...@beekhof.net wrote: [snip] If I do this however: # cp start.xml 1.xml; tools/cibadmin --replace -o configuration --xml-file replace.some -V I start to see what you see: ( xml.c:4985 )info: validate_with_relaxng:

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-07 Thread Kristoffer Grönlund
On Fri, 07 Mar 2014 10:30:13 +0300 Vladislav Bogdanov bub...@hoster-ok.com wrote: Andrew, current git master (ee094a2) almost works, the only issue is that crm_diff calculates incorrect diff digest. If I replace digest in diff by hands with what cib calculates as expected. it applies

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-06 Thread Vladislav Bogdanov
18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the performance of Pacemaker in the following combinations. Pacemaker-1.1.11.rc1 libqb-0.16.0 corosync-2.3.2 All nodes are KVM virtual machines. stopped the

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-06 Thread Kristoffer Grönlund
On Thu, 06 Mar 2014 14:39:46 +0300 Vladislav Bogdanov bub...@hoster-ok.com wrote: Probably best to poke the corosync guys about this. However, = .11 is known to cause significant CPU usage with that many nodes. I can easily imagine this staving corosync of resources and causing

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-06 Thread Andrew Beekhof
On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the performance of Pacemaker in the following combinations. Pacemaker-1.1.11.rc1

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-06 Thread Yusuke Iida
Hi, Andrew 2014-03-07 11:43 GMT+09:00 Andrew Beekhof and...@beekhof.net: I don't understand... crm_mon doesn't look for changes to resources or constraints and it should already be using the new faster diff format. [/me reads attachment] Ah, but perhaps I do understand afterall :-) This

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-03-06 Thread Vladislav Bogdanov
07.03.2014 05:43, Andrew Beekhof wrote: On 6 Mar 2014, at 10:39 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the performance of Pacemaker in the following

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 6:06 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew I tested in the following environments. KVM virtual 16 machines CPU: 1 memory: 2048MB OS: RHEL6.4 Pacemaker-1.1.11(709b36b) corosync-2.3.2 libqb-0.16.0 It looks like performance is much better on the

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-20 Thread yusuke iida
Hi, Andrew 2014-02-20 17:28 GMT+09:00 Andrew Beekhof and...@beekhof.net: Who was pid 16243? Doesn't look like a pacemaker daemon. pid 16243 is crm_mon. In vm01, crm_mon was started and the state was checked. If there is information required for analysis to other, I get it. Regards, Yusuke

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-20 Thread Andrew Beekhof
On 20 Feb 2014, at 8:39 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, Andrew 2014-02-20 17:28 GMT+09:00 Andrew Beekhof and...@beekhof.net: Who was pid 16243? Doesn't look like a pacemaker daemon. pid 16243 is crm_mon. That means that the state displayed by crm_mon was 500 updates

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-18 Thread Vladislav Bogdanov
18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the performance of Pacemaker in the following combinations. Pacemaker-1.1.11.rc1 libqb-0.16.0 corosync-2.3.2 All nodes are KVM virtual machines. stopped the

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 7:40 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the performance of Pacemaker in the following combinations. Pacemaker-1.1.11.rc1

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-18 Thread Andrew Beekhof
On 18 Feb 2014, at 8:18 pm, Andrew Beekhof and...@beekhof.net wrote: On 18 Feb 2014, at 7:40 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: 18.02.2014 03:49, Andrew Beekhof wrote: On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-18 Thread yusuke iida
Hi, Andrew and Digimer Thank you for the comment. I solved with reference to other mailing list about this problem. https://bugzilla.redhat.com/show_bug.cgi?id=880035 It seems that the kernel of my environment was old when said from the conclusion. It updated to the newest kernel now.

Re: [Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-02-17 Thread Andrew Beekhof
On 31 Jan 2014, at 6:20 pm, yusuke iida yusk.i...@gmail.com wrote: Hi, all I measure the performance of Pacemaker in the following combinations. Pacemaker-1.1.11.rc1 libqb-0.16.0 corosync-2.3.2 All nodes are KVM virtual machines. stopped the node of vm01 compulsorily from the

[Pacemaker] What is the reason which the node in which failure has not occurred carries out lost?

2014-01-30 Thread yusuke iida
Hi, all I measure the performance of Pacemaker in the following combinations. Pacemaker-1.1.11.rc1 libqb-0.16.0 corosync-2.3.2 All nodes are KVM virtual machines. stopped the node of vm01 compulsorily from the inside, after starting 14 nodes. virsh destroy vm01 was used for the stop. Then, in