Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

renayama19661014 Mon, 14 Jun 2010 00:45:04 -0700

Hi Andrew,

Thank you for comment.


> More likely of the underlying messaging infrastructure, but I'll take a look.
> Perhaps the default cib operation timeouts are too low for larger clusters.
> 
> >
> > The log attached it to next Bugzilla.
> > &#65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
> 
> Ok, I'll follow up there.

If it is necessary for us to work for the solution of the problem, please order 
it.

Best Regards,
Hideo Yamauchi.

--- Andrew Beekhof <and...@beekhof.net> wrote:

> On Mon, Jun 14, 2010 at 4:46 AM,  <renayama19661...@ybb.ne.jp> wrote:
> > We tested 16 node constitution (15+1).
> >
> > We carried out the next procedure.
> >
> > Step1) Start 16 nodes.
> > Step2) Send cib after a DC node was decided.
> >
> > An error occurs by the update of the attribute of pingd after Probe 
> > processing was over.
> >
> >
>
----------------------------------------------------------------------------------------------------------------------------------------
> > Jun 14 10:58:03 hb0102 pingd: [2465]: info: ping_read: Retrying...
> > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 337 
> > for
> default_ping_set=1600
> > failed: Remote node did not respond
> > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 340 
> > for
> default_ping_set=1600
> > failed: Remote node did not respond
> > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 343 
> > for
> default_ping_set=1600
> > failed: Remote node did not respond
> > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 346 
> > for
> default_ping_set=1600
> > failed: Remote node did not respond
> > Jun 14 10:58:13 hb0102 attrd: [2155]: WARN: attrd_cib_callback: Update 349 
> > for
> default_ping_set=1600
> > failed: Remote node did not respond
> >
>
----------------------------------------------------------------------------------------------------------------------------------------
> >
> > In the middle of this error, I carried out a cibadmin(-Q optin) command, 
> > but time-out
> occurred.
> > In addition, cib of the DC node seemed to move by the top command very 
> > busily.
> >
> >
> > In addition, a communication error with cib occurs in the DC node, and crmd 
> > reboots.
> >
> >
>
----------------------------------------------------------------------------------------------------------------------------------------
> > Jun 14 10:58:09 hb0101 attrd: [2278]: WARN: xmlfromIPC: No message received 
> > in the required
> interval
> > (120s)
> > Jun 14 10:58:09 hb0101 attrd: [2278]: info: attrd_perform_update: Sent 
> > update -41:
> > default_ping_set=1600
> > (snip)
> > Jun 14 10:59:07 hb0101 crmd: [2280]: info: do_exit: [crmd] stopped (2)
> > Jun 14 10:59:07 hb0101 corosync[2269]: &#65533; [pcmk &#65533;] 
> > plugin.c:858 info: pcmk_ipc_exit:
Client
> crmd
> > (conn=0x106a2bf0, async-conn=0x106a2bf0) left
> > Jun 14 10:59:08 hb0101 corosync[2269]: &#65533; [pcmk &#65533;] 
> > plugin.c:481 ERROR:
pcmk_wait_dispatch:
> Child
> > process crmd exited (pid=2280, rc=2)
> > Jun 14 10:59:08 hb0101 corosync[2269]: &#65533; [pcmk &#65533;] 
> > plugin.c:498 notice:
pcmk_wait_dispatch:
> Respawning
> > failed child process: crmd
> > Jun 14 10:59:08 hb0101 corosync[2269]: &#65533; [pcmk &#65533;] utils.c:131 
> > info: spawn_child:
Forked child
> 2680 for
> > process crmd
> > Jun 14 10:59:08 hb0101 crmd: [2680]: info: Invoked: 
> > /usr/lib64/heartbeat/crmd
> > Jun 14 10:59:08 hb0101 crmd: [2680]: info: main: CRM Hg Version:
> > 9f04fa88cfd3da553e977cc79983d1c494c8b502
> > Jun 14 10:59:08 hb0101 crmd: [2680]: info: crmd_init: Starting crmd
> > Jun 14 10:59:08 hb0101 crmd: [2680]: info: G_main_add_SignalHandler: Added 
> > signal handler for
> signal
> > 17
> >
>
----------------------------------------------------------------------------------------------------------------------------------------
> >
> > There seems to be a problem in cib of the DC node somehow or other.
> > We hope that an attribute change is completed in 16 nodes definitely.
> > &#65533;* Is this phenomenon a limit of the current cib process?
> 
> More likely of the underlying messaging infrastructure, but I'll take a look.
> Perhaps the default cib operation timeouts are too low for larger clusters.
> 
> >
> > The log attached it to next Bugzilla.
> > &#65533;* http://developerbugs.linux-foundation.org/show_bug.cgi?id=2443
> 
> Ok, I'll follow up there.
> 
> >
> > Best Regards,
> > Hideo Yamauchi.
> >
> >
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: 
> > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> >
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker
> 


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] [Problem]Cib cannot update an attribute by 16 node constitution.

Reply via email to