Hi Andrew, > Do you know if this behaviour still exists? > A LOT of work went into the remote node logic in the last couple of months, > its > possible this was fixed as a side-effect.
It is the latest and does not confirm it. I confirm it. Many Thanks! Hideo Yamauchi. ----- Original Message ----- > From: Andrew Beekhof <and...@beekhof.net> > To: renayama19661...@ybb.ne.jp; Cluster Labs - All topics related to > open-source clustering welcomed <users@clusterlabs.org> > Cc: > Date: 2015/8/4, Tue 13:16 > Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About movement of > pacemaker_remote. > > >> On 12 May 2015, at 12:12 pm, renayama19661...@ybb.ne.jp wrote: >> >> Hi All, >> >> The problem is like a buffer becoming NULL after crm_resouce -C practice > somehow or other after having rebooted remote node. >> >> I incorporated log in a source code and confirmed it. >> >> ------------------------------------------------ >> crm_remote_recv_once(crm_remote_t * remote) >> { >> (snip) >> /* automatically grow the buffer when needed */ >> if(remote->buffer_size < read_len) { >> remote->buffer_size = 2 * read_len; >> crm_trace("Expanding buffer to %u bytes", > remote->buffer_size); >> >> remote->buffer = realloc_safe(remote->buffer, > remote->buffer_size + 1); >> CRM_ASSERT(remote->buffer != NULL); >> } >> >> #ifdef HAVE_GNUTLS_GNUTLS_H >> if (remote->tls_session) { >> if (remote->buffer == NULL) { >> crm_info("### YAMAUCHI buffer is NULL [buffer_zie[%d] > readlen[%d]", remote->buffer_size, read_len); >> } >> rc = gnutls_record_recv(*(remote->tls_session), >> remote->buffer + > remote->buffer_offset, >> remote->buffer_size - > remote->buffer_offset); >> (snip) >> ------------------------------------------------ >> >> May 12 10:54:01 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### > YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40] >> May 12 10:54:02 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### > YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40] >> May 12 10:54:04 sl7-01 crmd[30447]: info: crm_remote_recv_once: ### > YAMAUCHI buffer is NULL [buffer_zie[1326] readlen[40] > > Do you know if this behaviour still exists? > A LOT of work went into the remote node logic in the last couple of months, > its > possible this was fixed as a side-effect. > >> >> ------------------------------------------------ >> >> gnutls_record_recv processes an empty buffer and becomes the error. >> >> ------------------------------------------------ >> (snip) >> ssize_t >> _gnutls_recv_int(gnutls_session_t session, content_type_t type, >> gnutls_handshake_description_t htype, >> gnutls_packet_t *packet, >> uint8_t * data, size_t data_size, void *seq, >> unsigned int ms) >> { >> int ret; >> >> if (packet == NULL && (type != GNUTLS_ALERT && type != > GNUTLS_HEARTBEAT) >> && (data_size == 0 || data == NULL)) >> return gnutls_assert_val(GNUTLS_E_INVALID_REQUEST); >> >> (sip) >> ssize_t >> gnutls_record_recv(gnutls_session_t session, void *data, size_t data_size) >> { >> return _gnutls_recv_int(session, GNUTLS_APPLICATION_DATA, -1, NULL, >> data, data_size, NULL, >> session->internals.record_timeout_ms); >> } >> (snip) >> ------------------------------------------------ >> >> Best Regards, >> Hideo Yamauchi. >> >> >> >> ----- Original Message ----- >>> From: "renayama19661...@ybb.ne.jp" > <renayama19661...@ybb.ne.jp> >>> To: "users@clusterlabs.org" <users@clusterlabs.org> >>> Cc: >>> Date: 2015/5/11, Mon 16:45 >>> Subject: Re: [ClusterLabs] Antw: Re: Antw: Re: [Question] About > movement of pacemaker_remote. >>> >>> Hi Ulrich, >>> >>> Thank you for comments. >>> >>>> So your host and you resource are both named "snmp1"? I > also >>> don't >>>> have much experience with cleaning up resources for a node that is > offline. >>> What >>>> change should it make (while the node is offline)? >>> >>> >>> The name of the remote resource and the name of the remote node make > same >>> "snmp1". >>> >>> >>> (snip) >>> primitive snmp1 ocf:pacemaker:remote \ >>> params \ >>> server="snmp1" \ >>> op start interval="0s" timeout="60s" >>> on-fail="ignore" \ >>> op monitor interval="3s" timeout="15s" > \ >>> op stop interval="0s" timeout="60s" >>> on-fail="ignore" >>> >>> primitive Host-rsc1 ocf:heartbeat:Dummy \ >>> op start interval="0s" timeout="60s" >>> on-fail="restart" \ >>> op monitor interval="10s" timeout="60s" >>> on-fail="restart" \ >>> op stop interval="0s" timeout="60s" >>> on-fail="ignore" >>> >>> primitive Remote-rsc1 ocf:heartbeat:Dummy \ >>> op start interval="0s" timeout="60s" >>> on-fail="restart" \ >>> op monitor interval="10s" timeout="60s" >>> on-fail="restart" \ >>> op stop interval="0s" timeout="60s" >>> on-fail="ignore" >>> >>> location loc1 Remote-rsc1 \ >>> rule 200: #uname eq snmp1 >>> location loc3 Host-rsc1 \ >>> rule 200: #uname eq bl460g8n1 >>> (snip) >>> >>> The pacemaker_remoted of the snmp1 node stops in SIGTERM. >>> I reboot pacemaker_remoted of the snmp1 node afterwards. >>> And I execute crm_resource command, but the snmp1 node remains > off-line. >>> >>> After having executed crm_resource command, the remote node thinks that > it is >>> right movement to become the snmp1 online. >>> >>> >>> >>> Best Regards, >>> Hideo Yamauchi. >>> >>> >>> >>> >>> >>> ----- Original Message ----- >>>> From: Ulrich Windl <ulrich.wi...@rz.uni-regensburg.de> >>>> To: users@clusterlabs.org; renayama19661...@ybb.ne.jp >>>> Cc: >>>> Date: 2015/5/11, Mon 15:39 >>>> Subject: Antw: Re: [ClusterLabs] Antw: Re: [Question] About > movement of >>> pacemaker_remote. >>>> >>>>>>> <renayama19661...@ybb.ne.jp> schrieb am > 11.05.2015 um >>> 06:22 >>>> in Nachricht >>>> <361916.15877...@web200006.mail.kks.yahoo.co.jp>: >>>>> Hi All, >>>>> >>>>> I matched the OS version of the remote node with a host once > again and >>> >>>>> confirmed it in Pacemaker1.1.13-rc2. >>>>> >>>>> It was the same even if I made a host RHEL7.1.(bl460g8n1) >>>>> I made the remote host RHEL7.1.(snmp1) >>>>> >>>>> The first crm_resource -C fails. >>>>> -------------------------------- >>>>> [root@bl460g8n1 ~]# crm_resource -C -r snmp1 >>>>> Cleaning up snmp1 on bl460g8n1 >>>>> Waiting for 1 replies from the CRMd. OK >>>>> >>>>> [root@bl460g8n1 ~]# crm_mon -1 -Af >>>>> Last updated: Mon May 11 12:44:31 2015 >>>>> Last change: Mon May 11 12:43:30 2015 >>>>> Stack: corosync >>>>> Current DC: bl460g8n1 - partition WITHOUT quorum >>>>> Version: 1.1.12-7a2e3ae >>>>> 2 Nodes configured >>>>> 3 Resources configured >>>>> >>>>> >>>>> Online: [ bl460g8n1 ] >>>>> RemoteOFFLINE: [ snmp1 ] >>>> >>>> So your host and you resource are both named "snmp1"? I > also >>> don't >>>> have much experience with cleaning up resources for a node that is > offline. >>> What >>>> change should it make (while the node is offline)? >>>> >>>>> >>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 >>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 > (failure >>> ignored) >>>>> >>>>> Node Attributes: >>>>> * Node bl460g8n1: >>>>> + ringnumber_0 : 192.168.101.21 is UP >>>>> + ringnumber_1 : 192.168.102.21 is UP >>>>> >>>>> Migration summary: >>>>> * Node bl460g8n1: >>>>> snmp1: migration-threshold=1 fail-count=1000000 >>> last-failure='Mon >>>> May 11 >>>>> 12:44:28 2015' >>>>> >>>>> Failed actions: >>>>> snmp1_start_0 on bl460g8n1 'unknown error' (1): > call=5, >>>> status=Timed >>>>> Out, exit-reason='none', last-rc-change='Mon May > 11 >>> 12:43:31 >>>> 2015', queued=0ms, >>>>> exec=0ms >>>>> -------------------------------- >>>>> >>>>> >>>>> The second crm_resource -C succeeded and was connected to the > remote >>> host. >>>> >>>> Then the node was online it seems. >>>> >>>> Regards, >>>> Ulrich >>>> >>>>> -------------------------------- >>>>> [root@bl460g8n1 ~]# crm_mon -1 -Af >>>>> Last updated: Mon May 11 12:44:54 2015 >>>>> Last change: Mon May 11 12:44:48 2015 >>>>> Stack: corosync >>>>> Current DC: bl460g8n1 - partition WITHOUT quorum >>>>> Version: 1.1.12-7a2e3ae >>>>> 2 Nodes configured >>>>> 3 Resources configured >>>>> >>>>> >>>>> Online: [ bl460g8n1 ] >>>>> RemoteOnline: [ snmp1 ] >>>>> >>>>> Host-rsc1 (ocf::heartbeat:Dummy): Started bl460g8n1 >>>>> Remote-rsc1 (ocf::heartbeat:Dummy): Started snmp1 >>>>> snmp1 (ocf::pacemaker:remote): Started bl460g8n1 >>>>> >>>>> Node Attributes: >>>>> * Node bl460g8n1: >>>>> + ringnumber_0 : 192.168.101.21 is UP >>>>> + ringnumber_1 : 192.168.102.21 is UP >>>>> * Node snmp1: >>>>> >>>>> Migration summary: >>>>> * Node bl460g8n1: >>>>> * Node snmp1: >>>>> -------------------------------- >>>>> >>>>> The gnutls of a host and the remote node was the next > version. >>>>> >>>>> gnutls-devel-3.3.8-12.el7.x86_64 >>>>> gnutls-dane-3.3.8-12.el7.x86_64 >>>>> gnutls-c++-3.3.8-12.el7.x86_64 >>>>> gnutls-3.3.8-12.el7.x86_64 >>>>> gnutls-utils-3.3.8-12.el7.x86_64 >>>>> >>>>> >>>>> Best Regards, >>>>> Hideo Yamauchi. >>>>> >>>>> >>>>> >>>>> >>>>> ----- Original Message ----- >>>>>> From: "renayama19661...@ybb.ne.jp" >>>> <renayama19661...@ybb.ne.jp> >>>>>> To: Cluster Labs - All topics related to open-source > clustering >>>> welcomed >>>>> <users@clusterlabs.org> >>>>>> Cc: >>>>>> Date: 2015/4/28, Tue 14:06 >>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About > movement of >>>>> pacemaker_remote. >>>>>> >>>>>> Hi David, >>>>>> >>>>>> Even if the result changed the remote node to RHEL7.1, it > was the >>> same. >>>>>> >>>>>> >>>>>> I try it with a host node of pacemaker as RHEL7.1 this > time. >>>>>> >>>>>> >>>>>> I noticed an interesting phenomenon. >>>>>> The remote node fails in a reconnection in the first > crm_resource. >>>>>> However, the remote node succeeds in a reconnection in > the second >>>>> crm_resource. >>>>>> >>>>>> I think that I have some problem with the point where I > cut the >>>> connection >>>>> with >>>>>> the remote node first. >>>>>> >>>>>> Best Regards, >>>>>> Hideo Yamauchi. >>>>>> >>>>>> >>>>>> ----- Original Message ----- >>>>>>> From: "renayama19661...@ybb.ne.jp" >>>>>> <renayama19661...@ybb.ne.jp> >>>>>>> To: Cluster Labs - All topics related to open-source >>> clustering >>>> welcomed >>>>>> <users@clusterlabs.org> >>>>>>> Cc: >>>>>>> Date: 2015/4/28, Tue 11:52 >>>>>>> Subject: Re: [ClusterLabs] Antw: Re: [Question] About > >>> movement of >>>>>> pacemaker_remote. >>>>>>> >>>>>>> Hi David, >>>>>>> Thank you for comments. >>>>>>>> At first glance this looks gnutls related. > GNUTLS is >>>> returning -50 >>>>>> during >>>>>>> receive >>>>>>> >>>>>>>> on the client side (pacemaker's side). -50 > maps to >>>> 'invalid >>>>>>> request'. >debug: crm_remote_recv_once: > TLS >>> receive >>>> failed: The >>>>>>> request is invalid. >We treat this error as fatal > and >>> destroy >>>> the >>>>>> connection. >>>>>>> I've never encountered >>>>>>>> this error and I don't know what causes it. > It's >>>> possible >>>>>>> there's a bug in >>>>>>>> our gnutls usage... it's also possible > there's a >>> bug >>>> in the >>>>>> version >>>>>>> of gnutls >>>>>>>> that is in use as well. >>>>>>> We built the remote node in RHEL6.5. >>>>>>> Because it may be a problem of gnutls, I confirm it > in >>> RHEL7.1. >>>>>>> >>>>>>> Best Regards, >>>>>>> Hideo Yamauchi. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Users mailing list: Users@clusterlabs.org >>>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>>> >>>>>>> Project Home: http://www.clusterlabs.org >>>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>>> Bugs: http://bugs.clusterlabs.org >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Users mailing list: Users@clusterlabs.org >>>>>> http://clusterlabs.org/mailman/listinfo/users >>>>>> >>>>>> Project Home: http://www.clusterlabs.org >>>>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>>> Bugs: http://bugs.clusterlabs.org >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list: Users@clusterlabs.org >>>>> http://clusterlabs.org/mailman/listinfo/users >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: http://bugs.clusterlabs.org >>>> >>> >>> _______________________________________________ >>> Users mailing list: Users@clusterlabs.org >>> http://clusterlabs.org/mailman/listinfo/users >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Users mailing list: Users@clusterlabs.org >> http://clusterlabs.org/mailman/listinfo/users >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > _______________________________________________ Users mailing list: Users@clusterlabs.org http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org