Hi Mahesh,
Could it be possible to provide the fix for this defect at the earliest? We have release in June, could it be possible before that? Can I at least get the fix patch before it gets officially released? Regards, Girish *From:* Girish Nagaraj [mailto:[email protected]] *Sent:* Thursday, March 26, 2015 3:26 PM *To:* 'A V Mahesh'; '[email protected]' *Subject:* RE: [users] Issues with CPSv Hi Mahesh, I tested with opensaf4.5 TIPC as MDS, this issue is not seen. Have raised a ticket “*#1285 MDS TCP: zero bytes recvd results in application exit*” Regards, Girish *From:* A V Mahesh [mailto:[email protected] <[email protected]>] *Sent:* Monday, February 23, 2015 10:04 AM *To:* Girish Nagaraj; [email protected] *Subject:* Re: [users] Issues with CPSv Hi, To confirm/isolate the problems further , test your application with TIPC transport with ticket #1227 fix ( both 4.3 & 4.5) . and provide your observations. If issue is NOT reproducible with TIPC transport , as a workaround prevent sending ZERO size ( hack message ) in your ckpt application for TCP transport and raise a ticket with all details as Mathi explained. -AVM On 2/20/2015 3:33 PM, Girish Nagaraj wrote: Hi, Yes, similar issue in TCP also: exits with message: Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost with dh server, exiting library err :Success Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart probation timer started (timeout: 4000000000 ns) Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1) Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO 'safComp=ribd,safSu=SU1,safSg=zebos-simplex,safApp=zebos' faulted due to 'avaDown' : Recovery is 'componentRestart' I experimented with code changes: recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->len_buff, 2, MSG_NOSIGNAL); if (0 == recd_bytes) { syslog(LOG_ERR, "MDTM:socket_recv() = %d, conn lost with dh server, exiting library err 111:%d", recd_bytes, errno); close(tcp_cb->DBSRsock); exit(0); } else if (2 == recd_bytes) { uint16_t local_len_buf = 0; data = tcp_cb->len_buff; local_len_buf = ncs_decode_16bit(&data); /* MY CHANGE START */ *if (0 == local_len_buf)* * return;* /* MY CHANGE END */ tcp_cb->buff_total_len = local_len_buf; tcp_cb->num_by_read_for_len_buff = 2; if (NULL == (tcp_cb->buffer = calloc(1, (local_len_buf + 1)))) { /* Length + 2 is done to reuse the same buffer while sending to other nodes */ syslog(LOG_ERR, "Memory allocation failed in dtm_intranode_processing"); return; } recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->buffer, local_len_buf, 0); if (recd_bytes < 0) { return; } else if (0 == recd_bytes) { syslog(LOG_ERR, "MDTM:socket_recv() = %d, conn lost with dh server, exiting library err 222:%d len:%d", recd_bytes, errno, local_len_buf); close(tcp_cb->DBSRsock); exit(0); This caused many other issues, so I think just returning won’t work. Regards, Girish -----Original Message----- From: A V Mahesh [mailto:[email protected]] Sent: Friday, February 20, 2015 1:38 PM To: Girish Nagaraj; [email protected] Subject: Re: [users] Issues with CPSv Hi, On 2/20/2015 1:19 PM, Girish Nagaraj wrote: > Hi , > > I think this is not connection loss, we are passing 0 (len of bytes > to be > read) to recv() function. Which returns back 0 received bytes. You mean, you are seeing issue similar to `TIPC ticket #1227 mds/tipc : protect mds application form zero bytes hacking messages` for TCP as well ? -AVM > > local_len_buf = ncs_decode_16bit(&data); > > Is there mistake in decoding local_len_buf? > > Regards, > Girish > > -----Original Message----- > From: A V Mahesh [mailto:[email protected] <[email protected]> ] > Sent: Friday, February 20, 2015 11:03 AM > To: [email protected] > Subject: Re: [users] Issues with CPSv > > Hi, > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: >> local_len_buf turns out be 0, this causes recv() to return 0 and >> application exits. Is this programming bug?? > This is expected behavior , if any connection loss happens on TCP > socket will recives ZERO size bytes, this not related to CPSv. > > -AVM > > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: >> Hi, >> >> >> >> *Background*: >> >> Opensaf version: 4.5 >> >> Number of checkpoints used: 2 >> >> In our application we use CPSv to save application data and when >> application faults, it is restarted and it’s state is restored back >> by reading data from checkpoints >> >> Model: Simplex >> >> >> >> * Issue faced:* >> >> application sometimes crashes, stack trace as below: >> >> >> >> Program received signal SIGSEGV, Segmentation fault. >> >> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 >> "H\356\367\b") at patricia.c:94 >> >> 94 patricia.c: No such file or directory. >> >> (gdb) bt >> >> #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 >> "H\356\367\b") at patricia.c:94 >> >> #1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4, >> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 >> >> #2 0xb7738493 in cpa_lcl_ckpt_node_get >> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, >> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) >> >> at cpa_db.c:195 >> >> #3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120, >> ioVector=0x92c6d28, numberOfElements=1320, >> >> erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at >> cpa_api.c:3134 >> >> >> >> (gdb) p pNode >> >> $2 = (NCS_PATRICIA_NODE *) 0x5e >> >> (gdb) p *pTree >> >> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, >> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, >> actual_key_size = 0, >> >> node_size = 0}, n_nodes = 3} >> >> >> >> sometimes application exits with below message: >> >> >> >> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn >> lost with dh server, exiting library err:0 len:0 >> >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO >> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart >> probation timer started (timeout: 4000000000 ns) >> >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a >> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp >> restart count: 1) >> >> >> >> >> >> Below is the modified code snippet from file >> osaf/libs/core/mds/mds_dt_trans.c >> >> >> >> } else if (2 == recd_bytes) { >> >> uint16_t local_len_buf = 0; >> >> >> >> data = tcp_cb->len_buff; >> >> local_len_buf = >> ncs_decode_16bit(&data); >> >> tcp_cb->buff_total_len = >> local_len_buf; >> >> tcp_cb->num_by_read_for_len_buff = >> 2; >> >> >> >> if (NULL == (tcp_cb->buffer = >> calloc(1, (local_len_buf + 1)))) { >> >> /* Length + 2 is done to >> reuse the same buffer >> >> while sending to other >> nodes */ >> >> syslog(LOG_ERR, "Memory >> allocation failed in dtm_intranode_processing"); >> >> return; >> >> } >> >> recd_bytes = recv(tcp_cb->DBSRsock, >> tcp_cb->buffer, local_len_buf, 0); >> >> if (recd_bytes < 0) { >> >> return; >> >> } else if (0 == recd_bytes) { >> >> syslog(LOG_ERR, >> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library >> err:%d len:%d", recd_bytes, errno, local_len_buf); >> >> close(tcp_cb->DBSRsock); >> >> exit(0); *<<<<<<<EXITS >> HERE>>>>>>>>>>* >> >> } else if (local_len_buf > >> recd_bytes) { >> >> /* can happen only in two >> cases, system call interrupt or half data, */ >> >> TRACE("less data recd, recd >> bytes = %d, actual len = %d", recd_bytes, >> >> local_len_buf); >> >> tcp_cb->bytes_tb_read = >> tcp_cb->buff_total_len - recd_bytes; >> >> return; >> >> >> >> local_len_buf turns out be 0, this causes recv() to return 0 and >> application exits. Is this programming bug?? >> >> >> >> Could someone please help to resolve these issues. >> >> >> >> Regards, >> >> Girish >> > > ---------------------------------------------------------------------- > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > Server from Actuate! Instantly Supercharge Your Business Reports and > Dashboards with Interactivity, Sharing, Native Excel Exports, App > Integration & more Get technology previously reserved for > billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg. > clktrk _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users > . -- . ------------------------------------------------------------------------------ BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT Develop your own process in accordance with the BPMN 2 standard Learn Process modeling best practices with Bonita BPM through live exercises http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_ source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
