Hi, On 2/20/2015 1:19 PM, Girish Nagaraj wrote: > Hi , > > I think this is not connection loss, we are passing 0 (len of bytes to be > read) to recv() function. Which returns back 0 received bytes.
You mean, you are seeing issue similar to `TIPC ticket #1227 mds/tipc : protect mds application form zero bytes hacking messages` for TCP as well ? -AVM > > local_len_buf = ncs_decode_16bit(&data); > > Is there mistake in decoding local_len_buf? > > Regards, > Girish > > -----Original Message----- > From: A V Mahesh [mailto:[email protected]] > Sent: Friday, February 20, 2015 11:03 AM > To: [email protected] > Subject: Re: [users] Issues with CPSv > > Hi, > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: >> local_len_buf turns out be 0, this causes recv() to return 0 and >> application exits. Is this programming bug?? > This is expected behavior , if any connection loss happens on TCP socket > will recives ZERO size bytes, this not related to CPSv. > > -AVM > > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: >> Hi, >> >> >> >> *Background*: >> >> Opensaf version: 4.5 >> >> Number of checkpoints used: 2 >> >> In our application we use CPSv to save application data and when >> application faults, it is restarted and it’s state is restored back by >> reading data from checkpoints >> >> Model: Simplex >> >> >> >> * Issue faced:* >> >> application sometimes crashes, stack trace as below: >> >> >> >> Program received signal SIGSEGV, Segmentation fault. >> >> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 >> "H\356\367\b") at patricia.c:94 >> >> 94 patricia.c: No such file or directory. >> >> (gdb) bt >> >> #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 >> "H\356\367\b") at patricia.c:94 >> >> #1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4, >> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 >> >> #2 0xb7738493 in cpa_lcl_ckpt_node_get >> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, >> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) >> >> at cpa_db.c:195 >> >> #3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120, >> ioVector=0x92c6d28, numberOfElements=1320, >> >> erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at >> cpa_api.c:3134 >> >> >> >> (gdb) p pNode >> >> $2 = (NCS_PATRICIA_NODE *) 0x5e >> >> (gdb) p *pTree >> >> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, >> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, >> actual_key_size = 0, >> >> node_size = 0}, n_nodes = 3} >> >> >> >> sometimes application exits with below message: >> >> >> >> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn >> lost with dh server, exiting library err:0 len:0 >> >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO >> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart >> probation timer started (timeout: 4000000000 ns) >> >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a >> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp >> restart count: 1) >> >> >> >> >> >> Below is the modified code snippet from file >> osaf/libs/core/mds/mds_dt_trans.c >> >> >> >> } else if (2 == recd_bytes) { >> >> uint16_t local_len_buf = 0; >> >> >> >> data = tcp_cb->len_buff; >> >> local_len_buf = >> ncs_decode_16bit(&data); >> >> tcp_cb->buff_total_len = >> local_len_buf; >> >> tcp_cb->num_by_read_for_len_buff = 2; >> >> >> >> if (NULL == (tcp_cb->buffer = >> calloc(1, (local_len_buf + 1)))) { >> >> /* Length + 2 is done to >> reuse the same buffer >> >> while sending to other >> nodes */ >> >> syslog(LOG_ERR, "Memory >> allocation failed in dtm_intranode_processing"); >> >> return; >> >> } >> >> recd_bytes = recv(tcp_cb->DBSRsock, >> tcp_cb->buffer, local_len_buf, 0); >> >> if (recd_bytes < 0) { >> >> return; >> >> } else if (0 == recd_bytes) { >> >> syslog(LOG_ERR, >> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library >> err:%d len:%d", recd_bytes, errno, local_len_buf); >> >> close(tcp_cb->DBSRsock); >> >> exit(0); *<<<<<<<EXITS >> HERE>>>>>>>>>>* >> >> } else if (local_len_buf > >> recd_bytes) { >> >> /* can happen only in two >> cases, system call interrupt or half data, */ >> >> TRACE("less data recd, recd >> bytes = %d, actual len = %d", recd_bytes, >> >> local_len_buf); >> >> tcp_cb->bytes_tb_read = >> tcp_cb->buff_total_len - recd_bytes; >> >> return; >> >> >> >> local_len_buf turns out be 0, this causes recv() to return 0 and >> application exits. Is this programming bug?? >> >> >> >> Could someone please help to resolve these issues. >> >> >> >> Regards, >> >> Girish >> > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from > Actuate! Instantly Supercharge Your Business Reports and Dashboards with > Interactivity, Sharing, Native Excel Exports, App Integration & more Get > technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
