Hi, Please raise a ticket for this crash and share the traces of CPND and CPA(your application). Also, you should specify a testcase or try to explain what the application is doing and at what point the crash is occuring?
Thanks, Mathi. ----- [email protected] wrote: > Hi, > > > > I don’t get this issue with opensaf version 4.3, but I get segfault: > > > > application sometimes crashes, stack trace as below: > > > > Program received signal SIGSEGV, Segmentation fault. > > search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > "H\356\367\b") at patricia.c:94 > > 94 patricia.c: No such file or directory. > > (gdb) bt > > #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > "H\356\367\b") at patricia.c:94 > > #1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4, > pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 > > #2 0xb7738493 in cpa_lcl_ckpt_node_get > (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, > lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) > > at cpa_db.c:195 > > #3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120, > ioVector=0x92c6d28, numberOfElements=1320, > > erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at > cpa_api.c:3134 > > > > (gdb) p pNode > > $2 = (NCS_PATRICIA_NODE *) 0x5e > > (gdb) p *pTree > > $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, > key_info > = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, > actual_key_size = 0, > > node_size = 0}, n_nodes = 3} > > > > > > Regards, > > Girish > > > > *From:* Girish Nagaraj [mailto:[email protected]] > *Sent:* Friday, February 20, 2015 3:34 PM > *To:* 'A V Mahesh'; '[email protected]' > *Subject:* RE: [users] Issues with CPSv > > > > Hi, > > > > Yes, similar issue in TCP also: exits with message: > > > > Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost > with > dh server, exiting library err :Success > > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO > 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart > probation > timer started (timeout: 4000000000 ns) > > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of > 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1) > > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO > 'safComp=ribd,safSu=SU1,safSg=zebos-simplex,safApp=zebos' faulted due > to > 'avaDown' : Recovery is 'componentRestart' > > > > I experimented with code changes: > > > > recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->len_buff, 2, > MSG_NOSIGNAL); > > if (0 == recd_bytes) { > > syslog(LOG_ERR, "MDTM:socket_recv() = > %d, > conn lost with dh server, exiting library err 111:%d", recd_bytes, > errno); > > close(tcp_cb->DBSRsock); > > exit(0); > > } else if (2 == recd_bytes) { > > uint16_t local_len_buf = 0; > > > > data = tcp_cb->len_buff; > > local_len_buf = > ncs_decode_16bit(&data); > > > > /* MY CHANGE START */ > > *if (0 == local_len_buf)* > > * return;* > > /* MY CHANGE END */ > > > > tcp_cb->buff_total_len = > local_len_buf; > > tcp_cb->num_by_read_for_len_buff = 2; > > > > if (NULL == (tcp_cb->buffer = > calloc(1, > (local_len_buf + 1)))) { > > /* Length + 2 is done to reuse > the > same buffer > > while sending to other > nodes */ > > syslog(LOG_ERR, "Memory > allocation > failed in dtm_intranode_processing"); > > return; > > } > > recd_bytes = recv(tcp_cb->DBSRsock, > tcp_cb->buffer, local_len_buf, 0); > > if (recd_bytes < 0) { > > return; > > } else if (0 == recd_bytes) { > > syslog(LOG_ERR, > "MDTM:socket_recv() > = %d, conn lost with dh server, exiting library err 222:%d len:%d", > recd_bytes, errno, > > > local_len_buf); > > close(tcp_cb->DBSRsock); > > exit(0); > > > > This caused many other issues, so I think just returning won’t work. > > > > Regards, > > Girish > > > > -----Original Message----- > From: A V Mahesh [mailto:[email protected] > <[email protected]>] > Sent: Friday, February 20, 2015 1:38 PM > To: Girish Nagaraj; [email protected] > Subject: Re: [users] Issues with CPSv > > > > Hi, > > > > On 2/20/2015 1:19 PM, Girish Nagaraj wrote: > > > Hi , > > > > > > I think this is not connection loss, we are passing 0 (len of > bytes > > > to be > > > read) to recv() function. Which returns back 0 received bytes. > > > > You mean, you are seeing issue similar to `TIPC ticket #1227 > mds/tipc > > : protect mds application form zero bytes hacking messages` for TCP as > well > ? > > > > -AVM > > > > > > > > local_len_buf = ncs_decode_16bit(&data); > > > > > > Is there mistake in decoding local_len_buf? > > > > > > Regards, > > > Girish > > > > > > -----Original Message----- > > > From: A V Mahesh [mailto:[email protected] > <[email protected]> > ] > > > Sent: Friday, February 20, 2015 11:03 AM > > > To: [email protected] > > > Subject: Re: [users] Issues with CPSv > > > > > > Hi, > > > > > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: > > >> local_len_buf turns out be 0, this causes recv() to return 0 and > > >> application exits. Is this programming bug?? > > > This is expected behavior , if any connection loss happens on TCP > > > socket will recives ZERO size bytes, this not related to CPSv. > > > > > > -AVM > > > > > > > > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote: > > >> Hi, > > >> > > >> > > >> > > >> *Background*: > > >> > > >> Opensaf version: 4.5 > > >> > > >> Number of checkpoints used: 2 > > >> > > >> In our application we use CPSv to save application data and when > > >> application faults, it is restarted and it’s state is restored back > > >> by reading data from checkpoints > > >> > > >> Model: Simplex > > >> > > >> > > >> > > >> * Issue faced:* > > >> > > >> application sometimes crashes, stack trace as below: > > >> > > >> > > >> > > >> Program received signal SIGSEGV, Segmentation fault. > > >> > > >> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > > >> "H\356\367\b") at patricia.c:94 > > >> > > >> 94 patricia.c: No such file or directory. > > >> > > >> (gdb) bt > > >> > > >> #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 > > >> "H\356\367\b") at patricia.c:94 > > >> > > >> #1 0xb76d0bef in ncs_patricia_tree_get > (pTree=pTree@entry=0x8f733e4, > > >> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 > > >> > > >> #2 0xb7738493 in cpa_lcl_ckpt_node_get > > >> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, > > >> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) > > >> > > >> at cpa_db.c:195 > > >> > > >> #3 0xb7734d76 in saCkptCheckpointWrite > (checkpointHandle=150466120, > > >> ioVector=0x92c6d28, numberOfElements=1320, > > >> > > >> erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) > at > > >> cpa_api.c:3134 > > >> > > >> > > >> > > >> (gdb) p pNode > > >> > > >> $2 = (NCS_PATRICIA_NODE *) 0x5e > > >> > > >> (gdb) p *pTree > > >> > > >> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, > > >> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, > > >> actual_key_size = 0, > > >> > > >> node_size = 0}, n_nodes = 3} > > >> > > >> > > >> > > >> sometimes application exits with below message: > > >> > > >> > > >> > > >> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, > conn > > >> lost with dh server, exiting library err:0 len:0 > > >> > > >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO > > >> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart > > >> probation timer started (timeout: 4000000000 ns) > > >> > > >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a > > >> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp > > >> restart count: 1) > > >> > > >> > > >> > > >> > > >> > > >> Below is the modified code snippet from file > > >> osaf/libs/core/mds/mds_dt_trans.c > > >> > > >> > > >> > > >> } else if (2 == recd_bytes) { > > >> > > >> uint16_t local_len_buf = 0; > > >> > > >> > > >> > > >> data = tcp_cb->len_buff; > > >> > > >> local_len_buf = > > >> ncs_decode_16bit(&data); > > >> > > >> tcp_cb->buff_total_len = > > >> local_len_buf; > > >> > > >> tcp_cb->num_by_read_for_len_buff > = > > >> 2; > > >> > > >> > > >> > > >> if (NULL == (tcp_cb->buffer = > > >> calloc(1, (local_len_buf + 1)))) { > > >> > > >> /* Length + 2 is done to > > >> reuse the same buffer > > >> > > >> while sending to other > > >> nodes */ > > >> > > >> syslog(LOG_ERR, "Memory > > >> allocation failed in dtm_intranode_processing"); > > >> > > >> return; > > >> > > >> } > > >> > > >> recd_bytes = > recv(tcp_cb->DBSRsock, > > >> tcp_cb->buffer, local_len_buf, 0); > > >> > > >> if (recd_bytes < 0) { > > >> > > >> return; > > >> > > >> } else if (0 == recd_bytes) { > > >> > > >> syslog(LOG_ERR, > > >> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library > > >> err:%d len:%d", recd_bytes, errno, local_len_buf); > > >> > > >> close(tcp_cb->DBSRsock); > > >> > > >> exit(0); *<<<<<<<EXITS > > >> HERE>>>>>>>>>>* > > >> > > >> } else if (local_len_buf > > > >> recd_bytes) { > > >> > > >> /* can happen only in two > > >> cases, system call interrupt or half data, */ > > >> > > >> TRACE("less data recd, > recd > > >> bytes = %d, actual len = %d", recd_bytes, > > >> > > >> local_len_buf); > > >> > > >> tcp_cb->bytes_tb_read = > > >> tcp_cb->buff_total_len - recd_bytes; > > >> > > >> return; > > >> > > >> > > >> > > >> local_len_buf turns out be 0, this causes recv() to return 0 and > > >> application exits. Is this programming bug?? > > >> > > >> > > >> > > >> Could someone please help to resolve these issues. > > >> > > >> > > >> > > >> Regards, > > >> > > >> Girish > > >> > > > > > > > ---------------------------------------------------------------------- > > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > > > Server from Actuate! Instantly Supercharge Your Business Reports and > > > Dashboards with Interactivity, Sharing, Native Excel Exports, App > > > Integration & more Get technology previously reserved for > > > billion-dollar corporations, FREE > > > > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg. > > > clktrk _______________________________________________ > > > Opensaf-users mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > > > -- > . > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and > Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & > more > Get technology previously reserved for billion-dollar corporations, > FREE > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
