Can someone please check this issue
*From:* Girish Nagaraj [mailto:[email protected]] *Sent:* Thursday, February 19, 2015 3:42 PM *To:* '[email protected]' *Subject:* Issues with CPSv Hi, *Background*: Opensaf version: 4.5 Number of checkpoints used: 2 In our application we use CPSv to save application data and when application faults, it is restarted and it’s state is restored back by reading data from checkpoints Model: Simplex *Issue faced:* application sometimes crashes, stack trace as below: Program received signal SIGSEGV, Segmentation fault. search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:94 94 patricia.c: No such file or directory. (gdb) bt #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:94 #1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4, pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434 #2 0xb7738493 in cpa_lcl_ckpt_node_get (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4, lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10) at cpa_db.c:195 #3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120, ioVector=0x92c6d28, numberOfElements=1320, erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at cpa_api.c:3134 (gdb) p pNode $2 = (NCS_PATRICIA_NODE *) 0x5e (gdb) p *pTree $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, actual_key_size = 0, node_size = 0}, n_nodes = 3} sometimes application exits with below message: Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn lost with dh server, exiting library err:0 len:0 Feb 19 15:13:31 controller2 osafamfnd[28110]: NO 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart probation timer started (timeout: 4000000000 ns) Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1) Below is the modified code snippet from file osaf/libs/core/mds/mds_dt_trans.c } else if (2 == recd_bytes) { uint16_t local_len_buf = 0; data = tcp_cb->len_buff; local_len_buf = ncs_decode_16bit(&data); tcp_cb->buff_total_len = local_len_buf; tcp_cb->num_by_read_for_len_buff = 2; if (NULL == (tcp_cb->buffer = calloc(1, (local_len_buf + 1)))) { /* Length + 2 is done to reuse the same buffer while sending to other nodes */ syslog(LOG_ERR, "Memory allocation failed in dtm_intranode_processing"); return; } recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->buffer, local_len_buf, 0); if (recd_bytes < 0) { return; } else if (0 == recd_bytes) { syslog(LOG_ERR, "MDTM:socket_recv() = %d, conn lost with dh server, exiting library err:%d len:%d", recd_bytes, errno, local_len_buf); close(tcp_cb->DBSRsock); exit(0); *<<<<<<<EXITS HERE>>>>>>>>>>* } else if (local_len_buf > recd_bytes) { /* can happen only in two cases, system call interrupt or half data, */ TRACE("less data recd, recd bytes = %d, actual len = %d", recd_bytes, local_len_buf); tcp_cb->bytes_tb_read = tcp_cb->buff_total_len - recd_bytes; return; local_len_buf turns out be 0, this causes recv() to return 0 and application exits. Is this programming bug?? Could someone please help to resolve these issues. Regards, Girish -- . ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
