Can someone please check this issue


*From:* Girish Nagaraj [mailto:[email protected]]
*Sent:* Thursday, February 19, 2015 3:42 PM
*To:* '[email protected]'
*Subject:* Issues with CPSv



Hi,



*Background*:

Opensaf version: 4.5

Number of checkpoints used: 2

In our application we use CPSv to save application data and when
application faults, it is restarted and it’s state is restored back by
reading data from checkpoints

Model: Simplex



*Issue faced:*

  application sometimes crashes, stack trace as below:



Program received signal SIGSEGV, Segmentation fault.

search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
"H\356\367\b") at patricia.c:94

94      patricia.c: No such file or directory.

(gdb) bt

#0  search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
"H\356\367\b") at patricia.c:94

#1  0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4,
pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434

#2  0xb7738493 in cpa_lcl_ckpt_node_get
(lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,
lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10)

    at cpa_db.c:195

#3  0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120,
ioVector=0x92c6d28, numberOfElements=1320,

    erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at
cpa_api.c:3134



(gdb) p pNode

$2 = (NCS_PATRICIA_NODE *) 0x5e

(gdb) p *pTree

$4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, key_info
= 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, actual_key_size = 0,

    node_size = 0}, n_nodes = 3}



  sometimes application exits with below message:



Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn lost
with dh server, exiting library err:0 len:0

Feb 19 15:13:31 controller2 osafamfnd[28110]: NO
'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart probation
timer started (timeout: 4000000000 ns)

Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a component of
'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1)





Below is the modified code snippet from file
osaf/libs/core/mds/mds_dt_trans.c



} else if (2 == recd_bytes) {

                                uint16_t local_len_buf = 0;



                                data = tcp_cb->len_buff;

                                local_len_buf = ncs_decode_16bit(&data);

                                tcp_cb->buff_total_len = local_len_buf;

                                tcp_cb->num_by_read_for_len_buff = 2;



                                if (NULL == (tcp_cb->buffer = calloc(1,
(local_len_buf + 1)))) {

                                        /* Length + 2 is done to reuse the
same buffer

                                           while sending to other nodes */

                                        syslog(LOG_ERR, "Memory allocation
failed in dtm_intranode_processing");

                                        return;

                                }

                                recd_bytes = recv(tcp_cb->DBSRsock,
tcp_cb->buffer, local_len_buf, 0);

                                if (recd_bytes < 0) {

                                        return;

                                } else if (0 == recd_bytes) {

                                        syslog(LOG_ERR, "MDTM:socket_recv()
= %d, conn lost with dh server, exiting library err:%d len:%d", recd_bytes,
errno, local_len_buf);

                                        close(tcp_cb->DBSRsock);

                                        exit(0); *<<<<<<<EXITS
HERE>>>>>>>>>>*

                                } else if (local_len_buf > recd_bytes) {

                                        /* can happen only in two cases,
system call interrupt or half data, */

                                        TRACE("less data recd, recd bytes =
%d, actual len = %d", recd_bytes,

                                               local_len_buf);

                                        tcp_cb->bytes_tb_read =
tcp_cb->buff_total_len - recd_bytes;

                                        return;



local_len_buf turns out be 0, this causes recv() to return 0 and
application exits. Is this programming bug??



Could someone please help to resolve these issues.



Regards,

Girish

-- 
.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to