Hi ,

 I think this is not connection loss, we are passing 0 (len of bytes to be
read) to recv() function. Which returns back 0 received bytes.

     local_len_buf =  ncs_decode_16bit(&data);

 Is there mistake in decoding local_len_buf?

Regards,
Girish

-----Original Message-----
From: A V Mahesh [mailto:[email protected]]
Sent: Friday, February 20, 2015 11:03 AM
To: [email protected]
Subject: Re: [users] Issues with CPSv

Hi,

On 2/19/2015 3:42 PM, Girish Nagaraj wrote:
> local_len_buf turns out be 0, this causes recv() to return 0 and
> application exits. Is this programming bug??
This is expected behavior , if any connection loss happens on TCP socket
will recives ZERO  size bytes, this not related to CPSv.

-AVM


On 2/19/2015 3:42 PM, Girish Nagaraj wrote:
> Hi,
>
>
>
> *Background*:
>
> Opensaf version: 4.5
>
> Number of checkpoints used: 2
>
> In our application we use CPSv to save application data and when
> application faults, it is restarted and it’s state is restored back by
> reading data from checkpoints
>
> Model: Simplex
>
>
>
> * Issue faced:*
>
>    application sometimes crashes, stack trace as below:
>
>
>
> Program received signal SIGSEGV, Segmentation fault.
>
> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
> "H\356\367\b") at patricia.c:94
>
> 94      patricia.c: No such file or directory.
>
> (gdb) bt
>
> #0  search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
> "H\356\367\b") at patricia.c:94
>
> #1  0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4,
> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434
>
> #2  0xb7738493 in cpa_lcl_ckpt_node_get
> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,
> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10)
>
>      at cpa_db.c:195
>
> #3  0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120,
> ioVector=0x92c6d28, numberOfElements=1320,
>
>      erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at
> cpa_api.c:3134
>
>
>
> (gdb) p pNode
>
> $2 = (NCS_PATRICIA_NODE *) 0x5e
>
> (gdb) p *pTree
>
> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4,
> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0,
> actual_key_size = 0,
>
>      node_size = 0}, n_nodes = 3}
>
>
>
>    sometimes application exits with below message:
>
>
>
> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn
> lost with dh server, exiting library err:0 len:0
>
> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO
> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart
> probation timer started (timeout: 4000000000 ns)
>
> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a
> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp
> restart count: 1)
>
>
>
>
>
> Below is the modified code snippet from file
> osaf/libs/core/mds/mds_dt_trans.c
>
>
>
> } else if (2 == recd_bytes) {
>
>                                  uint16_t local_len_buf = 0;
>
>
>
>                                  data = tcp_cb->len_buff;
>
>                                  local_len_buf =
> ncs_decode_16bit(&data);
>
>                                  tcp_cb->buff_total_len =
> local_len_buf;
>
>                                  tcp_cb->num_by_read_for_len_buff = 2;
>
>
>
>                                  if (NULL == (tcp_cb->buffer =
> calloc(1, (local_len_buf + 1)))) {
>
>                                          /* Length + 2 is done to
> reuse the same buffer
>
>                                             while sending to other
> nodes */
>
>                                          syslog(LOG_ERR, "Memory
> allocation failed in dtm_intranode_processing");
>
>                                          return;
>
>                                  }
>
>                                  recd_bytes = recv(tcp_cb->DBSRsock,
> tcp_cb->buffer, local_len_buf, 0);
>
>                                  if (recd_bytes < 0) {
>
>                                          return;
>
>                                  } else if (0 == recd_bytes) {
>
>                                          syslog(LOG_ERR,
> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library
> err:%d len:%d", recd_bytes, errno, local_len_buf);
>
>                                          close(tcp_cb->DBSRsock);
>
>                                          exit(0); *<<<<<<<EXITS
> HERE>>>>>>>>>>*
>
>                                  } else if (local_len_buf >
> recd_bytes) {
>
>                                          /* can happen only in two
> cases, system call interrupt or half data, */
>
>                                          TRACE("less data recd, recd
> bytes = %d, actual len = %d", recd_bytes,
>
>                                                 local_len_buf);
>
>                                          tcp_cb->bytes_tb_read =
> tcp_cb->buff_total_len - recd_bytes;
>
>                                          return;
>
>
>
> local_len_buf turns out be 0, this causes recv() to return 0 and
> application exits. Is this programming bug??
>
>
>
> Could someone please help to resolve these issues.
>
>
>
> Regards,
>
> Girish
>


------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from
Actuate! Instantly Supercharge Your Business Reports and Dashboards with
Interactivity, Sharing, Native Excel Exports, App Integration & more Get
technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

-- 
.

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to