Girish,

It appears that the crash is happening in the CPA (CheckpointAgent) library 
linked to your application.
To enable the CPA traces, you could export 
CPA_TRACE_PATHNAME=/tmp/myapp_cpadebug.log  before starting your application
and share the traces.

Cheers,
Mathi.

----- [email protected] wrote:

> Hi Mathi/Mahesh,
> 
>   First of all thanks for helping me in resolving this issue.
> 
>   Do you require CPA(application) or traces of CPA? If it is traces,
> please
> let me know how to get it.
> 
> Regards,
> Girish
> 
> -----Original Message-----
> From: Mathivanan Naickan Palanivelu [mailto:[email protected]]
> Sent: Friday, February 20, 2015 3:55 PM
> To: [email protected]
> Cc: [email protected]; [email protected]
> Subject: Re: [users] Issues with CPSv
> 
> Hi,
> 
> Please raise a ticket for this crash and share the traces of CPND and
> CPA(your application).
> Also, you should specify a testcase or try to explain what the
> application
> is doing and at what point the crash is occuring?
> 
> 
> Thanks,
> Mathi.
> 
> ----- [email protected] wrote:
> 
> > Hi,
> >
> >
> >
> > I don’t get this issue with opensaf version 4.3, but I get
> segfault:
> >
> >
> >
> > application sometimes crashes, stack trace as below:
> >
> >
> >
> > Program received signal SIGSEGV, Segmentation fault.
> >
> > search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
> > "H\356\367\b") at patricia.c:94
> >
> > 94      patricia.c: No such file or directory.
> >
> > (gdb) bt
> >
> > #0  search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
> > "H\356\367\b") at patricia.c:94
> >
> > #1  0xb76d0bef in ncs_patricia_tree_get
> (pTree=pTree@entry=0x8f733e4,
> > pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434
> >
> > #2  0xb7738493 in cpa_lcl_ckpt_node_get
> > (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,
> > lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10)
> >
> >     at cpa_db.c:195
> >
> > #3  0xb7734d76 in saCkptCheckpointWrite
> (checkpointHandle=150466120,
> > ioVector=0x92c6d28, numberOfElements=1320,
> >
> >     erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at
> > cpa_api.c:3134
> >
> >
> >
> > (gdb) p pNode
> >
> > $2 = (NCS_PATRICIA_NODE *) 0x5e
> >
> > (gdb) p *pTree
> >
> > $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4,
> > key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0,
> > actual_key_size = 0,
> >
> >     node_size = 0}, n_nodes = 3}
> >
> >
> >
> >
> >
> > Regards,
> >
> > Girish
> >
> >
> >
> > *From:* Girish Nagaraj [mailto:[email protected]]
> > *Sent:* Friday, February 20, 2015 3:34 PM
> > *To:* 'A V Mahesh'; '[email protected]'
> > *Subject:* RE: [users] Issues with CPSv
> >
> >
> >
> > Hi,
> >
> >
> >
> > Yes, similar issue in TCP also: exits with message:
> >
> >
> >
> > Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn
> lost
> > with dh server, exiting library err :Success
> >
> > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO
> > 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart
> > probation timer started (timeout: 4000000000 ns)
> >
> > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component
> of
> > 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count:
> 1)
> >
> > Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO
> > 'safComp=ribd,safSu=SU1,safSg=zebos-simplex,safApp=zebos' faulted
> due
> > to 'avaDown' : Recovery is 'componentRestart'
> >
> >
> >
> > I experimented with code changes:
> >
> >
> >
> > recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->len_buff, 2,
> > MSG_NOSIGNAL);
> >
> >                         if (0 == recd_bytes) {
> >
> >                                 syslog(LOG_ERR, "MDTM:socket_recv()
> =
> > %d, conn lost with dh server, exiting library err 111:%d",
> recd_bytes,
> > errno);
> >
> >                                 close(tcp_cb->DBSRsock);
> >
> >                                 exit(0);
> >
> >                         } else if (2 == recd_bytes) {
> >
> >                                 uint16_t local_len_buf = 0;
> >
> >
> >
> >                                 data = tcp_cb->len_buff;
> >
> >                                 local_len_buf =
> > ncs_decode_16bit(&data);
> >
> >
> >
> > /* MY CHANGE START */
> >
> >                                 *if (0 == local_len_buf)*
> >
> > *                                  return;*
> >
> > /* MY CHANGE END */
> >
> >
> >
> >                                 tcp_cb->buff_total_len =
> > local_len_buf;
> >
> >                                 tcp_cb->num_by_read_for_len_buff =
> 2;
> >
> >
> >
> >                                 if (NULL == (tcp_cb->buffer =
> > calloc(1, (local_len_buf + 1)))) {
> >
> >                                         /* Length + 2 is done to
> reuse
> > the same buffer
> >
> >                                            while sending to other
> > nodes */
> >
> >                                         syslog(LOG_ERR, "Memory
> > allocation failed in dtm_intranode_processing");
> >
> >                                         return;
> >
> >                                 }
> >
> >                                 recd_bytes = recv(tcp_cb->DBSRsock,
> > tcp_cb->buffer, local_len_buf, 0);
> >
> >                                 if (recd_bytes < 0) {
> >
> >                                         return;
> >
> >                                 } else if (0 == recd_bytes) {
> >
> >                                         syslog(LOG_ERR,
> > "MDTM:socket_recv()
> > = %d, conn lost with dh server, exiting library err 222:%d len:%d",
> > recd_bytes, errno,
> >
> >
> > local_len_buf);
> >
> >                                         close(tcp_cb->DBSRsock);
> >
> >                                         exit(0);
> >
> >
> >
> >  This caused many other issues, so I think just returning won’t
> work.
> >
> >
> >
> > Regards,
> >
> > Girish
> >
> >
> >
> > -----Original Message-----
> > From: A V Mahesh [mailto:[email protected]
> > <[email protected]>]
> > Sent: Friday, February 20, 2015 1:38 PM
> > To: Girish Nagaraj; [email protected]
> > Subject: Re: [users] Issues with CPSv
> >
> >
> >
> > Hi,
> >
> >
> >
> > On 2/20/2015 1:19 PM, Girish Nagaraj wrote:
> >
> > > Hi ,
> >
> > >
> >
> > >   I think this is not connection loss, we are passing 0 (len of
> > bytes
> >
> > > to be
> >
> > > read) to recv() function. Which returns back 0 received bytes.
> >
> >
> >
> > You mean, you are seeing issue   similar to `TIPC ticket #1227
> > mds/tipc
> >
> > : protect mds application form zero bytes hacking messages` for TCP
> as
> > well ?
> >
> >
> >
> > -AVM
> >
> >
> >
> > >
> >
> > >       local_len_buf =  ncs_decode_16bit(&data);
> >
> > >
> >
> > >   Is there mistake in decoding local_len_buf?
> >
> > >
> >
> > > Regards,
> >
> > > Girish
> >
> > >
> >
> > > -----Original Message-----
> >
> > > From: A V Mahesh [mailto:[email protected]
> > <[email protected]>
> > ]
> >
> > > Sent: Friday, February 20, 2015 11:03 AM
> >
> > > To: [email protected]
> >
> > > Subject: Re: [users] Issues with CPSv
> >
> > >
> >
> > > Hi,
> >
> > >
> >
> > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote:
> >
> > >> local_len_buf turns out be 0, this causes recv() to return 0 and
> >
> > >> application exits. Is this programming bug??
> >
> > > This is expected behavior , if any connection loss happens on TCP
> >
> > > socket will recives ZERO  size bytes, this not related to CPSv.
> >
> > >
> >
> > > -AVM
> >
> > >
> >
> > >
> >
> > > On 2/19/2015 3:42 PM, Girish Nagaraj wrote:
> >
> > >> Hi,
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> *Background*:
> >
> > >>
> >
> > >> Opensaf version: 4.5
> >
> > >>
> >
> > >> Number of checkpoints used: 2
> >
> > >>
> >
> > >> In our application we use CPSv to save application data and when
> >
> > >> application faults, it is restarted and it’s state is restored
> back
> >
> > >> by reading data from checkpoints
> >
> > >>
> >
> > >> Model: Simplex
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> * Issue faced:*
> >
> > >>
> >
> > >>     application sometimes crashes, stack trace as below:
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> Program received signal SIGSEGV, Segmentation fault.
> >
> > >>
> >
> > >> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
> >
> > >> "H\356\367\b") at patricia.c:94
> >
> > >>
> >
> > >> 94      patricia.c: No such file or directory.
> >
> > >>
> >
> > >> (gdb) bt
> >
> > >>
> >
> > >> #0  search (pTree=pTree@entry=0x8f733e4,
> key=key@entry=0xbfa0cdf8
> >
> > >> "H\356\367\b") at patricia.c:94
> >
> > >>
> >
> > >> #1  0xb76d0bef in ncs_patricia_tree_get
> > (pTree=pTree@entry=0x8f733e4,
> >
> > >> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434
> >
> > >>
> >
> > >> #2  0xb7738493 in cpa_lcl_ckpt_node_get
> >
> > >> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,
> >
> > >> lc_hdl=lc_hdl@entry=0xbfa0cdf8,
> lc_node=lc_node@entry=0xbfa0ce10)
> >
> > >>
> >
> > >>       at cpa_db.c:195
> >
> > >>
> >
> > >> #3  0xb7734d76 in saCkptCheckpointWrite
> > (checkpointHandle=150466120,
> >
> > >> ioVector=0x92c6d28, numberOfElements=1320,
> >
> > >>
> >
> > >>      
> erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c)
> > at
> >
> > >> cpa_api.c:3134
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> (gdb) p pNode
> >
> > >>
> >
> > >> $2 = (NCS_PATRICIA_NODE *) 0x5e
> >
> > >>
> >
> > >> (gdb) p *pTree
> >
> > >>
> >
> > >> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right =
> 0x8f733e4,
> >
> > >> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0,
> >
> > >> actual_key_size = 0,
> >
> > >>
> >
> > >>       node_size = 0}, n_nodes = 3}
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >>     sometimes application exits with below message:
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0,
> > conn
> >
> > >> lost with dh server, exiting library err:0 len:0
> >
> > >>
> >
> > >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO
> >
> > >> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart
> >
> > >> probation timer started (timeout: 4000000000 ns)
> >
> > >>
> >
> > >> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a
> >
> > >> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp
> >
> > >> restart count: 1)
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> Below is the modified code snippet from file
> >
> > >> osaf/libs/core/mds/mds_dt_trans.c
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> } else if (2 == recd_bytes) {
> >
> > >>
> >
> > >>                                   uint16_t local_len_buf = 0;
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >>                                   data = tcp_cb->len_buff;
> >
> > >>
> >
> > >>                                   local_len_buf =
> >
> > >> ncs_decode_16bit(&data);
> >
> > >>
> >
> > >>                                   tcp_cb->buff_total_len =
> >
> > >> local_len_buf;
> >
> > >>
> >
> > >>                                  
> tcp_cb->num_by_read_for_len_buff
> > =
> >
> > >> 2;
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >>                                   if (NULL == (tcp_cb->buffer =
> >
> > >> calloc(1, (local_len_buf + 1)))) {
> >
> > >>
> >
> > >>                                           /* Length + 2 is done
> to
> >
> > >> reuse the same buffer
> >
> > >>
> >
> > >>                                              while sending to
> other
> >
> > >> nodes */
> >
> > >>
> >
> > >>                                           syslog(LOG_ERR,
> "Memory
> >
> > >> allocation failed in dtm_intranode_processing");
> >
> > >>
> >
> > >>                                           return;
> >
> > >>
> >
> > >>                                   }
> >
> > >>
> >
> > >>                                   recd_bytes =
> > recv(tcp_cb->DBSRsock,
> >
> > >> tcp_cb->buffer, local_len_buf, 0);
> >
> > >>
> >
> > >>                                   if (recd_bytes < 0) {
> >
> > >>
> >
> > >>                                           return;
> >
> > >>
> >
> > >>                                   } else if (0 == recd_bytes) {
> >
> > >>
> >
> > >>                                           syslog(LOG_ERR,
> >
> > >> "MDTM:socket_recv() = %d, conn lost with dh server, exiting
> library
> >
> > >> err:%d len:%d", recd_bytes, errno, local_len_buf);
> >
> > >>
> >
> > >>                                          
> close(tcp_cb->DBSRsock);
> >
> > >>
> >
> > >>                                           exit(0); *<<<<<<<EXITS
> >
> > >> HERE>>>>>>>>>>*
> >
> > >>
> >
> > >>                                   } else if (local_len_buf >
> >
> > >> recd_bytes) {
> >
> > >>
> >
> > >>                                           /* can happen only in
> two
> >
> > >> cases, system call interrupt or half data, */
> >
> > >>
> >
> > >>                                           TRACE("less data recd,
> > recd
> >
> > >> bytes = %d, actual len = %d", recd_bytes,
> >
> > >>
> >
> > >>                                                  local_len_buf);
> >
> > >>
> >
> > >>                                           tcp_cb->bytes_tb_read
> =
> >
> > >> tcp_cb->buff_total_len - recd_bytes;
> >
> > >>
> >
> > >>                                           return;
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> local_len_buf turns out be 0, this causes recv() to return 0 and
> >
> > >> application exits. Is this programming bug??
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> Could someone please help to resolve these issues.
> >
> > >>
> >
> > >>
> >
> > >>
> >
> > >> Regards,
> >
> > >>
> >
> > >> Girish
> >
> > >>
> >
> > >
> >
> > >
> >
> ----------------------------------------------------------------------
> >
> > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade
> BIRT
> >
> > > Server from Actuate! Instantly Supercharge Your Business Reports
> and
> >
> > > Dashboards with Interactivity, Sharing, Native Excel Exports, App
> >
> > > Integration & more Get technology previously reserved for
> >
> > > billion-dollar corporations, FREE
> >
> > >
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.
> >
> > > clktrk _______________________________________________
> >
> > > Opensaf-users mailing list
> >
> > > [email protected]
> >
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> >
> > >
> >
> > --
> > .
> >
> ----------------------------------------------------------------------
> > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
> > Server from Actuate! Instantly Supercharge Your Business Reports
> and
> > Dashboards with Interactivity, Sharing, Native Excel Exports, App
> > Integration & more Get technology previously reserved for
> > billion-dollar corporations, FREE
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.
> > clktrk _______________________________________________
> > Opensaf-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-users
> 
> -- 
> .

------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to