Hi,
I don’t get this issue with opensaf version 4.3, but I get segfault:
application sometimes crashes, stack trace as below:
Program received signal SIGSEGV, Segmentation fault.
search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
"H\356\367\b") at patricia.c:94
94 patricia.c: No such file or directory.
(gdb) bt
#0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
"H\356\367\b") at patricia.c:94
#1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4,
pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434
#2 0xb7738493 in cpa_lcl_ckpt_node_get
(lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,
lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10)
at cpa_db.c:195
#3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120,
ioVector=0x92c6d28, numberOfElements=1320,
erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at
cpa_api.c:3134
(gdb) p pNode
$2 = (NCS_PATRICIA_NODE *) 0x5e
(gdb) p *pTree
$4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4, key_info
= 0x8f734b8 ""}, params = {key_size = 8, info_size = 0, actual_key_size = 0,
node_size = 0}, n_nodes = 3}
Regards,
Girish
*From:* Girish Nagaraj [mailto:[email protected]]
*Sent:* Friday, February 20, 2015 3:34 PM
*To:* 'A V Mahesh'; '[email protected]'
*Subject:* RE: [users] Issues with CPSv
Hi,
Yes, similar issue in TCP also: exits with message:
Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost with
dh server, exiting library err :Success
Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO
'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart probation
timer started (timeout: 4000000000 ns)
Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of
'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1)
Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO
'safComp=ribd,safSu=SU1,safSg=zebos-simplex,safApp=zebos' faulted due to
'avaDown' : Recovery is 'componentRestart'
I experimented with code changes:
recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->len_buff, 2, MSG_NOSIGNAL);
if (0 == recd_bytes) {
syslog(LOG_ERR, "MDTM:socket_recv() = %d,
conn lost with dh server, exiting library err 111:%d", recd_bytes, errno);
close(tcp_cb->DBSRsock);
exit(0);
} else if (2 == recd_bytes) {
uint16_t local_len_buf = 0;
data = tcp_cb->len_buff;
local_len_buf = ncs_decode_16bit(&data);
/* MY CHANGE START */
*if (0 == local_len_buf)*
* return;*
/* MY CHANGE END */
tcp_cb->buff_total_len = local_len_buf;
tcp_cb->num_by_read_for_len_buff = 2;
if (NULL == (tcp_cb->buffer = calloc(1,
(local_len_buf + 1)))) {
/* Length + 2 is done to reuse the
same buffer
while sending to other nodes */
syslog(LOG_ERR, "Memory allocation
failed in dtm_intranode_processing");
return;
}
recd_bytes = recv(tcp_cb->DBSRsock,
tcp_cb->buffer, local_len_buf, 0);
if (recd_bytes < 0) {
return;
} else if (0 == recd_bytes) {
syslog(LOG_ERR, "MDTM:socket_recv()
= %d, conn lost with dh server, exiting library err 222:%d len:%d",
recd_bytes, errno,
local_len_buf);
close(tcp_cb->DBSRsock);
exit(0);
This caused many other issues, so I think just returning won’t work.
Regards,
Girish
-----Original Message-----
From: A V Mahesh [mailto:[email protected] <[email protected]>]
Sent: Friday, February 20, 2015 1:38 PM
To: Girish Nagaraj; [email protected]
Subject: Re: [users] Issues with CPSv
Hi,
On 2/20/2015 1:19 PM, Girish Nagaraj wrote:
> Hi ,
>
> I think this is not connection loss, we are passing 0 (len of bytes
> to be
> read) to recv() function. Which returns back 0 received bytes.
You mean, you are seeing issue similar to `TIPC ticket #1227 mds/tipc
: protect mds application form zero bytes hacking messages` for TCP as well
?
-AVM
>
> local_len_buf = ncs_decode_16bit(&data);
>
> Is there mistake in decoding local_len_buf?
>
> Regards,
> Girish
>
> -----Original Message-----
> From: A V Mahesh [mailto:[email protected] <[email protected]>
]
> Sent: Friday, February 20, 2015 11:03 AM
> To: [email protected]
> Subject: Re: [users] Issues with CPSv
>
> Hi,
>
> On 2/19/2015 3:42 PM, Girish Nagaraj wrote:
>> local_len_buf turns out be 0, this causes recv() to return 0 and
>> application exits. Is this programming bug??
> This is expected behavior , if any connection loss happens on TCP
> socket will recives ZERO size bytes, this not related to CPSv.
>
> -AVM
>
>
> On 2/19/2015 3:42 PM, Girish Nagaraj wrote:
>> Hi,
>>
>>
>>
>> *Background*:
>>
>> Opensaf version: 4.5
>>
>> Number of checkpoints used: 2
>>
>> In our application we use CPSv to save application data and when
>> application faults, it is restarted and it’s state is restored back
>> by reading data from checkpoints
>>
>> Model: Simplex
>>
>>
>>
>> * Issue faced:*
>>
>> application sometimes crashes, stack trace as below:
>>
>>
>>
>> Program received signal SIGSEGV, Segmentation fault.
>>
>> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
>> "H\356\367\b") at patricia.c:94
>>
>> 94 patricia.c: No such file or directory.
>>
>> (gdb) bt
>>
>> #0 search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8
>> "H\356\367\b") at patricia.c:94
>>
>> #1 0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4,
>> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434
>>
>> #2 0xb7738493 in cpa_lcl_ckpt_node_get
>> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,
>> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10)
>>
>> at cpa_db.c:195
>>
>> #3 0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120,
>> ioVector=0x92c6d28, numberOfElements=1320,
>>
>> erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at
>> cpa_api.c:3134
>>
>>
>>
>> (gdb) p pNode
>>
>> $2 = (NCS_PATRICIA_NODE *) 0x5e
>>
>> (gdb) p *pTree
>>
>> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4,
>> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0,
>> actual_key_size = 0,
>>
>> node_size = 0}, n_nodes = 3}
>>
>>
>>
>> sometimes application exits with below message:
>>
>>
>>
>> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn
>> lost with dh server, exiting library err:0 len:0
>>
>> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO
>> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart
>> probation timer started (timeout: 4000000000 ns)
>>
>> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a
>> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp
>> restart count: 1)
>>
>>
>>
>>
>>
>> Below is the modified code snippet from file
>> osaf/libs/core/mds/mds_dt_trans.c
>>
>>
>>
>> } else if (2 == recd_bytes) {
>>
>> uint16_t local_len_buf = 0;
>>
>>
>>
>> data = tcp_cb->len_buff;
>>
>> local_len_buf =
>> ncs_decode_16bit(&data);
>>
>> tcp_cb->buff_total_len =
>> local_len_buf;
>>
>> tcp_cb->num_by_read_for_len_buff =
>> 2;
>>
>>
>>
>> if (NULL == (tcp_cb->buffer =
>> calloc(1, (local_len_buf + 1)))) {
>>
>> /* Length + 2 is done to
>> reuse the same buffer
>>
>> while sending to other
>> nodes */
>>
>> syslog(LOG_ERR, "Memory
>> allocation failed in dtm_intranode_processing");
>>
>> return;
>>
>> }
>>
>> recd_bytes = recv(tcp_cb->DBSRsock,
>> tcp_cb->buffer, local_len_buf, 0);
>>
>> if (recd_bytes < 0) {
>>
>> return;
>>
>> } else if (0 == recd_bytes) {
>>
>> syslog(LOG_ERR,
>> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library
>> err:%d len:%d", recd_bytes, errno, local_len_buf);
>>
>> close(tcp_cb->DBSRsock);
>>
>> exit(0); *<<<<<<<EXITS
>> HERE>>>>>>>>>>*
>>
>> } else if (local_len_buf >
>> recd_bytes) {
>>
>> /* can happen only in two
>> cases, system call interrupt or half data, */
>>
>> TRACE("less data recd, recd
>> bytes = %d, actual len = %d", recd_bytes,
>>
>> local_len_buf);
>>
>> tcp_cb->bytes_tb_read =
>> tcp_cb->buff_total_len - recd_bytes;
>>
>> return;
>>
>>
>>
>> local_len_buf turns out be 0, this causes recv() to return 0 and
>> application exits. Is this programming bug??
>>
>>
>>
>> Could someone please help to resolve these issues.
>>
>>
>>
>> Regards,
>>
>> Girish
>>
>
> ----------------------------------------------------------------------
> -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
> Server from Actuate! Instantly Supercharge Your Business Reports and
> Dashboards with Interactivity, Sharing, Native Excel Exports, App
> Integration & more Get technology previously reserved for
> billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.
> clktrk _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
--
.
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users