Hi Mahesh,


Could it be possible to provide the fix for this defect at the earliest? We
have release in June, could it be possible before that?

Can I at least get the fix patch before it gets officially released?



Regards,

Girish



*From:* Girish Nagaraj [mailto:[email protected]]
*Sent:* Thursday, March 26, 2015 3:26 PM
*To:* 'A V Mahesh'; '[email protected]'
*Subject:* RE: [users] Issues with CPSv



Hi Mahesh,



I tested with opensaf4.5 TIPC as MDS, this issue is not seen.



Have raised a ticket “*#1285 MDS TCP: zero bytes recvd results in
application exit*”



Regards,

Girish



*From:* A V Mahesh [mailto:[email protected] <[email protected]>]

*Sent:* Monday, February 23, 2015 10:04 AM
*To:* Girish Nagaraj; [email protected]
*Subject:* Re: [users] Issues with CPSv



Hi,

To confirm/isolate the problems further , test your application with TIPC
transport with ticket #1227 fix ( both 4.3 & 4.5) .
and provide your observations.

If issue is NOT reproducible with TIPC transport , as a workaround prevent
sending ZERO size ( hack message ) in your ckpt application for TCP
transport
and raise a ticket with all details as Mathi explained.

-AVM

On 2/20/2015 3:33 PM, Girish Nagaraj wrote:

Hi,



Yes, similar issue in TCP also: exits with message:



Feb 20 15:24:59 fedvm1 RIB[28549]: MDTM:socket_recv() = 0, conn lost with
dh server, exiting library err :Success

Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO
'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart probation
timer started (timeout: 4000000000 ns)

Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO Restarting a component of
'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp restart count: 1)

Feb 20 15:24:59 fedvm1 osafamfnd[28263]: NO
'safComp=ribd,safSu=SU1,safSg=zebos-simplex,safApp=zebos' faulted due to
'avaDown' : Recovery is 'componentRestart'



I experimented with code changes:



recd_bytes = recv(tcp_cb->DBSRsock, tcp_cb->len_buff, 2, MSG_NOSIGNAL);

                        if (0 == recd_bytes) {

                                syslog(LOG_ERR, "MDTM:socket_recv() = %d,
conn lost with dh server, exiting library err 111:%d", recd_bytes, errno);

                                close(tcp_cb->DBSRsock);

                                exit(0);

                        } else if (2 == recd_bytes) {

                                uint16_t local_len_buf = 0;



                                data = tcp_cb->len_buff;

                                local_len_buf = ncs_decode_16bit(&data);



/* MY CHANGE START */

                                *if (0 == local_len_buf)*

*                                  return;*

/* MY CHANGE END */



                                tcp_cb->buff_total_len = local_len_buf;

                                tcp_cb->num_by_read_for_len_buff = 2;



                                if (NULL == (tcp_cb->buffer = calloc(1,
(local_len_buf + 1)))) {

                                        /* Length + 2 is done to reuse the
same buffer

                                           while sending to other nodes */

                                        syslog(LOG_ERR, "Memory allocation
failed in dtm_intranode_processing");

                                        return;

                                }

                                recd_bytes = recv(tcp_cb->DBSRsock,
tcp_cb->buffer, local_len_buf, 0);

                                if (recd_bytes < 0) {

                                        return;

                                } else if (0 == recd_bytes) {

                                        syslog(LOG_ERR, "MDTM:socket_recv()
= %d, conn lost with dh server, exiting library err 222:%d len:%d",
recd_bytes, errno,


local_len_buf);

                                        close(tcp_cb->DBSRsock);

                                        exit(0);



 This caused many other issues, so I think just returning won’t work.



Regards,

Girish



-----Original Message-----
From: A V Mahesh [mailto:[email protected]]
Sent: Friday, February 20, 2015 1:38 PM
To: Girish Nagaraj; [email protected]
Subject: Re: [users] Issues with CPSv



Hi,



On 2/20/2015 1:19 PM, Girish Nagaraj wrote:

> Hi ,

>

>   I think this is not connection loss, we are passing 0 (len of bytes

> to be

> read) to recv() function. Which returns back 0 received bytes.



You mean, you are seeing issue   similar to `TIPC ticket #1227 mds/tipc

: protect mds application form zero bytes hacking messages` for TCP as well
?



-AVM



>

>       local_len_buf =  ncs_decode_16bit(&data);

>

>   Is there mistake in decoding local_len_buf?

>

> Regards,

> Girish

>

> -----Original Message-----

> From: A V Mahesh [mailto:[email protected] <[email protected]>
]

> Sent: Friday, February 20, 2015 11:03 AM

> To: [email protected]

> Subject: Re: [users] Issues with CPSv

>

> Hi,

>

> On 2/19/2015 3:42 PM, Girish Nagaraj wrote:

>> local_len_buf turns out be 0, this causes recv() to return 0 and

>> application exits. Is this programming bug??

> This is expected behavior , if any connection loss happens on TCP

> socket will recives ZERO  size bytes, this not related to CPSv.

>

> -AVM

>

>

> On 2/19/2015 3:42 PM, Girish Nagaraj wrote:

>> Hi,

>>

>>

>>

>> *Background*:

>>

>> Opensaf version: 4.5

>>

>> Number of checkpoints used: 2

>>

>> In our application we use CPSv to save application data and when

>> application faults, it is restarted and it’s state is restored back

>> by reading data from checkpoints

>>

>> Model: Simplex

>>

>>

>>

>> * Issue faced:*

>>

>>     application sometimes crashes, stack trace as below:

>>

>>

>>

>> Program received signal SIGSEGV, Segmentation fault.

>>

>> search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8

>> "H\356\367\b") at patricia.c:94

>>

>> 94      patricia.c: No such file or directory.

>>

>> (gdb) bt

>>

>> #0  search (pTree=pTree@entry=0x8f733e4, key=key@entry=0xbfa0cdf8

>> "H\356\367\b") at patricia.c:94

>>

>> #1  0xb76d0bef in ncs_patricia_tree_get (pTree=pTree@entry=0x8f733e4,

>> pKey=pKey@entry=0xbfa0cdf8 "H\356\367\b") at patricia.c:434

>>

>> #2  0xb7738493 in cpa_lcl_ckpt_node_get

>> (lcl_ckpt_tree=lcl_ckpt_tree@entry=0x8f733e4,

>> lc_hdl=lc_hdl@entry=0xbfa0cdf8, lc_node=lc_node@entry=0xbfa0ce10)

>>

>>       at cpa_db.c:195

>>

>> #3  0xb7734d76 in saCkptCheckpointWrite (checkpointHandle=150466120,

>> ioVector=0x92c6d28, numberOfElements=1320,

>>

>>       erroneousVectorIndex=erroneousVectorIndex@entry=0xbfa0d35c) at

>> cpa_api.c:3134

>>

>>

>>

>> (gdb) p pNode

>>

>> $2 = (NCS_PATRICIA_NODE *) 0x5e

>>

>> (gdb) p *pTree

>>

>> $4 = {root_node = {bit = -1, left = 0x8f7e9c0, right = 0x8f733e4,

>> key_info = 0x8f734b8 ""}, params = {key_size = 8, info_size = 0,

>> actual_key_size = 0,

>>

>>       node_size = 0}, n_nodes = 3}

>>

>>

>>

>>     sometimes application exits with below message:

>>

>>

>>

>> Feb 19 15:13:31 controller2 RIB[28395]: MDTM:socket_recv() = 0, conn

>> lost with dh server, exiting library err:0 len:0

>>

>> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO

>> 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' component restart

>> probation timer started (timeout: 4000000000 ns)

>>

>> Feb 19 15:13:31 controller2 osafamfnd[28110]: NO Restarting a

>> component of 'safSu=SU1,safSg=zebos-simplex,safApp=zebos' (comp

>> restart count: 1)

>>

>>

>>

>>

>>

>> Below is the modified code snippet from file

>> osaf/libs/core/mds/mds_dt_trans.c

>>

>>

>>

>> } else if (2 == recd_bytes) {

>>

>>                                   uint16_t local_len_buf = 0;

>>

>>

>>

>>                                   data = tcp_cb->len_buff;

>>

>>                                   local_len_buf =

>> ncs_decode_16bit(&data);

>>

>>                                   tcp_cb->buff_total_len =

>> local_len_buf;

>>

>>                                   tcp_cb->num_by_read_for_len_buff =

>> 2;

>>

>>

>>

>>                                   if (NULL == (tcp_cb->buffer =

>> calloc(1, (local_len_buf + 1)))) {

>>

>>                                           /* Length + 2 is done to

>> reuse the same buffer

>>

>>                                              while sending to other

>> nodes */

>>

>>                                           syslog(LOG_ERR, "Memory

>> allocation failed in dtm_intranode_processing");

>>

>>                                           return;

>>

>>                                   }

>>

>>                                   recd_bytes = recv(tcp_cb->DBSRsock,

>> tcp_cb->buffer, local_len_buf, 0);

>>

>>                                   if (recd_bytes < 0) {

>>

>>                                           return;

>>

>>                                   } else if (0 == recd_bytes) {

>>

>>                                           syslog(LOG_ERR,

>> "MDTM:socket_recv() = %d, conn lost with dh server, exiting library

>> err:%d len:%d", recd_bytes, errno, local_len_buf);

>>

>>                                           close(tcp_cb->DBSRsock);

>>

>>                                           exit(0); *<<<<<<<EXITS

>> HERE>>>>>>>>>>*

>>

>>                                   } else if (local_len_buf >

>> recd_bytes) {

>>

>>                                           /* can happen only in two

>> cases, system call interrupt or half data, */

>>

>>                                           TRACE("less data recd, recd

>> bytes = %d, actual len = %d", recd_bytes,

>>

>>                                                  local_len_buf);

>>

>>                                           tcp_cb->bytes_tb_read =

>> tcp_cb->buff_total_len - recd_bytes;

>>

>>                                           return;

>>

>>

>>

>> local_len_buf turns out be 0, this causes recv() to return 0 and

>> application exits. Is this programming bug??

>>

>>

>>

>> Could someone please help to resolve these issues.

>>

>>

>>

>> Regards,

>>

>> Girish

>>

>

> ----------------------------------------------------------------------

> -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT

> Server from Actuate! Instantly Supercharge Your Business Reports and

> Dashboards with Interactivity, Sharing, Native Excel Exports, App

> Integration & more Get technology previously reserved for

> billion-dollar corporations, FREE

> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.

> clktrk _______________________________________________

> Opensaf-users mailing list

> [email protected]

> https://lists.sourceforge.net/lists/listinfo/opensaf-users

>




.

-- 
.
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to