Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-26 Thread Lennart Lund
Hi Canh,

You are right about that configuration streams except the "well known" ones can 
be deleted but if the stream configuration object exist and numOpeners = 0 
there is an inconsistency that must be considered as "out of synch" that 
requires a restart.

Regards
Lennart

From: Canh Van Truong 
Sent: den 26 mars 2019 10:17
To: Lennart Lund ; Vu Minh Nguyen 

Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Thanks Lennart,

Please my comments.

Regards
Canh
From: Lennart Lund mailto:lennart.l...@ericsson.com>>
Sent: Tuesday, March 26, 2019 3:41 PM
To: Canh Van Truong 
mailto:canh.v.tru...@dektech.com.au>>; Vu Minh 
Nguyen mailto:vu.m.ngu...@dektech.com.au>>
Cc: 
opensaf-devel@lists.sourceforge.net;
 Lennart Lund mailto:lennart.l...@ericsson.com>>
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Hi Canh,

I don't really understand what you suggest and how this will keep standby and 
active in sync. However, there are two things I would like to mention:

  1.  It is not only the "well known streams" if you by this means the alarm, 
notification and system streams that must never have 0 openers. This applies to 
all streams having a configuration object in the IMM model.
[Canh] 1/ Create configuration app stream with configuration object in IMM. 
Numopener =1
immcfg -c SaLogStreamConfig  safLgStrCfg=TestApp5 -a saLogStreamPathName=. -a 
saLogStreamFileName=TestApp5

2/ Delete cfg stream: Numopeners =0, log file is closed, ...
immcfg -d  safLgStrCfg=TestApp5


  1.  If standby gets checkpoint data where something wrong can be detected, 
for example leading to numOpeners = 0 for a stream where this shall never 
happen, then standby and active is "out of synch". The "normal" behavior is 
then to exit the standby process in order to be restarted which will trig a 
"cold synch". Note that normally the whole node is restarted in such a case but 
it is possible to just restart the log server process. Check how this is 
handled in other places where an "out of synch" restart is done.
[Canh] Yes. It should be restarted the standby node. I am testing it.

Regards
Lennart

From: Canh Van Truong 
mailto:canh.v.tru...@dektech.com.au>>
Sent: den 25 mars 2019 04:50
To: Lennart Lund mailto:lennart.l...@ericsson.com>>; 
Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au>>
Cc: 
opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Thanks all,

With the case that causes the osaflogd crash (the "numOpeners" of "well know 
stream" is 0), I suggest solution is that when the standby process the 
checkpoint of open stream, it should always check if the ("numOpeners" ckpt 
from active) less than ("numOpeners" of standby + 1), The "numOpeners" should 
not use the value that ckpt from active. It should be ("numOpeners" of standby 
+ 1).  Because there is one or more than one streams that client has not 
removed these streams to its owned list on standby while these streams already 
removed on active node.   Is it ok?

@Lennart:  Yes, the numOpeners of "well know streams" should never become 0 as 
you said. But some unexpected case as I mention in previous email it may become 
0 due to ckpt problem.
Only "well know streams" should not become 0. Other configuration streams (app 
cfg stream) can till be 0 in case no client own this stream  and the user 
deletes the stream in IMM data base and it will call the callback to delete 
stream in lgd(e.g. immcfg -d ...)

@aVu: As your mention the case, although the "numOpeners" in standby is less 
than one with active node, but the stream has not associated with the client by 
"lgs_client_stream_add()"in standby node.  When the active node reboot or split 
and the standby node is up to active, the client is down and will not close 
stream because "stream_list_root" list does not have that stream. So if my 
thinking is correct, that case won't cause the issue happen ?
There is already a log ER if ckpt fail "LOG_ER("%s: MBCSV send FAILED rc=%u.", 
__FUNCTION__, rc);"

Thanks
Canh

From: Lennart Lund mailto:lennart.l...@ericsson.com>>
Sent: Friday, March 22, 2019 7:40 PM
To: Vu Minh Nguyen 
mailto:vu.m.ngu...@dektech.com.au>>; Canh Van 
Truong mailto:canh.v.tru...@dektech.com.au>>
Cc: 
opensaf-devel@lists.sourceforge.net;
 Lennart Lund mailto:lennart.l...@ericsson.com>>
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Hi Canh,

Just a small comment.
For "Well known streams" it shall never be possible that numOpeners becomes 0 
since the log service itself is one of the "openers" (the first opener) and 
that "opener" is never closed. This also applies for any configuration stream 
(well known streams are also 

Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-26 Thread Canh Van Truong
Thanks Lennart,

 

Please my comments.

 

Regards

Canh

From: Lennart Lund  
Sent: Tuesday, March 26, 2019 3:41 PM
To: Canh Van Truong ; Vu Minh Nguyen

Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund

Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

I don't really understand what you suggest and how this will keep standby
and active in sync. However, there are two things I would like to mention:

1.  It is not only the "well known streams" if you by this means the
alarm, notification and system streams that must never have 0 openers. This
applies to all streams having a configuration object in the IMM model.

[Canh] 1/ Create configuration app stream with configuration object in IMM.
Numopener =1

immcfg -c SaLogStreamConfig  safLgStrCfg=TestApp5 -a saLogStreamPathName=.
-a saLogStreamFileName=TestApp5

 

2/ Delete cfg stream: Numopeners =0, log file is closed, .

immcfg -d  safLgStrCfg=TestApp5

 

2.  If standby gets checkpoint data where something wrong can be
detected, for example leading to numOpeners = 0 for a stream where this
shall never happen, then standby and active is "out of synch". The "normal"
behavior is then to exit the standby process in order to be restarted which
will trig a "cold synch". Note that normally the whole node is restarted in
such a case but it is possible to just restart the log server process. Check
how this is handled in other places where an "out of synch" restart is done.

[Canh] Yes. It should be restarted the standby node. I am testing it.

 

Regards

Lennart

 

From: Canh Van Truong mailto:canh.v.tru...@dektech.com.au> > 
Sent: den 25 mars 2019 04:50
To: Lennart Lund mailto:lennart.l...@ericsson.com> >; Vu Minh Nguyen
mailto:vu.m.ngu...@dektech.com.au> >
Cc: opensaf-devel@lists.sourceforge.net
 
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Thanks all,

 

With the case that causes the osaflogd crash (the "numOpeners" of "well know
stream" is 0), I suggest solution is that when the standby process the
checkpoint of open stream, it should always check if the ("numOpeners" ckpt
from active) less than ("numOpeners" of standby + 1), The "numOpeners"
should not use the value that ckpt from active. It should be ("numOpeners"
of standby + 1).  Because there is one or more than one streams that client
has not removed these streams to its owned list on standby while these
streams already removed on active node.   Is it ok?

 

@Lennart:  Yes, the numOpeners of "well know streams" should never become 0
as you said. But some unexpected case as I mention in previous email it may
become 0 due to ckpt problem. 

Only "well know streams" should not become 0. Other configuration streams
(app cfg stream) can till be 0 in case no client own this stream  and the
user deletes the stream in IMM data base and it will call the callback to
delete stream in lgd(e.g. immcfg -d .)

 

@aVu: As your mention the case, although the "numOpeners" in standby is less
than one with active node, but the stream has not associated with the client
by "lgs_client_stream_add()"in standby node.  When the active node reboot or
split and the standby node is up to active, the client is down and will not
close stream because "stream_list_root" list does not have that stream. So
if my thinking is correct, that case won't cause the issue happen ?

There is already a log ER if ckpt fail "LOG_ER("%s: MBCSV send FAILED
rc=%u.", __FUNCTION__, rc);"

 

Thanks

Canh

 

From: Lennart Lund mailto:lennart.l...@ericsson.com> > 
Sent: Friday, March 22, 2019 7:40 PM
To: Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au> >; Canh Van Truong
mailto:canh.v.tru...@dektech.com.au> >
Cc: opensaf-devel@lists.sourceforge.net
 ; Lennart Lund
mailto:lennart.l...@ericsson.com> >
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

Just a small comment.

For "Well known streams" it shall never be possible that numOpeners becomes
0 since the log service itself is one of the "openers" (the first opener)
and that "opener" is never closed. This also applies for any configuration
stream (well known streams are also configuration streams).

 

Regards

Lennart

 

From: Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au> > 
Sent: den 22 mars 2019 09:30
To: Canh Van Truong mailto:canh.v.tru...@dektech.com.au> >; Lennart Lund
mailto:lennart.l...@ericsson.com> >
Cc: opensaf-devel@lists.sourceforge.net
 
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

Thanks for your good finding.

 

There is other possibility that well-known streams can be deleted as well.
Looking at below code, proc_stream_open_msg().

 

rc = lgs_mds_msg_send(cb, , >fr_dest, >mds_ctxt,

   

Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-26 Thread Lennart Lund
Hi Canh,

I don't really understand what you suggest and how this will keep standby and 
active in sync. However, there are two things I would like to mention:

  1.  It is not only the "well known streams" if you by this means the alarm, 
notification and system streams that must never have 0 openers. This applies to 
all streams having a configuration object in the IMM model.
  2.  If standby gets checkpoint data where something wrong can be detected, 
for example leading to numOpeners = 0 for a stream where this shall never 
happen, then standby and active is "out of synch". The "normal" behavior is 
then to exit the standby process in order to be restarted which will trig a 
"cold synch". Note that normally the whole node is restarted in such a case but 
it is possible to just restart the log server process. Check how this is 
handled in other places where an "out of synch" restart is done.

Regards
Lennart

From: Canh Van Truong 
Sent: den 25 mars 2019 04:50
To: Lennart Lund ; Vu Minh Nguyen 

Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Thanks all,

With the case that causes the osaflogd crash (the "numOpeners" of "well know 
stream" is 0), I suggest solution is that when the standby process the 
checkpoint of open stream, it should always check if the ("numOpeners" ckpt 
from active) less than ("numOpeners" of standby + 1), The "numOpeners" should 
not use the value that ckpt from active. It should be ("numOpeners" of standby 
+ 1).  Because there is one or more than one streams that client has not 
removed these streams to its owned list on standby while these streams already 
removed on active node.   Is it ok?

@Lennart:  Yes, the numOpeners of "well know streams" should never become 0 as 
you said. But some unexpected case as I mention in previous email it may become 
0 due to ckpt problem.
Only "well know streams" should not become 0. Other configuration streams (app 
cfg stream) can till be 0 in case no client own this stream  and the user 
deletes the stream in IMM data base and it will call the callback to delete 
stream in lgd(e.g. immcfg -d ...)

@aVu: As your mention the case, although the "numOpeners" in standby is less 
than one with active node, but the stream has not associated with the client by 
"lgs_client_stream_add()"in standby node.  When the active node reboot or split 
and the standby node is up to active, the client is down and will not close 
stream because "stream_list_root" list does not have that stream. So if my 
thinking is correct, that case won't cause the issue happen ?
There is already a log ER if ckpt fail "LOG_ER("%s: MBCSV send FAILED rc=%u.", 
__FUNCTION__, rc);"

Thanks
Canh

From: Lennart Lund mailto:lennart.l...@ericsson.com>>
Sent: Friday, March 22, 2019 7:40 PM
To: Vu Minh Nguyen 
mailto:vu.m.ngu...@dektech.com.au>>; Canh Van 
Truong mailto:canh.v.tru...@dektech.com.au>>
Cc: 
opensaf-devel@lists.sourceforge.net;
 Lennart Lund mailto:lennart.l...@ericsson.com>>
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Hi Canh,

Just a small comment.
For "Well known streams" it shall never be possible that numOpeners becomes 0 
since the log service itself is one of the "openers" (the first opener) and 
that "opener" is never closed. This also applies for any configuration stream 
(well known streams are also configuration streams).

Regards
Lennart

From: Vu Minh Nguyen 
mailto:vu.m.ngu...@dektech.com.au>>
Sent: den 22 mars 2019 09:30
To: Canh Van Truong 
mailto:canh.v.tru...@dektech.com.au>>; Lennart 
Lund mailto:lennart.l...@ericsson.com>>
Cc: 
opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Hi Canh,

Thanks for your good finding.

There is other possibility that well-known streams can be deleted as well. 
Looking at below code, proc_stream_open_msg().

rc = lgs_mds_msg_send(cb, , >fr_dest, >mds_ctxt,
MDS_SEND_PRIORITY_HIGH);

  // Checkpoint the opened stream
if (ais_rv == SA_AIS_OK) {
lgs_ckpt_stream_open(logStream, open_sync_param->client_id);
}

If the active node is rebooted or is split from the peer after sending OK reply 
to log agent *and* before forwarding the update to standby node,
the numberOpeners value at standby will be less than one comparing with the 
actual total number of connections toward that stream.

I think, we should log Warning or Error in case checkpoint data gets failed and 
never close well-known streams even the numberOpenners is zero(0).

Regards, Vu

From: Canh Van Truong 
mailto:canh.v.tru...@dektech.com.au>>
Sent: Friday, March 22, 2019 12:01 PM
To: 'Vu Minh Nguyen' 
mailto:vu.m.ngu...@dektech.com.au>>; 
lennart.l...@ericsson.com
Cc: 

Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-24 Thread Canh Van Truong
Thanks all,

 

With the case that causes the osaflogd crash (the "numOpeners" of "well know
stream" is 0), I suggest solution is that when the standby process the
checkpoint of open stream, it should always check if the ("numOpeners" ckpt
from active) less than ("numOpeners" of standby + 1), The "numOpeners"
should not use the value that ckpt from active. It should be ("numOpeners"
of standby + 1).  Because there is one or more than one streams that client
has not removed these streams to its owned list on standby while these
streams already removed on active node.   Is it ok?

 

@Lennart:  Yes, the numOpeners of "well know streams" should never become 0
as you said. But some unexpected case as I mention in previous email it may
become 0 due to ckpt problem. 

Only "well know streams" should not become 0. Other configuration streams
(app cfg stream) can till be 0 in case no client own this stream  and the
user deletes the stream in IMM data base and it will call the callback to
delete stream in lgd(e.g. immcfg -d .)

 

@aVu: As your mention the case, although the "numOpeners" in standby is less
than one with active node, but the stream has not associated with the client
by "lgs_client_stream_add()"in standby node.  When the active node reboot or
split and the standby node is up to active, the client is down and will not
close stream because "stream_list_root" list does not have that stream. So
if my thinking is correct, that case won't cause the issue happen ?

There is already a log ER if ckpt fail "LOG_ER("%s: MBCSV send FAILED
rc=%u.", __FUNCTION__, rc);"

 

Thanks

Canh

 

From: Lennart Lund  
Sent: Friday, March 22, 2019 7:40 PM
To: Vu Minh Nguyen ; Canh Van Truong

Cc: opensaf-devel@lists.sourceforge.net; Lennart Lund

Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

Just a small comment.

For "Well known streams" it shall never be possible that numOpeners becomes
0 since the log service itself is one of the "openers" (the first opener)
and that "opener" is never closed. This also applies for any configuration
stream (well known streams are also configuration streams).

 

Regards

Lennart

 

From: Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au> > 
Sent: den 22 mars 2019 09:30
To: Canh Van Truong mailto:canh.v.tru...@dektech.com.au> >; Lennart Lund
mailto:lennart.l...@ericsson.com> >
Cc: opensaf-devel@lists.sourceforge.net
 
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

Thanks for your good finding.

 

There is other possibility that well-known streams can be deleted as well.
Looking at below code, proc_stream_open_msg().

 

rc = lgs_mds_msg_send(cb, , >fr_dest, >mds_ctxt,

MDS_SEND_PRIORITY_HIGH);

 

  // Checkpoint the opened stream

if (ais_rv == SA_AIS_OK) {

lgs_ckpt_stream_open(logStream, open_sync_param->client_id);

}

 

If the active node is rebooted or is split from the peer after sending OK
reply to log agent *and* before forwarding the update to standby node, 

the numberOpeners value at standby will be less than one comparing with the
actual total number of connections toward that stream.

 

I think, we should log Warning or Error in case checkpoint data gets failed
and never close well-known streams even the numberOpenners is zero(0).

 

Regards, Vu

 

From: Canh Van Truong mailto:canh.v.tru...@dektech.com.au> > 
Sent: Friday, March 22, 2019 12:01 PM
To: 'Vu Minh Nguyen' mailto:vu.m.ngu...@dektech.com.au> >; lennart.l...@ericsson.com
 
Cc: opensaf-devel@lists.sourceforge.net
 
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Lennart and aVu,

 

Thanks for your review.

I check the code again and guess the condition for the crash:

1/Current "numOpeners" of one well known stream = 2. This mean that just one
client open this stream. LOGD get close that stream request from client. The
closing stream is successful on active node. But checkpoiting to standby
have problem and cannot closing stream on standby. 

2/Now the "numOpeners" on active node is (1), on standby node is (2). And
still have one client own that well known stream on standby although the
client already closed the stream on active node. 

3/One other client open again that stream again. "numOpeners" on both active
node and standby is (2). Only one client own the stream, but 2 client own
that stream on standby.

4/Reboot active node and assume that 2 clients that is on active node, so 2
client will be downed. Standby node up to active will close that well known
stream 2 times with  numOpeners = 2. Then numOpeners will be zero for the
well known stream.

 

I don't see any more hint that cause the issue happen. 

 

Regards

Canh

From: Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au> > 

Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-22 Thread Lennart Lund
Hi Canh,

Just a small comment.
For "Well known streams" it shall never be possible that numOpeners becomes 0 
since the log service itself is one of the "openers" (the first opener) and 
that "opener" is never closed. This also applies for any configuration stream 
(well known streams are also configuration streams).

Regards
Lennart

From: Vu Minh Nguyen 
Sent: den 22 mars 2019 09:30
To: Canh Van Truong ; Lennart Lund 

Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Hi Canh,

Thanks for your good finding.

There is other possibility that well-known streams can be deleted as well. 
Looking at below code, proc_stream_open_msg().

rc = lgs_mds_msg_send(cb, , >fr_dest, >mds_ctxt,
MDS_SEND_PRIORITY_HIGH);

  // Checkpoint the opened stream
if (ais_rv == SA_AIS_OK) {
lgs_ckpt_stream_open(logStream, open_sync_param->client_id);
}

If the active node is rebooted or is split from the peer after sending OK reply 
to log agent *and* before forwarding the update to standby node,
the numberOpeners value at standby will be less than one comparing with the 
actual total number of connections toward that stream.

I think, we should log Warning or Error in case checkpoint data gets failed and 
never close well-known streams even the numberOpenners is zero(0).

Regards, Vu

From: Canh Van Truong 
mailto:canh.v.tru...@dektech.com.au>>
Sent: Friday, March 22, 2019 12:01 PM
To: 'Vu Minh Nguyen' 
mailto:vu.m.ngu...@dektech.com.au>>; 
lennart.l...@ericsson.com
Cc: 
opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]

Hi Lennart and aVu,

Thanks for your review.
I check the code again and guess the condition for the crash:
1/Current "numOpeners" of one well known stream = 2. This mean that just one 
client open this stream. LOGD get close that stream request from client. The 
closing stream is successful on active node. But checkpoiting to standby have 
problem and cannot closing stream on standby.
2/Now the "numOpeners" on active node is (1), on standby node is (2). And still 
have one client own that well known stream on standby although the client 
already closed the stream on active node.
3/One other client open again that stream again. "numOpeners" on both active 
node and standby is (2). Only one client own the stream, but 2 client own that 
stream on standby.
4/Reboot active node and assume that 2 clients that is on active node, so 2 
client will be downed. Standby node up to active will close that well known 
stream 2 times with  numOpeners = 2. Then numOpeners will be zero for the well 
known stream.

I don't see any more hint that cause the issue happen.

Regards
Canh
From: Vu Minh Nguyen 
mailto:vu.m.ngu...@dektech.com.au>>
Sent: Monday, March 18, 2019 2:14 PM
To: 'Canh Van Truong' 
mailto:canh.v.tru...@dektech.com.au>>; 
lennart.l...@ericsson.com
Cc: 
opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has 
numOpeners = 0 [#3018]


Hi Canh,



Log stream is allocated with brackets, (), going with new operator; It means 
numOpeners field is already zero-initialized.

log_stream_t *stream = new (std::nothrow) log_stream_t();



So, I don't think your change will address the issue. The bug may locate at 
another place.



Regards, Vu



> -Original Message-

> From: Canh Van Truong 
> mailto:canh.v.tru...@dektech.com.au>>

> Sent: Thursday, March 14, 2019 6:49 PM

> To: lennart.l...@ericsson.com; 
> vu.m.ngu...@dektech.com.au

> Cc: 
> opensaf-devel@lists.sourceforge.net;
>  Canh Van Truong

> mailto:canh.v.tru...@dektech.com.au>>

> Subject: [PATCH 1/1] log: logd crash due to well known stream has

> numOpeners = 0 [#3018]

>

> When the stream is created, the numOpeners is not initialized and

> may be started with unexpected value (e.g max value of unsigned int32).

> It is not correct. That may cause when client close the well known stream

> and numOpeners may be 0. The crash happens.

>

> The "numOpeners" should be initialized with 0.

> ---

>  src/log/logd/lgs_stream.cc | 1 +

>  1 file changed, 1 insertion(+)

>

> diff --git a/src/log/logd/lgs_stream.cc b/src/log/logd/lgs_stream.cc

> index 28344c8cd..8ad0757b9 100644

> --- a/src/log/logd/lgs_stream.cc

> +++ b/src/log/logd/lgs_stream.cc

> @@ -719,6 +719,7 @@ log_stream_t *log_stream_new(const std::string

> , int stream_id) {

>stream->severityFilter = 0x7f; /* by default all levels are allowed */

>stream->isRtStream = SA_FALSE;

>stream->dest_names.clear();

> +  stream->numOpeners = 0;  // Set the number of openers is 0 at 

Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-22 Thread Vu Minh Nguyen
Hi Canh,

 

Thanks for your good finding.

 

There is other possibility that well-known streams can be deleted as well.
Looking at below code, proc_stream_open_msg().

 

rc = lgs_mds_msg_send(cb, , >fr_dest, >mds_ctxt,

MDS_SEND_PRIORITY_HIGH);

 

  // Checkpoint the opened stream

if (ais_rv == SA_AIS_OK) {

lgs_ckpt_stream_open(logStream, open_sync_param->client_id);

}

 

If the active node is rebooted or is split from the peer after sending OK
reply to log agent *and* before forwarding the update to standby node, 

the numberOpeners value at standby will be less than one comparing with the
actual total number of connections toward that stream.

 

I think, we should log Warning or Error in case checkpoint data gets failed
and never close well-known streams even the numberOpenners is zero(0).

 

Regards, Vu

 

From: Canh Van Truong  
Sent: Friday, March 22, 2019 12:01 PM
To: 'Vu Minh Nguyen' ; lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Lennart and aVu,

 

Thanks for your review.

I check the code again and guess the condition for the crash:

1/Current "numOpeners" of one well known stream = 2. This mean that just one
client open this stream. LOGD get close that stream request from client. The
closing stream is successful on active node. But checkpoiting to standby
have problem and cannot closing stream on standby. 

2/Now the "numOpeners" on active node is (1), on standby node is (2). And
still have one client own that well known stream on standby although the
client already closed the stream on active node. 

3/One other client open again that stream again. "numOpeners" on both active
node and standby is (2). Only one client own the stream, but 2 client own
that stream on standby.

4/Reboot active node and assume that 2 clients that is on active node, so 2
client will be downed. Standby node up to active will close that well known
stream 2 times with  numOpeners = 2. Then numOpeners will be zero for the
well known stream.

 

I don't see any more hint that cause the issue happen. 

 

Regards

Canh

From: Vu Minh Nguyen mailto:vu.m.ngu...@dektech.com.au> > 
Sent: Monday, March 18, 2019 2:14 PM
To: 'Canh Van Truong' mailto:canh.v.tru...@dektech.com.au> >; lennart.l...@ericsson.com
 
Cc: opensaf-devel@lists.sourceforge.net
 
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

Log stream is allocated with brackets, (), going with new operator; It means
numOpeners field is already zero-initialized. 

log_stream_t *stream = new (std::nothrow) log_stream_t();

 

So, I don't think your change will address the issue. The bug may locate at
another place.

 

Regards, Vu

 

> -Original Message-

> From: Canh Van Truong mailto:canh.v.tru...@dektech.com.au> >

> Sent: Thursday, March 14, 2019 6:49 PM

> To: lennart.l...@ericsson.com  ;
vu.m.ngu...@dektech.com.au  

> Cc: opensaf-devel@lists.sourceforge.net
 ; Canh Van Truong

> mailto:canh.v.tru...@dektech.com.au> >

> Subject: [PATCH 1/1] log: logd crash due to well known stream has

> numOpeners = 0 [#3018]

> 

> When the stream is created, the numOpeners is not initialized and

> may be started with unexpected value (e.g max value of unsigned int32).

> It is not correct. That may cause when client close the well known stream

> and numOpeners may be 0. The crash happens.

> 

> The "numOpeners" should be initialized with 0.

> ---

>  src/log/logd/lgs_stream.cc | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/src/log/logd/lgs_stream.cc b/src/log/logd/lgs_stream.cc

> index 28344c8cd..8ad0757b9 100644

> --- a/src/log/logd/lgs_stream.cc

> +++ b/src/log/logd/lgs_stream.cc

> @@ -719,6 +719,7 @@ log_stream_t *log_stream_new(const std::string

> , int stream_id) {

>stream->severityFilter = 0x7f; /* by default all levels are allowed */

>stream->isRtStream = SA_FALSE;

>stream->dest_names.clear();

> +  stream->numOpeners = 0;  // Set the number of openers is 0 at creating

> stream

> 

>/* Initiate local or shared stream file descriptor dependant on shared
or

> * split file system

> --

> 2.15.1

 


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-21 Thread Canh Van Truong
Hi Lennart and aVu,

 

Thanks for your review.

I check the code again and guess the condition for the crash:

1/Current "numOpeners" of one well known stream = 2. This mean that just one
client open this stream. LOGD get close that stream request from client. The
closing stream is successful on active node. But checkpoiting to standby
have problem and cannot closing stream on standby. 

2/Now the "numOpeners" on active node is (1), on standby node is (2). And
still have one client own that well known stream on standby although the
client already closed the stream on active node. 

3/One other client open again that stream again. "numOpeners" on both active
node and standby is (2). Only one client own the stream, but 2 client own
that stream on standby.

4/Reboot active node and assume that 2 clients that is on active node, so 2
client will be downed. Standby node up to active will close that well known
stream 2 times with  numOpeners = 2. Then numOpeners will be zero for the
well known stream.

 

I don't see any more hint that cause the issue happen. 

 

Regards

Canh

From: Vu Minh Nguyen  
Sent: Monday, March 18, 2019 2:14 PM
To: 'Canh Van Truong' ;
lennart.l...@ericsson.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: RE: [PATCH 1/1] log: logd crash due to well known stream has
numOpeners = 0 [#3018]

 

Hi Canh,

 

Log stream is allocated with brackets, (), going with new operator; It means
numOpeners field is already zero-initialized. 

log_stream_t *stream = new (std::nothrow) log_stream_t();

 

So, I don't think your change will address the issue. The bug may locate at
another place.

 

Regards, Vu

 

> -Original Message-

> From: Canh Van Truong mailto:canh.v.tru...@dektech.com.au> >

> Sent: Thursday, March 14, 2019 6:49 PM

> To: lennart.l...@ericsson.com  ;
vu.m.ngu...@dektech.com.au  

> Cc: opensaf-devel@lists.sourceforge.net
 ; Canh Van Truong

> mailto:canh.v.tru...@dektech.com.au> >

> Subject: [PATCH 1/1] log: logd crash due to well known stream has

> numOpeners = 0 [#3018]

> 

> When the stream is created, the numOpeners is not initialized and

> may be started with unexpected value (e.g max value of unsigned int32).

> It is not correct. That may cause when client close the well known stream

> and numOpeners may be 0. The crash happens.

> 

> The "numOpeners" should be initialized with 0.

> ---

>  src/log/logd/lgs_stream.cc | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/src/log/logd/lgs_stream.cc b/src/log/logd/lgs_stream.cc

> index 28344c8cd..8ad0757b9 100644

> --- a/src/log/logd/lgs_stream.cc

> +++ b/src/log/logd/lgs_stream.cc

> @@ -719,6 +719,7 @@ log_stream_t *log_stream_new(const std::string

> , int stream_id) {

>stream->severityFilter = 0x7f; /* by default all levels are allowed */

>stream->isRtStream = SA_FALSE;

>stream->dest_names.clear();

> +  stream->numOpeners = 0;  // Set the number of openers is 0 at creating

> stream

> 

>/* Initiate local or shared stream file descriptor dependant on shared
or

> * split file system

> --

> 2.15.1

 


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-18 Thread Vu Minh Nguyen
Hi Canh,

 

Log stream is allocated with brackets, (), going with new operator; It means
numOpeners field is already zero-initialized. 

log_stream_t *stream = new (std::nothrow) log_stream_t();

 

So, I don't think your change will address the issue. The bug may locate at
another place.

 

Regards, Vu

 

> -Original Message-

> From: Canh Van Truong 

> Sent: Thursday, March 14, 2019 6:49 PM

> To: lennart.l...@ericsson.com; vu.m.ngu...@dektech.com.au

> Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong

> 

> Subject: [PATCH 1/1] log: logd crash due to well known stream has

> numOpeners = 0 [#3018]

> 

> When the stream is created, the numOpeners is not initialized and

> may be started with unexpected value (e.g max value of unsigned int32).

> It is not correct. That may cause when client close the well known stream

> and numOpeners may be 0. The crash happens.

> 

> The "numOpeners" should be initialized with 0.

> ---

>  src/log/logd/lgs_stream.cc | 1 +

>  1 file changed, 1 insertion(+)

> 

> diff --git a/src/log/logd/lgs_stream.cc b/src/log/logd/lgs_stream.cc

> index 28344c8cd..8ad0757b9 100644

> --- a/src/log/logd/lgs_stream.cc

> +++ b/src/log/logd/lgs_stream.cc

> @@ -719,6 +719,7 @@ log_stream_t *log_stream_new(const std::string

> , int stream_id) {

>stream->severityFilter = 0x7f; /* by default all levels are allowed */

>stream->isRtStream = SA_FALSE;

>stream->dest_names.clear();

> +  stream->numOpeners = 0;  // Set the number of openers is 0 at creating

> stream

> 

>/* Initiate local or shared stream file descriptor dependant on shared
or

> * split file system

> --

> 2.15.1

 


___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-15 Thread Lennart Lund
Hi Canh,

Ack

Thanks
Lennart

-Original Message-
From: Canh Van Truong  
Sent: den 14 mars 2019 12:49
To: Lennart Lund ; Vu Minh Nguyen 

Cc: opensaf-devel@lists.sourceforge.net; Canh Van Truong 

Subject: [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 
0 [#3018]

When the stream is created, the numOpeners is not initialized and may be 
started with unexpected value (e.g max value of unsigned int32).
It is not correct. That may cause when client close the well known stream and 
numOpeners may be 0. The crash happens.

The "numOpeners" should be initialized with 0.
---
 src/log/logd/lgs_stream.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/log/logd/lgs_stream.cc b/src/log/logd/lgs_stream.cc index 
28344c8cd..8ad0757b9 100644
--- a/src/log/logd/lgs_stream.cc
+++ b/src/log/logd/lgs_stream.cc
@@ -719,6 +719,7 @@ log_stream_t *log_stream_new(const std::string , int 
stream_id) {
   stream->severityFilter = 0x7f; /* by default all levels are allowed */
   stream->isRtStream = SA_FALSE;
   stream->dest_names.clear();
+  stream->numOpeners = 0;  // Set the number of openers is 0 at 
+ creating stream
 
   /* Initiate local or shared stream file descriptor dependant on shared or
* split file system
--
2.15.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] log: logd crash due to well known stream has numOpeners = 0 [#3018]

2019-03-14 Thread Canh Van Truong
When the stream is created, the numOpeners is not initialized and
may be started with unexpected value (e.g max value of unsigned int32).
It is not correct. That may cause when client close the well known stream
and numOpeners may be 0. The crash happens.

The "numOpeners" should be initialized with 0.
---
 src/log/logd/lgs_stream.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/log/logd/lgs_stream.cc b/src/log/logd/lgs_stream.cc
index 28344c8cd..8ad0757b9 100644
--- a/src/log/logd/lgs_stream.cc
+++ b/src/log/logd/lgs_stream.cc
@@ -719,6 +719,7 @@ log_stream_t *log_stream_new(const std::string , int 
stream_id) {
   stream->severityFilter = 0x7f; /* by default all levels are allowed */
   stream->isRtStream = SA_FALSE;
   stream->dest_names.clear();
+  stream->numOpeners = 0;  // Set the number of openers is 0 at creating stream
 
   /* Initiate local or shared stream file descriptor dependant on shared or
* split file system
-- 
2.15.1



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel