I tried to reproduce this issue on opensaf version 5.18.04, but failed to 
reproduce.
   
   1)I try to reproduce this issue on 2 payloads(PL-3 and PL-4) and controller 
(SC-1) and enable PBE on SC-1.
   2)I am writing into the checkpoint opened in PL-3 and read it from the PL-4 
using demo app in a loop.
        a)PL-3:
                root@mohan-VirtualBox:/home/mohan/ticket1733# ./tkt 1
                Writing to Checkpoint safCkpt=DemoCkpt,safApp=safCkptService 
....
                Section-Id = 11 ....
                CheckpointData being written = "************ This is the 
saCkptCheckpointTrackCallback demo ***********"
                DataOffset = 0 ....
                saCkptCheckpointWrite PASSED
                Writing to Checkpoint safCkpt=DemoCkpt,safApp=safCkptService 
....
                Section-Id = 11 ....
                CheckpointData being written = "************ This is the 
saCkptCheckpointTrackCallback demo ***********"
                DataOffset = 0 ....
                saCkptCheckpointWrite PASSED
                Writing to Checkpoint safCkpt=DemoCkpt,safApp=safCkptService 
....
                Section-Id = 11 ....
                CheckpointData being written = "************ This is the 
saCkptCheckpointTrackCallback demo ***********"
                DataOffset = 0 ....
                saCkptCheckpointWrite PASSED
        b)PL-4:
root@mohan-VirtualBox:/home/mohan/ticket1733# ./tkt 2
*******************************************************************
Demonstrating Checkpoint Service Usage with a Track Callback
*******************************************************************
Initialising With Checkpoint Service....
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService with 
create flags....
saCkptTrack being enabled ....
Checkpoint being on Select .....
saCkptTrack being enabled ....
Checkpoint being on Select .....
Ckpt TrackCallback received for SectionId`s =
Section-Id = 11 ....
SectionId-idLen = 2 ....
Reading from Checkpoint safCkpt=DemoCkpt,safApp=safCkptService  Ckpt handle 
=1956C00 ....

CheckpointRead Checkpoint TrackCallback processed
CheckpointData was written in sectionId: 11 = "************ This is the 
saCkptCheckpointTrackCallback demo ***********"


   3)I used following command,immcfg -a saAmfSGCompRestartMax=1000 
safSg=NoRed,safApp=OpenSAF
        and killed the checkpoint node director number of times on payloads 
(PL-3, PL-4) but node was not rebooted.
   4) root@mohan-VirtualBox:~# amf-state siass
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
        saAmfSISUHAState=ACTIVE(1)
        saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
   5)pl:3

root@mohan-VirtualBox:~# /etc/init.d/opensafd start
[ ok ] Starting opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# immcfg -a saAmfSGCompRestartMax=1000 
safSg=NoRed,safApp=openSAF
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~#

   6)pl-4:
root@mohan-VirtualBox:~# /etc/init.d/opensafd start
[ ok ] Starting opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# immcfg -a saAmfSGCompRestartMax=1000 
safSg=NoRed,safApp=openSAF                                                      
                                                               
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd

since, there is no trace information(ckptnd trace is not there), further 
debugging is not possible.
so, i am closing this ticket, please reopen with traces if issue is reproduced.


---

** [tickets:#1733] Payload got rebooted when cpnd is killed on payload**

**Status:** accepted
**Milestone:** 5.18.08
**Created:** Wed Apr 06, 2016 11:05 AM UTC by Madhurika Koppula
**Last Updated:** Thu Aug 23, 2018 08:30 AM UTC
**Owner:** mohan kanakam
**Attachments:**

- 
[cpsv.tgz](https://sourceforge.net/p/opensaf/tickets/1733/attachment/cpsv.tgz) 
(15.0 MB; application/octet-stream)


Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE

Issue Observed: It is random.

1) When CPND is killed on payload, component restart of CPND failed because of 
expiration of component registration timer.
2) Node went for reboot. Test application is being ran.

Below is the timestamp of PL-4:

Apr  6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 60000000000 ns)
Apr  6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO Restarting a component of 
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)

Apr  6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'componentRestart'

Apr  6 10:52:00 OEL_M-SLOT-4 osafckptnd[6263]: Started
Apr  6 10:52:10 OEL_M-SLOT-4 osafamfnd[3015]: NO Instantiation of 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed
Apr  6 10:52:10 OEL_M-SLOT-4 osafamfnd[3015]: NO Reason: component registration 
timer expired
Apr  6 10:52:10 OEL_M-SLOT-4 osafckptnd[6294]: Started

Apr  6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Instantiation of 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed

Apr  6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Reason: component registration 
timer expired
Apr  6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: WA 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State RESTARTING 
=> INSTANTIATION_FAILED
Apr  6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Component Failover trigerred 
for 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Failed component: 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
Apr  6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: ER 
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'got Inst failed
Apr  6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: Rebooting OpenSAF NodeId = 132111 
EE Name = , Reason: NCS component Instantiation failed, OwnNodeId = 132111, 
SupervisionTime = 60
Apr  6 10:52:20 OEL_M-SLOT-4 opensaf_reboot: Rebooting local node; timeout=60
Apr  6 10:52:46 OEL_M-SLOT-4 kernel: imklog 5.8.10, log source = /proc/kmsg 
started.

3) Below is the timestamp of ACTIVE controller:

Apr  6 10:51:59 OEL_M-SLOT-1 osafimmd[6916]: WA No coordinator IMMND known 
(case B) - ignoring sync request
Apr  6 10:51:59 OEL_M-SLOT-1 osafimmd[6916]: NO Node 2040f request sync 
sync-pid:2980 epoch:0
Apr  6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Resetting link 
<1.1.1:eth3-1.1.4:eth3>, peer not responding
Apr  6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Lost link <1.1.1:eth3-1.1.4:eth3> on 
network plane A
Apr  6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Lost contact with <1.1.4>
Apr  6 10:52:24 OEL_M-SLOT-1 osafamfd[7003]: NO Node 'PL-4' left the cluster
Apr  6 10:52:24 OEL_M-SLOT-1 osafclmd[6988]: NO Node 132111 went down. Not 
sending track callback for agents on that node
Apr  6 10:52:24 OEL_M-SLOT-1 osafclmd[6988]: NO Node 132111 went down. Not 
sending track callback for agents on that node
Apr  6 10:52:24 OEL_M-SLOT-1 osafimmnd[3728]: NO Global discard node received 
for nodeId:2040f pid:2980
Apr  6 10:52:24 OEL_M-SLOT-1 osafimmnd[3728]: NO Implementer connected: 1539 
(MsgQueueService132111) <12283, 2010f>



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to