I tried to reproduce this issue on opensaf version 5.18.04, but failed to
reproduce.
1)I try to reproduce this issue on 2 payloads(PL-3 and PL-4) and controller
(SC-1) and enable PBE on SC-1.
2)I am writing into the checkpoint opened in PL-3 and read it from the PL-4
using demo app in a loop.
a)PL-3:
root@mohan-VirtualBox:/home/mohan/ticket1733# ./tkt 1
Writing to Checkpoint safCkpt=DemoCkpt,safApp=safCkptService
....
Section-Id = 11 ....
CheckpointData being written = "************ This is the
saCkptCheckpointTrackCallback demo ***********"
DataOffset = 0 ....
saCkptCheckpointWrite PASSED
Writing to Checkpoint safCkpt=DemoCkpt,safApp=safCkptService
....
Section-Id = 11 ....
CheckpointData being written = "************ This is the
saCkptCheckpointTrackCallback demo ***********"
DataOffset = 0 ....
saCkptCheckpointWrite PASSED
Writing to Checkpoint safCkpt=DemoCkpt,safApp=safCkptService
....
Section-Id = 11 ....
CheckpointData being written = "************ This is the
saCkptCheckpointTrackCallback demo ***********"
DataOffset = 0 ....
saCkptCheckpointWrite PASSED
b)PL-4:
root@mohan-VirtualBox:/home/mohan/ticket1733# ./tkt 2
*******************************************************************
Demonstrating Checkpoint Service Usage with a Track Callback
*******************************************************************
Initialising With Checkpoint Service....
Opening Collocated Checkpoint = safCkpt=DemoCkpt,safApp=safCkptService with
create flags....
saCkptTrack being enabled ....
Checkpoint being on Select .....
saCkptTrack being enabled ....
Checkpoint being on Select .....
Ckpt TrackCallback received for SectionId`s =
Section-Id = 11 ....
SectionId-idLen = 2 ....
Reading from Checkpoint safCkpt=DemoCkpt,safApp=safCkptService Ckpt handle
=1956C00 ....
CheckpointRead Checkpoint TrackCallback processed
CheckpointData was written in sectionId: 11 = "************ This is the
saCkptCheckpointTrackCallback demo ***********"
3)I used following command,immcfg -a saAmfSGCompRestartMax=1000
safSg=NoRed,safApp=OpenSAF
and killed the checkpoint node director number of times on payloads
(PL-3, PL-4) but node was not rebooted.
4) root@mohan-VirtualBox:~# amf-state siass
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
safSISU=safSu=PL-4\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed4,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
5)pl:3
root@mohan-VirtualBox:~# /etc/init.d/opensafd start
[ ok ] Starting opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# immcfg -a saAmfSGCompRestartMax=1000
safSg=NoRed,safApp=openSAF
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~#
6)pl-4:
root@mohan-VirtualBox:~# /etc/init.d/opensafd start
[ ok ] Starting opensafd (via systemctl): opensafd.service.
root@mohan-VirtualBox:~# immcfg -a saAmfSGCompRestartMax=1000
safSg=NoRed,safApp=openSAF
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
root@mohan-VirtualBox:~# pkill -9 ckptnd
since, there is no trace information(ckptnd trace is not there), further
debugging is not possible.
so, i am closing this ticket, please reopen with traces if issue is reproduced.
---
** [tickets:#1733] Payload got rebooted when cpnd is killed on payload**
**Status:** accepted
**Milestone:** 5.18.08
**Created:** Wed Apr 06, 2016 11:05 AM UTC by Madhurika Koppula
**Last Updated:** Thu Aug 23, 2018 08:30 AM UTC
**Owner:** mohan kanakam
**Attachments:**
-
[cpsv.tgz](https://sourceforge.net/p/opensaf/tickets/1733/attachment/cpsv.tgz)
(15.0 MB; application/octet-stream)
Setup:
Changeset- 7436
Version - opensaf 5.0
4 nodes configured with single PBE
Issue Observed: It is random.
1) When CPND is killed on payload, component restart of CPND failed because of
expiration of component registration timer.
2) Node went for reboot. Test application is being ran.
Below is the timestamp of PL-4:
Apr 6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' component restart probation timer
started (timeout: 60000000000 ns)
Apr 6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO Restarting a component of
'safSu=PL-4,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
Apr 6 10:52:00 OEL_M-SLOT-4 osafamfnd[3015]: NO
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'componentRestart'
Apr 6 10:52:00 OEL_M-SLOT-4 osafckptnd[6263]: Started
Apr 6 10:52:10 OEL_M-SLOT-4 osafamfnd[3015]: NO Instantiation of
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed
Apr 6 10:52:10 OEL_M-SLOT-4 osafamfnd[3015]: NO Reason: component registration
timer expired
Apr 6 10:52:10 OEL_M-SLOT-4 osafckptnd[6294]: Started
Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Instantiation of
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' failed
Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Reason: component registration
timer expired
Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: WA
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF' Presence State RESTARTING
=> INSTANTIATION_FAILED
Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: NO Component Failover trigerred
for 'safSu=PL-4,safSg=NoRed,safApp=OpenSAF': Failed component:
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'
Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: ER
'safComp=CPND,safSu=PL-4,safSg=NoRed,safApp=OpenSAF'got Inst failed
Apr 6 10:52:20 OEL_M-SLOT-4 osafamfnd[3015]: Rebooting OpenSAF NodeId = 132111
EE Name = , Reason: NCS component Instantiation failed, OwnNodeId = 132111,
SupervisionTime = 60
Apr 6 10:52:20 OEL_M-SLOT-4 opensaf_reboot: Rebooting local node; timeout=60
Apr 6 10:52:46 OEL_M-SLOT-4 kernel: imklog 5.8.10, log source = /proc/kmsg
started.
3) Below is the timestamp of ACTIVE controller:
Apr 6 10:51:59 OEL_M-SLOT-1 osafimmd[6916]: WA No coordinator IMMND known
(case B) - ignoring sync request
Apr 6 10:51:59 OEL_M-SLOT-1 osafimmd[6916]: NO Node 2040f request sync
sync-pid:2980 epoch:0
Apr 6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Resetting link
<1.1.1:eth3-1.1.4:eth3>, peer not responding
Apr 6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Lost link <1.1.1:eth3-1.1.4:eth3> on
network plane A
Apr 6 10:52:24 OEL_M-SLOT-1 kernel: TIPC: Lost contact with <1.1.4>
Apr 6 10:52:24 OEL_M-SLOT-1 osafamfd[7003]: NO Node 'PL-4' left the cluster
Apr 6 10:52:24 OEL_M-SLOT-1 osafclmd[6988]: NO Node 132111 went down. Not
sending track callback for agents on that node
Apr 6 10:52:24 OEL_M-SLOT-1 osafclmd[6988]: NO Node 132111 went down. Not
sending track callback for agents on that node
Apr 6 10:52:24 OEL_M-SLOT-1 osafimmnd[3728]: NO Global discard node received
for nodeId:2040f pid:2980
Apr 6 10:52:24 OEL_M-SLOT-1 osafimmnd[3728]: NO Implementer connected: 1539
(MsgQueueService132111) <12283, 2010f>
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets