As I stated in the README.txt:
3. For the BigMerged case (where the larger configuration is merged into
the imm.xml using immxml-merge commands before starting the cluster), and on
the SC-1 controller node:
First installing, starting the SC-1 controller, then with the modified
status command (reset && sudo /etc/init.d/opensafd status >/tmp/osafstat.txt &&
perl -0777 -pi -e 's/(\w)\n\s+(saAmfSISUHAState=)/$1,$2/mgx' /tmp/osafstat.txt
&& sort </tmp/osafstat.txt && wc -l </tmp/osafstat.txt ), you get:
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF,saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF,saAmfSISUHAState=ACTIVE(1)
2
(charles)~$
On starting the SC-2 controller, at first you get the modified status:
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF,saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-1\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF,saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF,saAmfSISUHAState=STANDBY(2)
safSISU=safSu=SC-2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF,saAmfSISUHAState=ACTIVE(1)
4
(charles)~$
and you also get that same status on the SC-2 node. Then after 30 seconds
or so, the SC-1 node fails, and 30 seconds after that the SC-2 node fails.
So when 'immxml-merge'-ing all the little 2N.xmls into the imm.xml before
starting the first controller node, you get a cluster that only briefly comes
up for the 2 controller nodes and then dies. There is never a point when all
the nodes are up, to run stubinit.sh (which I only ever run when all the nodes
are in the cluster.)
So steps 1 and 2 cannot be executed for the BigMerged configuration as things
stand. Are you saying that turning off the log level info might allow the
system to stay up, that logging is causing it to die?
Charlie ...
-----Original Message-----
From: praveen malviya [mailto:[email protected]]
Sent: Wednesday, April 15, 2015 5:07 AM
To: Johnson, Charles; [email protected]
Subject: Re: [users] amf-adm question ...
On 03-Apr-15 10:25 PM, Johnson, Charles wrote:
> I have created a set of examples of the amf malfunctions and amf crashes I
> have experienced, and as Praveen indicated, the message logs, /etc/opensaf
> configurations and trace logs for the 2 imm and 3 amf daemons are included
> for the crashes.
>
> The README file in the examples directory tells the tale (actually, five
> different configurations of what is probably one bug).
>
> I tried a workaround, putting all the added services into the payloads only,
> that didn't fix it.
>
> It is a 7zip archive with ultra-compression, as small as I could
> squeeze it (25MB), it is located at the box.com storage site:
> https://app.box.com/s/t9ghgglv6cs0kaf10bnm0ial7be8h9zo
>
> On yum systems, 7zip is installed by "sudo yum -y install
> p7zip-plugins.x86_64 p7zip.x86_64"
>
> FYI, the command line I used for making the archive is "7z a -t7z -m0=lzma
> -mx=9 -mfb=64 -md=32m -ms=on examples.7z examples", in case that info is
> needed.
>
I have tried to use the imm.xml provided in Bigmerged and smallmerged
directories for a four node cluster. For applications SUs, I replaced
PL-5/PL-7/PL-9 by PL-3 and PL-6/PL-8/PL-10 by PL-4. In this way all the
services are hosted on four node cluster.
I have verified all the services successfully came up on my four node cluster.
For Bigmerged I have attached the output of:
1)SI assignment status in siass.txt.
2)SI state in si.txt.
At the same time I have analyzed BigMerged/messages.SC-1.txt which says
that SC-1 rebooted at Mar 31 11:01:58:
"
Mar 31 11:01:58 metabox-fedora19-dl360g6-7 osafimmnd[911]: IN Create
runtime object
'safCSIComp=safComp=IMMD\#safSu=SC-2\#safSg=2N\#safApp=OpenSAF,safCsi=IMMD,safSi=SC-2N,safApp=OpenSAF'
by Impl id: 3
SC-1 dies, and is rebooted.
Mar 31 11:06:17 metabox-fedora19-dl360g6-7 rsyslogd: [origin
software="rsyslogd" swVersion="7.2.6" x-pid="436"
x-info="http://www.rsyslog.com"] start
"
But if I see BigMerged/traces.SC-1/osafamfd there are traces beyond Mar
31 11:01:58. And there are not errors during this time:
"
Mar 31 11:02:20.216401 osafamfd [1005:mbcsv_util.c:0492] <<
mbcsv_send_ckpt_data_to_all_peers
Mar 31 11:02:20.216412 osafamfd [1005:mbcsv_api.c:0868] <<
mbcsv_process_snd_ckpt_request: retval: 1
Mar 31 11:02:20.216423 osafamfd [1005:imm.cc:0235] >> peek
Mar 31 11:02:20.216433 osafamfd [1005:imm.cc:0244] << peek
"BigMerged/traces.SC-1/osafamfd" 12533L, 1185611C
"
There is no clue what caused reboot of SC-1.
Please try to run Bigmerged configuration with following changes:
1)Make sure stubinit.sh is called after all nodes have joined the cluster.
2)Also in stubinit.sh, all immcfg -f <> should be commented as all the
applications are already merged in in the imm.xml , so why to again add
the configuration.
3) For AMF, keep the traces enabled and not the log level info.
Please Share the AMFD traces and syslog from active and standby
controller if issue is still observed.
Thanks,
Praveen
------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users