- **status**: accepted --> review


---

** [tickets:#1077] opensaf randomly and frequently fails to start with trace 
enabled**

**Status:** review
**Milestone:** 4.5.0
**Created:** Mon Sep 15, 2014 07:08 AM UTC by Hans Feldt
**Last Updated:** Mon Sep 15, 2014 07:08 AM UTC
**Owner:** Hans Feldt

In IMM and NTF logging and tracing is done between fork and exec. This together 
with the added call to tzset() in logtrace creates a deadlock in the child. 
Here's an example of how immpbed hangs for ever (no supervision in immnd):

The system appears to have started correctly but configuration changes times 
out:

root@SC-1:/# immcfg -a saAmfClusterStartupTimeout=10000000000 
safAmfCluster=myAmfCluster
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
error - immcfg command timed out (alarm)


>> all process including pbe has started:

root       391  0.0  0.0 146660  1144 ?        S<sl 07:08   0:00 
/usr/local/lib/opensaf/osafrded
root       405  0.0  0.0 148848  1144 ?        S<sl 07:08   0:00 
/usr/local/lib/opensaf/osaffmd
root       414  0.0  0.0 157324  1428 ?        SNsl 07:08   0:00 
/usr/local/lib/opensaf/osafimmd
root       423  0.0  0.0 238192  2600 ?        SNsl 07:08   0:00 
/usr/local/lib/opensaf/osafimmnd --tracemask=0xffffffff
root       437  0.0  0.0 227412  3884 ?        SNsl 07:08   0:00 
/usr/local/lib/opensaf/osaflogd
root       449  0.0  0.0 159552  1564 ?        SNsl 07:08   0:00 
/usr/local/lib/opensaf/osafntfd
root       459  0.0  0.0 157892  1708 ?        SNsl 07:08   0:00 
/usr/local/lib/opensaf/osafclmd
root       464  0.0  0.0 164344  1052 ?        SN   07:08   0:00 
/usr/local/lib/opensaf/osafimmnd --tracemask=0xffffffff
root       469  0.0  0.0 146656  1156 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafclmna
root       477  0.0  0.0 167104  2712 ?        S<sl 07:08   0:00 
/usr/local/lib/opensaf/osafamfd
root       486  0.0  0.0 225600  1964 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafamfnd
root       499  0.0  0.0 148728  1036 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafsmfnd
root       504  0.0  0.0 254200  1840 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafsmfd
root       536  0.0  0.0 157928  1904 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafckptnd
root       555  0.0  0.0 146644  1036 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafamfwd
root       596  0.0  0.0 153592  1332 ?        Ssl  07:08   0:00 
/usr/local/lib/opensaf/osafckptd

>> gdb backtrace shows that pbe is hanging in the newly added tzset in logtrace:

(gdb) bt
#0  __lll_lock_wait_private () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x00007f180cee39de in _L_lock_2427 () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f180cee37b1 in __tzset () at tzset.c:598
#3  0x00007f180db5e8cf in output (file=0x4722ff "immnd_proc.c", line=1577, 
priority=priority@entry=7, category=category@entry=1, 
format=format@entry=0x472452 "Exec: %s %s %s", ap=ap@entry=0x7fff58b213d8)
    at logtrace.c:96
#4  0x00007f180db5ed9b in _logtrace_trace (file=file@entry=0x4722ff 
"immnd_proc.c", line=line@entry=1577, category=category@entry=1, 
format=format@entry=0x472452 "Exec: %s %s %s") at logtrace.c:173
#5  0x0000000000409cee in immnd_forkPbe (cb=cb@entry=0x691540 <_immnd_cb>) at 
immnd_proc.c:1577
#6  0x000000000041e570 in immnd_proc_server 
(timeout=timeout@entry=0x7fff58b21fd8) at immnd_proc.c:2111
#7  0x000000000040a763 in main (argc=<optimized out>, argv=<optimized out>) at 
immnd_main.c:355

Sep 13 07:08:17 SC-1 osafimmnd[423]: NO STARTING PBE process.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO 
pbe-db-file-path:/srv/shared/imm//imm.db VETERAN:0 B:0
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO Implementer connected: 2 
(safClmService) <13, 2010f>
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaClmNode' is 
safClmService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaClmCluster' is 
safClmService => class extent is safe.
Sep 13 07:08:17 SC-1 osafclmna[469]: Started
Sep 13 07:08:17 SC-1 osafclmna[469]: NO safNode=SC-1,safCluster=myClmCluster 
Joined cluster, nodeid=2010f
Sep 13 07:08:17 SC-1 osafamfd[477]: Started
Sep 13 07:08:17 SC-1 osafamfd[477]: NO Invalid configuration, 
saAmfCtDefRecoveryOnError=NO_RECOMMENDATION(1) for 
'safVersion=4.0.0,safCompType=OpenSafCompTypeAMFWDOG'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO COMPONENT_FAILOVER(3) used instead of 
NO_RECOMMENDATION(1) for 'safVersion=4.0.0,safCompType=OpenSafCompTypeAMFWDOG'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO Invalid configuration, 
saAmfCtDefRecoveryOnError=NO_RECOMMENDATION(1) for 
'safVersion=4.0.0,safCompType=OpenSafCompTypeCPND'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO COMPONENT_FAILOVER(3) used instead of 
NO_RECOMMENDATION(1) for 'safVersion=4.0.0,safCompType=OpenSafCompTypeCPND'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO Invalid configuration, 
saAmfCtDefRecoveryOnError=NO_RECOMMENDATION(1) for 
'safVersion=4.0.0,safCompType=OpenSafCompTypeSMFND'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO COMPONENT_FAILOVER(3) used instead of 
NO_RECOMMENDATION(1) for 'safVersion=4.0.0,safCompType=OpenSafCompTypeSMFND'
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO Implementer (applier) connected: 3 
(@safAmfService2020f) <0, 2020f>
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO Implementer connected: 4 
(safAmfService) <18, 2010f>
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 
'SaAmfCompBaseType' is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaAmfSUBaseType' 
is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaAmfSGBaseType' 
is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 
'SaAmfAppBaseType' is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 
'SaAmfSvcBaseType' is safAmfService => class extent is safe.

tzset() needs to be moved to the init() function, services needs to be cleaned 
up not do log or trace between fork() and exec()

The problem got more apparent after tzset() got introduced but could have 
happened any time just using syslog in child.


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to