---
** [tickets:#1077] opensaf randomly and frequently fails to start with trace
enabled**
**Status:** accepted
**Milestone:** 4.5.0
**Created:** Mon Sep 15, 2014 07:08 AM UTC by Hans Feldt
**Last Updated:** Mon Sep 15, 2014 07:08 AM UTC
**Owner:** Hans Feldt
In IMM and NTF logging and tracing is done between fork and exec. This together
with the added call to tzset() in logtrace creates a deadlock in the child.
Here's an example of how immpbed hangs for ever (no supervision in immnd):
The system appears to have started correctly but configuration changes times
out:
root@SC-1:/# immcfg -a saAmfClusterStartupTimeout=10000000000
safAmfCluster=myAmfCluster
error - saImmOmCcbObjectModify_2 FAILED: SA_AIS_ERR_TRY_AGAIN (6)
error - immcfg command timed out (alarm)
>> all process including pbe has started:
root 391 0.0 0.0 146660 1144 ? S<sl 07:08 0:00
/usr/local/lib/opensaf/osafrded
root 405 0.0 0.0 148848 1144 ? S<sl 07:08 0:00
/usr/local/lib/opensaf/osaffmd
root 414 0.0 0.0 157324 1428 ? SNsl 07:08 0:00
/usr/local/lib/opensaf/osafimmd
root 423 0.0 0.0 238192 2600 ? SNsl 07:08 0:00
/usr/local/lib/opensaf/osafimmnd --tracemask=0xffffffff
root 437 0.0 0.0 227412 3884 ? SNsl 07:08 0:00
/usr/local/lib/opensaf/osaflogd
root 449 0.0 0.0 159552 1564 ? SNsl 07:08 0:00
/usr/local/lib/opensaf/osafntfd
root 459 0.0 0.0 157892 1708 ? SNsl 07:08 0:00
/usr/local/lib/opensaf/osafclmd
root 464 0.0 0.0 164344 1052 ? SN 07:08 0:00
/usr/local/lib/opensaf/osafimmnd --tracemask=0xffffffff
root 469 0.0 0.0 146656 1156 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafclmna
root 477 0.0 0.0 167104 2712 ? S<sl 07:08 0:00
/usr/local/lib/opensaf/osafamfd
root 486 0.0 0.0 225600 1964 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafamfnd
root 499 0.0 0.0 148728 1036 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafsmfnd
root 504 0.0 0.0 254200 1840 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafsmfd
root 536 0.0 0.0 157928 1904 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafckptnd
root 555 0.0 0.0 146644 1036 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafamfwd
root 596 0.0 0.0 153592 1332 ? Ssl 07:08 0:00
/usr/local/lib/opensaf/osafckptd
>> gdb backtrace shows that pbe is hanging in the newly added tzset in logtrace:
(gdb) bt
#0 __lll_lock_wait_private () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007f180cee39de in _L_lock_2427 () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f180cee37b1 in __tzset () at tzset.c:598
#3 0x00007f180db5e8cf in output (file=0x4722ff "immnd_proc.c", line=1577,
priority=priority@entry=7, category=category@entry=1,
format=format@entry=0x472452 "Exec: %s %s %s", ap=ap@entry=0x7fff58b213d8)
at logtrace.c:96
#4 0x00007f180db5ed9b in _logtrace_trace (file=file@entry=0x4722ff
"immnd_proc.c", line=line@entry=1577, category=category@entry=1,
format=format@entry=0x472452 "Exec: %s %s %s") at logtrace.c:173
#5 0x0000000000409cee in immnd_forkPbe (cb=cb@entry=0x691540 <_immnd_cb>) at
immnd_proc.c:1577
#6 0x000000000041e570 in immnd_proc_server
(timeout=timeout@entry=0x7fff58b21fd8) at immnd_proc.c:2111
#7 0x000000000040a763 in main (argc=<optimized out>, argv=<optimized out>) at
immnd_main.c:355
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO STARTING PBE process.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO
pbe-db-file-path:/srv/shared/imm//imm.db VETERAN:0 B:0
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO Implementer connected: 2
(safClmService) <13, 2010f>
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaClmNode' is
safClmService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaClmCluster' is
safClmService => class extent is safe.
Sep 13 07:08:17 SC-1 osafclmna[469]: Started
Sep 13 07:08:17 SC-1 osafclmna[469]: NO safNode=SC-1,safCluster=myClmCluster
Joined cluster, nodeid=2010f
Sep 13 07:08:17 SC-1 osafamfd[477]: Started
Sep 13 07:08:17 SC-1 osafamfd[477]: NO Invalid configuration,
saAmfCtDefRecoveryOnError=NO_RECOMMENDATION(1) for
'safVersion=4.0.0,safCompType=OpenSafCompTypeAMFWDOG'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO COMPONENT_FAILOVER(3) used instead of
NO_RECOMMENDATION(1) for 'safVersion=4.0.0,safCompType=OpenSafCompTypeAMFWDOG'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO Invalid configuration,
saAmfCtDefRecoveryOnError=NO_RECOMMENDATION(1) for
'safVersion=4.0.0,safCompType=OpenSafCompTypeCPND'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO COMPONENT_FAILOVER(3) used instead of
NO_RECOMMENDATION(1) for 'safVersion=4.0.0,safCompType=OpenSafCompTypeCPND'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO Invalid configuration,
saAmfCtDefRecoveryOnError=NO_RECOMMENDATION(1) for
'safVersion=4.0.0,safCompType=OpenSafCompTypeSMFND'
Sep 13 07:08:17 SC-1 osafamfd[477]: NO COMPONENT_FAILOVER(3) used instead of
NO_RECOMMENDATION(1) for 'safVersion=4.0.0,safCompType=OpenSafCompTypeSMFND'
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO Implementer (applier) connected: 3
(@safAmfService2020f) <0, 2020f>
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO Implementer connected: 4
(safAmfService) <18, 2010f>
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class
'SaAmfCompBaseType' is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaAmfSUBaseType'
is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class 'SaAmfSGBaseType'
is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class
'SaAmfAppBaseType' is safAmfService => class extent is safe.
Sep 13 07:08:17 SC-1 osafimmnd[423]: NO implementer for class
'SaAmfSvcBaseType' is safAmfService => class extent is safe.
tzset() needs to be moved to the init() function, services needs to be cleaned
up not do log or trace between fork() and exec()
The problem got more apparent after tzset() got introduced but could have
happened any time just using syslog in child.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets