I'm not sure about which scenario IMM is becoming slow. But if it has been
slow, then;
The bigger and actual problem is that once AMFD becomes QUIESCED and cannot
serve any messages/requests nor receive any messages, it is important that it
transitions to the next state as early and real-time as possible as the
heart-beat timeout between the amfd and amfnd.
Loss of service is more severe than the node-local heartbeat related failure.
i.e. The more serious issue is that - for the time period for which AMFD is
hung at the quiesced state, there is more likelyhood for loss-of-service(For
eg:- if AMFND wants to talk to the director to perform any distributed HA
operations).
Also, a slow-responsive IMMND leading to an AMFD-AMFND heartbeat timeout thus
leading to a failover only adds on to the problem.
Even if we implement a mechanism where we stop using immutil and instead
periodically call imm api directly, we will still be at the risk of loss of
service, till imm does not responds for a longer time(after 3 minutes AMFND
will get AMFD service down and clsuter will reboot); so it is better to let the
node go down quicker than to be unresponsive to AMFND requests and resulting in
a loss of service(like redundancy decisions) or cluster reset.
---
** [tickets:#516] Amfd: calling immutil_saImmOiImplementerClear in
avd_mds_qsd_role_evh leads to amfnd sending SIGABRT to amfd**
**Status:** unassigned
**Created:** Tue Jul 23, 2013 02:39 PM UTC by hano
**Last Updated:** Fri Jul 26, 2013 10:22 AM UTC
**Owner:** nobody
osafamfd is "supervised" by osafamfnd through osafamfd is sending "heartbeats"
to osafamfnd. If no "heartbeats" are recievied within one minute, osafamfnd
will send an abort signal to osafamfd which then will abort, (produce an core
dump and exit). The reason why osafamfd is not sending any "heartbeats" below
is due to that osafamfd has got a role change message from MDS (Active to
Quiesced) and calls immutil_saImmOiImplementerClear. IMM is not responding,
osafamfd waits and is not sending any "heartbeats" and will be aborted by
osafamfnd.
There are several cases with this behavior and amfd should not call
immutil_saImmOiImplementerClear but instead call saImmOiImplementerClear and
handle the return code and retry logic in avd_main_proc poll loop instead
to avoid these core dumps and make amf responsive.
---
Core was generated by `/usr/lib64/opensaf/osafamfd'.
Program terminated with signal 6, Aborted.
#0 0x00007f08e45b6dfd in nanosleep () from /lib64/libc.so.6
(gdb) bt full
#0 0x00007f08e45b6dfd in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f08e45e2824 in usleep () from /lib64/libc.so.6
No symbol table info available.
#2 0x0000000000407506 in immutil_saImmOiImplementerClear
(immOiHandle=94489411855) at ../../../../../osaf/tools/safimm/src/immutil.c:1042
rc = <optimized out>
nTries = 54
#3 0x000000000043492a in avd_mds_qsd_role_evh (cb=0x69c980, evt=<optimized
out>) at avd_role.c:573
status = <optimized out>
rc = <optimized out>
__FUNCTION__ = <error reading variable>
#4 0x000000000043341d in avd_process_event (cb_now=0x69c980, evt=0x7ff160) at
avd_proc.c:591
__FUNCTION__ = <error reading variable>
#5 0x00000000004336a1 in avd_main_proc () at avd_proc.c:507
pollretval = <optimized out>
cb = 0x69c980
evt = 0x7ff160
mbx_fd = <optimized out>
error = <optimized out>
polltmo = -1
#6 0x00000000004096bd in main (argc=<optimized out>, argv=<optimized out>) at
amfd_main.c:47
error = 0
node_id = <optimized out>
(gdb) quit
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets