- **status**: review --> fixed
---
** [tickets:#556] IMM: temporary deadlock between IMMND and PBE.**
**Status:** fixed
**Created:** Wed Aug 28, 2013 09:26 AM UTC by Anders Bjornerstedt
**Last Updated:** Tue Sep 03, 2013 07:54 AM UTC
**Owner:** Anders Bjornerstedt
The incident is similar to that in ticket #517.
https://sourceforge.net/p/opensaf/tickets/517/
Part of the problem is that the AMFND goes down due to a too short timeout
(10 seconds) on a handle an om-handle initialize. In general, an IMMND sync
of 300K objects or more could take up to 60 seconds. Put in another way,
if a sync takes longer than 60 seconds then the system is configured wrong
or has too much imm data so that it is out of bounds for what OpenSAF
currently tries to support.
This ticket deals with why the sync took unexpectedly long in this particlar
case. The imm data was small enough that other syncs took just a few seconds.
The problem discovered is a temporary service internal deadlock between the
PBE and the IMMND.
The sync is blocked from actually starting because it is waiting on the
outcome of one or more CCBs in critical (i.e. being processed by the PBE).
Removal of that blocking is itself covered by enhancement ticket (#31):
https://sourceforge.net/p/opensaf/tickets/31/
The PBE it turns out is being restarted at new active SC due to an SC failover.
A restarted PBE in this context is forced to regenerate the imm.db sqlite file.
Regenerating the sqlite file is effectively an immdump which tries to obtain
a "dump iterator" from IMMND. The dump iterator is special in that it iterates
filtered over only persistent objects. It also allocates a new epoch so that
all persistent modifications done after the dump snapshot are covered by a new
epoch. But the epoch allocation fails because there is a sync ongoing.
So the PBE is stuck in a TRY_AGAIN loop to obtain the dump iterator.
This explains the deadlock between PBE and IMMND.
The PBE times out after 20 seconds, exits and gets restarted.
The sync also times out on non progress after 20 seconds and is aborted.
The restarted PBE then succeeds in obtaining the dump iterator and the
next sync attempt succeeds.
The AMF should of course be fixed to have a longer wait.
But in this case it would first have waited 20 seconds due to this deadlock
and then wait for a successfully started sync to complete. Since the actual
sync could take up to 60 seonds, the total wait could here end up to be
clearly above 60 seconds. It is in principle possible that the imm internal
deadlock could re-occur in the next sync attempt.
So this problem needs to be fixed.
A solution is to detect this situation in the IMMND and to allow the
dump iterator to proceed without generating an epoch. Since there is a sync
in progress, no persistent writes are allowed anyway. So the epoch should
not strictly be necessaty in this case. The sync itself generates a new epoch
and the dump iterator should ba able to share the epoch shift done by the
sync iterator.
The dump can also not take longer than the sync since the sync is still waiting
on the outcome of critical ccbs.
When enhancement ticket #31 is done, this issue (the shared epoch) needs to
be looked into again, because the sync can then complete before the PBE has
finished regenerating the file and re-attached.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets