Hi!

I have a couple of questing regarding this ticket:

1) Why did you chose a solution where you delay the reply to NID until 
amfd has completed its cold sync, instead of a solution where amfd would 
itself initiate a node reboot if a failover happens before it has 
completed the cold sync? The chosen solution will delay the startup time 
of the standby controller, so there is a clear (and visible, since we 
are measuring startup time) disadvantage with this solution. Is there 
any advantage with it, or disadvantage with the other solution?

2) Isn't this problem applicable to all OpenSAF directors, and not just 
AMFD? Has anyone analyzed the other services to determine if they also 
need a similar fix? Since NID starts the services in serial and doesn't 
continue with the next service before it has received a reply from the 
previous one, it would mean that with the chosen solution all OpenSAF 
directors will be cold synced in serial. With the other solution, they 
can be cold synced in parallel.

/ Anders Widell

On 04/27/2015 11:19 AM, [email protected] wrote:
> Summary: amfd: respond to nid only after initialization is completed [#1334]
> Review request for Trac Ticket(s): #1334
> Peer Reviewer(s): Mathi, Hans N, Praveen
> Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>>
> Affected branch(es): All
> Development branch: Default
>
> --------------------------------
> Impacted area       Impact y/n
> --------------------------------
>   Docs                    n
>   Build system            n
>   RPM/packaging           n
>   Configuration files     n
>   Startup scripts         n
>   SAF services            y
>   OpenSAF services        n
>   Core libraries          n
>   Samples                 n
>   Tests                   n
>   Other                   n
>
>
> Comments (indicate scope for each "y" above):
> ---------------------------------------------
>   <<EXPLAIN/COMMENT THE PATCH SERIES HERE>>
>
> changeset eeba7fe22afb8f4c40bb393d45a108cc59061eda
> Author:       Nagendra Kumar<[email protected]>
> Date: Mon, 27 Apr 2015 14:41:02 +0530
>
>       amfd: respond to nid only after initialization is completed [#1334] Act 
> Amfd
>       initialization is said to be completed when it completes its 
> initialization
>       with imm. Apart from initializing with imm, Standby Amfd also need to 
> get
>       run time data from Act Amfd using cold sync. So, Standby Amfd 
> initialization
>       is said to be completed when it completes its initialization with imm 
> and it
>       completes its cold sync with Act Amfd. In the present code, Standby is
>       sending response to nid without cold sync complete. So, code has been 
> added
>       to send nid response only when Amfd completes its initialization.
>
>
> Complete diffstat:
> ------------------
>   osaf/services/saf/amf/amfd/chkop.cc |  4 ++++
>   osaf/services/saf/amf/amfd/main.cc  |  6 +++++-
>   2 files changed, 9 insertions(+), 1 deletions(-)
>
>
> Testing Commands:
> -----------------
> 1. Start Act controller SC-1.
>     Start Standby controller SC-2 and stop SC-1 when
>     Standby Amfd is in the middle of cold sync.
> 2. Start Act controller SC-1.
>     Start Standby controller SC-2 and stop SC-1 when
>     Standby Amfd has completed cold sync but Amfnd has not asigned
>     role to Fmd at Standby controller.
>
>
> Testing, Expected Results:
> --------------------------
> 1. Fmd reboots the node :
> Apr 27 12:56:32 PM_SC-2 osaffmd[11792]: Rebooting OpenSAF NodeId = 0 EE Name 
> = No EE Mapped, Reason: Failover occurred, but this node is not yet ready, 
> OwnNodeId = 131599, SupervisionTime = 60
> 2. Same as above.
>
> Conditions of Submission:
> -------------------------
> Ack from peer reviewers.
>
> Arch      Built     Started    Linux distro
> -------------------------------------------
> mips        n          n
> mips64      n          n
> x86         n          n
> x86_64      y          y
> powerpc     n          n
> powerpc64   n          n
>
>
> Reviewer Checklist:
> -------------------
> [Submitters: make sure that your review doesn't trigger any checkmarks!]
>
>
> Your checkin has not passed review because (see checked entries):
>
> ___ Your RR template is generally incomplete; it has too many blank entries
>      that need proper data filled in.
>
> ___ You have failed to nominate the proper persons for review and push.
>
> ___ Your patches do not have proper short+long header
>
> ___ You have grammar/spelling in your header that is unacceptable.
>
> ___ You have exceeded a sensible line length in your headers/comments/text.
>
> ___ You have failed to put in a proper Trac Ticket # into your commits.
>
> ___ You have incorrectly put/left internal data in your comments/files
>      (i.e. internal bug tracking tool IDs, product names etc)
>
> ___ You have not given any evidence of testing beyond basic build tests.
>      Demonstrate some level of runtime or other sanity testing.
>
> ___ You have ^M present in some of your files. These have to be removed.
>
> ___ You have needlessly changed whitespace or added whitespace crimes
>      like trailing spaces, or spaces before tabs.
>
> ___ You have mixed real technical changes with whitespace and other
>      cosmetic code cleanup changes. These have to be separate commits.
>
> ___ You need to refactor your submission into logical chunks; there is
>      too much content into a single commit.
>
> ___ You have extraneous garbage in your review (merge commits etc)
>
> ___ You have giant attachments which should never have been sent;
>      Instead you should place your content in a public tree to be pulled.
>
> ___ You have too many commits attached to an e-mail; resend as threaded
>      commits, or place in a public tree for a pull.
>
> ___ You have resent this content multiple times without a clear indication
>      of what has changed between each re-send.
>
> ___ You have failed to adequately and individually address all of the
>      comments and change requests that were proposed in the initial review.
>
> ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
>
> ___ Your computer have a badly configured date and time; confusing the
>      the threaded patch review.
>
> ___ Your changes affect IPC mechanism, and you don't present any results
>      for in-service upgradability test.
>
> ___ Your changes affect user manual and documentation, your patch series
>      do not contain the patch that updates the Doxygen manual.
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Opensaf-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-devel
>
>


------------------------------------------------------------------------------
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to