Hi Anders,
Thanks for pointing out. Please find my responses inlined with
[Nagu].
Thanks
-Nagu
> -----Original Message-----
> From: Anders Widell [mailto:[email protected]]
> Sent: 08 September 2015 15:44
> To: Nagendra Kumar; [email protected]; Praveen Malviya;
> Mathivanan Naickan Palanivelu
> Cc: [email protected]
> Subject: Re: [devel] [PATCH 0 of 1] Review Request for amfd: respond to nid
> only after initialization is completed [#1334]
>
> Hi!
>
> I have a couple of questing regarding this ticket:
>
> 1) Why did you chose a solution where you delay the reply to NID until
> amfd has completed its cold sync, instead of a solution where amfd would
> itself initiate a node reboot if a failover happens before it has
> completed the cold sync? The chosen solution will delay the startup time
> of the standby controller, so there is a clear (and visible, since we
> are measuring startup time) disadvantage with this solution. Is there
> any advantage with it, or disadvantage with the other solution?
[Nagu]: If you could read my comment in #1334 ticket, it will point out a flaw
in handling failover by FMS in case of cold sync in progress(Amfd).
FMS collects evidence of Cold sync complete by checking whether FMS has
got CSI assignment or not. Since Amfd has acknowledged NID that he has completed
its initialization(without Cold sync completed) and hence NID started Amfnd
and Amfnd took configuration from Act Amfd and assigned CSI to FMS. Till here
Amfd is
not synced. So, when failover is triggered FMS couldn't take any decision.
So, #1334 was reported.
Now the question was
1. whether Amfd can ack NID that it is initialized and ready for
Assignment/failover without syncing.
2. Whether FMS logic can be changed.
I thought Amfd need to ack NID when it completes sync.
Whether 1. it increased Startuptime of Standby controller or
2. it actually corrected the errata of measurement,
is subject to interpretation.
In my opinion, #2 is correct. If Amfd is not ready, it should not have sent the
Ack.
So, it was a bug.
>
> 2) Isn't this problem applicable to all OpenSAF directors, and not just
> AMFD? Has anyone analyzed the other services to determine if they also
> need a similar fix? Since NID starts the services in serial and doesn't
> continue with the next service before it has received a reply from the
> previous one, it would mean that with the chosen solution all OpenSAF
> directors will be cold synced in serial. With the other solution, they
> can be cold synced in parallel.
[Nagu]: I checked now in Clmd, Ntfd, Logd, Immd and all needs correction and
hence startup time
will increase more.
>
> / Anders Widell
>
> On 04/27/2015 11:19 AM, [email protected] wrote:
> > Summary: amfd: respond to nid only after initialization is completed [#1334]
> > Review request for Trac Ticket(s): #1334
> > Peer Reviewer(s): Mathi, Hans N, Praveen
> > Pull request to: <<LIST THE PERSON WITH PUSH ACCESS HERE>>
> > Affected branch(es): All
> > Development branch: Default
> >
> > --------------------------------
> > Impacted area Impact y/n
> > --------------------------------
> > Docs n
> > Build system n
> > RPM/packaging n
> > Configuration files n
> > Startup scripts n
> > SAF services y
> > OpenSAF services n
> > Core libraries n
> > Samples n
> > Tests n
> > Other n
> >
> >
> > Comments (indicate scope for each "y" above):
> > ---------------------------------------------
> > <<EXPLAIN/COMMENT THE PATCH SERIES HERE>>
> >
> > changeset eeba7fe22afb8f4c40bb393d45a108cc59061eda
> > Author: Nagendra Kumar<[email protected]>
> > Date: Mon, 27 Apr 2015 14:41:02 +0530
> >
> > amfd: respond to nid only after initialization is completed [#1334] Act
> Amfd
> > initialization is said to be completed when it completes its
> initialization
> > with imm. Apart from initializing with imm, Standby Amfd also need
> to get
> > run time data from Act Amfd using cold sync. So, Standby Amfd
> initialization
> > is said to be completed when it completes its initialization with imm
> and it
> > completes its cold sync with Act Amfd. In the present code, Standby is
> > sending response to nid without cold sync complete. So, code has
> been added
> > to send nid response only when Amfd completes its initialization.
> >
> >
> > Complete diffstat:
> > ------------------
> > osaf/services/saf/amf/amfd/chkop.cc | 4 ++++
> > osaf/services/saf/amf/amfd/main.cc | 6 +++++-
> > 2 files changed, 9 insertions(+), 1 deletions(-)
> >
> >
> > Testing Commands:
> > -----------------
> > 1. Start Act controller SC-1.
> > Start Standby controller SC-2 and stop SC-1 when
> > Standby Amfd is in the middle of cold sync.
> > 2. Start Act controller SC-1.
> > Start Standby controller SC-2 and stop SC-1 when
> > Standby Amfd has completed cold sync but Amfnd has not asigned
> > role to Fmd at Standby controller.
> >
> >
> > Testing, Expected Results:
> > --------------------------
> > 1. Fmd reboots the node :
> > Apr 27 12:56:32 PM_SC-2 osaffmd[11792]: Rebooting OpenSAF NodeId = 0 EE
> Name = No EE Mapped, Reason: Failover occurred, but this node is not yet
> ready, OwnNodeId = 131599, SupervisionTime = 60
> > 2. Same as above.
> >
> > Conditions of Submission:
> > -------------------------
> > Ack from peer reviewers.
> >
> > Arch Built Started Linux distro
> > -------------------------------------------
> > mips n n
> > mips64 n n
> > x86 n n
> > x86_64 y y
> > powerpc n n
> > powerpc64 n n
> >
> >
> > Reviewer Checklist:
> > -------------------
> > [Submitters: make sure that your review doesn't trigger any checkmarks!]
> >
> >
> > Your checkin has not passed review because (see checked entries):
> >
> > ___ Your RR template is generally incomplete; it has too many blank entries
> > that need proper data filled in.
> >
> > ___ You have failed to nominate the proper persons for review and push.
> >
> > ___ Your patches do not have proper short+long header
> >
> > ___ You have grammar/spelling in your header that is unacceptable.
> >
> > ___ You have exceeded a sensible line length in your
> headers/comments/text.
> >
> > ___ You have failed to put in a proper Trac Ticket # into your commits.
> >
> > ___ You have incorrectly put/left internal data in your comments/files
> > (i.e. internal bug tracking tool IDs, product names etc)
> >
> > ___ You have not given any evidence of testing beyond basic build tests.
> > Demonstrate some level of runtime or other sanity testing.
> >
> > ___ You have ^M present in some of your files. These have to be removed.
> >
> > ___ You have needlessly changed whitespace or added whitespace crimes
> > like trailing spaces, or spaces before tabs.
> >
> > ___ You have mixed real technical changes with whitespace and other
> > cosmetic code cleanup changes. These have to be separate commits.
> >
> > ___ You need to refactor your submission into logical chunks; there is
> > too much content into a single commit.
> >
> > ___ You have extraneous garbage in your review (merge commits etc)
> >
> > ___ You have giant attachments which should never have been sent;
> > Instead you should place your content in a public tree to be pulled.
> >
> > ___ You have too many commits attached to an e-mail; resend as threaded
> > commits, or place in a public tree for a pull.
> >
> > ___ You have resent this content multiple times without a clear indication
> > of what has changed between each re-send.
> >
> > ___ You have failed to adequately and individually address all of the
> > comments and change requests that were proposed in the initial review.
> >
> > ___ You have a misconfigured ~/.hgrc file (i.e. username, email etc)
> >
> > ___ Your computer have a badly configured date and time; confusing the
> > the threaded patch review.
> >
> > ___ Your changes affect IPC mechanism, and you don't present any results
> > for in-service upgradability test.
> >
> > ___ Your changes affect user manual and documentation, your patch series
> > do not contain the patch that updates the Doxygen manual.
> >
> >
> > ------------------------------------------------------------------------------
> > One dashboard for servers and applications across Physical-Virtual-Cloud
> > Widest out-of-the-box monitoring support with 50+ applications
> > Performance metrics, stats and reports that give you Actionable Insights
> > Deep dive visibility with transaction tracing using APM Insight.
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> > _______________________________________________
> > Opensaf-devel mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/opensaf-devel
> >
> >
>
------------------------------------------------------------------------------
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991&iu=/4140
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel