Re: [Users] Problems starting up hisv (HPI) service

Murthy E-G19462 Mon, 03 Dec 2007 21:34:12 -0800

Michael,

Main functionality of the HISv service is to


1. Receives the hpi events by opening session with underlying HPI daemon
2. provides mechanism to reset the blades.

You can check the HISv log to see whether all the FRUs are detected.
Also give
Administrative triggers like removing a blade (if you have other blades
in the system).
When a hot-swap event is triggered then Hisv publishes this event to all
the subscribers.
You can check this information in Hisv and AVM logging.

Thanks
Murthy

 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of Bishop, 
> Michael (OSLO R&D)
> Sent: Monday, December 03, 2007 11:42 PM
> To: [email protected]
> Subject: Re: [Users] Problems starting up hisv (HPI) service
> 
> Thanks to Scott Peterson, I now have the hisv service up and 
> running on my 2 OpenSAF 1.0-4 controllers.
> 
> Can anyone give me recommendation on what is the best way to 
> test the hisv service, in order to make sure it is properly 
> communicating with my OpenHPI daemon process - which is also 
> running on my controller nodes?
> 
> I've noticed that there is an hisv_demo.c program under 
> services/hisv/hcd - but it does not appear to get built via 
> the provided Makefile.
> 
> Any suggestions on how to test the hisv service would be appreciated.
> 
> Regards,
> Michael Bishop
> Open Source & Linux Organization (OSLO)
> Hewlett-Packard Company
> 3404 E. Harmony Rd.  Bldg. 5L, Post C8,  Mailstop 42 Fort 
> Collins, CO  80528-9599
> Phone: 970-898-4393
> E-Mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> > -----Original Message-----
> > From: Petersen Scott-P27052 [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, November 29, 2007 4:06 PM
> > To: Bishop, Michael (OSLO R&D); [email protected]
> > Subject: RE: [Users] Problems starting up hisv (HPI) service
> >
> >
> > Hi Michael,
> >
> > I looked at your BOM file and it appears that not all of 
> the sections 
> > were uncommented for the HISV component. I uncommented 
> those sections 
> > and the updated file is attached.
> >
> > Hope this helps
> >
> > Scott G. Petersen
> > System Validation
> > Motorola, Inc.
> > Embedded Communications Computing
> > 2900 S Diablo Way
> > Tempe, AZ 85282
> > Phone: 602-438-3471
> > Cell: 480-600-6964
> > Text: [EMAIL PROTECTED]
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of 
> Bishop, Michael 
> > (OSLO R&D)
> > Sent: Thursday, November 29, 2007 3:51 PM
> > To: [email protected]
> > Subject: [Users] Problems starting up hisv (HPI) service
> >
> > Greetings.
> >
> > I have OpenSAF 1.0-4 controllers, and I have an instance of OpenHPI 
> > running (HPI-B.01.01  OpenHPI version 2.8.1).
> >
> > I built the controllers with hw_mgmt=2.  I do have an ncs_hisv 
> > executable in /opt/opensaf/controller/bin.
> >
> > I cannot seem to get the hisv service to start properly.  Both 
> > controllers get stuck in SCAP.  If I crtl-c out of SCAP and look at 
> > the logs, there is no indication that hisv is attempting to start.
> >
> > I'm assuming that my problem most likely lies in my 
> NCSSystemBOM.xml - 
> > which I've attached.  I've uncommented the config lines 
> pertaining to 
> > HISV and hisv - as documented in earlier e-mails.
> >
> > Thanks for any help or suggestions.
> >
> > Regards,
> > Michael Bishop
> > Open Source & Linux Organization (OSLO) Hewlett-Packard Company
> > 3404 E. Harmony Rd.  Bldg. 5L, Post C8,  Mailstop 42 Fort 
> Collins, CO
> > 80528-9599
> > Phone: 970-898-4393
> > E-Mail: [EMAIL PROTECTED]
> >
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] Behalf Of Hans Feldt
> > > Sent: Thursday, November 29, 2007 3:03 AM
> > > To: Gudipalli S.-G19449
> > > Cc: [email protected]
> > > Subject: Re: [Users] Controller HA mechanisms
> > >
> > >
> > >
> > > What do you mean by "configure NID config"?
> > >
> > > Change nid so that stays alive and supervise its children?
> > >
> > > Thanks,
> > > Hans
> > >
> > > Gudipalli S.-G19449 wrote:
> > > > Hi hans,
> > > >
> > > >
> > > >    In a system the controller nodes that will have RDE/SCAP
> > > will power
> > > > on by self.
> > > >
> > > > Case both blades in the box, the power to the box is applied:
> > > > -------------------------------------------------------------
> > > >
> > > > Sub case1:
> > > > ----------
> > > > RDE on node 1 becomes active.
> > > > SCAP on node 1 starts but fails to complete Init successfully.
> > > >
> > > > We expect the platform vendor porting openSAF to configure
> > > NID config to
> > > > Reboot the node on failure or have his platform mechanisms
> > > do that for
> > > > him.
> > > >
> > > > Sub case2:
> > > > ----------
> > > > RDE on node 1 becomes active.
> > > > SCAP on node 1 starts completes Init successfully.
> > > > Immediately afterwards crashes.
> > > >
> > > > Since the two nodes node 1 and node 2 were in the box when
> > > the power was
> > > > applied
> > > > We expect that given small variations in the boot times
> > > node 2 will be
> > > > at SCAP initialization
> > > > Before node 1 is successfully initialized. Since the other
> > > RDE/SCAP is
> > > > there this
> > > > Situation is also solved.
> > > >
> > > > Case one blade in the box, the power to the box is applied:
> > > > ----------------------------------------------------------------
> > > >
> > > > Sub case1:
> > > > ----------
> > > > RDE on node 1 becomes active.
> > > > SCAP on node 1 starts but fails to complete Init successfully.
> > > >
> > > > We expect the platform vendor porting openSAF to configure
> > > NID config to
> > > > Reboot the node on failure or have his platform mechanisms
> > > do that for
> > > > him.
> > > >
> > > > Sub case2:
> > > > ----------
> > > > RDE on node 1 becomes active.
> > > > SCAP on node 1 starts completes Init successfully.
> > > > Immediately afterwards crashes.
> > > >
> > > > This is a double fault case a manual repair of restarting
> > > this single
> > > > node
> > > > Is required. If the platform is normally run like this then
> > > the platform
> > > > Vendor can have his fault manager track SCAP and on its
> > > death take the
> > > > Necessary recover/repair actions.
> > > >
> > > > Regards
> > > > Sugadeesh
> > > >
> > > >> -----Original Message-----
> > > >> From: [EMAIL PROTECTED]
> > > >> [mailto:[EMAIL PROTECTED] On Behalf Of Hans Feldt
> > > >> Sent: Thursday, November 29, 2007 2:43 AM
> > > >> To: Saha Sayandeb-G19428
> > > >> Cc: [email protected]
> > > >> Subject: Re: [Users] Controller HA mechanisms
> > > >>
> > > >>
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: Saha Sayandeb-G19428 [mailto:[EMAIL PROTECTED]
> > > >>> Sent: den 28 november 2007 18:23
> > > >>> To: Hans Feldt
> > > >>> Cc: [email protected]
> > > >>> Subject: RE: [Users] Controller HA mechanisms
> > > >>>
> > > >>> Hans,
> > > >>>
> > > >>> Comments below ...
> > > >>>
> > > >>>> How does OpenSAF handle the following scenario:
> > > >>>>
> > > >>>> - Controller 1 (C1) power on
> > > >>>> - C1 RDE starts and decides to be active since it is
> > alone in the
> >
> > > >>>> cluster
> > > >>>> - C1 PSR or AMF dies due to some reason
> > > >>>> - Controller 2 (C2) power on
> > > >>>> - C2 RDE starts and gets the role standby from RDE on C1
> > > >>>> - C2 waits forever to get synced from C1
> > > >>>>
> > > >>>> Some issues:
> > > >>>> C1 RDE claims to be active although it is not
> > > >>>> C1 does not reboot
> > > >>>> C2 does not reboot when its looses contact with the active
> > > >>> controller
> > > >>>> and not in sync.
> > > >>>> C2 cannot become active if we reboot C1
> > > >>>>
> > > >>>> Comments?
> > > >>> [SS] I simulated this condition quite easily by simply
> > killing the
> >
> > > >>> ncs_scap process in the one and only active 
> controller and then
> > > >>> running the get_ha_state command  and as you say the 
> RDE in this
> > > >>> controller still keeps thinking that it is active which
> > > >> prevents the
> > > >>> second controller to obtain the active state. So this is a
> > > >> hole as the
> > > >>> RDE has no clue that the Avd+AvM has crashed. I guess we
> > > >> could add a
> > > >>> role heart-beat from the Avd+AvM to the RDE to ensure that
> > > >> the RDE is
> > > >>> always in-synch with what's going on and can relinquish
> > the active
> >
> > > >>> state so that the other controller can become active
> > under such a
> > > >>> circumstance.
> > > >>> But this whole scenario of having only one controller
> > > which crashes
> > > >>> and then the second one that tries to come up is 
> probably not so
> > > >>> common or do you think it will be because of the way
> > > >> OpenSAF waits 3
> > > >>> minutes before rebooting payload blades when AvD goes down?
> > > >> No I just stumbled on this since we're doing a lot power
> > on/off of
> > > >> controllers and fail-overs at the moment.
> > > >>
> > > >> As a solution, what if nid stays alive and supervise its
> > children?
> > > >> If rde or scap dies, nid reboots the system.
> > > >>
> > > >> Cheers,
> > > >> Hans
> > > >>
> > > >>> Sayan
> > > >>>
> > > >>>> Regards,
> > > >>>> Hans
> > > >>>> _______________________________________________
> > > >>>> Users mailing list
> > > >>>> [email protected]
> > > >>>> http://list.opensaf.org/maillist/listinfo/users
> > > >>>>
> > > >> _______________________________________________
> > > >> Users mailing list
> > > >> [email protected]
> > > >> http://list.opensaf.org/maillist/listinfo/users
> > > >>
> > > >
> > >
> > > _______________________________________________
> > > Users mailing list
> > > [email protected]
> > > http://list.opensaf.org/maillist/listinfo/users
> > >
> >
> _______________________________________________
> Users mailing list
> [email protected]
> http://list.opensaf.org/maillist/listinfo/users
> 
_______________________________________________
Users mailing list
[email protected]
http://list.opensaf.org/maillist/listinfo/users

Re: [Users] Problems starting up hisv (HPI) service

Reply via email to