[EMAIL PROTECTED] wrote on 09/12/2007 04:51:38
PM:
> - A shelf manager with IPMI service (In fact, there is 2
> shelf-manager, but the second one is in standby mode)
> - 2 blades running each an openhpi daemon, each of them
> connected to the shelf-manager to retrieve IPMI data
> - A SMS connected to OpenHPI with the libopenhpi.so
>
>
> Probably using a Master/Slave model, I think that both daemons
> should have the data in sync in order to be able to provide HPI
> service in case one of the HPI daemons disappears. Right? How hard
> do you think it would be to keep necessary data in sync? We would
> probably add a data pipe between the 2 daemons to transfer
> synchronization data and heartbeat? Should we also sync some files
> (Domain Event Log, etc??)
>
> What do you think about this? I am trying to evaluate how much of
> effort it will be required by us to add these features (in order for
> me to get the goahead)?
I think that if both domains, having a peer relationship, are accessing
the same hardware through a plugin, you don't need to take any steps to
keep the domains in sync other that what its specified in the spec as the
HPI user's responsibility (e.g. HPI B.02.01 spec; Peer Domains, page 23;
last paragraph, page 83).
Now, peer domains do not require:
- that same resources (by Resource ID) will have the same tag, severity,
failed flag, or entry id (order in RPT).
- that events will be received in same order.
- that internal domain events (Domain Event Log) and user events will show
up in the other domain.
- same alarm id or acknowledge status between DATs.
- same user alarms.
- alarms resulting because of domain conditions to appear in the other
domain.
You probably want some or all of these things if you are talking about
master/slave and heartbeats. In that case, you don't necessarily need a
peer domain (second paragraph, page 25, spec). A master/slave
configuration could work. libopenhpi.so could switch between them
transparently if it loses connection with one of them. For synchronizing
information between the domains, the slave domain could act like a client
and get all its information from the master domain. Once libopenhpi.so
needs to switch to the slave domain, it can send an extra message to let
the slave know that it must get its information from the plugin, instead
of the master domain. Or this could be automatic, if we let the slave
switch to the plugin once he detects connection loss with the master.
Later, the slave polls the master and stops answering HPI requests from
libopenhpi.so once it finds the master is back on. That would make
libopenhpi.so switch back to the master....something like that.
An effort like this would take a good chunk of time. The main thing is
that it depends on having the domains live within their own daemons first,
which is another good chunk of time (maybe greater). I'm probably going to
be doing the latter next year.
In my personal opinion, this level of failover redundancy, that goes above
and beyond the HPI spec, is best done at a higher level middleware above
the HPI layer. For example, Linux-HA supports a stonith plugin for the
BladeCenter using OpenHPI.
However, though the primary goal of OpenHPI is to implement and comply
with the SAF HPI spec, secondary goals are guided by its volunteers and
their interests. So I would support any contributions as long as they are
clean and integrate well.
--Renier
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Openhpi-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openhpi-devel