Re: [Openhpi-devel] Regarding bug 1794430

Renier Morales Thu, 15 May 2008 16:58:50 -0700

[EMAIL PROTECTED] wrote on 05/15/2008 01:15:12 
PM:

> Hi,
> 
> While trying to analyze the second half (duplicate alarms) of the 
> bug 1794430, I got stuck with a basic question.
> Why do we need to store the DAT entries in persistent memory?
> 
> Please correct me if I'm wrong.
> 
> According to SAF HPI spec, DAT contains the entries for active alarm.
> The DAT stores the active alarms and deletes when alarms get cleared.
> 
> Below is the extraction for SAF HPI B.02.01 spec (section 6.6):
> The domain controller maintains a Domain Alarm Table (DAT) which 
> contains entries for each active alarm in the domain. Alarms are 
> added to and deleted from the DAT by the HPI implementation as the 
> presence or absence of the corresponding conditions are detected by 
> the domain controller.
> 
> Storing the entries in persistent memory is required for marinating 
> the history.
> This is true for Domain Event Log (DEL) and not correct for DAT.


It is neither incorrect or correct to provide presistence for alarms. More 
on this below.

> 
> I'll try to explain with examples.
> Scenario 1
> -------------
> 1. The user has enabled option of saving DAT entries to persistent 
memory.
> 2. The openhpi daemon is started.
> 3. A resource (say) R1, reports an alarm related to temperature sensor.
> 4. An entry is created in the DAT and same is stored into the DAT file.
> 5. The openhpi daemon is brought down (for any reason). 
> 6. Before the openhpi daemon is restarted, the resource R1 got 
> resolved problem and temperature alarm got cleared.
> 7. openhpi daemon is restarted. 
>    Domain controller (openhpi daemon) reads the DAT file and creates
> an alarm entry for the temperature sensor in the DAT. 
> 
> This entry is a wrong as the temperature sensor alarm of resource R1
> is not active anymore.
> Since the domain controller alarm entries added by reading DAT file,
> openhpi plugin (which is managing resource R1) will not be aware of 
> these entries.
> 
> Hence, alarm entries whose state is already cleared will never get 
> deleted from DAT.

In this case the HPI user has to at least acknowledge the alarm and look 
only at unacknowledged alarms.
The truth is that the HPI daemon won't restart very often and if it does, 
it should come back up quickly. The probability that an event which 
deasserts the alarm condition comes up just between restarts is very low. 
However, I'm not sure what the solution is in this edge case... Maybe the 
daemon should ask the plugins about the validity of alarms that where read 
in on startup, some way.

> 
> Scenario 2
> ------------
> Let us take the same situation as explained in scenario 1 till step 5.
> 
> 6. The alarm condition in resource R1 still persists.
> 7. The openhpi daemon is restarted. 
>    Domain controller (openhpi daemon) reads the DAT file and creates
> an alarm entry for the temperature sensor in the DAT. 
> 8. Since the openhpi plugin (which is managing the resource R1) is 
> not aware of the alarm added by the domain controller, plugin 
> detects the temperature sensor alarm condition and reports the same into 
DAT.
> 
> Scenario 2 is root cause of the duplicate alarms as reported in bug 
1794430.

Duplication of alarms can happen and I don't see it as a problem if they 
are valid. That is, if the hardware reported an asserted alarm condition 
for the same instrument or resource on more than one ocasion. That just 
means that HPI is reflecting exactly what the hardware is saying.
Once the daemon detects that the alarm condition is deasserted, all 
matching alarms (same instrument or resource) will be removed in one 
swoop.
Code can be inserted to scan the DAT before adding an alarm to avoid 
duplication if its a problem for others, but I don't see it as a major 
problem.

> 
> I'm not able to find a reference in SAF HPI spec for storing the DAT
> entries in persistent memory.
> 

The presistence of the DAT to disk is a feature of the HPI implementation. 
You won't find this in the specification.
Scenario 3
----------
1. Daemon is running and logs alarm conditions.
2. Daemon is brought down (node is recycled)
3. Hardware alarm condition is still asserted.
4. Daemon comes back up and reads the persisted DAT on startup.
5. As a result alarm is restored to the Domain's living DAT.

Without persitence the daemon would have lost all reference that an alarm 
had been asserted. Its better to acknowledge alarms that you know you or 
the hardware has already dealt with, than live with missing alarms.

        --Renier

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/

_______________________________________________
Openhpi-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openhpi-devel

Re: [Openhpi-devel] Regarding bug 1794430

Reply via email to