I've had times (and sorry, I don't have the emails from SIM/Zenoss anymore)
where I've gotten notifications of hardware issues from SIM but not Zenoss.
One example is an attached HP MSA500 on an HP DL380 with all proper agents
installed. SIM notified me:
Event Identification and Details
Event Severity Critical
Cleared Status Not cleared
Event Source SERVER
Associated System SERVER
Associated System Status Normal
Event Time Tue, 10/16/2007, 8:26 PM PDT
Description Logical Drive Status Change. This trap signifies that the agent
has detected a change in the status of a drive array logical drive. The
variable cpqDaLogDrvStatus indicates the current logical drive status.
Assignee
Comments
Trap Details
Variable Description Value
An administratively-assigned name for this managed node. By convention, this is
the node``s fully-qualified domain name. SERVER
The Trap Flags. This is a collection of flags used during trap delivery. Each
bit has the following meaning: Bit 5-31: RESERVED: Always 0. Bit 2-4: Trap
Condition 0= Not used (for backward compatibility) 1= Condition unknown or N/A
2= Condition ok 3= Condition degraded 4= Condition failed 5-7= reserved Bit 1:
Client IP address type 0= static entry 1= DHCP entry Bit 0: Agent Type 0=
Server 1= Client NOTE: bit 31 is the most significant bit, bit 0 is the least
significant. 16
A text description of the hardware location of the controller. A NULL string
indicates that the hardware location could not be determined or is irrelevant.
Slot 3
Drive Array Logical Drive Controller Index. This maps the logical drives into
their respective controllers. Controller index ``i`` under the controller group
owns the associated drives in the logical drive group which use that index.
3
Drive Array Logical Drive Index. This logical drive number keeps track of
multiple instances of logical drives which are on the same controller. For each
controller index value, the logical drive index starts at 1 and increments for
each logical drive. 2
Logical Drive Status. The logical drive can be in one of the following states:
Ok (2) Indicates that the logical drive is in normal operation mode. Failed (3)
Indicates that more physical drives have failed than the fault tolerance mode
of the logical drive can handle without data loss. Unconfigured (4) Indicates
that the logical drive is not configured. Recovering (5) Indicates that the
logical drive is using Interim Recovery Mode. In Interim Recovery Mode, at
least one physical drive has failed, but the logical drive``s fault tolerance
mode lets the drive continue to operate with no data loss. Ready Rebuild (6)
Indicates that the logical drive is ready for Automatic Data Recovery. The
physical drive that failed has been replaced, but the logical drive is still
operating in Interim Recovery Mode. Rebuilding (7) Indicates that the logical
drive is currently doing Automatic Data Recovery. During Automatic Data
Recovery, fault tolerance algorithms restore data to the replacemen
t drive. Wrong Drive (8) Indicates that the wrong physical drive was replaced
after a physical drive failure. Bad Connect (9) Indicates that a physical drive
is not responding. Overheating (10) Indicates that the drive array enclosure
that contains the logical drive is overheating. The drive array is still
functioning, but should be shutdown. Shutdown (11) Indicates that the drive
array enclosure that contains the logical drive has overheated. The logical
drive is no longer functioning. Expanding (12) Indicates that the logical drive
is currently doing Automatic Data Expansion. During Automatic Data Expansion,
fault tolerance algorithms redistribute logical drive data to the newly added
physical drive. Not Available (13) Indicates that the logical drive is
currently unavailable. If a logical drive is expanding and the new
configuration frees additional disk space, this free space can be configured
into another logical volume. If this is done, the new volume will be set to no
t available. Queued For Expansion (14) Indicates that the logical drive is
ready for Automatic Data Expansion. The logical drive is in the queue for
expansion. failed
Mib Information
The associated MIB File Name for this trap is cpqida.mib and the MIB identifier
CPQIDA-MIB
I have email from that same day in my deleted items folder from Zenoss, but not
from this server (it is possible that I did a permanent delete on these
messages from Zenoss alone...unfortunately I can't remember, and Zenoss didn't
keep a record back to 10/16).
I'm also still concerned about the WMI RPC and snmp_authenticationFailure
errors. I don't have these issues with SIM, but they've come to bite in the
Zenoss 2.1 release. I don't necessarily see any issues because of the errors,
but I may (or may not!) be missing something because of them.
I do like the HP SIM not sending CLEAR emails. There are events I really don't
care about seeing CLEARs from. This is purely cosmetic, but I also like HP
SIMs HTML email versus plain text. Last two things...Native-no-hassle Active
Directory authentication (I haven't even tried with Zenoss/Zope) and brain-dead
(hey, I'm a Windows admin afterall) deployment/configuration of
OpenSSH/PeguisusWMI. I wouldn't mind installing these two items manually, but
I would not want to configure them manually.
Besides that, you know, it may just be because Zenoss is new...I know how the
SIM functions and what to expect from it. With Zenoss, sometimes I don't.
IOW, I do a lot more troubleshooting with Zenoss than I do with SIM (which is
basically 0).
-------------------- m2f --------------------
Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=12298#12298
-------------------- m2f --------------------
_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users