Jim - HF04 has been pulled due to an inordinate number of bugs!  If you can
roll back to H03 until HF04 is "fixed" I'd strongly advise it.

 

 

From: Pfleger, Jim [mailto:[email protected]] 
Sent: 26 May 2011 16:09
To: spectrum
Subject: [spectrum] H04 experiences (WAS: Problems with Spectrum 9.2 H03)

 

As promised, here's an update on the issues from my previous email - all
four were addressed in H04. I'm awaiting final confirmation from our
reporting guru on the SRM issue, but I believe it's been resolved. Also,
we're occasionally getting duplicate notifications again, but the debug
looks different from the last time this happened, so we're treating it as a
separate issue.

Because of these fixes (and others) we installed H04 as soon as it was
available, so I can provide some early feedback.

*       We've found two events (so far) that are no longer logged to the
events database (0x1010a and 0x1010d, for model attr changes). We're going
to now conduct a full audit of what other undocumented event changes came
with H04. 
*       It seems that our failover servers are sometimes being started as
root instead of ssadmin, which is mangling file permissions. The problem
started after H04, but it's not clear why yet, partly because it's only
happening on some of our failover servers. 
*       The negative alarm counts issue citied in the release notes is not
(completely) fixed, and we already have a new ticket open on it.


That's all I have so far on H04. When I have anything else of note, I'll
definitely share it with the group.

HTH,
Jim


-- 
JIM PFLEGER  |  Application Architect  |  Insight  |  insight.com

t. 480.889.9680 f. 480.889.9599  [email protected]




On 3/28/11 11:22 AM, "Pfleger, Jim" <[email protected]> wrote:

I wanted to share with everyone some of the problems we've had with Spectrum
that we're currently working with CA in the hopes that knowing about them
will help out others. I don't know if they are specific to 9.2 H03, but I do
know that they all exist on this version.

*       SRM missing alarms. The SRM alarm table is missing alarms that the
SRM event and outage tables say should be there. We originally found this
when going through the outages table and trying to match up outages to
alarms so we could get their trouble ticket IDs. As we've continued to work
this, we've been able to craft a query that matches the number of 0x10701
events ("Alarm will be generated") with the number of alarms actually in SRM
for the same time period. If anyone would like to check the consistency of
their SRM databases, contact me directly for the query. CA has accepted this
as an issue and is working to determine a cause. 
*       Condition correlation engine does not resuppress. This one takes a
bit of explaining, so please bear with me. Suppose you have a simple network
like this:

SS --- A --- B

If B goes down, you will get several alarms, including "blade status
unknown", "chassis down", and "device not responding to polls". In our
environment, we have a condition correlation that will use "device not
responding" to suppress the other two, so the operators see one root cause,
and two hidden symptoms. Now if A goes down, it will suppress B.
Specifically, the "device not responding" alarm on B is cleared and
immediately replaced with an "all device connections are unreachable" alarm.
The problem is that, with the original "device not responding" alarm on B
now cleared, there is nothing to suppress the "blade status unknown" and
"chassis down" alarms, which now display to the operators. Logically, the
alarm on A that is suppressing B should also be used to resuppress all the
alarms on B, but this doesn't happen. CA has accepted this as an issue, and
created a patch that we're currently testing.

*       Notifier duplicating or not sending alarms. We've received two
different patches for Notifier - one was for it occasionally acting on an
alarm twice (about 1.5% of alarms), and the other was for it not acting on
an alarm at all (about 0.1% of alarms). These patches were merged together
into D89a.


These are the ones that I think will be of wide interest. When I have
substantial updates to share with the group, I will definitely do so.

If you're seeing other strange behaviors, or have any questions about these,
please contact me to discuss. I think that an open flow of information about
these sorts of issues benefits us all.

Jim

*       --To unsubscribe from spectrum, send email to [email protected] with
the body: unsubscribe spectrum [email protected] 


______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to