Jim - HF04 has been pulled due to an inordinate number of bugs! If you can roll back to H03 until HF04 is "fixed" I'd strongly advise it.
From: Pfleger, Jim [mailto:[email protected]] Sent: 26 May 2011 16:09 To: spectrum Subject: [spectrum] H04 experiences (WAS: Problems with Spectrum 9.2 H03) As promised, here's an update on the issues from my previous email - all four were addressed in H04. I'm awaiting final confirmation from our reporting guru on the SRM issue, but I believe it's been resolved. Also, we're occasionally getting duplicate notifications again, but the debug looks different from the last time this happened, so we're treating it as a separate issue. Because of these fixes (and others) we installed H04 as soon as it was available, so I can provide some early feedback. * We've found two events (so far) that are no longer logged to the events database (0x1010a and 0x1010d, for model attr changes). We're going to now conduct a full audit of what other undocumented event changes came with H04. * It seems that our failover servers are sometimes being started as root instead of ssadmin, which is mangling file permissions. The problem started after H04, but it's not clear why yet, partly because it's only happening on some of our failover servers. * The negative alarm counts issue citied in the release notes is not (completely) fixed, and we already have a new ticket open on it. That's all I have so far on H04. When I have anything else of note, I'll definitely share it with the group. HTH, Jim -- JIM PFLEGER | Application Architect | Insight | insight.com t. 480.889.9680 f. 480.889.9599 [email protected] On 3/28/11 11:22 AM, "Pfleger, Jim" <[email protected]> wrote: I wanted to share with everyone some of the problems we've had with Spectrum that we're currently working with CA in the hopes that knowing about them will help out others. I don't know if they are specific to 9.2 H03, but I do know that they all exist on this version. * SRM missing alarms. The SRM alarm table is missing alarms that the SRM event and outage tables say should be there. We originally found this when going through the outages table and trying to match up outages to alarms so we could get their trouble ticket IDs. As we've continued to work this, we've been able to craft a query that matches the number of 0x10701 events ("Alarm will be generated") with the number of alarms actually in SRM for the same time period. If anyone would like to check the consistency of their SRM databases, contact me directly for the query. CA has accepted this as an issue and is working to determine a cause. * Condition correlation engine does not resuppress. This one takes a bit of explaining, so please bear with me. Suppose you have a simple network like this: SS --- A --- B If B goes down, you will get several alarms, including "blade status unknown", "chassis down", and "device not responding to polls". In our environment, we have a condition correlation that will use "device not responding" to suppress the other two, so the operators see one root cause, and two hidden symptoms. Now if A goes down, it will suppress B. Specifically, the "device not responding" alarm on B is cleared and immediately replaced with an "all device connections are unreachable" alarm. The problem is that, with the original "device not responding" alarm on B now cleared, there is nothing to suppress the "blade status unknown" and "chassis down" alarms, which now display to the operators. Logically, the alarm on A that is suppressing B should also be used to resuppress all the alarms on B, but this doesn't happen. CA has accepted this as an issue, and created a patch that we're currently testing. * Notifier duplicating or not sending alarms. We've received two different patches for Notifier - one was for it occasionally acting on an alarm twice (about 1.5% of alarms), and the other was for it not acting on an alarm at all (about 0.1% of alarms). These patches were merged together into D89a. These are the ones that I think will be of wide interest. When I have substantial updates to share with the group, I will definitely do so. If you're seeing other strange behaviors, or have any questions about these, please contact me to discuss. I think that an open flow of information about these sorts of issues benefits us all. Jim * --To unsubscribe from spectrum, send email to [email protected] with the body: unsubscribe spectrum [email protected] ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________
smime.p7s
Description: S/MIME cryptographic signature
