I know that 2.2.0 has been out for a while now, but it only became generally 
available for Enterprise customers this past week in the form of 2.2.1.  After 
reading the many fixes and enhancements in this release, I was eager to get it 
installed and try it out.  Let's just say that this upgrade has made me 
seriously consider bringing up a test VM the next time I upgrade so I don't 
mess with the production instance.  (Yes, yes, I know I should be doing this 
anyway.)  2.2.1 works, it definitely works; it's just more of that old saying 
"the devil you know...".

Throughout the last couple days I have been writing down things I have seen 
with this update, and I am presenting them here for others to comment on.  Note 
that I have done no searching through the archives yet about any of these 
issues; this is solely a first impressions post.  Since I am an Enterprise 
customer, some of these issues may also end up being support cases as well if I 
can't find resolution on the forums--hopefully the issues I am seeing are all 
easy fixes, though.  That would be nice. :-)

Ok, here's the list, straight off my wiki article about the upgrade:


== Issues After Upgrade ==
        Upgrading the Zenoss Enterprise ZenPack changes many Windows Services' 
zMonitor property from false to true (kdc, ntfrs, exchange*, sql*), even though 
I had explicitly disabled monitoring on those services (from the last time I 
upgraded the enterprise zenpack--sigh).  This even affects servers that are 
*not* running any of the services listed above.  For example, Zenoss will show 
that the ntfrs service is down on all servers, even though it is only installed 
and running on the domain controllers and two or three others.  Also, after 
changing all the relevant services back to "zMonitor = false", Zenoss still 
reports that these services are down on a few servers (yet not all of 
them--curious).  I had to go into each affected server, find the service, and 
change the zMonitor property from false to true, save, and then switch back to 
false (and save) in order to completely fix the problem.

        "Threshold of zenwin cycle time exceeded" and "zenwin heartbeat 
failure" issues.  Could these be contributing to the next issue below?  Update: 
 this is still happening two days after the upgrade.

        ~1000 events regarding "Wmi communication failure during connect"--and 
others--after the first zenmodeler (?) poll of Windows servers.  LOTS of email 
alerts generated (had paging disabled though).  Even more "Wmi communication 
failure during connect" events the day after the upgrade.  Continuing to 
monitor this issue.  Update:  two days after the upgrade there are 23 of these 
errors in the event console.  Either 2.2.1 broke something, or I didn't know 
the problem existed before because of poor reporting.  Also, if these errors 
are transient, then shouldn't they be warnings by default?  That way they'll go 
away after a few hours and it won't  look like the entire datacenter blew up in 
a sea of orange.  I assume they are transient because the count on all of them 
is only 1.  Or are these more errors like RPC_S_CALL_FAILED that must be 
cleared before the servers will be monitored again?

        The zenwinmodeler daemon is no longer listed under the Daemons tab (and 
doesn't get started with a 'zenoss start' command), yet 
$ZENHOME/bin/zenwinmodeler still exists.  Is it still used, or should it be 
deleted?  Also, zenwinmodeler shows up as a component on certain events.  If 
it's no longer used, why are there still references to it?

        I didn't see anything in the install that said I needed to reset the 
Page Command to something useful.  More of a documentation issue, really.

        Why do the daemon "threshold" alerts insist on setting the device name 
to "localhost", even after I changed the "Hostname" value (found at 
/Monitors/Hub/localhost/localhost) to the actual name of the server?  Is there 
somewhere else to change this value?

        On a good note, I haven't yet seen any RPC_S_CALL_FAILED errors.  
Usually I get at least one per day (at 10:26am, if you can believe the 
regularity).  The fact that I haven't seen one yet makes me happy.


=== Modifications after upgrade ===
        changed "Process Parallel Jobs" from 10 to 20 to try to get a little 
better performance.  (8 cores, and 7 are bored all day long.  Hello, 
parallelism?)  Need to read up on this to make sure it does what I think it 
does.

        changed "Windows Modeler Cycle Interval" from 60s to 120s to try and 
alleviate the "cycle time exceeded" error.  (Note above, this didn't seem to 
help.  And interestingly, yesterday I saw one cycle take 84 seconds and 
generate an "exceeded" error, even though the cycle time was 120.  WTF?

        changed "Page Command" value; substituted "snpp.metrocall.com" in place 
of "localhost".



And there you go.  If anyone has any comments on anything above, please let me 
know!  Like I said, I'll be trolling through the forums over the next few days 
to see if any of these issues have been fixed.  For those that I do find fixes 
for, and any other issues that crop up and / or fix themselves, I'll update 
this thread as well.

Finally:  a big thank you to the Zenoss team for this release!  Other than the 
issues I outlined above, this looks like it will be a good step forward from 
2.1.3.

--

seth wright ([EMAIL PROTECTED])
windows engineer
540.568.2912 (office)
james madison university




-------------------- m2f --------------------

Read this topic online here:
http://community.zenoss.com/forums/viewtopic.php?p=21782#21782

-------------------- m2f --------------------



_______________________________________________
zenoss-users mailing list
[email protected]
http://lists.zenoss.org/mailman/listinfo/zenoss-users

Reply via email to