When looking at my dtlog today.
I noticed that snmpdiskspace.monitor, ( the official release, and ours )
dumps the header line instead of the failed hosts into (last_summary) if
-list or -listall was specified.

In the process of fixing that,
...
What follows applies to our release only.
...
I found that some of my hosts where not reporting partitions although
all the snmp data was there.
In the process of investigating that,
I found this and wanted to bounce it off the list before releasing it.

In the hostmib processing code right after

# ignore this instance
# and try to move on
# to next we wouldn't
# need this if
# use-dummy-values
# really worked

If I add
$instancenum = $instancenum - 1;

I can recover from the error, and get the data.

What I found is that,
Its not usually the current instance of the dataset that is corrupted.
Usually a previous dataset was incomplete, which skews the instance #'s.
If we decrement the instance #, we can recover the current dataset.

Using that one line fix, however could cause a show stopper.

If the previous incomplete dataset's instance # was numerically right
before the current instance#.
Decrementing would cause code to loop infinitely.

I tested this by setting the Instance# to one that was causing us
problems, instead of decrementing.

Here, decrementing the instance# works fine.
The question is what is the risk of it breaking stuff elsewhere.



-- 
Sincerely,

Nathan Gibbs

Systems Administrator
Christ Media
http://www.cmpublishers.com


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
mon mailing list
mon@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Reply via email to