Hello, Nobody likes to be awaken at night by an urgent call from NOC about some boring Squid cache.log message the NOC folks have not seen before (or miss a critical message that was ignored by the monitoring system). To facilitate automatic monitoring of Squid cache.logs, I suggest to adjust Squid code to divide all level-0/1 messages into two major categories -- "problem messages" and "status messages"[0]:
* Problem messages are defined as those that start with one of the three well-known prefixes: FATAL:, ERROR:, and WARNING:. These are the messages that most admins may want to be notified about (by default[1]) and these standardized prefixes make setting up reliable automated notifications straightforward. * Status messages are all other messages. Most admins do not want to be notified about normal Squid state changes and progress reports (by default[2]). These status messages are still valuable in triage so they are _not_ going away[3]. Today, Squid does not support the problem/status message classification well. To reach the above state, we will need to adjust many messages so that they fall into the right category. However, my analysis of the existing level-0/1 messages shows that it is doable to correctly classify most of them without a lot of tedious work (all numbers and prefix strings below are approximate and abridged for clarity of the presentation): * About 40% of messages (~700) already have "obvious" prefixes: BUG:, BUG [0-9]*:, ERROR:, WARNING:, and FATAL:. We will just need to adjust ~20 existing BUG messages to move them into one of the three proposed major categories (while still being clearly identified as Squid bugs, of course). * About 15% of messages (~300) can be easily found and adjusted using their prefixes (after excluding the "obvious" messages above). Here is a representative sample of those prefixes: SECURITY NOTICE, Whoops!, XXX:, UPGRADE:, CRITICAL, Error, ERROR, Error, error, ALERT, NOTICE, WARNING!, WARNING OVERRIDE, Warning:, Bug, Failed, Stop, Startup:, Shutdown:, FATAL Shutdown, avoiding, suspending, DNS error, bug, cannot, can't, could not, couldn't, bad, unable, malformed, unsupported, not found, missing, broken, unexpected, invalid, corrupt, obsolete, unrecognised, and unknown. Again, there is valuable information in many of these existing prefixes, and all valuable information will be preserved (after the standardized prefix). Some of these messages may be demoted to debugging level 2. * The remaining 45% of messages (~800) may remain as is during the initial conversion. Many of them are genuine status/progress messages with prefixes like these: Creating, Processing, Adding, Accepting, Configuring, Sending, Making, Rebuilding, Skipping, Beginning, Starting, Initializing, Installing, Indexing, Loading, Preparing, Killing, Stopping, Completed, Indexing, Loading, Killing, Stopping, Finished, Removing, Closing, Shutting. There are also "squid -k parse" messages that are easy to find automatically if somebody wants to classify them properly. Most other messages can be adjusted as/if they get modified or if we discover that they are frequent/important enough to warrant a dedicated adjustment. If there are no objections or better ideas, Factory will work on a few PRs that adjust the existing level-0/1 messages according to the above classification, in the rough order of existing message categories/kinds discussed in the three bullets above. Thank you, Alex. [0] The spelling of these two category names is unimportant. If you can suggest better category names, great, but let's focus on the category definitions. [1] No default will satisfy everybody, and we already have the cache_log_message directive that can control the visibility and volume of individual messages. However, manually setting those parameters for every level-0/1 message is impractical -- we have more than 1600 such messages! This RFC makes a reasonable _default_ treatment possible. [2] Admins can, of course, configure their log monitoring scripts to alert them of certain status messages if they consider those messages important. Again, this RFC is about facilitating reasonable _default_ treatment. [3] We could give status messages a unique prefix as well (e.g., INFO:) but such a prefix is not necessary to easily distinguish them _and_ adding a prefix would create a lot more painful code changes, so I think we should stay away from that idea. _______________________________________________ squid-dev mailing list squid-dev@lists.squid-cache.org http://lists.squid-cache.org/listinfo/squid-dev