Re: [squid-dev] RFC: Categorize level-0/1 messages

2021-10-20 Thread Alex Rousskov
On 10/20/21 3:14 PM, Amos Jeffries wrote:
> On 21/10/21 4:22 am, Alex Rousskov wrote:
>> To facilitate automatic monitoring of Squid cache.logs, I suggest to
>> adjust Squid code to divide all level-0/1 messages into two major
>> categories -- "problem messages" and "status messages"[0]:


> We already have a published categorization design which (when/if used)
> solves the problem(s) you are describing. Unfortunately that design has
> not been followed by all authors and conversion of old code to it has
> not been done.

> Please focus your project on making Squid actually use the system of
> debugs line labels. The labels are documented at:
>   https://wiki.squid-cache.org/SquidFaq/SquidLogs#Squid_Error_Messages

AFAICT, the partial classification in that wiki table is an opinion on
how things could be designed, and that opinion does not reflect Project
consensus. FWIW, I cannot use that wiki table for labeling messages, but
I do not want to hijack this RFC thread for that table review.

Fortunately, there are many similarities between the wiki table and this
RFC that we can and should capitalize on instead:

* While the wiki table is silent about the majority of existing
cache.log messages, most of the messages it is silent about probably
belong to the "status messages" category proposed by this RFC. This
assumption gives a usable match between the wiki table and the RFC for
about half of the existing level-0/1 cache.log messages. Great!

* The wiki table talks about FATAL, ERROR, and WARNING messages. These
labels match the RFC "problem messages" category. This match covers all
of the remaining cache.log messages except for 10 debugs() detailed
below. Thus, so far, there is a usable match on nearly all current
level-0/1 messages. Excellent!

* The wiki table also uses three "SECURITY ..." labels. The RFC does not
recognize those labels as special. I find their definitions in the wiki
table unusable/impractical, and you naturally think otherwise, but the
situation is not as bad as it may seem at the first glance:

- "SECURITY ERROR" is used once to report a coding _bug_. That single
use case does not match the wiki table SECURITY ERROR description. We
should be able to rephrase that single message so that does it not
contradict the wiki table and the RFC.

- "SECURITY ALERT" is used 6 times. Most or all of those cases are a
poor match for the SECURITY ALERT description in the wiki table IMHO. I
hope we can find a way to rephrase those 6 cases to avoid conflicts.

- "SECURITY NOTICE" is used 3 times. Two of those use cases can be
simply removed by removing the long-deprecated and increasingly poorly
supported SslBump features. I do not see why we should keep the third
message/feature, but if it must be kept, we may be able to rephrase it.

If we cannot reach an agreement regarding these 10 special messages, we
can leave them as is for now, and come back to them when we find a way
to agree on how/whether to assign additional labels to some messages.


Thus, there are no significant conflicts between the RFC and the table!
We strongly disagree how labels should be defined, but I do not think we
have to agree on those details to make progress here. We only need to
agree that (those 10 SECURITY messages aside) the RFC-driven message
categorization projects should adjust (the easily adjustable) messages
about Squid problems to use three standard labels: FATAL, ERROR, and
WARNING. Can we do just that and set aside the other disagreements for
another time?

If there are serious disagreements whether a specific debugs() is an
ERROR or WARNING, we can leave those specific messages intact until we
find a way to reach consensus. I hope there will be very few such
messages if we use the three labels from the RFC and do our best to
avoid controversial changes.


> What we do not have in that design is clarity on which labels are shown
> at what level.

In hope to make progress, I strongly suggest to _ignore_ the difference
between level 0 and level 1 for now. We are just too far apart on that
topic to reach consensus AFAICT. The vast majority of messages that
RFC-driven projects should touch (and, if really needed, _all_ such
messages!) can be left at their current level, avoiding this problem.



> I have one worry about you taking this on right now. PR 574 has not been
> resolved and merged yet, but many of the debugs() messages you are going
> to be touching in here should be converted to thrown exceptions - which
> ones and what exception type is used has some dependency on how that PR
> turns out.

If the RFC PRs are merged first, Factory will help resolve conflicts in
the PR 574 branch. While resolving boring conflicts is certainly
annoying, this is not really a big deal in this case, and both projects
are worth the pain.

Alternatively, I can offer to massage PR 574 branch into merge-able
shape _before_ we start working on these PRs. While the current branch
code has serious problems, I believe they have 

Re: [squid-dev] RFC: Categorize level-0/1 messages

2021-10-20 Thread Amos Jeffries

On 21/10/21 4:22 am, Alex Rousskov wrote:

Hello,

 Nobody likes to be awaken at night by an urgent call from NOC about
some boring Squid cache.log message the NOC folks have not seen before
(or miss a critical message that was ignored by the monitoring system).
To facilitate automatic monitoring of Squid cache.logs, I suggest to
adjust Squid code to divide all level-0/1 messages into two major
categories -- "problem messages" and "status messages"[0]:



We already have a published categorization design which (when/if used) 
solves the problem(s) you are describing. Unfortunately that design has 
not been followed by all authors and conversion of old code to it has 
not been done.


Please focus your project on making Squid actually use the system of 
debugs line labels. The labels are documented at:

  


What we do not have in that design is clarity on which labels are shown 
at what level. IMO they should be:


 * DBG_CRITICAL(0) - admin *need* to know this even if they do not 
think they want to.

  - FATAL
  - SECURITY ALERT
  - ERROR which were mislabeled and should be FATAL

 * DBG_IMPORTANT(1) - some admin want to know these, not mandatory though.
  - ERROR
  - SECURITY ERROR
  - SECURITY WARNING

 * level-2 - status, troubleshooting etc.
  - WARNING admin cannot do anything about
  - SECURITY NOTICE (these are for troubleshooting advice)

 * level-3+ - other




There are also "squid -k parse" messages
that are easy to find automatically if somebody wants to classify them
properly.


Those are level 1-2 messages that become mandatory to display on 
startup/reconfigure.




I have one worry about you taking this on right now. PR 574 has not been 
resolved and merged yet, but many of the debugs() messages you are going 
to be touching in here should be converted to thrown exceptions - which 
ones and what exception type is used has some dependency on how that PR 
turns out.



Amos
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] RFC: Categorize level-0/1 messages

2021-10-20 Thread Alex Rousskov
Hello,

Nobody likes to be awaken at night by an urgent call from NOC about
some boring Squid cache.log message the NOC folks have not seen before
(or miss a critical message that was ignored by the monitoring system).
To facilitate automatic monitoring of Squid cache.logs, I suggest to
adjust Squid code to divide all level-0/1 messages into two major
categories -- "problem messages" and "status messages"[0]:

* Problem messages are defined as those that start with one of the three
well-known prefixes: FATAL:, ERROR:, and WARNING:. These are the
messages that most admins may want to be notified about (by default[1])
and these standardized prefixes make setting up reliable automated
notifications straightforward.

* Status messages are all other messages. Most admins do not want to be
notified about normal Squid state changes and progress reports (by
default[2]). These status messages are still valuable in triage so they
are _not_ going away[3].


Today, Squid does not support the problem/status message classification
well. To reach the above state, we will need to adjust many messages so
that they fall into the right category. However, my analysis of the
existing level-0/1 messages shows that it is doable to correctly
classify most of them without a lot of tedious work (all numbers and
prefix strings below are approximate and abridged for clarity of the
presentation):

* About 40% of messages (~700) already have "obvious" prefixes: BUG:,
BUG [0-9]*:, ERROR:, WARNING:, and FATAL:. We will just need to adjust
~20 existing BUG messages to move them into one of the three proposed
major categories (while still being clearly identified as Squid bugs, of
course).

* About 15% of messages (~300) can be easily found and adjusted using
their prefixes (after excluding the "obvious" messages above). Here is a
representative sample of those prefixes: SECURITY NOTICE, Whoops!, XXX:,
UPGRADE:, CRITICAL, Error, ERROR, Error, error, ALERT, NOTICE, WARNING!,
WARNING OVERRIDE, Warning:, Bug, Failed, Stop, Startup:, Shutdown:,
FATAL Shutdown, avoiding, suspending, DNS error, bug, cannot, can't,
could not, couldn't, bad, unable, malformed, unsupported, not found,
missing, broken, unexpected, invalid, corrupt, obsolete, unrecognised,
and unknown. Again, there is valuable information in many of these
existing prefixes, and all valuable information will be preserved (after
the standardized prefix). Some of these messages may be demoted to
debugging level 2.

* The remaining 45% of messages (~800) may remain as is during the
initial conversion. Many of them are genuine status/progress messages
with prefixes like these: Creating, Processing, Adding, Accepting,
Configuring, Sending, Making, Rebuilding, Skipping, Beginning, Starting,
Initializing, Installing, Indexing, Loading, Preparing, Killing,
Stopping, Completed, Indexing, Loading, Killing, Stopping, Finished,
Removing, Closing, Shutting. There are also "squid -k parse" messages
that are easy to find automatically if somebody wants to classify them
properly. Most other messages can be adjusted as/if they get modified or
if we discover that they are frequent/important enough to warrant a
dedicated adjustment.

If there are no objections or better ideas, Factory will work on a few
PRs that adjust the existing level-0/1 messages according to the above
classification, in the rough order of existing message categories/kinds
discussed in the three bullets above.


Thank you,

Alex.

[0] The spelling of these two category names is unimportant. If you can
suggest better category names, great, but let's focus on the category
definitions.

[1] No default will satisfy everybody, and we already have the
cache_log_message directive that can control the visibility and volume
of individual messages. However, manually setting those parameters for
every level-0/1 message is impractical -- we have more than 1600 such
messages! This RFC makes a reasonable _default_ treatment possible.

[2] Admins can, of course, configure their log monitoring scripts to
alert them of certain status messages if they consider those messages
important. Again, this RFC is about facilitating reasonable _default_
treatment.

[3] We could give status messages a unique prefix as well (e.g., INFO:)
but such a prefix is not necessary to easily distinguish them _and_
adding a prefix would create a lot more painful code changes, so I think
we should stay away from that idea.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev