Re: [squid-dev] RFC: Categorize level-0/1 messages

2021-12-05 Thread Alex Rousskov
On 12/5/21 8:06 AM, Amos Jeffries wrote:
> On 21/10/21 16:16, Alex Rousskov wrote:
>> On 10/20/21 3:14 PM, Amos Jeffries wrote:
>>> On 21/10/21 4:22 am, Alex Rousskov wrote:
 To facilitate automatic monitoring of Squid cache.logs, I suggest to
 adjust Squid code to divide all level-0/1 messages into two major
 categories -- "problem messages" and "status messages"[0]:
>>
>>> We already have a published categorization design which (when/if used)
>>> solves the problem(s) you are describing. Unfortunately that design has
>>> not been followed by all authors and conversion of old code to it has
>>> not been done.
>>
>>> Please focus your project on making Squid actually use the system of
>>> debugs line labels. The labels are documented at:
>>>    https://wiki.squid-cache.org/SquidFaq/SquidLogs#Squid_Error_Messages
>>
>> AFAICT, the partial classification in that wiki table is an opinion on
>> how things could be designed, and that opinion does not reflect Project
>> consensus.

> The wiki was written from observation of how the message labels are/were
> being used in the code. As such it reflects the defacto consensus of
> everyone ever authoring code that used one of the labels.

[ N.B. I am worried that this (mostly irrelevant IMO) part of the
discussion risks ruining the shaky agreement we have reached on the
important parts of the RFC, but I am also worried about
misrepresentation of the wiki table status. I will respond here, but
please move any future discussion about that table status (if you decide
to continue them) to a different email thread. ]

AFAICT, the wiki table in question does not accurately reflect Squid
code and does not constitute Project consensus on how things could be
designed, regardless of what observations led to that table creation.
The creative process of writing a classification table (based on code
observations) naturally allows for misinterpretations, mistakes, and
other problems. One cannot claim consensus on the _result_ on the
grounds that they have started with code observations.


>> FWIW, I cannot use that wiki table for labeling messages, but
>> I do not want to hijack this RFC thread for that table review.

> You our text below contradicts the "cannot" statement by describing how
> the two definitions fit together and offer to use the wiki table labels
> for problem category.

I cannot use the wiki table for deciding how to label a given message
(because of the problems with the table definitions that I would rather
not review here), but the primary labels we use (and should continue to
use!) are naturally found in that table. There is no contradiction here.


>> * The wiki table talks about FATAL, ERROR, and WARNING messages. These
>> labels match the RFC "problem messages" category. This match covers all
>> of the remaining cache.log messages except for 10 debugs() detailed
>> below. Thus, so far, there is a usable match on nearly all current
>> level-0/1 messages. Excellent!

> Thus my request that you use the wiki definitions to categorize the
> unlabeled and fix any detected labeling mistakes.

While I cannot use those wiki definitions (because of the problems with
the table that I would rather not review here), it is not a big deal as
far as this RFC is concerned because I do not have to use or violate
those definitions to implement the vast majority of the proposed changes
-- those changes are orthogonal to the wiki table and its definitions.

If somebody finds a table violation introduced by the RFC PR, then we
will either undo the corresponding PR change, change the label used by
the PR, or fix the table, but my goal is to minimize the number of such
cases because they are likely to waste a lot of time on difficult
discussions about poorly defined concepts.



>> * The wiki table also uses three "SECURITY ..." labels. The RFC does not
>> recognize those labels as special. I find their definitions in the wiki
>> table unusable/impractical, and you naturally think otherwise, but the
>> situation is not as bad as it may seem at the first glance:
>>
>> - "SECURITY ERROR" is used once to report a coding _bug_. That single
>> use case does not match the wiki table SECURITY ERROR description. We
>> should be able to rephrase that single message so that does it not
>> contradict the wiki table and the RFC.
>>
>> - "SECURITY ALERT" is used 6 times. Most or all of those cases are a
>> poor match for the SECURITY ALERT description in the wiki table IMHO. I
>> hope we can find a way to rephrase those 6 cases to avoid conflicts.
>>
>> - "SECURITY NOTICE" is used 3 times. Two of those use cases can be
>> simply removed by removing the long-deprecated and increasingly poorly
>> supported SslBump features. I do not see why we should keep the third
>> message/feature, but if it must be kept, we may be able to rephrase it.
>>
>> If we cannot reach an agreement regarding these 10 special messages, we
>> can leave them as is for now, and come back to them when we 

Re: [squid-dev] RFC: Categorize level-0/1 messages

2021-12-05 Thread Amos Jeffries

On 21/10/21 16:16, Alex Rousskov wrote:

On 10/20/21 3:14 PM, Amos Jeffries wrote:

On 21/10/21 4:22 am, Alex Rousskov wrote:

To facilitate automatic monitoring of Squid cache.logs, I suggest to
adjust Squid code to divide all level-0/1 messages into two major
categories -- "problem messages" and "status messages"[0]:




We already have a published categorization design which (when/if used)
solves the problem(s) you are describing. Unfortunately that design has
not been followed by all authors and conversion of old code to it has
not been done.



Please focus your project on making Squid actually use the system of
debugs line labels. The labels are documented at:
   https://wiki.squid-cache.org/SquidFaq/SquidLogs#Squid_Error_Messages


AFAICT, the partial classification in that wiki table is an opinion on
how things could be designed, and that opinion does not reflect Project
consensus.


The wiki was written from observation of how the message labels are/were 
being used in the code. As such it reflects the defacto consensus of 
everyone ever authoring code that used one of the labels.



NP: The "core team" or "dev team" are not "The Project". There are a 
large number of developers contributing to each version of Squid whose 
only voice in any of the style/design decisions is the existing Squid code.




FWIW, I cannot use that wiki table for labeling messages, but
I do not want to hijack this RFC thread for that table review.



You our text below contradicts the "cannot" statement by describing how 
the two definitions fit together and offer to use the wiki table labels 
for problem category.


I assume the below text is your definition of "cannot"? if not them 
please explain why not.




Fortunately, there are many similarities between the wiki table and this
RFC that we can and should capitalize on instead:

* While the wiki table is silent about the majority of existing
cache.log messages, most of the messages it is silent about probably
belong to the "status messages" category proposed by this RFC.


Exactly so.


This
assumption gives a usable match between the wiki table and the RFC for
about half of the existing level-0/1 cache.log messages. Great!

* The wiki table talks about FATAL, ERROR, and WARNING messages. These
labels match the RFC "problem messages" category. This match covers all
of the remaining cache.log messages except for 10 debugs() detailed
below. Thus, so far, there is a usable match on nearly all current
level-0/1 messages. Excellent!


Thus my request that you use the wiki definitions to categorize the 
unlabeled and fix any detected labeling mistakes.




* The wiki table also uses three "SECURITY ..." labels. The RFC does not
recognize those labels as special. I find their definitions in the wiki
table unusable/impractical, and you naturally think otherwise, but the
situation is not as bad as it may seem at the first glance:

- "SECURITY ERROR" is used once to report a coding _bug_. That single
use case does not match the wiki table SECURITY ERROR description. We
should be able to rephrase that single message so that does it not
contradict the wiki table and the RFC.

- "SECURITY ALERT" is used 6 times. Most or all of those cases are a
poor match for the SECURITY ALERT description in the wiki table IMHO. I
hope we can find a way to rephrase those 6 cases to avoid conflicts.

- "SECURITY NOTICE" is used 3 times. Two of those use cases can be
simply removed by removing the long-deprecated and increasingly poorly
supported SslBump features. I do not see why we should keep the third
message/feature, but if it must be kept, we may be able to rephrase it.

If we cannot reach an agreement regarding these 10 special messages, we
can leave them as is for now, and come back to them when we find a way
to agree on how/whether to assign additional labels to some messages.



AFAICT, they were added as equivalent to ERROR/WARNING in CVE fixes, or 
to highlight a known security vulnerability being opened by admin settings.


I am okay with them remaining untouched by a PR submission cleaning 
level 0/1 messages. Though they are there to use if any author finds a 
message that suitably meets their definition.





Thus, there are no significant conflicts between the RFC and the table!
We strongly disagree how labels should be defined,


Recall that the wiki is describing the observed pattern of label usage 
by all Squid contributors. That means any significant conflict is 
between your choice of definition and "The Project" as a whole. Minor 
conflicts may be just differences in my wording and yours on the 
observed pattern.




but I do not think we
have to agree on those details to make progress here.


The options for any author are to comply with the existing 
consensus/pattern or to get agreement on changing the definitions.


Options like changing the labeling scheme are off the table because we 
already have significant amounts of community using those labels with 
third-party 

Re: [squid-dev] RFC: Categorize level-0/1 messages

2021-10-20 Thread Alex Rousskov
On 10/20/21 3:14 PM, Amos Jeffries wrote:
> On 21/10/21 4:22 am, Alex Rousskov wrote:
>> To facilitate automatic monitoring of Squid cache.logs, I suggest to
>> adjust Squid code to divide all level-0/1 messages into two major
>> categories -- "problem messages" and "status messages"[0]:


> We already have a published categorization design which (when/if used)
> solves the problem(s) you are describing. Unfortunately that design has
> not been followed by all authors and conversion of old code to it has
> not been done.

> Please focus your project on making Squid actually use the system of
> debugs line labels. The labels are documented at:
>   https://wiki.squid-cache.org/SquidFaq/SquidLogs#Squid_Error_Messages

AFAICT, the partial classification in that wiki table is an opinion on
how things could be designed, and that opinion does not reflect Project
consensus. FWIW, I cannot use that wiki table for labeling messages, but
I do not want to hijack this RFC thread for that table review.

Fortunately, there are many similarities between the wiki table and this
RFC that we can and should capitalize on instead:

* While the wiki table is silent about the majority of existing
cache.log messages, most of the messages it is silent about probably
belong to the "status messages" category proposed by this RFC. This
assumption gives a usable match between the wiki table and the RFC for
about half of the existing level-0/1 cache.log messages. Great!

* The wiki table talks about FATAL, ERROR, and WARNING messages. These
labels match the RFC "problem messages" category. This match covers all
of the remaining cache.log messages except for 10 debugs() detailed
below. Thus, so far, there is a usable match on nearly all current
level-0/1 messages. Excellent!

* The wiki table also uses three "SECURITY ..." labels. The RFC does not
recognize those labels as special. I find their definitions in the wiki
table unusable/impractical, and you naturally think otherwise, but the
situation is not as bad as it may seem at the first glance:

- "SECURITY ERROR" is used once to report a coding _bug_. That single
use case does not match the wiki table SECURITY ERROR description. We
should be able to rephrase that single message so that does it not
contradict the wiki table and the RFC.

- "SECURITY ALERT" is used 6 times. Most or all of those cases are a
poor match for the SECURITY ALERT description in the wiki table IMHO. I
hope we can find a way to rephrase those 6 cases to avoid conflicts.

- "SECURITY NOTICE" is used 3 times. Two of those use cases can be
simply removed by removing the long-deprecated and increasingly poorly
supported SslBump features. I do not see why we should keep the third
message/feature, but if it must be kept, we may be able to rephrase it.

If we cannot reach an agreement regarding these 10 special messages, we
can leave them as is for now, and come back to them when we find a way
to agree on how/whether to assign additional labels to some messages.


Thus, there are no significant conflicts between the RFC and the table!
We strongly disagree how labels should be defined, but I do not think we
have to agree on those details to make progress here. We only need to
agree that (those 10 SECURITY messages aside) the RFC-driven message
categorization projects should adjust (the easily adjustable) messages
about Squid problems to use three standard labels: FATAL, ERROR, and
WARNING. Can we do just that and set aside the other disagreements for
another time?

If there are serious disagreements whether a specific debugs() is an
ERROR or WARNING, we can leave those specific messages intact until we
find a way to reach consensus. I hope there will be very few such
messages if we use the three labels from the RFC and do our best to
avoid controversial changes.


> What we do not have in that design is clarity on which labels are shown
> at what level.

In hope to make progress, I strongly suggest to _ignore_ the difference
between level 0 and level 1 for now. We are just too far apart on that
topic to reach consensus AFAICT. The vast majority of messages that
RFC-driven projects should touch (and, if really needed, _all_ such
messages!) can be left at their current level, avoiding this problem.



> I have one worry about you taking this on right now. PR 574 has not been
> resolved and merged yet, but many of the debugs() messages you are going
> to be touching in here should be converted to thrown exceptions - which
> ones and what exception type is used has some dependency on how that PR
> turns out.

If the RFC PRs are merged first, Factory will help resolve conflicts in
the PR 574 branch. While resolving boring conflicts is certainly
annoying, this is not really a big deal in this case, and both projects
are worth the pain.

Alternatively, I can offer to massage PR 574 branch into merge-able
shape _before_ we start working on these PRs. While the current branch
code has serious problems, I believe they have 

Re: [squid-dev] RFC: Categorize level-0/1 messages

2021-10-20 Thread Amos Jeffries

On 21/10/21 4:22 am, Alex Rousskov wrote:

Hello,

 Nobody likes to be awaken at night by an urgent call from NOC about
some boring Squid cache.log message the NOC folks have not seen before
(or miss a critical message that was ignored by the monitoring system).
To facilitate automatic monitoring of Squid cache.logs, I suggest to
adjust Squid code to divide all level-0/1 messages into two major
categories -- "problem messages" and "status messages"[0]:



We already have a published categorization design which (when/if used) 
solves the problem(s) you are describing. Unfortunately that design has 
not been followed by all authors and conversion of old code to it has 
not been done.


Please focus your project on making Squid actually use the system of 
debugs line labels. The labels are documented at:

  


What we do not have in that design is clarity on which labels are shown 
at what level. IMO they should be:


 * DBG_CRITICAL(0) - admin *need* to know this even if they do not 
think they want to.

  - FATAL
  - SECURITY ALERT
  - ERROR which were mislabeled and should be FATAL

 * DBG_IMPORTANT(1) - some admin want to know these, not mandatory though.
  - ERROR
  - SECURITY ERROR
  - SECURITY WARNING

 * level-2 - status, troubleshooting etc.
  - WARNING admin cannot do anything about
  - SECURITY NOTICE (these are for troubleshooting advice)

 * level-3+ - other




There are also "squid -k parse" messages
that are easy to find automatically if somebody wants to classify them
properly.


Those are level 1-2 messages that become mandatory to display on 
startup/reconfigure.




I have one worry about you taking this on right now. PR 574 has not been 
resolved and merged yet, but many of the debugs() messages you are going 
to be touching in here should be converted to thrown exceptions - which 
ones and what exception type is used has some dependency on how that PR 
turns out.



Amos
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev


[squid-dev] RFC: Categorize level-0/1 messages

2021-10-20 Thread Alex Rousskov
Hello,

Nobody likes to be awaken at night by an urgent call from NOC about
some boring Squid cache.log message the NOC folks have not seen before
(or miss a critical message that was ignored by the monitoring system).
To facilitate automatic monitoring of Squid cache.logs, I suggest to
adjust Squid code to divide all level-0/1 messages into two major
categories -- "problem messages" and "status messages"[0]:

* Problem messages are defined as those that start with one of the three
well-known prefixes: FATAL:, ERROR:, and WARNING:. These are the
messages that most admins may want to be notified about (by default[1])
and these standardized prefixes make setting up reliable automated
notifications straightforward.

* Status messages are all other messages. Most admins do not want to be
notified about normal Squid state changes and progress reports (by
default[2]). These status messages are still valuable in triage so they
are _not_ going away[3].


Today, Squid does not support the problem/status message classification
well. To reach the above state, we will need to adjust many messages so
that they fall into the right category. However, my analysis of the
existing level-0/1 messages shows that it is doable to correctly
classify most of them without a lot of tedious work (all numbers and
prefix strings below are approximate and abridged for clarity of the
presentation):

* About 40% of messages (~700) already have "obvious" prefixes: BUG:,
BUG [0-9]*:, ERROR:, WARNING:, and FATAL:. We will just need to adjust
~20 existing BUG messages to move them into one of the three proposed
major categories (while still being clearly identified as Squid bugs, of
course).

* About 15% of messages (~300) can be easily found and adjusted using
their prefixes (after excluding the "obvious" messages above). Here is a
representative sample of those prefixes: SECURITY NOTICE, Whoops!, XXX:,
UPGRADE:, CRITICAL, Error, ERROR, Error, error, ALERT, NOTICE, WARNING!,
WARNING OVERRIDE, Warning:, Bug, Failed, Stop, Startup:, Shutdown:,
FATAL Shutdown, avoiding, suspending, DNS error, bug, cannot, can't,
could not, couldn't, bad, unable, malformed, unsupported, not found,
missing, broken, unexpected, invalid, corrupt, obsolete, unrecognised,
and unknown. Again, there is valuable information in many of these
existing prefixes, and all valuable information will be preserved (after
the standardized prefix). Some of these messages may be demoted to
debugging level 2.

* The remaining 45% of messages (~800) may remain as is during the
initial conversion. Many of them are genuine status/progress messages
with prefixes like these: Creating, Processing, Adding, Accepting,
Configuring, Sending, Making, Rebuilding, Skipping, Beginning, Starting,
Initializing, Installing, Indexing, Loading, Preparing, Killing,
Stopping, Completed, Indexing, Loading, Killing, Stopping, Finished,
Removing, Closing, Shutting. There are also "squid -k parse" messages
that are easy to find automatically if somebody wants to classify them
properly. Most other messages can be adjusted as/if they get modified or
if we discover that they are frequent/important enough to warrant a
dedicated adjustment.

If there are no objections or better ideas, Factory will work on a few
PRs that adjust the existing level-0/1 messages according to the above
classification, in the rough order of existing message categories/kinds
discussed in the three bullets above.


Thank you,

Alex.

[0] The spelling of these two category names is unimportant. If you can
suggest better category names, great, but let's focus on the category
definitions.

[1] No default will satisfy everybody, and we already have the
cache_log_message directive that can control the visibility and volume
of individual messages. However, manually setting those parameters for
every level-0/1 message is impractical -- we have more than 1600 such
messages! This RFC makes a reasonable _default_ treatment possible.

[2] Admins can, of course, configure their log monitoring scripts to
alert them of certain status messages if they consider those messages
important. Again, this RFC is about facilitating reasonable _default_
treatment.

[3] We could give status messages a unique prefix as well (e.g., INFO:)
but such a prefix is not necessary to easily distinguish them _and_
adding a prefix would create a lot more painful code changes, so I think
we should stay away from that idea.
___
squid-dev mailing list
squid-dev@lists.squid-cache.org
http://lists.squid-cache.org/listinfo/squid-dev