some (well a lot ) review remarks

ons-huis.net!ALbert Wed, 31 Mar 2004 11:50:26 -0800

Hello Rainer, Anton, WG

Last weeks I have reviewed some of the current draft documents. I had
planned to spend more time on it, coming month. But, plans are changed.
It will be very busy, which is good .. But also means less tome for syslog.



I have one important "issue": I think it would be a great improvement if
some parts of Rainer's document would move to Anton's. I think the
protocol document should NOT specify how to break messages in parts. It
should assume the transport layer can "transport" 'long messages'.
The transport layer, should when a long message can't be send in once,
spilt in is several parts.

I think it should be possible to write another 'transport' document,
which describes how 'long, new, better' messages can be transported by
rfc-3164 syslog messages.
Note: I'm NOT saying we should make an RFC for this. Only that, if we
could write it, the separation between transport and  "upper" layers are
better.
Note 2: That document should  describe how to 'shrink' the longer
headers into the old headers, etc. Not by putting the complete message
into the payload.

--------------
Aside from the point above, I will include a raider long list of
(personal) notes 'asis'.

Rainer, please use those notes when relevant, Skip the others. I'm
sorry, I don't have more time to rewrite my notes to a good review. It
is either me to throw away them, of give another that option.:-)

Hope it helps a bit.

===========================================================

-- 
Groetjes,
--ALbert Mietus
Private mail to:           albert at ons-huis dot net
Business mail to:     albert dot mietus at PTS dot nl
Spam:   Just don't do it! Thrust me, I will not order!

Hello WG,

This is my comment on Rainer's syslog-protcol draft 04. I have been very passive on 
following this mailinglist; (to busy:-) But on receiving a request to review I decided 
to do so. But I apologise if I raise issues that are discussed already. I didn't read 
them. This week I started, after a log time, to study the current draft document(s).

Before continuing, let me say I like the idea of splitting syslog in several layers. I 
say so, in the hope the individual sub-specifications will become short.
My comment is split into 3 parts, of which you are reading part 1 now. Part 1 is about 
the "idea" (the design). In a separate mail I will comment on the text of the 
document; in the hope I will become clearer, cleaner and shorter. The 3rd mail will 
contain some details and bit that didn't fit in the others.

Possible, I will send more comment in futures mail; it will depend on the amount of 
time I have now, and then.

-------------------------------------------------------

This review is becoming (very) long, I really hope all comments are worth reading. I 
have tried to express my thoughts about is a well as possible, given the time I have...

General Comment on the idea/design of syslog-protocol
=====================================================

* This idea is GOOD!



H2 Architecture
================

Traditional, syslog knows Devices, Collectors and Relays.
I would like to add two 'things'. One I would like to call a *Generator* , the other 
an *Runner*.

*Generator*
As we can see in e.g. most Unix implementations, it is the application that knows WHAT 
to log, transmit that to the system (the log device, the syslogdaemon) that know HOW 
to log it. Also on embedded systems, this can be seen. Historically, the combination 
of (a part of the) application and the "system" form the log-device.
I would like to split this, now this opportunity exits. The part that is build-in in 
the application, (in C: the lines syslog(..."hai there");) can be called  the 
(LOG-)Generator.
The communication between generator and Device is system/platform depended. On Unix 
systems usually the log-device, on embedded systems a function-call, and on windows 
log-events can be used.

By introducing this Generator, we (can) make clear this private/dedicated 
communication exist; and is allowed. We also make clear the Generator is 
syslog-protocol INDEPENDED.

The function of the Device, know becomes clear: it get log-message (-events) and 
transport them. It also does some bookkeeping, like timestamping, adding crypto (for 
-sign), etc.

*Runner*
A LOG-runner is a other kind of syslog-thing, which is frequently used. Without a 
proper place in the architecture.  Whereas a relay (should) forwards syslog-messages 
without knowledge of the semantics of the message, the Runner does. The most simple 
life-form of a runner is a filter. It "relays" messages, but only when the are 
important. On Unix there a several of these in perl, grep-scripts  etc. Formally, the 
are knot relays (I think).
A more complex runner, is a "program" that receives log-messages, CHANGES them, and 
send (or stores) the result. Examples: statically analyses, Intrusion detection, etc.

Both kind of Runners are useful, frequently used, but the not part of the 
architecture. And as we try to make syslog "better", we better add them and make sure 
out standards can "deal" with them. Otherwise, non-RFC compliance log-programs will be 
standard.


H4 Syslog format
================

412 enterpriseID
-----------------

I don't see any reason to include an enterpriseID; not into the header. (When needed, 
it can be used in the structured MSG part)

Currently, it is just a number. It will be unused, misused or will lead to a lot of 
(operational) management. I'm afraid for the latter, as in H4.1.3 is suggested  that 
the semantic of the Facility can be enterpriceID depended.

Also, it is required to use the "IANA assigned vendor" number. This implies 
open-source/free/non-commercial are 'ruled out' as the often will not to so.

Last, should the number of the Generator-vendor, the system-vendor of the 
device-vendor be used? (See above about generator/device) This is not clear to me. And 
whatever one is chosen, it will be hard to implement. Not using the defacto (Unix) 
logging api's!

413 Facility
-------------

Although, at first sight, liked the idea of "a terrible lot" of facilities. The 
current <used a number idea> is wrong, as I see it. Aside from the problems mentioned 
above, more then a million facilities will mean relays can't be managed! The set of 
facilities, which will be seen in a (major) network, expressed as numbers, will be are 
more or less at random. Which implies very long complex and unmanageable configfiles 
(or MIB's) for each LOG-router!

As we heave learn form routing IPv4, hierarchical structuring is needed.

I think, extending the set of facilities is good. But I can't imagine more the say 
1000 are ever needed.

So, my counter-proposition is:
        *  Make  facilities (as a number) structured
        *  Limit the number of facilities to a manageable number
        *  Keep the format such that extending the allowed numbers is possible

A Facility then is still a number, at least 3 (or 4) digits long. A longer number 
means the it is an extended facility. They have to be assign by IANA.
Facilities of length 3 (4) MUST have the format '(K)KLM', where 'K' (or 'KK') 
indicated the kind of facility; 'L' give a sub indication and 'M' is *SITE* 
configurable (so, by the local sysadmin, see example below).
The  'K/KK' is based on the RFC3164 facilities, clean up and extended. Those numbers 
can be IANA assigned. L can be chosen by the (generator) vendor, and `M' by the admin. 
'M' defaults to 0 (zero), and  applications/vendors MAY give the possibility to set 
that digit.

Example: For mail, there will be an K specified, let say 1.
Then all mail-log will have the format '1XY', which is easily routable. It will do for 
small sites.
Some vendors, like "sendmail" (only 1 process) will probably use only one value for L 
e.g. '0'; others, like "postfix" (several processes) can use multiple values, like 
'1', '2' and '3'.
When supported by both sendmail and postfix, the local sysadmin can add (change) the 
M-digit, such that mail-systems on the border, and internal ones use another facility.

In all cases, the local sysadmin can either use simple routing-rules, like 1** (for 
all mail), or 10* for  sendmail and 1[1-2]*, 13* for postfix, or even more complex.
Now, the sysadmin has a choice, and can keep it manageable.

Note: the 0 for sendmail and 1-4 for postfix are "by example".
However, we can add a "rule" that '0' shall be used when only 1 L-value is used, and 
when several values are used, zero should be skipped.
Also, I would like to prescribe/reserve "9" for local additions. (on all digits).

The (K)KLM idea is used a suggestion to improve, IF this idea is accepted, THEN we can 
discuss variations like 3 or 4 digits, where to save mappings (IANA of this rfc), etc.

5 Structured data
==================

In short: I thing having the option of structured data is a nice option. But lets keep 
it simple.

The current one is to complex, it has to be as we need 4 pages to describe it.
Also, I find those pages hard to study (given the time I had :-). Also it can lead to 
not implementing it. Programmers, especially there bosses, don't have a lot of time!

More positive: The main reason why structured data is complex (currently) comes down 
into 1 problems:
1) It isn't part of the main-design (the ABNF on page 9)
2) The "structure" can start anywhere in MSG

Both are easy to solve:
Ad 2) Specify that the structured part ALWAYS START directly after the header.
Ad 1) We need to introduce it where it belongs. In an optional field on page 9


Let give it a try ( I also use the "improvement" on the ABNF of my other mail; it 
saves typing) (Also I "forget" the SP parts, for now. Just the idea)

SYSLOG-MSG  = HEADER DATA
HEADER      = VERSIONING PRIO ID         // See other mail
DATA        = [ *STR-DATA ] MSG
STR-DATA    = see below
MSG         = free format

Given this ABNF, the structured data ALWAYS comes (in RFC3164 notation) at the start 
of the MSG-part (of in new ABNF: before the free format MSG).

This implies receivers always have (as last resort) the option to see everything after 
the header as free-format. And just store/forward it.
It implies the start of STR-DATA is simple to find: Its starts directly after the 
header, or directly after another STR-DATA

This implies to, we have the option to start STR-DATA with '<' which is more usable 
and XML-alike. The complex long "[EMAIL PROTECTED]" cookie isn't needed anymore. 
However, we free to use it. My personal vote is for the (XML) < > style.

See also my other posting about details of structured-data.

6 Multi-Part Messages
=====================

There are some mistakes in the this part, but I like the general idea. However, I fell 
spliting/reassembly is done in other protocols too. Maybe we can use/reference a (de 
facto) standard? I don't know an RFC which we can use, but I'm sure there must be one!

Second, it to complex, and to long (to read). I have studied it, but I'm not sure I do 
understand it.  Some details about which I find hard/wrong/dislike

MP-timestamp
------------
I do not like having several messages having the same TIMESTAMP



62 SD-ID receiving an optional STR-DATA
=======================================
This must be a mistake!
In the 3rd paragraph is stated the a receiver sometimes MUST NOT parse a STR-DATA of a 
log-message that is received. However, when the option Multi-part is not implemented, 
is doesn't now this!



Hello WG,

This is part 2 (of 3) of my comment on the 3th draft-syslog-protocol. See the 
introduction at my posting [EMAIL PROTECTED]
This one contains comment on the text of the document; to help to clarify, and shorten 
the document. It does NOT contain comment on the "idea" (design); se posting [EMAIL 
PROTECTED]

-------------------------------------------------------



Implementation hints
====================

The current RFC contains a lot of valuable hints for programmers, like the one about 
time-secfrac (Yes I introduced it, at least the bug:-)!

Currently the are scattered around the document, making the document long and more 
complex to read for non-programmers.

I would suggest to move all of them to a new chapter, at the end of the document 
(after the current chapter 9).


ABNF (Chapter 4)
================

I think we can clarify the syntax, by "unflattening" RGC-3164 syslog uses some field 
and subfields which are nice when introducing syslog to others. The "understand" names 
as priority.

So, keep it structured. I give it a try (please correct the syntax of the ABNF, as I 
not writing it daily anymore)

SYSLOG-MSG  = HEADER DATA
HEADER      = VERSIONING PRIO ID
DATA        = [ STR-DATA ] MSG
VERSIONING  = "V" VERSION
VERSION     = 1*3 DIGIT
PRIO        = '<' FACILITY '.' SEVERITY '>'      // See [EMAIL PROTECTED] for notation 
change
ID          = TIMESTAMP SP HOSTNAME SP TAG
STR-DATA    = see elsewhere
MSG         = free format
(etc)

Now we have meaningful field, that can be used to. E.g the ID field (see other mail) 
make each message unique

5.1 Format (typo?)
==================
On page 20, the paragraph starting with "The structured data element MUST ..." is 
confusing. The 2nd last line say _no space_ is allowed, the last one says one or more 
space are.
Is this a typo? Or it is unclear (at least to me)

Structures data, ID-length
==========================

I don't see why we should limit the several field to 64 positions. I agree this will 
normally suffice. But so will 32, or 16, of any other number. 64 is "to big" to use as 
a fix-sized ("reserved") space for programmer's, database fields, etc. (to big == 
spoil to much bytes on huge logs). So dynamic field need to be used anyhow. Then there 
is no need for a trivial maximum.
Note, there is a maximum anyhow, given by the size of a single log-message. That will 
do for "short term allocation".

Removing this limit, make the rdc cleaner and smaller.

Structured data, spaces
=======================

I would like to have all line about SP (spaces) in chapter 5 removed. The point about 
0,1 or more spaces is not relevant.
In general, syslog uses SP to separate field (when needed). And allows them in 
MSG-part. The syntax and semantics of STR-DATA is does not depend on the amount of 
space. Nor is it harder to read. Implementing receivers even become easier when spaces 
(in STR-DATA) can be skipped (while { if 'sp' then skip } ) instead of checking the 
correct number, and doing something if wrong !

Proposal: Allow spaces anywhere, but inside SD-ID (see note) and SD-PARAM. SP in 
SD-VALUE is allowed (already), but not a separator. Prescribe (at least 1) SP between 
each param-value pair.

*Note: as SP between '[#@'(or '<') and the SP-ID itself is probably not a good idea, 
but not a problem. We can fix is, by moving the fixed string in the ABNF:
STR-DATA    = STR-START ... STR-END
STR-START   = "[#@" SD-ID   ; or '<'
STR-END     = "]"           ; or '>'
...         = as before, SP are allowed.

Note2: Doing so allows for format line for human reading, which is handy
Example     <x-gam-example  doYoe="like this"  or     = "This one?"   >
            <z-gam-more     Yes  = "I do"      find   = "it readable!" >

This example is simple to parse, both for humans and computers. This change will make 
chapter-5 shorter, I think!

Last, I think any whitespace should be allowed instead of SP (eg. TAB)

Chapter-5, MSG
==============

I suggest, but only as a detail, the text of chapter-5 should be part of 4.2



Hello WG,

This is part 3 (of 3) of my comment on the 3th draft-syslog-protocol. It only contains 
short remark on details, and bit that didn't fit in part 1 (idea/design) or part 2 
(the document itself). See posting [EMAIL PROTECTED] for more introduction

-------------------------------------------------------

413/314 FACILITY/SEVERITY
==========================

In this draft, both facility and severity are numbers. Even with my suggestion, the 
are 'just numbers'. And numbers are hard to read for humans. Especially when the are a 
lot of them. Most people will forget which column contains which number.

Therefore, I think it is better to use the (verbose)notation used by most syslog 
implementations: "'<' facility '.' severity '>'"
Both facility and severity are numbers (at least in the wire). Collectors (viewers) 
can translate those numbers into there names. But still use this format. And Even 
without them, it is easier to read the first (new) then the second (current) line:
V1 0 <888.4> 2003-010-11T22:14:13.003Z new.Formated. ...
V1 0 888 4 2003-010-11T22:14:13.003Z old.Formated. ...

Note: I agree, we should not the tricky "8 times F plus S" notation. I'm not 
suggesting that! Just insert a dot and the angles.


4151 timestamp, without time
============================

There are 'devices' as meant in this section which  haven't  an idea of TIME. So it is 
a good idea this section.

Often, those devices can store ("know") a few bit of information. Therefore, I would 
like to change this fixed TIMESTAMP, to the same one, but with a sequence number 
attached; the factional-seconds field can be used for it.

Then a timestamp becomes 2001-01-01T00:00:60.<seq>Z.
As in the current draft, this time doesn't exist. But al least, collectors can (more 
or less) sort a set of logmessages form 1 device

Note: the latter is needed for e.g. syslog-sign


417 TAG
=======

I think we need to make the TAG stuctured! All current syslog receivers (collectors, 
relays) use __PARTS of__ the RFC3164 TAG to route messages.
In RFC3164 the TAG is simple and short, so it is quite simple to use it for routing. 
Note: not the complete TAG is used, only the 'program name', never the PID part.

With the new TAG, with an static ID an a dynamic part, similar routing should be 
possible. At least, the RFC should be clear on it. So, by demanding the static part is 
"fixed", and make sure that static part can be found.

Given the current practice, routing is based (mainly) on the program-name, it would be 
wise to (at least suggest) how (where) that part can be found.

Proposal:  Forget native support for VMS/Windows/DOS and even Unix pathnames. And 
introduce an URI (URL)-alike schema. Where only '/' (not the one form Unix, but from 
URL's) is used to "path separation" and ':' and '//' are major separators.

In this case, the ABNF is simple; only the semantics become a little more complex.
Also, it become simple for web-applications to log. The have an URL already. All "old 
fashioned:-)" application have an URL:  ''file://path/to/appl'' already. This is valid 
on any system.

Note: the dynamic part has to be added
Note2: for web-applictions, which include a 'hostpart' in the URL: that hostname is 
NOT the same as the HOSTPART in the header. Frequently (a web-farm) several systems 
share the same URL, but not the hostname. Then the sysadmin can decide which one to 
use for routing.


Message ID
==========

Given the architecture of syslog-networks, messages can be duplicated. But, sometimes 
messages are related (e.g. with the signatures of syslog-sign). Both the RFC3164 and 
the current -protocol draft do not have possibilities to unique identify a message.

In practice, messages are unique by there hostname, TAG and timestamp. But, we can't 
trust on this, as it isn't required.

I would like to introduce this requirement. It is simple to add to the RFC, and simple 
to implement. Given the HOSTNAME and TAG, all the implementers have to do is never 
send a message with the same timestamp. Given the microsecond resolution, this doable. 
(It does imply some systems have to fake the last digits; I don't see a problem with 
this. Otherwise we can add a "." SequenceNo to the timestamp

Structured data, tokens
=======================

Given the current draft, of my counter proposal of it, of structured data, the are 
(only) 2 kinds of tokens. The IANA controlled ones and the experimental ones; the 
latter starting with "x-"

I think, we can safely add an third: "X-" for private/local/vendor specific tokens. As 
we can see my e.g. mail, this kind of field will be used a lot. Now we have an option 
to allow the, without giving the a status of "testing".


STR-DATA, can we use it for syslog-sign (or similar)?
=====================================================

Just an idea: in a protocol as syslog-sign (here just as example), where messages 
reference to other messages (now implicit), we could use the STR-DATA to do so?

I verified the syntax/semantics of this, and YES, we could do so.
This means, I like STR-DATA a lot more :-) It is great. Even when -sign doesn't use 
it, I (we) can use the same format to present it to the user!

64 MultiPart examples
=====================

All examples use rfc3164 headers, shouldn't -protocol headers be used?

some (well a lot ) review remarks

Reply via email to