RE: -international - support for oversize messages

Rainer Gerhards Wed, 10 Sep 2003 06:53:31 -0700

Anton,

> 1. Fragmented messages should be supported over any transport
> including legacy UDP syslog.  If you lose some fragments over
> unreliable transport, it is expected. A way to determine that this
> happened would nice to have. I think even your original proposal
> addresses this with SEQNUM.


Actually, UDP was one of the main reasons I went into the more complex
approach.

With -international-00, we *just* have SEQNUM. Remember that UDP is not
just unreliable, but frames can also arrive out of sequence. Let's
assume you send:

SEQNO 0 msg1 frag1
SEQNO 1 msg1 frag2
SEQNO 0 msg2 frag1
SEQNO 1 msg2 frag2

And this is received in this sequence:

SEQNO 0 msg1 frag1
SEQNO 1 msg2 frag2
SEQNO 1 msg1 frag2
SEQNO 0 msg2 frag1

You see the issue? There is *no way* to determine which fragement
belongs to which message. This is why I limited fragementation to
reliable transport in -00, but as you said, this is a bad idea. So I
looked into a way to do it somewhat more reliably.

Bottomline: a sequence counter alone is not sufficient, we need to have
a message id that uniquely identifies a message. That ID must be
transmitted with each message and then we can use a sequence number.

Let's extend the example above.

We now send:

ID 1 SEQNO 0 msg1 frag1
ID 1 SEQNO 1 msg1 frag2
ID 2 SEQNO 0 msg2 frag1
ID 2 SEQNO 1 msg2 frag2

The collector still receives:
ID 1 SEQNO 0 msg1 frag1
ID 2 SEQNO 1 msg2 frag2
ID 1 SEQNO 1 msg1 frag2
ID 2 SEQNO 0 msg2 frag1

It now can assign the right fragment to the right message.

But now, again, we need to look at how unique the ID is. If it is
counted up from 0 (or 1;)) each time the engine starts, there is a
slight chance of confusion. This will also facilitate reply attacks.
This is where I turned to -sign. It has the exact same issue when
transfering larger blocks, so it looks a little tempting to me to use
what is specified there. I see two advantages in doing so:

#1 it is already been discussed, so there is some good wisdom in it
#2 if you implement both -sign and -international, you will already have
the
   code at hand (and I assume over time both will be implemented
   hopefully frequently;))
>
> 2. In the new proposal, why do we need to send message length? The "."
> indicates the last message, right?  Looks like redundant framing
> methods.

That is a good point. But let's look at the fineprint...

I introduced it because I borrowed from -sign. The difference, though,
is that I don't like to see the "fragmenation fields" as mandatory ones.
So "." does no longer mean it is the last fragment but it means it *is*
fragmented. I definitely need to clear up the text for this (if we stick
with this approach...). Right now, it is just in the ABNF. If you use
".", you MUST NOT use the fragmentation fields. So the overall size in
the frag fields is what effectively tells you when a message is
finished. The "."/"*" tells you if the message is fragmented or not.

Any more comments anyone?

> 3. I think it is better to talk in terms of message parts (fragments
> count) instead of length.  It is easier for implementers and for human
> readers.  So, I like SEQNUM stuff.

I initially thought so and this is one reason I asked for comments ;).
While thinking on how this whole thing could be implemented, I came to
the conclusion that the code most probably already knows the overall
size of the message. But it does not know in how many fragments it will
fit (ok, a quick computation will give you this). Besides that, the size
has the advantage that I can do some verification on the messages so
that I can detect minor modifications done by relays along the path.

Regarding human redability - I am right now implementing BEEP for syslog
purposes and have found that the byte counts present in the BEEP headers
are quite easy to follow ones you know which size is in which field.
But, agreed, a plain number is obviously easier... ;)

> 4. Minor thing... Can we start any counts at 1 and not 0?  It matches
> English better -- it is the *first* fragment, not fragment zero. Why
> reinforce that barrier between CS and normal English?  If there is no
> any IETF conventions that dictate otherwise, I would prefer counts to
> start 1.  (No, I am not a VisualBasic programmer. :))

Actually, it got in out of habit. I also know a lot of RFCs do it. I
don't think there is a RFC that requests it. I will try to find one. If
anybody should have a pointer, please post it. All in all, I do not
object starting at 1 and the reasoning sounds good to me.

Would anyone like to stay with 0 as the initial number?

> 5. I am concerned about the scope of reboot id and message sequence
> numbers?  Are they per process?  I think we certainly have to support
> multiple processes firing syslog messages remotely directly without a
> central daemon. In that case, each has to have its own reboot
> (restart?) id and message numbers?  Then the process MUST be
> identified in the message, right?  There were some discussions on this
> which I did not quite follow.

Yes, this is indeed an issue. Let me say it this way: we must find a
solution for coding -sign anyhow. So I would apply this solution to
-international, too. I hope I can implement sign in the weeks to come
(but no promise, vacation season is over and it gets busier... ;)) If I
or others find it hard to adopt to such a scheme, we may need to change
something in -international. But as -sign is quite mature and
-international is in its initial stage, I would like to defer this
discussion a little.

One approach is to use the second the process started as the reboot
session id. This will work for all processes, except those that start
quite rapidly. There is also a slight chance of two processes starting
in the same second. This could be tackeled with a small (0..9) second
random delay on startup.

I think I will implement this algorithm in our solutions as we have the
exact needs that you describe - multiple processes sending
independently. I have to admit that I am still looking for a good
solution for fast, frequently executing processes. So far, the only
thing I have on my mind is to persist the reboot session id somewhere,
but this opens up Pandorra's box [just think of the implications of an
OS reinstall]... :-(

> 6. How do we identify the message to which a given fragment belongs?
> I can have two processes on the same host originate fragmented
> messages concurrently.  Do all fragments have the MSGNUM of the first
> message?  There has to be some reference in the fragments to the
> original message.  Providing MSGNUM of the first fragment in all
> subsequent fragments could solve this.

Oops... I just re-read my message and I see I actually didn't describe
the MSGNBR field from the ABNF... Yes, this is exactly what you are
pointing out. Replace your MSGNUM with MSGNBR in your wording (I like
your name more) and past this into my description...

> 7. I would propose that all fragments of the message should have the
> same timestamp.  Logically, the timestamp should be close to when the
> event happened, not when it was sent, right?

This is why I said the SHOULD be done. However, I can see some reasoning
that an implementation would like to send an updated timestamp with the
later fragments. They may want to do this to indicate how long
processing take, they may even need to do this because this would allow
a quick implementation.

I don't see anything terribly wrong with  the later and thus would not
like to outrule it. What I - at least have tried to - outrule is that
time moves backward.

What does the rest of the group think. Should using the same timestamp
stay a SHOULD or become a MUST?

> 8. The message length limit is a tough one. I am leaning towards not
> limiting the length of messages unless we absolutely have to. The
> receiver should always make sure that it does not attempt to aggregate
> fragments of messages over what its memory limit (or hard limit)
> allows it. It is general out of memory error handling which can happen
> for many reasons, right?  Even if we say that message should not
> exceed X MB, it is likely we would not recommend that implementations
> just discard messages over this size, but rather default to a
> different behavior, right?  Just like UDP syslog receivers probably
> won't discard messages over 1024 bytes.

Actually, I am more or less of the same point of view. I though allowing
9,999,999 bytes would be sufficient, but, yes, this may turn out to be a
false impression. I just like to place some limits on the fields, as no
limits lead to less robust code and thus easier exploitation. I can move
up this limit if there is no opposition, but I would definitely like to
limit it to the 32bit unsigned range. It is very hard to think of
anybody sending syslog message of that size. And going above 32 bit is
probably calling for coding errors. Not limiting this field at all will
definitely cause big troubles.

For example, I suspect that we will see some interop issue with -sign
code as soon as the first client uses a reboot session id beyond the 32
bit scope. When I implement -sign, I already have identified a module to
do 64 bit arithmetic for machines / compilers who do not support this
natively (it also opens up a box of "portability worms" ;)). I would
like remove this error cause from -international (but it stays inside
the reboot session id, if we carry it over from -sign - this is why I
asked if reboot session id should be limited to 32 bit unsigned range).

> 9. Does it make sense to put message fragmentation into a separate RFC
> from international?  The process-unique sequence numbers, for example,
> can be used in many contexts.

As you know from my previous posts, I personally would like to see this.
But I have the impression that there is not enough support for it at the
time being. So I am trying to bring this into a separate section of
-international, which then can be referred to by later IDs/RFCs without
the need for a full implementation of -international. I hope this suits
all well... and is doable under the current process.

> 10. Do I think all this is an overkill?  Well, what's the alternative?
> In the interim, at Cisco we had to come up with a pretty similar
> proprietary scheme.

Well, my point is I would like to see if there is concensus which could
lead to implementation. You may write a really smart and bright spec (I
am sure mine isn't ;)) - but that doesn't help if nobody implements it.
So my experience is that it is often better to do what you think is the
second-best - if that will lead to adaption.

In this spirit: looking forward to another round of comments ;)

Rainer

RE: -international - support for oversize messages

Reply via email to