Anton, > 1. Fragmented messages should be supported over any transport > including legacy UDP syslog. If you lose some fragments over > unreliable transport, it is expected. A way to determine that this > happened would nice to have. I think even your original proposal > addresses this with SEQNUM.
Actually, UDP was one of the main reasons I went into the more complex approach. With -international-00, we *just* have SEQNUM. Remember that UDP is not just unreliable, but frames can also arrive out of sequence. Let's assume you send: SEQNO 0 msg1 frag1 SEQNO 1 msg1 frag2 SEQNO 0 msg2 frag1 SEQNO 1 msg2 frag2 And this is received in this sequence: SEQNO 0 msg1 frag1 SEQNO 1 msg2 frag2 SEQNO 1 msg1 frag2 SEQNO 0 msg2 frag1 You see the issue? There is *no way* to determine which fragement belongs to which message. This is why I limited fragementation to reliable transport in -00, but as you said, this is a bad idea. So I looked into a way to do it somewhat more reliably. Bottomline: a sequence counter alone is not sufficient, we need to have a message id that uniquely identifies a message. That ID must be transmitted with each message and then we can use a sequence number. Let's extend the example above. We now send: ID 1 SEQNO 0 msg1 frag1 ID 1 SEQNO 1 msg1 frag2 ID 2 SEQNO 0 msg2 frag1 ID 2 SEQNO 1 msg2 frag2 The collector still receives: ID 1 SEQNO 0 msg1 frag1 ID 2 SEQNO 1 msg2 frag2 ID 1 SEQNO 1 msg1 frag2 ID 2 SEQNO 0 msg2 frag1 It now can assign the right fragment to the right message. But now, again, we need to look at how unique the ID is. If it is counted up from 0 (or 1;)) each time the engine starts, there is a slight chance of confusion. This will also facilitate reply attacks. This is where I turned to -sign. It has the exact same issue when transfering larger blocks, so it looks a little tempting to me to use what is specified there. I see two advantages in doing so: #1 it is already been discussed, so there is some good wisdom in it #2 if you implement both -sign and -international, you will already have the code at hand (and I assume over time both will be implemented hopefully frequently;)) > > 2. In the new proposal, why do we need to send message length? The "." > indicates the last message, right? Looks like redundant framing > methods. That is a good point. But let's look at the fineprint... I introduced it because I borrowed from -sign. The difference, though, is that I don't like to see the "fragmenation fields" as mandatory ones. So "." does no longer mean it is the last fragment but it means it *is* fragmented. I definitely need to clear up the text for this (if we stick with this approach...). Right now, it is just in the ABNF. If you use ".", you MUST NOT use the fragmentation fields. So the overall size in the frag fields is what effectively tells you when a message is finished. The "."/"*" tells you if the message is fragmented or not. Any more comments anyone? > 3. I think it is better to talk in terms of message parts (fragments > count) instead of length. It is easier for implementers and for human > readers. So, I like SEQNUM stuff. I initially thought so and this is one reason I asked for comments ;). While thinking on how this whole thing could be implemented, I came to the conclusion that the code most probably already knows the overall size of the message. But it does not know in how many fragments it will fit (ok, a quick computation will give you this). Besides that, the size has the advantage that I can do some verification on the messages so that I can detect minor modifications done by relays along the path. Regarding human redability - I am right now implementing BEEP for syslog purposes and have found that the byte counts present in the BEEP headers are quite easy to follow ones you know which size is in which field. But, agreed, a plain number is obviously easier... ;) > 4. Minor thing... Can we start any counts at 1 and not 0? It matches > English better -- it is the *first* fragment, not fragment zero. Why > reinforce that barrier between CS and normal English? If there is no > any IETF conventions that dictate otherwise, I would prefer counts to > start 1. (No, I am not a VisualBasic programmer. :)) Actually, it got in out of habit. I also know a lot of RFCs do it. I don't think there is a RFC that requests it. I will try to find one. If anybody should have a pointer, please post it. All in all, I do not object starting at 1 and the reasoning sounds good to me. Would anyone like to stay with 0 as the initial number? > 5. I am concerned about the scope of reboot id and message sequence > numbers? Are they per process? I think we certainly have to support > multiple processes firing syslog messages remotely directly without a > central daemon. In that case, each has to have its own reboot > (restart?) id and message numbers? Then the process MUST be > identified in the message, right? There were some discussions on this > which I did not quite follow. Yes, this is indeed an issue. Let me say it this way: we must find a solution for coding -sign anyhow. So I would apply this solution to -international, too. I hope I can implement sign in the weeks to come (but no promise, vacation season is over and it gets busier... ;)) If I or others find it hard to adopt to such a scheme, we may need to change something in -international. But as -sign is quite mature and -international is in its initial stage, I would like to defer this discussion a little. One approach is to use the second the process started as the reboot session id. This will work for all processes, except those that start quite rapidly. There is also a slight chance of two processes starting in the same second. This could be tackeled with a small (0..9) second random delay on startup. I think I will implement this algorithm in our solutions as we have the exact needs that you describe - multiple processes sending independently. I have to admit that I am still looking for a good solution for fast, frequently executing processes. So far, the only thing I have on my mind is to persist the reboot session id somewhere, but this opens up Pandorra's box [just think of the implications of an OS reinstall]... :-( > 6. How do we identify the message to which a given fragment belongs? > I can have two processes on the same host originate fragmented > messages concurrently. Do all fragments have the MSGNUM of the first > message? There has to be some reference in the fragments to the > original message. Providing MSGNUM of the first fragment in all > subsequent fragments could solve this. Oops... I just re-read my message and I see I actually didn't describe the MSGNBR field from the ABNF... Yes, this is exactly what you are pointing out. Replace your MSGNUM with MSGNBR in your wording (I like your name more) and past this into my description... > 7. I would propose that all fragments of the message should have the > same timestamp. Logically, the timestamp should be close to when the > event happened, not when it was sent, right? This is why I said the SHOULD be done. However, I can see some reasoning that an implementation would like to send an updated timestamp with the later fragments. They may want to do this to indicate how long processing take, they may even need to do this because this would allow a quick implementation. I don't see anything terribly wrong with the later and thus would not like to outrule it. What I - at least have tried to - outrule is that time moves backward. What does the rest of the group think. Should using the same timestamp stay a SHOULD or become a MUST? > 8. The message length limit is a tough one. I am leaning towards not > limiting the length of messages unless we absolutely have to. The > receiver should always make sure that it does not attempt to aggregate > fragments of messages over what its memory limit (or hard limit) > allows it. It is general out of memory error handling which can happen > for many reasons, right? Even if we say that message should not > exceed X MB, it is likely we would not recommend that implementations > just discard messages over this size, but rather default to a > different behavior, right? Just like UDP syslog receivers probably > won't discard messages over 1024 bytes. Actually, I am more or less of the same point of view. I though allowing 9,999,999 bytes would be sufficient, but, yes, this may turn out to be a false impression. I just like to place some limits on the fields, as no limits lead to less robust code and thus easier exploitation. I can move up this limit if there is no opposition, but I would definitely like to limit it to the 32bit unsigned range. It is very hard to think of anybody sending syslog message of that size. And going above 32 bit is probably calling for coding errors. Not limiting this field at all will definitely cause big troubles. For example, I suspect that we will see some interop issue with -sign code as soon as the first client uses a reboot session id beyond the 32 bit scope. When I implement -sign, I already have identified a module to do 64 bit arithmetic for machines / compilers who do not support this natively (it also opens up a box of "portability worms" ;)). I would like remove this error cause from -international (but it stays inside the reboot session id, if we carry it over from -sign - this is why I asked if reboot session id should be limited to 32 bit unsigned range). > 9. Does it make sense to put message fragmentation into a separate RFC > from international? The process-unique sequence numbers, for example, > can be used in many contexts. As you know from my previous posts, I personally would like to see this. But I have the impression that there is not enough support for it at the time being. So I am trying to bring this into a separate section of -international, which then can be referred to by later IDs/RFCs without the need for a full implementation of -international. I hope this suits all well... and is doable under the current process. > 10. Do I think all this is an overkill? Well, what's the alternative? > In the interim, at Cisco we had to come up with a pretty similar > proprietary scheme. Well, my point is I would like to see if there is concensus which could lead to implementation. You may write a really smart and bright spec (I am sure mine isn't ;)) - but that doesn't help if nobody implements it. So my experience is that it is often better to do what you think is the second-best - if that will lead to adaption. In this spirit: looking forward to another round of comments ;) Rainer
