Re: A hideously long description of syslog-auth

John Kelsey Wed, 06 Dec 2000 11:08:29 -0800
-----BEGIN PGP SIGNED MESSAGE-----

At 12:11 PM 12/5/00 -0800, Darren New wrote:
>BTW, did you mean for your reply to me to go to the list? I
>don't think it did, so neither did this. But feel free to
>forward it or whatever.

I forgot to add the syslog list to my note.  I've forwarded
my note; I didn't forward either of your notes, but you
probably should.  I'm planning to forward this message to
the list, as well.

...
>> Hmmm.  I see your point, but S-A doesn't really do anything
>> for reliability in the sense of making sure your packets
>> arrive, it just tells you whether they arrived or not, and
>> guarantees that they weren't changed in transit and
>> originated from the sender you expect them to have
>> originated from.
>
>Well, "delivered reliably" to me implies you know it
>arrived and you know that it wasn't changed. S-A gives you
>"you know whether it arrived" and you have a guarantee it
>wasn't changed. So you're 3/4ths the way there.

I guess we mean very different things, here.  TCP actually
does something to make the delivery reliable, in the absense
of an attacker sitting in the middle blocking messages.
Syslog-auth as I'm describing it tells you whether a
message has been altered or replayed, and tells you whether
you've missed an intervening message.  It can't do anything
to make those missing messages turn up, it just tells you
they're missing.

>syslog-reliable guarantees that it gets delivered, that it
>wasn't changed in transit, and that the previous
>relay/device is who you think it is. The only things it
>doesn't guarantee are (1) storage security, and (2) messages
>relayed thru a comprimised machine. In other words, it
>doesn't guarantee that the message originated from somewhere
>other than where you are receiving it from - the security
>only covers one hop.

Syslog-reliable can make some additional assumptions, like

a.  Two-way message delivery

b.  Far fewer constraints on resources for the machines
implementing it.

Syslog-auth really has to provide what security can be
provided, given minimal resources and unreliable one-way
transport.

Now, I can see what you're saying here--if syslog-auth can
do all the crypto for all messages, then maybe we should
just use syslog-auth over TCP to get syslog-reliable.
Alternatively, maybe that means we're sticking the kitchen
sink into syslog-auth, when it should really just provide
minimal security for low-end machines and for situations
where reliable delivery isn't available.  I guess one thing
that bugs me about this is that once we've gone to the
trouble of getting syslog-auth implemented and installed,
it's not that much more work to get reasonable security
guarantees.  (I didn't really make this clear in my huge
document.  The way syslog-auth fits together in my head is
not really very clearly specified in the document/spec, just
yet.)

Anyway, if we've decided we want to put authentication of
some kind into machines that have to keep using UDP, and
that may not have a clock or anything similar, I think we
kind-of end up at syslog-auth, and that this is a reasonable
place to end up.  The alternatives seem to me to be either:

a.  Dumb down syslog-auth so that it won't compete with
syslog-reliable.

b.  Get rid of syslog-auth, and let people who can't or
won't use TCP for their syslog traffic do without security.

These both seem unreasonable to me, but maybe we should try
to hash this out at the meeting.

>> The thing that makes this so complicated
>> isn't really the duplication of TCP in things like sequence
>> numbers, it's dealing with huge numbers of options for the
>> devices, relays, and collectors, and propogating verified
>> statements from a chain of relays to the final collector.
>>
>> How does syslog-reliable handle forwarded information from
>> an old-style device?  My impression is that it doesn't do
>> anything with it, because that's not the way you're supposed
>> to use this stuff.
>
>I'm not sure what you mean by "do anything". It doesn't
>modify it. It receives it (reliably, securly,
>authenticatedly, etc) and passes it along. It's not
>unreasonable to expect that a syslog-reliable relay could be
>set up that can parse the types of "raw" messages it gets
>and extract out more precise information such as the
>timestamp, process ID, etc and break them out so they're
>easier to handle.

Right.  But if a syslog-reliable relay gets a message from
an old-style device, it seems to me that the eventual
collector that gets that message ought to somehow be
informed that, although this arrived over syslog-reliable,
there's no guarantee that this message wasn't changed in
transit over its first hop, that it really came from the
device we think it came from, that it's fresh (not replayed),
that it arrived in the order it was sent relative to other
messages, or that there weren't some messages dropped on
that first hop.  Otherwise, someone reading that log file on
the collector will make a bunch of incorrect assumptions,
because they'll believe that the messages in the log file
were protected when they weren't.

>>  And I'm inclined to think that this is
>> exactly right.  Our goal is clearly for people to move to
>> syslog-reliable wherever possible, since that gives superior
>> guarantees (e.g., instead of just noting where the missing
>> messages are, we either have all the messages in sequence,
>> or have evidence of a denial-of-service attack of some
>> kind), and since we can then use two-way communications for
>> key management and such.  But I wanted to make sure that
>> syslog-auth would be something that could be dropped in with
>> a minimum of hassle, and which would provide a clear
>> statement of guarantees about what could be said about the
>> logs stored on the concentrator.  And syslog-auth over TCP,
>> or over some other reliable delivery mechanism, would (I
>> think) provide the same kind of guarantees that
>> syslog-reliable would.
>
>Yes, that last is my basic complaint. :-)

Hmmm.  This seems inevitable.  Syslog-auth is supposed to
provide all the assurances that it's possible to provide,
given the limitations of one-way, unreliable transport.
When it's given reliable transport, it will provide all
those guarantees plus the knowledge that reliable transport
exists, as well.
...

>> Anyway, I am writing up syslog-sign (I'll think of a better
>> name) to deal with the whole totally in-band storage
>> authentication scheme I described before.  It's about ten
>> times simpler, since I don't have to worry about
>> transmission security at all.
>
>Yes. I (personally) think the better way to go would be to
>have "syslog-sign" that guarantees the message came from
>where it claims to have come from and hasn't been changed
>(i.e., not unlike the "storage signature") and use
>syslog-reliable for on-the-wire security and authentication.

So, what do you do for installations where they can't or
won't use a TCP session for each syslog ``session?''

>Basically, what syslog-reliable already gives you is:
>1) Delivery as reliable as TCP to the next hop.
>2) Secrecy from on-the-wire observation (i.e., encryption).
>3) Assurance (via shared password) that the previous hop
>   is who they claim to be.
>4) Replay prevention (via nonces).

>What I understand syslog-auth to be giving you is
>1) Knowledge of whether a message has been lost
>   (maybe, and with enough effort)
>2) Assurance of the previous hop (as (3) above)
>3) Replay prevention (as (4) above)
>   (uh, maybe. See below.)
>4) Authentication of the originator, in spite
>   of comprimised relays
>5) Storage security, in spite of comprimised
>   collectors.

Right.  Actually, (1) and (3) are guaranteed if the device,
relays, and collector all implement them, and are really
easy to get when messages are going directly from device to
collector.  When messages are going through a relay or two,
things get more complicated for both gap and replay
detection, but they're both possible.  I think the
difference here is that I'm working with the assumption that
relays have a lot of options; that they may or may not
implement online replay detection from all devices, for
example.

>I think the degree of overlap is high enough that it's
>worth discussing at the WG.

I agree it's worth discussing.

...
>> >Consider that it's only a
>> >collision if you actually *store* all the old session
>> >numbers; consider how much memory it would take to even list
>> >a sparse bit array with 2^96 entries in it. (I haven't done
>> >this, but I expect it's rather large if you assume you have
>> >(say) 2^32 bits set out of 2^96.)

>> There are two sides to this.
>> [...]
>> The pseudorandom RSID might need to be a little smaller, but
>> I hate to make it much smaller, since that will decide what
>> anyone is allowed to use it for in the future.
>
>I just thought that if the maximum reasonable number of
>session numbers is about 2^32, using 2^96 bits of selection
>space is bigger than it needs to be. 2^64 is probably
>plenty.

Well, you run into the birthday paradox here.  With 2^{32}
pseudorandom 64-bit session IDs generated by a single device
in its lifetime, we have about 2^{63} pairs of session IDs,
and we expect one collision with probability a little less
than 1/2.  So if we believe that some device using
pseudorandom session IDs will really generate 2^{32} session
IDs in its lifetime, then we expect a collision at some
point.  I don't ever want to have an incident of a reboot
session ID repeating in the lifetime of any syslog-auth
machine, so I chose a size of reboot session ID that would
give us 2^{-33} probability of ever seeing a collision for
any given device with this unreasonably large number of
reboot sessions in its lifetime.

Now, I suspect you're right that 96 bits is overkill,
because I can't imagine a device actually having 2^{32}
reboot session IDs in its lifetime under any kind of normal
circumstances.  If we imagine a device with a 100 year
lifetime, which reboots 100 times a day, we get less than
2^{22} possible session IDs, which we might take as an upper
bound.  Which would require a 70-bit session ID field to
give us the same 2^{-33} probability of a given device with
this number of reboots ever getting the same reboot session
ID twice by chance.

>> >4.3.2 - I think the wording you're looking for is that the
>> >PRNG needs to be cryptographically secure.
>>
>> That's close.  I need a seed that has something like 96 bits
>> of entropy in it, in the sense that I should expect to have
>> to wait until I've seen on the order of 2^{48} independently
>> generated seeds, before I see a pair that are equal.
>> There's no need for cryptographically strong mechanisms to
>> expand the seed or condense it.  There's not even any need
>> for the seed to be hard to guess, given other information
>> like the time of reboot or what's going on on the device's
>> local network.  The only thing that's required is that we
>> get a unique RSID.
>
>Then I've missed the point of your statement that a C
>library PRNG is inadiquate.

The problem is in the seeding.  C compilers usually use a
linear congruential generator, which will have a 16 or 32
bit seed. Now, the best a PRNG of any kind can do is to take
the initial seed and get all of its entropy into the reboot
session ID.  But that basically means that the best we can
do is to put the PRNG's seed directly into the reboot
session ID field.  If there is a 16-bit seed, then there
will be only 2^{16} possible reboot session IDs generated by
it, and it doesn't matter what we use to expand that seed to
a session ID.  After about 256 reboot sessions, we will
expect to see a session IDs repeated.

If a 16- or 32-bit seed is sufficient, then so is a 16- or
32-bit reboot session ID.  I hope I've made it clear why I
don't think a 16- or 32-bit pseudorandom session ID is okay.

...
>Actually, you answer it later. My statement was meant to
>imply "if 64 bits is too big for me to brute force, why do I
>need 96 bits of key ID?" Later, you answer that as trying to
>reduce the likelyhood of key-id collision. I still think 64
>bits is way plenty - that implies (if I understand the math
>right) something like a 1/65000 chance of two machine having
>the same key-id, *assuming* every single IPv4 address is in
>use at once and running syslog-auth and all talking to the
>same collector.

If there are 2^x key IDs being handled by a given collector,
and the key IDs are n-bit random numbers, then the
expected number of collisions is about 2^{2x-1-n}.  With
2^{32} key IDs in use, you'd expect about 1/2 a collision,
which means you wouldn't be surprised to see a collision.

Again, I suspect you're right--the fields are defined wider
than they need to be.  On one hand, that leaves a lot of
room for expansion later; on the other, it wastes bits and
bandwidth and disk space.

>> Anyway, the basic question a relay needs to answer about a
>> message it's forwarding over syslog-auth is ``Do I have some
>> reason to trust that this message hasn't been altered or
>> replayed in transit to me?''  And it will embed its answer
>> in a flag in the forwarding block, so that later relays and
>> the final collector will know the answer.
>
>OK, so it's not "when a message is received from an
>old-style forwarder..."? It's really "when a message is
>received with an unrecognised key-id, it is treated as if it
>came from an old-style forwarder..."?

Right.  The text should read ``when a message is received
with no authentication,'' since what we care about in this
context is what the relays can pass along about the message.

This is a big underlying idea I don't think I made very
clear in the document.  When a message is going from a
device straight to a collector, the collector can
immediately determine what ``promises'' are being made by
the crypto.  The point of the forwarding block is to make it
equally clear what promises are being made by the crypto
when messages go through one or more relays on the way to
the collector.

What this basically means is that there are some status
flags of the message, such as

a.  ``If this message had been a replay, I or some previous
relay would have discarded it.''

or

b.  ``The receiver of this message can count on the original
authentication block's destination counter to show all
messages it should be receiving; you can use that counter to
see gaps.''

or

c.  ``This message has traveled along syslog-auth for its
whole lifetime.''

Now, whenever a relay receives a message that's already been
forwarded, it effectively ANDs its status flags (e.g. ``I would
have detected a replay from the previous sender.'') into the
status flags from the previous sender.  If there's ever a
relay that wouldn't be able to tell if this was a replayed
message, then all later receivers will be informed of this
fact.  If messages that were originally sent to the same
destination (all using one destination counter to keep them
in sequence) are kept together at this relay, then it can
inform later receivers of this fact.

It may be that we should just simplify this whole idea away,
and require that destination counters are either not kept or
are kept per priority value, and that relays always must
forward messages based on priority values only, and that the
final collector is responsible for checking for replays.
That would simplify the role of the relays considerably.

>> Each relay checks to make sure that the incoming message
>> isn't being replayed, from the device or the previous relay.
>>
>> Here's the idea I'm trying to get across:
>>
>> (Device,Relay1) share a key and some context.
>> (Relay1,Relay2) share a key and some context.
>> (Relay2,Collector) share a key and some context.
>>
>> There's no reason to expect Device to share context with
>> Collector or Relay2, or for Relay1 and Collector to share
>> any context.  There's no reason to even imagine that Device
>> has any clue about the very existence of Relay2 or the
>> Collector.

>The problem is, someone breaks into Relay3, and replays
>messages from Device directly into Collector. At this point,
>Collector is trusting Relay3 to not forward the replays,
>right? Since it shares no context with Device or Relay1 that
>would let it detect this? Or if someone breaks into Relay1,
>they can send all the replays they want, yes?

Right.  In fact, this is an inevitable consequence of having
no shared key or context between Device and Collector.
Relay3 can just send it properly-formatted garbage in the
place of any replayed messages--how will Collector tell?
It can't really use the original authentication block to
check whether Relay3 is flooding it, since its only reason
to trust the original authentication blocks on its messages
from Device is because they're authenticated by Relay3.

Imagine the same situation with syslog-reliable--a relay is
compromised.  Wouldn't the relay be able to do replays (and
worse), assuming there's no shared key between the device
and the collector?

This is one reason to want to have a storage MAC--so that
the worst of this kind of attacks can be detected offline.

>Even if the relays also numbered all the messages they sent
>(which I didn't see in the spec, altho I might have missed
>it when I looked), if you rely on only the relays to do
>replay protection, your relays become the points of
>vulnerability.

True.  In fact, so long as we have the situation where

Device -> Relay -> Collector,

where Device and Collector share no key material, we're
relying on Relay for message integrity and origin
authentication, as well.  A compromised Relay can send us
anything at all.  This is basically a feature of how we're
managing keys.

...
>> Right.  The big potential problem seems to me to exist if
>> the relay holds big messages back for a long time, but
>> happily sends along smaller messages right away.  If many
>> relays along a path do this, the message could end up
>> arriving outside its replay window.
>
>Yes, but I don't see any way to prevent this, if, as I say,
>the relays are permitted to hold messages arbitrarily long.
>The calculation above gives an upper limit on queueing
>delays, and is very concervative, with the hope that the
>application-added delays are sufficiently small that they're
>absorbed in the queueing delays.

My understanding of networking issues is mostly theoretical,
but it seems to me that:

a.  It's not possible in principle to make a replay window
big enough to handle all contingencies when we have
arbitrary-length chains of relays.

b.  It is possible in practice to make a replay window big
enough to handle nearly all cases that ever occur.

Does this look reasonable to you, or am I missing something?

...
>> >Appendix A: It's not that hard to test, if you have the
>> >source to the code. You simply put an "if" at the front if
>> >the hash function that says "If the key is YADDA1 or YADDA2,
>> >return a hash of 123". Then don't use YADDA1 or YADDA2 in
>> >your real configuration. :-)
>>
>> Well, the test would have to deal with intelligent handling
>> of colliding key IDs.
>
>I'm sorry. I figured what you'd do is put the key into the
>device, and into the relay, and when you put it into the
>relay, the relay would hash it into a key-id and say "Hey,
>this key-id is already there!" In other words, the relay
>already knows all the key-id's it can recognise, so when you
>configure it to recognise a new one, that's when you squawk.
>Then the administrator goes and picks a different password
>for the device.

Ah, I see.  You're right, this is the right place to check
for this, either here or when the administrator generates a
key for the thing in the first place.

...
>But the hash on messages is a bit different than the
>key-id's. You want an astronomically low likelyhood of
>finding two messages with the same hash, because the point
>is to keep you from changing the message. However, with
>keys, if you can detect when you generate the key that the
>key-id matches an already existant key, you just generate a
>new one.

This is a good point.  I've been constraining my design by
assuming that there's essentially no back-channel from the
relay or collector back to the device.  And that's true for
general operations, but not for key management, where we can
assume the use of sneaker-net as required.

...

>Darren New / Senior MTS & Free Radical / Invisible Worlds Inc.
>San Diego, CA, USA (PST).  Cryptokeys on demand.
>There is no "P" on the end of "Winnie the Pooh."

- --John Kelsey, [EMAIL PROTECTED]

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 6.5.1 Int. for non-commercial use
<http://www.pgpinternational.com>
Comment: foo

iQCVAwUBOi6RvSZv+/Ry/LrBAQEtYgQArwBj9rN53BJ4zWsM1OOidzddYOJ4A4Qe
SZyGauLcb6s66kgqLaBH6auOnrki0oDl7m2V8Eh3/5IJITkj2tVQVyQ+q9MWRPbj
2mZh/A/tWltVOuK7JKOI25qaM5BIt97OstPX8O8i+Xe4EGMeN7rVvR0Ta/Yq8EVt
QcHvdoPoCJc=
=F5KE
-----END PGP SIGNATURE-----
Re: A hideously long description of syslog-auth

Reply via email to