FWIW, some input, and my opinion.
On Wed Jan 9 23:14:42 2008, Peter Saint-Andre wrote:
ISSUE #1: Do we need a new namespace?
Description: We have changed things around so radically since
version 1.3 [2] of the spec that maybe we need a new namespace (as
we did for the Entity Time protocol).
Discussion: Yes we could do this, but then we'd have two separate
entity capabilities notations in every presence notification that
every user sent over the network, thus violating one of the
requirements of XEP-0115 ("minimize network impact"). Therefore we
have bent over backwards to not define a new namespace. The result
is not the prettiest protocol in the world, but it doesn't break
anything.
My conclusion: I am opposed to defining a new namespace.
Equally, with the current design - and I agree it's ugly, and may
offend purists - we have the neat trick that it degrades gracefully
to disco in three key cases:
1) If the sender doesn't understand hashes, and therefore doesn't use
them.
2) If the receiver doesn't understand hashes, and therefore ignores
them.
3) If the sender uses a hash that the receiver doesn't understand,
even though the receiver *does* understand hashes in general.
That latter is key to our "hash agility" story, incidentally, as it
allows graceful fallback in the case where we're forced into using
hash agility.
ISSUE #2: Should the 'v' attribute be REQUIRED?
Description: The 'ver' attribute was REQUIRED in version 1.3 [2] of
the spec. In a late change made to version 1.4 [3] of the spec
during the Council meeting at which version 1.4 was approved, we
suggested that the value of the 'node' should be
"ProductURL#ProductVersion" (e.g., "http://psi-im.org/#0.11") but
we agreed that this would *not* be REQUIRED or even officially
RECOMMENDED. In the proposed version 1.5 [4] of the spec, we added
a new attribute 'v' to encapsulate the software version, but it is
only RECOMMENDED, *not* REQUIRED.
Discussion: Some people on the list objected strenuously to the
late change made to version 1.4 [3] which suggested that the 'node'
attribute should encapsulate the ProductVersion. Therefore the list
consensus was that the 'node' attribute should be the ProductURL
not including the ProductVersion, and that we would define a new
attribute 'v' to communicate the ProductVersion; however, the list
consensus was that this attribute would *not* be REQUIRED but
instead only RECOMMENDED (some people argued for making it OPTIONAL
or removing it altogether, but we settled on RECOMMENDED).
My conclusion: Leave version 1.5 [4] as it is now, with 'v'
RECOMMENDED but *not* REQUIRED. (In fact I would not object to
making it OPTIONAL, but RECOMMENDED seems closest to the prior list
consensus.)
Conflicting arguments here. As a not-really-client developer (I do
have a client, but even I don't use it), I hold no strong opinion.
1) The old spec did have a version, held in ver, so the new version
is to this extent a regression.
2) Exposing your client software version is a potential security
issue.
If I had to state an opinion, I'd say that if you wanted to hide your
software version in "Classic" XEP-0115, it was pretty easy to
obfuscate the ver attribute, whereas making v optional (whether
OPTIONAL or RECOMMENDED) does at least make this choice explicit.
ISSUE #3: Which hashing algorithms?
Description: The Council discussion seemed to assume that version
1.5 [4] says SHA-1 is mandatory-to-implement ("MTI"). In fact,
version 1.5 does not mandate implementation of any specific
algorithm. Be that as it may, some Council members suggested that
we recommend MD5 instead of SHA-1 (the only concrete reason I heard
in the meeting is that MD5 output is smaller).
(Kind of. One issue is that MD5 might actually be more secure.)
Discussion: As far as I can see, we had consensus not to mandate
any particular hashing algorithm, but instead to allow any
algorithm that is registered with the IANA [5]. Currently the
registered algorithms are md2, md5, sha-1, sha-224, sha-256,
sha-384, and sha-512. However, we seemed to have list consensus
that most people would use SHA-1 at the beginning (SHA-1 is the
default value of the 'hash' algorithm in the currently-approved
version 1.4 [3] of the spec), and perhaps switch to SHA-256 in the
future if it is shown that pre-image attacks (see RFC 4270) are
likely against SHA-1. That said, people *could* implement MD5 if
they want to because it is registered with the IANA.
Note that RFC4270 was a fairly extensive survey by an experienced
IETF security chap - Paul Hoffman runs the VPN Consortium - and Bruce
Schneier's name ought to be familiar to people interested in crypto
and security.
Note also that whilst it describes some progress made in preimage
weaknesses in SHA-1, none are mentioned for either SHA-2 (That's
SHA-256, SHA-512, etc), or MD5. MD5 has had a lot of cryptanalysis -
you'll note that more researchers are producing papers on it than any
other hash algorithm, and this isn't entirely down to relative
strength compared to SHA-* - it's more down to the fact that MD5 has
considerably larger deployment, and so is a more attractive hash to
analyse.
The fact that after this length of time, nobody appears to have found
a preimage attack on it is pretty gratifying. MD5 *is* demonstrably
weak in two areas:
1) Challenge-Response password hashing, for example in CRAM-MD5. Not
because of a mathematical weakness, but because you can brute force
things too fast, across the entire, fairly limited, space of a
password. This doesn't affect us for the twin reasons that:
a) Our space is much bigger.
b) The space we have is quite rigid in format.
2) Collisions, and from there signature algorithms. This is where you
come up with two inputs that produce an identical output. This is
useful if:
a) You get to choose both inputs. (Our poisoner cannot).
b) There is scope for adding random junk somewhere. (Likewise).
In theory, you can do a collision without random junk, but it would
take considerably longer. Also important to note is that this has no
impact on whether we're more likely to find inadvertant collisions
with MD5. In theory, the shorter hash length will have an impact,
simply by the birthday "paradox", but it's still pretty rare.
But it's not weak in preimage attacks - those where the attacker
knows the hash, and/or the input, and wishes to construct an
alternate input of their choosing which matches.
In order to perform caps poisoning with MD5, therefore, the attacker
must:
i) Subvert the development process of the client.
ii) Optionally, to cover his tracks, subvert the XSF, thus allowing
the attacker to have some control over what counts as legitimate
input, thus reducing, to a degree, the random junk problem.
You'll note that Kevin Smith is in a position to do both, but no
other person or entity is, throughout the entire world.
And anyone in either position is capable of inflicting a
significantly higher damage by choosing to do some easier attack - if
the developer of your client turns out to be an Evil Genius, you're
henceforth Doomed. Similarly, if Council members wish to subtley
undermine your security, we are in a position to do that.
3a. Do we specify an MTI algorithm or let the market decide?
I think we need an MTI, I have to admit I'd read the current text as
essentially stating that SHA-1 was the MTI.
3b. If we specify an MTI algorithm, do we specify MD5 or SHA-1 or
something else?
What concerns me is not that SHA-1 is a particularly poor choice, but
that we may have reached that choice by applying faulty logic. SHA-1
does appear to have *some* weakness in preimage. I don't know if,
given the similarity between SHA-1 and SHA-2, this also applies
there, but I cannot find any mention of preimage weakness in MD5.
I'll drop my objection if people want - in fact, I'll drop it if the
other two issues are resolved, but I would like people to take the
opportunity to satisfy themselves that they've made the right choice
in the face of the evidence.
So, some reading:
1) RFC4270 is an excellent backgrounder on the different attacks on
hashes, and how these affect real-world protocols.
2) Wikipædia is helpful, too:
http://en.wikipedia.org/wiki/Birthday_attack demonstrates that we
need around 2.2 x 10^19 possible inputs for MD5 before an inadvertant
collision is more likely than 50%, assuming that these are randomly
spread. (They aren't, so this is in effect a worst case).
http://en.wikipedia.org/wiki/Preimage_attack,
http://en.wikipedia.org/wiki/Cryptographic_hash_function, both give
detailed background.
Finally, of course, feel free to bug me by XMPP or email. :-)
Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
- acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
- http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade