FWIW, some input, and my opinion.

On Wed Jan  9 23:14:42 2008, Peter Saint-Andre wrote:
ISSUE #1: Do we need a new namespace?

Description: We have changed things around so radically since version 1.3 [2] of the spec that maybe we need a new namespace (as we did for the Entity Time protocol).

Discussion: Yes we could do this, but then we'd have two separate entity capabilities notations in every presence notification that every user sent over the network, thus violating one of the requirements of XEP-0115 ("minimize network impact"). Therefore we have bent over backwards to not define a new namespace. The result is not the prettiest protocol in the world, but it doesn't break anything.

My conclusion: I am opposed to defining a new namespace.


Equally, with the current design - and I agree it's ugly, and may offend purists - we have the neat trick that it degrades gracefully to disco in three key cases:

1) If the sender doesn't understand hashes, and therefore doesn't use them.

2) If the receiver doesn't understand hashes, and therefore ignores them.

3) If the sender uses a hash that the receiver doesn't understand, even though the receiver *does* understand hashes in general.

That latter is key to our "hash agility" story, incidentally, as it allows graceful fallback in the case where we're forced into using hash agility.


ISSUE #2: Should the 'v' attribute be REQUIRED?

Description: The 'ver' attribute was REQUIRED in version 1.3 [2] of the spec. In a late change made to version 1.4 [3] of the spec during the Council meeting at which version 1.4 was approved, we suggested that the value of the 'node' should be "ProductURL#ProductVersion" (e.g., "http://psi-im.org/#0.11";) but we agreed that this would *not* be REQUIRED or even officially RECOMMENDED. In the proposed version 1.5 [4] of the spec, we added a new attribute 'v' to encapsulate the software version, but it is only RECOMMENDED, *not* REQUIRED.

Discussion: Some people on the list objected strenuously to the late change made to version 1.4 [3] which suggested that the 'node' attribute should encapsulate the ProductVersion. Therefore the list consensus was that the 'node' attribute should be the ProductURL not including the ProductVersion, and that we would define a new attribute 'v' to communicate the ProductVersion; however, the list consensus was that this attribute would *not* be REQUIRED but instead only RECOMMENDED (some people argued for making it OPTIONAL or removing it altogether, but we settled on RECOMMENDED).

My conclusion: Leave version 1.5 [4] as it is now, with 'v' RECOMMENDED but *not* REQUIRED. (In fact I would not object to making it OPTIONAL, but RECOMMENDED seems closest to the prior list consensus.)


Conflicting arguments here. As a not-really-client developer (I do have a client, but even I don't use it), I hold no strong opinion.

1) The old spec did have a version, held in ver, so the new version is to this extent a regression.

2) Exposing your client software version is a potential security issue.

If I had to state an opinion, I'd say that if you wanted to hide your software version in "Classic" XEP-0115, it was pretty easy to obfuscate the ver attribute, whereas making v optional (whether OPTIONAL or RECOMMENDED) does at least make this choice explicit.


ISSUE #3: Which hashing algorithms?

Description: The Council discussion seemed to assume that version 1.5 [4] says SHA-1 is mandatory-to-implement ("MTI"). In fact, version 1.5 does not mandate implementation of any specific algorithm. Be that as it may, some Council members suggested that we recommend MD5 instead of SHA-1 (the only concrete reason I heard in the meeting is that MD5 output is smaller).


(Kind of. One issue is that MD5 might actually be more secure.)


Discussion: As far as I can see, we had consensus not to mandate any particular hashing algorithm, but instead to allow any algorithm that is registered with the IANA [5]. Currently the registered algorithms are md2, md5, sha-1, sha-224, sha-256, sha-384, and sha-512. However, we seemed to have list consensus that most people would use SHA-1 at the beginning (SHA-1 is the default value of the 'hash' algorithm in the currently-approved version 1.4 [3] of the spec), and perhaps switch to SHA-256 in the future if it is shown that pre-image attacks (see RFC 4270) are likely against SHA-1. That said, people *could* implement MD5 if they want to because it is registered with the IANA.


Note that RFC4270 was a fairly extensive survey by an experienced IETF security chap - Paul Hoffman runs the VPN Consortium - and Bruce Schneier's name ought to be familiar to people interested in crypto and security.

Note also that whilst it describes some progress made in preimage weaknesses in SHA-1, none are mentioned for either SHA-2 (That's SHA-256, SHA-512, etc), or MD5. MD5 has had a lot of cryptanalysis - you'll note that more researchers are producing papers on it than any other hash algorithm, and this isn't entirely down to relative strength compared to SHA-* - it's more down to the fact that MD5 has considerably larger deployment, and so is a more attractive hash to analyse.

The fact that after this length of time, nobody appears to have found a preimage attack on it is pretty gratifying. MD5 *is* demonstrably weak in two areas:

1) Challenge-Response password hashing, for example in CRAM-MD5. Not because of a mathematical weakness, but because you can brute force things too fast, across the entire, fairly limited, space of a password. This doesn't affect us for the twin reasons that:

a) Our space is much bigger.
b) The space we have is quite rigid in format.

2) Collisions, and from there signature algorithms. This is where you come up with two inputs that produce an identical output. This is useful if:

a) You get to choose both inputs. (Our poisoner cannot).
b) There is scope for adding random junk somewhere. (Likewise).

In theory, you can do a collision without random junk, but it would take considerably longer. Also important to note is that this has no impact on whether we're more likely to find inadvertant collisions with MD5. In theory, the shorter hash length will have an impact, simply by the birthday "paradox", but it's still pretty rare.

But it's not weak in preimage attacks - those where the attacker knows the hash, and/or the input, and wishes to construct an alternate input of their choosing which matches.

In order to perform caps poisoning with MD5, therefore, the attacker must:

i) Subvert the development process of the client.
ii) Optionally, to cover his tracks, subvert the XSF, thus allowing the attacker to have some control over what counts as legitimate input, thus reducing, to a degree, the random junk problem.

You'll note that Kevin Smith is in a position to do both, but no other person or entity is, throughout the entire world.

And anyone in either position is capable of inflicting a significantly higher damage by choosing to do some easier attack - if the developer of your client turns out to be an Evil Genius, you're henceforth Doomed. Similarly, if Council members wish to subtley undermine your security, we are in a position to do that.

3a. Do we specify an MTI algorithm or let the market decide?


I think we need an MTI, I have to admit I'd read the current text as essentially stating that SHA-1 was the MTI.


3b. If we specify an MTI algorithm, do we specify MD5 or SHA-1 or something else?

What concerns me is not that SHA-1 is a particularly poor choice, but that we may have reached that choice by applying faulty logic. SHA-1 does appear to have *some* weakness in preimage. I don't know if, given the similarity between SHA-1 and SHA-2, this also applies there, but I cannot find any mention of preimage weakness in MD5.

I'll drop my objection if people want - in fact, I'll drop it if the other two issues are resolved, but I would like people to take the opportunity to satisfy themselves that they've made the right choice in the face of the evidence.

So, some reading:

1) RFC4270 is an excellent backgrounder on the different attacks on hashes, and how these affect real-world protocols.

2) Wikipædia is helpful, too:

http://en.wikipedia.org/wiki/Birthday_attack demonstrates that we need around 2.2 x 10^19 possible inputs for MD5 before an inadvertant collision is more likely than 50%, assuming that these are randomly spread. (They aren't, so this is in effect a worst case).

http://en.wikipedia.org/wiki/Preimage_attack, http://en.wikipedia.org/wiki/Cryptographic_hash_function, both give detailed background.

Finally, of course, feel free to bug me by XMPP or email. :-)

Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply via email to