Hello List!
During the discussion on the different ID types at the summit I had an idea for a possible solution to the problem but not a sufficient understanding of the problem to even discuss it. I tried to find somebody to discuss it with in chat afterwards but nobody was available and I forgot about it. To get it off my ToDo list, here is my current understanding. I hope it can be a basis for further discussion. A) Status-Quo: Currently there are A1. stanza-ID: generated by server A2. origin-ID: generated by client from https://xmpp.org/extensions/xep-0359.html and A3. message-ID: this is the ID-attribute on the stanza from https://tools.ietf.org/html/rfc6120#section-8.1.3 There are also (4.) SM-IDs in stream management but those are per-stream and unrelated. B) Use-cases: B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID. B2. MUCs require IDs to detect reflections of own messages. And reflection is great because it gives everybody the same view on the MUC in the presence of things like autopastebin or other rewrites. B3. Error responses have the same ID-attribute as the original stanza. C) Problems with current situation: C1. People dislike having so many different IDs. This is not a problem per se but it does mean implementation complexity and confusion. C2. According to Daniel it is not clear which ID should be used when referencing things. In other words if he gets a delivery receipt for an ID the client might have based that on the origin-ID or the message-ID. I'm not sure if this should be considered relevant. People can always write broken clients which send back crap. Of course if it happens unintentionally because of (C1.) fewer IDs would help C3. Using origin-ID to detect MUC reflection doesn't always work because MUCs may not reflect it. That's of course unfortunate but should IMHO considered an error in the MUC implementation (probably a transport) and fixed there. I understand that it might be difficult in some cases ( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel already pointed out yesterday it is much easier to fix a transport, since it knows which protocol it is talking, to instead of working around it at the end. In any case the current situation seems to be bad: https://wiki.xmpp.org/web/XEP-Remarks/XEP-0045:_Multi-User_Chat#Matching_Your_Reflected_Message C4. Clients require a bounce of their messages to learn the stanza-id which is used for MAM. Why do they need to know? Maybe they want to reference their own message. Do they require this bounce anyway to make sure that their was on rewriting? C5. Some MUCs rewrite the message-id Why is this allowed? It is even suggested here: https://xmpp.org/extensions/xep-0045.html#message C6. A global ID to reference messages might be nice. C7. When referencing a message for example by "liking" it a forgeable ID could get you to like things you didn't intend to like. This is a difficult problem because in many cases it requires malicious clients and servers and those have a lot of power anyway. D) Possible root cause: People do not trust the message IDs assigned by others and therefore want to assign their own. E) Suggested solutions, including partial solutions: E1. message-ID and origin-ID should always be the same, as proposed by Georg in https://mail.jabber.org/pipermail/standards/2017-September/033415.html Some concerns where voiced in that thread the only valid one is that due to bad software we need to deal with the situation that they are different anyway. There was a privacy concern about the "by=" attribute but origin-ID does not actually have that. According to Daniel and Georg things currently break down anyway if this does not hold. E2. Make the ID verifiable: This is what I had in mind at the summit and after some discussion yesterday Jonas and Dave basically immediately came up with the same thing, so it might be reasonably straightforward. Basically, the client calculates the ID based on some information that it shares with the server like HASH(stream-id || sm-counter). This would allow the server to verify that the client generated a proper ID. Jonas suggested HMAC(key=stream-id, msg=sm-counter). If the message is in a MUC, the MUC server can provide the user with some salt and then a HASH(message-counter || salt) could be used to ensure that proper unique IDs are generated. This ID is based on there being a party which is in charge of checking the IDs. If you connect to a malicious MUC with malicious clients they can still send you whatever. I don't think that is a problem, is it? E3. Simply make the ID: FROM-TIMESTAMP. Here FROM needs to be the eventual FROM after possible rewriting. Can that be done? And TIMESTAMP has to be strictly increasing so should have sub-second resolution. I assume this is impossible because otherwise it would be to easy. But why is it impossible? :) F) Left-overs: F1. Would it be useful to have monotonically increasing IDs? It seems these might be useful if not necessary to query the MAM or some other archive for certain periods? I'm not sure. F2. Discussions about malicious forgery of responses when IDs are predictable ended with the assumption that this is impossible because the receiver needs to be properly verified anyway. F3. Zash wants to use timestamps in the MAM-ID. Why? Because of (F1.)? F4. Related to (F1.): Would good IDs, possibly monotonically increasing ones simplify the problems that MAM and SM are solving? I would be very happy if people would comment! :) Regards, Simon _______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: standards-unsubscr...@xmpp.org _______________________________________________