[Standards] Message-IDs

Simon Friedberger Tue, 13 Feb 2018 08:59:40 -0800

Hello List!

During the discussion on the different ID types at the summit I had an
idea for
a possible solution to the problem but not a sufficient understanding of the
problem to even discuss it. I tried to find somebody to discuss it with
in chat
afterwards but nobody was available and I forgot about it. To get it off
my ToDo
list, here is my current understanding. I hope it can be a basis for further
discussion.

A) Status-Quo:
Currently there are
A1. stanza-ID: generated by server
A2. origin-ID: generated by client
from https://xmpp.org/extensions/xep-0359.html and

A3. message-ID: this is the ID-attribute on the stanza
from https://tools.ietf.org/html/rfc6120#section-8.1.3

There are also (4.) SM-IDs in stream management but those are
per-stream and
unrelated.

B) Use-cases:
B1. MAM https://xmpp.org/extensions/xep-0313.html uses stanza-ID.
B2. MUCs require IDs to detect reflections of own messages.
And reflection is great because it gives everybody the same view
on the
MUC in the presence of things like autopastebin or other rewrites.
B3. Error responses have the same ID-attribute as the original stanza.

C) Problems with current situation:
C1. People dislike having so many different IDs.
This is not a problem per se but it does mean implementation
complexity
and confusion.
C2. According to Daniel it is not clear which ID should be used when
referencing things. In other words if he gets a delivery receipt
for an
ID the client might have based that on the origin-ID or the
message-ID.
I'm not sure if this should be considered relevant. People can
always
write broken clients which send back crap. Of course if it happens
unintentionally because of (C1.) fewer IDs would help
C3. Using origin-ID to detect MUC reflection doesn't always work
because MUCs
may not reflect it.
That's of course unfortunate but should IMHO considered an error
in the
MUC implementation (probably a transport) and fixed there. I
understand
that it might be difficult in some cases
( https://lab.louiz.org/louiz/biboumi/issues/3283 ) but as Daniel
already pointed out yesterday it is much easier to fix a transport,
since it knows which protocol it is talking, to instead of working
around it at the end.
In any case the current situation seems to be bad:
https://wiki.xmpp.org/web/XEP-Remarks/XEP-0045:_Multi-User_Chat#Matching_Your_Reflected_Message
C4. Clients require a bounce of their messages to learn the
stanza-id which
is used for MAM.
Why do they need to know? Maybe they want to reference their own
message.
Do they require this bounce anyway to make sure that their was
on rewriting?
C5. Some MUCs rewrite the message-id
Why is this allowed? It is even suggested here:
https://xmpp.org/extensions/xep-0045.html#message
C6. A global ID to reference messages might be nice.
C7. When referencing a message for example by "liking" it a forgeable ID
could get you to like things you didn't intend to like.
This is a difficult problem because in many cases it requires
malicious
clients and servers and those have a lot of power anyway.

D) Possible root cause:
People do not trust the message IDs assigned by others and therefore
want to
assign their own.

E) Suggested solutions, including partial solutions:
E1. message-ID and origin-ID should always be the same, as proposed
by Georg
in
https://mail.jabber.org/pipermail/standards/2017-September/033415.html
Some concerns where voiced in that thread the only valid one is
that due
to bad software we need to deal with the situation that they are
different anyway.
There was a privacy concern about the "by=" attribute but
origin-ID does
not actually have that.
According to Daniel and Georg things currently break down anyway
if this
does not hold.
E2. Make the ID verifiable: This is what I had in mind at the summit and
after some discussion yesterday Jonas and Dave basically immediately
came up with the same thing, so it might be reasonably
straightforward.
Basically, the client calculates the ID based on some
information that
it shares with the server like HASH(stream-id || sm-counter).
This would
allow the server to verify that the client generated a proper
ID. Jonas
suggested HMAC(key=stream-id, msg=sm-counter). If the message is
in a
MUC, the MUC server can provide the user with some salt and then a
HASH(message-counter || salt) could be used to ensure that
proper unique
IDs are generated.
This ID is based on there being a party which is in charge of
checking
the IDs. If you connect to a malicious MUC with malicious
clients they
can still send you whatever. I don't think that is a problem, is it?
E3. Simply make the ID: FROM-TIMESTAMP.
Here FROM needs to be the eventual FROM after possible
rewriting. Can
that be done?
And TIMESTAMP has to be strictly increasing so should have
sub-second
resolution.
I assume this is impossible because otherwise it would be to
easy. But
why is it impossible? :)

F) Left-overs:
F1. Would it be useful to have monotonically increasing IDs?
It seems these might be useful if not necessary to query the MAM or
some other archive for certain periods? I'm not sure.
F2. Discussions about malicious forgery of responses when IDs are
predictable
ended with the assumption that this is impossible because the
receiver
needs to be properly verified anyway.
F3. Zash wants to use timestamps in the MAM-ID. Why? Because of (F1.)?
F4. Related to (F1.): Would good IDs, possibly monotonically
increasing ones
simplify the problems that MAM and SM are solving?

I would be very happy if people would comment! :)

Regards,
Simon
_______________________________________________
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: [email protected]
_______________________________________________

[Standards] Message-IDs

Reply via email to