On Thu Apr  2 01:54:35 2009, Robin Redeker wrote:
If the server does route the wrong information to the wrong people it's buggy.

Well... Possibly. Or it might be lacking sufficient information to route correctly - that part we simply don't see today.


With a reverse roster lookup on a corrupted database anything might happen.

Hmmmm - I'm not talking about a corrupted database, I'm talking about a roster which has missed an unsubscribed or two.

I
don't get why this would justify the horrible amount of redundant information that is sent.

If privacy isn't important to you, go use MSN. ;-)

I want to tell my 10 contacts on jabber.org that I am online. So
sending 1 stanza will be 10 times less parsing, less bandwith, less CPU cost
for compression, than sending 10.


Do you have any figures? I'd be curious to know whether it's really linear. It could be, of course, but given the nature of the operations, I'd have thought it might be a bit less. Certainly the encryption wouldn't scale linearly, of course.


That factor scales with the number of contacts on a roster, which is linear, which is incredibly bad for something that can be accomplished with O(1)
complexity.


Well, that's the issue, you're not accomplishing the same result.

Even if you were, it's important to review what those O() things mean, because they can be a little deceptive.

You're suggesting the proposal is O(1), which is "Amortized Constant Time", or:

C = k

Which isn't really the case.

O(n) is what you say we currently use - that's:

C' = k'.n

I'll show later that your proposed solution is *also* O(n), at least considered as a whole.

Also getting a brief mention later on is O(log(n)), as performed by binary search trees:

C'' = k''.log(n)

Important to note is that C, C', and C'' depend on both quantity and the various constants - red-black trees, while O(log(n)), have quite a high k'', typically, and so are often avoided in favour of simpler linear algorithms which are faster for small values of n.


If the problem is synchronization of subscription states, well, then fix
the problem with synchronization of subscription states.


Or avoid it, which we're doing perfectly well right now.


> sender doesn't want you to, which is substantially worse, and without > any bad actor involved. (I could be wrong here, please check this.)

So we argue to "optimize" the protocol for this case? "Hmm, the other side might get information I didn't want to tell him, well, lets send it
10 times instead of once. I guess that will 'fix' the problem."


I'm certainly arguing that there's insufficient reason to break the protocol to save a few stanzas.


I understand what you mean: Going from 10 to 1 message will indeed
be a privacy problem in case of lost 'unsubscribed' stanzas. But the
solution is to ask why the unsubscribed stanza was lost. XEP-0198 will
address at least parts of this problem.


It will reduce, but not eliminate, a problem which we do not currently suffer as much from as we could.

Oh yay, let's burn some more CPU cycles, energy is sooo cheap. Everytime you send out 10 instead of 1 presence stanzas an additional amount of warmth is added
to the global warming.


You know XEP-0263 is a joke, right?


Compression is a good optimization if you don't have anything else to save anymore and really want to pay for the CPU time. But in this case, wouldn't be preventing the redundancy in the first place be more beneficial in the overall
performance?


Yes. But not at the cost of making the protocol worse. By the way, the "compression costs" argument can be somewhat misleading in some cases - not this particular case - but in general, with TLS on the stream, compression will reduce the cost. Besides, deflate is really very cheap in any case.

Everytime you compress a redundant presence stanza a tree dies. Please
think of the trees.

I really worry that this entire note is actually a excellent example of sarcasm, based on this, but I'll take it seriously as the safer option.

I could probably extract the actual energy involved in processing a presence stanza, but we're talking in terms of nanoseconds of CPU time - this doesn't, obviously, cost a tree, but over time it will add up.

Let's review what happens currently, assuming an initiating contact with n availableremote contacts across d domains, each of which has y resources connected - y being different in each instance obviously:

1) Client sends home server a broadcast presence. Cost E1, O(1).

2) Home server, which (almost certainly) has client's roster in-memory, iterates through and emits one presence stanza per remote subscribed contact known to be available. Cost E2, O(n). (Iterating through a list of known available contacts).

3) Home server encodes and transmits N stanzas, remote server decodes and transmits N stanzas. Cost E3, O(n). (One stanza per contact - arguably this is O(d) to open the stream, and O(y) to send the stanzas - you pick).

4) Remote server receives one or more fully-addressed stanzas, and broadcasts them to all resources. Cost E4, O(y). (Iterating through a list of resources of the recipient).

This gives E, as O(n) + O(y), a linear complexity algorithm.

You want to replace 2-4 with:

2') Home server collates roster by remote domain and emits one presence stanza per remote domain which has a subscribed contacts known to be available. This is, I'll accept, likely to be close to the energy cost of 2, although due to the fluctuating nature of the final clause that collation has to be done each time, so it'll be a little above. Cost E2' is, therefore > E2. O(n) + O(d) - I have a feeling this can be reduced to O(n).

3') encode/transmit/decode 1 stanza per remote domain. This is certainly an energy/cpu/cost saving compared to the 3 above, no argument from me here. Cost E3' is < E3. O(d) (One stanza per domain. Still linear, of course).

4') Lookup sender against all rosters in the system, and detirmine which of the resulting potential recipients is online and available to the sender. It seems reasonable that it would be possible to limit the lookup to only contacts who are online and available - assuming we ignore privacy lists - but you're still adding a lookup and the associating lookup mechanics (like a hash table or something) which must be maintained in-memory. I strongly suspect that this much, much more costly to use, build, and maintain than 4 above, hence E4' >> E4. I believe this to be possible to implement as O(log(y)), but I also suspect it's overwhemlingly more likely that it's O(y) (as derived from a reverse roster lookup O(log(y)) combined with a resource-broadcast lookup O(y)).

This give E' as O(n) + O(d) + O(y), a linear compexity algorithm. Poof goes your argument above about linear complexity versus constant amortized time.

I can't see (E3' - E3) < ((E4' - E4) + (E2' - E2)) as being at all likely, so I continue to maintain that this is a false optimization. (I also continue to maintain that this is an arse-clenchingly more complex solution to the problem of getting presence from one contact to another, but I hope nobody's arguing against me, there).

Now, as always, I encourage you to change my mind with suitably backed factual evidence.

Oh, and I would add that this does, of course, change dramatically if we introduce a non-meshed routing architecture between servers - since that reassociates responsibility anyway, it's not a problem there. (FWIW, I would note that this already occurs between client and home server).

Dave.
--
Dave Cridland - mailto:[email protected] - xmpp:[email protected]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Reply via email to