Re: [Standards] Presence distribution

Dave Cridland Mon, 06 Apr 2009 05:19:54 -0700

On Thu Apr  2 01:54:35 2009, Robin Redeker wrote:

If the server does route the wrong information to the wrong peopleit's buggy.

Well... Possibly. Or it might be lacking sufficient information toroute correctly - that part we simply don't see today.

With a reverse roster lookup on a corrupted database anything mighthappen.

Hmmmm - I'm not talking about a corrupted database, I'm talking abouta roster which has missed an unsubscribed or two.

I
don't get why this would justify the horrible amount of redundantinformationthat is sent.


If privacy isn't important to you, go use MSN. ;-)

I want to tell my 10 contacts on jabber.org that I am online. So
sending 1 stanza will be 10 times less parsing, less bandwith, lessCPU cost
for compression, than sending 10.

Do you have any figures? I'd be curious to know whether it's reallylinear. It could be, of course, but given the nature of theoperations, I'd have thought it might be a bit less. Certainly theencryption wouldn't scale linearly, of course.

That factor scales with the number of contacts on a roster, whichis linear,which is incredibly bad for something that can be accomplished withO(1)
complexity.

Well, that's the issue, you're not accomplishing the same result.

Even if you were, it's important to review what those O() thingsmean, because they can be a little deceptive.

You're suggesting the proposal is O(1), which is "Amortized ConstantTime", or:


C = k

Which isn't really the case.

O(n) is what you say we currently use - that's:

C' = k'.n

I'll show later that your proposed solution is *also* O(n), at leastconsidered as a whole.

Also getting a brief mention later on is O(log(n)), as performed bybinary search trees:


C'' = k''.log(n)

Important to note is that C, C', and C'' depend on both quantity andthe various constants - red-black trees, while O(log(n)), have quitea high k'', typically, and so are often avoided in favour of simplerlinear algorithms which are faster for small values of n.

If the problem is synchronization of subscription states, well,then fix
the problem with synchronization of subscription states.

Or avoid it, which we're doing perfectly well right now.

> sender doesn't want you to, which is substantially worse, andwithout> any bad actor involved. (I could be wrong here, please checkthis.)
So we argue to "optimize" the protocol for this case? "Hmm, theotherside might get information I didn't want to tell him, well, letssend it
10 times instead of once. I guess that will 'fix' the problem."

I'm certainly arguing that there's insufficient reason to break theprotocol to save a few stanzas.

I understand what you mean: Going from 10 to 1 message will indeed
be a privacy problem in case of lost 'unsubscribed' stanzas. But the

solution is to ask why the unsubscribed stanza was lost. XEP-0198will

address at least parts of this problem.

It will reduce, but not eliminate, a problem which we do notcurrently suffer as much from as we could.

Oh yay, let's burn some more CPU cycles, energy is sooo cheap.Everytime yousend out 10 instead of 1 presence stanzas an additional amount ofwarmth is added
to the global warming.

You know XEP-0263 is a joke, right?

Compression is a good optimization if you don't have anything elseto saveanymore and really want to pay for the CPU time. But in this case,wouldn't bepreventing the redundancy in the first place be more beneficial inthe overall
performance?

Yes. But not at the cost of making the protocol worse. By the way,the "compression costs" argument can be somewhat misleading in somecases - not this particular case - but in general, with TLS on thestream, compression will reduce the cost. Besides, deflate is reallyvery cheap in any case.

Everytime you compress a redundant presence stanza a tree dies.Please
think of the trees.

I really worry that this entire note is actually a excellent exampleof sarcasm, based on this, but I'll take it seriously as the saferoption.

I could probably extract the actual energy involved in processing apresence stanza, but we're talking in terms of nanoseconds of CPUtime - this doesn't, obviously, cost a tree, but over time it willadd up.

Let's review what happens currently, assuming an initiating contactwith n availableremote contacts across d domains, each of which has yresources connected - y being different in each instance obviously:


1) Client sends home server a broadcast presence. Cost E1, O(1).

2) Home server, which (almost certainly) has client's rosterin-memory, iterates through and emits one presence stanza per remotesubscribed contact known to be available. Cost E2, O(n). (Iteratingthrough a list of known available contacts).

3) Home server encodes and transmits N stanzas, remote server decodesand transmits N stanzas. Cost E3, O(n). (One stanza per contact -arguably this is O(d) to open the stream, and O(y) to send thestanzas - you pick).

4) Remote server receives one or more fully-addressed stanzas, andbroadcasts them to all resources. Cost E4, O(y). (Iterating through alist of resources of the recipient).


This gives E, as O(n) + O(y), a linear complexity algorithm.

You want to replace 2-4 with:

2') Home server collates roster by remote domain and emits onepresence stanza per remote domain which has a subscribed contactsknown to be available. This is, I'll accept, likely to be close tothe energy cost of 2, although due to the fluctuating nature of thefinal clause that collation has to be done each time, so it'll be alittle above. Cost E2' is, therefore > E2. O(n) + O(d) - I have afeeling this can be reduced to O(n).

3') encode/transmit/decode 1 stanza per remote domain. This iscertainly an energy/cpu/cost saving compared to the 3 above, noargument from me here. Cost E3' is < E3. O(d) (One stanza per domain.Still linear, of course).

4') Lookup sender against all rosters in the system, and detirminewhich of the resulting potential recipients is online and availableto the sender. It seems reasonable that it would be possible to limitthe lookup to only contacts who are online and available - assumingwe ignore privacy lists - but you're still adding a lookup and theassociating lookup mechanics (like a hash table or something) whichmust be maintained in-memory. I strongly suspect that this much, muchmore costly to use, build, and maintain than 4 above, hence E4' >>E4. I believe this to be possible to implement as O(log(y)), but Ialso suspect it's overwhemlingly more likely that it's O(y) (asderived from a reverse roster lookup O(log(y)) combined with aresource-broadcast lookup O(y)).

This give E' as O(n) + O(d) + O(y), a linear compexity algorithm.Poof goes your argument above about linear complexity versus constantamortized time.

I can't see (E3' - E3) < ((E4' - E4) + (E2' - E2)) as being at alllikely, so I continue to maintain that this is a false optimization.(I also continue to maintain that this is an arse-clenchingly morecomplex solution to the problem of getting presence from one contactto another, but I hope nobody's arguing against me, there).

Now, as always, I encourage you to change my mind with suitablybacked factual evidence.

Oh, and I would add that this does, of course, change dramatically ifwe introduce a non-meshed routing architecture between servers -since that reassociates responsibility anyway, it's not a problemthere. (FWIW, I would note that this already occurs between clientand home server).


Dave.
--
Dave Cridland - mailto:[email protected] - xmpp:[email protected]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade

Re: [Standards] Presence distribution

Reply via email to