Re: [Standards] Presence distribution

Mridul Muralidharan Mon, 06 Apr 2009 22:42:04 -0700

--- On Mon, 6/4/09, Dave Cridland <[email protected]> wrote:

> From: Dave Cridland <[email protected]>
> Subject: Re: [Standards] Presence distribution
> To: "XMPP Standards" <[email protected]>
> Date: Monday, 6 April, 2009, 11:10 PM
> On Mon Apr  6 17:54:16 2009,
> Philipp Hancke wrote:
> > Actually, the current model enables a single client to
> bring down
> > a server by making the server broadcast large presence
> stanzas at
> > a high rate. Simplistic bandwidth control mechanisms
> such as "karma"
> > account traffic per socket, and do not throttle the
> client when it
> > generates excessive amounts of s2s traffic.
> > 
> > 
> How does the client force the subscription?
> 
> >> Yes.. But not at the cost of making the protocol
> worse. By the way, the
> > 
> > It is not "worse", just different.
> 
> No, it is worse, the failure case of the missing
> unsubscribed is a privacy leak.
> 
> >> Let's review what happens currently, assuming an
> initiating contact with n availableremote contacts across d
> domains, each of which has y resources connected - y being
> different in each instance obviously:
> > 
> > "available remote contacts" is those bis
> optimizations?
> > Actually, were those discussed before they were
> introduced?
> > What is their impact on presence reliability?
> > 
> > 
> Yes, and they're not mandated.
> 
> 
> >> 1) Client sends home server a broadcast presence.
> Cost E1, O(1).
> > 
> > Why not shift the responsibility for distribution to
> the client?
> > This would make E1=O(n) and E2=O(1) (simple message
> routing).
> > Afaics, your argument applies for that as well.
> > 
> > 
> Ah, but the server has responsibility anyway, so it's a
> reasonable optimization there.
> 
> 
> >> 2) Home server, which (almost certainly) has
> client's roster in-memory, iterates through and emits one
> presence stanza per remote subscribed contact known to be
> available. Cost E2, O(n). (Iterating through a list of known
> available contacts).
> > 
> > Actually, why don't you send the presence stanza to
> each connected
> > resource of each available contact here - iterating
> through a list of
> > known avaible contact resources)?
> > 
> > 
> It's not that server's responsibility to maintain that
> definitive list, though.
> 
> >> 3) Home server encodes and transmits N stanzas,
> remote server decodes and transmits N stanzas. Cost E3,
> O(n). (One stanza per contact - arguably this is O(d) to
> open the stream, and O(y) to send the stanzas - you pick).
> > 
> > O(y) should be O(n) here?
> > 
> > 
> No, it's d*O(y), but that's equivalent to O(y). I don't
> think it makes a huge difference whether it's O(y) + O(d) or
> O(n), though.
> 
> 
> >> 4) Remote server receives one or more
> fully-addressed stanzas, and broadcasts them to all
> resources. Cost E4, O(y). (Iterating through a list of
> resources of the recipient).
> > 
> > This is done n times. And you are neglecting costs for
> privacy list
> > checks (I will call that O(p), different for each
> instance and
> > quite costly imo).
> > Hence E4 = O(n) + n*(O(p) + O(y))
> > 
> > 
> Well, the moment you start considering privacy lists, the
> entire thing breaks, because evaluation of those needs
> handling at both ends, so a reverse roster lookup fails.
> Instead, what's needed is list building, which means
> evaluating the privacy list locally n times, in order to
> collate a per-domain list of contacts. I've not considered
> this, because of the risks of such lists being lost, or out
> of sync themselves, and the problems that having arbitrary
> servers able to command resources on your server.



This was discussed and proposed a while back, and iirc there was a prototype 
... not sure where it went though : but the saving were interestingly high.

http://blogs.sun.com/mridul/entry/minimising_s2s_traffic
http://blogs.sun.com/mridul/entry/minimising_s2s_traffic_in_xmpp

are some rough notes/thoughts which could be bashed into shape, if required.

- Mridul


> 
> 
> >> This gives E, as O(n) + O(y), a linear complexity
> algorithm.
> > 
> > O(n) + n*(O(p) + O(y)).
> > 
> > 
> That's O(n + np + ny), I think.
> 
> 
> >> You want to replace 2-4 with:
> >> 
> >> 2') Home server collates roster by remote domain
> and emits one presence stanza per remote domain which has a
> subscribed contacts known to be available. This is, I'll
> accept, likely to be close to the energy cost of 2, although
> due to the fluctuating nature of  the final clause that
> collation has to be done each time, so it'll be a little
> above. Cost E2' is, therefore > E2. O(n) + O(d) - I have
> a feeling this can be reduced to O(n).
> >> 
> >> 3') encode/transmit/decode 1 stanza per remote
> domain.. This is certainly an energy/cpu/cost saving compared
> to the 3 above, no argument from me here. Cost E3' is <
> E3. O(d) (One stanza per domain.. Still linear, of course).
> > 
> > Linear with d. Assuming that people 'cluster', d
> << n is reasonable.
> > 
> > 
> Sure, but I'm astonishingly unconvinced that the O(y) term
> in E3 is significant - I suspect the maintainence of the
> domain XMLstreams is by far the bigger cost, here - in fact,
> I personally think it's entirely reasonable to suggest that
> for all practical purposes, E3 == E3' == O(d), although in
> practise E3' will be slightly smaller, and in the case where
> y is particularly big, that difference will of course become
> significant.
> 
> But for the vast majority of cases, y == 1 - in my roster,
> it's a maximum of 12 (for jabber.org), the next highest is 6
> or so (for gmail.com), and the remaining levels or 2 or 1,
> across quite a few domains. I don't think under these cases
> that - unless the constant involved is massive - O(y) is
> going to be a substantial factor.
> 
> 
> >> 4') Lookup sender against all rosters in the
> system, and detirmine which of the resulting potential
> recipients is online and available to the sender. It seems
> reasonable that it would be possible to limit the lookup to
> only contacts who are online and available - assuming we
> ignore privacy lists - but you're still adding a lookup and
> the
> > 
> > This depends on how you do the lookup imo.
> > 
> > 
> Yes, Waqas argues that it might be replacement rather than
> addition.
> 
> 
> >> associating lookup mechanics (like a hash table or
> something) which must be maintained in-memory. I strongly
> suspect that this much, much more costly to use, build, and
> maintain than 4 above, hence E4' >> E4. I
> > 
> > If you are neglecting privacy lists above certainly..
> > 
> > I think you can reduce this to E4' = n*O(y).
> > 
> > 
> n cannot possibly occur in this - n is a figure local to
> the initiating server. It might happen d times, though, on
> different servers.
> 
> 
> >> believe this to be possible to implement as
> O(log(y)), but I also suspect it's overwhemlingly more
> likely that it's O(y) (as derived from a reverse roster
> lookup O(log(y)) combined with a resource-broadcast lookup
> O(y)).
> >> 
> >> This give E' as O(n) + O(d) + O(y), a linear
> compexity algorithm. Poof goes your argument above about
> linear complexity versus constant amortized time.
> >> 
> >> I can't see (E3' - E3) < ((E4' - E4) + (E2' -
> E2)) as being at all likely, so I continue to maintain that
> this is a false optimization.
> > 
> > I disagree. But at least your argument is technically
> founded and better
> > than what I have heard from the council in 2006 (and I
> had to gather
> > that from the chatroom).
> > 
> > 
> Waqas points out some potential flaws - I'm still not
> convinced that it represents a cost-saving for either end,
> but the difference might well be smaller than I thought. In
> particular, he suggests that E4 might be implemented as a
> single multivalued hashtable, thus reducing that lookup to
> O(1).
> 
> We had a very interesting conversation about the relative
> costs of string internalization and all sorts.
> 
> 
> > I care about effective flood control mechanisms. This
> is about moving
> > cost where they are affordable.
> > 
> >> (I also continue to maintain that this is an
> arse-clenchingly more complex solution to the problem of
> getting presence from one contact to another, but I hope
> nobody's arguing against me, there).
> > 
> > It is a different solution. One that has been used for
> years on IRC.
> 
> IRC has a different structure, and uniform view throughout
> the network - indeed, one server is specifically designed to
> be indistinguishable from any other. That's not the case in
> XMPP, and means different approaches and considerations. IRC
> behaves like a single XMPP cluster, in that respect, where
> all these optimizations are entirely possible.
> 
> > p.s.: I still want a review for that 220-rewrite from
> you.
> 
> Pester me next week?
> 
> Dave.
> --Dave Cridland - mailto:[email protected]
> - xmpp:[email protected]
>  -
> acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
>  - http://dave.cridland.net/
> Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
> 


      Add more friends to your messenger and enjoy! Go to 
http://messenger.yahoo.com/invite/
Re: [Standards] Presence distribution

Reply via email to