First of all thanks for giving an insight. Kindly find my queries and some more insight into what the application actually looks like.
Abhinav Singh, Bangalore, India http://abhinavsingh.com ________________________________ From: Nathan Fritz <[email protected]> To: XMPP PubSub <[email protected]> Sent: Wed, February 3, 2010 1:25:17 AM Subject: Re: [PubSub] Questions over scalability while using pubsub under high traffic (200K concurrent users) Abhinav, Let me reply to you inline. On Mon, Feb 1, 2010 at 10:51 PM, Abhinav Singh <[email protected]> wrote: >Hi, > >I am looking to deploy pub sub for a website build to display live sports >scorecard to the users. >At peak traffic, there are about 200K concurrent users. Since the load is >probably too much, I am doubting if pub sub can really be a good solution here. > Just like a website can scale to 200,000 concurrent users, Pubsub and XMPP can, it's just a matter of having an effective strategy to do so. Thanks that boosts up my confidence. Yup, i understand from gtalk and facebook chat that challenge is mainly in implementation. >Here is an overview of how I am looking to deploy pub sub: > > 1. User visits http://localhost/match-id-23000 > 2. As soon as user loads the match page, he is subscribed to the match > specific node Do these users have accounts or are they anonymous? Temporary subscriptions while web browsing requires a pubsub service that supports temporary subscriptions. Yes users do have an account, however to start with we can skip user/pass check. That's how I am moving ahead right now. Login the user anonymously and then subscribe him to the pubsub node. 1. There can be 1 or more (<5) publishers, who actually are vendors sitting in remote locations > 2. Vendors publish the live scorecard as and when data is available > with them > 3. Subscribed users gets the live scorecard update > in almost real-timeHere is the technical flow of the same: > > 1. On page load, an ajax is fired subscribing the user for match node > 2. Target url for all ajax requests in a PHP connection manager (say > bosh.class.php) > I assume you're talking about JAXL here. Also consider client-side for browser updates with strophe.js. Yes currently I am developing this using Jaxl. (Yes, I am quite in favor of going with browser side and other solutions) 1. PHP connection manager parse incoming request params, generated appropriate xml's as specified in various XEP's and communicate with the jabber server > 2. PHP conn. manager communicate by curling the jabber server > 3. Jabber server respond back whenever there is new data available for > the subscribers >Now the scalability doubts which comes to my mind are: > > 1. With PHP connection manager holding every single incoming request > (timeout 60 sec) and waiting for a response from the jabber server, I assume > under 200K concurrent users the web server will soon stop accepting the > incoming ajax requests. This is why I suggest client-side strophe.js -- though I'm sure you *could* make the server-side management work if you threw enough hardware at the problem. Yes, I think server side solutions might not scale because most of the time incoming ajax request will be waiting for response from the web server. Apache max child and max threads might also require some tuning. 1. Also are ejabberd, openfire and the likes capable of handling such a > load. > Ejabberd, configured correctly can handle the load. Now, I'm not talking about ejabberd's publish-subscribe interface. I would suggest using Ralph's Idavoll or my SleekPubsub which I'm very actively developing to solve this problem. I also suggest a small cluster or some load balancing. I do not recommend Openfire for this kind of load. You can also take a look at Tigase w/ their built in Publish-Subscribe. Right now on my dev env I am using openfire, but will soon move to ejabberd. Will study about the other suggested alternatives. 1. > > 2. If yes, how much is the estimated throughput of these jabber server. > (need to access amount of infrastructure required) > A well tuned XMPP Server generally shoots for 100,000 messages per second on a single core CPU. With extra cores, clustering, etc, you can easily get higher. This is good signs I must say. However here is why 100,000 mssg per second capacity is actually not an issue. Here is a bit more about the application: 1. It's about delivering the score of a match 2. Hence score of a match might not change all the time 3. In the application, match score can change roughly every 30 sec 4. In those 30 seconds all these concurrent users will be just waiting for that update 5. Hence 2 things are a problem as I see, a) The webserver holding so many concurrent users for 30 seconds b) The jabber server being able to deliver this change in score to all these concurrent users in one go. 100,000 is a promising number. Load balancing and clustering will ofcourse help. The throughput value will help in calculating number of servers required. 1. > > 2. Overall I think PHP connection manager will be the main bottleneck > here. So will it be advisable to proxy all incoming ajax requests directly to > the jabber server and put the login between the vendor and the jabber server. > I agree. Again, consider client-side connection to BOSH in JS with strophe.js. Right, this can save me from a lot of pain and might still scale at the same time. 1. > >Thanks in advance for your helpful advise and insight. > Abhinav Singh, >Bangalore, >>India >http://abhinavsingh.com > > Thanks. I'm a fulltime XMPP consultant, and happy to help with these kinds of projects. I'm happy to give free advice, however, when it comes to actually writing code or helping you install and configure services, I have reasonable rates. You should know that what you're proposing it completely doable. It was a lot harder a couple of years ago, as there weren't very many good implementations of Publish-Subscribe (although Idavoll was around). Some people had bad experience trying to do this with ejabberd's earlier implementations of publish-subscribe, though I hear it has improved. Yes Nathan I know about your capabilities :D However over here I am myself an employee, who has this task to explore xmpp based solution and make it work if possible. "You should know that what you're proposing it completely doable" - makes me feel better and gives me confidence to move forward with xmpp. Also, since I think language might not be a problem here. js, php, erlang and c are the 4 options as i see over here. 1. js: with something like strope 2. php: with something like jaxl, but then i doubt the scalability capability of a php based connection manager under such load. 3. c: I have some experience with libevent based custom web servers. Such a web server might scale and at the same time do the connection manager task as well as web server task too. 4. erlang: Though I know little about erlang, but if erlang have some inherited advantages in such kind of applications, I won't mind learning a bit about it. Also since facebook proved it's worth in a similar application, i might dig into it (using mochiweb as first choice here) , if none of the above solutions work out well here. To start with, a js based solution can be worked out. But as the application matures and a few more features are added, we will also need to have some logic done before ajax connects to the jabber server and before it receive the response back from the server. user ---> web server doing some logic (e.g. authentication) ----> request goes to jabber and responded back to web server ----> web server do some more logic (e.g. stitching some data with outgoing response and formating the response in json format) -----> user gets the data. My main object as of now is to know what configuration can help me out over here. The application is already running in stable mode with enough hardware put in, but it runs in a classic ajax mode. Request being made every 20-30 second mainly due to the nature of the application. However, still those extra few seconds makes a lot of difference for the fans (the site visitors). Since I have some head start with xmpp, I want to explore if xmpp based solution can help us serve more real time scores to the users. Abhinav Singh, Bangalore, India http://abhinavsingh.com
