I'm looking at Qpid as a possible replacement for a home-grown back-end for
a large real-time chat operation. My design target is 1,000,000 simultaneous
online users, none of which is trusted (all the brokers and exchanges run in
a controlled data center), each of which is subscribed to between two and
ten brokers through their indivdual queues. Exchanges are all fanout, queues
are all unreliable. In addition, there's about 200,000 communication
channels ("chat rooms" or whatever -- not AMQP channels), which are created
by the controlled servers, and have their own incoming and outgoing
exchanges, and enforce rules on the messages flowing through them.The logical set-up is something like: Server create channel: 1) Create "channelname".in exchange 2) Create "channelname".out exchange 3) Create "server-specific" queue 4) Bind "channelname".in to the "server-specific" queue 5) Subscribe to the "server-specific" queue. At times, send messages on the "channelname.out" exchange. Client join the system: 1) Create a "clientname".in exchange 2) Create a "client-specific" queue 3) Bind "clientname".in to "client-specific" queue 4) Subscribe to "client-specific" queue. This allows the system to send messages targeted to the user. Client join a channel: 5) Bind "channelname".out to "client-specific" queue 6) At times, send messages to the "channelname".in exchange Now, to make this scale as appropriate, I'll need to use clustering/federation. However, the description for the federation commands for Qpid seem to assume that message processing will be the bottleneck. In this system, I'm almost certain that simply having a large number of clients connected to a physical machine will be the main bottleneck. (If it isn't, then that probably calls out some opportunity for optimizing the innards of the system) So, let's assume that each 1U pizza box or blade can do 20,000 connected users or processes. This means I'd need about 60 boxes/blades (50 for the 1,000,000 users, and 10 for the 200,000 server processes). This also means that there are 20,000 exchanges on each user-facing box, and 40,000 exchanges on each channel/chat-room-facing box. (Btw: I'm currently running 1 Gb/s Ethernet ports and backbone, but I could upgrade to 10 Gb/s backbone if needed). In the description for federation, it's unclear how to set this up such that it load balances appropriately, without putting mirrors of all 1,400,000 exchanges on all of the server boxes (which would probably not be the way to go). The most promising part I've seen was the queue route. Suppose I create a fully connected set of 60 physical boxes running a broker each, with 20,000-40,000 exchanges on each, perhaps I could create a queue route that routes from a given exchange output to a given queue, for delivery to the client over the internet. Vice versa, maybe I could have a route from the client broker to the chat room broker for each room that the client wants to send messages to. So, questions: 1) Is this reaosonable? Am I thinking wrong somewhere? Is 40,000 exchanges on a single machine with 20,000 separately connected clients a reasonable expectation for a quad-core Xeon blade type server? (each connection may send 5 messages per second at most, so actual message traffic won't be that great) 2) Does a queue route have to have a queue on the source and destination sides, or can I queue route from a local exchange to a remote queue? How would I set up a queue route from a locally connected client to a remote exchange? (for posting messages on the chat room exchange) 3) I'm separating exchanges for each client and channel/room here, with full fan-out for each exchange. This seems simplest -- is it also the best performing? Would it be better to have a single direct exchange per box, and use the routing key instead? 4) How do I do reasonable authentication in this system? The kinds of authentication and authorization I need are: - User is authenticated through external means (some super-user process), and issued a ticket (Kerberos-style). When connecting to a broker, the user uses this ticket. - The user can only receive data from the user-specific queue for that user. This queue is likely created by external means (super-user process), but subscribed by the (remote) user client. - The user can send data only to exchanges that some external means (super-user process) allows. When a user is joined to a channel/room, the system should open up the exchange for that room to the user. I *could* enforce this at the application level (when processing messages in the channel/room server code), but I'd prefer to weed out invalid message sources earlier than that. - Binding of exchanges to queues is controlled by external means (super-user process), not by the user. The only things that a remote user would be allowed to do is subscribe to the user-specific queue, and to post messages on a small subset of brokers. That subset is separate for each user, and will dynamically change. The group and authorization system does not seem like a good fit here, because users mutate (and are created) all the time; anything that lives in a text configuration file is unlikely to scale well with user flow. Peak sign-up periods may see 100+ new users sign up per minute, all of which expect to immediately be able to enter the system (assuming the proper client software is installed). Would it fit the model better if any message *from* a user also went to a user-specific queue, which in turn could be bound (perhaps based on routing key) to the appropriate exchanges for the appropriate targets? Would that make authentication and authorization easier to enforce? Sincerely, jw -- Americans might object: there is no way we would sacrifice our living standards for the benefit of people in the rest of the world. Nevertheless, whether we get there willingly or not, we shall soon have lower consumption rates, because our present rates are unsustainable.
