Re: [zeromq-dev] queue overhead
On Sun, Jul 27, 2014 at 11:13:31AM -0700, Justin Karneges wrote: I have a stable (in the addressing sense) worker that I want to take advantage of multiple cores. So, I run N instances of this worker, where N is the number of cores on the host machine, and each worker binds to its own socket. Components that wish to make use of this worker service connect to all N worker instances. Unfortunately this is a little awkward. The connecting components must be configured with the N socket specs. And it's hard to automate this, since even if the connecting components could generate socket specs programmatically, this still requires knowing the number of cores of the remote machine. What I'd like to do is put an adapter component in front of the N worker instances (on the same machine as the worker instances) that binds to a single socket. It would route to the N workers, and this is easily done since the adapter lives on the same machine and knows the number of cores. Connecting components could then simply connect to this adapter, and not need to care about the number of remote cores. The question I have is what kind of overhead this introduces. An MxN set of connections between M remote components and the N workers seems like it would be far more efficient than M-1-N, which looks like a bottleneck. But maybe in practice, if the routing is very simple, then it becomes negligible? Justin You want one worker per core? So that is all a single system? So why not multithread your worker? You have one main thread that handles the communication with the outside using a ROUTER socket and talks to the worker threads using inproc://. That way you have a single socket to connect to and inproc:// avoids the overhead of retransmitting messages to other processes or between systems. MfG Goswin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] queue overhead
there is a queue in zmq that you can use as template for own mods. I took it years ago for following functionality: - creating a queue device, supplying two endpoints that others can connect to (one bcast, one rep/req) - creating two internal endpoints, one for broadcasting, one for req/rep - instanciating multiple workers, that connect to the two internal endpoints - managing a list of ready workers for dispatching messages only to idle ones. I would have attached the code here, but the queue logic is that mixed with my own service concept, that it would be of few use for you. to handle the several endpoints to different services, I used a central dispatcher, that is used by all external progs etc. and that knows, where to send which messages (including subscription themes in bcast channel to target a special sercvice instance). works fine since ~4 years now with a really old zmq. I'm just about to check, if I could drop my own code when using a new version of zmq ;o) ^5 sven Am 2014-07-27 20:13, schrieb Justin Karneges: I have a stable (in the addressing sense) worker that I want to take advantage of multiple cores. So, I run N instances of this worker, where N is the number of cores on the host machine, and each worker binds to its own socket. Components that wish to make use of this worker service connect to all N worker instances. Unfortunately this is a little awkward. The connecting components must be configured with the N socket specs. And it's hard to automate this, since even if the connecting components could generate socket specs programmatically, this still requires knowing the number of cores of the remote machine. What I'd like to do is put an adapter component in front of the N worker instances (on the same machine as the worker instances) that binds to a single socket. It would route to the N workers, and this is easily done since the adapter lives on the same machine and knows the number of cores. Connecting components could then simply connect to this adapter, and not need to care about the number of remote cores. The question I have is what kind of overhead this introduces. An MxN set of connections between M remote components and the N workers seems like it would be far more efficient than M-1-N, which looks like a bottleneck. But maybe in practice, if the routing is very simple, then it becomes negligible? Justin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev [1] Links: -- [1] http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] queue overhead
Putting router process on the same box where your workers are -- is (in my experience) bad idea, if you strive for best-of-the-best perf. Routing is not negligible, it's a process with always running threads, so it eats cpu all the time. So, now, on a single machine you have always busy worker processes and always busy router, hence, high rate for context switching. -- artemv 2014-07-27 21:13 GMT+03:00 Justin Karneges jus...@affinix.com: I have a stable (in the addressing sense) worker that I want to take advantage of multiple cores. So, I run N instances of this worker, where N is the number of cores on the host machine, and each worker binds to its own socket. Components that wish to make use of this worker service connect to all N worker instances. Unfortunately this is a little awkward. The connecting components must be configured with the N socket specs. And it's hard to automate this, since even if the connecting components could generate socket specs programmatically, this still requires knowing the number of cores of the remote machine. What I'd like to do is put an adapter component in front of the N worker instances (on the same machine as the worker instances) that binds to a single socket. It would route to the N workers, and this is easily done since the adapter lives on the same machine and knows the number of cores. Connecting components could then simply connect to this adapter, and not need to care about the number of remote cores. The question I have is what kind of overhead this introduces. An MxN set of connections between M remote components and the N workers seems like it would be far more efficient than M-1-N, which looks like a bottleneck. But maybe in practice, if the routing is very simple, then it becomes negligible? Justin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
Re: [zeromq-dev] queue overhead
... found some ==old== (but working) snippet of a queue device that dispatches to several workers using pub and req/rep sockets. warning: the workers have to supply those READYforWORK message dummies to care for the ready/busy list in the queue. I attached my modified code for zmw's queue device ... maybe it helps you a bit ;o) Am 2014-07-27 20:13, schrieb Justin Karneges: I have a stable (in the addressing sense) worker that I want to take advantage of multiple cores. So, I run N instances of this worker, where N is the number of cores on the host machine, and each worker binds to its own socket. Components that wish to make use of this worker service connect to all N worker instances. Unfortunately this is a little awkward. The connecting components must be configured with the N socket specs. And it's hard to automate this, since even if the connecting components could generate socket specs programmatically, this still requires knowing the number of cores of the remote machine. What I'd like to do is put an adapter component in front of the N worker instances (on the same machine as the worker instances) that binds to a single socket. It would route to the N workers, and this is easily done since the adapter lives on the same machine and knows the number of cores. Connecting components could then simply connect to this adapter, and not need to care about the number of remote cores. The question I have is what kind of overhead this introduces. An MxN set of connections between M remote components and the N workers seems like it would be far more efficient than M-1-N, which looks like a bottleneck. But maybe in practice, if the routing is very simple, then it becomes negligible? Justin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev [1] Links: -- [1] http://lists.zeromq.org/mailman/listinfo/zeromq-dev //#include /home/sven/Download/zeromq-2.0.10/src/socket_base.hpp //#include /home/sven/Download/zeromq-2.0.10/src/err.hpp namespace zmq { int ownZmqQueue ( class socket_base_t*, class socket_base_t*, class socket_base_t*, class socket_base_t*, class socket_base_t*, class socket_base_t*); extern bool zmq_abort_on_queue_error; extern bool zmq_end_queue_devices; } /* Copyright (c) 2007-2010 iMatix Corporation This file is part of 0MQ. 0MQ is free software; you can redistribute it and/or modify it under the terms of the Lesser GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. 0MQ is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the Lesser GNU General Public License for more details. You should have received a copy of the Lesser GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/. */ #include stddef.h #include iostream #include set #include ../include/zmq.h #include ownZmqQueue.h #include ../general/logging.h #include zhelpers.h #include zmsg.h using namespace std; namespace zmq { LOGGER_INIT_C(ZmqQueue); #define NBR_WORKERS 99 int ownZmqQueue ( class socket_base_t *clsocket_, class socket_base_t *wosocket_, class socket_base_t *sub_clients, class socket_base_t *sub_workers, class socket_base_t *pub_clients, class socket_base_t *pub_workers) { bool zmq_end_queue_device=false; // Queue of available workers int available_workers = 0; std::string *worker_queue [NBR_WORKERS]; // max.100 threads (senseless high) memset(worker_queue,0,NBR_WORKERS*sizeof(char*)); std::setstd::string known_workers; zmq_msg_t msg; int rc = zmq_msg_init (msg); if (rc != 0) { logFatal(zmq_msg could not be initialized (errno=errno,strerror(errno)); return errno; } int numPolls=2; zmq_pollitem_t allitems [4]={ // the publishing port of the dispatcher if this IS a dispatcher { pub_workers, 0, ZMQ_POLLIN, 0 }, // Always poll for worker activity on backend { wosocket_, 0, ZMQ_POLLIN, 0 }, // Poll front-end only if we have available workers
Re: [zeromq-dev] queue overhead
Justin, The way you would structure this is by setting up a zmq_proxy() that would bind to the “well known” port to receive tasks. On the back end, the proxy would also bind to a local tcp port (or via ipc, perf should be the same). The workers on the box would all connect to this backend port. The proxy would just receive incoming messages and then dole them out to the workers. If you use push/pull sockets, it will round robin them. If you need something more sophisticated, you can build your own proxy component and make the routing and load balancing logic as smart (or as dumb) as you like. The perf hit will be measurable but it’s likely on the order of a millisecond or two. It’s impossible to be more definitive than that since it is very dependent upon your hardware. If this is on a dedicated box with 10GB interfaces, the router will be very fast. If this is on a Micro AWS instance in the cloud, then the router will probably takes a few 10s of milliseconds to do its work. If you use TCP endpoints, you can start to spread your workers over multiple boxes and scale horizontally. It’s pretty neat. None of this is difficult, so benchmark it on your own hardware to determine the exact overhead. I think the flexibility that a configuration like this offers is well worth a tiny overhead. If you need redundancy then I suggest reading through the advanced patterns in the zguide. All of this is covered there in detail usually with working code. cr On Jul 27, 2014, at 1:13 PM, Justin Karneges jus...@affinix.com wrote: I have a stable (in the addressing sense) worker that I want to take advantage of multiple cores. So, I run N instances of this worker, where N is the number of cores on the host machine, and each worker binds to its own socket. Components that wish to make use of this worker service connect to all N worker instances. Unfortunately this is a little awkward. The connecting components must be configured with the N socket specs. And it's hard to automate this, since even if the connecting components could generate socket specs programmatically, this still requires knowing the number of cores of the remote machine. What I'd like to do is put an adapter component in front of the N worker instances (on the same machine as the worker instances) that binds to a single socket. It would route to the N workers, and this is easily done since the adapter lives on the same machine and knows the number of cores. Connecting components could then simply connect to this adapter, and not need to care about the number of remote cores. The question I have is what kind of overhead this introduces. An MxN set of connections between M remote components and the N workers seems like it would be far more efficient than M-1-N, which looks like a bottleneck. But maybe in practice, if the routing is very simple, then it becomes negligible? Justin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev
[zeromq-dev] queue overhead
I have a stable (in the addressing sense) worker that I want to take advantage of multiple cores. So, I run N instances of this worker, where N is the number of cores on the host machine, and each worker binds to its own socket. Components that wish to make use of this worker service connect to all N worker instances. Unfortunately this is a little awkward. The connecting components must be configured with the N socket specs. And it's hard to automate this, since even if the connecting components could generate socket specs programmatically, this still requires knowing the number of cores of the remote machine. What I'd like to do is put an adapter component in front of the N worker instances (on the same machine as the worker instances) that binds to a single socket. It would route to the N workers, and this is easily done since the adapter lives on the same machine and knows the number of cores. Connecting components could then simply connect to this adapter, and not need to care about the number of remote cores. The question I have is what kind of overhead this introduces. An MxN set of connections between M remote components and the N workers seems like it would be far more efficient than M-1-N, which looks like a bottleneck. But maybe in practice, if the routing is very simple, then it becomes negligible? Justin ___ zeromq-dev mailing list zeromq-dev@lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev