Re: [zeromq-dev] queue overhead

2014-07-29 Thread Goswin von Brederlow
On Sun, Jul 27, 2014 at 11:13:31AM -0700, Justin Karneges wrote:
 I have a stable (in the addressing sense) worker that I want to take 
 advantage of multiple cores. So, I run N instances of this worker, where 
 N is the number of cores on the host machine, and each worker binds to 
 its own socket. Components that wish to make use of this worker service 
 connect to all N worker instances.
 
 Unfortunately this is a little awkward. The connecting components must 
 be configured with the N socket specs. And it's hard to automate this, 
 since even if the connecting components could generate socket specs 
 programmatically, this still requires knowing the number of cores of the 
 remote machine.
 
 What I'd like to do is put an adapter component in front of the N worker 
 instances (on the same machine as the worker instances) that binds to a 
 single socket. It would route to the N workers, and this is easily done 
 since the adapter lives on the same machine and knows the number of 
 cores. Connecting components could then simply connect to this adapter, 
 and not need to care about the number of remote cores.
 
 The question I have is what kind of overhead this introduces. An MxN set 
 of connections between M remote components and the N workers seems like 
 it would be far more efficient than M-1-N, which looks like a 
 bottleneck. But maybe in practice, if the routing is very simple, then 
 it becomes negligible?
 
 Justin

You want one worker per core? So that is all a single system? So why
not multithread your worker?

You have one main thread that handles the communication with the
outside using a ROUTER socket and talks to the worker threads using
inproc://. That way you have a single socket to connect to and
inproc:// avoids the overhead of retransmitting messages to other
processes or between systems.

MfG
Goswin
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] queue overhead

2014-07-28 Thread sven . koebnick
 

there is a queue in zmq that you can use as template for own mods.


I took it years ago for following functionality: 

- creating a queue
device, supplying two endpoints that others can connect to (one bcast,
one rep/req)
- creating two internal endpoints, one for broadcasting,
one for req/rep
- instanciating multiple workers, that connect to the
two internal endpoints
- managing a list of ready workers for
dispatching messages only to idle ones. 

I would have attached the code
here, but the queue logic is that mixed with my own service concept,
that it would be of few use for you. 

to handle the several endpoints
to different services, I used a central dispatcher, that is used by all
external progs etc. and that knows, where to send which messages
(including subscription themes in bcast channel to target a special
sercvice instance). 

works fine since ~4 years now with a really old
zmq. 

I'm just about to check, if I could drop my own code when using a
new version of zmq ;o) 

^5 

sven 

Am 2014-07-27 20:13, schrieb Justin
Karneges: 

 I have a stable (in the addressing sense) worker that I
want to take 
 advantage of multiple cores. So, I run N instances of
this worker, where 
 N is the number of cores on the host machine, and
each worker binds to 
 its own socket. Components that wish to make use
of this worker service 
 connect to all N worker instances.
 

Unfortunately this is a little awkward. The connecting components must

 be configured with the N socket specs. And it's hard to automate
this, 
 since even if the connecting components could generate socket
specs 
 programmatically, this still requires knowing the number of
cores of the 
 remote machine.
 
 What I'd like to do is put an
adapter component in front of the N worker 
 instances (on the same
machine as the worker instances) that binds to a 
 single socket. It
would route to the N workers, and this is easily done 
 since the
adapter lives on the same machine and knows the number of 
 cores.
Connecting components could then simply connect to this adapter, 
 and
not need to care about the number of remote cores.
 
 The question I
have is what kind of overhead this introduces. An MxN set 
 of
connections between M remote components and the N workers seems like 

it would be far more efficient than M-1-N, which looks like a 

bottleneck. But maybe in practice, if the routing is very simple, then

 it becomes negligible?
 
 Justin
 

___
 zeromq-dev mailing
list
 zeromq-dev@lists.zeromq.org

http://lists.zeromq.org/mailman/listinfo/zeromq-dev [1]




Links:
--
[1] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] queue overhead

2014-07-28 Thread artemv zmq
Putting router process on the same box where your workers are -- is (in my
experience) bad idea, if you strive for best-of-the-best perf.
Routing is not negligible, it's a process with always running threads, so
it eats cpu all the time. So, now, on a single machine you have
always busy worker processes and always busy router, hence, high rate for
context switching.

--
artemv


2014-07-27 21:13 GMT+03:00 Justin Karneges jus...@affinix.com:

 I have a stable (in the addressing sense) worker that I want to take
 advantage of multiple cores. So, I run N instances of this worker, where
 N is the number of cores on the host machine, and each worker binds to
 its own socket. Components that wish to make use of this worker service
 connect to all N worker instances.

 Unfortunately this is a little awkward. The connecting components must
 be configured with the N socket specs. And it's hard to automate this,
 since even if the connecting components could generate socket specs
 programmatically, this still requires knowing the number of cores of the
 remote machine.

 What I'd like to do is put an adapter component in front of the N worker
 instances (on the same machine as the worker instances) that binds to a
 single socket. It would route to the N workers, and this is easily done
 since the adapter lives on the same machine and knows the number of
 cores. Connecting components could then simply connect to this adapter,
 and not need to care about the number of remote cores.

 The question I have is what kind of overhead this introduces. An MxN set
 of connections between M remote components and the N workers seems like
 it would be far more efficient than M-1-N, which looks like a
 bottleneck. But maybe in practice, if the routing is very simple, then
 it becomes negligible?

 Justin

 ___
 zeromq-dev mailing list
 zeromq-dev@lists.zeromq.org
 http://lists.zeromq.org/mailman/listinfo/zeromq-dev

___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


Re: [zeromq-dev] queue overhead

2014-07-28 Thread sven
 

... found some ==old== (but working) snippet of a queue device
that dispatches to several workers using pub and req/rep sockets.


warning: the workers have to supply those READYforWORK message
dummies to care for the ready/busy list in the queue. 

I attached my
modified code for zmw's queue device ... maybe it helps you a bit
;o)

Am 2014-07-27 20:13, schrieb Justin Karneges: 

 I have a stable
(in the addressing sense) worker that I want to take 
 advantage of
multiple cores. So, I run N instances of this worker, where 
 N is the
number of cores on the host machine, and each worker binds to 
 its own
socket. Components that wish to make use of this worker service 

connect to all N worker instances.
 
 Unfortunately this is a little
awkward. The connecting components must 
 be configured with the N
socket specs. And it's hard to automate this, 
 since even if the
connecting components could generate socket specs 
 programmatically,
this still requires knowing the number of cores of the 
 remote
machine.
 
 What I'd like to do is put an adapter component in front
of the N worker 
 instances (on the same machine as the worker
instances) that binds to a 
 single socket. It would route to the N
workers, and this is easily done 
 since the adapter lives on the same
machine and knows the number of 
 cores. Connecting components could
then simply connect to this adapter, 
 and not need to care about the
number of remote cores.
 
 The question I have is what kind of
overhead this introduces. An MxN set 
 of connections between M remote
components and the N workers seems like 
 it would be far more
efficient than M-1-N, which looks like a 
 bottleneck. But maybe in
practice, if the routing is very simple, then 
 it becomes
negligible?
 
 Justin
 

___
 zeromq-dev mailing
list
 zeromq-dev@lists.zeromq.org

http://lists.zeromq.org/mailman/listinfo/zeromq-dev [1]




Links:
--
[1] http://lists.zeromq.org/mailman/listinfo/zeromq-dev
//#include /home/sven/Download/zeromq-2.0.10/src/socket_base.hpp
//#include /home/sven/Download/zeromq-2.0.10/src/err.hpp

namespace zmq
{

int ownZmqQueue (
class socket_base_t*,
class socket_base_t*,
class socket_base_t*,
class socket_base_t*,
class socket_base_t*,
class socket_base_t*);
extern bool zmq_abort_on_queue_error;
extern bool zmq_end_queue_devices;
}
/*
Copyright (c) 2007-2010 iMatix Corporation

This file is part of 0MQ.

0MQ is free software; you can redistribute it and/or modify it under
the terms of the Lesser GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.

0MQ is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
Lesser GNU General Public License for more details.

You should have received a copy of the Lesser GNU General Public License
along with this program.  If not, see http://www.gnu.org/licenses/.
*/

#include stddef.h
#include iostream
#include set

#include ../include/zmq.h

#include ownZmqQueue.h
#include ../general/logging.h
#include zhelpers.h

#include zmsg.h

using namespace std;

namespace zmq {
LOGGER_INIT_C(ZmqQueue);

#define NBR_WORKERS 99

int ownZmqQueue (
class socket_base_t *clsocket_,
class socket_base_t *wosocket_,
class socket_base_t *sub_clients,
class socket_base_t *sub_workers,
class socket_base_t *pub_clients,
class socket_base_t *pub_workers)
{
bool zmq_end_queue_device=false;
//  Queue of available workers
int available_workers = 0;
std::string *worker_queue [NBR_WORKERS]; // max.100 threads 
(senseless high)
memset(worker_queue,0,NBR_WORKERS*sizeof(char*));
std::setstd::string known_workers;
zmq_msg_t msg;
int rc = zmq_msg_init (msg);
if (rc != 0) {
logFatal(zmq_msg could not be initialized 
(errno=errno,strerror(errno));
return errno;
}

int numPolls=2;
zmq_pollitem_t allitems [4]={
// the publishing port of the dispatcher if 
this IS a dispatcher
{ pub_workers, 0, ZMQ_POLLIN, 0 },
//  Always poll for worker activity on backend
{ wosocket_, 0, ZMQ_POLLIN, 0 },
//  Poll front-end only if we have available workers
  

Re: [zeromq-dev] queue overhead

2014-07-28 Thread Charles Remes
Justin,

The way you would structure this is by setting up a zmq_proxy() that would bind 
to the “well known” port to receive tasks. On the back end, the proxy would 
also bind to a local tcp port (or via ipc, perf should be the same). The 
workers on the box would all connect to this backend port. The proxy would just 
receive incoming messages and then dole them out to the workers. If you use 
push/pull sockets, it will round robin them. If you need something more 
sophisticated, you can build your own proxy component and make the routing and 
load balancing logic as smart (or as dumb) as you like.

The perf hit will be measurable but it’s likely on the order of a millisecond 
or two. It’s impossible to be more definitive than that since it is very 
dependent upon your hardware. If this is on a dedicated box with 10GB 
interfaces, the router will be very fast. If this is on a Micro AWS instance in 
the cloud, then the router will probably takes a few 10s of milliseconds to do 
its work. If you use TCP endpoints, you can start to spread your workers over 
multiple boxes and scale horizontally. It’s pretty neat.

None of this is difficult, so benchmark it on your own hardware to determine 
the exact overhead. I think the flexibility that a configuration like this 
offers is well worth a tiny overhead.

If you need redundancy then I suggest reading through the advanced patterns in 
the zguide. All of this is covered there in detail usually with working code.

cr

On Jul 27, 2014, at 1:13 PM, Justin Karneges jus...@affinix.com wrote:

 I have a stable (in the addressing sense) worker that I want to take 
 advantage of multiple cores. So, I run N instances of this worker, where 
 N is the number of cores on the host machine, and each worker binds to 
 its own socket. Components that wish to make use of this worker service 
 connect to all N worker instances.
 
 Unfortunately this is a little awkward. The connecting components must 
 be configured with the N socket specs. And it's hard to automate this, 
 since even if the connecting components could generate socket specs 
 programmatically, this still requires knowing the number of cores of the 
 remote machine.
 
 What I'd like to do is put an adapter component in front of the N worker 
 instances (on the same machine as the worker instances) that binds to a 
 single socket. It would route to the N workers, and this is easily done 
 since the adapter lives on the same machine and knows the number of 
 cores. Connecting components could then simply connect to this adapter, 
 and not need to care about the number of remote cores.
 
 The question I have is what kind of overhead this introduces. An MxN set 
 of connections between M remote components and the N workers seems like 
 it would be far more efficient than M-1-N, which looks like a 
 bottleneck. But maybe in practice, if the routing is very simple, then 
 it becomes negligible?
 
 Justin
 
 ___
 zeromq-dev mailing list
 zeromq-dev@lists.zeromq.org
 http://lists.zeromq.org/mailman/listinfo/zeromq-dev

___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev


[zeromq-dev] queue overhead

2014-07-27 Thread Justin Karneges
I have a stable (in the addressing sense) worker that I want to take 
advantage of multiple cores. So, I run N instances of this worker, where 
N is the number of cores on the host machine, and each worker binds to 
its own socket. Components that wish to make use of this worker service 
connect to all N worker instances.

Unfortunately this is a little awkward. The connecting components must 
be configured with the N socket specs. And it's hard to automate this, 
since even if the connecting components could generate socket specs 
programmatically, this still requires knowing the number of cores of the 
remote machine.

What I'd like to do is put an adapter component in front of the N worker 
instances (on the same machine as the worker instances) that binds to a 
single socket. It would route to the N workers, and this is easily done 
since the adapter lives on the same machine and knows the number of 
cores. Connecting components could then simply connect to this adapter, 
and not need to care about the number of remote cores.

The question I have is what kind of overhead this introduces. An MxN set 
of connections between M remote components and the N workers seems like 
it would be far more efficient than M-1-N, which looks like a 
bottleneck. But maybe in practice, if the routing is very simple, then 
it becomes negligible?

Justin

___
zeromq-dev mailing list
zeromq-dev@lists.zeromq.org
http://lists.zeromq.org/mailman/listinfo/zeromq-dev