>> sometimes having more threads than the number of cores is
desirable when you have a thread pool with worker threads doing
network/disk I/O
Yep, totally agreed, and this is certainly an intuitive way of programming.
But in all of these systems, either each worker thread has exclusive ownership
over the file descriptors/sockets it uses, or you have to introduce locking
around those shared resources.
The latter is typically a much larger performance drain, so that's why Thrift
doesn't do it by default (or at all - it is up to the application layer to lock
appropriately).
>> I suppose it is possible to keep most of code in a worker thread model
and just use a queue of requests at the lowest level where I need to
make Thrift RPC calls
Yep, exactly. Each thread can have a semaphore. You do all your application
logic in worker threads. When a worker thread needs a network thing to happen,
it places the request in a queue for the network thread, along with your
semaphore. The network thread runs an infinite loop, pulling items off the
queue and signaling the sempahore when they are complete. Your worker thread
code can still *look* synchronous and not have any callbacks.
Here is really rough pseudocode:
network_queue;
worker_thread {
semaphore s;
run() {
// do application logic
request = thrift_request();
request.sempahore = s;
lock(network_queue) {
network_queue.enqueue(request);
s.increment();
}
// block until a networker signals
s.wait();
// request is now populated with our response data
// more application logic
return whatever;
}
}
network_thread {
while (true) {
lock(network_queue) {
request = network_queue.dequeue();
}
request.process();
request.semaphore.signal();
}
}
You can have multiple network threads all working on this queue. Typically
makes sense to have as many networker threads as you have cores.
Cheers,
mcslee
________________________________________
From: Akshat Aranya [[email protected]]
Sent: Wednesday, August 08, 2012 2:06 PM
To: [email protected]
Subject: Re: Using multi-threaded clients with Thrift
On Wed, Aug 8, 2012 at 4:55 PM, Mark Slee <[email protected]> wrote:
> The Thrift transport layer is not thread-safe. It is essentially a wrapper on
> a socket.
>
> You can't interleave writing things to a single socket from multiple threads
> without locking. You also don't know what order the responses will come back
> in. Each thread is effectively calling read(). To have this work in a
> multi-threaded environment would require another layer of abstraction that
> parceled out responses on the socket and determined which data should go to
> which thread. This would be less efficient in the common case of a single
> transport per thread.
>
> You certainly could build this functionality on top of the Thrift
> abstractions, but the base layers are designed to be very lightweight and
> pretty closely mimic raw sockets.
>
>>> If so, is the only way to make it work in a
> multi-threaded environment is to use an independent connection (i.e.,
> a new Transport) per thread? That seems kind of wasteful and
> inefficient.
>
> In practice, assuming your number of threads is on the order of you number of
> cores, this is not inefficient and additional sockets aren't very expensive.
> Having each thread own its own socket obviates the need for locking around
> all accesses the shared socket resource, which tends to be much more costly.
>
> Another common design in a multi-threaded environment is to have a single
> networker thread (or a low fixed number of them). This thread owns a
> transport, and the clients put in requests to this thread to perform an
> operation and then block, waiting to receive a callback when the operation
> they requested is complete.
>
> Cheers,
> mcslee
> ________________________________________
Thanks for the information, Mark. This is not a criticism of Thrift,
but sometimes having more threads than the number of cores is
desirable when you have a thread pool with worker threads doing
network/disk I/O. I have programmed with both a message passing model
and a worker thread model, and in my opinion, the latter results in
more readable code and is easier to program with, especially with
languages such as C++ that require explicit memory management. I
suppose it is possible to keep most of code in a worker thread model
and just use a queue of requests at the lowest level where I need to
make Thrift RPC calls. I was hoping that Thrift would do this for me
out of the box. :-D
Cheers,
Akshat