Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-17 Thread Guido Medina
For pool of Buffers it is usually better some bounded MpMc queue with some
pre-allocated capacity.
It doesn't need to be synchronized, it only needs to be used as any other
pool in that:

- You take or create.
- Use and pass around.
- And finally you offer back to such queue.

By default you can use a ConcurrentLinkedQueue for that and later you can
easily move to a MpSc bounded queue,
why bounded? if you happen to slow down processing for whatever reason you
don't want your pool to have more than N capacity.

Is tricky but it has been done before and it shouldn't be a problem here,
though it requires try...finally to make sure you return the buffer to the
queue.
Bounded non-blocking queues are perfect because it will not throw an
exception if the queue is empty or full in the case of slow processing
scenario.

Guido.

On Mon, Oct 17, 2016 at 7:01 PM, Emmanuel Lécharny 
wrote:

>
>
> Le 17/10/16 à 11:12, Guido Medina a écrit :
> > Hi Emmanuel,
> >
> > At the mina-core class AbstractNioSession the variable:
> >
> > /** the queue of pending writes for the session, to be dequeued by
> the
> > {@link SelectorLoop} */
> > private final Queue writeQueue = new
> DefaultWriteQueue();
> >
> > such queue is being consumed by a selector loop?
> It's consumed by the IoProcessor selector loop (so you have many of them).
>
> The thread that has been selected to process the read will go on up to
> the point it has written the bytes into teh write queue, then it go back
> to teh point it was called from, and the  it processes the writes :
>
> select :
> 1) read -> head -> filter -> filter -> ... -> handler ->
> session.write(something) -> filter -> filter -> ... -> head -> put in
> write queue and return (unstacking all teh calls)
> 2) write -> get the write queue, and write the data in it until the
> queue is empty or the socket is full (and in thsi case, set the OP_WRITE
> status)
>
>
>
> > which makes me think it is
> > a single thread loop hence making it MpSc and ideal for low GC
> optimization.
> > But maybe such optimization is so unnoticeable that is not worth.
> Actually, we have as many thread as we have IoProcessor instances. The
> thing is that I'm not even sure that we need a synchronize queue, as the
> IoProcessor is execution all the processing on ts own thread. The
> cincurrentQueue is there just because one can use an executor filter,
> that will spread the processing into many other threads, and then we
> *may* have more than one thread accessing this queue.
>
> Regarding GC, we have removed some useless object allocations in 2.0.15,
> so the GC should be slightly less under pressure.
>
> If you want to alleviate the GC load, I think there are other areas
> where some improvement can be made. Typically, there is a
> BufferAllocator that pool the CachedBufferAllocator that allows you to
> reuse buffers. Now, this is a tricky solution, as it has to be
> synchronized, so expect some bottleneck here. Ideally, we should have
> another implementation that use the ThreadLocalStorage, but that would
> be memory expensive.
>
>
> >
> > That's the only place I think it is worth replacing by a low GC footprint
> > queue, it will avoid the creation of GC-able linked nodes
> > (ConcurrentLinkedQueue)
> >
> > In fact, further in that logic you try to avoid writing to the queue if
> is
> > empty by passing the message directly to the next handler which is a
> > micro-optimization,
> > isEmpty() will 99.99% of the cases render to be false for systems with
> high
> > load.
> Well, it depends. The queue will be emptied if the socket is able to
> swallow the data. If your system is under high load, I suspect that all
> the socket will be full already, so you'll end up with some bigger
> problem ! (typically, the queue will grow, and at some pont, you'll get
> a OOM... I have alredy experienced that, so yes, it may happen).
>
> My point is that you shoud always expect that your OS and your network
> are capable of allocating enouh buffer for teh sockects, and have enough
> bandwith to send them fast enough so that the socket buffer can always
> accept any write done on it. In this case, the write queue will always
> be empty, except if you are writing a huge message (typically above the
> socket buffer size, whoch default to 1Kb - from the top of my head- but
> that you can set up to 64Kb more or less). Even on a loaded system.
>
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-17 Thread Emmanuel Lécharny


Le 17/10/16 à 11:35, Guido Medina a écrit :
> And DefaultIoSessionDataStructureFactory has the following public static
> inner class:
>
> private static class DefaultWriteRequestQueue implements
> WriteRequestQueue {
> /** A queue to store incoming write requests */
> private final Queue q = new
> ConcurrentLinkedQueue();
>
> I'm assuming that's also a queue consumed by a loop thread, anything in
> fact that is consumable by a loop thread and has a chance of receiving many
> messages is optimizable by a low GC MpSc version of JC tools APIs.
> You can look at other queues they have available but TBH I prefer the array
> based queues as they have a practically zero GC impact:
> https://github.com/JCTools/JCTools/tree/master/jctools-core/src/main/java/org/jctools/queues

Maybe.

I would suggest that you do your experiment, and measure the imrpovement
you get with the JCtools queue. If it's any better, we would be pleased
to swap what we have with something better :-)

It's pretty hard to tell what would be the hypothetical gain *in
theory*, this has to be tested in silicio.

Now, MINA is an OSS project, so we warmly welcome any contribution !
Keep in mind this is also *your* project, if you want to participate !


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-17 Thread Emmanuel Lécharny


Le 17/10/16 à 11:12, Guido Medina a écrit :
> Hi Emmanuel,
>
> At the mina-core class AbstractNioSession the variable:
>
> /** the queue of pending writes for the session, to be dequeued by the
> {@link SelectorLoop} */
> private final Queue writeQueue = new DefaultWriteQueue();
>
> such queue is being consumed by a selector loop? 
It's consumed by the IoProcessor selector loop (so you have many of them).

The thread that has been selected to process the read will go on up to
the point it has written the bytes into teh write queue, then it go back
to teh point it was called from, and the  it processes the writes :

select :
1) read -> head -> filter -> filter -> ... -> handler ->
session.write(something) -> filter -> filter -> ... -> head -> put in
write queue and return (unstacking all teh calls)
2) write -> get the write queue, and write the data in it until the
queue is empty or the socket is full (and in thsi case, set the OP_WRITE
status)   



> which makes me think it is
> a single thread loop hence making it MpSc and ideal for low GC optimization.
> But maybe such optimization is so unnoticeable that is not worth.
Actually, we have as many thread as we have IoProcessor instances. The
thing is that I'm not even sure that we need a synchronize queue, as the
IoProcessor is execution all the processing on ts own thread. The
cincurrentQueue is there just because one can use an executor filter,
that will spread the processing into many other threads, and then we
*may* have more than one thread accessing this queue.

Regarding GC, we have removed some useless object allocations in 2.0.15,
so the GC should be slightly less under pressure.

If you want to alleviate the GC load, I think there are other areas
where some improvement can be made. Typically, there is a
BufferAllocator that pool the CachedBufferAllocator that allows you to
reuse buffers. Now, this is a tricky solution, as it has to be
synchronized, so expect some bottleneck here. Ideally, we should have
another implementation that use the ThreadLocalStorage, but that would
be memory expensive.


>
> That's the only place I think it is worth replacing by a low GC footprint
> queue, it will avoid the creation of GC-able linked nodes
> (ConcurrentLinkedQueue)
>
> In fact, further in that logic you try to avoid writing to the queue if is
> empty by passing the message directly to the next handler which is a
> micro-optimization,
> isEmpty() will 99.99% of the cases render to be false for systems with high
> load.
Well, it depends. The queue will be emptied if the socket is able to
swallow the data. If your system is under high load, I suspect that all
the socket will be full already, so you'll end up with some bigger
problem ! (typically, the queue will grow, and at some pont, you'll get
a OOM... I have alredy experienced that, so yes, it may happen).

My point is that you shoud always expect that your OS and your network
are capable of allocating enouh buffer for teh sockects, and have enough
bandwith to send them fast enough so that the socket buffer can always
accept any write done on it. In this case, the write queue will always
be empty, except if you are writing a huge message (typically above the
socket buffer size, whoch default to 1Kb - from the top of my head- but
that you can set up to 64Kb more or less). Even on a loaded system.



Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-17 Thread Guido Medina
And DefaultIoSessionDataStructureFactory has the following public static
inner class:

private static class DefaultWriteRequestQueue implements
WriteRequestQueue {
/** A queue to store incoming write requests */
private final Queue q = new
ConcurrentLinkedQueue();

I'm assuming that's also a queue consumed by a loop thread, anything in
fact that is consumable by a loop thread and has a chance of receiving many
messages is optimizable by a low GC MpSc version of JC tools APIs.
You can look at other queues they have available but TBH I prefer the array
based queues as they have a practically zero GC impact:
https://github.com/JCTools/JCTools/tree/master/jctools-core/src/main/java/org/jctools/queues

Regards,
Guido.


On Mon, Oct 17, 2016 at 10:29 AM, Guido Medina  wrote:

> Sorry, wrong branch, that's on the current master (trunk), for our case it
> would be in the following on 2.0.15 branch:
>
> public abstract class AbstractProtocolEncoderOutput implements
> ProtocolEncoderOutput {
> private final Queue messageQueue = new
> ConcurrentLinkedQueue();
> ...
> }
>
> The messageQueue variable is the candidate for this kind of optimization,
> assuming it is consumed by a single loop thread.
>
> Regards,
>
> Guido.
>
> On Mon, Oct 17, 2016 at 10:12 AM, Guido Medina  wrote:
>
>> Hi Emmanuel,
>>
>> At the mina-core class AbstractNioSession the variable:
>>
>> /** the queue of pending writes for the session, to be dequeued by
>> the {@link SelectorLoop} */
>> private final Queue writeQueue = new
>> DefaultWriteQueue();
>>
>> such queue is being consumed by a selector loop? which makes me think it
>> is a single thread loop hence making it MpSc and ideal for low GC
>> optimization.
>> But maybe such optimization is so unnoticeable that is not worth.
>>
>> That's the only place I think it is worth replacing by a low GC footprint
>> queue, it will avoid the creation of GC-able linked nodes
>> (ConcurrentLinkedQueue)
>>
>> In fact, further in that logic you try to avoid writing to the queue if
>> is empty by passing the message directly to the next handler which is a
>> micro-optimization,
>> isEmpty() will 99.99% of the cases render to be false for systems with
>> high load.
>>
>> WDTY?
>>
>> Guido.
>>
>> On Sat, Oct 15, 2016 at 8:12 PM, Guido Medina  wrote:
>>
>>> I will take a look again at the source code but not today, I will let
>>> you know on Monday if is applicable for MINA core, it seems it is not the
>>> case,
>>> my application is simply forwarding each decoded FIX message to an Akka
>>> actor which are backed by a high performance queue,
>>> I was thinking (will double check) these ByteBuffers were queue somehow
>>> before they are picked by the handlers which is where a non-blocking MpSc
>>> would play a role.
>>>
>>> But maybe I misunderstood the code I saw.
>>>
>>> I will check again and let you know,
>>>
>>> Have a nice weekend,
>>>
>>> Guido.
>>>
>>> On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny >> > wrote:
>>>
 Tio be clear : when some sockets are ready for read (ie, the OP_READ
 flag
 has been set, and there is something in the socket to be read), the
 IoProcessor call to select()) will return and we will have a set of
 SelectionKey returned. Thsi set contains the set of all the channel that
 are ready for some processing. The IoProcessor thread will process them
 one
 after the other, from top to bottom. That means we don't process
 multiple
 sessions in parallel when all those sessions are handled by one
 singleIoProcessor. You have to be careful in what you do in your
 application, because any costly processing, or any synchronous access
 to a
 remote system will block the other sessions processing.

 Now, we always start the server with more than one IoProcessor
 (typically,
 Nb core + 1 Ioprocessor). You can also fix a higher number of
 IoProcessor
 if you like, but at some point, if your CPU is 100% used, adding more
 IoProcessor does not help.

 What kind of performance are you expecting to reach ? (ie, how many
 requests per second ?)

>>>
>>>
>>
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-17 Thread Guido Medina
Sorry, wrong branch, that's on the current master (trunk), for our case it
would be in the following on 2.0.15 branch:

public abstract class AbstractProtocolEncoderOutput implements
ProtocolEncoderOutput {
private final Queue messageQueue = new
ConcurrentLinkedQueue();
...
}

The messageQueue variable is the candidate for this kind of optimization,
assuming it is consumed by a single loop thread.

Regards,

Guido.

On Mon, Oct 17, 2016 at 10:12 AM, Guido Medina  wrote:

> Hi Emmanuel,
>
> At the mina-core class AbstractNioSession the variable:
>
> /** the queue of pending writes for the session, to be dequeued by the
> {@link SelectorLoop} */
> private final Queue writeQueue = new DefaultWriteQueue();
>
> such queue is being consumed by a selector loop? which makes me think it
> is a single thread loop hence making it MpSc and ideal for low GC
> optimization.
> But maybe such optimization is so unnoticeable that is not worth.
>
> That's the only place I think it is worth replacing by a low GC footprint
> queue, it will avoid the creation of GC-able linked nodes
> (ConcurrentLinkedQueue)
>
> In fact, further in that logic you try to avoid writing to the queue if is
> empty by passing the message directly to the next handler which is a
> micro-optimization,
> isEmpty() will 99.99% of the cases render to be false for systems with
> high load.
>
> WDTY?
>
> Guido.
>
> On Sat, Oct 15, 2016 at 8:12 PM, Guido Medina  wrote:
>
>> I will take a look again at the source code but not today, I will let you
>> know on Monday if is applicable for MINA core, it seems it is not the case,
>> my application is simply forwarding each decoded FIX message to an Akka
>> actor which are backed by a high performance queue,
>> I was thinking (will double check) these ByteBuffers were queue somehow
>> before they are picked by the handlers which is where a non-blocking MpSc
>> would play a role.
>>
>> But maybe I misunderstood the code I saw.
>>
>> I will check again and let you know,
>>
>> Have a nice weekend,
>>
>> Guido.
>>
>> On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny 
>> wrote:
>>
>>> Tio be clear : when some sockets are ready for read (ie, the OP_READ flag
>>> has been set, and there is something in the socket to be read), the
>>> IoProcessor call to select()) will return and we will have a set of
>>> SelectionKey returned. Thsi set contains the set of all the channel that
>>> are ready for some processing. The IoProcessor thread will process them
>>> one
>>> after the other, from top to bottom. That means we don't process multiple
>>> sessions in parallel when all those sessions are handled by one
>>> singleIoProcessor. You have to be careful in what you do in your
>>> application, because any costly processing, or any synchronous access to
>>> a
>>> remote system will block the other sessions processing.
>>>
>>> Now, we always start the server with more than one IoProcessor
>>> (typically,
>>> Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor
>>> if you like, but at some point, if your CPU is 100% used, adding more
>>> IoProcessor does not help.
>>>
>>> What kind of performance are you expecting to reach ? (ie, how many
>>> requests per second ?)
>>>
>>
>>
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-17 Thread Guido Medina
Hi Emmanuel,

At the mina-core class AbstractNioSession the variable:

/** the queue of pending writes for the session, to be dequeued by the
{@link SelectorLoop} */
private final Queue writeQueue = new DefaultWriteQueue();

such queue is being consumed by a selector loop? which makes me think it is
a single thread loop hence making it MpSc and ideal for low GC optimization.
But maybe such optimization is so unnoticeable that is not worth.

That's the only place I think it is worth replacing by a low GC footprint
queue, it will avoid the creation of GC-able linked nodes
(ConcurrentLinkedQueue)

In fact, further in that logic you try to avoid writing to the queue if is
empty by passing the message directly to the next handler which is a
micro-optimization,
isEmpty() will 99.99% of the cases render to be false for systems with high
load.

WDTY?

Guido.

On Sat, Oct 15, 2016 at 8:12 PM, Guido Medina  wrote:

> I will take a look again at the source code but not today, I will let you
> know on Monday if is applicable for MINA core, it seems it is not the case,
> my application is simply forwarding each decoded FIX message to an Akka
> actor which are backed by a high performance queue,
> I was thinking (will double check) these ByteBuffers were queue somehow
> before they are picked by the handlers which is where a non-blocking MpSc
> would play a role.
>
> But maybe I misunderstood the code I saw.
>
> I will check again and let you know,
>
> Have a nice weekend,
>
> Guido.
>
> On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny 
> wrote:
>
>> Tio be clear : when some sockets are ready for read (ie, the OP_READ flag
>> has been set, and there is something in the socket to be read), the
>> IoProcessor call to select()) will return and we will have a set of
>> SelectionKey returned. Thsi set contains the set of all the channel that
>> are ready for some processing. The IoProcessor thread will process them
>> one
>> after the other, from top to bottom. That means we don't process multiple
>> sessions in parallel when all those sessions are handled by one
>> singleIoProcessor. You have to be careful in what you do in your
>> application, because any costly processing, or any synchronous access to a
>> remote system will block the other sessions processing.
>>
>> Now, we always start the server with more than one IoProcessor (typically,
>> Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor
>> if you like, but at some point, if your CPU is 100% used, adding more
>> IoProcessor does not help.
>>
>> What kind of performance are you expecting to reach ? (ie, how many
>> requests per second ?)
>>
>
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Guido Medina
I will take a look again at the source code but not today, I will let you
know on Monday if is applicable for MINA core, it seems it is not the case,
my application is simply forwarding each decoded FIX message to an Akka
actor which are backed by a high performance queue,
I was thinking (will double check) these ByteBuffers were queue somehow
before they are picked by the handlers which is where a non-blocking MpSc
would play a role.

But maybe I misunderstood the code I saw.

I will check again and let you know,

Have a nice weekend,

Guido.

On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny 
wrote:

> Tio be clear : when some sockets are ready for read (ie, the OP_READ flag
> has been set, and there is something in the socket to be read), the
> IoProcessor call to select()) will return and we will have a set of
> SelectionKey returned. Thsi set contains the set of all the channel that
> are ready for some processing. The IoProcessor thread will process them one
> after the other, from top to bottom. That means we don't process multiple
> sessions in parallel when all those sessions are handled by one
> singleIoProcessor. You have to be careful in what you do in your
> application, because any costly processing, or any synchronous access to a
> remote system will block the other sessions processing.
>
> Now, we always start the server with more than one IoProcessor (typically,
> Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor
> if you like, but at some point, if your CPU is 100% used, adding more
> IoProcessor does not help.
>
> What kind of performance are you expecting to reach ? (ie, how many
> requests per second ?)
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Emmanuel Lecharny
Tio be clear : when some sockets are ready for read (ie, the OP_READ flag
has been set, and there is something in the socket to be read), the
IoProcessor call to select()) will return and we will have a set of
SelectionKey returned. Thsi set contains the set of all the channel that
are ready for some processing. The IoProcessor thread will process them one
after the other, from top to bottom. That means we don't process multiple
sessions in parallel when all those sessions are handled by one
singleIoProcessor. You have to be careful in what you do in your
application, because any costly processing, or any synchronous access to a
remote system will block the other sessions processing.

Now, we always start the server with more than one IoProcessor (typically,
Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor
if you like, but at some point, if your CPU is 100% used, adding more
IoProcessor does not help.

What kind of performance are you expecting to reach ? (ie, how many
requests per second ?)


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Emmanuel Lecharny
On Sat, Oct 15, 2016 at 8:26 PM, Guido Medina  wrote:

> Doesn't each ByteBuffer goes into a queue?
>

No. What for ? Once we have read the bytes, we process them immediately,
pushing them in the chain. You can multiplex the processing by adding an
ExecutorFilter in the chain, assuming you have a lot of cores available.


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Emmanuel Lecharny
On Sat, Oct 15, 2016 at 3:59 PM, Guido Medina  wrote:

> Maybe I misunderstood what I saw in the code, I saw 10 (not sure) places
> where ConcurentLinkedQueue was used, one of them was for the connections
> which for this case wouldn't make a difference.
>

Most of those lists are used to manage new session to be created, flushed
or deleted. This is because we usually have one acceptor processing
incoming connections, and dispatching them to various IoProcesser,
responsible for the further processing.



> The other place are for handling the received frame/message? Am I correct
> here? that's where I believe it would make a difference, where a single
> connection can have potentially hundreds of frames to be "handled" (by a
> handler)
>

Not sure what you mean by 'frame'.



>
> Isn't that a good place to introduce MpSc? if such place has 1 consumer
> thread pulling from such queue and then delegating to a handler.
>

The IoProcessor is the thread that get the messages (and it's not 'polling'
them, it's an event-driven system) and propagate them to the handler.


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Emmanuel Lecharny
On Sat, Oct 15, 2016 at 3:54 PM, Guido Medina  wrote:

> The connections count is usually "finite" (not worth the effort),


If so, the best solution is certainly not to use NIO. One thread per
connection s the way to go, and if you have enough memory, you can handle
thousands of connections.


> but the
> queue for packets, isn't also a ConcurrentLinkedQueue?
> I'm not sure how MINA core stores the packets received before they are
> passed to their handler.
>

Packets  aren't stored in any data strcture. They are read into a
ByteBuffer, and passed through the filter chain up to the handler. You may
have a codec filter in the middle, that decode the packet into
Application's message, which are then passed through teh chain to the
handler.


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Guido Medina
Maybe I misunderstood what I saw in the code, I saw 10 (not sure) places
where ConcurentLinkedQueue was used, one of them was for the connections
which for this case wouldn't make a difference.
The other place are for handling the received frame/message? Am I correct
here? that's where I believe it would make a difference, where a single
connection can have potentially hundreds of frames to be "handled" (by a
handler)

Isn't that a good place to introduce MpSc? if such place has 1 consumer
thread pulling from such queue and then delegating to a handler.

On Sat, Oct 15, 2016 at 2:54 PM, Guido Medina  wrote:

> The connections count is usually "finite" (not worth the effort), but the
> queue for packets, isn't also a ConcurrentLinkedQueue?
> I'm not sure how MINA core stores the packets received before they are
> passed to their handler.
>
> On Sat, Oct 15, 2016 at 2:27 PM, Emmanuel Lecharny 
> wrote:
>
>> On Sat, Oct 15, 2016 at 1:20 PM, Guido Medina  wrote:
>>
>> > Hi,
>> >
>> > I was looking at MINA core source code and I noticed events are publish
>> to
>> > a ConcurrentLinkedQueue so here are my questions and suggestions:
>> >
>> >- Does ConcurrentLinkedQueue for these cases use the Pattern of
>> > *Multiple
>> >Producer/Single Consumer* (MpSc) or *Multiple Producer/Multiple
>> > Consumer*
>> >(MpMc)
>> >
>>
>> MpMc.
>>
>>
>>
>> >- For low latency applications (in my case I'm talking QuickFixJ for
>> the
>> >financial industry) would it benefit from a MpSc that has low memory
>> >footprint (more like low GC footprint)?
>> >
>> > If that is the case I would shade JCtools dependency and use the queue:
>> > https://github.com/JCTools/JCTools/blob/master/jctools-
>> > core/src/main/java/org/jctools/queues/MpscChunkedArrayQueue.java
>> >
>> > Such queue uses ring buffers (power of two arrays) and linked them if
>> they
>> > need to expand, which is great for theoretically unbounded queues but
>> with
>> > the benefit of not used linked nodes per element but linked arrays.
>> >
>> > Recently Netty replaced their non-blocking linked queues for that one.
>> >
>>
>> That is an option.
>>
>> Now, I would say that for an application requiring low latency, basing it
>> on top of NIO makes littel sense, considering the extra cost compared to a
>> Blocking IO solution (and we are talking about 30% performance penalty, at
>> least).
>>
>> Do you need to handle potentially millions of connections ?
>>
>
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Guido Medina
The connections count is usually "finite" (not worth the effort), but the
queue for packets, isn't also a ConcurrentLinkedQueue?
I'm not sure how MINA core stores the packets received before they are
passed to their handler.

On Sat, Oct 15, 2016 at 2:27 PM, Emmanuel Lecharny 
wrote:

> On Sat, Oct 15, 2016 at 1:20 PM, Guido Medina  wrote:
>
> > Hi,
> >
> > I was looking at MINA core source code and I noticed events are publish
> to
> > a ConcurrentLinkedQueue so here are my questions and suggestions:
> >
> >- Does ConcurrentLinkedQueue for these cases use the Pattern of
> > *Multiple
> >Producer/Single Consumer* (MpSc) or *Multiple Producer/Multiple
> > Consumer*
> >(MpMc)
> >
>
> MpMc.
>
>
>
> >- For low latency applications (in my case I'm talking QuickFixJ for
> the
> >financial industry) would it benefit from a MpSc that has low memory
> >footprint (more like low GC footprint)?
> >
> > If that is the case I would shade JCtools dependency and use the queue:
> > https://github.com/JCTools/JCTools/blob/master/jctools-
> > core/src/main/java/org/jctools/queues/MpscChunkedArrayQueue.java
> >
> > Such queue uses ring buffers (power of two arrays) and linked them if
> they
> > need to expand, which is great for theoretically unbounded queues but
> with
> > the benefit of not used linked nodes per element but linked arrays.
> >
> > Recently Netty replaced their non-blocking linked queues for that one.
> >
>
> That is an option.
>
> Now, I would say that for an application requiring low latency, basing it
> on top of NIO makes littel sense, considering the extra cost compared to a
> Blocking IO solution (and we are talking about 30% performance penalty, at
> least).
>
> Do you need to handle potentially millions of connections ?
>


Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue

2016-10-15 Thread Emmanuel Lecharny
On Sat, Oct 15, 2016 at 1:20 PM, Guido Medina  wrote:

> Hi,
>
> I was looking at MINA core source code and I noticed events are publish to
> a ConcurrentLinkedQueue so here are my questions and suggestions:
>
>- Does ConcurrentLinkedQueue for these cases use the Pattern of
> *Multiple
>Producer/Single Consumer* (MpSc) or *Multiple Producer/Multiple
> Consumer*
>(MpMc)
>

MpMc.



>- For low latency applications (in my case I'm talking QuickFixJ for the
>financial industry) would it benefit from a MpSc that has low memory
>footprint (more like low GC footprint)?
>
> If that is the case I would shade JCtools dependency and use the queue:
> https://github.com/JCTools/JCTools/blob/master/jctools-
> core/src/main/java/org/jctools/queues/MpscChunkedArrayQueue.java
>
> Such queue uses ring buffers (power of two arrays) and linked them if they
> need to expand, which is great for theoretically unbounded queues but with
> the benefit of not used linked nodes per element but linked arrays.
>
> Recently Netty replaced their non-blocking linked queues for that one.
>

That is an option.

Now, I would say that for an application requiring low latency, basing it
on top of NIO makes littel sense, considering the extra cost compared to a
Blocking IO solution (and we are talking about 30% performance penalty, at
least).

Do you need to handle potentially millions of connections ?