Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
For pool of Buffers it is usually better some bounded MpMc queue with some pre-allocated capacity. It doesn't need to be synchronized, it only needs to be used as any other pool in that: - You take or create. - Use and pass around. - And finally you offer back to such queue. By default you can use a ConcurrentLinkedQueue for that and later you can easily move to a MpSc bounded queue, why bounded? if you happen to slow down processing for whatever reason you don't want your pool to have more than N capacity. Is tricky but it has been done before and it shouldn't be a problem here, though it requires try...finally to make sure you return the buffer to the queue. Bounded non-blocking queues are perfect because it will not throw an exception if the queue is empty or full in the case of slow processing scenario. Guido. On Mon, Oct 17, 2016 at 7:01 PM, Emmanuel Lécharnywrote: > > > Le 17/10/16 à 11:12, Guido Medina a écrit : > > Hi Emmanuel, > > > > At the mina-core class AbstractNioSession the variable: > > > > /** the queue of pending writes for the session, to be dequeued by > the > > {@link SelectorLoop} */ > > private final Queue writeQueue = new > DefaultWriteQueue(); > > > > such queue is being consumed by a selector loop? > It's consumed by the IoProcessor selector loop (so you have many of them). > > The thread that has been selected to process the read will go on up to > the point it has written the bytes into teh write queue, then it go back > to teh point it was called from, and the it processes the writes : > > select : > 1) read -> head -> filter -> filter -> ... -> handler -> > session.write(something) -> filter -> filter -> ... -> head -> put in > write queue and return (unstacking all teh calls) > 2) write -> get the write queue, and write the data in it until the > queue is empty or the socket is full (and in thsi case, set the OP_WRITE > status) > > > > > which makes me think it is > > a single thread loop hence making it MpSc and ideal for low GC > optimization. > > But maybe such optimization is so unnoticeable that is not worth. > Actually, we have as many thread as we have IoProcessor instances. The > thing is that I'm not even sure that we need a synchronize queue, as the > IoProcessor is execution all the processing on ts own thread. The > cincurrentQueue is there just because one can use an executor filter, > that will spread the processing into many other threads, and then we > *may* have more than one thread accessing this queue. > > Regarding GC, we have removed some useless object allocations in 2.0.15, > so the GC should be slightly less under pressure. > > If you want to alleviate the GC load, I think there are other areas > where some improvement can be made. Typically, there is a > BufferAllocator that pool the CachedBufferAllocator that allows you to > reuse buffers. Now, this is a tricky solution, as it has to be > synchronized, so expect some bottleneck here. Ideally, we should have > another implementation that use the ThreadLocalStorage, but that would > be memory expensive. > > > > > > That's the only place I think it is worth replacing by a low GC footprint > > queue, it will avoid the creation of GC-able linked nodes > > (ConcurrentLinkedQueue) > > > > In fact, further in that logic you try to avoid writing to the queue if > is > > empty by passing the message directly to the next handler which is a > > micro-optimization, > > isEmpty() will 99.99% of the cases render to be false for systems with > high > > load. > Well, it depends. The queue will be emptied if the socket is able to > swallow the data. If your system is under high load, I suspect that all > the socket will be full already, so you'll end up with some bigger > problem ! (typically, the queue will grow, and at some pont, you'll get > a OOM... I have alredy experienced that, so yes, it may happen). > > My point is that you shoud always expect that your OS and your network > are capable of allocating enouh buffer for teh sockects, and have enough > bandwith to send them fast enough so that the socket buffer can always > accept any write done on it. In this case, the write queue will always > be empty, except if you are writing a huge message (typically above the > socket buffer size, whoch default to 1Kb - from the top of my head- but > that you can set up to 64Kb more or less). Even on a loaded system. > >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
Le 17/10/16 à 11:35, Guido Medina a écrit : > And DefaultIoSessionDataStructureFactory has the following public static > inner class: > > private static class DefaultWriteRequestQueue implements > WriteRequestQueue { > /** A queue to store incoming write requests */ > private final Queue q = new > ConcurrentLinkedQueue(); > > I'm assuming that's also a queue consumed by a loop thread, anything in > fact that is consumable by a loop thread and has a chance of receiving many > messages is optimizable by a low GC MpSc version of JC tools APIs. > You can look at other queues they have available but TBH I prefer the array > based queues as they have a practically zero GC impact: > https://github.com/JCTools/JCTools/tree/master/jctools-core/src/main/java/org/jctools/queues Maybe. I would suggest that you do your experiment, and measure the imrpovement you get with the JCtools queue. If it's any better, we would be pleased to swap what we have with something better :-) It's pretty hard to tell what would be the hypothetical gain *in theory*, this has to be tested in silicio. Now, MINA is an OSS project, so we warmly welcome any contribution ! Keep in mind this is also *your* project, if you want to participate !
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
Le 17/10/16 à 11:12, Guido Medina a écrit : > Hi Emmanuel, > > At the mina-core class AbstractNioSession the variable: > > /** the queue of pending writes for the session, to be dequeued by the > {@link SelectorLoop} */ > private final Queue writeQueue = new DefaultWriteQueue(); > > such queue is being consumed by a selector loop? It's consumed by the IoProcessor selector loop (so you have many of them). The thread that has been selected to process the read will go on up to the point it has written the bytes into teh write queue, then it go back to teh point it was called from, and the it processes the writes : select : 1) read -> head -> filter -> filter -> ... -> handler -> session.write(something) -> filter -> filter -> ... -> head -> put in write queue and return (unstacking all teh calls) 2) write -> get the write queue, and write the data in it until the queue is empty or the socket is full (and in thsi case, set the OP_WRITE status) > which makes me think it is > a single thread loop hence making it MpSc and ideal for low GC optimization. > But maybe such optimization is so unnoticeable that is not worth. Actually, we have as many thread as we have IoProcessor instances. The thing is that I'm not even sure that we need a synchronize queue, as the IoProcessor is execution all the processing on ts own thread. The cincurrentQueue is there just because one can use an executor filter, that will spread the processing into many other threads, and then we *may* have more than one thread accessing this queue. Regarding GC, we have removed some useless object allocations in 2.0.15, so the GC should be slightly less under pressure. If you want to alleviate the GC load, I think there are other areas where some improvement can be made. Typically, there is a BufferAllocator that pool the CachedBufferAllocator that allows you to reuse buffers. Now, this is a tricky solution, as it has to be synchronized, so expect some bottleneck here. Ideally, we should have another implementation that use the ThreadLocalStorage, but that would be memory expensive. > > That's the only place I think it is worth replacing by a low GC footprint > queue, it will avoid the creation of GC-able linked nodes > (ConcurrentLinkedQueue) > > In fact, further in that logic you try to avoid writing to the queue if is > empty by passing the message directly to the next handler which is a > micro-optimization, > isEmpty() will 99.99% of the cases render to be false for systems with high > load. Well, it depends. The queue will be emptied if the socket is able to swallow the data. If your system is under high load, I suspect that all the socket will be full already, so you'll end up with some bigger problem ! (typically, the queue will grow, and at some pont, you'll get a OOM... I have alredy experienced that, so yes, it may happen). My point is that you shoud always expect that your OS and your network are capable of allocating enouh buffer for teh sockects, and have enough bandwith to send them fast enough so that the socket buffer can always accept any write done on it. In this case, the write queue will always be empty, except if you are writing a huge message (typically above the socket buffer size, whoch default to 1Kb - from the top of my head- but that you can set up to 64Kb more or less). Even on a loaded system.
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
And DefaultIoSessionDataStructureFactory has the following public static inner class: private static class DefaultWriteRequestQueue implements WriteRequestQueue { /** A queue to store incoming write requests */ private final Queue q = new ConcurrentLinkedQueue(); I'm assuming that's also a queue consumed by a loop thread, anything in fact that is consumable by a loop thread and has a chance of receiving many messages is optimizable by a low GC MpSc version of JC tools APIs. You can look at other queues they have available but TBH I prefer the array based queues as they have a practically zero GC impact: https://github.com/JCTools/JCTools/tree/master/jctools-core/src/main/java/org/jctools/queues Regards, Guido. On Mon, Oct 17, 2016 at 10:29 AM, Guido Medinawrote: > Sorry, wrong branch, that's on the current master (trunk), for our case it > would be in the following on 2.0.15 branch: > > public abstract class AbstractProtocolEncoderOutput implements > ProtocolEncoderOutput { > private final Queue messageQueue = new > ConcurrentLinkedQueue(); > ... > } > > The messageQueue variable is the candidate for this kind of optimization, > assuming it is consumed by a single loop thread. > > Regards, > > Guido. > > On Mon, Oct 17, 2016 at 10:12 AM, Guido Medina wrote: > >> Hi Emmanuel, >> >> At the mina-core class AbstractNioSession the variable: >> >> /** the queue of pending writes for the session, to be dequeued by >> the {@link SelectorLoop} */ >> private final Queue writeQueue = new >> DefaultWriteQueue(); >> >> such queue is being consumed by a selector loop? which makes me think it >> is a single thread loop hence making it MpSc and ideal for low GC >> optimization. >> But maybe such optimization is so unnoticeable that is not worth. >> >> That's the only place I think it is worth replacing by a low GC footprint >> queue, it will avoid the creation of GC-able linked nodes >> (ConcurrentLinkedQueue) >> >> In fact, further in that logic you try to avoid writing to the queue if >> is empty by passing the message directly to the next handler which is a >> micro-optimization, >> isEmpty() will 99.99% of the cases render to be false for systems with >> high load. >> >> WDTY? >> >> Guido. >> >> On Sat, Oct 15, 2016 at 8:12 PM, Guido Medina wrote: >> >>> I will take a look again at the source code but not today, I will let >>> you know on Monday if is applicable for MINA core, it seems it is not the >>> case, >>> my application is simply forwarding each decoded FIX message to an Akka >>> actor which are backed by a high performance queue, >>> I was thinking (will double check) these ByteBuffers were queue somehow >>> before they are picked by the handlers which is where a non-blocking MpSc >>> would play a role. >>> >>> But maybe I misunderstood the code I saw. >>> >>> I will check again and let you know, >>> >>> Have a nice weekend, >>> >>> Guido. >>> >>> On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny >> > wrote: >>> Tio be clear : when some sockets are ready for read (ie, the OP_READ flag has been set, and there is something in the socket to be read), the IoProcessor call to select()) will return and we will have a set of SelectionKey returned. Thsi set contains the set of all the channel that are ready for some processing. The IoProcessor thread will process them one after the other, from top to bottom. That means we don't process multiple sessions in parallel when all those sessions are handled by one singleIoProcessor. You have to be careful in what you do in your application, because any costly processing, or any synchronous access to a remote system will block the other sessions processing. Now, we always start the server with more than one IoProcessor (typically, Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor if you like, but at some point, if your CPU is 100% used, adding more IoProcessor does not help. What kind of performance are you expecting to reach ? (ie, how many requests per second ?) >>> >>> >> >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
Sorry, wrong branch, that's on the current master (trunk), for our case it would be in the following on 2.0.15 branch: public abstract class AbstractProtocolEncoderOutput implements ProtocolEncoderOutput { private final Queue messageQueue = new ConcurrentLinkedQueue(); ... } The messageQueue variable is the candidate for this kind of optimization, assuming it is consumed by a single loop thread. Regards, Guido. On Mon, Oct 17, 2016 at 10:12 AM, Guido Medinawrote: > Hi Emmanuel, > > At the mina-core class AbstractNioSession the variable: > > /** the queue of pending writes for the session, to be dequeued by the > {@link SelectorLoop} */ > private final Queue writeQueue = new DefaultWriteQueue(); > > such queue is being consumed by a selector loop? which makes me think it > is a single thread loop hence making it MpSc and ideal for low GC > optimization. > But maybe such optimization is so unnoticeable that is not worth. > > That's the only place I think it is worth replacing by a low GC footprint > queue, it will avoid the creation of GC-able linked nodes > (ConcurrentLinkedQueue) > > In fact, further in that logic you try to avoid writing to the queue if is > empty by passing the message directly to the next handler which is a > micro-optimization, > isEmpty() will 99.99% of the cases render to be false for systems with > high load. > > WDTY? > > Guido. > > On Sat, Oct 15, 2016 at 8:12 PM, Guido Medina wrote: > >> I will take a look again at the source code but not today, I will let you >> know on Monday if is applicable for MINA core, it seems it is not the case, >> my application is simply forwarding each decoded FIX message to an Akka >> actor which are backed by a high performance queue, >> I was thinking (will double check) these ByteBuffers were queue somehow >> before they are picked by the handlers which is where a non-blocking MpSc >> would play a role. >> >> But maybe I misunderstood the code I saw. >> >> I will check again and let you know, >> >> Have a nice weekend, >> >> Guido. >> >> On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny >> wrote: >> >>> Tio be clear : when some sockets are ready for read (ie, the OP_READ flag >>> has been set, and there is something in the socket to be read), the >>> IoProcessor call to select()) will return and we will have a set of >>> SelectionKey returned. Thsi set contains the set of all the channel that >>> are ready for some processing. The IoProcessor thread will process them >>> one >>> after the other, from top to bottom. That means we don't process multiple >>> sessions in parallel when all those sessions are handled by one >>> singleIoProcessor. You have to be careful in what you do in your >>> application, because any costly processing, or any synchronous access to >>> a >>> remote system will block the other sessions processing. >>> >>> Now, we always start the server with more than one IoProcessor >>> (typically, >>> Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor >>> if you like, but at some point, if your CPU is 100% used, adding more >>> IoProcessor does not help. >>> >>> What kind of performance are you expecting to reach ? (ie, how many >>> requests per second ?) >>> >> >> >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
Hi Emmanuel, At the mina-core class AbstractNioSession the variable: /** the queue of pending writes for the session, to be dequeued by the {@link SelectorLoop} */ private final Queue writeQueue = new DefaultWriteQueue(); such queue is being consumed by a selector loop? which makes me think it is a single thread loop hence making it MpSc and ideal for low GC optimization. But maybe such optimization is so unnoticeable that is not worth. That's the only place I think it is worth replacing by a low GC footprint queue, it will avoid the creation of GC-able linked nodes (ConcurrentLinkedQueue) In fact, further in that logic you try to avoid writing to the queue if is empty by passing the message directly to the next handler which is a micro-optimization, isEmpty() will 99.99% of the cases render to be false for systems with high load. WDTY? Guido. On Sat, Oct 15, 2016 at 8:12 PM, Guido Medinawrote: > I will take a look again at the source code but not today, I will let you > know on Monday if is applicable for MINA core, it seems it is not the case, > my application is simply forwarding each decoded FIX message to an Akka > actor which are backed by a high performance queue, > I was thinking (will double check) these ByteBuffers were queue somehow > before they are picked by the handlers which is where a non-blocking MpSc > would play a role. > > But maybe I misunderstood the code I saw. > > I will check again and let you know, > > Have a nice weekend, > > Guido. > > On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharny > wrote: > >> Tio be clear : when some sockets are ready for read (ie, the OP_READ flag >> has been set, and there is something in the socket to be read), the >> IoProcessor call to select()) will return and we will have a set of >> SelectionKey returned. Thsi set contains the set of all the channel that >> are ready for some processing. The IoProcessor thread will process them >> one >> after the other, from top to bottom. That means we don't process multiple >> sessions in parallel when all those sessions are handled by one >> singleIoProcessor. You have to be careful in what you do in your >> application, because any costly processing, or any synchronous access to a >> remote system will block the other sessions processing. >> >> Now, we always start the server with more than one IoProcessor (typically, >> Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor >> if you like, but at some point, if your CPU is 100% used, adding more >> IoProcessor does not help. >> >> What kind of performance are you expecting to reach ? (ie, how many >> requests per second ?) >> > >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
I will take a look again at the source code but not today, I will let you know on Monday if is applicable for MINA core, it seems it is not the case, my application is simply forwarding each decoded FIX message to an Akka actor which are backed by a high performance queue, I was thinking (will double check) these ByteBuffers were queue somehow before they are picked by the handlers which is where a non-blocking MpSc would play a role. But maybe I misunderstood the code I saw. I will check again and let you know, Have a nice weekend, Guido. On Sat, Oct 15, 2016 at 7:33 PM, Emmanuel Lecharnywrote: > Tio be clear : when some sockets are ready for read (ie, the OP_READ flag > has been set, and there is something in the socket to be read), the > IoProcessor call to select()) will return and we will have a set of > SelectionKey returned. Thsi set contains the set of all the channel that > are ready for some processing. The IoProcessor thread will process them one > after the other, from top to bottom. That means we don't process multiple > sessions in parallel when all those sessions are handled by one > singleIoProcessor. You have to be careful in what you do in your > application, because any costly processing, or any synchronous access to a > remote system will block the other sessions processing. > > Now, we always start the server with more than one IoProcessor (typically, > Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor > if you like, but at some point, if your CPU is 100% used, adding more > IoProcessor does not help. > > What kind of performance are you expecting to reach ? (ie, how many > requests per second ?) >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
Tio be clear : when some sockets are ready for read (ie, the OP_READ flag has been set, and there is something in the socket to be read), the IoProcessor call to select()) will return and we will have a set of SelectionKey returned. Thsi set contains the set of all the channel that are ready for some processing. The IoProcessor thread will process them one after the other, from top to bottom. That means we don't process multiple sessions in parallel when all those sessions are handled by one singleIoProcessor. You have to be careful in what you do in your application, because any costly processing, or any synchronous access to a remote system will block the other sessions processing. Now, we always start the server with more than one IoProcessor (typically, Nb core + 1 Ioprocessor). You can also fix a higher number of IoProcessor if you like, but at some point, if your CPU is 100% used, adding more IoProcessor does not help. What kind of performance are you expecting to reach ? (ie, how many requests per second ?)
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
On Sat, Oct 15, 2016 at 8:26 PM, Guido Medinawrote: > Doesn't each ByteBuffer goes into a queue? > No. What for ? Once we have read the bytes, we process them immediately, pushing them in the chain. You can multiplex the processing by adding an ExecutorFilter in the chain, assuming you have a lot of cores available.
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
On Sat, Oct 15, 2016 at 3:59 PM, Guido Medinawrote: > Maybe I misunderstood what I saw in the code, I saw 10 (not sure) places > where ConcurentLinkedQueue was used, one of them was for the connections > which for this case wouldn't make a difference. > Most of those lists are used to manage new session to be created, flushed or deleted. This is because we usually have one acceptor processing incoming connections, and dispatching them to various IoProcesser, responsible for the further processing. > The other place are for handling the received frame/message? Am I correct > here? that's where I believe it would make a difference, where a single > connection can have potentially hundreds of frames to be "handled" (by a > handler) > Not sure what you mean by 'frame'. > > Isn't that a good place to introduce MpSc? if such place has 1 consumer > thread pulling from such queue and then delegating to a handler. > The IoProcessor is the thread that get the messages (and it's not 'polling' them, it's an event-driven system) and propagate them to the handler.
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
On Sat, Oct 15, 2016 at 3:54 PM, Guido Medinawrote: > The connections count is usually "finite" (not worth the effort), If so, the best solution is certainly not to use NIO. One thread per connection s the way to go, and if you have enough memory, you can handle thousands of connections. > but the > queue for packets, isn't also a ConcurrentLinkedQueue? > I'm not sure how MINA core stores the packets received before they are > passed to their handler. > Packets aren't stored in any data strcture. They are read into a ByteBuffer, and passed through the filter chain up to the handler. You may have a codec filter in the middle, that decode the packet into Application's message, which are then passed through teh chain to the handler.
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
Maybe I misunderstood what I saw in the code, I saw 10 (not sure) places where ConcurentLinkedQueue was used, one of them was for the connections which for this case wouldn't make a difference. The other place are for handling the received frame/message? Am I correct here? that's where I believe it would make a difference, where a single connection can have potentially hundreds of frames to be "handled" (by a handler) Isn't that a good place to introduce MpSc? if such place has 1 consumer thread pulling from such queue and then delegating to a handler. On Sat, Oct 15, 2016 at 2:54 PM, Guido Medinawrote: > The connections count is usually "finite" (not worth the effort), but the > queue for packets, isn't also a ConcurrentLinkedQueue? > I'm not sure how MINA core stores the packets received before they are > passed to their handler. > > On Sat, Oct 15, 2016 at 2:27 PM, Emmanuel Lecharny > wrote: > >> On Sat, Oct 15, 2016 at 1:20 PM, Guido Medina wrote: >> >> > Hi, >> > >> > I was looking at MINA core source code and I noticed events are publish >> to >> > a ConcurrentLinkedQueue so here are my questions and suggestions: >> > >> >- Does ConcurrentLinkedQueue for these cases use the Pattern of >> > *Multiple >> >Producer/Single Consumer* (MpSc) or *Multiple Producer/Multiple >> > Consumer* >> >(MpMc) >> > >> >> MpMc. >> >> >> >> >- For low latency applications (in my case I'm talking QuickFixJ for >> the >> >financial industry) would it benefit from a MpSc that has low memory >> >footprint (more like low GC footprint)? >> > >> > If that is the case I would shade JCtools dependency and use the queue: >> > https://github.com/JCTools/JCTools/blob/master/jctools- >> > core/src/main/java/org/jctools/queues/MpscChunkedArrayQueue.java >> > >> > Such queue uses ring buffers (power of two arrays) and linked them if >> they >> > need to expand, which is great for theoretically unbounded queues but >> with >> > the benefit of not used linked nodes per element but linked arrays. >> > >> > Recently Netty replaced their non-blocking linked queues for that one. >> > >> >> That is an option. >> >> Now, I would say that for an application requiring low latency, basing it >> on top of NIO makes littel sense, considering the extra cost compared to a >> Blocking IO solution (and we are talking about 30% performance penalty, at >> least). >> >> Do you need to handle potentially millions of connections ? >> > >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
The connections count is usually "finite" (not worth the effort), but the queue for packets, isn't also a ConcurrentLinkedQueue? I'm not sure how MINA core stores the packets received before they are passed to their handler. On Sat, Oct 15, 2016 at 2:27 PM, Emmanuel Lecharnywrote: > On Sat, Oct 15, 2016 at 1:20 PM, Guido Medina wrote: > > > Hi, > > > > I was looking at MINA core source code and I noticed events are publish > to > > a ConcurrentLinkedQueue so here are my questions and suggestions: > > > >- Does ConcurrentLinkedQueue for these cases use the Pattern of > > *Multiple > >Producer/Single Consumer* (MpSc) or *Multiple Producer/Multiple > > Consumer* > >(MpMc) > > > > MpMc. > > > > >- For low latency applications (in my case I'm talking QuickFixJ for > the > >financial industry) would it benefit from a MpSc that has low memory > >footprint (more like low GC footprint)? > > > > If that is the case I would shade JCtools dependency and use the queue: > > https://github.com/JCTools/JCTools/blob/master/jctools- > > core/src/main/java/org/jctools/queues/MpscChunkedArrayQueue.java > > > > Such queue uses ring buffers (power of two arrays) and linked them if > they > > need to expand, which is great for theoretically unbounded queues but > with > > the benefit of not used linked nodes per element but linked arrays. > > > > Recently Netty replaced their non-blocking linked queues for that one. > > > > That is an option. > > Now, I would say that for an application requiring low latency, basing it > on top of NIO makes littel sense, considering the extra cost compared to a > Blocking IO solution (and we are talking about 30% performance penalty, at > least). > > Do you need to handle potentially millions of connections ? >
Re: ConcurrentLinkedQueue vs MpscChunkedArrayQueue
On Sat, Oct 15, 2016 at 1:20 PM, Guido Medinawrote: > Hi, > > I was looking at MINA core source code and I noticed events are publish to > a ConcurrentLinkedQueue so here are my questions and suggestions: > >- Does ConcurrentLinkedQueue for these cases use the Pattern of > *Multiple >Producer/Single Consumer* (MpSc) or *Multiple Producer/Multiple > Consumer* >(MpMc) > MpMc. >- For low latency applications (in my case I'm talking QuickFixJ for the >financial industry) would it benefit from a MpSc that has low memory >footprint (more like low GC footprint)? > > If that is the case I would shade JCtools dependency and use the queue: > https://github.com/JCTools/JCTools/blob/master/jctools- > core/src/main/java/org/jctools/queues/MpscChunkedArrayQueue.java > > Such queue uses ring buffers (power of two arrays) and linked them if they > need to expand, which is great for theoretically unbounded queues but with > the benefit of not used linked nodes per element but linked arrays. > > Recently Netty replaced their non-blocking linked queues for that one. > That is an option. Now, I would say that for an application requiring low latency, basing it on top of NIO makes littel sense, considering the extra cost compared to a Blocking IO solution (and we are talking about 30% performance penalty, at least). Do you need to handle potentially millions of connections ?