Jason,

As you know, with non-blocking I/O a small number of threads are servicing a
potentially large number of connections (as opposed to blocking I/O with 1
thread per connection).  Therefore synchronization can potentially have a
dramatic effect when attempting to scale up if there is the possibility of
lock contention across more than one thread where one of those contending
threads is an I/O thread.  All connections serviced by the same I/O thread
are affected by such a scenario.

We saw active connections serviced by the same I/O thread pause throughput
while processing the request to close other sessions serviced by the same
I/O thread.  Eliminating the lock prevented the blocking behavior.  With a
small number of connections or low throughput, the pause may not be
noticeable, but at high load it became much more obvious.

The patch uses atomics to maintain the semantics of starting up the I/O
thread when the first connection is accepted, and shutting down the I/O
thread when the last connection is closed, but without acquiring the
synchronization lock for each connection in between.  It deals with the race
where two or more connections might be attempting to startup or shutdown the
I/O thread at the same time, and picks a winner atomically.

Kind Regards,
John Fallows

On Tue, Feb 15, 2011 at 10:29 AM, Jason Weinstein <
[email protected]> wrote:

> John,
>
> I just looked at the jira and it looks like your patch was applied in the
> next branch.
>
> If you have time, I'd be interested in a more detailed explanation of the
> issue
> and in how your solution solved the problem. Mostly I'm interested if your
> solution
> could be applied in other contexts where similar problems exist. Basically
> if there is
> some sort of pattern in there, perhaps something to learn.
>
> Thanks,
>
> Jason
>
>
> On 2/14/2011 7:19 PM, John Fallows wrote:
>
>> Hi Emmanuel,
>>
>> On Mon, Feb 14, 2011 at 6:30 PM, Emmanuel Lecharny <[email protected]
>> >wrote:
>>
>>
>>
>>> On 2/15/11 2:35 AM, John Fallows wrote:
>>>
>>>
>>>
>>>> Folks,
>>>>
>>>>
>>>>
>>> Hi,
>>>
>>> first, if you think that you have found a bug, the best, really, is to
>>> create a JIRA. We can discuss the ins and outs of the problem in the
>>> JIRA,
>>> at least, nothing will get lost. The mailing list is on the opposite a
>>> perfect way to lose some informations :)
>>>
>>> Don't be afraid : if there is no bug, we simply close the JIRA, nothing
>>> more.
>>>
>>>
>>
>>
>> No problem - in the past I've seen folks want it confirmed on the list
>> before filing JIRA.
>>
>>
>>
>>
>>>  We have discovered some blocking behavior in the Mina
>>>
>>>
>>>> AbstractPollingIoProcessor that is triggered while adding and removing
>>>> connections.
>>>>
>>>>
>>>>
>>> Adding a connection first triggers the Acceptor which create the socket
>>> and
>>> adds it to an IoProcessor. This is just a mmater of registering a
>>> Selection
>>> Key to the IoProcessor selector. As soon as the selectionKey is added, it
>>> generates an event which interrupts the worker select() call, then the
>>> newly
>>> added session is processed.
>>>
>>> Closing the connection does exactly the same thing : trigger an event and
>>> wake up the worker.
>>>
>>>  This class manages an internal worker that must be started when the
>>> first
>>>
>>>
>>>> connection is added, and stopped when the last connection is removed.
>>>>
>>>>
>>>>
>>> The worker is started *before* the first connection reaches it, AFAICT.
>>> And
>>> when the last connection has been closed, the worker is not killed,
>>> unless
>>> you specifically requires its disposal. It just waits for new incoming
>>> connections.
>>>
>>>  The code achieves this by using a synchronized block in
>>> startupProcessor()
>>>
>>>
>>>> as follows:
>>>>
>>>>    public final void remove(T session) {
>>>>        scheduleRemove(session);
>>>>        startupProcessor();
>>>>    }
>>>>
>>>>    private void scheduleRemove(T session) {
>>>>        removingSessions.add(session);
>>>>    }
>>>>
>>>>    private void startupProcessor() {
>>>>        synchronized (lock) {
>>>>            if (processor == null) {
>>>>                processor = new Processor();
>>>>                executor.execute(new NamePreservingRunnable(processor,
>>>>                        threadName));
>>>>            }
>>>>        }
>>>>
>>>>        // Just stop the select() and start it again, so that the
>>>> processor
>>>>        // can be activated immediately.
>>>>        wakeup();
>>>>    }
>>>>
>>>>
>>>> Each call to session.close() triggers the "filterClose" event on the
>>>> filter
>>>> chain, ending in a call to removeSession (shown above) where the
>>>> synchronized lock is obtained to verify that the processor is running in
>>>> order to close the connection.  When a large number of connections are
>>>> closed at the same time, they will contend for the synchronized lock.
>>>>
>>>>
>>>>
>>> The lock is just used the time it takes to check for a value nullity.
>>> It's
>>> *extremely* fast. I would not say that there is some potential contention
>>> issue here. You'll probably need tens of thousands connection closing at
>>> the
>>> same time to see the consequences of such a lock.
>>>
>>>
>>
>>
>> Agreed.  We did see significant impact when such collisions do occur.
>>
>>
>>
>>
>>>  Similar behavior occurs when new connections are established via
>>>
>>>
>>>> addSession
>>>> (not shown here).  Both removeSession and addSession synchronize on the
>>>> same
>>>> lock, so they also contend with each other as connections come and go.
>>>>
>>>>
>>>>
>>> I have to check the code to see if the IoProcessors can't be start
>>> *before*
>>> the Acceptor accepts incoming connections. That would eliminate the cost
>>> of
>>> creating new IoProcessors in a contended section of the code.
>>>
>>>
>>
>>
>> Yes, and a similar solution would also be needed for Connectors.
>>
>>
>>
>>
>>>  If you agree that this is an issue and add it to your issue tracker,
>>> then
>>>
>>>
>>>> we
>>>> would be happy to upload a patch with a suggested solution using atomic
>>>> data
>>>> structures instead for your consideration.
>>>>
>>>>
>>>>
>>> Please, feel free to do so ! This is a community work, we don't expect to
>>> be best coders on earth !
>>>
>>>  Note that we have found similar behavior in AbstractPollingIoAcceptor
>>>
>>>
>>>> and AbstractPollingIoConnector will gladly include suggested patches for
>>>> those as well.
>>>>
>>>>
>>>>
>>> Which makes sense, as they are symetrical.
>>>
>>> Thanks for your investigation, I'm waiting for the JIRA.
>>>
>>>
>>>
>>
>> Sure thing - https://issues.apache.org/jira/browse/DIRMINA-819.
>>
>> A patch is attached to DIRMINA-819 describing the proposed changes to
>> preserve the existing semantics using atomics and eliminate the lock.
>>
>> Hope this is helpful,
>> -john.
>>
>>
>>
>


-- 
>|< Kaazing Corporation >|<
John Fallows | CTO | +1.650.960.8148
444 Castro St, Suite 1100 | Mountain View, CA 94041, USA

Reply via email to