Re: sshd-core: dealock

2016-12-05 Thread Markus Rathgeb
Hello again,

I am still trying to create a minimal example to reproduce the problem.

If I am using only 1 NIO worker on server side it is easy to reproduce.
So, I assume there must be at least two workers, that there is one
that always could call the notifyAll method to wake up all the
sleeping ones.
Can you give me some hints, how you ensure, that there is always one
worker free to notify the ones that call lock.wait?

A perhaps more readable (because of the line breaks) version of the
thread dump could be found here:
https://gist.github.com/maggu2810/4c8df61e95938102726376a044ca9a9e

Best regards,
Markus


sshd-core: dealock

2016-12-02 Thread Markus Rathgeb
Hello,

I have a project that is using sshd-core (+mina-core 2.0.16) on Client
and Server side.
I previously used 1.3.0 and switched to 1.4.0-SNAPSHOT. The problem (I
will describe shortly) is raised by both versions (not pre 1.3.0
version has been used before).

The clients logic is very simple:
* create a SshClient
* setup the client
* connect
* authenticate
* startRemotePortForwarding

The relevant server logic looks like:
* create a SshServer
* setup the server (e.g. accept all forwarding filter, set default
TCP/IP forwarder)
* start the server

Multiple clients connect to the server and the forwarded ports on the
server side are used by other tools to establish a tunneled connection
to the clients.

FYI: The whole stuff is running in a Karaf instance.

This works most of the time...
But not always.

Sometimes no data seems to be transmitted to the tunnels anymore.
The server seems to be run into a deadlock.

I created a stack trace (attached at the end) of the server and
connected a debugging session at the time no data is transmitted
anymore. It shows that all NIO workers resides in the "lock.wait()"
statement of the AbstractClientChanne's waitFor method.

If I disconnect the respective client  by closing the session on the
server side (I created a Karaf command line command) then the handling
is working again.

I know that the best would be to have a minimal example or an unit
test, but I cannot reproduce the scenario if e.g. client and server
are on the same network. I assume it is very timing dependent.

Could someone with more knowledge about the channel and lock handling
have a look at?
Who is responsible to call lock.notify[All]() is all NIO workers are waiting?

Thanks you guys!


The deadlock looks like:
===
"sshd-SshServer[67de991c]-nio2-thread-3" - Thread t@125

java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)

- waiting on <132c6b60> (a java.lang.Object)

at 
org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)

at 
org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)

at 
org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)

at 
org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)

at 
org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)

at 
org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)

at 
org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)

at java.security.AccessController.doPrivileged(Native Method)

at 
org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)

at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)

at sun.nio.ch.Invoker$2.run(Invoker.java:218)

at 
sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Locked ownable synchronizers:

- locked <6d02d7ed> (a java.util.concurrent.ThreadPoolExecutor$Worker)


"sshd-SshServer[67de991c]-nio2-thread-2" - Thread t@124

java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)

- waiting on <7e9f4eff> (a java.lang.Object)

at 
org.apache.sshd.client.channel.AbstractClientChannel.waitFor(AbstractClientChannel.java:244)

at 
org.apache.sshd.common.forward.DefaultTcpipForwarder$StaticIoHandler.messageReceived(DefaultTcpipForwarder.java:984)

at 
org.apache.sshd.common.io.nio2.Nio2Session.handleReadCycleCompletion(Nio2Session.java:276)

at 
org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:256)

at 
org.apache.sshd.common.io.nio2.Nio2Session$1.onCompleted(Nio2Session.java:253)

at 
org.apache.sshd.common.io.nio2.Nio2CompletionHandler.lambda$completed$0(Nio2CompletionHandler.java:38)

at 
org.apache.sshd.common.io.nio2.Nio2CompletionHandler$$Lambda$45/1071326492.run(Unknown
Source)

at java.security.AccessController.doPrivileged(Native Method)

at 
org.apache.sshd.common.io.nio2.Nio2CompletionHandler.completed(Nio2CompletionHandler.java:37)

at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)

at sun.nio.ch.Invoker$2.run(Invoker.java:218)

at 
sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)

at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Locked ownable synchronizers:

- locked <35fbf3e8> (a java.util.concurrent.ThreadPoolExecutor$Worker)


"sshd-SshServer[67de991c]-nio2-thread-1" - Thread t@122

java.lang.Thread.State: TIMED_WAITING

at java.lang.Object.wait(Native Method)

- waiting on