[jira] [Commented] (IGNITE-3412) Client instance hangs on close

2016-08-26 Thread Denis Magda (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440280#comment-15440280
 ] 

Denis Magda commented on IGNITE-3412:
-

Hell, now I see why we can't reproduce this on our side.

> Client instance hangs on close
> --
>
> Key: IGNITE-3412
> URL: https://issues.apache.org/jira/browse/IGNITE-3412
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 1.8
>
> Attachments: SocketsTest.zip, threadDump.txt
>
>
> In some cases calling close on Ignite client instance will lead to deadlock.
> The deadlock happens because of the following.
> Socket writer is waiting for new messages.
> {code}
> "tcp-client-disco-sock-writer-#2%null%" #100 prio=6 os_prio=0 
> tid=0x5fad2800 nid=0x13bc in Object.wait() [0x67d0e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1051)
>   - locked <0x863da2f8> (a java.lang.Object)
>   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> The closing process is hanging because TcpDiscoverySPI waits while this 
> writer is terminated
> {code}
> "Thread-6" #29 prio=6 os_prio=0 tid=0x5a74 nid=0x17e8 in 
> Object.wait() [0x6077e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x863da010> (a 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:4476)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.spiStop(ClientImpl.java:295)
>   at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStop(TcpDiscoverySpi.java:1905)
>   at 
> org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:325)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.stop(GridDiscoveryManager.java:1336)
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1940)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1812)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2248)
>   - locked <0x858e77a8> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2211)
>   at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
>   at org.apache.ignite.Ignition.stop(Ignition.java:224)
>   at org.apache.ignite.internal.IgniteKernal.close(IgniteKernal.java:2921)
>   at ru.sbrf.ggcod.loader.job.MainLoader.run(MainLoader.java:123)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> There is some raise that led to the situation when the writer is hanging on 
> {{Object.wait}} method ignoring interrupted flag that was set at some point 
> of time.
> The full thread dump is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3412) Client instance hangs on close

2016-07-29 Thread Yakov Zhdanov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399009#comment-15399009
 ] 

Yakov Zhdanov commented on IGNITE-3412:
---

Guys this looks really weird.

Can you please try this class with debug output?

{noformat}
/**
 *
 */
private class SocketWriter extends IgniteSpiThread {
/** */
private final Object mux = new Object();

/** */
private Socket sock;

/** */
private boolean clientAck;

/** */
private final Queue queue = new 
ArrayDeque<>();

/** */
private final long socketTimeout;

/** */
private TcpDiscoveryAbstractMessage unackedMsg;

/**
 *
 */
protected SocketWriter() {
super(spi.ignite().name(), "tcp-client-disco-sock-writer", log);

socketTimeout = spi.failureDetectionTimeoutEnabled() ? 
spi.failureDetectionTimeout() :
spi.getSocketTimeout();
}

/**
 * @param msg Message.
 */
private void sendMessage(TcpDiscoveryAbstractMessage msg) {
synchronized (mux) {
queue.add(msg);

mux.notifyAll();
}
}

/**
 * @param sock Socket.
 * @param clientAck {@code True} is server supports client message 
acknowlede.
 */
private void setSocket(Socket sock, boolean clientAck) {
synchronized (mux) {
this.sock = sock;

this.clientAck = clientAck;

unackedMsg = null;

mux.notifyAll();
}
}

/**
 * @return {@code True} if connection is alive.
 */
public boolean isOnline() {
synchronized (mux) {
return sock != null;
}
}

/**
 * @param res Acknowledge response.
 */
void ackReceived(TcpDiscoveryClientAckResponse res) {
synchronized (mux) {
if (unackedMsg != null) {
assert unackedMsg.id().equals(res.messageId()) : unackedMsg;

unackedMsg = null;
}

mux.notifyAll();
}
}

/** {@inheritDoc} */
@Override public void interrupt() {
super.interrupt();

U.debug(log, "SocketWriter has been interrupted: " + 
System.identityHashCode(this));
}

/** {@inheritDoc} */
@Override protected void body() throws InterruptedException {
TcpDiscoveryAbstractMessage msg = null;

while (!Thread.currentThread().isInterrupted()) {
Socket sock;

U.debug(log, "1 SocketWriter thread interrupted status " +
"[status=" + Thread.currentThread().isInterrupted() +
", hash=" + System.identityHashCode(this) + ']');

synchronized (mux) {
sock = this.sock;

if (sock == null) {
mux.wait();

continue;
}

if (msg == null)
msg = queue.poll();

if (msg == null) {
mux.wait();

continue;
}
}

U.debug(log, "2 SocketWriter thread interrupted status " +
"[status=" + Thread.currentThread().isInterrupted() +
", hash=" + System.identityHashCode(this) + ']');

for (IgniteInClosure msgLsnr : 
spi.sndMsgLsnrs)
msgLsnr.apply(msg);

boolean ack = clientAck && !(msg instanceof 
TcpDiscoveryPingResponse);

try {
if (ack) {
synchronized (mux) {
assert unackedMsg == null : "Unacked=" + unackedMsg 
+ ", received=" + msg;

unackedMsg = msg;
}
}

U.debug(log, "3 SocketWriter thread interrupted status " +
"[status=" + Thread.currentThread().isInterrupted() +
", hash=" + System.identityHashCode(this) + ']');

spi.writeToSocket(
sock,
msg,
socketTimeout);

msg = null;

U.debug(log, "4 SocketWriter thread interrupted status " +
"[status=" + Thread.currentThread().isInterrupted() +
", hash=" + System.identityHashCode(this) + ']');


if (ack) {
long waitEnd = U.currentTimeMillis() + 
(spi.failureDetectionTimeoutEnabled() ?

[jira] [Commented] (IGNITE-3412) Client instance hangs on close

2016-07-18 Thread Alexei Scherbakov (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15382619#comment-15382619
 ] 

Alexei Scherbakov commented on IGNITE-3412:
---

I did some work testing clearing of interrupted stated on socketexception and 
found out what it's not true.
Will try to debug the case later.

> Client instance hangs on close
> --
>
> Key: IGNITE-3412
> URL: https://issues.apache.org/jira/browse/IGNITE-3412
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 1.7
>
> Attachments: SocketsTest.zip, threadDump.txt
>
>
> In some cases calling close on Ignite client instance will lead to deadlock.
> The deadlock happens because of the following.
> Socket writer is waiting for new messages.
> {code}
> "tcp-client-disco-sock-writer-#2%null%" #100 prio=6 os_prio=0 
> tid=0x5fad2800 nid=0x13bc in Object.wait() [0x67d0e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1051)
>   - locked <0x863da2f8> (a java.lang.Object)
>   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> The closing process is hanging because TcpDiscoverySPI waits while this 
> writer is terminated
> {code}
> "Thread-6" #29 prio=6 os_prio=0 tid=0x5a74 nid=0x17e8 in 
> Object.wait() [0x6077e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x863da010> (a 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:4476)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.spiStop(ClientImpl.java:295)
>   at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStop(TcpDiscoverySpi.java:1905)
>   at 
> org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:325)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.stop(GridDiscoveryManager.java:1336)
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1940)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1812)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2248)
>   - locked <0x858e77a8> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2211)
>   at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
>   at org.apache.ignite.Ignition.stop(Ignition.java:224)
>   at org.apache.ignite.internal.IgniteKernal.close(IgniteKernal.java:2921)
>   at ru.sbrf.ggcod.loader.job.MainLoader.run(MainLoader.java:123)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> There is some raise that led to the situation when the writer is hanging on 
> {{Object.wait}} method ignoring interrupted flag that was set at some point 
> of time.
> The full thread dump is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3412) Client instance hangs on close

2016-07-11 Thread Denis Magda (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15370436#comment-15370436
 ] 

Denis Magda commented on IGNITE-3412:
-

[~ascherbakov],

According to the logs and thread dumps there shouldn't be any 
{{InterruptedIOException}} and the socket should have been closed by 
{{SocketReader}}. After that {{SocketWriter}} received {{SocketException}}. The 
question is why interrupted flag was reset after {{SocketWriter}} received this 
kind of exception. 

Please keep debugging performing the following:
- enable {{DEBUG}} level logging for {{ClientImpl}} and {{TcpDiscoverySpi}};
- run SocketTest.zip test on the target machine. First start {{Second}} 
executable and after that {{First}} executable. Looks like that Windows reset 
the interrupted flag on {{SocketException}}. Need to double check.

> Client instance hangs on close
> --
>
> Key: IGNITE-3412
> URL: https://issues.apache.org/jira/browse/IGNITE-3412
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Denis Magda
> Fix For: 1.7
>
> Attachments: SocketsTest.zip, threadDump.txt
>
>
> In some cases calling close on Ignite client instance will lead to deadlock.
> The deadlock happens because of the following.
> Socket writer is waiting for new messages.
> {code}
> "tcp-client-disco-sock-writer-#2%null%" #100 prio=6 os_prio=0 
> tid=0x5fad2800 nid=0x13bc in Object.wait() [0x67d0e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1051)
>   - locked <0x863da2f8> (a java.lang.Object)
>   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> The closing process is hanging because TcpDiscoverySPI waits while this 
> writer is terminated
> {code}
> "Thread-6" #29 prio=6 os_prio=0 tid=0x5a74 nid=0x17e8 in 
> Object.wait() [0x6077e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x863da010> (a 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:4476)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.spiStop(ClientImpl.java:295)
>   at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStop(TcpDiscoverySpi.java:1905)
>   at 
> org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:325)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.stop(GridDiscoveryManager.java:1336)
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1940)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1812)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2248)
>   - locked <0x858e77a8> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2211)
>   at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
>   at org.apache.ignite.Ignition.stop(Ignition.java:224)
>   at org.apache.ignite.internal.IgniteKernal.close(IgniteKernal.java:2921)
>   at ru.sbrf.ggcod.loader.job.MainLoader.run(MainLoader.java:123)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> There is some raise that led to the situation when the writer is hanging on 
> {{Object.wait}} method ignoring interrupted flag that was set at some point 
> of time.
> The full thread dump is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (IGNITE-3412) Client instance hangs on close

2016-07-08 Thread Denis Magda (JIRA)

[ 
https://issues.apache.org/jira/browse/IGNITE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368442#comment-15368442
 ] 

Denis Magda commented on IGNITE-3412:
-

[~ascherbakov], seems that I figured out why the client hangs and the 
interrupted flag is ignored. The reason is that we ignore 
{{InterruptedIOException}} in the loop of {{SocketWriter}}. Basically the 
following can happen:

1) {{SocketWriter}} is writing a chunk of data using {{spi.writeToSocket(sock, 
msg, socketTimeout)}};
2) {{Ignite}} is being closed and {{interrupt}} is executed for 
{{SocketWriter}} thread;
3) underlying I/O code that is executed by {{spi.writeToSocket(sock, msg, 
socketTimeout)}} sees that the thread has been interrupted and throws 
{{InterruptedIOExceptio}} and not {{InterruptedException}}!!! When 
{{InterruptedIOException}} is thrown the interrupted flag is cleared; 
4) since {{InterruptedIOException}} is just a kind of {{IOException}} then 
{{SocketWriter}} catches this exception, closes the socket, checks that the 
Thread is not interrupted and hangs at {{mux.wait()}}.

To fix the issue we must threat {{InterruptedIOException}} differently. Not 
only {{SocketWriter}} but other similar parts of the platform must be fixed. 
I'm assigning the ticket on myself and will finalize it in the nearest time.

BTW, do you have the logs at hands? I remember you were telling that before 
{{SocketWriter}} moved to the hanging state it threw some kind of IO Exception. 
What was that?

> Client instance hangs on close
> --
>
> Key: IGNITE-3412
> URL: https://issues.apache.org/jira/browse/IGNITE-3412
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 1.6
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
> Fix For: 1.7
>
> Attachments: threadDump.txt
>
>
> In some cases calling close on Ignite client instance will lead to deadlock.
> The deadlock happens because of the following.
> Socket writer is waiting for new messages.
> {code}
> "tcp-client-disco-sock-writer-#2%null%" #100 prio=6 os_prio=0 
> tid=0x5fad2800 nid=0x13bc in Object.wait() [0x67d0e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1051)
>   - locked <0x863da2f8> (a java.lang.Object)
>   at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {code}
> The closing process is hanging because TcpDiscoverySPI waits while this 
> writer is terminated
> {code}
> "Thread-6" #29 prio=6 os_prio=0 tid=0x5a74 nid=0x17e8 in 
> Object.wait() [0x6077e000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x863da010> (a 
> org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> org.apache.ignite.internal.util.IgniteUtils.join(IgniteUtils.java:4476)
>   at 
> org.apache.ignite.spi.discovery.tcp.ClientImpl.spiStop(ClientImpl.java:295)
>   at 
> org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStop(TcpDiscoverySpi.java:1905)
>   at 
> org.apache.ignite.internal.managers.GridManagerAdapter.stopSpi(GridManagerAdapter.java:325)
>   at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager.stop(GridDiscoveryManager.java:1336)
>   at org.apache.ignite.internal.IgniteKernal.stop0(IgniteKernal.java:1940)
>   at org.apache.ignite.internal.IgniteKernal.stop(IgniteKernal.java:1812)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop0(IgnitionEx.java:2248)
>   - locked <0x858e77a8> (a 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
>   at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.stop(IgnitionEx.java:2211)
>   at org.apache.ignite.internal.IgnitionEx.stop(IgnitionEx.java:322)
>   at org.apache.ignite.Ignition.stop(Ignition.java:224)
>   at org.apache.ignite.internal.IgniteKernal.close(IgniteKernal.java:2921)
>   at ru.sbrf.ggcod.loader.job.MainLoader.run(MainLoader.java:123)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> There is some raise that led to the situation when the writer is hanging on 
> {{Object.wait}} method ignoring interrupted flag that was set at some point 
> of time.
> The full thread dump is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)