[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.

2018-07-11 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540147#comment-16540147
 ] 

ASF GitHub Bot commented on IGNITE-8942:


Github user asfgit closed the pull request at:

https://github.com/apache/ignite/pull/4329


> In some cases grid cannot be deactivated because of hanging CQ internal 
> cleanup.
> 
>
> Key: IGNITE-8942
> URL: https://issues.apache.org/jira/browse/IGNITE-8942
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>Priority: Major
> Fix For: 2.7
>
> Attachments: thread_dump_eip-server_2018-07-05-18-02.log
>
>
> See the attachment for thread dump.
> Most probably caused by blocking of message worker while waiting for cluster 
> state change:
> {noformat}
> "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 
> os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition 
> [0x7fdcd76f5000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> at 
> org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {noformat}
> Another problem:
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor#onDeActivate
>  is called during exchange before transactions have completed, having 
> probability of losing CQ updates for current transactions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.

2018-07-11 Thread Sergey Chugunov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539968#comment-16539968
 ] 

Sergey Chugunov commented on IGNITE-8942:
-

[~ascherbakov],

Change looks good, please go ahead and merge it.

However it doesn't fix the root cause of the issue but provides only a 
workaround IMHO.

The issue is that code collecting cache metrics synchronously waits for 
transition state instead of returning immediately.
*publicApiActiveState* method has a parameter *waitForTransition* which is 
assigned to true in attached stack trace.

We may create a ticket of Minor priority to figure out how to implement the 
correct fix: *waitForTransition* should be assigned to false when collecting 
cache metrics. After that everything should be good.

> In some cases grid cannot be deactivated because of hanging CQ internal 
> cleanup.
> 
>
> Key: IGNITE-8942
> URL: https://issues.apache.org/jira/browse/IGNITE-8942
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>Priority: Major
> Fix For: 2.7
>
> Attachments: thread_dump_eip-server_2018-07-05-18-02.log
>
>
> See the attachment for thread dump.
> Most probably caused by blocking of message worker while waiting for cluster 
> state change:
> {noformat}
> "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 
> os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition 
> [0x7fdcd76f5000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> at 
> org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {noformat}
> Another problem:
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor#onDeActivate
>  is called during exchange before transactions have completed, having 
> probability of losing CQ updates for current transactions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.

2018-07-10 Thread Alexei Scherbakov (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538455#comment-16538455
 ] 

Alexei Scherbakov commented on IGNITE-8942:
---

[~agoncharuk],

Please review.

> In some cases grid cannot be deactivated because of hanging CQ internal 
> cleanup.
> 
>
> Key: IGNITE-8942
> URL: https://issues.apache.org/jira/browse/IGNITE-8942
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>Priority: Major
> Fix For: 2.7
>
> Attachments: thread_dump_eip-server_2018-07-05-18-02.log
>
>
> See the attachment for thread dump.
> Most probably caused by blocking of message worker while waiting for cluster 
> state change:
> {noformat}
> "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 
> os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition 
> [0x7fdcd76f5000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> at 
> org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {noformat}
> Another problem:
> org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor#onDeActivate
>  is called during exchange before transactions have completed, having 
> probability of losing CQ updates for current transactions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.

2018-07-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535774#comment-16535774
 ] 

ASF GitHub Bot commented on IGNITE-8942:


GitHub user ascherbakoff opened a pull request:

https://github.com/apache/ignite/pull/4329

IGNITE-8942

In some cases grid cannot be deactivated because of hanging CQ internal 
cleanup.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gridgain/apache-ignite ignite-8942

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/ignite/pull/4329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4329


commit 37a79c2d33ce17a8fa01f6205764f8099849e4b2
Author: Aleksei Scherbakov 
Date:   2018-07-06T17:25:31Z

IGNITE-8942 In some cases grid cannot be deactivated because of hanging CQ 
internal cleanup.

commit 49cd39caaa5ac29a88b2c0d3eae652f88feb3e5a
Author: ascherbakoff 
Date:   2018-07-07T14:40:58Z

IGNITE-8942 In some cases grid cannot be deactivated because of hanging CQ 
internal cleanup.




> In some cases grid cannot be deactivated because of hanging CQ internal 
> cleanup.
> 
>
> Key: IGNITE-8942
> URL: https://issues.apache.org/jira/browse/IGNITE-8942
> Project: Ignite
>  Issue Type: Bug
>Reporter: Alexei Scherbakov
>Assignee: Alexei Scherbakov
>Priority: Major
> Fix For: 2.6
>
> Attachments: thread_dump_eip-server_2018-07-05-18-02.log
>
>
> See the attachment for thread dump.
> Most probably caused by blocking of message worker while waiting for cluster 
> state change:
> {noformat}
> "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 
> os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition 
> [0x7fdcd76f5000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
> at 
> org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
> at 
> org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193)
> at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724)
> at 
> org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334)
> at 
> org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255)
> at 
> org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657)
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
> at 
> org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847)
> at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)