[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.
[ https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540147#comment-16540147 ] ASF GitHub Bot commented on IGNITE-8942: Github user asfgit closed the pull request at: https://github.com/apache/ignite/pull/4329 > In some cases grid cannot be deactivated because of hanging CQ internal > cleanup. > > > Key: IGNITE-8942 > URL: https://issues.apache.org/jira/browse/IGNITE-8942 > Project: Ignite > Issue Type: Bug >Reporter: Alexei Scherbakov >Assignee: Alexei Scherbakov >Priority: Major > Fix For: 2.7 > > Attachments: thread_dump_eip-server_2018-07-05-18-02.log > > > See the attachment for thread dump. > Most probably caused by blocking of message worker while waiting for cluster > state change: > {noformat} > "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 > os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition > [0x7fdcd76f5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724) > at > org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > Another problem: > org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor#onDeActivate > is called during exchange before transactions have completed, having > probability of losing CQ updates for current transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.
[ https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16539968#comment-16539968 ] Sergey Chugunov commented on IGNITE-8942: - [~ascherbakov], Change looks good, please go ahead and merge it. However it doesn't fix the root cause of the issue but provides only a workaround IMHO. The issue is that code collecting cache metrics synchronously waits for transition state instead of returning immediately. *publicApiActiveState* method has a parameter *waitForTransition* which is assigned to true in attached stack trace. We may create a ticket of Minor priority to figure out how to implement the correct fix: *waitForTransition* should be assigned to false when collecting cache metrics. After that everything should be good. > In some cases grid cannot be deactivated because of hanging CQ internal > cleanup. > > > Key: IGNITE-8942 > URL: https://issues.apache.org/jira/browse/IGNITE-8942 > Project: Ignite > Issue Type: Bug >Reporter: Alexei Scherbakov >Assignee: Alexei Scherbakov >Priority: Major > Fix For: 2.7 > > Attachments: thread_dump_eip-server_2018-07-05-18-02.log > > > See the attachment for thread dump. > Most probably caused by blocking of message worker while waiting for cluster > state change: > {noformat} > "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 > os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition > [0x7fdcd76f5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724) > at > org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > Another problem: > org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor#onDeActivate > is called during exchange before transactions have completed, having > probability of losing CQ updates for current transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.
[ https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538455#comment-16538455 ] Alexei Scherbakov commented on IGNITE-8942: --- [~agoncharuk], Please review. > In some cases grid cannot be deactivated because of hanging CQ internal > cleanup. > > > Key: IGNITE-8942 > URL: https://issues.apache.org/jira/browse/IGNITE-8942 > Project: Ignite > Issue Type: Bug >Reporter: Alexei Scherbakov >Assignee: Alexei Scherbakov >Priority: Major > Fix For: 2.7 > > Attachments: thread_dump_eip-server_2018-07-05-18-02.log > > > See the attachment for thread dump. > Most probably caused by blocking of message worker while waiting for cluster > state change: > {noformat} > "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 > os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition > [0x7fdcd76f5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724) > at > org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} > Another problem: > org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor#onDeActivate > is called during exchange before transactions have completed, having > probability of losing CQ updates for current transactions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IGNITE-8942) In some cases grid cannot be deactivated because of hanging CQ internal cleanup.
[ https://issues.apache.org/jira/browse/IGNITE-8942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535774#comment-16535774 ] ASF GitHub Bot commented on IGNITE-8942: GitHub user ascherbakoff opened a pull request: https://github.com/apache/ignite/pull/4329 IGNITE-8942 In some cases grid cannot be deactivated because of hanging CQ internal cleanup. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-8942 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/4329.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4329 commit 37a79c2d33ce17a8fa01f6205764f8099849e4b2 Author: Aleksei Scherbakov Date: 2018-07-06T17:25:31Z IGNITE-8942 In some cases grid cannot be deactivated because of hanging CQ internal cleanup. commit 49cd39caaa5ac29a88b2c0d3eae652f88feb3e5a Author: ascherbakoff Date: 2018-07-07T14:40:58Z IGNITE-8942 In some cases grid cannot be deactivated because of hanging CQ internal cleanup. > In some cases grid cannot be deactivated because of hanging CQ internal > cleanup. > > > Key: IGNITE-8942 > URL: https://issues.apache.org/jira/browse/IGNITE-8942 > Project: Ignite > Issue Type: Bug >Reporter: Alexei Scherbakov >Assignee: Alexei Scherbakov >Priority: Major > Fix For: 2.6 > > Attachments: thread_dump_eip-server_2018-07-05-18-02.log > > > See the attachment for thread dump. > Most probably caused by blocking of message worker while waiting for cluster > state change: > {noformat} > "tcp-disco-msg-worker-#2%DPL_GRID%DplGridNodeName%" #380 daemon prio=10 > os_prio=0 tid=0x7fe084c4c000 nid=0x39aa waiting on condition > [0x7fdcd76f5000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) > at > org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140) > at > org.apache.ignite.internal.processors.cluster.GridClusterStateProcessor.publicApiActiveState(GridClusterStateProcessor.java:193) > at > org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTopologyFutureAdapter.validateCache(GridDhtTopologyFutureAdapter.java:83) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForOperation(CacheMetricsImpl.java:715) > at > org.apache.ignite.internal.processors.cache.CacheMetricsImpl.isValidForReading(CacheMetricsImpl.java:724) > at > org.apache.ignite.internal.processors.cache.CacheMetricsSnapshot.(CacheMetricsSnapshot.java:334) > at > org.apache.ignite.internal.processors.cache.GridCacheAdapter.localMetrics(GridCacheAdapter.java:3255) > at > org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$7.cacheMetrics(GridDiscoveryManager.java:1098) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMetricsUpdateMessage(ServerImpl.java:5141) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2794) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2570) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorker.body(ServerImpl.java:6903) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2657) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at > org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerThread.body(ServerImpl.java:6847) > at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)