[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803074#comment-16803074 ] Alexey Goncharuk commented on IGNITE-4003: -- [~ilantukh] are you still working on this? Can you please unassign the ticket in case if you are not? > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Ilya Lantukh >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791784#comment-16791784 ] Ilya Kasnacheev commented on IGNITE-4003: - [~briggxs] please elaborate your case on userlist. JIRA is not for troubleshooting of particular deployments. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Ilya Lantukh >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784205#comment-16784205 ] Ross Brigoli commented on IGNITE-4003: -- I think the issues is not yet resolve. We are still getting this problem until now with Ignite 2.6. When an ignite client node JVM process terminates abnormally, the whole cluster sometimes becomes unresponsive. Any updates? > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Ilya Lantukh >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689619#comment-16689619 ] ASF GitHub Bot commented on IGNITE-4003: GitHub user ilantukh opened a pull request: https://github.com/apache/ignite/pull/5417 IGNITE-4003 You can merge this pull request into a Git repository by running: $ git pull https://github.com/gridgain/apache-ignite ignite-4003-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/ignite/pull/5417.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5417 commit 7d3aae218f69ef3959026463e97b4f6f2a836cf1 Author: Ilya Lantukh Date: 2018-11-16T15:39:56Z IGNITE-4003 : WIP. commit c9667fbaf47f37cdffa48bb1c78eafb3550e4c55 Author: Ilya Lantukh Date: 2018-11-16T16:17:43Z IGNITE-4003 : Fixed recovery descriptor initialization. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Ilya Lantukh >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536741#comment-16536741 ] sunxm commented on IGNITE-4003: --- How does this issue will handle? > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Semen Boikov >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16259384#comment-16259384 ] Amit Pundir commented on IGNITE-4003: - Please update for the status of this bug fix. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Semen Boikov >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240624#comment-16240624 ] Amit Pundir commented on IGNITE-4003: - With which version will this fix be released?\ Thanks > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Semen Boikov >Priority: Critical > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964169#comment-15964169 ] Semen Boikov commented on IGNITE-4003: -- Andrey, I did review, found only one potential implementation issue: {noformat} nioSrvr.sendSystem(ses, new RecoveryLastReceivedMessage(-1), new IgniteInClosure() { @Override public void apply(IgniteInternalFuture msgFut) { try { msgFut.get(); } catch (IgniteCheckedException e) { if (log.isDebugEnabled()) log.debug("Failed to send recovery handshake " + "[rmtNode=" + rmtNode.id() + ", err=" + e + ']'); recoveryDesc.release(); } finally { fut.onDone(); clientFuts.remove(connKey, fut); ses.close(); } } }); {noformat} It seems it is not needed to call 'recoveryDesc.release()' since descriptor should be released on session close. Another, more serious issue from my point of view: TcpComminucationSpi code already was overcomplicated, now it became even for unclear, I need to think how it can be simplified. Thanks > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Semen Boikov >Priority: Critical > Fix For: 2.1 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15947325#comment-15947325 ] Yakov Zhdanov commented on IGNITE-4003: --- Andery, can you please merge the latest master? > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15889061#comment-15889061 ] Andrey Gura commented on IGNITE-4003: - Still there are failed tests from client reconnect during query execution subset. Fix in progress. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871973#comment-15871973 ] Andrey Gura commented on IGNITE-4003: - In async mode handshake protocol can be broken. As result one node is sure that it is trying to connect to some remote node and rejecting all connections from remote nodes while remote node already interrupt handshake protocol. Fixed. Waiting for tests results. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 1.9 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864043#comment-15864043 ] Andrey Gura commented on IGNITE-4003: - Actually my changes broke client reconnect tests. It was shaded because in my branch shmem is switched on by default. But if shmem disabled tests fail with 100% probability. Investigation in progress. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 1.9 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858362#comment-15858362 ] Andrey Gura commented on IGNITE-4003: - Fixed failed tests from TCP communication suite. Waiting for TC. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856619#comment-15856619 ] Andrey Gura commented on IGNITE-4003: - Merged master branch to my changes. Most tests are fixed but still there are a few issues related with paired connections. Fix in progress. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832077#comment-15832077 ] Andrey Gura commented on IGNITE-4003: - Fixed. Also fix of IGNITE-4499 is merged. Waiting for TC. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15780621#comment-15780621 ] Semen Boikov commented on IGNITE-4003: -- Hi Andrey, I started review, so far found two issues (left TODOs in branch), please take a look: - 'onTimeout' method is called from dedicated thread (one thread per node), it is not safe to retry connect from this callback - when connect is cancelled by ConnectionTimeoutObject you call 'recoveryDesc.release()' from comm worker. Actually at this moment session can be already created and used, so call 'release' can break something. Need move descriptor release to NIO thread Thanks! > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15778835#comment-15778835 ] Andrey Gura commented on IGNITE-4003: - Added tests for scenario when client node should be failed if communication fails to establish connection. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776941#comment-15776941 ] Andrey Gura commented on IGNITE-4003: - At this moment there are several problems. I think that all this problems have common root of problem: race condition on {{clients}} map and {{GridNioRecoveryDescriptor}}. There are the following behaviour aspects that lead to infinite retries connect to remote node: # Client was removed from {{clients}} map on local node but still exists on remote node. As result remote node rejects connection with "Received incoming connection when already connected to this node, rejecting" message. # Remote node rejects connection because it already tries connect to local node but can't reserve recovery descriptor. Remote node rejects connection with "Received incoming connection when already connected to this node, rejecting" message. # Remote node doesn't reject connection but don't send {{RecoveryLastReceivedMessage}} because of impossibility to reserve recovery descriptor on {{tryReserve}} call. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762512#comment-15762512 ] Andrey Gura commented on IGNITE-4003: - Outgoing connection creation process can go to infinite loop due to incorrect handling of handshake timeout that can occur after connection was established. As result remote note rejects connection while local tries to connect again and again. Need to investigate. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15746046#comment-15746046 ] Andrey Gura commented on IGNITE-4003: - Found issues with recovery descriptor reservation. Fixed. Waiting for TC. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742701#comment-15742701 ] Andrey Gura commented on IGNITE-4003: - All TODOs fixed. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.access$1000(GridDhtPreloader.java:81) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729499#comment-15729499 ] Andrey Gura commented on IGNITE-4003: - Minor fixes. New PR created https://github.com/apache/ignite/pull/1328. {{TcpCommunicationSpi.ConnectGate}} should be optimized because now it has single contention point (mutex). Waiting for TC. Please, review. > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture$MiniFuture.onResult(GridDhtForceKeysFuture.java:548) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtForceKeysFuture.onResult(GridDhtForceKeysFuture.java:207) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader.processForceKeyResponse(GridDhtPreloader.java:636) >
[jira] [Commented] (IGNITE-4003) Slow or faulty client can stall the whole cluster.
[ https://issues.apache.org/jira/browse/IGNITE-4003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727182#comment-15727182 ] Andrey Gura commented on IGNITE-4003: - Outgoing connection establishing reimplemented in asynchronous manner, so user thread should not be blocked. Most changes related with {{TcpCommunicationSpi}} and {{GridNioServer}} classes. Handshake functionality ({{TcpCommunicationSpi.safeHandshake()}} method) was keep for shmem IPC but rewritten for real network communication. For the last case it was moved to server listener's {{onMessage}} and {{onFirstMessageListener}}. Also, now communication worker makes additional job in order to provide session management in async way. See [PR #1318|https://github.com/apache/ignite/pull/1318] > Slow or faulty client can stall the whole cluster. > -- > > Key: IGNITE-4003 > URL: https://issues.apache.org/jira/browse/IGNITE-4003 > Project: Ignite > Issue Type: Bug > Components: cache, general >Affects Versions: 1.7 >Reporter: Vladimir Ozerov >Assignee: Andrey Gura >Priority: Critical > Fix For: 2.0 > > > Steps to reproduce: > 1) Start two server nodes and some data to cache. > 2) Start a client from Docker subnet, which is not visible from the outside. > Client will join the cluster. > 3) Try to put something to cache or start another node to force rabalance. > Cluster is stuck at this moment. Root cause - servers are constantly trying > to establish outgoing connection to the client, but fail as Docker subnet is > not visible from the outside. It may stop virtually all cluster operations. > Typical thread dump: > {code} > org.apache.ignite.IgniteCheckedException: Failed to send message (node may > have left the grid or TCP connection cannot be established due to firewall > issues) [node=TcpDiscoveryNode [id=a15d74c2-1ec2-4349-9640-aeacd70d8714, > addrs=[127.0.0.1, 172.17.0.6], sockAddrs=[/127.0.0.1:0, /127.0.0.1:0, > /172.17.0.6:0], discPort=0, order=7241, intOrder=3707, > lastExchangeTime=1474096941045, loc=false, ver=1.5.23#20160526-sha1:259146da, > isClient=true], topic=T4 [topic=TOPIC_CACHE, > id1=949732fd-1360-3a58-8d9e-0ff6ea6182cc, > id2=a15d74c2-1ec2-4349-9640-aeacd70d8714, id3=2], msg=GridContinuousMessage > [type=MSG_EVT_NOTIFICATION, routineId=7e13c48e-6933-48b2-9f15-8d92007930db, > data=null, futId=null], policy=2] > at > org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1129) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.managers.communication.GridIoManager.sendOrderedMessage(GridIoManager.java:1347) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1227) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1198) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendWithRetries(GridContinuousProcessor.java:1180) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.sendNotification(GridContinuousProcessor.java:841) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.continuous.GridContinuousProcessor.addNotification(GridContinuousProcessor.java:800) > ~[ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.onEntryUpdate(CacheContinuousQueryHandler.java:787) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler.access$700(CacheContinuousQueryHandler.java:91) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryHandler$1.onEntryUpdated(CacheContinuousQueryHandler.java:412) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:343) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager.onEntryUpdated(CacheContinuousQueryManager.java:250) > [ignite-core-1.5.23.jar:1.5.23] > at > org.apache.ignite.internal.processors.cache.GridCacheMapEntry.initialValue(GridCacheMapEntry.java:3476) > [ignite-core-1.5.23.jar:1.5.23] > at >