Hi Igniters, We have been testing with Ignite 1.9.0 and have this client that runs a simple (no-join) SQL Query on a single distributed cache. But if we kill the server node for testing in the meantime and if the client was running this query, it actually stalls the whole cluster.
All we have to do for the grid to resume functioning is restart the client. This may have something to do with data rebalancing when a server node dies. Would setting a rebalanceDelay help? we are using the default of 0 now. How does a client affect the whole cluster like this? and restarting it fixes the stall? The server nodes exchange worker threads are stuck on partitioning data. Client thread stuck below (thread dump) Name: main State: TIMED_WAITING Total blocked: 40 Total waited: 102,828 Stack trace: java.lang.Thread.sleep(Native Method) org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:494) org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.iterator(IgniteH2Indexing.java:1315) org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:94) org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1355) org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:94) com.tudor.server.grid.matching.GridMatcher.getTradeOrdersForPSGroup(GridMatcher.java:322) com.tudor.server.grid.matching.MatcherDelegate.unmatchRematch(MatcherDelegate.java:101) com.tudor.server.grid.matching.GridMatcher.processPendingOrder(GridMatcher.java:275) com.tudor.server.grid.matching.GridMatcher.run(GridMatcher.java:201) com.tudor.server.grid.matching.GridMatcher.main(GridMatcher.java:99) server node exchange worker thread dump "exchange-worker-#34%DataGridServer-Development%" Id=68 in TIMED_WAITING on lock=org.apache.ignite.internal.util.future.GridCompoundFuture@7e9c149b at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189) at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:779) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:732) at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:489) at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1674) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:745) Any help is appreciated. Thanks, Binti -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/SQL-query-on-client-stalling-the-grid-when-server-node-dies-tp13107.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
