Hi, Is it possible to provide full logs or reproducer?
Anyway, I see that exchange waits for something and you should see reason at logs after phrase "Failed to wait for partition release future". On Wed, May 24, 2017 at 7:31 AM, bintisepaha <[email protected]> wrote: > Hi Igniters, > > We have been testing with Ignite 1.9.0 and have this client that runs a > simple (no-join) SQL Query on a single distributed cache. But if we kill > the > server node for testing in the meantime and if the client was running this > query, it actually stalls the whole cluster. > > All we have to do for the grid to resume functioning is restart the client. > This may have something to do with data rebalancing when a server node > dies. > Would setting a rebalanceDelay help? we are using the default of 0 now. > > How does a client affect the whole cluster like this? and restarting it > fixes the stall? The server nodes exchange worker threads are stuck on > partitioning data. > > Client thread stuck below (thread dump) > > Name: main > State: TIMED_WAITING > Total blocked: 40 Total waited: 102,828 > > Stack trace: > java.lang.Thread.sleep(Native Method) > org.apache.ignite.internal.processors.query.h2.twostep. > GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:494) > org.apache.ignite.internal.processors.query.h2. > IgniteH2Indexing$7.iterator(IgniteH2Indexing.java:1315) > org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator( > QueryCursorImpl.java:94) > org.apache.ignite.internal.processors.query.h2. > IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1355) > org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator( > QueryCursorImpl.java:94) > com.tudor.server.grid.matching.GridMatcher.getTradeOrdersForPSGroup( > GridMatcher.java:322) > com.tudor.server.grid.matching.MatcherDelegate.unmatchRematch( > MatcherDelegate.java:101) > com.tudor.server.grid.matching.GridMatcher.processPendingOrder( > GridMatcher.java:275) > com.tudor.server.grid.matching.GridMatcher.run(GridMatcher.java:201) > com.tudor.server.grid.matching.GridMatcher.main(GridMatcher.java:99) > > > server node exchange worker thread dump > > > "exchange-worker-#34%DataGridServer-Development%" Id=68 in TIMED_WAITING > on > lock=org.apache.ignite.internal.util.future.GridCompoundFuture@7e9c149b > at sun.misc.Unsafe.park(Native Method) > at java.util.concurrent.locks.LockSupport.parkNanos( > LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer. > tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at > org.apache.ignite.internal.util.future.GridFutureAdapter. > get0(GridFutureAdapter.java:189) > at > org.apache.ignite.internal.util.future.GridFutureAdapter. > get(GridFutureAdapter.java:139) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader. > GridDhtPartitionsExchangeFuture.waitPartitionRelease( > GridDhtPartitionsExchangeFuture.java:779) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader. > GridDhtPartitionsExchangeFuture.distributedExchange( > GridDhtPartitionsExchangeFuture.java:732) > at > org.apache.ignite.internal.processors.cache.distributed.dht.preloader. > GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFutur > e.java:489) > at > org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeMana > ger$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1674) > at > org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) > at java.lang.Thread.run(Thread.java:745) > > Any help is appreciated. > > Thanks, > Binti > > > > -- > View this message in context: http://apache-ignite-users. > 70518.x6.nabble.com/SQL-query-on-client-stalling-the-grid- > when-server-node-dies-tp13107.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. >
