Hi Igniters,

We have been testing with Ignite 1.9.0 and have this client that runs a
simple (no-join) SQL Query on a single distributed cache. But if we kill the
server node for testing in the meantime and if the client was running this
query, it actually stalls the whole cluster.

All we have to do for the grid to resume functioning is restart the client.
This may have something to do with data rebalancing when a server node dies.
Would setting a rebalanceDelay help? we are using the default of 0 now.

How does a client affect the whole cluster like this? and restarting it
fixes the stall? The server nodes exchange worker threads are stuck on
partitioning data.

Client thread stuck below (thread dump)

Name: main
State: TIMED_WAITING
Total blocked: 40  Total waited: 102,828

Stack trace: 
java.lang.Thread.sleep(Native Method)
org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:494)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$7.iterator(IgniteH2Indexing.java:1315)
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:94)
org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1355)
org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:94)
com.tudor.server.grid.matching.GridMatcher.getTradeOrdersForPSGroup(GridMatcher.java:322)
com.tudor.server.grid.matching.MatcherDelegate.unmatchRematch(MatcherDelegate.java:101)
com.tudor.server.grid.matching.GridMatcher.processPendingOrder(GridMatcher.java:275)
com.tudor.server.grid.matching.GridMatcher.run(GridMatcher.java:201)
com.tudor.server.grid.matching.GridMatcher.main(GridMatcher.java:99)


server node exchange worker thread dump


"exchange-worker-#34%DataGridServer-Development%" Id=68 in TIMED_WAITING on
lock=org.apache.ignite.internal.util.future.GridCompoundFuture@7e9c149b
  at sun.misc.Unsafe.park(Native Method)
  at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
  at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
  at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:189)
  at
org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:139)
  at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.waitPartitionRelease(GridDhtPartitionsExchangeFuture.java:779)
  at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.distributedExchange(GridDhtPartitionsExchangeFuture.java:732)
  at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:489)
  at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1674)
  at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
  at java.lang.Thread.run(Thread.java:745)

Any help is appreciated.

Thanks,
Binti



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/SQL-query-on-client-stalling-the-grid-when-server-node-dies-tp13107.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to