Hello, I'm doing a POC to see if Ignite is suitable for my company's application. While doing this, I have created the following environment:
Configuration: Ignite version: 2.6.0 Java version used: Java(TM) SE Runtime Environment 1.8.0_171-b11 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11 OS: Windows 10 (local dev env) Server: running an Ignite server via the $IGNITE_HOME/bin/ignite.bat script. Client: Junit session running in IntelliJ, using Ignite's Java API to attach to the server, run in client mode, and activate the cluster once initially connected. Configuration: see attached zip file, file ignitepoc.xml. Both the client and server use the same configuration. PartitionExchangeProblemWhenReconnecting.zip <http://apache-ignite-users.70518.x6.nabble.com/file/t1951/PartitionExchangeProblemWhenReconnecting.zip> What's Happening Inititial client run - ok 1. Start server up - server starts ok 2. Run client - client is able to connect to server and run test to completion. Client also explicitly calls Ignite.close() to shutdown cleanly. During the client execution, it: * Destroys any existing copy of the test cache from prior runs * Creates a new test cache * Loads 100K items into that cache using a DataStreamer * reads all items in the cache using an Iterator obtained from the cache * reads 100K items at random using the cache's get() method Logs from this step are available in the attached zip file - file names ClientLog-FirstRun-Success.txt, ServerLog-FirstRun-Success.txt Second client run - trouble starts The server remains up and running from the first run. 3. Run client again. *The problem here is that the client never successfully connects to the server.* The server fails responding back to one of the messages sent from the client, and I see the following exception in the logs: /2018-07-30 18:08:05.494 [exchange-worker-#42] ERROR o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture.error:137 - Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, ip-172-27-225-23.ec2.internal/172.27.225.23:0, ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4, intOrder=3, lastExchangeTime=1532988478915, loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=true], topVer=4, nodeId8=798ca779, msg=Node joined: TcpDiscoveryNode [id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0, ip-172-27-225-23.ec2.internal/172.27.225.23:0, ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4, intOrder=3, lastExchangeTime=1532988478915, loc=false, ver=2.6.0#20180710-sha1:669feacc, isClient=true], type=NODE_JOINED, tstamp=1532988478959], nodeId=207a9b5d, evt=NODE_JOINED] java.lang.NullPointerException: null at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1243) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1239) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.rebuildIndexesIfNeeded(GridCacheDatabaseSharedManager.java:1239) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1711) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:126) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:729) ~[ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419) [ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299) [ignite-core-2.6.0.jar:2.6.0] at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [ignite-core-2.6.0.jar:2.6.0] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_171]/ Logs from this step are available in the attached zip file - file names ClientLog-SecondRun-ClientCannotConnect.txt, ServerLog-SecondRun-ClientCannotConnect.txt >From stepping through the server code using a debugger, I can see that the usrFut variable is null on GridCacheDatabaseSharedManager.java:1243. But I have no idea whether that is the problem or if my setup should not have even gotten into that area of the code. I had to kill the client in order to stop it, otherwise it will continually wait for the message to come back. Try to run client one more time - still a problem The server is still up and running from before, it hasn't been restarted. 4. Try running the client again. Here again, the client hangs. I don't seem to see the NPE like before. But it continually waits for a response from the server, and I have to kill it. Logs from this step are available in the attached zip file - file names ClientLog-ThirdRun-ClientStillCannotConnect.txt, ServerLog-ThirdRun-ClientStillCannotConnect.txt The client will not successfully connect to the server unless I restart the server. Then the pattern of events shown above repeats itself - first time the client can connect, but subsequent times it hangs. *Could someone please help? Is this a bug, or have I messed up something in the configuration?* -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
