Hello,

I'm doing a POC to see if Ignite is suitable for my company's application. 
While doing this, I have created the following environment:

Configuration:

Ignite version: 2.6.0
Java version used:  Java(TM) SE Runtime Environment 1.8.0_171-b11 Oracle
Corporation Java HotSpot(TM) 64-Bit Server VM 25.171-b11
OS: Windows 10 (local dev env)

Server: running an Ignite server via the $IGNITE_HOME/bin/ignite.bat script.
Client: Junit session running in IntelliJ, using Ignite's Java API to attach
to the server, run in client mode, and activate the cluster once initially
connected.
Configuration: see attached zip file, file ignitepoc.xml.  Both the client
and server use the same configuration.
PartitionExchangeProblemWhenReconnecting.zip
<http://apache-ignite-users.70518.x6.nabble.com/file/t1951/PartitionExchangeProblemWhenReconnecting.zip>
  

What's Happening

Inititial client run - ok

1. Start server up - server starts ok
2. Run client - client is able to connect to server and run test to
completion.  Client also explicitly calls Ignite.close() to shutdown
cleanly.
During the client execution, it:
* Destroys any existing copy of the test cache from prior runs
* Creates a new test cache
* Loads 100K items into that cache using a DataStreamer
* reads all items in the cache using an Iterator obtained from the cache
* reads 100K items at random using the cache's get() method

Logs from this step are available in the attached zip file - file names
ClientLog-FirstRun-Success.txt, ServerLog-FirstRun-Success.txt

Second client run - trouble starts

The server remains up and running from the first run.
3. Run client again.

*The problem here is that the client never successfully connects to the
server.*
The server fails responding back to one of the messages sent from the
client, and I see the following exception in the logs:

/2018-07-30 18:08:05.494 [exchange-worker-#42] ERROR
o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture.error:137 - Failed to
reinitialize local partitions (preloading will be stopped): 

GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=4,
minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode
[id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1,
172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0,
ip-172-27-225-23.ec2.internal/172.27.225.23:0,
ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
intOrder=3, lastExchangeTime=1532988478915, loc=false,
ver=2.6.0#20180710-sha1:669feacc, isClient=true], topVer=4,
nodeId8=798ca779, msg=Node joined: TcpDiscoveryNode
[id=207a9b5d-0305-405b-9aee-32b7cbee7163, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1,
172.27.225.23, 192.168.52.92], sockAddrs=[/0:0:0:0:0:0:0:1:0, /127.0.0.1:0,
ip-172-27-225-23.ec2.internal/172.27.225.23:0,
ip-192-168-52-92.ec2.internal/192.168.52.92:0], discPort=0, order=4,
intOrder=3, lastExchangeTime=1532988478915, loc=false,
ver=2.6.0#20180710-sha1:669feacc, isClient=true], type=NODE_JOINED,
tstamp=1532988478959], nodeId=207a9b5d, evt=NODE_JOINED] 

java.lang.NullPointerException: null at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1243)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$11.apply(GridCacheDatabaseSharedManager.java:1239)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.future.GridFutureAdapter.notifyListener(GridFutureAdapter.java:383)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.future.GridFutureAdapter.listen(GridFutureAdapter.java:353)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.rebuildIndexesIfNeeded(GridCacheDatabaseSharedManager.java:1239)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:1711)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onDone(GridDhtPartitionsExchangeFuture.java:126)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:451)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:729)
~[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:2419)
[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2299)
[ignite-core-2.6.0.jar:2.6.0] at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
[ignite-core-2.6.0.jar:2.6.0] at java.lang.Thread.run(Thread.java:748)
[na:1.8.0_171]/

Logs from this step are available in the attached zip file - file names
ClientLog-SecondRun-ClientCannotConnect.txt,
ServerLog-SecondRun-ClientCannotConnect.txt

>From stepping through the server code using a debugger, I can see that the
usrFut variable is null on GridCacheDatabaseSharedManager.java:1243.

But I have no idea whether that is the problem or if my setup should not
have even gotten into that area of the code.

I had to kill the client in order to stop it, otherwise it will continually
wait for the message to come back.

Try to run client one more time - still a problem

The server is still up and running from before, it hasn't been restarted.
4. Try running the client again.

Here again, the client hangs.  I don't seem to see the NPE like before.  But
it continually waits for a response from the server, and I have to kill it.

Logs from this step are available in the attached zip file - file names
ClientLog-ThirdRun-ClientStillCannotConnect.txt,
ServerLog-ThirdRun-ClientStillCannotConnect.txt

The client will not successfully connect to the server unless I restart the
server.  Then the pattern of events shown above repeats itself - first time
the client can connect, but subsequent times it hangs.

*Could someone please help?  Is this a bug, or have I messed up something in
the configuration?*



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to