I seem to be running into some sort of memory issues with my DataStreamers
and I'd like to get a better idea of how they work behind the scenes to
troubleshoot my problem.

I have a cluster of 4 nodes, each of which is pulling files from S3 over an
extended period of time and loading the contents. Each new opens up a new
DataStreamer, loads its contents and closes the DataStreamer. At most each
cache has 4 DataStreamers writing to 4 different caches simultaneously. A
new DataStreamer isn't created until the last one on that thread is closed.
I wait for the futures to complete, then close the DataStreamer. So far so
good.

After my nodes are running for a few hours, one or more inevitably ends up
crashing. Sometimes the Java heap overflows and Java exits, and sometimes
Java is killed by the kernel because of an OOM error.

Here are my specs per node:
Total Available Memory: 110GB
Memory Assigned to All Data Regions: 50GB
Total Checkpoint Page Buffers: 5GB
Java Heap: 25GB

Does DataStreamer.close block until data is loaded into the cache on remote
nodes (I'm assuming it doesn't), and if not is there anyway to monitor the
progress loading data in the cache on the remote nodes/replicas, so I can
slow down my DataStreamers to keep pace? 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to