I seem to be running into some sort of memory issues with my DataStreamers and I'd like to get a better idea of how they work behind the scenes to troubleshoot my problem.
I have a cluster of 4 nodes, each of which is pulling files from S3 over an extended period of time and loading the contents. Each new opens up a new DataStreamer, loads its contents and closes the DataStreamer. At most each cache has 4 DataStreamers writing to 4 different caches simultaneously. A new DataStreamer isn't created until the last one on that thread is closed. I wait for the futures to complete, then close the DataStreamer. So far so good. After my nodes are running for a few hours, one or more inevitably ends up crashing. Sometimes the Java heap overflows and Java exits, and sometimes Java is killed by the kernel because of an OOM error. Here are my specs per node: Total Available Memory: 110GB Memory Assigned to All Data Regions: 50GB Total Checkpoint Page Buffers: 5GB Java Heap: 25GB Does DataStreamer.close block until data is loaded into the cache on remote nodes (I'm assuming it doesn't), and if not is there anyway to monitor the progress loading data in the cache on the remote nodes/replicas, so I can slow down my DataStreamers to keep pace? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/
