Hello! DataStreamer WILL block until all data is loaded in caches.
The recommendation here is probably reducing perNodeParallelOperations(), streamerBufferSize() and perThreadBufferSize(), and flush()ing your DataStreamer frequently to avoid data build-ups in temporary data structures of DataStreamer. Or maybe, if you have a few entries which are very large, you can just use Cache API to populate those. Regards, -- Ilya Kasnacheev вс, 14 апр. 2019 г. в 18:45, kellan <kellan.bur...@gmail.com>: > I seem to be running into some sort of memory issues with my DataStreamers > and I'd like to get a better idea of how they work behind the scenes to > troubleshoot my problem. > > I have a cluster of 4 nodes, each of which is pulling files from S3 over an > extended period of time and loading the contents. Each new opens up a new > DataStreamer, loads its contents and closes the DataStreamer. At most each > cache has 4 DataStreamers writing to 4 different caches simultaneously. A > new DataStreamer isn't created until the last one on that thread is closed. > I wait for the futures to complete, then close the DataStreamer. So far so > good. > > After my nodes are running for a few hours, one or more inevitably ends up > crashing. Sometimes the Java heap overflows and Java exits, and sometimes > Java is killed by the kernel because of an OOM error. > > Here are my specs per node: > Total Available Memory: 110GB > Memory Assigned to All Data Regions: 50GB > Total Checkpoint Page Buffers: 5GB > Java Heap: 25GB > > Does DataStreamer.close block until data is loaded into the cache on remote > nodes (I'm assuming it doesn't), and if not is there anyway to monitor the > progress loading data in the cache on the remote nodes/replicas, so I can > slow down my DataStreamers to keep pace? > > > > -- > Sent from: http://apache-ignite-users.70518.x6.nabble.com/ >