Hello!

DataStreamer WILL block until all data is loaded in caches.

The recommendation here is probably reducing perNodeParallelOperations(),
streamerBufferSize() and perThreadBufferSize(), and flush()ing your
DataStreamer frequently to avoid data build-ups in temporary data
structures of DataStreamer. Or maybe, if you have a few entries which are
very large, you can just use Cache API to populate those.

Regards,
-- 
Ilya Kasnacheev


вс, 14 апр. 2019 г. в 18:45, kellan <kellan.bur...@gmail.com>:

> I seem to be running into some sort of memory issues with my DataStreamers
> and I'd like to get a better idea of how they work behind the scenes to
> troubleshoot my problem.
>
> I have a cluster of 4 nodes, each of which is pulling files from S3 over an
> extended period of time and loading the contents. Each new opens up a new
> DataStreamer, loads its contents and closes the DataStreamer. At most each
> cache has 4 DataStreamers writing to 4 different caches simultaneously. A
> new DataStreamer isn't created until the last one on that thread is closed.
> I wait for the futures to complete, then close the DataStreamer. So far so
> good.
>
> After my nodes are running for a few hours, one or more inevitably ends up
> crashing. Sometimes the Java heap overflows and Java exits, and sometimes
> Java is killed by the kernel because of an OOM error.
>
> Here are my specs per node:
> Total Available Memory: 110GB
> Memory Assigned to All Data Regions: 50GB
> Total Checkpoint Page Buffers: 5GB
> Java Heap: 25GB
>
> Does DataStreamer.close block until data is loaded into the cache on remote
> nodes (I'm assuming it doesn't), and if not is there anyway to monitor the
> progress loading data in the cache on the remote nodes/replicas, so I can
> slow down my DataStreamers to keep pace?
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Reply via email to