Re: Disk usage during savepoints

2020-12-16 Thread Robert Metzger
Hey Rex, If I'm reading the Flink code correctly, then RocksDB will allocate it's storage across all configured tmp directories. Flink is respecting the io.tmp.dirs configuration property for that. it seems that you are using Flink on YARN, where Flink is respecting the tmp directory configs from

Re: Disk usage during savepoints

2020-12-12 Thread Rex Fenley
Our job just crashed while running a savepoint, it ran out of disk space. I inspected the disk and found the following: -rw--- 1 yarn yarn 10139680768 Dec 12 22:14 presto-s3-10125099138119182412.tmp -rw--- 1 yarn yarn 10071916544 Dec 12 22:14 presto-s3-10363672991943897408.tmp

Re: Disk usage during savepoints

2020-12-12 Thread Rex Fenley
Also, small correction from earlier, there are 4 volumes of 256 GiB so that's 1 TiB total. On Sat, Dec 12, 2020 at 10:08 AM Rex Fenley wrote: > Our first big test run we wanted to eliminate as many variables as > possible, so this is on 1 machine with 1 task manager and 1 parallelism. > The

Re: Disk usage during savepoints

2020-12-12 Thread Rex Fenley
Our first big test run we wanted to eliminate as many variables as possible, so this is on 1 machine with 1 task manager and 1 parallelism. The machine has 4 disks though, and as you can see, they mostly all use around the same space for storage until a savepoint is triggered. Could it be that

Re: Disk usage during savepoints

2020-12-12 Thread David Anderson
RocksDB does do compaction in the background, and incremental checkpoints simply mirror to S3 the set of RocksDB SST files needed by the current set of checkpoints. However, unlike checkpoints, which can be incremental, savepoints are always full snapshots. As for why one host would have much

Disk usage during savepoints

2020-12-11 Thread Rex Fenley
Hi, We're using the Rocks state backend with incremental checkpoints and savepoints setup for S3. We notice that every time we trigger a savepoint, one of the local disks on our host explodes in disk usage. What is it that savepoints are doing which would cause so much disk to be used? Our