Re: OOM error for heap state backend.

2020-08-27 Thread Congxian Qiu
Hi The stack said that the job failed when restoring from checkpoint/savepoint. If encounter this when in failover, maybe you can try to find out the root cause which caused the job failover. For the stack, it because when restoring `HeapPriorityQueue`, there would ensure there are enough

Re: OOM error for heap state backend.

2020-08-27 Thread Robert Metzger
Hi Vishwas, Your scenario sounds like RocksDB would actually be recommended. I would always suggest to start with RocksDB, unless your state is really small compared to the available memory, or you need to optimize for performance. But maybe your job is running fine with RocksDB (performance

Re: OOM error for heap state backend.

2020-08-26 Thread Vishwas Siravara
Thanks Andrey, My question is related to The FsStateBackend is encouraged for: - Jobs with large state, long windows, large key/value states. - All high-availability setups. How large is large state without any overhead added by the framework? Best, Vishwas On Wed, Aug 26, 2020 at 12:10

Re: OOM error for heap state backend.

2020-08-26 Thread Andrey Zagrebin
Hi Vishwas, is this quantifiable with respect to JVM heap size on a single node > without the node being used for other tasks ? I don't quite understand this question. I believe the recommendation in docs has the same reason: use larger state objects so that the Java object overhead pays off.

Re: OOM error for heap state backend.

2020-08-26 Thread Andrey Zagrebin
Hi Vishwas, I believe the screenshots are from a heap size of 1GB? There are indeed many internal Flink state objects. They are overhead which is required for Flink to organise and track the state on-heap. Depending on the actual size of your state objects, the overhead may be relatively large

Re: OOM error for heap state backend.

2020-08-25 Thread Andrey Zagrebin
Hi Vishwas, If you use Flink 1.7, check the older memory model docs [1] because you referred to the new memory model of Flink 1.10 in your reference 2. Could you also share a screenshot where you get the state size of 2.5 GB? Do you mean Flink WebUI? Generally, it is quite hard to estimate the

Re: OOM error for heap state backend.

2020-08-23 Thread Xintong Song
Hi Vishwas, According to the log, heap space is 13+GB, which looks fine. Several reason might lead to the heap space OOM: - Memory leak - Not enough GC threads - Concurrent GC starts too late - ... I would suggest taking a look at the GC logs. Thank you~ Xintong Song On Fri,

OOM error for heap state backend.

2020-08-21 Thread Vishwas Siravara
Hi guys, I use flink version 1.7.2 I have a stateful streaming job which uses a keyed process function. I use heap state backend. Although I set TM heap size to 16 GB, I get OOM error when the state size is around 2.5 GB(from dashboard I get the state size). I have set taskmanager.memory.fraction: