Hi
The stack said that the job failed when restoring from
checkpoint/savepoint. If encounter this when in failover, maybe you can try
to find out the root cause which caused the job failover.
For the stack, it because when restoring `HeapPriorityQueue`, there
would ensure there are enough
Hi Vishwas,
Your scenario sounds like RocksDB would actually be recommended. I would
always suggest to start with RocksDB, unless your state is really small
compared to the available memory, or you need to optimize for performance.
But maybe your job is running fine with RocksDB (performance
Thanks Andrey,
My question is related to
The FsStateBackend is encouraged for:
- Jobs with large state, long windows, large key/value states.
- All high-availability setups.
How large is large state without any overhead added by the framework?
Best,
Vishwas
On Wed, Aug 26, 2020 at 12:10
Hi Vishwas,
is this quantifiable with respect to JVM heap size on a single node
> without the node being used for other tasks ?
I don't quite understand this question. I believe the recommendation in
docs has the same reason: use larger state objects so that the Java object
overhead pays off.
Hi Vishwas,
I believe the screenshots are from a heap size of 1GB?
There are indeed many internal Flink state objects. They are overhead which
is required for Flink to organise and track the state on-heap.
Depending on the actual size of your state objects, the overhead may be
relatively large
Hi Vishwas,
If you use Flink 1.7, check the older memory model docs [1] because you
referred to the new memory model of Flink 1.10 in your reference 2.
Could you also share a screenshot where you get the state size of 2.5 GB?
Do you mean Flink WebUI?
Generally, it is quite hard to estimate the
Hi Vishwas,
According to the log, heap space is 13+GB, which looks fine.
Several reason might lead to the heap space OOM:
- Memory leak
- Not enough GC threads
- Concurrent GC starts too late
- ...
I would suggest taking a look at the GC logs.
Thank you~
Xintong Song
On Fri,
Hi guys,
I use flink version 1.7.2
I have a stateful streaming job which uses a keyed process function. I use
heap state backend. Although I set TM heap size to 16 GB, I get OOM error
when the state size is around 2.5 GB(from dashboard I get the state size).
I have set taskmanager.memory.fraction: