Hi Flink Community, In our flink deployments, we see some flink threads are cpu busy/stuck after few hours with the below stack
"Sink:AggregationSink (2/4)" #567 daemon prio=5 os_prio=0 tid=0x00007f901dc97000 nid=0x254 runnable [0x00007f8fe017f000] java.lang.Thread.State: RUNNABLE at org.apache.flink.api.java.typeutils.runtime.DataInputViewStream.read(DataInputViewStream.java:70) at com.esotericsoftware.kryo.io.Input.fill(Input.java:146) at org.apache.flink.api.java.typeutils.runtime.NoFetchingInput.require(NoFetchingInput.java:76) at com.esotericsoftware.kryo.io.Input.readVarInt(Input.java:355) at com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:109) at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752) at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:236) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:187) at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.deserialize(StreamElementSerializer.java:40) at org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:109) at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:147) at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:261) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:656) at java.lang.Thread.run(Thread.java:745) Few observations from the stack, while it is stuck. SpillingAdaptiveSpanningRecordDeserializer.getNextRecord nonSpanningRemaining = 13 All these 13 bytes were read even before reaching com.esotericsoftware.kryo.io.Input.readVarInt We are wondering is this a serialization bug or memory segment corruption? Any pointers on how to debug further will be much appreciated. Regards, Mehar