Re: [External Sender] Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-09 Thread Piotr Nowojski
Hi, At the first glance I can not find anything wrong with those settings. If it was some memory configuration problem that caused this error, I guess it would be visible as an exception somewhere. It's unlikely a GC issue, as if some machine froze and stopped responding for a longer period of tim

Re: [External Sender] Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-08 Thread Piotr Nowojski
Hi, This exception looks like it was thrown by a downstream Task/TaskManager when trying to read a message/packet from some upstream Task/TaskManager and that connection between two TaskManagers was reseted (closed abruptly). So it's the case: > involves communicating with other non-collocated tas

Re: [External Sender] Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-08 Thread Kye Bae
Hello, Piotr. Thank you. This is an error logged to the taskmanager just before it became "lost" to the jobmanager (i.e., reported as "lost" in the jobmanager log just before the job restart). In what context would this particular error (not the root-root cause you referred to) be thrown from a t

Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-08 Thread Piotr Nowojski
Hi Kye, Almost for sure this error is not the primary cause of the failure. This error means that the node reporting it, has detected some fatal failure on the other side of the wire (connection reset by peer), but the original error is somehow too slow or unable to propagate to the JobManager bef

Re: ERROR org.apache.flink.runtime.io.network.netty.PartitionRequestQueue

2020-12-07 Thread Kye Bae
I forgot to mention: this is Flink 1.10. -K On Mon, Dec 7, 2020 at 5:08 PM Kye Bae wrote: > Hello! > > We have a real-time streaming workflow that has been running for about 2.5 > weeks. > > Then, we began to get the exception below from taskmanagers (random) since > yesterday, and the job bega