Python pipeline on Flink not finishing cleanly

Janek Bevendorff Thu, 27 Jan 2022 06:40:40 -0800

Hi,

When I run a Python pipeline with multiple concurrent TaskManagers onFlink, the job hardly ever (or never) finishes properly. At the end,Beam (or Flink?) always throws a seemingly random gRPCIllegalStateException after my last GlobalCombine, so Beam goes intosome weird error handling mode and eventually fails to job, even thoughit should have finished cleanly.

This is only reproducible with parallelism set to at least 5 or 8. With1-4, I cannot reliably (or at all) reproduce it. It looks like a similarissue has already been reported on Jira(https://issues.apache.org/jira/browse/BEAM-8980), but it got marked asstale. Anyone else seeing this? Is there anything I can do? I don't wantmy job to restart after it's finished and I want a clean exit status,otherwise I don't really know if everything succeeded properly and Idon't want to comb through hundreds of log files to find out.

I added a comment with the stacktraces that I get below theabove-mentioned issue:https://issues.apache.org/jira/browse/BEAM-8980?focusedCommentId=17483174&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17483174


Janek

Python pipeline on Flink not finishing cleanly

Reply via email to