Hi, I've been running some flink scala applications on an AWS EMR cluster (version 5.26.0 with flink 1.8.0 for scala 2.11) for a while and I started to have some issues now.
I have a flink app that reads some files from S3, process them and save some files to s3 and also some records to a database. The application is not so complex it has a source that reads a directory (multiple files) and other one that reads a single one and then it has some grouping and mapping and a left outer join between these 2 sources. The issue is that occasionally the application got stuck with only two tasks running, one finished and the other ones not even run. The 2 tasks that keep running forever are the source1 from directory (multiple files) and the leftouterjoin, the source2 (input from a single file) is the one that finishes. One interest thing is that there should be several tasks between source 1 and this leftouterjoin but they remain in CREATED state. If the app stuck usually I simply kill that and run that again, which works. The issue is not that frequent but is getting more and more frequent. It's happening almost everyday now. I also have a DEBUG log from a job that didn't work and another one from a job that worked. Thanks.