Hi,

I'm running flink batch jobs on EMR 5.21, and I'm seeing many (>50%) jobs
stall and make no progress after some initial period. I've seen the
behaviour earlier (5.17), but not nearly as much as now.

The job is a fairly simple enrichment job, loading an avro metadata file,
creating several datasets from the file and broadcasting them. Later they
are used in joins with the dataset of input events, also avro files. There
are no shuffles or keyBy operations.

I see nothing in the logs at INFO level, and the UI for the stalled jobs
shows the following:
* metadata loading tasks are finished.
* all other tasks are running, except the parquet output which is in state
"created"
* the task earlier in the DAG from the parquet output task shows the back
pressure status as "OK", the one earlier is shown with back pressure status
"High"

Are there any specific logs I should enable to get more information on
this? Has anyone else seen this behaviour?

Kind regards,
Marko

Reply via email to