Our Flink/YARN pipeline has been reading Avro from Kafka for a while now.
We just introduced a source of Avro OCF (Object Container Files) read from
S3. The Kafka Avro continued to decode without incident, but the OCF files
failed 100% with anomalous parse errors in the decoding phase after the
schema and codec were successfully read from them. The pipeline would work
on my laptop, and when I submitted a test Main program to the Flink Session
in YARN, that would also successfully decode. Only the actual pipeline run
from the TaskManager failed. At one point I even remote debugged the
TaskManager process and stepped through what looked like a normal Avro
decode (if you can describe Avro code as normal!) -- until it abruptly
failed with an int decode or what-have-you.

This stumped me for a while, but I finally tried moving flink-avro.jar from
the lib to the application jar, and that fixed it. I'm not sure why this
is, especially since there were no typical classloader-type errors.  This
issue was observed both on Flink 1.5 and 1.6 in Flip-6 mode.

-Cliff

Reply via email to