Our Flink/YARN pipeline has been reading Avro from Kafka for a while now. We just introduced a source of Avro OCF (Object Container Files) read from S3. The Kafka Avro continued to decode without incident, but the OCF files failed 100% with anomalous parse errors in the decoding phase after the schema and codec were successfully read from them. The pipeline would work on my laptop, and when I submitted a test Main program to the Flink Session in YARN, that would also successfully decode. Only the actual pipeline run from the TaskManager failed. At one point I even remote debugged the TaskManager process and stepped through what looked like a normal Avro decode (if you can describe Avro code as normal!) -- until it abruptly failed with an int decode or what-have-you.
This stumped me for a while, but I finally tried moving flink-avro.jar from the lib to the application jar, and that fixed it. I'm not sure why this is, especially since there were no typical classloader-type errors. This issue was observed both on Flink 1.5 and 1.6 in Flip-6 mode. -Cliff