Hi Vino,

You were right in your assumption -- unshaded avro was being added to our
application jar via third-party dependency. Excluding it in packaging fixed
the issue. For the record, it looks flink-avro must be loaded from the lib
or there will be errors in checkpoint restores.

On Mon, Aug 20, 2018 at 8:43 AM Cliff Resnick <cre...@gmail.com> wrote:

> Hi Vino,
>
> Thanks for the explanation, but the job only ever uses the Avro (1.8.2)
> pulled in by flink-formats/avro, so it's not a class version conflict
> there.
>
> I'm using default child-first loading. It might be a further transitive
> dependency, though it's not clear by stack trace or stepping through the
> process. When I get a chance I'll look further into it but in case anyone
> is experiencing similar problems, what is clear is that classloader order
> does matter with Avro.
>
> On Sun, Aug 19, 2018, 11:36 PM vino yang <yanghua1...@gmail.com> wrote:
>
>> Hi Cliff,
>>
>> My personal guess is that this may be caused by Job's Avro conflict with
>> the Avro that the Flink framework itself relies on.
>> Flink has provided some configuration parameters which allows you to
>> determine the order of the classloaders yourself. [1]
>> Alternatively, you can debug classloading and participate in the
>> documentation.[2]
>>
>> [1]:
>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html
>> [2]:
>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html
>>
>> Thanks, vino.
>>
>> Cliff Resnick <cre...@gmail.com> 于2018年8月20日周一 上午10:40写道:
>>
>>> Our Flink/YARN pipeline has been reading Avro from Kafka for a while
>>> now. We just introduced a source of Avro OCF (Object Container Files) read
>>> from S3. The Kafka Avro continued to decode without incident, but the OCF
>>> files failed 100% with anomalous parse errors in the decoding phase after
>>> the schema and codec were successfully read from them. The pipeline would
>>> work on my laptop, and when I submitted a test Main program to the Flink
>>> Session in YARN, that would also successfully decode. Only the actual
>>> pipeline run from the TaskManager failed. At one point I even remote
>>> debugged the TaskManager process and stepped through what looked like a
>>> normal Avro decode (if you can describe Avro code as normal!) -- until it
>>> abruptly failed with an int decode or what-have-you.
>>>
>>> This stumped me for a while, but I finally tried moving flink-avro.jar
>>> from the lib to the application jar, and that fixed it. I'm not sure why
>>> this is, especially since there were no typical classloader-type errors.
>>> This issue was observed both on Flink 1.5 and 1.6 in Flip-6 mode.
>>>
>>> -Cliff
>>>
>>>
>>>
>>>
>>>
>>>

Reply via email to