Hi Jamie,

No, it was nothing of the class not found variety, just parse errors. It
had to do with Avro getting mixed up with different versions.

-Cliff

On Mon, Aug 20, 2018 at 4:18 PM Jamie Grier <jgr...@lyft.com> wrote:

> Hey Cliff, can you provide the stack trace of the issue you were seeing?
> We recently ran into a similar issue that we're still debugging.  Did it
> look like this:
>
> java.lang.IllegalStateException: Could not initialize operator state
>> backend.
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:301)
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:248)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.UnsupportedOperationException: Could not find
>> required Avro dependency.
>> at
>> org.apache.flink.api.java.typeutils.runtime.kryo.Serializers$DummyAvroKryoSerializerClass.read(Serializers.java:170)
>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
>> at
>> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:249)
>> at
>> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
>> at
>> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
>> at
>> org.apache.flink.runtime.state.DefaultOperatorStateBackend.deserializeStateValues(DefaultOperatorStateBackend.java:584)
>> at
>> org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:399)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorStateBackend(StreamTask.java:733)
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:299)
>> ... 6 common frames omitted
>> 00:17:19.626 INFO o.a.f.r.e.ExecutionGraph - Job
>> ClientEventToElasticsearchJob (5cec438674e9a111703c83897f7c8138) switched
>> from state RUNNING to FAILING.
>> java.lang.IllegalStateException: Could not initialize operator state
>> backend.
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:301)
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:248)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:692)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:679)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
>> at java.lang.Thread.run(Thread.java:748)
>> Caused by: java.lang.UnsupportedOperationException: Could not find
>> required Avro dependency.
>> at
>> org.apache.flink.api.java.typeutils.runtime.kryo.Serializers$DummyAvroKryoSerializerClass.read(Serializers.java:170)
>> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761)
>> at
>> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:249)
>> at
>> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:136)
>> at
>> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
>> at
>> org.apache.flink.runtime.state.DefaultOperatorStateBackend.deserializeStateValues(DefaultOperatorStateBackend.java:584)
>> at
>> org.apache.flink.runtime.state.DefaultOperatorStateBackend.restore(DefaultOperatorStateBackend.java:399)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorStateBackend(StreamTask.java:733)
>> at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initOperatorState(AbstractStreamOperator.java:299)
>> ... 6 common frames omitted
>
>
> -Jamie
>
>
> On Mon, Aug 20, 2018 at 10:42 AM, Cliff Resnick <cre...@gmail.com> wrote:
>
>> Hi Vino,
>>
>> You were right in your assumption -- unshaded avro was being added to our
>> application jar via third-party dependency. Excluding it in packaging fixed
>> the issue. For the record, it looks flink-avro must be loaded from the lib
>> or there will be errors in checkpoint restores.
>>
>> On Mon, Aug 20, 2018 at 8:43 AM Cliff Resnick <cre...@gmail.com> wrote:
>>
>>> Hi Vino,
>>>
>>> Thanks for the explanation, but the job only ever uses the Avro (1.8.2)
>>> pulled in by flink-formats/avro, so it's not a class version conflict
>>> there.
>>>
>>> I'm using default child-first loading. It might be a further transitive
>>> dependency, though it's not clear by stack trace or stepping through the
>>> process. When I get a chance I'll look further into it but in case anyone
>>> is experiencing similar problems, what is clear is that classloader order
>>> does matter with Avro.
>>>
>>> On Sun, Aug 19, 2018, 11:36 PM vino yang <yanghua1...@gmail.com> wrote:
>>>
>>>> Hi Cliff,
>>>>
>>>> My personal guess is that this may be caused by Job's Avro conflict
>>>> with the Avro that the Flink framework itself relies on.
>>>> Flink has provided some configuration parameters which allows you to
>>>> determine the order of the classloaders yourself. [1]
>>>> Alternatively, you can debug classloading and participate in the
>>>> documentation.[2]
>>>>
>>>> [1]:
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/config.html
>>>> [2]:
>>>> https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/debugging_classloading.html
>>>>
>>>> Thanks, vino.
>>>>
>>>> Cliff Resnick <cre...@gmail.com> 于2018年8月20日周一 上午10:40写道:
>>>>
>>>>> Our Flink/YARN pipeline has been reading Avro from Kafka for a while
>>>>> now. We just introduced a source of Avro OCF (Object Container Files) read
>>>>> from S3. The Kafka Avro continued to decode without incident, but the OCF
>>>>> files failed 100% with anomalous parse errors in the decoding phase after
>>>>> the schema and codec were successfully read from them. The pipeline would
>>>>> work on my laptop, and when I submitted a test Main program to the Flink
>>>>> Session in YARN, that would also successfully decode. Only the actual
>>>>> pipeline run from the TaskManager failed. At one point I even remote
>>>>> debugged the TaskManager process and stepped through what looked like a
>>>>> normal Avro decode (if you can describe Avro code as normal!) -- until it
>>>>> abruptly failed with an int decode or what-have-you.
>>>>>
>>>>> This stumped me for a while, but I finally tried moving flink-avro.jar
>>>>> from the lib to the application jar, and that fixed it. I'm not sure why
>>>>> this is, especially since there were no typical classloader-type errors.
>>>>> This issue was observed both on Flink 1.5 and 1.6 in Flip-6 mode.
>>>>>
>>>>> -Cliff
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>

Reply via email to