Stefan Richter created FLINK-4150:
-------------------------------------

             Summary: Problem with Blobstore in Yarn HA setting on recovery 
after cluster shutdown
                 Key: FLINK-4150
                 URL: https://issues.apache.org/jira/browse/FLINK-4150
             Project: Flink
          Issue Type: Bug
          Components: Job-Submission
            Reporter: Stefan Richter


Submitting a job in Yarn with HA can lead to the following exception:

{code}
org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot load user 
class: org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09
ClassLoader info: URL ClassLoader:
    file: 
'/tmp/blobStore-ccec0f4a-3e07-455f-945b-4fcd08f5bac1/cache/blob_7fafffe9595cd06aff213b81b5da7b1682e1d6b0'
 (invalid JAR: zip file is empty)
Class not resolvable through given classloader.
        at 
org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:207)
        at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:222)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
        at java.lang.Thread.run(Thread.java:745)
{code}

Some job information, including the Blob ids, are stored in Zookeeper. The 
actual Blobs are stored in a dedicated BlobStore, if the recovery mode is set 
to Zookeeper. This BlobStore is typically located in a FS like HDFS. When the 
cluster is shut down, the path for the BlobStore is deleted. When the cluster 
is then restarted, recovering jobs cannot restore because it's Blob ids stored 
in Zookeeper now point to deleted files.

In particular, this problem frequently occurs for HA in combination with -m 
yarn-cluster. We should discuss in how far this combination actually makes 
sense and what the expected behavior should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to