Hi amenreet,

According to the error message, I think you can log in the jm pod after it
starts, and check access permissions for the directory
`file:///opt/flink/pm/ha`

Best,
Shammon FY


On Fri, Jul 7, 2023 at 6:04 PM amenreet sodhi <amenso...@gmail.com> wrote:

> Hi Shammon
>
> I am using an external NFS mount which gets mounted at path
> /opt/flink/pm/, and the path that is mentioned there refers to that
> only, so not a local file. Could there be any other configuration issue?
>
> Thanks
> Regard
> Amenreet Singh Sodhi
>
> On Fri, Jul 7, 2023 at 2:00 PM Shammon FY <zjur...@gmail.com> wrote:
>
>> Hi amenreet,
>>
>> Maybe you can try to use hdfs or s3 for `high-availability.storageDir`, I
>> found your current job is using a local file which is started with
>> `file:///`.
>>
>> Best,
>> Shammon FY
>>
>>
>> On Fri, Jul 7, 2023 at 4:20 PM amenreet sodhi <amenso...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>> I am deploying Flink cluster on Kubernetes in HA mode. But i noticed,
>>> whenever i deploy Flink cluster for first time on K8s cluster, it is not
>>> able to populate the cluster configmap, and due to which JM fails with the
>>> following exception:
>>>
>>> 2023-07-06 16:46:11,428 ERROR 
>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint        [] - Fatal 
>>> error occurred in the cluster entrypoint.
>>> java.util.concurrent.CompletionException: java.lang.IllegalStateException: 
>>> The base directory of the JobResultStore isn't accessible. No dirty 
>>> JobResults can be restored.
>>>     at 
>>> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:314)
>>>  ~[?:?]
>>>     at 
>>> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:319)
>>>  [?:?]
>>>     at 
>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1702)
>>>  [?:?]
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>>>  [?:?]
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>>>  [?:?]
>>>     at java.lang.Thread.run(Thread.java:834) [?:?]
>>> Caused by: java.lang.IllegalStateException: The base directory of the 
>>> JobResultStore isn't accessible. No dirty JobResults can be restored.
>>>     at 
>>> org.apache.flink.util.Preconditions.checkState(Preconditions.java:193) 
>>> ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> org.apache.flink.runtime.highavailability.FileSystemJobResultStore.getDirtyResultsInternal(FileSystemJobResultStore.java:182)
>>>  ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.withReadLock(AbstractThreadsafeJobResultStore.java:118)
>>>  ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> org.apache.flink.runtime.highavailability.AbstractThreadsafeJobResultStore.getDirtyResults(AbstractThreadsafeJobResultStore.java:100)
>>>  ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResults(SessionDispatcherLeaderProcess.java:194)
>>>  ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> org.apache.flink.runtime.dispatcher.runner.AbstractDispatcherLeaderProcess.supplyUnsynchronizedIfRunning(AbstractDispatcherLeaderProcess.java:198)
>>>  ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess.getDirtyJobResultsIfRunning(SessionDispatcherLeaderProcess.java:188)
>>>  ~[event_executor-1.1.20.jar:?]
>>>     at 
>>> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
>>>  ~[?:?]
>>>
>>> Once we reinstall/helm upgrade then this exception goes away. How can
>>> this be resolved, any additional configuration required to resolve this?
>>>
>>> I am using the following configuration for HA:
>>>
>>>  high-availability.storageDir: file:///opt/flink/pm/ha
>>>     kubernetes.cluster-id: {{ include "fullname" . }}-cluster-{{ now | date 
>>> "20060102150405" }}
>>>     high-availability.jobmanager.port: 6123
>>>     high-availability.type: kubernetes
>>>     high-availability: 
>>> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>>>     kubernetes.namespace: {{ .Release.Namespace }}
>>>
>>> Thanks
>>>
>>> Regards
>>> Amenreet Singh Sodhi
>>>
>>>

Reply via email to