[QUESTION] In Flink k8s Application mode with HA can not using History Server for history backend

2022-05-11 Thread 谭家良
In Flink k8s application mode with high-availability, it's job id always 
00, but in history server, it make job's id for the key. How can I 
using the application mode with HA and store the history job status with 
history server?




Best,

tanjialiang.

Re: 请问同一个flink history server能够支持多个flink application cluster吗?

2021-08-20 Thread Tony Wei
Hi Chenyu,

這確實是目前尚未解決的一個問題,相關的 jira issue 可以看這 [1]。
jira issue 底下的討論串有提到一個替代方案是:使用 -D\$internal.pipeline.job-id=$(cat
/proc/sys/kernel/random/uuid|tr -d "-") 主動為 application 模式的任務產生隨機的 jobid。
但由於此配置參數屬於 flink 內部參數,可能不保證未來任何改動後的向後兼容性,請謹慎考慮後再使用。

[1] https://issues.apache.org/jira/browse/FLINK-19358


Chenyu Zheng  於 2021年8月20日 週五 下午12:16寫道:

> History Server的API也是使用jobid作为区分
>
>   *   /config
>   *   /jobs/overview
>   *   /jobs/
>   *   /jobs//vertices
>   *   /jobs//config
>   *   /jobs//exceptions
>   *   /jobs//accumulators
>   *   /jobs//vertices/
>   *   /jobs//vertices//subtasktimes
>   *   /jobs//vertices//taskmanagers
>   *   /jobs//vertices//accumulators
>   *   /jobs//vertices//subtasks/accumulators
>   *   /jobs//vertices//subtasks/
>   *
>  /jobs//vertices//subtasks//attempts/
>   *
>  
> /jobs//vertices//subtasks//attempts//accumulators
>   *   /jobs//plan
>
>
> From: Chenyu Zheng 
> Reply-To: "user-zh@flink.apache.org" 
> Date: Friday, August 20, 2021 at 11:43 AM
> To: "user-zh@flink.apache.org" 
> Subject: 请问同一个flink history server能够支持多个flink application cluster吗?
>
> 您好,
>
> 我们目前在k8s上以flink application模式运行作业,现在希望部署一个history
> server方便debug。但是根据文档,flink
> historyserver貌似只支持单个cluster下不同job的使用方法,如果存在多个cluster,相同的jobID将会出现错误。
>
> 请问对于多个application cluster,history使用的最佳姿势是什么样的?
>
> 谢谢[cid:image001.png@01D795B8.6430A670]
>


Re: 请问同一个flink history server能够支持多个flink application cluster吗?

2021-08-19 Thread Chenyu Zheng
History Server的API也是使用jobid作为区分

  *   /config
  *   /jobs/overview
  *   /jobs/
  *   /jobs//vertices
  *   /jobs//config
  *   /jobs//exceptions
  *   /jobs//accumulators
  *   /jobs//vertices/
  *   /jobs//vertices//subtasktimes
  *   /jobs//vertices//taskmanagers
  *   /jobs//vertices//accumulators
  *   /jobs//vertices//subtasks/accumulators
  *   /jobs//vertices//subtasks/
  *   /jobs//vertices//subtasks//attempts/
  *   
/jobs//vertices//subtasks//attempts//accumulators
  *   /jobs//plan


From: Chenyu Zheng 
Reply-To: "user-zh@flink.apache.org" 
Date: Friday, August 20, 2021 at 11:43 AM
To: "user-zh@flink.apache.org" 
Subject: 请问同一个flink history server能够支持多个flink application cluster吗?

您好,

我们目前在k8s上以flink application模式运行作业,现在希望部署一个history server方便debug。但是根据文档,flink 
historyserver貌似只支持单个cluster下不同job的使用方法,如果存在多个cluster,相同的jobID将会出现错误。

请问对于多个application cluster,history使用的最佳姿势是什么样的?

谢谢[cid:image001.png@01D795B8.6430A670]


请问同一个flink history server能够支持多个flink application cluster吗?

2021-08-19 Thread Chenyu Zheng
您好,

我们目前在k8s上以flink application模式运行作业,现在希望部署一个history server方便debug。但是根据文档,flink 
historyserver貌似只支持单个cluster下不同job的使用方法,如果存在多个cluster,相同的jobID将会出现错误。

请问对于多个application cluster,history使用的最佳姿势是什么样的?

谢谢[cid:image001.png@01D795B8.6430A670]


Re: History Server是否可以查看TaskManager聚合后的日志

2021-05-07 Thread Yang Wang
目前Flink的history server并没有和Yarn NM的log
aggregation进行整合,所以任务结束以后只能看webui以及exception
日志是没有办法看的

Best,
Yang

lhuiseu  于2021年5月7日周五 下午1:57写道:

> Hi:
> flink 1.12.0
> on yarn 模式
> 已经Finish的任务可以再history server中找到。但是通过WebUI查看TaskManager Log报404。目前Flink
> History Server是不支持查看TaskManager聚合后的日志吗?希望了解history serve相关原理的同学给予帮助。
> 非常感谢。
>
> <http://apache-flink.147419.n8.nabble.com/file/t1254/file.png>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/
>


请问在native kubernetes上如何运行Flink History Server?

2021-05-07 Thread casel.chen
请问在native kubernetes上如何运行Flink History Server? 有没有相应的文档?

History Server是否可以查看TaskManager聚合后的日志

2021-05-06 Thread lhuiseu
Hi:
flink 1.12.0  
on yarn 模式
已经Finish的任务可以再history server中找到。但是通过WebUI查看TaskManager Log报404。目前Flink
History Server是不支持查看TaskManager聚合后的日志吗?希望了解history serve相关原理的同学给予帮助。
非常感谢。

<http://apache-flink.147419.n8.nabble.com/file/t1254/file.png> 



--
Sent from: http://apache-flink.147419.n8.nabble.com/


Re: Native kubernetes execution and History server

2021-03-25 Thread Yang Wang
Thanks Guowei for the comments and Lukáš Drbal for sharing the feedback.

I think it is not only for Kubernetes application mode, but also Yarn
application and standalone application,
the job id will be set to ZERO if not configured explicitly in HA mode.

For standalone application, we could use "--job-id" to specify the user
defined job id.

However, for Yarn and Kubernetes applications, we do not have a public
config options for this.
Concerning we might support multiple jobs in a single Flink application
when HA enabled in the future,
introducing such a public config option may be inopportune.


Best,
Yang

Lukáš Drbal  于2021年3月25日周四 下午7:09写道:

> Hello Guowei,
>
> I just checked it and it works!
>
> Thanks a lot!
>
> Here is workaround which use UUID as jobId:
> -D\$internal.pipeline.job-id=$(cat /proc/sys/kernel/random/uuid|tr -d "-")
>
>
> L.
>
> On Thu, Mar 25, 2021 at 11:01 AM Guowei Ma  wrote:
>
>> Hi,
>> Thanks for providing the logs. From the logs this is a known bug.[1]
>> Maybe you could use `$internal.pipeline.job-id` to set your own
>> job-id.(Thanks to Wang Yang)
>> But keep in mind this is only for internal use and may be changed in
>> some release. So you should keep an eye on [1] for the correct solution.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-19358
>>
>> Best,
>> Guowei
>>
>>
>> On Thu, Mar 25, 2021 at 5:31 PM Lukáš Drbal 
>> wrote:
>>
>>> Hello,
>>>
>>> sure. Here is log from first run which succeed -
>>> https://pastebin.com/tV75ZS5S
>>> and here is from second run (it's same for all next) -
>>> https://pastebin.com/pwTFyGvE
>>>
>>> My Docker file is pretty simple, just take wordcount + S3
>>>
>>> FROM flink:1.12.2
>>>
>>> RUN mkdir -p $FLINK_HOME/usrlib
>>> COPY flink-examples-batch_2.12-1.12.2-WordCount.jar
>>>  $FLINK_HOME/usrlib/wordcount.jar
>>>
>>> RUN mkdir -p ${FLINK_HOME}/plugins/s3-fs-presto
>>> COPY flink-s3-fs-presto-1.12.2.jar $FLINK_HOME/plugins/s3-fs-presto/
>>>
>>> Thanks!
>>>
>>> On Thu, Mar 25, 2021 at 9:24 AM Guowei Ma  wrote:
>>>
>>>> Hi,
>>>> After some discussion with Wang Yang offline, it seems that there might
>>>> be a jobmanager failover. So would you like to share full jobmanager log?
>>>> Best,
>>>> Guowei
>>>>
>>>>
>>>> On Wed, Mar 24, 2021 at 10:04 PM Lukáš Drbal 
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I would like to use native kubernetes execution [1] for one batch job
>>>>> and let scheduling on kubernetes. Flink version: 1.12.2.
>>>>>
>>>>> Kubernetes job:
>>>>> apiVersion: batch/v1beta1
>>>>> kind: CronJob
>>>>> metadata:
>>>>>   name: scheduled-job
>>>>> spec:
>>>>>   schedule: "*/1 * * * *"
>>>>>   jobTemplate:
>>>>> spec:
>>>>>   template:
>>>>> metadata:
>>>>>   labels:
>>>>> app: super-flink-batch-job
>>>>> spec:
>>>>>   containers:
>>>>>   - name: runner
>>>>> image: localhost:5000/batch-flink-app-v3:latest
>>>>> imagePullPolicy: Always
>>>>> command:
>>>>>   - /bin/sh
>>>>>   - -c
>>>>>   - /opt/flink/bin/flink run-application --target
>>>>> kubernetes-application -Dkubernetes.service-account=flink-service-account
>>>>> -Dkubernetes.rest-service.exposed.type=NodePort
>>>>> -Dkubernetes.cluster-id=batch-job-cluster
>>>>> -Dkubernetes.container.image=localhost:5000/batch-flink-app-v3:latest
>>>>> -Ds3.endpoint=http://minio-1616518256:9000 -Ds3.access-key=ACCESSKEY
>>>>> -Ds3.secret-key=SECRETKEY
>>>>> -Djobmanager.archive.fs.dir=s3://flink/completed-jobs/
>>>>> -Ds3.path-style-access=true -Ds3.ssl.enabled=false
>>>>> -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>>>>> -Dhigh-availability.storageDir=s3://flink/flink-ha
>>>>> local:///opt/flink/usrlib/job.jar
>>>>>   restartPolicy: OnFailure
>>>>>
>>>>>
>>&

Re: Native kubernetes execution and History server

2021-03-25 Thread Lukáš Drbal
Hello Guowei,

I just checked it and it works!

Thanks a lot!

Here is workaround which use UUID as jobId:
-D\$internal.pipeline.job-id=$(cat /proc/sys/kernel/random/uuid|tr -d "-")


L.

On Thu, Mar 25, 2021 at 11:01 AM Guowei Ma  wrote:

> Hi,
> Thanks for providing the logs. From the logs this is a known bug.[1]
> Maybe you could use `$internal.pipeline.job-id` to set your own
> job-id.(Thanks to Wang Yang)
> But keep in mind this is only for internal use and may be changed in
> some release. So you should keep an eye on [1] for the correct solution.
>
> [1] https://issues.apache.org/jira/browse/FLINK-19358
>
> Best,
> Guowei
>
>
> On Thu, Mar 25, 2021 at 5:31 PM Lukáš Drbal  wrote:
>
>> Hello,
>>
>> sure. Here is log from first run which succeed -
>> https://pastebin.com/tV75ZS5S
>> and here is from second run (it's same for all next) -
>> https://pastebin.com/pwTFyGvE
>>
>> My Docker file is pretty simple, just take wordcount + S3
>>
>> FROM flink:1.12.2
>>
>> RUN mkdir -p $FLINK_HOME/usrlib
>> COPY flink-examples-batch_2.12-1.12.2-WordCount.jar
>>  $FLINK_HOME/usrlib/wordcount.jar
>>
>> RUN mkdir -p ${FLINK_HOME}/plugins/s3-fs-presto
>> COPY flink-s3-fs-presto-1.12.2.jar $FLINK_HOME/plugins/s3-fs-presto/
>>
>> Thanks!
>>
>> On Thu, Mar 25, 2021 at 9:24 AM Guowei Ma  wrote:
>>
>>> Hi,
>>> After some discussion with Wang Yang offline, it seems that there might
>>> be a jobmanager failover. So would you like to share full jobmanager log?
>>> Best,
>>> Guowei
>>>
>>>
>>> On Wed, Mar 24, 2021 at 10:04 PM Lukáš Drbal 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I would like to use native kubernetes execution [1] for one batch job
>>>> and let scheduling on kubernetes. Flink version: 1.12.2.
>>>>
>>>> Kubernetes job:
>>>> apiVersion: batch/v1beta1
>>>> kind: CronJob
>>>> metadata:
>>>>   name: scheduled-job
>>>> spec:
>>>>   schedule: "*/1 * * * *"
>>>>   jobTemplate:
>>>> spec:
>>>>   template:
>>>> metadata:
>>>>   labels:
>>>> app: super-flink-batch-job
>>>> spec:
>>>>   containers:
>>>>   - name: runner
>>>> image: localhost:5000/batch-flink-app-v3:latest
>>>> imagePullPolicy: Always
>>>> command:
>>>>   - /bin/sh
>>>>   - -c
>>>>   - /opt/flink/bin/flink run-application --target
>>>> kubernetes-application -Dkubernetes.service-account=flink-service-account
>>>> -Dkubernetes.rest-service.exposed.type=NodePort
>>>> -Dkubernetes.cluster-id=batch-job-cluster
>>>> -Dkubernetes.container.image=localhost:5000/batch-flink-app-v3:latest
>>>> -Ds3.endpoint=http://minio-1616518256:9000 -Ds3.access-key=ACCESSKEY
>>>> -Ds3.secret-key=SECRETKEY
>>>> -Djobmanager.archive.fs.dir=s3://flink/completed-jobs/
>>>> -Ds3.path-style-access=true -Ds3.ssl.enabled=false
>>>> -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>>>> -Dhigh-availability.storageDir=s3://flink/flink-ha
>>>> local:///opt/flink/usrlib/job.jar
>>>>   restartPolicy: OnFailure
>>>>
>>>>
>>>> This works well for me but I would like to write the result to the
>>>> archive path and show it in the History server (running as separate
>>>> deployment in k8)
>>>>
>>>> Anytime it creates JobId= which
>>>> obviously leads to
>>>>
>>>> Caused by: java.util.concurrent.ExecutionException:
>>>> org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has
>>>> already been submitted.
>>>> at
>>>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>>>> ~[?:1.8.0_282]
>>>> at
>>>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>>>> ~[?:1.8.0_282]
>>>> at
>>>> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
>>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>>> at
>>>> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129)
&

Re: Native kubernetes execution and History server

2021-03-25 Thread Guowei Ma
Hi,
Thanks for providing the logs. From the logs this is a known bug.[1]
Maybe you could use `$internal.pipeline.job-id` to set your own
job-id.(Thanks to Wang Yang)
But keep in mind this is only for internal use and may be changed in
some release. So you should keep an eye on [1] for the correct solution.

[1] https://issues.apache.org/jira/browse/FLINK-19358

Best,
Guowei


On Thu, Mar 25, 2021 at 5:31 PM Lukáš Drbal  wrote:

> Hello,
>
> sure. Here is log from first run which succeed -
> https://pastebin.com/tV75ZS5S
> and here is from second run (it's same for all next) -
> https://pastebin.com/pwTFyGvE
>
> My Docker file is pretty simple, just take wordcount + S3
>
> FROM flink:1.12.2
>
> RUN mkdir -p $FLINK_HOME/usrlib
> COPY flink-examples-batch_2.12-1.12.2-WordCount.jar
>  $FLINK_HOME/usrlib/wordcount.jar
>
> RUN mkdir -p ${FLINK_HOME}/plugins/s3-fs-presto
> COPY flink-s3-fs-presto-1.12.2.jar $FLINK_HOME/plugins/s3-fs-presto/
>
> Thanks!
>
> On Thu, Mar 25, 2021 at 9:24 AM Guowei Ma  wrote:
>
>> Hi,
>> After some discussion with Wang Yang offline, it seems that there might
>> be a jobmanager failover. So would you like to share full jobmanager log?
>> Best,
>> Guowei
>>
>>
>> On Wed, Mar 24, 2021 at 10:04 PM Lukáš Drbal 
>> wrote:
>>
>>> Hi,
>>>
>>> I would like to use native kubernetes execution [1] for one batch job
>>> and let scheduling on kubernetes. Flink version: 1.12.2.
>>>
>>> Kubernetes job:
>>> apiVersion: batch/v1beta1
>>> kind: CronJob
>>> metadata:
>>>   name: scheduled-job
>>> spec:
>>>   schedule: "*/1 * * * *"
>>>   jobTemplate:
>>> spec:
>>>   template:
>>> metadata:
>>>   labels:
>>> app: super-flink-batch-job
>>> spec:
>>>   containers:
>>>   - name: runner
>>> image: localhost:5000/batch-flink-app-v3:latest
>>> imagePullPolicy: Always
>>> command:
>>>   - /bin/sh
>>>   - -c
>>>   - /opt/flink/bin/flink run-application --target
>>> kubernetes-application -Dkubernetes.service-account=flink-service-account
>>> -Dkubernetes.rest-service.exposed.type=NodePort
>>> -Dkubernetes.cluster-id=batch-job-cluster
>>> -Dkubernetes.container.image=localhost:5000/batch-flink-app-v3:latest
>>> -Ds3.endpoint=http://minio-1616518256:9000 -Ds3.access-key=ACCESSKEY
>>> -Ds3.secret-key=SECRETKEY
>>> -Djobmanager.archive.fs.dir=s3://flink/completed-jobs/
>>> -Ds3.path-style-access=true -Ds3.ssl.enabled=false
>>> -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>>> -Dhigh-availability.storageDir=s3://flink/flink-ha
>>> local:///opt/flink/usrlib/job.jar
>>>   restartPolicy: OnFailure
>>>
>>>
>>> This works well for me but I would like to write the result to the
>>> archive path and show it in the History server (running as separate
>>> deployment in k8)
>>>
>>> Anytime it creates JobId= which
>>> obviously leads to
>>>
>>> Caused by: java.util.concurrent.ExecutionException:
>>> org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has
>>> already been submitted.
>>> at
>>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>>> ~[?:1.8.0_282]
>>> at
>>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>>> ~[?:1.8.0_282]
>>> at
>>> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>> at
>>> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129)
>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>> at
>>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>> at
>>> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>> at org.apache.flink.api.java.DataSet.collect(DataSet.java:417)
>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>> at org.apache.flink.api.java.DataSet.print(DataSet.java:1748)
>>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>>> at
>>> org.apache.flink.examples.java.word

Re: Native kubernetes execution and History server

2021-03-25 Thread Lukáš Drbal
Hello,

sure. Here is log from first run which succeed -
https://pastebin.com/tV75ZS5S
and here is from second run (it's same for all next) -
https://pastebin.com/pwTFyGvE

My Docker file is pretty simple, just take wordcount + S3

FROM flink:1.12.2

RUN mkdir -p $FLINK_HOME/usrlib
COPY flink-examples-batch_2.12-1.12.2-WordCount.jar
 $FLINK_HOME/usrlib/wordcount.jar

RUN mkdir -p ${FLINK_HOME}/plugins/s3-fs-presto
COPY flink-s3-fs-presto-1.12.2.jar $FLINK_HOME/plugins/s3-fs-presto/

Thanks!

On Thu, Mar 25, 2021 at 9:24 AM Guowei Ma  wrote:

> Hi,
> After some discussion with Wang Yang offline, it seems that there might be
> a jobmanager failover. So would you like to share full jobmanager log?
> Best,
> Guowei
>
>
> On Wed, Mar 24, 2021 at 10:04 PM Lukáš Drbal 
> wrote:
>
>> Hi,
>>
>> I would like to use native kubernetes execution [1] for one batch job and
>> let scheduling on kubernetes. Flink version: 1.12.2.
>>
>> Kubernetes job:
>> apiVersion: batch/v1beta1
>> kind: CronJob
>> metadata:
>>   name: scheduled-job
>> spec:
>>   schedule: "*/1 * * * *"
>>   jobTemplate:
>> spec:
>>   template:
>> metadata:
>>   labels:
>> app: super-flink-batch-job
>> spec:
>>   containers:
>>   - name: runner
>> image: localhost:5000/batch-flink-app-v3:latest
>> imagePullPolicy: Always
>> command:
>>   - /bin/sh
>>   - -c
>>   - /opt/flink/bin/flink run-application --target
>> kubernetes-application -Dkubernetes.service-account=flink-service-account
>> -Dkubernetes.rest-service.exposed.type=NodePort
>> -Dkubernetes.cluster-id=batch-job-cluster
>> -Dkubernetes.container.image=localhost:5000/batch-flink-app-v3:latest
>> -Ds3.endpoint=http://minio-1616518256:9000 -Ds3.access-key=ACCESSKEY
>> -Ds3.secret-key=SECRETKEY
>> -Djobmanager.archive.fs.dir=s3://flink/completed-jobs/
>> -Ds3.path-style-access=true -Ds3.ssl.enabled=false
>> -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>> -Dhigh-availability.storageDir=s3://flink/flink-ha
>> local:///opt/flink/usrlib/job.jar
>>   restartPolicy: OnFailure
>>
>>
>> This works well for me but I would like to write the result to the
>> archive path and show it in the History server (running as separate
>> deployment in k8)
>>
>> Anytime it creates JobId= which obviously
>> leads to
>>
>> Caused by: java.util.concurrent.ExecutionException:
>> org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has
>> already been submitted.
>> at
>> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
>> ~[?:1.8.0_282]
>> at
>> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
>> ~[?:1.8.0_282]
>> at
>> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at org.apache.flink.api.java.DataSet.collect(DataSet.java:417)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at org.apache.flink.api.java.DataSet.print(DataSet.java:1748)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
>> ~[?:?]
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> ~[?:1.8.0_282]
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> ~[?:1.8.0_282]
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> ~[?:1.8.0_282]
>> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
>> at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> at
>> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:242)
>> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
>> ... 10 more
>>
>> I assume it is because it will spawn a completely new cluster for each
>> run.
>>
>> Can I somehow set jobId or I'm trying to do something unsupported/bad?
>>
>> Thanks for advice.
>>
>> L.
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/native_kubernetes.html
>>
>


Re: Native kubernetes execution and History server

2021-03-25 Thread Guowei Ma
Hi,
After some discussion with Wang Yang offline, it seems that there might be
a jobmanager failover. So would you like to share full jobmanager log?
Best,
Guowei


On Wed, Mar 24, 2021 at 10:04 PM Lukáš Drbal  wrote:

> Hi,
>
> I would like to use native kubernetes execution [1] for one batch job and
> let scheduling on kubernetes. Flink version: 1.12.2.
>
> Kubernetes job:
> apiVersion: batch/v1beta1
> kind: CronJob
> metadata:
>   name: scheduled-job
> spec:
>   schedule: "*/1 * * * *"
>   jobTemplate:
> spec:
>   template:
> metadata:
>   labels:
> app: super-flink-batch-job
> spec:
>   containers:
>   - name: runner
> image: localhost:5000/batch-flink-app-v3:latest
> imagePullPolicy: Always
> command:
>   - /bin/sh
>   - -c
>   - /opt/flink/bin/flink run-application --target
> kubernetes-application -Dkubernetes.service-account=flink-service-account
> -Dkubernetes.rest-service.exposed.type=NodePort
> -Dkubernetes.cluster-id=batch-job-cluster
> -Dkubernetes.container.image=localhost:5000/batch-flink-app-v3:latest
> -Ds3.endpoint=http://minio-1616518256:9000 -Ds3.access-key=ACCESSKEY
> -Ds3.secret-key=SECRETKEY
> -Djobmanager.archive.fs.dir=s3://flink/completed-jobs/
> -Ds3.path-style-access=true -Ds3.ssl.enabled=false
> -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> -Dhigh-availability.storageDir=s3://flink/flink-ha
> local:///opt/flink/usrlib/job.jar
>   restartPolicy: OnFailure
>
>
> This works well for me but I would like to write the result to the archive
> path and show it in the History server (running as separate deployment in
> k8)
>
> Anytime it creates JobId= which obviously
> leads to
>
> Caused by: java.util.concurrent.ExecutionException:
> org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has
> already been submitted.
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
> ~[?:1.8.0_282]
> at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> ~[?:1.8.0_282]
> at
> org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at org.apache.flink.api.java.DataSet.collect(DataSet.java:417)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at org.apache.flink.api.java.DataSet.print(DataSet.java:1748)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
> ~[?:?]
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> ~[?:1.8.0_282]
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> ~[?:1.8.0_282]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_282]
> at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> at
> org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:242)
> ~[flink-dist_2.12-1.12.2.jar:1.12.2]
> ... 10 more
>
> I assume it is because it will spawn a completely new cluster for each run.
>
> Can I somehow set jobId or I'm trying to do something unsupported/bad?
>
> Thanks for advice.
>
> L.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/native_kubernetes.html
>


Native kubernetes execution and History server

2021-03-24 Thread Lukáš Drbal
Hi,

I would like to use native kubernetes execution [1] for one batch job and
let scheduling on kubernetes. Flink version: 1.12.2.

Kubernetes job:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: scheduled-job
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
spec:
  template:
metadata:
  labels:
app: super-flink-batch-job
spec:
  containers:
  - name: runner
image: localhost:5000/batch-flink-app-v3:latest
imagePullPolicy: Always
command:
  - /bin/sh
  - -c
  - /opt/flink/bin/flink run-application --target
kubernetes-application -Dkubernetes.service-account=flink-service-account
-Dkubernetes.rest-service.exposed.type=NodePort
-Dkubernetes.cluster-id=batch-job-cluster
-Dkubernetes.container.image=localhost:5000/batch-flink-app-v3:latest
-Ds3.endpoint=http://minio-1616518256:9000 -Ds3.access-key=ACCESSKEY
-Ds3.secret-key=SECRETKEY
-Djobmanager.archive.fs.dir=s3://flink/completed-jobs/
-Ds3.path-style-access=true -Ds3.ssl.enabled=false
-Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
-Dhigh-availability.storageDir=s3://flink/flink-ha
local:///opt/flink/usrlib/job.jar
  restartPolicy: OnFailure


This works well for me but I would like to write the result to the archive
path and show it in the History server (running as separate deployment in
k8)

Anytime it creates JobId= which obviously
leads to

Caused by: java.util.concurrent.ExecutionException:
org.apache.flink.runtime.client.DuplicateJobSubmissionException: Job has
already been submitted.
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
~[?:1.8.0_282]
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
~[?:1.8.0_282]
at
org.apache.flink.api.java.ExecutionEnvironment.executeAsync(ExecutionEnvironment.java:1056)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.client.program.ContextEnvironment.executeAsync(ContextEnvironment.java:129)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:70)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:942)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.api.java.DataSet.collect(DataSet.java:417)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.api.java.DataSet.print(DataSet.java:1748)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:96)
~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_282]
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[?:1.8.0_282]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_282]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:349)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:219)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
at
org.apache.flink.client.deployment.application.ApplicationDispatcherBootstrap.runApplicationEntryPoint(ApplicationDispatcherBootstrap.java:242)
~[flink-dist_2.12-1.12.2.jar:1.12.2]
... 10 more

I assume it is because it will spawn a completely new cluster for each run.

Can I somehow set jobId or I'm trying to do something unsupported/bad?

Thanks for advice.

L.

[1]
https://ci.apache.org/projects/flink/flink-docs-stable/deployment/resource-providers/native_kubernetes.html


Re: Flink History server ( running jobs )

2021-03-19 Thread Vishal Santoshi
Thank you for the confirmation.

On Fri, Mar 19, 2021 at 5:37 AM Matthias Pohl 
wrote:

> Hi Vishal,
> yes, as the documentation explains [1]: Only jobs that reached a globally
> terminal state are archived into Flink's history server. State information
> about running jobs can be retrieved through Flink's REST API.
>
> Best,
> Matthias
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/advanced/historyserver.html#overview
>
> On Wed, Mar 17, 2021 at 10:33 PM Vishal Santoshi <
> vishal.santo...@gmail.com> wrote:
>
>> Hello folks,
>>
>> Does fliink server not provide for running jobs ( like spark history does
>> ) ?
>>
>> Regards.
>>
>


Re: Flink History server ( running jobs )

2021-03-19 Thread Matthias Pohl
Hi Vishal,
yes, as the documentation explains [1]: Only jobs that reached a globally
terminal state are archived into Flink's history server. State information
about running jobs can be retrieved through Flink's REST API.

Best,
Matthias

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/advanced/historyserver.html#overview

On Wed, Mar 17, 2021 at 10:33 PM Vishal Santoshi 
wrote:

> Hello folks,
>
> Does fliink server not provide for running jobs ( like spark history does
> ) ?
>
> Regards.
>


Flink History server ( running jobs )

2021-03-17 Thread Vishal Santoshi
Hello folks,

Does fliink server not provide for running jobs ( like spark history does )
?

Regards.


Re: Flink1.10 history server无法监控 FlinkSQL任务

2020-10-23 Thread Robin Zhang
Hi,yujianbo

只要任务结束,不管是cancel、failed、killed都会在history sever展示,
可以先去hdfs查看配置的目录是否存在任务相关的文件夹;也可以尝试重启一下history
server试试。麻烦问一下,你的任务使用什么api写的,以及版本、提交方式?





yujianbo wrote
> 大佬,我发现我配置完后就只能看到完成的任务在history sever上面,失败的看不到。现在疑惑的是失败的能不能出现在history server
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/





--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink1.10 history server无法监控 FlinkSQL任务

2020-10-23 Thread Robin Zhang
Hi,zhisheng

1.默认的刷新时间10s以及5s都测试过,真实体验是反应时间有点长,达到分钟级别,猜测这个参数的设置意义不大;
2.其实页面提供了Runing job
List,理论上是可以展示的,如果不能展示,web用的同一套前端代码的话,觉得有点鸡肋。使用起来,目前只能查看job的一些统计信息,如
ck相关,dag相关。

使用Filnk on yarn per job提交方式, 已经启动了yarn
JobHistoryServer,应该是不会产生影响的,除了sql其他api的任务监控正常。
对于大佬提出的问题:1.由于目前是测试阶段,没有上生产,依照yarn-session的running job
list展示模式,官方没有对页面进行分页操作,需要自己改源码。
   问题2:1.10版本对日志的展示不是很友好,1.11可以滚动文件展示,至于jm 和 tm
日志怎么获取,受限于官网文档资料的限制,现在还没有解决,我这里现在还是依赖yarn的job history
server以及聚合日志功能进行bug分析。如有进展会在此继续讨论,欢迎分享新成果。

Best,
Robin



zhisheng wrote
> Hi Robin:
> 
> 1、是不是更改了刷新时间?一直不显示吗?
> 
> 2、running 的作业不会显示的,你可以之间在 yarn 查看,history server 应该是只提供展示挂掉的作业
> 
> PS:另外提几个 history server 的问题
> 
> 1、挂掉的作业展示能否支持分页呢?目前直接在一个页面全部展示了历史所有的作业,打开会很卡
> 
> 2、有办法可以查看挂掉作业的 jm 和 tm 日志吗?因为 HDFS
> 其实是有日志,按道理是可以拿到日志信息然后解析展示出来的,Spark history server 也是可以查看挂掉作业的日志
> 
> 
> Best!
> zhisheng
> 
> Robin Zhang 

> vincent2015qdlg@

>  于2020年10月22日周四 下午6:11写道:
> 
>>
>> 如下图,Flink 1.10 on yarn per job提交方式,如果是java datastream 以及table
>> api开发的应用,能够被jm正常拉取统计信息,但是sql化的job没有办法被历史服务器监控。
>> 使用的sql不完全是官网的,但是是经过转化为datastream,以on yarn per
>> job方式提交到yarn运行的,只是多了个sql解析动作。不能理解
>>
>> ,为什么历史服务器没有加载job信息到hdfs上的目标目录。查看jobmanager日志以及configuration都能确定jm加载到了历史服务器的相关配置。
>>
>> <
>> http://apache-flink.147419.n8.nabble.com/file/t447/%E5%8E%86%E5%8F%B2%E6%9C%8D%E5%8A%A1%E5%99%A8.png>
>>
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-flink.147419.n8.nabble.com/





--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink1.10 history server无法监控 FlinkSQL任务

2020-10-23 Thread yujianbo
大佬,我发现我配置完后就只能看到完成的任务在history sever上面,失败的看不到。现在疑惑的是失败的能不能出现在history server



--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: Flink1.10 history server无法监控 FlinkSQL任务

2020-10-22 Thread zhisheng
Hi Robin:

1、是不是更改了刷新时间?一直不显示吗?

2、running 的作业不会显示的,你可以之间在 yarn 查看,history server 应该是只提供展示挂掉的作业

PS:另外提几个 history server 的问题

1、挂掉的作业展示能否支持分页呢?目前直接在一个页面全部展示了历史所有的作业,打开会很卡

2、有办法可以查看挂掉作业的 jm 和 tm 日志吗?因为 HDFS
其实是有日志,按道理是可以拿到日志信息然后解析展示出来的,Spark history server 也是可以查看挂掉作业的日志


Best!
zhisheng

Robin Zhang  于2020年10月22日周四 下午6:11写道:

>
> 如下图,Flink 1.10 on yarn per job提交方式,如果是java datastream 以及table
> api开发的应用,能够被jm正常拉取统计信息,但是sql化的job没有办法被历史服务器监控。
> 使用的sql不完全是官网的,但是是经过转化为datastream,以on yarn per
> job方式提交到yarn运行的,只是多了个sql解析动作。不能理解
>
> ,为什么历史服务器没有加载job信息到hdfs上的目标目录。查看jobmanager日志以及configuration都能确定jm加载到了历史服务器的相关配置。
>
> <
> http://apache-flink.147419.n8.nabble.com/file/t447/%E5%8E%86%E5%8F%B2%E6%9C%8D%E5%8A%A1%E5%99%A8.png>
>
>
>
>
>
>
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/


Flink1.10 history server无法监控 FlinkSQL任务

2020-10-22 Thread Robin Zhang

如下图,Flink 1.10 on yarn per job提交方式,如果是java datastream 以及table
api开发的应用,能够被jm正常拉取统计信息,但是sql化的job没有办法被历史服务器监控。
使用的sql不完全是官网的,但是是经过转化为datastream,以on yarn per
job方式提交到yarn运行的,只是多了个sql解析动作。不能理解
,为什么历史服务器没有加载job信息到hdfs上的目标目录。查看jobmanager日志以及configuration都能确定jm加载到了历史服务器的相关配置。


 





--
Sent from: http://apache-flink.147419.n8.nabble.com/

Re: flink1.11.2 在k8s上部署,如何启动history server

2020-10-10 Thread cxydeve...@163.com
目前自己的解决方案是像之前版本1.10类似
.
"containers": [
  {
"name": "jobmanager",
"image": "flink:1.11.2-scala_2.11",
"command": [
  "/bin/bash",
  "-c",
  "/opt/flink/bin/historyserver.sh start;/docker-entrypoint.sh
jobmanager;"
],
"ports": [
.




--
Sent from: http://apache-flink.147419.n8.nabble.com/


Re: flink1.11.2 在k8s上部署,如何启动history server

2020-10-10 Thread Yun Tang
Hi

可以在yaml文件中覆盖原始的 ENTRYPOINT 启动命令 [1]
或者可以参考 FLINK-17167 [2] 中的修改更改原始Dockerfile中的docker-entrypoint.sh


[1] 
https://kubernetes.io/zh/docs/tasks/inject-data-application/define-command-argument-container/
[2] https://issues.apache.org/jira/browse/FLINK-17167

祝好
唐云

From: chenxuying 
Sent: Saturday, October 10, 2020 15:56
To: user-zh@flink.apache.org 
Subject: flink1.11.2 在k8s上部署,如何启动history server

flink1.11.2 在k8s上部署,如何启动history server
之前1.10的yaml里面可以加命令,但是1.11的yaml是通过docker-entrypoint.sh
好像没发现这个入口脚本没有对应的history server参数



flink1.11.2 在k8s上部署,如何启动history server

2020-10-10 Thread chenxuying
flink1.11.2 在k8s上部署,如何启动history server
之前1.10的yaml里面可以加命令,但是1.11的yaml是通过docker-entrypoint.sh
好像没发现这个入口脚本没有对应的history server参数



Re: History Server Not Showing Any Jobs - File Not Found?

2020-07-12 Thread Chesnay Schepler
Ah, I remembered wrong, my apologies. Unfortunately there is no option 
to prevent the cleanup; it is something I wanted to have for a long time 
though...


On 11/07/2020 17:57, Hailu, Andreas wrote:


Thanks for the clarity. To this point you made:

/(Note that by configuring "historyserver.web.tmpdir" to some 
permanent directory subsequent (re)starts of the HistorySserver can 
re-use this directory; so you only have to download things once)/


The HistoryServer process in fact deletes this local cache during its 
shutdown hook. Is there a setting we can use so that it doesn’t do this?


2020-07-11 11:43:29,527 [HistoryServer shutdown hook] INFO 
HistoryServer - *Removing web dashboard root cache directory 
/local/scratch/flink_historyserver_tmpdir*


2020-07-11 11:43:29,536 [HistoryServer shutdown hook] INFO 
HistoryServer - Stopped history server.


We’re attempting to work around the UI becoming un-responsive/crashing 
the browser at a large number archives (in my testing, that’s around 
20,000 archives with Chrome)  by persisting the job IDs of our 
submitted apps and then navigating to the job overview page directly, 
e.g. http://(host):(port)/#/job/(jobId)/overview 
. It would have been 
really great if the server stored archives by the application ID 
rather than the job ID – particularly for apps that potentially submit 
hundreds of jobs. Tracking one application ID (ala Spark) would ease 
the burden on the dev + ops side. Perhaps a feature for the future J


*// *ah**

*From:*Chesnay Schepler 
*Sent:* Tuesday, June 2, 2020 3:55 AM
*To:* Hailu, Andreas [Engineering] ; 
user@flink.apache.org

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

1) It downloads all archives and stores them on disk; the only thing 
stored in memory is the job ID or the archive. There is no hard upper 
limit; it is mostly constrained by disk space / memory. I say mostly, 
because I'm not sure how well the WebUI handles 100k jobs being loaded 
into the overview.


2) No, there is no retention policy. It is currently expected that an 
external process cleans up archives. If an archive was deleted (from 
the archive directory) the HistoryServer does notice that and also 
delete the local copy.


On 01/06/2020 23:05, Hailu, Andreas wrote:

So I created a new HDFS directory with just 1 archive and pointed
the server to monitor that directory, et voila – I’m able to see
the applications in the UI. So it must have been really churning
trying to fetch all of those initial archives J

I have a couple of follow up questions if you please:

1.What is the upper limit of the number of archives the history
server can support? Does it attempt to download every archive and
load them all into memory?

2.Retention: we have on the order of 100K applications per day in
our production environment. Is there any native retention of
policy? E.g. only keep the latest X archives in the dir - or is
this something we need to manage ourselves?

Thanks.

*// *ah

*From:*Hailu, Andreas [Engineering]
*Sent:* Friday, May 29, 2020 8:46 AM
*To:* 'Chesnay Schepler' 
<mailto:ches...@apache.org>; user@flink.apache.org
<mailto:user@flink.apache.org>
*Subject:* RE: History Server Not Showing Any Jobs - File Not Found?

Yes, these are all in the same directory, and we’re at 67G right
now. I’ll try with incrementally smaller directories and let you
know what I find.

*// *ah

*From:*Chesnay Schepler mailto:ches...@apache.org>>
*Sent:* Friday, May 29, 2020 3:11 AM
*To:* Hailu, Andreas [Engineering] mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org
<mailto:user@flink.apache.org>
*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

oh I'm not using the HistoryServer; I just wrote it ;)

Are these archives all in the same location? So we're roughly
looking at 5 GB of archives then?

That could indeed "just" be a resource problem. The HistoryServer
eagerly downloads all archives, and not on-demand.

The next step would be to move some of the archives into a
separate HDFS directory and try again.

(Note that by configuring "historyserver.web.tmpdir" to some
permanent directory subsequent (re)starts of the HistorySserver
can re-use this directory; so you only have to download things once)

On 29/05/2020 00:43, Hailu, Andreas wrote:

May I also ask what version of flink-hadoop you’re using and
the number of jobs you’re storing the history for? As of
writing we have roughly 101,000 application history files. I’m
curious to know if we’re encountering some kind of resource
problem.

*// *ah

*From:*Hailu, Andreas [Engineering]
*Sent:* Thursday, May 28, 2020 12:18 PM
*To:* 'Chesnay Schepler' 
<mailto:ches...@apache.or

RE: History Server Not Showing Any Jobs - File Not Found?

2020-07-11 Thread Hailu, Andreas
Thanks for the clarity. To this point you made:
(Note that by configuring "historyserver.web.tmpdir" to some permanent 
directory subsequent (re)starts of the HistorySserver can re-use this 
directory; so you only have to download things once)

The HistoryServer process in fact deletes this local cache during its shutdown 
hook. Is there a setting we can use so that it doesn't do this?
2020-07-11 11:43:29,527 [HistoryServer shutdown hook] INFO  HistoryServer - 
Removing web dashboard root cache directory 
/local/scratch/flink_historyserver_tmpdir
2020-07-11 11:43:29,536 [HistoryServer shutdown hook] INFO  HistoryServer - 
Stopped history server.

We're attempting to work around the UI becoming un-responsive/crashing the 
browser at a large number archives (in my testing, that's around 20,000 
archives with Chrome)  by persisting the job IDs of our submitted apps and then 
navigating to the job overview page directly, e.g. 
http://(host):(port)/#/job/(jobId)/overview. It would have been really great if 
the server stored archives by the application ID rather than the job ID - 
particularly for apps that potentially submit hundreds of jobs. Tracking one 
application ID (ala Spark) would ease the burden on the dev + ops side. Perhaps 
a feature for the future :)

// ah

From: Chesnay Schepler 
Sent: Tuesday, June 2, 2020 3:55 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

1) It downloads all archives and stores them on disk; the only thing stored in 
memory is the job ID or the archive. There is no hard upper limit; it is mostly 
constrained by disk space / memory. I say mostly, because I'm not sure how well 
the WebUI handles 100k jobs being loaded into the overview.

2) No, there is no retention policy. It is currently expected that an external 
process cleans up archives. If an archive was deleted (from the archive 
directory) the HistoryServer does notice that and also delete the local copy.

On 01/06/2020 23:05, Hailu, Andreas wrote:
So I created a new HDFS directory with just 1 archive and pointed the server to 
monitor that directory, et voila - I'm able to see the applications in the UI. 
So it must have been really churning trying to fetch all of those initial 
archives :)

I have a couple of follow up questions if you please:

1.   What is the upper limit of the number of archives the history server 
can support? Does it attempt to download every archive and load them all into 
memory?

2.   Retention: we have on the order of 100K applications per day in our 
production environment. Is there any native retention of policy? E.g. only keep 
the latest X archives in the dir - or is this something we need to manage 
ourselves?

Thanks.

// ah

From: Hailu, Andreas [Engineering]
Sent: Friday, May 29, 2020 8:46 AM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Yes, these are all in the same directory, and we're at 67G right now. I'll try 
with incrementally smaller directories and let you know what I find.

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Friday, May 29, 2020 3:11 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

oh I'm not using the HistoryServer; I just wrote it ;)
Are these archives all in the same location? So we're roughly looking at 5 GB 
of archives then?

That could indeed "just" be a resource problem. The HistoryServer eagerly 
downloads all archives, and not on-demand.
The next step would be to move some of the archives into a separate HDFS 
directory and try again.

(Note that by configuring "historyserver.web.tmpdir" to some permanent 
directory subsequent (re)starts of the HistorySserver can re-use this 
directory; so you only have to download things once)

On 29/05/2020 00:43, Hailu, Andreas wrote:
May I also ask what version of flink-hadoop you're using and the number of jobs 
you're storing the history for? As of writing we have roughly 101,000 
application history files. I'm curious to know if we're encountering some kind 
of resource problem.

// ah

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Okay, I will look further to see if we're mistakenly using a version that's 
pre-2.6.0. However, I don't see flink-shaded-hadoop in my /lib directory for 
flink-1.9.1.

flink-dist_2.11-1.9.1.jar
flink-table-blink_2.11-1.9.1.jar
flink-table_2.11-1.9.1.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.15.jar

Are the files within /lib.

// ah

From: C

Re: History Server Not Showing Any Jobs - File Not Found?

2020-06-02 Thread Chesnay Schepler
1) It downloads all archives and stores them on disk; the only thing 
stored in memory is the job ID or the archive. There is no hard upper 
limit; it is mostly constrained by disk space / memory. I say mostly, 
because I'm not sure how well the WebUI handles 100k jobs being loaded 
into the overview.


2) No, there is no retention policy. It is currently expected that an 
external process cleans up archives. If an archive was deleted (from the 
archive directory) the HistoryServer does notice that and also delete 
the local copy.


On 01/06/2020 23:05, Hailu, Andreas wrote:


So I created a new HDFS directory with just 1 archive and pointed the 
server to monitor that directory, et voila – I’m able to see the 
applications in the UI. So it must have been really churning trying to 
fetch all of those initial archives J


I have a couple of follow up questions if you please:

1.What is the upper limit of the number of archives the history server 
can support? Does it attempt to download every archive and load them 
all into memory?


2.Retention: we have on the order of 100K applications per day in our 
production environment. Is there any native retention of policy? E.g. 
only keep the latest X archives in the dir - or is this something we 
need to manage ourselves?


Thanks.

*// *ah**

*From:*Hailu, Andreas [Engineering]
*Sent:* Friday, May 29, 2020 8:46 AM
*To:* 'Chesnay Schepler' ; user@flink.apache.org
*Subject:* RE: History Server Not Showing Any Jobs - File Not Found?

Yes, these are all in the same directory, and we’re at 67G right now. 
I’ll try with incrementally smaller directories and let you know what 
I find.


*// *ah**

*From:*Chesnay Schepler mailto:ches...@apache.org>>
*Sent:* Friday, May 29, 2020 3:11 AM
*To:* Hailu, Andreas [Engineering] <mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org 
<mailto:user@flink.apache.org>

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

oh I'm not using the HistoryServer; I just wrote it ;)

Are these archives all in the same location? So we're roughly looking 
at 5 GB of archives then?


That could indeed "just" be a resource problem. The HistoryServer 
eagerly downloads all archives, and not on-demand.


The next step would be to move some of the archives into a separate 
HDFS directory and try again.


(Note that by configuring "historyserver.web.tmpdir" to some permanent 
directory subsequent (re)starts of the HistorySserver can re-use this 
directory; so you only have to download things once)


On 29/05/2020 00:43, Hailu, Andreas wrote:

May I also ask what version of flink-hadoop you’re using and the
number of jobs you’re storing the history for? As of writing we
have roughly 101,000 application history files. I’m curious to
know if we’re encountering some kind of resource problem.

*// *ah

*From:*Hailu, Andreas [Engineering]
*Sent:* Thursday, May 28, 2020 12:18 PM
*To:* 'Chesnay Schepler' 
<mailto:ches...@apache.org>; user@flink.apache.org
<mailto:user@flink.apache.org>
*Subject:* RE: History Server Not Showing Any Jobs - File Not Found?

Okay, I will look further to see if we’re mistakenly using a
version that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop
in my /lib directory for flink-1.9.1.

flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

Are the files within /lib.

*// *ah

*From:*Chesnay Schepler mailto:ches...@apache.org>>
*Sent:* Thursday, May 28, 2020 11:00 AM
*To:* Hailu, Andreas [Engineering] mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org
    <mailto:user@flink.apache.org>
*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar
instances:

https://issues.apache.org/jira/browse/HDFS-6999

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6999=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=wtWbBz9FrMlr29HibXGZvdcsFC1wqyVPulrYiTewpoQ=>

https://issues.apache.org/jira/browse/HDFS-7005

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7005=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=0KgRQHmW0Xj6NToNVzoi9iAGh1SIbfe8cnCqj1TXuW8=>

https://issues.apache.org/jira/browse/HDFS-7145

<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7145=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0Zmaa

RE: History Server Not Showing Any Jobs - File Not Found?

2020-06-01 Thread Hailu, Andreas
So I created a new HDFS directory with just 1 archive and pointed the server to 
monitor that directory, et voila - I'm able to see the applications in the UI. 
So it must have been really churning trying to fetch all of those initial 
archives :)

I have a couple of follow up questions if you please:

1.  What is the upper limit of the number of archives the history server 
can support? Does it attempt to download every archive and load them all into 
memory?

2.  Retention: we have on the order of 100K applications per day in our 
production environment. Is there any native retention of policy? E.g. only keep 
the latest X archives in the dir - or is this something we need to manage 
ourselves?

Thanks.

// ah

From: Hailu, Andreas [Engineering]
Sent: Friday, May 29, 2020 8:46 AM
To: 'Chesnay Schepler' ; user@flink.apache.org
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Yes, these are all in the same directory, and we're at 67G right now. I'll try 
with incrementally smaller directories and let you know what I find.

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Friday, May 29, 2020 3:11 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

oh I'm not using the HistoryServer; I just wrote it ;)
Are these archives all in the same location? So we're roughly looking at 5 GB 
of archives then?

That could indeed "just" be a resource problem. The HistoryServer eagerly 
downloads all archives, and not on-demand.
The next step would be to move some of the archives into a separate HDFS 
directory and try again.

(Note that by configuring "historyserver.web.tmpdir" to some permanent 
directory subsequent (re)starts of the HistorySserver can re-use this 
directory; so you only have to download things once)

On 29/05/2020 00:43, Hailu, Andreas wrote:
May I also ask what version of flink-hadoop you're using and the number of jobs 
you're storing the history for? As of writing we have roughly 101,000 
application history files. I'm curious to know if we're encountering some kind 
of resource problem.

// ah

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Okay, I will look further to see if we're mistakenly using a version that's 
pre-2.6.0. However, I don't see flink-shaded-hadoop in my /lib directory for 
flink-1.9.1.

flink-dist_2.11-1.9.1.jar
flink-table-blink_2.11-1.9.1.jar
flink-table_2.11-1.9.1.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.15.jar

Are the files within /lib.

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6999=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=wtWbBz9FrMlr29HibXGZvdcsFC1wqyVPulrYiTewpoQ=>
https://issues.apache.org/jira/browse/HDFS-7005<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7005=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=0KgRQHmW0Xj6NToNVzoi9iAGh1SIbfe8cnCqj1TXuW8=>
https://issues.apache.org/jira/browse/HDFS-7145<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7145=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=oy8z5gRd6dNDURDDH20f2yiplIuJ9qnYZeVpTIrHMwc=>

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in 
/lib then you basically don't know what Hadoop version is actually being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used 
and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:
Just created a dump, here's what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 
tid=0x7f93a5a2c000 nid=0x5692 runnable [0x7f934a0d3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.

RE: History Server Not Showing Any Jobs - File Not Found?

2020-05-29 Thread Hailu, Andreas
Yes, these are all in the same directory, and we're at 67G right now. I'll try 
with incrementally smaller directories and let you know what I find.

// ah

From: Chesnay Schepler 
Sent: Friday, May 29, 2020 3:11 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

oh I'm not using the HistoryServer; I just wrote it ;)
Are these archives all in the same location? So we're roughly looking at 5 GB 
of archives then?

That could indeed "just" be a resource problem. The HistoryServer eagerly 
downloads all archives, and not on-demand.
The next step would be to move some of the archives into a separate HDFS 
directory and try again.

(Note that by configuring "historyserver.web.tmpdir" to some permanent 
directory subsequent (re)starts of the HistorySserver can re-use this 
directory; so you only have to download things once)

On 29/05/2020 00:43, Hailu, Andreas wrote:
May I also ask what version of flink-hadoop you're using and the number of jobs 
you're storing the history for? As of writing we have roughly 101,000 
application history files. I'm curious to know if we're encountering some kind 
of resource problem.

// ah

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Okay, I will look further to see if we're mistakenly using a version that's 
pre-2.6.0. However, I don't see flink-shaded-hadoop in my /lib directory for 
flink-1.9.1.

flink-dist_2.11-1.9.1.jar
flink-table-blink_2.11-1.9.1.jar
flink-table_2.11-1.9.1.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.15.jar

Are the files within /lib.

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6999=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=wtWbBz9FrMlr29HibXGZvdcsFC1wqyVPulrYiTewpoQ=>
https://issues.apache.org/jira/browse/HDFS-7005<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7005=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=0KgRQHmW0Xj6NToNVzoi9iAGh1SIbfe8cnCqj1TXuW8=>
https://issues.apache.org/jira/browse/HDFS-7145<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7145=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=oy8z5gRd6dNDURDDH20f2yiplIuJ9qnYZeVpTIrHMwc=>

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in 
/lib then you basically don't know what Hadoop version is actually being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used 
and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:
Just created a dump, here's what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 
tid=0x7f93a5a2c000 nid=0x5692 runnable [0x7f934a0d3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0005df986960> (a sun.nio.ch.Util$2)
- locked <0x0005df986948> (a java.util.Collections$UnmodifiableSet)
- locked <0x0005df928390> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
  

Re: History Server Not Showing Any Jobs - File Not Found?

2020-05-29 Thread Chesnay Schepler

oh I'm not using the HistoryServer; I just wrote it ;)
Are these archives all in the same location? So we're roughly looking at 
5 GB of archives then?


That could indeed "just" be a resource problem. The HistoryServer 
eagerly downloads all archives, and not on-demand.
The next step would be to move some of the archives into a separate HDFS 
directory and try again.


(Note that by configuring "historyserver.web.tmpdir" to some permanent 
directory subsequent (re)starts of the HistorySserver can re-use this 
directory; so you only have to download things once)


On 29/05/2020 00:43, Hailu, Andreas wrote:


May I also ask what version of flink-hadoop you’re using and the 
number of jobs you’re storing the history for? As of writing we have 
roughly 101,000 application history files. I’m curious to know if 
we’re encountering some kind of resource problem.


*// *ah**

*From:*Hailu, Andreas [Engineering]
*Sent:* Thursday, May 28, 2020 12:18 PM
*To:* 'Chesnay Schepler' ; user@flink.apache.org
*Subject:* RE: History Server Not Showing Any Jobs - File Not Found?

Okay, I will look further to see if we’re mistakenly using a version 
that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop in my /lib 
directory for flink-1.9.1.


flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

Are the files within /lib.

*// *ah**

*From:*Chesnay Schepler mailto:ches...@apache.org>>
*Sent:* Thursday, May 28, 2020 11:00 AM
*To:* Hailu, Andreas [Engineering] <mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org 
<mailto:user@flink.apache.org>

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:

https://issues.apache.org/jira/browse/HDFS-6999 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6999=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=wtWbBz9FrMlr29HibXGZvdcsFC1wqyVPulrYiTewpoQ=>


https://issues.apache.org/jira/browse/HDFS-7005 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7005=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=0KgRQHmW0Xj6NToNVzoi9iAGh1SIbfe8cnCqj1TXuW8=>


https://issues.apache.org/jira/browse/HDFS-7145 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7145=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=oy8z5gRd6dNDURDDH20f2yiplIuJ9qnYZeVpTIrHMwc=>


It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and 
flink-shaded-hadoop in /lib then you basically don't know what Hadoop 
version is actually being used,


which could lead to incompatibilities and dependency clashes.

If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is 
being used and runs into HDFS-7005.


On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5
os_prio=0 tid=0x7f93a5a2c000 nid=0x5692 runnable
[0x7f934a0d3000]

java.lang.Thread.State: RUNNABLE

    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

    at
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

    at
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

    at
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

    - locked <0x0005df986960> (a sun.nio.ch.Util$2)

    - locked <0x0005df986948> (a
java.util.Collections$UnmodifiableSet)

    - locked <0x0005df928390> (a sun.nio.ch.EPollSelectorImpl)

    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

    at

org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

    at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

    at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

    at

org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

    at

org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

    at

org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

    at

org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

    at
  

RE: History Server Not Showing Any Jobs - File Not Found?

2020-05-28 Thread Hailu, Andreas
May I also ask what version of flink-hadoop you're using and the number of jobs 
you're storing the history for? As of writing we have roughly 101,000 
application history files. I'm curious to know if we're encountering some kind 
of resource problem.

// ah

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' ; user@flink.apache.org
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Okay, I will look further to see if we're mistakenly using a version that's 
pre-2.6.0. However, I don't see flink-shaded-hadoop in my /lib directory for 
flink-1.9.1.

flink-dist_2.11-1.9.1.jar
flink-table-blink_2.11-1.9.1.jar
flink-table_2.11-1.9.1.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.15.jar

Are the files within /lib.

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6999=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=wtWbBz9FrMlr29HibXGZvdcsFC1wqyVPulrYiTewpoQ=>
https://issues.apache.org/jira/browse/HDFS-7005<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7005=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=0KgRQHmW0Xj6NToNVzoi9iAGh1SIbfe8cnCqj1TXuW8=>
https://issues.apache.org/jira/browse/HDFS-7145<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7145=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=oy8z5gRd6dNDURDDH20f2yiplIuJ9qnYZeVpTIrHMwc=>

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in 
/lib then you basically don't know what Hadoop version is actually being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used 
and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:
Just created a dump, here's what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 
tid=0x7f93a5a2c000 nid=0x5692 runnable [0x7f934a0d3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0005df986960> (a sun.nio.ch.Util$2)
- locked <0x0005df986948> (a java.util.Collections$UnmodifiableSet)
- locked <0x0005df928390> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
- locked <0x0005ceade5e0> (a 
org.apache.hadoop.hdfs.RemoteBlockReader2)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)
- eliminated <0x0005cead3688> (a 
org.apache.hadoop.hdfs.DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)
- locked <0x0005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:9

RE: History Server Not Showing Any Jobs - File Not Found?

2020-05-28 Thread Hailu, Andreas
Okay, I will look further to see if we're mistakenly using a version that's 
pre-2.6.0. However, I don't see flink-shaded-hadoop in my /lib directory for 
flink-1.9.1.

flink-dist_2.11-1.9.1.jar
flink-table-blink_2.11-1.9.1.jar
flink-table_2.11-1.9.1.jar
log4j-1.2.17.jar
slf4j-log4j12-1.7.15.jar

Are the files within /lib.

// ah

From: Chesnay Schepler 
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D6999=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=wtWbBz9FrMlr29HibXGZvdcsFC1wqyVPulrYiTewpoQ=>
https://issues.apache.org/jira/browse/HDFS-7005<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7005=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=0KgRQHmW0Xj6NToNVzoi9iAGh1SIbfe8cnCqj1TXuW8=>
https://issues.apache.org/jira/browse/HDFS-7145<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_HDFS-2D7145=DwMD-g=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM=b1rFpuaq4HMshPx-d-0ZmaazccTuKjDKzJjF0WZSIso=oy8z5gRd6dNDURDDH20f2yiplIuJ9qnYZeVpTIrHMwc=>

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in 
/lib then you basically don't know what Hadoop version is actually being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used 
and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:
Just created a dump, here's what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 
tid=0x7f93a5a2c000 nid=0x5692 runnable [0x7f934a0d3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0005df986960> (a sun.nio.ch.Util$2)
- locked <0x0005df986948> (a java.util.Collections$UnmodifiableSet)
- locked <0x0005df928390> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
- locked <0x0005ceade5e0> (a 
org.apache.hadoop.hdfs.RemoteBlockReader2)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)
- eliminated <0x0005cead3688> (a 
org.apache.hadoop.hdfs.DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)
- locked <0x0005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
- locked <0x0005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)
at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)
at 
org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)
at 
org.apache.flink.runtime.we

Re: History Server Not Showing Any Jobs - File Not Found?

2020-05-28 Thread Chesnay Schepler

Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:
https://issues.apache.org/jira/browse/HDFS-6999
https://issues.apache.org/jira/browse/HDFS-7005
https://issues.apache.org/jira/browse/HDFS-7145

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop 
in /lib then you basically don't know what Hadoop version is actually 
being used,

which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being 
used and runs into HDFS-7005.


On 28/05/2020 16:27, Hailu, Andreas wrote:


Just created a dump, here’s what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 
os_prio=0 tid=0x7f93a5a2c000 nid=0x5692 runnable [0x7f934a0d3000]


java.lang.Thread.State: RUNNABLE

    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

    at 
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)


    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

    - locked <0x0005df986960> (a sun.nio.ch.Util$2)

    - locked <0x0005df986948> (a 
java.util.Collections$UnmodifiableSet)


    - locked <0x0005df928390> (a sun.nio.ch.EPollSelectorImpl)

    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

    at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)


    at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)


    at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)


    at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)


    at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)


    at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)


    at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)


    at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)


    at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)


    - locked <0x0005ceade5e0> (a 
org.apache.hadoop.hdfs.RemoteBlockReader2)


    at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)


    at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)


    - eliminated <0x0005cead3688> (a 
org.apache.hadoop.hdfs.DFSInputStream)


    at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)


    - locked <0x0005cead3688> (a 
org.apache.hadoop.hdfs.DFSInputStream)


   at 
org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)


    - locked <0x0005cead3688> (a 
org.apache.hadoop.hdfs.DFSInputStream)


    at java.io.DataInputStream.read(DataInputStream.java:149)

    at 
org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)


    at java.io.InputStream.read(InputStream.java:101)

    at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

    at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

    at 
org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)


    at 
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)


    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)


    at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)


    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)


    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)


    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)


    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)


    at java.lang.Thread.run(Thread.java:745)

What problems could the flink-shaded-hadoop jar being included introduce?

*// *ah**

*From:*Chesnay Schepler 
*Sent:* Thursday, May 28, 2020 9:26 AM
*To:* Hailu, Andreas [Engineering] ; 
user@flink.apache.org

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

If it were a class-loading issue I would think that we'd see an 
exception of some kind. Maybe double-check that flink-shaded-hadoop is 
not in the lib directory. (usually I would ask for the full classpath 
that the HS is started with, but as it turns out thi

RE: History Server Not Showing Any Jobs - File Not Found?

2020-05-28 Thread Hailu, Andreas
Just created a dump, here's what I see:

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 
tid=0x7f93a5a2c000 nid=0x5692 runnable [0x7f934a0d3000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0005df986960> (a sun.nio.ch.Util$2)
- locked <0x0005df986948> (a java.util.Collections$UnmodifiableSet)
- locked <0x0005df928390> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)
at 
org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)
- locked <0x0005ceade5e0> (a 
org.apache.hadoop.hdfs.RemoteBlockReader2)
at 
org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)
at 
org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)
- eliminated <0x0005cead3688> (a 
org.apache.hadoop.hdfs.DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)
- locked <0x0005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)
   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)
- locked <0x0005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)
at java.io.InputStream.read(InputStream.java:101)
at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)
at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)
at 
org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)
at 
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

What problems could the flink-shaded-hadoop jar being included introduce?

// ah

From: Chesnay Schepler 
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

If it were a class-loading issue I would think that we'd see an exception of 
some kind. Maybe double-check that flink-shaded-hadoop is not in the lib 
directory. (usually I would ask for the full classpath that the HS is started 
with, but as it turns out this isn't getting logged :( (FLINK-18008))

The fact that overview.json and jobs/overview.json are missing indicates that 
something goes wrong directly on startup. What is supposed to happens is that 
the HS starts, fetches all currently available archives and then creates these 
files.
So it seems like the download gets stuck for some reason.

Can you use jstack to create a thread dump, and see what the 
Flink-HistoryServer-ArchiveFetcher is doing?

I will also file a JIRA for adding more logging statements, like when fetching 
starts/stops.

On 27/05/2020 20:57, Hailu, Andreas wrote:
Hi Chesney, apologies for not getting back to you sooner here. So I did what 
you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS 
directory to a locally availab

Re: History Server Not Showing Any Jobs - File Not Found?

2020-05-28 Thread Chesnay Schepler
If it were a class-loading issue I would think that we'd see an 
exception of some kind. Maybe double-check that flink-shaded-hadoop is 
not in the lib directory. (usually I would ask for the full classpath 
that the HS is started with, but as it turns out this isn't getting 
logged :( (FLINK-18008))


The fact that overview.json and jobs/overview.json are missing indicates 
that something goes wrong directly on startup. What is supposed to 
happens is that the HS starts, fetches all currently available archives 
and then creates these files.

So it seems like the download gets stuck for some reason.

Can you use jstack to create a thread dump, and see what the 
Flink-HistoryServer-ArchiveFetcher is doing?


I will also file a JIRA for adding more logging statements, like when 
fetching starts/stops.


On 27/05/2020 20:57, Hailu, Andreas wrote:


Hi Chesney, apologies for not getting back to you sooner here. So I 
did what you suggested - I downloaded a few files from my 
jobmanager.archive.fs.dir HDFS directory to a locally available 
directory named 
/local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then 
changed my historyserver.archive.fs.dir to 
file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and 
that seemed to work. I’m able to see the history of the applications I 
downloaded. So this points to a problem with sourcing the history from 
HDFS.


Do you think this could be classpath related? This is what we use for 
our HADOOP_CLASSPATH var:


//gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar/

//

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

*// *ah**

*From:*Chesnay Schepler 
*Sent:* Sunday, May 3, 2020 6:00 PM
*To:* Hailu, Andreas [Engineering] ; 
user@flink.apache.org

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

I couldn't reproduce the issue locally myself so far.

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve
only just started to archive them in the past couple weeks. Could
you clarify on how you want to try local filesystem archives? As
in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir
to the same local directory?

*// *ah

*From:*Chesnay Schepler 
<mailto:ches...@apache.org>
*Sent:* Wednesday, April 29, 2020 8:26 AM
*To:* Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
<mailto:user@flink.apache.org>
    *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

hmm...let's see if I can reproduce the issue locally.

Are the archives from the same version the history server runs on?
(Which I supposed would be 1.9.1?)

Just for the sake of narrowing things down, it would also be
interesting to check if it works with the archives residing in the
local filesystem.

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

There are just two directories in here. I don’t see cache
directories from my attempts today, which is interesting.
Looking a little deeper into them:

bash-4.1$ ls -lr

/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr

/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

There are indeed archives already in HDFS – I’ve included some
in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r- 3 delp datalake_admin_dev  50569 2020-03-21
23:17
/user/p

RE: History Server Not Showing Any Jobs - File Not Found?

2020-05-27 Thread Hailu, Andreas
Hi Chesney, apologies for not getting back to you sooner here. So I did what 
you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS 
directory to a locally available directory named 
/local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my 
historyserver.archive.fs.dir to 
file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed 
to work. I'm able to see the history of the applications I downloaded. So this 
points to a problem with sourcing the history from HDFS.

Do you think this could be classpath related? This is what we use for our 
HADOOP_CLASSPATH var:
/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

// ah

From: Chesnay Schepler 
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

I couldn't reproduce the issue locally myself so far.

On 01/05/2020 22:31, Hailu, Andreas wrote:
Hi Chesnay, yes - they were created using Flink 1.9.1 as we've only just 
started to archive them in the past couple weeks. Could you clarify on how you 
want to try local filesystem archives? As in changing jobmanager.archive.fs.dir 
and historyserver.web.tmpdir to the same local directory?

// ah

From: Chesnay Schepler <mailto:ches...@apache.org>
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

hmm...let's see if I can reproduce the issue locally.

Are the archives from the same version the history server runs on? (Which I 
supposed would be 1.9.1?)

Just for the sake of narrowing things down, it would also be interesting to 
check if it works with the archives residing in the local filesystem.

On 27/04/2020 18:35, Hailu, Andreas wrote:
bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/
total 8
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

There are just two directories in here. I don't see cache directories from my 
attempts today, which is interesting. Looking a little deeper into them:

bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
total 1756
drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs
bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs
total 0
-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

There are indeed archives already in HDFS - I've included some in my initial 
mail, but here they are again just for reference:
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...


// ah

From: Chesnay Schepler <mailto:ches...@apache.org>
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that 
should be fine.

What are the contents of /local/scratch/flink_historyserver_tmpdir?
I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:
My machine's /tmp directory is not large enough to support the archived files, 
so I changed my java.io.tmpdir to be in some other location which is 
significantly larger. I hadn't set anything for historyserver.web.tmpdir, so I 
suspect it was still po

Re: History Server Not Showing Any Jobs - File Not Found?

2020-05-03 Thread Chesnay Schepler

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

I couldn't reproduce the issue locally myself so far.

On 01/05/2020 22:31, Hailu, Andreas wrote:


Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only 
just started to archive them in the past couple weeks. Could you 
clarify on how you want to try local filesystem archives? As in 
changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the 
same local directory?


*// *ah**

*From:*Chesnay Schepler 
*Sent:* Wednesday, April 29, 2020 8:26 AM
*To:* Hailu, Andreas [Engineering] ; 
user@flink.apache.org

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

hmm...let's see if I can reproduce the issue locally.

Are the archives from the same version the history server runs on? 
(Which I supposed would be 1.9.1?)


Just for the sake of narrowing things down, it would also be 
interesting to check if it works with the archives residing in the 
local filesystem.


On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

There are just two directories in here. I don’t see cache
directories from my attempts today, which is interesting. Looking
a little deeper into them:

bash-4.1$ ls -lr

/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr

/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

There are indeed archives already in HDFS – I’ve included some in
my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r- 3 delp datalake_admin_dev  50569 2020-03-21 23:17
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r- 3 delp datalake_admin_dev  49578 2020-03-03 08:45
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r- 3 delp datalake_admin_dev  50842 2020-03-24 15:19
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

*// *ah

*From:*Chesnay Schepler 
<mailto:ches...@apache.org>
*Sent:* Monday, April 27, 2020 10:28 AM
*To:* Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
<mailto:user@flink.apache.org>
    *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

If historyserver.web.tmpdir is not set then java.io.tmpdir is
used, so that should be fine.

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the
archived files, so I changed my java.io.tmpdir to be in some
other location which is significantly larger. I hadn’t set
anything for historyserver.web.tmpdir, so I suspect it was
still pointing at /tmp. I just tried setting
historyserver.web.tmpdir to the same location as my
java.io.tmpdir location, but I’m afraid I’m still seeing the
following issue:

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG
HistoryServerStaticFileServerHandler - Unable to load
requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG
HistoryServerStaticFileServerHandler - Unable to load
requested file /jobs/overview.json from classloader

flink-conf.yaml for reference:

jobmanager.archive.fs.dir:
hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir:
hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir:
/local/scratch/flink_historyserver_tmpdir/

Did you have anything else in mind when you said pointing
somewhere funny?

*// *ah

*From:*Chesnay Schepler 
<mailto:ches...@apache.org>
*Sent:* Monday, April 27, 2020 5:56 AM
*To:* Hailu, Andreas [Engineering]

<mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
<mailto:user@flink.apache.org>
    *Subject:* Re: History Server Not Showing Any Jobs - File Not
Found?

overview.json is a generated file that is placed in the local
directory controlled by /historyserver.web.tmpdir/.

Have you configu

RE: History Server Not Showing Any Jobs - File Not Found?

2020-05-01 Thread Hailu, Andreas
Hi Chesnay, yes - they were created using Flink 1.9.1 as we've only just 
started to archive them in the past couple weeks. Could you clarify on how you 
want to try local filesystem archives? As in changing jobmanager.archive.fs.dir 
and historyserver.web.tmpdir to the same local directory?

// ah

From: Chesnay Schepler 
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

hmm...let's see if I can reproduce the issue locally.

Are the archives from the same version the history server runs on? (Which I 
supposed would be 1.9.1?)

Just for the sake of narrowing things down, it would also be interesting to 
check if it works with the archives residing in the local filesystem.

On 27/04/2020 18:35, Hailu, Andreas wrote:
bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/
total 8
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

There are just two directories in here. I don't see cache directories from my 
attempts today, which is interesting. Looking a little deeper into them:

bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
total 1756
drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs
bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs
total 0
-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

There are indeed archives already in HDFS - I've included some in my initial 
mail, but here they are again just for reference:
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...


// ah

From: Chesnay Schepler <mailto:ches...@apache.org>
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that 
should be fine.

What are the contents of /local/scratch/flink_historyserver_tmpdir?
I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:
My machine's /tmp directory is not large enough to support the archived files, 
so I changed my java.io.tmpdir to be in some other location which is 
significantly larger. I hadn't set anything for historyserver.web.tmpdir, so I 
suspect it was still pointing at /tmp. I just tried setting 
historyserver.web.tmpdir to the same location as my java.io.tmpdir location, 
but I'm afraid I'm still seeing the following issue:

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/overview.json from classloader
2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader

flink-conf.yaml for reference:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

Did you have anything else in mind when you said pointing somewhere funny?

// ah

From: Chesnay Schepler <mailto:ches...@apache.org>
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?


overview.json is a generated file that is placed in the local directory 
controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if 
not, is the java.io.tmpdir property pointing somewhere funny?)
On 24/04/2020 18:24, Hailu, Andreas wrote:
I'm having a further look at the code in HistoryServerStaticFileServerHandler - 
is there an assumption about where overview.json is supposed to be located?

// ah

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; Hailu, 
Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - Fil

Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-29 Thread Chesnay Schepler

hmm...let's see if I can reproduce the issue locally.

Are the archives from the same version the history server runs on? 
(Which I supposed would be 1.9.1?)


Just for the sake of narrowing things down, it would also be interesting 
to check if it works with the archives residing in the local filesystem.


On 27/04/2020 18:35, Hailu, Andreas wrote:


bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9


drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76


There are just two directories in here. I don’t see cache directories 
from my attempts today, which is interesting. Looking a little deeper 
into them:


bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9


total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs


total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

There are indeed archives already in HDFS – I’ve included some in my 
initial mail, but here they are again just for reference:


-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936


-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5


-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757


...

*// *ah**

*From:*Chesnay Schepler 
*Sent:* Monday, April 27, 2020 10:28 AM
*To:* Hailu, Andreas [Engineering] ; 
user@flink.apache.org

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so 
that should be fine.


What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the
archived files, so I changed my java.io.tmpdir to be in some other
location which is significantly larger. I hadn’t set anything for
historyserver.web.tmpdir, so I suspect it was still pointing at
/tmp. I just tried setting historyserver.web.tmpdir to the same
location as my java.io.tmpdir location, but I’m afraid I’m still
seeing the following issue:

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG
HistoryServerStaticFileServerHandler - Unable to load requested
file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG
HistoryServerStaticFileServerHandler - Unable to load requested
file /jobs/overview.json from classloader

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir:
hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

Did you have anything else in mind when you said pointing
somewhere funny?

*// *ah

*From:*Chesnay Schepler 
<mailto:ches...@apache.org>
*Sent:* Monday, April 27, 2020 5:56 AM
*To:* Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
<mailto:user@flink.apache.org>
    *Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

overview.json is a generated file that is placed in the local
directory controlled by /historyserver.web.tmpdir/.

Have you configured this option to point to some non-local
filesystem? (Or if not, is the java.io.tmpdir property pointing
somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in
HistoryServerStaticFileServerHandler - is there an assumption
about where overview.json is supposed to be located?

*// *ah

*From:*Hailu, Andreas [Engineering]
*Sent:* Wednesday, April 22, 2020 1:32 PM
*To:* 'Chesnay Schepler' 
<mailto:ches...@apache.org>; Hailu, Andreas [Engineering]

<mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
<mailto:user@flink.apache.org>
    *Subject:* RE: History Server Not Showing Any Jobs - File Not
Found?

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I
enabled DEBUG level logging and this is something relevant I see:

2020-04-22 13:25:52,566
[Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
DFSInputStream - Connecting to datanode 10.79.25

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Hailu, Andreas
bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/
total 8
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 
flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 
flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

There are just two directories in here. I don't see cache directories from my 
attempts today, which is interesting. Looking a little deeper into them:

bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9
total 1756
drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs
bash-4.1$ ls -lr 
/local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs
total 0
-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

There are indeed archives already in HDFS - I've included some in my initial 
mail, but here they are again just for reference:
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...


// ah

From: Chesnay Schepler 
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that 
should be fine.

What are the contents of /local/scratch/flink_historyserver_tmpdir?
I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:
My machine's /tmp directory is not large enough to support the archived files, 
so I changed my java.io.tmpdir to be in some other location which is 
significantly larger. I hadn't set anything for historyserver.web.tmpdir, so I 
suspect it was still pointing at /tmp. I just tried setting 
historyserver.web.tmpdir to the same location as my java.io.tmpdir location, 
but I'm afraid I'm still seeing the following issue:

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/overview.json from classloader
2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader

flink-conf.yaml for reference:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

Did you have anything else in mind when you said pointing somewhere funny?

// ah

From: Chesnay Schepler <mailto:ches...@apache.org>
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?


overview.json is a generated file that is placed in the local directory 
controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if 
not, is the java.io.tmpdir property pointing somewhere funny?)
On 24/04/2020 18:24, Hailu, Andreas wrote:
I'm having a further look at the code in HistoryServerStaticFileServerHandler - 
is there an assumption about where overview.json is supposed to be located?

// ah

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; Hailu, 
Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Hi Chesnay, thanks for responding. We're using Flink 1.9.1. I enabled DEBUG 
level logging and this is something relevant I see:

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - Connecting to datanode 10.79.252.101:1019
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, 
remoteHostTrusted = false
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL client skipping handshake in secured 
configuration with privileged port for addr = /10.79.252.101, datanodeId = 
DatanodeI
nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]
2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - DFSInputStream has been closed already
2020-04-22 13:25:

Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Chesnay Schepler
If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so 
that should be fine.


What are the contents of /local/scratch/flink_historyserver_tmpdir?
I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:


My machine’s /tmp directory is not large enough to support the 
archived files, so I changed my java.io.tmpdir to be in some other 
location which is significantly larger. I hadn’t set anything for 
historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. 
I just tried setting historyserver.web.tmpdir to the same location as 
my java.io.tmpdir location, but I’m afraid I’m still seeing the 
following issue:


2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/overview.json from classloader


2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader


flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

Did you have anything else in mind when you said pointing somewhere funny?

*// *ah**

*From:*Chesnay Schepler 
*Sent:* Monday, April 27, 2020 5:56 AM
*To:* Hailu, Andreas [Engineering] ; 
user@flink.apache.org

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

overview.json is a generated file that is placed in the local 
directory controlled by /historyserver.web.tmpdir/.


Have you configured this option to point to some non-local filesystem? 
(Or if not, is the java.io.tmpdir property pointing somewhere funny?)


On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in
HistoryServerStaticFileServerHandler - is there an assumption
about where overview.json is supposed to be located?

*// *ah

*From:*Hailu, Andreas [Engineering]
*Sent:* Wednesday, April 22, 2020 1:32 PM
*To:* 'Chesnay Schepler' 
<mailto:ches...@apache.org>; Hailu, Andreas [Engineering]

<mailto:andreas.ha...@ny.email.gs.com>; user@flink.apache.org
<mailto:user@flink.apache.org>
    *Subject:* RE: History Server Not Showing Any Jobs - File Not Found?

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I
enabled DEBUG level logging and this is something relevant I see:

2020-04-22 13:25:52,566
[Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream
- Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567
[Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
SaslDataTransferClient - SASL encryption trust check:
localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567
[Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG
SaslDataTransferClient - SASL client skipping handshake in secured
configuration with privileged port for addr = /10.79.252.101,
datanodeId = DatanodeI


nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

*2020-04-22 13:25:52,571
[Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream
- DFSInputStream has been closed already*

*2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG
HistoryServerStaticFileServerHandler - Unable to load requested
file /jobs/overview.json from classloader*

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG
Client$Connection$3 - IPC Client (1578587450) connection to
d279536-002.dc.gs.com/10.59.61.87:8020 from d...@gs.com
<mailto:d...@gs.com> sending #1391

Aside from that, it looks like a lot of logging around datanodes
and block location metadata. Did I miss something in my classpath,
perhaps? If so, do you have a suggestion on what I could try?

*// *ah

*From:*Chesnay Schepler mailto:ches...@apache.org>>
*Sent:* Wednesday, April 22, 2020 2:16 AM
*To:* Hailu, Andreas [Engineering] mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org
<mailto:user@flink.apache.org>
*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

    I’m trying to set up the History Server, but none of my
applications are showing up in the Web UI. Looking at the
console, I see that all of the calls to /overview return the
following 404 response: {"errors":["File not found."]}.

I’ve set up my configuration as follows:

JobManager Archive directory:

*jobmanager.archive.fs.dir*:
hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ 

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Hailu, Andreas
My machine's /tmp directory is not large enough to support the archived files, 
so I changed my java.io.tmpdir to be in some other location which is 
significantly larger. I hadn't set anything for historyserver.web.tmpdir, so I 
suspect it was still pointing at /tmp. I just tried setting 
historyserver.web.tmpdir to the same location as my java.io.tmpdir location, 
but I'm afraid I'm still seeing the following issue:

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/overview.json from classloader
2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader

flink-conf.yaml for reference:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

Did you have anything else in mind when you said pointing somewhere funny?

// ah

From: Chesnay Schepler 
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?


overview.json is a generated file that is placed in the local directory 
controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if 
not, is the java.io.tmpdir property pointing somewhere funny?)
On 24/04/2020 18:24, Hailu, Andreas wrote:
I'm having a further look at the code in HistoryServerStaticFileServerHandler - 
is there an assumption about where overview.json is supposed to be located?

// ah

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' <mailto:ches...@apache.org>; Hailu, 
Andreas [Engineering] 
<mailto:andreas.ha...@ny.email.gs.com>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Hi Chesnay, thanks for responding. We're using Flink 1.9.1. I enabled DEBUG 
level logging and this is something relevant I see:

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - Connecting to datanode 10.79.252.101:1019
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, 
remoteHostTrusted = false
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL client skipping handshake in secured 
configuration with privileged port for addr = /10.79.252.101, datanodeId = 
DatanodeI
nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]
2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - DFSInputStream has been closed already
2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader
2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG 
Client$Connection$3 - IPC Client (1578587450) connection to 
d279536-002.dc.gs.com/10.59.61.87:8020 from d...@gs.com<mailto:d...@gs.com> 
sending #1391

Aside from that, it looks like a lot of logging around datanodes and block 
location metadata. Did I miss something in my classpath, perhaps? If so, do you 
have a suggestion on what I could try?

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Which Flink version are you using?
Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:
Hi,

I'm trying to set up the History Server, but none of my applications are 
showing up in the Web UI. Looking at the console, I see that all of the calls 
to /overview return the following 404 response: {"errors":["File not found."]}.

I've set up my configuration as follows:

JobManager Archive directory:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...
...

History Server will fetch the archived jobs from the same location:
historyserver.archive.f

Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-27 Thread Chesnay Schepler
overview.json is a generated file that is placed in the local directory 
controlled by /historyserver.web.tmpdir/.


Have you configured this option to point to some non-local filesystem? 
(Or if not, is the java.io.tmpdir property pointing somewhere funny?)


On 24/04/2020 18:24, Hailu, Andreas wrote:


I’m having a further look at the code in 
HistoryServerStaticFileServerHandler - is there an assumption about 
where overview.json is supposed to be located?


*// *ah**

*From:*Hailu, Andreas [Engineering]
*Sent:* Wednesday, April 22, 2020 1:32 PM
*To:* 'Chesnay Schepler' ; Hailu, Andreas 
[Engineering] ; user@flink.apache.org

*Subject:* RE: History Server Not Showing Any Jobs - File Not Found?

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled 
DEBUG level logging and this is something relevant I see:


2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] 
DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019


2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] 
DEBUG SaslDataTransferClient - SASL encryption trust check: 
localHostTrusted = false, remoteHostTrusted = false


2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] 
DEBUG SaslDataTransferClient - SASL client skipping handshake in 
secured configuration with privileged port for addr = /10.79.252.101, 
datanodeId = DatanodeI


nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

*2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] 
DEBUG DFSInputStream - DFSInputStream has been closed already*


*2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader*


2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG 
Client$Connection$3 - IPC Client (1578587450) connection to 
d279536-002.dc.gs.com/10.59.61.87:8020 from d...@gs.com 
<mailto:d...@gs.com> sending #1391


Aside from that, it looks like a lot of logging around datanodes and 
block location metadata. Did I miss something in my classpath, 
perhaps? If so, do you have a suggestion on what I could try?


*// *ah**

*From:*Chesnay Schepler mailto:ches...@apache.org>>
*Sent:* Wednesday, April 22, 2020 2:16 AM
*To:* Hailu, Andreas [Engineering] <mailto:andreas.ha...@ny.email.gs.com>>; user@flink.apache.org 
<mailto:user@flink.apache.org>

*Subject:* Re: History Server Not Showing Any Jobs - File Not Found?

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

    I’m trying to set up the History Server, but none of my
applications are showing up in the Web UI. Looking at the console,
I see that all of the calls to /overview return the following 404
response: {"errors":["File not found."]}.

I’ve set up my configuration as follows:

JobManager Archive directory:

*jobmanager.archive.fs.dir*:
hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r- 3 delp datalake_admin_dev  50569 2020-03-21 23:17
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r- 3 delp datalake_admin_dev  49578 2020-03-03 08:45
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r- 3 delp datalake_admin_dev  50842 2020-03-24 15:19
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

History Server will fetch the archived jobs from the same location:

*historyserver.archive.fs.dir*:
hdfs:///user/p2epda/lake/delp_qa/flink_hs/

So I’m able to confirm that there are indeed archived applications
that I should be able to view in the histserver. I’m not able to
find out what file the overview service is looking for from the
repository – any suggestions as to what I could look into next?

Best,

Andreas




Your Personal Data: We may collect and process information about
you that may be subject to data protection laws. For more
information about how we use and disclose your personal data, how
we protect your information, our legal basis to use your
information, your rights and who you can contact, please refer to:
www.gs.com/privacy-notices <http://www.gs.com/privacy-notices>




Your Personal Data: We may collect and process information about you 
that may be subject to data protection laws. For more information 
about how we use and disclose your personal data, how we protect your 
information, our legal basis to use your information, your rights and 
who you can contact, please ref

RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-24 Thread Hailu, Andreas
I'm having a further look at the code in HistoryServerStaticFileServerHandler - 
is there an assumption about where overview.json is supposed to be located?

// ah

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' ; Hailu, Andreas [Engineering] 
; user@flink.apache.org
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

Hi Chesnay, thanks for responding. We're using Flink 1.9.1. I enabled DEBUG 
level logging and this is something relevant I see:

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - Connecting to datanode 10.79.252.101:1019
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, 
remoteHostTrusted = false
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL client skipping handshake in secured 
configuration with privileged port for addr = /10.79.252.101, datanodeId = 
DatanodeI
nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]
2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - DFSInputStream has been closed already
2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader
2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG 
Client$Connection$3 - IPC Client (1578587450) connection to 
d279536-002.dc.gs.com/10.59.61.87:8020 from d...@gs.com<mailto:d...@gs.com> 
sending #1391

Aside from that, it looks like a lot of logging around datanodes and block 
location metadata. Did I miss something in my classpath, perhaps? If so, do you 
have a suggestion on what I could try?

// ah

From: Chesnay Schepler mailto:ches...@apache.org>>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] 
mailto:andreas.ha...@ny.email.gs.com>>; 
user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Which Flink version are you using?
Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:
Hi,

I'm trying to set up the History Server, but none of my applications are 
showing up in the Web UI. Looking at the console, I see that all of the calls 
to /overview return the following 404 response: {"errors":["File not found."]}.

I've set up my configuration as follows:

JobManager Archive directory:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...
...

History Server will fetch the archived jobs from the same location:
historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

So I'm able to confirm that there are indeed archived applications that I 
should be able to view in the histserver. I'm not able to find out what file 
the overview service is looking for from the repository - any suggestions as to 
what I could look into next?

Best,
Andreas



Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>





Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>


RE: History Server Not Showing Any Jobs - File Not Found?

2020-04-22 Thread Hailu, Andreas
Hi Chesnay, thanks for responding. We're using Flink 1.9.1. I enabled DEBUG 
level logging and this is something relevant I see:

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - Connecting to datanode 10.79.252.101:1019
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, 
remoteHostTrusted = false
2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
SaslDataTransferClient - SASL client skipping handshake in secured 
configuration with privileged port for addr = /10.79.252.101, datanodeId = 
DatanodeI
nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]
2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG 
DFSInputStream - DFSInputStream has been closed already
2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG 
HistoryServerStaticFileServerHandler - Unable to load requested file 
/jobs/overview.json from classloader
2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG 
Client$Connection$3 - IPC Client (1578587450) connection to 
d279536-002.dc.gs.com/10.59.61.87:8020 from d...@gs.com sending #1391

Aside from that, it looks like a lot of logging around datanodes and block 
location metadata. Did I miss something in my classpath, perhaps? If so, do you 
have a suggestion on what I could try?

// ah

From: Chesnay Schepler 
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] ; 
user@flink.apache.org
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

Which Flink version are you using?
Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:
Hi,

I'm trying to set up the History Server, but none of my applications are 
showing up in the Web UI. Looking at the console, I see that all of the calls 
to /overview return the following 404 response: {"errors":["File not found."]}.

I've set up my configuration as follows:

JobManager Archive directory:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...
...

History Server will fetch the archived jobs from the same location:
historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

So I'm able to confirm that there are indeed archived applications that I 
should be able to view in the histserver. I'm not able to find out what file 
the overview service is looking for from the repository - any suggestions as to 
what I could look into next?

Best,
Andreas



Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>





Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>


Re: History Server Not Showing Any Jobs - File Not Found?

2020-04-22 Thread Chesnay Schepler

Which Flink version are you using?
Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:


Hi,

I’m trying to set up the History Server, but none of my applications 
are showing up in the Web UI. Looking at the console, I see that all 
of the calls to /overview return the following 404 response: 
{"errors":["File not found."]}.


I’ve set up my configuration as follows:

JobManager Archive directory:

*jobmanager.archive.fs.dir*: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r- 3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936


-rw-r- 3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5


-rw-r- 3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757


...

...

History Server will fetch the archived jobs from the same location:

*historyserver.archive.fs.dir*: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

So I’m able to confirm that there are indeed archived applications 
that I should be able to view in the histserver. I’m not able to find 
out what file the overview service is looking for from the repository 
– any suggestions as to what I could look into next?


Best,

Andreas




Your Personal Data: We may collect and process information about you 
that may be subject to data protection laws. For more information 
about how we use and disclose your personal data, how we protect your 
information, our legal basis to use your information, your rights and 
who you can contact, please refer to: www.gs.com/privacy-notices 
<http://www.gs.com/privacy-notices>





History Server Not Showing Any Jobs - File Not Found?

2020-04-21 Thread Hailu, Andreas [Engineering]
Hi,

I'm trying to set up the History Server, but none of my applications are 
showing up in the Web UI. Looking at the console, I see that all of the calls 
to /overview return the following 404 response: {"errors":["File not found."]}.

I've set up my configuration as follows:

JobManager Archive directory:
jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/
-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs
Found 44282 items
-rw-r-   3 delp datalake_admin_dev  50569 2020-03-21 23:17 
/user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936
-rw-r-   3 delp datalake_admin_dev  49578 2020-03-03 08:45 
/user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5
-rw-r-   3 delp datalake_admin_dev  50842 2020-03-24 15:19 
/user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757
...
...

History Server will fetch the archived jobs from the same location:
historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

So I'm able to confirm that there are indeed archived applications that I 
should be able to view in the histserver. I'm not able to find out what file 
the overview service is looking for from the repository - any suggestions as to 
what I could look into next?

Best,
Andreas



Your Personal Data: We may collect and process information about you that may 
be subject to data protection laws. For more information about how we use and 
disclose your personal data, how we protect your information, our legal basis 
to use your information, your rights and who you can contact, please refer to: 
www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>


Re: History server UI not working

2020-03-10 Thread Yadong Xie
Hi pwestermann

I believe this is related to
https://issues.apache.org/jira/browse/FLINK-13799

It seems that the configuration.features['web-submit'] is missed from the
api when you upgrading from 1.7 to 1.9.2

Do you have the same problem when upgrading to 1.10? feel free to ping me if
you still have related problems.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: History server UI not working

2020-03-09 Thread pwestermann
Hey Robert,

I just tried Flink 1.10 and the history server UI works for me too. Only
Flink 1.9.2 is not loading.
Since we were already looking into upgrading to 1.10, I might just do that
now.

Thanks,
Peter 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: History server UI not working

2020-03-07 Thread Robert Metzger
Hey Peter,

I tried reproducing the error, and for a second, I though the 1.10 release
really broke the web ui, because I saw a pretty similar error.
However after clearing the cache, the error was gone.
Are you sure that you cleared the cache of your browser?

I have also asked the main contributor working on the web UI to take a look
at this thread.

Best,
Robert

On Fri, Mar 6, 2020 at 5:52 PM pwestermann 
wrote:

> I am seeing this error in firefox:
>
> ERROR TypeError: "this.statusService.configuration.features is undefined"
> t http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> qr http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> Gr http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> ko http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> Oo http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> Bo http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> create http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> create http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> bootstrap http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> _moduleDoBootstrap
> http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> _moduleDoBootstrap
> http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> o http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> invoke http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> onInvoke http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> invoke http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> run http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> I http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> invokeTask
> http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> onInvokeTask http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
> invokeTask
> http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> runTask http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> g http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> invokeTask
> http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> m http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
> k http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
>
> And this one in Chrome (both on Mac):
>
> main.177039bdbab11da4f8ac.js:1 ERROR TypeError: Cannot read property
> 'web-submit' of undefined
> at new t (main.177039bdbab11da4f8ac.js:1)
> at qr (main.177039bdbab11da4f8ac.js:1)
> at Gr (main.177039bdbab11da4f8ac.js:1)
> at ko (main.177039bdbab11da4f8ac.js:1)
> at Oo (main.177039bdbab11da4f8ac.js:1)
> at Object.Bo [as createRootView] (main.177039bdbab11da4f8ac.js:1)
> at e.create (main.177039bdbab11da4f8ac.js:1)
> at e.create (main.177039bdbab11da4f8ac.js:1)
> at t.bootstrap (main.177039bdbab11da4f8ac.js:1)
> at main.177039bdbab11da4f8ac.js:1
>
> Refreshing doesn't do anything.
>
> Thanks for looking into this,
>
> Peter
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: History server UI not working

2020-03-06 Thread pwestermann
I am seeing this error in firefox:

ERROR TypeError: "this.statusService.configuration.features is undefined"
t http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
qr http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
Gr http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
ko http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
Oo http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
Bo http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
create http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
create http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
bootstrap http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
_moduleDoBootstrap
http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
_moduleDoBootstrap
http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
o http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
invoke http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
onInvoke http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
invoke http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
run http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
I http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
invokeTask http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
onInvokeTask http://10.25.197.60:8082/main.177039bdbab11da4f8ac.js:1
invokeTask http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
runTask http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
g http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
invokeTask http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
m http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1
k http://10.25.197.60:8082/polyfills.b37850e8279bc3caafc9.js:1

And this one in Chrome (both on Mac):

main.177039bdbab11da4f8ac.js:1 ERROR TypeError: Cannot read property
'web-submit' of undefined
at new t (main.177039bdbab11da4f8ac.js:1)
at qr (main.177039bdbab11da4f8ac.js:1)
at Gr (main.177039bdbab11da4f8ac.js:1)
at ko (main.177039bdbab11da4f8ac.js:1)
at Oo (main.177039bdbab11da4f8ac.js:1)
at Object.Bo [as createRootView] (main.177039bdbab11da4f8ac.js:1)
at e.create (main.177039bdbab11da4f8ac.js:1)
at e.create (main.177039bdbab11da4f8ac.js:1)
at t.bootstrap (main.177039bdbab11da4f8ac.js:1)
at main.177039bdbab11da4f8ac.js:1

Refreshing doesn't do anything.

Thanks for looking into this,

Peter



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: History server UI not working

2020-03-06 Thread Robert Metzger
I'm also suspecting a problem with the UI updated

Do the Developer Tools of the browser show any error messages?

On Thu, Mar 5, 2020 at 7:00 AM Yang Wang  wrote:

> If all the rest api could be viewed successfully, then the reason may be
> js cache.
> You could try to force a refresh(e.g. Cmd+Shft+R for Mac). It solved my
> problem before.
>
>
> Best,
> Yang
>
> pwestermann  于2020年3月4日周三 下午8:40写道:
>
>> We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server
>> UI
>> now seems to be broken. It doesn't load and always just displays a blank
>> screen.
>> The individual endpoints (e.g. /jobs/overview) still work.
>> Could this be an issue caused by the Angular update for the regular UI?
>>
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>


Re: History server UI not working

2020-03-04 Thread Yang Wang
If all the rest api could be viewed successfully, then the reason may be js
cache.
You could try to force a refresh(e.g. Cmd+Shft+R for Mac). It solved my
problem before.


Best,
Yang

pwestermann  于2020年3月4日周三 下午8:40写道:

> We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server
> UI
> now seems to be broken. It doesn't load and always just displays a blank
> screen.
> The individual endpoints (e.g. /jobs/overview) still work.
> Could this be an issue caused by the Angular update for the regular UI?
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


History server UI not working

2020-03-04 Thread pwestermann
We recently upgraded from Flink 1.7 to Flink 1.9.2 and the history server UI
now seems to be broken. It doesn't load and always just displays a blank
screen. 
The individual endpoints (e.g. /jobs/overview) still work.
Could this be an issue caused by the Angular update for the regular UI?




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: job history server

2020-02-18 Thread Richard Moorhead
2020-02-18 09:44:45,227 ERROR
org.apache.flink.runtime.webmonitor.hist/ry.HistoryServerArchiveFetcher  -
Failure while fetching/process
ing job archive for job eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/062e4d80ed1d4bdafd24e46
2245c5926/subtasks/86/attempts/0.json: No space left on device

and there it is:

42103b5b-5410-d2d8-6a0b-21757e4a0fbc ~
0 % df -iH
Filesystem   Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg00-rootlv00
   132k   13k  119k   10% /
tmpfs   `  508k  465k   43k   92% /dev/shm

Thanks for the tip.

On Mon, Feb 17, 2020 at 8:08 PM Richard Moorhead 
wrote:

> I did not know that.
>
> I have since wiped the directory. I will post when I see this error again.
>
> On Mon, Feb 17, 2020 at 8:03 PM Benchao Li  wrote:
>
>> `df -H` only gives the sizes, not inodes information. Could you also show
>> us the result of `df -iH`?
>>
>> Richard Moorhead  于2020年2月18日周二 上午9:40写道:
>>
>>> Yes, I did. I mentioned it last but I should have been clearer:
>>>
>>> 22526:~/ $ df -H
>>>
>>>
>>>  [18:15:20]
>>> FilesystemSize  Used Avail Use% Mounted on
>>> /dev/mapper/vg00-rootlv00
>>>   2.1G  777M  1.2G  41% /
>>> tmpfs 2.1G  753M  1.4G  37% /dev/shm
>>>
>>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:
>>>
>>>> Hi Richard,
>>>>
>>>> Have you checked that inodes of the disk partition were full or not?
>>>>
>>>> Richard Moorhead |richard.moorh...@gmail.com> 于2020年2月18日周二 上午8:16写道:
>>>>
>>>>> I see the following exception often:
>>>>>
>>>>> 2020-02-17 18:13:26,796 ERROR
>>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>>>> Failure while fetching/processing job archive for job
>>>>> eaf0639027aca1624adaa100bdf1332e.
>>>>> java.nio.file.FileSystemException:
>>>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6ab&3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>>>> No space left on device
>>>>> at
>>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>>> at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>>> at
>>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>>> at
>>>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>>>> at
>>>>> java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
J>>>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>>>> at
>>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>>> at
>>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>>> at java.lang.Thread.run(Thread.java:748)
>>>>>
>>>>>
>>>>> Unfortunately the partition listed does not appear to be full or
>>>>> anywhere near full?
>>>>>
>>>>> Is there ! workaround to this?
>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Benchao Li
>>>> School of Electronics Engineering and Computer Science, Peking University
>>>> Tel:+86-15650713730
>>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>>>
>>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>>


Re: job history server

2020-02-17 Thread Richard Moorhead
I did not know that.

I have since wiped the directory. I will post when I see this error again.

On Mon, Feb 17, 2020 at 8:03 PM Benchao Li  wrote:

> `df -H` only gives the sizes, not inodes information. Could you also show
> us the result of `df -iH`?
>
> Richard Moorhead  于2020年2月18日周二 上午9:40写道:
>
>> Yes, I did. I mentioned it last but I should have been clearer:
>>
>> 22526:~/ $ df -H
>>
>>
>>  [18:15:20]
>> FilesystemSize  Used Avail Use% Mounted on
>> /dev/mapper/vg00-rootlv00
>>   2.1G  777M  1.2G  41% /
>> tmpfs 2.1G  753M  1.4G  37% /dev/shm
>>
>> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:
>>
>>> Hi Richard,
>>>
>>> Have you checked that inodes of the disk partition were full or not?
>>>
>>> Richard Moorhead  于2020年2月18日周二 上午8:16写道:
>>>
>>>> I see the following exception often:
>>>>
>>>> 2020-02-17 18:13:26,796 ERROR
>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>>> Failure while fetching/processing job archive for job
>>>> eaf0639027aca1624adaa100bdf1332e.
>>>> java.nio.file.FileSystemException:
>>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>>> No space left on device
>>>> at
>>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>>> at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>>> at
>>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>>> at
>>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>>> at
>>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>>> at
>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>> at
>>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>>> at
>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>>> at java.lang.Thread.run(Thread.java:748)
>>>>
>>>>
>>>> Unfortunately the partition listed does not appear to be full or
>>>> anywhere near full?
>>>>
>>>> Is there a workaround to this?
>>>>
>>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>>
>>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>


Re: job history server

2020-02-17 Thread Benchao Li
`df -H` only gives the sizes, not inodes information. Could you also show
us the result of `df -iH`?

Richard Moorhead  于2020年2月18日周二 上午9:40写道:

> Yes, I did. I mentioned it last but I should have been clearer:
>
> 22526:~/ $ df -H
>
>
>[18:15:20]
> FilesystemSize  Used Avail Use% Mounted on
> /dev/mapper/vg00-rootlv00
>   2.1G  777M  1.2G  41% /
> tmpfs 2.1G  753M  1.4G  37% /dev/shm
>
> On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:
>
>> Hi Richard,
>>
>> Have you checked that inodes of the disk partition were full or not?
>>
>> Richard Moorhead  于2020年2月18日周二 上午8:16写道:
>>
>>> I see the following exception often:
>>>
>>> 2020-02-17 18:13:26,796 ERROR
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>>> Failure while fetching/processing job archive for job
>>> eaf0639027aca1624adaa100bdf1332e.
>>> java.nio.file.FileSystemException:
>>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>>> No space left on device
>>> at
>>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>>> at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>>> at
>>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>>> at
>>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>>> at java.nio.file.Files.createDirectory(Files.java:674)
>>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>>> at java.nio.file.Files.createDirectories(Files.java:767)
>>> at
>>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>>> at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>> at
>>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>>> at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>>
>>> Unfortunately the partition listed does not appear to be full or
>>> anywhere near full?
>>>
>>> Is there a workaround to this?
>>>
>>>
>>
>> --
>>
>> Benchao Li
>> School of Electronics Engineering and Computer Science, Peking University
>> Tel:+86-15650713730
>> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>>
>>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


Re: job history server

2020-02-17 Thread Richard Moorhead
Yes, I did. I mentioned it last but I should have been clearer:

22526:~/ $ df -H


   [18:15:20]
FilesystemSize  Used Avail Use% Mounted on
/dev/mapper/vg00-rootlv00
  2.1G  777M  1.2G  41% /
tmpfs 2.1G  753M  1.4G  37% /dev/shm

On Mon, Feb 17, 2020 at 7:13 PM Benchao Li  wrote:

> Hi Richard,
>
> Have you checked that inodes of the disk partition were full or not?
>
> Richard Moorhead  于2020年2月18日周二 上午8:16写道:
>
>> I see the following exception often:
>>
>> 2020-02-17 18:13:26,796 ERROR
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
>> Failure while fetching/processing job archive for job
>> eaf0639027aca1624adaa100bdf1332e.
>> java.nio.file.FileSystemException:
>> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
>> No space left on device
>> at
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>> at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>> at
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>> at
>> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
>> at java.nio.file.Files.createDirectory(Files.java:674)
>> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
>> at java.nio.file.Files.createDirectories(Files.java:767)
>> at
>> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>> at
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> Unfortunately the partition listed does not appear to be full or anywhere
>> near full?
>>
>> Is there a workaround to this?
>>
>>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: libenc...@gmail.com; libenc...@pku.edu.cn
>
>


Re: job history server

2020-02-17 Thread Benchao Li
Hi Richard,

Have you checked that inodes of the disk partition were full or not?

Richard Moorhead  于2020年2月18日周二 上午8:16写道:

> I see the following exception often:
>
> 2020-02-17 18:13:26,796 ERROR
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
> Failure while fetching/processing job archive for job
> eaf0639027aca1624adaa100bdf1332e.
> java.nio.file.FileSystemException:
> /dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
> No space left on device
> at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at
> sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
> at java.nio.file.Files.createDirectory(Files.java:674)
> at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
> at java.nio.file.Files.createDirectories(Files.java:767)
> at
> org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
>
>
> Unfortunately the partition listed does not appear to be full or anywhere
> near full?
>
> Is there a workaround to this?
>
>

-- 

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: libenc...@gmail.com; libenc...@pku.edu.cn


job history server

2020-02-17 Thread Richard Moorhead
I see the following exception often:

2020-02-17 18:13:26,796 ERROR
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher  -
Failure while fetching/processing job archive for job
eaf0639027aca1624adaa100bdf1332e.
java.nio.file.FileSystemException:
/dev/shm/flink-history-server/jobs/eaf0639027aca1624adaa100bdf1332e/vertices/6abf3ed37d1a5e48f2786b832033f074/subtasks/86/attempts:
No space left on device
at
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:384)
at java.nio.file.Files.createDirectory(Files.java:674)
at java.nio.file.Files.createAndCheckIsDirectory(Files.java:781)
at java.nio.file.Files.createDirectories(Files.java:767)
at
org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:186)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)


Unfortunately the partition listed does not appear to be full or anywhere
near full?

Is there a workaround to this?


Re: best practices on getting flink job logs from Hadoop history server?

2019-09-05 Thread Yang Wang
I think the best way to view the log is flink history server.
However, it could only support jobGraph and exceptions. Maybe
the flink history server needs to be enhanced so that we could view
logs just like the cluster is running.


Best,
Yang

Yu Yang  于2019年9月6日周五 上午3:06写道:

> Hi Yun Tang & Zhu Zhu,
>
> Thanks for the reply!  With your current approach, we will still need to
> search job manager log / yarn client log to find information on job
> id/vertex id --> yarn container id mapping. I am wondering howe we can
> propagate this kind of information to Flink execution graph so that it can
> stored under flink history server's archived execution graph. Any
> suggestions about that?
>
> -Yu
>
> On Fri, Aug 30, 2019 at 2:21 AM Yun Tang  wrote:
>
>> Hi  Yu
>>
>> If you have client job log and you could find your application id from
>> below description:
>>
>> The Flink YARN client has been started in detached mode. In order to stop
>> Flink on YARN, use the following command or a YARN web interface to stop it:
>> yarn application -kill {appId}
>> Please also note that the temporary files of the YARN session in the home
>> directory will not be removed.
>>
>> Best
>> Yun Tang
>>
>> --
>> *From:* Zhu Zhu 
>> *Sent:* Friday, August 30, 2019 16:24
>> *To:* Yu Yang 
>> *Cc:* user 
>> *Subject:* Re: best practices on getting flink job logs from Hadoop
>> history server?
>>
>> Hi Yu,
>>
>> Regarding #2,
>> Currently we search task deployment log in JM log, which contains info of
>> the container and machine the task deploys to.
>>
>> Regarding #3,
>> You can find the application logs aggregated by machines on DFS, this
>> path of which relies on your YARN config.
>> Each log may still include multiple TM logs. However it can be much
>> smaller than the "yarn logs ..." generated log.
>>
>> Thanks,
>> Zhu Zhu
>>
>> Yu Yang  于2019年8月30日周五 下午3:58写道:
>>
>> Hi,
>>
>> We run flink jobs through yarn on hadoop clusters. One challenge that we
>> are facing is to simplify flink job log access.
>>
>> The flink job logs can be accessible using "yarn logs $application_id".
>> That approach has a few limitations:
>>
>>1. It is not straightforward to find yarn application id based on
>>flink job id.
>>2. It is difficult to find the corresponding container id for the
>>flink sub tasks.
>>3. For jobs that have many tasks, it is inefficient to use "yarn logs
>>..."  as it mixes logs from all task managers.
>>
>> Any suggestions on the best practice to get logs for completed flink job
>> that run on yarn?
>>
>> Regards,
>> -Yu
>>
>>
>>


Re: best practices on getting flink job logs from Hadoop history server?

2019-09-05 Thread Yu Yang
Hi Yun Tang & Zhu Zhu,

Thanks for the reply!  With your current approach, we will still need to
search job manager log / yarn client log to find information on job
id/vertex id --> yarn container id mapping. I am wondering howe we can
propagate this kind of information to Flink execution graph so that it can
stored under flink history server's archived execution graph. Any
suggestions about that?

-Yu

On Fri, Aug 30, 2019 at 2:21 AM Yun Tang  wrote:

> Hi  Yu
>
> If you have client job log and you could find your application id from
> below description:
>
> The Flink YARN client has been started in detached mode. In order to stop
> Flink on YARN, use the following command or a YARN web interface to stop it:
> yarn application -kill {appId}
> Please also note that the temporary files of the YARN session in the home
> directory will not be removed.
>
> Best
> Yun Tang
>
> --
> *From:* Zhu Zhu 
> *Sent:* Friday, August 30, 2019 16:24
> *To:* Yu Yang 
> *Cc:* user 
> *Subject:* Re: best practices on getting flink job logs from Hadoop
> history server?
>
> Hi Yu,
>
> Regarding #2,
> Currently we search task deployment log in JM log, which contains info of
> the container and machine the task deploys to.
>
> Regarding #3,
> You can find the application logs aggregated by machines on DFS, this path
> of which relies on your YARN config.
> Each log may still include multiple TM logs. However it can be much
> smaller than the "yarn logs ..." generated log.
>
> Thanks,
> Zhu Zhu
>
> Yu Yang  于2019年8月30日周五 下午3:58写道:
>
> Hi,
>
> We run flink jobs through yarn on hadoop clusters. One challenge that we
> are facing is to simplify flink job log access.
>
> The flink job logs can be accessible using "yarn logs $application_id".
> That approach has a few limitations:
>
>1. It is not straightforward to find yarn application id based on
>flink job id.
>2. It is difficult to find the corresponding container id for the
>flink sub tasks.
>3. For jobs that have many tasks, it is inefficient to use "yarn logs
>..."  as it mixes logs from all task managers.
>
> Any suggestions on the best practice to get logs for completed flink job
> that run on yarn?
>
> Regards,
> -Yu
>
>
>


Re: best practices on getting flink job logs from Hadoop history server?

2019-08-30 Thread Yun Tang
Hi  Yu

If you have client job log and you could find your application id from below 
description:

The Flink YARN client has been started in detached mode. In order to stop Flink 
on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill {appId}
Please also note that the temporary files of the YARN session in the home 
directory will not be removed.

Best
Yun Tang


From: Zhu Zhu 
Sent: Friday, August 30, 2019 16:24
To: Yu Yang 
Cc: user 
Subject: Re: best practices on getting flink job logs from Hadoop history 
server?

Hi Yu,

Regarding #2,
Currently we search task deployment log in JM log, which contains info of the 
container and machine the task deploys to.

Regarding #3,
You can find the application logs aggregated by machines on DFS, this path of 
which relies on your YARN config.
Each log may still include multiple TM logs. However it can be much smaller 
than the "yarn logs ..." generated log.

Thanks,
Zhu Zhu

Yu Yang mailto:yuyan...@gmail.com>> 于2019年8月30日周五 下午3:58写道:
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we are 
facing is to simplify flink job log access.

The flink job logs can be accessible using "yarn logs $application_id". That 
approach has a few limitations:

  1.  It is not straightforward to find yarn application id based on flink job 
id.
  2.  It is difficult to find the corresponding container id for the flink sub 
tasks.
  3.  For jobs that have many tasks, it is inefficient to use "yarn logs ..."  
as it mixes logs from all task managers.

Any suggestions on the best practice to get logs for completed flink job that 
run on yarn?

Regards,
-Yu




Re: best practices on getting flink job logs from Hadoop history server?

2019-08-30 Thread Zhu Zhu
Hi Yu,

Regarding #2,
Currently we search task deployment log in JM log, which contains info of
the container and machine the task deploys to.

Regarding #3,
You can find the application logs aggregated by machines on DFS, this path
of which relies on your YARN config.
Each log may still include multiple TM logs. However it can be much smaller
than the "yarn logs ..." generated log.

Thanks,
Zhu Zhu

Yu Yang  于2019年8月30日周五 下午3:58写道:

> Hi,
>
> We run flink jobs through yarn on hadoop clusters. One challenge that we
> are facing is to simplify flink job log access.
>
> The flink job logs can be accessible using "yarn logs $application_id".
> That approach has a few limitations:
>
>1. It is not straightforward to find yarn application id based on
>flink job id.
>2. It is difficult to find the corresponding container id for the
>flink sub tasks.
>3. For jobs that have many tasks, it is inefficient to use "yarn logs
>..."  as it mixes logs from all task managers.
>
> Any suggestions on the best practice to get logs for completed flink job
> that run on yarn?
>
> Regards,
> -Yu
>
>
>


best practices on getting flink job logs from Hadoop history server?

2019-08-30 Thread Yu Yang
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we
are facing is to simplify flink job log access.

The flink job logs can be accessible using "yarn logs $application_id".
That approach has a few limitations:

   1. It is not straightforward to find yarn application id based on flink
   job id.
   2. It is difficult to find the corresponding container id for the flink
   sub tasks.
   3. For jobs that have many tasks, it is inefficient to use "yarn logs
   ..."  as it mixes logs from all task managers.

Any suggestions on the best practice to get logs for completed flink job
that run on yarn?

Regards,
-Yu


Re: History Server in Kubernetes

2018-08-30 Thread Till Rohrmann
Hi Encho,

currently, the existing image does not support to start a HistoryServer.
The reason is simply that it has not been exposed because the image
contains everything needed. In order to do this, you would need to extend
the docker-entrypoint.sh script with an additional history-server option.
It could look the following:

```
if [ "${CMD}" == "${TASK_MANAGER}" ]; then
exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@"
elif [ "${CMD}" == "history-server" ]; then
exec $FLINK_HOME/bin/historyserver.sh start-foreground "$@"
else
exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
fi
```

Do you want to create an JIRA issue for that and contribute it?

Cheers,
Till

On Thu, Aug 30, 2018 at 9:04 AM Encho Mishinev 
wrote:

> Hello,
>
> I am struggling to find how to run a history server in Kubernetes. The
> docker image takes an argument that starts a jobmanager or a taskmanager,
> but no history server. What's the best way to set up one in K8S?
>
> Thanks,
> Encho
>


History Server in Kubernetes

2018-08-30 Thread Encho Mishinev
Hello,

I am struggling to find how to run a history server in Kubernetes. The
docker image takes an argument that starts a jobmanager or a taskmanager,
but no history server. What's the best way to set up one in K8S?

Thanks,
Encho


Re: History Server

2018-01-16 Thread Eron Wright
As a follow-up question, how well does the history server work for
observing a running job?   I'm trying to understand whether, in the
cluster-per-job model, a user would be expected to hop from the Web UI to
the History Server once the job completed.

Thanks

On Wed, Oct 4, 2017 at 3:49 AM, Stephan Ewen <se...@apache.org> wrote:

> To add to this:
>
> The History Server is mainly useful in cases where one runs a
> Flink-cluster-per-job. One the job finished, the processes disappear. The
> History Server should be longer lived to make past executions' stats
> available.
>
> On Mon, Sep 25, 2017 at 3:44 PM, Nico Kruber <n...@data-artisans.com>
> wrote:
>
>> Hi Elias,
>> in theory, it could be integrated into a single web interface, but this
>> was
>> not done so far.
>> I guess the main reason for keeping it separate was probably to have a
>> better
>> separation of concerns as the history server is actually independent of
>> the
>> current JobManager execution and merely displays previous job results
>> which
>> may also come from different or previously existing JobManager instances
>> which
>> stored history data in its storage directory.
>>
>> Chesnay (cc'd) may elaborate a bit more in case you'd like to change that
>> and
>> integrate the history server (interface) into the JobManager.
>>
>>
>> Nico
>>
>> On Sunday, 24 September 2017 02:48:40 CEST Elias Levy wrote:
>> > I am curious, why is the History Server a separate process and Web UI
>> > instead of being part of the Web Dashboard within the Job Manager?
>>
>>
>>
>


Re: History Server

2017-10-04 Thread Stephan Ewen
To add to this:

The History Server is mainly useful in cases where one runs a
Flink-cluster-per-job. One the job finished, the processes disappear. The
History Server should be longer lived to make past executions' stats
available.

On Mon, Sep 25, 2017 at 3:44 PM, Nico Kruber <n...@data-artisans.com> wrote:

> Hi Elias,
> in theory, it could be integrated into a single web interface, but this was
> not done so far.
> I guess the main reason for keeping it separate was probably to have a
> better
> separation of concerns as the history server is actually independent of the
> current JobManager execution and merely displays previous job results which
> may also come from different or previously existing JobManager instances
> which
> stored history data in its storage directory.
>
> Chesnay (cc'd) may elaborate a bit more in case you'd like to change that
> and
> integrate the history server (interface) into the JobManager.
>
>
> Nico
>
> On Sunday, 24 September 2017 02:48:40 CEST Elias Levy wrote:
> > I am curious, why is the History Server a separate process and Web UI
> > instead of being part of the Web Dashboard within the Job Manager?
>
>
>


Re: History Server

2017-09-25 Thread Nico Kruber
Hi Elias,
in theory, it could be integrated into a single web interface, but this was 
not done so far.
I guess the main reason for keeping it separate was probably to have a better 
separation of concerns as the history server is actually independent of the 
current JobManager execution and merely displays previous job results which 
may also come from different or previously existing JobManager instances which 
stored history data in its storage directory.

Chesnay (cc'd) may elaborate a bit more in case you'd like to change that and 
integrate the history server (interface) into the JobManager.


Nico

On Sunday, 24 September 2017 02:48:40 CEST Elias Levy wrote:
> I am curious, why is the History Server a separate process and Web UI
> instead of being part of the Web Dashboard within the Job Manager?




History Server

2017-09-23 Thread Elias Levy
I am curious, why is the History Server a separate process and Web UI
instead of being part of the Web Dashboard within the Job Manager?