Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed.

2018-09-06 Thread 杨力
Hi all, I am encountering a weird problem when running flink 1.6 in yarn per-job clusters. The job fails in about half an hour after it starts. Related logs is attached as an imange. This piece of log comes from one of the taskmanagers. There are not any other related log lines. No ERROR-level

Re: Cannot set classpath which can be used before job is submitted to yarn.

2018-09-06 Thread bupt_ljy
Hi Hequn, I can read it by using the full path, but I want it to be in program's classpath. For example, I use “bin/flink run -m yarn-cluster” to run the program on Server1, and I have a conf file “config.conf" located in “/server/conf” on Server1, I can’t read this file by using

Re: Behaviour of Process Window Function

2018-09-06 Thread vino yang
Hi Harshvardhan, 1) Yes, ProcessWindowFunction extends AbstractRichFunction, through getRuntimeContext,you can access keyed state API. 2) ProcessWindowFunction has given you considerable flexibility, you can based on processing time / event time / timer / it's clear method / customized

Re: After OutOfMemoryError State can not be readed

2018-09-06 Thread vino yang
Hi Edward, >From this log: Caused by: java.io.EOFException, it seems that the state metadata file has been corrupted. But I can't confirm it, maybe Stefan knows more details, Ping him for you. Thanks, vino. Edward Rojas 于2018年9月7日周五 上午1:22写道: > Hello all, > > We are running Flink 1.5.3 on

Re: Cannot set classpath which can be used before job is submitted to yarn.

2018-09-06 Thread Hequn Cheng
Hi bupt, No sure about the answer. Do you mean that you can't read the file from local FS? Have you ever tried load the file through a full path? or you choose a wrong classloader. Best, Hequn On Thu, Sep 6, 2018 at 11:01 PM bupt_ljy wrote: > Hi,all > >I’m using “bin/flink run -m

How to customize schedule mode and result partition type?

2018-09-06 Thread 陈梓立
Hi all, Here I prefer to forcing a task running in LAZY_FROM_SOURCE schedule mode with all ResultPartitionType be BLOCKING. But I cannot find options to config that in StreamExecutionEnvironment, thus using below as a workaround, quite triky. inal StreamExecutionEnvironment env =

Re: Why FlinkKafkaConsumer doesn't subscribe to topics?

2018-09-06 Thread Renjie Liu
Hi, Julio: Is checkpoint enabled in your job? Flink kafka connector only commits offsets when checkpoint is enabled. On Tue, Sep 4, 2018 at 11:43 PM Tzu-Li (Gordon) Tai wrote: > Hi Julio, > > As Renjie had already mentioned, to achieve exactly-once semantics with > the Kafka consumer, Flink

Re: Setting Flink Monitoring API Port on YARN Cluster

2018-09-06 Thread 陈梓立
Hi Austin, `rest.port` is the latest config option to configure "The port that the server listens on / the client connects to.", with deprecated key `web.port` which is with deprecated key `jobmanager.web.port`, so it is enough to config `rest.port` only (at least for 1.6). However, in your case

Setting Flink Monitoring API Port on YARN Cluster

2018-09-06 Thread Austin Cawley-Edwards
Hi everyone, I'm running a YARN session on a cluster with one master and one core and would like to use the Monitoring API programmatically to submit jobs. I have found that the configuration variables are read but ignored when starting the session - it seems to choose a random port each run.

Re: REST: reading completed jobs' details

2018-09-06 Thread Miguel Coimbra
Exactly, that was the problem. Didn't realize the restructured cluster channels all communications to the REST port. Thanks again. Best, On Thu, 6 Sep 2018 at 17:57, Chesnay Schepler wrote: > Did you by chance use the RemoteEnvironment and pass in 6123 as the port? > If so, try using 8081

After OutOfMemoryError State can not be readed

2018-09-06 Thread Edward Rojas
Hello all, We are running Flink 1.5.3 on Kubernetes with RocksDB as statebackend. When performing some load testing we got an /OutOfMemoryError: native memory exhausted/, causing the job to fail and be restarted. After the Taskmanager is restarted, the job is recovered from a Checkpoint, but it

Re: REST: reading completed jobs' details

2018-09-06 Thread Chesnay Schepler
Did you by chance use the RemoteEnvironment and pass in 6123 as the port? If so, try using 8081 instead, which is the REST port. On 06.09.2018 18:24, Miguel Coimbra wrote: Hello Chesnay, Thanks for the information. Decided to move straight away to launching a standalone cluster. I'm now

Re: REST: reading completed jobs' details

2018-09-06 Thread Miguel Coimbra
Hello Chesnay, Thanks for the information. Decided to move straight away to launching a standalone cluster. I'm now having another problem when trying to submit a job through my Java program after launching the standalone cluster. I configured the cluster (flink-1.6.0/conf/flink-conf.yaml) to

Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-06 Thread Till Rohrmann
Hi Subramanya, if the container is still running and the TM can simply not connect to the JobManager, then the ResourceManager does not see a problem. The RM things in terms of containers and as long as n containers are running, it won't start new ones. That's the reason why the TM should exit in

Cannot set classpath which can be used before job is submitted to yarn.

2018-09-06 Thread bupt_ljy
Hi,all I’m using “bin/flink run -m yarn-cluster” to run my program on yarn. However, it seems that I can’t add my own files into classpath before the the job is submitted to yarn. For example, I have a conf file, which located in my own conf directory, and I need to load file from the conf

Re: Ask about running multiple jars for different stream jobs

2018-09-06 Thread Rad Rad
Thanks very much. Now it works fine. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Ask about running multiple jars for different stream jobs

2018-09-06 Thread Rad Rad
Thanks very much. Now it works fine. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Flink failure recovery tooks very long time

2018-09-06 Thread Yun Tang
Hi Kien You could try to kill one TM container by using 'yarn container -signal FORCEFUL_SHUTDOWN' command, and then watch the first checkpoint after job failover. You could view the checkpoint details[1] to see whether exists outlier operator or sub-task which consumed extremely long time to

Re: Flink failure recovery tooks very long time

2018-09-06 Thread trung kien
And here is the snapshot of my checkpoint metrics in normal condition. On Thu, Sep 6, 2018 at 9:21 AM trung kien wrote: > Hi Yun, > > Yes, the job’s status change to Running pretty fast after failure (~ 1 > min). > > As soon as the status change to running, first checkpoint is kick off and >

Re: Flink failure recovery tooks very long time

2018-09-06 Thread trung kien
Hi Yun, Yes, the job’s status change to Running pretty fast after failure (~ 1 min). As soon as the status change to running, first checkpoint is kick off and it took 30 mins. I need to have exactly-one as i maintining some aggregation metric, do you know whats the diffrent between first

Behaviour of Process Window Function

2018-09-06 Thread Harshvardhan Agrawal
Hello, We have a Flink pipeline where we are windowing our data after a keyBy. i.e. myStream.keyBy().window().process(MyIncrementalAggregation(), MyProcessFunction()). I have two questions about the above line of code: 1) Does the state in the process window function qualify as KeyedState or

Re: Missing Calcite SQL functions in table API

2018-09-06 Thread françois lacombe
Thanks Fabian, I didn't notice select() wasn't SQL compliant. sqlQuery works fine, it's all right :) All the best François 2018-09-05 12:30 GMT+02:00 Fabian Hueske : > Hi > > You are using SQL syntax in a Table API query. You have to stick to Table > API syntax or use SQL as > >

Re: Flink failure recovery tooks very long time

2018-09-06 Thread vino yang
Hi trung, Can you provide more information to aid in positioning? For example, the size of the state generated by a checkpoint and more log information, you can try to switch the log level to DEBUG. Thanks, vino. Yun Tang 于2018年9月6日周四 下午7:42写道: > Hi Kien > > From your description, your job

Re: Flink failure recovery tooks very long time

2018-09-06 Thread Yun Tang
Hi Kien >From your description, your job has already started to execute checkpoint >after job failover, which means your job was in RUNNING status. From my point >of view, the actual recovery time should be the time during job's status: >RESTARTING->CREATED->RUNNING[1]. Your trouble sounds

Flink failure recovery tooks very long time

2018-09-06 Thread trung kien
Hi all, I am trying to test failure recovery of a Flink job when a JM or TM goes down. Our target is having job auto restart and back to normal condition in any case. However, what's I am seeing is very strange and hope someone here help me to understand it. When JM or TM went down, I see the

Re: RocksDB Number of Keys Metric

2018-09-06 Thread Andrey Zagrebin
Hi, afaik, the option is not exposed according to the current state of source code. I can see it to be useful and technically possible using: db.getLongProperty(stateColumnFamilyHandle, "rocksdb.estimate-num-keys”); Though couple of things come into my mind to take into account for this feature:

Re: Increased Size of Incremental Checkpoint

2018-09-06 Thread Stefan Richter
Hi, you should expect that the size can vary for some checkpoints, even if the change rate is constant. Some checkpoints will upload compacted replacements for previous checkpoints to prevent that the checkpoint history will grow without bounds. Whenever that happens, the checkpoint does some

Re: org.apache.flink.util.FlinkException: Could not cancel job

2018-09-06 Thread Chang Liu
You are correct. Thanks! I misused the job ID. Sorry for bothering you guys. Best regards, Chang from iPhone > On 4 Sep 2018, at 18:06, Chesnay Schepler wrote: > > Please check that the job ID is correct. > >> On 04.09.2018 15:48, Chang Liu wrote: >> Dear All, >> >> I had the following

Re: Increased Size of Incremental Checkpoint

2018-09-06 Thread Yun Tang
+ user mail list From: Yun Tang Sent: Thursday, September 6, 2018 14:36 To: burgesschen Subject: Re: Increased Size of Incremental Checkpoint Hi I think the "checkpoint size" metrics showed in your graph means the total checkpoint size of each time. The

Re: ListState - elements order

2018-09-06 Thread vino yang
Hi Alexey, The answer is Yes, which preserves the semantics of the List's order of elements. Thank, vino. Alexey Trenikhun 于2018年9月6日周四 上午10:55写道: > Hello, > Does keyed managed ListState preserve elements order, for example if I > call listState.add(e1); listState.add(e2); listState.add(e3);