Re: Change in sub-task id assignment from 1.9 to 1.10?

2020-08-12 Thread Zhu Zhu
that each slot can contain a source task. With config cluster.evenly-spread-out-slots set to true, slots can be evenly distributed in all available taskmanagers in most cases. Thanks, Zhu Zhu Ken Krugler 于2020年8月7日周五 上午5:28写道: > Hi all, > > Was there any change in how sub-tasks get all

[ANNOUNCE] Apache Flink 1.10.2 released

2020-08-24 Thread Zhu Zhu
The Apache Flink community is very happy to announce the release of Apache Flink 1.10.2, which is the first bugfix release for the Apache Flink 1.10 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming a

Re: Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Zhu Zhu
Congratulations Dian! Thanks, Zhu Zhijiang 于2020年8月27日周四 下午6:04写道: > Congrats, Dian! > > -- > From:Yun Gao > Send Time:2020年8月27日(星期四) 17:44 > To:dev ; Dian Fu ; user < > user@flink.apache.org>; user-zh > Subject:Re: Re: [ANNOUNC

Re: Why setAllVerticesInSameSlotSharingGroupByDefault is set to false in batch mode

2020-09-15 Thread Zhu Zhu
Hi Zheng, To divide managed memory for operators[1], we need to consider which tasks will run in the same slot. In batch jobs, vertices in different regions may not run at the same time. If we put them in the same slot sharing group, running tasks may run slower with less managed memory, while man

[ANNOUNCE] Apache Flink 1.11.2 released

2020-09-16 Thread Zhu Zhu
The Apache Flink community is very happy to announce the release of Apache Flink 1.11.2, which is the second bugfix release for the Apache Flink 1.11 series. Apache Flink® is an open-source stream processing framework for distributed, high-performing, always-available, and accurate data streaming

Re: App gets stuck in Created State

2020-09-21 Thread Zhu Zhu
Hi Arpith, All tasks in CREATED state indicates no task is scheduled yet. It is strange it a job gets stuck in this state. Is it possible that you share the job manager log so we can check what is happening there? Thanks, Zhu Arpith P 于2020年9月21日周一 下午3:52写道: > Hi, > > We have Flink 1.8.0 clust

Re: slot problem

2020-11-24 Thread Zhu Zhu
Each task will be assigned a dedicated thread for its data processing. A slot can be shared by multiple tasks if they are in the same slot sharing group[1]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.10/concepts/runtime.html#task-slots-and-resources Thanks, Zhu ゞ野蠻遊戲χ 于2020年1

Re: How to test flink job recover from checkpoint

2020-03-04 Thread Zhu Zhu
break the network connection between the Flink app and the source service). Thanks, Zhu Zhu Eleanore Jin 于2020年3月5日周四 上午8:40写道: > Hi, > > I have a flink application and checkpoint is enabled, I am running locally > using miniCluster. > > I just wonder if there is a way to simulat

Re: Issue with Could not resolve ResourceManager address akka.tcp://flink

2020-03-26 Thread Zhu Zhu
e root cause. Would you check whether the hostname *prod-bigd-dn11* is resolvable? And whether the port 43757 of that machine is permitted to be accessed? Thanks, Zhu Zhu Vitaliy Semochkin 于2020年3月27日周五 上午1:54写道: > Hi, > > I'm facing an issue similar to > https://issues.apach

Re: Flink job didn't restart when a task failed

2020-04-13 Thread Zhu Zhu
this case. @Bruce would you take a look at the TM log? If the guess is right, in task manager logs there will be one line "Task {} is already in state FAILED." Thanks, Zhu Zhu Till Rohrmann 于2020年4月10日周五 上午12:51写道: > For future reference, here is the issue to track the reconciliat

Re: Flink job didn't restart when a task failed

2020-04-14 Thread Zhu Zhu
to confirm that the line "FOG_PREDICTION_FUNCTION (15/20) ( 3086efd0e57612710d0ea74138c01090) switched from RUNNING to FAILED" does not appear in the JM log, right? This might be an issue that the message was lost on network, which should be a rare case. Do you encounter it often? Thanks,

Re: Flink 1.10.0 failover

2020-04-25 Thread Zhu Zhu
ugh resource? Thanks, Zhu Zhu seeksst 于2020年4月26日周日 上午11:21写道: > Hi, > > > Recently, I find a problem when job failed in 1.10.0, flink didn’t > release resource first. > > > > You can see I used flink on yarn, and it doesn’t allocate task > manager, beacause

Re: Flink 1.10.0 failover

2020-04-26 Thread Zhu Zhu
Seems something bad happened in the task managers and led to heartbeat timeouts. These TMs were not released by flink but lost connections with the master node. I think you need to check the TM log to see what happens there. Thanks, Zhu Zhu seeksst 于2020年4月26日周日 下午2:13写道: > Thank you for y

Re: Flink restart strategy on specific exception

2020-05-12 Thread Zhu Zhu
ng to support later. @Till Rohrmann @Gary Yao what do you think? [1] https://lists.apache.org/thread.html/6ed95eb6a91168dba09901e158bc1b6f4b08f1e176db4641f79de765%40%3Cdev.flink.apache.org%3E Thanks, Zhu Zhu Ken Krugler 于2020年5月13日周三 上午7:34写道: > Hi Til, > > Sorry, missed the key q

Re: Flink restart strategy on specific exception

2020-05-14 Thread Zhu Zhu
Ticket FLINK-17714 is created to track this requirement. Thanks, Zhu Zhu Till Rohrmann 于2020年5月13日周三 下午8:30写道: > Yes, you are right Zhu Zhu. Extending > the RestartBackoffTimeStrategyFactoryLoader to also load custom > RestartBackoffTimeStrategies sound like a good improvement for t

Re: [ANNOUNCE] Apache Flink 1.10.1 released

2020-05-17 Thread Zhu Zhu
Thanks Yu for being the release manager. Thanks everyone who made this release possible! Thanks, Zhu Zhu Benchao Li 于2020年5月15日周五 下午7:51写道: > Thanks Yu for the great work, and everyone else who made this possible. > > Dian Fu 于2020年5月15日周五 下午6:55写道: > >> Thanks Yu for man

Re: Apache Flink - Question about application restart

2020-05-24 Thread Zhu Zhu
Hi M, Regarding your questions: 1. yes. The id is fixed once the job graph is generated. 2. yes Regarding yarn mode: 1. the job id keeps the same because the job graph will be generated once at client side and persist in DFS for reuse 2. yes if high availability is enabled Thanks, Zhu Zhu M

Re: Apache Flink - Question about application restart

2020-05-27 Thread Zhu Zhu
mentioned. One example is that you can submit an application to a cluster multiple times at the same time, different JobIDs are needed to differentiate them. Thanks, Zhu Zhu Till Rohrmann 于2020年5月27日周三 下午10:05写道: > Hi, > > if you submit the same job multiple times, then it will get eve

Re: Apache Flink - Question about application restart

2020-05-28 Thread Zhu Zhu
) but each submission will be treated as a different job and will have a different job id. Thanks, Zhu Zhu M Singh 于2020年5月29日周五 上午4:59写道: > Thanks Till - in the case of restart of flink master - I believe the jobid > will be different. Thanks > > On Thursday, May 28, 2020, 11:

Re: NoResourceAvailableException and JobNotFound Errors

2020-06-02 Thread Zhu Zhu
of your job is c23a172cda6cc659296af6452ff57f45, but the REST request is get the info of job be3d6b9751b6e9c509b9bedeb581a72e. Thanks, Zhu Zhu Prasanna kumar 于2020年6月3日周三 上午2:16写道: > Hi , > > I am running flink locally in my machine with following configurations. > > # The

Re: How does TaskManager announce JobManager about available ResultPartitions?

2020-07-20 Thread Zhu Zhu
Hi Joseph, The availability of pipelined result partition is notified to JM via scheduleOrUpdateConsumers RPC. Just want to mention that it's better to send such questions to the user mail list. Thanks, Zhu Zhu Fork Joseph 于2020年7月21日周二 上午3:30写道: > Hi, > > According to

Re: Flink jobs getting finished because of "Could not allocate the required slot within slot request timeout"

2020-07-29 Thread Zhu Zhu
resources for a requested container can be found in Flink JM log. Thanks, Zhu Zhu mars 于2020年7月29日周三 下午10:52写道: > Hi All, > > I have an EMR Cluster with one Master Node and 3 worker Nodes ( it has > auto > scaling enabled and the max no.of worker nodes can go up to 8). > > I have

Re: In JobGraph, does an IntermediateDataSet contain multiple JobEdges?

2020-08-04 Thread Zhu Zhu
What you obsessed is right. At the moment, one IntermediateDataSet can have one only consumer job edge in production code path. Thanks, Zhu Zhu yuehan1 于2020年8月4日周二 下午5:14写道: > IntermediateDataSet.java has a JobEdge list named consumers. > In which case, an IntermediateDataSet co

Re: [ANNOUNCE] Apache Flink 1.10.3 released

2021-01-31 Thread Zhu Zhu
Thanks Xintong for being the release manager and everyone who helped with the release! Cheers, Zhu Dian Fu 于2021年1月29日周五 下午5:56写道: > Thanks Xintong for driving this release! > > Regards, > Dian > > 在 2021年1月29日,下午5:24,Till Rohrmann 写道: > > Thanks Xintong for being our release manager. Well don

Re: [ANNOUNCE] Apache Flink 1.12.2 released

2021-03-08 Thread Zhu Zhu
Thanks Roman and Yuan for being the release managers! Thanks everyone who has made this release possible! Cheers, Zhu Piotr Nowojski 于2021年3月6日周六 上午12:38写道: > Thanks Roman and Yuan for your work and driving the release process :) > > pt., 5 mar 2021 o 15:53 Till Rohrmann napisał(a): > >> Great

Re: [ANNOUNCE] Apache Flink 1.13.0 released

2021-05-06 Thread Zhu Zhu
Thanks Dawid and Guowei for being the release managers! And thanks everyone who has made this release possible! Thanks, Zhu Yun Tang 于2021年5月6日周四 下午2:30写道: > Thanks for Dawid and Guowei's great work, and thanks for everyone involved > for this release. > > Best > Yun Tang >

Re: Jobmanagers are in a crash loop after upgrade from 1.12.2 to 1.13.1

2021-06-30 Thread Zhu Zhu
Hi Shilpa, JobType was introduced in 1.13. So I guess the cause is that the client which creates and submit the job is still 1.12.2. The client generates a outdated job graph which does not have its JobType set and resulted in this NPE problem. Thanks, Zhu Austin Cawley-Edwards 于2021年7月1日周四 上午1

Re: Job manager failing because Flink does not find checkpoints on HDFS

2019-08-01 Thread Zhu Zhu
Hi Farouk, This issue does not relate to checkpoints. The JM launching fails due to the job's user jar blob is missing on HDFS. Does this issue always happen? If it rarely occurs, the file might be unexpectedly deleted by someone else. Thanks, Zhu Zhu Farouk 于2019年8月1日周四 下午5:22写道: > H

Re: Streaming from a file

2019-08-01 Thread Zhu Zhu
Hi Vishwas, Not sure whether I understand your needs correctly. I think currently readTextFile(path) does return a DataStream. From the code it is emitting one line once it is read from the file, thus in a line-by-line streaming pattern. Thanks, Zhu Zhu Vishwas Siravara 于2019年8月1日周四 下午11:50写道

Re: Flink issue

2019-08-01 Thread Zhu Zhu
Hi Karthick, >From the log seems the TM "flink-taskmanager-b/2:6121" is lost unexpectedly. You may need to check the log of that TM to see why it exits, which should be the root cause. Thanks, Zhu Zhu Karthick Thanigaimani 于2019年8月2日周五 下午1:54写道: > Hi Team, > We are facing f

Re: Flink issue

2019-08-02 Thread Zhu Zhu
is needed to check why the TM exits, for internal failure(e.g. cancel timeout) or killed by other services(e.g. K8S). I'm not familiar with K8S so I'm not sure in which case the cluster may kill the TM. Thanks, Zhu Zhu Karthick Thanigaimani 于2019年8月2日周五 下午2:48写道: > Thanks Zhu. That&#

Re: Flink issue

2019-08-02 Thread Zhu Zhu
For network issue, this answer might help http://mail-archives.apache.org/mod_mbox/flink-user/201907.mbox/%3cdb49d6f2-1a6b-490a-8502-edd9562b0163.yungao...@aliyun.com%3E . Thanks, Zhu Zhu Karthick Thanigaimani 于2019年8月2日周五 下午5:34写道: > Yes Zhu, the server is online and it didn't get die

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Zhu Zhu
Congratulations to Hequn! Thanks, Zhu Zhu Zili Chen 于2019年8月7日周三 下午5:16写道: > Congrats Hequn! > > Best, > tison. > > > Jeff Zhang 于2019年8月7日周三 下午5:14写道: > >> Congrats Hequn! >> >> Paul Lam 于2019年8月7日周三 下午5:08写道: >> >>> Congrats Hequn! W

Re: Passing jvm options to flink

2019-08-07 Thread Zhu Zhu
han one URL. The protocol must be supported by the " + "{@link java.net.URLClassLoader}."* I think it should work. Thanks, Zhu Zhu Vishwas Siravara 于2019年8月8日周四 上午1:03写道: > Hi , > I am running flink on a standalone cluster without any resource manager > like yarn or K

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-08 Thread Zhu Zhu
cause. Thanks, Zhu Zhu Subramanyam Ramanathan 于2019年8月9日周五 上午1:45写道: > > > Hello, > > > > I'm currently using flink 1.7.2. > > > > I'm trying to run a job that's submitted programmatically using the > ClusterClient API. > >

Re: some slots are not be available,when job is not running

2019-08-09 Thread Zhu Zhu
Hi pengchengling, Does this issue happen before you submitting any job to the cluster or after some jobs are terminated? If it's the latter case, didi you wait for a while to see if the unavailable slots became available again? Thanks, Zhu Zhu pengcheng...@bonc.com.cn 于2019年8月9日周五 下午4

Re: Strange DataSet behavior when using custom FileInputFormat

2019-08-09 Thread Zhu Zhu
data are some how corrupted, this case may happen. Do you have the detailed error info that why your program exits? That can be helpful to identify the root cause. Thanks, Zhu Zhu Hynek Noll 于2019年8月9日周五 下午8:59写道: > Hi, > I'm trying to implement a custom FileInputFormat (to rea

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-08-10 Thread Zhu Zhu
file exists but it has some static initialization process which may fail. This can also lead to the class to not be loaded and cause NoClassDefFoundError. Thanks, Zhu Zhu Subramanyam Ramanathan 于2019年8月10日周六 下午2:38写道: > Hi. > > > > 1) The url pattern example : > file:///r

Re: How many task managers can Flink efficiently scale to?

2019-08-11 Thread Zhu Zhu
JM main thread and increased computation complexity of each RPC handling. Thanks, Zhu Zhu qi luo 于2019年8月11日周日 下午6:17写道: > Hi Chad, > > In our cases, 1~2k TMs with up to ~10k TM slots are used in one job. In > general, the CPU/memory of Job Manager should be increased with more TMs.

Re: Why available task slots are not leveraged for pipeline?

2019-08-11 Thread Zhu Zhu
Hi Cam, This case is expected due to slot sharing. A slot can be shared by one instance of different tasks. So the used slot is count of your max parallelism of a task. You can specify the shared group with slotSharingGroup(String slotSharingGroup) on operators. Thanks, Zhu Zhu Abhishek Jain 于

Re: Why Job Manager die/restarted when Task Manager die/restarted?

2019-08-11 Thread Zhu Zhu
connection to ZK 3. encounters other unexpected fatal errors. In this case we need to check the log to see what happens then Thanks, Zhu Zhu Cam Mach 于2019年8月12日周一 下午12:15写道: > Hello Flink experts, > > We are running Flink under Kubernetes and see that Job Manager > die/restarted w

Re: Why Job Manager die/restarted when Task Manager die/restarted?

2019-08-11 Thread Zhu Zhu
Another possibility is the JM is killed externally, e.g. K8s may kill JM/TM if it exceeds the resource limit. Thanks, Zhu Zhu Zhu Zhu 于2019年8月12日周一 下午1:45写道: > Hi Cam, > > Flink master should not die when getting disconnected with task managers. > It may exit for cases below: >

Re: Why available task slots are not leveraged for pipeline?

2019-08-12 Thread Zhu Zhu
parallelism individually, you can invoke setParallelism on each operator. Thanks, Zhu Zhu Zili Chen 于2019年8月12日周一 下午8:00写道: > Hi Cam, > > If you set parallelism to 60, then you would make use of all 60 slots you > have and > for you case, each slot executes a chained operator contai

Re: How can I pass jvm options to flink when started from command line

2019-08-14 Thread Zhu Zhu
Hi Vishwas, If what you want is to set JVM options for Flink client JVM when running jobs with "flink run", I think export the variable 'JVM_ARGS' does help. Thanks, Zhu Zhu Vishwas Siravara 于2019年8月15日周四 上午4:03写道: > I understand that when I run a flink job from comm

Re: Customize file assignments logic in flink application

2019-08-14 Thread Zhu Zhu
Split*s in one request, you can implement a new *InputSplit* which wraps multiple *FileInputSplit*s. And you may need to define in your *InputFormat* on how to process the new *InputSplit*. Thanks, Zhu Zhu Lu Niu 于2019年8月15日周四 上午12:26写道: > Hi, > > I have a data set backed by a director

Re: How to load udf jars in flink program

2019-08-15 Thread Zhu Zhu
Hi Jiangang, Does "flink run -j jarpath ..." work for you? If that jar id deployed to the same path on each worker machine, you can try "flink run -C classpath ..." as well. Thanks, Zhu Zhu 刘建刚 于2019年8月15日周四 下午5:31写道: > We are using per-job to load udf jar when

Re: Customize file assignments logic in flink application

2019-08-16 Thread Zhu Zhu
Hi Lu, I think it's OK to choose any way as long as it works. Though I've no idea how you would extend SplittableIterator in your case. The underlying is ParallelIteratorInputFormat and its processing is not matched to a certain subtask index. Thanks, Zhu Zhu Lu Niu 于2019年8月16日周五

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Zhu Zhu
t the stored resource manager address in HA is replaced by jobmanager address in any case? Thanks, Zhu Zhu Aleksandar Mastilovic 于2019年8月22日周四 上午8:16写道: > Hi all, > > I’m experimenting with using my own implementation of HA services instead > of ZooKeeper that would persist JobManager

Re: Externalized checkpoints

2019-08-21 Thread Zhu Zhu
Hi Vishwas, You can configure "state.checkpoints.num-retained" to specify the max checkpoints to retain. By default it is 1. Thanks, Zhu Zhu Vishwas Siravara 于2019年8月22日周四 上午6:48写道: > I am also using exactly once checkpointing mode, I have a kafka source and > sink

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Zhu Zhu
Thanks Gordon for the update. Congratulations that we have Flink 1.9.0 released! Thanks to all the contributors. Thanks, Zhu Zhu Eliza 于2019年8月22日周四 下午8:10写道: > > > On 2019/8/22 星期四 下午 8:03, Tzu-Li (Gordon) Tai wrote: > > The Apache Flink community is very happy to announce

Re: Problem with Flink on Yarn

2019-08-23 Thread Zhu Zhu
, Zhu Zhu Juan Gentile 于2019年8月23日周五 下午7:48写道: > Hello! > > > > We are running Flink on Yarn and we are currently getting the following > error: > > > > *2019-08-23 06:11:01,534 WARN > org.apache.hadoop.security.UserGroupInformation - >

Re: Using shell environment variables

2019-08-24 Thread Zhu Zhu
ink's task manager process. For example for passing LD_LIBRARY_PATH as an env variable to the workers, set: containerized.taskmanager.env.LD_LIBRARY_PATH: "/usr/lib/native" in the flink-conf.yaml. Thanks, Zhu Zhu Abhishek Jain 于2019年8月25日周日 上午2:48写道: > Hi Miki, > Thanks for your

Re: Problem upgrading HA set up from 1.7.2 to 1.9.0

2019-08-28 Thread Zhu Zhu
with a non-null value. In that case, would you check if the docker image is storing a pre-generated legacy(1.7.2) JobGraph which is not compatible with Flink 1.9? Thanks, Zhu Zhu Steven Nelson 于2019年8月28日周三 下午11:23写道: > I am trying to update a cluster running in HA mode from 1.7.2 to 1.9.0. I &

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-08-30 Thread Zhu Zhu
One optimization that we take is letting yarn to reuse the flink-dist jar which was localized when running previous jobs. Thanks, Zhu Zhu Jörn Franke 于2019年8月30日周五 下午4:02写道: > Increase replication factor and/or use HDFS cache > https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/

Re: best practices on getting flink job logs from Hadoop history server?

2019-08-30 Thread Zhu Zhu
multiple TM logs. However it can be much smaller than the "yarn logs ..." generated log. Thanks, Zhu Zhu Yu Yang 于2019年8月30日周五 下午3:58写道: > Hi, > > We run flink jobs through yarn on hadoop clusters. One challenge that we > are facing is to simplify flink job log access. >

Re: [SURVEY] Is the default restart delay of 0s causing problems?

2019-08-30 Thread Zhu Zhu
In our production, we usually override the restart delay to be 10 s. We once encountered cases that external services are overwhelmed by reconnections from frequent restarted tasks. As a safer though not optimized option, a default delay larger than 0 s is better in my opinion. 未来阳光 <2217232...@q

Re: How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?

2019-09-01 Thread Zhu Zhu
ed cache of YARN. In this way, the localized dist jar can be shared by different YARN applications and it will not be removed when the YARN application which localized it terminates. This requires some changes in Flink though. We will open a ISSUE to contribute this optimization to the community. Thank

Re: [SURVEY] Is the default restart delay of 0s causing problems?

2019-09-02 Thread Zhu Zhu
1s looks good to me. And I think the conclusion that when a user should override the delay is worth to be documented. Thanks, Zhu Zhu Steven Wu 于2019年9月3日周二 上午4:42写道: > 1s sounds a good tradeoff to me. > > On Mon, Sep 2, 2019 at 1:30 PM Till Rohrmann wrote: > >> Thanks

Re: error in LocalStreamEnvironment

2019-09-04 Thread Zhu Zhu
client, I think you can change "val env: StreamExecutionEnvironment = DemoStreamEnvironment.env" to be "val env = StreamExecutionEnvironment.getExecutionEnvironment" in the demo code like TotalArrivalCount.scala. Thanks, Zhu Zhu alaa 于2019年9月4日周三 下午5:18写道: > Hallo

Re: error in LocalStreamEnvironment

2019-09-05 Thread Zhu Zhu
cs-release-1.9/getting-started/examples/ https://ci.apache.org/projects/flink/flink-docs-release-1.9/getting-started/tutorials/local_setup.html Thanks, Zhu Zhu alaa 于2019年9月5日周四 下午3:10写道: > thank for your reply but Unfortunately this solution is not suitable > > < > http://apach

Re: Post-processing batch JobExecutionResult

2019-09-06 Thread Zhu Zhu
tionEnvironment();" Thanks, Zhu Zhu spoganshev 于2019年9月6日周五 下午11:39写道: > Due to OptimizerPlanEnvironment.execute() throwing exception on the last > line > there is not way to post-process batch job execution result, like: > > JobExecutionResult r = env.execute(); //

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Zhu Zhu
Congratulations Zili! Thanks, Zhu Zhu Terry Wang 于2019年9月11日周三 下午5:34写道: > Congratulations! > > Best, > Terry Wang > > > > 在 2019年9月11日,下午5:28,Dian Fu 写道: > > Congratulations! > > 在 2019年9月11日,下午5:26,Jeff Zhang 写道: > > Congratulations Zili Che

[SURVEY] How many people are using customized RestartStrategy(s)

2019-09-12 Thread Zhu Zhu
hanks, Zhu Zhu

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-12 Thread Zhu Zhu
estart-strategy: org.foobar.MyRestartStrategyFactoryFactory". The usage of restart strategies you mentioned will keep working with the new scheduler. Thanks, Zhu Zhu Oytun Tez 于2019年9月12日周四 下午10:05写道: > Hi Zhu, > > We are using custom restart strategy like this: > > environment.setRestartStrategy(f

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-19 Thread Zhu Zhu
Flink 1.10 Other usages are still supported, including all the strategies and configuring ways described in https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/task_failure_recovery.html#restart-strategies . Feel free to share in this thread if you has any concern for it. Thanks, Zh

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-19 Thread Zhu Zhu
Thanks Steven for the feedback! Could you share more information about the metrics you add in you customized restart strategy? Thanks, Zhu Zhu Steven Wu 于2019年9月20日周五 上午7:11写道: > We do use config like "restart-strategy: > org.foobar.MyRestartStrategyFactoryFactory". Mainly t

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-22 Thread Zhu Zhu
not be restarted when task failures happen and the "fullRestart" value will not increment in such cases. I'd appreciate if you can help with these questions and we can make better decisions for Flink. Thanks, Zhu Zhu Steven Wu 于2019年9月22日周日 上午3:31写道: > Zhu Zhu, > > Flink fullRes

Re: Is there a lifecycle listener that gets notified when a topology starts/stops on a task manager

2019-09-23 Thread Zhu Zhu
Hi Stephen, I think disposing static components in the closing stage of a task is required. This is because your code(operators/UDFs) is part of the task, namely that it can only be executed when the task is not disposed. Thanks, Zhu Zhu Stephen Connolly 于2019年9月24日周二 上午2:13写道: > Curren

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-23 Thread Zhu Zhu
ric to show failovers that respects fine grained recovery. [1] https://issues.apache.org/jira/browse/FLINK-14164 Thanks, Zhu Zhu Steven Wu 于2019年9月24日周二 上午6:41写道: > > When we setup alert like "fullRestarts > 1" for some rolling window, we > want to use counter. if it i

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Zhu Zhu
throw NoClassDefFoundError due to the class loader gets closed. Thanks, Zhu Zhu Subramanyam Ramanathan 于2019年9月24日周二 下午8:07写道: > Hi, > > > > Thank you. > > I think the takeaway for us is that we need to make sure that the threads > are stopped in the close() method. > > > > W

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-24 Thread Zhu Zhu
Hi Subramanyam, I checked the commits. There are 2 fixes in FLINK-10455, only release 1.8.1 and release 1.9.0 contain both of them. Thanks, Zhu Zhu Subramanyam Ramanathan 于2019年9月24日周二 下午11:02写道: > Hi Zhu, > > > > We also use FlinkKafkaProducer(011), hence I felt this fi

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-24 Thread Zhu Zhu
Hi Steven, As a conclusion, since we will have a meter metric[1] for restarts, customized restart strategy is not needed in your case. Is that right? [1] https://issues.apache.org/jira/browse/FLINK-14164 Thanks, Zhu Zhu Steven Wu 于2019年9月25日周三 上午2:30写道: > Zhu Zhu, > > Sorry, I

Re: NoClassDefFoundError in failing-restarting job that uses url classloader

2019-09-25 Thread Zhu Zhu
Yes. 1.8.2 contains all commits in 1.8.1. Subramanyam Ramanathan 于2019年9月25日周三 下午5:03写道: > Hi Zhu, > > > > Thanks a lot ! > > Since 1.8.2 is also available, would it be right to assume 1.8.2 would > also contain the fix ? > > > > Thanks, > > Subb

Re: [SURVEY] How many people are using customized RestartStrategy(s)

2019-09-25 Thread Zhu Zhu
We will then keep the decision that we do not support customized restart strategy in Flink 1.10. Thanks Steven for the inputs! Thanks, Zhu Zhu Steven Wu 于2019年9月26日周四 上午12:13写道: > Zhu Zhu, that is correct. > > On Tue, Sep 24, 2019 at 8:04 PM Zhu Zhu wrote: > >> Hi

Re: Anomaly in handling late arriving data

2019-09-25 Thread Zhu Zhu
"e12", 7L) AND allowedLateness(Time.milliseconds(2L)) change to allowedLateness(Time.milliseconds(4L)) * The same as case 1). Thanks, Zhu Zhu Indraneel R 于2019年9月26日周四 上午2:24写道: > Hi Everyone, > > I am trying to execute this simple sessionization pipeline, with the > allo

Re: ** Help need w.r.t parallelism settings in flink **

2019-09-26 Thread Zhu Zhu
hange the parallelism via manually rescaling. Thanks, Zhu Zhu Akshay Iyangar 于2019年9月27日周五 上午4:20写道: > Hi > > So we are running a beam pipeline that uses flink as its execution engine. > We are currently on flink1.8 > > So per the flink documentation I see that there is an option

Re: Where are uploaded Job jars stored?

2019-10-10 Thread Zhu Zhu
be java.io.tmpdir in standalone mode. Thanks, Zhu Zhu John Smith 于2019年10月11日周五 上午2:41写道: > And can that folder be shared so that all nodes see it? > > On Thu, 10 Oct 2019 at 14:36, Yun Tang wrote: > >> Hi John >> >> The jar is not stored in HA path, I think

Re: Discard message on deserialization errors.

2019-10-11 Thread Zhu Zhu
a null deserialized record. Just pay attention to also not make *DeserializationSchema#isEndOfStream* throw errors on a null record provided. Thanks, Zhu Zhu John Smith 于2019年10月12日周六 上午5:36写道: > Hi using Flink 1.8.0. > > I am ingesting data from Kafka, unfortunately for the time bei

Re: Discard message on deserialization errors.

2019-10-12 Thread Zhu Zhu
I mean the Kafka source provided in Flink can correctly ignores null deserialized values. isEndOfStream allows you to control when to end the input stream. If it is used for running infinite stream jobs, you can simply return false. Thanks, Zhu Zhu John Smith 于2019年10月12日周六 下午8:40写道: >

Re: Is it possible to get Flink job name in an operator?

2019-10-15 Thread Zhu Zhu
I think ExecutionConfig.GlobalJobParameters is the way to do this if you want to retrieve it in runtime. Or you can just pass the name to each operator you implement to have it serialized together with the udf. Thanks, Zhu Zhu 马阳阳 于2019年10月15日周二 下午3:11写道: > As the title. Is it possible now?

Re: Data processing with HDFS local or remote

2019-10-18 Thread Zhu Zhu
, Zhu Zhu Pritam Sadhukhan 于2019年10月18日周五 上午10:59写道: > Hi, > > I am trying to process data stored on HDFS using flink batch jobs. > Our data is splitted into 16 data nodes. > > I am curious to know how data will be pulled from the data nodes with the > same number of parall

Re: Data processing with HDFS local or remote

2019-10-20 Thread Zhu Zhu
source interfaces provided in ExecutionEnvironment, like #readTextFile and #readFile, use FileInputFormat, so the input locality is supported by default. Thanks, Zhu Zhu Pritam Sadhukhan 于2019年10月21日周一 上午10:17写道: > Hi Zhu Zhu, > > Thanks for your detailed answer. > Can you pleas

Re: Flink batch app occasionally hang

2019-10-29 Thread Zhu Zhu
Hi Caio, Did you check whether there are enough resources to launch the other nodes? Could you attach the logs you mentioned? And elaborate how the tasks are connected in the topology? Thanks, Zhu Zhu Caio Aoque 于2019年10月30日周三 上午8:31写道: > Hi, I've been running some flink scala appl

Re: Flink disaster recovery test problems

2019-11-11 Thread Zhu Zhu
with the initial state, i.e. the job will consume data from the very beginning and there can be a big data regression. Thanks, Zhu Zhu 钟旭阳 于2019年11月5日周二 下午3:01写道: > hello: > > > I am currently learning flink.I recently had a problem with Flink for > disaster recovery testin

Re: Job Distribution Strategy On Cluster.

2019-11-11 Thread Zhu Zhu
Hi Srikanth, Is this issue what you encounter? FLINK-12122: a job would tend to fill one TM before using another. If it is, you may need to wait for the release 1.9.2 or 1.10, since it is just fixed. Thanks, Zhu Zhu vino yang 于2019年11月11日周一 下午5:48写道: > Hi srikanth, > > What

Re: Job Distribution Strategy On Cluster.

2019-11-11 Thread Zhu Zhu
There is no plan for release 1.9.2 yet. Flink 1.10.0 is planned to be released in early January. Thanks, Zhu Zhu srikanth flink 于2019年11月11日周一 下午9:53写道: > Zhu Zhu, > > That's awesome and is what I'm looking for. > Any update on when would be the next release date? > &

Re: Flink configuration at runtime

2019-11-18 Thread Zhu Zhu
[1] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/docker.html#flink-job-cluster [2] https://ci.apache.org/projects/flink/flink-docs-release-1.9/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn Thanks, Zhu Zhu amran dean 于2019年11月19日周二 上午5:53写道: > Is it possible

Re: Collections as Flink job parameters

2019-11-18 Thread Zhu Zhu
parse it later in the main method. Thanks, Zhu Zhu Протченко Алексей 于2019年11月19日周二 上午12:29写道: > > Hello all. > > I have a question about providing complex configuration to Flink job. We > are working on some kind of platform for running used-defined packages > which actually ca

Re: Apache Flink - Retries for async processing

2019-12-10 Thread Zhu Zhu
the error is recoverable, you can just retry (or refresh the token), and only complete the ResultFuture until it succeeds (until timeout). Thanks, Zhu Zhu M Singh 于2019年12月10日周二 下午8:51写道: > Thanks Jingsong for sharing your solution. > > Since both refreshing the token and the actual AP

Re: [ANNOUNCE] Apache Flink 1.8.3 released

2019-12-12 Thread Zhu Zhu
Thanks Hequn for driving the release and everyone who makes this release possible! Thanks, Zhu Zhu Wei Zhong 于2019年12月12日周四 下午3:45写道: > Thanks Hequn for being the release manager. Great work! > > Best, > Wei > > 在 2019年12月12日,15:27,Jingsong Li 写道: > > Thanks Hequn

Re: [EXTERNAL] Flink and Prometheus monitoring question

2019-12-16 Thread Zhu Zhu
Hi Jesús, If your job has checkpointing enabled, you can monitor 'numberOfCompletedCheckpoints' to see wether the job is still alive and healthy. Thanks, Zhu Zhu Jesús Vásquez 于2019年12月17日周二 上午2:43写道: > The thing about numRunningJobs metric is that i have to configure in

Re: How to reprocess certain events in Flink?

2019-12-17 Thread Zhu Zhu
not be a problem with the unique transaction_id assumption. Thanks, Zhu Zhu Pooja Agrawal 于2019年12月17日周二 下午9:17写道: > > Hi, > > I have a use case where we are reading events from kinesis stream.The > event can look like this > Event { > event_id, > transaction_id >

Re: Different jobName per Job when reporting Flink metrics to PushGateway

2019-12-17 Thread Zhu Zhu
on cluster or submit a job in job cluster mode. Thanks, Zhu Zhu Sidney Feiner 于2019年12月17日周二 下午11:08写道: > I'm using Flink 1.9.1 with PrometheusPushGateway to report my metrics. The > jobName the metrics are reported with is defined in the flink-conf.yaml > file which makes the j

Re: How to reprocess certain events in Flink?

2019-12-18 Thread Zhu Zhu
original event was processed. Thanks, Zhu Zhu Rafi Aroch 于2019年12月18日周三 下午3:50写道: > Hi Pooja, > > Here's an implementation from Jamie Grier for de-duplication using > in-memory cache with some expiration time: > > https://github.com/jgrier/FilteringExample/blob/master/src/mai

Re: Deploying Operator on concrete Task Manager

2019-12-18 Thread Zhu Zhu
Hi KristoffSC, Flink does not support specifying the TM for tasks. So I think you need to launch a separate job to do the "AsyncCall + map" in the secured zone. Thanks, Zhu Zhu KristoffSC 于2019年12月18日周三 下午8:04写道: > Hi, > I have a question regarding job/operator deployment

Re: Rich Function Thread Safety

2019-12-18 Thread Zhu Zhu
Hi Aaron, It is thread safe since the state snapshot happens in the same thread with the user function. Thanks, Zhu Zhu Aaron Langford 于2019年12月19日周四 上午11:25写道: > Hello Flink Community, > > I'm hoping to verify some understanding: > > If I have a function with managed sta

Re: How long Flink state default TTL,if I don't config the state ttl config?

2020-01-05 Thread Zhu Zhu
Yes. State TTL is by default disabled. Thanks, Zhu Zhu LakeShen 于2020年1月6日周一 上午10:09写道: > I saw the flink source code, I find the flink state ttl default is > never expire,is it right? > > LakeShen 于2020年1月6日周一 上午9:58写道: > >> Hi community,I have a question about flink

Re: Flink Job claster scalability

2020-01-08 Thread Zhu Zhu
e-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Temporarily-remove-support-for-job-rescaling-via-CLI-action-quot-modify-quot-td27447.html Thanks, Zhu Zhu KristoffSC 于2020年1月9日周四 上午1:05写道: > Hi all, > I must say I'm very impressed by Flink and what it can do. > > I

Re: Flink Job claster scalability

2020-01-09 Thread Zhu Zhu
case. KristoffSC 于2020年1月9日周四 下午9:26写道: > Thank you David and Zhu Zhu, > this helps a lot. > > I have follow up questions though. > > Having this > /"Instead the Job must be stopped via a savepoint and restarted with a new > parallelism"/ > > and slot s

Re: How to assign a UID to a KeyedStream?

2020-01-09 Thread Zhu Zhu
Hi Ken, This is actually a bug that a Partition should not require a UID. It is fixed in 1.9.2 and 1.10. see FLINK-14910 <https://jira.apache.org/jira/browse/FLINK-14910>. Thanks, Zhu Zhu Ken Krugler 于2020年1月10日周五 上午7:51写道: > Hi all, > > [Of course, right after hitting send I r

  1   2   >