Re: 如何进行主页中文翻译任务

2022-05-21 Thread Zhilong Hong
Hi, 振宇: Flink官方主页的代码位于 [1],目录下所有以.zh.md 为结尾的文件都是中文版本。至于 Documentation Style 的中文文档则在 [2],目前确实没有翻译成中文,如果你感兴趣的话可以参考文档 [3] 进行代码贡献。首先在JIRA [4] 上新建一个Issue,用英文阐述相关信息。在Apache Flink Committer将该Issue指定给你以后,就可以在目录 [1] 下提pull request了~ Best, Zhilong [1]

Re: taskexecutor .out files

2022-05-16 Thread Zhilong Hong
Hi, Zain: The taskmanager.out only contains contents outputted by stdout. Sometimes some fatal exceptions, like JVM exit exceptions and so on will be outputted to the .out file. If you don't specify the file path for the gc log, the content of the gc log will be saved into the .out file, too.

Re: Flink OLAP 与 Trino TPC-DS 对比

2022-05-08 Thread Zhilong Hong
十分感谢Yu Li老师的提醒,原邮件中第5个文档连接(即《10GiB TPCDS数据集测试结果》)已经更新至Google Docs [1]。 [1] https://docs.google.com/spreadsheets/d/1nietTOrFg93p7k7L82lGPlUjwCpw97bWfP21xI_MLcE/edit?usp=sharing Best, Zhilong Hong On Fri, May 6, 2022 at 4:51 PM Yu Li wrote: > 感谢大家的分享和分析,也期待Flink在相关方向的持续优化! > > Let's m

Re: Flink OLAP 与 Trino TPC-DS 对比

2022-05-01 Thread Zhilong Hong
的差距已大幅缩短,详见 [5]。 目前在 OLAP 场景下 Flink 与 Trino 确实还存在差距,社区目前也正在针对这一场景进行优化 [6]。我们目前在阿里内部的开发分支上,已经追平了 Trino 的性能,相关优化预计会在 Flink 1.16、1.17 两个版本中陆续贡献回社区。 Best, Zhilong Hong [1] https://github.com/ververica/flink-sql-gateway [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL

Re: The file STDOUT does not exist on the TaskExecutor 异常

2022-04-20 Thread Zhilong Hong
Hello, 卓宇: 这个是REST API的报错,说明你在Flink Dashboard中TaskManager页面点击了Stdout选项卡,但对应的TaskManager上访问不到stdout文件,因此报错。该错误不会影响任务的正常运行,可以忽略。 Best, Zhilong On Wed, Apr 20, 2022 at 3:06 PM 陈卓宇 <2572805...@qq.com.invalid> wrote: > 大佬您好: > 小弟想问一下这个异常是什么原因产生的,对生产有何影响,如何消除 > >

Re: Flink OLAP 与 Trino TPC-DS 对比

2022-04-15 Thread Zhilong Hong
Hello, Luning! 我们目前也正在关注Flink在OLAP场景的性能表现,请问你测试的Flink和Trino版本分别是什么呢?另外我看到flink-sql-benchmark中所使用的集群配置和你的不太一样,可能需要根据集群资源对flink-conf.yaml中taskmanager.memory.process.size等资源配置进行调整。 Best, Zhilong On Fri, Apr 15, 2022 at 2:38 PM LuNing Wang wrote: > 跑了100个 TPC-DS SQL > 10 GB

Re: io.network.netty.exception

2022-03-07 Thread Zhilong Hong
Hi, 明文: 这个报错实际上是TM失联,一般是TM被kill导致的,可以根据TM的Flink日志和GC日志、集群层面的NM日志(YARN环境)或者是K8S日志查看TM被kill的原因。一般情况下可能是:gc时间过长导致TM心跳超时被kill、TM内存超用导致container/pod被kill等等。 Best. Zhilong On Mon, Mar 7, 2022 at 10:18 AM 潘明文 wrote: > HI 读kafka,入hbase和kafka > flink任务经常性报错 > >

Re: Task Manager shutdown causing jobs to fail

2022-03-07 Thread Zhilong Hong
Hi, Puneet: Like Terry says, if you find your job failed unexpectedly, you could check the configuration restart-strategy in your flink-conf.yaml. If the restart strategy is set to be disabled or none, the job will transition to failed once it encounters a failover. The job would also fail itself

Re: PyFlink : submission via rest

2022-03-05 Thread Zhilong Hong
Hi, Aryan: You could refer to the official docs [1] for how to submit PyFlink jobs. $ ./bin/flink run \ --target yarn-per-job --python examples/python/table/word_count.py With this command you can submit a per-job application to YARN. The docs [2] and [3] describe how to submit jobs

Re: Flink failure rate restart not work as expect

2022-03-02 Thread Zhilong Hong
Hi, Jiaqiao: Since your job enables checkpoint, you can just try to remove the restart strategy config. The default value will be fixed-delay with Integer.MAX_VALUE restart attempts and '1 s' delay, as mentioned in [1]. In this way when a failover occurs, your job will wait for 1 seconds before

Re: Flink job recovery after task manager failure

2022-02-24 Thread Zhilong Hong
start, and the task manager was restarted a few times until it was > stabilized. > > > > You can find the log here: > > jobmanager-log.txt.gz > <https://nokia-my.sharepoint.com/:u:/p/ifat_afek/EUsu4rb_-BpNrkpvSwzI-vgBtBO9OQlIm0CHtW0gsZ7Gqg?email=zhlonghong%40gmail.com=ww5Idt&

Re: Flink job recovery after task manager failure

2022-02-23 Thread Zhilong Hong
Hi, Afek! When a TaskManager is killed, JobManager will not be acknowledged until a heartbeat timeout happens. Currently, the default value of heartbeat.timeout is 50 seconds [1]. That's why it takes more than 30 seconds for Flink to trigger a failover. If you'd like to shorten the time a

Re: TaskManager的Slot的释放时机

2022-01-25 Thread Zhilong Hong
Hello, johnjlong: TaskExecutor#cancel是RPC调用,不包含TM是否存活的信息。TM是否存活是由Heartbeat Service来负责检测的,目前heartbeat.timeout配置项 [1] 的默认值为50s。而RPC调用的超时配置项akka.ask.timeout [2] 的默认值为10s。如果想要尽快检测到TM丢失的情况,可以将这两个配置项的值调小,但这有可能会导致集群或作业不稳定。 关于降低heartbeat timeout时长社区目前已有讨论,具体可以参考:[3] 和 [4] [1]

Re: flink作业支持资源自动扩缩容吗?

2021-12-11 Thread Zhilong Hong
流作业的话可以看一下自1.13版本开始引入的Reactive模式 [1] 和Adaptive调度,会根据资源的变化对作业并发度进行调整。用户可以根据作业指标对资源进行调整,flink即会根据资源变化对作业进行调整。批作业的话可以了解一下1.15版本中即将推出的Adaptive批调度模式 [2],在这种模式下节点并发度会随着数据量自动进行调整。 [1] https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/docs/deployment/elastic_scaling/ [2]

Re: New blog post published - Sort-Based Blocking Shuffle Implementation in Flink

2021-11-08 Thread Zhilong Hong
Thank you for writing this blog post, Daisy and Kevin! It helps me to understand what sort-based shuffle is and how to use it. Looking forward to your future improvements! On Wed, Nov 3, 2021 at 6:32 PM Yuxin Tan wrote: > Thanks Daisy and Kevin! The IO scheduling idea of the sequential reading

Re: [ANNOUNCE] Apache Flink 1.11.2 released

2020-09-18 Thread Zhilong Hong
Thank you, @ZhuZhu, for driving this release! Best regards, Zhilong From: Zhu Zhu Sent: Thursday, September 17, 2020 13:29 To: dev ; user ; user-zh ; Apache Announce List Subject: [ANNOUNCE] Apache Flink 1.11.2 released The Apache Flink community is very