from:"Zili Chen"

Re: [SURVEY] Drop Share and Key_Shared subscription support in Pulsar connector

2022-12-14 Thread Zili Chen

Hi Yufan,

Thanks for starting this discussion. My two coins:

1. It can help the upstream to fix the transaction issues by submitting the 
instability and performance issues to the pulsar repo also.
2. Could you elaborate on whether and (if so) why we should drop the Shared and 
Key_Share subscription support on Flink?

Best,
tison.

On 2022/12/14 10:00:56 盛宇帆 wrote:
> Hi, I'm the maintainer of flink-connector-pulsar. I would like to
> start a survey on a function change proposal in
> flink-connector-pulsar.
> 
> I have created a ticket
>  on JIRA and paste
> its description here:
> 
> A lot of Pulsar connector test unstable issues are related to Shared
> and Key_Shared subscription. Because this two subscription is designed
> to consume the records in an unordered way. And we can support
> multiple consumers in same topic partition. But this feature lead to
> some drawbacks in connector.
> 
> 1. Performance
> 
> Flink is a true stream processor with high correctness support. But
> support multiple consumer will require higher correctness which
> depends on Pulsar transaction. But the internal implementation of
> Pulsar transaction on source is record the message one by one and
> stores all the pending ack status in client side. Which is slow and
> memory inefficient.
> 
> This means that we can only use Shared and Key_Shared on Flink with
> low throughput. This against our intention to support these two
> subscription. Because adding multiple consumer to same partition can
> increase the consuming speed.
> 
> 2. Unstable
> 
> Pulsar transaction acknowledge the messages one by one in an internal
> Pulsar's topic. But it's not stable enough to get it works. A lot of
> pending issues in Flink JIRA are related to Pulsar transaction and we
> don't have any workaround.
> 
> 3. Complex
> 
> Support Shared and Key_Shared subscription make the connector's code
> more complex than we expect. We have to make every part of code into
> ordered and unordered way. Which is hard to understand for the
> maintainer.
> 
> 4. Necessary
> 
> The current implementation on Shared and Key_Shared is completely
> unusable to use in Production environment. For the user, this function
> is not necessary. Because there is no bottleneck in consuming data
> from Pulsar, the bottleneck is in processing the data, which we can
> achieve by increasing the parallelism of the processing operator.
>

FLINK WEEKLY 2019/42

2019-10-21 Thread Zili Chen

FLINK WEEKLY 2019/42 

很高兴和大家分享上周 FLINK 社区的发展。上周 Jark Wu 主导发布了 FLINK 1.9.1 版本，该版本修复了 1.9.0
上的一系列缺陷，欢迎 FLINK 的用户及时更新到 1.9.1 以避免被 1.9.0 的缺陷所困扰。
用户问题

如何限制blink中资源使用上限（perjob模式）


Flink 1.9 SQL/TableAPI 设置uid及State 更新问题


如何修改checkpoint生成的_metadata文件中的hdfs路径


使用flink-sql实现mysql维表的join的ddl和dml的示列


Data processing with HDFS local or remote


FLINK 的批作业在读取 HDFS 的输入的时候如何尽量让 task 部署在存储输入的机器上以利用局部性提升作业性能。FLINK
内部已经完成了这个局部性优化，邮件列表上 Zhu Zhu 详细介绍了 FLINK 的实现细节

Submitting jobs via REST


通过 FLINK 的 JarRun REST API 提交作业的方式

Customize Part file naming (Flink 1.9.0)


自定义 StreamingFileSink 的部分名称，这次是在英文 user 列表上提问的，同样的问题上周在 user-zh 列表上有人提问过

ProcessFunction Timer


关于在 Window 中使用合适的 Timer 完成具体业务逻辑的问题

JDBC Table Sink doesn't seem to sink to database.


JDBC Table Sink 使用上的一些问题，batch interval 会导致只在 Sink 接收到指定数目的数据后才 flush
数据，可能导致末尾数据一直不被 flush

Warnings connecting to Akka


akka connection refused 可能是意料中的异常，发生在 akka 试图重新连接一个已经失效的 endpoint 的时候，当
FLINK 通过心跳机制发现 endpoint 已经丢失，将不再试图重新连接

Jar Uploads in High Availability (Flink 1.7.2)


在 k8s 上启动多个 FLINK 集群做 HA 的时候，由于 k8s 的网络路由机制，可能会导致 Web UI 的请求被转发到不同的
Dispatcher 上，从而使得 Web UI 工作不稳定

Is it possible to get Flink job name in an operator?


不太可能在算子中获取 Job 的名字，但是可以在用户层面事先设置 Job 名字，并在启动的时候使用这个名字，在算子中也使用同样的名字

Discard message on deserialization errors


Kafka Connector 使用 KafkaDeserializationSchema 反序列化数据时在无法反序列化时返回 null 即可丢弃该数据
已知缺陷

FLINK-14429 Wrong app final status when running batch job on yarn with
non-detached mode 

YARN 上 non-detached 部署的作业状态显示为 SUCCEEDED 但是可能其实作业是失败的，这跟 FLINK 的 per-job
实现有关，由于涉及到 hack 的逻辑，目前暂时没有一个明确的解法处理这个问题
开发讨论

[DISCUSS] Stateful Functions - in which form to contribute? (same or
different repository)


Stephan 发起了关于如何将 Stateful Functions 贡献回 FLINK
社区的讨论，主要集中在是否以独立的仓库存在和文档与构建的一些处理上

[NOTICE] Binary licensing is now auto-generated


Chesnay Schepler 把 FLINK 的 NOTICE 文件生成放到 release 的自动化步骤中，可以减轻开发者关注 NOTICE
文件的负担

[DISUCSS] FLIP-80: Expression String Serializable and Deserializable


Jark Wu 的 FLIP-80 旨在解决如何序列化/反序列化 catalog 中的 expression

[DISCUSS] Rename the SQL ANY type to OPAQUE type


Timo Walther 发起了将 SQL 中 ANY 类型改名为 OPAQUE 类型的讨论，主要是因为目前的 ANY
类型代表的是一种用户自定义序列化的对于 FLINK 来说的黑盒类型，而不是真正的任意类型

[DISCUSS] FLIP-59: Enable execution configuration from Configuration object


FLIP-59 关于将 Execution 配置加入到 Configuration 的讨论由于 FLIP-73
涉及相关的议题开始重新讨论，主要集中在相关概念的厘清和命名问题上

[ARM support] Travis ARM CI is now in Alpha Release


Xiyuan Wang 同步了 FLINK on ARM 的测试的进度，目前 Travis 支持了 ARM 架构的测试环境，他提议将

Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Zili Chen

Thanks a lot for being release manager Jark. Great work!

Best,
tison.


Till Rohrmann  于2019年10月19日周六 下午10:15写道：

> Thanks a lot for being our release manager Jark and thanks to everyone who
> has helped to make this release possible.
>
> Cheers,
> Till
>
> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>
>>  Hi,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>> 1.9 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>
>> We would like to thank all contributors of the Apache Flink community who
>> helped to verify this release and made this release possible!
>> Great thanks to @Jincheng for helping finalize this release.
>>
>> Regards,
>> Jark Wu
>>
>

Re: [ANNOUNCE] Apache Flink 1.9.1 released

2019-10-19 Thread Zili Chen

Thanks a lot for being release manager Jark. Great work!

Best,
tison.


Till Rohrmann  于2019年10月19日周六 下午10:15写道：

> Thanks a lot for being our release manager Jark and thanks to everyone who
> has helped to make this release possible.
>
> Cheers,
> Till
>
> On Sat, Oct 19, 2019 at 3:26 PM Jark Wu  wrote:
>
>>  Hi,
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.1, which is the first bugfix release for the Apache Flink
>> 1.9 series.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this bugfix release:
>> https://flink.apache.org/news/2019/10/18/release-1.9.1.html
>>
>> The full release notes are available in Jira:
>> https://issues.apache.org/jira/projects/FLINK/versions/12346003
>>
>> We would like to thank all contributors of the Apache Flink community who
>> helped to verify this release and made this release possible!
>> Great thanks to @Jincheng for helping finalize this release.
>>
>> Regards,
>> Jark Wu
>>
>

Re: [PROPOSAL] Contribute Stateful Functions to Apache Flink

2019-10-14 Thread Zili Chen

+1 to add Stateful Function to FLINK core repository.

Best,
tison.


Becket Qin  于2019年10月14日周一 下午4:16写道：

> +1 to adding Stateful Function to Flink. It is a very useful addition to
> the Flink ecosystem.
>
> Given this is essentially a new top-level / first-citizen API of Flink, it
> seems better to have it the Flink core repo. This will also avoid letting
> this important new API to be blocked on potential problems of maintaining
> multiple different repositories.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Sun, Oct 13, 2019 at 4:48 AM Hequn Cheng  wrote:
>
>> Hi Stephan,
>>
>> Big +1 for adding this to Apache Flink!
>>
>> As for the problem of whether this should be added to the Flink main
>> repository, from my side, I prefer to put it in the main repository. Not
>> only Stateful Functions shares very close relations with the current Flink,
>> but also other libs or modules in Flink can make use of it the other way
>> round in the future. At that time the Flink API stack would also be changed
>> a bit and this would be cool.
>>
>> Best, Hequn
>>
>> On Sat, Oct 12, 2019 at 9:16 PM Biao Liu  wrote:
>>
>>> Hi Stehpan,
>>>
>>> +1 for having Stateful Functions in Flink.
>>>
>>> Before discussing which repository it should belong, I was wondering if
>>> we have reached an agreement of "splitting flink repository" as Piotr
>>> mentioned or not. It seems that it's just no more further discussion.
>>> It's OK for me to add it to core repository. After all almost everything
>>> is in core repository now. But if we decide to split the core repository
>>> someday, I tend to create a separate repository for Stateful Functions. It
>>> might be good time to take the first step of splitting.
>>>
>>> Thanks,
>>> Biao /'bɪ.aʊ/
>>>
>>>
>>>
>>> On Sat, 12 Oct 2019 at 19:31, Yu Li  wrote:
>>>
 Hi Stephan,

 Big +1 for adding stateful functions to Flink. I believe a lot of user
 would be interested to try this out and I could imagine how this could
 contribute to reduce the TCO for business requiring both streaming
 processing and stateful functions.

 And my 2 cents is to put it into flink core repository since I could
 see a tight connection between this library and flink state.

 Best Regards,
 Yu


 On Sat, 12 Oct 2019 at 17:31, jincheng sun 
 wrote:

> Hi Stephan,
>
> bit +1 for adding this great features to Apache Flink.
>
> Regarding where we should place it, put it into Flink core repository
> or create a separate repository? I prefer put it into main repository and
> looking forward the more detail discussion for this decision.
>
> Best,
> Jincheng
>
>
> Jingsong Li  于2019年10月12日周六 上午11:32写道：
>
>> Hi Stephan,
>>
>> big +1 for this contribution. It provides another user interface that
>> is easy to use and popular at this time. these functions, It's hard for
>> users to write in SQL/TableApi, while using DataStream is too complex.
>> (We've done some stateFun kind jobs using DataStream before). With
>> statefun, it is very easy.
>>
>> I think it's also a good opportunity to exercise Flink's core
>> capabilities. I looked at stateful-functions-flink briefly, it is very
>> interesting. I think there are many other things Flink can improve. So I
>> think it's a better thing to put it into Flink, and the improvement for 
>> it
>> will be more natural in the future.
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Oct 11, 2019 at 7:33 PM Dawid Wysakowicz <
>> dwysakow...@apache.org> wrote:
>>
>>> Hi Stephan,
>>>
>>> I think this is a nice library, but what I like more about it is
>>> that it suggests exploring different use-cases. I think it definitely 
>>> makes
>>> sense for the Flink community to explore more lightweight applications 
>>> that
>>> reuses resources. Therefore I definitely think it is a good idea for 
>>> Flink
>>> community to accept this contribution and help maintaining it.
>>>
>>> Personally I'd prefer to have it in a separate repository. There
>>> were a few discussions before where different people were suggesting to
>>> extract connectors and other libraries to separate repositories. 
>>> Moreover I
>>> think it could serve as an example for the Flink ecosystem website[1]. 
>>> This
>>> could be the first project in there and give a good impression that the
>>> community sees potential in the ecosystem website.
>>>
>>> Lastly, I'm wondering if this should go through PMC vote according
>>> to our bylaws[2]. In the end the suggestion is to adopt an existing code
>>> base as is. It also proposes a new programs concept that could result 
>>> in a
>>> shift of priorities for the community in a long run.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>> [1]
>>>

FLINK WEEKLY 2019/41

2019-10-14 Thread Zili Chen

FLINK WEEKLY 2019/41 

很高兴和大家分享上周 FLINK 社区的发展。上周 Stephan Ewen 在 Flink Forward Berlin 上宣布了基于 FLINK
的通用计算库 Stateful Function ，使用 Stateful Function 可以将
FLINK 的应用场景扩展到现有的几乎所有数据系统上。具体的邮件链接参考本次 WEEKLY 最后的社区发展部分
用户问题

Flink 1.8 版本如何进行 TaskManager 的资源控制


升级 FLINK 1.5 到 1.8 并切换 runtime 框架到 FLIP-6 之后遇到的资源配置问题

文件重命名


一定程度上自定义 StreamingFileSink 产生的文件的名称

Flink SQL ：Unknown or invalid SQL statement.


FLINK SQL Client 对 SQL 的支持局限性，不支持 create table 语句

How to write stream data to other Hadoop Cluster by StreamingFileSink


FLINK 作业将输出写到另一个 Hadoop 集群上所需要的配置，避免 FLINK 无法解析另一个集群的相关信息

基于savepoint 调小并发的问题


从 savepoint 中启动作业时修改原先配置的并发度，但是最大并发不能改变

flink checkpoint超时问题


排查 checkpoint 问题可以参考这篇文章 

Flink集群迁移savepoint还保留原集群地址问题讨论


目前 FLINK savepoint 保存的是文件的绝对路径，因此不支持移动到另一个 HDFS 集群上启动。作为临时方案，可以通过修改 meta
文件的非正规方法绕过

flink1.9 webui exception日志显示问题


FLINK 1.9 之后 Web UI 显示异常问题，可能与 1.9 对 failover 的策略更新有关，暂无定论

Flink StreamingFileSink.forBulkFormat to HDFS


支持使用 ORC 格式的 Hive 表消费 Kafka 数据到 HDFS

Group by multiple fields


基于多个 field 做 group by 的 API 调用方法

[SURVEY] How do people upgrade their Flink applications?


关于升级 FLINK 应用的调查，阿里的工程师简要介绍了他们的经验

Backpressure tuning/failure


关于 FLINK 反压的调优方法的问题
开发讨论

[DISCUSS] Drop Python 2 support for 1.10


Dian Fu 发起了在 1.10 中移除 FLINK 对 Python2 的支持的讨论，目前 FLINK 正在实现新的 Python
API。该讨论已基本达成一致，正在投票中

Mongo Connector


Vijay Srinivasaraghavan 发起了关于 FLINK Mongo 连接器的讨论

[DISCUSS] FLIP-76: Unaligned checkpoints


Arvid Heise 的 FLIP-76 得到了非常好的反响，该 FLIP 旨在优化反压情况下的 checkpoint 性能

[DISCUSS] FLIP-77: Introduce ConfigOptions with Data Types


Timo Walther 的 FLIP-77 由 FLIP-54 分裂而来，作为演化 FLINK 配置的一部分，首先支持 ConfigOptions
中带有数据类型信息

[SURVEY] How do you use ExternallyInducedSource or WithMasterCheckpointHook


Biao Liu 发起了对 FLINK 用户使用 ExternallyInducedSource 和 WithMasterCheckpointHook
接口的调查。这将对他主导的 CheckpointCoordinator 的线程模型重构有所帮助，并有助于保证重构工作不会影响现有的使用场景
社区发展

[PROPOSAL] Contribute Stateful Functions to Apache Flink


Stephan Ewen 在 Flink Forward Berlin 上宣布了基于 FLINK 的通用计算库 Stateful Function
，使用 Stateful Function 可以将 FLINK
的应用场景扩展到现有的几乎所有数据系统上。这个邮件旨在将 Stateful Function 贡献回 FLINK
的代码仓库中。目前主要对是否接受贡献和代码以独立仓库存在还是整合进 FLINK 主仓库进行讨论

[VOTE] Release 1.9.1, release candidate #1


FLINK 1.9.1 的发布稳步进行中，目前已经收到若干正面的反馈，对应的发布页也已经就绪。有望在本周发出

Re: 文件重命名

2019-10-08 Thread Zili Chen

简单看了下拼文件名的规则，你可以试试

RowFormatBuilder#withPartFilePrefix
RowFormatBuilder#withPartFileSuffix

这两个方法。应该可以将你的文件名设置成

--

中间段是写死的。

如果你有更自定义的重命名需求，建议把你为什么需要重命名具体的说一下。

Best,
tison.


Wesley Peng  于2019年10月8日周二 下午5:43写道：

> May you want to rename them in HDFS with FileSystem.rename method?
>
>
> on 2019/10/8 17:39, yanggang_it_job wrote:
> > 写入hdfs的文件都是
> > part-{parallel-task}-{count}
> > 这种格式
> >
> >
> > 如何重命名啊？
>

FLINK WEEKLY 2019/40

2019-10-07 Thread Zili Chen

FLINK WEEKLY 2019/40 用户问题

Dynamic stream handling


FLINK 暂不支持流图的动态更新，但这是 FLINK 计划中支持的功能

[SURVEY] What is the most subtle/hard to catch bug that people have seen?


Konstantinos Kallas 发起了一个有趣的调查，关于 FLINK 用户遇到过的最微妙棘手的问题。他和他的团队正准备搭建一个 FLINK
的测试框架，希望能够收集已有的问题的样本

Broadcast state


关于在作业中 Broadcast state 的复用问题

Finding the Maximum Value Received so far in a Stream


场景实现，查找流中当前的最大值

POJO serialization vs immutability


关于 FLINK 中 POJO 实现的细节，由于 POJO 的域是可变的，所以在默认的 hashCode 实现下不能用作键值对的键
已知缺陷

FLINK-14315 NPE with JobMaster.disconnectTaskManager


JobMaster 的竞态条件使得运行中可能抛出空指针异常，已定位到问题，将在 1.10.0/1.9.1/1.8.3 中修复
开发讨论

[SURVEY] Dropping non Credit-based Flow Control


Piotr Nowojski 发起了废除非 Credit-based 的流量控制机制的讨论。在 FLINK 1.5 中引入了 Credit-based
的流量控制机制，目前 FLINK 的网络栈正在活跃发展，废除这部分代码将有利于开发的进行

[jira] [Created] (FLINK-14320) [FLIP-66] Support Time Attribute in SQL DDL


Jark Wu 的 FLIP-66 已经通过投票，开始开发。FLIP-66 旨在提供 SQL DDL 中对时间属性的支持

[DISCUSS] Improve Flink logging with contextual information


Gyula Fóra 发起了关于丰富 FLINK 日志内容的讨论，主要是提供关于 TaskManager/Container/JobId 等信息

[DISCUSS] FLIP-65: New type inference for Table API UDFs


Timo Walther 的 FLIP-65 旨在为 Table API 的用户定义函数提供新的类型接口，这也是新一轮 Table API
开发中的一部分

[DISCUSS] FLIP-76: Unaligned checkpoints


Arvid Heise 的 FLIP-76 引入了 Unaligned checkpoints，旨在优化背压情况下的 checkpoint 性能

[VOTE] FLIP-73: Introducing Executors for job submission

[VOTE]
FLIP-74: Flink JobClient API


Client API 改进的两个 FLIP 进入投票阶段
社区发展

[VOTE] Release 1.9.1, release candidate #1


Jark Wu 作为 1.9.1 的 release manager 拉出了第一个 release candidate

Real-time experiment analytics at Pinterest using Apache Flink


来自 Pinterest Engineering 的开发者分享了他们使用 FLINK 做实时计算的经验

Turning messy data into a gold mine using Spark, Flink, and ScyllaDB


来自 DynamicYield 的 Oran Hirsch 分享了他们基于 Spark Flink ScyllaDB 的数据分析栈

Re: Warnings connecting to Akka

2019-10-03 Thread Zili Chen

Does the log you attached above come from a TaskManager Node? If so,
what state is the Job node it tried to connect to? Did it crash?

BTW, it would be helpful if you can attach more logs of TM and JM except
two lines said akka connection refused.


John Smith  于2019年10月4日周五 上午2:08写道：

> So I guess it had some older state?
>
> On Thu., Oct. 3, 2019, 11:29 a.m. John Smith, 
> wrote:
>
>> I'm running standalone cluster with Zookeeper. It seems it was trying to
>> connect to an older node. I rebooted the Job node tha was complaining. It
>> seems to be ok now...
>>
>> I have 3 Zookeepers, 3 Job Nodes and 3 Tasks Nodes
>>
>> On Thu, 3 Oct 2019 at 11:15, Zili Chen  wrote:
>>
>>> Hi John,
>>>
>>> could you provide some details such as which mode you runs
>>> on(standalone/YARN)
>>> and related configuration(jobmanager.address jobmanager.port and so on)?
>>>
>>> Best,
>>> tison.
>>>
>>>
>>> John Smith  于2019年10月3日周四 下午11:02写道：
>>>
>>>> Hi running 1.8 the cluster seems to be OK but I see these warnings in
>>>> the logs...
>>>>
>>>> 2019-10-03 14:57:25,152 WARN
>>>>  akka.remote.transport.netty.NettyTransport- Remote
>>>> connection to [null] failed with java.net.ConnectException: Connection
>>>> refused: /xxx.xxx.xxx.65:46167
>>>> 2019-10-03 14:57:25,156 WARN  akka.remote.ReliableDeliverySupervisor
>>>>  - Association with remote system
>>>> [akka.tcp://fl...@xxx.xxx.xxx.65:46167] has failed, address is now
>>>> gated for [50] ms. Reason: [Association failed with
>>>> [akka.tcp://fl...@xxx.xxx.xxx.65:46167]] Caused by: [Connection
>>>> refused: /xxx.xxx.xxx.65:46167]
>>>>
>>>>
>>>>

Re: Warnings connecting to Akka

2019-10-03 Thread Zili Chen

Hi John,

could you provide some details such as which mode you runs
on(standalone/YARN)
and related configuration(jobmanager.address jobmanager.port and so on)?

Best,
tison.


John Smith  于2019年10月3日周四 下午11:02写道：

> Hi running 1.8 the cluster seems to be OK but I see these warnings in the
> logs...
>
> 2019-10-03 14:57:25,152 WARN  akka.remote.transport.netty.NettyTransport
>  - Remote connection to [null] failed with
> java.net.ConnectException: Connection refused: /xxx.xxx.xxx.65:46167
> 2019-10-03 14:57:25,156 WARN  akka.remote.ReliableDeliverySupervisor
>  - Association with remote system
> [akka.tcp://fl...@xxx.xxx.xxx.65:46167] has failed, address is now gated
> for [50] ms. Reason: [Association failed with
> [akka.tcp://fl...@xxx.xxx.xxx.65:46167]] Caused by: [Connection refused:
> /xxx.xxx.xxx.65:46167]
>
>
>

Re: Problems with java.utils

2019-09-30 Thread Zili Chen

Thanks for your information Dian. It is not an urgent issue though. Maybe
revisit later :-)

Best,
tison.


Dian Fu  于2019年9月30日周一 下午7:34写道：

> Hi tison,
>
> Actually there may be compatibility issues as the
> BatchTableEnvironment/StreamTableEnvironment under "api.java" are public
> interfaces.
>
> Regards,
> Dian
>
> 在 2019年9月30日，下午4:49，Zili Chen  写道：
>
> Hi Dian,
>
> What about rename api.java to japi if there is no unexpected compatibility
> issue? I think we can always avoid use a `.java.` in package names.
>
> Best,
> tison.
>
>
> Dian Fu  于2019年9月26日周四 下午10:54写道：
>
>> Hi Nick,
>>
>> There is a package named "org.apache.flink.table.api.java" in flink and
>> so the import of "org.apache.flink.table.api._" causes "
>> org.apache.flink.table.api.java" imported. Then all the import of
>> package starting with "java" such as "import java.util.ArrayList" will try
>> to find the classes under "org.apache.flink.table.api.java" as Scala
>> uses relative import by default. So I think nothing is wrong here. This is
>> the behavior of Scala language. You just need to use prefix of "_root_" to
>> force using absolute import.
>>
>> Regards,
>> Dian
>>
>> 在 2019年9月26日，下午10:26，Nicholas Walton  写道：
>>
>> Dian
>>
>> That fixed the problem thanks you. It would appear that someone has taken
>> it upon themselves to redefine part of the Java standard library in
>> org.apache.flink.table.api._
>>
>> NIck
>>
>> On 26 Sep 2019, at 15:16, Dian Fu  wrote:
>>
>> Hi Nick,
>>
>> [error] ……/src/main/scala/org/example/Job.scala:30:13: object util
>> is not a member of package org.apache.flink.table.api.java
>> [error] import java.util.ArrayList
>>
>>
>> The error message shows that it tries to find "util.ArrayList" under
>> package "org.apache.flink.table.api.java". So you can try one of the
>> following two solutions:
>> 1) Don't import "org.apache.flink.table.api._"
>> 2) Use absolute import: "import _root_.java.util.ArrayList"
>>
>> Regards,
>> Dian
>>
>> 在 2019年9月26日，下午10:04，Nicholas Walton  写道：
>>
>> I’ve shrunk the problem down to a minimal size. The code
>>
>> package org.example
>>
>> import org.apache.flink.table.api._
>> import org.apache.http.HttpHost
>>
>> import java.util.ArrayList
>>
>> object foo {
>>
>>   val httpHosts = new ArrayList[HttpHost]
>>   httpHosts.add(new HttpHost("samwise.local", 9200, "http"))
>>
>> }
>>
>> will not compile, but remove the import org.apache.flink.table.api._ and
>> all is well
>>
>> Nick
>>
>>
>> On 26 Sep 2019, at 12:53, Nicholas Walton  wrote:
>>
>> I’m having a problem using ArrayList in Scala . The code is below
>>
>> import org.apache.flink.core.fs._
>> import org.apache.flink.streaming.api._
>> import org.apache.flink.streaming.api.scala._
>> import org.apache.flink.table.api._
>> import org.apache.flink.table.api.scala._
>> import org.apache.flink.table.sinks._
>> import org.apache.http.HttpHost
>> import org.slf4j.LoggerFactory
>>
>> import java.util.ArrayList
>>
>> object Job {
>>
>>   val httpHosts = new ArrayList[HttpHost]
>>   httpHosts.add(new HttpHost("samwise.local", 9200, "http"))
>>
>> …..
>>
>> }
>>
>> Scala refuses to recognise ArrayList. The IDE InteliJ likewise flags
>> java.util as not existing, even though it recognises ArrayList and suggest
>> auto-insertion of the import java.utils.ArrayList statement. The
>> compilation errors are
>>
>> [error] ……/src/main/scala/org/example/Job.scala:30:13: object util
>> is not a member of package org.apache.flink.table.api.java
>> [error] import java.util.ArrayList
>> [error] ^
>> [error] ……/src/main/scala/org/example/Job.scala:34:23: not found:
>> type ArrayList
>> [error]   val httpHosts = new ArrayList[HttpHost]
>> [error]   ^
>> [error] two errors found
>> [error] (Compile / compileIncremental) Compilation failed
>>
>> Without the ArrayList reference the code compiles with no errors.
>>
>> The sbt build file, if it helps is,
>>
>> resolvers in ThisBuild ++= Seq("Apache Development Snapshot Repository"
>> at "https://repository.apache.org/content/repositories/snapsh

FLINK WEEKLY 2019/39

2019-09-30 Thread Zili Chen

FLINK WEEKLY 2019/39 

大家国庆节快乐呀！过去的一周也是 FLINK 蓬勃发展的一周，下面就让我们看看上周都有些什么讨论和进展吧。
用户问题

关于 Async I/O 的 exactly-once


讨论了 FLINK 中 exactly-once 到底提供了什么保证，FLINK
只提供了系统内部的数据发送保证，与外部系统交互时需要合作提供跨系统的保证

请教初始化系统缓存的问题


FLINK 在物联网和车联网场景的实践，关于 FLINK 应用的数据缓存和优化

向社区提交代码怎么自己验证


在 FLINK 代码仓库上作出改动后，在提交 pull request 之前自测的方法

map不能返回null值吗


是的

flink 命令行疑问


YARN 参数 -yj 只支持本地 jar 包

怎么执行flink代码里边的测试用例


其实是个跟 Maven 或者 IDE 更相关的话题，Xintong Song 介绍了在命令行下通过 Maven 运行测试用例的方法

Challenges Deploying Flink With Savepoints On Kubernetes


FLINK on k8s 使用 Savepoint 启动的 troubleshooting

Help need w.r.t parallelism settings in flink


FLINK 中关于并发度配置的问题

Flink job manager doesn't remove stale checkmarks


开启 incremental checkpoint 之后先前的 checkpoint 作为基础并不会因为 checkpoint retain
数量的问题而被认为是过期的

Flink SQL update-mode set to retract in env file


FLINK SQL update-mode=retract 的配置方式

Problems with java.utils


Scala 语言特性导致的导入 org.apache.flink.table.api._ 之后的导入冲突问题

Setting environment variables of the taskmanagers (yarn)


配置 YARN 下 TM 启动时的环境变量的问题

Joins Usage Clarification


关于 FLINK SQL Window 和 Join 定义的问题

How to corretly use checkstyle in IntelliJ IDEA


开发 FLINK 的时候如何向 IDEA 导入 checkstyle 规则
已知缺陷

FLINK-14145 getLatestCheckpoint(true) returns wrong checkpoint


FLINK 1.9.0 上 checkpoint 恢复选择 prefer checkpoint 的时候可能会发生严重的 BUG，在最近的一个
checkpoint 不是 savepoint 时仍然会被跳过。FLINK 1.9.1 上将会修复

FLINK-13708 transformations should be cleared because a table environment
could execute multiple job


FLINK 1.9.0 上 Table Environment 在执行多个 execute
的时候会出现前面的任务干扰后面的任务的情况。任务执行完时应该清除前一个任务的信息。FLINK 1.9.1 上将会修复
开发讨论

[DISCUSS] Expose multiple level clients

REST
API / JarRunHandler: More flexibility for launching jobs

[DISCUSS]
FLIP-73: Introducing Executors for job submission

[DISCUSS]
FLIP-74: Flink JobClient API


系列讨论都是关于 FLINK Client API 的重构的，FLINK 希望讨论出一个标准化的 Client API 以提供给 FLINK
的用户，包括直接提交 FLINK 作业的用户和 FLINK 作业管理平台的开发者。如果你关心提升跟 FLINK 交互的体验，欢迎参与任何一个
thread 或创建一个 thread 发表你的需求和看法

[DISCUSS] FLIP 69 - Flink SQL DDL Enhancement


Terry Wang 的 FLIP-69 提出了一系列增强 FLINK SQL DDL 的功能的提案

Per Key Grained Watermark Support


廖嘉逸提出了支持键级别的 watermark 的方案

[DISCUSS] FLIP-75: Flink Web UI Improvement Proposal


Yadong Xie 的 FLIP-75 提出了一系列的 Web UI

Re: Problems with java.utils

2019-09-30 Thread Zili Chen

Hi Dian,

What about rename api.java to japi if there is no unexpected compatibility
issue? I think we can always avoid use a `.java.` in package names.

Best,
tison.


Dian Fu  于2019年9月26日周四 下午10:54写道：

> Hi Nick,
>
> There is a package named "org.apache.flink.table.api.java" in flink and
> so the import of "org.apache.flink.table.api._" causes "
> org.apache.flink.table.api.java" imported. Then all the import of package
> starting with "java" such as "import java.util.ArrayList" will try to find
> the classes under "org.apache.flink.table.api.java" as Scala uses
> relative import by default. So I think nothing is wrong here. This is the
> behavior of Scala language. You just need to use prefix of "_root_" to
> force using absolute import.
>
> Regards,
> Dian
>
> 在 2019年9月26日，下午10:26，Nicholas Walton  写道：
>
> Dian
>
> That fixed the problem thanks you. It would appear that someone has taken
> it upon themselves to redefine part of the Java standard library in
> org.apache.flink.table.api._
>
> NIck
>
> On 26 Sep 2019, at 15:16, Dian Fu  wrote:
>
> Hi Nick,
>
> [error] ……/src/main/scala/org/example/Job.scala:30:13: object util is
> not a member of package org.apache.flink.table.api.java
> [error] import java.util.ArrayList
>
>
> The error message shows that it tries to find "util.ArrayList" under
> package "org.apache.flink.table.api.java". So you can try one of the
> following two solutions:
> 1) Don't import "org.apache.flink.table.api._"
> 2) Use absolute import: "import _root_.java.util.ArrayList"
>
> Regards,
> Dian
>
> 在 2019年9月26日，下午10:04，Nicholas Walton  写道：
>
> I’ve shrunk the problem down to a minimal size. The code
>
> package org.example
>
> import org.apache.flink.table.api._
> import org.apache.http.HttpHost
>
> import java.util.ArrayList
>
> object foo {
>
>   val httpHosts = new ArrayList[HttpHost]
>   httpHosts.add(new HttpHost("samwise.local", 9200, "http"))
>
> }
>
> will not compile, but remove the import org.apache.flink.table.api._ and
> all is well
>
> Nick
>
>
> On 26 Sep 2019, at 12:53, Nicholas Walton  wrote:
>
> I’m having a problem using ArrayList in Scala . The code is below
>
> import org.apache.flink.core.fs._
> import org.apache.flink.streaming.api._
> import org.apache.flink.streaming.api.scala._
> import org.apache.flink.table.api._
> import org.apache.flink.table.api.scala._
> import org.apache.flink.table.sinks._
> import org.apache.http.HttpHost
> import org.slf4j.LoggerFactory
>
> import java.util.ArrayList
>
> object Job {
>
>   val httpHosts = new ArrayList[HttpHost]
>   httpHosts.add(new HttpHost("samwise.local", 9200, "http"))
>
> …..
>
> }
>
> Scala refuses to recognise ArrayList. The IDE InteliJ likewise flags
> java.util as not existing, even though it recognises ArrayList and suggest
> auto-insertion of the import java.utils.ArrayList statement. The
> compilation errors are
>
> [error] ……/src/main/scala/org/example/Job.scala:30:13: object util is
> not a member of package org.apache.flink.table.api.java
> [error] import java.util.ArrayList
> [error] ^
> [error] ……/src/main/scala/org/example/Job.scala:34:23: not found:
> type ArrayList
> [error]   val httpHosts = new ArrayList[HttpHost]
> [error]   ^
> [error] two errors found
> [error] (Compile / compileIncremental) Compilation failed
>
> Without the ArrayList reference the code compiles with no errors.
>
> The sbt build file, if it helps is,
>
> resolvers in ThisBuild ++= Seq("Apache Development Snapshot Repository" at
> "https://repository.apache.org/content/repositories/snapshots/;,
> Resolver.mavenLocal)
>
> name := "Flink MultiChannel Project"
>
> version := "0.1-SNAPSHOT"
>
> organization := "org.example"
>
> scalaVersion in ThisBuild := "2.11.0"
>
> val flinkVersion = "1.8.1"
>
> val flinkDependencies = Seq(
>   "org.apache.flink" %% "flink-scala" % flinkVersion ,
>   "org.apache.flink" %% "flink-table" % "1.7.2" ,
>   "org.apache.flink" %% "flink-connector-elasticsearch" % flinkVersion,
>   "org.apache.flink" %% "flink-streaming-scala" % flinkVersion,
>   "org.apache.httpcomponents" % "httpclient" % "4.5.10",
>   "org.apache.httpcomponents" % "httpcore" % "4.4.11")
> // https://mvnrepository.com/artifact/org.apache.httpcomponents/httpcore
>
> lazy val root = (project in file(".")).
>   settings(
> libraryDependencies ++= flinkDependencies)
>
> mainClass in assembly := Some("org.example.Job")
>
> // make run command include the provided dependencies
> run in Compile := Defaults.runTask(fullClasspath in Compile, mainClass in
> (Compile, run), runner in (Compile, run))
>
> // exclude Scala library from assembly
> assemblyOption in assembly := (assemblyOption in
> assembly).value.copy(includeScala = false)
>
>
> TIA
>
> Nick
>
>
>
>
>
>

Re: Re: 向社区提交代码怎么自己验证

2019-09-27 Thread Zili Chen

你可以报个不稳定测试（x

Flink 很多测试跟并发相关，有一定可能在 check in 的时候通过 CI 后续阴魂不散的 fail（x

Best,
tison.


gaofeilong198...@163.com  于2019年9月27日周五 下午9:53写道：

>  Dian Fu 谢谢你的答复，
>
> build失败的log我拿出来放在这里了：
> https://issues.apache.org/jira/projects/FLINK/issues/FLINK-14115?filter=myopenissues
>
> 从这里还是看不出来原因。现在我有几个问题：
> 1. 我在本地执行doc目录下边的build_docs.sh
> -i是没有问题的。那么我能否重启开一个PR，重新提交我的修改，如果不能，我在现在这个PR的基础上应该怎么做。
> 2. 在我的PR中，我只有一个commit，为什么flinkbot会对3个comimt进行build，其中一个失败，两个成功，参考链接：
> https://github.com/apache/flink/pull/9749#issuecomment-534149758
> 3. 我提交的改动是文档内容，怎么会造成avro模块报错呢。
>
>
>
> gaofeilong198...@163.com
>
> 发件人： Dian Fu
> 发送时间： 2019-09-26 22:29
> 收件人： user-zh
> 主题： Re: 向社区提交代码怎么自己验证
> 1）build失败的话，可以看一下失败原因，如果和这个PR没有关系，可以通过“@flinkbot run travis”重新触发travis
> 2）本地可以通过“mvn clean verify”验证一下，详细可以看一下[1]，我看你这个改动是doc相关的，一般来说，不会导致build失败
>
> [1] https://flink.apache.org/contributing/contribute-code.html <
> https://flink.apache.org/contributing/contribute-code.html>
> > 在 2019年9月26日，下午9:56，高飞龙  写道：
> >
> > hi，我在向社区提交PR时，提示build失败（
> https://github.com/apache/flink/pull/9749#issuecomment-534149758）
> >
> >
> > 我应该怎么做？在提交PR之前我可以执行什么脚本先在本地进行build测试吗？
> >
> >
> >
> >
> >
> > --
> >
> >
> >
> > gaofeilong198...@163.com
>
>

Re: 向社区提交代码怎么自己验证

2019-09-26 Thread Zili Chen

看了下你的 PR，应该是因为不稳定测试导致的。文档相关的改动应该跟 CI 无关。

Best,
tison.


Zili Chen  于2019年9月26日周四 下午10:21写道：

> mvn verify 可以跑单元测试和做编译期检查（如 checkstyle）
>
> Best,
> tison.
>
>
> 高飞龙  于2019年9月26日周四 下午9:56写道：
>
>> hi，我在向社区提交PR时，提示build失败（
>> https://github.com/apache/flink/pull/9749#issuecomment-534149758）
>>
>>
>> 我应该怎么做？在提交PR之前我可以执行什么脚本先在本地进行build测试吗？
>>
>>
>>
>>
>>
>> --
>>
>>
>>
>> gaofeilong198...@163.com
>
>

Re: 向社区提交代码怎么自己验证

2019-09-26 Thread Zili Chen

mvn verify 可以跑单元测试和做编译期检查（如 checkstyle）

Best,
tison.


高飞龙  于2019年9月26日周四 下午9:56写道：

> hi，我在向社区提交PR时，提示build失败（
> https://github.com/apache/flink/pull/9749#issuecomment-534149758）
>
>
> 我应该怎么做？在提交PR之前我可以执行什么脚本先在本地进行build测试吗？
>
>
>
>
>
> --
>
>
>
> gaofeilong198...@163.com

Re: Recommended approach to debug this

2019-09-24 Thread Zili Chen

gt;>   // the runner waits for the execution to complete
>>>   // In normal circumstances it will run forever for streaming data
>>> source unless
>>>   // being stopped forcibly or any of the queries faces an exception
>>>   Await.result(streamletExecution.completed, Duration.Inf)
>>> } match { //..
>>>
>>> Apparently it looks like the exception that I was facing earlier leaked
>>> through the Flink engine and Try caught it and it got logged. But removing
>>> it out of Try now enables Flink to catch it back and follow the course that
>>> it should. But I am not sure if this is a cogent explanation and looking
>>> forward to some more accurate one from the experts. Note there is no
>>> asynchrony of concurrency going on here - the Runner code may look a bit
>>> over-engineered but there is a context to this. The Runner code handles not
>>> only Flink but other types of streaming engines as well like Spark and Akka
>>> Streams.
>>>
>>> regards.
>>>
>>>
>>> On Tue, Sep 24, 2019 at 10:17 AM Biao Liu  wrote:
>>>
>>>> Hi Zili,
>>>>
>>>> Thanks for pointing that out.
>>>> I didn't realize that it's a REST API based case. Debasish's case has
>>>> been discussed not only in this thread...
>>>>
>>>> It's really hard to analyze the case without the full picture.
>>>>
>>>> I think the reason of why `ProgramAbortException` is not caught is that
>>>> he did something outside `env.execute`. Like executing this piece of codes
>>>> inside a Scala future.
>>>>
>>>> I guess the scenario is that he is submitting job through REST API. But
>>>> in the main method, he wraps `env.execute` with Scala future, not executing
>>>> it directly.
>>>> The reason of env has been set to `StreamPlanEnvironment` is
>>>> `JarHandlerUtils` retrieves job graph through it.
>>>> And the `ProgramAbortException` is not thrown out, because the Scala
>>>> future tackles this exception.
>>>> So retrieving job graph fails due to an unrecognized exception (Boxed
>>>> Error).
>>>>
>>>> Thanks,
>>>> Biao /'bɪ.aʊ/
>>>>
>>>>
>>>>
>>>> On Tue, 24 Sep 2019 at 10:44, Zili Chen  wrote:
>>>>
>>>>> Hi Biao,
>>>>>
>>>>> The log below already infers that the job was submitted via REST API
>>>>> and I don't think it matters.
>>>>>
>>>>> at org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$
>>>>> JarHandlerContext.toJobGraph(JarHandlerUtils.java:126)
>>>>> at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$
>>>>> getJobGraphAsync$6(JarRunHandler.java:142)
>>>>>
>>>>> What I don't understand it that flink DOES catch the exception at the
>>>>> point it is reported thrown...
>>>>>
>>>>> Best,
>>>>> tison.
>>>>>
>>>>>
>>>>> Biao Liu  于2019年9月24日周二 上午10:34写道：
>>>>>
>>>>>>
>>>>>> > We submit the code through Kubernetes Flink Operator which uses the
>>>>>> REST API to submit the job to the Job Manager
>>>>>>
>>>>>> So you are submitting job through REST API, not Flink client? Could
>>>>>> you explain more about this?
>>>>>>
>>>>>> Thanks,
>>>>>> Biao /'bɪ.aʊ/
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 24 Sep 2019 at 03:44, Debasish Ghosh <
>>>>>> ghosh.debas...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Dian -
>>>>>>>
>>>>>>> We submit one job through the operator. We just use the following to
>>>>>>> complete a promise when the job completes ..
>>>>>>>
>>>>>>>   Try {
>>>>>>> createLogic.executeStreamingQueries(ctx.env)
>>>>>>>   }.fold(
>>>>>>> th ⇒ completionPromise.tryFailure(th),
>>>>>>> _ ⇒ completionPromise.trySuccess(Dun)
>>>>>>>   )
>>>>>>>
>>>>>>> If we totally do away with the promise and future stuff then we
>>>>>&

Re: Recommended approach to debug this

2019-09-23 Thread Zili Chen

Hi Biao,

The log below already infers that the job was submitted via REST API and I
don't think it matters.

at org.apache.flink.runtime.webmonitor.handlers.utils.JarHandlerUtils$
JarHandlerContext.toJobGraph(JarHandlerUtils.java:126)
at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.lambda$
getJobGraphAsync$6(JarRunHandler.java:142)

What I don't understand it that flink DOES catch the exception at the point
it is reported thrown...

Best,
tison.


Biao Liu  于2019年9月24日周二 上午10:34写道：

>
> > We submit the code through Kubernetes Flink Operator which uses the REST
> API to submit the job to the Job Manager
>
> So you are submitting job through REST API, not Flink client? Could you
> explain more about this?
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Tue, 24 Sep 2019 at 03:44, Debasish Ghosh 
> wrote:
>
>> Hi Dian -
>>
>> We submit one job through the operator. We just use the following to
>> complete a promise when the job completes ..
>>
>>   Try {
>> createLogic.executeStreamingQueries(ctx.env)
>>   }.fold(
>> th ⇒ completionPromise.tryFailure(th),
>> _ ⇒ completionPromise.trySuccess(Dun)
>>   )
>>
>> If we totally do away with the promise and future stuff then we don't get
>> the boxed error - only the exception reported in Caused By.
>>
>> regards.
>>
>> On Mon, Sep 23, 2019 at 10:20 PM Dian Fu  wrote:
>>
>>> Hi Debasish,
>>>
>>> In which case will the exception occur? Does it occur when you submit
>>> one job at a time or when multiple jobs are submitted at the same time? I'm
>>> asking this because I noticed that you used Future to execute the job
>>> unblocking. I guess ThreadLocal doesn't work well in this case.
>>>
>>> Regards,
>>> Dian
>>>
>>> 在 2019年9月23日，下午11:57，Debasish Ghosh  写道：
>>>
>>> Hi tison -
>>>
>>> Please find my response below in >>.
>>>
>>> regards.
>>>
>>> On Mon, Sep 23, 2019 at 6:20 PM Zili Chen  wrote:
>>>
>>>> Hi Debasish,
>>>>
>>>> The OptimizerPlanEnvironment.ProgramAbortException should be caught at
>>>> OptimizerPlanEnvironment#getOptimizedPlan
>>>> in its catch (Throwable t) branch.
>>>>
>>>
>>> >> true but what I get is a StreamPlanEnvironment. From my code I am
>>> only doing val env = StreamExecutionEnvironment.getExecutionEnvironment
>>> .
>>>
>>>>
>>>> It should always throw a ProgramInvocationException instead of
>>>> OptimizerPlanEnvironment.ProgramAbortException if any
>>>> exception thrown in the main method of your code.
>>>>
>>>> Another important problem is how the code is executed, (set context
>>>> environment should be another flink internal operation)
>>>> but given that you submit the job via flink k8s operator it might
>>>> require time to take a look at k8s operator implementation.
>>>>
>>>
>>> >> We submit the code through Kubernetes Flink Operator which uses the
>>> REST API to submit the job to the Job Manager
>>>
>>>>
>>>> However, given we catch Throwable in the place this exception thrown, I
>>>> highly suspect whether it is executed by an official
>>>> flink release.
>>>>
>>>
>>> >> It is an official Flink release 1.9.0
>>>
>>>>
>>>> A completed version of the code and the submission process is helpful.
>>>> Besides, what is buildExecutionGraph return type,
>>>> I think it is not ExecutionGraph in flink...
>>>>
>>>
>>> >> buildExecutionGraph is our function which returns a Unit. It's not
>>> ExecutionGraph. It builds the DataStream s by reading from Kafka and then
>>> finally writes to Kafka.
>>>
>>>>
>>>> Best,
>>>> tison.
>>>>
>>>>
>>>> Debasish Ghosh  于2019年9月23日周一 下午8:21写道：
>>>>
>>>>> This is the complete stack trace which we get from execution on
>>>>> Kubernetes using the Flink Kubernetes operator .. The boxed error comes
>>>>> from the fact that we complete a Promise with Success when it returns a
>>>>> JobExecutionResult and with Failure when we get an exception. And here we 
>>>>> r
>>>>> getting an exception. So the real stack trace we have is the one below in
>>>>

Re: Recommended approach to debug this

2019-09-23 Thread Zili Chen

ae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/PreviewPlanDumpTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/PreviewPlanDumpTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/PreviewPlanDumpTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/PreviewPlanDumpTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/PreviewPlanDumpTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/DumpCompiledPlanTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/DumpCompiledPlanTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/DumpCompiledPlanTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/DumpCompiledPlanTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/DumpCompiledPlanTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/jsonplan/DumpCompiledPlanTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/examples/KMeansSingleStepTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/examples/RelationalQueryCompilerTest.java:
>>>> } catch (OptimizerPlanEnvironment.ProgramAbortException pae) {
>>>> ./flink-tests/src/test/java/org/apache/flink/test/optimizer/iterations/ConnectedComponentsCoGroupTest.java:import
>>>> org.apache.flink.client.program.OptimizerPlanEnvironment.ProgramAbortException;
>>>> ./flink-tests/src/test/java/org/apache/flink/test/planning/LargePlanTest.java:
>>>> @Test(expected = OptimizerPlanEnvironment.ProgramAbortException.class,
>>>> timeout = 30_000)
>>>>
>>>> What am I missing here ?
>>>>
>>>> regards.
>>>>
>>>> On Mon, Sep 23, 2019 at 7:50 AM Zili Chen  wrote:
>>>>
>>>>> Hi Debasish,
>>>>>
>>>>> As mentioned by Dian, it is an internal exception that should be
>>>>> always caught by
>>>>> Flink internally. I would suggest you share the job(abstractly).
>>>>> Generally it is because
>>>>> you use StreamPlanEnvironment/OptimizerPlanEnvironment directly.
>>>>>
>>>>> Best,
>>>>> tison.
>>>>>
>>>>>
>>>>> Austin Cawley-Edwards  于2019年9月23日周一
>>>>> 上午5:09写道：
>>>>>
>>>>>> Have you reached out to the FlinkK8sOperator team on Slack? They’re
>>>>>> usually pretty active on there.
>>>>>>
>>>>>> Here’s the link:
>>>>>>
>>>>>> https://join.slack.com/t/flinkk8soperator/shared_invite/enQtNzIxMjc5NDYxODkxLTEwMThmN2I0M2QwYjM3ZDljYTFhMGRiNDUzM2FjZGYzNTRjYWNmYTE1NzNlNWM2YWM5NzNiNGFhMTkxZjA4OGU
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Austin
>>>>>>
>>>>>> On Sun, Sep 22, 2019 at 12:38 PM Debasish Ghosh <
>>>>>> ghosh.debas...@gmail.com> wrote:
>>>>>>
>>>>>>> The problem is I am submitting Flink jobs to Kubernetes cluster
>>>>>>> using a Flink Operator. Hence it's difficult to debug in the traditional
>>>>>>> sense of the term. And all I get is the exception that I reported ..
>>>>>>>
>>>>>>> Caused by:
>>>>>>> org.apache.flink.client.program.OptimizerPlanEnvi

FLINK WEEKLY 2019/38

2019-09-22 Thread Zili Chen

FLINK WEEKLY 2019/38 

本周的 FLINK WEELY 做了分类上的调整，分为用户问题、已知缺陷、开发讨论和社区发展四个模块。其中已知缺陷旨在提供不在 user 列表中出现的
FLINK 已知问题的参考。社区发展除了以前的 NEWS 包含的内容，还会包括 FLINK 相关的博文的链接。如果你撰写了和 FLINK
使用或者任何主题相关的博文，欢迎将链接同步发送到 wander4...@gmail.com，我会将链接附在当期的 FLINK WEEKLY
社区发展模块中。
用户问题

关于使用Flink计算TopN的问题


TopN 的底层实现使用 RetractableTopNFunction，其中缺少清理空闲状态的逻辑导致状态残留。社区已有相应的 issue
 和 pull request
 处理这个问题。

Recommended approach to debug this


关于程序中抛出
org.apache.flink.client.program.OptimizerPlanEnvironment$ProgramAbortException
的问题。目前仍在讨论中，但这个异常是 FLINK 的内部异常，在用户代码中抛出通常来说是因为错误地使用 ExecutionEnvironment
的问题。

changing flink/kafka configs for stateful flink streaming applications


不同于 Spark Stateful Streaming 在作业启动后就不能修改任何配置项，FLINK
可以在不同配置的集群之间迁移作业，甚至可以在运行过程中修改自己的配置。FLINK 1.9 引入的 Savepoint Processor API
可以进一步的修改 FLINK 的 savepoint 从而实现更多的重新配置作业的动作。

Window metadata removal


在某些场景下 window 的用户并不需要保存 metadata，但是 FLINK 目前对所有的 window 操作都会保存
metadata，这是为了使得 window 更加通用化。对于短窗口但是支持长滞后的作业，可以采用邮件列表中 Fabian
提供的改变作业描述的方式或者简单的通过加机器来解决无用的 metadata 过多导致的问题。

Flink on yarn use jar on hdfs


Shengnan YU 发现 FLINK 在 YARN 上启动一个新的应用的时候总是会 ship flink-uber jar，这对于部署多个
per-job 模式的任务会带来比较明显的性能负担。Yang Wang 开启了相关的 issue FLINK-13938
 旨在启动 YARN 上的 FLINK
应用的时候可以指定 HDFS 上 lib 的位置，避免重复 ship。

Difference between data stream window function and cep within


Dian Fu 解释了 FLINK 中 window function 和 CEP within 实现上的区别。

Extending Flink's SQL-Parser


Dominik 在具体业务场景下需要扩展 FLINK SQL 的语义，Rui Li 和 Rong Rong 简要介绍了 FLINK 中 SQL
Parser 的实现方式，提供了可以参考的资料。
已知缺陷

FLINK-14010 Dispatcher & JobManagers don't give up leadership when AM is
shut down 

FLINK on YARN 的一个 BUG。在 AM 与 YARN RM 心跳超时的情况下，可能出现即使旧的 AM 随后收到 YARN RM 发出的
shutdown 请求，新起的 AM 仍然拿不到 leadership 从而无法开始工作的情况。

FLINK-14107 Kinesis consumer record emitter deadlock under event time
alignment 

FLINK Kinesis 连接器的一个 BUG。在使用 event time alignment 的情况下 Kinesis consumer
有死锁的可能。已经在即将发布的 1.8.3 和 1.9.1 上修复。
开发讨论

[DISCUSS] FLIP-68: Extend Core Table System with Modular Plugins


Bowen Li 的 FLIP-68 来源于 对 Hive built-in function 在 FLINK SQL 中的支持的讨论，旨在通过在
FLINK 的 Table 系统之上搭建模块化的插件支持来实现前面提到的功能。

[DISCUSS] FLIP-70 - Support Computed Column for Flink SQL


Danny Chen 的 FLIP-70 旨在支持 FLINK SQL 上的 computed columns 功能。

[DISCUSS] FLIP-71 - E2E View support in Flink SQL


Zhenghua Gao 的 FLIP-71 旨在支持 FLINK SQL 上的 E2E View 功能。

FLIP-72: Introduce Pulsar Connector


Yijie Shen 的 FLIP-72 来源于此前关于在 FLINK scope 引入 Pulsar 连接器的讨论。目前关于如何处理这个
proposal 社区还在讨论当中，但是 Pulsar 连接器的代码已经可以在 Pulsar 代码仓库中找到，主要问题集中在是否要在 FLINK
代码仓库中维护这个连接器。
社区发展

Apache Flink on YARN with Kerberos Authentication


Ana Esguerra 发表了一篇讲述如何在 YARN 上的 FLINK 应用中使用 Kerberos 鉴权的博文。

Re: How to prevent from launching 2 jobs at the same time

2019-09-22 Thread Zili Chen

The situation is as Dian said. Flink identifies jobs by job id instead of
job name.

However, I think it is still a valid question if it is an alternative Flink
identifies jobs by job name and
leaves the work to distinguish jobs by name to users. The advantages in
this way includes a readable
display and interaction, as well as reduce some hardcode works on job id,
such as we always set
job id to new JobID(0, 0) in standalone per-job mode for getting the same
ZK path.

Best,
tison.


Dian Fu  于2019年9月23日周一 上午10:55写道：

> Hi David,
>
> The jobs are identified by job id, not by job name internally in Flink and
> so It will only check if there are two jobs with the same job id.
>
> If you submit the job via CLI[1], I'm afraid there are still no built-in
> ways provided as currently the job id is generated randomly when submitting
> a job via CLI and the generated job id has nothing to do with the job name.
> However, if you submit the job via REST API [2], it did provide an option
> to specify the job id when submitting a job. You can generate the job id by
> yourself.
>
> Regards,
> Dian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
>
> 在 2019年9月23日，上午4:57，David Morin  写道：
>
> Hi,
>
> What is the best way to prevent from launching 2 jobs with the same name
> concurrently ?
> Instead of doing a check in the script that starts the Flink job, I would
> prefer to stop a job if another one with the same name is in progress
> (Exception or something like that).
>
> David
>
>
>

Re: Recommended approach to debug this

2019-09-22 Thread Zili Chen

Hi Debasish,

As mentioned by Dian, it is an internal exception that should be always
caught by
Flink internally. I would suggest you share the job(abstractly). Generally
it is because
you use StreamPlanEnvironment/OptimizerPlanEnvironment directly.

Best,
tison.


Austin Cawley-Edwards  于2019年9月23日周一 上午5:09写道：

> Have you reached out to the FlinkK8sOperator team on Slack? They’re
> usually pretty active on there.
>
> Here’s the link:
>
> https://join.slack.com/t/flinkk8soperator/shared_invite/enQtNzIxMjc5NDYxODkxLTEwMThmN2I0M2QwYjM3ZDljYTFhMGRiNDUzM2FjZGYzNTRjYWNmYTE1NzNlNWM2YWM5NzNiNGFhMTkxZjA4OGU
>
>
>
> Best,
> Austin
>
> On Sun, Sep 22, 2019 at 12:38 PM Debasish Ghosh 
> wrote:
>
>> The problem is I am submitting Flink jobs to Kubernetes cluster using a
>> Flink Operator. Hence it's difficult to debug in the traditional sense of
>> the term. And all I get is the exception that I reported ..
>>
>> Caused by:
>> org.apache.flink.client.program.OptimizerPlanEnvironment$ProgramAbortException
>> at
>> org.apache.flink.streaming.api.environment.StreamPlanEnvironment.execute(StreamPlanEnvironment.java:66)
>> at
>> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
>> at
>> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)
>>
>> I am thinking that this exception must be coming because of some other
>> exceptions, which are not reported BTW. I expected a Caused By portion in
>> the stack trace. Any clue as to which area I should look into to debug this.
>>
>> regards.
>>
>> On Sat, Sep 21, 2019 at 8:10 AM Debasish Ghosh 
>> wrote:
>>
>>> Thanks for the pointer .. I will try debugging. I am getting this
>>> exception running my application on Kubernetes using the Flink operator
>>> from Lyft.
>>>
>>> regards.
>>>
>>> On Sat, 21 Sep 2019 at 6:44 AM, Dian Fu  wrote:
>>>
 This exception is used internally to get the plan of a job before
 submitting it for execution. It's thrown with special purpose and will be
 caught internally in [1] and will not be thrown to end users usually.

 You could check the following places to find out the cause to this
 problem:
 1. Check the execution environment you used
 2. If you can debug, set a breakpoint at[2] to see if the type of the
 env wrapped in StreamPlanEnvironment is OptimizerPlanEnvironment.
 Usually it should be.

 Regards,
 Dian

 [1]
 https://github.com/apache/flink/blob/e2728c0dddafcfe7fac0652084be6c7fd9714d85/flink-clients/src/main/java/org/apache/flink/client/program/OptimizerPlanEnvironment.java#L76
 [2]
 https://github.com/apache/flink/blob/15a7f978a2e591451c194c41408a12e64d04e9ba/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/environment/StreamPlanEnvironment.java#L57

 在 2019年9月21日，上午4:14，Debasish Ghosh  写道：

 Hi -

 When you get an exception stack trace like this ..

 Caused by:
 org.apache.flink.client.program.OptimizerPlanEnvironment$ProgramAbortException
 at
 org.apache.flink.streaming.api.environment.StreamPlanEnvironment.execute(StreamPlanEnvironment.java:66)
 at
 org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1507)
 at
 org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:654)

 what is the recommended approach of debugging ? I mean what kind of
 errors can potentially lead to such a stacktrace ? In my case it starts
 from env.execute(..) but does not give any information as to what can
 go wrong.

 Any help will be appreciated.

 regards.

 --
 Debasish Ghosh
 http://manning.com/ghosh2
 http://manning.com/ghosh

 Twttr: @debasishg
 Blog: http://debasishg.blogspot.com
 Code: http://github.com/debasishg


 --
>>> Sent from my iPhone
>>>
>>
>>
>> --
>> Debasish Ghosh
>> http://manning.com/ghosh2
>> http://manning.com/ghosh
>>
>> Twttr: @debasishg
>> Blog: http://debasishg.blogspot.com
>> Code: http://github.com/debasishg
>>
>

Re: Contribute as freelancer on a flink project from scratch

2019-09-16 Thread Zili Chen

Hi Deepak,

There is a similar thread[1] recently. Could you please checkout it first?

Briefly, Flink hasn't maintained a good first issue list so far. But the
community is open to answer questions from you and if you provide more
information about topics you might be interested in, members who are
familiar with those topic can possibly provide more assistance/guidance.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/4bb8549495917d312eec8d348087a07a650982f35df4797da0d86455@%3Cdev.flink.apache.org%3E


Deepak Sharma  于2019年9月16日周一 下午11:20写道：

> Hi All
> I am looking to contribute on a flink project (May be a enterprise use
> case)
> Please do let me know if anyone got such opportunities.
>
> Thanks
> Deepak
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>

FLINK WEEKLY 2019/37

2019-09-15 Thread Zili Chen

FLINK WEEKLY 2019/37 <https://zhuanlan.zhihu.com/p/82657420>

中秋节快乐呀 >_<

这里依旧是小 tison 跟大家分享上周 FLINK 的新进展，主要包括 USER 列表的用户问题，社区开发的最新状态和 1.8.2 版本的发布。
USER

多个keyBy时，只有最后一个超作用吗?
<https://lists.apache.org/x/thread.html/778fc353602525993ca9813af09d2a4e8e555ef4d7b07faf5a218560@%3Cuser-zh.flink.apache.org%3E>

确实如此，回复同时给出了想要针对不同 key 做复杂分组的一种可能的解决方案

Kafka 与 extractly-once
<https://lists.apache.org/x/thread.html/e4fc8d9c2092e5af743f3c1f222f3004bd9efac329acf34159ba2e1d@%3Cuser-zh.flink.apache.org%3E>
回复：Kafka
与 extractly-once
<https://lists.apache.org/x/thread.html/9973126f1deec91e87a552c9c53979c4276733f297fb739e11cf4090@%3Cuser-zh.flink.apache.org%3E>

常见问题，FLINK 保证自己框架内部的数据消费是 Exactly-Once 的，如果要和 Kafka 等外部系统实现整个 pipeline 的
Exactly-Once，需要使用额外的支持，例如 Kafka 的事务功能等

Flink大state读取磁盘，磁盘IO打满，任务相互影响的问题探讨
<https://lists.apache.org/x/thread.html/bea276d69a49212619e846ef5f9b132b2d4265de635922316993b9eb@%3Cuser-zh.flink.apache.org%3E>

Wesley Peng 提供了一种调优 RocksDB 的手法和思路

编译flink 1.9 flink-table-api-java 编译不过
<https://lists.apache.org/x/thread.html/45cb35f36835a825a03fd86f82113c66a41d486521de045473a746e3@%3Cuser-zh.flink.apache.org%3E>

问题本身和 FLINK 关系不大，是 JDK 的 bug 导致的，升级 JDK 的小版本可修复

Will there be a Flink 1.9.1 release ?
<https://lists.apache.org/x/thread.html/d40ed5a338bf32d516b7b1590fe3d0e37f3607466b5bb74234ad1725@%3Cuser.flink.apache.org%3E>

关于 1.9.1 发布是否包括某个 fix 的提问，回复解释了 FLINK 社区发版的策略，bugfix 版本并不会包括 master 上所有新的
commit

[SURVEY] How many people are using customized RestartStrategy(s)
<https://lists.apache.org/x/thread.html/8914256debb28b69a967fa3ae3a01a111b4987ae4d8bf8ea7277c92a@%3Cuser.flink.apache.org%3E>

Zhu Zhu 发起了关于用户自定义重启策略的调查，如果你使用了自己实现的重启策略，欢迎参与调查

How do I start a Flink application on my Flink+Mesos cluster?
<https://lists.apache.org/x/thread.html/ad2cb86ba1fa35481f734d30b06c5afadf05ae799e51958a60229e28@%3Cuser.flink.apache.org%3E>

FLINK on Mesos 的实践
DEV

Interested in contributing and looking for good first issues
<https://lists.apache.org/x/thread.html/4bb8549495917d312eec8d348087a07a650982f35df4797da0d86455@%3Cdev.flink.apache.org%3E>

FLINK 社区暂时没有一个专门的 good for first 的 label 标志适合新手上手的 issue，但是任何贡献者随时可以在对应的
JIRA 或者 dev 邮件列表上发声寻求帮助

[DISCUSS] Drop older versions of Kafka Connectors (0.9, 0.10) for Flink 1.10
<https://lists.apache.org/x/thread.html/58a84b9cac5e8fbb36e34a42237a778a7f2d4012f461742d15180f3e@%3Cuser.flink.apache.org%3E>

Stephan Ewen 发起了移除 Kafka 0.9 和 0.10 连接器的讨论，如果你在生产中使用该版本的 Kafka 连接器并且希望保持使用
FLINK 的最新版本，欢迎参与讨论并分享你的使用场景和需求

[DISCUSS] Retrieval services in non-high-availability scenario
<https://lists.apache.org/x/thread.html/584d84e2ac60de24a2ae1e814608f437057b3c4a264743ce90d21d0f@%3Cdev.flink.apache.org%3E>

tison 发起了关于 non-ha 场景下 FLINK
提供不支持容错的名称服务的讨论，原计划将现有的两种不完整实现统一为一个位置透明的单点实现，但是社区出于稳定性考虑最终达成的共识是拆分成更细分的步骤实现和审查

[DISCUSS] modular built-in functions
<https://lists.apache.org/x/thread.html/cd2dd1376915b3c64c0bd90c189b0faf60b1f6e17fdee8e2825dff2f@%3Cdev.flink.apache.org%3E>

Bowen Li 发起了关于支持通过 module 加载外部 built-in 函数的讨论，这分支于关于 FLINK Function Catalog
的讨论

Call for approving Elasticsearch 7.x connector
<https://lists.apache.org/x/thread.html/74b8c35bf58ffab0f9b72ab32ec9ea808038c4bc1a9171d1f90f3bf2@%3Cdev.flink.apache.org%3E>

Vino Yang 重新提起了关于 ES 7.x 连接器的讨论，由于 ES API 不兼容，FLINK 针对 ES 7.x 需要额外实现一个连接器

[DISCUSS] Contribute Pulsar Flink connector back to Flink
<https://lists.apache.org/x/thread.html/88960831be9ce316c45ec48aeece946ca90c944fa34ddcd39083c6c2@%3Cdev.flink.apache.org%3E>

Pulsar 社区的连接器贡献请求看上去已经被社区接受了，这次有不少于两三名 committer 表示会持续关注其进展并参与代码审查。虽然该连接器和
FLIP-27 的冲突仍在讨论中，但很有希望在 1.10 版本中看到这个新连接器被加入到功能集合中
NEWS

[ANNOUNCE] Apache Flink 1.8.2 released
<https://lists.apache.org/x/thread.html/50e9a48154f1e497d28443583dbd754540408a0f361249b3c5f9418f@%3Cuser.flink.apache.org%3E>

Jark Wu 发布了 FLINK 的 1.8.2 版本，bugfix 版本通常包含重要的 bugfix，欢迎跳转到邮件列表和相关链接查看 1.8.2
修复的 bug，以便确认自己是否需要升级到 1.8.2 或升级到 1.8.2 是否解决目前遇到的问题

[ANNOUNCE] Zili Chen becomes a Flink committer
<https://lists.apache.org/x/thread.html/384c91ce1dc6ae5a437f2d87d9878727f893d2a7182bae9831a32502@%3Cdev.flink.apache.org%3E>

小 tison 成为 FLINK 的 committer 啦 >_< 我主要的关注点在 FLINK runtime 方面，所以编撰 FLINK
WEEKLY 的时候也常常有看不懂某些 thread 的情况，如果你对 FLINK 有一定的了解，欢迎联系我一起编写 FLINK WEEKLY
呀。很惭愧，只做了一点微小的工作
Best,
tison.

Re: Will there be a Flink 1.9.1 release ?

2019-09-15 Thread Zili Chen

Hi Debasish,

>From the information in the corresponding JIRA[1] 1.9.1 is not a fixed
version
of the issue you referred to. Technically Flink back ports notable fixes to
brach release-1.9 and start the release of 1.9.1 from that branch.

Visually, it looks like

... - ... - PR#9565 - ... - master
\-release-1.9(for 1.9.0) - ...(without PR#9565) ... - release-1.9(for
1.9.1)

you can require to back port the commit to release-1.9 providing persuasive
reason
and if the community reaches a consensus to have it then it will be
included in
release-1.9.1.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-12501


Debasish Ghosh  于2019年9月9日周一 下午6:20写道：

> Thanks Kurt. I was just asking as it would help us a lot with the issue (
> https://github.com/apache/flink/pull/9565) that I mentioned in my mail.
> It got merged recently (after the 1.9.0 release).
>
> BTW, the issue you mentioned didn't fixed in 1.9.1.
>
>
> Will u please explain what u mean by this. The issue that I mentioned got
> merged recently after the 1.9.0 release.
>
> regards.
>
> On Mon, Sep 9, 2019 at 3:34 PM Kurt Young  wrote:
>
>> Hi Debasish,
>>
>> I think there is a good chance to have 1.9.1, the only question is when.
>> 1.9.0 released ~2 weeks ago, and I think some users are still under the
>> migration if they want to use 1.9.0. Wait another 1 or 2 weeks and also
>> see whether there are some critical bugs in 1.9.0 sounds reasonable to
>> me.
>>
>> BTW, the issue you mentioned didn't fixed in 1.9.1.
>>
>> Best,
>> Kurt
>>
>>
>> On Mon, Sep 9, 2019 at 5:54 PM Kostas Kloudas  wrote:
>>
>>> Hi Debasish,
>>>
>>> So far I am not aware of any concrete timeline for Flink 1.9.1 but
>>> I think that Gordon and Kurt (cc'ed) who were the release-1.9
>>> managers are the best to answer this question.
>>>
>>> Cheers,
>>> Kostas
>>>
>>> On Mon, Sep 9, 2019 at 9:38 AM Debasish Ghosh 
>>> wrote:
>>> >
>>> > Hello -
>>> >
>>> > Is there a plan for a Flink 1.9.1 release in the short term ? We are
>>> using Flink and Avro with Avrohugger generating Scala case classes form
>>> Avro schema. Hence we need https://github.com/apache/flink/pull/9565
>>> which has been closed recently.
>>> >
>>> > regards.
>>> >
>>> >
>>> > --
>>> > Debasish Ghosh
>>> > http://manning.com/ghosh2
>>> > http://manning.com/ghosh
>>> >
>>> > Twttr: @debasishg
>>> > Blog: http://debasishg.blogspot.com
>>> > Code: http://github.com/debasishg
>>>
>>
>
> --
> Debasish Ghosh
> http://manning.com/ghosh2
> http://manning.com/ghosh
>
> Twttr: @debasishg
> Blog: http://debasishg.blogspot.com
> Code: http://github.com/debasishg
>

FLINK WEEKLY 2019/36

2019-09-08 Thread Zili Chen

FLINK WEEKLY 2019/36 

很高兴和大家分享上周 FLINK 社区的发展。在过去的一周里，更多 FLINK 1.10 的特性被提出和讨论，包括新的 FLIP，来自 Apache
Pulsar 社区的 Connector 贡献等等。专门讨论 FLINK 1.10 将要实现什么特性的议程也在进行。
用户问题

Streaming File Sink疑问


FLINK 作业运行的集群和结果写入的 HDFS 集群不是同一个，配置 nameservices 正确寻址的方法

关于Flink SQL DISTINCT问题


FLINK SQL DISTINCT 窗口内去重的实现逻辑

flink1.9.0对DDL的支持


FLINK 1.9.0 仅支持通过 CLI Create View

如何优化flink内存?


特定业务使用滑动窗口导致占用大量内存，社区的成员分享了他们各自场景下对此问题的解决方案或绕过方案

flink1.9中blinkSQL对定义udf的TIMESTAMP类型报错


BLINK Planner 对 TIMESTAMP 支持与 FLINK Planner 的不同，确定为缺陷，将在 1.10 中修复

Making broadcast state queryable?


社区成员关于 queryable state 的改进建议，目前 queryable state 社区没有足够的 committer
能够参与到已有的改进方案讨论中。如果有更多的用户有对 queryable state 的需求的话，社区对此功能的优先级可能会重新定义

Post-processing batch JobExecutionResult


批作业场景下在 env.execute 后进一步处理作业结果的需求，目前除了提交到 session 集群以外由于 FLINK
作业提交逻辑的实现均无法做进一步处理。正在进行的 Client API 改进的讨论有助于改善这一情况

Flink SQL client support for running in Flink cluster


FLINK SQL Client 仅支持和预先部署的 standalone session 集群交互，是一个比较基础的实现。同样受限于目前
Client API 的缺陷，有望和 Client API 的改进一同得到改善
开发进展

FLINK-13954 Clean up ExecutionEnvironment / JobSubmission code paths


Client API 重构的一部分，重构 ExecutionEnvironment 和作业提交的遗留代码路径正在推进中

FLINK-13958 Job class loader may not be reused after batch job recovery


在最新的 Batch 作业恢复模式 region based restart 下，ClassLoader 加载 native library
可能会出现重复加载

[DISCUSS] Support JSON functions in Flink SQL


Xu Forward 发起了在 FLINK SQL 中支持 JSON 函数的讨论

[DISCUSS] Reducing build times


此前 Chesnay Schepler 发起的关于缩短 FLINK CI 构建时间的讨论有了新的进展，开始讨论是否将 FLINK 的 CI 迁移到
Travis 以外的系统上，以使得 e2e 的测试也能对每个 pull request 的提交都运行

[DISCUSS] Contribute Pulsar Flink connector back to Flink


来自 Apache Pulsar 社区的 Yijie Shen 提出了将适用于 FLINK 1.9.0 和 Pulsar 2.4.0 的
connector 贡献到 FLINK 社区的讨论。然而，由于此前 Pulsar connector 曾经提出过相同请求，但在提出 pull
request 之后被搁置至今，FLINK 社区和 Pulsar 社区正在商讨一个合适的贡献和维护 connector 的方式

[DISCUSS] FLIP-61 Simplify Flink's cluster level RestartStrategy
configuration


Till Rohrmann 提出的关于简化 FLINK 集群级别重启策略配置的 FLIP，已经被接受，正在实现中

[DISCUSS] FLIP-62: Set default restart delay for FixedDelay- and
FailureRateRestartStrategy to 1s


Till Rohrmann 提出的把重启策略的延迟默认值设置为非零值的 FLIP，这有助于避免 FLINK 以外的系统生成的流的重启。FLIP
已经被接受，正在实现中

FLIP-63: Rework table partition support


Jingsong Lee 提出的 FLIP-63，旨在重构 Table 的 partition 支持

[DISCUSS] FLIP-64: Support for Temporary Objects in Table module


Dawid Wysakowicz 提出的 FLIP-64，旨在支持 Table 中临时对象，是对 Catalog API 的功能完善

[DISCUSS] FLIP-66: Support time attribute in SQL DDL


Jark Wu 提出的 FLIP-66，旨在支持在 SQL DDL 中的时间属性，这将有助于用户对 DDL 生成的 Table 应用 window 操作
社区新闻

[DISCUSS] Features for Apache Flink 1.10


Gary Yao 发起了 FLINK 1.10 特性的讨论，旨在初步确定 FLINK 将会在 1.10 中引入什么功能和改进。同时，Gary
提议了自己和 Yu Li 担当 1.10 的 release manager

[ANNOUNCE] Kostas Kloudas joins the Flink PMC

Re: Post-processing batch JobExecutionResult

2019-09-06 Thread Zili Chen

Attach the missing link.

[1]
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit#heading=h.s8r4pkyalskt
[2]
https://lists.apache.org/x/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@


Zili Chen  于2019年9月7日周六 上午12:52写道：

> Besides, if you submit the job by Jar Run REST API, it is also
> OptimizerPlanEnvironment to be used. So again, _no_ post
> processing support at the moment.
>
>
> Zili Chen  于2019年9月7日周六 上午12:51写道：
>
>> Hi spoganshev,
>>
>> If you deploy in per-job mode, OptimizerPlanEnvironment would be used,
>> and thus
>> as you pointed out, there is _no_ way to post processing
>> JobExecutionResult.
>> We the community regard this situation as a shortcoming and work on an
>> enhancement
>> progress to enable you get a JobClient as return value of #execute in all
>> deployment
>> and execution mode. Take a look at [1] and [2] for a preview and feel
>> free to describe
>> your requirement so that the following version can satisfy your demand.
>>
>> Besides, if you deploy in session mode, which might be more natural in
>> batch cases,
>> at the moment ContextEnvironment is used, which execute normally and
>> return the
>> JobExecutionResult that you can make use of.
>>
>> Simply sum up, you can try out session mode deployment to see if it
>> satisfy your
>> requirement on post processing.
>>
>> Best,
>> tison.
>>
>>
>> Zhu Zhu  于2019年9月7日周六 上午12:07写道：
>>
>>> Hi spoganshev,
>>>
>>> The *OptimizerPlanEnvironment* is for creating optimized plan only, as
>>> described in the javadoc
>>> "An {@link ExecutionEnvironment} that never executes a job but only
>>> creates the optimized plan."
>>> It execute() is invoked with some internal handling so that it only
>>> generates optimized plan and do not actually submit a job.
>>> Some other execution environment will execute the job instead.
>>>
>>> Not sure how you created your ExecutionEnvironment?
>>> Usually for DataSet jobs, it should be created in the way as below.
>>> "final ExecutionEnvironment env =
>>> ExecutionEnvironment.getExecutionEnvironment();"
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> spoganshev  于2019年9月6日周五 下午11:39写道：
>>>
>>>> Due to OptimizerPlanEnvironment.execute() throwing exception on the
>>>> last line
>>>> there is not way to post-process batch job execution result, like:
>>>>
>>>> JobExecutionResult r = env.execute(); // execute batch job
>>>> analyzeResult(r); // this will never get executed due to plan
>>>> optimization
>>>>
>>>>
>>>> https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/OptimizerPlanEnvironment.java#L54
>>>>
>>>> Is there any way to allow such post-processing in batch jobs?
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from:
>>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>>>
>>>

Re: Post-processing batch JobExecutionResult

2019-09-06 Thread Zili Chen

Besides, if you submit the job by Jar Run REST API, it is also
OptimizerPlanEnvironment to be used. So again, _no_ post
processing support at the moment.


Zili Chen  于2019年9月7日周六 上午12:51写道：

> Hi spoganshev,
>
> If you deploy in per-job mode, OptimizerPlanEnvironment would be used, and
> thus
> as you pointed out, there is _no_ way to post processing
> JobExecutionResult.
> We the community regard this situation as a shortcoming and work on an
> enhancement
> progress to enable you get a JobClient as return value of #execute in all
> deployment
> and execution mode. Take a look at [1] and [2] for a preview and feel free
> to describe
> your requirement so that the following version can satisfy your demand.
>
> Besides, if you deploy in session mode, which might be more natural in
> batch cases,
> at the moment ContextEnvironment is used, which execute normally and
> return the
> JobExecutionResult that you can make use of.
>
> Simply sum up, you can try out session mode deployment to see if it
> satisfy your
> requirement on post processing.
>
> Best,
> tison.
>
>
> Zhu Zhu  于2019年9月7日周六 上午12:07写道：
>
>> Hi spoganshev,
>>
>> The *OptimizerPlanEnvironment* is for creating optimized plan only, as
>> described in the javadoc
>> "An {@link ExecutionEnvironment} that never executes a job but only
>> creates the optimized plan."
>> It execute() is invoked with some internal handling so that it only
>> generates optimized plan and do not actually submit a job.
>> Some other execution environment will execute the job instead.
>>
>> Not sure how you created your ExecutionEnvironment?
>> Usually for DataSet jobs, it should be created in the way as below.
>> "final ExecutionEnvironment env =
>> ExecutionEnvironment.getExecutionEnvironment();"
>>
>> Thanks,
>> Zhu Zhu
>>
>> spoganshev  于2019年9月6日周五 下午11:39写道：
>>
>>> Due to OptimizerPlanEnvironment.execute() throwing exception on the last
>>> line
>>> there is not way to post-process batch job execution result, like:
>>>
>>> JobExecutionResult r = env.execute(); // execute batch job
>>> analyzeResult(r); // this will never get executed due to plan
>>> optimization
>>>
>>>
>>> https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/OptimizerPlanEnvironment.java#L54
>>>
>>> Is there any way to allow such post-processing in batch jobs?
>>>
>>>
>>>
>>>
>>> --
>>> Sent from:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>>
>>

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Zili Chen

Congrats Klou!

Best,
tison.


Till Rohrmann  于2019年9月6日周五 下午9:23写道：

> Congrats Klou!
>
> Cheers,
> Till
>
> On Fri, Sep 6, 2019 at 3:00 PM Dian Fu  wrote:
>
>> Congratulations Kostas!
>>
>> Regards,
>> Dian
>>
>> > 在 2019年9月6日，下午8:58，Wesley Peng  写道：
>> >
>> > On 2019/9/6 8:55 下午, Fabian Hueske wrote:
>> >> I'm very happy to announce that Kostas Kloudas is joining the Flink
>> PMC.
>> >> Kostas is contributing to Flink for many years and puts lots of effort
>> in helping our users and growing the Flink community.
>> >> Please join me in congratulating Kostas!
>> >
>> > congratulation Kostas!
>> >
>> > regards.
>>
>>

Re: 希望获得一份编译后的1.6版本以上flink源码

2019-09-06 Thread Zili Chen

你好，官方下载页面有编译后的二进制版本，我看你的要求是 1.6 版本*以上*，
官网提供了1.7.2 1.8.1 和 1.9.0 的二进制版本。

具体见 https://flink.apache.org/downloads.html

Best,
tison.


Wesley Peng  于2019年9月6日周五 下午5:16写道：

> Helo
>
> guaishushu1...@163.com wrote:
> > 因windows系统搭载虚拟机环境，总是不能成功编译flink源码，所以希望能得到一份编译后的1.6版本以上的flink源码
>
> there is docker image for flink 1.9. since you are using a virtual
> machine, then docker might be used.
>
> regards
>

FLINK WEEKLY 2019/35

2019-09-01 Thread Zili Chen

FLINK WEEKLY 2019/35 

FLINK 社区正在如火如荼的开发 1.10 的新特性中，许多对 FLINK
现有局限的讨论，包括功能上的、配置上的和文档上的问题都在热烈的讨论中。上周，user-zh
列表活跃度大大增加，社区的开发者和使用者对用户的问题的回复也非常迅速，FLINK 中文社区的壮大有目共睹。本周仍然分为用户列表的问答，FLINK
开发的进展和社区事件三个部分为大家推送上周的 FLINK 社区新闻。
USER

flink 1.9 消费kafka报错


实际问题是使用 BLINK planner 的问题，阿里的开发者介绍了使用 BLINK planner 的姿势。

flink1.9 blink planner table ddl 使用问题

flink1.9
Blink planner create view 问题


同样是 BLINK planner 的使用姿势问题。

关于elasticSearch table sink 构造过于复杂


查询结果输出到 ES sink 的连接方式。

关于flink状态后端使用Rocksdb序列化问题


升级到 FLINK 1.8 使用 POJO Scheme Evolution 支持状态模式演化。

Checkpoint使用


作业从 Checkpoint 而不是 Savepoint 中恢复的方式，恢复时可以在一定程度上调整并行度。

FLINK 1.9 Docker 镜像 

FLINK 1.9 Docker 镜像已经发布，包括 Scala 2.11 和 2.12 的支持版本。

How can TMs distribute evenly over Flink on YARN cluster?


FLINK 目前无法保证在 YARN 上起作业的时候 TM 尽量分配到不同的节点上。

type error with generics


FLINK Java API 使用时有时需要手动添加类型信息，在 Scala 的情况下由于有 implicit 所以有时候两种 API 的表现很不相同。

Re: Flink operators for Kubernetes


k8s 上的 FLINK operator 已经由 Apache Beam 社区的成员开发出来了，有 FLINK on k8s 需求的同学可以尝试使用。

Is there Go client for Flink?


目前 FLINK 只有 Java Client 和 REST API，使用 Go 的用户可以通过 REST API 来控制 FLINK
作业的提交和监控。

How to handle Flink Job with 400MB+ Uberjar with 800+ containers ?


FLINK 大作业包含大的 uberjar 的情况下的最佳实践，主要受限于 FLINK Resource Manager
的一些缺陷。阿里和腾讯的开发者都分享了自己处理大作业大包的方案。
DEV

[DISCUSS] FLIP-57 - Rework FunctionCatalog


Bowen Li 的 FLIP-57 旨在提供更好的 FLINK SQL 的开发和编写体验。

[DISCUSS] FLIP-60: Restructure the Table API & SQL documentation


Timo Walther 的 FLIP-60 旨在将 Table API & SQL 的文档从原来附属于 DataStream API
的情况提升为第一层级的文档。FLINK 的用户很多都通过编写 SQL 来实现自己的作业，文档的提升有助于改善用户开发时查阅相关信息的体验。

[DISCUSS] FLIP-59: Enable execution configuration from Configuration object


Dawid Wysakowicz 的 FLIP-59 与 FLIP-54 关系紧密，都是着重在改善 FLINK 的配置情况。目前，FLINK 的
execution configuration 只能在编写程序的时候从程序中设置，与其他许多配置可以通过配置文件或命令行参数等方法传递不一致。

[DISCUSS] Simplify Flink's cluster level RestartStrategy configuration


Till Rohrmann 发起了简化 FLINK 集群级别重启策略配置的讨论，目前 FLINK
的重启策略配置在演化过程中变得很复杂，主要是除了推荐的 restart-strategy 配置外还有非常多的默认行为。

Re: [DISCUSS] Flink client api enhancement for downstream project


Kostas Kloudas 更新了 Client API 重构的进展，按照开发文档实现 JobClient 和多部署后端的 Executor
的原型已经在开发中。
NEWS

[ANNOUNCE] Apache Flink-shaded 8.0 released


Apache Flink-shaded 8.0 发布，Chesnay Schepler 是本次的 release manager，这个项目为
FLINK 提供了 shaded 的依赖。

[DISCUSS] Releasing Flink 1.8.2


jincheng sun 发起了 FLINK 1.8.2 的发布讨论，有望在近期发布 1.8.2 版本。

Best,
tison.

Re: [SURVEY] How do you use high-availability services in Flink?

2019-08-28 Thread Zili Chen

Thanks for your email Aleksandar! Sorry for reply late.

May I ask a question, do you config high-availability.storageDir in your
case?
That is, do you persist and retrieve job graph & checkpoint totally in MapDB
or, as ZooKeeper implementation does, persist them in an external filesystem
and just store a handle in MapDB?

Best,
tison.


Aleksandar Mastilovic  于2019年8月24日周六 上午7:04写道：

> Hi all,
>
> Since I’m currently working on an implementation of
> HighAvailabilityServicesFactory I thought it would be good to report here
> about my experience so far.
>
> Our use case is cloud based, where we package Flink and our supplementary
> code into a docker image, then run those images through Kubernetes+Helm
> orchestration.
>
> We don’t use Hadoop nor HDFS but rather Google Cloud Storage, and we don’t
> run ZooKeepers. Our Flink setup consists of one JobManager and multiple
> TaskManagers on-demand.
>
> Due to the nature of cloud computing there’s a possibility our JobManager
> instance might go down, only to be automatically recreated through
> Kubernetes. Since we don’t run ZooKeeper
> We needed a way to run a variant of High Availability cluster where we
> would keep JobManager information on our attached persistent k8s volume
> instead of ZooKeeper.
> We found this (
> https://stackoverflow.com/questions/52104759/apache-flink-on-kubernetes-resume-job-if-jobmanager-crashes/52112538)
> post on StackOverflow and decided to give it a try.
>
> So far we have a setup that seems to be working on our local deployment,
> we haven’t yet tried it in the actual cloud.
>
> As far as implementation goes, here’s what we did:
>
> We used MapDB (mapdb.org) as our storage format, to persist lists of
> objects onto disk. We partially relied on StandaloneHaServices for our
> HaServices implementation. Otherwise we looked at the ZooKeeperHaServices
> and related classes for inspiration and guidance.
>
> Here’s a list of new classes:
>
> FileSystemCheckpointIDCounter implements CheckpointIDCounter
> FileSystemCheckpointRecoveryFactory implements CheckpointRecoveryFactory
> FileSystemCompletedCheckpointStore implements CompletedCheckpointStore
> FileSystemHaServices extends StandaloneHaServices
> FileSystemHaServicesFactory implements HighAvailabilityServicesFactory
> FileSystemSubmittedJobGraphStore implements SubmittedJobGraphStore
>
> Testing so far proved that bringing down a JobManager and bringing it back
> up does indeed restore all the running jobs. Job creation/destruction also
> works.
>
> Hope this helps!
>
> Thanks,
> Aleksandar Mastilovic
>
> On Aug 21, 2019, at 12:32 AM, Zili Chen  wrote:
>
> Hi guys,
>
> We want to have an accurate idea of how users actually use
> high-availability services in Flink, especially how you customize
> high-availability services by HighAvailabilityServicesFactory.
>
> Basically there are standalone impl., zookeeper impl., embedded impl.
> used in MiniCluster, YARN impl. not yet implemented, and a gate to
> customized implementations.
>
> Generally I think standalone impl. and zookeeper impl. are the most
> widely used implementations. The embedded impl. is used without
> awareness when users run a MiniCluster.
>
> Besides that, it is helpful to know how you guys customize
> high-availability services using HighAvailabilityServicesFactory
> interface for the ongoing FLINK-10333[1] which would evolve
> high-availability services in Flink. As well as whether there is any
> user take interest in the not yet implemented YARN impl..
>
> Any user case should be helpful. I really appreciate your time and your
> insight.
>
> Best,
> tison.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10333
>
>
>

Re: flink日志级别问题

2019-08-27 Thread Zili Chen

另一种思路如果可以 download 日志的话可以直接文本处理挑出 ERROR 日志（x

Best,
tison.


陈思 <58683...@qq.com> 于2019年8月27日周二 下午6:56写道：

> 目的：调整flink作业日志级别为ERROR
>
>
> 背景：公司提交flink作业只能通过公司的可视化平台，因此不能修改集群上的log4j.properties文件，现在的日志级别是INFO，日志太多不方便排错
>
> 目前情况：打算在代码中设置日志级别，使用LogManager.getRootLogger().setLevel(Level.ERROR);在算子的open方法中填入上述代码。但依然会输出INFO日志，请问大神们有什么解决方案吗？

Re: flink1.9中关于blink的文档在哪看呀

2019-08-26 Thread Zili Chen

Blink 的文档应该都在 [1] 了，并没有跟着 Flink 版本变化而变化的意思呀（x

Best,
tison.

[1] https://github.com/apache/flink/blob/blink/README.md


rockey...@163.com  于2019年8月27日周二 上午10:18写道：

>
> hi,all
> flink1.9中关于blink的文档在哪看呀?找了半天找不到 0.0
>
>
> rockey...@163.com
> Have a good day !
>

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-26 Thread Zili Chen

Hi Oytun,

I think it intents to publish flink-queryable-state-client-java
without scala suffix since it is scala-free. An artifact without
scala suffix has been published [2].

See also [1].

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-12602
[2]
https://mvnrepository.com/artifact/org.apache.flink/flink-queryable-state-client-java/1.9.0



Till Rohrmann  于2019年8月26日周一 下午3:50写道：

> The missing support for the Scala shell with Scala 2.12 was documented in
> the 1.7 release notes [1].
>
> @Oytun, the docker image should be updated in a bit. Sorry for the
> inconveniences. Thanks for the pointer that
> flink-queryable-state-client-java_2.11 hasn't been published. We'll upload
> this in a bit.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html#scala-shell-does-not-work-with-scala-212
>
> Cheers,
> Till
>
> On Sat, Aug 24, 2019 at 12:14 PM chaojianok  wrote:
>
>> Congratulations and thanks!
>> At 2019-08-22 20:03:26, "Tzu-Li (Gordon) Tai" 
>> wrote:
>> >The Apache Flink community is very happy to announce the release of
>> Apache
>> >Flink 1.9.0, which is the latest major release.
>> >
>> >Apache Flink® is an open-source stream processing framework for
>> >distributed, high-performing, always-available, and accurate data
>> streaming
>> >applications.
>> >
>> >The release is available for download at:
>> >https://flink.apache.org/downloads.html
>> >
>> >Please check out the release blog post for an overview of the
>> improvements
>> >for this new major release:
>> >https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>> >
>> >The full release notes are available in Jira:
>> >
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344601
>> >
>> >We would like to thank all contributors of the Apache Flink community who
>> >made this release possible!
>> >
>> >Cheers,
>> >Gordon
>>
>

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread Zili Chen

不应该呀，我看到仍然有

def registerDataStream[T](name: String, dataStream: DataStream[T], fields:
Expression*): Unit

这个方法的，你能提供完整一点的上下文和报错吗？

Best,
tison.


ddwcg <3149768...@qq.com> 于2019年8月26日周一 上午11:38写道：

> 感谢您的回复，确实是这个原因。但是后面注册表的时候不能使用 Expression
> 总是感觉 java api 和scala api有点混乱了
>
>
> 在 2019年8月26日，11:22，Zili Chen  写道：
>
> 试试把
>
> import
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
>
> 换成
>
> import org.apache.flink.table.api.scala.StreamExecutionEnvironment
>
> 应该是意外 import 了不同包下的同名类的缘故
>
> Best,
> tison.
>
>
> ddwcg <3149768...@qq.com> 于2019年8月26日周一 上午11:12写道：
>
> 大家好，
> 我周末升级到了1.9.0，但是在初始化table env的时候编译不通过，请大家帮忙看看是什么原因，
>
> import org.apache.flink.streaming.api.CheckpointingMode
> import
> org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup
> import
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
> import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
> import org.apache.flink.streaming.util.serialization.SimpleStringSchema
> import org.apache.flink.table.api.scala.StreamTableEnvironment
> import org.apache.flink.table.planner.expressions.StddevPop
> import org.apache.kafka.clients.consumer.ConsumerConfig
> import org.apache.kafka.clients.producer.ProducerConfig
>
> object StreamingJob {
>  def main(args: Array[String]) {
>val kafkaTopic = "source.kafka.topic"
>val jobName ="test"
>val parallelism =1
>val checkPointPath ="checkpoint/"
>val kafkaBrokers =""
>
>// set up the streaming execution environment
>val env = StreamExecutionEnvironment.getExecutionEnvironment
>env.setParallelism(parallelism)
>env.enableCheckpointing(1)
>
>
> env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
>env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
>
>
> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
>//env.setStateBackend(new FsStateBackend(checkPointPath))
>
>
>val tableEnv = StreamTableEnvironment.create(env)
>
>
> 提示有多个实现：
>
> 下面是pom文件：
>
> 
>   org.apache.flink
>   flink-scala_${scala.binary.version}
>   ${flink.version}
>   compile
> 
> 
>   org.apache.flink
>   flink-streaming-scala_${scala.binary.version}
>   ${flink.version}
>   compile
> 
> 
>   org.apache.flink
>
>   flink-table-planner-blink_${scala.binary.version}
>   ${flink.version}
>   provided
> 
> 
> 
>   org.apache.flink
>   flink-table-runtime-blink_2.11
>   1.9.0
> 
> 
>   org.apache.flink
>   flink-connector-kafka_2.11
>   1.9.0
> 
> 
>   org.apache.flink
>   flink-table-common
>   ${flink.version}
>   provided
> 
>
>
>
>
>
>

Re: flink 1.9.0 StreamTableEnvironment 编译错误

2019-08-25 Thread Zili Chen

试试把

import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment

换成

import org.apache.flink.table.api.scala.StreamExecutionEnvironment

应该是意外 import 了不同包下的同名类的缘故

Best,
tison.


ddwcg <3149768...@qq.com> 于2019年8月26日周一 上午11:12写道：

> 大家好，
> 我周末升级到了1.9.0，但是在初始化table env的时候编译不通过，请大家帮忙看看是什么原因，
>
> import org.apache.flink.streaming.api.CheckpointingMode
> import 
> org.apache.flink.streaming.api.environment.CheckpointConfig.ExternalizedCheckpointCleanup
> import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
> import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer
> import org.apache.flink.streaming.util.serialization.SimpleStringSchema
> import org.apache.flink.table.api.scala.StreamTableEnvironment
> import org.apache.flink.table.planner.expressions.StddevPop
> import org.apache.kafka.clients.consumer.ConsumerConfig
> import org.apache.kafka.clients.producer.ProducerConfig
>
> object StreamingJob {
>   def main(args: Array[String]) {
> val kafkaTopic = "source.kafka.topic"
> val jobName ="test"
> val parallelism =1
> val checkPointPath ="checkpoint/"
> val kafkaBrokers =""
>
> // set up the streaming execution environment
> val env = StreamExecutionEnvironment.getExecutionEnvironment
> env.setParallelism(parallelism)
> env.enableCheckpointing(1)
> 
> env.getCheckpointConfig.setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE)
> env.getCheckpointConfig.setMaxConcurrentCheckpoints(1)
> 
> env.getCheckpointConfig.enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION)
> //env.setStateBackend(new FsStateBackend(checkPointPath))
>
>
> val tableEnv = StreamTableEnvironment.create(env)
>
>
> 提示有多个实现：
>
> 下面是pom文件：
>
> 
>org.apache.flink
>flink-scala_${scala.binary.version}
>${flink.version}
>compile
> 
> 
>org.apache.flink
>flink-streaming-scala_${scala.binary.version}
>${flink.version}
>compile
> 
> 
>org.apache.flink
>flink-table-planner-blink_${scala.binary.version}
>${flink.version}
>provided
> 
> 
> 
>org.apache.flink
>flink-table-runtime-blink_2.11
>1.9.0
> 
> 
>org.apache.flink
>flink-connector-kafka_2.11
>1.9.0
> 
> 
>org.apache.flink
>flink-table-common
>${flink.version}
>provided
> 
>
>
>
>

FLINK WEEKLY 2019/34

2019-08-25 Thread Zili Chen

很高兴和各位分享 FLINK 社区上周的发展。上周 FLINK 1.9.0[1]
正式发布了，本次发布的重大更新包括细粒度的恢复机制（FLIP-1）、State 处理 API（FLIP-43）、提供强一致性保证的
stop-with-savepoint（FLIP-43）以及利用 Angular 7.x 重写的 FLINK WebUI
等。此外，本次发布还包括了一系列正在开发中的供用户预览的特性，例如 Blink 的 SQL Query Processor，Hive 的整合，以及新的
Python Table API（FLIP-38）。欢迎大家下载 FLINK 1.9.0 尝试新功能！

同上次 WEEKLY[2]一样，FLINK WEEKLY 分为 USER、DEV 和 NEWS
三个部分，分别关注到用户问题的解答、社区开发的进展和社区的新闻。

[0] https://zhuanlan.zhihu.com/p/79781544
[1] https://flink.apache.org/news/2019/08/22/release-1.9.0.html
[2] https://zhuanlan.zhihu.com/p/78753149

@USER

[3] build Flink master brach fail due to npm

从源码 build FLINK 项目的时候，有时候会由于 npm 的问题（通常是网络问题）导致 build 速度慢或 build 失败，由于 npm
仅用于 build FLINK 的 WebUI，可以通过向 maven 传递参数 `-Dskip.npm` 来跳过 npm 的过程，减少 build
的时间。

[4] Flink Kafka Connector相关问题

FLINK 和 Kafka 协作时的一致性保证问题，相关的几个 offset 的定义和理解。

[5] flink1.9.0 LOCAL_WEBSERVER 问题

如果要在自己的项目中要使用 FLINK 的 WebUI，需要依赖 `filnk-runtime-web` 项目，大部分 REST 相关功能仅依赖
`flink-runtime` 但是小部分 REST 接口以及 Angular 开发的 WebUI 均依赖于 `flink-runtime-web`。

[6] processing avro data source using DataSet API and output to parquet

如何使用 FLINK 与 avro 和 parquet 协作？

[7] Using S3 as a sink (StreamingFileSink)

用户在将 S3 作为 `StreamingFileSink` 连接到 FLINK 之后发现无法从 savepoint 当中恢复作业，这可能与 S3
管理文件的策略有关。

[8] Issue with FilterableTableSource and the logical optimizer rules

FilterableTableSource 的使用过程中 CALCITE 引发的作业失败，社区 Committer 提供了一种 workaround
但是完整的修复还在进行中。

[9] Recovery from job manager crash using check points

FLINK 的 job graph store 和 checkpoint store 分别在 JobManager 和 TaskManager
挂掉的情况下提供高可用机制，根据实现的不同可能需要依赖 ZooKeeper 集群来存储元数据。

[10] Can I use watermarkers to have a global trigger of different
ProcessFunction's?

FLINK 中 watermark 的语义和正确的使用姿势。

[11] Externalized checkpoints

External checkpoint 使用过程中 retain 的策略和清理的策略设置。

[12] [SURVEY] How do you use high-availability services in Flink?

基于正在进行的 FLINK 高可用机制重构工作（FLINK-10333），我发起了一个用户使用 FLINK
的高可用机制的调查，欢迎参与到调查中介绍你的使用方式。

[3]
https://lists.apache.org/x/thread.html/3d983f5c49b88a316a2e13fdefa10548584c6e841270923033962dc0@%3Cuser-zh.flink.apache.org%3E
[4]
https://lists.apache.org/x/thread.html/b64e1cd6fc02239589fe3a316293b07ad47ab84f8f62b96b9198b8dc@%3Cuser-zh.flink.apache.org%3E
[5]
https://lists.apache.org/x/thread.html/2f6e5624079ecb09b18affc18ebf9dce2abba8ecb701657c84043e27@%3Cuser-zh.flink.apache.org%3E
[6]
https://lists.apache.org/x/thread.html/9349327ab7f130bcaca1b4c3515fcfc6b89b12ac2fac53540cc996df@%3Cuser.flink.apache.org%3E
[7]
https://lists.apache.org/x/thread.html/a23cb1b0247bb3d9206717bf99c735e11ffe3548fe58fdee8fb96ccc@%3Cuser.flink.apache.org%3E
[8]
https://lists.apache.org/x/thread.html/69dca869019f39c469907eb23f5dba02696d8fc1fd8ba86d870282e6@%3Cuser.flink.apache.org%3E
[9]
https://lists.apache.org/x/thread.html/598f3c6d92c316a78e28c8aefb6aa5a00ddea6cdf2dd2c937d635275@%3Cuser.flink.apache.org%3E
[10]
https://lists.apache.org/x/thread.html/3bebd6e6bb3a11eeb3bc5d5943b7bfce333b737cae3419ebab6490ea@%3Cuser.flink.apache.org%3E
[11]
https://lists.apache.org/x/thread.html/166f9e21411a1c3b8d749b9b51875f9ff7a1a497debd35603243144a@%3Cuser.flink.apache.org%3E
[12]
https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E

@DEV

[13] [DISCUSS] FLIP-55: Introduction of a Table API Java Expression DSL

Timo Walther 发起了 FLIP-55 的讨论，旨在为 Table API 提供一个 Expression DSL 以方便用户编写程序。

[14] [DISCUSS] FLIP-56: Dynamic Slot Allocation

Xintong Song 发起了 FLIP-56 的讨论，旨在提供动态的 Slot 分配策略以更好的利用集群的资源。

[15] [DISCUSS] Upgrade kinesis connector to Apache 2.0 License and include
it in official release

更新 Kinesis 连接器的版本使得其 License 为 Apache 2.0 License，此后 FLINK 就可以在发布中直接包含
Kinesis 连接器。

[16] Support disk spilling in HeapKeyedStateBackend

Yu Li 提出的 FLIP-50: Spill-able Heap Keyed State Backend

进入开发阶段。

[17] [DISCUSS] Enhance Support for Multicast Communication Pattern

Yun Gao 发起了改进 FLINK 多播通信模式的讨论，这一改进旨在支持算子间更复杂的通信。

[18] CiBot Update

Chesnay Schepler 为 FLINK 的 pull request 机器人增加了发送 comment 重新跑测试的功能，这一功能在
ZooKeeper 等社区中也被广泛实现，此后 contributor 不用再通过发送空 commit 或关闭再打开 pull request
来触发重新跑测试。

[19] [DISCUSS] Use Java's Duration instead of Flink's Time

Stephan Ewen 发起了使用 Java 8 的 `Duration` 替代 FLINK 的 `Time` 的讨论，此前 FLINK
有两个简单实现的 `Time` 类来表示 runtime 中处理一段时间的概念，这经常引起开发者和用户的困惑。

[13]
https://lists.apache.org/x/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
)
[14]
https://lists.apache.org/x/thread.html/72e5c211fb39ac1c596e12ae096d593ca30118dc12dcf664b7538624@%3Cdev.flink.apache.org%3E
[15]
https://lists.apache.org/x/thread.html/3876eec7aced42d2ac28728bc5084980ed7bf8ca6a6a8ed56e01e387@%3Cdev.flink.apache.org%3E
[16] https://issues.apache.org/jira/browse/FLINK-12692
[17]
https://lists.apache.org/x/thread.html/06834937769fda7c7afa4114e4f2f4ec84d95a54cc6ec46a5aa839de@%3Cdev.flink.apache.org%3E
[18]

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Zili Chen

Hi Till,

Did we mention this in release note(or maybe previous release note where we
did the exclusion)?

Best,
tison.


Till Rohrmann  于2019年8月23日周五 下午10:28写道：

> Hi Gavin,
>
> if I'm not mistaken, then the community excluded the Scala FlinkShell
> since a couple of versions for Scala 2.12. The problem seems to be that
> some of the tests failed. See here [1] for more information.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10911
>
> Cheers,
> Till
>
> On Fri, Aug 23, 2019 at 11:44 AM Gavin Lee  wrote:
>
>> I used package on apache official site, with mirror [1], the difference is
>> I used scala 2.12 version.
>> I also tried to build from source for both scala 2.11 and 2.12, when build
>> 2.12 the FlinkShell.class is in flink-dist jar file but after running mvn
>> clean package -Dscala-2.12, this class was removed in flink-dist_2.12-1.9
>> jar
>> file.
>> Seems broken here for scala 2.12, right?
>>
>> [1]
>>
>> http://mirror.bit.edu.cn/apache/flink/flink-1.9.0/flink-1.9.0-bin-scala_2.12.tgz
>>
>> On Fri, Aug 23, 2019 at 4:37 PM Zili Chen  wrote:
>>
>> > I downloaded 1.9.0 dist from here[1] and didn't see the issue. Where do
>> you
>> > download it? Could you try to download the dist from [1] and see whether
>> > the problem last?
>> >
>> > Best,
>> > tison.
>> >
>> > [1]
>> >
>> http://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.9.0/flink-1.9.0-bin-scala_2.11.tgz
>> >
>> >
>> > Gavin Lee  于2019年8月23日周五 下午4:34写道：
>> >
>> >> Thanks for your reply @Zili.
>> >> I'm afraid it's not the same issue.
>> >> I found that the FlinkShell.class was not included in flink dist jar
>> file
>> >> in 1.9.0 version.
>> >> Nowhere can find this class file inside jar, either in opt or lib
>> >> directory under root folder of flink distribution.
>> >>
>> >>
>> >> On Fri, Aug 23, 2019 at 4:10 PM Zili Chen 
>> wrote:
>> >>
>> >>> Hi Gavin,
>> >>>
>> >>> I also find a problem in shell if the directory contain whitespace
>> >>> then the final command to run is incorrect. Could you check the
>> >>> final command to be executed?
>> >>>
>> >>> FYI, here is the ticket[1].
>> >>>
>> >>> Best,
>> >>> tison.
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/FLINK-13827
>> >>>
>> >>>
>> >>> Gavin Lee  于2019年8月23日周五 下午3:36写道：
>> >>>
>> >>>> Why bin/start-scala-shell.sh local return following error?
>> >>>>
>> >>>> bin/start-scala-shell.sh local
>> >>>>
>> >>>> Error: Could not find or load main class
>> >>>> org.apache.flink.api.scala.FlinkShell
>> >>>> For flink 1.8.1 and previous ones, no such issues.
>> >>>>
>> >>>> On Fri, Aug 23, 2019 at 2:05 PM qi luo  wrote:
>> >>>>
>> >>>>> Congratulations and thanks for the hard work!
>> >>>>>
>> >>>>> Qi
>> >>>>>
>> >>>>> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> >>>>> wrote:
>> >>>>>
>> >>>>> The Apache Flink community is very happy to announce the release of
>> >>>>> Apache Flink 1.9.0, which is the latest major release.
>> >>>>>
>> >>>>> Apache Flink® is an open-source stream processing framework for
>> >>>>> distributed, high-performing, always-available, and accurate data
>> streaming
>> >>>>> applications.
>> >>>>>
>> >>>>> The release is available for download at:
>> >>>>> https://flink.apache.org/downloads.html
>> >>>>>
>> >>>>> Please check out the release blog post for an overview of the
>> >>>>> improvements for this new major release:
>> >>>>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>> >>>>>
>> >>>>> The full release notes are available in Jira:
>> >>>>>
>> >>>>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344601
>> >>>>>
>> >>>>> We would like to thank all contributors of the Apache Flink
>> community
>> >>>>> who made this release possible!
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Gordon
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>> --
>> >>>> Gavin
>> >>>>
>> >>>
>> >>
>> >> --
>> >> Gavin
>> >>
>> >
>>
>> --
>> Gavin
>>
>

Re: Re: Re: Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

相应的提了 LOCAL_WEBSERVER 的 issue[1]

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-13828


hb <343122...@163.com> 于2019年8月23日周五 下午3:26写道：

> 谢谢,的确是这样的, 少了依赖哈哈
>
>
> 在 2019-08-23 14:20:54，"Zili Chen"  写道：
> >这是因为网页相关的文件被打包在 flink-runtime-web_${scala.binary.version} 的 resource
> >下面，只要能正确依赖、下载然后被发现就行了。
> >
> >你之前可以应该是因为依赖里有这个模块。
> >
> >Best,
> >tison.
> >
> >
> >Zili Chen  于2019年8月23日周五 下午3:19写道：
> >
> >> 添加这个依赖就可以了
> >>
> >> 
> >> org.apache.flink
> >> flink-runtime-web_2.11
> >> 1.9.0
> >> 
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> Zili Chen  于2019年8月23日周五 下午3:12写道：
> >>
> >>> 这个应该跟 1.9 使用了新版 WebUI 有关，我不太清楚。你可以到 JIRA 上提 issue 贴上 1.9
> 和之前版本的前后对比图让相关的
> >>> Flink 开发者帮忙看一下。
> >>>
> >>> 后面的问题，看源码发现的（x
> >>>
> >>> Best,
> >>> tison.
> >>>
> >>>
> >>> hb <343122...@163.com> 于2019年8月23日周五 下午3:05写道：
> >>>
> >>>> 请问 这个【配置项无效】 是在哪里看的, debug程序里看的么
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 在 2019-08-23 14:01:32，"Zili Chen"  写道：
> >>>> >我看看能不能怎么加依赖或者手动放网页文件来解决这个问题。
> >>>> >
> >>>> >另外，【配置项无效】是说，这个配置项在代码里没有用到，所以无论你怎么配，都不会对程序产生影响；而不是说，不管怎么配，都产生了 WebUI
> >>>> 无效的效果。
> >>>> >
> >>>> >Best,
> >>>> >tison.
> >>>> >
> >>>> >
> >>>> >Zili Chen  于2019年8月23日周五 下午2:59写道：
> >>>> >
> >>>> >> 喔，明白了，这个是因为你访问 /taskmanagers 是一个 REST 接口，Flink 的 WebUI
> >>>> 实际上是正常的，所以能正常的返回你。
> >>>> >>
> >>>> >> 你访问主页的时候，由于加载主页需要相应的 html 等文件，而 Flink 找不到，所以就告诉你 not found
> >>>> >>
> >>>> >> Best,
> >>>> >> tison.
> >>>> >>
> >>>> >>
> >>>> >> hb <343122...@163.com> 于2019年8月23日周五 下午2:51写道：
> >>>> >>
> >>>> >>> 个人理解,能通过这个端口访问restfull API 应该表示 这个设置生效了,但是首页却是404
> >>>> >>>
> >>>> >>>
> >>>> >>> 访问: http://localhost:8089/   404
> >>>> >>> 访问: http://localhost:8089/taskmanagers/
> >>>> >>> 正常:
> >>>> >>>
> >>>> >>>
> >>>>
> {"taskmanagers":[{"id":"ef7030d2-eb13-4c68-8d0c-20b3d59616c8","path":"akka://flink/user/taskmanager_0","dataPort":-1,"timeSinceLastHeartbeat":0,"slotsNumber":8,"freeSlots":0,"hardware":{"cpuCores":4,"physicalMemory":34359738368,"freeMemory":7635730432,"managedMemory":5338540032}}]}
> >>>> >>>
> >>>> >>>
> >>>> >>> 源码:
> >>>> >>> ```
> >>>> >>> package test
> >>>> >>>
> >>>> >>>
> >>>> >>> import org.apache.flink.configuration.{ConfigConstants,
> >>>> Configuration,
> >>>> >>> ResourceManagerOptions, RestOptions}
> >>>> >>> import
> org.apache.flink.streaming.api.functions.source.SourceFunction
> >>>> >>> import
> >>>> org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment,
> >>>> >>> _}
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>>
> >>>> >>> object File1 extends App {
> >>>> >>>   class MySource extends SourceFunction[String] {
> >>>> >>> override def run(sourceContext:
> >>>> >>> SourceFunction.SourceContext[String]): Unit = {
> >>>> >>>   for (i <- 1 to 1000) {
> >>>> >>> sourceContext.collect(i.toString)
> >>>> >>> Thread.sleep(1)
> >>>> >>>   }
> >>>> >>> }
> >>>> >>> override def cancel(): Unit = {}
> >>>> >>>   }
> >>>> >>>
> >>>> >>>
> >>&

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Zili Chen

I downloaded 1.9.0 dist from here[1] and didn't see the issue. Where do you
download it? Could you try to download the dist from [1] and see whether
the problem last?

Best,
tison.

[1]
http://mirrors.tuna.tsinghua.edu.cn/apache/flink/flink-1.9.0/flink-1.9.0-bin-scala_2.11.tgz


Gavin Lee  于2019年8月23日周五 下午4:34写道：

> Thanks for your reply @Zili.
> I'm afraid it's not the same issue.
> I found that the FlinkShell.class was not included in flink dist jar file
> in 1.9.0 version.
> Nowhere can find this class file inside jar, either in opt or lib
> directory under root folder of flink distribution.
>
>
> On Fri, Aug 23, 2019 at 4:10 PM Zili Chen  wrote:
>
>> Hi Gavin,
>>
>> I also find a problem in shell if the directory contain whitespace
>> then the final command to run is incorrect. Could you check the
>> final command to be executed?
>>
>> FYI, here is the ticket[1].
>>
>> Best,
>> tison.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-13827
>>
>>
>> Gavin Lee  于2019年8月23日周五 下午3:36写道：
>>
>>> Why bin/start-scala-shell.sh local return following error?
>>>
>>> bin/start-scala-shell.sh local
>>>
>>> Error: Could not find or load main class
>>> org.apache.flink.api.scala.FlinkShell
>>> For flink 1.8.1 and previous ones, no such issues.
>>>
>>> On Fri, Aug 23, 2019 at 2:05 PM qi luo  wrote:
>>>
>>>> Congratulations and thanks for the hard work!
>>>>
>>>> Qi
>>>>
>>>> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai 
>>>> wrote:
>>>>
>>>> The Apache Flink community is very happy to announce the release of
>>>> Apache Flink 1.9.0, which is the latest major release.
>>>>
>>>> Apache Flink® is an open-source stream processing framework for
>>>> distributed, high-performing, always-available, and accurate data streaming
>>>> applications.
>>>>
>>>> The release is available for download at:
>>>> https://flink.apache.org/downloads.html
>>>>
>>>> Please check out the release blog post for an overview of the
>>>> improvements for this new major release:
>>>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>>>
>>>> The full release notes are available in Jira:
>>>>
>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344601
>>>>
>>>> We would like to thank all contributors of the Apache Flink community
>>>> who made this release possible!
>>>>
>>>> Cheers,
>>>> Gordon
>>>>
>>>>
>>>>
>>>
>>> --
>>> Gavin
>>>
>>
>
> --
> Gavin
>

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-23 Thread Zili Chen

Hi Gavin,

I also find a problem in shell if the directory contain whitespace
then the final command to run is incorrect. Could you check the
final command to be executed?

FYI, here is the ticket[1].

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-13827


Gavin Lee  于2019年8月23日周五 下午3:36写道：

> Why bin/start-scala-shell.sh local return following error?
>
> bin/start-scala-shell.sh local
>
> Error: Could not find or load main class
> org.apache.flink.api.scala.FlinkShell
> For flink 1.8.1 and previous ones, no such issues.
>
> On Fri, Aug 23, 2019 at 2:05 PM qi luo  wrote:
>
>> Congratulations and thanks for the hard work!
>>
>> Qi
>>
>> On Aug 22, 2019, at 8:03 PM, Tzu-Li (Gordon) Tai 
>> wrote:
>>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.0, which is the latest major release.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this new major release:
>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344601
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Cheers,
>> Gordon
>>
>>
>>
>
> --
> Gavin
>

Re: Re: Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

这是因为网页相关的文件被打包在 flink-runtime-web_${scala.binary.version} 的 resource
下面，只要能正确依赖、下载然后被发现就行了。

你之前可以应该是因为依赖里有这个模块。

Best,
tison.


Zili Chen  于2019年8月23日周五 下午3:19写道：

> 添加这个依赖就可以了
>
> 
> org.apache.flink
> flink-runtime-web_2.11
> 1.9.0
> 
>
> Best,
> tison.
>
>
> Zili Chen  于2019年8月23日周五 下午3:12写道：
>
>> 这个应该跟 1.9 使用了新版 WebUI 有关，我不太清楚。你可以到 JIRA 上提 issue 贴上 1.9 和之前版本的前后对比图让相关的
>> Flink 开发者帮忙看一下。
>>
>> 后面的问题，看源码发现的（x
>>
>> Best,
>> tison.
>>
>>
>> hb <343122...@163.com> 于2019年8月23日周五 下午3:05写道：
>>
>>> 请问 这个【配置项无效】 是在哪里看的, debug程序里看的么
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 在 2019-08-23 14:01:32，"Zili Chen"  写道：
>>> >我看看能不能怎么加依赖或者手动放网页文件来解决这个问题。
>>> >
>>> >另外，【配置项无效】是说，这个配置项在代码里没有用到，所以无论你怎么配，都不会对程序产生影响；而不是说，不管怎么配，都产生了 WebUI
>>> 无效的效果。
>>> >
>>> >Best,
>>> >tison.
>>> >
>>> >
>>> >Zili Chen  于2019年8月23日周五 下午2:59写道：
>>> >
>>> >> 喔，明白了，这个是因为你访问 /taskmanagers 是一个 REST 接口，Flink 的 WebUI
>>> 实际上是正常的，所以能正常的返回你。
>>> >>
>>> >> 你访问主页的时候，由于加载主页需要相应的 html 等文件，而 Flink 找不到，所以就告诉你 not found
>>> >>
>>> >> Best,
>>> >> tison.
>>> >>
>>> >>
>>> >> hb <343122...@163.com> 于2019年8月23日周五 下午2:51写道：
>>> >>
>>> >>> 个人理解,能通过这个端口访问restfull API 应该表示 这个设置生效了,但是首页却是404
>>> >>>
>>> >>>
>>> >>> 访问: http://localhost:8089/   404
>>> >>> 访问: http://localhost:8089/taskmanagers/
>>> >>> 正常:
>>> >>>
>>> >>>
>>> {"taskmanagers":[{"id":"ef7030d2-eb13-4c68-8d0c-20b3d59616c8","path":"akka://flink/user/taskmanager_0","dataPort":-1,"timeSinceLastHeartbeat":0,"slotsNumber":8,"freeSlots":0,"hardware":{"cpuCores":4,"physicalMemory":34359738368,"freeMemory":7635730432,"managedMemory":5338540032}}]}
>>> >>>
>>> >>>
>>> >>> 源码:
>>> >>> ```
>>> >>> package test
>>> >>>
>>> >>>
>>> >>> import org.apache.flink.configuration.{ConfigConstants,
>>> Configuration,
>>> >>> ResourceManagerOptions, RestOptions}
>>> >>> import org.apache.flink.streaming.api.functions.source.SourceFunction
>>> >>> import
>>> org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment,
>>> >>> _}
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> object File1 extends App {
>>> >>>   class MySource extends SourceFunction[String] {
>>> >>> override def run(sourceContext:
>>> >>> SourceFunction.SourceContext[String]): Unit = {
>>> >>>   for (i <- 1 to 1000) {
>>> >>> sourceContext.collect(i.toString)
>>> >>> Thread.sleep(1)
>>> >>>   }
>>> >>> }
>>> >>> override def cancel(): Unit = {}
>>> >>>   }
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>   var config = new Configuration()
>>> >>>   config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
>>> >>>
>>> >>>
>>> >>>   config.setInteger(RestOptions.PORT, 8089)
>>> >>>   val env = StreamExecutionEnvironment.createLocalEnvironment(8,
>>> config)
>>> >>>   env.addSource(new MySource).print()
>>> >>>
>>> >>>
>>> >>>   env.execute()
>>> >>> }
>>> >>> ```
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> 在 2019-08-23 13:41:36，"Zili Chen"  写道：
>>> >>> >源码里没有标为废弃应该是 FLINK 的一个 issue，你可以到 JIRA[1]上提，这个选项确实是没有用的。
>>> >>> >
>>> >>> >听起来你的程序是个测试程序，能提供相应的源码吗？如果你说后面能访问 /taskmanagers 的话可能并没有问题（x
>>> >>> >
>>> >>> >Best,
>>> >>> >tison.
>&g

Re: Re: Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

添加这个依赖就可以了


org.apache.flink
flink-runtime-web_2.11
1.9.0


Best,
tison.


Zili Chen  于2019年8月23日周五 下午3:12写道：

> 这个应该跟 1.9 使用了新版 WebUI 有关，我不太清楚。你可以到 JIRA 上提 issue 贴上 1.9 和之前版本的前后对比图让相关的
> Flink 开发者帮忙看一下。
>
> 后面的问题，看源码发现的（x
>
> Best,
> tison.
>
>
> hb <343122...@163.com> 于2019年8月23日周五 下午3:05写道：
>
>> 请问 这个【配置项无效】 是在哪里看的, debug程序里看的么
>>
>>
>>
>>
>>
>>
>>
>>
>> 在 2019-08-23 14:01:32，"Zili Chen"  写道：
>> >我看看能不能怎么加依赖或者手动放网页文件来解决这个问题。
>> >
>> >另外，【配置项无效】是说，这个配置项在代码里没有用到，所以无论你怎么配，都不会对程序产生影响；而不是说，不管怎么配，都产生了 WebUI
>> 无效的效果。
>> >
>> >Best,
>> >tison.
>> >
>> >
>> >Zili Chen  于2019年8月23日周五 下午2:59写道：
>> >
>> >> 喔，明白了，这个是因为你访问 /taskmanagers 是一个 REST 接口，Flink 的 WebUI
>> 实际上是正常的，所以能正常的返回你。
>> >>
>> >> 你访问主页的时候，由于加载主页需要相应的 html 等文件，而 Flink 找不到，所以就告诉你 not found
>> >>
>> >> Best,
>> >> tison.
>> >>
>> >>
>> >> hb <343122...@163.com> 于2019年8月23日周五 下午2:51写道：
>> >>
>> >>> 个人理解,能通过这个端口访问restfull API 应该表示 这个设置生效了,但是首页却是404
>> >>>
>> >>>
>> >>> 访问: http://localhost:8089/   404
>> >>> 访问: http://localhost:8089/taskmanagers/
>> >>> 正常:
>> >>>
>> >>>
>> {"taskmanagers":[{"id":"ef7030d2-eb13-4c68-8d0c-20b3d59616c8","path":"akka://flink/user/taskmanager_0","dataPort":-1,"timeSinceLastHeartbeat":0,"slotsNumber":8,"freeSlots":0,"hardware":{"cpuCores":4,"physicalMemory":34359738368,"freeMemory":7635730432,"managedMemory":5338540032}}]}
>> >>>
>> >>>
>> >>> 源码:
>> >>> ```
>> >>> package test
>> >>>
>> >>>
>> >>> import org.apache.flink.configuration.{ConfigConstants, Configuration,
>> >>> ResourceManagerOptions, RestOptions}
>> >>> import org.apache.flink.streaming.api.functions.source.SourceFunction
>> >>> import
>> org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment,
>> >>> _}
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> object File1 extends App {
>> >>>   class MySource extends SourceFunction[String] {
>> >>> override def run(sourceContext:
>> >>> SourceFunction.SourceContext[String]): Unit = {
>> >>>   for (i <- 1 to 1000) {
>> >>> sourceContext.collect(i.toString)
>> >>> Thread.sleep(1)
>> >>>   }
>> >>> }
>> >>> override def cancel(): Unit = {}
>> >>>   }
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>   var config = new Configuration()
>> >>>   config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
>> >>>
>> >>>
>> >>>   config.setInteger(RestOptions.PORT, 8089)
>> >>>   val env = StreamExecutionEnvironment.createLocalEnvironment(8,
>> config)
>> >>>   env.addSource(new MySource).print()
>> >>>
>> >>>
>> >>>   env.execute()
>> >>> }
>> >>> ```
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> 在 2019-08-23 13:41:36，"Zili Chen"  写道：
>> >>> >源码里没有标为废弃应该是 FLINK 的一个 issue，你可以到 JIRA[1]上提，这个选项确实是没有用的。
>> >>> >
>> >>> >听起来你的程序是个测试程序，能提供相应的源码吗？如果你说后面能访问 /taskmanagers 的话可能并没有问题（x
>> >>> >
>> >>> >Best,
>> >>> >tison.
>> >>> >
>> >>> >[1] https://issues.apache.org/jira/browse/
>> >>> >
>> >>> >
>> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午2:27写道：
>> >>> >
>> >>> >> 我在idea里 用maven 下载依赖,在idea里运行flink程序源码里没有标志为废弃啊
>> >>> >> ```package
>> >>> >> org.apache.flink.configurationpublic final class
>> >>> ConfigConstants
>> >>> >> {  ... /** * @deprecated Use {@link
>> >>> >> ResourceManagerOptions#LOCAL_NUMBER_RESOURCE_MANAGER} instead.
>> >>> >> */@Deprecatedpublic static final int
>> >>> >> DEFAULT_LOCAL_NUMBER_RESOURCE_MANAGER = 1;
>>  public
>> >>> >> static final String LOCAL_START_WEBSERVER =
>> >>> >> "local.start-webserver";```
>> >>> >> 在 2019-08-23 13:07:27，"Zili Chen"  写道：
>> >>> >> >另外有个问题是，你是下载二进制 zip 还是从源码编译安装的？
>> >>> >> >
>> >>> >> >Best,
>> >>> >> >tison.
>> >>> >> >
>> >>> >> >
>> >>> >> >Zili Chen  于2019年8月23日周五 下午2:04写道：
>> >>> >> >
>> >>> >> >> 我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER
>> >>> >> 应该是个废设置，设不设都没有任何效果。
>> >>> >> >>
>> >>> >> >> 所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？
>> >>> >> >>
>> >>> >> >> Best,
>> >>> >> >> tison.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：
>> >>> >> >>
>> >>> >> >>> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
>> >>> >> >>> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
>> >>> >> >>> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
>> >>> >> >>> >
>> >>> >> >>> >Best,
>> >>> >> >>> >tison.
>> >>> >> >>> >
>> >>> >> >>> >
>> >>> >> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
>> >>> >> >>> >
>> >>> >> >>> >> flink1.9.0 下 本地web 页面主页404,代码:
>> >>> >> >>> >> ```
>> >>> >> >>> >> var config = new Configuration()
>> >>> >> >>> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER,
>> true)
>> >>> >> >>> >>
>> >>> >> >>> >> config.setInteger(RestOptions.PORT, 8089)
>> >>> >> >>> >> val env =
>> StreamExecutionEnvironment.createLocalEnvironment(8,
>> >>> >> config)
>> >>> >> >>> >> ```
>> >>> >> >>> >> 打开 http://localhost:8089/ 显示
>> >>> >> >>> >> {"errors":["Not found."]}
>> >>> >> >>> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
>> >>> >> >>>
>> >>> >> >>
>> >>> >>
>> >>>
>> >>
>>
>

Re: Re: Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

这个应该跟 1.9 使用了新版 WebUI 有关，我不太清楚。你可以到 JIRA 上提 issue 贴上 1.9 和之前版本的前后对比图让相关的
Flink 开发者帮忙看一下。

后面的问题，看源码发现的（x

Best,
tison.


hb <343122...@163.com> 于2019年8月23日周五 下午3:05写道：

> 请问 这个【配置项无效】 是在哪里看的, debug程序里看的么
>
>
>
>
>
>
>
>
> 在 2019-08-23 14:01:32，"Zili Chen"  写道：
> >我看看能不能怎么加依赖或者手动放网页文件来解决这个问题。
> >
> >另外，【配置项无效】是说，这个配置项在代码里没有用到，所以无论你怎么配，都不会对程序产生影响；而不是说，不管怎么配，都产生了 WebUI
> 无效的效果。
> >
> >Best,
> >tison.
> >
> >
> >Zili Chen  于2019年8月23日周五 下午2:59写道：
> >
> >> 喔，明白了，这个是因为你访问 /taskmanagers 是一个 REST 接口，Flink 的 WebUI
> 实际上是正常的，所以能正常的返回你。
> >>
> >> 你访问主页的时候，由于加载主页需要相应的 html 等文件，而 Flink 找不到，所以就告诉你 not found
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> hb <343122...@163.com> 于2019年8月23日周五 下午2:51写道：
> >>
> >>> 个人理解,能通过这个端口访问restfull API 应该表示 这个设置生效了,但是首页却是404
> >>>
> >>>
> >>> 访问: http://localhost:8089/   404
> >>> 访问: http://localhost:8089/taskmanagers/
> >>> 正常:
> >>>
> >>>
> {"taskmanagers":[{"id":"ef7030d2-eb13-4c68-8d0c-20b3d59616c8","path":"akka://flink/user/taskmanager_0","dataPort":-1,"timeSinceLastHeartbeat":0,"slotsNumber":8,"freeSlots":0,"hardware":{"cpuCores":4,"physicalMemory":34359738368,"freeMemory":7635730432,"managedMemory":5338540032}}]}
> >>>
> >>>
> >>> 源码:
> >>> ```
> >>> package test
> >>>
> >>>
> >>> import org.apache.flink.configuration.{ConfigConstants, Configuration,
> >>> ResourceManagerOptions, RestOptions}
> >>> import org.apache.flink.streaming.api.functions.source.SourceFunction
> >>> import
> org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment,
> >>> _}
> >>>
> >>>
> >>>
> >>>
> >>> object File1 extends App {
> >>>   class MySource extends SourceFunction[String] {
> >>> override def run(sourceContext:
> >>> SourceFunction.SourceContext[String]): Unit = {
> >>>   for (i <- 1 to 1000) {
> >>> sourceContext.collect(i.toString)
> >>> Thread.sleep(1)
> >>>   }
> >>> }
> >>> override def cancel(): Unit = {}
> >>>   }
> >>>
> >>>
> >>>
> >>>
> >>>   var config = new Configuration()
> >>>   config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
> >>>
> >>>
> >>>   config.setInteger(RestOptions.PORT, 8089)
> >>>   val env = StreamExecutionEnvironment.createLocalEnvironment(8,
> config)
> >>>   env.addSource(new MySource).print()
> >>>
> >>>
> >>>   env.execute()
> >>> }
> >>> ```
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 在 2019-08-23 13:41:36，"Zili Chen"  写道：
> >>> >源码里没有标为废弃应该是 FLINK 的一个 issue，你可以到 JIRA[1]上提，这个选项确实是没有用的。
> >>> >
> >>> >听起来你的程序是个测试程序，能提供相应的源码吗？如果你说后面能访问 /taskmanagers 的话可能并没有问题（x
> >>> >
> >>> >Best,
> >>> >tison.
> >>> >
> >>> >[1] https://issues.apache.org/jira/browse/
> >>> >
> >>> >
> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午2:27写道：
> >>> >
> >>> >> 我在idea里 用maven 下载依赖,在idea里运行flink程序源码里没有标志为废弃啊
> >>> >> ```package
> >>> >> org.apache.flink.configurationpublic final class
> >>> ConfigConstants
> >>> >> {  ... /** * @deprecated Use {@link
> >>> >> ResourceManagerOptions#LOCAL_NUMBER_RESOURCE_MANAGER} instead.
> >>> >> */@Deprecatedpublic static final int
> >>> >> DEFAULT_LOCAL_NUMBER_RESOURCE_MANAGER = 1;
>  public
> >>> >> static final String LOCAL_START_WEBSERVER =
> >>> >> "local.start-webserver";```
> >>> >> 在 2019-08-23 13:07:27，"Zili Chen"  写道：
> >>> >> >另外有个问题是，你是下载二进制 zip 还是从源码编译安装的？
> >>> >> >
> >>> >> >Best,
> >>> >> >tison.
> >>> >> >
> >>> >> >
> >>> >> >Zili Chen  于2019年8月23日周五 下午2:04写道：
> >>> >> >
> >>> >> >> 我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER
> >>> >> 应该是个废设置，设不设都没有任何效果。
> >>> >> >>
> >>> >> >> 所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？
> >>> >> >>
> >>> >> >> Best,
> >>> >> >> tison.
> >>> >> >>
> >>> >> >>
> >>> >> >> hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：
> >>> >> >>
> >>> >> >>> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
> >>> >> >>> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
> >>> >> >>> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
> >>> >> >>> >
> >>> >> >>> >Best,
> >>> >> >>> >tison.
> >>> >> >>> >
> >>> >> >>> >
> >>> >> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
> >>> >> >>> >
> >>> >> >>> >> flink1.9.0 下 本地web 页面主页404,代码:
> >>> >> >>> >> ```
> >>> >> >>> >> var config = new Configuration()
> >>> >> >>> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER,
> true)
> >>> >> >>> >>
> >>> >> >>> >> config.setInteger(RestOptions.PORT, 8089)
> >>> >> >>> >> val env =
> StreamExecutionEnvironment.createLocalEnvironment(8,
> >>> >> config)
> >>> >> >>> >> ```
> >>> >> >>> >> 打开 http://localhost:8089/ 显示
> >>> >> >>> >> {"errors":["Not found."]}
> >>> >> >>> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
> >>> >> >>>
> >>> >> >>
> >>> >>
> >>>
> >>
>

Re: Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

我看看能不能怎么加依赖或者手动放网页文件来解决这个问题。

另外，【配置项无效】是说，这个配置项在代码里没有用到，所以无论你怎么配，都不会对程序产生影响；而不是说，不管怎么配，都产生了 WebUI 无效的效果。

Best,
tison.


Zili Chen  于2019年8月23日周五 下午2:59写道：

> 喔，明白了，这个是因为你访问 /taskmanagers 是一个 REST 接口，Flink 的 WebUI 实际上是正常的，所以能正常的返回你。
>
> 你访问主页的时候，由于加载主页需要相应的 html 等文件，而 Flink 找不到，所以就告诉你 not found
>
> Best,
> tison.
>
>
> hb <343122...@163.com> 于2019年8月23日周五 下午2:51写道：
>
>> 个人理解,能通过这个端口访问restfull API 应该表示 这个设置生效了,但是首页却是404
>>
>>
>> 访问: http://localhost:8089/   404
>> 访问: http://localhost:8089/taskmanagers/
>> 正常:
>>
>> {"taskmanagers":[{"id":"ef7030d2-eb13-4c68-8d0c-20b3d59616c8","path":"akka://flink/user/taskmanager_0","dataPort":-1,"timeSinceLastHeartbeat":0,"slotsNumber":8,"freeSlots":0,"hardware":{"cpuCores":4,"physicalMemory":34359738368,"freeMemory":7635730432,"managedMemory":5338540032}}]}
>>
>>
>> 源码:
>> ```
>> package test
>>
>>
>> import org.apache.flink.configuration.{ConfigConstants, Configuration,
>> ResourceManagerOptions, RestOptions}
>> import org.apache.flink.streaming.api.functions.source.SourceFunction
>> import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment,
>> _}
>>
>>
>>
>>
>> object File1 extends App {
>>   class MySource extends SourceFunction[String] {
>> override def run(sourceContext:
>> SourceFunction.SourceContext[String]): Unit = {
>>   for (i <- 1 to 1000) {
>> sourceContext.collect(i.toString)
>> Thread.sleep(1)
>>   }
>> }
>> override def cancel(): Unit = {}
>>   }
>>
>>
>>
>>
>>   var config = new Configuration()
>>   config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
>>
>>
>>   config.setInteger(RestOptions.PORT, 8089)
>>   val env = StreamExecutionEnvironment.createLocalEnvironment(8, config)
>>   env.addSource(new MySource).print()
>>
>>
>>   env.execute()
>> }
>> ```
>>
>>
>>
>>
>>
>>
>> 在 2019-08-23 13:41:36，"Zili Chen"  写道：
>> >源码里没有标为废弃应该是 FLINK 的一个 issue，你可以到 JIRA[1]上提，这个选项确实是没有用的。
>> >
>> >听起来你的程序是个测试程序，能提供相应的源码吗？如果你说后面能访问 /taskmanagers 的话可能并没有问题（x
>> >
>> >Best,
>> >tison.
>> >
>> >[1] https://issues.apache.org/jira/browse/
>> >
>> >
>> >hb <343122...@163.com> 于2019年8月23日周五 下午2:27写道：
>> >
>> >> 我在idea里 用maven 下载依赖,在idea里运行flink程序源码里没有标志为废弃啊
>> >> ```package
>> >> org.apache.flink.configurationpublic final class
>> ConfigConstants
>> >> {  ... /** * @deprecated Use {@link
>> >> ResourceManagerOptions#LOCAL_NUMBER_RESOURCE_MANAGER} instead.
>> >> */    @Deprecatedpublic static final int
>> >> DEFAULT_LOCAL_NUMBER_RESOURCE_MANAGER = 1;   public
>> >> static final String LOCAL_START_WEBSERVER =
>> >> "local.start-webserver";```
>> >> 在 2019-08-23 13:07:27，"Zili Chen"  写道：
>> >> >另外有个问题是，你是下载二进制 zip 还是从源码编译安装的？
>> >> >
>> >> >Best,
>> >> >tison.
>> >> >
>> >> >
>> >> >Zili Chen  于2019年8月23日周五 下午2:04写道：
>> >> >
>> >> >> 我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER
>> >> 应该是个废设置，设不设都没有任何效果。
>> >> >>
>> >> >> 所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？
>> >> >>
>> >> >> Best,
>> >> >> tison.
>> >> >>
>> >> >>
>> >> >> hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：
>> >> >>
>> >> >>> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
>> >> >>> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
>> >> >>> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
>> >> >>> >
>> >> >>> >Best,
>> >> >>> >tison.
>> >> >>> >
>> >> >>> >
>> >> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
>> >> >>> >
>> >> >>> >> flink1.9.0 下 本地web 页面主页404,代码:
>> >> >>> >> ```
>> >> >>> >> var config = new Configuration()
>> >> >>> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
>> >> >>> >>
>> >> >>> >> config.setInteger(RestOptions.PORT, 8089)
>> >> >>> >> val env = StreamExecutionEnvironment.createLocalEnvironment(8,
>> >> config)
>> >> >>> >> ```
>> >> >>> >> 打开 http://localhost:8089/ 显示
>> >> >>> >> {"errors":["Not found."]}
>> >> >>> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
>> >> >>>
>> >> >>
>> >>
>>
>

Re: Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

喔，明白了，这个是因为你访问 /taskmanagers 是一个 REST 接口，Flink 的 WebUI 实际上是正常的，所以能正常的返回你。

你访问主页的时候，由于加载主页需要相应的 html 等文件，而 Flink 找不到，所以就告诉你 not found

Best,
tison.


hb <343122...@163.com> 于2019年8月23日周五 下午2:51写道：

> 个人理解,能通过这个端口访问restfull API 应该表示 这个设置生效了,但是首页却是404
>
>
> 访问: http://localhost:8089/   404
> 访问: http://localhost:8089/taskmanagers/
> 正常:
>
> {"taskmanagers":[{"id":"ef7030d2-eb13-4c68-8d0c-20b3d59616c8","path":"akka://flink/user/taskmanager_0","dataPort":-1,"timeSinceLastHeartbeat":0,"slotsNumber":8,"freeSlots":0,"hardware":{"cpuCores":4,"physicalMemory":34359738368,"freeMemory":7635730432,"managedMemory":5338540032}}]}
>
>
> 源码:
> ```
> package test
>
>
> import org.apache.flink.configuration.{ConfigConstants, Configuration,
> ResourceManagerOptions, RestOptions}
> import org.apache.flink.streaming.api.functions.source.SourceFunction
> import org.apache.flink.streaming.api.scala.{StreamExecutionEnvironment, _}
>
>
>
>
> object File1 extends App {
>   class MySource extends SourceFunction[String] {
> override def run(sourceContext: SourceFunction.SourceContext[String]):
> Unit = {
>   for (i <- 1 to 1000) {
> sourceContext.collect(i.toString)
> Thread.sleep(1)
>   }
> }
> override def cancel(): Unit = {}
>   }
>
>
>
>
>   var config = new Configuration()
>   config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
>
>
>   config.setInteger(RestOptions.PORT, 8089)
>   val env = StreamExecutionEnvironment.createLocalEnvironment(8, config)
>   env.addSource(new MySource).print()
>
>
>   env.execute()
> }
> ```
>
>
>
>
>
>
> 在 2019-08-23 13:41:36，"Zili Chen"  写道：
> >源码里没有标为废弃应该是 FLINK 的一个 issue，你可以到 JIRA[1]上提，这个选项确实是没有用的。
> >
> >听起来你的程序是个测试程序，能提供相应的源码吗？如果你说后面能访问 /taskmanagers 的话可能并没有问题（x
> >
> >Best,
> >tison.
> >
> >[1] https://issues.apache.org/jira/browse/
> >
> >
> >hb <343122...@163.com> 于2019年8月23日周五 下午2:27写道：
> >
> >> 我在idea里 用maven 下载依赖,在idea里运行flink程序源码里没有标志为废弃啊
> >> ```package
> >> org.apache.flink.configurationpublic final class
> ConfigConstants
> >> {  ...     /** * @deprecated Use {@link
> >> ResourceManagerOptions#LOCAL_NUMBER_RESOURCE_MANAGER} instead.
> >> */@Deprecatedpublic static final int
> >> DEFAULT_LOCAL_NUMBER_RESOURCE_MANAGER = 1;   public
> >> static final String LOCAL_START_WEBSERVER =
> >> "local.start-webserver";```
> >> 在 2019-08-23 13:07:27，"Zili Chen"  写道：
> >> >另外有个问题是，你是下载二进制 zip 还是从源码编译安装的？
> >> >
> >> >Best,
> >> >tison.
> >> >
> >> >
> >> >Zili Chen  于2019年8月23日周五 下午2:04写道：
> >> >
> >> >> 我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER
> >> 应该是个废设置，设不设都没有任何效果。
> >> >>
> >> >> 所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？
> >> >>
> >> >> Best,
> >> >> tison.
> >> >>
> >> >>
> >> >> hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：
> >> >>
> >> >>> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
> >> >>> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
> >> >>> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
> >> >>> >
> >> >>> >Best,
> >> >>> >tison.
> >> >>> >
> >> >>> >
> >> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
> >> >>> >
> >> >>> >> flink1.9.0 下 本地web 页面主页404,代码:
> >> >>> >> ```
> >> >>> >> var config = new Configuration()
> >> >>> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
> >> >>> >>
> >> >>> >> config.setInteger(RestOptions.PORT, 8089)
> >> >>> >> val env = StreamExecutionEnvironment.createLocalEnvironment(8,
> >> config)
> >> >>> >> ```
> >> >>> >> 打开 http://localhost:8089/ 显示
> >> >>> >> {"errors":["Not found."]}
> >> >>> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
> >> >>>
> >> >>
> >>
>

Re: Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

源码里没有标为废弃应该是 FLINK 的一个 issue，你可以到 JIRA[1]上提，这个选项确实是没有用的。

听起来你的程序是个测试程序，能提供相应的源码吗？如果你说后面能访问 /taskmanagers 的话可能并没有问题（x

Best,
tison.

[1] https://issues.apache.org/jira/browse/


hb <343122...@163.com> 于2019年8月23日周五 下午2:27写道：

> 我在idea里 用maven 下载依赖,在idea里运行flink程序源码里没有标志为废弃啊
> ```package
> org.apache.flink.configurationpublic final class ConfigConstants
> {  ... /** * @deprecated Use {@link
> ResourceManagerOptions#LOCAL_NUMBER_RESOURCE_MANAGER} instead.
> */@Deprecatedpublic static final int
> DEFAULT_LOCAL_NUMBER_RESOURCE_MANAGER = 1;   public
> static final String LOCAL_START_WEBSERVER =
> "local.start-webserver";```
> 在 2019-08-23 13:07:27，"Zili Chen"  写道：
> >另外有个问题是，你是下载二进制 zip 还是从源码编译安装的？
> >
> >Best,
> >tison.
> >
> >
> >Zili Chen  于2019年8月23日周五 下午2:04写道：
> >
> >> 我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER
> 应该是个废设置，设不设都没有任何效果。
> >>
> >> 所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？
> >>
> >> Best,
> >> tison.
> >>
> >>
> >> hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：
> >>
> >>> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
> >>> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
> >>> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
> >>> >
> >>> >Best,
> >>> >tison.
> >>> >
> >>> >
> >>> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
> >>> >
> >>> >> flink1.9.0 下 本地web 页面主页404,代码:
> >>> >> ```
> >>> >> var config = new Configuration()
> >>> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
> >>> >>
> >>> >> config.setInteger(RestOptions.PORT, 8089)
> >>> >> val env = StreamExecutionEnvironment.createLocalEnvironment(8,
> config)
> >>> >> ```
> >>> >> 打开 http://localhost:8089/ 显示
> >>> >> {"errors":["Not found."]}
> >>> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
> >>>
> >>
>

Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

另外有个问题是，你是下载二进制 zip 还是从源码编译安装的？

Best,
tison.


Zili Chen  于2019年8月23日周五 下午2:04写道：

> 我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER 应该是个废设置，设不设都没有任何效果。
>
> 所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？
>
> Best,
> tison.
>
>
> hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：
>
>> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
>> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
>> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
>> >
>> >Best,
>> >tison.
>> >
>> >
>> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
>> >
>> >> flink1.9.0 下 本地web 页面主页404,代码:
>> >> ```
>> >> var config = new Configuration()
>> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
>> >>
>> >> config.setInteger(RestOptions.PORT, 8089)
>> >> val env = StreamExecutionEnvironment.createLocalEnvironment(8, config)
>> >> ```
>> >> 打开 http://localhost:8089/ 显示
>> >> {"errors":["Not found."]}
>> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
>>
>

Re: Re: flink1.9.0 LOCAL_WEBSERVER 问题

2019-08-23 Thread Zili Chen

我切到了 1.9 的代码上看，这个 ConfigConstants.LOCAL_START_WEBSERVER 应该是个废设置，设不设都没有任何效果。

所以问题应该跟这个选项没关系，比如你刷新 localhost:8089 能不能恢复？

Best,
tison.


hb <343122...@163.com> 于2019年8月23日周五 下午1:47写道：

> 1.9 版本之前,都是可以这么用的,正常的,1.9也是有这个API的啊
> 在 2019-08-23 12:28:14，"Zili Chen"  写道：
> >你是在哪看到这个配置的，我查了下代码甚至这个选项都没有使用点（x
> >
> >Best,
> >tison.
> >
> >
> >hb <343122...@163.com> 于2019年8月23日周五 下午1:22写道：
> >
> >> flink1.9.0 下 本地web 页面主页404,代码:
> >> ```
> >> var config = new Configuration()
> >> config.setBoolean(ConfigConstants.LOCAL_START_WEBSERVER, true)
> >>
> >> config.setInteger(RestOptions.PORT, 8089)
> >> val env = StreamExecutionEnvironment.createLocalEnvironment(8, config)
> >> ```
> >> 打开 http://localhost:8089/ 显示
> >> {"errors":["Not found."]}
> >> 打开 http://localhost:8089/taskmanagers/ 能正常显示
>

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-22 Thread Zili Chen

Nice to hear :-)

Best,
tison.


Aleksandar Mastilovic  于2019年8月23日周五 上午2:22写道：

> Thanks for all the help, people - you made me go through my code once
> again and discover that I switched argument positions for job manager and
> resource manager addresses :-)
>
> The docker ensemble now starts fine, I’m working on ironing out the bugs
> now.
>
> I’ll participate in the survey too!
>
> On Aug 21, 2019, at 7:18 PM, Zili Chen  wrote:
>
> Besides, would you like to participant our survey thread[1] on
> user list about "How do you use high-availability services in Flink?"
>
> It would help Flink improve its high-availability serving.
>
> Best,
> tison.
>
> [1]
> https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E
>
>
> Zili Chen  于2019年8月22日周四 上午10:16写道：
>
>> Hi Aleksandar,
>>
>> base on your log:
>>
>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Connecting
>> to ResourceManager
>> akka.tcp://flink@jobmanager:6123/user/jobmanager()
>> .
>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
>> resolve ResourceManager address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 1 ms:
>> Could not connect to rpc endpoint under address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>
>> it looks like you return a jobmanager address on retrieval service of
>> resource manager. Please check the implementation carefully or share it on
>> mailing list that others can help for investigation.
>>
>> Best,
>> tison.
>>
>>
>> Zhu Zhu  于2019年8月22日周四 上午10:11写道：
>>
>>> Hi Aleksandar,
>>>
>>> The resource manager address is retrieved from the HA services.
>>> Would you check whether your customized HA services is returning the
>>> right  LeaderRetrievalService and whether the LeaderRetrievalService is
>>> really retrieving the right leader's address?
>>> Or is it possible that the stored resource manager address in HA is
>>> replaced by jobmanager address in any case?
>>>
>>> Thanks,
>>> Zhu Zhu
>>>
>>> Aleksandar Mastilovic  于2019年8月22日周四
>>> 上午8:16写道：
>>>
>>>> Hi all,
>>>>
>>>> I’m experimenting with using my own implementation of HA services
>>>> instead of ZooKeeper that would persist JobManager information on a
>>>> Kubernetes volume instead of in ZooKeeper.
>>>>
>>>> I’ve set the high-availability option in flink-conf.yaml to the FQN of
>>>> my factory class, and started the docker ensemble as I usually do (i.e.
>>>> with no special “cluster” arguments or scripts.)
>>>>
>>>> What’s happening now is that TaskManager is unable to connect to
>>>> ResourceManager, because it seems it’s trying to use the /user/jobmanager
>>>> path instead of /user/resourcemanager.
>>>>
>>>> Here’s what I found in the logs:
>>>>
>>>>
>>>> jobmanager_1| 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>>>>- Remoting started; listening on
>>>> addresses :[akka.tcp://flink@jobmanager:6123]
>>>> jobmanager_1| 2019-08-22 00:05:00,975 INFO
>>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor
>>>> system started at akka.tcp://flink@jobmanager:6123
>>>>
>>>> jobmanager_1| 2019-08-22 00:05:02,380 INFO
>>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService  - Starting
>>>> RPC endpoint for
>>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>>>> akka://flink/user/resourcemanager .
>>>>
>>>> jobmanager_1| 2019-08-22 00:05:03,138 INFO
>>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService  - Starting
>>>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>>>> at akka://flink/user/dispatcher .
>>>>
>>>> jobmanager_1| 2019-08-22 00:05:03,211 INFO
>>>>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>>>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager
>>>> was granted leadership with fencing token 
>>>>
>>>&

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Zili Chen

Congratulations!

Thanks Gordon and Kurt for being the release manager.

Thanks all the contributors who have made this release possible.

Best,
tison.


Jark Wu  于2019年8月22日周四 下午8:11写道：

> Congratulations!
>
> Thanks Gordon and Kurt for being the release manager and thanks a lot to
> all the contributors.
>
>
> Cheers,
> Jark
>
> On Thu, 22 Aug 2019 at 20:06, Oytun Tez  wrote:
>
>> Congratulations team; thanks for the update, Gordon.
>>
>> ---
>> Oytun Tez
>>
>> *M O T A W O R D*
>> The World's Fastest Human Translation Platform.
>> oy...@motaword.com — www.motaword.com
>>
>>
>> On Thu, Aug 22, 2019 at 8:03 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.9.0, which is the latest major release.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this new major release:
>>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344601
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Cheers,
>>> Gordon
>>>
>>

Re: 回复： flink启动等待10分钟问题

2019-08-21 Thread Zili Chen

基本上你是卡在上传用户 jar 这一步了，提交任务到部署成功是一瞬间的

2019-08-22 11:38:02,185 INFO
org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Submitting
application master application_1566383236573_0004
2019-08-22 11:38:02,226 INFO
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
application application_1566383236573_0004

中间隔开的时间 FLINK 干的事情就是上传用户依赖的 jar 包

Best,
tison.


々守护々 <346531...@qq.com> 于2019年8月22日周四 上午11:49写道：

> 这个是终端打印的日志,十分钟后才启动- Submitting application master
> application_1566383236573_0004
>
>
>
> 2019-08-22 11:28:21,766 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - This YARN
> session requires 34816MB of memory in the cluster. There are currently only
> 26624MB available.
> The Flink YARN client will try to allocate the YARN session, but maybe not
> all TaskManagers are connecting from the beginning because the resources
> are currently not available in the cluster. The allocation might take more
> time than usual because the Flink YARN client needs to wait until the
> resources become available.
> 2019-08-22 11:28:21,766 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - There is
> not enough memory available in the YARN cluster. The TaskManager(s) require
> 8192MB each. NodeManagers available: [14336, 12288]
> After allocating the JobManager (2048MB) and (2/4) TaskManagers, the
> following NodeManagers are available: [4096, 4096]
> The Flink YARN client will try to allocate the YARN session, but maybe not
> all TaskManagers are connecting from the beginning because the resources
> are currently not available in the cluster. The allocation might take more
> time than usual because the Flink YARN client needs to wait until the
> resources become available.
> 2019-08-22 11:28:21,766 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - There is
> not enough memory available in the YARN cluster. The TaskManager(s) require
> 8192MB each. NodeManagers available: [14336, 12288]
> After allocating the JobManager (2048MB) and (3/4) TaskManagers, the
> following NodeManagers are available: [4096, 4096]
> The Flink YARN client will try to allocate the YARN session, but maybe not
> all TaskManagers are connecting from the beginning because the resources
> are currently not available in the cluster. The allocation might take more
> time than usual because the Flink YARN client needs to wait until the
> resources become available.
> 2019-08-22 11:28:21,767 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Cluster
> specification: ClusterSpecification{masterMemoryMB=2048,
> taskManagerMemoryMB=8192, numberTaskManagers=4, slotsPerTaskManager=2}
> 2019-08-22 11:28:22,317 WARN
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - The
> configuration directory ('/usr/flink/flink-1.8.1/conf') contains both LOG4J
> and Logback configuration files. Please delete or rename one of them.
> 2019-08-22 11:38:02,185 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Submitting
> application master application_1566383236573_0004
> 2019-08-22 11:38:02,226 INFO
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted
> application application_1566383236573_0004
> 2019-08-22 11:38:02,226 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Waiting for
> the cluster to be allocated
> 2019-08-22 11:38:02,228 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - Deploying
> cluster, current state ACCEPTED
> 2019-08-22 11:38:07,244 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - YARN
> application has been deployed successfully.
> 2019-08-22 11:38:07,244 INFO
> org.apache.flink.yarn.AbstractYarnClusterDescriptor   - The Flink
> YARN client has been started in detached mode. In order to stop Flink on
> YARN, use the following command or a YARN web interface to stop it:
> yarn application -kill application_1566383236573_0004
> Please also note that the temporary files of the YARN session in the home
> directory will not be removed.
> Job has been submitted with JobID c16c4fc1556ccb2ceaaa2f0e8f32ec88
>
> -- 原始邮件 --
> *发件人:* "Zili Chen";
> *发送时间:* 2019年8月22日(星期四) 中午11:37
> *收件人:* "user-zh";
> *主题:* Re: 回复： flink启动等待10分钟问题
>
> 你说的【停在那儿了】是说 flink run 的终端输出不动了吗？你看一下这个终端输出里 YARN 是什么时候 accept 你的应用的，我怀疑是
> YARN 集群忙导致 10 分钟没响应。
>
> Best,
> tison.
>
>
> Zili Chen  于2019年8月22日周四 上午11:35写道：
>
> > user-zh 不支持贴图，你用下第三方存储然后贴个链接吧，或者我记得可以传邮件附件
> >
> > Best,
> > tison.
> >
> >
> > 々守护々 <346531...@qq.com> 于2019年8月22日周四 上午11:33写道：
> >
> >> 是这样的，在启动的是后客户端日志就停在

Re: 回复： flink启动等待10分钟问题

2019-08-21 Thread Zili Chen

你说的【停在那儿了】是说 flink run 的终端输出不动了吗？你看一下这个终端输出里 YARN 是什么时候 accept 你的应用的，我怀疑是
YARN 集群忙导致 10 分钟没响应。

Best,
tison.


Zili Chen  于2019年8月22日周四 上午11:35写道：

> user-zh 不支持贴图，你用下第三方存储然后贴个链接吧，或者我记得可以传邮件附件
>
> Best,
> tison.
>
>
> 々守护々 <346531...@qq.com> 于2019年8月22日周四 上午11:33写道：
>
>> 是这样的，在启动的是后客户端日志就停在那了，等待10分钟后才能生成yarn的application
>> <http://hadoop.changba.com/cluster/app/application_1566383236573_0003>
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Yun Tang";
>> *发送时间:* 2019年8月22日(星期四) 中午11:23
>> *收件人:* "user-zh";
>> *主题:* Re: 回复： flink启动等待10分钟问题
>>
>> 启动时间是 20:00:25，task都处于running甚至第一次checkpoint
>> completed的时间是20:00:42，一共才17秒，何来10分钟的问题？
>> 
>> From: 々守护々 <346531...@qq.com>
>> Sent: Thursday, August 22, 2019 11:18
>> To: user-zh 
>> Subject: 回复： flink启动等待10分钟问题
>>
>> 您好，这个是我jobmanager启动日志，请帮忙看看，谢谢！
>>
>>
>> 2019-08-21 20:00:25,428 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
>> 
>> 2019-08-21 20:00:25,430 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Registered
>> UNIX signal handlers for [TERM, HUP, INT]
>> 2019-08-21 20:00:25,433 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - YARN daemon
>> is running as: root Yarn client user obtainer: root
>> 2019-08-21 20:00:25,437 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: high-availability.cluster-id,
>> application_1566383236573_0003
>> 2019-08-21 20:00:25,437 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: jobmanager.rpc.address, localhost
>> 2019-08-21 20:00:25,437 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: jobmanager.rpc.port, 6123
>> 2019-08-21 20:00:25,437 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: high-availability.zookeeper.path.root, /flink-yarn
>> 2019-08-21 20:00:25,437 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: high-availability.storageDir,
>> hdfs://host51:9000/flink/ha-yarn
>> 2019-08-21 20:00:25,438 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: high-availability.zookeeper.quorum, host51:2181
>> 2019-08-21 20:00:25,438 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: internal.cluster.execution-mode, DETACHED
>> 2019-08-21 20:00:25,438 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: high-availability, zookeeper
>> 2019-08-21 20:00:25,438 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: parallelism.default, 1
>> 2019-08-21 20:00:25,438 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: yarn.application-attempts, 10
>> 2019-08-21 20:00:25,438 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: taskmanager.numberOfTaskSlots, 2
>> 2019-08-21 20:00:25,439 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: taskmanager.heap.size, 8192m
>> 2019-08-21 20:00:25,439 INFO
>> org.apache.flink.configuration.GlobalConfiguration- Loading
>> configuration property: jobmanager.heap.size, 2048m
>> 2019-08-21 20:00:25,465 WARN
>> org.apache.flink.configuration.Configuration  - Config uses
>> deprecated configuration key 'web.port' instead of proper key
>> 'rest.bind-port'
>> 2019-08-21 20:00:25,469 INFO
>> org.apache.flink.runtime.clusterframework.BootstrapTools  - Setting
>> directories for temporary files to:
>> /usr/hadoop/hadoop-2.7.7/tmp/nm-local-dir/usercache/root/appcache/application_1566383236573_0003
>> 2019-08-21 20:00:25,485 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Starting
>> YarnJobClusterEntrypoint.
>> 2019-08-21 20:00:25,485 INFO
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Install
>> default filesystem.
>> 2019-08-21 20:00:25,561 INFO
>> org.apache.flink.runtime.security.modules.HadoopModule- H

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Zili Chen

Besides, would you like to participant our survey thread[1] on
user list about "How do you use high-availability services in Flink?"

It would help Flink improve its high-availability serving.

Best,
tison.

[1]
https://lists.apache.org/x/thread.html/c0cc07197e6ba30b45d7709cc9e17d8497e5e3f33de504d58dfcafad@%3Cuser.flink.apache.org%3E


Zili Chen  于2019年8月22日周四 上午10:16写道：

> Hi Aleksandar,
>
> base on your log:
>
> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Connecting
> to ResourceManager
> akka.tcp://flink@jobmanager:6123/user/jobmanager()
> .
> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
> resolve ResourceManager address
> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 1 ms:
> Could not connect to rpc endpoint under address
> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>
> it looks like you return a jobmanager address on retrieval service of
> resource manager. Please check the implementation carefully or share it on
> mailing list that others can help for investigation.
>
> Best,
> tison.
>
>
> Zhu Zhu  于2019年8月22日周四 上午10:11写道：
>
>> Hi Aleksandar,
>>
>> The resource manager address is retrieved from the HA services.
>> Would you check whether your customized HA services is returning the
>> right  LeaderRetrievalService and whether the LeaderRetrievalService is
>> really retrieving the right leader's address?
>> Or is it possible that the stored resource manager address in HA is
>> replaced by jobmanager address in any case?
>>
>> Thanks,
>> Zhu Zhu
>>
>> Aleksandar Mastilovic  于2019年8月22日周四
>> 上午8:16写道：
>>
>>> Hi all,
>>>
>>> I’m experimenting with using my own implementation of HA services
>>> instead of ZooKeeper that would persist JobManager information on a
>>> Kubernetes volume instead of in ZooKeeper.
>>>
>>> I’ve set the high-availability option in flink-conf.yaml to the FQN of
>>> my factory class, and started the docker ensemble as I usually do (i.e.
>>> with no special “cluster” arguments or scripts.)
>>>
>>> What’s happening now is that TaskManager is unable to connect to
>>> ResourceManager, because it seems it’s trying to use the /user/jobmanager
>>> path instead of /user/resourcemanager.
>>>
>>> Here’s what I found in the logs:
>>>
>>>
>>> jobmanager_1| 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>>>  - Remoting started; listening on
>>> addresses :[akka.tcp://flink@jobmanager:6123]
>>> jobmanager_1| 2019-08-22 00:05:00,975 INFO
>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor
>>> system started at akka.tcp://flink@jobmanager:6123
>>>
>>> jobmanager_1| 2019-08-22 00:05:02,380 INFO
>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService  - Starting
>>> RPC endpoint for
>>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>>> akka://flink/user/resourcemanager .
>>>
>>> jobmanager_1| 2019-08-22 00:05:03,138 INFO
>>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService  - Starting
>>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>>> at akka://flink/user/dispatcher .
>>>
>>> jobmanager_1| 2019-08-22 00:05:03,211 INFO
>>>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager
>>> was granted leadership with fencing token 
>>>
>>> jobmanager_1| 2019-08-22 00:05:03,292 INFO
>>>  org.apache.flink.runtime.dispatcher.StandaloneDispatcher  - Dispatcher
>>> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted leadership
>>> with fencing token ----
>>>
>>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Connecting
>>> to ResourceManager
>>> akka.tcp://flink@jobmanager:6123/user/jobmanager()
>>> .
>>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
>>> resolve ResourceManager address
>>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 1 ms:
>>> Could not connect to rpc endpoint under address
>>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>>
>>> Is this a known bug? I’d appreciate any help I can get.
>>>
>>> Thanks,
>>> Aleksandar Mastilovic
>>>
>>

Re: TaskManager not connecting to ResourceManager in HA mode

2019-08-21 Thread Zili Chen

Hi Aleksandar,

base on your log:

taskmanager_1   | 2019-08-22 00:05:03,713 INFO
 org.apache.flink.runtime.taskexecutor.TaskExecutor- Connecting
to ResourceManager
akka.tcp://flink@jobmanager:6123/user/jobmanager()
.
taskmanager_1   | 2019-08-22 00:05:04,137 INFO
 org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
resolve ResourceManager address
akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 1 ms:
Could not connect to rpc endpoint under address
akka.tcp://flink@jobmanager:6123/user/jobmanager..

it looks like you return a jobmanager address on retrieval service of
resource manager. Please check the implementation carefully or share it on
mailing list that others can help for investigation.

Best,
tison.


Zhu Zhu  于2019年8月22日周四 上午10:11写道：

> Hi Aleksandar,
>
> The resource manager address is retrieved from the HA services.
> Would you check whether your customized HA services is returning the
> right  LeaderRetrievalService and whether the LeaderRetrievalService is
> really retrieving the right leader's address?
> Or is it possible that the stored resource manager address in HA is
> replaced by jobmanager address in any case?
>
> Thanks,
> Zhu Zhu
>
> Aleksandar Mastilovic  于2019年8月22日周四
> 上午8:16写道：
>
>> Hi all,
>>
>> I’m experimenting with using my own implementation of HA services instead
>> of ZooKeeper that would persist JobManager information on a Kubernetes
>> volume instead of in ZooKeeper.
>>
>> I’ve set the high-availability option in flink-conf.yaml to the FQN of my
>> factory class, and started the docker ensemble as I usually do (i.e. with
>> no special “cluster” arguments or scripts.)
>>
>> What’s happening now is that TaskManager is unable to connect to
>> ResourceManager, because it seems it’s trying to use the /user/jobmanager
>> path instead of /user/resourcemanager.
>>
>> Here’s what I found in the logs:
>>
>>
>> jobmanager_1| 2019-08-22 00:05:00,963 INFO  akka.remote.Remoting
>>  - Remoting started; listening on
>> addresses :[akka.tcp://flink@jobmanager:6123]
>> jobmanager_1| 2019-08-22 00:05:00,975 INFO
>>  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor
>> system started at akka.tcp://flink@jobmanager:6123
>>
>> jobmanager_1| 2019-08-22 00:05:02,380 INFO
>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService  - Starting
>> RPC endpoint for
>> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at
>> akka://flink/user/resourcemanager .
>>
>> jobmanager_1| 2019-08-22 00:05:03,138 INFO
>>  org.apache.flink.runtime.rpc.akka.AkkaRpcService  - Starting
>> RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher
>> at akka://flink/user/dispatcher .
>>
>> jobmanager_1| 2019-08-22 00:05:03,211 INFO
>>  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  -
>> ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager
>> was granted leadership with fencing token 
>>
>> jobmanager_1| 2019-08-22 00:05:03,292 INFO
>>  org.apache.flink.runtime.dispatcher.StandaloneDispatcher  - Dispatcher
>> akka.tcp://flink@jobmanager:6123/user/dispatcher was granted leadership
>> with fencing token ----
>>
>> taskmanager_1   | 2019-08-22 00:05:03,713 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Connecting
>> to ResourceManager
>> akka.tcp://flink@jobmanager:6123/user/jobmanager()
>> .
>> taskmanager_1   | 2019-08-22 00:05:04,137 INFO
>>  org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
>> resolve ResourceManager address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager, retrying in 1 ms:
>> Could not connect to rpc endpoint under address
>> akka.tcp://flink@jobmanager:6123/user/jobmanager..
>>
>> Is this a known bug? I’d appreciate any help I can get.
>>
>> Thanks,
>> Aleksandar Mastilovic
>>
>

Re: Recovery from job manager crash using check points

2019-08-21 Thread Zili Chen

Hi Min,

For your question, the answer is no.

In standalone case Flink uses an in memory checkpoint store which
is able to restore your savepoint configured in command-line and
recover states from it.

Besides, stop with savepoint and resume the job from savepoint
is the standard path to migrate jobs.

Best,
tison.


 于2019年8月21日周三 下午9:46写道：

> Thanks for the helpful reply.
>
>
>
> One more question, does this zookeeper or HA requirement apply for a
> savepoint?
>
>
>
> Can I bounce a single jobmanager cluster and rerun my flink job from its
> previous states with a save point directory? e.g.
>
> ./bin/flink run myJob.jar -s savepointDirectory
>
>
>
> Regards,
>
>
>
> Min
>
>
>
> *From:* Zili Chen [mailto:wander4...@gmail.com]
> *Sent:* Dienstag, 20. August 2019 04:16
> *To:* Biao Liu
> *Cc:* Tan, Min; user
> *Subject:* [External] Re: Recovery from job manager crash using check
> points
>
>
>
> Hi Min,
>
>
>
> I guess you use standalone high-availability and when TM fails,
>
> JM can recovered the job from an in-memory checkpoint store.
>
>
>
> However, when JM fails, since you don't persist state on ha backend
>
> such as ZooKeeper, even JM relaunched by YARN RM superseded by a
>
> stand by, the new one knows nothing about the previous jobs.
>
>
>
> In short, you need to set up ZooKeepers as you yourself mentioned.
>
>
>
> Best,
>
> tison.
>
>
>
>
>
> Biao Liu  于2019年8月19日周一 下午11:49写道：
>
> Hi Min,
>
>
>
> > Do I need to set up zookeepers to keep the states when a job manager
> crashes?
>
>
>
> I guess you need to set up the HA [1] properly. Besides that, I would
> suggest you should also check the state backend.
>
>
>
> 1.
> https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html
>
> 2.
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
>
>
>
> Thanks,
>
> Biao /'bɪ.aʊ/
>
>
>
>
>
>
>
> On Mon, 19 Aug 2019 at 23:28,  wrote:
>
> Hi,
>
>
>
> I can use check points to recover Flink states when a task manger crashes.
>
>
>
> I can not use check points to recover Flink states when a job manger
> crashes.
>
>
>
> Do I need to set up zookeepers to keep the states when a job manager
> crashes?
>
>
>
> Regards
>
>
>
> Min
>
>
>
>

FLINK WEEKLY 2019/33

2019-08-21 Thread Zili Chen

近日 FLINK 社区的每周社区速报又重新开始发布了[1]，鉴于 FLINK 中文社区已经发展到开设了一个专门的 user-zh
邮件列表，我打算试着编纂中文版的 FLINK 社区每周速报。

本次首发在这里[0]，根据和社区讨论的结果，以后会同步发布在邮件列表上，同时针对邮件列表的阅读形式做格式上的调整。
英文版的 WEEKLY 主要关注在开发（develop）和社区新闻事件（event）上，而我发现中文社区的用户对大多在海外的 event
兴趣不大，反而对一些 FLINK 常见的问题的解答需求量较大。因此我会在第一个部分先选出 FLINK 社区 user 列表和 user-zh
列表上一些有意义的 QA 邮件。因为我主要 focus 在 runtime 的部分，所以对其他模块尤其是 SQL
的部分不甚了解，暂时不予摘录。如果有同学想要一起合作写 WEEKLY 也欢迎私信联系。
本次 WEEKLY 的结构为 USER 用户问题的解答，DEV 社区开发和提议的进展，NEWS 社区新闻，以及最后作为第一次 WEEKLY 附录的
FLINK 邮件列表的加入方法。
USER
Flink 1.8 run 参数不一样

涉及到 FLINK 1.8.1 之后将 pre-bundle 的 Hadoop 包分开 release 的问题，需要单独下载或者指定已有的
Hadoop 路径。
[DISCUSS] Flink Docker Playgrounds

Fabian Hueske 表示社区发布了一个 Flink Docker 的样例镜像，可以为想在 Docker 上部署 Flink
的用户提供一个方便尝试的方式。
Why available task slots are not leveraged for pipeline?

FLINK 调度时 slot 使用与并行度的关系，涉及到 operator chain 的作用。
flink on yarn，提交方式是per job的话，如何保证高可用？

回答介绍了 FLINK 高可用机制的实现。
What should the "Data Source" be translated into Chinese

FLINK 文档中文翻译中关于 Data Source 一词翻译的讨论。FLINK 文档中文翻译由 Jark Wu 主导，Umbrella Issue
为 FLINK-11529[2]，是中文社区用户参与到社区贡献的一个好起点。
How many task managers can Flink efficiently scale to?

FLINK 能够处理多大规模的集群？阿里巴巴的开发者分享了内部实践经验，并介绍了一些常见的瓶颈。
Launching Flink server from IntelliJ

如何在 IntelliJ IDEA 中运行 FLINK 程序？
DEV
过去两周 FLINK 社区一共有 6 个 FLIP（FLINK Improvement Proposals
）被提出，可以看到
FLINK 社区在 1.9.0 顺利进入 release 阶段后关于下一阶段的开发的讨论热度持续上升中。
[DISCUSS] FLIP-49: Unified Memory Configuration for TaskExecutors

Xintong Song 的 FLIP-49 主要是为了解决流和批不同的存储配置和 RocksDB 难以配置的问题。通过重构
MemoryManager/TaskManager 和 Configuration 自身来支持统一的配置接口和提高配置项的可读性。这个 FLIP
还在讨论的早期阶段。
[DISCUSS] FLIP-50: Spill-able Heap Keyed State Backend

Yu Li 的 FLIP-50 旨在实现一个新的 State Backend，Spill-able Heap Keyed State
Backend，它将能够有效的减少作业执行中出现 OOM 的几率。这个 FLIP 已经进入到了 VOTE 的阶段。
[DISCUSS] FLIP-51: Rework of the Expression Design

JingsongLee 的 FLIP-51 由 FLINK 社区整合 BLINK planner 的过程提出，是 FLINK SQL
功能优化一部分。FLINK 的 SQL 功能高度依赖 Calcite，这一 FLIP 将减少 FLINK Expression 和 Calcite
RexNode 之间转换的复杂性，同时使得 TableAPI 和 Calcite 的 definitions 更一致。目前这个 FLIP 已经进入到了
VOTE 的阶段。
[DISCUSS] FLIP-52: Remove legacy Program interface.

Kostas Kloudas 的 FLIP-52 基于我此前提出的讨论[3]和调查[4]，主要是从 codebase 中去掉 Program
接口和相关的代码路径。这个接口早期作为 FLINK 的前身 Stratosphere 提供给用户提交任务的接口，在 FLINK
的发展过程中已经无人问津了。Program 的废弃和移除也作为正在讨论的 Client API 重构的一部分。这个 FLIP
已经被社区接受并进入实现阶段。
[DISCUSS] FLIP-53: Fine Grained Resource Management

Xintong Song 的 FLIP-53 旨在重构 FLINK 的资源管理模块，真正激活 FLINK 中 ResourceProfile
类及其相关逻辑。目前 FLINK 的资源管理是比较粗粒度的，基于 slot 来分配资源，并且具体资源难以配置。FLIP-53
试图提出一个在批和流上统一的资源管理视图，并提供细化的配置资源和动态的更改资源配置的能力。这个 FLIP 还在讨论的早期阶段。
[DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

Timo Walther 的 FLIP-54 的出发点是在开发 TableAPI 相关功能时发现 FLINK 的 ExecutionConfig 和
TableConfig 不容易配置，进而提出了一个更一般的问题，FLINK
的用户配置在内部应该提供一个一致的视图，应该支持方便的演进，即加入新的配置项。这个 FLIP 还在讨论的早期阶段。
Rework threading model of CheckpointCoordinator

Piotr Nowojski 和 Biao Liu 合作的 FLINK-13698 主要是为了统一 CheckpointCoordinator
的线程模型。FLINK 的各个组件设计之初都有一个类似于 actor
模型的单个主线程模型的设计，但是随着代码的演进线程模型越来越混乱，许多难以排查的阴魂不散的 BUG
都由此而起。梳理和重构组件的线程模型有助于减少由于不必要的或者没有良好管理的并发带来的稳定性和性能问题。
[DISCUSS] Best practice to run flink on kubernetes

Yang Wang 提出的关于 FLINK on k8s 的讨论继续了 FLINK-9953

Re: [SURVEY] How do you use high-availability services in Flink?

2019-08-21 Thread Zili Chen

In addition, FLINK-13750[1] also likely introduce breaking change
on high-availability services. So it is highly encouraged you who
might be affected by the change share your cases :-)

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-13750


Zili Chen  于2019年8月21日周三 下午3:32写道：

> Hi guys,
>
> We want to have an accurate idea of how users actually use
> high-availability services in Flink, especially how you customize
> high-availability services by HighAvailabilityServicesFactory.
>
> Basically there are standalone impl., zookeeper impl., embedded impl.
> used in MiniCluster, YARN impl. not yet implemented, and a gate to
> customized implementations.
>
> Generally I think standalone impl. and zookeeper impl. are the most
> widely used implementations. The embedded impl. is used without
> awareness when users run a MiniCluster.
>
> Besides that, it is helpful to know how you guys customize
> high-availability services using HighAvailabilityServicesFactory
> interface for the ongoing FLINK-10333[1] which would evolve
> high-availability services in Flink. As well as whether there is any
> user take interest in the not yet implemented YARN impl..
>
> Any user case should be helpful. I really appreciate your time and your
> insight.
>
> Best,
> tison.
>
> [1] https://issues.apache.org/jira/browse/FLINK-10333
>

[SURVEY] How do you use high-availability services in Flink?

2019-08-21 Thread Zili Chen

Hi guys,

We want to have an accurate idea of how users actually use
high-availability services in Flink, especially how you customize
high-availability services by HighAvailabilityServicesFactory.

Basically there are standalone impl., zookeeper impl., embedded impl.
used in MiniCluster, YARN impl. not yet implemented, and a gate to
customized implementations.

Generally I think standalone impl. and zookeeper impl. are the most
widely used implementations. The embedded impl. is used without
awareness when users run a MiniCluster.

Besides that, it is helpful to know how you guys customize
high-availability services using HighAvailabilityServicesFactory
interface for the ongoing FLINK-10333[1] which would evolve
high-availability services in Flink. As well as whether there is any
user take interest in the not yet implemented YARN impl..

Any user case should be helpful. I really appreciate your time and your
insight.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-10333

Re: Weekly Community Update 2019/33, Personal Chinese Version

2019-08-19 Thread Zili Chen

Hi Paul & Jark,

Thanks for your feedbacks!

I also think of putting the content in the email but hesitate on
where it should be sent to(user-zh only IMO), what kind of thread
it should be sorted to([ANNOUNCE] or just normal thread), and how
to format to fit the email form.

It is reasonable to have this thread sync on both user-zh list side
and web page side so that we both follow the Apache way and make it
convenient to reach for (potential) Chinese users.

I'm glad to put the content in the email but decided to collect some
feedback of the idea first :-) If no other suggestions I am going to
start a separated normal thread, i.e., without [ANNOUNCE] header, to
user-zh list later today or tomorrow.

Best,
tison.

Jark Wu  于2019年8月20日周二 上午11:28写道：

> Hi Zili,
>
> +1 for the Chinese Weekly Community Update.
> I think this will categorical attract more Chinese users.
> Btw, could you also put the content of Chinese Weekly Updates in the
> email? I think this will be more align with the Apache Way.
> So that we can help to response users who have interesting/questions on
> some items.
>
> Thanks,
> Jark
>
>
> On Mon, 19 Aug 2019 at 13:27, Paul Lam  wrote:
>
>> Hi Tison,
>>
>> Big +1 for the Chinese Weekly Community Update. The content is
>> well-organized, and I believe it would be very helpful for Chinese users to
>> get an overview of what’s going on in the community.
>>
>> Best,
>> Paul Lam
>>
>> > 在 2019年8月19日，12:27，Zili Chen  写道：
>> >
>> > Hi community,
>> >
>> > Inspired by weekly community updates thread, regards the growth
>> > of Chinese community and kind of different concerns for community
>> > members I'd like to start a personally maintained Chinese version of
>> > Weekly Community Update.
>> >
>> > Right now I posted these weeks' updates on this page[1] where Chinese
>> > users as well as potential ones could easily reach.
>> >
>> > Any feedbacks are welcome and I am looking for a collaborator who is
>> > familiar with TableAPI/SQL topics to enrich the content.
>> >
>> > Best,
>> > tison.
>> >
>> > [1] https://zhuanlan.zhihu.com/p/78753149 <
>> https://zhuanlan.zhihu.com/p/78753149>
>>
>

Re: Recovery from job manager crash using check points

2019-08-19 Thread Zili Chen

Hi Min,

I guess you use standalone high-availability and when TM fails,
JM can recovered the job from an in-memory checkpoint store.

However, when JM fails, since you don't persist state on ha backend
such as ZooKeeper, even JM relaunched by YARN RM superseded by a
stand by, the new one knows nothing about the previous jobs.

In short, you need to set up ZooKeepers as you yourself mentioned.

Best,
tison.

Biao Liu  于2019年8月19日周一 下午11:49写道：

> Hi Min,
>
> > Do I need to set up zookeepers to keep the states when a job manager
> crashes?
>
> I guess you need to set up the HA [1] properly. Besides that, I would
> suggest you should also check the state backend.
>
> 1.
> https://ci.apache.org/projects/flink/flink-docs-master/ops/jobmanager_high_availability.html
> 2.
> https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Mon, 19 Aug 2019 at 23:28,  wrote:
>
>> Hi,
>>
>>
>>
>> I can use check points to recover Flink states when a task manger crashes.
>>
>>
>>
>> I can not use check points to recover Flink states when a job manger
>> crashes.
>>
>>
>>
>> Do I need to set up zookeepers to keep the states when a job manager
>> crashes?
>>
>>
>>
>> Regards
>>
>>
>>
>> Min
>>
>>
>>
>

Weekly Community Update 2019/33, Personal Chinese Version

2019-08-18 Thread Zili Chen

Hi community,

Inspired by weekly community updates thread, regards the growth
of Chinese community and kind of different concerns for community
members I'd like to start a personally maintained Chinese version of
Weekly Community Update.

Right now I posted these weeks' updates on this page[1] where Chinese
users as well as potential ones could easily reach.

Any feedbacks are welcome and I am looking for a collaborator who is
familiar with TableAPI/SQL topics to enrich the content.

Best,
tison.

[1] https://zhuanlan.zhihu.com/p/78753149

Weekly Community Update 2019/33, Personal Chinese Version

2019-08-18 Thread Zili Chen

Hi community,

Inspired by weekly community updates thread, regards the growth
of Chinese community and kind of different concerns for community
members I'd like to start a personally maintained Chinese version of
Weekly Community Update.

Right now I posted these weeks' updates on this page[1] where Chinese
users as well as potential ones could easily reach.

Any feedbacks are welcome and I am looking for a collaborator who is
familiar with TableAPI/SQL topics to enrich the content.

Best,
tison.

[1] https://zhuanlan.zhihu.com/p/78753149

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread Zili Chen

Congratulations Andrey!

Best,
tison.


Till Rohrmann  于2019年8月14日周三 下午9:26写道：

> Hi everyone,
>
> I'm very happy to announce that Andrey Zagrebin accepted the offer of the
> Flink PMC to become a committer of the Flink project.
>
> Andrey has been an active community member for more than 15 months. He
> has helped shaping numerous features such as State TTL, FRocksDB release,
> Shuffle service abstraction, FLIP-1, result partition management and
> various fixes/improvements. He's also frequently helping out on the
> user@f.a.o mailing lists.
>
> Congratulations Andrey!
>
> Best, Till
> (on behalf of the Flink PMC)
>

Re: [Discuss] What should the "Data Source" be translated into Chinese

2019-08-13 Thread Zili Chen

Source 和 Sink 的通译是 源 和 汇，data source 和 data sink 因此分别对应 数据源 和
数据汇。出处可参见中国计算机协会的《计算机科学技术名词》，线上检索在这里[1]

但是具体到 FLINK 的情况，实际上我建议所有 FLINK 专有名词都不要翻译，翻译反而增加理解难度。

Best,
tison.
[1] http://www.termonline.cn/index.htm


hsw  于2019年8月13日周二 下午6:46写道：

> 赞同，想了半天没想到合适的中文翻译data sink
>
>
>
> --
> 发件人：Jeff Zhang
> 日 期：2019年08月13日 18:20:34
> 收件人：
> 抄 送：dafei1...@sina.com; dev
> 主 题：Re: [Discuss] What should the "Data Source" be translated into Chinese
>
> 不建议翻译
>
> Simon Su  于2019年8月13日周二 下午6:18写道：
>
> > 更倾向不去翻译Data Source和Data Sink, 通过用中文对其做解释即可
> >
> >
> > Thanks,
> > SImon
> >
> >
> > On 08/13/2019 18:07， wrote：
> > How about translate  "data sink" into “数据漕”
> > 漕，读作：cáo。汉字基本字义指通过水道运输粮食：漕运|漕粮。==>
> > https://baike.baidu.com/item/%E6%BC%95?forcehttps=1%3Ffr%3Dkg_hanyu
> >
> >
> >
> > - 原始邮件 -
> > 发件人：Kurt Young 
> > 收件人：dev , user-zh 
> > 主题：Re: [Discuss] What should the "Data Source" be translated into Chinese
> > 日期：2019年08月13日 16点44分
> >
> > cc user-zh mailing list, since there are lots of chinese speaking people.
> > Best,
> > Kurt
> > On Tue, Aug 13, 2019 at 4:02 PM WangHengwei  wrote:
> > Hi all,
> >
> >
> > I'm working on [FLINK-13405] Translate "Basic API Concepts" page into
> > Chinese. I have a problem.
> >
> > Usually we translate "Data Source" into "数据源" but there is no agreed
> > translation for "Data Sink". Since it often appears in documents, I think
> > we'd better to have a unified translation. I have some alternatives, e.g.
> > "数据沉淀","数据归" or "数据终".
> >
> > Committer Xingcan Cui has a good suggestion for "数据汇" which
> > corresponds to source ("数据源"). I asked Committer Jark Wu, he is also fine
> > with it. I think "数据汇" is a good representation of flow charactiristics
> so
> > I would like to use it.
> >
> >
> > I want to hear more thoughts from the community whether we should
> > translate it and what it should be translated into.
> >
> >
> > Thanks,
> > WangHW
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>

Re: JDBC sink for streams API

2019-08-12 Thread Zili Chen

Hi Eduardo,

JDBCSinkFunction is a package-private class which you can make use of
by JDBCAppendTableSink. A typical statement could be

new JDBCAppendTableSink.builder()
   . ...
   .build()
   .consumeDataStream(upstream);

Also JDBCUpsertTableSink and JDBCTableSourceSinkFactory are worth to give a
look.

The successor of OutputFormatSinkFunction is StreamingFileSink, but it
seems out of
your requirements.

Best,
tison.


Eduardo Winpenny Tejedor  于2019年8月13日周二
上午7:59写道：

> Hi all,
>
> Could someone point me to the current advised way of adding a JDBC sink?
>
> Online I've seen one can DataStream.writeUsingOutputFormat() however I see
> the OutputFormatSinkFunction is deprecated.
>
> I can also see a JDBCSinkFunction (or JDBCUpsertSinkFunction) but that is
> "package private" so I can't use it myself. JDBCAppendSinkTable makes use
> of it but that doesn't work with the Streams API...I tried simply copying
> its code into a class of my own alas the flush method in JDBCOutputFormat
> is also "package private"...
>
> JDBC does not receive a lot of attention in Fabian's and Vasili's Stream
> Processing with Apache Flink or in the online documentation - is there a
> reason for this? Is it bad practice?
>
>
> Regards,
> Eduardo
>

Re: How can I pass multiple java options in standalone mode ?

2019-08-12 Thread Zili Chen

Hi Vishwas,

Replace ',' with ' '(space) should work.

Best,
tison.


Vishwas Siravara  于2019年8月13日周二 上午6:50写道：

> Hi guys,
> I have this entry in flink-conf.yaml file for jvm options.
> env.java.opts: "-Djava.security.auth.login.config={{ flink_installed_dir
> }}/kafka-jaas.conf,-Djava.security.krb5.conf={{ flink_installed_dir
> }}/krb5.conf"
>
> Is this supposed to be a , separated list ? I get a parse exception when
> the cluster starts.
>
> Thanks,
> Vishwas
>

Re: Why available task slots are not leveraged for pipeline?

2019-08-12 Thread Zili Chen

Hi Cam,

If you set parallelism to 60, then you would make use of all 60 slots you
have and
for you case, each slot executes a chained operator contains 13 tasks. It
is not
the case one slot executes at least 60 sub-tasks.

Best,
tison.

Cam Mach  于2019年8月12日周一 下午7:55写道：

> Hi Zhu and Abhishek,
>
> Thanks for your response and pointers. It's correct, the count of
> parallelism will be the number of slot used for a pipeline. And, the number
> (or count) of the parallelism is also used to generate number of sub-tasks
> for each operator. In my case, I have parallelism of 60, it generates 60
> sub-tasks for each operator. And so it'll be too much for one slot execute
> at least 60 sub-tasks. I am wondering if there is a way we can set number
> of generated sub-tasks, different than number of parallelism?
>
> Cam Mach
> Software Engineer
> E-mail: cammac...@gmail.com
> Tel: 206 972 2768
>
>
>
> On Sun, Aug 11, 2019 at 10:37 PM Zhu Zhu  wrote:
>
>> Hi Cam,
>> This case is expected due to slot sharing.
>> A slot can be shared by one instance of different tasks. So the used slot
>> is count of your max parallelism of a task.
>> You can specify the shared group with slotSharingGroup(String
>> slotSharingGroup) on operators.
>>
>> Thanks,
>> Zhu Zhu
>>
>> Abhishek Jain  于2019年8月12日周一 下午1:23写道：
>>
>>> What you'se seeing is likely operator chaining. This is the default
>>> behaviour of grouping sub tasks to avoid transer overhead (from one slot to
>>> another). You can disable chaining if you need to. Please refer task
>>> and operator chains
>>> 
>>> .
>>>
>>> - Abhishek
>>>
>>> On Mon, 12 Aug 2019 at 09:56, Cam Mach  wrote:
>>>
 Hello Flink expert,

 I have a cluster with 10 Task Managers, configured with 6 task slot
 each, and a pipeline that has 13 tasks/operators with parallelism of 5. But
 when running the pipeline I observer that only  5 slots are being used, the
 other 55 slots are available/free. It should use all of my slots, right?
 since I have 13 (tasks) x 5 = 65 sub-tasks? What are the configuration that
 I missed in order to leverage all of the available slots for my pipelines?

 Thanks,
 Cam

>>>

Re: flink源码编译可以不编译scala代码吗

2019-08-11 Thread Zili Chen

Flink 编译的默认 Scala 版本是 2.11.x，你可以试着把 Scala 版本切换成 2.11.x 再编译一下。

Best,
tison.


苟刚  于2019年8月10日周六 下午11:08写道：

>
>
>
> Hi,All:
>
>
>   我再尝试编译flink 1.7的源码时，遇到如下错误，本人对scala不是很了解，不知道是不是版本问题引起，另外可以去掉sacla模块编译吗：
>  本机scala版本：2.13.0
> JDK 版本： 1.8.0_91
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
> (default-compile) on project flink-scala_2.11: Compilation failure:
> Compilation failure:
> [ERROR]
> /Users/gang.gou/work/git/github/flink/flink-scala/src/main/java/org/apache/flink/api/scala/typeutils/ScalaEitherSerializerSnapshot.java:[67,44]
> 不兼容的类型:
> 无法推断org.apache.flink.api.java.typeutils.runtime.EitherSerializer<>的类型参数
> [ERROR] 原因: 不存在类型变量L,R,T,T的实例,
> 以使org.apache.flink.api.java.typeutils.runtime.EitherSerializer与org.apache.flink.api.common.typeutils.TypeSerializer>一致
> [ERROR]
> /Users/gang.gou/work/git/github/flink/flink-scala/src/main/java/org/apache/flink/api/scala/typeutils/ScalaEitherSerializerSnapshot.java:[78,86]
> 不兼容的类型:
> org.apache.flink.api.common.typeutils.TypeSerializer>无法转换为org.apache.flink.api.java.typeutils.runtime.EitherSerializer
> [ERROR] -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> [ERROR]
> [ERROR] After correcting the problems, you can resume the build with the
> command
> [ERROR]   mvn  -rf :flink-scala_2.11
>
>
> Process finished with exit code 1
>
>
>
>
> --
> Best Wishes
>Galen.K

Re: need help

2019-08-11 Thread Zili Chen

异常原因如上所说是 akka ask timeout 的问题，这个问题前两天有人在部署 k8s 的时候也遇过[1]

他的情况是配置资源过少导致 JM 未能及时响应。除了调整上述参数外也可看看是不是这个问题。

Best,
tison.

[1]
https://lists.apache.org/thread.html/84db9dca2e990dd0ebc30aa35390ac75a0e9c7cbfcdbc2029986d4d7@%3Cuser-zh.flink.apache.org%3E


Biao Liu  于2019年8月8日周四 下午8:00写道：

> 你好，
>
> 异常里可以看出 AskTimeoutException, 可以调整两个参数 akka.ask.timeout 和 web.timeout
> 再试一下，默认值如下
>
> akka.ask.timeout: 10 s
> web.timeout: 1
>
> PS: 搜 “AskTimeoutException Flink” 可以搜到很多相关答案
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Thu, Aug 8, 2019 at 7:33 PM 陈某  wrote:
>
> >
> >
> > -- Forwarded message -
> > 发件人： 陈某 
> > Date: 2019年8月8日周四 下午7:25
> > Subject: need help
> > To: 
> >
> >
> > 你好，我是一个刚接触flink的新手，在搭建完flink on
> >
> yarn集群后，依次启动zookeeper，hadoop，yarn，flkink集群，并提交认识到yarn上时运行遇到问题，网上搜索相关问题，暂未找到解决方式，希望能得到帮助，谢谢。
> >
> > 使用的运行指令为：
> > [root@flink01 logs]# flink run -m  yarn-cluster
> > ./examples/streaming/WordCount.jar
> > 查看log后错误信息如下：（附件中为完整的log文件）
> > org.apache.flink.client.program.ProgramInvocationException: Could not
> > retrieve the execution result. (JobID: 91e82fd8626bde4c901bf0b1639e12e7)
> > at
> >
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:261)
> > at
> > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
> > at
> >
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
> > at
> >
> org.apache.flink.streaming.examples.wordcount.WordCount.main(WordCount.java:89)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> >
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
> > at
> >
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
> > at
> > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
> > at
> >
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
> > at
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
> > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
> > at
> >
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
> > at
> >
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
> > at
> >
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
> > Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed
> > to submit JobGraph.
> > at
> >
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:388)
> > at
> >
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
> > at
> >
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
> > at
> >
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> > at
> >
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
> > at
> >
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:208)
> > at
> >
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> > at
> >
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
> > at
> >
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
> > at
> >
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
> > at
> >
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
> > at
> >
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.flink.runtime.rest.util.RestClientException:
> > [Internal server error.,  > akka.pattern.AskTimeoutException: Ask timed out on
> > [Actor[akka://flink/user/dispatcher#2035575525]] after [1 ms].
> > Sender[null] sent message of type
> > "org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
> > at
> >
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604)
> > at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
> > at

Re: some slots are not be available,when job is not running

2019-08-09 Thread Zili Chen

Hi,

Could you attach the stack trace in exception or relevant logs?

Best,
tison.


pengcheng...@bonc.com.cn  于2019年8月9日周五 下午4:55写道：

> Hi,
>
> Why are some slots unavailable?
>
> My cluster model is standalone,and high-availability mode is zookeeper.
> task.cancellation.timeout: 0
> some slots are not be available,when job is not running.
>
>
>
> --
> pengcheng...@bonc.com.cn
>

Re: flink-1.8.1 yarn per job模式使用

2019-08-08 Thread Zili Chen

刚发现 user-zh 是有 archive[1] 的，上面提到过的跟你类似的问题是这个 thread[2]。

Best,
tison.

[1] https://lists.apache.org/list.html?user-zh@flink.apache.org
[2]
https://lists.apache.org/thread.html/061d8e48b091b27e797975880c193838f2c37894c2a90aa6a6e83d36@%3Cuser-zh.flink.apache.org%3E

Yuhuan Li  于2019年8月7日周三 下午7:57写道：

> 非常感谢tison，完美的解决了我的问题，以后会多留意社区问题。
>
> 具体到自己的hadoop版本，就是在flink工程编译
> flink-1.8.1/flink-shaded-hadoop/flink-shaded-hadoop2-uber/target
> 的jar放在lib下即可
>
> Zili Chen  于2019年8月7日周三 下午7:33写道：
>
> > 这个问题以前邮件列表有人提过...不过现在 user-zh 没有 archive 不好引用。
> >
> > 你看下是不是 lib 下面没有 flink-shaded-hadoop-2-uber--7.0.jar
> 这样一个文件。
> >
> > 1.8.1 之后 FLINK 把 hadoop(YARN) 的 lib 分开 release 了，你要指定自己的 HADOOP_CLASSPATH
> > 或者下载 FLINK 官网 pre-bundle 的 hadoop。
> >
> > 具体可以看这个页面（https://flink.apache.org/downloads.html）第一段的内容。
> >
> > Best,
> > tison.
> >
> >
> > 李玉环  于2019年8月7日周三 下午7:15写道：
> >
> > > Hi 大家好：
> > >
> > > 在使用flink过程中，运行官网给的命令
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn
> > > 报错如下：
> > >
> > > ➜  flink-1.8.1 ./bin/flink run -m yarn-cluster
> > > ./examples/batch/WordCount.jar
> > > 
> > >  The program finished with the following exception:
> > >
> > > java.lang.RuntimeException: Could not identify hostname and port in
> > > 'yarn-cluster'.
> > > at
> > >
> > >
> >
> org.apache.flink.client.ClientUtils.parseHostPortAddress(ClientUtils.java:47)
> > > at
> > >
> > >
> >
> org.apache.flink.client.cli.AbstractCustomCommandLine.applyCommandLineOptionsToConfiguration(AbstractCustomCommandLine.java:83)
> > > at
> > >
> > >
> >
> org.apache.flink.client.cli.DefaultCLI.createClusterDescriptor(DefaultCLI.java:60)
> > > at
> > >
> > >
> >
> org.apache.flink.client.cli.DefaultCLI.createClusterDescriptor(DefaultCLI.java:35)
> > > at
> > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:224)
> > > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
> > > at
> > >
> > >
> >
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
> > > at
> > >
> > >
> >
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> > > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
> > >
> > >
> > > 疑问：
> > > 1.为什么会将 yarn-clustet解析为host?
> > > 2.要运行single flink job on yarn的正确姿势是啥？
> > >
> > > Best,
> > > Yuhuan
> > >
> >
>

Re: flink-1.8.1 yarn per job模式使用

2019-08-07 Thread Zili Chen

这个问题以前邮件列表有人提过...不过现在 user-zh 没有 archive 不好引用。

你看下是不是 lib 下面没有 flink-shaded-hadoop-2-uber--7.0.jar 这样一个文件。

1.8.1 之后 FLINK 把 hadoop(YARN) 的 lib 分开 release 了，你要指定自己的 HADOOP_CLASSPATH
或者下载 FLINK 官网 pre-bundle 的 hadoop。

具体可以看这个页面（https://flink.apache.org/downloads.html）第一段的内容。

Best,
tison.


李玉环  于2019年8月7日周三 下午7:15写道：

> Hi 大家好：
>
> 在使用flink过程中，运行官网给的命令
>
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/yarn_setup.html#run-a-single-flink-job-on-yarn
> 报错如下：
>
> ➜  flink-1.8.1 ./bin/flink run -m yarn-cluster
> ./examples/batch/WordCount.jar
> 
>  The program finished with the following exception:
>
> java.lang.RuntimeException: Could not identify hostname and port in
> 'yarn-cluster'.
> at
>
> org.apache.flink.client.ClientUtils.parseHostPortAddress(ClientUtils.java:47)
> at
>
> org.apache.flink.client.cli.AbstractCustomCommandLine.applyCommandLineOptionsToConfiguration(AbstractCustomCommandLine.java:83)
> at
>
> org.apache.flink.client.cli.DefaultCLI.createClusterDescriptor(DefaultCLI.java:60)
> at
>
> org.apache.flink.client.cli.DefaultCLI.createClusterDescriptor(DefaultCLI.java:35)
> at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:224)
> at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
> at
>
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
> at
>
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
> at
>
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
>
>
> 疑问：
> 1.为什么会将 yarn-clustet解析为host?
> 2.要运行single flink job on yarn的正确姿势是啥？
>
> Best,
> Yuhuan
>

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Zili Chen

Congrats Hequn!

Best,
tison.


Jeff Zhang  于2019年8月7日周三 下午5:14写道：

> Congrats Hequn!
>
> Paul Lam  于2019年8月7日周三 下午5:08写道：
>
>> Congrats Hequn! Well deserved!
>>
>> Best,
>> Paul Lam
>>
>> 在 2019年8月7日，16:28，jincheng sun  写道：
>>
>> Hi everyone,
>>
>> I'm very happy to announce that Hequn accepted the offer of the Flink PMC
>> to become a committer of the Flink project.
>>
>> Hequn has been contributing to Flink for many years, mainly working on
>> SQL/Table API features. He's also frequently helping out on the user
>> mailing lists and helping check/vote the release.
>>
>> Congratulations Hequn!
>>
>> Best, Jincheng
>> (on behalf of the Flink PMC)
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: Re: submit jobGraph error on server side

2019-08-07 Thread Zili Chen

从错误堆栈上看你的请求应该是已经发到 jobmanager 上了，也就是不存在找不到端口的问题。
但是 jobmanager 在处理 submit job 的时候某个动作超时了。你这个问题是一旦把
gateway 分开就稳定复现吗？也有可能是 akka 偶然的超时。

Best,
tison.


王智  于2019年8月7日周三 下午2:33写道：

> 感谢您的回复与指导~
>
>
> 经过简单的验证（验证方案在邮件末尾），明确是网络问题。
>
>
> 现在我猜测是flink run 提交job graph 的时候打开了除 这四个以外的端口导致。麻烦再请教一下，flink jobmanager
> 是否会打开新的端口进行通讯（或者还有其他端口配置我没有注意到）
>
> ports:
>   - containerPort: 6123
> protocol: TCP
>   - containerPort: 6124
> protocol: TCP
>   - containerPort: 6125
> protocol: TCP
>   - containerPort: 8081
> protocol: TCP
>
>
> # flink conf 内的配置
>
> jobmanager.rpc.port: 6123
>
> jobmanager.rpc.port: 6123
>
> jobmanager.rpc.port: 6123
>
> blob.server.port: 6124
>
> query.server.port: 6125
>
> # 8081 没有配置，使用默认，web ui 可以正常访问
>
> 我使用k8s 搭建的环境，提交任务的节点（命名为gateway）与jobmanager 在两个不同的pod，gateway 通过jobmanager
> 对应的jobmanager-service
> 找到jobmanager对应的服务。猜测是因为我在服务中仅定义了上述4个端口，所以gateway节点上的进程无法通过jobmanager-service
> 与jobmanager 通讯。
>
>
> 附: 以下是我的验证方案: 将提交节点与jobmanager 放入同一个pod，使用回环地址（不会有端口限制）通讯，可以成功提交job【flink 
> 的配置和代码完全一致】
>
>
>
>
>
>
>
>
> 原始邮件
>
> 发件人："Zili Chen"< wander4...@gmail.com >;
>
> 发件时间：2019/8/6 19:19
>
> 收件人："user-zh"< user-zh@flink.apache.org >;
>
> 主题：Re: submit jobGraph error on server side
>
>
> 问题是 Ask timed out on [Actor[akka://flink/user/dispatcher#-273192824]] after
> [1 ms]. Sender[null] sent message of type
> "org.apache.flink.runtime.rpc.
> messages.LocalFencedMessage".
>
> 也就是 submit job 的时候在请求 Dispatcher 的时候 akka ask timeout
> 了，可以检查一下配置的地址和端口是否正确，或者贴出你的相关配置。
>
> Best,
> tison.
>
>
> 王智于2019年8月6日周二 下午7:13写道：
>
> > 向session cluster 提交job 出错，麻烦各位老师帮忙看下，给点排查提示 THX~
> >
> >
> >
> >
> > 环境:
> >
> > blink 1.8.0
> >
> > 用docker 方式启动的flink session cluster，flink 集群独立，我从集群外的一个docker
> > 节点提交job（该节点的flink-conf.yaml 配置与flink 集群内的配置一致）
> >
> >
> >
> >
> > --
> >
> >
> > 报错信息:
> >
> > 
> >
> >  The program finished with the following exception:
> >
> >
> >
> >
> > org.apache.flink.client.program.ProgramInvocationException: The main
> > method caused an error:
> > org.apache.flink.client.program.ProgramInvocationException: Could not
> > retrieve the execution result. (JobID: 82
> >
> > 3a336683f6476b3e7ee2780c33395b)
> >
> > at
> >
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
> >
> > at
> >
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
> >
> > at
> > org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
> >
> > at
> >
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
> >
> > at
> > org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
> >
> > at
> > org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
> >
> > at
> >
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
> >
> > at
> >
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
> >
> > at
> >
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
> >
> > at
> > org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
> >
> > Caused by: java.lang.RuntimeException:
> > org.apache.flink.client.program.ProgramInvocationException: Could not
> > retrieve the execution result. (JobID: 823a336683f6476b3e7ee2780c33395b)
> >
> > at
> >
> com.xx.data.platform.pandora.flink.table.BatchSqlRunner.run(BatchSqlRunner.java:176)
> >
> > at
> >
> com.xx.data.platform.pandora.flink.EntryPoint.main(EntryPoint.java:78)
> >
> > at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> > at
> > java.lang.reflect.Method.invoke(Method.java:498)
> >
> > at
> >
> org.apache.flink.client.program.PackagedProgra

Re: submit jobGraph error on server side

2019-08-06 Thread Zili Chen

问题是 Ask timed out on [Actor[akka://flink/user/dispatcher#-273192824]] after
[1 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.
messages.LocalFencedMessage".

也就是 submit job 的时候在请求 Dispatcher 的时候 akka ask timeout
了，可以检查一下配置的地址和端口是否正确，或者贴出你的相关配置。

Best,
tison.


王智  于2019年8月6日周二 下午7:13写道：

> 向session cluster 提交job 出错，麻烦各位老师帮忙看下，给点排查提示 THX~
>
>
>
>
> 环境:
>
> blink 1.8.0
>
> 用docker 方式启动的flink session cluster，flink 集群独立，我从集群外的一个docker
> 节点提交job（该节点的flink-conf.yaml 配置与flink 集群内的配置一致）
>
>
>
>
> --
>
>
> 报错信息:
>
> 
>
> The program finished with the following exception:
>
>
>
>
> org.apache.flink.client.program.ProgramInvocationException: The main
> method caused an error:
> org.apache.flink.client.program.ProgramInvocationException: Could not
> retrieve the execution result. (JobID: 82
>
> 3a336683f6476b3e7ee2780c33395b)
>
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
>
> at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
>
> at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
>
> at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
>
> at
> org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
>
> at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
>
> at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
>
> at
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
>
> at
> org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>
> at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
>
> Caused by: java.lang.RuntimeException:
> org.apache.flink.client.program.ProgramInvocationException: Could not
> retrieve the execution result. (JobID: 823a336683f6476b3e7ee2780c33395b)
>
> at
> com.xx.data.platform.pandora.flink.table.BatchSqlRunner.run(BatchSqlRunner.java:176)
>
> at
> com.xx.data.platform.pandora.flink.EntryPoint.main(EntryPoint.java:78)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at
> java.lang.reflect.Method.invoke(Method.java:498)
>
> at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
>
> ... 9 more
>
> Caused by: org.apache.flink.client.program.ProgramInvocationException:
> Could not retrieve the execution result. (JobID:
> 823a336683f6476b3e7ee2780c33395b)
>
> at
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:261)
>
> at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
>
> at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:471)
>
> at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:62)
>
> at
> com.xx.data.platform.pandora.flink.table.BatchSqlRunner.run(BatchSqlRunner.java:174)
>
> ... 15 more
>
>
>
> Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed
> to submit JobGraph.
>
> at
> org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$8(RestClusterClient.java:388)
>
> at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:870)
>
> at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:852)
>
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>
> at
> org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:207)
>
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
>
> at
> java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
>
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>
> at
> java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:561)
>
> at
> java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:929)
>
> at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>
> at java.lang.Thread.run(Thread.java:748)
>
> Caused by: org.apache.flink.runtime.rest.util.RestClientException:
> [Internal server

Re: Flink and Akka version incompatibility ..

2019-07-24 Thread Zili Chen

I can see that we relocate akka's netty, akka uncommon math but also
be curious why Flink doesn't shaded all of akka dependencies...

Best,
tison.


Debasish Ghosh  于2019年7月24日周三 下午3:15写道：

> Hello Haibo -
>
> Yes, my application depends on Akka 2.5.
> Just curious, why do you think it's recommended to shade Akka version of
> my application instead of Flink ?
>
> regards.
>
> On Wed, Jul 24, 2019 at 12:42 PM Haibo Sun  wrote:
>
>> Hi  Debasish Ghosh,
>>
>> Does your application have to depend on Akka 2.5? If not, it's a good
>> idea to always keep the Akka version that the application depend on in line
>> with Flink.
>> If you want to try shading Akka dependency, I think that it is more
>> recommended to shade Akka dependency of your application.
>>
>> Best,
>> Haibo
>>
>> At 2019-07-24 14:31:29, "Debasish Ghosh" 
>> wrote:
>>
>> Hello -
>>
>> An application that uses Akka 2.5 and Flink 1.8.0 gives runtime errors
>> because of version mismatch between Akka that we use and the one that Flink
>> uses (which is Akka 2.4). Anyone tried shading Akka dependency with Flink ?
>>
>> Or is there any other alternative way to handle this issue ? I know Flink
>> 1.9 has upgraded to Akka 2.5 but this is (I think) going to be a recurring
>> problem down the line with mismatch between the new releases of Akka and
>> Flink.
>>
>> regards.
>>
>> --
>> Debasish Ghosh
>> http://manning.com/ghosh2
>> http://manning.com/ghosh
>>
>> Twttr: @debasishg
>> Blog: http://debasishg.blogspot.com
>> Code: http://github.com/debasishg
>>
>>
>
> --
> Debasish Ghosh
> http://manning.com/ghosh2
> http://manning.com/ghosh
>
> Twttr: @debasishg
> Blog: http://debasishg.blogspot.com
> Code: http://github.com/debasishg
>

Re: Flink 1.8 run参数不一样

2019-07-23 Thread Zili Chen

你好，可以查看下 log/ 目录下的相关日志有没有这样一段

2019-07-24 09:34:36,507 WARN  org.apache.flink.client.cli.CliFrontend
- Could not load CLI class
org.apache.flink.yarn.cli.FlinkYarnSessionCli.

java.lang.NoClassDefFoundError:
org/apache/hadoop/yarn/exceptions/YarnException

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:264)

at
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1187)

at
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1147)

at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1072)

Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.yarn.exceptions.YarnException

at java.net.URLClassLoader.findClass(URLClassLoader.java:382)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

... 5 more


如果有的话，现在 Flink 把核心包和 hadoop 的 pre-bundled 包分开 release，需要你单独下载 pre-bundled 的
hadoop 然后放到 lib/ 文件夹里。


具体地，请仔细阅读下载页面[1] "Apache Flink 1.8.1" 字样上面的文字内容。

Best,
tison.

[1] https://flink.apache.org/downloads.html


王佩  于2019年7月24日周三 上午9:30写道：

> 之前下载的Flink 1.8，运行bin/flink run --help，会有 yarn-cluster 的一些参数，如下:
> Options for yarn-cluster mode:
>  -d,--detachedIf present, runs the job in
> detached
>   mode
>  -m,--jobmanager Address of the JobManager
> (master) to
>   which to connect. Use this flag
> to
>   connect to a different JobManager
> than
>   the one specified in the
>   configuration.
>  -sae,--shutdownOnAttachedExitIf the job is submitted in
> attached
>   mode, perform a best-effort
> cluster
>   shutdown when the CLI is
> terminated
>   abruptly, e.g., in response to a
> user
>   interrupt, such as typing Ctrl +
> C.
>  -yD  use value for given property
>  -yd,--yarndetached   If present, runs the job in
> detached
>   mode (deprecated; use non-YARN
>   specific option instead)
>  -yh,--yarnhelp   Help for the Yarn session CLI.
>  -yid,--yarnapplicationIdAttach to running YARN session
>  -yj,--yarnjar   Path to Flink jar file
>  -yjm,--yarnjobManagerMemory Memory for JobManager Container
> with
>   optional unit (default: MB)
>  -yn,--yarncontainer Number of YARN container to
> allocate
>   (=Number of Task Managers)
>  -ynl,--yarnnodeLabelSpecify YARN node label for the
> YARN
>   application
>  -ynm,--yarnname Set a custom name for the
> application
>   on YARN
>  -yq,--yarnquery  Display available YARN resources
>   (memory, cores)
>  -yqu,--yarnqueueSpecify YARN queue.
>  -ys,--yarnslots Number of slots per TaskManager
>  -yst,--yarnstreaming Start Flink in streaming mode
>  -yt,--yarnship  Ship files in the specified
> directory
>   (t for transfer)
>  -ytm,--yarntaskManagerMemoryMemory per TaskManager Container
> with
>   optional unit (default: MB)
>  -yz,--yarnzookeeperNamespaceNamespace to create the Zookeeper
>   sub-paths for high availability
> mode
>  -z,--zookeeperNamespace Namespace to create the Zookeeper
>   sub-paths for high availability
> mode
>
>
> 现在下载的Flink 1.8，运行bin/flink run --help，总共只有如下参数，少了yarn-cluster选项:
> Action "run" compiles and runs a program.
>
>   Syntax: run [OPTIONS]  
>   "run" action options:
>  -c,--classClass with the program entry
> point
>   ("main" method or "getPlan()"
> method.
>   Only needed if the JAR file does
> not
>   specify the class in its
> manifest.
>  -C,--classpath  Adds a URL to each user code
>   classloader  on all nodes in the
>   cluster. The paths must specify a
>   protocol (e.g. file://)

Re: apache flink: Why checkpoint coordinator takes long time to get completion

2019-07-23 Thread Zili Chen

Hi Xiangyu,

Could you share the corresponding JIRA that fixed this issue?

Best,
tison.

Xiangyu Su  于2019年7月19日周五 下午8:47写道：

> btw. it seems like this issue has been fixed in 1.8.1
>
> On Fri, 19 Jul 2019 at 12:21, Xiangyu Su  wrote:
>
>> Ok, thanks.
>>
>> and this time-consuming until now always happens after 3rd checkpointing,
>> and this unexpected  time-consuming was always consistent (~ 4 min by under
>> 4G/min incoming traffic).
>>
>> On Fri, 19 Jul 2019 at 11:06, Biao Liu  wrote:
>>
>>> Hi Xiangyu,
>>>
>>> Just took a glance at the relevant codes. There is a gap between
>>> calculating the duration and logging it out. I guess the checkpoint 4 is
>>> finished in 1 minute, but there is an unexpected time-consuming operation
>>> during that time. But I can't tell which part it is.
>>>
>>>
>>> Xiangyu Su  于2019年7月19日周五 下午4:14写道：
>>>
 Dear flink community,

 We are POC flink(1.8) to process data in real time, and using global
 checkpointing(S3) and local checkpointing(EBS), deploy cluster on EKS. Our
 application is consuming data from Kinesis.

 For my test e.g I am using checkpointing interval 5min. and minimum
 pause 2min.

 The issue what we saw is: It seems like flink checkpointing process
 would be idle for 3-4 min, before job manager get complete notification.

 here is some logging from job manager:

 2019-07-10 11:59:03,893 INFO  
 org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering 
 checkpoint 4 @ 1562759941082 for job e7a97014f5799458f1c656135712813d.
 2019-07-10 12:05:01,836 INFO  
 org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed 
 checkpoint 4 for job e7a97014f5799458f1c656135712813d (22387207650 bytes 
 in 58645 ms).

 As my understanding the logging above, the 
 completedCheckpoint(CheckpointCoordinator)
 object has been completed in 58645 ms, but the whole checkpointing process
 took ~ 6min.

 This logging is for 4th checkpointing, But the first 3 checkpointing
 were finished on time.
 Could you please tell me, why flink checkpointing in my test was
 starting "idle" for few minutes after 3 checkpointing?

 Best Regards
 --
 Xiangyu Su
 Java Developer
 xian...@smaato.com

 Smaato Inc.
 San Francisco - New York - Hamburg - Singapore
 www.smaato.com

 Germany:
 Valentinskamp 70, Emporio, 19th Floor
 20355 Hamburg
 M 0049(176)22943076

 The information contained in this communication may be CONFIDENTIAL and
 is intended only for the use of the recipient(s) named above. If you are
 not the intended recipient, you are hereby notified that any dissemination,
 distribution, or copying of this communication, or any of its contents, is
 strictly prohibited. If you have received this communication in error,
 please notify the sender and delete/destroy the original message and any
 copy of it from your computer or paper files.

>>>
>>
>> --
>> Xiangyu Su
>> Java Developer
>> xian...@smaato.com
>>
>> Smaato Inc.
>> San Francisco - New York - Hamburg - Singapore
>> www.smaato.com
>>
>> Germany:
>> Valentinskamp 70, Emporio, 19th Floor
>> 20355 Hamburg
>> M 0049(176)22943076
>>
>> The information contained in this communication may be CONFIDENTIAL and
>> is intended only for the use of the recipient(s) named above. If you are
>> not the intended recipient, you are hereby notified that any dissemination,
>> distribution, or copying of this communication, or any of its contents, is
>> strictly prohibited. If you have received this communication in error,
>> please notify the sender and delete/destroy the original message and any
>> copy of it from your computer or paper files.
>>
>
>
> --
> Xiangyu Su
> Java Developer
> xian...@smaato.com
>
> Smaato Inc.
> San Francisco - New York - Hamburg - Singapore
> www.smaato.com
>
> Germany:
> Valentinskamp 70, Emporio, 19th Floor
> 20355 Hamburg
> M 0049(176)22943076
>
> The information contained in this communication may be CONFIDENTIAL and is
> intended only for the use of the recipient(s) named above. If you are not
> the intended recipient, you are hereby notified that any dissemination,
> distribution, or copying of this communication, or any of its contents, is
> strictly prohibited. If you have received this communication in error,
> please notify the sender and delete/destroy the original message and any
> copy of it from your computer or paper files.
>

[SURVEY] How many people implement Flink job based on the interface Program?

2019-07-22 Thread Zili Chen

Hi guys,

We want to have an accurate idea of how many people are implementing
Flink job based on the interface Program, and how they actually
implement it.

The reason I ask for the survey is from this thread[1] where we notice
this codepath is stale and less useful than it should be. As it is an
interface marked as @PublicEvolving it is originally aimed at serving
as user interface. Thus before doing deprecation or dropping, we'd like
to see if there are users implementing their job based on this
interface(org.apache.flink.api.common.Program) and if there is any,
we are curious about how it is used.

If little or none of Flink user based on this interface, we would
propose deprecating or dropping it.

I really appreciate your time and your insight.

Best,
tison.

[1]
https://lists.apache.org/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E

Re: Jython support for Flink

2019-07-20 Thread Zili Chen

Hi Dante,

Nice finding! I just miss this powerful project :-)

Best,
tison.


Dante Van den Broeke  于2019年7月20日周六 下午5:28写道：

>
> Hi Tison, Jeff,
>
> Thanks a lot for the help! I’ll definitly look into the python API and
> py4j support next. I was also thinking about trying to create the pipeline
> through beam and flink instead of kafka and flink, as i see that python is
> a full class citizen in the beam framework!
>
> Regards,
> Dante
>
> Op 20 jul. 2019 om 01:59 heeft Zili Chen  het
> volgende geschreven:
>
> Hi Dante,
>
> Both Jython and Jython support for Flink are out of development
> and maintain. As pointed out by Jeff, Flink 1.9 supports Python
> api via py4j[1] and the document page as posted.
>
> I guess your algorithms are written in CPython instead of Jython
> and want Jython only for interoperate, and thus recommend you to
> have a look at the doc posted above.
>
> For previous Jython support example or setup, cc Chesnay who is
> our committer familiar with this scope.
>
> Best,
> tison.
>
> [1] https://www.py4j.org/
>
>
>
> Jeff Zhang  于2019年7月19日周五 下午11:06写道：
>
>> Hi Dante,
>>
>> Flink 1.9 support python api, which may be what you want. See
>> https://ci.apache.org/projects/flink/flink-docs-master/tutorials/python_table_api.html
>>
>>
>> Dante Van den Broeke  于2019年7月19日周五
>> 下午10:40写道：
>>
>>> Dear,
>>>
>>>
>>> I'm a student currently working on a project involving apache kafka and
>>> flink. The project itself is revolved around path prediction and machine
>>> learning for websites. To test a prove of concept I setup a kafka server
>>> locally (goal is to expend this to a google cloud server or similar later)
>>> and a kafka producer (written in java intelliJ idea project). The producer
>>> would send JSON data (currently just a local file but later json data from
>>> the website itself) to a flink-kafka connection and the data transformation
>>> (key-windowed by user id) would than happen in the flink framework.
>>>
>>>
>>> The problem i'm facing however is that i wrote all the algorithms for
>>> transformation of the data in python and i'm struggling with initializing a
>>> jython environment to setup the flink-kafka connection.
>>>
>>> I was wondering whether or not there is a working example for this setup
>>> / some documentation regarding the framework as i'm struggling to find a
>>> lot of documentation for my application online.
>>>
>>>
>>> thanks in advance.
>>>
>>>
>>> kind regards,
>>>
>>> Dante Van den Broeke
>>>
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>

Re: Jython support for Flink

2019-07-19 Thread Zili Chen

Hi Dante,

Both Jython and Jython support for Flink are out of development
and maintain. As pointed out by Jeff, Flink 1.9 supports Python
api via py4j[1] and the document page as posted.

I guess your algorithms are written in CPython instead of Jython
and want Jython only for interoperate, and thus recommend you to
have a look at the doc posted above.

For previous Jython support example or setup, cc Chesnay who is
our committer familiar with this scope.

Best,
tison.

[1] https://www.py4j.org/



Jeff Zhang  于2019年7月19日周五 下午11:06写道：

> Hi Dante,
>
> Flink 1.9 support python api, which may be what you want. See
> https://ci.apache.org/projects/flink/flink-docs-master/tutorials/python_table_api.html
>
>
> Dante Van den Broeke  于2019年7月19日周五
> 下午10:40写道：
>
>> Dear,
>>
>>
>> I'm a student currently working on a project involving apache kafka and
>> flink. The project itself is revolved around path prediction and machine
>> learning for websites. To test a prove of concept I setup a kafka server
>> locally (goal is to expend this to a google cloud server or similar later)
>> and a kafka producer (written in java intelliJ idea project). The producer
>> would send JSON data (currently just a local file but later json data from
>> the website itself) to a flink-kafka connection and the data transformation
>> (key-windowed by user id) would than happen in the flink framework.
>>
>>
>> The problem i'm facing however is that i wrote all the algorithms for
>> transformation of the data in python and i'm struggling with initializing a
>> jython environment to setup the flink-kafka connection.
>>
>> I was wondering whether or not there is a working example for this setup
>> / some documentation regarding the framework as i'm struggling to find a
>> lot of documentation for my application online.
>>
>>
>> thanks in advance.
>>
>>
>> kind regards,
>>
>> Dante Van den Broeke
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>

Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？

2019-07-18 Thread Zili Chen

Hi,

欢迎有 PR 后同步到这个 thread 上 :-)

Best,
tison.


highfei2011  于2019年7月19日周五 上午8:34写道：

> Hi，Zili Chen:
> 早上好，你讲的没错，谢谢。另外我发现，Glossary 英文文档中没有 Slot 和 Parallelism
> 的说明，建议添加。这样可以方便初学者和用户的学习和使用！
>
> 祝好
>
>
>
>  Original Message 
> Subject: Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？
> From: Zili Chen
> To: user-zh@flink.apache.org
> CC:
>
> 没有可援引的通译出处建议专有名词不要翻译。Glossary 的解释部分可以解释得详尽一点，上面像 record task
> 这些有比较普遍共识的还有商讨空间，像 transformation "operator chain"
> 强行翻译很可能是懂的人本来就看得懂，不懂的人看了还是不懂。现在不翻译在有通译之后可以改，先根据个人喜好翻译了以后就不好改了。
>
> 一点拙见。
>
>
> Best,
> tison.
>
>
> highfei2011  于2019年7月18日周四 下午11:35写道：
>
> > Hi 各位,
> >   晚上好！
> >   以下名词在翻译 Glossary 章节时，有必要翻译成中文吗？名词列表如下：
> >
> >
> >
> > Flink Application Cluster
> >
> >
> > Flink Cluster
> >
> >
> > Event
> >
> >
> > ExecutionGraph
> >
> >
> > Function
> >
> >
> > Instance
> >
> >
> > Flink Job
> >
> >
> > JobGraph
> >
> >
> > Flink JobManager
> >
> >
> > Logical Graph
> >
> >
> > Managed State
> >
> >
> > Flink Master
> >
> >
> > Operator
> >
> >
> > Operator Chain
> >
> >
> > Partition
> >
> >
> > Physical Graph
> >
> >
> > Record
> >
> >
> > Flink Session Cluster
> >
> >
> > State Backend
> >
> >
> > Sub-Task
> >
> >
> > Task
> >
> >
> > Flink TaskManager
> >
> >
> > Transformation
> >
> >
> >
> >
> > 祝好！
>
>

Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？

2019-07-18 Thread Zili Chen

没有可援引的通译出处建议专有名词不要翻译。Glossary 的解释部分可以解释得详尽一点，上面像 record task
这些有比较普遍共识的还有商讨空间，像 transformation "operator chain"
强行翻译很可能是懂的人本来就看得懂，不懂的人看了还是不懂。现在不翻译在有通译之后可以改，先根据个人喜好翻译了以后就不好改了。

一点拙见。


Best,
tison.


highfei2011  于2019年7月18日周四 下午11:35写道：

> Hi 各位,
>   晚上好！
>   以下名词在翻译 Glossary 章节时，有必要翻译成中文吗？名词列表如下：
>
>
>
> Flink Application Cluster
>
>
> Flink Cluster
>
>
> Event
>
>
> ExecutionGraph
>
>
> Function
>
>
> Instance
>
>
> Flink Job
>
>
> JobGraph
>
>
> Flink JobManager
>
>
> Logical Graph
>
>
> Managed State
>
>
> Flink Master
>
>
> Operator
>
>
> Operator Chain
>
>
> Partition
>
>
> Physical Graph
>
>
> Record
>
>
> Flink Session Cluster
>
>
> State Backend
>
>
> Sub-Task
>
>
> Task
>
>
> Flink TaskManager
>
>
> Transformation
>
>
>
>
> 祝好！

Re: flink 自定义UDTF函数

2019-06-16 Thread Zili Chen

@imj...@gmail.com 这个问题已经在另一个 thread 【Types源码】下解答了

Best,
tison.


Jark Wu  于2019年6月17日周一 上午11:26写道：

> Hi Liu,
>
> 好像还是没有收到你的函数源码？ 你可以尝试将源码粘贴过来，不要粘贴图片。
>
> On Mon, 17 Jun 2019 at 10:09, liu_mingzhang  wrote:
>
> > 自定义的函数如下:
> >
> >
> > 源码中注释里的示例也是这样写的,但是编译报错...
> >
> > 在2019年6月15日 15:52，Hequn Cheng <
> chenghe...@gmail.com>
> > 写道：
> >
> > 自定义的函数忘贴了么 ：）
> >
> > On Fri, Jun 14, 2019 at 6:58 PM liu_mingzhang 
> > wrote:
> >
> >
> > 我希望自定义一个这样功能的UDTF,
> >
> > 将表中原始数据:
> > idfiled1field2
> > 1 A,B,C D,E,F
> >
> > 转换成:
> > idnumfiled1field2
> > 1   1AD
> > 1   2BE
> > 1   3CF
> >
> >
> > 下面是我自己写的函数,但是报错
> >
> >
> > 然而ROW是有这样的构造函数的
> >
> >
> > 我不明白我为什么会报错, 希望大佬们帮忙解答,不胜感激
> >
> >
>

Re: Types源码

2019-06-16 Thread Zili Chen

你试过直接运行吗？IDEA 有时候对 Scala 的类型推断有问题，可以编译运行的代码会误报类型不匹配。如果可以运行应该是 IDEA
的问题，可以到相应的 issue tracker[1] 报告。

Best,
tison.

[1] https://youtrack.jetbrains.com/oauth?state=%2Fissues%2FIDEA


liu_mingzhang  于2019年6月17日周一 上午10:22写道：

>
> 我希望自定义一个这样功能的UDTF,
>
> 将表中原始数据:
> idfiled1field2
> 1 A,B,C D,E,F
>
> 转换成:
> idnumfiled1field2
> 1   1AD
> 1   2BE
> 1   3CF
>
>
> 下面是我自己写的函数,但是报错
>
> 然而org.apache.flink.table.api.Types.ROW是有这样的构造函数的,注释中的示例也是这样写的
>
> 我不明白我为什么会报错, 希望大佬们帮忙解答,不胜感激
>
> 另: 不知道为什么之前发的图片没法成功, 如果这次还看不到的话, 麻烦请看附件,多谢各位大佬
>

Re: [ANNOUNCE] Weekly Community Update 2019/24

2019-06-16 Thread Zili Chen

Hi Konstantin and all,

Thank Konstantin very much for reviving this tradition! It reminds
me of the joyful time I can easily catch up interesting ongoing threads.
Thanks for Till's work, too.

Besides exciting updates and news above, I'd like to pick up
some other threads you guys may be interested in.

* xiaogang has recently started a discussion[1] on allowing
at-most-once delivery in case of failures, which adapts Flink
to more scenarios.

* vino has raised a discussion[2] on supporting local aggregation
in Flink, which was received a lot of positive feedbacks and now
there is a ongoing FLIP-44 thread[3].

* Jeff Zhang has raised a discussion[4] and drafted a design doc[5]
on Flink client API enhancement, which aims at overcoming limitation
when integrating Flink with projects such as Zepplin or Beam.

Best,
tison.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Allow-at-most-once-delivery-in-case-of-failures-td29464.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-Local-Aggregation-in-Flink-td29307.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-44-Support-Local-Aggregation-in-Flink-td29513.html
[4]
https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
[5]
https://docs.google.com/document/d/1VavBrYn8vJeZs-Mhu5VzKO6xrWCF40aY0nlQ_UVVTRg/


Konstantin Knauf  于2019年6月17日周一 上午12:10写道：

> Dear community,
>
> last year Till did a great job on summarizing recent developments in the
> Flink community in a "Weekly community update" thread. I found this very
> helpful and would like to revive this tradition with a focus on topics &
> threads which are particularly relevant to the wider community of Flink
> users.
>
> As we haven't had such an update for some time (since December 2018), I
> find it impossible to cover everything that's currently going on in this
> email. I'll try to include most ongoing discussions and FLIPs over the
> course of the next weeks to catch up. Afterwards I am going to go back to
> only focus on news since the last update.
>
> You are welcome to share any additional news and updates with the
> community in this thread.
>
> Flink Development
> ===
>
> * [releases] The community is currently working on a Flink 1.8.1 release
> [1]. The first release candidate should be ready soon (one critical bug to
> fix as of writing, FLINK-12863).
> * [releases] Kurt and Gordon stepped up as release managers for Flink 1.9
> and started a thread [2] to sync on the status of various development
> threads targeted for Flink 1.9. Check it out to see if the feature you are
> waiting for is likely to make it or not.
> * [savepoints] Gordon, Kostas and Congxian have recently started a
> discussion [3] on unifying the savepoint format across StateBackends, which
> will enable users to switch between StateBackends when recovering from a
> Savepoint. The related discussion on introducing Stop-With-Checkpoint [4]
> initiated by Yu Li is closely related and worth a read to understand the
> long term vision.
> * [savepoints] Seth and Gordon have started a discussion to add a State
> Processing API ("Savepoint Connector"), which will allow reading &
> modifying existing Savepoints as well as creating new Savepoints from
> scratch with the DataSet API. The feature is targeted for Flink 1.9.0 as a
> new *library*.
> * [python-support] Back in April we had a discussion on the mailing list
> about adding Python Support to the Table API [6]. This support will likely
> be available in Flink 1.9 (without UDFs and later with UDF support as
> well). Therefore, Stephan has started a discussion [7] to deprecate the
> current Python API in Flink 1.9. This has gotten a lot of positive feedback
> and the only open question as of writing is whether to only deprecate it or
> to remove it directly.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-8-1-td29154.html
> [2]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-9-0-td28701.html
> [3]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-41-Unify-Keyed-State-Snapshot-Binary-Format-for-Savepoints-td29197.html
> [4] https://issues.apache.org/jira/browse/FLINK-12619
> [5]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLIP-43-Savepoint-Connector-td29232.html
> [6]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html#a28096
> [7]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Deprecate-previous-Python-APIs-td29483.html#a29522
>
> Notable Bugs
> ===
>
> In this section I am going to list some recently discovered bugs, which
> might be relevant to a larger audience. I'll try to explain them to the
> best of my knowledge, but no

Re: Local blobStore not freed

2019-06-11 Thread Zili Chen

Hi Dan,

Said "The files are removed after a restart of the process", it sounds Flink
cleaned up blobs properly. From your description I don't understand clearly
in which case/situation you expected Flink deleted blobs but it doesn't.

Could you describe the difference between 1.4.2 and 1.7.2/1.8.0 in detail?
Especially what moment exactly you want blobs to be deleted and it was not.

Best,
tison.


Dede  于2019年6月11日周二 下午11:06写道：

> Hi Team,
>
> I'm struggling for a while with a strange issue: the local blob store
> files are not actually deleted from the job manager/task manager in
> versions 1.7.2 and 1.8.0 : lsof commands shows them like this:
> java6528  root   63r   REG 202,16 162458786 1802248
> /mnt/tmp1/blobStore-542fc202-b263-482f-87d7-11b5ad70cc32/job_b3446b24474ac3e107bbde27ff24df98/blob_p-96a29e8796d15ce6359edb4ab80ff8661f8b1fd0-73395221d4ffcfd603dbd1d25961aee3
> (deleted)
>
> The files are removed after a restart of the process, so I guess the flink
> itself is responsible for keeping a handle to the deleted file.
>
> The same process works fine on 1.4.2, deleting all the files properly.
>
> Is there something that I'm missing? I played around with the blob server
> configuration, but with no luck.
>
> I can provide more logs/debug if needed.
>
> Thanks,
> Dan
>
>
>

Re: [DISCUSS] Deprecate previous Python APIs

2019-06-11 Thread Zili Chen

+1

Best,
tison.


zhijiang  于2019年6月11日周二 下午10:52写道：

> It is reasonable as stephan explained. +1 from my side!
>
> --
> From:Jeff Zhang 
> Send Time:2019年6月11日(星期二) 22:11
> To:Stephan Ewen 
> Cc:user ; dev 
> Subject:Re: [DISCUSS] Deprecate previous Python APIs
>
> +1
>
> Stephan Ewen  于2019年6月11日周二 下午9:30写道：
>
> > Hi all!
> >
> > I would suggest to deprecating the existing python APIs for DataSet and
> > DataStream API with the 1.9 release.
> >
> > Background is that there is a new Python API under development.
> > The new Python API is initially against the Table API. Flink 1.9 will
> > support Table API programs without UDFs, 1.10 is planned to support UDFs.
> > Future versions would support also the DataStream API.
> >
> > In the long term, Flink should have one Python API for DataStream and
> > Table APIs. We should not maintain multiple different implementations and
> > confuse users that way.
>
> > Given that the existing Python APIs are a bit limited and not under active
> > development, I would suggest to deprecate them in favor of the new API.
> >
> > Best,
> > Stephan
> >
> >
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>

Re: apache-flink项目导入IDEA出现错误

2019-05-13 Thread Zili Chen

你好，

图片不可见，请使用附件或贴链接。一般来说，IDEA你可以直接从git version control check out出来。

Best,
tison.


程 婕  于2019年5月13日周一 下午3:09写道：

> Dear Flink Commiter
>
>
>
> 您好。我是一名研一的学生，并且我对flink的开发有非常大的兴趣。最近我正在尝试将一个算法做在flink框架中。但是我在将flink代码导入IDEA
> 时出现了一些问题。比如，我在build project时，会显示以下错误：
>
> [image: cid:image001.png@01D5082C.AE3E9D60]
>
> 这个错误是在buid
>
> 时出现的。
>
> 我很好奇为什么会出现这个错误，因为在导入项目和build project的过程中，我并没有更改flink
> 的代码。在我的想法中，这类问题应该不会存在。我尝试在网上寻找一些方法来解决这个问题，但似乎没有得到很好的效果。不知道您是否可以帮助我解决这问题呢？
>
> 万分感谢
>
>
>
> Jie Cheng
>
> 发送自 Windows 10 版邮件 应用
>
>
>

Re: [ANNOUNCE] Apache Flink 1.8.0 released

2019-04-10 Thread Zili Chen

Cheers!

Glad to meet the release of 1.8.0. Thank Aljoscha for being
the release manager and all contributors who made this release
possible.

Best,
tison.


Aljoscha Krettek  于2019年4月10日周三 下午4:31写道：

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.8.0, which is the next major release.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2019/04/09/release-1.8.0.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12344274
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Regards,
> Aljoscha

Re: blink提交yarn卡在一直重复分配container

2019-04-07 Thread Zili Chen

你好，apache 的邮件列表不支持内嵌图片，请以附件或链接方式引用。

Best,
tison.


苏 欣  于2019年4月8日周一 上午10:17写道：

> 我以per-job方式提交了一个作业到yarn上面，发现会出现不断重复分配container的现象。
>
> 现象为从yarn的web ui上看一瞬间tm的container分配成功了，但是立刻变为只剩一个jm的container，接着会继续分配tm的
> container。不断的重复这个过程直到作业调度不到资源而失败。
>
> 我查了一下exit code没找到31代表是什么意思，有没有大佬帮忙分析下，非常感谢！
>
>
>
> 发送自 Windows 10 版邮件 应用
>
>
>

Re: How is Collector out element processed?

2019-04-03 Thread Zili Chen

Collector should follow the "push" mechanism.

Best,
tison.


Son Mai  于2019年4月4日周四 下午12:11写道：

> Hi Tison,
>
> so are you saying that the output will be iterated on when the next
> operator that called them? and they are not processed in push but pull
> mechanism by the next operator like sink?
>
> Thanks,
>
> On Thu, Apr 4, 2019 at 9:46 AM Zili Chen  wrote:
>
>> Hi Son,
>>
>> As from Collector's document, it collects a record and forwards it.
>> The collector is the "push" counterpart of the Iterator which "pulls"
>> data in.
>>
>> Best,
>> tison.
>>
>>
>> Son Mai  于2019年4月4日周四 上午10:15写道：
>>
>>> Hi all,
>>>
>>> when I create new classes extending ProcessFunction or implementing
>>> WindowFunction, there is a *Collector out* for output.
>>>
>>> How is this output processed in the next stage, for example a Sink or
>>> another WindowAssigner? Is it processed immediately by the next operator by
>>> push mechanism, or is the collector checked regularly by the next operator
>>> to see if an element exists?
>>>
>>> Thanks,
>>> Son
>>>
>>

Re: How is Collector out element processed?

2019-04-03 Thread Zili Chen

Hi Son,

As from Collector's document, it collects a record and forwards it.
The collector is the "push" counterpart of the Iterator which "pulls" data
in.

Best,
tison.


Son Mai  于2019年4月4日周四 上午10:15写道：

> Hi all,
>
> when I create new classes extending ProcessFunction or implementing
> WindowFunction, there is a *Collector out* for output.
>
> How is this output processed in the next stage, for example a Sink or
> another WindowAssigner? Is it processed immediately by the next operator by
> push mechanism, or is the collector checked regularly by the next operator
> to see if an element exists?
>
> Thanks,
> Son
>

Re: Reading csv file

2019-04-01 Thread Zili Chen

Option 1. Provide command-line parameter following
https://stackoverflow.com/questions/2066307/how-do-you-input-commandline-argument-in-intellij-idea
Option 2. Directly set `String inputPath =
"/home/alaa/Downloads/sorted_data.csv";`

Best,
tison.


alaa  于2019年4月1日周一 下午2:42写道：

> So what should i do ?
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Reading csv file

2019-04-01 Thread Zili Chen

Hi,

Isn't it because you do not provide command-line arguments?

Best,
tison.


alaa  于2019年4月1日周一 下午2:33写道：

> Hi,
>
> I am following the Taxi example provided on
> "http://training.data-artisans.com/exercises/taxiData.html;, however, I
> got
> the following error message when I run AreasTotalPerHour.java at my
> Intellij
> IDE.
>
>
> <
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1965/Screenshot_from_2019-04-01_08-18-03.png>
>
>
> i just inset the path for csv file in line:
> System.err.println("/home/alaa/Downloads/sorted_data.csv");
>
> Should i change anything in utils package ??
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>

Re: Re: flink ha模式进程hang！！！

2019-03-25 Thread Zili Chen

如果没有清理此前的 zk 数据的话，有可能是此前你把 high-availability.storageDir 配置成
/flink/ha/zookeeper，随后清理了 hdfs 但是 zk 上还有过期的 handler 的信息

Best,
tison.


Han Xiao  于2019年3月26日周二 上午9:33写道：

> Hi，早上好，谢谢您的回复，以下是我的配置项及参数：
>
> flink-conf.yaml
> common：
> jobmanager.rpc.address: test10
> jobmanager.rpc.port: 6123
> jobmanager.heap.size: 1024m
> taskmanager.heap.size: 1024m
> taskmanager.numberOfTaskSlots: 2
> parallelism.default: 2
> taskmanager.tmp.dirs: /app/tools/flink-1.7.2/tmp
>
> High Availability：
> high-availability: zookeeper
> high-availability.storageDir: hdfs://test10:8020/flink/ha/
>  ##此文件目录可以正常生成，但无jobGraph相关目录；
> high-availability.zookeeper.quorum:
> ip1:2181,ip2:2181,ip3:2181,ip4:2181,ip5:2181
> high-availability.zookeeper.client.acl: open
>
> Fault tolerance and checkpointing：
> state.backend: filesystem
> state.checkpoints.dir: hdfs://test10:8020/flink-checkpoints  ##此目录没有生成；
>
>  Web Frontend：
> rest.port: 8081
>
> masters： slaves：
> test10：8081   test12
> test11 : 8082test13
>      test14
>
> 以上为全部配置项，结合下面报的错误信息检索路径，我的配置中并没有。。。很让我不解。
>
> Thank you for your reply！
> 发件人： Zili Chen
> 发送时间： 2019-03-25 19:57
> 收件人： user-zh@flink.apache.org
> 主题： Re: flink ha模式进程hang！！！
> 看起来是 HDFS 去 /flink/ha/zookeeper/submittedJobGraphb05001535f91 这个路径下找
> submittedJobGraph，这个看起来就不太对。
>
> Flink 的 ha 需要配置 zk 的路径和把 state 存到 file system 的路径，你可以试试把
> high-availability.storageDir
> 配成一个有效的 HDFS 路径
>
> Best,
> tison.
>
>
> Zili Chen  于2019年3月25日周一 下午7:53写道：
>
> > 能提供你的 ha 配置吗？特别是 high-availability.storageDir，我怀疑是不是没有配置这个啊
> > Best,
> > tison.
> >
> >
> > Han Xiao  于2019年3月25日周一 下午7:26写道：
> >
> >> 各位朋友大家好，我是flink初学者，部署flink ha的过程中出现一些问题，麻烦大家帮忙看下；
> >> 启动flink ha后，jobmanager进程直接hang，使用的flink 1.7.2版本，下面log中有一处出现此错误  File
> does
> >> not exist: /flink/ha/zookeeper/submittedJobGraphb05001535f91
> >> ，让我不解的是我的checkpoint目录以及ha目录并不是这个，为什么会到这个目录去找，我所配置的目录下没有生成JobGraph
> ，他会一直去检索
> >> /a5ffe00b0bc5688d9a7de5c62b8150e6
> >> 这个作业图而且找不到，我删除了所有相关的配置路径之后重新搭建，启动时还是会去检索，我该怎样避免flink去检索这个JobGraph
> >> ，让我的ha群集健康的运行起来。
> >>
> >>
> >> 报错日志：
> >> 2019-03-25 18:55:00,742 ERROR
> >> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal
> error
> >> occurred in the cluster entrypoint.
> >> java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could
> >> not retrieve submitted JobGraph from state handle under
> >> /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved
> state
> >> handle is broken. Try cleaning the state handle store.
> >> at
> >> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
> >> at
> >>
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:74)
> >> at
> >>
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> >> ...
> >> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> >> submitted JobGraph from state handle under
> >> /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved
> state
> >> handle is broken. Try cleaning the state handle store.
> >> at
> >>
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
> >> at
> >>
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
> >> at
> >>
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
> >> 
> >> Caused by: java.io.FileNotFoundException: File does not exist:
> >> /flink/ha/zookeeper/submittedJobGraphb05001535f91
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
> >> ...
> >> Caused by: org.apache.hadoop.ipc.RemoteException(java.io
> .FileNotFoundException):
> >> File does not exist: /flink/ha/zookeeper/submittedJobGraphb05001535f91
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100)
> >> at
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
> >> ...
> >>
> >> 谢谢！
> >>
> >
>

Re: flink ha模式进程hang！！！

2019-03-25 Thread Zili Chen

能提供你的 ha 配置吗？特别是 high-availability.storageDir，我怀疑是不是没有配置这个啊
Best,
tison.


Han Xiao  于2019年3月25日周一 下午7:26写道：

> 各位朋友大家好，我是flink初学者，部署flink ha的过程中出现一些问题，麻烦大家帮忙看下；
> 启动flink ha后，jobmanager进程直接hang，使用的flink 1.7.2版本，下面log中有一处出现此错误  File does
> not exist: /flink/ha/zookeeper/submittedJobGraphb05001535f91
> ，让我不解的是我的checkpoint目录以及ha目录并不是这个，为什么会到这个目录去找，我所配置的目录下没有生成JobGraph ，他会一直去检索
> /a5ffe00b0bc5688d9a7de5c62b8150e6
> 这个作业图而且找不到，我删除了所有相关的配置路径之后重新搭建，启动时还是会去检索，我该怎样避免flink去检索这个JobGraph
> ，让我的ha群集健康的运行起来。
>
>
> 报错日志：
> 2019-03-25 18:55:00,742 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
> occurred in the cluster entrypoint.
> java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could
> not retrieve submitted JobGraph from state handle under
> /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
> at
> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:74)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> ...
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> submitted JobGraph from state handle under
> /a5ffe00b0bc5688d9a7de5c62b8150e6. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
> at
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
> 
> Caused by: java.io.FileNotFoundException: File does not exist:
> /flink/ha/zookeeper/submittedJobGraphb05001535f91
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
> ...
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
> File does not exist: /flink/ha/zookeeper/submittedJobGraphb05001535f91
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
> ...
>
> 谢谢！
>

Re: blink ha，进程启动就挂掉

2019-03-14 Thread ZiLi Chen

注意到这一行

ERROR org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  -
Authentication failed

你的 ZK 是正常工作并且 blink 正确连接上了吗？

Best,
tison.


xiao...@chinaunicom.cn  于2019年3月13日周三 下午3:58写道：

> Hi，All
>
> 搭建了blink的ha，节点为：JM(node1,node2)，TM(node3,node4,node5)但是启动后node1的进程就挂掉，node2的进程不能启动，报错如下：
>
> node1的JobManager日志：
> ERROR
> org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  -
> Authentication failed
>
> ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> Fatal error occurred in the cluster entrypoint.
> org.apache.flink.util.FlinkException: Could not retrieve submitted
> JobGraph from state handle under /a5ffe00b0bc5688d9a7de5c62b8150e6. This
> indicatesthatthe retrieved state handle is broken. Try
> cleaning the state handle store.
> at
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:196)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:646)
> 
>
> node2的JobManager日志：
> ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
> Fatal error occurred in the cluster entrypoint.
> org.apache.flink.runtime.dispatcher.DispatcherException: Could not
> start the added job a5ffe00b0bc5688d9a7de5c62b8150e6
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$onAddedJobGraph$31(Dispatcher.java:878)
> at
> java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
> 
>
> TaskManager日志：
> ERROR org.apache.flink.runtime.taskexecutor.TaskManagerRunner
>  - Fatal error occurred while executing the TaskManager. Shutting it down...
> java.lang.Exception: Reconnect to RM failed
> at
> org.apache.flink.runtime.taskexecutor.TaskExecutor.lambda$closeResourceManagerConnection$3(TaskExecutor.java:1179)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:332)
> at
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:158)
> 
>
> flink-conf.yaml 配置：
> jobmanager.rpc.address: localhost
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 4096
> taskmanager.heap.mb: 4096
> taskmanager.numberOfTaskSlots: 2
> parallelism.default: 6
> taskmanager.managed.memory.size: 256
> yarn.application-attempts: 10
> env.java.home: /opt/jdk1.8.0_171/
> fs.hdfs.hadoopconf:
> /app/cloudera/parcels/CDH-5.14.0-1.cdh5.14.0.p0.24/lib/hadoop/etc/hadoop/
> taskmanager.network.numberOfBuffers: 1024
> high-availability: zookeeper
> high-availability.storageDir: hdfs://ip:8020/blink/ha/zookeeper/storageDir/
> high-availability.zookeeper.quorum: ip:2181
> high-availability.filesystem.path.jobgraphs:
> /app/blinkTmp/TaskTmp/jobgraphs/
> state.backend: filesystem
> state.checkpoints.dir: hdfs://ip:8020/blink/flink-checkpoints
> state.backend.incremental: true
> rest.port: 8081
>
> masters配置：
> node1:8081
> node2:8081
>
> slaves配置：
> node3
> node4
> node5
>
> 本人刚刚接触blink，我认为是我的配置有问题，大家有人体验了blink的安装部署么？配置能否发给我，我该怎样解决我的环境所出现的问题？
>
> 谢谢。
>
>

1 2 >

1 - 100 of 102 matches

Mail list logo