Re: [Question] How to use different filesystem between checkpointdata and user data sink

2019-12-23 Thread ouywl






Hi Piotr:     As debug the code, I found The jobmanager classpath is not content “system-plugin.jar”, But when run `configureFileSystems(configuration)` in `ClusterEntrypoint.startCluster()`, It will initialize with the FileSystem plugin in the method 'FileSystem.initialize(configuration, PluginUtils.createPluginManagerFromRootFolder(configuration));’,  After that when blob getFilesystem,It will load FileSystem by schema, So it will call “MyFileSystemFactory” which implements myself, and It have put core-site.xml, hfs-site.xml in “system-plugin.jar”, It is not  Suitable for ‘hdfs://slothTest/user/sloth/HA/’ 。Full stack is :  Log Type: jobmanager.logLog Upload Time: Sun Dec 22 19:09:16 +0800 2019Log Length: 396002019-12-22 19:09:12,305 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
2019-12-22 19:09:12,307 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting YarnJobClusterEntrypoint (Version: 1.9.1, Rev:4d56de8, Date:30.09.2019 @ 11:32:19 CST)
2019-12-22 19:09:12,308 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS current user: yarn
2019-12-22 19:09:12,790 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current Hadoop/Kerberos user: sloth
2019-12-22 19:09:12,791 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.152-b16
2019-12-22 19:09:12,791 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum heap size: 406 MiBytes
2019-12-22 19:09:12,791 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JAVA_HOME: /usr/jdk64/jdk1.8.0_152
2019-12-22 19:09:12,792 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Hadoop version: 2.7.3
2019-12-22 19:09:12,792 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM Options:
2019-12-22 19:09:12,793 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms424m
2019-12-22 19:09:12,793 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx424m
2019-12-22 19:09:12,793 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -XX:+PrintGCDetails
2019-12-22 19:09:12,793 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -XX:+PrintGCDateStamps
2019-12-22 19:09:12,793 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xloggc:/home/sloth/hadoop/yarn/logs/application_1576548114502_0152/container_e70_1576548114502_0152_02_01/jobmanager-gc.log
2019-12-22 19:09:12,793 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -XX:+UseGCLogFileRotation
2019-12-22 19:09:12,794 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -XX:NumberOfGCLogFiles=1
2019-12-22 19:09:12,794 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -XX:GCLogFileSize=1M
2019-12-22 19:09:12,794 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog.file=/home/sloth/hadoop/yarn/logs/application_1576548114502_0152/container_e70_1576548114502_0152_02_01/jobmanager.log
2019-12-22 19:09:12,794 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlogback.configurationFile=file:logback.xml
2019-12-22 19:09:12,794 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Dlog4j.configuration=file:log4j.properties
2019-12-22 19:09:12,794 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program Arguments: (none)
2019-12-22 19:09:12,795 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Classpath: 

[ANNOUNCE] 每周社区更新-51 (20191216-20191222)

2019-12-23 Thread Hequn Cheng
大家好,

很高兴与大家分享本周的社区摘要,其中包括发布 Flink 1.10 和 Flink 1.9.2 的更新,关于将 Flink Docker image
发布集成到 Flink 发布过程中的讨论,PyFlink 后期新功能的讨论以及一些博客文章。

Flink开发
==

* [releases] Kostas Kloudas 建议在 feature-freeze 期间,关注下1.10新功能的文档。他创建了一个总
umbrella issue(FLINK-15273)来跟踪未完成的文档任务。 [1]

* [releases] Hequn 展开了一个启动Flink 1.9.2发布的讨论。 本周解决了一个 blocker,还剩一个blocker。
考虑到正在进行的1.10版本以及社区的资源有限,计划在圣诞节后进行1.9.2的投票。[2]

* [releases] Patrick 建议将 Flink Docker 映像发布集成到Flink发布过程中。 目前的争论点是是否要为发布
Docker 镜像的 Dockerfiles 提供专门的 git repo。[3]

* [sql] 关于在 Flink SQL 中支持 JSON 函数的讨论似乎已经达成共识。 Jark Wu 建议 Forward Xu 开始 Flip
投票。[4]

* [runtime] 在试用了新的 FLIP-49 内存配置之后,Stephan 进行了讨论并提供了一些反馈。
他提供了一些关于配置键名称和描述的改进意见。 目前收到了许多其他人的赞同。 [5]

* [connectors] Flip-27(新的 source 接口)的讨论本周有了一些更新。 本周讨论的重点是“有界和无界”的概念。 [6]

* [pyflink] Jincheng 展开了一个讨论,意在和社区一起讨论 PyFlink 接下来希望支持的功能。目前有一个人回复,期待
PyFlink 能更好地集成 Jupyter。 [7]

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Documentation-tasks-for-release-1-10-td36031.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-9-2-td36087.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Integrate-Flink-Docker-image-publication-into-Flink-release-process-td36139.html
[4]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-JSON-functions-in-Flink-SQL-td32674.html
[5]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Some-feedback-after-trying-out-the-new-FLIP-49-memory-configurations-td36129.html
[6]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-27-Refactor-Source-Interface-td24952.html
[7]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-What-parts-of-the-Python-API-should-we-focus-on-next-td36119.html

已知缺陷
==

[FLINK-15262] [1.10.0] kafka connector doesn't read from beginning
immediately when 'connector.startup-mode' = 'earliest-offset'. [8]
即使设置了'connector.startup-mode' = 'earliest-offset'的配置项,Kafka 的 connector
也没有从最开始的点位消费。

[FLINK-15300] [1.10.0] Shuffle memory fraction sanity check does not
account for its min/max limit. [9]
如果我们有一个设置 shuffle memory
最小/最大值的配置,但是分数超出最小/最大范围,则完整性检查(TaskExecutorResourceUtils#sanityCheckShuffleMemory)可能会失败。

[FLINK-15304] [1.11.0] Remove unexpected Hadoop dependency from Flink's
Mesos integration. [10]
目前 Hadoop 依赖存在于 Flink 的 Mesos 集成中,需要去掉。

[FLINK-15313] [1.10.0] Can not insert decimal with precision into sink
using TypeInformation. [11]
如果 Insert 到一张带有 Decimal 类型的表,并且 Decimal 类型包含精度,那么目前 Flink 会抛出异常。

[FLINK-15320] [1.10.0] JobManager crashes in the standalone model when
cancelling job which subtask' status is scheduled. [12]
Standalone 集群下,如果 cancel 一个子 task 的状态是 scheduled 的作业,JobManager 会崩溃。

[8] https://issues.apache.org/jira/browse/FLINK-15262
[9] https://issues.apache.org/jira/browse/FLINK-15300
[10] https://issues.apache.org/jira/browse/FLINK-15304
[11] https://issues.apache.org/jira/browse/FLINK-15313
[12] https://issues.apache.org/jira/browse/FLINK-15320

活动/博客文章/其他
===

* Philip Wilcox 发布了一个博客,介绍 Bird 公司内他们如何使用 Flink 检测离线踏板车。
该博客主要分享一些如何解决实际业务场景中一系列棘手问题的经验,涉及 Kafka,事件时间,水印和排序。 [13]

* Preetdeep Kumar 发表了一篇博文,介绍了使用 Apache Flink 处理流数据的用例和最佳实践。[14].

[13]
https://www.ververica.com/blog/replayable-process-functions-time-ordering-and-timers
[14] https://dzone.com/articles/streaming-etl-with-apache-flink

谢谢,
Hequn


CEP匹配乱序数据的问题

2019-12-23 Thread qishang zhong
HI,大家好。

咨询一个问题,flink-training-exercises练习的工程里面
com.ververica.flinktraining.solutions.datastream_java.cep.LongRidesSolution

Pattern completedRides =
Pattern.begin("start")
.where(new SimpleCondition() {
@Override
public boolean filter(TaxiRide ride) throws Exception {
return ride.isStart;
}
})
.next("end")
.where(new SimpleCondition() {
@Override
public boolean filter(TaxiRide ride) throws Exception {
return !ride.isStart;
}
});

现在有一个类似的监控场景,也是需要超时后输出没有匹配到的数据,但是流的数据有可能产生乱序。
是不是就不能匹配例子中的Pattern?
如果我想乱序的数据也要匹配上,不作为超时输出有什么对应的解决方案吗?