Re: [ANNOUNCE] Apache Flink CDC 3.1.0 released

2024-05-17 文章 Leonard Xu
Congratulations !

Thanks Qingsheng for the great work and all contributors involved !!

Best,
Leonard


> 2024年5月17日 下午5:32,Qingsheng Ren  写道:
> 
> The Apache Flink community is very happy to announce the release of
> Apache Flink CDC 3.1.0.
> 
> Apache Flink CDC is a distributed data integration tool for real time
> data and batch data, bringing the simplicity and elegance of data
> integration via YAML to describe the data movement and transformation
> in a data pipeline.
> 
> Please check out the release blog post for an overview of the release:
> https://flink.apache.org/2024/05/17/apache-flink-cdc-3.1.0-release-announcement/
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> Maven artifacts for Flink CDC can be found at:
> https://search.maven.org/search?q=g:org.apache.flink%20cdc
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12354387
> 
> We would like to thank all contributors of the Apache Flink community
> who made this release possible!
> 
> Regards,
> Qingsheng Ren



[ANNOUNCE] Donation Flink CDC into Apache Flink has Completed

2024-03-20 文章 Leonard Xu
Hi devs and users,

We are thrilled to announce that the donation of Flink CDC as a sub-project of 
Apache Flink has completed. We invite you to explore the new resources 
available:

- GitHub Repository: https://github.com/apache/flink-cdc
- Flink CDC Documentation: 
https://nightlies.apache.org/flink/flink-cdc-docs-stable

After Flink community accepted this donation[1], we have completed software 
copyright signing, code repo migration, code cleanup, website migration, CI 
migration and github issues migration etc. 
Here I am particularly grateful to Hang Ruan, Zhongqaing Gong, Qingsheng Ren, 
Jiabao Sun, LvYanquan, loserwang1024 and other contributors for their 
contributions and help during this process!


For all previous contributors: The contribution process has slightly changed to 
align with the main Flink project. To report bugs or suggest new features, 
please open tickets 
Apache Jira (https://issues.apache.org/jira).  Note that we will no longer 
accept GitHub issues for these purposes.


Welcome to explore the new repository and documentation. Your feedback and 
contributions are invaluable as we continue to improve Flink CDC.

Thanks everyone for your support and happy exploring Flink CDC!

Best,
Leonard
[1] https://lists.apache.org/thread/cw29fhsp99243yfo95xrkw82s5s418ob



Re: [ANNOUNCE] Apache Flink 1.19.0 released

2024-03-18 文章 Leonard Xu
Congratulations, thanks release managers and all involved for the great work!


Best,
Leonard

> 2024年3月18日 下午4:32,Jingsong Li  写道:
> 
> Congratulations!
> 
> On Mon, Mar 18, 2024 at 4:30 PM Rui Fan <1996fan...@gmail.com> wrote:
>> 
>> Congratulations, thanks for the great work!
>> 
>> Best,
>> Rui
>> 
>> On Mon, Mar 18, 2024 at 4:26 PM Lincoln Lee  wrote:
>>> 
>>> The Apache Flink community is very happy to announce the release of Apache 
>>> Flink 1.19.0, which is the fisrt release for the Apache Flink 1.19 series.
>>> 
>>> Apache Flink® is an open-source stream processing framework for 
>>> distributed, high-performing, always-available, and accurate data streaming 
>>> applications.
>>> 
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>> 
>>> Please check out the release blog post for an overview of the improvements 
>>> for this bugfix release:
>>> https://flink.apache.org/2024/03/18/announcing-the-release-of-apache-flink-1.19/
>>> 
>>> The full release notes are available in Jira:
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353282
>>> 
>>> We would like to thank all contributors of the Apache Flink community who 
>>> made this release possible!
>>> 
>>> 
>>> Best,
>>> Yun, Jing, Martijn and Lincoln



Re: 退订

2024-02-22 文章 Leonard Xu
可以发送任意内容的邮件到  user-zh-unsubscr...@flink.apache.org   取消订阅来自 
user-zh@flink.apache.org  邮件列表的邮件,邮件列表的订阅管理,可以参考[1]

祝好,
[1] https://flink.apache.org/zh/what-is-flink/community/

> 2024年2月20日 下午4:36,任香帅  写道:
> 
> 退订



Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 文章 Leonard Xu
Thanks Jing for driving the release, nice work!

Thanks all who involved this release!

Best,
Leonard

> 2024年1月20日 上午12:01,Jing Ge  写道:
> 
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
> series.
> 
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
> 
> Please note: Users that have state compression should not migrate to 1.18.1
> (nor 1.18.0) due to a critical bug that could lead to data loss. Please
> refer to FLINK-34063 for more information.
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
> 
> We would like to thank all contributors of the Apache Flink community who
> made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
> @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
> release.
> 
> A Jira task series based on the Flink release wiki has been created for
> 1.18.1 release. Tasks that need to be done by PMC have been explicitly
> created separately. It will be convenient for the release manager to reach
> out to PMC for those tasks. Any future patch release could consider cloning
> it and follow the standard release process.
> https://issues.apache.org/jira/browse/FLINK-33824
> 
> Feel free to reach out to the release managers (or respond to this thread)
> with feedback on the release process. Our goal is to constantly improve the
> release process. Feedback on what could be improved or things that didn't
> go so well are appreciated.
> 
> Regards,
> Jing



Re: [ANNOUNCE] Apache Flink 1.17.2 released

2023-11-28 文章 Leonard Xu

Thanks Yun for driving the release.  
Thanks a lot to everyone that has contributed with bug fixes and other 
improvements!

Best,
Leonard


> 2023年11月29日 下午1:05,Yun Tang  写道:
> 
> The Apache Flink community is very happy to announce the release of Apache 
> Flink 1.17.2, which is the second bugfix release for the Apache Flink 1.17 
> series.
> 
> Apache Flink® Is a framework and distributed processing engine for stateful 
> computations over unbounded and bounded data streams. Flink has been designed 
> to run in all common cluster environments, perform computations at in-memory 
> speed and at any scale.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
> 
> Please check out the release blog post for an overview of the improvements 
> for this bugfix release: 
> https://flink.apache.org/2023/11/29/apache-flink-1.17.2-release-announcement/ 
> 
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353260
>  
> 
> 
> We would like to thank all contributors of the Apache Flink community who 
> made this release possible!
> 
> 
> Feel free to reach out to the release managers (or respond to this thread) 
> with feedback on the release process. Our goal is to constantly improve the 
> release process. Feedback on what could be improved or things that didn't go 
> so well are appreciated.
> 
> 
> Regards,
> Release Manager



Re: [ANNOUNCE] Apache Flink 1.18.0 released

2023-10-26 文章 Leonard Xu
Congratulations, Well done!

Best,
Leonard

On Fri, Oct 27, 2023 at 12:23 AM Lincoln Lee  wrote:

> Thanks for the great work! Congrats all!
>
> Best,
> Lincoln Lee
>
>
> Jing Ge  于2023年10月27日周五 00:16写道:
>
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.18.0, which is the first release for the Apache Flink 1.18
> series.
> >
> > Apache Flink® is an open-source unified stream and batch data processing
> > framework for distributed, high-performing, always-available, and
> accurate
> > data applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this release:
> >
> >
> https://flink.apache.org/2023/10/24/announcing-the-release-of-apache-flink-1.18/
> >
> > The full release notes are available in Jira:
> >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12352885
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible!
> >
> > Best regards,
> > Konstantin, Qingsheng, Sergey, and Jing
> >
>


Re: 提问

2023-05-22 文章 Leonard Xu
(1)可以检查下是不是其他作业或者同步工具使用了对应的server-id
(2) server-id 可以尝试用机器IP+时间戳来生成,这样能尽可能避免冲突

祝好,
雪尽

> On May 22, 2023, at 3:34 PM, 曹明勤  wrote:
> 
> 在我提交的flink-cdc-mysql的任务中,需要flink同步多张表的数据,但是我遇到了server-id重复的问题。我尝试过设置随机数,但是server-id有一定的取值范围,并且随机数还是有可能重复。官方文档建议我将server-id设置为一个范围,比如5400-6400,并且设置flink的并行度。这些我都做了,但是当我同步表的数量较多时,还是会出现server-id重复的问题导致任务提交失败。我需要如何设置才能如何避免这种错误?
> 
> 
> 
> 
> In the Flinks-cdc-mysql task I submitted, flink was required to synchronize 
> data of multiple tables, but I encountered the problem of server-id 
> duplication. I tried to set a random number, but server-id has a range of 
> values, and random numbers can be repeated. The official documentation 
> advised me to set server-id to a range, such as 5400-6400, and set flink's 
> parallelism. I did all of this, but when I synchronized a large number of 
> tables, I still had the problem of server-id duplication, which caused the 
> task submission to fail. What do I need to set up to avoid this error?



Re: checkpoint Kafka Offset commit failed

2023-05-04 文章 Leonard Xu
可以发送任意内容的邮件到  user-zh-unsubscr...@flink.apache.org   取消订阅来自 
user-zh@flink.apache.org  邮件列表的邮件,邮件列表的订阅管理,可以参考[1]

祝好,
Leonard
[1] https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8

> 2023年5月4日 下午9:00,wuzhongxiu  写道:
> 
> 退订
> 
> 
> 
> | |
> go574...@163.com
> |
> |
> 邮箱:go574...@163.com
> |
> 
> 
> 
> 
>  回复的原邮件 
> | 发件人 | zhan...@eastcom-sw.com |
> | 日期 | 2023年05月04日 14:54 |
> | 收件人 | user-zh |
> | 抄送至 | |
> | 主题 | checkpoint Kafka Offset commit failed |
> hi,请问在flink(1.14、1.16) checkpoint(10s)提交 kafka偏移量提示 The coordinator is not 
> available  
> 
> 查看kafka集群日志都是正常的,手动也可以正确提交偏移量,重启flink job后也可以正常提交,运行一段时间后又会失败,请问有参数可以优化一下吗?
> 
> flink 日志如下:
> 2023-05-04 11:31:02,636 WARN  
> org.apache.flink.connector.kafka.source.reader.KafkaSourceReader [] - Failed 
> to commit consumer offsets for checkpoint 69153
> org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset 
> commit failed with a retriable exception. You should retry committing the 
> latest consumed offsets.
> Caused by: org.apache.kafka.common.errors.CoordinatorNotAvailableException: 
> The coordinator is not available.



Re: 退订

2023-05-04 文章 Leonard Xu


如果需要取消订阅 user-zh@flink.apache.org 邮件组,请发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org ,参考[1]

[1] https://flink.apache.org/zh/community/

> 2023年4月21日 上午10:52,琴师 <1129656...@qq.com.INVALID> 写道:
> 
> 退订
> 
> 
> 琴师
> 1129656...@qq.com
> 
> 
> 
> 



Re: [ANNOUNCE] Flink Table Store Joins Apache Incubator as Apache Paimon(incubating)

2023-03-27 文章 Leonard Xu
Congratulations!


Best,
Leonard

> On Mar 27, 2023, at 5:23 PM, Yu Li  wrote:
> 
> Dear Flinkers,
> 
> As you may have noticed, we are pleased to announce that Flink Table Store 
> has joined the Apache Incubator as a separate project called Apache 
> Paimon(incubating) [1] [2] [3]. The new project still aims at building a 
> streaming data lake platform for high-speed data ingestion, change data 
> tracking and efficient real-time analytics, with the vision of supporting a 
> larger ecosystem and establishing a vibrant and neutral open source community.
> 
> We would like to thank everyone for their great support and efforts for the 
> Flink Table Store project, and warmly welcome everyone to join the 
> development and activities of the new project. Apache Flink will continue to 
> be one of the first-class citizens supported by Paimon, and we believe that 
> the Flink and Paimon communities will maintain close cooperation.
> 
> 亲爱的Flinkers,
> 
> 正如您可能已经注意到的,我们很高兴地宣布,Flink Table Store 已经正式加入 Apache 孵化器独立孵化 [1] [2] 
> [3]。新项目的名字是 Apache 
> Paimon(incubating),仍致力于打造一个支持高速数据摄入、流式数据订阅和高效实时分析的新一代流式湖仓平台。此外,新项目将支持更加丰富的生态,并建立一个充满活力和中立的开源社区。
> 
> 在这里我们要感谢大家对 Flink Table Store 项目的大力支持和投入,并热烈欢迎大家加入新项目的开发和社区活动。Apache Flink 
> 将继续作为 Paimon 支持的主力计算引擎之一,我们也相信 Flink 和 Paimon 社区将继续保持密切合作。
> 
> Best Regards,
> Yu (on behalf of the Apache Flink PMC and Apache Paimon PPMC)
> 
> 致礼,
> 李钰(谨代表 Apache Flink PMC 和 Apache Paimon PPMC)
> 
> [1] https://paimon.apache.org/ 
> [2] https://github.com/apache/incubator-paimon 
> 
> [3] https://cwiki.apache.org/confluence/display/INCUBATOR/PaimonProposal 
> 


Re: 项目中引入 flink-sql-connector-oracle-cdc-2.3.0.jar 后启动报解析配置异常

2023-03-25 文章 Leonard Xu
flink-sql-connector-xx 都是uber jar, 不应该在项目中直接uber jar,你在项目中应该引入 
flink-connector-xx 依赖并自己管理。


Best,
Leonard

> On Mar 25, 2023, at 3:25 PM, casel.chen  wrote:
> 
> 项目中引入 flink-sql-connector-oracle-cdc-2.3.0.jar 
> 后启动过程中报如下异常,查了一下该jar下有oracle.xml.jaxp.JXDocumentBuilderFactory类,有什么办法解决么?
> 
> 
> ERROR StatusLogger Caught javax.xml.parsers.ParserConfigurationException 
> setting feature http://xml.org/sax/features/external-general-entities to 
> false on DocumentBuilderFactory 
> oracle.xml.jaxp.JXDocumentBuilderFactory@68dc098b: 
> javax.xml.parsers.ParserConfigurationException
> javax.xml.parsers.ParserConfigurationException
> at 
> oracle.xml.jaxp.JXDocumentBuilderFactory.setFeature(JXDocumentBuilderFactory.java:374)
> at 
> org.apache.logging.log4j.core.config.xml.XmlConfiguration.setFeature(XmlConfiguration.java:204)
> at 
> org.apache.logging.log4j.core.config.xml.XmlConfiguration.disableDtdProcessing(XmlConfiguration.java:197)
> at 
> org.apache.logging.log4j.core.config.xml.XmlConfiguration.newDocumentBuilder(XmlConfiguration.java:186)
> at 
> org.apache.logging.log4j.core.config.xml.XmlConfiguration.(XmlConfiguration.java:89)
> at 
> org.apache.logging.log4j.core.config.xml.XmlConfigurationFactory.getConfiguration(XmlConfigurationFactory.java:46)
> at 
> org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:558)
> at 
> org.apache.logging.log4j.core.config.ConfigurationFactory$Factory.getConfiguration(ConfigurationFactory.java:482)
> at 
> org.apache.logging.log4j.core.config.ConfigurationFactory.getConfiguration(ConfigurationFactory.java:322)
> at 
> org.apache.logging.log4j.core.LoggerContext.reconfigure(LoggerContext.java:695)
> 



[ANNOUNCE] Apache Flink 1.17.0 released

2023-03-23 文章 Leonard Xu
The Apache Flink community is very happy to announce the release of Apache 
Flink 1.17.0, which is the first release for the Apache Flink 1.17 series.

Apache Flink® is an open-source unified stream and batch data processing 
framework for distributed, high-performing, always-available, and accurate data 
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements for 
this release:
https://flink.apache.org/2023/03/23/announcing-the-release-of-apache-flink-1.17/

The full release notes are available in Jira:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12351585
 

We would like to thank all contributors of the Apache Flink community who made 
this release possible!

Best regards,
Qingsheng, Martijn, Matthias and Leonard

Re: [ANNOUNCE] FRocksDB 6.20.3-ververica-2.0 released

2023-01-31 文章 Leonard Xu
Thanks Yanfei for driving the release ! !


Best,
Leonard

> On Jan 31, 2023, at 3:43 PM, Yun Tang  wrote:
> 
> Thanks Yuanfei for driving the frocksdb release!
> 
> Best
> Yun Tang
> From: Yuan Mei 
> Sent: Tuesday, January 31, 2023 15:09
> To: Jing Ge 
> Cc: Yanfei Lei ; d...@flink.apache.org 
> ; user ; 
> user-zh@flink.apache.org 
> Subject: Re: [ANNOUNCE] FRocksDB 6.20.3-ververica-2.0 released
>  
> Thanks Yanfei for driving the release!
> 
> Best
> Yuan
> 
> On Mon, Jan 30, 2023 at 8:46 PM Jing Ge via user  > wrote:
> Hi Yanfei,
> 
> Thanks for your effort. Looking forward to checking it.
> 
> Best regards,
> Jing
> 
> On Mon, Jan 30, 2023 at 1:42 PM Yanfei Lei  > wrote:
> It is very happy to announce the release of FRocksDB 6.20.3-ververica-2.0.
> 
> Compiled files for Linux x86, Linux arm, Linux ppc64le, MacOS x86,
> MacOS arm, and Windows are included in FRocksDB 6.20.3-ververica-2.0
> jar, and the FRocksDB in Flink 1.17 would be updated to
> 6.20.3-ververica-2.0.
> 
> Release highlights:
> - [FLINK-30457] Add periodic_compaction_seconds option to RocksJava[1].
> - [FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13[2].
> - Avoid expensive ToString() call when not in debug[3].
> - [FLINK-24932] Support build FRocksDB Java on Apple silicon[4].
> 
> Maven artifacts for FRocksDB can be found at:
> https://mvnrepository.com/artifact/com.ververica/frocksdbjni 
> 
> 
> We would like to thank all efforts from the Apache Flink community
> that made this release possible!
> 
> [1] https://issues.apache.org/jira/browse/FLINK-30457 
> 
> [2] https://issues.apache.org/jira/browse/FLINK-30321 
> 
> [3] https://github.com/ververica/frocksdb/pull/55 
> 
> [4] https://issues.apache.org/jira/browse/FLINK-24932 
> 
> 
> Best regards,
> Yanfei
> Ververica(Alibaba)



Re: FlinkCDC可以读到MySQL变更数据,但是插不到新的MySQL表里

2022-11-29 文章 Leonard Xu


> On Nov 4, 2022, at 2:34 PM, 左岩 <13520871...@163.com> wrote:
> 
> tenv.executeSql("xxx);
> env.execute();


这样使用是不对的,你可以看下这两个方法的java doc

祝好,
Leonard

Re: flink sql接cdc数据源按最新数据统计问题

2022-11-29 文章 Leonard Xu


> On Nov 29, 2022, at 8:32 AM, casel.chen  wrote:
> 
> 业务需求是mysql订单表按天按供应商实时统计交易金额,订单表会发生修改和删除,用flink 
> sql要如何实现呢?开窗取最新一条记录再聚合吗?如果遇到delete记录会不会减去相应的price呢?试着写了如下flink sql不知道对不对


会的,可以看下flink sql相关的原理文章,百度/谷歌一搜一大把。

祝好
Leonard


> 
> 
> select 
>  s.biddate, 
>  s.supplier, 
>  sum(s.price) 
> from 
>  (
>select 
>  * 
>from 
>  (
>select 
>  biddate, 
>  supplier, 
>  price, 
>  ROW_NUMBER() OVER (
>PARTITION BY biddate, 
>supplier 
>ORDER BY 
>  bidtime DESC
>  ) as rownum 
>from 
>  (
>select 
>  bidtime, 
>  date_format(bidtime, '-MM-dd-HH') as biddate, 
>  supplier, 
>  price 
>from 
>  orders
>  )
>  ) as t 
>where 
>  t.rownum = 1
>  ) as s 
> group by 
>  s.biddate, 
>  s.supplier
> ;
> 



Re: debezium-json数据timestamp类型时区问题

2022-11-24 文章 Leonard Xu
你在Oracle 数据库中的数据类型是TIMESTAMP 还是 TIMESTAMP WITH LOCAL TIME ZONE? 
我猜是后者,如果是后者直接在Flink SQL 里TIMESTAMP_LTZ 类型去映射就可以了
Oracle 的TIMESTAMP LTZ 类型和Flink SQL的TIMESTAMP LTZ类型含义和存储都是一致的语义,即epoch 
mills,存储时不需要时区。这两个类型都是在各自的系统中在在需要查看这些数据时,需要用 session 时区从epoch mills 
转换成可读timestamp格式的字符串。

Oracle 设置session 时区的命令是:
ALTER SESSION SET TIME_ZONE='Asia/Shanghai';

Flink SQL 设置session 时区的命令是:
Flink SQL> SET 'table.local-time-zone' = 'Asia/Shanghai';

祝好,
Leonard


> On Nov 22, 2022, at 4:32 PM, Kyle Zhang  wrote:
> 
> Hi all,
>我们有一个场景,是把oracle数据通过debezium-oracle-cdc插件抽到kafka中,后面接flink
> sql分析,现在遇到一个时区的问题,比如数据库中有一个timestamp类型的字段,值是‘2022-11-17
> 16:16:44’,但是debezium处理的时候用了int64保存,还不带时区信息,变成1668701804000,导致flink
> sql中用FROM_UNIXTIME处理后变成‘2022-11-18 00:16:44
> ’,差了8小时,需要手工再减8h。请问有没有一种统一的方式处理这种情况?
> 
> Best



Re: [ACCOUNCE] Apache Flink Elasticsearch Connector 3.0.0 released

2022-11-10 文章 Leonard Xu
Thanks Chesnay and Martijn for the great work!   I believe the 
flink-connector-shared-utils[1] you built will help Flink connector developers 
a lot.


Best,
Leonard
[1] https://github.com/apache/flink-connector-shared-utils

> 2022年11月10日 下午9:53,Martijn Visser  写道:
> 
> Really happy with the first externalized connector for Flink. Thanks a lot to 
> all of you involved!
> 
> On Thu, Nov 10, 2022 at 12:51 PM Chesnay Schepler  > wrote:
> The Apache Flink community is very happy to announce the release of 
> Apache Flink Elasticsearch Connector 3.0.0.
> 
> Apache Flink® is an open-source stream processing framework for 
> distributed, high-performing, always-available, and accurate data 
> streaming applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html 
> 
> 
> This release marks the first time we have released a connector 
> separately from the main Flink release.
> Over time more connectors will be migrated to this release model.
> 
> This release is equivalent to the connector version released alongside 
> Flink 1.16.0 and acts as a drop-in replacement.
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/projects/FLINK/versions/12352291 
> 
> 
> We would like to thank all contributors of the Apache Flink community 
> who made this release possible!
> 
> Regards,
> Chesnay



Re: flinkcdc 读不到mysql中数据

2022-11-02 文章 Leonard Xu
Flink CDC 社区有提供1.14支持的,2.2.1版本即可。你这个好像是没有开启checkpoint, 开启下就好了。
// enable checkpoint
env.enableCheckpointing(1000);


祝好,
Leonard

> 2022年11月3日 上午11:34,左岩 <13520871...@163.com> 写道:
> 
> 我用的是flink1.14 
> ,因为官方没有匹配的版本,所以自己编译的flinkCDC,binlog也开启了,然后也没报错,读不到mysql的数据,idea控制台不报错也不输出数据,可能是什么原因呢(运行日志见附件)
> public static void main(String[] args) throws Exception {
> Configuration conf = new Configuration();
> conf.setInteger("rest.port", 10041);
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment(conf);
> StreamTableEnvironment tenv = StreamTableEnvironment.create(env);
> 
> env.enableCheckpointing(1000, CheckpointingMode.EXACTLY_ONCE);
> 
> env.getCheckpointConfig().setCheckpointStorage("file:///d:/checkpoint3");
> 
> //StreamTableEnvironment tenv = StreamTableEnvironment.create(env);
> 
> env.setParallelism(4);
> 
> // 建表
> tenv.executeSql("CREATE TABLE flink_t_stu ( " +
> "  userid INT, " +
> "  username string, " +
> "  age string, " +
> "  `partition` INT, " +
> " PRIMARY KEY(userid) NOT ENFORCED " +
> " ) WITH ( " +
> " 'connector' = 'mysql-cdc', " +
> " 'server-id' = '5401-5404', " +
> " 'hostname' = '192.168.0.220', " +
> " 'port' = '3306', " +
> " 'username' = 'root', " +
> " 'password' = 'root', " +
> " 'database-name' = 'zy', " +
> " 'table-name' = 't_stu' " +
> ")");
> 
> // 查询
> tenv.executeSql("select * from flink_t_stu").print();
> 
> env.execute();
> 
> }
> 



Re: Flink CDC 打宽表

2022-11-02 文章 Leonard Xu
是的,如果是双流join打宽,ttl设置得过短,state里的历史数据被清理掉了,后续的更新数据进入join节点关联不上就可能会下发nul

祝好,
Leonard

> 2022年11月2日 上午11:49,Fei Han  写道:
> 
> 大家好!关于 Flink CDC 打宽表有如下疑问:
> 启动一个任务后,刚开始个字段是有值的。但跑一段时间或者跨天后,字段无缘无故是null值了。用其他引擎跑数据却是正常的。
> 比如第一天启动任务,A字段是有值的。但是第二天发现A字段全部是NULL值了。但用presto查询却是正常的。但我猜测是不是和TTL设置有关系呢?我设置了1天。



Re: Flink CDC2.2.1 设置server id范围

2022-10-31 文章 Leonard Xu

> 2022年10月31日 下午4:57,林影  写道:
> 
> Hi, Leonard.
> 
> 我也有类似的疑惑。
> 
> 有个线上的Flink Application之前配置的serverid 是
> 6416-6418,并行度之前是3,后来缩容的时候并行度改成2了,在这种场景下serverid的范围需要进行调整吗?

缩容并不需要的,你的case里只会用6416 和 6417这两个id,只有扩容需要考虑,并且扩容时如果没有夸大范围,目前是会报错提示的。

祝好,
Leonard




> 
> casel.chen  于2022年10月31日周一 16:50写道:
> 
>> 
>> 
>> 
>> 
>> server-id配置范围对于后面修改并发度是不是不太友好?每改一次并发度就得重新调整server-id范围么?还是说先配置一个较大的server-id范围,在在这个较大的范围内调整并发度?
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 在 2022-10-31 16:04:32,"Leonard Xu"  写道:
>>> Hi,
>>> 
>> 
>>> 你5张表对应的source并发是多少呀?如果是多并发需要把server-id设置成一个范围,范围和并发数匹配,比如4个并发,应该类似’1101-1104’.
>>> 另外 server-id 是全局唯一的,你需要确保下你使用的server-id 和其他作业、其他同步工具都不冲突才可以。
>>> 
>>> 
>>> Best,
>>> Leonard
>>> 
>>> 
>>>> 2022年10月31日 下午4:00,Fei Han  写道:
>>>> 
>>>> 大家好!
>>>> 现在我在 Flink CDC2.2.1设置了server id。有5张表且server id的范围都不同,通过Flink CDC
>> 打宽表。但是在任务跑一段时间后,还是出现如下报错:
>>>> Caused by: com.github.shyiko.mysql.binlog.network.ServerException: A
>> slave with the same server_uuid/server_id as this slave has connected to
>> the master;
>>>> 请教下各位,还有什么解决方案没有
>>> 
>> 



Re: Flink CDC2.2.1 设置server id范围

2022-10-31 文章 Leonard Xu

> server-id配置范围对于后面修改并发度是不是不太友好?每改一次并发度就得重新调整server-id范围么?还是说先配置一个较大的server-id范围,在在这个较大的范围内调整并发度?

作业起来后修改并发是需要调整的,建议这块可以放到平台里去设计,这样可以让写sql的用户知道with参数里参数的作用。

祝好,
Leonard


> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 在 2022-10-31 16:04:32,"Leonard Xu"  写道:
>> Hi, 
>> 
>> 你5张表对应的source并发是多少呀?如果是多并发需要把server-id设置成一个范围,范围和并发数匹配,比如4个并发,应该类似’1101-1104’.
>> 另外 server-id 是全局唯一的,你需要确保下你使用的server-id 和其他作业、其他同步工具都不冲突才可以。
>> 
>> 
>> Best,
>> Leonard
>> 
>> 
>>> 2022年10月31日 下午4:00,Fei Han  写道:
>>> 
>>> 大家好!
>>> 现在我在 Flink CDC2.2.1设置了server id。有5张表且server id的范围都不同,通过Flink CDC 
>>> 打宽表。但是在任务跑一段时间后,还是出现如下报错:
>>> Caused by: com.github.shyiko.mysql.binlog.network.ServerException: A slave 
>>> with the same server_uuid/server_id as this slave has connected to the 
>>> master;
>>> 请教下各位,还有什么解决方案没有
>> 



Re: flink sql client取消sql-clients-default.yaml后那些预置catalogs建议在哪里定义呢?

2022-10-31 文章 Leonard Xu
Hi,

我记得有个-i 参数可以指定初始化sql文件,你贴你的初始化sql在文件里加进去就可以了。

祝好,
Leonard




> 2022年10月31日 下午4:52,casel.chen  写道:
> 
> flink新版本已经找不到sql-clients-default.yaml文件了,那么之前配置的那些预置catalogs建议在哪里定义呢?通过初始化sql么?



Re: Flink CDC2.2.1 设置server id范围

2022-10-31 文章 Leonard Xu
Hi, 

你5张表对应的source并发是多少呀?如果是多并发需要把server-id设置成一个范围,范围和并发数匹配,比如4个并发,应该类似’1101-1104’.
另外 server-id 是全局唯一的,你需要确保下你使用的server-id 和其他作业、其他同步工具都不冲突才可以。


Best,
Leonard


> 2022年10月31日 下午4:00,Fei Han  写道:
> 
> 大家好!
> 现在我在 Flink CDC2.2.1设置了server id。有5张表且server id的范围都不同,通过Flink CDC 
> 打宽表。但是在任务跑一段时间后,还是出现如下报错:
> Caused by: com.github.shyiko.mysql.binlog.network.ServerException: A slave 
> with the same server_uuid/server_id as this slave has connected to the master;
> 请教下各位,还有什么解决方案没有



Re: Flink SQL 问题请教

2022-10-22 文章 Leonard Xu
你好,你的Flink 版本多少呀?我记得低版本才有这个问题。
另外SQL可以贴下嘛?

祝好,
Leonard



> 2022年10月22日 上午11:11,邮件帮助中心  写道:
> 
> 大家好!
>最近在开发一个项目时,在使用CDC表和维表表做Temporal Table 
> JOIN时,发现2个表Join时join字段的类型必须一致,否则提交时提示如下的错误
>The main method caused an error: Temporal table join requires an equality 
> condition on fields of table.
>为了解决上述问题,我们做了如下尝试:
> 1:在join时,对维表要关联的字段使用cast转换,如: JOIN ON CAST(tableA.filedA  AS INT) = 
> cdc_table_b.fieldB,将2个关联表的关联字段类型保持一致
> 2:在维表上建立一个视图,在视图定义字段的类型和select时使用cast转换,然后视图和cdc表进行join, 
> 此时join时字段类型理论上是一致的,
>很可惜,上述2个解决办法未能解决问题,都是提示上述同样的错误(The main method caused an error: Temporal 
> table join requires an equality condition on fields of 
> table),如果在DDL中将维表要jion的字段和CDC表join的字段定义成相同的类型时,提交时不报上述错误,但在运行过程中处理数据时会出现castException,请教下大家上述问题可以怎么解决?不胜感激!



Re: 看了官方文档的Versioned Table,有一些小疑惑希望可以得到解答

2022-08-08 文章 Leonard Xu



> 2022年8月8日 下午3:34,林影  写道:
> 
> 先上链接, Versioned Table
> 
> 从文档描述中可知,以Upsert-Kafka作为Source,以debezium或canal作为format时,可被认为是Versioned
> Table Source。
> 
> 1. 那么flink cdc所提供的connector下,是否也可以被认定为一种Versioned Table Source?
可以,cdc 流上定义了 pk 和 watermark就可以作为 versioned table


> 2. Versioned Table 在转化成DataStream时,转化后是否必定是一个restract stream?
是的,所有cdc流(即changelog流)从SQL API转到Datastream时都是一个retractStream

> 3. 是否所有的Versioned Table,都可以发送往带有撤销能力的sink(如MySQL/ES/Hudi等等)?


是的,只要sink支持回撤(retract),那么sink就支持消费changelog流


祝好,
Leonard



Re: flink table store

2022-04-07 文章 Leonard Xu

项目是开源的[1], 最近快要发布第一个版本了,可以关注下

Best,
Leonard
[1] https://github.com/apache/flink-table-store 




> 2022年4月7日 上午9:54,Xianxun Ye  写道:
> 
> 这里有 flink table store 的设计文档,你可以了解下。
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage
> 
> 
> Best regards,
> 
> 
> Xianxun
> 
> 
> On 04/6/2022 16:56,LuNing Wang wrote:
> Hi,
> 
> Table store是存储,应和数据湖类似
> 
> Best,
> LuNing Wang
> 
> yidan zhao  于2022年4月6日周三 16:55写道:
> 
> 看官网出来个 flink table store,和 flink table、flink sql 那一套有啥区别呢?
> 



Re: [ANNOUNCE] Apache Flink 1.1.4.4 released

2022-03-15 文章 Leonard Xu
Thanks a lot for being our release manager Konstantin and everyone who 
involved! 

Best,
Leonard

> 2022年3月15日 下午9:34,Martijn Visser  写道:
> 
> Thank you Konstantin and everyone who contributed! 



Re: flinkcdc:slave with the same server_uuid/server_id as this slave has connected to the master;

2022-03-14 文章 Leonard Xu
Please see the FAQ document [1]

Best,
Leonard

[1] 
https://github.com/ververica/flink-cdc-connectors/wiki/FAQ(ZH)#q10-%E4%BD%9C%E4%B8%9A%E6%8A%A5%E9%94%99-connectexception-a-slave-with-the-same-server_uuidserver_id-as-this-slave-has-connected-to-the-master%E6%80%8E%E4%B9%88%E5%8A%9E%E5%91%A2

> 2022年3月14日 下午7:05,maker_d...@foxmail.com 写道:
> 
> 时隔一个月又遇到了这个问题,现在有人能帮忙解决一下吗?
> 
> 
> 
> maker_d...@foxmail.com
> 
> 发件人: maker_d...@foxmail.com
> 发送时间: 2022-02-15 14:13
> 收件人: user-zh@flink.apache.org
> 主题: flinkcdc:slave with the same server_uuid/server_id as this slave has 
> connected to the master;
> flink version:flink-1.13.5
> cdc version:2.1.1
> 
> 在使用flinkcdc同步多个表时遇到报错:
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=3, 
> backoffTimeMS=1)
> at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:138)
> at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:82)
> at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:216)
> at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:206)
> at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:197)
> at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:682)
> at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:79)
> at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:435)
> at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:305)
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:212)
> at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77)
> at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
> at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
> at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)
> at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)
> at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
> at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)
> at akka.actor.Actor.aroundReceive(Actor.scala:517)
> at akka.actor.Actor.aroundReceive$(Actor.scala:515)
> at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
> at akka.actor.ActorCell.invoke(ActorCell.scala:561)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
> at akka.dispatch.Mailbox.run(Mailbox.scala:225)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
> at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
> exception
> at 
> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:223)
> at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
> at 
> org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
> at 
> org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:294)
> at 
> org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:69)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:684)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:639)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:650)
> at 
> 

Re: flink结合历史数据怎么处理

2021-12-06 文章 Leonard Xu
MySQL CDC connector 
支持并发读取的,读取过程也不会用锁,600万的数据量很小了,百亿级的分库分表我们和社区用户测试下都是ok的,你可以自己试试。

祝好,
Leonard


> 2021年12月6日 下午3:54,张阳 <705503...@qq.com.INVALID> 写道:
> 
> 因为数据量有600w 所以担心初始化时间太长 或者性能问题
> 
> 
> 
> 
> --原始邮件--
> 发件人:  
>   "user-zh"   
>  
>  发送时间:2021年12月6日(星期一) 下午2:38
> 收件人:"user-zh" 
> 主题:Re: flink结合历史数据怎么处理
> 
> 
> 
> 如果你的数据源是 数据库,你可以尝试下 Flink CDC Connectors[1], 这些Connector 就是 hybrid source, 
> 先读历史全量数据,再读增量数据,
> 历史和增量阶段是无缝衔接的。
> 
> 祝好,
> Leonard 
> [1] 
> https://ververica.github.io/flink-cdc-connectors/release-2.1/content/connectors/mysql-cdc.html
> 
> 
>  2021年12月2日 下午2:40,张阳   
>  统计的指标有大量的历史数据,怎么把历史的数据和今天的实时数据进行汇总呢。



Re: flink结合历史数据怎么处理

2021-12-05 文章 Leonard Xu
如果你的数据源是 数据库,你可以尝试下 Flink CDC Connectors[1], 这些Connector 就是 hybrid source, 
先读历史全量数据,再读增量数据,
历史和增量阶段是无缝衔接的。

祝好,
Leonard 
[1] 
https://ververica.github.io/flink-cdc-connectors/release-2.1/content/connectors/mysql-cdc.html


> 2021年12月2日 下午2:40,张阳  写道:
> 
> 统计的指标有大量的历史数据,怎么把历史的数据和今天的实时数据进行汇总呢。



Re: 退订

2021-11-23 文章 Leonard Xu
你好,取消订阅是发送到 user-zh-unsubscr...@flink.apache.org 
 , 参考 
https://flink.apache.org/zh/community.html#section 


祝好

> 在 2021年11月24日,14:33,Gauler Tan  写道:
> 
> 你好,已经发了很多次退订了,发啥还在源源不断的给我发邮件?
> 
> 谢谢



Re: FlinkSQL ES7连接器无法使用

2021-11-22 文章 Leonard Xu
这是个依赖问题,你检查下你环境中是否只使用sql connector 的jar,即 flink-sql-connector-elasticsearch7, 
如果不是 datastream 作业是不需要 flink-connector-elasticsearch7 这个 
jar包的。如果不是这个问题,你可以分析下你作业里使用的 es 相关依赖,可以参考异常栈确定类再去确定jar包,看下是不是多加了一些无用的jar。

祝好,
Leonard
 

> 在 2021年11月22日,12:30,mispower  写道:
> 
> 你好,咨询一下后续你这个问题是如何解决的?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> At 2021-06-10 10:15:58, "mokaful" <649713...@qq.com> wrote:
>> org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot
>> instantiate user function.
>>  at
>> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:338)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperator(OperatorChain.java:653)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:626)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:616)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:616)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:616)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:566)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:181)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.executeRestore(StreamTask.java:548)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:537)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:759)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
>> ~[flink-dist_2.11-1.13.1.jar:1.13.1]
>>  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
>> Caused by: java.io.InvalidClassException:
>> org.apache.flink.streaming.connectors.elasticsearch.table.Elasticsearch7DynamicSink$AuthRestClientFactory;
>> local class incompatible: stream classdesc serialVersionUID =
>> -2564582543942331131, local class serialVersionUID = -2353232579685349916
>>  at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
>> ~[?:1.8.0_181]
>>  at 
>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
>> ~[?:1.8.0_181]
>>  at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
>> ~[?:1.8.0_181]
>>  at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
>> ~[?:1.8.0_181]
>>  at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
>> ~[?:1.8.0_181]
>>  at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
>> ~[?:1.8.0_181]
>>  at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
>> ~[?:1.8.0_181]
>>  at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
>> ~[?:1.8.0_181]
>>  at
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
>> ~[?:1.8.0_181]
>>  at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
>> ~[?:1.8.0_181]
>>  at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
>> ~[?:1.8.0_181]
>>

Re: 退订

2021-11-03 文章 Leonard Xu
如果需要取消订阅 user-zh@flink.apache.org 邮件组,请发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org

> 在 2021年11月2日,14:15,李芳奎  写道:
> 
> 退订
> 
> felix 
> 
> felix_...@163.com



Re: flinksql 读取orc文件失败

2021-10-29 文章 Leonard Xu
图挂了,试着直接贴下日志文本,或者用图床工具。


> 在 2021年10月28日,16:54,陈卓宇 <2572805...@qq.com.INVALID> 写道:
> 
> 
> 
> 
> 
> flink版本:1.12.2
> jdk:1.8
> 场景:flinksql 读取hdfs的orc文件
> 请教大神这个报错是什么原因
> 
> 
> 宇
>  



Re: 一些关于flink rabbitmq connector的疑问

2021-10-28 文章 Leonard Xu
Hi, Peng

There’s no doubt that RabbitMQ is a good open source community with active 
users. 
I understand what @renqschn means is that Flink RabbitMQ  Connector is one 
connector with few users among the many connectors in the Flink project.  From 
my observation, the connector that is used more in the Flink project should be 
Kafka. Filesystem, JDBC and so on. So, please help us to promote Flink in the 
RabbitMQ community and let more RabbitMQ users know and then use the Flink 
RabbitMQ Connector, which will give the Flink community more motivation to 
improve the Flink RabbitMQ Connector.

Best,
Leonard

> 在 2021年10月29日,11:13,Ken Peng  写道:
> 
> I am one of the Forum Moderators for RabbitMQ, which does have a lot of
> active users. :)
> If you have any questions about RMQ please subscribe to its official group
> and ask there.
> rabbitmq-users+subscr...@googlegroups.com
> 
> Regards.
> 
> 
> On Fri, Oct 29, 2021 at 11:09 AM 任庆盛  wrote:
> 
>> 您好,
>> 
>> 从代码来看 RabbitMQ Sink 的确没有语义保证。目前 RabbitMQ
>> 由于社区用户不多,相对的维护频率也比较低,如果感兴趣的话也欢迎您参与社区的贡献~
>> 
>> 
>> 
>>> 2021年10月28日 下午7:53,wx liao  写道:
>>> 
>>> 你好:
>>> 
>> 冒昧打扰,最近项目在使用flink,sink端是rabbitmq,但是查看项目源码发现RMQSink好像并没有对消息不丢失做保证,没有看到有使用waitForConfirm()或者是confirm
>> listener,想问一下RMQSink部分是否没有保证at least once?希望可以得到解答,谢谢。
>> 
>> 



Re: 退订

2021-09-27 文章 Leonard Xu
如果需要取消订阅 user-zh@flink.apache.org  
邮件组,请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 
  即可


> 在 2021年9月27日,14:43,rzy1107  写道:
> 
> 退订



Re: GroupWindowAggregate doesn't support consuming update and delete changes which is produced by node Deduplicate

2021-09-27 文章 Leonard Xu
hi, 报错详情可以在邮件贴下吗?


> 在 2021年9月27日,11:36,lzy139...@outlook.com 写道:
> 
> 使用ROW_NUMBER过滤数据后,进行开窗聚合计算报错



Re: mysql cdc入hudi湖出错

2021-09-26 文章 Leonard Xu
Hi, chan

完整的日志能贴下吗?这个日志还看不出来。



> 在 2021年9月24日,18:23,casel.chen  写道:
> 
> SELECT `id`, `name`, `birthday`, `ts`, DATE_FORMAT(`birthday`, 'MMdd') AS 
> `partition` FROM mysql_users;



Re: 退订

2021-09-26 文章 Leonard Xu
如果需要取消订阅 user-zh@flink.apache.org  
邮件组,请发送任意内容的邮件到 user-zh-unsubscr...@flink.apache.org 
  即可

Best,
Leonard

> 在 2021年9月26日,14:25,luoye <13033709...@163.com> 写道:
> 
> 退订



Re: flink-1.12.0 ddl设置watermark error,但是1.13.2没有报错

2021-09-25 文章 Leonard Xu
这是个已知bug[1], 在1.13.0 和 1.12.3上都修复了, 你可以用下flink 1.12.5 或 1.13.2的小版本

[1]https://issues.apache.org/jira/browse/FLINK-22082

祝好

> 在 2021年9月25日,21:29,kcz <573693...@qq.com.INVALID> 写道:
> 
> SQL定义如下,当1.12.0将watermark语句移除之后,就不报错了。
> CREATE TABLE KafkaTable (
>  test array  gatherTime STRING,
>  log_ts as TO_TIMESTAMP(FROM_UNIXTIME(CAST(gatherTime AS 
> bigint)/1000,'-MM-dd HH:mm:ss')),
>  WATERMARK FOR log_ts AS log_ts - INTERVAL '5' SECOND
> ) WITH (
>  'connector' = 'kafka',
>  'topic' = 'user_behavior',
>  'properties.bootstrap.servers' = 'localhost:9092',
>  'properties.group.id' = 'testGroup',
>  'scan.startup.mode' = 'earliest-offset',
>  'format' = 'json'
> );
> 
> SELECT test[1].signalValue from KafkaTable;
> 
> 
> 
> 
> Exception in thread "main" scala.MatchError: ITEM($0, 1) (of class 
> org.apache.calcite.rex.RexCall)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.internalVisit$1(NestedProjectionUtil.scala:273)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:283)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedSchemaExtractor.visitFieldAccess(NestedProjectionUtil.scala:269)
>   at org.apache.calcite.rex.RexFieldAccess.accept(RexFieldAccess.java:92)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:112)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$$anonfun$build$1.apply(NestedProjectionUtil.scala:111)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil$.build(NestedProjectionUtil.scala:111)
>   at 
> org.apache.flink.table.planner.plan.utils.NestedProjectionUtil.build(NestedProjectionUtil.scala)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.getUsedFieldsInTopLevelProjectAndWatermarkAssigner(ProjectWatermarkAssignerTransposeRule.java:127)
>   at 
> org.apache.flink.table.planner.plan.rules.logical.ProjectWatermarkAssignerTransposeRule.matches(ProjectWatermarkAssignerTransposeRule.java:62)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.matchRecurse(VolcanoRuleCall.java:284)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.matchRecurse(VolcanoRuleCall.java:411)
>   at 
> org.apache.calcite.plan.volcano.VolcanoRuleCall.match(VolcanoRuleCall.java:268)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.fireRules(VolcanoPlanner.java:985)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1245)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:84)
>   at 
> org.apache.calcite.rel.AbstractRelNode.onRegister(AbstractRelNode.java:268)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl(VolcanoPlanner.java:1132)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.register(VolcanoPlanner.java:589)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.ensureRegistered(VolcanoPlanner.java:604)
>   at 
> org.apache.calcite.plan.volcano.VolcanoPlanner.changeTraits(VolcanoPlanner.java:486)
>   at 
> org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:309)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkVolcanoProgram.optimize(FlinkVolcanoProgram.scala:64)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:62)
>   at 
> org.apache.flink.table.planner.plan.optimize.program.FlinkChainedProgram$$anonfun$optimize$1.apply(FlinkChainedProgram.scala:58)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
>   at scala.collection.AbstractTraversable.foldLeft(Traversable.scala:104)
>   at 
> 

Re: flink cdc SQL2ES,GC overhead limit exceeded

2021-09-15 文章 Leonard Xu
应该和Flink CDC无关,CDC只是source,这个异常栈是从join节点抛出来的,你可以贴下你的SQL和配置
这样大家才好分析一点
Best,
Leonard


> 在 2021年9月15日,15:01,wukon...@foxmail.com 写道:
> 
> hi LIYUAN:
> 请描述下如何使用的flink,以及什么场景下 会造成这个报错, 这样方便大家帮助你定位问题。
> 
> 
> 
> wukon...@foxmail.com
> 
> 发件人: LI YUAN
> 发送时间: 2021-09-09 20:38
> 收件人: user-zh
> 主题: flink cdc SQL2ES,GC overhead limit exceeded
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at org.rocksdb.RocksIterator.key0(Native Method)
> at org.rocksdb.RocksIterator.key(RocksIterator.java:37)
> at 
> org.apache.flink.contrib.streaming.state.RocksIteratorWrapper.key(RocksIteratorWrapper.java:99)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBMapState$RocksDBMapIterator.loadCache(RocksDBMapState.java:670)
> at 
> org.apache.flink.contrib.streaming.state.RocksDBMapState$RocksDBMapIterator.hasNext(RocksDBMapState.java:585)
> at 
> org.apache.flink.table.runtime.operators.join.stream.state.OuterJoinRecordStateViews$InputSideHasNoUniqueKey$1.hasNext(OuterJoinRecordStateViews.java:285)
> at 
> org.apache.flink.table.runtime.operators.join.stream.AbstractStreamingJoinOperator$AssociatedRecords.of(AbstractStreamingJoinOperator.java:199)
> at 
> org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator.processElement(StreamingJoinOperator.java:211)
> at 
> org.apache.flink.table.runtime.operators.join.stream.StreamingJoinOperator.processElement2(StreamingJoinOperator.java:129)
> at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.processRecord2(StreamTwoInputProcessorFactory.java:221)
> at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory.lambda$create$1(StreamTwoInputProcessorFactory.java:190)
> at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$$Lambda$363/366743523.accept(Unknown
>  Source)
> at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessorFactory$StreamTaskNetworkOutput.emitRecord(StreamTwoInputProcessorFactory.java:291)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134)
> at 
> org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105)
> at 
> org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:66)
> at 
> org.apache.flink.streaming.runtime.io.StreamTwoInputProcessor.processInput(StreamTwoInputProcessor.java:98)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:423)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$208/999379751.runDefaultAction(Unknown
>  Source)
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:204)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:681)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.executeInvoke(StreamTask.java:636)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$$Lambda$625/601779266.run(Unknown
>  Source)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runWithCleanUpOnFail(StreamTask.java:647)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:620)
> at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:779)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:566)
> at java.lang.Thread.run(Thread.java:748)
> 
> Environment :
> 
> Flink version : 1.13.1
> 
> Flink CDC version: 1.4.0
> 
> Database and version: Mysql 7.0



Re: flink-connector-postgres-cdc:no changes will be captured 无数据捕获到

2021-09-06 文章 Leonard Xu
没看到你的附件呢,你也可以在Flink CDC 项目里建个issue, 图贴 issue里,
Github 贴图比邮件方便点

> 在 2021年9月6日,19:08,Fisher Xiang  写道:
> 
> Thank u Leonard.
> 我把运行日志放在附件了,麻烦看下
> 使用的用户是:role 'repl' [superuser: true, replication: true, inherit: true, create 
> role: false, create db: false, can log in: true]
> 
> 
> BR
> Fisher
> 
> 
> On Mon, Sep 6, 2021 at 5:55 PM Leonard Xu  <mailto:xbjt...@gmail.com>> wrote:
> Hello, Fisher
> 
> 图挂了,可以用图床工具贴下吗?
> 我可以帮忙看看
> 
> 祝好,
> Leonard
> 
> > 在 2021年9月6日,17:48,Fisher Xiang  > <mailto:fisherxia...@gmail.com>> 写道:
> > 
> > hi,
> > 
> > 在使用  flink-connector-postgres-cdc时(版本从1.1.1 ~ 1.4.0都试过), 出现一个警告:
> > WARN io.debezium.relational.RelationalDatabaseSchema - After applying the 
> > include/exclude list filters, no changes will be captured. Please check 
> > your configuration!
> > 启动配置是,Starting PostgresConnectorTask with configuration :
> > 
> > 
> > 
> > 然后,往这些表里面(  table.whitelist = stud,data_input,data_output 
> > )insert和delete记录时,没有捕获到变更数据,只有以下输出,求看是什么问题:
> > 
> > 
> > 
> > 使用的用户是:role 'repl' [superuser: true, replication: true, inherit: true, 
> > create role: false, create db: false, can log in: true]
> > 
> > BR
> > Fisher
> 



Re: flink-connector-postgres-cdc:no changes will be captured 无数据捕获到

2021-09-06 文章 Leonard Xu
Hello, Fisher

图挂了,可以用图床工具贴下吗?
我可以帮忙看看

祝好,
Leonard

> 在 2021年9月6日,17:48,Fisher Xiang  写道:
> 
> hi,
> 
> 在使用  flink-connector-postgres-cdc时(版本从1.1.1 ~ 1.4.0都试过), 出现一个警告:
> WARN io.debezium.relational.RelationalDatabaseSchema - After applying the 
> include/exclude list filters, no changes will be captured. Please check your 
> configuration!
> 启动配置是,Starting PostgresConnectorTask with configuration :
> 
> 
> 
> 然后,往这些表里面(  table.whitelist = stud,data_input,data_output 
> )insert和delete记录时,没有捕获到变更数据,只有以下输出,求看是什么问题:
> 
> 
> 
> 使用的用户是:role 'repl' [superuser: true, replication: true, inherit: true, create 
> role: false, create db: false, can log in: true]
> 
> BR
> Fisher



[ANNOUNCE] Flink mailing lists archive service has migrated to Apache Archive service

2021-09-06 文章 Leonard Xu
Hi, all

The mailing list archive service Nabble Archive was broken at the end of June, 
the Flink community has migrated the mailing lists archives[1] to Apache 
Archive service by commit[2], you can refer [3] to know more mailing lists 
archives of Flink.

Apache Archive service is maintained by ASF thus the stability is guaranteed, 
it’s a web-based mail archive service which allows you to browse, search, 
interact, subscribe, unsubscribe, etc. with mailing lists.

Apache Archive service shows mails of the last month by default, you can 
specify the date range to browse, search the history mails.  


Hope it would be helpful.

Best,
Leonard

[1] The Flink mailing lists in Apache archive service
dev mailing list archives: 
https://lists.apache.org/list.html?d...@flink.apache.org 
user mailing list archives : 
https://lists.apache.org/list.html?u...@flink.apache.org  
user-zh mailing list archives : 
https://lists.apache.org/list.html?user-zh@flink.apache.org
[2] 
https://github.com/apache/flink-web/commit/9194dda862da00d93f627fd315056471657655d1
[3] https://flink.apache.org/community.html#mailing-lists


Re: 退订

2021-08-31 文章 Leonard Xu
Hi,
 
  Please send email to dev-unsubscr...@flink.apache.org 
 if you want to unsubscribe the mail 
from d...@flink.apache.org  .
  Please send email to user-unsubscr...@flink.apache.org 
 if you want to unsubscribe the mail 
from u...@flink.apache.org  .
  Please send email to user-zh-unsubscr...@flink.apache.org 
 if you want to unsubscribe the 
mail from user-zh@flink.apache.org  .
 
You can refer[1] for more details. 

[1] https://flink.apache.org/community.html#mailing-lists 
  

Best,
Leonard

> 在 2021年8月31日,22:06,kindragos <6230...@163.com> 写道:
> 
> 退订



Re: 【Announce】Zeppelin 0.10.0 is released, Flink on Zeppelin Improved

2021-08-25 文章 Leonard Xu
Thanks Jeff for the great work !

Best,
Leonard 

> 在 2021年8月25日,22:48,Jeff Zhang  写道:
> 
> Hi Flink users,
> 
> We (Zeppelin community) are very excited to announce Zeppelin 0.10.0 is 
> officially released. In this version, we made several improvements on Flink 
> interpreter.  Here's the main features of Flink on Zeppelin:
> Support multiple versions of Flink
> Support multiple versions of Scala
> Support multiple languages
> Support multiple execution modes
> Support Hive
> Interactive development
> Enhancement on Flink SQL
> Multi-tenancy
> Rest API Support
> Take a look at this document for more details:  
> https://zeppelin.apache.org/docs/0.10.0/interpreter/flink.html 
> 
> The quickest way to try Flink on Zeppelin is via its docker image 
> https://zeppelin.apache.org/docs/0.10.0/interpreter/flink.html#play-flink-in-zeppelin-docker
>  
> 
> 
> Besides these, here’s one blog about how to run Flink sql cookbook on 
> Zeppelin, 
> https://medium.com/analytics-vidhya/learn-flink-sql-the-easy-way-d9d48a95ae57 
> 
>   The easy way to learn Flink Sql.
> 
> Hope it would be helpful for you and welcome to join our community to discuss 
> with others. http://zeppelin.apache.org/community.html 
> 
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang



Re: mini-batch 设置后没效果

2021-08-25 文章 Leonard Xu

> 如何退订这个邮件订阅了

如果需要取消订阅 user-zh@flink.apache.org 邮件组,请发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org 
  即可

Best,
Leonard

Re: Flink SQL Api不支持TIMESTAMP(p) WITH TIME ZONE 类型的列

2021-08-19 文章 Leonard Xu
Hello,

Flink 还不支持 TIMESTAMP WITH TIME ZONE 类型,

目前支持的有: 
TIMESTAMP WITHOUT TIME ZONE, 缩写为 TIMESTAMP
TIMESTAMP WITH LOCAL TIME ZONE,缩写为TIMESTAMP_LTZ

祝好,
Leonard

> 在 2021年8月19日,20:51,changfeng  写道:
> 
> ` TIMESTAMP(6) WITH TIME ZONE



Re: 请教下Flink时间戳问题

2021-08-15 文章 Leonard Xu
Hi,
你贴的图都挂了,需要传图可以用下图床工具,代码少可以直接贴代码。
TIMESTAMP 类型中 显示的T 没有任何含义,只是 format 一个时间戳时的一个分割符,你最终把 TIMESTAMP 
写入到你的sink,你自己的sink(比如mysql)会有其自己的format。
第二个问题,看不到你的图,你看下你flink的版本,1.13后这个TIMESTAMP_LTZ类型支持才完善的。

祝好,
Leonard


> 在 2021年8月16日,10:27,Geoff nie  写道:
> 
> 问题一:flink timestamp时间戳为何中间多了个T,怎么才能少去中间T呢?
> 



Re: Flink SQL向下兼容吗?

2021-08-11 文章 Leonard Xu
这里的SQL是指DDL还是DML,通常 DML都是兼容的,且一般不会有不兼容的升级,
DDL 语法 各家 SQL 方言都有自己的语法,这个比较灵活,FLINK SQL 的DDL 各个版本稍有不同,但 Flink SQL 新版本都是兼容老的 
DDL的,
只是新版本上的DDL语法如果提供了更丰富的功能,那么老版本的DDL则不能提供 。

所以我理解你关心的兼容性问题是不存在的,但请注意如果你的SQL作业是有状态的,需要带状态升级,这些状态都是跨版本不兼容的。

祝好,
Leonard

> 在 2021年8月10日,11:44,Jason Lee  写道:
> 
> 各位大佬好,
> 
> 
> 请教一个问题,我们最近打算升级Flink 版本,想问一下升级之后的已有任务的SQL会兼容到新版本吗?
> 比如我升级到1.13,那我1.10的SQL语法能被兼容吗?
> 
> 
> 感恩
> 
> 
> | |
> Chuang Li
> |
> |
> jasonlee1...@163.com
> |
> 签名由网易邮箱大师定制
> 



Re: 退订

2021-08-11 文章 Leonard Xu
如果需要取消订阅 user-zh@flink.apache.org 邮件组,请发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org 
Best,
Leonard

> 在 2021年8月6日,10:49,汪嘉富  写道:
> 
> 退订
> 



Re: 退订

2021-08-11 文章 Leonard Xu
如果需要取消订阅 user-zh@flink.apache.org 邮件组,请发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org 

Best,
Leonard

> 在 2021年8月11日,08:16,Lee2097  写道:
> 
> 退订



Re: flink 1.13.1 使用hive方言,执行hive sql解析报错

2021-07-29 文章 Leonard Xu
看起来是sql语法报错,这里面的ELSE呢?

祝好,
Leonard


> 在 2021年7月27日,20:04,Asahi Lee <978466...@qq.com.INVALID> 写道:
> 
> CASE
>   WHEN mipd.`param_cn` = '月池尺寸' THEN
>   mipv.`param_value`
>   END AS `Moonpool`



Re: 退订

2021-07-28 文章 Leonard Xu


如果需要取消订阅 user-zh@flink.apache.org 邮件组,是发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org 


> 在 2021年7月28日,10:52,赵珠峰  写道:
> 
> 退订
> 
> 
> 
> 本邮件载有秘密信息,请您恪守保密义务。未经许可不得披露、使用或允许他人使用。谢谢合作。
> This email contains confidential information. Recipient is obliged to keep 
> the information confidential. Any unauthorized disclosure, use, or 
> distribution of the information in this email is strictly prohibited. Thank 
> you.



Re: 退订

2021-06-28 文章 Leonard Xu
如果需要取消订阅 u...@flink.apache.org   邮件组,请发送任意内容的邮件到 
user-zh-unsubscr...@flink.apache.org 
 。


> 在 2021年6月29日,11:01,大雨 <95133...@qq.com.INVALID> 写道:
> 
> 退订



Re: Processing-time temporal join is not supported yet

2021-06-23 文章 Leonard Xu
会保留维表状态的,靠watermark清理过期数据。

祝好
Leonard


> 在 2021年6月23日,19:20,op <520075...@qq.com.INVALID> 写道:
> 
> 谢谢,Event time temporal join 
> 会保存temporal每个的key的最新状态吗,官网文档说跟两边watermark有关,每太看明白。。。
> 
> 
> 
> 
> --原始邮件--
> 发件人:  
>   "user-zh"   
>  
>  发送时间:2021年6月23日(星期三) 下午5:40
> 收件人:"user-zh" 
> 主题:Re: Processing-time temporal join is not supported yet
> 
> 
> 
> Hi,
> 
> Flink SQL 目前支持 Event time temporal join 任意表/视图,还不支持 Processing-time 
> temporal join 任意表/视图(支持Processing-time join 
> 实现了LookupTableSource的表)。
> 
> Processing-time temporal join 任意表目前不支持的原因主要是语义问题,具体来说: 
> 在Processing time关联时,Flink SQL 层面还没比较好的机制保证维表加载完后再关联。比如如用来做维表流的kafka中有 1000万 
> 条数据,但目前没有办法实现将这 
> 1000万条先记录全部加载完后主流过来的数据再去关联,在作业启动阶段,主流的数据预期能够关联上的数据可能因为维表还未加载完成而关联不上。
> 
> 可以参考下 https://issues.apache.org/jira/browse/FLINK-19830 
>  
> 祝好
> Leonard
> 
> 
> 
>  在 2021年6月23日,17:03,op <520075...@qq.com.INVALID 写道:
>  
>  Processing-time temporal join is not supported yet.



Re: Processing-time temporal join is not supported yet

2021-06-23 文章 Leonard Xu
Hi,

Flink SQL 目前支持 Event time  temporal join 任意表/视图,还不支持 Processing-time temporal  
join 任意表/视图(支持Processing-time  join 实现了LookupTableSource的表)。

Processing-time temporal  join 任意表目前不支持的原因主要是语义问题,具体来说:  在Processing 
time关联时,Flink SQL 层面还没比较好的机制保证维表加载完后再关联。比如如用来做维表流的kafka中有 1000万 条数据,但目前没有办法实现将这 
1000万条先记录全部加载完后主流过来的数据再去关联,在作业启动阶段,主流的数据预期能够关联上的数据可能因为维表还未加载完成而关联不上。

可以参考下 https://issues.apache.org/jira/browse/FLINK-19830 
 

祝好
Leonard



> 在 2021年6月23日,17:03,op <520075...@qq.com.INVALID> 写道:
> 
>  Processing-time temporal join is not supported yet.



Re: flink sql cdc如何获取元数据

2021-06-22 文章 Leonard Xu
Hello,

Flink sql cdc 还不支持获取元数据, 获取元数据的业务场景通常是怎么样的呢?

祝好,
Leonard




> 在 2021年6月23日,08:21,casel.chen  写道:
> 
> flink sql cdc如何获取元数据?像数据库名,表名,操作类型,binlog产生时间等。
> 
> 
> create table xxx_tbl (
>   k_op varchar, -- 操作类型
>   k_database varchar, -- 数据库名
>   k_table varchar, -- 表名
>   k_ts. BIGINT, -- binlog产生时间
>   idBIGINT,
>   name. varchar
> ) with (
>   'connector' = 'mysql-cdc',
>   .
>   'meta.fields-prefix' = 'k_'
> )



Re: 动态选择流

2021-06-22 文章 Leonard Xu
你的动态规则是怎么输入的? 流作业都是预先编译好作业拓扑,然后再调度运行,动态调整作业拓扑基本可不能。

祝好,
Leonard

> 在 2021年6月22日,20:10,梁鲁江  写道:
> 
> 你好,
> 麻烦问一下,有没有API或者实现方式来动态选择多个流。
> 例如我有A、B、C三条流,我的动态规则1要去joinA、B ,关联条件不定;动态规则2要去join B、C两条流 ……



Re: 场景实现咨询

2021-06-20 文章 Leonard Xu
Hi,

你可以试下 event time 的 temporal join, 把订单扩展信息表作为版本表。

Best,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/zh/docs/dev/table/sql/queries/joins/#event-time-temporal-join

> 在 2021年6月20日,11:33,chenchencc <1353637...@qq.com.INVALID> 写道:
> 
> 大佬们好,我有个生产场景,不知道怎么用flink sql实现。想咨询下。
> 
> 表说明:
> 1.订单主表:包含着订单的基本信息,有订单id主键,和其他基本信心,每次更新订单或者新增订单都会造成该表的更新或者新增
> 2.订单扩展信息表:包含着订单扩展信息,主键为订单id,订单的新增或者更新都会造成该表的新增或者更新
> 
> 两张表的新增更新相差10s内
> 场景:
> 需要关联两张表,要求订单主表left join 订单扩展表,并且是订单扩展表的最新信息
> 
> 想问下有什么好的实现方式吗
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: flink sql cdc做计算后, elastic search connector 多并发sink 会有问题,偶现!数据量较大,目前还没较好复现。请问有人是否也遇到过类似的问题。

2021-06-16 文章 Leonard Xu
看起来和 Flink-CDC 关系不大,看异常栈是 ES 侧抛出的异常 version_conflict_engine_exception, 
可以查下这个异常,看下是不是有写(其他作业/业务 也在写同步表)冲突。

祝好,
Leonard

> 在 2021年6月16日,17:05,mokaful <649713...@qq.com> 写道:
> 
> 相同问题,请问有处理方式吗
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: flink sql cdc数据同步至mysql

2021-06-08 文章 Leonard Xu
试着回答下这两个问题。

> flink 1.12的jdbc connector不支持 sink.parallism 参数,是不是说只能和上游cdc 
> connector保持相同的并行度1?如果是的话,下游sink会有瓶颈,因为表中存有大量的历史数据,将存量数据同步到目标表需要花费不少时间,而新的业务数据也源源不断进来,导致追不上。这要如何解决?
是的,关键问题是cdc connector为了保证数据一致性只能单并发,所以作业也只能单并发。这个需要cdc 
connector支持多并发读取,下游sink自然就能解决。


> flink 1.13的jdbc connector新增 sink.parallism 
> 参数,问题是如果增大并行度的话能否确保同一个主键记录发到同一个sink task呢?SQL正确的写法是什么?

这个不仅在同步场景,在其他场景也需要注意 sink.parallism这个参数的使用,目前框架没有保证同一个pk的记录发送到同一个task,需要用户自己保证 
sink 上定义的pk 和上游数据shuffle的key(比如 group key, join key)保持一致,
否则可能导致数据乱序。 这个社区也在从 plan 推导上并解决,可以参考 
https://issues.apache.org/jira/browse/FLINK-20374 
 
https://issues.apache.org/jira/browse/FLINK-22901 
 

祝好,
Leonard

Re: flink自定义connector相关报错

2021-06-02 文章 Leonard Xu
路径错了

> 在 2021年6月2日,17:02,MOBIN <18814118...@163.com> 写道:
> 
> META-INF.services/org.apache.flink.table.factories.Factory

=>   META-INF/services/org.apache.flink.table.factories.Factory

祝好
Leonard

Re: flink1.13 通过sql cli执行hdfs上面的SQL文件

2021-05-30 文章 Leonard Xu
> 
> 目前还不支持HDFS路径,只支持本地的文件,未来应该会支持.


是的, 目前还不支持,只支持本地文件,这个异常信息不是很明确


https://issues.apache.org/jira/browse/FLINK-22795 


祝好,
Leonard

> 
> 
> 
> -
> Best Wishes
> JasonLee
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: flink 1.13.0 中cumulate window 使用

2021-05-27 文章 Leonard Xu
Hi,

Cumulate window 是基于 window TVF 语法的,和之前的 groupWindowFunction 不一样, 你可以参考 [1]
Window TVF 也支持了 tumble window, hop window, 并且性能更优,session window 预计会在1.14支持, 
session window 有需要可以使用老的语法。

Best,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/sql/queries/window-agg/#windowing-tvfs


> 在 2021年5月28日,11:43,邹云鹤  写道:
> 
> insert into `test_out` select a.uid, 'dt', max(a.age) from `test_in` as a 
> group by uid, cumulate(PROCTIME(), interval '1' minute, interval '1' hour);
> 
> 
> hello, 各位大佬, flink 1.13.0 中流式作业中该怎么使用cumulate window呢?我用上面的SQL 貌似是不行, 
> 有没有使用过的大佬给点建议?
> 
> 
> 
> 
> | |
> 邹云鹤
> |
> |
> kevinyu...@163.com
> |
> 签名由网易邮箱大师定制



Re: flink 维表

2021-05-27 文章 Leonard Xu
Hi


> 1, 这个时态表必须是changlog流吗, 就是debezium - kafka 这样, 用和主表的flink cdc不可以吗, 
> 我用flink cdc测试没成功,因为flink cdc的维表好像不能指定watermark?

我们常说的 lookup维表也是时态表的一种,lookup 正如这个单词字面意思一样,主流的每条数据到来时都按 lookup key 去外部DB中 查询 
一条记录,这张维表自然就是最新的维表,这就是 proctime temporal join 的语义。

基于 Event-time temporal join,是按照 event-time 去取对应的版本, 与 proctime temporal 
join的区别是, proctime temporal join 只能取最新版本,Evet-time temporal join可以取该 event-time 
对应的版本。
Flink 声明 时态表主只需要两个属性, 主键 和 event time, 其中主键甚至可以是推导的。可以看下这个文档【1】

> 2, 订单表和时态表都需要一直写入数据来触发watermark吗? 
是的,event-time temporal join 靠两条流的 watermark 来共同驱动。


祝好,
Leonard
【1】https://ci.apache.org/projects/flink/flink-docs-master/zh/docs/dev/table/concepts/versioned_tables/#%e6%97%b6%e6%80%81%e8%a1%a8

> 
> 烦请解答下
> 
> 
> --原始邮件--
> 发件人:  
>   "user-zh"   
>  
>  发送时间:2021年5月27日(星期四) 下午5:14
> 收件人:"user-zh" 
> 主题:Re: flink 维表
> 
> 
> 
> HI,
> 可以修改的,修改后不需要重启作业。
> 修改后 flink 事实流 是能立即 查询到 最新的维表数据(前提是维表查询出来的数据没有开启cache)。
> 你可以理解下 lookup 维表的语法: A join B for system time as of A.proctime on A.id 
>  
> 
> 祝好,
> Leonard
> 
>  在 2021年5月27日,16:35,liujian <13597820...@qq.com 写道:
>  
>  请问flink lookup表作为维表,那么lookup表是不是不能新增或者修改,如果修改了那么作业就需要重启?
>  想要作业不重启咋操作



Re: flink 维表

2021-05-27 文章 Leonard Xu
HI,
可以修改的,修改后不需要重启作业。
修改后 flink 事实流 是能立即 查询到 最新的维表数据(前提是维表查询出来的数据没有开启cache)。
你可以理解下 lookup 维表的语法: A join B for system time as of A.proctime on A.id 
 = B.id  就是 查询当前最新的维表(proctime代表了最新时间)并关联。


祝好,
Leonard

> 在 2021年5月27日,16:35,liujian <13597820...@qq.com> 写道:
> 
> 请问flink lookup表作为维表,那么lookup表是不是不能新增或者修改,如果修改了那么作业就需要重启?
> 想要作业不重启咋操作



Re: flink问题咨询

2021-05-17 文章 Leonard Xu
Hello
你可以把具体的问题描述清楚点,比如给出一些数据和sql,能够复现你遇到的问题,这样大家才能帮忙排查。

祝好,
Leonard Xu


> 在 2021年5月18日,09:49,清酌  写道:
> 
> 您好!
>   我在使用1.11版本flink sql cdc 
> 时候,用sql形式想对多表关联生成实时的宽表,发现经常出现宽表的数据不准。特别是源表在cdc变更时候。比如:宽表本应该10条数据变更,但是实际只变更了3条。
>   我想知道这个问题是基于我使用不当产生的还是1.11版本的问题,如果是版本的问题后续会修复吗?



Re: 维度表 处理时间

2021-05-17 文章 Leonard Xu
只需要最新的维表数据,可以用处理时间,这样是事实表每条都实时去查mysql最新维表数据;
如果业务可以接受近似最新的维表数据,也可以将查询的维表结果通过缓存优化,减少访问mysql io访问,这两个参数:
lookup.cache.max-rows"
lookup.cache.ttl

祝好,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/zh/docs/connectors/table/jdbc/#%E8%BF%9E%E6%8E%A5%E5%99%A8%E5%8F%82%E6%95%B0

> 在 2021年5月18日,08:46,流弊 <1353637...@qq.com> 写道:
> 
> 大佬们好,我们现在有个场景,是left join 
> mysql维度表的,但是更新比较慢,大概10分钟更新一条,但是事实表速度比较快,每秒几万条。并且需要更新最新数据。如果采用mysql 
> cdc形式,那水位对等就要较长延迟。有什么好方式能够join到最新数据吗,使用处理时间?



Re: UpsertStreamTableSink requires that Table has a full primary keys if it is updated.

2021-05-14 文章 Leonard Xu
这里说的 PK 是定义在你结果表 DDL 上的PK,最开始的报错信息应该是你结果表上没声明PK吧。
你自定义的 connector 支持 upsert 的话,参考下 HBaseUpsertTableSink 的实现,你的 sink 获取到  Factory 
Context 中schema 的 pk 后,需要按照 upsert 语义处理下。

祝好,
Leonard




> On May 14, 2021, at 15:39, automths  wrote:
> 
> Hi:
> 
> 该问题有进一步的进展了。
> 
> 
> 我把cdc对应的表在创建表时设置了primary key,然后该问题解决了。
> 
> 
> 现在有一点弄不明白,在查找primary  key的时候,是去查找source表的primary key吗?
> 源码位置(org.apache.flink.table.planner.plan.utils.UpdatingPlanChecker#getUniqueKeyForUpsertSink)如下:
> 
> 
> def getUniqueKeyForUpsertSink(
>sinkNode: LegacySink,
>planner: PlannerBase,
>sink: UpsertStreamTableSink[_]): Option[Array[String]] = {
>  // extract unique key fields
>  // Now we pick shortest one to sink
>  // TODO UpsertStreamTableSink setKeyFields interface should be 
> Array[Array[String]]
>  val sinkFieldNames = sink.getTableSchema.getFieldNames
>  /** Extracts the unique keys of the table produced by the plan. */
>  val fmq = FlinkRelMetadataQuery.reuseOrCreate(
>planner.getRelBuilder.getCluster.getMetadataQuery)
>  val uniqueKeys = fmq.getUniqueKeys(sinkNode.getInput) // 此处是查找source的primary 
> key吗?
>  if (uniqueKeys != null && uniqueKeys.size() > 0) {
>uniqueKeys
>.filter(_.nonEmpty)
>.map(_.toArray.map(sinkFieldNames))
>.toSeq
>.sortBy(_.length)
>.headOption
>  } else {
>None
>  }
> }
> 
> 
> 
> 
> flink 版本为1.12.0。
> 在1.13.0上也有出现。
> 
> 
> 请知道的大佬告知。
> 
> 
> 
> 
> 祝好!
> automths
> 
> 
> 
> 
> On 05/14/2021 11:00,automths wrote:
> Hi:
> 
> 我自定义了一个UpsertStreamTableSink,在接入cdc数据向我的这个Sink写数据的时候,抛出了下面的异常:
> Exception in thread "main" org.apache.flink.table.api.TableException: 
> UpsertStreamTableSink requires that Table has a full primary keys if it is 
> updated.
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:93)
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlanInternal(StreamExecLegacySink.scala:48)
> at 
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:59)
> at 
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecLegacySink.translateToPlan(StreamExecLegacySink.scala:48)
> at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:66)
> at 
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:65)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:891)
> 
> 
> 我在构建tableSchema的时候,已经设置了primary key了,但依旧抛出这个错误。
> 我的flink版本是flink-1.12.0的。
> 
> 
> 请教一下,这个问题,该怎么解决?
> 
> 
> 
> 
> 祝好!
> automths
> 
> 
> 



Re: 扩展SqlServerDialect 运行在flink on k8s报错

2021-05-07 文章 Leonard Xu
Hi

看日志是加载不到对应的class文件,(1)可以对比下你jar包里的路径是不是不正确,(2) 检查下集群上是不是还有之前的jar包,没替换干净

祝好
Leonard

> 在 2021年5月7日,13:58,18756225...@163.com 写道:
> 
> 大家好,遇到一个问题:
> 坏境:flink 版本1.12.1,  k8s集群为session模式,  该集群之前可以将数据正常写入到mysql
> 参考mysqlDialect 扩展了一个 
> SqlServerDialect,替换了flink的lib包下的flink-connector-jdbc_2.11-1.12.1.jar,在on 
> yarn时 任务正常运行,flink-sql也可以将数据写入到sqlserver
> 在同一台机器坏境 提交到k8s搭建的flink session模式集群, 就报如下错误 , JdbcBatchingOutputFormat 
> 这个类加载不到? 
> 
> 谢谢!
> 
> 完整异常如下:
> org.apache.flink.runtime.JobException: Recovery is suppressed by 
> NoRestartBackoffTimeStrategy
>at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:116)
>at 
> org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:78)
>at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:224)
>at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.maybeHandleTaskFailure(DefaultScheduler.java:217)
>at 
> org.apache.flink.runtime.scheduler.DefaultScheduler.updateTaskExecutionStateInternal(DefaultScheduler.java:208)
>at 
> org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:610)
>at 
> org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:89)
>at 
> org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:419)
>at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>at java.lang.reflect.Method.invoke(Method.java:498)
>at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:286)
>at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:201)
>at 
> org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74)
>at 
> org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:154)
>at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
>at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
>at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
>at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
>at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
>at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
>at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
>at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
>at akka.actor.ActorCell.invoke(ActorCell.scala:561)
>at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
>at akka.dispatch.Mailbox.run(Mailbox.scala:225)
>at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
>at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>at 
> akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>at 
> akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: 
> Cannot load user class: 
> org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat
> ClassLoader info: URL ClassLoader:
> Class not resolvable through given classloader.
>at 
> org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperatorFactory(StreamConfig.java:322)
>at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperator(OperatorChain.java:614)
>at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:583)
>at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:526)
>at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:574)
>at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:526)
>at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:164)
>at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:485)
>at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:531)
>at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:722)
>at org.apache.flink.runtime.taskmanager.Task.run(Task.java:547)
>at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.flink.connector.jdbc.internal.JdbcBatchingOutputFormat
>at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>at 

Re: 設置look up table source

2021-04-21 文章 Leonard Xu
Hi, ChongAih

你可以参考 JdbcDynamicTableSource [1] 这个 table source 实现了 LookupTableSource 
接口,你需要写一个类似 JdbcRowDataLookupFunction 即可
的函数即可。

祝好,
Leonard
[1] 
https://github.com/apache/flink/blob/4be9aff3eccb3808df1f10ef7c30480ec11a9cb0/flink-connectors/flink-connector-jdbc/src/main/java/org/apache/flink/connector/jdbc/table/JdbcDynamicTableSource.java#L70



> 在 2021年4月21日,16:04,Chongaih Hau  写道:
> 
> hi all,
> 
> flink在使用temporal join只支持look up table source。我在做單元測試的時候, 下載了hive
> 表裡面的數據,嘗試了用filesystem註冊temporal table。可是後來發現file system不支持lookup。查詢了文檔(
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/legacySourceSinks.html),用戶可以自定義look
> up table source。可是我找不到類似用csv設置look up table source的方法。所以想請問有什麼例子我可以參考嗎?
> 
> Regards,
> 
> 
> *Hau ChongAih*



Re: flink-sql-connector-elasticsearch6_2.11_1.10.0 与 flink-connector-elasticsearch6_2.11_1.10.0 并存问题

2021-04-20 文章 Leonard Xu
Hi
如果只是sql作业,使用flink-sql-connector-elasticsearch6_2.11_1.10.0 
就可以了,如果纯datastream作业使用flink-connector-elasticsearch6_2.11_1.10.0 就可以了
如果两个包都要使用,有两个思路
1. 你自己自己打个包,把上面两个包的依赖放在一起。
2. 和1类似,shade掉flink-connector-elasticsearch6_2.11_1.10.0 
我没实际打过,你可以动手试下。

祝好


> 在 2021年4月20日,14:13,william <712677...@qq.com> 写道:
> 
> 你好,我也遇到了同样的问题,请问你们是怎么解决的,谢谢!
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: 退订

2021-04-12 文章 Leonard Xu
On Mon, Apr 12, 2021 at 3:06 PM yangxiaofei  wrote:

> 退订
>
>
Hi
是指取消订阅邮件吗?取消和订阅邮件组 不是直接发给邮件组, Apache的邮件组管理都类似。

请发送任意内容的邮件到  user-zh-unsubscr...@flink.apache.org   就可以取消订阅
user-zh@flink.apache.org  邮件列表

邮件列表的订阅管理,可以参考[1]

祝好,
Leonard Xu
[1]
https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list


Re: BUG :DataStream 转 Table 后无法 触发窗口计算

2021-03-09 文章 Leonard Xu
你好,
你的flink版本是多少?
之前有个bug是Table转datastream 会丢rowtime问题,看起来是这个问题。

我在[1]里修复了,你可以升级对应的版本试下。


祝好,
Leonard
[1]https://issues.apache.org/jira/browse/FLINK-21013 
 



> 在 2021年3月10日,14:34,HunterXHunter <1356469...@qq.com> 写道:
> 
> 再试了一下:
> 修改并行度也不行
>.setParallelism(9)
> 
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: flink sql中如何使用异步io关联维表?

2021-03-04 文章 Leonard Xu
目前Flink SQL 中的connector都没实现异步io关联维表,接口是上已经支持了的,如果是自己实现可以参考[1]
另外,HBase connector 社区有人正在支持异步io关联维表,预计1.13可以使用[2]

祝好

[1]https://github.com/apache/flink/blob/73cdd3d0d9f6a807b3e47c09eef7983c9aa180c7/flink-table/flink-table-planner-blink/src/test/java/org/apache/flink/table/planner/factories/TestValuesRuntimeFunctions.java#L618
[2] https://github.com/apache/flink/pull/14684#pullrequestreview-604148209 




> 在 2021年3月4日,14:22,HunterXHunter <1356469...@qq.com> 写道:
> 
> 定义一个 sourcetable
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: [DISCUSS] Deprecation and removal of the legacy SQL planner

2021-03-04 文章 Leonard Xu
+1 for the roadmap.

Thanks Timo for driving this.

Best,
Leonard

> 在 2021年3月4日,20:40,Timo Walther  写道:
> 
> Last call for feedback on this topic.
> 
> It seems everyone agrees to finally complete FLIP-32. Since FLIP-32 has been 
> accepted for a very long time, I think we don't need another voting thread 
> for executing the last implementation step. Please let me know if you think 
> differently.
> 
> I will start deprecating the affected classes and interfaces beginning of 
> next week.
> 
> Regards,
> Timo
> 
> 
> On 26.02.21 15:46, Seth Wiesman wrote:
>> Strong +1
>> Having two planners is confusing to users and the diverging semantics make
>> it difficult to provide useful learning material. It is time to rip the
>> bandage off.
>> Seth
>> On Fri, Feb 26, 2021 at 12:54 AM Kurt Young  wrote:
>>> >> change.>
>>> 
>>> Hi Timo,
>>> 
>>> First of all I want to thank you for introducing this planner design back
>>> in 1.9, this is a great work
>>> that allows lots of blink features to be merged to Flink in a reasonably
>>> short time. It greatly
>>> accelerates the evolution speed of Table & SQL.
>>> 
>>> Everything comes with a cost, as you said, right now we are facing the
>>> overhead of maintaining
>>> two planners and it causes bugs and also increases imbalance between these
>>> two planners. As
>>> a developer and also for the good of all Table & SQL users, I also think
>>> it's better for us to be more
>>> focused on a single planner.
>>> 
>>> Your proposed roadmap looks good to me, +1 from my side and thanks
>>> again for all your efforts!
>>> 
>>> Best,
>>> Kurt
>>> 
>>> 
>>> On Thu, Feb 25, 2021 at 5:01 PM Timo Walther  wrote:
>>> 
 Hi everyone,
 
 since Flink 1.9 we have supported two SQL planners. Most of the original
 plan of FLIP-32 [1] has been implemented. The Blink code merge has been
 completed and many additional features have been added exclusively to
 the new planner. The new planner is now in a much better shape than the
 legacy one.
 
 In order to avoid user confusion, reduce duplicate code, and improve
 maintainability and testing times of the Flink project as a whole we
 would like to propose the following steps to complete FLIP-32:
 
 In Flink 1.13:
 - Deprecate the `flink-table-planner` module
 - Deprecate `BatchTableEnvironment` for both Java, Scala, and Python
 
 In Flink 1.14:
 - Drop `flink-table-planner` early
 - Drop many deprecated interfaces and API on demand
 - Rename `flink-table-planner-blink` to `flink-table-planner`
 - Rename `flink-table-runtime-blink` to `flink-table-runtime`
 - Remove references of "Blink" in the code base
 
 This will have an impact on users that still use DataSet API together
 with Table API. With this change we will not support converting between
 DataSet API and Table API anymore. We hope to compensate the missing
 functionality in the new unified TableEnvironment and/or the batch mode
 in DataStream API during 1.14 and 1.15. For this, we are looking for
 further feedback which features are required in Table API/DataStream API
 to have a smooth migration path.
 
 Looking forward to your feedback.
 
 Regards,
 Timo
 
 [1]
 
 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
 
>>> 
> 



Re: Flink SQL temporal table join with Hive 报错

2021-02-19 文章 Leonard Xu

> 
>  二,维表有有分区,每个分区仅仅包含当天的数据,没有 primary key
> 
>   这种情况因为要 Join 全部的数据,所以还是需要设置 'streaming-source.partition.include' = 
> 'all',但是还是因为没有 primary Key,所以无法 run。
> 
> 现在就是针对第二种情况,因为Hive的维度表不是我维护的,很多人都在用,所以不能修改去加上 primary key,无法进行 join.

第二种情况,hive表不是streaming读的,相当于是一张静态表,每次都是加载最新的全量,所以配置如下参数即可
  'streaming-source.enable' = 'false',  -- option with default value, 
can be ignored.
  'streaming-source.partition.include' = 'all', -- option with default value, 
can be ignored.
  'lookup.join.cache.ttl' = '12 h’
   'streaming-source.partition.include' = ‘all’  是默认值,也可以不配, 参考【1】
> 
> 
> 还有我看文档现在不支持 event time join, 官网的汇率是按照 process time 
> join,但是如果要回溯昨天的数据的时候,其实就会有问题。
> 
> 我看 FLIP-132 
> 
>  有提到 Event Time semantics, 这是以后回支持的吗?

Kafka connector已经支持了 event time join, 但hive表目前还不支持在上面声明watermark,所以还不支持


祝好,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#temporal-join-the-latest-table

> 
> 
> macia kk mailto:pre...@gmail.com>> 于2021年2月8日周一 下午6:53写道:
> Hi. Leonard
> 
>   麻烦帮忙看下 Flink 邮件里的这个问题,卡了我很久了,谢谢



Re: Flink SQL temporal table join with Hive 报错

2021-02-09 文章 Leonard Xu
Hi,  macia

> 在 2021年2月9日,10:40,macia kk  写道:
> 
> SELECT *FROM
>(
>SELECT  tt.*
>FROM
>input_tabe_01 tt
>FULL OUTER JOIN input_tabe_02 mt
>ON (mt.transaction_sn = tt.reference_id)
>and tt.create_time >= mt.create_time + INTERVAL '5' MINUTES
>and tt.create_time <= mt.create_time - INTERVAL '5' MINUTES
>WHERE COALESCE(tt.create_time, mt.create_time) is not NULL
>) lt
>LEFT JOIN exchange_rate ex
>/*+ 
> OPTIONS('streaming-source.enable'='true','streaming-source.partition.include'
> = 'all') */
>FOR SYSTEM_TIME AS OF lt.event_time ex ON DATE_FORMAT
> (lt.event_time, '-MM-dd') = cast(ex.date_id as String)


你说的异常我本地没有复现,异常栈能直接贴下吗?

另外看你写的是lt.event_time, 
这个sql的是要做版本表的维表关联吗?目前Hive还不支持指定版本表的,只支持最新分区作为维表或者整个hive表作为维表,
两种维表的option你可参考下[1]

祝好,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/hive/hive_read_write.html#temporal-join-the-latest-table





Re: Flink SQL Hive 使用 partition.include 结果跟文档不一致

2021-02-04 文章 Leonard Xu
Hi

> 在 2021年2月5日,09:47,macia kk  写道:
> 
> the `latest` only works` when the
> streaming hive source table used as temporal table. 

只能用在temporal(时态)表中,时态表只能在 temporal join(也就是我们常说的维表join) 中使用

祝好

Re: 是否可以 hive 流 join hive 流?

2021-01-31 文章 Leonard Xu
还没有,你可以关注下这个issue[1]

祝好,
Leonard
[1] https://issues.apache.org/jira/browse/FLINK-21183

> 在 2021年2月1日,13:29,macdoor  写道:
> 
> 当前的 1.13-snapshot 支持了吗?我可以试试吗?
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: 是否可以 hive 流 join hive 流?

2021-01-31 文章 Leonard Xu
Okay, 和我理解的一样,这个时间上是 event time, 基于event time的 interval join 
需要定义watermark,目前hive表还不支持定义watermark,1.13应该会支持。



> 在 2021年2月1日,10:58,macdoor  写道:
> 
> p1.time 是数据记录里的时间,也用这个时间做分区
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: flink sql时区问题

2021-01-31 文章 Leonard Xu
嗯,flink 中 很多时间函数比如PROCTIME()/CURRENT_TIMESTAMP 返回的值都是 
UTC+0的时间值,这里的timezone设置对这些函数不生效的,这些函数是有点时区问题的,
目前只能在代码里通过加减时区偏移绕过。

> 在 2021年2月1日,10:50,沉醉寒風 <1039601...@qq.com> 写道:
> 
> 在代码中这样设置 streamTableEnv.getConfig().setLocalTimeZone(ZoneId.of("+8")) 
> 也不管用. 还是要自己手动去加减时间才能做到,方法比较笨,
> 
> 
> 
> 
> --原始邮件--
> 发件人:  
>   "user-zh"   
>  
>  发送时间:2021年2月1日(星期一) 上午10:46
> 收件人:"user-zh" 
> 主题:Re: flink sql时区问题
> 
> 
> 
> Hi,
> 时区不生效在你的代码中是体现在那些地方呀?目前flink sql是有些时区问题,社区也希望在1.13能解决掉。
> 
> 
>  在 2021年2月1日,10:42,沉醉寒風 <1039601...@qq.com 写道:
>  
>  streamTableEnv.getConfig().setLocalTimeZone(ZoneId.of("+8"))



Re: flink sql时区问题

2021-01-31 文章 Leonard Xu
Hi,
时区不生效在你的代码中是体现在那些地方呀?目前flink sql是有些时区问题,社区也希望在1.13能解决掉。


> 在 2021年2月1日,10:42,沉醉寒風 <1039601...@qq.com> 写道:
> 
> streamTableEnv.getConfig().setLocalTimeZone(ZoneId.of("+8"))



Re: 是否可以 hive 流 join hive 流?

2021-01-31 文章 Leonard Xu
Hi,macdoor

很有意思的case,p1.time字段是你记录里的时间吗? 你hive表的分区字段和这个时间字段的关系是怎么样的呀?


> 在 2021年1月30日,17:54,macdoor  写道:
> 
> 具体需求是这样,采集取得的通道总流量5分钟一次存入 hive 表,为了取得 5 分钟内该通道的流量,需要前后2次采集到的总流量相减,我想用同一个 hive
> 表自己相互 join,形成 2 个 hive 流 join,不知道是否可以实现?或者有其他实现方法吗?
> 我现在使用 crontab 定时 batch 模式做,希望能改成 stream 模式
> 
> select p1.traffic -p2.traffic
> from p as p1
> inner join p as p2 on p1.id=p2.id and p1.time=p2.time + interval 5 minutes
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: 咨询求助

2021-01-31 文章 Leonard Xu


> 在 2021年1月31日,20:15,Appleyuchi  写道:
> 
> 一点小小的建议哈,
> 目前flink社区讨论主要还是java/scala为主,
> 如果执意使用pyflink的话,后续极有可能会遇到较大的阻力.

我理解这种较大阻力应该不存在的,社区里pyflink的投入还挺大的,蛮多开发者的,我也cc两位在这块做贡献的社区开发者,从JIRA上看pyflink相关的开发进度都挺快的。
如果有机器学习,python相关的经验,用pyflink我觉得挺合适的。

祝好,
Leonard

Re: 退订

2021-01-24 文章 Leonard Xu
Hi
需要取消订阅邮件, 可以发送任意内容的邮件到  user-zh-unsubscr...@flink.apache.org   取消订阅来自 
user-zh@flink.apache.org  邮件列表的邮件
邮件列表的订阅管理,请参考[1]

祝好,
Leonard
[1] https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list 
 

> 在 2021年1月25日,15:06,Far  写道:
> 
> 退订,为什么不起作用
> 
> 发自我的iPhone
> 
>> 在 2021年1月24日,下午9:55,唐军亮  写道:
>> 



Re: Flink sql去重问题

2021-01-24 文章 Leonard Xu
Hello

特殊的Top-N是说去重的语义是Top 1, 所以只用保留一个大小的堆,其底层实现和其他Top-N的数据结构不一样,并不需要维护一个堆,  其他的数据根据语义 
要么被丢掉,要么撤回下发新值,另外这种有状态的算子,数据都是放在state里的,设置的TTL是生效的,表示state中的数据有效期时多久,这个数据会用来判断新来的数据是丢掉还是撤回旧值并下发新的值。

祝好,
Leonard 



> 在 2021年1月22日,10:53,guaishushu1...@163.com 写道:
> 
> 
> 看到社区文档说是Blink的去重是一种特殊Top-N。经了解Top-N会保留一个最小堆,麻烦问下那其他数据是被机制清除了,还是会保存在内存中呀。用了这个Blink去重还需要用Idle
>  State Retention Time来设置状态的TTL吗?
> 
> guaishushu1...@163.com 


[DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-01-21 文章 Leonard Xu
Hello, everyone

I want to start the discussion of FLIP-162: Consistent Flink SQL time function 
behavior[1]. 
We’ve some initial discussion of several problematic functions in dev mail 
list[2], and I think it's the right time to resolve them by a FLIP.   
 
Currently some time function behaviors are wired to user, user can not get 
local date/time/timestamp in their local time zone for time functions:
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
NOW()
PROCTIME()
Assume user's clock time is '2021-01-20 07:52:52.270' in Beijing time(UTC+8), 
currently the unexpected values are returned when user SELECT above functions 
in Flink SQL client 

Flink SQL> SELECT NOW(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, 
CURRENT_TIME;
+-+-+-+--+--+
|  NOW()   |  PROCTIME() |   CURRENT_TIMESTAMP 
| CURRENT_DATE | CURRENT_TIME |
+-+-+-+--+--+
| 2021-01-19T23:52:52.270 | 2021-01-19T23:52:52.270 | 2021-01-19T23:52:52.270 | 
  2021-01-19 | 23:52:52.270 |
+-+-+-+--+--+

Besides, the window with interval one day width based on PROCTIME() can not 
collect correct data that belongs to the date '2021-01-20', because some data 
was assigned to window '2021-01-19' due to the PROCTIME() does not return local 
TIMESTAMP as user expected. 

These problems come from these time-related functions like PROCTIME(), NOW(), 
CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are returning time values 
based on UTC+0 time zone, this is an incorrect behavior from my 
investigation[3].
I Invested all Flink time-related functions and compared with other DB vendors 
like Pg,Presto, Hive, Spark, Snowflake, this topic will lead to a comparison of 
the three types, i.e. 
 TIMESTAMP/TIMESTAMP WITHOUT TIME ZONE
 TIMESTAMP WITH LOCAL TIME ZONE
 TIMESTAMP WITH TIME ZONE
In order to better understand above three types, I wrote a document[4] to help 
understand them better. You will found the behavior of them is same with in 
Hadoop ecosystem from the document.The document is detailed and pretty long, 
it’s necessary to make the semantics clear(You can focus on the FLIP and skip 
the document). 

In one word, to correct the behavior of above functions, we can change the 
function return type or function return value. Both of them are valid because 
SQL:2011 does not specify the function return type, and every SQL engine vendor 
has its own implementation. For example the CURRENT_TIMESTAMP function in the 
document[3], Spark, Presto, Snowflake have different behaviors.

I tend to only change the return value for these problematic functions and 
introduce an option for compatibility consideration, the detailed proposal can 
be found in FLIP-162[1].  
After corrected these function, user can get their expected return values as 
following:

Flink SQL> SELECT NOW(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, 
CURRENT_TIME;
+-+-+-+--+--+
|  NOW()   |  PROCTIME() |   CURRENT_TIMESTAMP 
| CURRENT_DATE | CURRENT_TIME |
+-+-+-+--+--+
| 2021-01-20T07:52:52.270 | 2021-01-20T07:52:52.270 | 2021-01-20T07:52:52.270 | 
  2021-01-20 | 07:52:52.270 |
+-+-+-+--+--+

Looking forward to your feedback.

Best,
Leonard

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-162%3A+Consistent+Flink+SQL+time+function+behavior
[2] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Correct-time-related-function-behavior-in-Flink-SQL-tc47989.html
 
[3] 
https://docs.google.com/spreadsheets/d/1T178krh9xG-WbVpN7mRVJ8bzFnaSJx3l-eg1EWZe_X4/edit?usp=sharing
[4] 
https://docs.google.com/document/d/1iY3eatV8LBjmF0gWh2JYrQR0FlTadsSeuCsksOVp_iA/edit?usp=sharing
 



Re: [DISCUSS] Correct time-related function behavior in Flink SQL

2021-01-21 文章 Leonard Xu


> Before the changes, as I am writing this reply, the local time here is 
> 2021-01-21 12:03:35 (Beijing time, UTC+8).
> And I tried these 5 functions in sql client, and got:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, 
> CURRENT_TIME;
> +-+-+-+--+--+
> |  EXPR$0 |  EXPR$1 |   CURRENT_TIMESTAMP 
> | CURRENT_DATE | CURRENT_TIME |
> +-+-+-+--+--+
> | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 | 2021-01-21T04:03:35.228 
> |   2021-01-21 | 04:03:35.228 |
> +-+-+-+--+--+
> After the changes, the expected behavior will change to:
> 
> Flink SQL> select now(), PROCTIME(), CURRENT_TIMESTAMP, CURRENT_DATE, 
> CURRENT_TIME;
> +-+-+-+--+--+
> |  EXPR$0 |  EXPR$1 |   CURRENT_TIMESTAMP 
> | CURRENT_DATE | CURRENT_TIME |
> +-+-+-+--+--+
> | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 | 2021-01-21T12:03:35.228 
> |   2021-01-21 | 12:03:35.228 |
> +-+-+-+--+--+
> The return type of now(), proctime() and CURRENT_TIMESTAMP still be TIMESTAMP;

To Kurt, thanks  for the intuitive case, it really clear, you’re wright that I 
want to propose to change the return value of these functions. It’s the most 
important part of the topic from user's perspective.

> I think this definitely deserves a FLIP.
To Jark,  nice suggestion, I prepared a FLIP for this topic, and will start the 
FLIP discussion soon.

>> If use the default Flink SQL, the window time range of the
>> statistics is incorrect, then the statistical results will naturally be
>> incorrect.
To zhisheng, sorry to hear that this problem influenced your production jobs,  
Could you share your SQL pattern?  we can have more inputs and try to resolve 
them.


Best,
Leonard



> On Tue, Jan 19, 2021 at 6:42 PM Leonard Xu  <mailto:xbjt...@gmail.com>> wrote:
> I found above example format may mess up in different mail client, I post a 
> picture here[1].
> 
> Best,
> Leonard
> 
> [1] 
> https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
>  
> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>
>  
> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png
>  
> <https://github.com/leonardBang/flink-sql-etl/blob/master/etl-job/src/main/resources/pictures/CURRRENT_TIMESTAMP.png>>
>  
> 
> > 在 2021年1月19日,16:22,Leonard Xu  > <mailto:xbjt...@gmail.com>> 写道:
> > 
> > Hi, all
> > 
> > I want to start the discussion about correcting time-related function 
> > behavior in Flink SQL, this is a tricky topic but I think it’s time to 
> > address it. 
> > 
> > Currently some temporal function behaviors are wired to users.
> > 1.  When users use a PROCTIME() in SQL, the value of PROCTIME() has a 
> > timezone offset with the wall-clock time in users' local time zone, users 
> > need to add their local time zone offset manually to get expected local 
> > timestamp(e.g: Users in Germany need to +1h to get expected local 
> > timestamp). 
> > 
> > 2. Users can not use CURRENT_DATE/CURRENT_TIME/CURRENT_TIMESTAMP  to get 
> > wall-clock timestamp in local time zone, and thus they need write UDF in 
> > their SQL just for implementing a simple filter like WHERE date_col =  
> > CURRENT_DATE. 
> > 
> > 3. Another common case  is the time window  with day interval based on 
> > PROCTIME(), user plan to put all data from one day into the same window, 
> > but the window is assigned using timestamp in UTC+0 timezone rather than 
> > the session timezone which leads to the window starts with an offset(e.g: 
> > Users in China need to add -8h in their business sql start and then +8h 
> > when output the result, the conversion like a magic for users). 
> > 
> > These problems come from that lots of time-related functions like 
> > PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP are 
> > returning time values based on UTC+0 time zone.
> > 
>

Re: Flink SQL kafka connector有办法获取到partition、offset信息嘛?

2021-01-21 文章 Leonard Xu

> 看了下,是1.12才开始支持么,1.11是不行的嘛?
是的,1.11不支持,文档也是有版本的,如果对应版本的文档里没有该功能介绍,那就是不支持的。






Re: 【Flink SQL】维表优化规则建议

2021-01-10 文章 Leonard Xu
Hi,
这个异常信息可以提升的准确说是需要主键和even-time 时间属性,你的自定义维表是同时支持lookup和scan的吗?
这个异常信息可以提升的,如果确定问题的话可以再社区建个JIRA提升下的。

祝好
Leonard Xu

> 在 2021年1月9日,09:39,张韩  写道:
> 
> 版本:1.12
> 问题:维表关联若是支持事件时间,维表需要有主键和时间属性,在满足这两个条件前提下,自定义维表若是实现LookupTableSource接口则优化会报异常:
> Caused by: org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There 
> are not enough rules to produce a node with desired properties: 
> convention=STREAM_PHYSICAL, FlinkRelDistributionTraitDef=any, 
> MiniBatchIntervalTraitDef=None: 0, ModifyKindSetTraitDef=[NONE], 
> UpdateKindTraitDef=[NONE].
> Missing conversion is FlinkLogicalTableSourceScan[convention: LOGICAL -> 
> STREAM_PHYSICAL]
> 建议:LookupTableSource的维表关联需是系统时间,在优化规则LogicalCorrelateToJoinFromTemporalTableRule中把这种限制信息提示出来是不是更好些



Re: flink-sql流平台工具

2020-12-29 文章 Leonard Xu
感谢分享!
看起来很nice的平台化实践,star 了.

> 在 2020年12月29日,21:09,zhp <499348...@qq.com> 写道:
> 
> 本人业余时间开发了一个基于flink-sql 的web 可视化ui平台供大家学习、参考、使用 
> https://github.com/zhp8341/flink-streaming-platform-web
>   
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: flink1.12支持hbase1.2吗

2020-12-29 文章 Leonard Xu
Hi,
hbase-1.2社区没测试过,社区支持的是hbase-1.4.x 和 hbase-2.3.x,
你可以用hbase-1.4.x的connector试下,connector使用到Hbase的API不多,1.4.x 和 1.2.x 应该是兼容的

祝好,
Leonard
> 在 2020年12月29日,21:12,zhp <499348...@qq.com> 写道:
> 
> flink1.12支持hbase1.2吗
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: pyflink1.12 使用connector read.query参数报错

2020-12-24 文章 Leonard Xu
Hi, 嘉伟  

1.12 应该不支持 read.query, 社区还在讨论是否要开放这个,有些concern, 简单的讲,就如你这个query写的,创建的这张JDBC 
表应该是一个 View 而不是对应一张JDBC 表,同时这个表只能用来作为source,不能用来作为sink。

祝好,
Leonard

> 在 2020年12月24日,19:16,冯嘉伟 <1425385...@qq.com> 写道:
> 
> hi! 试试这个
> 
> CREATE TABLE source_table(
>yldrate DECIMAL,
>pf_id VARCHAR,
>symbol_id VARCHAR) WITH(
>'connector' = 'jdbc',
>'url' = 'jdbc:mysql://ip/db',
>'driver' = 'com.mysql.cj.jdbc.Driver',
>'username' = 'xxx',
>'password' = 'xxx',
>'table-name' = 'TS_PF_SEC_YLDRATE',
>'read.query' = 'SELECT YLDRATE, PF_ID, SYMBOL_ID FROM
> TS_PF_SEC_YLDRATE LEFT JOIN TP_GL_DAY ON DAY_ID = BIZ_DATE WHERE CCY_TYPE =
> "AC" AND PF_ID = "1030100122" AND SYMBOL_ID = "2030004042" AND BIZ_DATE
> between "20160701" AND "20170307"'
>)
> 
> 
> 
> --
> Sent from: http://apache-flink.147419.n8.nabble.com/



Re: Flink-1.12支持kafka join jdbc维表吗

2020-12-20 文章 Leonard Xu
>  
> 这么讲的话算是比较清晰了,所以如果想要基于事件时间进行jdbc维表Join,首先需要将jdbc维表的changelog数据接入kafka再进行join,这也是官网给的例子,对吗?

是的


> 你说的这种方式就是好像基于处理时间的join~
是的,基于处理时间的维表join和大家熟知的lookup关联, 
语法都是一样的,因为两者语义是一样的,就是在运行时关联最新的维表数据,只是两者实现方式不同,lookup 
关联维表只是一种实现方式,实现方式是运行时每条数据都去查询数据库(语义上就是关联了最新的维表数据),关联维表也有其他的实现方式,比如把维表最新的数据维护放在state里,在运行时每条数据去和state中的数据关联。

祝好
Leonard



>   
> 
> 
> 
> 发件人: Leonard Xu
> 发送时间: 2020-12-21 14:44
> 收件人: user-zh
> 主题: Re: Flink-1.12支持kafka join jdbc维表吗
> Hi 
>> 在给jdbc声明主键和watermark后,使用官网的事件时间join语法,任务启动后发现jdbc算子一次性读取了所有jdbc维表数据,状态变成了finished
> 
> 这是正常的,jdbc 
> connector实现的表就是bounded的,只会scan一次,一次读完,之后数据库表的数据更新是没有去捕捉的,connector也没有很好的办法去实时监控数据库表的更新并广播到下游节点。
> 
> 如果想要有获取实时更新的维表并做基于event time语义的维表关联,那么推荐的方式就是接入数据库表的binlog(changelog), 用主流去 
> temporal join changelog流 实现关联维表的准确版本。
> 
> 另外,如果对维表实时性要求不高,也没有延迟join的需求,你也可以用运行时lookup的方式完成维表join(即:JOIN latest_rates 
> FOR SYSTEM_TIME AS OF o.proctime),这种方式是在每条主流的数据到来时去查询一次外部DB,这种方式支持你参数中写的   
> 'lookup.cache.max-rows' = '1',   'lookup.cache.ttl' = ‘1min’ 优化。
> 
> 
> 祝好,
> Leonard 
> 



Re: Flink-1.12支持kafka join jdbc维表吗

2020-12-20 文章 Leonard Xu
Hi 
> 在给jdbc声明主键和watermark后,使用官网的事件时间join语法,任务启动后发现jdbc算子一次性读取了所有jdbc维表数据,状态变成了finished

这是正常的,jdbc 
connector实现的表就是bounded的,只会scan一次,一次读完,之后数据库表的数据更新是没有去捕捉的,connector也没有很好的办法去实时监控数据库表的更新并广播到下游节点。

如果想要有获取实时更新的维表并做基于event time语义的维表关联,那么推荐的方式就是接入数据库表的binlog(changelog), 用主流去 
temporal join changelog流 实现关联维表的准确版本。

另外,如果对维表实时性要求不高,也没有延迟join的需求,你也可以用运行时lookup的方式完成维表join(即:JOIN latest_rates FOR 
SYSTEM_TIME AS OF o.proctime),这种方式是在每条主流的数据到来时去查询一次外部DB,这种方式支持你参数中写的   
'lookup.cache.max-rows' = '1',   'lookup.cache.ttl' = ‘1min’ 优化。


祝好,
Leonard 



Re: Pyflink1.12尝试连接oracle数据,报错findAndCreateTableSource failed

2020-12-16 文章 Leonard Xu
目前 JDBC connector 只支持 MySQL, Pg和Derby(一般测试用)这几种dialect, Oracle还不支持。

祝好,
Leonard

> 在 2020年12月17日,09:47,肖越 <18242988...@163.com> 写道:
> 
> pyflink小白,测试pyflink1.12功能,想通过定义connector从oracle数据库中获取数据
> 通过如下方式定义:
> env = StreamExecutionEnvironment.get_execution_environment()
> env.set_parallelism(1)
> env = StreamTableEnvironment \
>.create(env, environment_settings=EnvironmentSettings
>.new_instance()
>.use_blink_planner().build())
> source_ddl1 = """
>CREATE TABLE source_table (id BIGINT,pf_id VARCHAR,\
>tree_id VARCHAR,node_id VARCHAR,biz_date DATE,\
>ccy_type VARCHAR,cur_id_d VARCHAR,tldrate DOUBLE,\
>is_valid INT,time_mark TIMESTAMP) WITH (
>'connector.type' = 'jdbc',
>'connector.url' = 'jdbc:oracle:thin:@ip:dbnamel',
>'connector.table' = 'ts_pf_ac_yldrate',
>'connector.driver' = 'oracle.jdbc.driver.OracleDriver',
>'connector.username' = 'xxx',
>'connector.password' = 'xxx')
>"""
> sql = "select pf_id,biz_date from source_table where biz_date='20160701' "
> env.sql_update(source_ddl1)
> table = env.sql_query(sql)
> env.execute("flink_test")
> 报错信息:
>raise java_exception
> pyflink.util.exceptions.TableException: findAndCreateTableSource failed.
> at 
> org.apache.flink.table.factories.TableFactoryUtil.findAndCreateTableSource(TableFactoryUtil.java:49)
> at 
> org.apache.flink.table.planner.plan.schema.LegacyCatalogSourceTable.findAndCreateLegacyTableSource(LegacyCatalogSourceTable.scala:193)
> at 
> org.apache.flink.table.planner.plan.schema.LegacyCatalogSourceTable.toRel(LegacyCatalogSourceTable.scala:94)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.toRel(SqlToRelConverter.java:3585)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:2507)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2144)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2093)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertFrom(SqlToRelConverter.java:2050)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:663)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:644)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:3438)
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:570)
> at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$rel(FlinkPlannerImpl.scala:165)
> at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.rel(FlinkPlannerImpl.scala:157)
> at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.toQueryOperation(SqlToOperationConverter.java:823)
> at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlQuery(SqlToOperationConverter.java:795)
> at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:250)
> at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:78)
> at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:639)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at 
> org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
> at 
> org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at 
> org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
> at 
> org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)



Re: A group window expects a time attribute for grouping in a stream environment谢谢

2020-12-10 文章 Leonard Xu

> ①请问输出结果里面的"+"是什么意思,不应该是2017吗?
时间戳格式而已,可以检查下你的时间数据

> 最后的999是啥意思???
代表窗口结束时间,即timestamp(3)精度下小于window边界的时间

> 
> 
> ②代码中有两处时间限定
> .window(Tumble.over(lit(5).minutes())
> $("user"),
> $("w").start(),//输出很奇怪
> $("w").end(),
> $("w").rowtime(),
> 请问这两处时间限定有什么区别吗?
> 是不是
> 前者是"全局范围限定"?
> 后者是在前者限定的基础上做进一步限定?
> 如果是的话,end里面是否可以设定时间戳?
> -
没看出你说的两处时间限定,你是不是理解偏了,你可以先看下关于窗口和时间的文档[1]。

祝好,
Leonard
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/stream/operators/windows.html#tumbling-windows
 
<https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/stream/operators/windows.html#tumbling-windows>
[2] 
https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/event_time.html 
<https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/event_time.html> 







> -
> 昨天的完整代码是:
> https://paste.ubuntu.com/p/9JsFDKC5V8/
> 
> 
> ~!!!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 在 2020-12-10 12:02:31,"Leonard Xu"  写道:
>> Hi,
>> 补充下昨天线下提供的答疑,在从datastream 转换到Table时,如果希望转换后的Table上能够继续使用watermark,
>> 需要(1)让datastream中包含watermark (2)在table上声明event time 属性. 文档可以参考[1]
>> 
>> 给出文档中省略的watermark生成部分code:
>> 
>>   // 老版本
>> //Table orders = 
>> tEnv.fromDataStream(orderA.assignTimestampsAndWatermarks(new 
>> AscendingTimestampExtractor() {
>> //@Override
>> //public long extractAscendingTimestamp(Order element) {
>> //return element.rowtime;
>> //}
>> //}), $("user"), $("product"), 
>> $("amount"),$("rowtime").rowtime());
>>   // 新版本
>>   Table orders = 
>> tEnv.fromDataStream(orderA.assignTimestampsAndWatermarks(WatermarkStrategy.forGenerator((ctx)
>>  -> new WatermarkGenerator() {
>>   @Override
>>   public void onEvent(Order order, long eventTimestamp, 
>> WatermarkOutput watermarkOutput) {
>>   watermarkOutput.emitWatermark(new 
>> Watermark(eventTimestamp));
>>   }
>>   @Override
>>   public void onPeriodicEmit(WatermarkOutput watermarkOutput) {
>>   }
>>   }))
>>   , $("user"), $("product"), $("amount"),$("rowtime").rowtime());
>> 如果代码不多,可以直接贴在邮件中哈。
>> 
>> 
>> 祝好,
>> Leonard
>> [1] 
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/table/streaming/time_attributes.html#%E5%9C%A8-datastream-%E5%88%B0-table-%E8%BD%AC%E6%8D%A2%E6%97%B6%E5%AE%9A%E4%B9%89-1
>> 
>>> 在 2020年12月10日,11:10,Jark Wu  写道:
>>> 
>>> 链接错了。重发下。
>>> 
>>> 1) 所谓时间属性,不是指 timestamp 类型的字段,而是一个特殊概念。可以看下文档如果声明时间属性:
>>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/streaming/time_attributes.html
>>> <https://ci.apache.org/projects/flink/flink-docs-release-1).12/dev/table/streaming/time_attributes.html>
>>> 2) 你的代码好像也不对。 L45: Table orders = tEnv.from("Orders"); 没看到你有注册过 "Orders"
>>> 表。这一行应该执行不成功把。
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Thu, 10 Dec 2020 at 11:09, Jark Wu  wrote:
>>> 
>>>> 1). 所谓时间属性,不是指 timestamp 类型的字段,而是一个特殊概念。可以看下文档如果声明时间属性:
>>>> https://ci.apache.org/projects/flink/flink-docs-release-1).12/dev/table/streaming/time_attributes.html
>>>> 2. 你的代码好像也不对。 L45: Table orders = tEnv.from("Orders"); 没看到你有注册过 "Orders"
>>>> 表。这一行应该执行不成功把。
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> On Wed, 9 Dec 2020 at 15:44, Appleyuchi  wrote:
>>>> 
>>>>> 代码是:
>>>>> https://paste.ubuntu.com/p/gVGrj2V7ZF/
>>>>> 报错:
>>>>> A group window expects a time attribute for grouping in a stream
>>>>> environment.
>>>>> 但是代码的数据源中已经有时间属性了.
>>>>> 请问应该怎么修改代码?
>>>>> 谢谢
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 



  1   2   3   4   >