Re:RE: Re:RE: binlog文件丢失问题

2024-01-21 文章 wyk



您好:
  我确认我们两台mysql备库都开启了gtid选项,并且该问题我们进行了复现,复现步骤如下:
flink版本 1.14.5
flink-connector-mysql-cdc版本  2.2.0
mysql版本 5.6.0


1.准备两台备库,并且binlog文件名相差很远没有交集
2.采集第一台备库,等待数据正常写入后,停止该cdc采集任务,正常保存savepoint
3.修改采集mysql的配置信息为备库2,并且将flink任务正常从savepoint启动,就会出现上述反馈的问题
















在 2024-01-19 20:36:10,"Jiabao Sun"  写道:
>Hi,
>
>日志中有包含 GTID 的内容吗?
>用 SHOW VARIABLES LIKE 'gtid_mode’; 确认下是否开启了GTID呢?
>
>Best,
>Jiabao
>
>
>On 2024/01/19 09:36:38 wyk wrote:
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 抱歉,具体报错和代码如下:
>> 
>> 
>> 报错部分:
>> Caused by: java.lang.IllegalStateException: The connector is trying to read 
>> binlog starting at 
>> Struct{version=1.5.4.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1705645599953,db=,server_id=0,file=mysql_bin.007132,pos=729790304,row=0},
>>  but this is no longer available on the server. Reconfigure the connector to 
>> use a snapshot when needed.
>> at 
>> com.ververica.cdc.connectors.mysql.debezium.task.context.StatefulTaskContext.loadStartingOffsetState(StatefulTaskContext.java:179)
>> at 
>> com.ververica.cdc.connectors.mysql.debezium.task.context.StatefulTaskContext.configure(StatefulTaskContext.java:112)
>> at 
>> com.ververica.cdc.connectors.mysql.debezium.reader.BinlogSplitReader.submitSplit(BinlogSplitReader.java:93)
>> at 
>> com.ververica.cdc.connectors.mysql.debezium.reader.BinlogSplitReader.submitSplit(BinlogSplitReader.java:65)
>> at 
>> com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader.checkSplitOrStartNext(MySqlSplitReader.java:170)
>> at 
>> com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader.fetch(MySqlSplitReader.java:75)
>> at 
>> org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
>> at 
>> org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:142)
>> ... 6 more
>> 
>> 
>> 
>> 
>> 代码部分: 
>> if (!isBinlogAvailable(mySqlOffsetContext)) {
>> throw new IllegalStateException(
>> "The connector is trying to read binlog starting at "
>> + mySqlOffsetContext.getSourceInfo()
>> + ", but this is no longer "
>> + "available on the server. Reconfigure the connector to 
>> use a snapshot when needed.");
>> }
>> 
>> 在 2024-01-19 17:33:03,"Jiabao Sun"  写道:
>> >Hi,
>> >
>> >你的图挂了,可以贴一下图床链接或者直接贴一下代码。
>> >
>> >Best,
>> >Jiabao
>> >
>> >
>> >On 2024/01/19 09:16:55 wyk wrote:
>> >> 
>> >> 
>> >> 各位大佬好:
>> >> 现在有一个binlog文件丢失问题,需要请教各位,具体问题描述如下:
>> >> 
>> >> 
>> >> 问题描述:
>> >> 场景: 公司mysql有两个备库: 备库1和备库2。
>> >> 1. 现在备库1需要下线,需要将任务迁移至备库2
>> >> 2.我正常将任务保存savepoint后,将链接信息修改为备库2从savepoint启动,这个时候提示报错binlog文件不存在问题,报错截图如下图一
>> >> 3.我根据报错找到对应代码(如下图二)后,发现是一块校验binlog文件是否存在的逻辑,我理解的是我们从gtid启动不需要对binlog文件进行操作,就将这部分代码进行了注释,任务能够正常从savepoint启动,并且数据接入正常
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 疑问: 想问一下校验binlog文件是否存在这块逻辑是否需要,或者是应该修改为校验gtid是否存在,期待您的回复,谢谢
>> >> 
>> >> 
>> >> 注意: 备库一个备库二的gtid是保持一致的
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 图一:
>> >> 
>> >> 
>> >> 图二:
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> >> 
>> 


RE: Re:RE: 如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-21 文章 Jiabao Sun
Hi,

Flink CDC MongoDB connector V1是基于 mongo-kafka 实现的,没有基于 debezium 实现。
Flink CDC MongoDB connector V2是基于增量快照读框架实现的,不依赖 mongo-kafka 和 debezium 。

Best,
Jiabao

[1] https://github.com/mongodb/mongo-kafka


On 2024/01/22 02:57:38 "casel.chen" wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Flink CDC MongoDB connector 还是基于debezium实现的
> 
> 
> 
> 
> 
> 
> 
> 
> 在 2024-01-22 10:14:32,"Jiabao Sun"  写道:
> >Hi,
> >
> >可以参考 Flink CDC MongoDB connector 的实现。
> >
> >Best,
> >Jiabao
> >
> >
> >On 2024/01/22 02:06:37 "casel.chen" wrote:
> >> 现有一种数据源不在debezium支持范围内,需要通过flink sql全增量一体消费,想着基于flink cdc 
> >> 3.x自行开发,查了一下现有大部分flink cdc source 
> >> connector都是基于debezium库开发的,只有oceanbase和tidb不是,但后二者用的source接口还是V1版本的,不是最新V2版本的incremental
> >>  snapshot,意味着全量阶段不支持checkpoint,如果作业失败需要重新从头消费。
> >> 想问一下有没有不基于debezium实现的V2版本source connector示例?谢谢!
> 

Re:RE: 如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-21 文章 casel.chen









Flink CDC MongoDB connector 还是基于debezium实现的








在 2024-01-22 10:14:32,"Jiabao Sun"  写道:
>Hi,
>
>可以参考 Flink CDC MongoDB connector 的实现。
>
>Best,
>Jiabao
>
>
>On 2024/01/22 02:06:37 "casel.chen" wrote:
>> 现有一种数据源不在debezium支持范围内,需要通过flink sql全增量一体消费,想着基于flink cdc 
>> 3.x自行开发,查了一下现有大部分flink cdc source 
>> connector都是基于debezium库开发的,只有oceanbase和tidb不是,但后二者用的source接口还是V1版本的,不是最新V2版本的incremental
>>  snapshot,意味着全量阶段不支持checkpoint,如果作业失败需要重新从头消费。
>> 想问一下有没有不基于debezium实现的V2版本source connector示例?谢谢!


RE: 如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-21 文章 Jiabao Sun
Hi,

可以参考 Flink CDC MongoDB connector 的实现。

Best,
Jiabao


On 2024/01/22 02:06:37 "casel.chen" wrote:
> 现有一种数据源不在debezium支持范围内,需要通过flink sql全增量一体消费,想着基于flink cdc 
> 3.x自行开发,查了一下现有大部分flink cdc source 
> connector都是基于debezium库开发的,只有oceanbase和tidb不是,但后二者用的source接口还是V1版本的,不是最新V2版本的incremental
>  snapshot,意味着全量阶段不支持checkpoint,如果作业失败需要重新从头消费。
> 想问一下有没有不基于debezium实现的V2版本source connector示例?谢谢!

如何基于flink cdc 3.x自定义不基于debezium的全增量一体source connector?

2024-01-21 文章 casel.chen
现有一种数据源不在debezium支持范围内,需要通过flink sql全增量一体消费,想着基于flink cdc 
3.x自行开发,查了一下现有大部分flink cdc source 
connector都是基于debezium库开发的,只有oceanbase和tidb不是,但后二者用的source接口还是V1版本的,不是最新V2版本的incremental
 snapshot,意味着全量阶段不支持checkpoint,如果作业失败需要重新从头消费。
想问一下有没有不基于debezium实现的V2版本source connector示例?谢谢!

Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 文章 Jing Ge
Thanks Leonard for the feedback! Also thanks @Jark Wu
 @Chesnay
Schepler  and each and everyone who worked closely with
me for this release. We made it together!

Best regards,
Jing

On Sun, Jan 21, 2024 at 9:25 AM Leonard Xu  wrote:

> Thanks Jing for driving the release, nice work!
>
> Thanks all who involved this release!
>
> Best,
> Leonard
>
> > 2024年1月20日 上午12:01,Jing Ge  写道:
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
> > series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this bugfix release:
> >
> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
> >
> > Please note: Users that have state compression should not migrate to
> 1.18.1
> > (nor 1.18.0) due to a critical bug that could lead to data loss. Please
> > refer to FLINK-34063 for more information.
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
> > @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
> > release.
> >
> > A Jira task series based on the Flink release wiki has been created for
> > 1.18.1 release. Tasks that need to be done by PMC have been explicitly
> > created separately. It will be convenient for the release manager to
> reach
> > out to PMC for those tasks. Any future patch release could consider
> cloning
> > it and follow the standard release process.
> > https://issues.apache.org/jira/browse/FLINK-33824
> >
> > Feel free to reach out to the release managers (or respond to this
> thread)
> > with feedback on the release process. Our goal is to constantly improve
> the
> > release process. Feedback on what could be improved or things that didn't
> > go so well are appreciated.
> >
> > Regards,
> > Jing
>
>


Re: [ANNOUNCE] Apache Flink 1.18.1 released

2024-01-21 文章 Leonard Xu
Thanks Jing for driving the release, nice work!

Thanks all who involved this release!

Best,
Leonard

> 2024年1月20日 上午12:01,Jing Ge  写道:
> 
> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.18.1, which is the first bugfix release for the Apache Flink 1.18
> series.
> 
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
> 
> The release is available for download at:
> https://flink.apache.org/downloads.html
> 
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/2024/01/19/apache-flink-1.18.1-release-announcement/
> 
> Please note: Users that have state compression should not migrate to 1.18.1
> (nor 1.18.0) due to a critical bug that could lead to data loss. Please
> refer to FLINK-34063 for more information.
> 
> The full release notes are available in Jira:
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12353640
> 
> We would like to thank all contributors of the Apache Flink community who
> made this release possible! Special thanks to @Qingsheng Ren @Leonard Xu
> @Xintong Song @Matthias Pohl @Martijn Visser for the support during this
> release.
> 
> A Jira task series based on the Flink release wiki has been created for
> 1.18.1 release. Tasks that need to be done by PMC have been explicitly
> created separately. It will be convenient for the release manager to reach
> out to PMC for those tasks. Any future patch release could consider cloning
> it and follow the standard release process.
> https://issues.apache.org/jira/browse/FLINK-33824
> 
> Feel free to reach out to the release managers (or respond to this thread)
> with feedback on the release process. Our goal is to constantly improve the
> release process. Feedback on what could be improved or things that didn't
> go so well are appreciated.
> 
> Regards,
> Jing