Hello, Dan

> 2022年2月21日 下午9:11,Dan Serb <dan.s...@boatyardx.com> 写道:
> 1.Have a processor that uses Flink JDBC CDC Connector over the table that 
> stores the information I need. (This is implemented currently - working)

You mean you’ve implemented a Flink JDBC Connector? Maybe the Flink CDC 
Connectors[1] would help you.


> 2.Find a way to store that Stream Source inside a table inside Flink. (I 
> tried with the approach to create a MySql JDBC Catalog – but apparently, I 
> can only create Postgres Catalog programmatically) – This is the question – 
> What api do I need to use to facilitate saving inside Flink in a SQL Table, 
> the data retrieved by the CDC Source?
>  3.The solution from point 2. Needs to be done in a way that I can query that 
> table, for each record I receive in a different Job that has a Kafka Source 
> as the entrypoint.


The Flink JDBC Catalog only provides the Postgres implementation, you need to 
implement your Catalog e.g mysql catalog which provides a CDC TableSource, you 
can encapsulate a mysql-cdc source[2] in your catalog implementation

> I’m just worried that I might need to reuse this data sets from the sql 
> database in future jobs, so this is why I’d like to have something decoupled 
> and available for the entire cluster.

If you want to reuse the data set for avoiding capturing the database table 
multiple times,  you can send the CDC data to message queue like Kafka/Pulsar 
and then consume the changelogs from message queue in different Flink jobs. 

Hope above information can help you. 

Best,
Leonard
[1]https://ververica.github.io/flink-cdc-connectors/master/content/about.html
[2] 
https://github.com/ververica/flink-cdc-connectors/blob/master/flink-connector-mysql-cdc/src/main/java/com/ververica/cdc/connectors/mysql/table/MySqlTableSource.java


Reply via email to