Hi, Marco ~

It seems what you need is a temporal join from the SQL side, you can define 2 
Flink tables for your PostgreSQL ones and join your Kafka stream with them 
[1][3].

Flink 1.10 also supports this. There is some difference with the DDL compared 
to 1.11 [2]

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connectors/jdbc.html#how-to-create-a-jdbc-table
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#jdbc-connector
[3] 
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/streaming/temporal_tables.html#temporal-table

Best,
Danny Chan
在 2020年8月5日 +0800 AM4:34,Marco Villalobos <mvillalo...@kineteque.com>,写道:
> Lets say that I have:
>
> SQL Query One from data in PostgreSQL (200K records).
> SQL Query Two from data in PostgreSQL (1000 records).
> and Kafka Topic One.
>
> Let's also say that main data from this Flink job arrives in Kafka Topic One.
>
> If I need SQL Query One and SQL Query Two to happen just one time, when the 
> job starts up, and afterwards maybe store it in Keyed State or Broadcast 
> State, but it's not really part of the stream, then what is the best practice 
> for supporting that in Flink
>
> The Flink job needs to stream data from Kafka Topic One, aggregate it, and 
> perform computations that require all of the data in SQL Query One and SQL 
> Query Two to perform its business logic.
>
> I am using Flink 1.10.
>
> I supposed to query the database before the Job I submitted, and then pass it 
> on as parameters to a function?
> Or am I supposed to use JDBCInputFormat for both queries and create two 
> streams, and somehow connect or broadcast both of them two the main stream 
> that uses Kafka Topic One?
>
> I would appreciate guidance. Please.  Thank you.
>
> Sincerely,
>
> Marco A. Villalobos
>
>
>

Reply via email to