It was not clear to me that JdbcInputFormat was part of the DataSet api.
Now I understand.
Thank you.
On Fri, Jun 18, 2021 at 5:23 AM Timo Walther wrote:
> Hi Marco,
>
> as Robert already mentioned, the BatchTableEnvironment is simply build
> on top of the DataSet API, partitioning
Hi Marco,
as Robert already mentioned, the BatchTableEnvironment is simply build
on top of the DataSet API, partitioning functionality is also available
in DataSet API.
So using the JdbcInputFormat directly should work in DataSet API.
Otherwise I would recommend to use some initial pipeline
I need to bootstrap a keyed process function.
So, I was hoping to use the Table SQL API because I thought it could
parallelize the work more efficiently via partitioning.
I need to boot strap keyed state for a keyed process function, with
Flnk 1.12.1, thus I think I am required to use the DataSet
Hi Marco,
which operations do you want to execute in the bootstrap pipeline?
Maybe you don't need to use SQL and old planner. At least this would
simplify the friction by going through another API layer.
The JDBC connector can be directly be used in DataSet API as well.
Regards,
Timo
On
Thank you very much!
I tried using Flink's SQL JDBC connector, and ran into issues. According
to the flink documentation, only the old planner is compatible with the
DataSet API.
When I connect to the table:
CREATE TABLE my_table (
) WITH (
'connector.type' = 'jdbc',
'connector.url'
Hi Marco,
The DataSet API will not run out of memory, as it spills to disk if the
data doesn't fit anymore.
Load is distributed by partitioning data.
Giving you advice depends a bit on the use-case. I would explore two major
options:
a) reading the data from postgres using Flink's SQL JDBC
I must bootstrap state from postgres (approximately 200 GB of data) and I
notice that the state processor API requires the DataSet API in order to
bootstrap state for the Stream API.
I wish there was a way to use the SQL API and use a partitioned scan, but I
don't know if that is even possible