Re: Please advise bootstrapping large state

2021-06-18 Thread Marco Villalobos
It was not clear to me that JdbcInputFormat was part of the DataSet api. Now I understand. Thank you. On Fri, Jun 18, 2021 at 5:23 AM Timo Walther wrote: > Hi Marco, > > as Robert already mentioned, the BatchTableEnvironment is simply build > on top of the DataSet API, partitioning

Re: Please advise bootstrapping large state

2021-06-18 Thread Timo Walther
Hi Marco, as Robert already mentioned, the BatchTableEnvironment is simply build on top of the DataSet API, partitioning functionality is also available in DataSet API. So using the JdbcInputFormat directly should work in DataSet API. Otherwise I would recommend to use some initial pipeline

Re: Please advise bootstrapping large state

2021-06-17 Thread Marco Villalobos
I need to bootstrap a keyed process function. So, I was hoping to use the Table SQL API because I thought it could parallelize the work more efficiently via partitioning. I need to boot strap keyed state for a keyed process function, with Flnk 1.12.1, thus I think I am required to use the DataSet

Re: Please advise bootstrapping large state

2021-06-17 Thread Timo Walther
Hi Marco, which operations do you want to execute in the bootstrap pipeline? Maybe you don't need to use SQL and old planner. At least this would simplify the friction by going through another API layer. The JDBC connector can be directly be used in DataSet API as well. Regards, Timo On

Re: Please advise bootstrapping large state

2021-06-16 Thread Marco Villalobos
Thank you very much! I tried using Flink's SQL JDBC connector, and ran into issues. According to the flink documentation, only the old planner is compatible with the DataSet API. When I connect to the table: CREATE TABLE my_table ( ) WITH ( 'connector.type' = 'jdbc', 'connector.url'

Re: Please advise bootstrapping large state

2021-06-16 Thread Robert Metzger
Hi Marco, The DataSet API will not run out of memory, as it spills to disk if the data doesn't fit anymore. Load is distributed by partitioning data. Giving you advice depends a bit on the use-case. I would explore two major options: a) reading the data from postgres using Flink's SQL JDBC

Please advise bootstrapping large state

2021-06-15 Thread Marco Villalobos
I must bootstrap state from postgres (approximately 200 GB of data) and I notice that the state processor API requires the DataSet API in order to bootstrap state for the Stream API. I wish there was a way to use the SQL API and use a partitioned scan, but I don't know if that is even possible