Re: JDBCInputFormat and SplitDataProperties

2018-08-13 Thread Fabian Hueske
Hi Alexis, Yes, the job cannot be executed until the required number of processing slots becomes available. IIRC, there is a timeout and a job gets canceled once the waiting time exceeds the threshold. Best, Fabian 2018-08-10 15:35 GMT+02:00 Alexis Sarda : > It ended up being a wrong

Re: JDBCInputFormat and SplitDataProperties

2018-08-10 Thread Alexis Sarda
It ended up being a wrong configuration of the cluster; there was only 1 task manager with 1 slot. If I submit a job with "flink run -p 24 ...", will the job hang until at least 24 slots are available? Regards, Alexis. On Fri, 10 Aug 2018, 14:01 Fabian Hueske wrote: > Can you share the plan

Re: JDBCInputFormat and SplitDataProperties

2018-08-10 Thread Fabian Hueske
Can you share the plan for the program? Are you sure that more than 1 split is generated by the JdbcInputFormat? 2018-08-10 12:04 GMT+02:00 Alexis Sarda : > It seems I may have spoken too soon. After executing the job with more > data, I can see the following things in the Flink dashboard: > >

Re: JDBCInputFormat and SplitDataProperties

2018-08-10 Thread Alexis Sarda
It seems I may have spoken too soon. After executing the job with more data, I can see the following things in the Flink dashboard: - The first subtask is a chained DataSource -> GroupCombine. Even with parallelism set to 24 and a ParameterValuesProvider returning Array(Array("first"),

Re: JDBCInputFormat and SplitDataProperties

2018-08-09 Thread Alexis Sarda
Hi Fabian, Thanks a lot for the help. The scala DataSet, at least in version 1.5.0, declares javaSet as private[flink], so I cannot access it directly. Nevertheless, I managed to get around it by using the java environment: val env = org.apache.flink.api.java.ExecutionEnvironment.

Re: JDBCInputFormat and SplitDataProperties

2018-08-09 Thread Fabian Hueske
Hi Alexis, The Scala API does not expose a DataSource object but only a Scala DataSet which wraps the Java object. You can get the SplitDataProperties from the Scala DataSet as follows: val dbData: DataSet[...] = ??? val sdp = dbData.javaSet.asInstanceOf[DataSource].getSplitDataProperties So

Re: JDBCInputFormat and SplitDataProperties

2018-08-08 Thread Alexis Sarda
Hi Fabian, Thanks for the clarification. I have a few remarks, but let me provide more concrete information. You can find the query I'm using, the JDBCInputFormat creation, and the execution plan in this github gist: https://gist.github.com/asardaes/8331a117210d4e08139c66c86e8c952d I cannot

Re: JDBCInputFormat and SplitDataProperties

2018-08-08 Thread Fabian Hueske
Hi Alexis, First of all, I think you leverage the partitioning and sorting properties of the data returned by the database using SplitDataProperties. However, please be aware that SplitDataProperties are a rather experimental feature. If used without query parameters, the JDBCInputFormat

JDBCInputFormat and SplitDataProperties

2018-08-07 Thread Alexis Sarda
Hi everyone, I have the following scenario: I have a database table with 3 columns: a host (string), a timestamp, and an integer ID. Conceptually, what I'd like to do is: group by host and timestamp -> based on all the IDs in each group, create a mapping to n new tuples -> for each unique tuple,