Hi Alexis,
Yes, the job cannot be executed until the required number of processing
slots becomes available.
IIRC, there is a timeout and a job gets canceled once the waiting time
exceeds the threshold.
Best, Fabian
2018-08-10 15:35 GMT+02:00 Alexis Sarda :
> It ended up being a wrong
It ended up being a wrong configuration of the cluster; there was only 1
task manager with 1 slot.
If I submit a job with "flink run -p 24 ...", will the job hang until at
least 24 slots are available?
Regards,
Alexis.
On Fri, 10 Aug 2018, 14:01 Fabian Hueske wrote:
> Can you share the plan
Can you share the plan for the program?
Are you sure that more than 1 split is generated by the JdbcInputFormat?
2018-08-10 12:04 GMT+02:00 Alexis Sarda :
> It seems I may have spoken too soon. After executing the job with more
> data, I can see the following things in the Flink dashboard:
>
>
It seems I may have spoken too soon. After executing the job with more
data, I can see the following things in the Flink dashboard:
- The first subtask is a chained DataSource -> GroupCombine. Even with
parallelism set to 24 and a ParameterValuesProvider returning
Array(Array("first"),
Hi Fabian,
Thanks a lot for the help. The scala DataSet, at least in version 1.5.0,
declares javaSet as private[flink], so I cannot access it directly.
Nevertheless, I managed to get around it by using the java environment:
val env = org.apache.flink.api.java.ExecutionEnvironment.
Hi Alexis,
The Scala API does not expose a DataSource object but only a Scala DataSet
which wraps the Java object.
You can get the SplitDataProperties from the Scala DataSet as follows:
val dbData: DataSet[...] = ???
val sdp = dbData.javaSet.asInstanceOf[DataSource].getSplitDataProperties
So
Hi Fabian,
Thanks for the clarification. I have a few remarks, but let me provide more
concrete information. You can find the query I'm using, the JDBCInputFormat
creation, and the execution plan in this github gist:
https://gist.github.com/asardaes/8331a117210d4e08139c66c86e8c952d
I cannot
Hi Alexis,
First of all, I think you leverage the partitioning and sorting properties
of the data returned by the database using SplitDataProperties.
However, please be aware that SplitDataProperties are a rather experimental
feature.
If used without query parameters, the JDBCInputFormat
Hi everyone,
I have the following scenario: I have a database table with 3 columns: a
host (string), a timestamp, and an integer ID. Conceptually, what I'd like
to do is:
group by host and timestamp -> based on all the IDs in each group, create a
mapping to n new tuples -> for each unique tuple,