Hi,
The equivalent would be setting a parallelism on your sink operator. e.g.
stream.addSink(…).setParallelism(…).
By default the parallelism of all operators in the pipeline will be whatever
parallelism was set for the whole job, unless parallelism is explicitly set for
a specific operator. For more details on the distributed runtime concepts you
can take a look at [1]
I saw the implementation of elasticsearch sink in Flink which can do
batching of messsges before writes. How can I batch data based on a custom
logic? For eg: batch writes grouped on one of the message keys. This is
possible in Storm via FieldGrouping.
The equivalent of partitioning streams in Flink is `stream.keyBy(…)`. All
messages of the same key would then go to the same parallel downstream operator
instance. If its an ElasticsearchSink, then following a keyBy all messages of
the same key will be batched by the same ElasticSearch writer.
Hope this helps!
Cheers,
Gordon
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.3/concepts/runtime.html
On 8 August 2017 at 8:58:30 AM, Jakes John (jakesjohn12...@gmail.com) wrote:
I am coming from Apache Storm world. I am planning to switch from storm
to flink. I was reading Flink documentation but, I couldn't find some
requirements in Flink which was present in Storm.
I need to have a streaming pipeline Kafka->flink-> ElasticSearch. In storm,
I have seen that I can specify number of tasks per bolt. Typically databases
are slow in writes and hence I need more writers to the database. Reading from
kafka is pretty fast when compared to ES writes. This means that I need to
have more ES writer tasks than Kafka consumers. How can I achieve it in Flink?
What are the concepts in Flink similar to Storm Parallelism concepts like
workers, executors, tasks?
I saw the implementation of elasticsearch sink in Flink which can do
batching of messsges before writes. How can I batch data based on a custom
logic? For eg: batch writes grouped on one of the message keys. This is
possible in Storm via FieldGrouping. But I couldn't find an equivalent way to
do grouping in Flink and control the overall number of writes to ES.
Please help me with above questions and some pointers to flink parallelism.