Is there any downside to using Kafka high level consumer as spout?
The main downside of the high level consumer is that you won't be able
to control exactly when it will request a broker for more data and that
it will always commit the latest offset you read from the stream it
provides. In a somewhat continuous stream of messages, the first part
won't matter much, you can tweak all of the client properties listed on
the kafka site. The latter part becomes somewhat complicated when you
need to be able to replay messages that fail within your topology,
assuming you don't want your client to commit anything that hasn't been
ack()'d to the spout.
There's up and downsides to using either the low or high level client,
feel free to examine the differences between
https://github.com/wurstmeister/storm-kafka-0.8-plus (low level client)
and https://github.com/HolmesNL/kafka-spout/ (high level client).
I plan to spawn threads to read from various partitions of the topic in Kafka.
I reckon that won't be necessary from storm; storm will manage the
parallelism for you in terms of multiple spout instances among your
clusters. As long as you put spouts in the same consumer group, they'll
together consume all partitions (even a single client will switch
partitions now and then to ensure all are read).
Kind regards,
Mattijs