Is there any downside to using Kafka high level consumer as spout?
The main downside of the high level consumer is that you won't be able to control exactly when it will request a broker for more data and that it will always commit the latest offset you read from the stream it provides. In a somewhat continuous stream of messages, the first part won't matter much, you can tweak all of the client properties listed on the kafka site. The latter part becomes somewhat complicated when you need to be able to replay messages that fail within your topology, assuming you don't want your client to commit anything that hasn't been ack()'d to the spout.

There's up and downsides to using either the low or high level client, feel free to examine the differences between https://github.com/wurstmeister/storm-kafka-0.8-plus (low level client) and https://github.com/HolmesNL/kafka-spout/ (high level client).

I plan to spawn threads to read from various partitions of the topic in Kafka.
I reckon that won't be necessary from storm; storm will manage the parallelism for you in terms of multiple spout instances among your clusters. As long as you put spouts in the same consumer group, they'll together consume all partitions (even a single client will switch partitions now and then to ensure all are read).

Kind regards,

Mattijs

Reply via email to