Thanks for the great explanation Stephen.
From: Stephen Powis [mailto:[email protected]] Sent: 21 November 2017 10:17 To: [email protected] Subject: Re: Regarding storm & Kafka Configuration. 1. Parallelism - You can set a maximum of 3, one for each partition in your topic. Typically, this will net you the fastest way to get messages out of Kafka and into your topology, but doing your own testing/benchmarks would be best to know for sure. 2. How many workers - This probably depends on what kind of work your topology is doing. Is it IO bound? Memory Bound? CPU Bound? 3. Max pending - Are you using timeouts/tracking tuples through your topology? Typically you want this high enough such that your bolts are not starved for things to work on, but not so high that tuples are queued up waiting to be processed and timeout before they can be worked on. The biggest trick here is your "total tuples in flight" is equal to (Number Of Spout Instances * Your Configured Max Spout Pending). For example, if you set max pending to 1000, and have 3 spout instances, you can have ~3000 tuples in flight. On Tue, Nov 21, 2017 at 12:55 PM, Mahabaleshwar <[email protected]> wrote: Hi, I am using 3 Node Kafka Cluster and i have created one topic called iot_gateway with 3 partition & 3 replication factor. My doubt is in storm Kafka spout configuration: 1. How much parallelism hint should give? 2. How much worker should give? 3. How much max pending messages should configure? 4. How should maintain task & partition relation? I need your help friends. Thanks, Mahabaleshwar
