Hi, We are trying to optimize our Storm topology that uses Kafka-Spout and Elastic-Search-Bolt (no other spouts/bolts).
Current performance statistics are as follows: storm-workers: 1 elastic-search primaries : 1 elastic-search replicas : 1 1 process in storm having 1 kafka-spout thread and 6 elastic-search bolt threads kafka-fetch-size : 10 MB kafka-buffer-size : 11 MB es-flush-entries-size : 10,000 16gb heap size with new-ratio = 1 (for Elastic-Search as well as Storm) average kafka-message-size : 1 kb The maximum ingestion rate we are able to achieve with the above is 800,000 messages per minute from kafka to elastic-search. These statistics scale almost horizontally with the number of storm worker nodes/processes (we use LOCAL_OR_SHUFFLE grouping) and with a similar increase in elastic-search nodes. Can someone comment on these throughput statistics? Any recommendations on increasing the throughput would be much appreciated. Thanks, Tid
