Hi expert,
I would ask you some guidelines, web-pages or comments regarding my
use-case.
*Requirements*:
- 2000+ producers
- input rate 600k messages/s
- consumers must write in 3 different databases (so i assume 3 consumer
groups) at 600k messages/s overall (200k messages/s/database)
- latency < 500ms between producers and databases
- good availability
- Possibility to process messages before to send them to the databases
(Kafka stream? Of course in HA. Docker? Marathon?)
- it's tolerate missing data ( 0.5% max ) (disk writing is not strictly
required), latency has higher priority
- record size: 100-1000
*Resources*:
brokers ( Bandwidth: 25 Gbps, 32Cpus, 1 disk (I/O 99.0 MB/s)
producers -> brokers -> consumers ( Bandwidth: 1 Gbps )
*My* *configuration*:
3 brokers
6 partition (without replication in order to minimize latency)
ack = 0 (missing data is tolerate)
batch.size = 1024 (with 8196 the throughput is max)
producers -> compression.type=none
I did test using kafka-producer-perf-test.sh and
kafka-consumer-perf-test.sh and i have a good throughput (500-600k
messages/s using 3 producers and 3 consumers) but i would improve
latency (0.3-2 sec) or features I'm not still considering.
I thank you in advance.
Cheers,
Gioacchino