Hi All, In our company, we extensively use kafka and love the scalability and durability guarantees that it offers.
However, almost all use cases in the company are for unordered messages, where there are latency sensitive "online" producers pumping messages to kafka, which is then processed leisurely by offline consumers. In such cases where ordering is not that important, The dev-ops work around maintaining partitions becomes increasingly painful. There are solutions like kafka cluster manager etc, but this post is about the very need of partitions in an unordered messaging system. When we drop the ordering constraint, The design of the cluster of message brokers becomes very simple: All producer round-robin equally among all brokers and broker skew becomes non-existent. Some issues which can be solved without partitions are: *Partition are rigid* Increasing the number of partitions is a chore, while decreasing partitions on a topic is not possible at all, without resorting to something like a surgery on the cluster. As the load for topics changes, partitions tend to get skewed across the brokers, leading to multiple problems: Some brokers have heavy load while other brokers are idle. Some brokers are getting their disks full while others are relatively empty. *Adding brokers is not seamless* Adding new brokers means that we have to manually recompute all the partitions to broker mapping and run an expensive rebalancing mechanism. Kafka does not support this kind of rebalancing natively. We have written sophisticated custom rebalancer scripts to take care of partition distribution, but the actual rebalancing is still time-consuming. *In-sync replicas, ack=-1 and under-replicated partitions (oh my!)* The existing design of kafka forces a fairly complicated model of replication in terms of in-sync replicas, acks=-1. An otherwise functioning broker can go “out-of-sync” which means that broker-partition has not caught up with *Consumer paralellism is affected by number of partitions* In kafka, one cannot have more consumers (per group) than partitions. This imposes an artificial bottleneck. A lot of our ops include tuning the number of partitions because consumers keep changing the amount of parallelism the want Kafka occupies a very important niche of scalable ordered messaging system But I wanted to know what the community thinks about an unordered messaging system that primarily functions as an "online" to "offline" messaging broker -- Does your company really depend on ordering a lot? -- Do you face issues ops issues dealing with partitions? Regards, Sharath