Good point. We've only got two disks per node and two topics so I was planning to have one disk/partition.
Our workload is very write heavy so I'm mostly concerned about write throughput. Will we get write speed improvements by sticking to 1 partition/disk or will the difference between 1 and 3 partitions/node be negligible? > On 24/06/2014, at 9:42 pm, Paul Mackles <pmack...@adobe.com> wrote: > > You'll want to account for the number of disks per node. Normally, > partitions are spread across multiple disks. Even more important, the OS > file cache reduces the amount of seeking provided that you are reading > mostly sequentially and your consumers are keeping up. > >> On 6/24/14 3:58 AM, "Daniel Compton" <d...@danielcompton.net> wrote: >> >> I¹ve been reading the Kafka docs and one thing that I¹m having trouble >> understanding is how partitions affect sequential disk IO. One of the >> reasons Kafka is so fast is that you can do lots of sequential IO with >> read-ahead cache and all of that goodness. However, if your broker is >> responsible for say 20 partitions, then won¹t the disk be seeking to 20 >> different spots for its writes and reads? I thought that maybe letting >> the OS handle fsync would make this less of an issue but it still seems >> like it could be a problem. >> >> In our particular situation, we are going to have 6 brokers, 3 in each >> DC, with mirror maker replication from the secondary DC to the primary >> DC. We aren¹t likely to need to add more nodes for a while so would it be >> faster to have 1 partition/node than say 3-4/node to minimise the seek >> times on disk? >> >> Are my assumptions correct or is this not an issue in practice? There are >> some nice things about having more partitions like rebalancing more >> evenly if we lose a broker but we don¹t want to make things significantly >> slower to get this. >> >> Thanks, Daniel. >