Hi Peter when you were struggling with HDD, were you using custom flush settings, or the "do not flush" default?
I think a system like BookKeeper is essential if you want to fsync synchronously, like Pulsar does - especially on HDDs. For SSDs, configuring multiple directories, i.e. multiple files being written to concurrently, may be necessary to maximize throughput. Eugen ________________________________ 差出人: Péter Sinóros-Szabó <peter.sinoros-sz...@transferwise.com.INVALID> 送信日�r: 2020年3月13日 21:42 宛先: users@kafka.apache.org <users@kafka.apache.org> 件名: Re: Sequential writes make Kafka fast, or so they say Hi, yes, if you write one partition only, it will be sequential. But that's unlikely, so in practice, it won't be sequential overall. I used AWS EC2 instances with st1 EBS disks, that is the old HDD type rotational disk. It struggled to give any kind of performance to support our 6000+ partitions. Switching to gp2 SSD solved that in a second. There are other emerging messaging systems that for this reason writes to one single file... like BookKeeper Peter On Thu, 12 Mar 2020 at 03:30, Eugen Dueck <eu...@tworks.co.jp> wrote: > A question about something that was always in the back of my mind. > > According to Jay Kreps > > > The first [reason that Kafka is so fast despite writing to disk] is that > Kafka does only sequential file I/O. > > I wonder how true this statement is, because Kafka uses 3 segments per > partition. so even with a single topic and partition per broker and disk, > it would not be sequential. Now say we have 1000 partitions per > broker/disk, i.e. 3000 files. How can concurrent/interleaved writes to > thousands of files on a single disk be considered sequential file I/O? > > Isn't the reason Kafka is so fast despite writing to disk the fact that it > does not fsync to disk, leaving that to the OS? The OS would, I assume, be > smart enough to order the writes when it flushes its caches to disk in a > way that minimizes random seeks. But then, wouldn't the manner in which > Kafka writes to files be more or less irrelevant? Or put differently: If > Kafka was synchronously flushing to disk, wouldn't it have to limit itself > to writing all partitions for a broker/disk to a single file, if it wanted > to do sequential file I/O? > > For reading (historical, non-realtime) data that is not in the OS cache, > keeping it in append-only files, the statement makes of course sense. >