Hey, I used the default settings. I think, but I didn't check it, it means it flushed to disk in every 5 seconds. I just switched to SSDs (it is kinda amazing to do this online on an AWS EBS disk :D ) and it works fine now.
Regards, Peter On Sat, 14 Mar 2020 at 01:27, Eugen Dueck <eu...@tworks.co.jp> wrote: > Hi Peter > > when you were struggling with HDD, were you using custom flush settings, > or the "do not flush" default? > > I think a system like BookKeeper is essential if you want to fsync > synchronously, like Pulsar does - especially on HDDs. For SSDs, configuring > multiple directories, i.e. multiple files being written to concurrently, > may be necessary to maximize throughput. > > Eugen > ________________________________ > 差出人: Péter Sinóros-Szabó <peter.sinoros-sz...@transferwise.com.INVALID> > 送信日時: 2020年3月13日 21:42 > 宛先: users@kafka.apache.org <users@kafka.apache.org> > 件名: Re: Sequential writes make Kafka fast, or so they say > > Hi, > > yes, if you write one partition only, it will be sequential. But that's > unlikely, so in practice, it won't be sequential overall. > I used AWS EC2 instances with st1 EBS disks, that is the old HDD type > rotational disk. It struggled to give any kind of performance to support > our 6000+ partitions. Switching to gp2 SSD solved that in a second. > > There are other emerging messaging systems that for this reason writes to > one single file... like BookKeeper > > Peter > > On Thu, 12 Mar 2020 at 03:30, Eugen Dueck <eu...@tworks.co.jp> wrote: > > > A question about something that was always in the back of my mind. > > > > According to Jay Kreps > > > > > The first [reason that Kafka is so fast despite writing to disk] is > that > > Kafka does only sequential file I/O. > > > > I wonder how true this statement is, because Kafka uses 3 segments per > > partition. so even with a single topic and partition per broker and disk, > > it would not be sequential. Now say we have 1000 partitions per > > broker/disk, i.e. 3000 files. How can concurrent/interleaved writes to > > thousands of files on a single disk be considered sequential file I/O? > > > > Isn't the reason Kafka is so fast despite writing to disk the fact that > it > > does not fsync to disk, leaving that to the OS? The OS would, I assume, > be > > smart enough to order the writes when it flushes its caches to disk in a > > way that minimizes random seeks. But then, wouldn't the manner in which > > Kafka writes to files be more or less irrelevant? Or put differently: If > > Kafka was synchronously flushing to disk, wouldn't it have to limit > itself > > to writing all partitions for a broker/disk to a single file, if it > wanted > > to do sequential file I/O? > > > > For reading (historical, non-realtime) data that is not in the OS cache, > > keeping it in append-only files, the statement makes of course sense. > > >