Hey,

I used the default settings. I think, but I didn't check it, it means it
flushed to disk in every 5 seconds.
I just switched to SSDs (it is kinda amazing to do this online on an AWS
EBS disk :D ) and it works fine now.

Regards,
Peter

On Sat, 14 Mar 2020 at 01:27, Eugen Dueck <eu...@tworks.co.jp> wrote:

> Hi Peter
>
> when you were struggling with HDD, were you using custom flush settings,
> or the "do not flush" default?
>
> I think a system like BookKeeper is essential if you want to fsync
> synchronously, like Pulsar does - especially on HDDs. For SSDs, configuring
> multiple directories, i.e. multiple files being written to concurrently,
> may be necessary to maximize throughput.
>
> Eugen
> ________________________________
> 差出人: Péter Sinóros-Szabó <peter.sinoros-sz...@transferwise.com.INVALID>
> 送信日時: 2020年3月13日 21:42
> 宛先: users@kafka.apache.org <users@kafka.apache.org>
> 件名: Re: Sequential writes make Kafka fast, or so they say
>
> Hi,
>
> yes, if you write one partition only, it will be sequential. But that's
> unlikely, so in practice, it won't be sequential overall.
> I used AWS EC2 instances with st1 EBS disks, that is the old HDD type
> rotational disk. It struggled to give any kind of performance to support
> our 6000+ partitions. Switching to gp2 SSD solved that in a second.
>
> There are other emerging messaging systems that for this reason writes to
> one single file... like BookKeeper
>
> Peter
>
> On Thu, 12 Mar 2020 at 03:30, Eugen Dueck <eu...@tworks.co.jp> wrote:
>
> > A question about something that was always in the back of my mind.
> >
> > According to Jay Kreps
> >
> > > The first [reason that Kafka is so fast despite writing to disk] is
> that
> > Kafka does only sequential file I/O.
> >
> > I wonder how true this statement is, because Kafka uses 3 segments per
> > partition. so even with a single topic and partition per broker and disk,
> > it would not be sequential. Now say we have 1000 partitions per
> > broker/disk, i.e. 3000 files. How can concurrent/interleaved writes to
> > thousands of files on a single disk be considered sequential file I/O?
> >
> > Isn't the reason Kafka is so fast despite writing to disk the fact that
> it
> > does not fsync to disk, leaving that to the OS? The OS would, I assume,
> be
> > smart enough to order the writes when it flushes its caches to disk in a
> > way that minimizes random seeks. But then, wouldn't the manner in which
> > Kafka writes to files be more or less irrelevant? Or put differently: If
> > Kafka was synchronously flushing to disk, wouldn't it have to limit
> itself
> > to writing all partitions for a broker/disk to a single file, if it
> wanted
> > to do sequential file I/O?
> >
> > For reading (historical, non-realtime) data that is not in the OS cache,
> > keeping it in append-only files, the statement makes of course sense.
> >
>

Reply via email to