What would you consider being a message that is “too large”


In April I ran a bunch of tests which I outlined in the following thread



http://grokbase.com/t/kafka/users/145g8k62rf/performance-testing-data-to-share



It includes a google doc link with all the results (its easiest to download
in excel and uses filters to drill into what you want).  When looking at
snappy vs NONE I didn’t see much improvement for 2200 byte messages  we are
looking at and for small messages NONE was the fastest.



Running Kafka 8.0 on a three node cluster.  16 core, 256GB RAM, 12 4TB
drives

Message.size = 2200

Batch.size  = 400

Partitions = 12

Replication=3

acks=leader



I was able to get…

SNAPPY = 151K messages per second

NONE = 140K messages per second

GZIP = 86K messages per second



With small messages of 200 bytes

SNAPPY = 660K messages per second

NONE = 740K messages per second

GZIP = 340K messages per second





So let’s assume I can compress 2200 bytes into 200 bytes.  (Just using
these numbers as I ran tests on these sizes, my guess is I will not get
this good compression, but its an example)   If I run uncompressed I could
process 140K messages per second.  If I compressed in my application from
2200 to 200 bytes I could then send through Kafka at 740K events per second




Bert








On Thu, Jun 26, 2014 at 5:23 PM, Neha Narkhede <neha.narkh...@gmail.com>
wrote:

> Using a single Kafka message to contain an application snapshot has the
> upside of getting atomicity for free. Either the snapshot will be written
> as a whole to Kafka or not. This is poor man's transactionality. Care needs
> to be taken to ensure that the message is not too large since that might
> cause memory consumption problems on the server or the consumers.
>
> As far as compression overhead is concerned, have you tried running Snappy?
> Snappy's performance is good enough to offset the decompression-compression
> overhead on the server.
>
> Thanks,
> Neha
>
>
> On Thu, Jun 26, 2014 at 12:42 PM, Bert Corderman <bertc...@gmail.com>
> wrote:
>
> > We are in the process of engineering a system that will be using kafka.
> > The legacy system is using the local file system and  a database as the
> > queue.  In terms of scale we process about 35 billion events per day
> > contained in 15 million files.
> >
> >
> >
> > I am looking for feedback on a design decision we are discussing
> >
> >
> >
> > In our current system we depending heavily on compression as a
> performance
> > optimization.  In kafka the use of compression has some overhead as the
> > broker needs to decompress the data to assign offsets and re-compress.
> > (explained in detail here
> >
> >
> http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/
> > )
> >
> >
> >
> > We are thinking about NOT using Kafka compression but rather compressing
> > multiple rows in our code. For example let’s say we wanted to send data
> in
> > batches of 5,00 rows.  Using Kafka compression we would use a batch size
> of
> >  5,000 rows and use compression. The other option is using a batch size
> of
> > 1 in Kafka BUT in our code take 5,000 messages, compress them and then
> send
> > to kafka using the kafka compression setting of none.
> >
> >
> >
> > Is this  a pattern others have used?
> >
> >
> >
> > Regardless of compression I am curious if others are using a single
> message
> > in kafka to contain multiple messages from an application standpoint.
> >
> >
> > Bert
> >
>

Reply via email to