Issues with Kafka with High Volume of Data

Siva Ranjan Fri, 19 Nov 2021 07:01:52 -0800

Hello,

During my Internship at KLA Inc, we were using Kafka as a streaming
platform. We were using it to transfer text and image data from a Linux
Machine to a Windows machine.


The data is in the form of discrete non serialized records that contain
text and image data in the form of blobs. We then serialize it using
Flatbuffers or Protocol Buffers and then send it to a Kafka Producer, that
then pushes this data to the Broker. This data is then consumed on the
windows side by 10 Consumers per topic. There were two topics, with 10
partitions each. One dealt with the text data and the other one dealt with
the blob data.

So we simulate large volumes of data, by reading a single file once and
then looping the contents of it until we reach desired volume of data. We
tested it for 10K (722 MB), 100K(7.01 GB), 500K (35 GB) , 1M (69.7 GB)
records it worked flawlessly. But the issue occurred when we tried to
simulate 5M records which amounts to 348 GB. The text data worked
perfectly, but when it came to consuming the blob/Image data, the consumer
would not poll more than 1 or 2 records, whereas, with the other runs, the
image consumer would poll >1000 records. We were unable to solve this issue
and I don't know if it's a configuration issue or an issue with the systems
or Kafka's capability. I believe the producer also slowed down massively.
We were monitoring it through Prometheus and Grafana and the only insight
we got was that it started at around 2- 3M records (140 - 210 GB)

These are the specifications of the Systems used along with the
specification of the Producer, Broker and Consumer:

*Linux Machine: *
Processor : Intel® Xeon® Processor E5-2658 @ 2.40 GHz
Storage : 2 TB SSD + 1 TB HDD
RAM : 128 GB
Operating System : SUSE Linux Enterprise Server 11

*Windows Machine*
Processor : Intel® Xeon® Processor E5-2620 @ 2.00 GHz
RAM: 32 GB
Operating System: Windows Server 2008 R2 Standard

*Consumer Config:*
max.partition.fetch.bytes=1000000000
max.poll.records=1000000000
fetch.min.bytes=100000000
fetch.max.wait.ms=1000
fetch.max.bytes=1000000000
poll.ms=1000

*Producer config:*
compression.type=snappy
batch.size=30000
linger.ms=10
buffer.memory=33554432


*Additional Points:*
Only one broker was used. The transfer of records took place over
Infiniband and the Kafka logs were on an SSD. The same issue occurred when
we used a Spark Consumer.

Any insight on why it occurred and how to solve it would be highly
appreciated.

Best regards,
Sivaranjan M

Issues with Kafka with High Volume of Data

Reply via email to