Nilesh,

It's expected that a lot of memory is used for cache. This makes sense
because under the hood, Kafka mostly just reads and writes data to/from
files. While Kafka does manage some in-memory data, mostly it is writing
produced data (or replicated data) to log files and then serving those same
messages to consumers directly out of the log files. It relies on OS-level
file system caching optimize how data is managed. Operating systems are
already designed to do this well, so it's generally better to reuse this
functionality than to try to implement a custom caching layer.

So when you see most of your memory consumed as cache, that's because the
OS has used the access patterns for data in those files to select which
parts of different files seem most likely to be useful in the future. As
Daniel's link points out, it's only doing this when that memory is not
needed for some other purpose.

This approach isn't always perfect. If you have too much data to fit in
memory and you scan through it, performance will suffer. Eventually, you
will hit regions of files that are not in cache and the OS will be forced
to read those off disk, which is much slower than reading from cache.

>From your description I'm not sure if you have 120 partitions *per topic*
or *total* across all topics. Let's go with the lesser, 120 partitions
total. You also mention 3 brokers. Dividing 120 partitions across 3
brokers, we get about 40 partitions each broker is a leader for, which is
data it definitely needs cached in order to serve consumers. You didn't
mention the replication factor, so let's just ignore it here and assume the
lowest possible, only 1 copy of the data. Even so, it looks like you have
~8GB of memory (based on the free -m numbers), and at 15 MB/message with 40
partitions per broker, that's only 8192/(15*40) = ~14 messages per
partition that would fit in memory, assuming it was all used for file
cache. That's not much, so if your total data stored is much larger and you
ever have to read through any old data, your throughput will likely suffer.

It's hard to say much more without understanding what your workload is
like, if you're consuming data other than what the Storm spout is
consuming, the rate at which you're producing data, etc. However, my
initial impression is that you may be trying to process too much data with
too little memory and too little disk throughput.

If you want more details, I'd suggest reading this section of the docs,
which further explains how a lot of this stuff works:
http://kafka.apache.org/documentation.html#persistence

-Ewen

On Mon, Jul 27, 2015 at 11:19 PM, Nilesh Chhapru <
nilesh.chha...@ugamsolutions.com> wrote:

> Hi Ewen,
>
> I am using 3 brokers with 12 topic and near about 120-125 partitions
> without any replication and the message size is approx 15MB/message.
>
> The problem is when the cache memory increases and reaches to the max
> available the performance starts degrading also i am using Storm spot as
> consumer which  stops reading at times.
>
> When i do a free -m on my broker node after 1/2 - 1 hr the memory foot
> print is as follows.
> 1) Physical memory - 500 MB - 600 MB
> 2) Cache Memory - 6.5 GB
> 3) Free Memory - 50 - 60 MB
>
> Regards,
> Nilesh Chhapru.
>
> On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote:
> > Having the OS cache the data in Kafka's log files is useful since it
> means
> > that data doesn't need to be read back from disk when consumed. This is
> > good for the latency and throughput of consumers. Usually this caching
> > works out pretty well, keeping the latest data from your topics in cache
> > and only pulling older data into memory if a consumer reads data from
> > earlier in the log. In other words, by leveraging OS-level caching of
> > files, Kafka gets an in-memory caching layer for free.
> >
> > Generally you shouldn't need to clear this data -- the OS should only be
> > using memory that isn't being used anyway. Is there a particular problem
> > you're encountering that clearing the cache would help with?
> >
> > -Ewen
> >
> > On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru <
> > nilesh.chha...@ugamsolutions.com> wrote:
> >
> >> Hi All,
> >>
> >> I am facing issues with kafka broker process taking  a lot of cache
> >> memory, just wanted to know if the process really need that much of
> >> cache memory, or can i clear the OS level cache by setting a cron.
> >>
> >> Regards,
> >> Nilesh Chhapru.
> >>
> >
> >
>
>


-- 
Thanks,
Ewen

Reply via email to