Hi Ewen,

Thanks for reply.
The assumptions that you made for replication and partitions are
correct, 120 is total number of partitions and replication factor is 1
for all the topics.

Does that mean that a broker will keep all the messages that are
produced in memory, or will only the unconsumed messages.

is there a way we can restrict this to only x number of messages or x MB
of total data  in memory.

Regards,
Nilesh Chhapru.

On Tuesday 28 July 2015 12:37 PM, Ewen Cheslack-Postava wrote:
> Nilesh,
>
> It's expected that a lot of memory is used for cache. This makes sense
> because under the hood, Kafka mostly just reads and writes data to/from
> files. While Kafka does manage some in-memory data, mostly it is writing
> produced data (or replicated data) to log files and then serving those same
> messages to consumers directly out of the log files. It relies on OS-level
> file system caching optimize how data is managed. Operating systems are
> already designed to do this well, so it's generally better to reuse this
> functionality than to try to implement a custom caching layer.
>
> So when you see most of your memory consumed as cache, that's because the
> OS has used the access patterns for data in those files to select which
> parts of different files seem most likely to be useful in the future. As
> Daniel's link points out, it's only doing this when that memory is not
> needed for some other purpose.
>
> This approach isn't always perfect. If you have too much data to fit in
> memory and you scan through it, performance will suffer. Eventually, you
> will hit regions of files that are not in cache and the OS will be forced
> to read those off disk, which is much slower than reading from cache.
>
> From your description I'm not sure if you have 120 partitions *per topic*
> or *total* across all topics. Let's go with the lesser, 120 partitions
> total. You also mention 3 brokers. Dividing 120 partitions across 3
> brokers, we get about 40 partitions each broker is a leader for, which is
> data it definitely needs cached in order to serve consumers. You didn't
> mention the replication factor, so let's just ignore it here and assume the
> lowest possible, only 1 copy of the data. Even so, it looks like you have
> ~8GB of memory (based on the free -m numbers), and at 15 MB/message with 40
> partitions per broker, that's only 8192/(15*40) = ~14 messages per
> partition that would fit in memory, assuming it was all used for file
> cache. That's not much, so if your total data stored is much larger and you
> ever have to read through any old data, your throughput will likely suffer.
>
> It's hard to say much more without understanding what your workload is
> like, if you're consuming data other than what the Storm spout is
> consuming, the rate at which you're producing data, etc. However, my
> initial impression is that you may be trying to process too much data with
> too little memory and too little disk throughput.
>
> If you want more details, I'd suggest reading this section of the docs,
> which further explains how a lot of this stuff works:
> http://kafka.apache.org/documentation.html#persistence
>
> -Ewen
>
> On Mon, Jul 27, 2015 at 11:19 PM, Nilesh Chhapru <
> nilesh.chha...@ugamsolutions.com> wrote:
>
>> Hi Ewen,
>>
>> I am using 3 brokers with 12 topic and near about 120-125 partitions
>> without any replication and the message size is approx 15MB/message.
>>
>> The problem is when the cache memory increases and reaches to the max
>> available the performance starts degrading also i am using Storm spot as
>> consumer which  stops reading at times.
>>
>> When i do a free -m on my broker node after 1/2 - 1 hr the memory foot
>> print is as follows.
>> 1) Physical memory - 500 MB - 600 MB
>> 2) Cache Memory - 6.5 GB
>> 3) Free Memory - 50 - 60 MB
>>
>> Regards,
>> Nilesh Chhapru.
>>
>> On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote:
>>> Having the OS cache the data in Kafka's log files is useful since it
>> means
>>> that data doesn't need to be read back from disk when consumed. This is
>>> good for the latency and throughput of consumers. Usually this caching
>>> works out pretty well, keeping the latest data from your topics in cache
>>> and only pulling older data into memory if a consumer reads data from
>>> earlier in the log. In other words, by leveraging OS-level caching of
>>> files, Kafka gets an in-memory caching layer for free.
>>>
>>> Generally you shouldn't need to clear this data -- the OS should only be
>>> using memory that isn't being used anyway. Is there a particular problem
>>> you're encountering that clearing the cache would help with?
>>>
>>> -Ewen
>>>
>>> On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru <
>>> nilesh.chha...@ugamsolutions.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am facing issues with kafka broker process taking  a lot of cache
>>>> memory, just wanted to know if the process really need that much of
>>>> cache memory, or can i clear the OS level cache by setting a cron.
>>>>
>>>> Regards,
>>>> Nilesh Chhapru.
>>>>
>>>
>>
>

Reply via email to