Hi Ewen, Thanks for reply. The assumptions that you made for replication and partitions are correct, 120 is total number of partitions and replication factor is 1 for all the topics.
Does that mean that a broker will keep all the messages that are produced in memory, or will only the unconsumed messages. is there a way we can restrict this to only x number of messages or x MB of total data in memory. Regards, Nilesh Chhapru. On Tuesday 28 July 2015 12:37 PM, Ewen Cheslack-Postava wrote: > Nilesh, > > It's expected that a lot of memory is used for cache. This makes sense > because under the hood, Kafka mostly just reads and writes data to/from > files. While Kafka does manage some in-memory data, mostly it is writing > produced data (or replicated data) to log files and then serving those same > messages to consumers directly out of the log files. It relies on OS-level > file system caching optimize how data is managed. Operating systems are > already designed to do this well, so it's generally better to reuse this > functionality than to try to implement a custom caching layer. > > So when you see most of your memory consumed as cache, that's because the > OS has used the access patterns for data in those files to select which > parts of different files seem most likely to be useful in the future. As > Daniel's link points out, it's only doing this when that memory is not > needed for some other purpose. > > This approach isn't always perfect. If you have too much data to fit in > memory and you scan through it, performance will suffer. Eventually, you > will hit regions of files that are not in cache and the OS will be forced > to read those off disk, which is much slower than reading from cache. > > From your description I'm not sure if you have 120 partitions *per topic* > or *total* across all topics. Let's go with the lesser, 120 partitions > total. You also mention 3 brokers. Dividing 120 partitions across 3 > brokers, we get about 40 partitions each broker is a leader for, which is > data it definitely needs cached in order to serve consumers. You didn't > mention the replication factor, so let's just ignore it here and assume the > lowest possible, only 1 copy of the data. Even so, it looks like you have > ~8GB of memory (based on the free -m numbers), and at 15 MB/message with 40 > partitions per broker, that's only 8192/(15*40) = ~14 messages per > partition that would fit in memory, assuming it was all used for file > cache. That's not much, so if your total data stored is much larger and you > ever have to read through any old data, your throughput will likely suffer. > > It's hard to say much more without understanding what your workload is > like, if you're consuming data other than what the Storm spout is > consuming, the rate at which you're producing data, etc. However, my > initial impression is that you may be trying to process too much data with > too little memory and too little disk throughput. > > If you want more details, I'd suggest reading this section of the docs, > which further explains how a lot of this stuff works: > http://kafka.apache.org/documentation.html#persistence > > -Ewen > > On Mon, Jul 27, 2015 at 11:19 PM, Nilesh Chhapru < > nilesh.chha...@ugamsolutions.com> wrote: > >> Hi Ewen, >> >> I am using 3 brokers with 12 topic and near about 120-125 partitions >> without any replication and the message size is approx 15MB/message. >> >> The problem is when the cache memory increases and reaches to the max >> available the performance starts degrading also i am using Storm spot as >> consumer which stops reading at times. >> >> When i do a free -m on my broker node after 1/2 - 1 hr the memory foot >> print is as follows. >> 1) Physical memory - 500 MB - 600 MB >> 2) Cache Memory - 6.5 GB >> 3) Free Memory - 50 - 60 MB >> >> Regards, >> Nilesh Chhapru. >> >> On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote: >>> Having the OS cache the data in Kafka's log files is useful since it >> means >>> that data doesn't need to be read back from disk when consumed. This is >>> good for the latency and throughput of consumers. Usually this caching >>> works out pretty well, keeping the latest data from your topics in cache >>> and only pulling older data into memory if a consumer reads data from >>> earlier in the log. In other words, by leveraging OS-level caching of >>> files, Kafka gets an in-memory caching layer for free. >>> >>> Generally you shouldn't need to clear this data -- the OS should only be >>> using memory that isn't being used anyway. Is there a particular problem >>> you're encountering that clearing the cache would help with? >>> >>> -Ewen >>> >>> On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru < >>> nilesh.chha...@ugamsolutions.com> wrote: >>> >>>> Hi All, >>>> >>>> I am facing issues with kafka broker process taking a lot of cache >>>> memory, just wanted to know if the process really need that much of >>>> cache memory, or can i clear the OS level cache by setting a cron. >>>> >>>> Regards, >>>> Nilesh Chhapru. >>>> >>> >> >