Re: Cache Memory Kafka Process
Hi Ewen, Thanks for reply. The assumptions that you made for replication and partitions are correct, 120 is total number of partitions and replication factor is 1 for all the topics. Does that mean that a broker will keep all the messages that are produced in memory, or will only the unconsumed messages. is there a way we can restrict this to only x number of messages or x MB of total data in memory. Regards, Nilesh Chhapru. On Tuesday 28 July 2015 12:37 PM, Ewen Cheslack-Postava wrote: Nilesh, It's expected that a lot of memory is used for cache. This makes sense because under the hood, Kafka mostly just reads and writes data to/from files. While Kafka does manage some in-memory data, mostly it is writing produced data (or replicated data) to log files and then serving those same messages to consumers directly out of the log files. It relies on OS-level file system caching optimize how data is managed. Operating systems are already designed to do this well, so it's generally better to reuse this functionality than to try to implement a custom caching layer. So when you see most of your memory consumed as cache, that's because the OS has used the access patterns for data in those files to select which parts of different files seem most likely to be useful in the future. As Daniel's link points out, it's only doing this when that memory is not needed for some other purpose. This approach isn't always perfect. If you have too much data to fit in memory and you scan through it, performance will suffer. Eventually, you will hit regions of files that are not in cache and the OS will be forced to read those off disk, which is much slower than reading from cache. From your description I'm not sure if you have 120 partitions *per topic* or *total* across all topics. Let's go with the lesser, 120 partitions total. You also mention 3 brokers. Dividing 120 partitions across 3 brokers, we get about 40 partitions each broker is a leader for, which is data it definitely needs cached in order to serve consumers. You didn't mention the replication factor, so let's just ignore it here and assume the lowest possible, only 1 copy of the data. Even so, it looks like you have ~8GB of memory (based on the free -m numbers), and at 15 MB/message with 40 partitions per broker, that's only 8192/(15*40) = ~14 messages per partition that would fit in memory, assuming it was all used for file cache. That's not much, so if your total data stored is much larger and you ever have to read through any old data, your throughput will likely suffer. It's hard to say much more without understanding what your workload is like, if you're consuming data other than what the Storm spout is consuming, the rate at which you're producing data, etc. However, my initial impression is that you may be trying to process too much data with too little memory and too little disk throughput. If you want more details, I'd suggest reading this section of the docs, which further explains how a lot of this stuff works: http://kafka.apache.org/documentation.html#persistence -Ewen On Mon, Jul 27, 2015 at 11:19 PM, Nilesh Chhapru nilesh.chha...@ugamsolutions.com wrote: Hi Ewen, I am using 3 brokers with 12 topic and near about 120-125 partitions without any replication and the message size is approx 15MB/message. The problem is when the cache memory increases and reaches to the max available the performance starts degrading also i am using Storm spot as consumer which stops reading at times. When i do a free -m on my broker node after 1/2 - 1 hr the memory foot print is as follows. 1) Physical memory - 500 MB - 600 MB 2) Cache Memory - 6.5 GB 3) Free Memory - 50 - 60 MB Regards, Nilesh Chhapru. On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote: Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed. This is good for the latency and throughput of consumers. Usually this caching works out pretty well, keeping the latest data from your topics in cache and only pulling older data into memory if a consumer reads data from earlier in the log. In other words, by leveraging OS-level caching of files, Kafka gets an in-memory caching layer for free. Generally you shouldn't need to clear this data -- the OS should only be using memory that isn't being used anyway. Is there a particular problem you're encountering that clearing the cache would help with? -Ewen On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru nilesh.chha...@ugamsolutions.com wrote: Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru.
Re: Cache Memory Kafka Process
Hi Ewen, I am using 3 brokers with 12 topic and near about 120-125 partitions without any replication and the message size is approx 15MB/message. The problem is when the cache memory increases and reaches to the max available the performance starts degrading also i am using Storm spot as consumer which stops reading at times. When i do a free -m on my broker node after 1/2 - 1 hr the memory foot print is as follows. 1) Physical memory - 500 MB - 600 MB 2) Cache Memory - 6.5 GB 3) Free Memory - 50 - 60 MB Regards, Nilesh Chhapru. On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote: Having the OS cache the data in Kafka's log files is useful since it means that data doesn't need to be read back from disk when consumed. This is good for the latency and throughput of consumers. Usually this caching works out pretty well, keeping the latest data from your topics in cache and only pulling older data into memory if a consumer reads data from earlier in the log. In other words, by leveraging OS-level caching of files, Kafka gets an in-memory caching layer for free. Generally you shouldn't need to clear this data -- the OS should only be using memory that isn't being used anyway. Is there a particular problem you're encountering that clearing the cache would help with? -Ewen On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru nilesh.chha...@ugamsolutions.com wrote: Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru.
Cache Memory Kafka Process
Hi All, I am facing issues with kafka broker process taking a lot of cache memory, just wanted to know if the process really need that much of cache memory, or can i clear the OS level cache by setting a cron. Regards, Nilesh Chhapru.
Issues With Parallelism In Kafka Spout
Hi All, I have implemented a high level Kafka consumer in Storm but looks like the parallelism isn't getting achieved as I have 3 partitions and 2 task for the spout, but only one of it is emitting the data. PFA the screen grab for number of task of spout and data emitted by only one of them. Please assist on how to achieve parallelism using high level Kafka spout. Regards, Nilesh Chhapru. ---Disclaimer-- Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media. Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email.
RE: Issues With Parallelism In Kafka Spout
Hi All, Please give some inputs as this is pending since long and need to meet the deadlines Regards, Nilesh Chhapru. From: Nilesh Chhapru [mailto:nilesh.chha...@ugamsolutions.com] Sent: 18 December 2014 01:24 PM To: u...@storm.apache.org; users@kafka.apache.org Subject: Issues With Parallelism In Kafka Spout Hi All, I have implemented a high level Kafka consumer in Storm but looks like the parallelism isn't getting achieved as I have 3 partitions and 2 task for the spout, but only one of it is emitting the data. PFB the screen grab for number of task of spout and data emitted by only one of them. Please assist on how to achieve parallelism using high level Kafka spout. [cid:image001.png@01D01ABD.FA56FB90] Regards, Nilesh Chhapru. ---Disclaimer-- Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media. Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ---Disclaimer-- Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media. Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email.
Issues With Parallelism In Kafka Spout
Hi All, I have implemented a high level Kafka consumer in Storm but looks like the parallelism isn't getting achieved as I have 3 partitions and 2 task for the spout, but only one of it is emitting the data. PFB the screen grab for number of task of spout and data emitted by only one of them. Please assist on how to achieve parallelism using high level Kafka spout. [cid:image001.png@01D01ABD.FA56FB90] Regards, Nilesh Chhapru. ---Disclaimer-- Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media. Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email.