Re: Cache Memory Kafka Process

2015-07-29 Thread Nilesh Chhapru
Hi Ewen,

Thanks for reply.
The assumptions that you made for replication and partitions are
correct, 120 is total number of partitions and replication factor is 1
for all the topics.

Does that mean that a broker will keep all the messages that are
produced in memory, or will only the unconsumed messages.

is there a way we can restrict this to only x number of messages or x MB
of total data  in memory.

Regards,
Nilesh Chhapru.

On Tuesday 28 July 2015 12:37 PM, Ewen Cheslack-Postava wrote:
 Nilesh,

 It's expected that a lot of memory is used for cache. This makes sense
 because under the hood, Kafka mostly just reads and writes data to/from
 files. While Kafka does manage some in-memory data, mostly it is writing
 produced data (or replicated data) to log files and then serving those same
 messages to consumers directly out of the log files. It relies on OS-level
 file system caching optimize how data is managed. Operating systems are
 already designed to do this well, so it's generally better to reuse this
 functionality than to try to implement a custom caching layer.

 So when you see most of your memory consumed as cache, that's because the
 OS has used the access patterns for data in those files to select which
 parts of different files seem most likely to be useful in the future. As
 Daniel's link points out, it's only doing this when that memory is not
 needed for some other purpose.

 This approach isn't always perfect. If you have too much data to fit in
 memory and you scan through it, performance will suffer. Eventually, you
 will hit regions of files that are not in cache and the OS will be forced
 to read those off disk, which is much slower than reading from cache.

 From your description I'm not sure if you have 120 partitions *per topic*
 or *total* across all topics. Let's go with the lesser, 120 partitions
 total. You also mention 3 brokers. Dividing 120 partitions across 3
 brokers, we get about 40 partitions each broker is a leader for, which is
 data it definitely needs cached in order to serve consumers. You didn't
 mention the replication factor, so let's just ignore it here and assume the
 lowest possible, only 1 copy of the data. Even so, it looks like you have
 ~8GB of memory (based on the free -m numbers), and at 15 MB/message with 40
 partitions per broker, that's only 8192/(15*40) = ~14 messages per
 partition that would fit in memory, assuming it was all used for file
 cache. That's not much, so if your total data stored is much larger and you
 ever have to read through any old data, your throughput will likely suffer.

 It's hard to say much more without understanding what your workload is
 like, if you're consuming data other than what the Storm spout is
 consuming, the rate at which you're producing data, etc. However, my
 initial impression is that you may be trying to process too much data with
 too little memory and too little disk throughput.

 If you want more details, I'd suggest reading this section of the docs,
 which further explains how a lot of this stuff works:
 http://kafka.apache.org/documentation.html#persistence

 -Ewen

 On Mon, Jul 27, 2015 at 11:19 PM, Nilesh Chhapru 
 nilesh.chha...@ugamsolutions.com wrote:

 Hi Ewen,

 I am using 3 brokers with 12 topic and near about 120-125 partitions
 without any replication and the message size is approx 15MB/message.

 The problem is when the cache memory increases and reaches to the max
 available the performance starts degrading also i am using Storm spot as
 consumer which  stops reading at times.

 When i do a free -m on my broker node after 1/2 - 1 hr the memory foot
 print is as follows.
 1) Physical memory - 500 MB - 600 MB
 2) Cache Memory - 6.5 GB
 3) Free Memory - 50 - 60 MB

 Regards,
 Nilesh Chhapru.

 On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote:
 Having the OS cache the data in Kafka's log files is useful since it
 means
 that data doesn't need to be read back from disk when consumed. This is
 good for the latency and throughput of consumers. Usually this caching
 works out pretty well, keeping the latest data from your topics in cache
 and only pulling older data into memory if a consumer reads data from
 earlier in the log. In other words, by leveraging OS-level caching of
 files, Kafka gets an in-memory caching layer for free.

 Generally you shouldn't need to clear this data -- the OS should only be
 using memory that isn't being used anyway. Is there a particular problem
 you're encountering that clearing the cache would help with?

 -Ewen

 On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru 
 nilesh.chha...@ugamsolutions.com wrote:

 Hi All,

 I am facing issues with kafka broker process taking  a lot of cache
 memory, just wanted to know if the process really need that much of
 cache memory, or can i clear the OS level cache by setting a cron.

 Regards,
 Nilesh Chhapru.







Re: Cache Memory Kafka Process

2015-07-28 Thread Nilesh Chhapru
Hi Ewen,

I am using 3 brokers with 12 topic and near about 120-125 partitions
without any replication and the message size is approx 15MB/message.

The problem is when the cache memory increases and reaches to the max
available the performance starts degrading also i am using Storm spot as
consumer which  stops reading at times.

When i do a free -m on my broker node after 1/2 - 1 hr the memory foot
print is as follows.
1) Physical memory - 500 MB - 600 MB
2) Cache Memory - 6.5 GB
3) Free Memory - 50 - 60 MB

Regards,
Nilesh Chhapru.

On Monday 27 July 2015 11:02 PM, Ewen Cheslack-Postava wrote:
 Having the OS cache the data in Kafka's log files is useful since it means
 that data doesn't need to be read back from disk when consumed. This is
 good for the latency and throughput of consumers. Usually this caching
 works out pretty well, keeping the latest data from your topics in cache
 and only pulling older data into memory if a consumer reads data from
 earlier in the log. In other words, by leveraging OS-level caching of
 files, Kafka gets an in-memory caching layer for free.

 Generally you shouldn't need to clear this data -- the OS should only be
 using memory that isn't being used anyway. Is there a particular problem
 you're encountering that clearing the cache would help with?

 -Ewen

 On Mon, Jul 27, 2015 at 2:33 AM, Nilesh Chhapru 
 nilesh.chha...@ugamsolutions.com wrote:

 Hi All,

 I am facing issues with kafka broker process taking  a lot of cache
 memory, just wanted to know if the process really need that much of
 cache memory, or can i clear the OS level cache by setting a cron.

 Regards,
 Nilesh Chhapru.






Cache Memory Kafka Process

2015-07-27 Thread Nilesh Chhapru
Hi All,

I am facing issues with kafka broker process taking  a lot of cache
memory, just wanted to know if the process really need that much of
cache memory, or can i clear the OS level cache by setting a cron.

Regards,
Nilesh Chhapru.


Issues With Parallelism In Kafka Spout

2014-12-18 Thread Nilesh Chhapru
Hi All,

I have implemented a high level Kafka consumer in Storm but looks like the 
parallelism isn't getting achieved as I have 3 partitions and 2 task for the 
spout, but only one of it is emitting the data.
PFA the screen grab for number of task of spout and data emitted by only one of 
them.

Please assist on how to achieve parallelism using high level Kafka spout.

Regards,
Nilesh Chhapru.



---Disclaimer--

Opinions expressed in this e-mail are those of the author and do not 
necessarily represent those of Ugam. Ugam does not accept any responsibility or 
liability for it. This e-mail message may contain proprietary, confidential or 
legally privileged information for the sole use of the person or entity to whom 
this message was originally addressed. Any review, re-transmission, 
dissemination or other use of or taking of any action in reliance upon this 
information by persons or entities other than the intended recipient is 
prohibited. If you have received this e-mail in error, please delete it and all 
attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses 
however the recipient should check this email and any attachments for the 
presence of viruses. Ugam accepts no liability for any damage caused by any 
virus transmitted by this email. 


RE: Issues With Parallelism In Kafka Spout

2014-12-18 Thread Nilesh Chhapru
Hi All,

Please give some inputs as this is pending since long and need to meet the 
deadlines

Regards,
Nilesh Chhapru.

From: Nilesh Chhapru [mailto:nilesh.chha...@ugamsolutions.com]
Sent: 18 December 2014 01:24 PM
To: u...@storm.apache.org; users@kafka.apache.org
Subject: Issues With Parallelism In Kafka Spout

Hi All,

I have implemented a high level Kafka consumer in Storm but looks like the 
parallelism isn't getting achieved as I have 3 partitions and 2 task for the 
spout, but only one of it is emitting the data.
PFB the screen grab for number of task of spout and data emitted by only one of 
them.

Please assist on how to achieve parallelism using high level Kafka spout.

[cid:image001.png@01D01ABD.FA56FB90]

Regards,
Nilesh Chhapru.



---Disclaimer--

Opinions expressed in this e-mail are those of the author and do not 
necessarily represent those of Ugam. Ugam does not accept any responsibility or 
liability for it. This e-mail message may contain proprietary, confidential or 
legally privileged information for the sole use of the person or entity to whom 
this message was originally addressed. Any review, re-transmission, 
dissemination or other use of or taking of any action in reliance upon this 
information by persons or entities other than the intended recipient is 
prohibited. If you have received this e-mail in error, please delete it and all 
attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses 
however the recipient should check this email and any attachments for the 
presence of viruses. Ugam accepts no liability for any damage caused by any 
virus transmitted by this email. 


---Disclaimer--

Opinions expressed in this e-mail are those of the author and do not 
necessarily represent those of Ugam. Ugam does not accept any responsibility or 
liability for it. This e-mail message may contain proprietary, confidential or 
legally privileged information for the sole use of the person or entity to whom 
this message was originally addressed. Any review, re-transmission, 
dissemination or other use of or taking of any action in reliance upon this 
information by persons or entities other than the intended recipient is 
prohibited. If you have received this e-mail in error, please delete it and all 
attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses 
however the recipient should check this email and any attachments for the 
presence of viruses. Ugam accepts no liability for any damage caused by any 
virus transmitted by this email. 


Issues With Parallelism In Kafka Spout

2014-12-17 Thread Nilesh Chhapru
Hi All,

I have implemented a high level Kafka consumer in Storm but looks like the 
parallelism isn't getting achieved as I have 3 partitions and 2 task for the 
spout, but only one of it is emitting the data.
PFB the screen grab for number of task of spout and data emitted by only one of 
them.

Please assist on how to achieve parallelism using high level Kafka spout.

[cid:image001.png@01D01ABD.FA56FB90]

Regards,
Nilesh Chhapru.



---Disclaimer--

Opinions expressed in this e-mail are those of the author and do not 
necessarily represent those of Ugam. Ugam does not accept any responsibility or 
liability for it. This e-mail message may contain proprietary, confidential or 
legally privileged information for the sole use of the person or entity to whom 
this message was originally addressed. Any review, re-transmission, 
dissemination or other use of or taking of any action in reliance upon this 
information by persons or entities other than the intended recipient is 
prohibited. If you have received this e-mail in error, please delete it and all 
attachments from any servers, hard drives or any other media.

Warning: Sufficient measures have been taken to scan any presence of viruses 
however the recipient should check this email and any attachments for the 
presence of viruses. Ugam accepts no liability for any damage caused by any 
virus transmitted by this email.