Hi Tim, 

Not sure if this might be of any use in terms of improving overall
cluster performance for you, but I hope that it might shed some ideas
for you and others. 

https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:57, Tim Chou wrote: 

> Hi Jonathan, 
> 
> Very useful information. I will look at the ganglia. 
> 
> However, I do not have the administrative privilege for the cluster. I don't 
> know if I can install Ganglia in the cluster. 
> 
> Thank you for your information. 
> 
> Best, 
> Tim 
> 
> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <[email protected]>:
> 
> Where I am working we are working on transient cluster (temporary) using 
> Amazon EMR. When I was reading up on how things work they suggested for 
> monitoring to use ganglia to monitor memory usage and network usage etc. That 
> way depending on how things are setup be it using an amazon s3 bucket for 
> example and pulling data directly into the cluster the network link will 
> always be saturated to ensure a constant flow of data. 
> 
> What I am suggesting is potentially looking at ganglia. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which 
> value is the right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <[email protected]> 
> wrote: 
> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how 
> the files are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website 
> interface to check the memory.
> The values I get from them are different. I also found that the memory always 
> changes periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in 
> Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory 
> used in each Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used 
> in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim
 

Reply via email to