Thank you for your sharing. Appreciate.
Tim > On Feb 22, 2015, at 1:23 AM, Jonathan Aquilina <[email protected]> > wrote: > > Hi Tim, > > Not sure if this might be of any use in terms of improving overall cluster > performance for you, but I hope that it might shed some ideas for you and > others. > > https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf > > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > On 2015-02-22 07:57, Tim Chou wrote: > >> Hi Jonathan, >> >> Very useful information. I will look at the ganglia. >> >> However, I do not have the administrative privilege for the cluster. I don't >> know if I can install Ganglia in the cluster. >> >> Thank you for your information. >> >> Best, >> Tim >> >> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <[email protected] >> <mailto:[email protected]>>: >> Where I am working we are working on transient cluster (temporary) using >> Amazon EMR. When I was reading up on how things work they suggested for >> monitoring to use ganglia to monitor memory usage and network usage etc. >> That way depending on how things are setup be it using an amazon s3 bucket >> for example and pulling data directly into the cluster the network link will >> always be saturated to ensure a constant flow of data. >> >> What I am suggesting is potentially looking at ganglia. >> >> >> --- >> Regards, >> Jonathan Aquilina >> Founder Eagle Eye T >> On 2015-02-22 07:42, Fang Zhou wrote: >> >> Hi Jonathan, >> >> Thank you. >> >> The number of files impact on the memory usage in Namenode. >> >> I just want to get the real memory usage situation in Namenode. >> >> The memory used in heap always changes so that I have no idea about which >> value is the right one. >> >> Thanks, >> Tim >> >> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <[email protected] >> <mailto:[email protected]>> wrote: >> >> I am rather new to hadoop, but wouldnt the difference be potentially in how >> the files are split in terms of size? >> >> >> --- >> Regards, >> Jonathan Aquilina >> Founder Eagle Eye T >> On 2015-02-21 21:54, Fang Zhou wrote: >> >> Hi All, >> >> I want to test the memory usage on Namenode and Datanode. >> >> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website >> interface to check the memory. >> The values I get from them are different. I also found that the memory >> always changes periodically. >> This is the first thing confused me. >> >> I thought the more files stored in Namenode, the more memory usage in >> Namenode and Datanode. >> I also thought the memory used in Namenode should be larger than the memory >> used in each Datanode. >> However, some results show my ideas are wrong. >> For example, I test the memory usage of Namenode with 6000 and 1000 files. >> The "6000" memory is less than "1000" memory from jmap's results. >> I also found that the memory usage in Datanode is larger than the memory >> used in Namenode. >> >> I really don't know how to get the memory usage in Namenode and Datanode. >> >> Can anyone give me some advices? >> >> Thanks, >> Tim
