Hi Tim,
Not sure if this might be of any use in terms of improving overall cluster performance for you, but I hope that it might shed some ideas for you and others. https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf --- Regards, Jonathan Aquilina Founder Eagle Eye T On 2015-02-22 07:57, Tim Chou wrote: > Hi Jonathan, > > Very useful information. I will look at the ganglia. > > However, I do not have the administrative privilege for the cluster. I don't > know if I can install Ganglia in the cluster. > > Thank you for your information. > > Best, > Tim > > 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <[email protected]>: > > Where I am working we are working on transient cluster (temporary) using > Amazon EMR. When I was reading up on how things work they suggested for > monitoring to use ganglia to monitor memory usage and network usage etc. That > way depending on how things are setup be it using an amazon s3 bucket for > example and pulling data directly into the cluster the network link will > always be saturated to ensure a constant flow of data. > > What I am suggesting is potentially looking at ganglia. > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > > On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, > > Thank you. > > The number of files impact on the memory usage in Namenode. > > I just want to get the real memory usage situation in Namenode. > > The memory used in heap always changes so that I have no idea about which > value is the right one. > > Thanks, > Tim > > On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <[email protected]> > wrote: > > I am rather new to hadoop, but wouldnt the difference be potentially in how > the files are split in terms of size? > > --- > Regards, > Jonathan Aquilina > Founder Eagle Eye T > > On 2015-02-21 21:54, Fang Zhou wrote: > > Hi All, > > I want to test the memory usage on Namenode and Datanode. > > I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website > interface to check the memory. > The values I get from them are different. I also found that the memory always > changes periodically. > This is the first thing confused me. > > I thought the more files stored in Namenode, the more memory usage in > Namenode and Datanode. > I also thought the memory used in Namenode should be larger than the memory > used in each Datanode. > However, some results show my ideas are wrong. > For example, I test the memory usage of Namenode with 6000 and 1000 files. > The "6000" memory is less than "1000" memory from jmap's results. > I also found that the memory usage in Datanode is larger than the memory used > in Namenode. > > I really don't know how to get the memory usage in Namenode and Datanode. > > Can anyone give me some advices? > > Thanks, > Tim
