Hi,
Depending on the version you are using there are some ways to monitor jobs.
You can use Hue (cloudera technology) which has a job monitoring system, but 
you could also use the "Yarn Resource Manager UI" to follow jobs.

Monitoring of nodes can be done through ambari(https://ambari.apache.org/) or 
Cloudera Manager (only available for cloudera distributions).

As far as I know the replication process for HDFS can not be changed to favour 
nodes.
An even distribution is needed in order to have an evenly spreaded load.
If replication blocks get corrupted this will be visible in the logs but the 
namenode will auto correct the problem by creating a new version of the block.
Normally you will have a replication factor of 3, but you can change this, if 
you want data to be spread across more nodes.

Hope this answers some questions.

With Regards,
Yves
From: [email protected]
To: [email protected]
Subject: Monitoring dashboard for Hadoop?
Date: Wed, 3 Jun 2015 17:25:43 -0400

Hello, I’m new to Hadoop and successfully built a fully distributed cluster of 
3 nodes (1 master, 2 slaves) as a proof of concept. I have some questions 
below. Is there a dashboard to monitor the progress of a mapreduce computation? 
1.       I’m looking to ensure the computation gets allocated and uses the 
correct number of computation nodes2.       Monitor computation on the nodes 
(up/down/in-progress/completed)3.       If possible direct computation to 
specific group of nodes (depending on the computation priority). Similarly for 
HDFS1.       Ensure data file gets replicated to the correct number of nodes2.  
     If possible prioritize data replication (i.e. replicate data files that 
are accessed frequently to nodes that have better hardware, so some sort of 
load balancing distribution) Many Thanks, Caesar.                               
       

Reply via email to