Ok, to give to you the solution to the namespace errors on the
datanodes, the startup and the communication problem between
datanodes/tasktracker and namenode/jobtracker i did the following:
As you can read on several sites: there are 2 strategies for fixing
datanode namespaces. since i like to delete old stuff, cause it seems
more reliable to me i wrote this script which can be called anytime to
fix namespaces in an arbitrary complex enviroment:
############ SCRIPT OVER HERE##########
#!/bin/sh
~/hadoop-1.0.2/bin/stop-all.sh
rm curclean.sh
sleep 3
echo "#!/bin/sh" > curclean.sh
while read line
do
echo "ssh '$line' 'rm -rf /home/work/bmacek/hadoop/hdfs/slave" >>
curclean.sh
done < "/home/fb16/bmacek/hadoop-1.0.2/conf/slaves"
/home/fb16/bmacek/curclean.sh
sleep 3
ssh $(< ~/hadoop-1.0.2/conf/namenode) "~/hadoop-1.0.2/bin/hadoop
namenode -format"
#####################################
!!! WARNING ADAPT PATHS !!!
The next two problems could be avoided by setting the following
properties in mapred-site.xml
############## FIX PORT PROBLEMS FOR SLAVES #############
<property>
<name>mapred.task.tracker.http.address</name>
<value>0.0.0.0:0</value>
</property>
<property>
<name>dfs.datanode.port</name>
<value>0
</property>
For people who are working with huge data i strongly recommend using:
<property>
<name>mapred.task.timeout</name>
<value>0</value>
</property>
Otherwise your job might fail due to reasons which you dont want to
influence the jobexecution.
So much from me ... for now. ;)
Best regards and thanks for having a look into my problems here and there.
Björn