Ok, to give to you the solution to the namespace errors on the datanodes, the startup and the communication problem between datanodes/tasktracker and namenode/jobtracker i did the following:

As you can read on several sites: there are 2 strategies for fixing datanode namespaces. since i like to delete old stuff, cause it seems more reliable to me i wrote this script which can be called anytime to fix namespaces in an arbitrary complex enviroment:

############ SCRIPT OVER HERE##########
#!/bin/sh
~/hadoop-1.0.2/bin/stop-all.sh

rm curclean.sh

sleep 3

echo "#!/bin/sh" > curclean.sh
while read line
do
echo "ssh '$line' 'rm -rf /home/work/bmacek/hadoop/hdfs/slave" >> curclean.sh
done < "/home/fb16/bmacek/hadoop-1.0.2/conf/slaves"

/home/fb16/bmacek/curclean.sh

sleep 3

ssh $(< ~/hadoop-1.0.2/conf/namenode) "~/hadoop-1.0.2/bin/hadoop namenode -format"

#####################################

!!! WARNING ADAPT PATHS !!!



The next two problems could be avoided by setting the following properties in mapred-site.xml

############## FIX PORT PROBLEMS FOR SLAVES #############
    <property>
        <name>mapred.task.tracker.http.address</name>
        <value>0.0.0.0:0</value>
    </property>
    <property>
        <name>dfs.datanode.port</name>
        <value>0
    </property>



For people who are working with huge data i strongly recommend using:
   <property>
        <name>mapred.task.timeout</name>
        <value>0</value>
    </property>
Otherwise your job might fail due to reasons which you dont want to influence the jobexecution.


So much from me ... for now. ;)


Best regards and thanks for having a look into my problems here and there.
Björn

Reply via email to