(Moving to user list, hdfs-dev bcc'd.)
Hi Prithvi,
>From a quick scan, it looks to me like one of your commands ends up using
"input_path" as a string literal instead of replacing with the value of the
input_path variable. I've pasted the command below. Notice that one of
the -file options used "input_path" instead of "$input_path".
Is that the problem?
Hope this helps,
--Chris
$hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
mapred.task.timeout=0 -D
mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
-D mapred.reduce.tasks=$num_of_reducer -input
input_BC_N$((num_of_node))_M$((num_of_mapper))
-output $output_path -file brandes_mapper -file src/mslab/BC_reducer.py
-file src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
$input_path $num_of_node" -reducer "./BC_reducer.py"
On Mon, Apr 22, 2013 at 10:11 AM, prithvi dammalapati <
[email protected]> wrote:
> I have the following hadoop code to find the betweenness centrality of a
> graph
>
> java_home=/usr/lib/jvm/java-1.7.0-openjdk-amd64
> hadoop_home=/usr/local/hadoop/hadoop-1.0.4
> hadoop_lib=$hadoop_home/hadoop-core-1.0.4.jar
> hadoop_bin=$hadoop_home/bin/hadoop
> hadoop_config=$hadoop_home/conf
>
> hadoop_streaming=$hadoop_home/contrib/streaming/hadoop-streaming-1.0.4.jar
> #task specific parameters
> source_code=BetweennessCentrality.java
> jar_file=BetweennessCentrality.jar
> main_class=mslab.BetweennessCentrality
> num_of_node=38012
> num_of_mapper=100
> num_of_reducer=8
> input_path=/data/dblp_author_conf_adj.txt
> output_path=dblp_bc_N$(($num_of_node))_M$((num_of_mapper))
> rm build -rf
> mkdir build
> $java_home/bin/javac -d build -classpath .:$hadoop_lib
> src/mslab/$source_code
> rm $jar_file -f
> $java_home/bin/jar -cf $jar_file -C build/ .
> $hadoop_bin --config $hadoop_config fs -rmr $output_path
> $hadoop_bin --config $hadoop_config jar $jar_file $main_class
> $num_of_node $num_of_mapper
>
> rm brandes_mapper
>
> g++ src/mslab/mapred_brandes.cpp -O3 -o brandes_mapper
> $hadoop_bin --config $hadoop_config jar $hadoop_streaming -D
> mapred.task.timeout=0 -D
> mapred.job.name="BC_N$((num_of_node))_M$((num_of_mapper))"
> -D mapred.reduce.tasks=$num_of_reducer -input
> input_BC_N$((num_of_node))_M$((num_of_mapper)) -output $output_path -file
> brandes_mapper -file src/mslab/BC_reducer.py -file
> src/mslab/MapReduceUtil.py -file input_path -mapper "./brandes_mapper
> $input_path $num_of_node" -reducer "./BC_reducer.py"
>
> When I run this code in a shell script, i get the following errors:
>
> Warning: $HADOOP_HOME is deprecated.
> File: /home/hduser/Downloads/mgmf/trunk/input_path does not exist, or
> is not readable.
> Streaming Command Failed!
>
> but the file exits at the specified path
>
> /Downloads/mgmf/trunk/data$ ls
> dblp_author_conf_adj.txt
>
> I have also added the input file into HDFS using
>
> /usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /source /destination
>
> Can someone help me solve this problem?
>
>
> Any help is appreciated,
> Thanks
> Prithvi
>