It's possible to do the whole thing in one round of map/reduce.
The only requirement is to be able to differentiate between the 2
different types of input files, possibly using different file name
extensions.
One of my coworkers wrote a smart InputFormat class that creates a
different
1. You can if you copy the cfg file into HDFS. Otherwise, it's local to
one node, and can't be accessed by map/reduce jobs running on other
nodes.
2. You can write your own RecordReader/InputFormat classes and handle
input files in any formats of your own.
Nathan
-Original Message-
Hi,
We're having problems when trying to deal with the namenode failover, by
following the wiki
http://wiki.apache.org/hadoop/NameNodeFailover
If we point dfs.name.dir to 2 local directories, it works fine.
But, if one of the directories is NFS mounted, we're having these problems:
1)
Right, you can't add that line globally. That will affect all processes.
What you can do is to modify this file: HADOOP_HOME/bin/hadoop.
For each process, give a different port number.
For example, for tasktracker, assign port 12345:
...
elif [ $COMMAND = tasktracker ] ; then
It depends on the uniqueness of your input data and maybe on how you
implemented concatenateValues.
Since you're collecting twice for each line, on both subject and object, then
concatenating the original line twice again.
If you have many rows with the same subjects and objects, you'll end up