Hello folks -
I made an attempt at running the ItemSimilarityJob on our hadoop
cluster today, but I can't seem to get past this error:
11/02/24 09:18:17 INFO mapred.JobClient: Task Id :
attempt_201102231433_0008_m_000070_0, Status : FAILED
java.io.FileNotFoundException: File does not exist: /user/mruno/temp
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456)
at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:234)
I ran the job with this command:
hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
-Dmapred.input.dir=/user/mruno -Dmapred.output.dir=/user/mruno
--similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir
/user/mruno/temp
The "first" job runs fine, but the second one it spawns after that always fails:
ItemSimilarityJob-ItemIDIndexMapper-ItemIDIndexReducer (runs fine)
ItemSimilarityJob-CountUsersMapper-CountUsersReducer (fails with above error)
ItemSimilarityJob-ToItemPrefsMapper-ToUserVectorReducer (fails with above error)
If I look in HDFS, I have the following directory structure:
/user/mruno/input-data-file.csv
/user/mruno/temp/countUsers/...
/user/mruno/temp/itemIDIndex/...
/user/mruno/temp//userVectors/...
...so obviously the path I gave for --tempDir exists and is writable,
after all the job created all that stuff just fine except for the
input file.
Does anyone have an idea on this? I'm sort of lost as to where to
start, the exception isn't all that helpful.
If I look at the job's XML file, I see that it has mapred.output.dir
set to /user/mruno/temp/userVectors, which does exist there.
I'd appreciate any ideas, and I apologize if this would be better
asked on the Hadoop message list but I thought I'd try here first
since it was specific to the ItemSimilarityJob.