That may have been it.. I'm not sure though. This command seems to work: hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir /user/mruno/temp -o /user/mruno/output -i /user/mruno/input
I hope this helps anyone who's trying to run this stuff.. --Matthew On Thu, Feb 24, 2011 at 11:46 AM, Sebastian Schelter <[email protected]> wrote: > Hi Matthew, > > I can't really see what's wrong, only thing that makes me wonder is > that your input and output dir are the same, you sure that's right? > > --sebastian > > 2011/2/24 Matthew Runo <[email protected]>: >> Hello folks - >> >> I made an attempt at running the ItemSimilarityJob on our hadoop >> cluster today, but I can't seem to get past this error: >> >> 11/02/24 09:18:17 INFO mapred.JobClient: Task Id : >> attempt_201102231433_0008_m_000070_0, Status : FAILED >> java.io.FileNotFoundException: File does not exist: /user/mruno/temp >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1586) >> at >> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1577) >> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:428) >> at >> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) >> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) >> at >> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:67) >> at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:450) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:645) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:240) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> at org.apache.hadoop.mapred.Child.main(Child.java:234) >> >> I ran the job with this command: >> >> hadoop jar mahout-core-0.5-SNAPSHOT-jar-with-dependencies.jar >> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob >> -Dmapred.input.dir=/user/mruno -Dmapred.output.dir=/user/mruno >> --similarityClassname SIMILARITY_LOGLIKELIHOOD --tempDir >> /user/mruno/temp >> >> The "first" job runs fine, but the second one it spawns after that always >> fails: >> ItemSimilarityJob-ItemIDIndexMapper-ItemIDIndexReducer (runs fine) >> ItemSimilarityJob-CountUsersMapper-CountUsersReducer (fails with above error) >> ItemSimilarityJob-ToItemPrefsMapper-ToUserVectorReducer (fails with above >> error) >> >> If I look in HDFS, I have the following directory structure: >> /user/mruno/input-data-file.csv >> /user/mruno/temp/countUsers/... >> /user/mruno/temp/itemIDIndex/... >> /user/mruno/temp//userVectors/... >> >> ...so obviously the path I gave for --tempDir exists and is writable, >> after all the job created all that stuff just fine except for the >> input file. >> >> Does anyone have an idea on this? I'm sort of lost as to where to >> start, the exception isn't all that helpful. >> >> If I look at the job's XML file, I see that it has mapred.output.dir >> set to /user/mruno/temp/userVectors, which does exist there. >> >> I'd appreciate any ideas, and I apologize if this would be better >> asked on the Hadoop message list but I thought I'd try here first >> since it was specific to the ItemSimilarityJob. >> >
