Thanks for your help. Yes, I had figured that out from the stack trace and a glance at the code of the ItemSimilarityJob. I'm afraid I am a complete newby in this field. I'm not sure what you mean when you say "Counters from PreparePreferenceMatrixJob". As you can see in the "hadoop fs -lsr" output I posted, there is only one file created that is not 0-size: temp/ prepareRatingMatrix/itemIDIndex/part-r-00000. Or do you mean something specific from the output? As far as I can see, line 103 of the PreparePfreferenceMatrixJob should have written the file.
I will try to run the job again and post the full output. Thanks, Teun 2013/5/25 Suneel Marthi <[email protected]> > The error is quite clear from the stacktrace you had posted > > Exception in thread "main" java.io.FileNotFoundException: File does not > exist: /user/ubuntu/temp/prepareRatingMatrix/numUsers.bin > > ItemSimilarityJob first executes a PreparePreferenceMatrixJob which > generates the 'numUsers.bin' and apparently this wasn't generated. > This is needed by the RowSimilarityJob which is the next phase in the > execution of ItemSimilarityJob. > > Could u post the Counters from PreparePreferenceMatrixJob? > > > > > > > ________________________________ > From: Teun Duynstee <[email protected]> > To: [email protected] > Sent: Friday, May 24, 2013 3:44 AM > Subject: FileNotFoundException while trying to run ItemSimilarityJob > > > Hi all, > I am trying to get the ItemSimilarityJob to work for me. I have a > standalone Hadoop setup and I am running the Job like this > > ~$ hadoop jar > /usr/local/mahout/core/target/mahout-core-0.8-SNAPSHOT-job.jar > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input > /user/ubuntu > --output /output --similarityClassname SIMILARITY_COOCCURRENCE > > In the HDFS folder /user/ubuntu, I have a somewhat large (2G) file with > records of the form: > [userID],[objectID],1 > > I can see hadoop starting on the job and mapping and reducing away for a > while, but then at some point it fails, logging: > > 13/05/23 17:21:10 INFO mapred.LocalJobRunner: > 13/05/23 17:21:10 INFO mapred.MapTask: Finished spill 7 > 13/05/23 17:21:10 INFO mapred.MapTask: Starting flush of map output > 13/05/23 17:21:12 INFO mapred.MapTask: Finished spill 8 > 13/05/23 17:21:12 INFO mapred.Merger: Merging 9 sorted segments > 13/05/23 17:21:12 INFO mapred.Merger: Down to the last merge-pass, with 9 > segmen > ts left of total size: 8563929 bytes > 13/05/23 17:21:13 INFO mapred.LocalJobRunner: > 13/05/23 17:21:13 INFO mapred.JobClient: map 100% reduce 0% > 13/05/23 17:21:16 INFO mapred.LocalJobRunner: > 13/05/23 17:21:19 INFO > mapred.LocalJobRunner: > 13/05/23 17:21:22 INFO mapred.LocalJobRunner: > 13/05/23 17:21:25 INFO mapred.LocalJobRunner: > 13/05/23 17:21:25 INFO mapred.Task: Task:attempt_local_0002_m_000029_0 is > done. > And is in the process of commiting > 13/05/23 17:21:28 INFO mapred.LocalJobRunner: > 13/05/23 17:21:28 INFO mapred.LocalJobRunner: > 13/05/23 17:21:28 INFO mapred.Task: Task 'attempt_local_0002_m_000029_0' > done. > 13/05/23 17:21:28 INFO mapred.Task: Using ResourceCalculatorPlugin : > org.apache > .hadoop.util.LinuxResourceCalculatorPlugin@39c07f3a > 13/05/23 17:21:28 INFO mapred.MapTask: io.sort.mb = 100 > 13/05/23 17:21:28 INFO mapred.MapTask: data buffer = 79691776/99614720 > 13/05/23 17:21:28 INFO mapred.MapTask: record buffer = 262144/327680 > 13/05/23 17:21:28 WARN > mapred.LocalJobRunner: job_local_0002 > java.io.FileNotFoundException: File does not exist: /user/ubuntu/temp > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja > va:1843) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java > :1834) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst > em.java:154) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) > at > org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(Lin > eRecordReader.java:67) > at > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(M > apTask.java:522) > at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2 > 12) > 13/05/23 17:21:29 INFO mapred.JobClient: Job complete: job_local_0002 > 13/05/23 17:21:29 INFO mapred.JobClient: Counters: 17 > 13/05/23 17:21:29 INFO mapred.JobClient: FileSystemCounters > 13/05/23 17:21:29 INFO mapred.JobClient: FILE_BYTES_READ=28341894805 > 13/05/23 17:21:29 INFO mapred.JobClient: HDFS_BYTES_READ=90730532541 > 13/05/23 17:21:29 INFO mapred.JobClient: FILE_BYTES_WRITTEN=35182198348 > 13/05/23 17:21:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=841343250 > 13/05/23 17:21:29 INFO mapred.JobClient: File Input Format Counters > 13/05/23 17:21:29 INFO mapred.JobClient: Bytes > Read=1985044985 > 13/05/23 17:21:29 INFO mapred.JobClient: Map-Reduce Framework > 13/05/23 17:21:29 INFO mapred.JobClient: Map output materialized > bytes=43663 > 5087 > 13/05/23 17:21:29 INFO mapred.JobClient: Combine output records=0 > 13/05/23 17:21:29 INFO mapred.JobClient: Map input records=117838687 > 13/05/23 17:21:29 INFO mapred.JobClient: Physical memory (bytes) > snapshot=0 > 13/05/23 17:21:29 INFO mapred.JobClient: Spilled Records=289154604 > 13/05/23 17:21:29 INFO mapred.JobClient: Map output bytes=1300200048 > 13/05/23 17:21:29 INFO mapred.JobClient: CPU time spent (ms)=0 > 13/05/23 17:21:29 INFO mapred.JobClient: Total committed heap usage > (bytes)= > 3796746240 > 13/05/23 17:21:29 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=0 > 13/05/23 17:21:29 INFO mapred.JobClient: > Combine input records=0 > 13/05/23 17:21:29 INFO mapred.JobClient: Map output records=117838687 > 13/05/23 17:21:29 INFO mapred.JobClient: SPLIT_RAW_BYTES=3240 > Exception in thread "main" java.io.FileNotFoundException: File does not > exist: / > user/ubuntu/temp/prepareRatingMatrix/numUsers.bin > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja > va:1843) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java > :1834) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) > at > org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst > em.java:154) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) > at > org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:290) > at > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.r > un(ItemSimilarityJob.java:146) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.m > ain(ItemSimilarityJob.java:94) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > The HDFS file system at this point contains this: (my input file is > /user/ubuntu/input.txt) > > drwxr-xr-x - ubuntu supergroup 0 2013-05-23 13:16 /user > drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:21 /user/ubuntu > -rw-r--r-- 3 ubuntu supergroup 1984926172 2013-05-23 13:06 > /user/ubuntu/input.txt > drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:21 > /user/ubuntu/temp > drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:47 > /user/ubuntu/temp/prepareRatingMatrix > drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:47 > /user/ubuntu/temp/prepareRatingMatrix/itemIDIndex > -rw-r--r-- 3 ubuntu supergroup 0 > 2013-05-23 16:47 > /user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/_SUCCESS > -rw-r--r-- 3 ubuntu supergroup 28044775 2013-05-23 16:47 > /user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/part-r-00000 > drwxr-xr-x - ubuntu supergroup 0 2013-05-23 17:21 > /user/ubuntu/temp/prepareRatingMatrix/userVectors > > I have retried with identical reslts and I have tried a different > similarityClassname with the same results. Where am I going wrong here? > > Thanks in advance for any pointers, > Teun >
