Preceding line 103 in PreparePreferenceMatrixJob is a Counter - Counters.USERS from toUserVectors job. You should see that in the Hadoop JobTracker. Check if the value of this counter is being set.
I am not too familiar with this code, but could u try specifying the --tempDir <tempPath> explicitly and see the output (not sure if u had that already). ________________________________ From: Teun Duynstee <[email protected]> To: [email protected]; Suneel Marthi <[email protected]> Sent: Saturday, May 25, 2013 5:44 PM Subject: Re: FileNotFoundException while trying to run ItemSimilarityJob Thanks for your help. Yes, I had figured that out from the stack trace and a glance at the code of the ItemSimilarityJob. I'm afraid I am a complete newby in this field. I'm not sure what you mean when you say "Counters from PreparePreferenceMatrixJob". As you can see in the "hadoop fs -lsr" output I posted, there is only one file created that is not 0-size: temp/prepareRatingMatrix/itemIDIndex/part-r-00000. Or do you mean something specific from the output? As far as I can see, line 103 of the PreparePfreferenceMatrixJob should have written the file. I will try to run the job again and post the full output. Thanks, Teun 2013/5/25 Suneel Marthi <[email protected]> The error is quite clear from the stacktrace you had posted > > >Exception in thread "main" java.io.FileNotFoundException: File does not >exist: /user/ubuntu/temp/prepareRatingMatrix/numUsers.bin > >ItemSimilarityJob first executes a PreparePreferenceMatrixJob which generates >the 'numUsers.bin' and apparently this wasn't generated. >This is needed by the RowSimilarityJob which is the next phase in the >execution of ItemSimilarityJob. > >Could u post the Counters from PreparePreferenceMatrixJob? > > > > > > >________________________________ > From: Teun Duynstee <[email protected]> >To: [email protected] >Sent: Friday, May 24, 2013 3:44 AM >Subject: FileNotFoundException while trying to run ItemSimilarityJob > > > >Hi all, >I am trying to get the ItemSimilarityJob to work for me. I have a >standalone Hadoop setup and I am running the Job like this > >~$ hadoop jar >/usr/local/mahout/core/target/mahout-core-0.8-SNAPSHOT-job.jar >org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input >/user/ubuntu > --output /output --similarityClassname SIMILARITY_COOCCURRENCE > >In the HDFS folder /user/ubuntu, I have a somewhat large (2G) file with >records of the form: >[userID],[objectID],1 > >I can see hadoop starting on the job and mapping and reducing away for a >while, but then at some point it fails, logging: > >13/05/23 17:21:10 INFO mapred.LocalJobRunner: >13/05/23 17:21:10 INFO mapred.MapTask: Finished spill 7 >13/05/23 17:21:10 INFO mapred.MapTask: Starting flush of map output >13/05/23 17:21:12 INFO mapred.MapTask: Finished spill 8 >13/05/23 17:21:12 INFO mapred.Merger: Merging 9 sorted segments >13/05/23 17:21:12 INFO mapred.Merger: Down to the last merge-pass, with 9 >segmen >ts left of total size: 8563929 bytes >13/05/23 17:21:13 INFO mapred.LocalJobRunner: >13/05/23 17:21:13 INFO mapred.JobClient: map 100% reduce 0% >13/05/23 17:21:16 INFO mapred.LocalJobRunner: >13/05/23 17:21:19 INFO > mapred.LocalJobRunner: >13/05/23 17:21:22 INFO mapred.LocalJobRunner: >13/05/23 17:21:25 INFO mapred.LocalJobRunner: >13/05/23 17:21:25 INFO mapred.Task: Task:attempt_local_0002_m_000029_0 is >done. >And is in the process of commiting >13/05/23 17:21:28 INFO mapred.LocalJobRunner: >13/05/23 17:21:28 INFO mapred.LocalJobRunner: >13/05/23 17:21:28 INFO mapred.Task: Task 'attempt_local_0002_m_000029_0' >done. >13/05/23 17:21:28 INFO mapred.Task: Using ResourceCalculatorPlugin : >org.apache >.hadoop.util.LinuxResourceCalculatorPlugin@39c07f3a >13/05/23 17:21:28 INFO mapred.MapTask: io.sort.mb = 100 >13/05/23 17:21:28 INFO mapred.MapTask: data buffer = 79691776/99614720 >13/05/23 17:21:28 INFO mapred.MapTask: record buffer = 262144/327680 >13/05/23 17:21:28 WARN > mapred.LocalJobRunner: job_local_0002 >java.io.FileNotFoundException: File does not exist: /user/ubuntu/temp > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja >va:1843) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java >:1834) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) > at >org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst >em.java:154) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) > at >org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(Lin >eRecordReader.java:67) > at >org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(M >apTask.java:522) > at > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) > at >org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2 >12) >13/05/23 17:21:29 INFO mapred.JobClient: Job complete: job_local_0002 >13/05/23 17:21:29 INFO mapred.JobClient: Counters: 17 >13/05/23 17:21:29 INFO mapred.JobClient: FileSystemCounters >13/05/23 17:21:29 INFO mapred.JobClient: FILE_BYTES_READ=28341894805 >13/05/23 17:21:29 INFO mapred.JobClient: HDFS_BYTES_READ=90730532541 >13/05/23 17:21:29 INFO mapred.JobClient: FILE_BYTES_WRITTEN=35182198348 >13/05/23 17:21:29 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=841343250 >13/05/23 17:21:29 INFO mapred.JobClient: File Input Format Counters >13/05/23 17:21:29 INFO mapred.JobClient: Bytes > Read=1985044985 >13/05/23 17:21:29 INFO mapred.JobClient: Map-Reduce Framework >13/05/23 17:21:29 INFO mapred.JobClient: Map output materialized >bytes=43663 >5087 >13/05/23 17:21:29 INFO mapred.JobClient: Combine output records=0 >13/05/23 17:21:29 INFO mapred.JobClient: Map input records=117838687 >13/05/23 17:21:29 INFO mapred.JobClient: Physical memory (bytes) >snapshot=0 >13/05/23 17:21:29 INFO mapred.JobClient: Spilled Records=289154604 >13/05/23 17:21:29 INFO mapred.JobClient: Map output bytes=1300200048 >13/05/23 17:21:29 INFO mapred.JobClient: CPU time spent (ms)=0 >13/05/23 17:21:29 INFO mapred.JobClient: Total committed heap usage >(bytes)= >3796746240 >13/05/23 17:21:29 INFO mapred.JobClient: Virtual memory (bytes) >snapshot=0 >13/05/23 17:21:29 INFO mapred.JobClient: > Combine input records=0 >13/05/23 17:21:29 INFO mapred.JobClient: Map output records=117838687 >13/05/23 17:21:29 INFO mapred.JobClient: SPLIT_RAW_BYTES=3240 >Exception in thread "main" java.io.FileNotFoundException: File does not >exist: / >user/ubuntu/temp/prepareRatingMatrix/numUsers.bin > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja >va:1843) > at >org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java >:1834) > at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) > at >org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst >em.java:154) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) > at > org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:290) > at >org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.r >un(ItemSimilarityJob.java:146) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at >org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.m >ain(ItemSimilarityJob.java:94) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. >java:57) > at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces >sorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > >The HDFS file system at this point contains this: (my input file is >/user/ubuntu/input.txt) > >drwxr-xr-x - ubuntu supergroup 0 2013-05-23 13:16 /user >drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:21 /user/ubuntu >-rw-r--r-- 3 ubuntu supergroup 1984926172 2013-05-23 13:06 >/user/ubuntu/input.txt >drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:21 >/user/ubuntu/temp >drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:47 >/user/ubuntu/temp/prepareRatingMatrix >drwxr-xr-x - ubuntu supergroup 0 2013-05-23 16:47 >/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex >-rw-r--r-- 3 ubuntu supergroup 0 > 2013-05-23 16:47 >/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/_SUCCESS >-rw-r--r-- 3 ubuntu supergroup 28044775 2013-05-23 16:47 >/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/part-r-00000 >drwxr-xr-x - ubuntu supergroup 0 2013-05-23 17:21 >/user/ubuntu/temp/prepareRatingMatrix/userVectors > >I have retried with identical reslts and I have tried a different >similarityClassname with the same results. Where am I going wrong here? > >Thanks in advance for any pointers, >Teun
