Thanks for your help. Yes, I had figured that out from the stack trace and
a glance at the code of the ItemSimilarityJob. I'm afraid I am a complete
newby in this field. I'm not sure what you mean when you say "Counters from
PreparePreferenceMatrixJob". As you can see in the "hadoop fs -lsr" output
I posted, there is only one file created that is not 0-size: temp/
prepareRatingMatrix/itemIDIndex/part-r-00000. Or do you mean something
specific from the output? As far as I can see, line 103 of the
PreparePfreferenceMatrixJob should have written the file.

I will try to run the job again and post the full output.

Thanks,
Teun


2013/5/25 Suneel Marthi <[email protected]>

> The error is quite clear from the stacktrace you had posted
>
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /user/ubuntu/temp/prepareRatingMatrix/numUsers.bin
>
> ItemSimilarityJob first executes a PreparePreferenceMatrixJob which
> generates the 'numUsers.bin' and apparently this wasn't generated.
> This is needed by the RowSimilarityJob which is the next phase in the
> execution of ItemSimilarityJob.
>
> Could u post the Counters from PreparePreferenceMatrixJob?
>
>
>
>
>
>
> ________________________________
>  From: Teun Duynstee <[email protected]>
> To: [email protected]
> Sent: Friday, May 24, 2013 3:44 AM
> Subject: FileNotFoundException while trying to run ItemSimilarityJob
>
>
> Hi all,
> I am trying to get the ItemSimilarityJob to work for me. I have a
> standalone Hadoop setup and I am running the Job like this
>
> ~$ hadoop jar
> /usr/local/mahout/core/target/mahout-core-0.8-SNAPSHOT-job.jar
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input
> /user/ubuntu
>  --output /output --similarityClassname SIMILARITY_COOCCURRENCE
>
> In the HDFS folder /user/ubuntu, I have a somewhat large (2G) file with
> records of the form:
> [userID],[objectID],1
>
> I can see hadoop starting on the job and mapping and reducing away for a
> while, but then at some point it fails, logging:
>
> 13/05/23 17:21:10 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:10 INFO mapred.MapTask: Finished spill 7
> 13/05/23 17:21:10 INFO mapred.MapTask: Starting flush of map output
> 13/05/23 17:21:12 INFO mapred.MapTask: Finished spill 8
> 13/05/23 17:21:12 INFO mapred.Merger: Merging 9 sorted segments
> 13/05/23 17:21:12 INFO mapred.Merger: Down to the last merge-pass, with 9
> segmen
> ts left of total size: 8563929 bytes
> 13/05/23 17:21:13 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:13 INFO mapred.JobClient:  map 100% reduce 0%
> 13/05/23 17:21:16 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:19 INFO
>  mapred.LocalJobRunner:
> 13/05/23 17:21:22 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:25 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:25 INFO mapred.Task: Task:attempt_local_0002_m_000029_0 is
> done.
> And is in the process of commiting
> 13/05/23 17:21:28 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:28 INFO mapred.LocalJobRunner:
> 13/05/23 17:21:28 INFO mapred.Task: Task 'attempt_local_0002_m_000029_0'
> done.
> 13/05/23 17:21:28 INFO mapred.Task:  Using ResourceCalculatorPlugin :
> org.apache
> .hadoop.util.LinuxResourceCalculatorPlugin@39c07f3a
> 13/05/23 17:21:28 INFO mapred.MapTask: io.sort.mb = 100
> 13/05/23 17:21:28 INFO mapred.MapTask: data buffer = 79691776/99614720
> 13/05/23 17:21:28 INFO mapred.MapTask: record buffer = 262144/327680
> 13/05/23 17:21:28 WARN
>  mapred.LocalJobRunner: job_local_0002
> java.io.FileNotFoundException: File does not exist: /user/ubuntu/temp
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
> va:1843)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java
> :1834)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
> em.java:154)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>         at
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(Lin
> eRecordReader.java:67)
>         at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(M
> apTask.java:522)
>         at
>  org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
> 12)
> 13/05/23 17:21:29 INFO mapred.JobClient: Job complete: job_local_0002
> 13/05/23 17:21:29 INFO mapred.JobClient: Counters: 17
> 13/05/23 17:21:29 INFO mapred.JobClient:   FileSystemCounters
> 13/05/23 17:21:29 INFO mapred.JobClient:     FILE_BYTES_READ=28341894805
> 13/05/23 17:21:29 INFO mapred.JobClient:     HDFS_BYTES_READ=90730532541
> 13/05/23 17:21:29 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=35182198348
> 13/05/23 17:21:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=841343250
> 13/05/23 17:21:29 INFO mapred.JobClient:   File Input Format Counters
> 13/05/23 17:21:29 INFO mapred.JobClient:     Bytes
>  Read=1985044985
> 13/05/23 17:21:29 INFO mapred.JobClient:   Map-Reduce Framework
> 13/05/23 17:21:29 INFO mapred.JobClient:     Map output materialized
> bytes=43663
> 5087
> 13/05/23 17:21:29 INFO mapred.JobClient:     Combine output records=0
> 13/05/23 17:21:29 INFO mapred.JobClient:     Map input records=117838687
> 13/05/23 17:21:29 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=0
> 13/05/23 17:21:29 INFO mapred.JobClient:     Spilled Records=289154604
> 13/05/23 17:21:29 INFO mapred.JobClient:     Map output bytes=1300200048
> 13/05/23 17:21:29 INFO mapred.JobClient:     CPU time spent (ms)=0
> 13/05/23 17:21:29 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=
> 3796746240
> 13/05/23 17:21:29 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=0
> 13/05/23 17:21:29 INFO mapred.JobClient:
>     Combine input records=0
> 13/05/23 17:21:29 INFO mapred.JobClient:     Map output records=117838687
> 13/05/23 17:21:29 INFO mapred.JobClient:     SPLIT_RAW_BYTES=3240
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /
> user/ubuntu/temp/prepareRatingMatrix/numUsers.bin
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
> va:1843)
>         at
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java
> :1834)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
> em.java:154)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>         at
>  org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:290)
>         at
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.r
> un(ItemSimilarityJob.java:146)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.m
> ain(ItemSimilarityJob.java:94)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
> java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
> sorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:616)
>
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> The HDFS file system at this point contains this: (my input file is
> /user/ubuntu/input.txt)
>
> drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 13:16 /user
> drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:21 /user/ubuntu
> -rw-r--r--   3 ubuntu supergroup 1984926172 2013-05-23 13:06
> /user/ubuntu/input.txt
> drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:21
> /user/ubuntu/temp
> drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:47
> /user/ubuntu/temp/prepareRatingMatrix
> drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:47
> /user/ubuntu/temp/prepareRatingMatrix/itemIDIndex
> -rw-r--r--   3 ubuntu supergroup          0
>  2013-05-23 16:47
> /user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/_SUCCESS
> -rw-r--r--   3 ubuntu supergroup   28044775 2013-05-23 16:47
> /user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/part-r-00000
> drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 17:21
> /user/ubuntu/temp/prepareRatingMatrix/userVectors
>
> I have retried with identical reslts and I have tried a different
> similarityClassname with the same results. Where am I going wrong here?
>
> Thanks in advance for any pointers,
> Teun
>

Reply via email to