Re: FileNotFoundException while trying to run ItemSimilarityJob

Suneel Marthi Sat, 25 May 2013 14:58:09 -0700

Preceding line 103 in PreparePreferenceMatrixJob is a Counter - Counters.USERS 
from toUserVectors job.
You should see that in the Hadoop JobTracker.  Check if the value of this 
counter is being set.


I am not too familiar with this code, but could u try specifying the --tempDir 
<tempPath> explicitly and see the output (not sure if u had that already).



________________________________
 From: Teun Duynstee <[email protected]>
To: [email protected]; Suneel Marthi <[email protected]> 
Sent: Saturday, May 25, 2013 5:44 PM
Subject: Re: FileNotFoundException while trying to run ItemSimilarityJob
 


Thanks for your help. Yes, I had figured that out from the stack trace and a 
glance at the code of the ItemSimilarityJob. I'm afraid I am a complete newby 
in this field. I'm not sure what you mean when you say "Counters from 
PreparePreferenceMatrixJob". As you can see in the "hadoop fs -lsr" output I 
posted, there is only one file created that is not 0-size: 
temp/prepareRatingMatrix/itemIDIndex/part-r-00000. Or do you mean something 
specific from the output? As far as I can see, line 103 of the 
PreparePfreferenceMatrixJob should have written the file. 

I will try to run the job again and post the full output. 

Thanks,
Teun



2013/5/25 Suneel Marthi <[email protected]>

The error is quite clear from the stacktrace you had posted
>
>
>Exception in thread "main" java.io.FileNotFoundException: File does not
>exist: /user/ubuntu/temp/prepareRatingMatrix/numUsers.bin
>
>ItemSimilarityJob first executes a PreparePreferenceMatrixJob which generates 
>the 'numUsers.bin' and apparently this wasn't generated.
>This is needed by the RowSimilarityJob which is the next phase in the 
>execution of ItemSimilarityJob.
>
>Could u post the Counters from PreparePreferenceMatrixJob?
>
>
>
>
>
>
>________________________________
> From: Teun Duynstee <[email protected]>
>To: [email protected]
>Sent: Friday, May 24, 2013 3:44 AM
>Subject: FileNotFoundException while trying to run ItemSimilarityJob
>
>
>
>Hi all,
>I am trying to get the ItemSimilarityJob to work for me. I have a
>standalone Hadoop setup and I am running the Job like this
>
>~$ hadoop jar
>/usr/local/mahout/core/target/mahout-core-0.8-SNAPSHOT-job.jar
>org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input
>/user/ubuntu
> --output /output --similarityClassname SIMILARITY_COOCCURRENCE
>
>In the HDFS folder /user/ubuntu, I have a somewhat large (2G) file with
>records of the form:
>[userID],[objectID],1
>
>I can see hadoop starting on the job and mapping and reducing away for a
>while, but then at some point it fails, logging:
>
>13/05/23 17:21:10 INFO mapred.LocalJobRunner:
>13/05/23 17:21:10 INFO mapred.MapTask: Finished spill 7
>13/05/23 17:21:10 INFO mapred.MapTask: Starting flush of map output
>13/05/23 17:21:12 INFO mapred.MapTask: Finished spill 8
>13/05/23 17:21:12 INFO mapred.Merger: Merging 9 sorted segments
>13/05/23 17:21:12 INFO mapred.Merger: Down to the last merge-pass, with 9
>segmen
>ts left of total size: 8563929 bytes
>13/05/23 17:21:13 INFO mapred.LocalJobRunner:
>13/05/23 17:21:13 INFO mapred.JobClient:  map 100% reduce 0%
>13/05/23 17:21:16 INFO mapred.LocalJobRunner:
>13/05/23 17:21:19 INFO
> mapred.LocalJobRunner:
>13/05/23 17:21:22 INFO mapred.LocalJobRunner:
>13/05/23 17:21:25 INFO mapred.LocalJobRunner:
>13/05/23 17:21:25 INFO mapred.Task: Task:attempt_local_0002_m_000029_0 is
>done.
>And is in the process of commiting
>13/05/23 17:21:28 INFO mapred.LocalJobRunner:
>13/05/23 17:21:28 INFO mapred.LocalJobRunner:
>13/05/23 17:21:28 INFO mapred.Task: Task 'attempt_local_0002_m_000029_0'
>done.
>13/05/23 17:21:28 INFO mapred.Task:  Using ResourceCalculatorPlugin :
>org.apache
>.hadoop.util.LinuxResourceCalculatorPlugin@39c07f3a
>13/05/23 17:21:28 INFO mapred.MapTask: io.sort.mb = 100
>13/05/23 17:21:28 INFO mapred.MapTask: data buffer = 79691776/99614720
>13/05/23 17:21:28 INFO mapred.MapTask: record buffer = 262144/327680
>13/05/23 17:21:28 WARN
> mapred.LocalJobRunner: job_local_0002
>java.io.FileNotFoundException: File does not exist: /user/ubuntu/temp
>        at
>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
>va:1843)
>        at
>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java
>:1834)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
>        at
>org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
>em.java:154)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>        at
>org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(Lin
>eRecordReader.java:67)
>        at
>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(M
>apTask.java:522)
>        at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>        at
>org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
>12)
>13/05/23 17:21:29 INFO mapred.JobClient: Job complete: job_local_0002
>13/05/23 17:21:29 INFO mapred.JobClient: Counters: 17
>13/05/23 17:21:29 INFO mapred.JobClient:   FileSystemCounters
>13/05/23 17:21:29 INFO mapred.JobClient:     FILE_BYTES_READ=28341894805
>13/05/23 17:21:29 INFO mapred.JobClient:     HDFS_BYTES_READ=90730532541
>13/05/23 17:21:29 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=35182198348
>13/05/23 17:21:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=841343250
>13/05/23 17:21:29 INFO mapred.JobClient:   File Input Format Counters
>13/05/23 17:21:29 INFO mapred.JobClient:     Bytes
> Read=1985044985
>13/05/23 17:21:29 INFO mapred.JobClient:   Map-Reduce Framework
>13/05/23 17:21:29 INFO mapred.JobClient:     Map output materialized
>bytes=43663
>5087
>13/05/23 17:21:29 INFO mapred.JobClient:     Combine output records=0
>13/05/23 17:21:29 INFO mapred.JobClient:     Map input records=117838687
>13/05/23 17:21:29 INFO mapred.JobClient:     Physical memory (bytes)
>snapshot=0
>13/05/23 17:21:29 INFO mapred.JobClient:     Spilled Records=289154604
>13/05/23 17:21:29 INFO mapred.JobClient:     Map output bytes=1300200048
>13/05/23 17:21:29 INFO mapred.JobClient:     CPU time spent (ms)=0
>13/05/23 17:21:29 INFO mapred.JobClient:     Total committed heap usage
>(bytes)=
>3796746240
>13/05/23 17:21:29 INFO mapred.JobClient:     Virtual memory (bytes)
>snapshot=0
>13/05/23 17:21:29 INFO mapred.JobClient: 
>    Combine input records=0
>13/05/23 17:21:29 INFO mapred.JobClient:     Map output records=117838687
>13/05/23 17:21:29 INFO mapred.JobClient:     SPLIT_RAW_BYTES=3240
>Exception in thread "main" java.io.FileNotFoundException: File does not
>exist: /
>user/ubuntu/temp/prepareRatingMatrix/numUsers.bin
>        at
>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
>va:1843)
>        at
>org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java
>:1834)
>        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
>        at
>org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
>em.java:154)
>        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
>        at
> org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:290)
>        at
>org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.r
>un(ItemSimilarityJob.java:146)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>        at
>org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.m
>ain(ItemSimilarityJob.java:94)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
>java:57)
>        at
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>sorImpl.java:43)
>        at java.lang.reflect.Method.invoke(Method.java:616)
> 
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
>The HDFS file system at this point contains this: (my input file is
>/user/ubuntu/input.txt)
>
>drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 13:16 /user
>drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:21 /user/ubuntu
>-rw-r--r--   3 ubuntu supergroup 1984926172 2013-05-23 13:06
>/user/ubuntu/input.txt
>drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:21
>/user/ubuntu/temp
>drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:47
>/user/ubuntu/temp/prepareRatingMatrix
>drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:47
>/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex
>-rw-r--r--   3 ubuntu supergroup          0
> 2013-05-23 16:47
>/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/_SUCCESS
>-rw-r--r--   3 ubuntu supergroup   28044775 2013-05-23 16:47
>/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/part-r-00000
>drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 17:21
>/user/ubuntu/temp/prepareRatingMatrix/userVectors
>
>I have retried with identical reslts and I have tried a different
>similarityClassname with the same results. Where am I going wrong here?
>
>Thanks in advance for any pointers,
>Teun

Re: FileNotFoundException while trying to run ItemSimilarityJob

Reply via email to