FileNotFoundException while trying to run ItemSimilarityJob

Teun Duynstee Fri, 24 May 2013 05:47:20 -0700

[if you got this mail twice, please accept my apologies, but I didn't
receive it myself after posting]


Hi all,
I am trying to get the ItemSimilarityJob to work for me. I have a
standalone Hadoop setup and I am running the Job like this

~$ hadoop jar
/usr/local/mahout/core/target/mahout-core-0.8-SNAPSHOT-job.jar
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input
/user/ubuntu --output /output --similarityClassname SIMILARITY_COOCCURRENCE

In the HDFS folder /user/ubuntu, I have a somewhat large (2G) file with
records of the form:
[userID],[objectID],1

I can see hadoop starting on the job and mapping and reducing away for a
while, but then at some point it fails, logging:

13/05/23 17:21:10 INFO mapred.LocalJobRunner:
13/05/23 17:21:10 INFO mapred.MapTask: Finished spill 7
13/05/23 17:21:10 INFO mapred.MapTask: Starting flush of map output
13/05/23 17:21:12 INFO mapred.MapTask: Finished spill 8
13/05/23 17:21:12 INFO mapred.Merger: Merging 9 sorted segments
13/05/23 17:21:12 INFO mapred.Merger: Down to the last merge-pass, with 9
segmen
ts left of total size: 8563929 bytes
13/05/23 17:21:13 INFO mapred.LocalJobRunner:
13/05/23 17:21:13 INFO mapred.JobClient:  map 100% reduce 0%
13/05/23 17:21:16 INFO mapred.LocalJobRunner:
13/05/23 17:21:19 INFO mapred.LocalJobRunner:
13/05/23 17:21:22 INFO mapred.LocalJobRunner:
13/05/23 17:21:25 INFO mapred.LocalJobRunner:
13/05/23 17:21:25 INFO mapred.Task: Task:attempt_local_0002_m_000029_0 is
done.
And is in the process of commiting
13/05/23 17:21:28 INFO mapred.LocalJobRunner:
13/05/23 17:21:28 INFO mapred.LocalJobRunner:
13/05/23 17:21:28 INFO mapred.Task: Task 'attempt_local_0002_m_000029_0'
done.
13/05/23 17:21:28 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache
.hadoop.util.LinuxResourceCalculatorPlugin@39c07f3a
13/05/23 17:21:28 INFO mapred.MapTask: io.sort.mb = 100
13/05/23 17:21:28 INFO mapred.MapTask: data buffer = 79691776/99614720
13/05/23 17:21:28 INFO mapred.MapTask: record buffer = 262144/327680
13/05/23 17:21:28 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File does not exist: /user/ubuntu/temp
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
va:1843)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java
:1834)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
em.java:154)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
        at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(Lin
eRecordReader.java:67)
        at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(M
apTask.java:522)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:2
12)
13/05/23 17:21:29 INFO mapred.JobClient: Job complete: job_local_0002
13/05/23 17:21:29 INFO mapred.JobClient: Counters: 17
13/05/23 17:21:29 INFO mapred.JobClient:   FileSystemCounters
13/05/23 17:21:29 INFO mapred.JobClient:     FILE_BYTES_READ=28341894805
13/05/23 17:21:29 INFO mapred.JobClient:     HDFS_BYTES_READ=90730532541
13/05/23 17:21:29 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=35182198348
13/05/23 17:21:29 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=841343250
13/05/23 17:21:29 INFO mapred.JobClient:   File Input Format Counters
13/05/23 17:21:29 INFO mapred.JobClient:     Bytes Read=1985044985
13/05/23 17:21:29 INFO mapred.JobClient:   Map-Reduce Framework
13/05/23 17:21:29 INFO mapred.JobClient:     Map output materialized
bytes=43663
5087
13/05/23 17:21:29 INFO mapred.JobClient:     Combine output records=0
13/05/23 17:21:29 INFO mapred.JobClient:     Map input records=117838687
13/05/23 17:21:29 INFO mapred.JobClient:     Physical memory (bytes)
snapshot=0
13/05/23 17:21:29 INFO mapred.JobClient:     Spilled Records=289154604
13/05/23 17:21:29 INFO mapred.JobClient:     Map output bytes=1300200048
13/05/23 17:21:29 INFO mapred.JobClient:     CPU time spent (ms)=0
13/05/23 17:21:29 INFO mapred.JobClient:     Total committed heap usage
(bytes)=
3796746240
13/05/23 17:21:29 INFO mapred.JobClient:     Virtual memory (bytes)
snapshot=0
13/05/23 17:21:29 INFO mapred.JobClient:     Combine input records=0
13/05/23 17:21:29 INFO mapred.JobClient:     Map output records=117838687
13/05/23 17:21:29 INFO mapred.JobClient:     SPLIT_RAW_BYTES=3240
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /
user/ubuntu/temp/prepareRatingMatrix/numUsers.bin
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.ja
va:1843)
        at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java
:1834)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)
        at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst
em.java:154)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
        at org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:290)
        at
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.r
un(ItemSimilarityJob.java:146)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.m
ain(ItemSimilarityJob.java:94)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


The HDFS file system at this point contains this: (my input file is
/user/ubuntu/input.txt)

drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 13:16 /user
drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:21 /user/ubuntu
-rw-r--r--   3 ubuntu supergroup 1984926172 2013-05-23 13:06
/user/ubuntu/input.txt
drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:21
/user/ubuntu/temp
drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:47
/user/ubuntu/temp/prepareRatingMatrix
drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 16:47
/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex
-rw-r--r--   3 ubuntu supergroup          0 2013-05-23 16:47
/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/_SUCCESS
-rw-r--r--   3 ubuntu supergroup   28044775 2013-05-23 16:47
/user/ubuntu/temp/prepareRatingMatrix/itemIDIndex/part-r-00000
drwxr-xr-x   - ubuntu supergroup          0 2013-05-23 17:21
/user/ubuntu/temp/prepareRatingMatrix/userVectors

I have retried with identical reslts and I have tried a different
similarityClassname with the same results. Where am I going wrong here?

Thanks in advance for any pointers,
Teun

FileNotFoundException while trying to run ItemSimilarityJob

Reply via email to