Hi All,I am having another issue with item similarity. For some reason
numUsers.bin file does not get generated. I am copying the command here:
./mahout itemsimilarity -i /scratch/SimilartyInput -o /scratch/SimilartyOutput
--tempDir /scratch/Similartytemp -s SIMILARITY_COOCCURRENCE
--maxSimilaritiesPerItem 10
The first MR job runs and then at the end of it I see the following error:
13/12/27 12:56:57 INFO mapred.JobClient: map 84% reduce 22%13/12/27 12:57:00
INFO mapred.JobClient: map 86% reduce 22%13/12/27 12:57:05 INFO
mapred.JobClient: Job complete: job_201311111627_043813/12/27 12:57:05 INFO
mapred.JobClient: Counters: 2413/12/27 12:57:05 INFO mapred.JobClient: Job
Counters13/12/27 12:57:05 INFO mapred.JobClient: Launched reduce
tasks=113/12/27 12:57:05 INFO mapred.JobClient:
SLOTS_MILLIS_MAPS=31478113/12/27 12:57:05 INFO mapred.JobClient: Total time
spent by all reduces waiting after reserving slots (ms)=013/12/27 12:57:05 INFO
mapred.JobClient: Total time spent by all maps waiting after reserving
slots (ms)=013/12/27 12:57:05 INFO mapred.JobClient: Rack-local map
tasks=1213/12/27 12:57:05 INFO mapred.JobClient: Launched map
tasks=6113/12/27 12:57:05 INFO mapred.JobClient: Data-local map
tasks=4913/12/27 12:57:05 INFO mapred.JobClient:
SLOTS_MILLIS_REDUCES=2706113/12/27 12:57:05 INFO mapred.JobClient: Failed
map tasks=113/12/27 12:57:05 INFO mapred.JobClient:
FileSystemCounters13/12/27 12:57:05 INFO mapred.JobClient:
HDFS_BYTES_READ=1927958413/12/27 12:57:05 INFO mapred.JobClient:
FILE_BYTES_WRITTEN=131048013/12/27 12:57:05 INFO mapred.JobClient: File Input
Format Counters13/12/27 12:57:05 INFO mapred.JobClient: Bytes
Read=1927253413/12/27 12:57:05 INFO mapred.JobClient: Map-Reduce
Framework13/12/27 12:57:05 INFO mapred.JobClient: Map output materialized
bytes=18969013/12/27 12:57:05 INFO mapred.JobClient: Combine output
records=4308113/12/27 12:57:05 INFO mapred.JobClient: Map input
records=129447813/12/27 12:57:05 INFO mapred.JobClient: Physical memory
(bytes) snapshot=1712499916813/12/27 12:57:05 INFO mapred.JobClient:
Spilled Records=4308113/12/27 12:57:05 INFO mapred.JobClient: Map output
bytes=775525813/12/27 12:57:05 INFO mapred.JobClient: CPU time spent
(ms)=3754013/12/27 12:57:05 INFO mapred.JobClient: Total committed heap
usage (bytes)=1799616921613/12/27 12:57:05 INFO mapred.JobClient: Virtual
memory (bytes) snapshot=12981121024013/12/27 12:57:05 INFO mapred.JobClient:
Combine input records=129447813/12/27 12:57:05 INFO mapred.JobClient: Map
output records=129447813/12/27 12:57:05 INFO mapred.JobClient:
SPLIT_RAW_BYTES=7050
Exception in thread "main" java.io.FileNotFoundException: File does not exist:
/scratch/Similartytemp/prepareRatingMatrix/numUsers.bin at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1843)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834) at
org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578) at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at
org.apache.mahout.common.HadoopUtil.readInt(HadoopUtil.java:339) at
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.run(ItemSimilarityJob.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.main(ItemSimilarityJob.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
org.apache.hadoop.util.RunJar.main(RunJar.java:156)
I checked the temp directory and here are its contents. I am not sure why the
numUsers.bin file is not generated.
-bash-4.1$ hadoop dfs -ls /scratch/Similartytemp/Warning: $HADOOP_HOME is
deprecated.
Found 1 itemsdrwxr-xr-x - userid supergroup 0 2013-12-27 12:56
/scratch/Similartytemp/prepareRatingMatrix