Hi,
I am running into some difficulty running the Mahout in Action Ch.06 Hadoop
RecommenderJob example and was hoping someone can help me out. I am using
the CDH3 VMWare image from Cloudera on my laptop. I first downloaded the
links-simple-sorted.txt dataset and added to HDFS as input/input.txt. I
also added the users.txt file with the number 3 on the first and only line.
[cloudera@localhost ~]$ hadoop fs -ls input
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2012-06-18 12:26
/user/cloudera/input/input.txt
-rw-r--r-- 1 cloudera supergroup 0 2012-06-18 12:28
/user/cloudera/input/users.txt
The following is my commandline.
[cloudera@localhost mahout-distribution-0.5]$ hadoop jar
mahout-core-0.5-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
input/users.txt --booleanData
It gets about 2/3 done with full run and throws the following
NumberFormatException.
12/06/18 18:04:39 INFO mapred.JobClient: Running job: job_201206181608_0047
12/06/18 18:04:40 INFO mapred.JobClient: map 0% reduce 0%
12/06/18 18:04:45 INFO mapred.JobClient: map 100% reduce 0%
12/06/18 18:04:53 INFO mapred.JobClient: map 100% reduce 33%
12/06/18 18:04:55 INFO mapred.JobClient: map 100% reduce 100%
12/06/18 18:04:55 INFO mapred.JobClient: Job complete: job_201206181608_0047
12/06/18 18:04:55 INFO mapred.JobClient: Counters: 26
12/06/18 18:04:55 INFO mapred.JobClient: Job Counters
12/06/18 18:04:55 INFO mapred.JobClient: Launched reduce tasks=1
12/06/18 18:04:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5261
12/06/18 18:04:55 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
12/06/18 18:04:55 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/06/18 18:04:55 INFO mapred.JobClient: Launched map tasks=1
12/06/18 18:04:55 INFO mapred.JobClient: Data-local map tasks=1
12/06/18 18:04:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9266
12/06/18 18:04:55 INFO mapred.JobClient: FileSystemCounters
12/06/18 18:04:55 INFO mapred.JobClient: FILE_BYTES_READ=22
12/06/18 18:04:55 INFO mapred.JobClient: HDFS_BYTES_READ=226
12/06/18 18:04:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=113312
12/06/18 18:04:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=97
12/06/18 18:04:55 INFO mapred.JobClient: Map-Reduce Framework
12/06/18 18:04:55 INFO mapred.JobClient: Map input records=0
12/06/18 18:04:55 INFO mapred.JobClient: Reduce shuffle bytes=14
12/06/18 18:04:55 INFO mapred.JobClient: Spilled Records=0
12/06/18 18:04:55 INFO mapred.JobClient: Map output bytes=0
12/06/18 18:04:55 INFO mapred.JobClient: CPU time spent (ms)=1970
12/06/18 18:04:55 INFO mapred.JobClient: Total committed heap usage
(bytes)=210960384
12/06/18 18:04:55 INFO mapred.JobClient: Combine input records=0
12/06/18 18:04:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=123
12/06/18 18:04:55 INFO mapred.JobClient: Reduce input records=0
12/06/18 18:04:55 INFO mapred.JobClient: Reduce input groups=0
12/06/18 18:04:55 INFO mapred.JobClient: Combine output records=0
12/06/18 18:04:55 INFO mapred.JobClient: Physical memory (bytes)
snapshot=302817280
12/06/18 18:04:55 INFO mapred.JobClient: Reduce output records=0
12/06/18 18:04:55 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=1217081344
12/06/18 18:04:55 INFO mapred.JobClient: Map output records=0
Exception in thread "main" java.lang.NumberFormatException: For input
string: ""
at java.lang.NumberFormatException.forInputString(
NumberFormatException.java:48)
at java.lang.Integer.parseInt(Integer.java:470)
at java.lang.Integer.parseInt(Integer.java:499)
at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.
readIntFromFile(TasteHadoopUtils.java:93)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
run(RecommenderJob.java:215)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
main(RecommenderJob.java:333)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
[cloudera@localhost mahout-distribution-0.5]$
Since these are just using the out of the box code and dataset I figured I
configured something incorrectly. Has anyone seen this before?
Out of curiosity I tried this on the current 0.7 release and got a
different error. I got the following ClassCastException with the
similarity classname.
[cloudera@localhost mahout-distribution-0.7]$ hadoop jar
mahout-core-0.7-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
input/users.txt --booleanData --similarityClassname
org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
12/06/18 17:49:08 INFO mapred.JobClient: Task Id :
attempt_201206181608_0042_m_000000_1, Status : FAILED
java.lang.ClassCastException: class
org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
at java.lang.Class.asSubclass(Class.java:3018)
at
org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
at
org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$VectorNormMapper.setup(RowSimilarityJob.java:194)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
Any help would be greatly appreciated.
-Jonathan