Please use a later version of mahout! The 0.5 release has a major bug in the recommendation code.
2012/6/19 Jonathan Hodges <[email protected]>: > After rereading my email I noticed that the input.txt and users.txt files > had size of 0. I deleted and then put them in HDFS. > > > [cloudera@localhost mahout-distribution-0.5]$ hadoop fs -ls input > Found 2 items > -rw-r--r-- 1 cloudera supergroup 1058414409 2012-06-18 19:37 > /user/cloudera/input/input.txt > -rw-r--r-- 1 cloudera supergroup 2 2012-06-18 19:00 > /user/cloudera/input/users.txt > > > Now that I have verified the correct sizes I rerun the following command > and get an ArrayIndexOutOfBoundsException. > > > [cloudera@localhost mahout-distribution-0.5]$ hadoop jar > mahout-core-0.5-job.jar > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile > input/users.txt --booleanData > 12/06/18 19:39:58 INFO common.AbstractJob: Command line arguments: > {--booleanData=false, --endPhase=2147483647, --maxCooccurrencesPerItem=100, > --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1, > --numRecommendations=10, --similarityClassname=SIMILARITY_COOCCURRENCE, > --startPhase=0, --tempDir=temp, --usersFile=input/users.txt} > 12/06/18 19:40:02 INFO input.FileInputFormat: Total input paths to process > : 1 > 12/06/18 19:40:02 WARN snappy.LoadSnappy: Snappy native library is available > 12/06/18 19:40:02 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 12/06/18 19:40:02 INFO snappy.LoadSnappy: Snappy native library loaded > 12/06/18 19:40:06 INFO mapred.JobClient: Running job: job_201206181853_0001 > 12/06/18 19:40:07 INFO mapred.JobClient: map 0% reduce 0% > 12/06/18 19:40:17 INFO mapred.JobClient: Task Id : > attempt_201206181853_0001_m_000000_0, Status : FAILED > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47) > at > org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > > > I will keep trying and posting my results. Thanks for your help! > > -Jonathan > > > > On Mon, Jun 18, 2012 at 7:28 PM, Jonathan Hodges <[email protected]> wrote: > >> Hi, >> >> I am running into some difficulty running the Mahout in Action Ch.06 >> Hadoop RecommenderJob example and was hoping someone can help me out. I am >> using the CDH3 VMWare image from Cloudera on my laptop. I first downloaded >> the links-simple-sorted.txt dataset and added to HDFS as input/input.txt. >> I also added the users.txt file with the number 3 on the first and only >> line. >> >> >> [cloudera@localhost ~]$ hadoop fs -ls input >> Found 2 items >> -rw-r--r-- 1 cloudera supergroup 0 2012-06-18 12:26 >> /user/cloudera/input/input.txt >> -rw-r--r-- 1 cloudera supergroup 0 2012-06-18 12:28 >> /user/cloudera/input/users.txt >> >> >> The following is my commandline. >> >> [cloudera@localhost mahout-distribution-0.5]$ hadoop jar >> mahout-core-0.5-job.jar >> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob >> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile >> input/users.txt --booleanData >> >> >> It gets about 2/3 done with full run and throws the following >> NumberFormatException. >> >> >> >> 12/06/18 18:04:39 INFO mapred.JobClient: Running job: job_201206181608_0047 >> 12/06/18 18:04:40 INFO mapred.JobClient: map 0% reduce 0% >> 12/06/18 18:04:45 INFO mapred.JobClient: map 100% reduce 0% >> 12/06/18 18:04:53 INFO mapred.JobClient: map 100% reduce 33% >> 12/06/18 18:04:55 INFO mapred.JobClient: map 100% reduce 100% >> 12/06/18 18:04:55 INFO mapred.JobClient: Job complete: >> job_201206181608_0047 >> 12/06/18 18:04:55 INFO mapred.JobClient: Counters: 26 >> 12/06/18 18:04:55 INFO mapred.JobClient: Job Counters >> 12/06/18 18:04:55 INFO mapred.JobClient: Launched reduce tasks=1 >> 12/06/18 18:04:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5261 >> 12/06/18 18:04:55 INFO mapred.JobClient: Total time spent by all >> reduces waiting after reserving slots (ms)=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Total time spent by all maps >> waiting after reserving slots (ms)=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Launched map tasks=1 >> 12/06/18 18:04:55 INFO mapred.JobClient: Data-local map tasks=1 >> 12/06/18 18:04:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9266 >> 12/06/18 18:04:55 INFO mapred.JobClient: FileSystemCounters >> 12/06/18 18:04:55 INFO mapred.JobClient: FILE_BYTES_READ=22 >> 12/06/18 18:04:55 INFO mapred.JobClient: HDFS_BYTES_READ=226 >> 12/06/18 18:04:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=113312 >> 12/06/18 18:04:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=97 >> 12/06/18 18:04:55 INFO mapred.JobClient: Map-Reduce Framework >> 12/06/18 18:04:55 INFO mapred.JobClient: Map input records=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce shuffle bytes=14 >> 12/06/18 18:04:55 INFO mapred.JobClient: Spilled Records=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Map output bytes=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: CPU time spent (ms)=1970 >> 12/06/18 18:04:55 INFO mapred.JobClient: Total committed heap usage >> (bytes)=210960384 >> 12/06/18 18:04:55 INFO mapred.JobClient: Combine input records=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=123 >> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce input records=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce input groups=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Combine output records=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Physical memory (bytes) >> snapshot=302817280 >> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce output records=0 >> 12/06/18 18:04:55 INFO mapred.JobClient: Virtual memory (bytes) >> snapshot=1217081344 >> 12/06/18 18:04:55 INFO mapred.JobClient: Map output records=0 >> Exception in thread "main" java.lang.NumberFormatException: For input >> string: "" >> at java.lang.NumberFormatException.forInputString( >> NumberFormatException.java:48) >> at java.lang.Integer.parseInt(Integer.java:470) >> at java.lang.Integer.parseInt(Integer.java:499) >> at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils. >> readIntFromFile(TasteHadoopUtils.java:93) >> at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. >> run(RecommenderJob.java:215) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. >> main(RecommenderJob.java:333) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke( >> NativeMethodAccessorImpl.java:39) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke( >> DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:197) >> [cloudera@localhost mahout-distribution-0.5]$ >> >> >> Since these are just using the out of the box code and dataset I figured I >> configured something incorrectly. Has anyone seen this before? >> >> >> Out of curiosity I tried this on the current 0.7 release and got a >> different error. I got the following ClassCastException with the >> similarity classname. >> >> >> [cloudera@localhost mahout-distribution-0.7]$ hadoop jar >> mahout-core-0.7-job.jar >> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob >> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile >> input/users.txt --booleanData --similarityClassname >> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity >> >> >> >> 12/06/18 17:49:08 INFO mapred.JobClient: Task Id : >> attempt_201206181608_0042_m_000000_1, Status : FAILED >> java.lang.ClassCastException: class >> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity >> at java.lang.Class.asSubclass(Class.java:3018) >> at >> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28) >> at >> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$VectorNormMapper.setup(RowSimilarityJob.java:194) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) >> at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) >> at org.apache.hadoop.mapred.Child.main(Child.java:264) >> >> >> Any help would be greatly appreciated. >> >> -Jonathan >> >>
