Re: Mahout in Action Ch.06 Hadoop RecommenderJob Example

Sebastian Schelter Mon, 18 Jun 2012 23:33:21 -0700

Please use a later version of mahout! The 0.5 release has a major bug
in the recommendation code.


2012/6/19 Jonathan Hodges <[email protected]>:
> After rereading my email I noticed that the input.txt and users.txt files
> had size of 0.  I deleted and then put them in HDFS.
>
>
> [cloudera@localhost mahout-distribution-0.5]$ hadoop fs -ls input
> Found 2 items
> -rw-r--r--   1 cloudera supergroup 1058414409 2012-06-18 19:37
> /user/cloudera/input/input.txt
> -rw-r--r--   1 cloudera supergroup          2 2012-06-18 19:00
> /user/cloudera/input/users.txt
>
>
> Now that I have verified the correct sizes I rerun the following command
> and get an ArrayIndexOutOfBoundsException.
>
>
> [cloudera@localhost mahout-distribution-0.5]$ hadoop jar
> mahout-core-0.5-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData
> 12/06/18 19:39:58 INFO common.AbstractJob: Command line arguments:
> {--booleanData=false, --endPhase=2147483647, --maxCooccurrencesPerItem=100,
> --maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1,
> --numRecommendations=10, --similarityClassname=SIMILARITY_COOCCURRENCE,
> --startPhase=0, --tempDir=temp, --usersFile=input/users.txt}
> 12/06/18 19:40:02 INFO input.FileInputFormat: Total input paths to process
> : 1
> 12/06/18 19:40:02 WARN snappy.LoadSnappy: Snappy native library is available
> 12/06/18 19:40:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 12/06/18 19:40:02 INFO snappy.LoadSnappy: Snappy native library loaded
> 12/06/18 19:40:06 INFO mapred.JobClient: Running job: job_201206181853_0001
> 12/06/18 19:40:07 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/18 19:40:17 INFO mapred.JobClient: Task Id :
> attempt_201206181853_0001_m_000000_0, Status : FAILED
> java.lang.ArrayIndexOutOfBoundsException: 1
>        at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
>        at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
>
> I will keep trying and posting my results.  Thanks for your help!
>
> -Jonathan
>
>
>
> On Mon, Jun 18, 2012 at 7:28 PM, Jonathan Hodges <[email protected]> wrote:
>
>> Hi,
>>
>> I am running into some difficulty running the Mahout in Action Ch.06
>> Hadoop RecommenderJob example and was hoping someone can help me out.  I am
>> using the CDH3 VMWare image from Cloudera on my laptop.  I first downloaded
>> the links-simple-sorted.txt dataset and added to HDFS as input/input.txt.
>>  I also added the users.txt file with the number 3 on the first and only
>> line.
>>
>>
>> [cloudera@localhost ~]$ hadoop fs -ls input
>> Found 2 items
>> -rw-r--r--   1 cloudera supergroup          0 2012-06-18 12:26
>> /user/cloudera/input/input.txt
>> -rw-r--r--   1 cloudera supergroup          0 2012-06-18 12:28
>> /user/cloudera/input/users.txt
>>
>>
>> The following is my commandline.
>>
>> [cloudera@localhost mahout-distribution-0.5]$ hadoop jar
>> mahout-core-0.5-job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
>> input/users.txt --booleanData
>>
>>
>> It gets about 2/3 done with full run and throws the following
>> NumberFormatException.
>>
>>
>>
>> 12/06/18 18:04:39 INFO mapred.JobClient: Running job: job_201206181608_0047
>> 12/06/18 18:04:40 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/06/18 18:04:45 INFO mapred.JobClient:  map 100% reduce 0%
>> 12/06/18 18:04:53 INFO mapred.JobClient:  map 100% reduce 33%
>> 12/06/18 18:04:55 INFO mapred.JobClient:  map 100% reduce 100%
>> 12/06/18 18:04:55 INFO mapred.JobClient: Job complete:
>> job_201206181608_0047
>> 12/06/18 18:04:55 INFO mapred.JobClient: Counters: 26
>> 12/06/18 18:04:55 INFO mapred.JobClient:   Job Counters
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Launched reduce tasks=1
>> 12/06/18 18:04:55 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5261
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Launched map tasks=1
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Data-local map tasks=1
>> 12/06/18 18:04:55 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9266
>> 12/06/18 18:04:55 INFO mapred.JobClient:   FileSystemCounters
>> 12/06/18 18:04:55 INFO mapred.JobClient:     FILE_BYTES_READ=22
>> 12/06/18 18:04:55 INFO mapred.JobClient:     HDFS_BYTES_READ=226
>> 12/06/18 18:04:55 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=113312
>> 12/06/18 18:04:55 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=97
>> 12/06/18 18:04:55 INFO mapred.JobClient:   Map-Reduce Framework
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Map input records=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce shuffle bytes=14
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Spilled Records=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Map output bytes=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     CPU time spent (ms)=1970
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Total committed heap usage
>> (bytes)=210960384
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Combine input records=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     SPLIT_RAW_BYTES=123
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce input records=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce input groups=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Combine output records=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Physical memory (bytes)
>> snapshot=302817280
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce output records=0
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Virtual memory (bytes)
>> snapshot=1217081344
>> 12/06/18 18:04:55 INFO mapred.JobClient:     Map output records=0
>> Exception in thread "main" java.lang.NumberFormatException: For input
>> string: ""
>>         at java.lang.NumberFormatException.forInputString(
>> NumberFormatException.java:48)
>>         at java.lang.Integer.parseInt(Integer.java:470)
>>         at java.lang.Integer.parseInt(Integer.java:499)
>>         at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.
>> readIntFromFile(TasteHadoopUtils.java:93)
>>         at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
>> run(RecommenderJob.java:215)
>>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>         at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
>> main(RecommenderJob.java:333)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(
>> NativeMethodAccessorImpl.java:39)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>> [cloudera@localhost mahout-distribution-0.5]$
>>
>>
>> Since these are just using the out of the box code and dataset I figured I
>> configured something incorrectly.  Has anyone seen this before?
>>
>>
>> Out of curiosity I tried this on the current 0.7 release and got a
>> different error.  I got the following ClassCastException with the
>> similarity classname.
>>
>>
>> [cloudera@localhost mahout-distribution-0.7]$ hadoop jar
>> mahout-core-0.7-job.jar
>> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
>> input/users.txt --booleanData --similarityClassname
>> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
>>
>>
>>
>> 12/06/18 17:49:08 INFO mapred.JobClient: Task Id :
>> attempt_201206181608_0042_m_000000_1, Status : FAILED
>> java.lang.ClassCastException: class
>> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
>>         at java.lang.Class.asSubclass(Class.java:3018)
>>         at
>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
>>         at
>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$VectorNormMapper.setup(RowSimilarityJob.java:194)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:264)
>>
>>
>> Any help would be greatly appreciated.
>>
>> -Jonathan
>>
>>

Re: Mahout in Action Ch.06 Hadoop RecommenderJob Example

Reply via email to