Re: Mahout in Action Ch.06 Hadoop RecommenderJob Example

Jonathan Hodges Mon, 18 Jun 2012 19:53:37 -0700

After rereading my email I noticed that the input.txt and users.txt files
had size of 0.  I deleted and then put them in HDFS.



[cloudera@localhost mahout-distribution-0.5]$ hadoop fs -ls input
Found 2 items
-rw-r--r--   1 cloudera supergroup 1058414409 2012-06-18 19:37
/user/cloudera/input/input.txt
-rw-r--r--   1 cloudera supergroup          2 2012-06-18 19:00
/user/cloudera/input/users.txt


Now that I have verified the correct sizes I rerun the following command
and get an ArrayIndexOutOfBoundsException.


[cloudera@localhost mahout-distribution-0.5]$ hadoop jar
mahout-core-0.5-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
input/users.txt --booleanData
12/06/18 19:39:58 INFO common.AbstractJob: Command line arguments:
{--booleanData=false, --endPhase=2147483647, --maxCooccurrencesPerItem=100,
--maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1,
--numRecommendations=10, --similarityClassname=SIMILARITY_COOCCURRENCE,
--startPhase=0, --tempDir=temp, --usersFile=input/users.txt}
12/06/18 19:40:02 INFO input.FileInputFormat: Total input paths to process
: 1
12/06/18 19:40:02 WARN snappy.LoadSnappy: Snappy native library is available
12/06/18 19:40:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/06/18 19:40:02 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/18 19:40:06 INFO mapred.JobClient: Running job: job_201206181853_0001
12/06/18 19:40:07 INFO mapred.JobClient:  map 0% reduce 0%
12/06/18 19:40:17 INFO mapred.JobClient: Task Id :
attempt_201206181853_0001_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 1
        at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
        at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)


I will keep trying and posting my results.  Thanks for your help!

-Jonathan



On Mon, Jun 18, 2012 at 7:28 PM, Jonathan Hodges <[email protected]> wrote:

> Hi,
>
> I am running into some difficulty running the Mahout in Action Ch.06
> Hadoop RecommenderJob example and was hoping someone can help me out.  I am
> using the CDH3 VMWare image from Cloudera on my laptop.  I first downloaded
> the links-simple-sorted.txt dataset and added to HDFS as input/input.txt.
>  I also added the users.txt file with the number 3 on the first and only
> line.
>
>
> [cloudera@localhost ~]$ hadoop fs -ls input
> Found 2 items
> -rw-r--r--   1 cloudera supergroup          0 2012-06-18 12:26
> /user/cloudera/input/input.txt
> -rw-r--r--   1 cloudera supergroup          0 2012-06-18 12:28
> /user/cloudera/input/users.txt
>
>
> The following is my commandline.
>
> [cloudera@localhost mahout-distribution-0.5]$ hadoop jar
> mahout-core-0.5-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData
>
>
> It gets about 2/3 done with full run and throws the following
> NumberFormatException.
>
>
>
> 12/06/18 18:04:39 INFO mapred.JobClient: Running job: job_201206181608_0047
> 12/06/18 18:04:40 INFO mapred.JobClient:  map 0% reduce 0%
> 12/06/18 18:04:45 INFO mapred.JobClient:  map 100% reduce 0%
> 12/06/18 18:04:53 INFO mapred.JobClient:  map 100% reduce 33%
> 12/06/18 18:04:55 INFO mapred.JobClient:  map 100% reduce 100%
> 12/06/18 18:04:55 INFO mapred.JobClient: Job complete:
> job_201206181608_0047
> 12/06/18 18:04:55 INFO mapred.JobClient: Counters: 26
> 12/06/18 18:04:55 INFO mapred.JobClient:   Job Counters
> 12/06/18 18:04:55 INFO mapred.JobClient:     Launched reduce tasks=1
> 12/06/18 18:04:55 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5261
> 12/06/18 18:04:55 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Launched map tasks=1
> 12/06/18 18:04:55 INFO mapred.JobClient:     Data-local map tasks=1
> 12/06/18 18:04:55 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9266
> 12/06/18 18:04:55 INFO mapred.JobClient:   FileSystemCounters
> 12/06/18 18:04:55 INFO mapred.JobClient:     FILE_BYTES_READ=22
> 12/06/18 18:04:55 INFO mapred.JobClient:     HDFS_BYTES_READ=226
> 12/06/18 18:04:55 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=113312
> 12/06/18 18:04:55 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=97
> 12/06/18 18:04:55 INFO mapred.JobClient:   Map-Reduce Framework
> 12/06/18 18:04:55 INFO mapred.JobClient:     Map input records=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce shuffle bytes=14
> 12/06/18 18:04:55 INFO mapred.JobClient:     Spilled Records=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Map output bytes=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     CPU time spent (ms)=1970
> 12/06/18 18:04:55 INFO mapred.JobClient:     Total committed heap usage
> (bytes)=210960384
> 12/06/18 18:04:55 INFO mapred.JobClient:     Combine input records=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     SPLIT_RAW_BYTES=123
> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce input records=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce input groups=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Combine output records=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Physical memory (bytes)
> snapshot=302817280
> 12/06/18 18:04:55 INFO mapred.JobClient:     Reduce output records=0
> 12/06/18 18:04:55 INFO mapred.JobClient:     Virtual memory (bytes)
> snapshot=1217081344
> 12/06/18 18:04:55 INFO mapred.JobClient:     Map output records=0
> Exception in thread "main" java.lang.NumberFormatException: For input
> string: ""
>         at java.lang.NumberFormatException.forInputString(
> NumberFormatException.java:48)
>         at java.lang.Integer.parseInt(Integer.java:470)
>         at java.lang.Integer.parseInt(Integer.java:499)
>         at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.
> readIntFromFile(TasteHadoopUtils.java:93)
>         at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
> run(RecommenderJob.java:215)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
> main(RecommenderJob.java:333)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> [cloudera@localhost mahout-distribution-0.5]$
>
>
> Since these are just using the out of the box code and dataset I figured I
> configured something incorrectly.  Has anyone seen this before?
>
>
> Out of curiosity I tried this on the current 0.7 release and got a
> different error.  I got the following ClassCastException with the
> similarity classname.
>
>
> [cloudera@localhost mahout-distribution-0.7]$ hadoop jar
> mahout-core-0.7-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData --similarityClassname
> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
>
>
>
> 12/06/18 17:49:08 INFO mapred.JobClient: Task Id :
> attempt_201206181608_0042_m_000000_1, Status : FAILED
> java.lang.ClassCastException: class
> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
>         at java.lang.Class.asSubclass(Class.java:3018)
>         at
> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
>         at
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$VectorNormMapper.setup(RowSimilarityJob.java:194)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
>         at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
>
> Any help would be greatly appreciated.
>
> -Jonathan
>
>

Re: Mahout in Action Ch.06 Hadoop RecommenderJob Example

Reply via email to