After rereading my email I noticed that the input.txt and users.txt files
had size of 0. I deleted and then put them in HDFS.
[cloudera@localhost mahout-distribution-0.5]$ hadoop fs -ls input
Found 2 items
-rw-r--r-- 1 cloudera supergroup 1058414409 2012-06-18 19:37
/user/cloudera/input/input.txt
-rw-r--r-- 1 cloudera supergroup 2 2012-06-18 19:00
/user/cloudera/input/users.txt
Now that I have verified the correct sizes I rerun the following command
and get an ArrayIndexOutOfBoundsException.
[cloudera@localhost mahout-distribution-0.5]$ hadoop jar
mahout-core-0.5-job.jar
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
-Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
input/users.txt --booleanData
12/06/18 19:39:58 INFO common.AbstractJob: Command line arguments:
{--booleanData=false, --endPhase=2147483647, --maxCooccurrencesPerItem=100,
--maxPrefsPerUser=10, --maxSimilaritiesPerItem=100, --minPrefsPerUser=1,
--numRecommendations=10, --similarityClassname=SIMILARITY_COOCCURRENCE,
--startPhase=0, --tempDir=temp, --usersFile=input/users.txt}
12/06/18 19:40:02 INFO input.FileInputFormat: Total input paths to process
: 1
12/06/18 19:40:02 WARN snappy.LoadSnappy: Snappy native library is available
12/06/18 19:40:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/06/18 19:40:02 INFO snappy.LoadSnappy: Snappy native library loaded
12/06/18 19:40:06 INFO mapred.JobClient: Running job: job_201206181853_0001
12/06/18 19:40:07 INFO mapred.JobClient: map 0% reduce 0%
12/06/18 19:40:17 INFO mapred.JobClient: Task Id :
attempt_201206181853_0001_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: 1
at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
I will keep trying and posting my results. Thanks for your help!
-Jonathan
On Mon, Jun 18, 2012 at 7:28 PM, Jonathan Hodges <[email protected]> wrote:
> Hi,
>
> I am running into some difficulty running the Mahout in Action Ch.06
> Hadoop RecommenderJob example and was hoping someone can help me out. I am
> using the CDH3 VMWare image from Cloudera on my laptop. I first downloaded
> the links-simple-sorted.txt dataset and added to HDFS as input/input.txt.
> I also added the users.txt file with the number 3 on the first and only
> line.
>
>
> [cloudera@localhost ~]$ hadoop fs -ls input
> Found 2 items
> -rw-r--r-- 1 cloudera supergroup 0 2012-06-18 12:26
> /user/cloudera/input/input.txt
> -rw-r--r-- 1 cloudera supergroup 0 2012-06-18 12:28
> /user/cloudera/input/users.txt
>
>
> The following is my commandline.
>
> [cloudera@localhost mahout-distribution-0.5]$ hadoop jar
> mahout-core-0.5-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData
>
>
> It gets about 2/3 done with full run and throws the following
> NumberFormatException.
>
>
>
> 12/06/18 18:04:39 INFO mapred.JobClient: Running job: job_201206181608_0047
> 12/06/18 18:04:40 INFO mapred.JobClient: map 0% reduce 0%
> 12/06/18 18:04:45 INFO mapred.JobClient: map 100% reduce 0%
> 12/06/18 18:04:53 INFO mapred.JobClient: map 100% reduce 33%
> 12/06/18 18:04:55 INFO mapred.JobClient: map 100% reduce 100%
> 12/06/18 18:04:55 INFO mapred.JobClient: Job complete:
> job_201206181608_0047
> 12/06/18 18:04:55 INFO mapred.JobClient: Counters: 26
> 12/06/18 18:04:55 INFO mapred.JobClient: Job Counters
> 12/06/18 18:04:55 INFO mapred.JobClient: Launched reduce tasks=1
> 12/06/18 18:04:55 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5261
> 12/06/18 18:04:55 INFO mapred.JobClient: Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Total time spent by all maps
> waiting after reserving slots (ms)=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Launched map tasks=1
> 12/06/18 18:04:55 INFO mapred.JobClient: Data-local map tasks=1
> 12/06/18 18:04:55 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=9266
> 12/06/18 18:04:55 INFO mapred.JobClient: FileSystemCounters
> 12/06/18 18:04:55 INFO mapred.JobClient: FILE_BYTES_READ=22
> 12/06/18 18:04:55 INFO mapred.JobClient: HDFS_BYTES_READ=226
> 12/06/18 18:04:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=113312
> 12/06/18 18:04:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=97
> 12/06/18 18:04:55 INFO mapred.JobClient: Map-Reduce Framework
> 12/06/18 18:04:55 INFO mapred.JobClient: Map input records=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce shuffle bytes=14
> 12/06/18 18:04:55 INFO mapred.JobClient: Spilled Records=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Map output bytes=0
> 12/06/18 18:04:55 INFO mapred.JobClient: CPU time spent (ms)=1970
> 12/06/18 18:04:55 INFO mapred.JobClient: Total committed heap usage
> (bytes)=210960384
> 12/06/18 18:04:55 INFO mapred.JobClient: Combine input records=0
> 12/06/18 18:04:55 INFO mapred.JobClient: SPLIT_RAW_BYTES=123
> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce input records=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce input groups=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Combine output records=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Physical memory (bytes)
> snapshot=302817280
> 12/06/18 18:04:55 INFO mapred.JobClient: Reduce output records=0
> 12/06/18 18:04:55 INFO mapred.JobClient: Virtual memory (bytes)
> snapshot=1217081344
> 12/06/18 18:04:55 INFO mapred.JobClient: Map output records=0
> Exception in thread "main" java.lang.NumberFormatException: For input
> string: ""
> at java.lang.NumberFormatException.forInputString(
> NumberFormatException.java:48)
> at java.lang.Integer.parseInt(Integer.java:470)
> at java.lang.Integer.parseInt(Integer.java:499)
> at org.apache.mahout.cf.taste.hadoop.TasteHadoopUtils.
> readIntFromFile(TasteHadoopUtils.java:93)
> at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
> run(RecommenderJob.java:215)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.
> main(RecommenderJob.java:333)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> [cloudera@localhost mahout-distribution-0.5]$
>
>
> Since these are just using the out of the box code and dataset I figured I
> configured something incorrectly. Has anyone seen this before?
>
>
> Out of curiosity I tried this on the current 0.7 release and got a
> different error. I got the following ClassCastException with the
> similarity classname.
>
>
> [cloudera@localhost mahout-distribution-0.7]$ hadoop jar
> mahout-core-0.7-job.jar
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
> -Dmapred.input.dir=input/input.txt -Dmapred.output.dir=output --usersFile
> input/users.txt --booleanData --similarityClassname
> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
>
>
>
> 12/06/18 17:49:08 INFO mapred.JobClient: Task Id :
> attempt_201206181608_0042_m_000000_1, Status : FAILED
> java.lang.ClassCastException: class
> org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity
> at java.lang.Class.asSubclass(Class.java:3018)
> at
> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
> at
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$VectorNormMapper.setup(RowSimilarityJob.java:194)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
>
> Any help would be greatly appreciated.
>
> -Jonathan
>
>