It's an issue same to https://issues.apache.org/jira/browse/MAHOUT-1061
After modifying SplitInputJob.java, it works well on CDH4.
Thank you all.
-Shengchao
-----Original Message-----
From: Ted Dunning
Sent: Thursday, January 10, 2013 8:15 PM
To: [email protected]
Subject: Re: Class not found in mahout split -xm mapreduce
Using the 0.23 profile might allow you to compile a version that works with
CDH4. Until Cloudera cares enough to test and commit a patch, however, we
can't be sure.
On Thu, Jan 10, 2013 at 5:52 PM, Marty Kube <
[email protected]> wrote:
Hi Shengchao,
My understanding is that CDH4 is not supported. Try CDH3.
Marty
On 01/10/2013 01:03 PM, Shengchao Ding wrote:
I'm running the 20 newsgroups examples on virtual machine of CDH4.1.2.
It ran smoothly but failed if I modify the split command to
mahout split \
-i newsgroup/vectors \
--trainingOutput newsgroup/train-vectors \
--testOutput newsgroup/test-vectors \
--randomSelectionPct 40 --overwrite --sequenceFiles -xm mapreduce
-mro newsgroup/mro
The only different to original command is that the method is modified
to mapreduce while the original example is sequential.
I got the following exception.
Error: java.lang.RuntimeException: java.lang.**ClassNotFoundException:
Class org.apache.mahout.utils.**SplitInputJob$SplitInputMapper not found
at org.apache.hadoop.conf.**Configuration.getClass(**
Configuration.java:1571)
at org.apache.hadoop.mapreduce.**task.JobContextImpl.**
getMapperClass(JobContextImpl.**java:186)
at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
java:685)
at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.**YarnChild$2.run(YarnChild.**
java:152)
at java.security.**AccessController.doPrivileged(**Native
Method)
at javax.security.auth.Subject.**doAs(Subject.java:396)
at org.apache.hadoop.security.**UserGroupInformation.doAs(**
UserGroupInformation.java:**1332)
at org.apache.hadoop.mapred.**YarnChild.main(YarnChild.java:**
147)
Caused by: java.lang.**ClassNotFoundException: Class
org.apache.mahout.utils.**SplitInputJob$SplitInputMapper not found
at org.apache.hadoop.conf.**Configuration.getClassByName(**
Configuration.java:1477)
at org.apache.hadoop.conf.**Configuration.getClass(**
Configuration.java:1569)
... 8 more
I checked the mahout package on the distribution as follows.
[cloudera@localhost ~]$ jar tf
/usr/lib/mahout/mahout-**examples-0.7-cdh4.1.2-job.jar | grep SplitInput
org/apache/mahout/utils/**SplitInputJob$**SplitInputReducer.class
org/apache/mahout/utils/**SplitInputJob$**SplitInputMapper.class
org/apache/mahout/utils/**SplitInputJob$**SplitInputComparator.class
org/apache/mahout/utils/**SplitInputJob.class
org/apache/mahout/utils/**SplitInput.class
org/apache/mahout/utils/**SplitInput$SplitCallback.class
Can anyone help me out? Thanks.