Re: KmeansDriver Question

Paritosh Ranjan Mon, 17 Sep 2012 01:00:14 -0700

AFAIK SequenceFileTokenizerMapper is not called from KMeansdriver.

The mapper is tokenizing sequence files, so, the error might be duringthat step.


On 17-09-2012 12:39, jung hoon sohn wrote:

Thank you for the reply.
However the error was thrown during the process of the map (
org.apache.hadoop.mapreduce.**Mapper.run).
Isn't the mapping function part of the KmeansDriver class?

Thank You.

Jung Hoon

On Sat, Sep 15, 2012 at 5:48 PM, Paritosh Ranjan <[email protected]> wrote:

I don't think that it is a kmeans driver error.
SequenceFileTokenizerMapper is not used in KmeansDriver. I think you are
getting error while transforming data.


On 15-09-2012 12:59, jung hoon sohn wrote:

Hello, I am trying to cluster the input data using KmeansDriver.
The input vector is transformed from the lucene vector using the
"bin/mahout lucene.vector ..." commands and when I run the
KmeansDriver using the run method, I get

12/09/15 15:18:13 INFO mapred.JobClient: Task Id :
attempt_201209121951_0067_m_**000000_1, Status : FAILED
java.lang.ClassCastException: org.apache.hadoop.io.**LongWritable cannot
be
cast to org.apache.hadoop.io.Text
          at
org.apache.mahout.vectorizer.**document.**SequenceFileTokenizerMapper.**
map(**SequenceFileTokenizerMapper.**java:37)
          at org.apache.hadoop.mapreduce.**Mapper.run(Mapper.java:144)
          at org.apache.hadoop.mapred.**MapTask.runNewMapper(MapTask.**
java:764)
          at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:370)
          at org.apache.hadoop.mapred.**Child$4.run(Child.java:255)
          at java.security.**AccessController.doPrivileged(**Native
Method)
          at javax.security.auth.Subject.**doAs(Subject.java:415)
          at
org.apache.hadoop.security.**UserGroupInformation.doAs(**
UserGroupInformation.java:**1093)
          at org.apache.hadoop.mapred.**Child.main(Child.java:249)

for several attempts but the process goes on and generates the output
data.
I can even run the clusterdump using the output cluster data however I am
concerned about the effect of above errors.

Please help me to get through the problem.

Thanks.

Jung Hoon

Re: KmeansDriver Question

Reply via email to