Hello all,
I am trying to run seq2sparse as follow:
bin/mahout seq2sparse \
-i clustering/items-seq \
-o clustering/items-vectors \
-wt tfidf \
-nr 3 \
-ng 3 \
-s 5 \
-md 3 \
-x 90 \
-ml 50 \
-ow
The first task is failing with the following error:
Running on hadoop, using HADOOP_HOME=/usr/lib/hadoop
HADOOP_CONF_DIR=/etc/hadoop/conf
11/06/08 17:39:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum
n-gram size is: 3
11/06/08 17:39:13 INFO common.HadoopUtil: Deleting clustering/items-vectors
11/06/08 17:39:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR
value: 50.0
11/06/08 17:39:13 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of
reduce tasks: 3
11/06/08 17:39:13 INFO input.FileInputFormat: Total input paths to process : 1
11/06/08 17:39:13 INFO mapred.JobClient: Running job: job_201106061352_0055
11/06/08 17:39:14 INFO mapred.JobClient: map 0% reduce 0%
11/06/08 17:39:18 INFO mapred.JobClient: Task Id :
attempt_201106061352_0055_m_000000_0, Status : FAILED
Error: Cannot inherit from final class
11/06/08 17:39:23 INFO mapred.JobClient: Task Id :
attempt_201106061352_0055_m_000000_1, Status : FAILED
Error: Cannot inherit from final class
11/06/08 17:39:26 INFO mapred.JobClient: Task Id :
attempt_201106061352_0055_m_000000_2, Status : FAILED
Error: Cannot inherit from final class
11/06/08 17:39:31 INFO mapred.JobClient: Job complete: job_201
The logs show:
*_syslog logs_*
2011-06-08 17:39:16,900 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes
where applicable
2011-06-08 17:39:17,097 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2011-06-08 17:39:17,372 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2011-06-08 17:39:17,380 FATAL org.apache.hadoop.mapred.Child: Error running
child : java.lang.VerifyError: Cannot inherit from final class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
at
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:57)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
I am running mahout-0.5 src... just downloaded a fresh copy and ran mvn
package.
I tried the same using the distribution package but when I run that
hadoop complains about missing jar.. ie lucene and google preconditions
(wtf?)
Is there something I am doing wrong or is this a possible bug?
Here are my system stats... notice I am running Cloudera 0.20.2
Fedora Core 9
java version "1.6.0_24"
Java(TM) SE Runtime Environment (build 1.6.0_24-b07)
Java HotSpot(TM) 64-Bit Server VM (build 19.1-b02, mixed mode)
Hadoop 0.20.2-cdh3u0
Subversion -r 81256ad0f2e4ab2bd34b04f53d25a6c23686dd14
Compiled by root on Fri Mar 25 20:07:24 EDT 2011
From source with checksum 6c1f62dddc4eac69b6b973c18bbc0f55