You r missing the Lucene jars from ur classpath. Mahout's presently at Lucene 4.6.1 that's what u should be including.
On Tuesday, June 3, 2014 3:40 PM, Terry Blankers <[email protected]> wrote: Hello, can anyone please give me a clue as to what I may be missing here? I'm trying to run a SparseVectorsFromSequenceFiles job via ToolRunner from a java project and I'm getting the following exception: Error: java.lang.ClassNotFoundException: org.apache.lucene.analysis.standard.StandardAnalyzer at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:62) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157) I've tried adding the location of lucene-analyzers-common-4.6.1.jar to my hadoop classpath which doesn't make any difference. I'm running against Hadoop 2.2 and Mahout trunk, compiled with: mvn clean install -Dhadoop2.version=2.2.0 -DskipTests I'm trying to run the job like this: String[] args = {"--input","/input/index" ,"--output","/output/vectors" ,"--maxNGramSize","3" ,"--namedVector", "--overwrite" }; SparseVectorsFromSequenceFiles sparse = new SparseVectorsFromSequenceFiles(); ToolRunner.run(configuration, sparse, args); Running seq2sparse from the commandline works successfully with no exceptions: $MAHOUT_HOME/bin/mahout seq2sparse -i /input/index --namedVector -o /output/vectors -ow --maxNGramSize 3 Many thanks, Terry
