Hi Suneel, can you please provide a little more detail since I still
can't get this to work.
Which classpath are the Lucene jars supposed to be added to? my java
project? or the Hadoop instance?
Thanks,
Terry
On 6/3/14, 5:35 PM, Terry Blankers wrote:
Thanks Suneel. I thought having the jar as a dependency and the class
imported was enough.
On 6/3/14, 4:18 PM, Suneel Marthi wrote:
You r missing the Lucene jars from ur classpath. Mahout's presently
at Lucene 4.6.1 that's what u should be including.
On Tuesday, June 3, 2014 3:40 PM, Terry Blankers
<[email protected]> wrote:
Hello, can anyone please give me a clue as to what I may be missing
here?
I'm trying to run a SparseVectorsFromSequenceFiles job via ToolRunner
from a java project and I'm getting the following exception:
Error: java.lang.ClassNotFoundException:
org.apache.lucene.analysis.standard.StandardAnalyzer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:62)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
I've tried adding the location of lucene-analyzers-common-4.6.1.jar to
my hadoop classpath which doesn't make any difference.
I'm running against Hadoop 2.2 and Mahout trunk, compiled with:
mvn clean install -Dhadoop2.version=2.2.0 -DskipTests
I'm trying to run the job like this:
String[] args = {"--input","/input/index"
,"--output","/output/vectors"
,"--maxNGramSize","3"
,"--namedVector", "--overwrite"
};
SparseVectorsFromSequenceFiles sparse = new
SparseVectorsFromSequenceFiles();
ToolRunner.run(configuration, sparse, args);
Running seq2sparse from the commandline works successfully with no
exceptions:
$MAHOUT_HOME/bin/mahout seq2sparse -i /input/index
--namedVector -o
/output/vectors -ow --maxNGramSize 3
Many thanks,
Terry