Thanks for the reply Alexander however I used wikiseq to parse the wiki docs xml and create the sequence file so couldn't really verify any document duplication, at least not easily anyway.
Using seq2encoded instead of seq2sparse when creating the vector files looks to have worked ok though! :) > Date: Mon, 23 Jul 2012 22:03:08 +0400 > Subject: Re:: trainnb > From: [email protected] > To: [email protected] > > Hi > I had such error when I feeded two docs with the same name (key). So check > input files for duplications. > > Alexander > 23.07.2012 2:44 пользователь "Sam Hodgson" <[email protected]> > написал: > > > > > Hi, > > > > Im trying to create a classification model using some wiki sample articles > > using seqwiki on the wiki xml dumps. It creates the sequence files and > > vectors ok but when executing: > > > > bin/mahout trainnb -i vectors/tfidf-vectors -el -o mod -li labidx -ow -c > > > > I get the following: > > > > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 > > at > > org.apache.mahout.classifier.naivebayes.BayesUtils.writeLabelIndex(BayesUtils.java:122) > > at > > org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.createLabelIndex(TrainNaiveBayesJob.java:178) > > at > > org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.run(TrainNaiveBayesJob.java:93) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > org.apache.mahout.classifier.naivebayes.training.TrainNaiveBayesJob.main(TrainNaiveBayesJob.java:63) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > > at > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > at > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > > > > > > I've tried using the various output methods from seq2spare with the same > > results, and also various wiki source files. > > > > Any advice would be greatly appreciated. > > > > Cheers > > > > Sam
