The error is thrown because you do not have enough training samples, try to run your code with at least 10 to 20 training samples.
Jörn On 08/23/2012 03:15 PM, andrea maestroni wrote:
Hi to all! i try to develop a program in java that take a document,extract the text ,analyze the text and extract the main topic of the document. i think it 's a problem of document categorizer right? i tried the example in the manual page. i have create the training file,i rtf file with the line: GMDecrease Major acquisitions that have a lower gross margin than the existing network also \ had a negative impact on the overall gross margin, but it should improve following \ the implementation of its integration strategies . GMIncrease The upward movement of gross margin resulted from amounts pursuant to adjustments \ to obligations towards dealers . then in my code i use this function for training a model: public static void Train() throws InvalidFormatException, IOException {DoccatModel model = null;InputStream dataIn = null; try { dataIn = new FileInputStream("/Users/andry85mae/Desktop/apache-opennlp-1.5.2-incubating/bin/train.train"); ObjectStream<String> lineStream = new PlainTextByLineStream(dataIn, "UTF-8"); ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream); model = DocumentCategorizerME.train("en", sampleStream); } catch (IOException e) { // Failed to read or parse training data, training failed e.printStackTrace(); } finally { if (dataIn != null) { try { dataIn.close(); } catch (IOException e) { // Not an issue, training already finished. // The exception should be logged and investigated // if part of a production system. e.printStackTrace(); } } }}but i give me an error... java.io.IOException: Empty lines, or lines with only a category string are not allowed! Computing event counts... Incorporating indexed data for training... Exception in thread "main" java.lang.NullPointerException at opennlp.maxent.GISTrainer.trainModel(GISTrainer.java:263) at opennlp.maxent.GIS.trainModel(GIS.java:256) at opennlp.model.TrainUtil.train(TrainUtil.java:182) at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:154) at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:176) at opennlp.tools.doccat.DocumentCategorizerME.train(DocumentCategorizerME.java:207) at opennlp_prova.Opennlp_prova.Train(Opennlp_prova.java:55) at opennlp_prova.Opennlp_prova.main(Opennlp_prova.java:96) Java Result: 1 what are the error? thank in advance!!!
