Hi,
I apologise if the question is trivial but I'm not experienced with openNLP
(and not too confident in my Java skills either).

I'm trying to train a sentence detection model for Zulu. No matter whether
I'm using the command line interface or the API, it appears to be training
but a model file is not created. I'm getting the following exception [1]:
java.lang.IllegalArgumentException: The maxent model is not compatible with
the sentence detector!

The original data comes from the Ukwabelana corpus [2] in a text file
(US-ASCII), one sentence per line. It is completely stripped off of
capitalisation and any kind of punctuation. I automatically added a "." at
the end of every sentence, so that there is some EOS token for the program
to pick up.

I would appreciate any insight as to what is to be done!

Mariya

[1] The whole output is:

Indexing events using cutoff of 5

    Computing event counts… done. 29424 events
    Indexing… done.
    Sorting and merging events… done. Reduced 29424 events to 7830.
    Done indexing.
    Incorporating indexed data for training…
    done.

    Number of Event Tokens: 7830
    Number of Outcomes: 1
    Number of Predicates: 1673

    …done.

    Computing model parameters …
    Performing 100 iterations.
    1: … loglikelihood=0.0 1.0
    2: … loglikelihood=0.0 1.0

    Exception in thread “main” java.lang.IllegalArgumentException: The
maxent model is not compatible with the sentence detector!

    at
opennlp.tools.util.model.BaseModel.checkArtifactMap(BaseModel.java:275)
    at opennlp.tools.sentdetect.SentenceModel.<init>(SentenceModel.java:64)
    at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:285)
    at
opennlp.tools.sentdetect.SentenceDetectorME.train(SentenceDetectorME.java:296)
    at
opennlp.tools.cmdline.sentdetect.SentenceDetectorTrainerTool.run(SentenceDetectorTrainerTool.java:111)
    at opennlp.tools.cmdline.CLI.main(CLI.java:191)


[2]
http://www.cs.bris.ac.uk/Research/MachineLearning/Morphology/resources.jsp#corpus

Reply via email to