On 01/10/2014 05:30 AM, Andrew Bagshaw wrote:
Hi! I’m new to OpenNLP and have been playing around with it in C# .NET 
following the instructions at 
https://cwiki.apache.org/confluence/display/OPENNLP/Introduction+to+using+openNLP+in+.NET+Projects.


So far everything has been going without a hitch, except I can’t figure out how 
to use the training API in C# to train a Name Finder model. I’ve been tinkering 
around without success and can’t seem to find any documentation on this.


I've finally managed to get some code that at least compiles but it gives me 
this error on the third last line:

An unhandled exception of type 'java.lang.IllegalStateException' occurred in 
opennlp.dll

Additional information: java.security.NoSuchAlgorithmException: class 
configured for MessageDigest(provider: SUN)cannot be found.



During the training a hash of all the events is computed with the Java API. It looks like that retrieving the MessageDigest fails on .Net. I am not sure how that can be fixed. Maybe we could use a different approach to compute the hash.

A pragmatic solution could be just to epxport the training data to a file and afterwards use the command line util to do the training (using the JVM).

My code:


FileReader fileReader = new FileReader("train.txt");

  ObjectStream fileStream = new PlainTextByLineStream(fileReader);

  ObjectStream sampleStream = new NameSampleDataStream(fileStream);

  TokenNameFinderModel test;

  test = NameFinderME.train("en", "person", sampleStream, 
Collections.emptyMap());

  opennlp.tools.namefind.TokenNameFinderModel model = NameFinderME.train("en", 
"person", sampleStream, Collections.emptyMap()); //I get the error here

  BufferedOutputStream modelOut = new BufferedOutputStream(new 
FileOutputStream("test.bin"));

model.serialize(modelOut);


I’m not sure if I’m even on the right track with this code. If someone would be 
kind enough to set me on the right track I would be very grateful.


The code looks good.



I have been using the name finder algorithm with the person name finder model 
with success, although I find that it misses a bunch of names that I would like 
it to detect. Is there a way that I can add to the model (train it without 
overwriting the current information in it)? That is what I am trying to 
accomplish.

No in our current implementation, that is not possible.

To get a well performing NER model you usually need to annotate your own data. If you want to process English text, it might be worth to train on OntoNotes, OpenNLP
has built-in support for it.

HTH,
Jörn

Reply via email to