On 11/05/2014 08:14 AM, Rodrigo Agerri wrote:
Hi Raj,
I believe that the NameFinder models were trained with MUC, but I am
not sure. In any case, if you are going to annotate a different domain
to that of MUC, you will better off annotating data for that domain
because supervised approaches do not adapt well when used in other
genres/domains.
The English name finder models are trained on MUC 6 / 7 plus some
corrections to solve
certain detection problems.
I suggest not to use MUC anymore because it is quite dated.
If you want to train name finder models which perform well I suggest to
have a look
at OntoNotes 4.0. We have support to train OpenNLP models directly on it.
The data is not free, we had to pay around 50 USD to get it.
There is now also a newer version 5.0:
https://catalog.ldc.upenn.edu/LDC2013T19
I guess the format of it didn't change to much, so there is a good
chance it runs
with the 4.0 parsing code.
HTH,
Jörn