Hello,

I have fairly large data sets for OpenNLP's TokenNameFinder on a variety of 
entity types, training one combined model for all types takes a lot of time and 
memory. So instead of wanted to see what i get when assign just one type to 
each model file.

So i trained three models on the same data set, one with types A and B, and two 
separate models for A and B each. The file sizes of model AB roughly matches A  
B, and the TokenNameFinder CLI tool uses about the same amount of heap for the 
combined model and the two separated models loaded together.

Their performance was nearly identical but the combined model had a slightly 
higher recall, just two correctly detected entities, one for each type. How is 
this possible? Is this to be expected? Is there something i might have done 
wrong, didn't think of?

I use OpenNLP 1.9.2, i trained the models with no other options than 
-nameTypes, -model, -data and -lang.

Many thanks,
Markus

Reply via email to