Hello, I have fairly large data sets for OpenNLP's TokenNameFinder on a variety of entity types, training one combined model for all types takes a lot of time and memory. So instead of wanted to see what i get when assign just one type to each model file.
So i trained three models on the same data set, one with types A and B, and two separate models for A and B each. The file sizes of model AB roughly matches A B, and the TokenNameFinder CLI tool uses about the same amount of heap for the combined model and the two separated models loaded together. Their performance was nearly identical but the combined model had a slightly higher recall, just two correctly detected entities, one for each type. How is this possible? Is this to be expected? Is there something i might have done wrong, didn't think of? I use OpenNLP 1.9.2, i trained the models with no other options than -nameTypes, -model, -data and -lang. Many thanks, Markus