Hi all,
I was wondering if we can do bug fixes which slightly decrease
the performance of existing models?
In this case I am speaking about OPENNLP-172 which fixes the handling
of lower case sequences in of the token class feature. It detects a
lower case sequences when they contain only A to Z, but in other languages
are more letters like the German umlauts.
This fix will decrease the recall of the existing spanish person ner
model by 2%,
should we apply it anyway for the next release?
After retraining the recall goes up by 6%.
Jörn