Hello OpenNLP community, We are using the OpenNLP Name Finder to train models on a domain specific German dataset. However, since upgrading from version 1.6.0 to 1.8.4, I have noticed that the Name Finder model is much better, but no longer robust.
Using the small amount of data we have, the new version improves upon the F-score on our test set. However, in order to boost the small amount of training data that I have, I have generated some "synthetic" data. It's imaginable that this "unclean" data would confuse the model, but in 1.6.0, it would improve the F-score. This is no longer the case in 1.8.4: any manipulations to the data appear to confuse the model and cause it to find many false positives. I'd like to understand a little better what has changed between these two versions, but the release notes aren't very descriptive. Has anybody else experienced any wild changes with the new version? Many thanks in advance! Fraser