I am using OpenNLP 1.8.0 and have trained NameFinder with approximately 78K sentences (perceptron model). I have 11 named entity types, and am finding alot of noise in the output. Looking at the output from training it indicates 39 outcomes. I would have assumed that this would align with the number of named entity types. Could one please explain what the Number of Outcomes refers to ? Also any guidance on data prep and / or areas to explore on how to reduce the FP's would be helpful. Thanks - viraf
Indexing events using cutoff of 3 Computing event counts... done. 1315813 events Indexing... done. Collecting events... Done indexing. Incorporating indexed data for training... done. Number of Event Tokens: 1315813 Number of Outcomes: 39 Number of Predicates: 290935 Computing model parameters... Performing 300 iterations. 1: . (1313259/1315813) 0.9980589947051747 2: . (1314613/1315813) 0.9990880163062684 3: . (1314904/1315813) 0.9993091723519983 4: . (1315136/1315813) 0.9994854891994531 5: . (1315250/1315813) 0.9995721276503576 6: . (1315335/1315813) 0.9996367264953303 7: . (1315402/1315813) 0.999687645584897 8: . (1315451/1315813) 0.9997248849190576 9: . (1315517/1315813) 0.9997750440222128 10: . (1315509/1315813) 0.9997689641309213 20: . (1315687/1315813) 0.9999042417121582 Stopping: change in training set accuracy less than 1.0E-5 Stats: (1315427/1315813) 0.999706645245183 ...done. Compressed 290935 parameters to 13506 2507 outcome patterns