I am using OpenNLP 1.8.0 and have trained NameFinder with approximately 78K 
sentences (perceptron model).  I have 11 named entity types, and am finding 
alot of noise in the output.  Looking at the output from training it indicates 
39 outcomes.  I would have assumed that this would align with the number of 
named entity types.  Could one please explain what the Number of Outcomes 
refers to ?
Also any guidance on data prep and / or areas to explore on how to reduce the 
FP's would be helpful.
Thanks
- viraf


Indexing events using cutoff of 3

    Computing event counts...  done. 1315813 events
    Indexing...  done.
Collecting events... Done indexing.
Incorporating indexed data for training...  
done.
    Number of Event Tokens: 1315813
        Number of Outcomes: 39
      Number of Predicates: 290935
Computing model parameters...
Performing 300 iterations.
  1:  . (1313259/1315813) 0.9980589947051747
  2:  . (1314613/1315813) 0.9990880163062684
  3:  . (1314904/1315813) 0.9993091723519983
  4:  . (1315136/1315813) 0.9994854891994531
  5:  . (1315250/1315813) 0.9995721276503576
  6:  . (1315335/1315813) 0.9996367264953303
  7:  . (1315402/1315813) 0.999687645584897
  8:  . (1315451/1315813) 0.9997248849190576
  9:  . (1315517/1315813) 0.9997750440222128
 10:  . (1315509/1315813) 0.9997689641309213
 20:  . (1315687/1315813) 0.9999042417121582
Stopping: change in training set accuracy less than 1.0E-5
Stats: (1315427/1315813) 0.999706645245183
...done.
Compressed 290935 parameters to 13506
2507 outcome patterns 

Reply via email to