Only slightly related, for cross validation one might also want to calculate standard deviation, then its easy to see if there a big outliers in the individual computations. They might not be noticeable when only the average is printed.
Jörn On 8/17/11 6:51 PM, [email protected] wrote:
Hi, Would it be useful to have detailed output from FMeasure while using span with types? For example, we should use it to know individual precision and recall for person, organization, date in a NameFinder model or for Chunker. Something the output from CONLL2000<http://www.cnts.ua.ac.be/conll2000/chunking/output.html> : processed 961 tokens with 459 phrases; found: 539 phrases; correct: 371. accuracy: 84.08%; precision: 68.83%; recall: 80.83%; FB1: 74.35 ADJP: precision: 0.00%; recall: 0.00%; FB1: 0.00 ADVP: precision: 45.45%; recall: 62.50%; FB1: 52.63 NP: precision: 64.98%; recall: 78.63%; FB1: 71.16 PP: precision: 83.18%; recall: 98.89%; FB1: 90.36 SBAR: precision: 66.67%; recall: 33.33%; FB1: 44.44 VP: precision: 69.00%; recall: 79.31%; FB1: 73.80 I will need something like that for my master dissertation. If it is useful I would add it to OpenNLP. Thanks, William
