Yes, it is. With NLP we need to deal with the low precision/recall. You could add more examples related to the misclassified tokens, but it is not easy. There is a number of issues you would face, such as overfitting. For a NER model I would not expect something better than 85% for F1, and you are getting 99%!
2015-03-15 18:12 GMT-03:00 Richard Head Jr. <[email protected]>: > > Could you restate your question? I could not understand what you want. > I was wondering why the precision/recall/f-measure values output > by TokenNameFinderEvaluatorwere higher for models that performed worse. By > "worse" I mean they miss tokens that the other, lower performing (according > to TokenNameFinderEvaluator) models, don't. > > Actually, your models are performing too good for a NER. > Well, they still misclassify tokens. Should I be using something else or > is this just the reality of NER? > Thanks. > On Monday, March 2, 2015 8:44 AM, William Colen < > [email protected]> wrote: > > > Could you restate your question? I could not understand what you want. > Actually, your models are performing too good for a NER. > > > 2015-02-27 23:34 GMT-03:00 Richard Head Jr. <[email protected] > >: > > > > Add -misclassified true > > Very handy > > > To evaluate you need a annotated corpus... > > This was my problem. > > No that I can run it I see measurements of 0.99XXXXX, but I noticed that > > the better models -as determined by my separate unit tests, which check > > what was actually classified- have lower measurements. > > According to my test cases this is a very good model: > > Precision: 0.9905921169966114Recall: 0.9946277476832162F-Measure: > > 0.9926058304478945 > > While this one is not so great: > > Precision: 0.9951354487436962Recall: 0.9982540179970453F-Measure: > > 0.9966922939388522 > > Am I missing something here? > > Thanks > > On Wednesday, February 25, 2015 11:48 PM, William Colen < > > [email protected]> wrote: > > > > > > Add > > -misclassified trueto the command to output what was misclassified.But I > > have a guess. To evaluate you need a annotated corpus. Is the > > file /tmp/db-raw.txt annotated? It should look like this:<START:person> > > Pierre Vinken <END> , 61 years old , will join the board as a > nonexecutive > > director Nov. 29 . > > Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch > > publishing group . > > <START:person> Rudolph Agnew <END> , 55 years old and former chairman of > > Consolidated Gold Fields PLC , > > was named a director of this British industrial conglomerate . > > Regards,William2015-02-26 1:24 GMT-03:00 Richard Head Jr. > > <[email protected]>: > > > > > Are you using 1.5.3? > > > > Yes. > > > > > Can you send a small sample? > > I can't send the model. Any other options? What format is the file given > > to the -data option supposed to be in? > > Thanks > > On Friday, February 20, 2015 2:14 PM, William Colen < > > [email protected]> wrote: > > > > > > Are you using 1.5.3? Can you send a small sample? > > > > Em segunda-feira, 16 de fevereiro de 2015, Richard Head Jr. > > <[email protected]> escreveu: > > > > I ran the command line evaluator several times on tokenized/untokenized > > and large/small input but get no results (see below). The model appears > to > > be finding tokens quite well, I'd just like to evaluate *how* well: > > > > opennlp TokenNameFinderEvaluator -data some-data.txt -model a-model.bin > > Loading Token Name Finder model ... done (0.111s) > > > > > > Average: 104.2 sent/s > > Total: 15 sent > > Runtime: 0.144s > > > > Precision: 0.0 > > Recall: 0.0 > > F-Measure: -1.0 > > > > Now on a larger set of data: > > > > opennlp TokenNameFinderEvaluator -encoding latin1 -data /tmp/db-raw.txt > > -model a-model.bin > > Loading Token Name Finder model ... done (0.156s) > > current: 364.9 sent/s avg: 364.9 sent/s total: 366 sent > > current: 427.4 sent/s avg: 396.1 sent/s total: 793 sent > > > > > > Average: 477.7 sent/s > > Total: 1434 sent > > Runtime: 3.002s > > > > Precision: 0.0 > > Recall: 0.0 > > F-Measure: -1.0 > > > > > > > > What am I doing wrong? > > > > > > Thanks > > > > > > > > -- > > William Colen > > > > > > > > > > > > > > > > > > > >
