I second Steve. ClearTK is an NLP library built using UIMA (and uimaFIT). It has a fully functional Named Entity Extraction example annotator including evaluation using the MASC corpus.
Himanshu On Tue, Nov 6, 2012 at 4:28 AM, Steven Bethard <[email protected]>wrote: > On Tue, Nov 6, 2012 at 9:26 AM, Yasen Kiprov <[email protected]> > wrote: > > I'm writing a named entity recognition system for text excerpts from the > social/public domain: blogs, news, etc. I'm testing different approaches > with rules and ML and I need to evaluate annotations accuracy (in terms of > f-score against a gold corpus). My plan is to use the MASC corpus or build > a custom one but the first task is to find the right tools for evaluation. > > This sounds a lot like an example we have in ClearTK (it also uses the > MASC named entity data). The full cross-validation evaluation code is > here: > > > https://code.google.com/p/cleartk/source/browse/cleartk-examples/src/main/java/org/cleartk/examples/chunking/EvaluateNamedEntityChunker.java > > And the class that actually calculates F-score, etc. over annotations is > here: > > > https://code.google.com/p/cleartk/source/browse/cleartk-eval/src/main/java/org/cleartk/eval/AnnotationStatistics.java > > Steve >
