Hi Jorn, let me be more precise. Do you have a notion of how the precision-recall curve (AUC) changes as a function of the number of annotations? I'm curious how many annotations are needed for a model with reasonable precision-recall AUC and reasonable performance (memory and speed).
Peace. Michael On Mon, Oct 7, 2013 at 3:29 PM, Jörn Kottmann <[email protected]> wrote: > On 10/07/2013 11:00 PM, Michael Schmitz wrote: >> >> Do you know how many sentences/tokens were annotated for the OpenNLP >> POS and CHUNK models? Do you have an idea of the "sweet spot" for >> number of annotations vs performance? > > > If the model gets bigger the computations get more complex, but as far as I > know > the effect of the model not fitting anymore in the CPU cache is much more > significant then > that. I am using hash based int features to reduce the memory footprint in > the name finder. > > I don't have much experience with the Chunker or Pos Tagger in regards to > performance, but > it should be easy to do a series of tests, the command line tools have built > in performance monitoring. > > Jörn
