Did you configure a custom feature generator and model type? 2016-06-28 10:03 GMT-03:00 Julian Bunzel <jbun...@zedat.fu-berlin.de>:
> Hey guys, > > I am currently trying to build my own NameFinderME model. The corpus (from > LCC) has around 20K sentences and every sentence > contains at least one (one-term) forename. > I have got a name dictionary with names that occur in the corpus and split > it up into an 80% and a 20% set. After that I tagged the sentences with a > dictionary tagger while using the 80% set and trained the NameFinderME > model based on the tagged sentences. > > Everything works fine, but the results are not the very best. When I am > using the created model on the same (untagged) corpus, the model finds > ~99% of the forenames that I used for training the model, but > unfortunately it doesn't find many new entities or entities from the 20% > set. > > Some information: > Sentences in corpus: ~20K > Different names occurring in corpus: ~2400 > Names in 80% set: ~1920 (Trained with Cutoff = 5; ~980 remaining names for > training) > Names in 20% set: ~480 > Overall found names with own model: ~1100 > > 99% of the 980 names used for training are found, 20% of the "cut off" > names are > found and <1% names from the test set are found. > > Perhaps, you can give me some information how to increase the percentage > of newly found entities or maybe I got something wrong. > > Cheers, > > Julian > >