- Use the cross validation support to test how well it performs on your training data - Try the perceptron instead of maxent - Try adding a word2vec dictionary - Are there tokenization issues? - Is there a difference between the data you use for training and tagging?
Even a really bad model would get to precision and recall in the range of 30 to 50%. Jörn On Tue, Jun 28, 2016 at 3:36 PM, William Colen <william.co...@gmail.com> wrote: > Did you configure a custom feature generator and model type? > > 2016-06-28 10:03 GMT-03:00 Julian Bunzel <jbun...@zedat.fu-berlin.de>: > > > Hey guys, > > > > I am currently trying to build my own NameFinderME model. The corpus > (from > > LCC) has around 20K sentences and every sentence > > contains at least one (one-term) forename. > > I have got a name dictionary with names that occur in the corpus and > split > > it up into an 80% and a 20% set. After that I tagged the sentences with a > > dictionary tagger while using the 80% set and trained the NameFinderME > > model based on the tagged sentences. > > > > Everything works fine, but the results are not the very best. When I am > > using the created model on the same (untagged) corpus, the model finds > > ~99% of the forenames that I used for training the model, but > > unfortunately it doesn't find many new entities or entities from the 20% > > set. > > > > Some information: > > Sentences in corpus: ~20K > > Different names occurring in corpus: ~2400 > > Names in 80% set: ~1920 (Trained with Cutoff = 5; ~980 remaining names > for > > training) > > Names in 20% set: ~480 > > Overall found names with own model: ~1100 > > > > 99% of the 980 names used for training are found, 20% of the "cut off" > > names are > > found and <1% names from the test set are found. > > > > Perhaps, you can give me some information how to increase the percentage > > of newly found entities or maybe I got something wrong. > > > > Cheers, > > > > Julian > > > > >