- Use the cross validation support to test how well it performs on your
training data
- Try the perceptron instead of maxent
- Try adding a word2vec dictionary
- Are there tokenization issues?
- Is there a difference between the data you use for training and tagging?

Even a really bad model would get to precision and recall in the range of
30 to 50%.

Jörn

On Tue, Jun 28, 2016 at 3:36 PM, William Colen <william.co...@gmail.com>
wrote:

> Did you configure a custom feature generator and  model type?
>
> 2016-06-28 10:03 GMT-03:00 Julian Bunzel <jbun...@zedat.fu-berlin.de>:
>
> > Hey guys,
> >
> > I am currently trying to build my own NameFinderME model. The corpus
> (from
> > LCC) has around 20K sentences and every sentence
> > contains at least one (one-term) forename.
> > I have got a name dictionary with names that occur in the corpus and
> split
> > it up into an 80% and a 20% set. After that I tagged the sentences with a
> > dictionary tagger while using the 80% set and trained the NameFinderME
> > model based on the tagged sentences.
> >
> > Everything works fine, but the results are not the very best. When I am
> > using the created model on the same (untagged) corpus, the model finds
> > ~99% of the forenames that I used for training the model, but
> > unfortunately it doesn't find many new entities or entities from the 20%
> > set.
> >
> > Some information:
> > Sentences in corpus: ~20K
> > Different names occurring in corpus: ~2400
> > Names in 80% set: ~1920 (Trained with Cutoff = 5; ~980 remaining names
> for
> > training)
> > Names in 20% set: ~480
> > Overall found names with own model: ~1100
> >
> > 99% of the 980 names used for training are found, 20% of the "cut off"
> > names are
> > found and <1% names from the test set are found.
> >
> > Perhaps, you can give me some information how to increase the percentage
> > of newly found entities or maybe I got something wrong.
> >
> > Cheers,
> >
> > Julian
> >
> >
>

Reply via email to