Did you configure a custom feature generator and  model type?

2016-06-28 10:03 GMT-03:00 Julian Bunzel <jbun...@zedat.fu-berlin.de>:

> Hey guys,
>
> I am currently trying to build my own NameFinderME model. The corpus (from
> LCC) has around 20K sentences and every sentence
> contains at least one (one-term) forename.
> I have got a name dictionary with names that occur in the corpus and split
> it up into an 80% and a 20% set. After that I tagged the sentences with a
> dictionary tagger while using the 80% set and trained the NameFinderME
> model based on the tagged sentences.
>
> Everything works fine, but the results are not the very best. When I am
> using the created model on the same (untagged) corpus, the model finds
> ~99% of the forenames that I used for training the model, but
> unfortunately it doesn't find many new entities or entities from the 20%
> set.
>
> Some information:
> Sentences in corpus: ~20K
> Different names occurring in corpus: ~2400
> Names in 80% set: ~1920 (Trained with Cutoff = 5; ~980 remaining names for
> training)
> Names in 20% set: ~480
> Overall found names with own model: ~1100
>
> 99% of the 980 names used for training are found, 20% of the "cut off"
> names are
> found and <1% names from the test set are found.
>
> Perhaps, you can give me some information how to increase the percentage
> of newly found entities or maybe I got something wrong.
>
> Cheers,
>
> Julian
>
>

Reply via email to