How does good training data look like?

Em Sun, 02 Oct 2011 15:59:43 -0700

Hello list,

I am currently trying to create a person-model for a specific domain for
testing purposes.
While the general suggestion is to have around 10k-15k sentences, I
retrain and reevaluate the outcome of my trainingdata while tagging new
sentences.


At the moment I am under 1k sentences. However I asked myself whether it
makes sense to include sentences without persons or not.
While playing around there was no clear conclusion to draw: Precision
almost always increased when I included sentences without persons while
*sometimes* recall dropped a little bit.

Is there a general direction for tagging training data?

Btw.: This is the first time I am preparing training data. I never saw a
complete training-dataset before.

Any experiences are appreciated!

Thanks!

Regards,
Em

How does good training data look like?

Reply via email to