Hello list, I am currently trying to create a person-model for a specific domain for testing purposes. While the general suggestion is to have around 10k-15k sentences, I retrain and reevaluate the outcome of my trainingdata while tagging new sentences.
At the moment I am under 1k sentences. However I asked myself whether it makes sense to include sentences without persons or not. While playing around there was no clear conclusion to draw: Precision almost always increased when I included sentences without persons while *sometimes* recall dropped a little bit. Is there a general direction for tagging training data? Btw.: This is the first time I am preparing training data. I never saw a complete training-dataset before. Any experiences are appreciated! Thanks! Regards, Em