Hello all, I wondered if anyone had insights on classifying data through multiple data sets or through a single data set for document categorizer ?
I am building a sentence category classifier (e.g. surprise, disgust, sarcasm, unknown) and wondered if one DocumentCategorizerME instance with training data with all four types is more effective (with confidence threshold of say 0.25 to decide on a category) or if I should seek to categorize sentences through three DocumentCategorizerME instances (one each for surprise, disgust, sarcasm, with confidence threshold of say 0.6 to decide on a category, unknown otherwise). I am a newbie to this mailing list, and apologize if this question is irrelevant. Please help me by pointing to the right direction. Best, Neeraj