Hello There, We are using opennlp for document categorization with Ngram Features to categorize our incoming text. For example :
"The shape of water and Frances McDormand rule oscar 2018" Given this sentence we would like to arrive at : Shape of Water : Movie Frances McDormand : Actress This we are able to achieve with the following document categorization training data and with the ngram features; Movie Shape of Water Actress Frances McDormand *What is not working:* If we try to categorize a single word say Oscar as an award category, we are not able to. Any idea how we can get this working? *Target training data* Movie Shape of Water Actress Frances McDormand Award Oscar *Desired Output :* Shape of Water : Movie Frances McDormand : Actress Oscar: Award Implementation details : Open NLP version : 1.8.4 Training Algorithm used : Naive Bayes Iteraitions set : 100 *General Questions* Q :Why we cant use NER ? A : We need ngram feature analysis which is not possible in NER. Q : Are we going to build our own training data ? A : Yes Really appreciate any help towards solving this issue. -- Thanks and Regards Manjunath