Hello There,

We are using opennlp for document categorization with Ngram Features to
categorize our incoming text. For example :

"The shape of water and Frances McDormand rule oscar 2018"

Given this sentence we would like to arrive at :

Shape of Water : Movie
Frances McDormand : Actress

This we are able to achieve with the following document categorization
training data and with the ngram features;

Movie Shape of Water
Actress Frances McDormand

*What is not working:*
If we try to categorize a single word say Oscar as an award category, we
are not able to. Any idea how we can get this working?

*Target training data*
Movie Shape of Water
Actress Frances McDormand
Award Oscar

*Desired Output :*
Shape of Water : Movie
Frances McDormand : Actress
Oscar: Award

Implementation details :
Open NLP version : 1.8.4
Training Algorithm used : Naive Bayes
Iteraitions set : 100

*General Questions*
Q :Why we cant use NER ?
A : We need ngram feature analysis which is not possible in NER.

Q : Are we going to build our own training data ?
A : Yes

Really appreciate any help towards solving this issue.

-- 
Thanks and Regards
Manjunath

Reply via email to