Hi Manjunath, The best way is to go with NER.
I don't get what you mean by N-gram feature analysis. Would be helpful if you could elaborate. >From your example I see all are exact matches. So I suggest you go with a Dictionary Name Finder. Thanks, Manoj. On Mon, Mar 5, 2018 at 4:16 PM, manjunath nakshathri <nakshat...@gmail.com> wrote: > Hello There, > > We are using opennlp for document categorization with Ngram Features to > categorize our incoming text. For example : > > "The shape of water and Frances McDormand rule oscar 2018" > > Given this sentence we would like to arrive at : > > Shape of Water : Movie > Frances McDormand : Actress > > This we are able to achieve with the following document categorization > training data and with the ngram features; > > Movie Shape of Water > Actress Frances McDormand > > *What is not working:* > If we try to categorize a single word say Oscar as an award category, we > are not able to. Any idea how we can get this working? > > *Target training data* > Movie Shape of Water > Actress Frances McDormand > Award Oscar > > *Desired Output :* > Shape of Water : Movie > Frances McDormand : Actress > Oscar: Award > > Implementation details : > Open NLP version : 1.8.4 > Training Algorithm used : Naive Bayes > Iteraitions set : 100 > > *General Questions* > Q :Why we cant use NER ? > A : We need ngram feature analysis which is not possible in NER. > > Q : Are we going to build our own training data ? > A : Yes > > Really appreciate any help towards solving this issue. > > -- > Thanks and Regards > Manjunath > -- Regards, Manoj.