The most straight-forward approach of doing what you want would be to define a string-similarity measure (i.e. Levenshtein-distance) and then simply for each word in S2, iterate S1 and disregard all the occurrences of words that return more than some predefined distance value. You are actually overcomplicating the problem by using maxent...

hope that helps,
Jim



On 02/10/13 04:10, George Ramonov wrote:
Hi everyone,

I am new to OpenNLP maxent classifier, and I have a question regarding
using features that are label-dependent.

I have two sets of words (S1 and S2, where ||S1|| >> ||S2||), and I am
trying to create find words from S2 that are most similar to S1 using
features I designed. I turned this into a classification problem, treating
words from S2 as labels, and built a nice training set. However, my
features are dependent on the labels itself. I can't find a simple way in
OpenNLP to utilize labels in the prediction process. My guess is I would
have to subclass MaxentModel and implement eval() method? Is there an
easier way to solve this problem? Or perhaps, maximum entropy is not the
best algorithm of choice?

Thanks,
George


Reply via email to