The most straight-forward approach of doing what you want would be to
define a string-similarity measure (i.e. Levenshtein-distance) and then
simply for each word in S2, iterate S1 and disregard all the occurrences
of words that return more than some predefined distance value. You are
actually overcomplicating the problem by using maxent...
hope that helps,
Jim
On 02/10/13 04:10, George Ramonov wrote:
Hi everyone,
I am new to OpenNLP maxent classifier, and I have a question regarding
using features that are label-dependent.
I have two sets of words (S1 and S2, where ||S1|| >> ||S2||), and I am
trying to create find words from S2 that are most similar to S1 using
features I designed. I turned this into a classification problem, treating
words from S2 as labels, and built a nice training set. However, my
features are dependent on the labels itself. I can't find a simple way in
OpenNLP to utilize labels in the prediction process. My guess is I would
have to subclass MaxentModel and implement eval() method? Is there an
easier way to solve this problem? Or perhaps, maximum entropy is not the
best algorithm of choice?
Thanks,
George