Hello all,

I understand that maximum entropy models are excellent at categorizing
documents. As it turns out, I have a situation where 1 document can be in
many categories (1:m relationship). I believe that I could create training
data that looks something like:

category_1 <text>
category_2 <text>
...

If I do this, will the resulting probability model return category
probabilities as Pr(<text> in category_m) = 1/m for all categories m or it
return Pr(<text> in category_m) = 1 for all categories m?

This is a very important distinction. I really hope it is the later. If it
isn't, do you have a way to make sure that if I receive a text that is
similar to the training data, I can get a probability close to 1 if it fits
into multiple categories?

Thanks,
~Ben

Reply via email to