I can actually live with a Pr() >> 0 to matching labels, maybe. What might be a reasonable option is to specify a sum of probabilities to get over a certain margin. Like, sum the probabilities by order, and select the top few that sum over a threshold. That could actually work.

~Ben On Thu, Apr 12, 2018 at 10:26 PM, <dr...@apache.org> wrote: > Hi Ben, > > if a document that can be in multiple categories, you should see it > reflected in the probabilities. The top categories will be close in > score. It will not be 1/m because that would imply that ALL categories are > “equally probable” or you have no idea. However, if you have 3 classes and > two are likely, it may be 0.49,0.49,0.02. Remember that the results are > normalized with by a softmax at the end. So the sum of all probabilities > will be always 1. > Sorry, but multi-class classification is more complicated that binary > classification. If you really are interested in multi-label > classification, I’m not sure maxent (at least the way openNLP formulated > the solution) is appropriate for your needs. You might want to consider > individual binary classifiers for each label. Have 1 model for each label: > > train_cat1.txt... > cat_1_TRUE <text> > cat_1_FALSE <text> > … > > train_cat2.txt… > cat_2_FALSE <text> > cat_2_TRUE <text> > > Hope it helps, Let me know what you wind up doing... > Daniel > > > On Apr 12, 2018, at 4:22 PM, Benedict Holland < > benedict.m.holl...@gmail.com> wrote: > > > > Hello all, > > > > I understand that maximum entropy models are excellent at categorizing > > documents. As it turns out, I have a situation where 1 document can be in > > many categories (1:m relationship). I believe that I could create > training > > data that looks something like: > > > > category_1 <text> > > category_2 <text> > > ... > > > > If I do this, will the resulting probability model return category > > probabilities as Pr(<text> in category_m) = 1/m for all categories m or > it > > return Pr(<text> in category_m) = 1 for all categories m? > > > > This is a very important distinction. I really hope it is the later. If > it > > isn't, do you have a way to make sure that if I receive a text that is > > similar to the training data, I can get a probability close to 1 if it > fits > > into multiple categories? > > > > Thanks, > > ~Ben > >