I can actually live with a Pr() >> 0 to matching labels, maybe. What might
be a reasonable option is to specify a sum of probabilities to get over a
certain margin. Like, sum the probabilities by order, and select the top
few that sum over a threshold. That could actually work.

~Ben

On Thu, Apr 12, 2018 at 10:26 PM, <dr...@apache.org> wrote:

> Hi Ben,
>
>    if a document that can be in multiple categories, you should see it
> reflected in the probabilities.  The top categories will be close in
> score.  It will not be 1/m because that would imply that ALL categories are
> “equally probable” or you have no idea.  However, if you have 3 classes and
> two are likely, it may be 0.49,0.49,0.02.  Remember that the results are
> normalized with by a softmax at the end. So the sum of all probabilities
> will be always 1.
>    Sorry, but multi-class classification is more complicated that binary
> classification.  If you really are interested in multi-label
> classification, I’m not sure maxent (at least the way openNLP formulated
> the solution) is appropriate for your needs.  You might want to consider
> individual binary classifiers for each label.  Have 1 model for each label:
>
> train_cat1.txt...
> cat_1_TRUE <text>
> cat_1_FALSE <text>
> …
>
> train_cat2.txt…
> cat_2_FALSE <text>
> cat_2_TRUE <text>
>
> Hope it helps, Let me know what you wind up doing...
> Daniel
>
> > On Apr 12, 2018, at 4:22 PM, Benedict Holland <
> benedict.m.holl...@gmail.com> wrote:
> >
> > Hello all,
> >
> > I understand that maximum entropy models are excellent at categorizing
> > documents. As it turns out, I have a situation where 1 document can be in
> > many categories (1:m relationship). I believe that I could create
> training
> > data that looks something like:
> >
> > category_1 <text>
> > category_2 <text>
> > ...
> >
> > If I do this, will the resulting probability model return category
> > probabilities as Pr(<text> in category_m) = 1/m for all categories m or
> it
> > return Pr(<text> in category_m) = 1 for all categories m?
> >
> > This is a very important distinction. I really hope it is the later. If
> it
> > isn't, do you have a way to make sure that if I receive a text that is
> > similar to the training data, I can get a probability close to 1 if it
> fits
> > into multiple categories?
> >
> > Thanks,
> > ~Ben
>
>

Reply via email to