I have talked to one user who had ~60,000 classes and they were able to use
OLR with success.

The way that they did this was to arrange the output classes into a
multi-level tree.  Then the trained classifiers at each level of the tree.
 At any level, if there was a dominating result, then only that sub-tree
would be searched.  Otherwise, all of the top few trees would be searched.

Thus, execution would proceed by evaluating the classifier at the root of
the tree.  One or more sub-trees would be selected.  Each of the
classifiers at the roots of these sub-trees would be evaluated.  This would
give a set of sub-sub-trees that eventually bottomed out with possible
answers.  These possible answers are combined to get a final set of
categories.

The detailed meanings of "dominating" and "top few" and "answers are
combined" are left as an exercise, but I think you can see the general
outline.  The detailed definitions are very likely application specific in
any case.



On Thu, Aug 1, 2013 at 11:25 AM, yikes aroni <[email protected]> wrote:

> Say that I am trying to determine which customers buy particular candy
> bars. So I want to classify training data consisting of candy bar
> attributes (an N dimensional vector of variables) into customer attributes
> (an M dimensional vector of customer attributes).
>
> Is there a preferred method when N and M are large? That is say 100 or
> more?
>
> I have done binary classification using AdaptiveLogisticRegression and
> OnlineLogisticRegression and small numbers of input features with relative
> success. As I'm trying to implement this for large N and M, I feel like i'm
> veering into the woods. Is there a code example anyone can point me to that
> uses mahout libraries to do multi-class classification when the number of
> classes is large?
>

Reply via email to