2014-07-04 16:37 GMT+02:00 Kyle Kastner <[email protected]>: > You should probably read the paper: Training Highly Multiclass Classifiers > http://jmlr.org/papers/v15/gupta14a.html > > That said, I think you could gain a lot of value by looking into > hierarchical approaches - training a bunch of small classifiers on subsets > of the overall data to subselect the "right region" before trying to do a > larger more exact classifier that focuses on specific areas.
Or maybe reduce the problem to binary classification (yes/no) of (sample, category) pairs. That way, you can also add features based on the candidate category's name, e.g. "token X occurs in both category name and sample document". ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
