You should probably read the paper: Training Highly Multiclass Classifiers
http://jmlr.org/papers/v15/gupta14a.html

That said, I think you could gain a lot of value by looking into
hierarchical approaches - training a bunch of small classifiers on subsets
of the overall data to subselect the "right region" before trying to do a
larger more exact classifier that focuses on specific areas.





On Fri, Jul 4, 2014 at 4:17 PM, Kartik Kumar Perisetla <
[email protected]> wrote:

> Hi Lars,
> I am trying to model a classifier trained on categories present in
> Wikipedia. There are approx 1 million categories in it.
>
> Is there a way to accomplish this?
>
> Any help would be appreciated.
>
> Thanks,
> Kartik Perisetla
> On Jul 3, 2014 6:28 PM, "Lars Buitinck" <[email protected]> wrote:
>
>> 2014-07-03 12:23 GMT+02:00 Kartik Kumar Perisetla <[email protected]
>> >:
>> > I am trying to use naive_bayes agorithm for training the model using
>> > partial_fit in scikit-learn.
>> >
>> > I tried with 16011( # of features) , 100 training instances and 1018664(
>> > total # of classes), I get an error when I invoke partial_fit method. I
>> > think there is a upper limit on ma
>> >
>> > I see that partial_fit will compute np.zeros((1018664, 16011) for this
>> which
>> > gives "Array is too big" exception.
>>
>> That array would take 121 GB of storage. In any case, 1e6 classes is
>> *extremely* multiclass. What are you trying to model?
>>
>>
>> ------------------------------------------------------------------------------
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to