It sounds like you haven't enough memory to store a dense matrix of binarized
labels.
There is already one pr that tries to alleviate this problem :
see https://github.com/scikit-learn/scikit-learn/pull/2458
Best,
Arnaud
On 20 Oct 2013, at 20:20, Olivier Grisel <[email protected]> wrote:
> 2013/10/20 Mahendra Kariya <[email protected]>:
>> Hi All,
>>
>> I have doing multi label classification for which I am using LabelBinarizer.
>> I am dealing with more than 6M data items and each data item has minimum 1
>> and maximum 5 labels. Number of unique labels is more than 42K. When I am
>> trying to binarize labels, I am getting ValueError: array is too big which
>> is obvious.
>>
>> Are there any other alternatives for classifying such large amount of
>> multi-labelled data?
>
> Rather than finding a way to workaround the bug I think we should try
> to find a way to fix the bug :)
>
> Can you reproduce the issue with some randomly generated data? If so
> please open a github issue with the code snippet to reproduce it. If
> you want to investigate further and issue a Pull Request as well
> please feel free to do so.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general