Ohh Thanks a lot Arnaud!

I was actually going to implement sparse matrix for this issue, but it is 
already done in this pull request.
BTW when will this change be merged to the master branch?

On a side note, what IDE do you guys prefer for python? I was just browsing 
through the code in master branch in GIT and just tried to do a Ctrl+Click out 
of habit on one of the methods and it didn't work. (I am mainly a Java 
developer using Eclipse). I was wondering if you guys have any suggestions.

 

Regards,
Mahendra Kariya



On Monday, 21 October 2013 1:55 PM, Arnaud Joly <arnaud4...@gmail.com> wrote:
 

>
>It sounds like you haven't enough memory to store a dense matrix of binarized 
>labels.
>
>
>There is already one pr that tries to alleviate this problem :
>see https://github.com/scikit-learn/scikit-learn/pull/2458
>
>
>
>
>Best,
>Arnaud
>
>
>
>On 20 Oct 2013, at 20:20, Olivier Grisel <olivier.gri...@ensta.org> wrote:
>
>2013/10/20 Mahendra Kariya <geek3142-skle...@yahoo.co.in>:
>>
>>Hi All,
>>>
>>>I have doing multi label classification for which I am using LabelBinarizer.
>>>I am dealing with more than 6M data items and each data item has minimum 1
>>>and maximum 5 labels. Number of unique labels is more than 42K. When I am
>>>trying to binarize labels, I am getting ValueError: array is too big which
>>>is obvious.
>>>
>>>Are there any other alternatives for classifying such large amount of
>>>multi-labelled data?
>>>
>>Rather than finding a way to workaround the bug I think we should try
>>to find a way to fix the bug :)
>>
>>Can you reproduce the issue with some randomly generated data? If so
>>please open a github issue with the code snippet to reproduce it. If
>>you want to investigate further and issue a Pull Request as well
>>please feel free to do so.
>>
>>-- 
>>Olivier
>>http://twitter.com/ogrisel - http://github.com/ogrisel
>>
>>------------------------------------------------------------------------------
>>October Webinars: Code for Performance
>>Free Intel webinars can help you accelerate application performance.
>>Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
>>the latest Intel processors and coprocessors. See abstracts and register >
>>http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>>_______________________________________________
>>Scikit-learn-general mailing list
>>Scikit-learn-general@lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
>------------------------------------------------------------------------------
>October Webinars: Code for Performance
>Free Intel webinars can help you accelerate application performance.
>Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
>the latest Intel processors and coprocessors. See abstracts and register >
>http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
>_______________________________________________
>Scikit-learn-general mailing list
>Scikit-learn-general@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to