Ha, no, buying more RAM would probably not be viable (I want to avoid
starting a war with my sysadmin).

I do think that memory is the issue here. I ran this code via the command
line and noticed that it was throwing this error:

Exception in thread Thread-3:
> Traceback (most recent call last):
>   File "/usr/local/anaconda-1.9.2/lib/python2.7/threading.py", line 810,
> in __bootstrap_inner
>     self.run()
>   File "/usr/local/anaconda-1.9.2/lib/python2.7/threading.py", line 763,
> in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "/usr/local/anaconda-1.9.2/lib/python2.7/multiprocessing/pool.py",
> line 342, in _handle_tasks
>     put(task)
> SystemError: NULL result without error in PyObject_Call


 I noticed that it works/fails under the following conditions:

*(hangs with the above error)*
X is 815,000 by 400, with nbytes ~2,608,000,000
using two jobs, for a total of ~5,216,000,000 bytes

*(works fine and fits in a few minutes)*
X is 815,000 by 300, with nbytes ~1,956,000,000 bytes each
using three jobs, for a total of ~5,868,000,000 bytes (more than the
conditions that threw an error)


I did some googling, and it seems that this has been reported in other
sklearn scripts as well, and that it is indeed a memory error of some sort:

https://github.com/scikit-learn/scikit-learn/issues/3032
https://github.com/scikit-learn/scikit-learn/issues/2878
<https://github.com/scikit-learn/scikit-learn/issues/2878>


That said, there didn't seem to be a really satisfying conclusion to the
issue above other than upgrading sklearn to the bleeding edge version, so
I'm hoping to avoid that if possible. Any thoughts?
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to