Hi Andy,

thanks for your comment. I didn't know about the pre_dispatch parameter
before. Here is my PR that adds the parameter to ``cross_val_score``
https://github.com/scikit-learn/scikit-learn/pull/1961

Unfortunately, this doesn't solve my problem. I'm still getting the same
error for an array with the dimension 2000x150000. I'm very surprised as
this is not exactly a very large dataset. Are you sure there are no other
settings (maybe on a system level) that might interfere with this issue?

Best,
 Matthias





Hi Matthias.
Unfortunately joblib doesn't handle large datasets very gracefully at
the moment.
Have you tried setting the pre_dispatch parameter? Otherwise it could be
that all jobs
are dispatched even if only two are run.

Hth,
Andy

On 05/12/2013 05:12 PM, Matthias Ekman wrote:
> Dear all,
>
> using sklearn 0.13 (fresh Ubuntu 12.04 installation), I'm getting the
> error below, which I belief is a memory error. What strikes me is that
> I'm using a machine with 512GB of RAM, so that shouldn't be happening.
>
> Is there maybe a system setting that limits the amount of RAM on a
> user basis?
>
> With n_features=14000, this is the memory usage
> In [5]: %memit cross_val_score(clf, X, y=y, score_func=score_func,
> cv=cv, n_jobs=2, verbose=0, fit_params=None)
> maximum of 1: 6997.214844 MB per loop
>
> Increasing the amount of features to n_features=150000 raises an
> error. Here is a minimalistic example:
>
> n_samples=1000 # per class
> n_features=150000
> X=np.random.randn(n_samples*2, n_features)
> y = np.repeat([0,1], n_samples)
>
> clf = svm.LinearSVC(C=1)
> score_func = accuracy_score
> cv = KFold(y.size, n_folds=3)
> scores = cross_val_score(clf, X, y=y, score_func=score_func, cv=cv,
> n_jobs=2, verbose=0, fit_params=None)
>
> Error
> -------
> Exception in thread Thread-3:
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
>     self.run()
>   File "/usr/lib/python2.7/threading.py", line 504, in run
>     self.__target(*self.__args, **self.__kwargs)
>   File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in
> _handle_tasks
>     put(task)
> SystemError: NULL result without error in PyObject_Call
>
> Appreciate your help,
>  Matthias
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to