Hi Matthias.
Unfortunately joblib doesn't handle large datasets very gracefully at the moment. Have you tried setting the pre_dispatch parameter? Otherwise it could be that all jobs
are dispatched even if only two are run.

Hth,
Andy

On 05/12/2013 05:12 PM, Matthias Ekman wrote:
Dear all,

using sklearn 0.13 (fresh Ubuntu 12.04 installation), I'm getting the error below, which I belief is a memory error. What strikes me is that I'm using a machine with 512GB of RAM, so that shouldn't be happening.

Is there maybe a system setting that limits the amount of RAM on a user basis?

With n_features=14000, this is the memory usage
In [5]: %memit cross_val_score(clf, X, y=y, score_func=score_func, cv=cv, n_jobs=2, verbose=0, fit_params=None)
maximum of 1: 6997.214844 MB per loop

Increasing the amount of features to n_features=150000 raises an error. Here is a minimalistic example:

n_samples=1000 # per class
n_features=150000
X=np.random.randn(n_samples*2, n_features)
y = np.repeat([0,1], n_samples)

clf = svm.LinearSVC(C=1)
score_func = accuracy_score
cv = KFold(y.size, n_folds=3)
scores = cross_val_score(clf, X, y=y, score_func=score_func, cv=cv, n_jobs=2, verbose=0, fit_params=None)

Error
-------
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    put(task)
SystemError: NULL result without error in PyObject_Call

Appreciate your help,
 Matthias



------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may


_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and 
their applications. This 200-page book is written by three acclaimed 
leaders in the field. The early access version is available now. 
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to