[Scikit-learn-general] Grid search with large datasets

Andreas Müller Wed, 16 Nov 2011 11:12:21 -0800

Hi everybody.
Today I ran into problems doing grid search on a large dataset.
When I try to use more than one job, I get the following error message:


Process PoolWorker-1:
Traceback (most recent call last):
   File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in 
_bootstrap
     self.run()
   File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
     self._target(*self._args, **self._kwargs)
   File "/usr/lib/python2.6/multiprocessing/pool.py", line 57, in worker
     task = get()
   File "/usr/lib/python2.6/multiprocessing/queues.py", line 352, in get
     return recv()
ValueError: buffer size does not match array size

I am not so familiar with joblib and multiprocessing so I don't know how 
to proceed.
A student at my lab had the same problem before, though I thought
it was because the dataset did not fit into RAM multiple times.

I had the problem when I ran SGD on a ~6 GB dataset on a 48GB machine.
The whole process took "only" 12GB so there should be plenty of room.

Does anyone know about this problem? It seems to be pretty low
level so I had the impression it should not be a place I can break ;)

Cheers,
Andy

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Grid search with large datasets

Reply via email to