Hi everybody.
Today I ran into problems doing grid search on a large dataset.
When I try to use more than one job, I get the following error message:
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/multiprocessing/process.py", line 232, in
_bootstrap
self.run()
File "/usr/lib/python2.6/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.6/multiprocessing/pool.py", line 57, in worker
task = get()
File "/usr/lib/python2.6/multiprocessing/queues.py", line 352, in get
return recv()
ValueError: buffer size does not match array size
I am not so familiar with joblib and multiprocessing so I don't know how
to proceed.
A student at my lab had the same problem before, though I thought
it was because the dataset did not fit into RAM multiple times.
I had the problem when I ran SGD on a ~6 GB dataset on a 48GB machine.
The whole process took "only" 12GB so there should be plenty of room.
Does anyone know about this problem? It seems to be pretty low
level so I had the impression it should not be a place I can break ;)
Cheers,
Andy
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general