Hi

I've got a reasonably large dataset I'm trying to do a gridsearch on. If I
feed in a subset of it it works fine, but if I feed in the entire file it
dies with : "Array can't be memory-mapped: Python objects in dtype.". Now I
realize what that's telling me, but I seem to remember building pipelines
with a countvectorizer in it a ton of times, and feeding datasets with
columns of strings to my gridsearches fit methods. Also why would this work
on a small file, but not a large one?

I stuck a fake classifier in the top of my pipeline with some print
statements to find out if it was my pipeline that was causing it, but I
never get there. So it seems to be before any of the input data is passed
to my pipeline.

Backtrace : https://gist.github.com/andaag/f8e4c3df2e41fcc1f84f

Anyone have any ideas whats going on? This is on scikit 0.15.1. The dtypes
are identical on the large file and the smaller one.

-- 
Best regards
Anders Aagaard
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to