Hi I've got a reasonably large dataset I'm trying to do a gridsearch on. If I feed in a subset of it it works fine, but if I feed in the entire file it dies with : "Array can't be memory-mapped: Python objects in dtype.". Now I realize what that's telling me, but I seem to remember building pipelines with a countvectorizer in it a ton of times, and feeding datasets with columns of strings to my gridsearches fit methods. Also why would this work on a small file, but not a large one?
I stuck a fake classifier in the top of my pipeline with some print statements to find out if it was my pipeline that was causing it, but I never get there. So it seems to be before any of the input data is passed to my pipeline. Backtrace : https://gist.github.com/andaag/f8e4c3df2e41fcc1f84f Anyone have any ideas whats going on? This is on scikit 0.15.1. The dtypes are identical on the large file and the smaller one. -- Best regards Anders Aagaard
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
