Hi,
I am having some issues about getting the nicely parallelized CV code
running on big data (888 features, 3333333 samples, 12.5 GB text file as
input). Olivier Grisel kindly helped me on StackOverflow, and I/we/he
think(s) there is a bug in joblib with automatic memory mapping. However,
when I map my data manually, I get different MemoryErrors, apparently
different ones for the different threads, and I have an even harder time
understanding what is going awry, and whether I could get it working
anytime soon.
If you care, please see the question with some sample code and error
messages on SO: http://stackoverflow.com/q/24406937/938408
I already submitted an issue to GitHub about the first issue, I have a very
poor understanding of the second.
https://github.com/scikit-learn/scikit-learn/issues/3313
Thanks,
Laszlo
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general