I'm using IPython notebook as the programming environment, and pandas and sklearn packages to analyze data from Digit Recognizer Tutorial<http://www.kaggle.com/c/digit-recognizer/data> .
The data is available on the webpage (link above), and the attached is my ipython notebook. KNeighborsClassifier is used for the prediction. Problem: "MemoryError" occurs when loading large dataset using read_csv function. To bypass this problem temporarily, I have to restart the kernel, which then read_csv function successfully loads the file, but the same error occurs when I run the same cell again. Anyway, when the read_csv function loads the file successfully, after making changes to the dataframe, I can pass the features and labels to the KNeighborsClassifier's fit function. At this point, similar memory error occurs. I tried the following: Iterate through the CSV file in chunks, and fit the data accordingly, but the problem is that the the predictive model is overwritten every time it fits a chunk of data... What can I do to make this work? Thanks!
DRecognizer.ipynb
Description: Binary data
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
