[Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Ji H. Park Sat, 28 Jul 2012 20:19:07 -0700

I'm using IPython notebook as the programming environment, and pandas and
sklearn packages to analyze data from Digit Recognizer
Tutorial<http://www.kaggle.com/c/digit-recognizer/data>
.


The data is available on the webpage (link above), and the attached is my
ipython notebook.

KNeighborsClassifier is used for the prediction.

Problem:

"MemoryError" occurs when loading large dataset using read_csv function. To
bypass this problem temporarily, I have to restart the kernel, which
then read_csv function successfully loads the file, but the same error
occurs when I run the same cell again.

Anyway, when the read_csv function loads the file successfully, after
making changes to the dataframe, I can pass the features and labels to the
KNeighborsClassifier's fit function. At this point, similar memory error
occurs.

I tried the following:
Iterate through the CSV file in chunks, and fit the data accordingly, but
the problem is that the the predictive model is overwritten every time it
fits a chunk of data...

What can I do to make this work?

Thanks!

DRecognizer.ipynb
Description: Binary data

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] Problem in Reading Large CSV and Fitting to ML Algorithm

Reply via email to