This dataset [1] is published as text encoded files (a CSV-like format) that you can read with `numpy.loadtxt` (read the source code of the load_digits function for details).
[1] http://mlearn.ics.uci.edu/databases/optdigits/ In "real life", raw picture data is more often collected and stored as PNG or JPEG files. To convert a single PNG or JPEG file into a numpy array you can use `scipy.misc.imread` (you will additionally need to install PIL or Pillow if not already installed on your system). Then you can use usual numpy magic such as `numpy.reshape`, `numpy.hstack` or `numpy.vstack` to rework and combine individual images into a single homogeneous 2D array that represent the image collection as a whole. If you are not familar with common numpy array manipulations you should start by reading a tutorial such as: http://scipy-lectures.github.io/ In "even more real life", images are not aligned or don't have the same size so you need to write your own pre-processing layer using tools such as http://scikit-image.org or http://opencv.org . This can be very hard to get working. Some people even write PhD thesis on stuff like that :) -- Olivier ------------------------------------------------------------------------------ Introducing Performance Central, a new site from SourceForge and AppDynamics. Performance Central is your source for news, insights, analysis and resources for efficient Application Performance Management. Visit us today! http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general