This dataset [1] is published as text encoded files (a CSV-like
format) that you can read with `numpy.loadtxt` (read the source code
of the load_digits function for details).

[1] http://mlearn.ics.uci.edu/databases/optdigits/

In "real life", raw picture data is more often collected and stored as
PNG or JPEG files. To convert a single PNG or JPEG file into a numpy
array you can use `scipy.misc.imread` (you will additionally need to
install PIL or Pillow if not already installed on your system). Then
you can use usual numpy magic such as `numpy.reshape`, `numpy.hstack`
or `numpy.vstack` to rework and combine individual images into a
single homogeneous 2D array that represent the image collection as a
whole.

If you are not familar with common numpy array manipulations you
should start by reading a tutorial such as:

  http://scipy-lectures.github.io/

In "even more real life", images are not aligned or don't have the
same size so you need to write your own pre-processing layer using
tools such as http://scikit-image.org or http://opencv.org . This can
be very hard to get working. Some people even write PhD thesis on
stuff like that :)

-- 
Olivier

------------------------------------------------------------------------------
Introducing Performance Central, a new site from SourceForge and 
AppDynamics. Performance Central is your source for news, insights, 
analysis and resources for efficient Application Performance Management. 
Visit us today!
http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to