This is a pretty classic machine learning problem and can be handled with several different algorithms. Logistic regression is the obvious choice, but clustering algorithms will work fine also. Just decompose the pixels into a really long vector and train your algorithm with the input-output pairs. You can get 100% accuracy on this pretty easily if you are careful with your bias-variance decomposition. This is a fun one for neural networks too!
Essentially any machine learning book will delve into greater detail on this as the US postal digit data has been around for a long time. I think Kaggle even had this as a training exercise for a while, so there's probably a ton of discussion of various methods and algorithms on their message boards. For kicks why don't you compare k-means clustering to logistic regression using Mahout? -Angus On Thu, Jan 23, 2014 at 8:00 PM, Chameera Wijebandara < [email protected]> wrote: > Hi, > > I am trying to classify handwritten digits using mahout classification. Any > suggestion to come up with good solution? > > -- > Thanks, > Chameera >
