This paper had some interesting references. The problem they worked on was different from yours, but if you know something abou the training images, this might work out. The something might be the original web-site nearby text or almost anything.
http://www.public.asu.edu/~huanliu/.../SBP09_3-31(Baoxin%20Li%20-4).pdf THis paper describes the use of Gabor transforms and histograms for image clustering: http://www-nlpir.nist.gov/projects/tvpubs/tv6.papers/eurecom.pdf HSV histogram clustering might be a reasonable scale effort for a student project. Another approach is to try a latent factor method to characterize images. This paper describes an image completion task on a handwritten digit dataset. I am pretty sure that clustering on these latent features would give very nice clustering because they inherently have a Euclidean metric imposed on them. http://arxiv.org/abs/1006.2156 The recommendation that you use OpenCV for image extraction is a very good one. You might want to use Mahout for clustering, but I doubt you will have enough images to make that worth-while. Just extracting useful features will take a long time. On Sun, Oct 3, 2010 at 10:33 AM, gagan chhabra <[email protected]>wrote: > Hello Steven Bourke, > > The data is actually not text. Query is an Image and database again of > images. > > I wanted to know how can one declare one image similar to another, in > programming terms. I mean there has to some parameter of analysis or > algorithm which can solve this problem. > > > > On Sun, Oct 3, 2010 at 10:44 PM, Steven Bourke <[email protected]> wrote: > > > Where is the semantic data coming from? I think something like lucene > would > > be more relevant if you are searching text based on available meta data. > > > > On Sun, Oct 3, 2010 at 6:54 PM, Sean Owen <[email protected]> wrote: > > > > > You probably want to look at Shannon's spectral clustering code? > That's > > > the > > > closest thing I can think of in Mahout. It doesn't have much of > anything > > > for image processing. > > > > > > On Sun, Oct 3, 2010 at 5:02 PM, gagan chhabra < > [email protected] > > > >wrote: > > > > > > > Hello all, > > > > > > > > I am a Engineering candidate and took a project which is based on > > Machine > > > > Learning. The idea is to Query-by-Image, it is a research paper by > > > > Googlers. > > > > I am not getting any point to start off. > > > > > > > > I don know if Mahout is of any use to me but since it is meant for > > > Machine > > > > Learnig I joined to know more about it. > > > > > > > > My application will go like: > > > > > User eneters a query( which is an image). > > > > > > > > > Then the application searches for other images in database with > same > > > > semantic. > > > > for example- if user enter an image of dog the app will retrieve > other > > > > images of dog > > > > or if user enters an image of snowy-mountain it retrieves simila > image. > > > > > > > > So i don get how to compare images. What metric to use to declare > any > > > > image > > > > similar to query image. > > > > > > > > Please suggest something... any help will make a huge difference. > > > > > > > > -- > > > > gagan > > > > > > > > > > > > > -- > gagan >
