This paper had some interesting references.  The problem they worked on was
different from yours, but if you
know something abou the training images, this might work out.  The something
might be the original web-site
nearby text or almost anything.

http://www.public.asu.edu/~huanliu/.../SBP09_3-31(Baoxin%20Li%20-4).pdf

THis paper describes the use of Gabor transforms and histograms for image
clustering:

http://www-nlpir.nist.gov/projects/tvpubs/tv6.papers/eurecom.pdf

HSV histogram clustering might be a reasonable scale effort for a student
project.

Another approach is to try a latent factor method to characterize images.
 This paper describes an image completion task on a handwritten digit
dataset.  I am pretty sure that clustering on these latent features would
give very nice clustering because they inherently have a Euclidean metric
imposed on them.

http://arxiv.org/abs/1006.2156

The recommendation that you use OpenCV for image extraction is a very good
one.  You might want to use Mahout for clustering, but I doubt you will have
enough images to make that worth-while.  Just extracting useful features
will take a long time.

On Sun, Oct 3, 2010 at 10:33 AM, gagan chhabra <[email protected]>wrote:

> Hello Steven Bourke,
>
> The data is actually not text. Query is an Image and database again of
> images.
>
> I wanted to know how can one declare one image similar to another, in
> programming terms. I mean  there has to some parameter of analysis or
> algorithm which can solve this problem.
>
>
>
> On Sun, Oct 3, 2010 at 10:44 PM, Steven Bourke <[email protected]> wrote:
>
> > Where is the semantic data coming from? I think something like lucene
> would
> > be more relevant if you are searching text based on available meta data.
> >
> > On Sun, Oct 3, 2010 at 6:54 PM, Sean Owen <[email protected]> wrote:
> >
> > > You probably want to look at  Shannon's spectral clustering code?
> That's
> > > the
> > > closest thing I can think of  in Mahout. It doesn't have much of
> anything
> > > for image processing.
> > >
> > > On Sun, Oct 3, 2010 at 5:02 PM, gagan chhabra <
> [email protected]
> > > >wrote:
> > >
> > > > Hello all,
> > > >
> > > > I am a Engineering candidate and took a project which is based on
> > Machine
> > > > Learning. The idea is to Query-by-Image, it is a research paper by
> > > > Googlers.
> > > > I am not getting any point to start off.
> > > >
> > > > I don know if Mahout is of any use to me but since it is meant for
> > > Machine
> > > > Learnig I joined to know more about it.
> > > >
> > > > My application will go like:
> > > > >  User eneters a query( which is an image).
> > > >
> > > > >  Then the application searches for other images in database with
> same
> > > > semantic.
> > > >  for example- if user enter an image of dog the app will retrieve
> other
> > > > images of dog
> > > > or if user enters an image of snowy-mountain it retrieves simila
> image.
> > > >
> > > > So i don get  how to compare images. What metric to use to declare
> any
> > > > image
> > > > similar to query image.
> > > >
> > > > Please suggest something... any help will make a huge difference.
> > > >
> > > > --
> > > > gagan
> > > >
> > >
> >
>
>
>
> --
> gagan
>

Reply via email to