Hi Ted, Could you, please, post again the reference of the first paper.
IMHO the problem with this kind of projects is *how* do you obtain and define the feature vector, that is the key in order to compare with enough success the different images and provided the k more similar images of any given image. gagan, look at the features provided in the UCI machine learning dataset for images, they are a good starting point without being necessary to perform image processing tasks. You can also obtain those features with the algorithms implemented on the OpenCV library. Cheers, Fede 2010/10/3 Ted Dunning <[email protected]>: > This paper had some interesting references. The problem they worked on was > different from yours, but if you > know something abou the training images, this might work out. The something > might be the original web-site > nearby text or almost anything. > > http://www.public.asu.edu/~huanliu/.../SBP09_3-31(Baoxin%20Li%20-4).pdf > > THis paper describes the use of Gabor transforms and histograms for image > clustering: > > http://www-nlpir.nist.gov/projects/tvpubs/tv6.papers/eurecom.pdf > > HSV histogram clustering might be a reasonable scale effort for a student > project. > > Another approach is to try a latent factor method to characterize images. > This paper describes an image completion task on a handwritten digit > dataset. I am pretty sure that clustering on these latent features would > give very nice clustering because they inherently have a Euclidean metric > imposed on them. > > http://arxiv.org/abs/1006.2156 > > The recommendation that you use OpenCV for image extraction is a very good > one. You might want to use Mahout for clustering, but I doubt you will have > enough images to make that worth-while. Just extracting useful features > will take a long time. > > On Sun, Oct 3, 2010 at 10:33 AM, gagan chhabra > <[email protected]>wrote: > >> Hello Steven Bourke, >> >> The data is actually not text. Query is an Image and database again of >> images. >> >> I wanted to know how can one declare one image similar to another, in >> programming terms. I mean there has to some parameter of analysis or >> algorithm which can solve this problem. >> >> >> >> On Sun, Oct 3, 2010 at 10:44 PM, Steven Bourke <[email protected]> wrote: >> >> > Where is the semantic data coming from? I think something like lucene >> would >> > be more relevant if you are searching text based on available meta data. >> > >> > On Sun, Oct 3, 2010 at 6:54 PM, Sean Owen <[email protected]> wrote: >> > >> > > You probably want to look at Shannon's spectral clustering code? >> That's >> > > the >> > > closest thing I can think of in Mahout. It doesn't have much of >> anything >> > > for image processing. >> > > >> > > On Sun, Oct 3, 2010 at 5:02 PM, gagan chhabra < >> [email protected] >> > > >wrote: >> > > >> > > > Hello all, >> > > > >> > > > I am a Engineering candidate and took a project which is based on >> > Machine >> > > > Learning. The idea is to Query-by-Image, it is a research paper by >> > > > Googlers. >> > > > I am not getting any point to start off. >> > > > >> > > > I don know if Mahout is of any use to me but since it is meant for >> > > Machine >> > > > Learnig I joined to know more about it. >> > > > >> > > > My application will go like: >> > > > > User eneters a query( which is an image). >> > > > >> > > > > Then the application searches for other images in database with >> same >> > > > semantic. >> > > > for example- if user enter an image of dog the app will retrieve >> other >> > > > images of dog >> > > > or if user enters an image of snowy-mountain it retrieves simila >> image. >> > > > >> > > > So i don get how to compare images. What metric to use to declare >> any >> > > > image >> > > > similar to query image. >> > > > >> > > > Please suggest something... any help will make a huge difference. >> > > > >> > > > -- >> > > > gagan >> > > > >> > > >> > >> >> >> >> -- >> gagan >> >
