Why are you using a search engine to build a recomender? None of the leading teams in the Netflix Prize used search engines as a base technology.
Start with the recommender algorithms in Mahout: http://mahout.apache.org/ wunder On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote: > Hey there, > > I'm using Solr for my thesis, where I have to implement a content-based > recommender system for movies. > > I have indexed about 20thousand movies with their informations: > movie-id > title > genre > plot/movie-description <- !!! > cast > > I've enabled the TermvektorComponent for the fields genre, description and > cast. > So I can get the tf-idf-values for the terms of every movie. > > With these term-TfIdfValue-couples I have to compute the similarities > between movies by using the cosine similarity. > I know about the Solr-Feature MLT (MoreLikeThis), but thats not the > solution, I have to > implement the CosineSimilarity in java myself. > > Now I have some problems/questions: > I get the responses in XML-format, which I read out with an XML-reader in > Java, > where it wriggle trough every child-node in order to reach the right node. > Is there a better way, to get these values in Node-Attributes or node-texts? > I have tried it with wt=csv but for the requests I get > responses only with the Movie-ID's, nothing more. > By XML-responseWriter my request is for example this: > http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true > I get the right response with all terms and tf-tdf's - in xml. > > And if I add csv-notation > http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv > I get only this: > id > 1800180382 > > Maybe my request is wrong? > > Another problem is, if I get the terms and their tfidf-values, I store > them in a map. > But there isn't a succession in the values. I want e.g. store only the 10 > chief terms, > so 10 terms with the highest tfidf-values. Can I sort them in a descending > succession? > I haven't find anything therefor. If its not possible, I must sort them > later in the map. > > My last question is: > any movie has a genre - often more than one. > Its like the "cat"-field (category) in the exampledocs with ipod/monitor > etc. and its an important pointfor the movies. > How can I integrate this factor? > I changed the boost-attribute in the Solr-Xml-Schema like this: > <field name="genre" type="string" indexed="true" stored="true" > multiValued="true" omitNorms="false" boost="3" termVectors="true" > termPositions="true" termOffsets="true"/> > Is that enough or is there any other possibility? > > Perhaps you see, that I am a beginner in Solr, > at the beginning a few weeks ago it was even more difficult for me but now > it goes better. > I would be very grateful for any help, ideas, tips or suggestions! > > Many regards > Nejla >