Re: Solr and TF-IDF

Walter Underwood Thu, 26 Jan 2012 09:28:04 -0800

Why are you using a search engine to build a recomender? None of the leading 
teams in the Netflix Prize used search engines as a base technology.


Start with the recommender algorithms in Mahout: http://mahout.apache.org/

wunder

On Jan 26, 2012, at 9:18 AM, Nejla Karacan wrote:

> Hey there,
> 
> I'm using Solr for my thesis, where I have to implement a content-based
> recommender system for movies.
> 
> I have indexed about 20thousand movies with their informations:
> movie-id
> title
> genre
> plot/movie-description <- !!!
> cast
> 
> I've enabled the TermvektorComponent for the fields genre, description and
> cast.
> So I can get the tf-idf-values for the terms of every movie.
> 
> With these term-TfIdfValue-couples I have to compute the similarities
> between movies by using the cosine similarity.
> I know about the Solr-Feature MLT (MoreLikeThis), but thats not the
> solution, I have to
> implement the CosineSimilarity in java myself.
> 
> Now I have some problems/questions:
> I get the responses in XML-format, which I read out with an XML-reader in
> Java,
> where it wriggle trough every child-node in order to reach the right node.
> Is there a better way, to get these values in Node-Attributes or node-texts?
> I have tried it with wt=csv but for the requests I get
> responses only with the Movie-ID's, nothing more.
> By XML-responseWriter my request is for example this:
> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true
> I get the right response with all terms and tf-tdf's - in xml.
> 
> And if I add csv-notation
> http://localhost:8983/solr/select/?qt=tvrh&q=id:1800180382&fl=id&tv.tf_idf=true&wt=csv
> I get only this:
> id
> 1800180382
> 
> Maybe my request is wrong?
> 
> Another problem is, if I get the terms and their tfidf-values, I store
> them in a map.
> But there isn't a succession in the values. I want e.g. store only the 10
> chief terms,
> so 10 terms with the highest tfidf-values. Can I sort them in a descending
> succession?
> I haven't find anything therefor. If its not possible, I must sort them
> later in the map.
> 
> My last question is:
> any movie has a genre - often more than one.
> Its like the "cat"-field (category) in the exampledocs with ipod/monitor
> etc. and its an important pointfor the movies.
> How can I integrate this factor?
> I changed the boost-attribute in the Solr-Xml-Schema like this:
> <field name="genre" type="string" indexed="true" stored="true"
> multiValued="true" omitNorms="false" boost="3" termVectors="true"
> termPositions="true" termOffsets="true"/>
> Is that enough or is there any other possibility?
> 
> Perhaps you see, that I am a beginner in Solr,
> at the beginning a few weeks ago it was even more difficult for me but now
> it goes better.
> I would be very grateful for any help, ideas, tips or suggestions!
> 
> Many regards
> Nejla
>

Re: Solr and TF-IDF

Reply via email to