Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Gokhan Capan
Cassio, I am not sure if there are direct/indirect ways to to this with existing code. Recall that an item neighborhood based score prediction, in simplest terms, is a weighted average of the active user's ratings on other items, where the weights are item-to-item similarities. Applying a decay

RE: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dyer, James
Pat, Can you give us the query it generates when you enter vampire werewolf zombie, q/qt/defType ? My guess is you're using the default query parser with q.op=AND , or, you're using dismax/edismax with a high mm (min-must-match) value. James Dyer Ingram Content Group (615) 213-4311

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
I have dismax (no edismax) but am not using it yet, using the default query, which does use ‘AND’. I had much the same though as I slept on it. Changing to OR is now working much much better. So obvious it almost bit me, not good in this case... With only a trivially small amount of testing

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dominik Hübner
Does anyone know what the difference is between keeping the ids in a space delimited string and indexing a multivalued field of ids? I recently tried the latter since ... it felt right, however I am not sure which of both has which advantages. On 07 Nov 2013, at 18:18, Pat Ferrel

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
One difference is that a “text” field has analyzers like Porter stemming applied. I had to take these out of the schema.xml. I think TFIDF is also applied to the tems in “text” but may not be to MV fields. I think TFIDF is good in the application. The idea is that if everyone likes a movie, it

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Andrew Psaltis
Pat, Perhaps I am missing something here, but why not use a String field if you do not need any of the analysis? Seems like from your previous email The query is a simple text query made of space delimited video id strings - - that you basically have a keyword style query which would seem to fit

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
Yes you are correct but my integration framework treats non-text fields as scalars so it is easier to neuter text than implement fulltext searching on strings. I would do what you suggest if were using raw Solr. My understanding was that string also does not get tfidf applied, which is not what

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
Interesting to think about ordering and adjacentness. The index ids are sorted by Mahout strength so the first id is the most similar to the row key and so forth. But the query is ordered buy recency. In both cases the first id is in some sense the most important. Does Solr/Lucene care about

Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Pat Ferrel
Not sure how you are going to decay in Mahout. Once ingested into Mahout there are no timestamps. So you’ll have to do that before ingesting. Last year we set up an ecom-department store type recommender with data from online user purchase, add-to-cart, and view. The data was actual user

RE: Solr-recommender for Mahout 0.9

2013-11-07 Thread Dyer, James
Best to my knowledge, Lucene does not care about the position of a keyword within a document. You could bucket the ids into several fields. Then use a dismax query to boost the top-tier ids more than then second, etc. A more fine-grained approach would probably involve a custom Similarity

OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Andreas Bauer
Hi, I’m trying to use OnlineLogisticRegression for a two-class classification problem, but as my classification results are not very good, I wanted to ask for support to find out if my settings are correct and if I’m using Mahout correctly. Because if I’m doing it correctly then probably my

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Pat Ferrel
Another approach would be to weight the terms in the docs by there Mahout similarity strength. But that will be for another day. My current question is whether Lucene looks at word proximity. I see the query syntax supports proximity but I don’t see that it is default so that’s good. On Nov

Re: Solr-recommender for Mahout 0.9

2013-11-07 Thread Ken Krugler
Hi Pat, On Nov 7, 2013, at 7:30pm, Pat Ferrel pat.fer...@gmail.com wrote: Another approach would be to weight the terms in the docs by there Mahout similarity strength. But that will be for another day. My current question is whether Lucene looks at word proximity. I see the query

Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Ted Dunning
On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan gkhn...@gmail.com wrote: This particular approach is discussed, and proven to increase the accuracy in Collaborative filtering with Temporal Dynamics by Yehuda Koren. The decay function is parameterized per user, keeping track of how consistent

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Ted Dunning
Why is FEATURE_NUMBER != 13? With 12 features that are already lovely and continuous, just stick them in elements 1..12 of a 13 long vector and put a constant value at the beginning of it. Hashed encoding is good for sparse stuff, but confusing for your case. Also, it looks like you only pass

Re: OnlineLogisticRegression: Are my settings sensible

2013-11-07 Thread Ted Dunning
On Thu, Nov 7, 2013 at 9:45 PM, Andreas Bauer b...@gmx.net wrote: Hi, Thanks for your comments. I modified the examples from the mahout in action book, therefore I used the hashed approach and that's why i used 100 features. I'll adjust the number. Makes sense. But the book was doing

Re: Decaying score for old preferences when using the .refresh()

2013-11-07 Thread Gokhan Capan
On Fri, Nov 8, 2013 at 6:24 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan gkhn...@gmail.com wrote: This particular approach is discussed, and proven to increase the accuracy in Collaborative filtering with Temporal Dynamics by Yehuda Koren. The