Cassio,
I am not sure if there are direct/indirect ways to to this with existing
code.
Recall that an item neighborhood based score prediction, in simplest terms,
is a weighted average of the active user's ratings on other items, where
the weights are item-to-item similarities. Applying a decay
Pat,
Can you give us the query it generates when you enter vampire werewolf
zombie, q/qt/defType ?
My guess is you're using the default query parser with q.op=AND , or, you're
using dismax/edismax with a high mm (min-must-match) value.
James Dyer
Ingram Content Group
(615) 213-4311
I have dismax (no edismax) but am not using it yet, using the default query,
which does use ‘AND’. I had much the same though as I slept on it. Changing to
OR is now working much much better. So obvious it almost bit me, not good in
this case...
With only a trivially small amount of testing
Does anyone know what the difference is between keeping the ids in a space
delimited string and indexing a multivalued field of ids? I recently tried the
latter since ... it felt right, however I am not sure which of both has which
advantages.
On 07 Nov 2013, at 18:18, Pat Ferrel
One difference is that a “text” field has analyzers like Porter stemming
applied. I had to take these out of the schema.xml. I think TFIDF is also
applied to the tems in “text” but may not be to MV fields. I think TFIDF is
good in the application. The idea is that if everyone likes a movie, it
Pat,
Perhaps I am missing something here, but why not use a String field if you
do not need any of the analysis? Seems like from your previous email The
query is a simple text query made of space delimited video id strings - -
that you basically have a keyword style query which would seem to fit
Yes you are correct but my integration framework treats non-text fields as
scalars so it is easier to neuter text than implement fulltext searching on
strings. I would do what you suggest if were using raw Solr. My understanding
was that string also does not get tfidf applied, which is not what
Interesting to think about ordering and adjacentness. The index ids are sorted
by Mahout strength so the first id is the most similar to the row key and so
forth. But the query is ordered buy recency. In both cases the first id is in
some sense the most important. Does Solr/Lucene care about
Not sure how you are going to decay in Mahout. Once ingested into Mahout there
are no timestamps. So you’ll have to do that before ingesting.
Last year we set up an ecom-department store type recommender with data from
online user purchase, add-to-cart, and view. The data was actual user
Best to my knowledge, Lucene does not care about the position of a keyword
within a document.
You could bucket the ids into several fields. Then use a dismax query to boost
the top-tier ids more than then second, etc.
A more fine-grained approach would probably involve a custom Similarity
Hi,
I’m trying to use OnlineLogisticRegression for a two-class classification
problem, but as my classification results are not very good, I wanted to ask
for support to find out if my settings are correct and if I’m using Mahout
correctly. Because if I’m doing it correctly then probably my
Another approach would be to weight the terms in the docs by there Mahout
similarity strength. But that will be for another day.
My current question is whether Lucene looks at word proximity. I see the query
syntax supports proximity but I don’t see that it is default so that’s good.
On Nov
Hi Pat,
On Nov 7, 2013, at 7:30pm, Pat Ferrel pat.fer...@gmail.com wrote:
Another approach would be to weight the terms in the docs by there Mahout
similarity strength. But that will be for another day.
My current question is whether Lucene looks at word proximity. I see the
query
On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan gkhn...@gmail.com wrote:
This particular approach is discussed, and proven to increase the accuracy
in Collaborative filtering with Temporal Dynamics by Yehuda Koren. The
decay function is parameterized per user, keeping track of how consistent
Why is FEATURE_NUMBER != 13?
With 12 features that are already lovely and continuous, just stick them in
elements 1..12 of a 13 long vector and put a constant value at the
beginning of it. Hashed encoding is good for sparse stuff, but confusing
for your case.
Also, it looks like you only pass
On Thu, Nov 7, 2013 at 9:45 PM, Andreas Bauer b...@gmx.net wrote:
Hi,
Thanks for your comments.
I modified the examples from the mahout in action book, therefore I used
the hashed approach and that's why i used 100 features. I'll adjust the
number.
Makes sense. But the book was doing
On Fri, Nov 8, 2013 at 6:24 AM, Ted Dunning ted.dunn...@gmail.com wrote:
On Thu, Nov 7, 2013 at 12:50 AM, Gokhan Capan gkhn...@gmail.com wrote:
This particular approach is discussed, and proven to increase the
accuracy
in Collaborative filtering with Temporal Dynamics by Yehuda Koren. The
17 matches
Mail list logo