I think the simplest implementation is to just get extra results from the
recommender and rescore after the rough retrieval. Integrating this into
the actual scoring engine is very hard since it depends on global
characteristics of the final result.
The same applies to result set clustering.
On Tue, May 21, 2013 at 10:34 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
Thanks for the list...as a non native speaker I got problems understanding
the meaning of dithering here.
Sorry about that. Your English is good enough that I hadn't noticed any
deficit.
Dithering is
Yes what you are describing with diversification is something that I have
called anti-flood. It comes from the fact that we really are optimizing a
portfolio of recommendations rather than a batch of independent
recommendations. Doing this from first principles is very hard but there are
very
Johannes,
Your summary is good.
I would add that the precalculated recommendations can be large enough that
the lookup becomes more expensive. Your point about staleness is very
on-point.
On Mon, May 20, 2013 at 10:15 PM, Johannes Schulte
johannes.schu...@gmail.com wrote:
I think Pat is
Thanks! Could you also add how to learn the weights you talked about, or at
least a hint? Learning weights for search engine query terms always sounds
like learning to rank to me but this always seemed pretty complicated
and i never managed to try it out..
On Tue, May 21, 2013 at 8:01 AM, Ted
In the interest of getting some empirical data out about various architectures:
On Mon, May 20, 2013 at 9:46 AM, Pat Ferrel pat.fer...@gmail.com wrote:
...
You use the user history vector as a query?
The most recent suffix of the history vector. How much is used varies by
the purpose.
We
I have so far just used the weights that Solr applies natively.
In my experience, what makes a recommendation engine work better is, in
order of importance,
a) dithering so that you gather wider data
b) using multiple sources of input
c) returning results quickly and reliably
d) the actual
Inline
On Tue, May 21, 2013 at 8:59 AM, Pat Ferrel p...@occamsmachete.com wrote:
In the interest of getting some empirical data out about various
architectures:
On Mon, May 20, 2013 at 9:46 AM, Pat Ferrel pat.fer...@gmail.com wrote:
...
You use the user history vector as a query?
Thanks for the list...as a non native speaker I got problems understanding
the meaning of dithering here.
I got the feeling that somewhere between a) and d) there is also
diversification of items in the recommendation list, so increasing the
distance between the list items according to some
I certainly have questions about this architecture mentioned below but first
let me make sure I understand.
You use the user history vector as a query? This will be a list of item IDs and
strength-of-preference values (maybe 1s for purchases). The cooccurrence matrix
has columns treated like
Hi Pat,
On May 20, 2013, at 9:46am, Pat Ferrel wrote:
I certainly have questions about this architecture mentioned below but first
let me make sure I understand.
You use the user history vector as a query? This will be a list of item IDs
and strength-of-preference values (maybe 1s for
Inline answers.
On Mon, May 20, 2013 at 9:46 AM, Pat Ferrel pat.fer...@gmail.com wrote:
...
You use the user history vector as a query?
The most recent suffix of the history vector. How much is used varies by
the purpose.
This will be a list of item IDs and strength-of-preference values
I think Pat is just saying that
time(history_lookup) (1) + time (recommendation_calculation) (2)
time(precalc_lookop) (3)
since 1 and 3 are assumed to be served by the same system class (key value
store, db) with a single key and 2 0.
ed is using a lot of information that is available at
Hi,
I would like to use Mahout to make recommendations on my web site. Since the
data is going to be big, hopefully, I plan to use hadoop implementations of the
recommender algorithms.
I'm currently storing the data in mysql. Should I continue with it or should I
switch to a nosql database
It doesn't matter, in the sense that it is never going to be fast
enough for real-time at any reasonable scale if actually run off a
database directly. One operation results in thousands of queries. It's
going to read data into memory anyway and cache it there. So, whatever
is easiest for you. The
Thanks Sean, but I could not get your answer. Can you please explain it again?
On Sun, May 19, 2013 at 8:00 PM, Sean Owen sro...@gmail.com wrote:
It doesn't matter, in the sense that it is never going to be fast
enough for real-time at any reasonable scale if actually run off a
database
I'm first saying that you really don't want to use the database as a
data model directly. It is far too slow.
Instead you want to use a data model implementation that reads all of
the data, once, serially, into memory. And in that case, it makes no
difference where the data is being read from,
ok, got it, thanks.
On Sun, May 19, 2013 at 8:20 PM, Sean Owen sro...@gmail.com wrote:
I'm first saying that you really don't want to use the database as a
data model directly. It is far too slow.
Instead you want to use a data model implementation that reads all of
the data, once, serially,
Hi Tevfik,
one request to the recommender could become more then 1000 queries to the
database depending on which recommender you use and the amount of preferences
for the given user.
The problem is not if you are using SQL, NoSQL, or any other query language.
The problem is the latency of the
Hi Manuel,
But if one uses matrix factorization and stores the user and item
factors in memory then there will be no database access during
recommendation.
I thought that the original question was where to store the data and
how to give it to hadoop.
On Sun, May 19, 2013 at 9:01 PM, Manuel
I think everyone is agreeing that it is essential to only access
information in memory at run-time, yes, whatever that info may be.
I don't think the original question was about Hadoop, but, the answer
is the same: Hadoop mappers are just reading the input serially. There
is no advantage to a
From: Sean Owen sro...@gmail.com
To: Mahout User List user@mahout.apache.org
Sent: Sunday, May 19, 2013 9:26 PM
Subject: Re: Which database should I use with Mahout
I think everyone is agreeing that it is essential to only access
information in memory at run-time, yes, whatever that info may
for showing the past ratings of a user.
Ahmet
From: Sean Owen sro...@gmail.com
To: Mahout User List user@mahout.apache.org
Sent: Sunday, May 19, 2013 9:26 PM
Subject: Re: Which database should I use with Mahout
I think everyone is agreeing
Hi Tevfik,
I am working with mysql but I would guess that HDFS like Sean suggested would
be a good idea as well.
There is also a project called sqoop which can be used to transfer data from
relation databases to Hadoop.
http://sqoop.apache.org/
Scribe might be also an option for transferring
Using a Hadoop version of a Mahout recommender will create some number of recs
for all users as its output. Sean is talking about Myrrix I think which uses
factorization to get much smaller models and so can calculate the recs at
runtime for fairly large user sets.
However if you are using
(I had in mind non distributed parts of Mahout but the principle is
similar, yes.)
On May 19, 2013 6:27 PM, Pat Ferrel pat.fer...@gmail.com wrote:
Using a Hadoop version of a Mahout recommender will create some number of
recs for all users as its output. Sean is talking about Myrrix I think
On Sun, May 19, 2013 at 6:26 PM, Pat Ferrel pat.fer...@gmail.com wrote:
Using a Hadoop version of a Mahout recommender will create some number of
recs for all users as its output. Sean is talking about Myrrix I think
which uses factorization to get much smaller models and so can calculate
the
Ah, which for completeness, brings up another scaling issue with Mahout. The
in-memory mahout recommenders do not pre-calculate all users recs. They keep
the preference matrix in-memory and calculate the recommendations at runtime.
At some point the size of your data will max a single machine.
On Sun, May 19, 2013 at 8:04 PM, Pat Ferrel p...@occamsmachete.com wrote:
Two basic solutions to this are: factorize (reduces 100s of thousands of
items to hundreds of 'features') and continue to calculate recs at runtime,
which you have to do with Myrrix since mahout does not have an
Won't argue with how fast Solr is, It's another fast and scalable lookup engine
and another option. Especially if you don't need to lookup anything else by
user, in which case you are back to a db...
Using a cooccurrence matrix means you are doing item similairty since there is
no user data in
On Sun, May 19, 2013 at 8:34 PM, Pat Ferrel p...@occamsmachete.com wrote:
Won't argue with how fast Solr is, It's another fast and scalable lookup
engine and another option. Especially if you don't need to lookup anything
else by user, in which case you are back to a db...
But remember, it
31 matches
Mail list logo