Hi,
I would like to use Mahout to make recommendations on my web site. Since the
data is going to be big, hopefully, I plan to use hadoop implementations of the
recommender algorithms.
I'm currently storing the data in mysql. Should I continue with it or should I
switch to a nosql database
It doesn't matter, in the sense that it is never going to be fast
enough for real-time at any reasonable scale if actually run off a
database directly. One operation results in thousands of queries. It's
going to read data into memory anyway and cache it there. So, whatever
is easiest for you. The
Thanks Sean, but I could not get your answer. Can you please explain it again?
On Sun, May 19, 2013 at 8:00 PM, Sean Owen sro...@gmail.com wrote:
It doesn't matter, in the sense that it is never going to be fast
enough for real-time at any reasonable scale if actually run off a
database
I'm first saying that you really don't want to use the database as a
data model directly. It is far too slow.
Instead you want to use a data model implementation that reads all of
the data, once, serially, into memory. And in that case, it makes no
difference where the data is being read from,
ok, got it, thanks.
On Sun, May 19, 2013 at 8:20 PM, Sean Owen sro...@gmail.com wrote:
I'm first saying that you really don't want to use the database as a
data model directly. It is far too slow.
Instead you want to use a data model implementation that reads all of
the data, once, serially,
Hi Tevfik,
one request to the recommender could become more then 1000 queries to the
database depending on which recommender you use and the amount of preferences
for the given user.
The problem is not if you are using SQL, NoSQL, or any other query language.
The problem is the latency of the
Hi Manuel,
But if one uses matrix factorization and stores the user and item
factors in memory then there will be no database access during
recommendation.
I thought that the original question was where to store the data and
how to give it to hadoop.
On Sun, May 19, 2013 at 9:01 PM, Manuel
I think everyone is agreeing that it is essential to only access
information in memory at run-time, yes, whatever that info may be.
I don't think the original question was about Hadoop, but, the answer
is the same: Hadoop mappers are just reading the input serially. There
is no advantage to a
Hi Sean,
If I understood you correctly you are saying that I will not need mysql. But if
I store my data on HDFS will I be make fast queries such as
Return all the ratings of a specific user
which will be needed for showing the past ratings of a user.
Ahmet
(Oh, by the way, I realize the original question was about Hadoop. I
can't read carefully.)
No, HDFS is not good for anything like random access. For input,
that's OK, because you don't need random access. So HDFS is just fine.
For output, if you are going to then serve these precomputed results
Hi Tevfik,
I am working with mysql but I would guess that HDFS like Sean suggested would
be a good idea as well.
There is also a project called sqoop which can be used to transfer data from
relation databases to Hadoop.
http://sqoop.apache.org/
Scribe might be also an option for transferring
Dear,
I'm experiencing difficulties with
hppchttp://labs.carrotsearch.com/hppc.htmllibrary that I'm using. My
algorithms work perfectly fine for small inputs,
but when I go for amazon machine and want to compute larger inputs, my code
hangs on forever as a result of some hidden bugs in that
Hello Sophie,
Mahout 0.7 Math module is available on Maven Central repository:
http://repo1.maven.org/maven2/org/apache/mahout/mahout-math/0.7/
Besides jar with binaries there is also a javadoc and sources jar.
I've just counted, since 0.7 release there have been 60 commits which
included math
Dear Stevo,
By this link
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/math/package-summary.html
there is no OpenIntHashSet or OpenIntIntHashMap classes or with similar
names, do they exist there?
Thank you for reply,
Best wishes
On 19 May 2013 22:50, Stevo Slavić
I found it here, seems okay now. That link just strange.
On 19 May 2013 23:15, Sophie Sperner sophie.sper...@gmail.com wrote:
Dear Stevo,
By this link
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/math/package-summary.html
there is no OpenIntHashSet or
They do, but it seems javadoc generation is not configured well - doesn't
generate report for generated sources.
org.apache.mahout.math.set.OpenIntHashSet
org.apache.mahout.math.map.OpenIntIntHashMap
On Mon, May 20, 2013 at 12:15 AM, Sophie Sperner
sophie.sper...@gmail.comwrote:
Dear Stevo,
Sophie,
Can you say a bit more about what you want to do?
On Sun, May 19, 2013 at 2:22 PM, Sophie Sperner sophie.sper...@gmail.comwrote:
Dear,
I'm experiencing difficulties with
hppchttp://labs.carrotsearch.com/hppc.htmllibrary that I'm using. My
algorithms work perfectly fine for small
Using a Hadoop version of a Mahout recommender will create some number of recs
for all users as its output. Sean is talking about Myrrix I think which uses
factorization to get much smaller models and so can calculate the recs at
runtime for fairly large user sets.
However if you are using
(I had in mind non distributed parts of Mahout but the principle is
similar, yes.)
On May 19, 2013 6:27 PM, Pat Ferrel pat.fer...@gmail.com wrote:
Using a Hadoop version of a Mahout recommender will create some number of
recs for all users as its output. Sean is talking about Myrrix I think
On Sun, May 19, 2013 at 6:26 PM, Pat Ferrel pat.fer...@gmail.com wrote:
Using a Hadoop version of a Mahout recommender will create some number of
recs for all users as its output. Sean is talking about Myrrix I think
which uses factorization to get much smaller models and so can calculate
the
Ah, which for completeness, brings up another scaling issue with Mahout. The
in-memory mahout recommenders do not pre-calculate all users recs. They keep
the preference matrix in-memory and calculate the recommendations at runtime.
At some point the size of your data will max a single machine.
On Sun, May 19, 2013 at 8:04 PM, Pat Ferrel p...@occamsmachete.com wrote:
Two basic solutions to this are: factorize (reduces 100s of thousands of
items to hundreds of 'features') and continue to calculate recs at runtime,
which you have to do with Myrrix since mahout does not have an
Won't argue with how fast Solr is, It's another fast and scalable lookup engine
and another option. Especially if you don't need to lookup anything else by
user, in which case you are back to a db...
Using a cooccurrence matrix means you are doing item similairty since there is
no user data in
On Sun, May 19, 2013 at 8:34 PM, Pat Ferrel p...@occamsmachete.com wrote:
Won't argue with how fast Solr is, It's another fast and scalable lookup
engine and another option. Especially if you don't need to lookup anything
else by user, in which case you are back to a db...
But remember, it
24 matches
Mail list logo