Re: Which database should I use with Mahout

Manuel Blechschmidt Sun, 19 May 2013 11:02:03 -0700

Hi Tevfik,
one request to the recommender could become more then 1000 queries to the 
database depending on which recommender you use and the amount of preferences 
for the given user.


The problem is not if you are using SQL, NoSQL, or any other query language. 
The problem is the latency of the answers.

An average tcp package in the same data center takes 500 µs. A main memory 
reference 0,1 µs. This means that your main memory of your java process can be 
accessed 5000 times faster then any other process like a database connected via 
TCP/IP.

http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html

Here you can see a screenshot that shows that database communication is by far 
(99%) the slowest component of a recommender request:

https://source.apaxo.de/MahoutDatabaseLowPerformance.png

If you do not want to cache your data in your Java process you can use a 
complete in memory database technology like SAP HANA 
http://www.saphana.com/welcome or EXASOL http://www.exasol.com/

Nevertheless if you are using these you do not need Mahout anymore.

An architecture of a Mahout system can be seen here:
https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/RecommenderArchitecture.png

Hope that helps
    Manuel

Am 19.05.2013 um 19:20 schrieb Sean Owen:

> I'm first saying that you really don't want to use the database as a
> data model directly. It is far too slow.
> Instead you want to use a data model implementation that reads all of
> the data, once, serially, into memory. And in that case, it makes no
> difference where the data is being read from, because it is read just
> once, serially. A file is just as fine as a fancy database. In fact
> it's probably easier and faster.
> 
> On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin
> <[email protected]> wrote:
>> Thanks Sean, but I could not get your answer. Can you please explain it 
>> again?
>> 
>> 
>> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <[email protected]> wrote:
>>> It doesn't matter, in the sense that it is never going to be fast
>>> enough for real-time at any reasonable scale if actually run off a
>>> database directly. One operation results in thousands of queries. It's
>>> going to read data into memory anyway and cache it there. So, whatever
>>> is easiest for you. The simplest solution is a file.
>>> 
>>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz
>>> <[email protected]> wrote:
>>>> Hi,
>>>> I would like to use Mahout to make recommendations on my web site. Since 
>>>> the data is going to be big, hopefully, I plan to use hadoop 
>>>> implementations of the recommender algorithms.
>>>> 
>>>> I'm currently storing the data in mysql. Should I continue with it or 
>>>> should I switch to a nosql database such as mongodb or something else?
>>>> 
>>>> Thanks
>>>> Ahmet

-- 
Manuel Blechschmidt
M.Sc. IT Systems Engineering
Dortustr. 57
14467 Potsdam
Mobil: 0173/6322621
Twitter: http://twitter.com/Manuel_B

Re: Which database should I use with Mahout

Reply via email to