Re: Which database should I use with Mahout

Tevfik Aytekin Sun, 19 May 2013 11:19:27 -0700

Hi Manuel,
But if one uses matrix factorization and stores the user and item
factors in memory then there will be no database access during
recommendation.
I thought that the original question was where to store the data and
how to give it to hadoop.


On Sun, May 19, 2013 at 9:01 PM, Manuel Blechschmidt
<[email protected]> wrote:
> Hi Tevfik,
> one request to the recommender could become more then 1000 queries to the 
> database depending on which recommender you use and the amount of preferences 
> for the given user.
>
> The problem is not if you are using SQL, NoSQL, or any other query language. 
> The problem is the latency of the answers.
>
> An average tcp package in the same data center takes 500 µs. A main memory 
> reference 0,1 µs. This means that your main memory of your java process can 
> be accessed 5000 times faster then any other process like a database 
> connected via TCP/IP.
>
> http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html
>
> Here you can see a screenshot that shows that database communication is by 
> far (99%) the slowest component of a recommender request:
>
> https://source.apaxo.de/MahoutDatabaseLowPerformance.png
>
> If you do not want to cache your data in your Java process you can use a 
> complete in memory database technology like SAP HANA 
> http://www.saphana.com/welcome or EXASOL http://www.exasol.com/
>
> Nevertheless if you are using these you do not need Mahout anymore.
>
> An architecture of a Mahout system can be seen here:
> https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/RecommenderArchitecture.png
>
> Hope that helps
>     Manuel
>
> Am 19.05.2013 um 19:20 schrieb Sean Owen:
>
>> I'm first saying that you really don't want to use the database as a
>> data model directly. It is far too slow.
>> Instead you want to use a data model implementation that reads all of
>> the data, once, serially, into memory. And in that case, it makes no
>> difference where the data is being read from, because it is read just
>> once, serially. A file is just as fine as a fancy database. In fact
>> it's probably easier and faster.
>>
>> On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin
>> <[email protected]> wrote:
>>> Thanks Sean, but I could not get your answer. Can you please explain it 
>>> again?
>>>
>>>
>>> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <[email protected]> wrote:
>>>> It doesn't matter, in the sense that it is never going to be fast
>>>> enough for real-time at any reasonable scale if actually run off a
>>>> database directly. One operation results in thousands of queries. It's
>>>> going to read data into memory anyway and cache it there. So, whatever
>>>> is easiest for you. The simplest solution is a file.
>>>>
>>>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz
>>>> <[email protected]> wrote:
>>>>> Hi,
>>>>> I would like to use Mahout to make recommendations on my web site. Since 
>>>>> the data is going to be big, hopefully, I plan to use hadoop 
>>>>> implementations of the recommender algorithms.
>>>>>
>>>>> I'm currently storing the data in mysql. Should I continue with it or 
>>>>> should I switch to a nosql database such as mongodb or something else?
>>>>>
>>>>> Thanks
>>>>> Ahmet
>
> --
> Manuel Blechschmidt
> M.Sc. IT Systems Engineering
> Dortustr. 57
> 14467 Potsdam
> Mobil: 0173/6322621
> Twitter: http://twitter.com/Manuel_B
>

Re: Which database should I use with Mahout

Reply via email to