Hi Tevfik, one request to the recommender could become more then 1000 queries to the database depending on which recommender you use and the amount of preferences for the given user.
The problem is not if you are using SQL, NoSQL, or any other query language. The problem is the latency of the answers. An average tcp package in the same data center takes 500 µs. A main memory reference 0,1 µs. This means that your main memory of your java process can be accessed 5000 times faster then any other process like a database connected via TCP/IP. http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html Here you can see a screenshot that shows that database communication is by far (99%) the slowest component of a recommender request: https://source.apaxo.de/MahoutDatabaseLowPerformance.png If you do not want to cache your data in your Java process you can use a complete in memory database technology like SAP HANA http://www.saphana.com/welcome or EXASOL http://www.exasol.com/ Nevertheless if you are using these you do not need Mahout anymore. An architecture of a Mahout system can be seen here: https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/RecommenderArchitecture.png Hope that helps Manuel Am 19.05.2013 um 19:20 schrieb Sean Owen: > I'm first saying that you really don't want to use the database as a > data model directly. It is far too slow. > Instead you want to use a data model implementation that reads all of > the data, once, serially, into memory. And in that case, it makes no > difference where the data is being read from, because it is read just > once, serially. A file is just as fine as a fancy database. In fact > it's probably easier and faster. > > On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin > <[email protected]> wrote: >> Thanks Sean, but I could not get your answer. Can you please explain it >> again? >> >> >> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <[email protected]> wrote: >>> It doesn't matter, in the sense that it is never going to be fast >>> enough for real-time at any reasonable scale if actually run off a >>> database directly. One operation results in thousands of queries. It's >>> going to read data into memory anyway and cache it there. So, whatever >>> is easiest for you. The simplest solution is a file. >>> >>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz >>> <[email protected]> wrote: >>>> Hi, >>>> I would like to use Mahout to make recommendations on my web site. Since >>>> the data is going to be big, hopefully, I plan to use hadoop >>>> implementations of the recommender algorithms. >>>> >>>> I'm currently storing the data in mysql. Should I continue with it or >>>> should I switch to a nosql database such as mongodb or something else? >>>> >>>> Thanks >>>> Ahmet -- Manuel Blechschmidt M.Sc. IT Systems Engineering Dortustr. 57 14467 Potsdam Mobil: 0173/6322621 Twitter: http://twitter.com/Manuel_B
