Hi Manuel, But if one uses matrix factorization and stores the user and item factors in memory then there will be no database access during recommendation. I thought that the original question was where to store the data and how to give it to hadoop.
On Sun, May 19, 2013 at 9:01 PM, Manuel Blechschmidt <[email protected]> wrote: > Hi Tevfik, > one request to the recommender could become more then 1000 queries to the > database depending on which recommender you use and the amount of preferences > for the given user. > > The problem is not if you are using SQL, NoSQL, or any other query language. > The problem is the latency of the answers. > > An average tcp package in the same data center takes 500 µs. A main memory > reference 0,1 µs. This means that your main memory of your java process can > be accessed 5000 times faster then any other process like a database > connected via TCP/IP. > > http://www.eecs.berkeley.edu/~rcs/research/interactive_latency.html > > Here you can see a screenshot that shows that database communication is by > far (99%) the slowest component of a recommender request: > > https://source.apaxo.de/MahoutDatabaseLowPerformance.png > > If you do not want to cache your data in your Java process you can use a > complete in memory database technology like SAP HANA > http://www.saphana.com/welcome or EXASOL http://www.exasol.com/ > > Nevertheless if you are using these you do not need Mahout anymore. > > An architecture of a Mahout system can be seen here: > https://github.com/ManuelB/facebook-recommender-demo/blob/master/docs/RecommenderArchitecture.png > > Hope that helps > Manuel > > Am 19.05.2013 um 19:20 schrieb Sean Owen: > >> I'm first saying that you really don't want to use the database as a >> data model directly. It is far too slow. >> Instead you want to use a data model implementation that reads all of >> the data, once, serially, into memory. And in that case, it makes no >> difference where the data is being read from, because it is read just >> once, serially. A file is just as fine as a fancy database. In fact >> it's probably easier and faster. >> >> On Sun, May 19, 2013 at 10:14 AM, Tevfik Aytekin >> <[email protected]> wrote: >>> Thanks Sean, but I could not get your answer. Can you please explain it >>> again? >>> >>> >>> On Sun, May 19, 2013 at 8:00 PM, Sean Owen <[email protected]> wrote: >>>> It doesn't matter, in the sense that it is never going to be fast >>>> enough for real-time at any reasonable scale if actually run off a >>>> database directly. One operation results in thousands of queries. It's >>>> going to read data into memory anyway and cache it there. So, whatever >>>> is easiest for you. The simplest solution is a file. >>>> >>>> On Sun, May 19, 2013 at 9:52 AM, Ahmet Ylmaz >>>> <[email protected]> wrote: >>>>> Hi, >>>>> I would like to use Mahout to make recommendations on my web site. Since >>>>> the data is going to be big, hopefully, I plan to use hadoop >>>>> implementations of the recommender algorithms. >>>>> >>>>> I'm currently storing the data in mysql. Should I continue with it or >>>>> should I switch to a nosql database such as mongodb or something else? >>>>> >>>>> Thanks >>>>> Ahmet > > -- > Manuel Blechschmidt > M.Sc. IT Systems Engineering > Dortustr. 57 > 14467 Potsdam > Mobil: 0173/6322621 > Twitter: http://twitter.com/Manuel_B >
