Hi Marco, I'm also new to riak and in the middle researching whether riak makes sense for a new project I'm working on.
Here are my thoughts based on what I've read so far: - Riak can be pretty fast, but their game is more about rock solid stability rather than raw performance - you're going to want a 3-node+ cluster; you'd want multiple nodes for any m/r setup anyway - Riak won't be super great for ad-hoc queries so I'm assuming that we're talking about a canned daily report - your schema needs to match your query patterns; a 2i of date, query by specific date then group by session; if you're doing this as described w/o a 2i date filter first, you'll be going through page view data for the past which presumably won't change after you've processed it - m/r using protocol buffers can be significantly faster than using HTTP so try again with a client that uses pb - once you have multiple nodes set up, do_prereduce will split the load of reducing across the nodes; i think this would definitely be useful if you're data reduces well per node like: for day d, x registered user page views, y unregistered user page views - updating shouldn't be a problem as long as it isn't hard for you to resolve collisions; if there's a collision you have to decide on the strategy for resolution; since this is log data and not something that requires transactions, that shouldn't be a problem Some links that I've found useful: http://joyeur.com/2010/10/31/riak-smartmachine-benchmark-the-technical-details/ (especially the comment) http://devblog.seomoz.org/2011/10/using-riak-for-ranking-collection/ http://inakanetworks.com/blog/2011/08/25/when-to-use-riak/ Basho Vimeo channel has a pile of informative videos where you'll find good nuggets here and there: http://vimeo.com/17604126 Hope that helps, Steve On Sun, Feb 12, 2012 at 9:00 AM, <[email protected]> wrote: > Message: 1 > Date: Sun, 12 Feb 2012 11:27:22 +0000 > From: Marco Monteiro <[email protected]> > To: [email protected] > Subject: Is Riak a good solution for this problem? > > Hello! > > I'm considering Riak for the statistics of a site that is approaching a > billion page views per month. > The plan is to log a little information about each the page view and then > to query that data. > > I'm very new to Riak. I've gone over the documentation on the wiki, and I > know about map-reduce, > secondary indexes and Riak search. I've installed Riak on a single node and > made a test with the > default configuration. The results were a little bellow what I expected. > For the test is used the following > requirement. > > We want the page view count by day for registered and unregistered users. > We are storing session > documents. Each document has a session identifier as it's key and a list of > page views as the value > (and a few additional properties we can ignore). This document structure > comes from CouchDB, > where I organised things like this to be able to more easily query the > database. I've done a basic > javascript map-reduce query for this. I just map over each session (every > k/v in a bucket) returning > the length of the page views array for either the registered or > unregistered field (the other is zero), and > the day of the request. In the reduce I collect them by hashing the day and > summing the two number > of page views. Then I have a second reduce to sort the list by day. > > This is very slow on a single machine setup with default Riak > configuration. 1.000 sessions takes > 6 seconds. 10.000 sessions takes more that 2 minutes (timeout). We want to > handle 10.000.000 > sessions, at least. Is there a way, maybe with secondary indexes, to make > this go faster using only Riak? > Or must I use some kind of persistent cache to store this info as time goes > by? Or can I make Riak > run 100 times faster by tweaking the config? I don't want to have 1000 > machines for making this work. > > Also, will updating the session documents be a problem for Riak? Would it > be better to store each > page hit under a new key, to not update the the session document. Because > of the "multilevel" map > reduce this ca work on Riak, where it didn't work on CouchDB, because its > view system limitation. > Unfortunately, with the update of documents the CouchDB database was > growing way too fast for it > to be a feasible solution. > > > Any advice to make Riak work for this problem is greatly appreciated. > > Thanks, > Marco > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120212/10c8fc3d/attachment-0001.html > > > > ------------------------------ > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > End of riak-users Digest, Vol 31, Issue 16 > ****************************************** > -- facebook: http://facebook.com/picturebookApp twitter: http://twitter.com/maplekey blog: http://maplekeycompany.blogspot.com/ site: http://www.maplekeycompany.com/mobile/
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
