Hi, We're an early stage analytics company and are evaluating Riak as an option for our analytics DB. Long story short: we don't quite have the volume to necessitate or time to support Hive / Hbase / Hadoop and our current analytics database (RavenDB) is starting to buckle under the load we're throwing at it.
I went through this presentation about Kiip (company with similar struggles to ours) scaling on Riak and wanted to ask some questions for heavy MapReduce users: https://speakerdeck.com/mitchellh/day-at-kiip So here's what I wanted to ask: 1. Is there an easy mechanism to run scheduled MapReduce jobs within Riak? If there isn't, how would you set this up? Chron job or Hadoop running on a separate box? 2. I've seen some creative strategies for using post-commit hooks in Riak to write incremental aggregate values (i.e. for each record added to User bucket, increment the TotalUsers property by one for a DailyUsers record in the DailyUsers bucket) - how well does this work in practice? Are there any issues with atomicity or inconsistency across Riak's vnodes that could cause problems? 3. Is there a way to give Riak a hint on where it should store data? I.E. If I have objects across multiple buckets that are related, store them on the same vnode so we don't have to span the network to run a M/R ? 4. What are some of the best practices for running large (millions of objects) MapReduce jobs? Best, -- *Aaron Stannard* • Founder • MarkedUp • markedup.com • [email protected] • 424.256.8675 • github.com/Aaronontheweb • @aaronontheweb
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
