Wow, so many replies and very educational. Thank you all! I'm working on a Graph backend that I hope the same infrastructure can support
1) interactive graph exploration and queries Answering what are the interactions among N users from time A to time B, and how are users connected (now and before). 2) real-time (<100ms) feature calculation (aggregation, matching) in a network of accounts Answering questions like: what's the ratio of newly registered accounts in my 'connected' (need flexible definition) network, how fast does it change; Does the network has path satisfying A(CN) -> B(IT) -> C(US) where the age of path is less than 3 days; etc. 3) offline simulation of events or offline calculation of new features (used for building models), so I need to take snapshots and also save point-in-time data Having them all-in-one in the same infrastructure will greatly simplify the implementation. BTW, I'm working for PayPal, Risk Data Science. (All questions above are fake and are not related to PayPal :) I made a prototype in the last two weeks for purpose 1) and my feeling about Accumulo is exactly what many of you has said: it just works! Very little admin work, Clean and clear documentation and APIs. One thing I haven't got right was high-speed ingestion, I only got 100K rows/sec/node, but it's already very satisfying. :) BTW, from Mike's slides it seems HBase is much faster in read throughput if the number of columns is small. Any comments? What about latency? Can I cache all data in memory in Accumulo to reduce latency for cold data (say I just restarted my cluster)? Jianshi On Tue, Jun 24, 2014 at 10:41 AM, William Slacum < [email protected]> wrote: > I think first and foremost, how has writing your application been? Is it > something you can easily onboard other people for? Does it seem stable > enough? If you can answer those questions positively, I think you have a > winning situation. > > The big three Hadoop vendors (Cloudera, Hortonworks and MapR) all provide > some level of support for Accumulo, so it has the pedigree of other members > of the Hadoop ecosystem. > > Regarding the performance, I think Mike's presentation needs some context. > He can definitely provide more context than the rest of us (and possibly > Sean or Bill |-|), but I think one thing he was driving home is that out of > the box, Accumulo is configured to run on someone's laptop. There are > adjustments to be made when running at any scale greater than a dev machine > and they may not be documented clearly. > > > On Mon, Jun 23, 2014 at 8:16 PM, Tejinder S Luthra <[email protected]> > wrote: > >> Mike did a pretty good presentation on performance comparison between >> Accumulo / HBase. Again not official IMO but is pretty detailed in the >> approach take and apples-apples comparison >> http://www.slideshare.net/AccumuloSummit/10-30-drob >> >> >> >> [image: Inactive hide details for Jeremy Kepner ---06/23/2014 07:42:57 >> PM---Performance is probably the largest difference between Accu]Jeremy >> Kepner ---06/23/2014 07:42:57 PM---Performance is probably the largest >> difference between Accumulo and HBase. Accumulo can ingest/scan >> >> From: Jeremy Kepner <[email protected]> >> To: <[email protected]> >> Date: 06/23/2014 07:42 PM >> Subject: Re: How does Accumulo compare to HBase >> ------------------------------ >> >> >> >> Performance is probably the largest difference between Accumulo and HBase. >> >> Accumulo can ingest/scan at a rate of 800K entries/sec/node. >> This performance scales well into the hundreds of nodes to deliver >> 100M+ entries/sec. >> >> There are no recent HBase benchmarks and none in the peer-reviewed >> literature. >> Old data suggests that HBase performance is ~1% of Accumulo performance. >> >> In short, one can often replace a 20+ node database with >> a single node Accumulo database. >> >> On Tue, Jun 24, 2014 at 01:55:54AM +0800, Jianshi Huang wrote: >> > Er... basically I need to explain to my manager why choosing Accumulo, >> > instead of HBase. >> > >> > So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98 >> also >> > got cell-level security, modeled after Accumulo) >> > >> > -- >> > Jianshi Huang >> > >> > LinkedIn: jianshi >> > Twitter: @jshuang >> > Github & Blog: http://huangjs.github.com/ >> >> >> > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
