In my opinion, one of our main goals for Accumulo is “it just works.” Specifically, Accumulo’s development focuses on fault tolerance, ingest performance, and ease of administration. It is likely that its design scales to larger clusters than HBase's does because it splits its metadata table, providing a 3-level lookup hierarchy instead of 2-level. A few other important points: * Accumulo supports very large rows with very large numbers of columns; its rows do not need to fit in memory and its columns do not need to be specified in advance. This opens Accumulo up for new types of table designs that take advantage of those features. * Accumulo has an off-heap in-memory map (where recently written data goes) which makes it less susceptible to Java garbage collection issues, and may have a positive effect on its ingest rates. * Accumulo has faster, more fault-tolerant splits than HBase and has no issue performing splits while a table is in use. User-initiated administrative operations are also performed atomically through Accumulo’s fault-tolerant execution system. * Accumulo ensures that key timestamps set by the server never go backwards, even when time across the cluster is incorrect. I have no idea how HBase operates without this, as it is essential in preventing data loss.
Accumulo’s read caching is not currently as good as HBase's, and HBase generally has more features and better integration with other projects. Our notable features page may provide you with some additional ideas: http://accumulo.apache.org/notable_features.html On Mon, Jun 23, 2014 at 10:55 AM, Jianshi Huang <[email protected]> wrote: > Er... basically I need to explain to my manager why choosing Accumulo, > instead of HBase. > > So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98 also > got cell-level security, modeled after Accumulo) > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >
