Re: How does Accumulo compare to HBase

Josh Elser Mon, 23 Jun 2014 12:25:38 -0700

A few observations I can make from watching both communities (althoughonly really participating in Accumulo's).

- HBase undeniably has a much larger public community of both users anddevelopers; however, we are seeing broader adoption across differentvertical markets with Accumulo. IMO, I think we have a rather responsivecommunity built up here. Lots of smart people are working that areavailable and happy to help with problems.

- BatchScanner: The BatchScanner is a query construct which willautomatically fetch data from a collection of Ranges on a table andreturn the results in the form of a Java Iterator. This makes for a verynatural way to read lots of data from Accumulo, automatically performingsome reduction in the data server-side (using Accumulo Iterators), andgetting a wonderfully simple Iterator<Entry<Key,Value>> in your clientcode. It really helps to encourage a state-less and functional-likestyle to your code.

I really like it, and, when combined with the ability to push a bunch ofwork server-side, it has often kept me from having to write MapReducejobs (which is always a win to me).

- Accumulo Iterators are a common thing you might hear as a difference.AFAICT, they're a bit more powerful than what you can do with HBasefilters because you are presented with a stream of Key-Value pairsinside of the TServer. Again, it's a bit functional programminginspired. You have the ability to combine, consume, seek within thestream and do what you please (more context would be helpful in givingspecific examples)

That being said, Iterators do come with a learning curve, but that's tobe expected with the amount of flexibility they provide. It's just likeanything else :)

- <disclaimer>I can't comment about running HBase in productionenvironments, but I tend to hear a lot of "war stories" about it. I alsodon't know how much of this is from running old version of HBase whichdon't have known issues patched. </disclaimer>

In my experience, Accumulo just works. It doesn't require muchday-to-day interaction, processes stay running and if some node goeshaywire, I have absolutely no qualms against `kill -9`'ing it andknowing that everything will come back fine.


My $0.02.

- Josh

On 6/23/14, 2:49 PM, Josh Elser wrote:

Another way you could word this is that Accumulo has a very "mature"
security implementation, whereas, like you pointed out, HBase has only
recently added this in 0.98.

The note about how visibility being in the Key as opposed to the Value
also has impact when writing Iterators. Because the visibility is a
"first class citizen" instead of an afterthought, having it uniquely
define some pair makes aggregations much easier to think about, IMO.
This is especially prevalent when doing this server-side with an
Accumulo Iterator.

There are also other differences between the implementations visibility
filtering, the most common being the support of a "NOT" operator in
HBase whereas Accumulo explicitly chose not to implement this. By
allowing "NOT" into the syntax, it becomes much more possible that data
is inadvertently leaked. Marking data correctly is more difficult than
it seems and introducing the ability to negate certain branches makes it
even more difficult. Auditors are scary :)

- Josh

On 6/23/14, 2:34 PM, Aaron wrote:

I'm not sure of all the differences, but, wrt HBase Cell Level security
(CLS)..while similar..not 100% the same.  If I understand how the HBase
CLS works it's extension to ACL system.  And that ACL is "applied" to a
cell.  In Accumulo's case, it is part of the key.  So the ramification
is that in Accumulo, you can have:

RowID, CF, CQ, VIS1, TS --> Value1
RowID, CF, CQ, VIS2, TS --> Value2

If everything is the same, including the timestamp, the visibility can
actually determine which value to return.  So, a more concrete example
would be:

XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
XXX, METADATA, NAME, alfred-only,  100--> Batman

Where Alfred could/would see both "values"...but, everyone else would
only see "Bruce"

Hope that helps.

Cheers,
Aaron

PS:  this is my understanding of how HBase CLS works...based on what I
have read/interpreted.



On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang <[email protected]
<mailto:[email protected]>> wrote:

    Er... basically I need to explain to my manager why choosing
    Accumulo, instead of HBase.

    So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98
    also got cell-level security, modeled after Accumulo)

    --
    Jianshi Huang

    LinkedIn: jianshi
    Twitter: @jshuang
    Github & Blog: http://huangjs.github.com/

Re: How does Accumulo compare to HBase

Reply via email to