Re: How does Accumulo compare to HBase

Josh Elser Mon, 23 Jun 2014 13:29:32 -0700

Noted: I'll add it to the top of my "to blog" queue. If anyone elsewants to do a write-up, I'm happy to help.


On 6/23/14, 4:23 PM, Donald Miner wrote:

This needs to be documented on the official blog.



On Mon, Jun 23, 2014 at 3:31 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:

    Sent too quickly..

    - The BatchScanner is communicating to tservers in *parallel* which
    is where this really shows it strength.

    - A "default" locality group. You don't have to define the locality
    groups for a table at creation time in Accumulo (or have to modify
    the table if you want to insert a new column family). Because of
    this, you have a lot more flexibility in how you structure your
    tables while also being able to take advantage of the efficient
    filtering you get having locality groups you have configured. Adding
    a new locality group does still require a compaction to re-write the
    data in separate files.


    On 6/23/14, 3:24 PM, Josh Elser wrote:

        A few observations I can make from watching both communities
        (although
        only really participating in Accumulo's).

        - HBase undeniably has a much larger public community of both
        users and
        developers; however, we are seeing broader adoption across different
        vertical markets with Accumulo. IMO, I think we have a rather
        responsive
        community built up here. Lots of smart people are working that are
        available and happy to help with problems.

        - BatchScanner: The BatchScanner is a query construct which will
        automatically fetch data from a collection of Ranges on a table and
        return the results in the form of a Java Iterator. This makes
        for a very
        natural way to read lots of data from Accumulo, automatically
        performing
        some reduction in the data server-side (using Accumulo
        Iterators), and
        getting a wonderfully simple Iterator<Entry<Key,Value>> in your
        client
        code. It really helps to encourage a state-less and functional-like
        style to your code.

        I really like it, and, when combined with the ability to push a
        bunch of
        work server-side, it has often kept me from having to write
        MapReduce
        jobs (which is always a win to me).

        - Accumulo Iterators are a common thing you might hear as a
        difference.
        AFAICT, they're a bit more powerful than what you can do with HBase
        filters because you are presented with a stream of Key-Value pairs
        inside of the TServer. Again, it's a bit functional programming
        inspired. You have the ability to combine, consume, seek within the
        stream and do what you please (more context would be helpful in
        giving
        specific examples)

        That being said, Iterators do come with a learning curve, but
        that's to
        be expected with the amount of flexibility they provide. It's
        just like
        anything else :)

        - <disclaimer>I can't comment about running HBase in production
        environments, but I tend to hear a lot of "war stories" about
        it. I also
        don't know how much of this is from running old version of HBase
        which
        don't have known issues patched. </disclaimer>

        In my experience, Accumulo just works. It doesn't require much
        day-to-day interaction, processes stay running and if some node goes
        haywire, I have absolutely no qualms against `kill -9`'ing it and
        knowing that everything will come back fine.

        My $0.02.

        - Josh

        On 6/23/14, 2:49 PM, Josh Elser wrote:

            Another way you could word this is that Accumulo has a very
            "mature"
            security implementation, whereas, like you pointed out,
            HBase has only
            recently added this in 0.98.

            The note about how visibility being in the Key as opposed to
            the Value
            also has impact when writing Iterators. Because the
            visibility is a
            "first class citizen" instead of an afterthought, having it
            uniquely
            define some pair makes aggregations much easier to think
            about, IMO.
            This is especially prevalent when doing this server-side with an
            Accumulo Iterator.

            There are also other differences between the implementations
            visibility
            filtering, the most common being the support of a "NOT"
            operator in
            HBase whereas Accumulo explicitly chose not to implement
            this. By
            allowing "NOT" into the syntax, it becomes much more
            possible that data
            is inadvertently leaked. Marking data correctly is more
            difficult than
            it seems and introducing the ability to negate certain
            branches makes it
            even more difficult. Auditors are scary :)

            - Josh

            On 6/23/14, 2:34 PM, Aaron wrote:

                I'm not sure of all the differences, but, wrt HBase Cell
                Level security
                (CLS)..while similar..not 100% the same.  If I
                understand how the HBase
                CLS works it's extension to ACL system.  And that ACL is
                "applied" to a
                cell.  In Accumulo's case, it is part of the key.  So
                the ramification
                is that in Accumulo, you can have:

                RowID, CF, CQ, VIS1, TS --> Value1
                RowID, CF, CQ, VIS2, TS --> Value2

                If everything is the same, including the timestamp, the
                visibility can
                actually determine which value to return.  So, a more
                concrete example
                would be:

                XXX, METADATA, NAME, everyone,  100--> Bruce Wayne
                XXX, METADATA, NAME, alfred-only,  100--> Batman

                Where Alfred could/would see both "values"...but,
                everyone else would
                only see "Bruce"

                Hope that helps.

                Cheers,
                Aaron

                PS:  this is my understanding of how HBase CLS
                works...based on what I
                have read/interpreted.



                On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang
                <[email protected] <mailto:[email protected]>
                <mailto:jianshi.huang@gmail.__com
                <mailto:[email protected]>>> wrote:

                     Er... basically I need to explain to my manager why
                choosing
                     Accumulo, instead of HBase.

                     So what are the pros and cons of Accumulo vs.
                HBase? (btw HBase 0.98
                     also got cell-level security, modeled after Accumulo)

                     --
                     Jianshi Huang

                     LinkedIn: jianshi
                     Twitter: @jshuang
                     Github & Blog: http://huangjs.github.com/





--
*
*Donald Miner
Chief Technology Officer
ClearEdge IT Solutions, LLC
Cell: 443 799 7807
www.clearedgeit.com <http://www.clearedgeit.com>

Re: How does Accumulo compare to HBase

Reply via email to