This needs to be documented on the official blog.
On Mon, Jun 23, 2014 at 3:31 PM, Josh Elser <[email protected]> wrote: > Sent too quickly.. > > - The BatchScanner is communicating to tservers in *parallel* which is > where this really shows it strength. > > - A "default" locality group. You don't have to define the locality groups > for a table at creation time in Accumulo (or have to modify the table if > you want to insert a new column family). Because of this, you have a lot > more flexibility in how you structure your tables while also being able to > take advantage of the efficient filtering you get having locality groups > you have configured. Adding a new locality group does still require a > compaction to re-write the data in separate files. > > > On 6/23/14, 3:24 PM, Josh Elser wrote: > >> A few observations I can make from watching both communities (although >> only really participating in Accumulo's). >> >> - HBase undeniably has a much larger public community of both users and >> developers; however, we are seeing broader adoption across different >> vertical markets with Accumulo. IMO, I think we have a rather responsive >> community built up here. Lots of smart people are working that are >> available and happy to help with problems. >> >> - BatchScanner: The BatchScanner is a query construct which will >> automatically fetch data from a collection of Ranges on a table and >> return the results in the form of a Java Iterator. This makes for a very >> natural way to read lots of data from Accumulo, automatically performing >> some reduction in the data server-side (using Accumulo Iterators), and >> getting a wonderfully simple Iterator<Entry<Key,Value>> in your client >> code. It really helps to encourage a state-less and functional-like >> style to your code. >> >> I really like it, and, when combined with the ability to push a bunch of >> work server-side, it has often kept me from having to write MapReduce >> jobs (which is always a win to me). >> >> - Accumulo Iterators are a common thing you might hear as a difference. >> AFAICT, they're a bit more powerful than what you can do with HBase >> filters because you are presented with a stream of Key-Value pairs >> inside of the TServer. Again, it's a bit functional programming >> inspired. You have the ability to combine, consume, seek within the >> stream and do what you please (more context would be helpful in giving >> specific examples) >> >> That being said, Iterators do come with a learning curve, but that's to >> be expected with the amount of flexibility they provide. It's just like >> anything else :) >> >> - <disclaimer>I can't comment about running HBase in production >> environments, but I tend to hear a lot of "war stories" about it. I also >> don't know how much of this is from running old version of HBase which >> don't have known issues patched. </disclaimer> >> >> In my experience, Accumulo just works. It doesn't require much >> day-to-day interaction, processes stay running and if some node goes >> haywire, I have absolutely no qualms against `kill -9`'ing it and >> knowing that everything will come back fine. >> >> My $0.02. >> >> - Josh >> >> On 6/23/14, 2:49 PM, Josh Elser wrote: >> >>> Another way you could word this is that Accumulo has a very "mature" >>> security implementation, whereas, like you pointed out, HBase has only >>> recently added this in 0.98. >>> >>> The note about how visibility being in the Key as opposed to the Value >>> also has impact when writing Iterators. Because the visibility is a >>> "first class citizen" instead of an afterthought, having it uniquely >>> define some pair makes aggregations much easier to think about, IMO. >>> This is especially prevalent when doing this server-side with an >>> Accumulo Iterator. >>> >>> There are also other differences between the implementations visibility >>> filtering, the most common being the support of a "NOT" operator in >>> HBase whereas Accumulo explicitly chose not to implement this. By >>> allowing "NOT" into the syntax, it becomes much more possible that data >>> is inadvertently leaked. Marking data correctly is more difficult than >>> it seems and introducing the ability to negate certain branches makes it >>> even more difficult. Auditors are scary :) >>> >>> - Josh >>> >>> On 6/23/14, 2:34 PM, Aaron wrote: >>> >>>> I'm not sure of all the differences, but, wrt HBase Cell Level security >>>> (CLS)..while similar..not 100% the same. If I understand how the HBase >>>> CLS works it's extension to ACL system. And that ACL is "applied" to a >>>> cell. In Accumulo's case, it is part of the key. So the ramification >>>> is that in Accumulo, you can have: >>>> >>>> RowID, CF, CQ, VIS1, TS --> Value1 >>>> RowID, CF, CQ, VIS2, TS --> Value2 >>>> >>>> If everything is the same, including the timestamp, the visibility can >>>> actually determine which value to return. So, a more concrete example >>>> would be: >>>> >>>> XXX, METADATA, NAME, everyone, 100--> Bruce Wayne >>>> XXX, METADATA, NAME, alfred-only, 100--> Batman >>>> >>>> Where Alfred could/would see both "values"...but, everyone else would >>>> only see "Bruce" >>>> >>>> Hope that helps. >>>> >>>> Cheers, >>>> Aaron >>>> >>>> PS: this is my understanding of how HBase CLS works...based on what I >>>> have read/interpreted. >>>> >>>> >>>> >>>> On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Er... basically I need to explain to my manager why choosing >>>> Accumulo, instead of HBase. >>>> >>>> So what are the pros and cons of Accumulo vs. HBase? (btw HBase 0.98 >>>> also got cell-level security, modeled after Accumulo) >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>>> >>>> -- Donald Miner Chief Technology Officer ClearEdge IT Solutions, LLC Cell: 443 799 7807 www.clearedgeit.com
