Have you got started with the blog, in case not I can spare some time writing about it? High ingestion speed and scan wrt to HBase is eye opener for me, I would be interested to dig deeper into it my self. Wondering if anyone had some information about the ingestion/scan speed with the Cassandra?
On Tue, Jun 24, 2014 at 1:58 AM, Josh Elser <[email protected]> wrote: > Noted: I'll add it to the top of my "to blog" queue. If anyone else wants > to do a write-up, I'm happy to help. > > > On 6/23/14, 4:23 PM, Donald Miner wrote: > >> This needs to be documented on the official blog. >> >> >> On Mon, Jun 23, 2014 at 3:31 PM, Josh Elser <[email protected] >> <mailto:[email protected]>> wrote: >> >> Sent too quickly.. >> >> - The BatchScanner is communicating to tservers in *parallel* which >> is where this really shows it strength. >> >> - A "default" locality group. You don't have to define the locality >> groups for a table at creation time in Accumulo (or have to modify >> the table if you want to insert a new column family). Because of >> this, you have a lot more flexibility in how you structure your >> tables while also being able to take advantage of the efficient >> filtering you get having locality groups you have configured. Adding >> a new locality group does still require a compaction to re-write the >> data in separate files. >> >> >> On 6/23/14, 3:24 PM, Josh Elser wrote: >> >> A few observations I can make from watching both communities >> (although >> only really participating in Accumulo's). >> >> - HBase undeniably has a much larger public community of both >> users and >> developers; however, we are seeing broader adoption across >> different >> vertical markets with Accumulo. IMO, I think we have a rather >> responsive >> community built up here. Lots of smart people are working that are >> available and happy to help with problems. >> >> - BatchScanner: The BatchScanner is a query construct which will >> automatically fetch data from a collection of Ranges on a table >> and >> return the results in the form of a Java Iterator. This makes >> for a very >> natural way to read lots of data from Accumulo, automatically >> performing >> some reduction in the data server-side (using Accumulo >> Iterators), and >> getting a wonderfully simple Iterator<Entry<Key,Value>> in your >> client >> code. It really helps to encourage a state-less and >> functional-like >> style to your code. >> >> I really like it, and, when combined with the ability to push a >> bunch of >> work server-side, it has often kept me from having to write >> MapReduce >> jobs (which is always a win to me). >> >> - Accumulo Iterators are a common thing you might hear as a >> difference. >> AFAICT, they're a bit more powerful than what you can do with >> HBase >> filters because you are presented with a stream of Key-Value pairs >> inside of the TServer. Again, it's a bit functional programming >> inspired. You have the ability to combine, consume, seek within >> the >> stream and do what you please (more context would be helpful in >> giving >> specific examples) >> >> That being said, Iterators do come with a learning curve, but >> that's to >> be expected with the amount of flexibility they provide. It's >> just like >> anything else :) >> >> - <disclaimer>I can't comment about running HBase in production >> environments, but I tend to hear a lot of "war stories" about >> it. I also >> don't know how much of this is from running old version of HBase >> which >> don't have known issues patched. </disclaimer> >> >> In my experience, Accumulo just works. It doesn't require much >> day-to-day interaction, processes stay running and if some node >> goes >> haywire, I have absolutely no qualms against `kill -9`'ing it and >> knowing that everything will come back fine. >> >> My $0.02. >> >> - Josh >> >> On 6/23/14, 2:49 PM, Josh Elser wrote: >> >> Another way you could word this is that Accumulo has a very >> "mature" >> security implementation, whereas, like you pointed out, >> HBase has only >> recently added this in 0.98. >> >> The note about how visibility being in the Key as opposed to >> the Value >> also has impact when writing Iterators. Because the >> visibility is a >> "first class citizen" instead of an afterthought, having it >> uniquely >> define some pair makes aggregations much easier to think >> about, IMO. >> This is especially prevalent when doing this server-side with >> an >> Accumulo Iterator. >> >> There are also other differences between the implementations >> visibility >> filtering, the most common being the support of a "NOT" >> operator in >> HBase whereas Accumulo explicitly chose not to implement >> this. By >> allowing "NOT" into the syntax, it becomes much more >> possible that data >> is inadvertently leaked. Marking data correctly is more >> difficult than >> it seems and introducing the ability to negate certain >> branches makes it >> even more difficult. Auditors are scary :) >> >> - Josh >> >> On 6/23/14, 2:34 PM, Aaron wrote: >> >> I'm not sure of all the differences, but, wrt HBase Cell >> Level security >> (CLS)..while similar..not 100% the same. If I >> understand how the HBase >> CLS works it's extension to ACL system. And that ACL is >> "applied" to a >> cell. In Accumulo's case, it is part of the key. So >> the ramification >> is that in Accumulo, you can have: >> >> RowID, CF, CQ, VIS1, TS --> Value1 >> RowID, CF, CQ, VIS2, TS --> Value2 >> >> If everything is the same, including the timestamp, the >> visibility can >> actually determine which value to return. So, a more >> concrete example >> would be: >> >> XXX, METADATA, NAME, everyone, 100--> Bruce Wayne >> XXX, METADATA, NAME, alfred-only, 100--> Batman >> >> Where Alfred could/would see both "values"...but, >> everyone else would >> only see "Bruce" >> >> Hope that helps. >> >> Cheers, >> Aaron >> >> PS: this is my understanding of how HBase CLS >> works...based on what I >> have read/interpreted. >> >> >> >> On Mon, Jun 23, 2014 at 1:55 PM, Jianshi Huang >> <[email protected] <mailto:[email protected]> >> <mailto:jianshi.huang@gmail.__com >> >> <mailto:[email protected]>>> wrote: >> >> Er... basically I need to explain to my manager why >> choosing >> Accumulo, instead of HBase. >> >> So what are the pros and cons of Accumulo vs. >> HBase? (btw HBase 0.98 >> also got cell-level security, modeled after Accumulo) >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> >> >> >> >> >> -- >> * >> *Donald Miner >> >> Chief Technology Officer >> ClearEdge IT Solutions, LLC >> Cell: 443 799 7807 >> www.clearedgeit.com <http://www.clearedgeit.com> >> >
