We had Hadoop Summit Europe last week where we had an HBase Meetup. First we
had Enis talk about HBase Architectureand then Lars talked about some
interesting HBase Use CasesFinally, we opened it up to the public where we had
a frank discussion on the Uptake of HBase vs. other NoSQL DB's such as Mongo
and Cassandra. This wasn't about bashing other DB's, just understanding how the
spectrum of NoSQL DB's was leading to a evaluation/production use of HBase. It
was also partly based on the report from
InfoWorldhttp://podcasts.infoworld.com/d/big-data/big-data-showdown-cassandra-vs-hbase-239592Anyways
these were the major points we discussed(Lars and Jon Hsieh from Cloudera,
Enis and Devaraj from Hortonworks contributed with about input from 12 other
users from the community)Documentation - Cassandra has a better web page than
HBase does. Even though HBase's documentation is complete, finding the
documentation is a bit hard. Installation - HBase is hard to install for the
newbie. I think there has been some effort to make this more friendly by
wrapping the master in RegionServersVendor Pushes - Cassandra has DataStax,
Pentaho pushes Mongo, Cloudera pushes Impala, MapR is pushing their proprietary
FS, IBM their own DB's. Even though HBase is part of the Hadoop Ecosystem,
there is no one vendor that is exclusively pushing HBase to uptake by the
community or even by the Hadoop communityMessaging - HBase has been at the
backend of a no. of negative marketing by various vendors over things that were
possibly true in the past. For e.g. Lars mentioned that a certain vendor was
incorrectly stating that HBase has issue with SPOF even though this hasn't been
true for quite some time. Similarly, Jon mentioned that a certain slide where
he was talking about the complexity of HBase was taken out of context and shown
as a negative implementation of HBaseSQL based solutions - Even though there
are a no. of efforts to showcase that HBase has some SQL based interfaces
available like Phoenix, Impala & Hive(Albeit some issues), there is still
misconception that HBase is purely accessed via JavaSecurity in HBase - Even
though 0.98 has Security, it needs to be road tested.Some recommendations:Push
messaging out and make it more clear - Apache blogs, Hortonworks Blogs,
Cloudera blogsDocumentation - David Worms, who is a consultant out of France,
has volunteered to help make the website better. You may want to reach out to
him - fr.linkedin.com/pub/david-worms/7/626/630Cost Calculator - Lars made a
great point of having a cost calculator ability to estimate the cost of various
operations. This makes it very likely by bigger organizations to pick and
choose HBase by understanding how they affect the bottom line
Update from Andrew -
"HBase has had strong security since 0.94 if not 0.92 - secure RPC and ACLs at
the table and column family level. We had these features before Cassandra and
even Accumulo.Why stuff like that gets lost is we are a bunch of engineers not
marketers. The trouble with messaging is someone has to write it. Since it's a
joyless job for most engineers, someone must be paid to do it. "
Thanks
Subash