Most of your points are dead-on. > Cassandra is no less complex than HBase. All of this complexity is > "hidden" in the sense that with Hadoop/HBase the layering is obvious -- > HDFS, HBase, etc. -- but the Cassandra internals are no less layered. > > Operationally, however, HBase is more complex. Admins have to configure > and manage ZooKeeper, HDFS, and HBase. Could this be improved? >
I strongly disagree with the premise[1]. Having personally been involved in the Digg Cassandra rollout, and spent up until a couple months ago being in part-time weekly contact with the Digg Cassandra administrator, and having very close ties to the SimpleGeo Cassandra admin, I know it is a fickle beast. Having also spent a good amount of time at StumbleUpon and Mozilla (and now Riot Games) I also see first-hand that HBase is far more stable and -- dare I say it? -- operationally more simple. So okay, HBase is "harder to set up" if following a step-by-step guide on a wiki is "hard,"[2] but it's FAR easier to administer. Cassandra is rife with cascading cluster failure scenarios. I would not recommend running Cassandra in a highly-available high-volume data scenario, but don't hesitate to do so for HBase. I do not know if this is a guaranteed (provable due to architecture) result, or just the result of the Cassandra community being... how shall I say... hostile to administrators. But then, to me it doesn't matter. Results do. -- Tim Ellis Data Architect, Riot Games [1] That said, the other part of your statement is spot-on, too. It's surely possible to improve the HBase architecture or simplify it. [2] I went from having never set up HBase nor ever used Chef to having functional Chef recipes that installed a functional HBase/HDFS cluster in about 2 weeks. From my POV, the biggest stumbling point was that HDFS by default stores critical data in the underlying filesystem's /tmp directory by default, which is, for lack of a better word, insane. If I had to suggest how to simplify "HBase installation," I'd ask for sane HDFS config files that are extremely common and difficult-to-ignore.
