Hi Andrew, Would you mind if I paraphrase your responses on StackOverflow?
-chris On Aug 30, 2011, at 2:47 AM, Andrew Purtell wrote: > Hi Chris, > > Appreciate your answer on the post. > > Personally speaking however the endless Cassandra vs. HBase discussion is > tiresome and rarely do blog posts or emails in this regard shed any light. > Often, Cassandra proponents mis-state their case out of ignorance of HBase or > due to commercial or personal agendas. It is difficult to find clear eyed > analysis among the partisans. I'm not sure it will make any difference > posting a rebuttal to some random thing jbellis says. Better to focus on > improving HBase than play whack a mole. > > > Regarding some of the specific points in that post: > > HBase is proven in production deployments larger than the largest publicly > reported Cassandra cluster, ~1K versus 400 or 700 or somesuch. But basically > this is the same order of magnitude, with HBase having a slight edge. I don't > see a meaningful difference here. Stating otherwise is false. > > HBase supports replication between clusters (i.e. data centers). I believe, > but admit I'm not super familiar with the Cassandra option here, that the > main difference is HBase provides simple mechanism and the user must build a > replication architecture useful for them; while Cassandra attempts to hide > some of that complexity. I do not know if they succeed there, but large scale > cross data center replication is rarely one size fits all so I doubt it. > > Cassandra does not have strong consistency in the sense that HBase provides. > It can provide strong consistency, but at the cost of failing any read if > there is insufficient quorum. HBase/HDFS does not have that limitation. On > the other hand, HBase has its own and different scenarios where data may not > be immediately available. The differences between the systems are nuanced and > which to use depends on the use case requirements. > > Cassandra's RandomPartitioner / hash based partitioning means efficient > MapReduce or table scanning is not possible, whereas HBase's distributed > ordered tree is naturally efficient for such use cases, I believe explaining > why Hadoop users often prefer it. This may or may not be a problem for any > given use case. Using an ordered partitioner with Cassandra used to require > frequent manual rebalancing to avoid blowing up nodes. I don't know if more > recent versions still have this mis-feature. > > Cassandra is no less complex than HBase. All of this complexity is "hidden" > in the sense that with Hadoop/HBase the layering is obvious -- HDFS, HBase, > etc. -- but the Cassandra internals are no less layered. An impartial > analysis of implementation and algorithms will reveal that Cassandra's theory > of operation in its full detail is substantially more complex. Compare the > BigTable and Dynamo papers and this is clear. There are actually more > opportunities for something to go wrong with Cassandra. > > While we are looking at codebases, it should be noted that HBase has > substantially more unit tests. > > With Cassandra, all RPC is via Thrift with various wrappers, so actually all > Cassandra clients are second class in the sense that jbellis means when he > states "Non-Java clients are not second-class citizens". > > The master-slave versus peer-to-peer argument is larger than Cassandra vs. > HBase, and not nearly as one sided as claimed. The famous (infamous?) global > failure of Amazon's S3 in 2008, a fully peer-to-peer system, due to a single > flipped bit in a gossip message demonstrates how in peer to peer systems > every node can be a single point of failure. There is no obvious winner, > instead, a series of trade offs. Claiming otherwise is intellectually > dishonest. Master-slave architectures seem easier to operate and reason about > in my experience. Of course, I'm partial there. > > I have just scratched the surface. > > > Best regards, > > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via > Tom White) > > >> ________________________________ >> From: Chris Tarnas <[email protected]> >> To: [email protected] >> Sent: Tuesday, August 30, 2011 2:02 PM >> Subject: HBase and Cassandra on StackOverflow >> >> Someone with better knowledge than might be interested in helping answer >> this question over at StackOverflow: >> >> http://stackoverflow.com/questions/7237271/large-scale-data-processing-hbase-cassandra >> >> -chris >>
