Re: HBase and Cassandra on StackOverflow

Chris Tarnas Tue, 30 Aug 2011 10:19:55 -0700

Hi Andrew,

Would you mind if I paraphrase your responses on StackOverflow?


-chris

On Aug 30, 2011, at 2:47 AM, Andrew Purtell wrote:

> Hi Chris,
> 
> Appreciate your answer on the post.
> 
> Personally speaking however the endless Cassandra vs. HBase discussion is 
> tiresome and rarely do blog posts or emails in this regard shed any light. 
> Often, Cassandra proponents mis-state their case out of ignorance of HBase or 
> due to commercial or personal agendas. It is difficult to find clear eyed 
> analysis among the partisans. I'm not sure it will make any difference 
> posting a rebuttal to some random thing jbellis says. Better to focus on 
> improving HBase than play whack a mole.
> 
> 
> Regarding some of the specific points in that post:
> 
> HBase is proven in production deployments larger than the largest publicly 
> reported Cassandra cluster, ~1K versus 400 or 700 or somesuch. But basically 
> this is the same order of magnitude, with HBase having a slight edge. I don't 
> see a meaningful difference here. Stating otherwise is false.
> 
> HBase supports replication between clusters (i.e. data centers). I believe, 
> but admit I'm not super familiar with the Cassandra option here, that the 
> main difference is HBase provides simple mechanism and the user must build a 
> replication architecture useful for them; while Cassandra attempts to hide 
> some of that complexity. I do not know if they succeed there, but large scale 
> cross data center replication is rarely one size fits all so I doubt it.
> 
> Cassandra does not have strong consistency in the sense that HBase provides. 
> It can provide strong consistency, but at the cost of failing any read if 
> there is insufficient quorum. HBase/HDFS does not have that limitation. On 
> the other hand, HBase has its own and different scenarios where data may not 
> be immediately available. The differences between the systems are nuanced and 
> which to use depends on the use case requirements.
> 
> Cassandra's RandomPartitioner / hash based partitioning means efficient 
> MapReduce or table scanning is not possible, whereas HBase's distributed 
> ordered tree is naturally efficient for such use cases, I believe explaining 
> why Hadoop users often prefer it. This may or may not be a problem for any 
> given use case. Using an ordered partitioner with Cassandra used to require 
> frequent manual rebalancing to avoid blowing up nodes. I don't know if more 
> recent versions still have this mis-feature.
> 
> Cassandra is no less complex than HBase. All of this complexity is "hidden" 
> in the sense that with Hadoop/HBase the layering is obvious -- HDFS, HBase, 
> etc. -- but the Cassandra internals are no less layered. An impartial 
> analysis of implementation and algorithms will reveal that Cassandra's theory 
> of operation in its full detail is substantially more complex. Compare the 
> BigTable and Dynamo papers and this is clear. There are actually more 
> opportunities for something to go wrong with Cassandra.
> 
> While we are looking at codebases, it should be noted that HBase has 
> substantially more unit tests.
> 
> With Cassandra, all RPC is via Thrift with various wrappers, so actually all 
> Cassandra clients are second class in the sense that jbellis means when he 
> states "Non-Java clients are not second-class citizens".
> 
> The master-slave versus peer-to-peer argument is larger than Cassandra vs. 
> HBase, and not nearly as one sided as claimed. The famous (infamous?) global 
> failure of Amazon's S3 in 2008, a fully peer-to-peer system, due to a single 
> flipped bit in a gossip message demonstrates how in peer to peer systems 
> every node can be a single point of failure. There is no obvious winner, 
> instead, a series of trade offs. Claiming otherwise is intellectually 
> dishonest. Master-slave architectures seem easier to operate and reason about 
> in my experience. Of course, I'm partial there.
> 
> I have just scratched the surface.
> 
> 
> Best regards,
> 
> 
>        - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
> Tom White)
> 
> 
>> ________________________________
>> From: Chris Tarnas <[email protected]>
>> To: [email protected]
>> Sent: Tuesday, August 30, 2011 2:02 PM
>> Subject: HBase and Cassandra on StackOverflow
>> 
>> Someone with better knowledge than might be interested in helping answer 
>> this question over at StackOverflow:
>> 
>> http://stackoverflow.com/questions/7237271/large-scale-data-processing-hbase-cassandra
>> 
>> -chris
>>

Re: HBase and Cassandra on StackOverflow

Reply via email to