Please help me overcome HBase's weaknesses

MauMau Sat, 04 Sep 2010 09:32:31 -0700

Hello,

We are considering which of HBase or Cassandra to choose for our futureprojects. I'm recommending HBase to my boss and coworkers, because HBase isgood both for analysis (MapReduce) and for OLTP (get/put provides relativelyfast response). Cassandra is superior in get/put response time, but it doesnot seem to be good at MapReduce because it can't perform range queriesbased on row keys (OPP can, but it seems difficult to use).

However, my boss points out the following as the weaknesses of HBase andinsists that we choose Cassandra. I prefer HBase because HBase has strongerpotential, thanks to its active community and rich ecosystem backed by themembership of Hadoop family. Are there any good explanations (or futureimprovement plans/ideas) to persuade him and change his mind?


(1) Ease of use

Cassandra does not require any other software. All nodes of Cassandra havethe same role. Pretty easy.On the other hand, HBase requires HDFS and ZooKeeper. Users have tomanipulate and manage HDFS and ZooKeeper. The nodes in the cluster havevarious roles, and the users need to design the placement of different typesof nodes.


(2) Failover time

One of our potential customers requires that the system completes failoverwithin one second. "One second" means the interval between when the systemdetects node failure and when the clients regain access to data.Cassandra continues to process requests if one of three replica nodesremains. Therefore, the requirement is met.However, HBase seems to take minutes, because it needs to reassign regionsto live region servers, open reassigned regions and load their block indexinto memory, and perform log application. As the hardware gets morepowerful, each node will be able to handle more regions. As a result,failover time will get longer in proportion to the number of regions, won'tit?

## My question:

Is it possible to improve failover time? If yes, how long will it getshortened?

##

(3) SPOF

Cassandra has no SPOF. HBase and future HDFS eliminates SPOF by using backupmasters, however, master failure *can* the entire system operation in someway. How long does it take to detect master failure and make one of thebackup masters promote to the new master and return to normal operation?


(4) Storage and analysis of sensor data

If the row key is (sensor_id) or (sensor_id, timestamp), Cassandra can hashthe row key and distribute inserts from many sensors to the entire cluster(no hotspot). Though MapReduce framework may throw commands to all nodes,the nodes that do not have related data will not do anything nor waste CPUor I/O resources.

## My question:
Is there any case study where HBase is used as a storage for sensor data?
##

Regards,
Maumau

Please help me overcome HBase's weaknesses

Reply via email to