On Tue, Aug 30, 2011 at 1:42 PM, Joe Pallas <[email protected]>wrote:
>
> On Aug 30, 2011, at 2:47 AM, Andrew Purtell wrote:
>
> > Better to focus on improving HBase than play whack a mole.
>
> Absolutely. So let's talk about improving HBase. I'm speaking here as
> someone who has been learning about and experimenting with HBase for more
> than six months.
>
> > HBase supports replication between clusters (i.e. data centers).
>
> That’s … debatable. There's replication support in the code, but several
> times in the recent past when someone asked about it on this mailing list,
> the response was “I don't know of anyone actually using it.” My
> understanding of replication is that you can't replicate any existing data,
> so unless you activated it on day one, it isn't very useful. Do I
> misunderstand?
>
> > Cassandra does not have strong consistency in the sense that HBase
> provides. It can provide strong consistency, but at the cost of failing any
> read if there is insufficient quorum. HBase/HDFS does not have that
> limitation. On the other hand, HBase has its own and different scenarios
> where data may not be immediately available. The differences between the
> systems are nuanced and which to use depends on the use case requirements.
>
> That's fair enough, although I think your first two sentences nearly
> contradict each other :-). If you use N=3, W=3, R=1 in Cassandra, you
> should get similar behavior to HBase/HDFS with respect to consistency and
> availability ("strong" consistency and reads do not fail if any one copy is
> available).
>
> A more important point, I think, is the one about storage. HBase uses two
> different kinds of files, data files and logs, but HDFS doesn't know about
> that and cannot, for example, optimize data files for write throughput (and
> random reads) and log files for low latency sequential writes. (For
> example, how could performance be improved by adding solid-state disk?)
>
> > Cassandra's RandomPartitioner / hash based partitioning means efficient
> MapReduce or table scanning is not possible, whereas HBase's distributed
> ordered tree is naturally efficient for such use cases, I believe explaining
> why Hadoop users often prefer it. This may or may not be a problem for any
> given use case.
>
> I don't think you can make a blanket statement that random partitioning
> makes efficient MapReduce impossible (scanning, yes). Many M/R tasks
> process entire tables. Random partitioning has definite advantages for some
> cases, and HBase might well benefit from recognizing that and adding some
> support.
>
> > Cassandra is no less complex than HBase. All of this complexity is
> "hidden" in the sense that with Hadoop/HBase the layering is obvious --
> HDFS, HBase, etc. -- but the Cassandra internals are no less layered.
>
> Operationally, however, HBase is more complex. Admins have to configure
> and manage ZooKeeper, HDFS, and HBase. Could this be improved?
>
> > With Cassandra, all RPC is via Thrift with various wrappers, so actually
> all Cassandra clients are second class in the sense that jbellis means when
> he states "Non-Java clients are not second-class citizens".
>
> That's disingenuous. Thrift exposes all of the Cassandra API to all of the
> wrappers, while HBase clients who want to use all of the HBase API must use
> Java. That can be fixed, but it is the status quo.
>
> joe
>
>
Hooked into another Cassandra hbase thread...
Cassandra's RandomPartitioner / hash based partitioning means efficient
MapReduce or table scanning is not possible, whereas HBase's distributed
ordered tree is naturally efficient for such use cases, I believe explaining
why Hadoop users often prefer it. This may or may not be a problem for any
given use case.
Many people can and do benefit with this property of HBase. Efficient
map/reduce still strikes me as an oxymoron :) Yes you can 'push down'
something like 'WHERE key > x and key < y', It is pretty nifty. That does
not really bring you all the way to complex queries. Cassandra now has
support for built in secondary indexes, and I think soon users will be able
to 'push down' where clauses for 'efficent' map reduce. Also you can
currently range scan on columns (in both directions) in c* which are
efficient. So if you can turn a key ranging design into a column ranging
design you can get the same effect. With both systems Hbase and Cassandra
you likely end up needing to design data around your queries.
Cassandra is no less complex than HBase. All of this complexity is "hidden"
in the sense that with Hadoop/HBase the layering is obvious -- HDFS, HBase,
etc. -- but the Cassandra internals are no less layered.
*This is an opinion*. I will disagree on this one. For example, The
Cassandra gossip protocol exchanges two facts (IMHO) 'the state of the ring
UP/DOWN' and the 'token ownership' of nodes. This information only changes
when nodes join or leave the cluster. On the hbase side of things many small
regions are splitting and moving often this involves communication between
several components lets say master, zk, and region servers. One time setup
complexity is one factor, monitoring and troubleshooting is another. You
also have to consider:
1) making your Namenode actually redundant you need to depend on LinuxHa or
multiple NFS servers
2) someway of protecting your masters/ZK nodes from processor/disk
starvation (IE they need their own machine)
3) Java's semi-piggish memory usage profile, the fact that it rarely gives
it back to the OS, so sharing a system with multiple Java processes is not
ideal because each process tends to bubble up to higher then Xmx!
(DataNode,Regionserver,TaskTracker) same box.
The one JVM per node cassandra stack is less complex architecturally.
I would argue administratively but I do not know of anyone with ROI numbers
on ten node Cassandra vs Hbase clusters :)