> While generalizations are dangerous, the one place when C++ code could
> shine over java (JVM really) is one does not have to fight the GC.

Yes.

> That being said, the folks working on hbase
> have been actively been addressing this problem to the extent possible
> in pure java by using unmanaged heap memory. Search for "mslab hbase" to
> learn more about it.
 
And Cloudera's Li Pi has been working on using off heap memory as a secondary 
cache in HBASE-4027 and related jiras: 
https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is important 
work. This gets us a lot closer to behaving like a C++-ish "large memory" 
process than we can under a JVM GC regime, until perhaps G1 is stable in what 
people run in production.


Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)


>________________________________
>From: Arvind Jayaprakash <[email protected]>
>To: [email protected]
>Sent: Thursday, September 8, 2011 2:49 AM
>Subject: Re: HBase Vs CitrusLeaf?
>
>On Sep 06, Something Something wrote:
>>Anyway, before I spent a lot of time on it, I thought I should check if
>>anyone has compared HBase against CitrusLeaf.  If you've, I would greatly
>>appreciate it if you would share your experiences.
>
>Disclaimer: I was an early evaluator/tester of citrusleaf about a year
>ago when it was in its infancy. Though I am not affliated with them in
>any manner, I might be more benevolent to them than most readers of this
>mailing list.
>
>The short answer is that hbase & citrusleaf (called CL in remainder of
>the mail) are very different products. 
>
>CL cares a lot more about predictable latencies than hbase does. This is
>manifested in two aspects of the design:
>
>* It is heavily optimized for large RAM + SSD usage. While hbase does
>a fair job of using RAM, I can say for sure that both the throughput and
>latency trends is much better with CL in cases where spinning disks are
>not used directly in the readwrite path.
>
>* Multiple machines can concurrently/actively handle requests for the
>same key, so the loss of one server does not mean that a range of keys
>is temporarily unavailable. A hbase cluster does have a partial,
>temporary outage when a region server dies. Things don't get back to
>normal immediately even when a new server takes over since not all
>region data may now be local disk reads. Even if they are, it won't be
>readily waiting for you in fast memory.
>
>* A third aspect that is more of a side-effect is that HDFS still has a
>SPOF in form the namenode does continue to be a cause for concern wrt
>overall uptime guarantees
>
>
>Here is where hbase would do much better:
>
>* It is designed for much larger data to the point where it is natural 
>for the entire dataset to much larger than the total available RAM and
>the usage of hard disks as the primary storage medium is natural.
>
>* A bigtable implementation is also designed for both ranged scans and
>also full table scans. Last I recall, CL was more of a DHT and so ranged
>scans is infeasible and doing full scans would qualify as much more than
>shooting oneself in the foot.
>
>
>And here is where hbase has advantages in principle:
>
>* As others mentioned, there are "textbook" advantages of using an open
>source solution.
>
>* hbase definitely has run both longer and on larger clusters than CL
>possibly has.
>
>
>While generalizations are dangerous, the one place when C++ code could
>shine over java (JVM really) is one does not have to fight the GC. I'd
>personally be more confomtable with handing off say 48GB of memory to a
>good C/C++ code than the JVM. That being said, the folks working on hbase
>have been actively been addressing this problem to the extent possible
>in pure java by using unmanaged heap memory. Search for "mslab hbase" to
>learn more about it.
>
>
>My conclusion is that the two products address different problem spaces.
>So I'd urge you to spend time understanding your access patterns and see
>which one does it map to more closely. Feel free to contact me off list
>if you feel the need to ask anything that is not approrpiate for the
>mailing list but is relevant to this discussion.
>
>
>

Reply via email to