> While generalizations are dangerous, the one place when C++ code could > shine over java (JVM really) is one does not have to fight the GC.
Yes. > That being said, the folks working on hbase > have been actively been addressing this problem to the extent possible > in pure java by using unmanaged heap memory. Search for "mslab hbase" to > learn more about it. And Cloudera's Li Pi has been working on using off heap memory as a secondary cache in HBASE-4027 and related jiras: https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is important work. This gets us a lot closer to behaving like a C++-ish "large memory" process than we can under a JVM GC regime, until perhaps G1 is stable in what people run in production. Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >________________________________ >From: Arvind Jayaprakash <[email protected]> >To: [email protected] >Sent: Thursday, September 8, 2011 2:49 AM >Subject: Re: HBase Vs CitrusLeaf? > >On Sep 06, Something Something wrote: >>Anyway, before I spent a lot of time on it, I thought I should check if >>anyone has compared HBase against CitrusLeaf. If you've, I would greatly >>appreciate it if you would share your experiences. > >Disclaimer: I was an early evaluator/tester of citrusleaf about a year >ago when it was in its infancy. Though I am not affliated with them in >any manner, I might be more benevolent to them than most readers of this >mailing list. > >The short answer is that hbase & citrusleaf (called CL in remainder of >the mail) are very different products. > >CL cares a lot more about predictable latencies than hbase does. This is >manifested in two aspects of the design: > >* It is heavily optimized for large RAM + SSD usage. While hbase does >a fair job of using RAM, I can say for sure that both the throughput and >latency trends is much better with CL in cases where spinning disks are >not used directly in the readwrite path. > >* Multiple machines can concurrently/actively handle requests for the >same key, so the loss of one server does not mean that a range of keys >is temporarily unavailable. A hbase cluster does have a partial, >temporary outage when a region server dies. Things don't get back to >normal immediately even when a new server takes over since not all >region data may now be local disk reads. Even if they are, it won't be >readily waiting for you in fast memory. > >* A third aspect that is more of a side-effect is that HDFS still has a >SPOF in form the namenode does continue to be a cause for concern wrt >overall uptime guarantees > > >Here is where hbase would do much better: > >* It is designed for much larger data to the point where it is natural >for the entire dataset to much larger than the total available RAM and >the usage of hard disks as the primary storage medium is natural. > >* A bigtable implementation is also designed for both ranged scans and >also full table scans. Last I recall, CL was more of a DHT and so ranged >scans is infeasible and doing full scans would qualify as much more than >shooting oneself in the foot. > > >And here is where hbase has advantages in principle: > >* As others mentioned, there are "textbook" advantages of using an open >source solution. > >* hbase definitely has run both longer and on larger clusters than CL >possibly has. > > >While generalizations are dangerous, the one place when C++ code could >shine over java (JVM really) is one does not have to fight the GC. I'd >personally be more confomtable with handing off say 48GB of memory to a >good C/C++ code than the JVM. That being said, the folks working on hbase >have been actively been addressing this problem to the extent possible >in pure java by using unmanaged heap memory. Search for "mslab hbase" to >learn more about it. > > >My conclusion is that the two products address different problem spaces. >So I'd urge you to spend time understanding your access patterns and see >which one does it map to more closely. Feel free to contact me off list >if you feel the need to ask anything that is not approrpiate for the >mailing list but is relevant to this discussion. > > >
