Your company sounds lovely. J-D
On Wed, Sep 7, 2011 at 11:10 PM, Something Something <[email protected]> wrote: > This is GREAT information folks. This is why I like open source communities > -:) I will present this to management, but in the mean time, the management > has thrown another *monkey* wrench. They want me to check the possibility > of replacing Netezza with *something*. Of course, I want to propose > replacing Netezza with HBase. Anyway, it's best if I start another email > thread. Thanks again. > > On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <[email protected]> wrote: > >> > While generalizations are dangerous, the one place when C++ code could >> > shine over java (JVM really) is one does not have to fight the GC. >> >> Yes. >> >> > That being said, the folks working on hbase >> > have been actively been addressing this problem to the extent possible >> > in pure java by using unmanaged heap memory. Search for "mslab hbase" to >> > learn more about it. >> >> And Cloudera's Li Pi has been working on using off heap memory as a >> secondary cache in HBASE-4027 and related jiras: >> https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is >> important work. This gets us a lot closer to behaving like a C++-ish "large >> memory" process than we can under a JVM GC regime, until perhaps G1 is >> stable in what people run in production. >> >> >> Best regards, >> >> >> - Andy >> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein >> (via Tom White) >> >> >> >________________________________ >> >From: Arvind Jayaprakash <[email protected]> >> >To: [email protected] >> >Sent: Thursday, September 8, 2011 2:49 AM >> >Subject: Re: HBase Vs CitrusLeaf? >> > >> >On Sep 06, Something Something wrote: >> >>Anyway, before I spent a lot of time on it, I thought I should check if >> >>anyone has compared HBase against CitrusLeaf. If you've, I would greatly >> >>appreciate it if you would share your experiences. >> > >> >Disclaimer: I was an early evaluator/tester of citrusleaf about a year >> >ago when it was in its infancy. Though I am not affliated with them in >> >any manner, I might be more benevolent to them than most readers of this >> >mailing list. >> > >> >The short answer is that hbase & citrusleaf (called CL in remainder of >> >the mail) are very different products. >> > >> >CL cares a lot more about predictable latencies than hbase does. This is >> >manifested in two aspects of the design: >> > >> >* It is heavily optimized for large RAM + SSD usage. While hbase does >> >a fair job of using RAM, I can say for sure that both the throughput and >> >latency trends is much better with CL in cases where spinning disks are >> >not used directly in the readwrite path. >> > >> >* Multiple machines can concurrently/actively handle requests for the >> >same key, so the loss of one server does not mean that a range of keys >> >is temporarily unavailable. A hbase cluster does have a partial, >> >temporary outage when a region server dies. Things don't get back to >> >normal immediately even when a new server takes over since not all >> >region data may now be local disk reads. Even if they are, it won't be >> >readily waiting for you in fast memory. >> > >> >* A third aspect that is more of a side-effect is that HDFS still has a >> >SPOF in form the namenode does continue to be a cause for concern wrt >> >overall uptime guarantees >> > >> > >> >Here is where hbase would do much better: >> > >> >* It is designed for much larger data to the point where it is natural >> >for the entire dataset to much larger than the total available RAM and >> >the usage of hard disks as the primary storage medium is natural. >> > >> >* A bigtable implementation is also designed for both ranged scans and >> >also full table scans. Last I recall, CL was more of a DHT and so ranged >> >scans is infeasible and doing full scans would qualify as much more than >> >shooting oneself in the foot. >> > >> > >> >And here is where hbase has advantages in principle: >> > >> >* As others mentioned, there are "textbook" advantages of using an open >> >source solution. >> > >> >* hbase definitely has run both longer and on larger clusters than CL >> >possibly has. >> > >> > >> >While generalizations are dangerous, the one place when C++ code could >> >shine over java (JVM really) is one does not have to fight the GC. I'd >> >personally be more confomtable with handing off say 48GB of memory to a >> >good C/C++ code than the JVM. That being said, the folks working on hbase >> >have been actively been addressing this problem to the extent possible >> >in pure java by using unmanaged heap memory. Search for "mslab hbase" to >> >learn more about it. >> > >> > >> >My conclusion is that the two products address different problem spaces. >> >So I'd urge you to spend time understanding your access patterns and see >> >which one does it map to more closely. Feel free to contact me off list >> >if you feel the need to ask anything that is not approrpiate for the >> >mailing list but is relevant to this discussion. >> > >> > >> > >> >
