Your company sounds lovely.

J-D

On Wed, Sep 7, 2011 at 11:10 PM, Something Something
<[email protected]> wrote:
> This is GREAT information folks.  This is why I like open source communities
> -:)  I will present this to management, but in the mean time, the management
> has thrown another *monkey* wrench.  They want me to check the possibility
> of replacing Netezza with *something*.  Of course, I want to propose
> replacing Netezza with HBase.  Anyway, it's best if I start another email
> thread.  Thanks again.
>
> On Wed, Sep 7, 2011 at 10:27 PM, Andrew Purtell <[email protected]> wrote:
>
>> > While generalizations are dangerous, the one place when C++ code could
>> > shine over java (JVM really) is one does not have to fight the GC.
>>
>> Yes.
>>
>> > That being said, the folks working on hbase
>> > have been actively been addressing this problem to the extent possible
>> > in pure java by using unmanaged heap memory. Search for "mslab hbase" to
>> > learn more about it.
>>
>> And Cloudera's Li Pi has been working on using off heap memory as a
>> secondary cache in HBASE-4027 and related jiras:
>> https://issues.apache.org/jira/browse/HBASE-4027 . I believe this is
>> important work. This gets us a lot closer to behaving like a C++-ish "large
>> memory" process than we can under a JVM GC regime, until perhaps G1 is
>> stable in what people run in production.
>>
>>
>> Best regards,
>>
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>>
>> >________________________________
>> >From: Arvind Jayaprakash <[email protected]>
>> >To: [email protected]
>> >Sent: Thursday, September 8, 2011 2:49 AM
>> >Subject: Re: HBase Vs CitrusLeaf?
>> >
>> >On Sep 06, Something Something wrote:
>> >>Anyway, before I spent a lot of time on it, I thought I should check if
>> >>anyone has compared HBase against CitrusLeaf.  If you've, I would greatly
>> >>appreciate it if you would share your experiences.
>> >
>> >Disclaimer: I was an early evaluator/tester of citrusleaf about a year
>> >ago when it was in its infancy. Though I am not affliated with them in
>> >any manner, I might be more benevolent to them than most readers of this
>> >mailing list.
>> >
>> >The short answer is that hbase & citrusleaf (called CL in remainder of
>> >the mail) are very different products.
>> >
>> >CL cares a lot more about predictable latencies than hbase does. This is
>> >manifested in two aspects of the design:
>> >
>> >* It is heavily optimized for large RAM + SSD usage. While hbase does
>> >a fair job of using RAM, I can say for sure that both the throughput and
>> >latency trends is much better with CL in cases where spinning disks are
>> >not used directly in the readwrite path.
>> >
>> >* Multiple machines can concurrently/actively handle requests for the
>> >same key, so the loss of one server does not mean that a range of keys
>> >is temporarily unavailable. A hbase cluster does have a partial,
>> >temporary outage when a region server dies. Things don't get back to
>> >normal immediately even when a new server takes over since not all
>> >region data may now be local disk reads. Even if they are, it won't be
>> >readily waiting for you in fast memory.
>> >
>> >* A third aspect that is more of a side-effect is that HDFS still has a
>> >SPOF in form the namenode does continue to be a cause for concern wrt
>> >overall uptime guarantees
>> >
>> >
>> >Here is where hbase would do much better:
>> >
>> >* It is designed for much larger data to the point where it is natural
>> >for the entire dataset to much larger than the total available RAM and
>> >the usage of hard disks as the primary storage medium is natural.
>> >
>> >* A bigtable implementation is also designed for both ranged scans and
>> >also full table scans. Last I recall, CL was more of a DHT and so ranged
>> >scans is infeasible and doing full scans would qualify as much more than
>> >shooting oneself in the foot.
>> >
>> >
>> >And here is where hbase has advantages in principle:
>> >
>> >* As others mentioned, there are "textbook" advantages of using an open
>> >source solution.
>> >
>> >* hbase definitely has run both longer and on larger clusters than CL
>> >possibly has.
>> >
>> >
>> >While generalizations are dangerous, the one place when C++ code could
>> >shine over java (JVM really) is one does not have to fight the GC. I'd
>> >personally be more confomtable with handing off say 48GB of memory to a
>> >good C/C++ code than the JVM. That being said, the folks working on hbase
>> >have been actively been addressing this problem to the extent possible
>> >in pure java by using unmanaged heap memory. Search for "mslab hbase" to
>> >learn more about it.
>> >
>> >
>> >My conclusion is that the two products address different problem spaces.
>> >So I'd urge you to spend time understanding your access patterns and see
>> >which one does it map to more closely. Feel free to contact me off list
>> >if you feel the need to ask anything that is not approrpiate for the
>> >mailing list but is relevant to this discussion.
>> >
>> >
>> >
>>
>

Reply via email to