Awesome stuff, Benoit.
 
> Hello,
> I use Netty extensively, and for a long time I wanted to have a fully
> asynchronous / non-blocking and thread-safe HBase client.  So I ended
> up writing one from scratch, which I just released at:
> http://github.com/stumbleupon/asynchbase

So, this implements the wire protocol of our current RPC?

I imagine we'll want to change our server-side RPC mechanisms at some point 
soon.  Could this be applied to re-implementing the RPC servers?

> It's rather different from HBase's own client (HTable).  The core of
> asynchbase is made of HBaseClient (javadoc @ http://su.pr/1PJCSY), and
> you normally need only a single instance (vs. one per table with
> HTable).  This instance is entirely thread-safe (I think :D) and
> scales well in my limited loadtests (on a 4 core machine it scales
> linearly).  I wrote it originally for another project that relies
> heavily on HBase and Netty (a scalable time series database we use for
> monitoring at StumbleUpon - http://opentsdb.net), which is gonna be
> open sourced most likely during the last week of September.  In some
> write-heavy code paths I'm seeing a 3x to 4x throughput improvement
> with asynchbase.

Awesome.

> There's a heated debate ongoing at StumbleUpon about the license of
> the client.  Stack wants to use it in HBase itself, but the LGPLv3+ is
> unfortunately incompatible with the ASF license.  Others simply seem
> to dislike anything with the substring "GPL" in it (:D).  I haven't
> had much time to find another license yet, but I think we're going to
> switch to something like a BSD or MIT style license.  I'm going in
> vacation soon so I wanted to get some feedback from other HBase users
> instead of blocking on this stupid and annoying licensing issue.

Personally, I think licensing is absolutely critical and not a stupid/annoying 
issue at all.

If you want uptake around what you're releasing, I would strongly recommend an 
Apache license.  Is there a reason that you don't seem to consider an ASF 
license but are considering BSD?
 
And I don't think it's just that people dislike stuff because it has a GPL in 
the license name.  That GPL comes with major ramifications.  It's viral, so all 
derivative works must be GPL.  And what you're building is based against an 
Apache project, so should be Apache compatible IMO.  Of course, this is your 
work and you can do with it whatever you'd like.  Just interested in motives I 
guess.

Is the aim to create a commercial system around this or to add something new to 
the HBase landscape for all to consume and build around?  Or do you have a fear 
that others will fork this into a commercial product and close source it?

> Your feedback or patches would be most welcome!
> 
> PS: The source code
> (http://github.com/stumbleupon/asynchbase/blob/master/src/HBaseRpc.java
> )
> contains an unofficial documentation of the Hadoop and HBase RPC
> protocols as well as Hadoop's variable-length encoding for integer
> values.  I've heard that others may be interested in implementing
> native HBase clients in non-Java languages, so I thought I'd pass this
> around to save their time.

Do you think this is the best long-term approach?  Deciphering the (already 
horrid) HBase RPC wire protocol to implement native clients?  Agree this was 
your own way forward at this point but wondering what your thoughts are about 
the long-term.

I'd much rather see us move to something that doesn't require us to reverse 
engineer but instead is designed for just this.

This is sweet, Benoit.  I think your throughput results point to the fact that 
we do need to change our RPC ASAP and get into a world where it's easier for 
anyone to write new native clients.  It's silly to think our client and RPC 
code are killing performance given all the work we do in our RSs to make them 
faster.

JG

Reply via email to