Thrift Gateway Server, ZooKeeper & HBase

Pankaj Misra Mon, 01 Oct 2012 02:16:37 -0700

Dear All,

I would like to request your help for clearing some doubts that I have around 
the deployment view of these components. I have been able to do some tests on 
my pseudo-distributed environment and have been able to get very good 
throughput using Thrift client and gateway server. I need your help to have a 
clear view of the deployment components, so that I can further elaborate my 
environment with a clear thought process.


Based on my recent experiences on gateway based connectivity using thrift to 
access hbase regions, it occurs to me that in order to run a thrift server it 
has to be run on the hbase node itself. I am trying to envision the deployment 
view in context of thrift gateway server running on HBase node, ZooKeeper 
quorum and the HBase node themselves.

I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 
0.23.1 natively compiled and have installed the thrift library as per the 
installation instructions. I also see that running gateway servers on HBase is 
a big plus for a highly multi-threaded environment as it takes advantage of 
thread pooling. So since I am running my setup in a pseudo-distributed mode, I 
have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN and 1 SNN.

So if I have to illustrate my thinking here, the steps that I perform to have 
HBase running with thrift gateway server are
$HBASE_HOME/bin/start-hbase.sh                                         --> 
Starts the HBase node,  Zookeeper Quorum & Region Server
$HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift 
gateway server on hbase node

This makes me think that the thrift server is tightly coupled with every 
instance of HBase node. If I just need to scale thrift server from a load 
balancing perspective, I cannot do it independent of HBase scaling, I will have 
to add another HBase node in the cluster to have another thrift server for 
scalability.

Also with the above scenario in mind, what seems to me is that the thrift 
server which runs on HBase, requests zookeeper for the connection and zookeeper 
allocates and manages the connection lifecycle via native Java objects (HTable 
& HTablePool) objects for respective RegionServers based on key values. Based 
on my understanding, which may be incorrect, if thrift server has to run on 
HBase node, which would also be running region servers as well, why the calls 
have to go through the zookeeper? Or is it that once the client makes a 
successful connection with a thrift server (on an Hbase node),  which may be 
initially mediated by Zookeeper for allocation, the client interaction happens 
directly with the thrift server?

I would greatly appreciate your inputs to help me build correct understanding 
around the complete deployment view, as I may have an incorrect perception 
around it.

Thanks and Regards
Pankaj Misra


________________________________

Impetus Ranked in the Top 50 India's Best Companies to Work For 2012.

Impetus webcast 'Designing a Test Automation Framework for Multi-vendor 
Interoperable Systems' available at http://lf1.me/0E/.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.

Thrift Gateway Server, ZooKeeper & HBase

Reply via email to