Hi,

Inline.

On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <[email protected]> wrote:
> Dear All,
>
> I would like to request your help for clearing some doubts that I have around 
> the deployment view of these components. I have been able to do some tests on 
> my pseudo-distributed environment and have been able to get very good 
> throughput using Thrift client and gateway server. I need your help to have a 
> clear view of the deployment components, so that I can further elaborate my 
> environment with a clear thought process.
>
> Based on my recent experiences on gateway based connectivity using thrift to 
> access hbase regions, it occurs to me that in order to run a thrift server it 
> has to be run on the hbase node itself. I am trying to envision the 
> deployment view in context of thrift gateway server running on HBase node, 
> ZooKeeper quorum and the HBase node themselves.

A thrift server needs connectivity to all HBase and ZK service/daemon
nodes, but does not need to be co-located with one.

> I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 
> 0.23.1 natively compiled and have installed the thrift library as per the 
> installation instructions. I also see that running gateway servers on HBase 
> is a big plus for a highly multi-threaded environment as it takes advantage 
> of thread pooling. So since I am running my setup in a pseudo-distributed 
> mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN 
> and 1 SNN.
>
> So if I have to illustrate my thinking here, the steps that I perform to have 
> HBase running with thrift gateway server are
> $HBASE_HOME/bin/start-hbase.sh                                         --> 
> Starts the HBase node,  Zookeeper Quorum & Region Server
> $HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift 
> gateway server on hbase node
>
> This makes me think that the thrift server is tightly coupled with every 
> instance of HBase node. If I just need to scale thrift server from a load 
> balancing perspective, I cannot do it independent of HBase scaling, I will 
> have to add another HBase node in the cluster to have another thrift server 
> for scalability.

Do not couple library dependency with service dependency - both are
different things.

You may _install_ HBase libs on any machine connected to the cluster,
and start _just_ the thrift server on it. The HBase thrift server does
need HBase libraries to run, but does not need a local service to run
alongside.

> Also with the above scenario in mind, what seems to me is that the thrift 
> server which runs on HBase, requests zookeeper for the connection and 
> zookeeper allocates and manages the connection lifecycle via native Java 
> objects (HTable & HTablePool) objects for respective RegionServers based on 
> key values. Based on my understanding, which may be incorrect, if thrift 
> server has to run on HBase node, which would also be running region servers 
> as well, why the calls have to go through the zookeeper? Or is it that once 
> the client makes a successful connection with a thrift server (on an Hbase 
> node),  which may be initially mediated by Zookeeper for allocation, the 
> client interaction happens directly with the thrift server?

If a thrift client is used, the client will only talk to thrift
server. The client will not talk to ZooKeeper. The thrift server will
talk to ZooKeeper, HMaster and HRegionServers like a regular Java
client instead, and act as a 'gateway' for requests to thrift clients.

Does this help clear your questions?

-- 
Harsh J

Reply via email to