You've got it right, Pankaj :)

On Mon, Oct 1, 2012 at 3:34 PM, Pankaj Misra <[email protected]> wrote:
> Thank you very much Harsh, thats extremely helpful and clears a lot of air 
> for me.
>
> Since I am running in a pseudo distributed mode, many things and mixed up, 
> possibly going to a small distributed setup will be better for me. Your note 
> was very helpful around independent scaling of Thrift servers. While the 
> thrift servers would be run on machines having HBase installs, as you 
> indicated, they would also need the zookeeper connectivity since they would 
> be working via the Zookeeper for interaction with the HBase service nodes.
>
> So, it boils down to the fact that while the Thrift server node may have the 
> complete HBase installs with the exact configurations for Zookeeper 
> connectivity, these nodes will only be running the thrift server and may not 
> be running the HBase service if we only intend to scale thrift server alone.
>
> Thank you again Harsh for your prompt help, and please do feel free to 
> indicate if my understanding above is incorrect or incomplete. Thanks.
>
> Thanks and Regards
> Pankaj Misra
>
>
> ________________________________________
> From: Harsh J [[email protected]]
> Sent: Monday, October 01, 2012 3:00 PM
> To: [email protected]
> Subject: Re: Thrift Gateway Server, ZooKeeper & HBase
>
> Hi,
>
> Inline.
>
> On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <[email protected]> 
> wrote:
>> Dear All,
>>
>> I would like to request your help for clearing some doubts that I have 
>> around the deployment view of these components. I have been able to do some 
>> tests on my pseudo-distributed environment and have been able to get very 
>> good throughput using Thrift client and gateway server. I need your help to 
>> have a clear view of the deployment components, so that I can further 
>> elaborate my environment with a clear thought process.
>>
>> Based on my recent experiences on gateway based connectivity using thrift to 
>> access hbase regions, it occurs to me that in order to run a thrift server 
>> it has to be run on the hbase node itself. I am trying to envision the 
>> deployment view in context of thrift gateway server running on HBase node, 
>> ZooKeeper quorum and the HBase node themselves.
>
> A thrift server needs connectivity to all HBase and ZK service/daemon
> nodes, but does not need to be co-located with one.
>
>> I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 
>> 0.23.1 natively compiled and have installed the thrift library as per the 
>> installation instructions. I also see that running gateway servers on HBase 
>> is a big plus for a highly multi-threaded environment as it takes advantage 
>> of thread pooling. So since I am running my setup in a pseudo-distributed 
>> mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 
>> DN and 1 SNN.
>>
>> So if I have to illustrate my thinking here, the steps that I perform to 
>> have HBase running with thrift gateway server are
>> $HBASE_HOME/bin/start-hbase.sh                                         --> 
>> Starts the HBase node,  Zookeeper Quorum & Region Server
>> $HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift 
>> gateway server on hbase node
>>
>> This makes me think that the thrift server is tightly coupled with every 
>> instance of HBase node. If I just need to scale thrift server from a load 
>> balancing perspective, I cannot do it independent of HBase scaling, I will 
>> have to add another HBase node in the cluster to have another thrift server 
>> for scalability.
>
> Do not couple library dependency with service dependency - both are
> different things.
>
> You may _install_ HBase libs on any machine connected to the cluster,
> and start _just_ the thrift server on it. The HBase thrift server does
> need HBase libraries to run, but does not need a local service to run
> alongside.
>
>> Also with the above scenario in mind, what seems to me is that the thrift 
>> server which runs on HBase, requests zookeeper for the connection and 
>> zookeeper allocates and manages the connection lifecycle via native Java 
>> objects (HTable & HTablePool) objects for respective RegionServers based on 
>> key values. Based on my understanding, which may be incorrect, if thrift 
>> server has to run on HBase node, which would also be running region servers 
>> as well, why the calls have to go through the zookeeper? Or is it that once 
>> the client makes a successful connection with a thrift server (on an Hbase 
>> node),  which may be initially mediated by Zookeeper for allocation, the 
>> client interaction happens directly with the thrift server?
>
> If a thrift client is used, the client will only talk to thrift
> server. The client will not talk to ZooKeeper. The thrift server will
> talk to ZooKeeper, HMaster and HRegionServers like a regular Java
> client instead, and act as a 'gateway' for requests to thrift clients.
>
> Does this help clear your questions?
>
> --
> Harsh J
>
> ________________________________
>
> Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012.
>
> Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor 
> Interoperable Systems’ available at http://lf1.me/0E/.
>
>
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.



-- 
Harsh J

Reply via email to