You've got it right, Pankaj :) On Mon, Oct 1, 2012 at 3:34 PM, Pankaj Misra <[email protected]> wrote: > Thank you very much Harsh, thats extremely helpful and clears a lot of air > for me. > > Since I am running in a pseudo distributed mode, many things and mixed up, > possibly going to a small distributed setup will be better for me. Your note > was very helpful around independent scaling of Thrift servers. While the > thrift servers would be run on machines having HBase installs, as you > indicated, they would also need the zookeeper connectivity since they would > be working via the Zookeeper for interaction with the HBase service nodes. > > So, it boils down to the fact that while the Thrift server node may have the > complete HBase installs with the exact configurations for Zookeeper > connectivity, these nodes will only be running the thrift server and may not > be running the HBase service if we only intend to scale thrift server alone. > > Thank you again Harsh for your prompt help, and please do feel free to > indicate if my understanding above is incorrect or incomplete. Thanks. > > Thanks and Regards > Pankaj Misra > > > ________________________________________ > From: Harsh J [[email protected]] > Sent: Monday, October 01, 2012 3:00 PM > To: [email protected] > Subject: Re: Thrift Gateway Server, ZooKeeper & HBase > > Hi, > > Inline. > > On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <[email protected]> > wrote: >> Dear All, >> >> I would like to request your help for clearing some doubts that I have >> around the deployment view of these components. I have been able to do some >> tests on my pseudo-distributed environment and have been able to get very >> good throughput using Thrift client and gateway server. I need your help to >> have a clear view of the deployment components, so that I can further >> elaborate my environment with a clear thought process. >> >> Based on my recent experiences on gateway based connectivity using thrift to >> access hbase regions, it occurs to me that in order to run a thrift server >> it has to be run on the hbase node itself. I am trying to envision the >> deployment view in context of thrift gateway server running on HBase node, >> ZooKeeper quorum and the HBase node themselves. > > A thrift server needs connectivity to all HBase and ZK service/daemon > nodes, but does not need to be co-located with one. > >> I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop >> 0.23.1 natively compiled and have installed the thrift library as per the >> installation instructions. I also see that running gateway servers on HBase >> is a big plus for a highly multi-threaded environment as it takes advantage >> of thread pooling. So since I am running my setup in a pseudo-distributed >> mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 >> DN and 1 SNN. >> >> So if I have to illustrate my thinking here, the steps that I perform to >> have HBase running with thrift gateway server are >> $HBASE_HOME/bin/start-hbase.sh --> >> Starts the HBase node, Zookeeper Quorum & Region Server >> $HBASE_HOME/bin/hbase.sh thrift start -threadpool --> Starts the Thrift >> gateway server on hbase node >> >> This makes me think that the thrift server is tightly coupled with every >> instance of HBase node. If I just need to scale thrift server from a load >> balancing perspective, I cannot do it independent of HBase scaling, I will >> have to add another HBase node in the cluster to have another thrift server >> for scalability. > > Do not couple library dependency with service dependency - both are > different things. > > You may _install_ HBase libs on any machine connected to the cluster, > and start _just_ the thrift server on it. The HBase thrift server does > need HBase libraries to run, but does not need a local service to run > alongside. > >> Also with the above scenario in mind, what seems to me is that the thrift >> server which runs on HBase, requests zookeeper for the connection and >> zookeeper allocates and manages the connection lifecycle via native Java >> objects (HTable & HTablePool) objects for respective RegionServers based on >> key values. Based on my understanding, which may be incorrect, if thrift >> server has to run on HBase node, which would also be running region servers >> as well, why the calls have to go through the zookeeper? Or is it that once >> the client makes a successful connection with a thrift server (on an Hbase >> node), which may be initially mediated by Zookeeper for allocation, the >> client interaction happens directly with the thrift server? > > If a thrift client is used, the client will only talk to thrift > server. The client will not talk to ZooKeeper. The thrift server will > talk to ZooKeeper, HMaster and HRegionServers like a regular Java > client instead, and act as a 'gateway' for requests to thrift clients. > > Does this help clear your questions? > > -- > Harsh J > > ________________________________ > > Impetus Ranked in the Top 50 India’s Best Companies to Work For 2012. > > Impetus webcast ‘Designing a Test Automation Framework for Multi-vendor > Interoperable Systems’ available at http://lf1.me/0E/. > > > NOTE: This message may contain information that is confidential, proprietary, > privileged or otherwise protected by law. The message is intended solely for > the named addressee. If received in error, please destroy and notify the > sender. Any use of this email is prohibited when received in error. Impetus > does not represent, warrant and/or guarantee, that the integrity of this > communication has been maintained nor that the communication is free of > errors, virus, interception or interference.
-- Harsh J
