Hi, Inline.
On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <[email protected]> wrote: > Dear All, > > I would like to request your help for clearing some doubts that I have around > the deployment view of these components. I have been able to do some tests on > my pseudo-distributed environment and have been able to get very good > throughput using Thrift client and gateway server. I need your help to have a > clear view of the deployment components, so that I can further elaborate my > environment with a clear thought process. > > Based on my recent experiences on gateway based connectivity using thrift to > access hbase regions, it occurs to me that in order to run a thrift server it > has to be run on the hbase node itself. I am trying to envision the > deployment view in context of thrift gateway server running on HBase node, > ZooKeeper quorum and the HBase node themselves. A thrift server needs connectivity to all HBase and ZK service/daemon nodes, but does not need to be co-located with one. > I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop > 0.23.1 natively compiled and have installed the thrift library as per the > installation instructions. I also see that running gateway servers on HBase > is a big plus for a highly multi-threaded environment as it takes advantage > of thread pooling. So since I am running my setup in a pseudo-distributed > mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN > and 1 SNN. > > So if I have to illustrate my thinking here, the steps that I perform to have > HBase running with thrift gateway server are > $HBASE_HOME/bin/start-hbase.sh --> > Starts the HBase node, Zookeeper Quorum & Region Server > $HBASE_HOME/bin/hbase.sh thrift start -threadpool --> Starts the Thrift > gateway server on hbase node > > This makes me think that the thrift server is tightly coupled with every > instance of HBase node. If I just need to scale thrift server from a load > balancing perspective, I cannot do it independent of HBase scaling, I will > have to add another HBase node in the cluster to have another thrift server > for scalability. Do not couple library dependency with service dependency - both are different things. You may _install_ HBase libs on any machine connected to the cluster, and start _just_ the thrift server on it. The HBase thrift server does need HBase libraries to run, but does not need a local service to run alongside. > Also with the above scenario in mind, what seems to me is that the thrift > server which runs on HBase, requests zookeeper for the connection and > zookeeper allocates and manages the connection lifecycle via native Java > objects (HTable & HTablePool) objects for respective RegionServers based on > key values. Based on my understanding, which may be incorrect, if thrift > server has to run on HBase node, which would also be running region servers > as well, why the calls have to go through the zookeeper? Or is it that once > the client makes a successful connection with a thrift server (on an Hbase > node), which may be initially mediated by Zookeeper for allocation, the > client interaction happens directly with the thrift server? If a thrift client is used, the client will only talk to thrift server. The client will not talk to ZooKeeper. The thrift server will talk to ZooKeeper, HMaster and HRegionServers like a regular Java client instead, and act as a 'gateway' for requests to thrift clients. Does this help clear your questions? -- Harsh J
