Dear All, I would like to request your help for clearing some doubts that I have around the deployment view of these components. I have been able to do some tests on my pseudo-distributed environment and have been able to get very good throughput using Thrift client and gateway server. I need your help to have a clear view of the deployment components, so that I can further elaborate my environment with a clear thought process.
Based on my recent experiences on gateway based connectivity using thrift to access hbase regions, it occurs to me that in order to run a thrift server it has to be run on the hbase node itself. I am trying to envision the deployment view in context of thrift gateway server running on HBase node, ZooKeeper quorum and the HBase node themselves. I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 0.23.1 natively compiled and have installed the thrift library as per the installation instructions. I also see that running gateway servers on HBase is a big plus for a highly multi-threaded environment as it takes advantage of thread pooling. So since I am running my setup in a pseudo-distributed mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN and 1 SNN. So if I have to illustrate my thinking here, the steps that I perform to have HBase running with thrift gateway server are $HBASE_HOME/bin/start-hbase.sh --> Starts the HBase node, Zookeeper Quorum & Region Server $HBASE_HOME/bin/hbase.sh thrift start -threadpool --> Starts the Thrift gateway server on hbase node This makes me think that the thrift server is tightly coupled with every instance of HBase node. If I just need to scale thrift server from a load balancing perspective, I cannot do it independent of HBase scaling, I will have to add another HBase node in the cluster to have another thrift server for scalability. Also with the above scenario in mind, what seems to me is that the thrift server which runs on HBase, requests zookeeper for the connection and zookeeper allocates and manages the connection lifecycle via native Java objects (HTable & HTablePool) objects for respective RegionServers based on key values. Based on my understanding, which may be incorrect, if thrift server has to run on HBase node, which would also be running region servers as well, why the calls have to go through the zookeeper? Or is it that once the client makes a successful connection with a thrift server (on an Hbase node), which may be initially mediated by Zookeeper for allocation, the client interaction happens directly with the thrift server? I would greatly appreciate your inputs to help me build correct understanding around the complete deployment view, as I may have an incorrect perception around it. Thanks and Regards Pankaj Misra ________________________________ Impetus Ranked in the Top 50 India's Best Companies to Work For 2012. Impetus webcast 'Designing a Test Automation Framework for Multi-vendor Interoperable Systems' available at http://lf1.me/0E/. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
