Hi~ I would like to know how data flows when you query it from HBase or Stargate, especially in I/O perspective. Please point me some directions to study. That means questions like below: When is HFile(StoreFile) being loaded as a region into region server's memory? Does a region stay in region server's memory afterward? When is it being freed? When Stargate uses a scan instance to obtain data, does it communicate with region server with another connection overhead?
Actually I'm asking these because I'm experimenting Toad for Cloud Database on HBase. And I got a performance issue of querying 400K data rows in about 5 minutes, kind of a awkward number. I installed HBase/HDFS on 7 VMs, 1 ResourceManager, 1 as NameNode and HMaster, 5 as DataNodes and RegionServers Barely change any configuration for performance tuning. I drew myself a very simple chart trying to find where are the bottlenecks. <http://apache-hbase.679495.n3.nabble.com/file/n4057719/Toad_Read_HBase_Process.png> I know I could miss many details in this simple chart Please give me some clues Much appreciate yglin -- View this message in context: http://apache-hbase.679495.n3.nabble.com/HBase-Stargate-dataflow-in-I-O-perspective-tp4057719.html Sent from the HBase User mailing list archive at Nabble.com.
