Hi all, We want to implement a bill query system. We have 20M users, the bill for each user per month contains about 10 0.6K-byte records. We want to store user bill for 6 months. Of course, user query focused on the latest month reports. But, the user to be queried doesn't have hot spot.
We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for data node and region server (besides the ZK, namenode and hmaster servers). RS heap is 8G and DN is 12G. HFile max size is 1G. The block cache is 0.4. The row key is month+user_id. Each record is stored as a cell. So, a month report per user is a row in HBase. Currently, to store bill records, we can achieve about 30K record/second. However, the query performance is quite poor. We can only achieve about 600~700 month_report/second. That is, each region server can only serve query for about 100 row/second. Block cache hit ratio is about 20%. Do you have any advice on how to improve the query performance? Below is some metrics info reported by region server: 2011-04-26T10:56:12 hbase.regionserver: RegionServer=regionserver50820, blockCacheCount=40969, blockCacheEvictedCount=216359, blockCacheFree=671152504, blockCacheHitCachingRatio=20, blockCacheHitCount=67936, blockCacheHitRatio=20, blockCacheMissCount=257675, blockCacheSize=2743351688, compactionQueueSize=0, compactionSize_avg_time=0, compactionSize_num_ops=7, compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, flushTime_num_ops=0, fsReadLatency_avg_time=46, fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 2011-04-26T10:56:22 hbase.regionserver: RegionServer=regionserver50820, blockCacheCount=42500, blockCacheEvictedCount=216359, blockCacheFree=569659040, blockCacheHitCachingRatio=20, blockCacheHitCount=68418, blockCacheHitRatio=20, blockCacheMissCount=259206, blockCacheSize=2844845152, compactionQueueSize=0, compactionSize_avg_time=0, compactionSize_num_ops=7, compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, flushTime_num_ops=0, fsReadLatency_avg_time=44, fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 2011-04-26T10:56:32 hbase.regionserver: RegionServer=regionserver50820, blockCacheCount=39238, blockCacheEvictedCount=221509, blockCacheFree=785944072, blockCacheHitCachingRatio=20, blockCacheHitCount=69043, blockCacheHitRatio=20, blockCacheMissCount=261095, blockCacheSize=2628560120, compactionQueueSize=0, compactionSize_avg_time=0, compactionSize_num_ops=7, compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, flushTime_num_ops=0, fsReadLatency_avg_time=39, fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, requests=128.77777, storefileIndexSizeMB=188, storefiles=343, stores=169 And we also tried to disable block cache, it seems the performance is even a little bit better. And it we use the configuration 6 DN servers + 3 RS servers, we can get better throughput at about 1000 month_report/second. I am confused. Can any one explain the reason? Thanks Weihua
