user_month might still be helpful on average if a user looks for one month and then another a short time later. This is because your cache could be primed by the first query.
But you know your application best, of course. On Mon, Apr 25, 2011 at 10:27 PM, Weihua JIANG <[email protected]>wrote: > Changing key to user_month may not be useful to me since, for each > query, we only need to get one month report for a user instead of all > the data stored for a user. > > Putting multiple month data into a single row may be useful, but not > sure. I will perform some experimentation when I have time. > > 2011/4/26 Ted Dunning <[email protected]>: > > Change your key to user_month. > > > > That will put all of the records for a user together so you will only > need a > > single disk operation to read all of your data. Also, test the option of > > putting multiple months in a single row. > > > > On Mon, Apr 25, 2011 at 7:59 PM, Weihua JIANG <[email protected] > >wrote: > > > >> Hi all, > >> > >> We want to implement a bill query system. We have 20M users, the bill > >> for each user per month contains about 10 0.6K-byte records. We want > >> to store user bill for 6 months. Of course, user query focused on the > >> latest month reports. But, the user to be queried doesn't have hot > >> spot. > >> > >> We use CDH3U0 with 6 servers (each with 24G mem and 3 1T disk) for > >> data node and region server (besides the ZK, namenode and hmaster > >> servers). RS heap is 8G and DN is 12G. HFile max size is 1G. The > >> block cache is 0.4. > >> > >> The row key is month+user_id. Each record is stored as a cell. So, a > >> month report per user is a row in HBase. > >> > >> Currently, to store bill records, we can achieve about 30K > record/second. > >> > >> However, the query performance is quite poor. We can only achieve > >> about 600~700 month_report/second. That is, each region server can > >> only serve query for about 100 row/second. Block cache hit ratio is > >> about 20%. > >> > >> Do you have any advice on how to improve the query performance? > >> > >> Below is some metrics info reported by region server: > >> 2011-04-26T10:56:12 hbase.regionserver: > >> RegionServer=regionserver50820, blockCacheCount=40969, > >> blockCacheEvictedCount=216359, blockCacheFree=671152504, > >> blockCacheHitCachingRatio=20, blockCacheHitCount=67936, > >> blockCacheHitRatio=20, blockCacheMissCount=257675, > >> blockCacheSize=2743351688, compactionQueueSize=0, > >> compactionSize_avg_time=0, compactionSize_num_ops=7, > >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > >> flushTime_num_ops=0, fsReadLatency_avg_time=46, > >> fsReadLatency_num_ops=257905, fsSyncLatency_avg_time=0, > >> fsSyncLatency_num_ops=1726, fsWriteLatency_avg_time=0, > >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > >> requests=82.1, storefileIndexSizeMB=188, storefiles=343, stores=169 > >> 2011-04-26T10:56:22 hbase.regionserver: > >> RegionServer=regionserver50820, blockCacheCount=42500, > >> blockCacheEvictedCount=216359, blockCacheFree=569659040, > >> blockCacheHitCachingRatio=20, blockCacheHitCount=68418, > >> blockCacheHitRatio=20, blockCacheMissCount=259206, > >> blockCacheSize=2844845152, compactionQueueSize=0, > >> compactionSize_avg_time=0, compactionSize_num_ops=7, > >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > >> flushTime_num_ops=0, fsReadLatency_avg_time=44, > >> fsReadLatency_num_ops=259547, fsSyncLatency_avg_time=0, > >> fsSyncLatency_num_ops=1736, fsWriteLatency_avg_time=0, > >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > >> requests=92.2, storefileIndexSizeMB=188, storefiles=343, stores=169 > >> 2011-04-26T10:56:32 hbase.regionserver: > >> RegionServer=regionserver50820, blockCacheCount=39238, > >> blockCacheEvictedCount=221509, blockCacheFree=785944072, > >> blockCacheHitCachingRatio=20, blockCacheHitCount=69043, > >> blockCacheHitRatio=20, blockCacheMissCount=261095, > >> blockCacheSize=2628560120, compactionQueueSize=0, > >> compactionSize_avg_time=0, compactionSize_num_ops=7, > >> compactionTime_avg_time=0, compactionTime_num_ops=7, flushQueueSize=0, > >> flushSize_avg_time=0, flushSize_num_ops=0, flushTime_avg_time=0, > >> flushTime_num_ops=0, fsReadLatency_avg_time=39, > >> fsReadLatency_num_ops=261070, fsSyncLatency_avg_time=0, > >> fsSyncLatency_num_ops=1746, fsWriteLatency_avg_time=0, > >> fsWriteLatency_num_ops=0, memstoreSizeMB=0, regions=169, > >> requests=128.77777, storefileIndexSizeMB=188, storefiles=343, > >> stores=169 > >> > >> > >> And we also tried to disable block cache, it seems the performance is > >> even a little bit better. And it we use the configuration 6 DN servers > >> + 3 RS servers, we can get better throughput at about 1000 > >> month_report/second. I am confused. Can any one explain the reason? > >> > >> Thanks > >> Weihua > >> > > >
