2016-01-26 2:29 GMT+08:00 Henning Blohm <henning.bl...@zfabrik.de>: > I am looking for advise on an HBase mass data access optimization problem. >
For multi-get and multi-scan: In my opion, multi-get(make less line) can work in realtime query, but multi-scan maybe work but it will let server busy easy and effect other small-query to a big query time. But multi-get's query time will not stable, when one of the region is busy the whole time will up. For realtime and offline: watch your real query result, when the result line is so much lines, like Mbyte or 10Mbyte, it's quert time will not so good as miliseconds, because of the network trans time. We must reduce the result lines or result sizes or result columns. or it is not suit the real-realtime query. if actually need so much querys and so much big-szie results, suggest to work with offline and parallel, but not realtime, because also the server network-through will not work(1000M BIT NIC for 2M byte/qps, a server just handler 50qps). if just the query issue(multi-scan and multi-get), I think we can waste store to up the query performance, just using an extra table(maybe will write twice) and using another schema, eg: one table with rowkey as A_B_time, another as B_A_time, when query B%, we just query table rowkey B_A_time that just one small-scan, and not need for query table row A_B_time with multi_scans. Hope helpful for U. -- Thanks & Regards, 李剑 Jameson Li Focus on Hadoop,Mysql