2016-01-26 2:29 GMT+08:00 Henning Blohm <henning.bl...@zfabrik.de>:

> I am looking for advise on an HBase mass data access optimization problem.
>

For multi-get and multi-scan:
In my opion, multi-get(make less line) can work in realtime query, but
multi-scan maybe work but it will let server busy easy and effect other
small-query to a big query time.
But multi-get's query time will not stable, when one of the region is busy
the whole time will up.

For realtime and offline:
watch your real query result, when the result line is so much lines, like
Mbyte or 10Mbyte, it's quert time will not so good as miliseconds, because
of the network trans time. We must reduce the result lines or result sizes
or result columns. or it is not suit the real-realtime query.
if actually need so much querys and so much big-szie results, suggest to
work with offline and parallel, but not realtime, because also the server
network-through will not work(1000M BIT NIC for 2M byte/qps, a server just
handler 50qps).

if just the query issue(multi-scan and multi-get), I think we can waste
store to up the query performance, just using an extra table(maybe will
write twice) and using another schema, eg: one table with rowkey as
A_B_time, another as B_A_time, when query B%, we just query table rowkey
B_A_time that just one small-scan, and not need for query table row
A_B_time with multi_scans.

Hope helpful for U.




-- 


Thanks & Regards,
李剑 Jameson Li
Focus on Hadoop,Mysql

Reply via email to