re: "my first version is using 20,000 Get² Just throwing this out there, but have you looked at multi-get? Multi-get will group the gets by RegionServer internally.
You are doing a lot of IO for a web-app so this is going to be tough to make ³fast², but there are ways to make it ³faster.² But since you only have 1,000,000 rows you might not have many regions, so this might wind up all going on the same RegionServer. On 4/14/14, 7:52 AM, "Li Li" <[email protected]> wrote: >I need to get about 20,000 rows from the table. the table is about >1,000,000 rows. >my first version is using 20,000 Get and I found it's very slow. So I >modified it to a scan and filter unrelated rows in the client. >maybe I should write a coprocessor. btw, is there any filter available >for me? something like sql statement where rowkey in('abc', 'abd' >....). a very long in statement > >On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari ><[email protected]> wrote: >> Hi Li Li, >> >> If you have more than one region, might be useful. MR will scan all the >> regions in parallel. If you do a full scan from a client API with no >> parallelism, then the MR job might be faster. But it will take more >> resources on the cluster and might impact the SLA of the other clients, >>if >> any, >> >> JM >> >> >> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <[email protected]>: >> >>> Well, it depends. Could you please provide some more details?It will >>>help >>> us in giving a proper answer. >>> >>> Warm Regards, >>> Tariq >>> cloudfront.blogspot.com >>> >>> >>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <[email protected]> wrote: >>> >>> > I have a full table scan which cost about 10 minutes. it seems a >>> > bottleneck for our application. if use map-reduce to rewrite it. will >>> > it be faster? >>> > >>>
