I need to get about 20,000 rows from the table. the table is about
1,000,000 rows.
my first version is using 20,000 Get and I found it's very slow. So I
modified it to a scan and filter unrelated rows in the client.
maybe I should write a coprocessor. btw, is there any filter available
for me? something like sql statement where rowkey in('abc', 'abd'
....). a very long in statementOn Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari <[email protected]> wrote: > Hi Li Li, > > If you have more than one region, might be useful. MR will scan all the > regions in parallel. If you do a full scan from a client API with no > parallelism, then the MR job might be faster. But it will take more > resources on the cluster and might impact the SLA of the other clients, if > any, > > JM > > > 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <[email protected]>: > >> Well, it depends. Could you please provide some more details?It will help >> us in giving a proper answer. >> >> Warm Regards, >> Tariq >> cloudfront.blogspot.com >> >> >> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <[email protected]> wrote: >> >> > I have a full table scan which cost about 10 minutes. it seems a >> > bottleneck for our application. if use map-reduce to rewrite it. will >> > it be faster? >> > >>
