Re: Scan vs map-reduce

Li Li Mon, 14 Apr 2014 04:53:34 -0700

I need to get about 20,000 rows from the table. the table is about
1,000,000 rows.
my first version is using 20,000 Get and I found it's very slow. So I
modified it to a scan and filter unrelated rows in the client.
maybe I should write a coprocessor. btw, is there any filter available
for me? something like sql statement where rowkey in('abc', 'abd'
....). a very long in statement


On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari
<[email protected]> wrote:
> Hi Li Li,
>
> If you have more than one region, might be useful. MR will scan all the
> regions in parallel. If you do a full scan from a client API with no
> parallelism, then the MR job might be faster. But it will take more
> resources on the cluster and might impact the SLA of the other clients, if
> any,
>
> JM
>
>
> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <[email protected]>:
>
>> Well, it depends. Could you please provide some more details?It will help
>> us in giving a proper answer.
>>
>> Warm Regards,
>> Tariq
>> cloudfront.blogspot.com
>>
>>
>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <[email protected]> wrote:
>>
>> > I have a full table scan which cost about 10 minutes. it seems a
>> > bottleneck for our application. if use map-reduce to rewrite it. will
>> > it be faster?
>> >
>>

Re: Scan vs map-reduce

Reply via email to