Re: Scan vs map-reduce

Doug Meil Mon, 14 Apr 2014 12:40:40 -0700

re:  "my first version is using 20,000 Get²

Just throwing this out there, but have you looked at multi-get?  Multi-get
will group the gets by RegionServer internally.


You are doing a lot of IO for a web-app so this is going to be tough to
make ³fast², but there are ways to make it ³faster.²

But since you only have 1,000,000 rows you might not have many regions, so
this might wind up all going on the same RegionServer.




On 4/14/14, 7:52 AM, "Li Li" <[email protected]> wrote:

>I need to get about 20,000 rows from the table. the table is about
>1,000,000 rows.
>my first version is using 20,000 Get and I found it's very slow. So I
>modified it to a scan and filter unrelated rows in the client.
>maybe I should write a coprocessor. btw, is there any filter available
>for me? something like sql statement where rowkey in('abc', 'abd'
>....). a very long in statement
>
>On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari
><[email protected]> wrote:
>> Hi Li Li,
>>
>> If you have more than one region, might be useful. MR will scan all the
>> regions in parallel. If you do a full scan from a client API with no
>> parallelism, then the MR job might be faster. But it will take more
>> resources on the cluster and might impact the SLA of the other clients,
>>if
>> any,
>>
>> JM
>>
>>
>> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <[email protected]>:
>>
>>> Well, it depends. Could you please provide some more details?It will
>>>help
>>> us in giving a proper answer.
>>>
>>> Warm Regards,
>>> Tariq
>>> cloudfront.blogspot.com
>>>
>>>
>>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <[email protected]> wrote:
>>>
>>> > I have a full table scan which cost about 10 minutes. it seems a
>>> > bottleneck for our application. if use map-reduce to rewrite it. will
>>> > it be faster?
>>> >
>>>

Re: Scan vs map-reduce

Reply via email to