thanks, I will try List<Get> later

On Tue, Apr 15, 2014 at 3:39 AM, Doug Meil
<[email protected]> wrote:
>
> re:  "my first version is using 20,000 Get²
>
> Just throwing this out there, but have you looked at multi-get?  Multi-get
> will group the gets by RegionServer internally.
>
> You are doing a lot of IO for a web-app so this is going to be tough to
> make ³fast², but there are ways to make it ³faster.²
>
> But since you only have 1,000,000 rows you might not have many regions, so
> this might wind up all going on the same RegionServer.
>
>
>
>
> On 4/14/14, 7:52 AM, "Li Li" <[email protected]> wrote:
>
>>I need to get about 20,000 rows from the table. the table is about
>>1,000,000 rows.
>>my first version is using 20,000 Get and I found it's very slow. So I
>>modified it to a scan and filter unrelated rows in the client.
>>maybe I should write a coprocessor. btw, is there any filter available
>>for me? something like sql statement where rowkey in('abc', 'abd'
>>....). a very long in statement
>>
>>On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari
>><[email protected]> wrote:
>>> Hi Li Li,
>>>
>>> If you have more than one region, might be useful. MR will scan all the
>>> regions in parallel. If you do a full scan from a client API with no
>>> parallelism, then the MR job might be faster. But it will take more
>>> resources on the cluster and might impact the SLA of the other clients,
>>>if
>>> any,
>>>
>>> JM
>>>
>>>
>>> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <[email protected]>:
>>>
>>>> Well, it depends. Could you please provide some more details?It will
>>>>help
>>>> us in giving a proper answer.
>>>>
>>>> Warm Regards,
>>>> Tariq
>>>> cloudfront.blogspot.com
>>>>
>>>>
>>>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <[email protected]> wrote:
>>>>
>>>> > I have a full table scan which cost about 10 minutes. it seems a
>>>> > bottleneck for our application. if use map-reduce to rewrite it. will
>>>> > it be faster?
>>>> >
>>>>
>

Reply via email to