thanks, I will try List<Get> later
On Tue, Apr 15, 2014 at 3:39 AM, Doug Meil <[email protected]> wrote: > > re: "my first version is using 20,000 Get² > > Just throwing this out there, but have you looked at multi-get? Multi-get > will group the gets by RegionServer internally. > > You are doing a lot of IO for a web-app so this is going to be tough to > make ³fast², but there are ways to make it ³faster.² > > But since you only have 1,000,000 rows you might not have many regions, so > this might wind up all going on the same RegionServer. > > > > > On 4/14/14, 7:52 AM, "Li Li" <[email protected]> wrote: > >>I need to get about 20,000 rows from the table. the table is about >>1,000,000 rows. >>my first version is using 20,000 Get and I found it's very slow. So I >>modified it to a scan and filter unrelated rows in the client. >>maybe I should write a coprocessor. btw, is there any filter available >>for me? something like sql statement where rowkey in('abc', 'abd' >>....). a very long in statement >> >>On Mon, Apr 14, 2014 at 7:46 PM, Jean-Marc Spaggiari >><[email protected]> wrote: >>> Hi Li Li, >>> >>> If you have more than one region, might be useful. MR will scan all the >>> regions in parallel. If you do a full scan from a client API with no >>> parallelism, then the MR job might be faster. But it will take more >>> resources on the cluster and might impact the SLA of the other clients, >>>if >>> any, >>> >>> JM >>> >>> >>> 2014-04-14 2:42 GMT-04:00 Mohammad Tariq <[email protected]>: >>> >>>> Well, it depends. Could you please provide some more details?It will >>>>help >>>> us in giving a proper answer. >>>> >>>> Warm Regards, >>>> Tariq >>>> cloudfront.blogspot.com >>>> >>>> >>>> On Mon, Apr 14, 2014 at 11:38 AM, Li Li <[email protected]> wrote: >>>> >>>> > I have a full table scan which cost about 10 minutes. it seems a >>>> > bottleneck for our application. if use map-reduce to rewrite it. will >>>> > it be faster? >>>> > >>>> >
