Re: HBase - Secondary Index

Shengjie Min Thu, 27 Dec 2012 08:29:40 -0800

>Didnt follow u completely here. There wont be any get() happening.. As the
>exact rowkey in a region we get from the index table, we can seek to the
>exact position and return that row.

Sorry, When I misused "get()" here, I meant seeking. Yes, if it's just
small number of rows returned, this works perfect. As you said you will get
the exact rowkey positions per region, and simply seek them. I was trying
to work out the case that when the number of result rows increases
massively. Like in Anil's case, he wants to do a scan query against the
2ndary index(timestamp): "select all rows from timestamp1 to timestamp2"
given no customerId provided. During that time period, he might have a big
chunk of rows from different customerIds. The index table returns a lot of
rowkey positions for different customerIds (I believe they are scattered in
different regions), then you end up seeking all different positions in
different regions and return all the rows needed. According to your
presentation page14 - Performance Test Results (Scan), without index, it's
a linear increase as result rows # increases. on the other hand, with
index, time spent climbs up way quicker than the case without index.

btw, quick question- in your presentation, the scale there is seconds or
mill-seconds:)

- Shengjie

On 27 December 2012 15:54, Anoop John <[email protected]> wrote:

> >how the massive number of get() is going to
> perform againt the main table
>
> Didnt follow u completely here. There wont be any get() happening.. As the
> exact rowkey in a region we get from the index table, we can seek to the
> exact position and return that row.
>
> -Anoop-
>
> On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min <[email protected]>
> wrote:
>
> > how the massive number of get() is going to
> > perform againt the main table
> >
>

-- 
All the best,
Shengjie Min

Re: HBase - Secondary Index

Reply via email to