Are the regions scanned in parallel? On Friday, January 3, 2014, rajeshbabu chintaguntla wrote:
> > Here are some performance numbers with RLI. > > No Region servers : 4 > Data per region : 2 GB > > Regions/RS| Total regions| Blocksize(kb) |No#rows matching values| Time > taken(sec)| > 50 | 200| 64|199|102 > 50 | 200|8|199| 35 > 100|400 | 8| 350| 95 > 200| 800| 8| 353| 153 > > Without secondary index scan is taking in hours. > > > Thanks, > Rajeshbabu > ________________________________________ > From: Anoop John [[email protected] <javascript:;>] > Sent: Friday, January 03, 2014 3:22 PM > To: [email protected] <javascript:;> > Subject: Re: secondary index feature > > >Is there any data on how RLI (or in particular Phoenix) query throughput > correlates with the number of region servers assuming homogeneously > distributed data? > > Phoenix is yet to add RLI. Now it is having global indexing only. Correct > James? > > RLI impl from Huawei (HIndex) is having some numbers wrt regions.. But I > doubt whether it is there large no# RSs. Do you have some data Rajesh > Babu? > > -Anoop- > > On Fri, Jan 3, 2014 at 3:11 PM, Henning Blohm <[email protected] > >wrote: > > > Jesse, James, Lars, > > > > after looking around a bit and in particular looking into Phoenix (which > I > > find very interesting), assuming that you want a secondary indexing on > > HBASE without adding other infrastructure, there seems to be not a lot of > > choice really: Either go with a region-level (and co-processor based) > > indexing feature (Phoenix, Huawei, is IHBase dead?) or add an index table > > to store (index value, entity key) pairs. > > > > The main concern I have with region-level indexing (RLI) is that Gets > > potentially require to visit all regions. Compared to global index tables > > this seems to flatten the read-scalability curve of the cluster. In our > > case, we have a large data set (hence HBASE) that will be queried (mostly > > point-gets via an index) in some linear correlation with its size. > > > > Is there any data on how RLI (or in particular Phoenix) query throughput > > correlates with the number of region servers assuming homogeneously > > distributed data? > > > > Thanks, > > Henning > > > > > > > > > > On 24.12.2013 12:18, Henning Blohm wrote: > > > >> All that sounds very promising. I will give it a try and let you know > >> how things worked out. > >> > >> Thanks, > >> Henning > >> > >> On 12/23/2013 08:10 PM, Jesse Yates wrote: > >> > >>> The work that James is referencing grew out of the discussions Lars > >>> and I > >>> had (which lead to those blog posts). The solution we implement is > >>> designed > >>> to be generic, as James mentioned above, but was written with all the > >>> hooks > >>> necessary for Phoenix to do some really fast updates (or skipping > updates > >>> in the case where there is no change). > >>> > >>> You should be able to plug in your own simple index builder (there is > >>> an example > >>> in the phoenix codebase<https://github.com/forcedotcom/phoenix/tree/ > >>> master/src/main/java/com/salesforce/hbase/index/covered/example>) > >>> to basic solution which supports the same transactional guarantees as > >>> HBase > >>> (per row) + data guarantees across the index rows. There are more > details > >>> in the presentations James linked. > >>> > >>> I'd love you see if your implementation can fit into the framework we > >>> wrote > >>> - we would be happy to work to see if it needs some more hooks or > >>> modifications - I have a feeling this is pretty much what you guys will > >>> need > >>> > >>> -Jesse > >>> > >>> > >>> On Mon, Dec 23, 2013 at 10:01 AM, James Taylor<[email protected]> > >>> wrote: > >>> > >>> Henning, > >>>> Jesse Yates wrote the back-end of our global secondary indexing system > >>>> in > >>>> Phoenix. He designed it as a separate, pluggable module with no > Phoenix > >>>> dependencies. Here's an overview of the feature: > >>>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing. The > >>>> section that discusses the data guarantees and failure management > might > >>>> be > >>>> of interest to you: > >>>> https://github.com/forcedotcom/phoenix/wiki/Secondary-Indexing#data- > >>>> guarantees-and-failure-management > >>>> > >>>> This presentation also gives a good overview of the pluggability of > his >
