Ron - Salting is only recommended when your primary key is monotonically increasing. It's mainly used to prevent write hotspotting. Also, I think Ron forgot to mention, but I was working with him a bit earlier on this, and I couldn't repro the issue either (in current 4.x or in 4.7 release). Here's the unit test I put together which hints a non covered global index:
@Test public void testIndexHintWithNonCoveredColumnSelected() throws Exception { String schemaName = ""; String dataTableName = "T_FOO"; String indexTableName = "I_FOO"; String dataTableFullName = SchemaUtil.getTableName(schemaName, dataTableName); String indexTableFullName = SchemaUtil.getTableName(schemaName, indexTableName); try (Connection conn = DriverManager.getConnection(getUrl())) { conn.createStatement().execute("CREATE TABLE " + dataTableFullName + "(k INTEGER PRIMARY KEY, v1 VARCHAR, v2 VARCHAR)"); conn.createStatement().execute("CREATE INDEX " + indexTableName + " ON " + dataTableName + "(v1)"); PhoenixStatement stmt = conn.createStatement().unwrap(PhoenixStatement.class); QueryPlan plan; stmt.executeQuery("SELECT v1, v2 " + " FROM " + dataTableFullName + " WHERE v1='a'"); plan = stmt.getQueryPlan(); assertEquals("Expected (" + dataTableFullName + ") but was " + plan.getSourceRefs(), 1, plan.getSourceRefs().size()); stmt.executeQuery("SELECT /*+ INDEX(" + dataTableFullName + " " + indexTableName + ") */ v1, v2" + " FROM " + dataTableFullName + " WHERE v1='a'"); plan = stmt.getQueryPlan(); assertEquals("Expected (" + dataTableFullName + "," + indexTableFullName + ") but was " + plan.getSourceRefs(), 2, plan.getSourceRefs().size()); } } On Fri, Apr 20, 2018 at 1:11 PM, Taylor, Ronald (Ronald) < ronald.tay...@cchmc.org> wrote: > Hello Sergey, > > > > Per your request, here are the commands that I used to create the table > and its indexes. Hopefully you can find something in here that provides a > guide as to what we are doing wrong. > > > > BTW – as I said, we are novices with Phoenix here. One thing we are doing > is playing around a bit with salting numbers. We believed that the data in > our test table was small enough to fit on one region server ( < 10 GB), so > we used a high salt number (24) to try to force HBase to use more than one > region server, to parallelize over more than one node. Did we get that > concept right? > > > > Ron > > > > %%%%%%%%%%%%%%%%%%%%%%%%%%%%% > > > CREATE TABLE variantjoin_RT_salted24 ( > chrom VARCHAR, > genomic_range VARCHAR, > reference VARCHAR, > alternate VARCHAR, > annotations VARCHAR, > consequence VARCHAR, > chrom_int INTEGER, > onekg_maf DOUBLE, > coding VARCHAR, > esp_aa_maf DOUBLE, > esp_ea_maf DOUBLE, > exac_maf DOUBLE, > filter VARCHAR, > gene VARCHAR, > impact VARCHAR, > polyphen VARCHAR, > sift VARCHAR, > viva_maf DOUBLE, > variant_id INTEGER PRIMARY KEY, > genomic_range_start INTEGER, > genomic_range_end INTEGER > ) SALT_BUCKETS = 24, IMMUTABLE_ROWS=false; > > > > > > 0: jdbc:phoenix:> !describe variantjoin_RTsalted24 > > > 0: jdbc:phoenix:> !describe variantjoin_RTsalted24 > +------------+--------------+-------------------------+----- > -----------------+-+ > | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | > COLUMN_NAME | | > +------------+--------------+-------------------------+----- > -----------------+-+ > | | | VARIANTJOIN_RTSALTED24 | > CHROM | | > | | | VARIANTJOIN_RTSALTED24 | > GENOMIC_RANGE | | > | | | VARIANTJOIN_RTSALTED24 | > REFERENCE | | > | | | VARIANTJOIN_RTSALTED24 | > ALTERNATE | | > | | | VARIANTJOIN_RTSALTED24 | > ANNOTATIONS | | > | | | VARIANTJOIN_RTSALTED24 | > CONSEQUENCE | | > | | | VARIANTJOIN_RTSALTED24 | > CHROM_INT | | > | | | VARIANTJOIN_RTSALTED24 | > ONEKG_MAF | | > | | | VARIANTJOIN_RTSALTED24 | > CODING | | > | | | VARIANTJOIN_RTSALTED24 | > ESP_AA_MAF | | > | | | VARIANTJOIN_RTSALTED24 | > ESP_EA_MAF | | > | | | VARIANTJOIN_RTSALTED24 | > EXAC_MAF | | > | | | VARIANTJOIN_RTSALTED24 | > FILTER | | > | | | VARIANTJOIN_RTSALTED24 | > GENE | | > | | | VARIANTJOIN_RTSALTED24 | > IMPACT | | > | | | VARIANTJOIN_RTSALTED24 | > POLYPHEN | | > | | | VARIANTJOIN_RTSALTED24 | > SIFT | | > | | | VARIANTJOIN_RTSALTED24 | > VIVA_MAF | | > | | | VARIANTJOIN_RTSALTED24 | > VARIANT_ID | | > | | | VARIANTJOIN_RTSALTED24 | > GENOMIC_RANGE_START | | > | | | VARIANTJOIN_RTSALTED24 | > GENOMIC_RANGE_END | | > +------------+--------------+-------------------------+----- > -----------------+-+ > 0: jdbc:phoenix:> > > > > > > 0: jdbc:phoenix:> create index vj2_chrom on variantjoin_RTsalted24 (chrom); > create index vj2_chrom on variantjoin_RTsalted24 (chrom); > No rows affected (13.322 seconds) > 0: jdbc:phoenix:> > > reference - no index at present > alternate - no index at present > annotations - JSONB blob no index at present > > > 0: jdbc:phoenix:> create index vj2_genomic_range on variantjoin_RTsalted24 > (genomic_range); > create index vj2_genomic_range on variantjoin_RTsalted24 (genomic_range); > No rows affected (11.953 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_chrom_int on variantjoin_RTsalted24 > (chrom_int); > create index vj2_chrom_int on variantjoin_RTsalted24 (chrom_int); > No rows affected (12.518 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_onekg_maf on variantjoin_RTsalted24 > (onekg_maf); > create index vj2_onekg_maf on variantjoin_RTsalted24 (onekg_maf); > No rows affected (13.727 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_coding_maf on variantjoin_RTsalted24 > (onekg_maf); > create index vj2_coding_maf on variantjoin_RTsalted24 (onekg_maf); > No rows affected (12.45 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_consequence on variantjoin_RTsalted24 > (consequence); > create index vj2_consequence on variantjoin_RTsalted24 (consequence); > No rows affected (13.906 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_esp_aa_maf on variantjoin_RTsalted24 > (esp_aa_maf); > create index vj2_esp_ea_maf on variantjoin_RTsalted24 (esp_ea_maf); > No rows affected (12.355 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_exac_maf on variantjoin_RTsalted24 > (exac_maf); > create index vj2_exac_maf on variantjoin_RTsalted24 (exac_maf); > No rows affected (12.146 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_filter on variantjoin_RTsalted24 > (filter); > create index vj2_filter on variantjoin_RTsalted24 (filter); > No rows affected (12.178 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_gene on variantjoin_RTsalted24 (gene); > create index vj2_gene on variantjoin_RTsalted24 (gene); > No rows affected (12.375 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_impact on variantjoin_RTsalted24 > (impact); > create index vj2_impact on variantjoin_RTsalted24 (impact); > No rows affected (14.806 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_polyphen on variantjoin_RTsalted24 > (polyphen); > create index vj2_polyphen on variantjoin_RTsalted24 (polyphen); > No rows affected (13.218 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_sift on variantjoin_RTsalted24 (sift); > create index vj2_sift on variantjoin_RTsalted24 (sift); > No rows affected (12.12 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_viva_maf on variantjoin_RTsalted24 > (viva_maf); > create index vj2_viva_maf on variantjoin_RTsalted24 (viva_maf); > No rows affected (12.032 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_genomic_range_start on > variantjoin_RTsalted24 (genomic_range_start); > create index vj2_genomic_range_start on variantjoin_RTsalted24 > (genomic_range_start); > No rows affected (12.682 seconds) > 0: jdbc:phoenix:> > > 0: jdbc:phoenix:> create index vj2_genomic_range_end on > variantjoin_RTsalted24 (genomic_range_end); > create index vj2_genomic_range_end on variantjoin_RTsalted24 > (genomic_range_end); > No rows affected (13.112 seconds) > 0: jdbc:phoenix:> > > > > %%%%%%%%%%%%%%%%%%%%%%% > > > > Ronald C. Taylor, Ph.D. > Divisions of Immunobiology and Biomedical Informatics > > Cincinnati Children's Hospital Medical Center > > Office phone: 513-803-4880 > > Cell phone: 509-783-7308 > > Email: ronald.tay...@cchmc.org > > > > > > *From: *<sergey.solda...@gmail.com> on behalf of Sergey Soldatov < > sergeysolda...@gmail.com> > *Reply-To: *"user@phoenix.apache.org" <user@phoenix.apache.org> > *Date: *Thursday, April 19, 2018 at 5:19 PM > *To: *"user@phoenix.apache.org" <user@phoenix.apache.org> > *Subject: *Re: hint to use a global index is not working - need to find > out why > > > > That looks strange. Could you please provide full DDLs for table and > indexes? I just tried a similar scenario and obviously index is used: > > > > 0: jdbc:phoenix:> create table VARIANTJOIN_RTSALTED24 (id integer primary > key, chrom_int integer, genomic_range integer); > > No rows affected (6.339 seconds) > > 0: jdbc:phoenix:> create index jv2_chrom_int on VARIANTJOIN_RTSALTED24 > (chrom_int); > > No rows affected (10.016 seconds) > > 0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */ > VJ.chrom_int, genomic_range FROM VARIANTJOIN_RTSALTED24 as VJ WHERE > (chrom_int =18 ) limit 5; > > +----------------------------------------------------------- > ----------------------------+ > > | PLAN > | > > +----------------------------------------------------------- > ----------------------------+ > > | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER > VARIANTJOIN_RTSALTED24 | > > | CLIENT 5 ROW LIMIT > | > > | SKIP-SCAN-JOIN TABLE 0 > | > > | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER > JV2_CHROM_INT [18] | > > | SERVER FILTER BY FIRST KEY ONLY > | > > | DYNAMIC SERVER FILTER BY "VJ.ID" IN ($2.$4) > | > > | JOIN-SCANNER 5 ROW LIMIT > | > > +----------------------------------------------------------- > ----------------------------+ > > 7 rows selected (0.936 seconds) > > > > > > Thanks, > > Sergey > > > > On Thu, Apr 19, 2018 at 7:31 PM, Taylor, Ronald (Ronald) < > ronald.tay...@cchmc.org> wrote: > > Hello Phoenix users, > > I am a novice Phoenix user and this is my first post to this user list. I > did some searching in the list archives, but could not find an answer to > what I hope is a simple question: my global index is being ignored, even > after I add a Hint, and I want to know why. > > We are using Phoenix 4.7 in the Hortonworks distribution. Looks like > Hortonworks has been backporting at least some phoenix updates into their > version of phoenix 4.7, so I guess it is a custom distribution. See > > > > https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_ > release-notes/content/patch_phoenix.html > > > > I have created a simple table of about 8 million rows, and about 15 > columns, with several fields having global indexes. I created the main > table (variantjoin_rtsalted24) and its indexes, and then used a bulk loader > to populate them from a tab-delimited file. That appeared to work fine. > > chrom_int is one field on which there is a global index, named > vj2_chrom_int. And you can see the index being automatically being used > below, where it is the only field being returned. Time required is 0.124 > sec. > > 0: jdbc:phoenix:> SELECT VJ.chrom_int FROM VARIANTJOIN_RTSALTED24 as VJ > WHERE (chrom_int =18 ) limit 5; > > +------------+ > > | CHROM_INT | > > +------------+ > > | 18 | > > | 18 | > > | 18 | > > | 18 | > > | 18 | > > +------------+ > > 5 rows selected (0.124 seconds) > > 0: jdbc:phoenix:> > > You can see that the vj2_chrom_int index is automatically being used, as I > understand things by the "RANGE SCAN" wording and "[0,1" in the explain > plan: > > 0: jdbc:phoenix:> explain SELECT VJ.chrom_int FROM VARIANTJOIN_RTSALTED24 > as VJ WHERE (chrom_int =18 ) limit 5; > > +----------------------------------------------------------- > -------------------+ > > | PLAN > | > > +----------------------------------------------------------- > -------------------+ > > | CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT > [0,1 | > > | SERVER FILTER BY FIRST KEY ONLY > | > > | SERVER 5 ROW LIMIT > | > > | CLIENT 5 ROW LIMIT > | > > +----------------------------------------------------------- > -------------------+ > > 4 rows selected (0.043 seconds) > > 0: jdbc:phoenix:> > > > I can use a Hint to tell Phoenix to NOT use this index, as seen below. And > that increases the time needed to 1.97 sec, over an order of magnitude more > time than the 0.124 sec required with index use. > > 0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int FROM > VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5; > > +------------+ > > | CHROM_INT | > > +------------+ > > | 18 | > > | 18 | > > | 18 | > > | 18 | > > | 18 | > > +------------+ > > 5 rows selected (1.977 seconds) > > 0: jdbc:phoenix:> > > And here is the explain plan for that: > > > 0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int FROM > VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5; > > +----------------------------------------------------------- > -------------------+ > > | > PLAN | > > +----------------------------------------------------------- > -------------------+ > > | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND > ROBIN | > > | SERVER FILTER BY CHROM_INT = 18 > | > > | SERVER 5 ROW LIMIT > | > > | CLIENT 5 ROW LIMIT > | > > +----------------------------------------------------------- > -------------------+ > > 4 rows selected (0.009 seconds) > > Now, I want to add other fields for retrieval. For example, > "genomic_range". The Phoenix documentation says in such a case I must add a > Hint to force Phoenix to make use of the index (since it is a simple global > index, not a covered index wherein genomic_range has been added.) So I > tried that. See below. Alas, the response time is about the same as what I > get with NO_INDEX. It appears that, even with the Hint, the index is not > being used. > > 0: jdbc:phoenix:> SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int, > genomic_range FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) > limit 5; > +------------+----------------------+ > | CHROM_INT | GENOMIC_RANGE | > +------------+----------------------+ > | 18 | [49546,49547) | > | 18 | [20003625,20003626) | > | 18 | [19618749,19618752) | > | 18 | [47561,47583) | > | 18 | [20024261,20024272) | > +------------+----------------------+ > 5 rows selected (1.799 seconds) > 0: jdbc:phoenix:> > > > And below is the explain plan for the query with the index failure. No > indication of index use, that I can tell. > > 0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */ > VJ.chrom_int, genomic_range FROM VARIANTJOIN_RTSALTED24 as VJ WHERE > (chrom_int =18 ) limit 5; > > +----------------------------------------------------------- > -------------------+ > | > PLAN | > +----------------------------------------------------------- > -------------------+ > | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND > ROBIN | > | SERVER FILTER BY CHROM_INT = 18 > | > | SERVER 5 ROW LIMIT > | > | CLIENT 5 ROW LIMIT > | > +----------------------------------------------------------- > -------------------+ > 4 rows selected (0.042 seconds) > 0: jdbc:phoenix:> > > So I am puzzled and frustrated. How do I get my index Hint to work? The > difference in timing between automatic use (when the index is the only > field being retrieved) and when NO_INDEX is used tells me that the index > table is there and can indeed be used. But something is going wrong when I > try to force its use via a Hint. Guidance would be very much appreciated on > this basic point. > > - Ron Taylor > > Ronald C. Taylor, Ph.D. > Divisions of Immunobiology and Biomedical Informatics > > Cincinnati Children's Hospital Medical Center > > Office phone: 513-803-4880 > > Cell phone: 509-783-7308 > > Email: ronald.tay...@cchmc.org > > > > > > >