Hello Phoenix users,

I am a novice Phoenix user and this is my first post to this user list. I did 
some searching in the list archives, but could not find an answer to what I 
hope is a simple question: my global index is being ignored, even after I add a 
Hint, and I want to know why.

We are using Phoenix 4.7 in the Hortonworks distribution. Looks like 
Hortonworks has been backporting at least some phoenix updates into their 
version of phoenix 4.7, so I guess it is a custom distribution. See

     
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_release-notes/content/patch_phoenix.html

I have created a simple table of about 8 million rows, and about 15 columns, 
with several fields having global indexes. I created the main table 
(variantjoin_rtsalted24) and its indexes, and then used a bulk loader to 
populate them from a tab-delimited file. That appeared to work fine.

chrom_int is one field on which there is a global index, named vj2_chrom_int. 
And you can see the index being automatically being used below, where it is the 
only field being returned. Time required is 0.124 sec.

0: jdbc:phoenix:> SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE 
(chrom_int =18 ) limit 5;

+------------+

| CHROM_INT  |

+------------+

| 18         |

| 18         |

| 18         |

| 18         |

| 18         |

+------------+

5 rows selected (0.124 seconds)

0: jdbc:phoenix:>

You can see that the vj2_chrom_int index is automatically being used, as I 
understand things  by the "RANGE SCAN" wording and "[0,1" in the explain plan:

0: jdbc:phoenix:> explain SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as 
VJ WHERE (chrom_int =18 ) limit 5;

+------------------------------------------------------------------------------+

|                                           PLAN                               |

+------------------------------------------------------------------------------+

| CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT [0,1 |

|     SERVER FILTER BY FIRST KEY ONLY                                          |

|     SERVER 5 ROW LIMIT                                                       |

| CLIENT 5 ROW LIMIT                                                           |

+------------------------------------------------------------------------------+

4 rows selected (0.043 seconds)

0: jdbc:phoenix:>


I can use a Hint to tell Phoenix to NOT use this index, as seen below. And that 
increases the time needed to 1.97 sec, over an order of magnitude more time 
than the 0.124 sec required with index use.

0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM 
VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;

+------------+

| CHROM_INT  |

+------------+

| 18         |

| 18         |

| 18         |

| 18         |

| 18         |

+------------+

5 rows selected (1.977 seconds)

0: jdbc:phoenix:>

And here is the explain plan for that:


0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM 
VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;

+------------------------------------------------------------------------------+

|                                                        PLAN                  |

+------------------------------------------------------------------------------+

| CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND ROBIN  |

|     SERVER FILTER BY CHROM_INT = 18                                          |

|     SERVER 5 ROW LIMIT                                                       |

| CLIENT 5 ROW LIMIT                                                           |

+------------------------------------------------------------------------------+

4 rows selected (0.009 seconds)

Now,  I want to add other fields for retrieval. For example, "genomic_range". 
The Phoenix documentation says in such a case I must add a Hint to force 
Phoenix to make use of the index (since it is a simple global index, not a 
covered index wherein genomic_range has been added.) So I tried that. See 
below. Alas, the response time is about the same as what I get with NO_INDEX.  
It appears that, even with the Hint, the index is not being used.

0: jdbc:phoenix:> SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int, 
genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
+------------+----------------------+
| CHROM_INT  |    GENOMIC_RANGE     |
+------------+----------------------+
| 18         | [49546,49547)        |
| 18         | [20003625,20003626)  |
| 18         | [19618749,19618752)  |
| 18         | [47561,47583)        |
| 18         | [20024261,20024272)  |
+------------+----------------------+
5 rows selected (1.799 seconds)
0: jdbc:phoenix:>


And below is the explain plan for the query with the index failure. No 
indication of index use, that I can tell.

0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int, 
genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;

+------------------------------------------------------------------------------+
|                                                        PLAN                  |
+------------------------------------------------------------------------------+
| CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND ROBIN  |
|     SERVER FILTER BY CHROM_INT = 18                                          |
|     SERVER 5 ROW LIMIT                                                       |
| CLIENT 5 ROW LIMIT                                                           |
+------------------------------------------------------------------------------+
4 rows selected (0.042 seconds)
0: jdbc:phoenix:>

So I am puzzled and frustrated. How do I get my index Hint to work?  The 
difference in timing between automatic use (when the index is the only field 
being retrieved) and when NO_INDEX is used tells me that the index table is 
there and can indeed be used. But something is going wrong when I try to force 
its use via a Hint. Guidance would be very much appreciated on this basic point.

  - Ron Taylor
Ronald C. Taylor, Ph.D.
Divisions of Immunobiology and Biomedical Informatics
Cincinnati Children's Hospital Medical Center
Office phone: 513-803-4880
Cell phone:     509-783-7308
Email: ronald.tay...@cchmc.org<mailto:ronald.tay...@cchmc.org>


Reply via email to