Re: hint to use a global index is not working - need to find out why

2018-04-20 Thread James Taylor
Ron - Salting is only recommended when your primary key is monotonically
increasing. It's mainly used to prevent write hotspotting. Also, I think
Ron forgot to mention, but I was working with him a bit earlier on this,
and I couldn't repro the issue either (in current 4.x or in 4.7 release).
Here's the unit test I put together which hints a non covered global index:

@Test
public void testIndexHintWithNonCoveredColumnSelected() throws
Exception {
String schemaName = "";
String dataTableName = "T_FOO";
String indexTableName = "I_FOO";
String dataTableFullName = SchemaUtil.getTableName(schemaName,
dataTableName);
String indexTableFullName = SchemaUtil.getTableName(schemaName,
indexTableName);
try (Connection conn = DriverManager.getConnection(getUrl())) {
conn.createStatement().execute("CREATE TABLE " +
dataTableFullName + "(k INTEGER PRIMARY KEY, v1 VARCHAR, v2 VARCHAR)");
conn.createStatement().execute("CREATE INDEX " + indexTableName
+ " ON " + dataTableName + "(v1)");
PhoenixStatement stmt =
conn.createStatement().unwrap(PhoenixStatement.class);
QueryPlan plan;
stmt.executeQuery("SELECT v1, v2 " +
" FROM " + dataTableFullName +
" WHERE v1='a'");
plan = stmt.getQueryPlan();
assertEquals("Expected (" + dataTableFullName + ") but was " +
plan.getSourceRefs(), 1, plan.getSourceRefs().size());

stmt.executeQuery("SELECT /*+ INDEX(" + dataTableFullName + " "
+ indexTableName + ") */ v1, v2" +
" FROM " + dataTableFullName +
" WHERE v1='a'");
plan = stmt.getQueryPlan();
assertEquals("Expected (" + dataTableFullName + "," +
indexTableFullName + ") but was " +  plan.getSourceRefs(), 2,
plan.getSourceRefs().size());
}
}



On Fri, Apr 20, 2018 at 1:11 PM, Taylor, Ronald (Ronald) <
ronald.tay...@cchmc.org> wrote:

> Hello Sergey,
>
>
>
> Per your request, here are the commands that I used to create the table
> and its indexes.  Hopefully you can find something in here that provides a
> guide as to what we are doing wrong.
>
>
>
> BTW – as I said, we are novices with Phoenix here. One thing we are doing
> is playing around a bit with salting numbers. We believed that the data in
> our test table was small enough to fit on one region server ( < 10 GB), so
> we used a high salt number (24) to try to force HBase to use more than one
> region server, to parallelize over more than one node. Did we get that
> concept right?
>
>
>
> Ron
>
>
>
> %
>
>
> CREATE TABLE variantjoin_RT_salted24 (
> chrom VARCHAR,
> genomic_range VARCHAR,
> reference VARCHAR,
> alternate VARCHAR,
> annotations VARCHAR,
> consequence VARCHAR,
> chrom_int INTEGER,
> onekg_maf DOUBLE,
> coding VARCHAR,
> esp_aa_maf DOUBLE,
> esp_ea_maf DOUBLE,
> exac_maf DOUBLE,
> filter VARCHAR,
> gene VARCHAR,
> impact VARCHAR,
> polyphen VARCHAR,
> sift VARCHAR,
> viva_maf DOUBLE,
> variant_id INTEGER PRIMARY KEY,
> genomic_range_start INTEGER,
> genomic_range_end INTEGER
> ) SALT_BUCKETS = 24, IMMUTABLE_ROWS=false;
>
>
>
>
>
> 0: jdbc:phoenix:> !describe variantjoin_RTsalted24
>
>
> 0: jdbc:phoenix:> !describe variantjoin_RTsalted24
> ++--+-+-
> -+-+
> | TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME|
> COLUMN_NAME  | |
> ++--+-+-
> -+-+
> ||  | VARIANTJOIN_RTSALTED24  |
> CHROM| |
> ||  | VARIANTJOIN_RTSALTED24  |
> GENOMIC_RANGE| |
> ||  | VARIANTJOIN_RTSALTED24  |
> REFERENCE| |
> ||  | VARIANTJOIN_RTSALTED24  |
> ALTERNATE| |
> ||  | VARIANTJOIN_RTSALTED24  |
> ANNOTATIONS  | |
> ||  | VARIANTJOIN_RTSALTED24  |
> CONSEQUENCE  | |
> ||  | VARIANTJOIN_RTSALTED24  |
> CHROM_INT| |
> ||  | VARIANTJOIN_RTSALTED24  |
> ONEKG_MAF| |
> ||  | VARIANTJOIN_RTSALTED24  |
> CODING   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> ESP_AA_MAF   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> ESP_EA_MAF   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> EXAC_MAF | |
> ||  | VARIANTJOIN_RTSALTED24  |
> FILTER   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> GENE | |
> ||  | VARIANTJOIN_RTSALTED24  |
> IMPACT   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> POLYPHEN | |
> ||  | VARIANTJOIN_RTSALTED24  |

Re: hint to use a global index is not working - need to find out why

2018-04-20 Thread Taylor, Ronald (Ronald)
Hello Sergey,

Per your request, here are the commands that I used to create the table and its 
indexes.  Hopefully you can find something in here that provides a guide as to 
what we are doing wrong.

BTW – as I said, we are novices with Phoenix here. One thing we are doing is 
playing around a bit with salting numbers. We believed that the data in our 
test table was small enough to fit on one region server ( < 10 GB), so we used 
a high salt number (24) to try to force HBase to use more than one region 
server, to parallelize over more than one node. Did we get that concept right?

Ron

%

CREATE TABLE variantjoin_RT_salted24 (
chrom VARCHAR,
genomic_range VARCHAR,
reference VARCHAR,
alternate VARCHAR,
annotations VARCHAR,
consequence VARCHAR,
chrom_int INTEGER,
onekg_maf DOUBLE,
coding VARCHAR,
esp_aa_maf DOUBLE,
esp_ea_maf DOUBLE,
exac_maf DOUBLE,
filter VARCHAR,
gene VARCHAR,
impact VARCHAR,
polyphen VARCHAR,
sift VARCHAR,
viva_maf DOUBLE,
variant_id INTEGER PRIMARY KEY,
genomic_range_start INTEGER,
genomic_range_end INTEGER
) SALT_BUCKETS = 24, IMMUTABLE_ROWS=false;



0: jdbc:phoenix:> !describe variantjoin_RTsalted24


0: jdbc:phoenix:> !describe variantjoin_RTsalted24
++--+-+--+-+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME| COLUMN_NAME  | |
++--+-+--+-+
||  | VARIANTJOIN_RTSALTED24  | CHROM| |
||  | VARIANTJOIN_RTSALTED24  | GENOMIC_RANGE| |
||  | VARIANTJOIN_RTSALTED24  | REFERENCE| |
||  | VARIANTJOIN_RTSALTED24  | ALTERNATE| |
||  | VARIANTJOIN_RTSALTED24  | ANNOTATIONS  | |
||  | VARIANTJOIN_RTSALTED24  | CONSEQUENCE  | |
||  | VARIANTJOIN_RTSALTED24  | CHROM_INT| |
||  | VARIANTJOIN_RTSALTED24  | ONEKG_MAF| |
||  | VARIANTJOIN_RTSALTED24  | CODING   | |
||  | VARIANTJOIN_RTSALTED24  | ESP_AA_MAF   | |
||  | VARIANTJOIN_RTSALTED24  | ESP_EA_MAF   | |
||  | VARIANTJOIN_RTSALTED24  | EXAC_MAF | |
||  | VARIANTJOIN_RTSALTED24  | FILTER   | |
||  | VARIANTJOIN_RTSALTED24  | GENE | |
||  | VARIANTJOIN_RTSALTED24  | IMPACT   | |
||  | VARIANTJOIN_RTSALTED24  | POLYPHEN | |
||  | VARIANTJOIN_RTSALTED24  | SIFT | |
||  | VARIANTJOIN_RTSALTED24  | VIVA_MAF | |
||  | VARIANTJOIN_RTSALTED24  | VARIANT_ID   | |
||  | VARIANTJOIN_RTSALTED24  | GENOMIC_RANGE_START  | |
||  | VARIANTJOIN_RTSALTED24  | GENOMIC_RANGE_END| |
++--+-+--+-+
0: jdbc:phoenix:>


0: jdbc:phoenix:> create index vj2_chrom on variantjoin_RTsalted24 (chrom);
create index vj2_chrom on variantjoin_RTsalted24 (chrom);
No rows affected (13.322 seconds)
0: jdbc:phoenix:>

reference- no index at present
alternate- no index at present
annotations  - JSONB blob no index at present


0: jdbc:phoenix:> create index vj2_genomic_range on variantjoin_RTsalted24 
(genomic_range);
create index vj2_genomic_range on variantjoin_RTsalted24 (genomic_range);
No rows affected (11.953 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_chrom_int on variantjoin_RTsalted24 
(chrom_int);
create index vj2_chrom_int on variantjoin_RTsalted24 (chrom_int);
No rows affected (12.518 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_onekg_maf on variantjoin_RTsalted24 
(onekg_maf);
create index vj2_onekg_maf on variantjoin_RTsalted24 (onekg_maf);
No rows affected (13.727 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_coding_maf on variantjoin_RTsalted24 
(onekg_maf);
create index vj2_coding_maf on variantjoin_RTsalted24 (onekg_maf);
No rows affected (12.45 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_consequence on variantjoin_RTsalted24 
(consequence);
create index vj2_consequence on variantjoin_RTsalted24 (consequence);
No rows affected (13.906 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_esp_aa_maf on variantjoin_RTsalted24 
(esp_aa_maf);
create index vj2_esp_ea_maf on variantjoin_RTsalted24 (esp_ea_maf);
No rows affected (12.355 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_exac_maf on variantjoin_RTsalted24 
(exac_maf);

Re: hint to use a global index is not working - need to find out why

2018-04-19 Thread Sergey Soldatov
That looks strange. Could you please provide full DDLs for table and
indexes? I just tried a similar scenario and obviously index is used:

0: jdbc:phoenix:> create table VARIANTJOIN_RTSALTED24 (id integer primary
key, chrom_int integer, genomic_range integer);
No rows affected (6.339 seconds)
0: jdbc:phoenix:>create index jv2_chrom_int on VARIANTJOIN_RTSALTED24
(chrom_int);
No rows affected (10.016 seconds)
0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */
VJ.chrom_int, genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE
(chrom_int =18 ) limit 5;
+---+
| PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER
VARIANTJOIN_RTSALTED24   |
| CLIENT 5 ROW LIMIT
|
| SKIP-SCAN-JOIN TABLE 0
|
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER
JV2_CHROM_INT [18]  |
| SERVER FILTER BY FIRST KEY ONLY
 |
| DYNAMIC SERVER FILTER BY "VJ.ID" IN ($2.$4)
 |
| JOIN-SCANNER 5 ROW LIMIT
|
+---+
7 rows selected (0.936 seconds)


Thanks,
Sergey

On Thu, Apr 19, 2018 at 7:31 PM, Taylor, Ronald (Ronald) <
ronald.tay...@cchmc.org> wrote:

> Hello Phoenix users,
>
> I am a novice Phoenix user and this is my first post to this user list. I
> did some searching in the list archives, but could not find an answer to
> what I hope is a simple question: my global index is being ignored, even
> after I add a Hint, and I want to know why.
>
> We are using Phoenix 4.7 in the Hortonworks distribution. Looks like
> Hortonworks has been backporting at least some phoenix updates into their
> version of phoenix 4.7, so I guess it is a custom distribution. See
>
>
>
>  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/
> bk_release-notes/content/patch_phoenix.html
>
>
>
> I have created a simple table of about 8 million rows, and about 15
> columns, with several fields having global indexes. I created the main
> table (variantjoin_rtsalted24) and its indexes, and then used a bulk loader
> to populate them from a tab-delimited file. That appeared to work fine.
>
> chrom_int is one field on which there is a global index, named
> vj2_chrom_int. And you can see the index being automatically being used
> below, where it is the only field being returned. Time required is 0.124
> sec.
>
> 0: jdbc:phoenix:> SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as VJ
> WHERE (chrom_int =18 ) limit 5;
>
> ++
>
> | CHROM_INT  |
>
> ++
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> ++
>
> 5 rows selected (0.124 seconds)
>
> 0: jdbc:phoenix:>
>
> You can see that the vj2_chrom_int index is automatically being used, as I
> understand things  by the "RANGE SCAN" wording and "[0,1" in the explain
> plan:
>
> 0: jdbc:phoenix:> explain SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24
> as VJ WHERE (chrom_int =18 ) limit 5;
>
> +---
> ---+
>
> |   PLAN
> |
>
> +---
> ---+
>
> | CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT
> [0,1 |
>
> | SERVER FILTER BY FIRST KEY ONLY
> |
>
> | SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5 ROW LIMIT
>   |
>
> +---
> ---+
>
> 4 rows selected (0.043 seconds)
>
> 0: jdbc:phoenix:>
>
>
> I can use a Hint to tell Phoenix to NOT use this index, as seen below. And
> that increases the time needed to 1.97 sec, over an order of magnitude more
> time than the 0.124 sec required with index use.
>
> 0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> ++
>
> | CHROM_INT  |
>
> ++
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> ++
>
> 5 rows selected (1.977 seconds)
>
> 0: jdbc:phoenix:>
>
> And here is the explain plan for that:
>
>
> 0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> +---
> ---+
>
> |
> PLAN  |
>
> +---
> ---+
>
> | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND
> ROBIN  |
>
> | SERVER FILTER BY CHROM_INT = 18
> |
>
> | SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5 

hint to use a global index is not working - need to find out why

2018-04-19 Thread Taylor, Ronald (Ronald)
Hello Phoenix users,

I am a novice Phoenix user and this is my first post to this user list. I did 
some searching in the list archives, but could not find an answer to what I 
hope is a simple question: my global index is being ignored, even after I add a 
Hint, and I want to know why.

We are using Phoenix 4.7 in the Hortonworks distribution. Looks like 
Hortonworks has been backporting at least some phoenix updates into their 
version of phoenix 4.7, so I guess it is a custom distribution. See

 
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_release-notes/content/patch_phoenix.html

I have created a simple table of about 8 million rows, and about 15 columns, 
with several fields having global indexes. I created the main table 
(variantjoin_rtsalted24) and its indexes, and then used a bulk loader to 
populate them from a tab-delimited file. That appeared to work fine.

chrom_int is one field on which there is a global index, named vj2_chrom_int. 
And you can see the index being automatically being used below, where it is the 
only field being returned. Time required is 0.124 sec.

0: jdbc:phoenix:> SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE 
(chrom_int =18 ) limit 5;

++

| CHROM_INT  |

++

| 18 |

| 18 |

| 18 |

| 18 |

| 18 |

++

5 rows selected (0.124 seconds)

0: jdbc:phoenix:>

You can see that the vj2_chrom_int index is automatically being used, as I 
understand things  by the "RANGE SCAN" wording and "[0,1" in the explain plan:

0: jdbc:phoenix:> explain SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as 
VJ WHERE (chrom_int =18 ) limit 5;

+--+

|   PLAN   |

+--+

| CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT [0,1 |

| SERVER FILTER BY FIRST KEY ONLY  |

| SERVER 5 ROW LIMIT   |

| CLIENT 5 ROW LIMIT   |

+--+

4 rows selected (0.043 seconds)

0: jdbc:phoenix:>


I can use a Hint to tell Phoenix to NOT use this index, as seen below. And that 
increases the time needed to 1.97 sec, over an order of magnitude more time 
than the 0.124 sec required with index use.

0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM 
VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;

++

| CHROM_INT  |

++

| 18 |

| 18 |

| 18 |

| 18 |

| 18 |

++

5 rows selected (1.977 seconds)

0: jdbc:phoenix:>

And here is the explain plan for that:


0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM 
VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;

+--+

|PLAN  |

+--+

| CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND ROBIN  |

| SERVER FILTER BY CHROM_INT = 18  |

| SERVER 5 ROW LIMIT   |

| CLIENT 5 ROW LIMIT   |

+--+

4 rows selected (0.009 seconds)

Now,  I want to add other fields for retrieval. For example, "genomic_range". 
The Phoenix documentation says in such a case I must add a Hint to force 
Phoenix to make use of the index (since it is a simple global index, not a 
covered index wherein genomic_range has been added.) So I tried that. See 
below. Alas, the response time is about the same as what I get with NO_INDEX.  
It appears that, even with the Hint, the index is not being used.

0: jdbc:phoenix:> SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int, 
genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
++--+
| CHROM_INT  |GENOMIC_RANGE |
++--+
| 18 | [49546,49547)|
| 18 | [20003625,20003626)  |
| 18 | [19618749,19618752)  |
| 18 | [47561,47583)|
| 18 | [20024261,20024272)  |
++--+
5 rows selected (1.799 seconds)
0: jdbc:phoenix:>


And below is the explain plan for the query with the index failure. No 
indication of index use, that I can tell.

0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */ VJ.chrom_int, 
genomic_range  FROM