Re: hint to use a global index is not working - need to find out why

2018-04-20 Thread James Taylor
Ron - Salting is only recommended when your primary key is monotonically
increasing. It's mainly used to prevent write hotspotting. Also, I think
Ron forgot to mention, but I was working with him a bit earlier on this,
and I couldn't repro the issue either (in current 4.x or in 4.7 release).
Here's the unit test I put together which hints a non covered global index:

@Test
public void testIndexHintWithNonCoveredColumnSelected() throws
Exception {
String schemaName = "";
String dataTableName = "T_FOO";
String indexTableName = "I_FOO";
String dataTableFullName = SchemaUtil.getTableName(schemaName,
dataTableName);
String indexTableFullName = SchemaUtil.getTableName(schemaName,
indexTableName);
try (Connection conn = DriverManager.getConnection(getUrl())) {
conn.createStatement().execute("CREATE TABLE " +
dataTableFullName + "(k INTEGER PRIMARY KEY, v1 VARCHAR, v2 VARCHAR)");
conn.createStatement().execute("CREATE INDEX " + indexTableName
+ " ON " + dataTableName + "(v1)");
PhoenixStatement stmt =
conn.createStatement().unwrap(PhoenixStatement.class);
QueryPlan plan;
stmt.executeQuery("SELECT v1, v2 " +
" FROM " + dataTableFullName +
" WHERE v1='a'");
plan = stmt.getQueryPlan();
assertEquals("Expected (" + dataTableFullName + ") but was " +
plan.getSourceRefs(), 1, plan.getSourceRefs().size());

stmt.executeQuery("SELECT /*+ INDEX(" + dataTableFullName + " "
+ indexTableName + ") */ v1, v2" +
" FROM " + dataTableFullName +
" WHERE v1='a'");
plan = stmt.getQueryPlan();
assertEquals("Expected (" + dataTableFullName + "," +
indexTableFullName + ") but was " +  plan.getSourceRefs(), 2,
plan.getSourceRefs().size());
}
}



On Fri, Apr 20, 2018 at 1:11 PM, Taylor, Ronald (Ronald) <
ronald.tay...@cchmc.org> wrote:

> Hello Sergey,
>
>
>
> Per your request, here are the commands that I used to create the table
> and its indexes.  Hopefully you can find something in here that provides a
> guide as to what we are doing wrong.
>
>
>
> BTW – as I said, we are novices with Phoenix here. One thing we are doing
> is playing around a bit with salting numbers. We believed that the data in
> our test table was small enough to fit on one region server ( < 10 GB), so
> we used a high salt number (24) to try to force HBase to use more than one
> region server, to parallelize over more than one node. Did we get that
> concept right?
>
>
>
> Ron
>
>
>
> %
>
>
> CREATE TABLE variantjoin_RT_salted24 (
> chrom VARCHAR,
> genomic_range VARCHAR,
> reference VARCHAR,
> alternate VARCHAR,
> annotations VARCHAR,
> consequence VARCHAR,
> chrom_int INTEGER,
> onekg_maf DOUBLE,
> coding VARCHAR,
> esp_aa_maf DOUBLE,
> esp_ea_maf DOUBLE,
> exac_maf DOUBLE,
> filter VARCHAR,
> gene VARCHAR,
> impact VARCHAR,
> polyphen VARCHAR,
> sift VARCHAR,
> viva_maf DOUBLE,
> variant_id INTEGER PRIMARY KEY,
> genomic_range_start INTEGER,
> genomic_range_end INTEGER
> ) SALT_BUCKETS = 24, IMMUTABLE_ROWS=false;
>
>
>
>
>
> 0: jdbc:phoenix:> !describe variantjoin_RTsalted24
>
>
> 0: jdbc:phoenix:> !describe variantjoin_RTsalted24
> ++--+-+-
> -+-+
> | TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME|
> COLUMN_NAME  | |
> ++--+-+-
> -+-+
> ||  | VARIANTJOIN_RTSALTED24  |
> CHROM| |
> ||  | VARIANTJOIN_RTSALTED24  |
> GENOMIC_RANGE| |
> ||  | VARIANTJOIN_RTSALTED24  |
> REFERENCE| |
> ||  | VARIANTJOIN_RTSALTED24  |
> ALTERNATE| |
> ||  | VARIANTJOIN_RTSALTED24  |
> ANNOTATIONS  | |
> ||  | VARIANTJOIN_RTSALTED24  |
> CONSEQUENCE  | |
> ||  | VARIANTJOIN_RTSALTED24  |
> CHROM_INT| |
> ||  | VARIANTJOIN_RTSALTED24  |
> ONEKG_MAF| |
> ||  | VARIANTJOIN_RTSALTED24  |
> CODING   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> ESP_AA_MAF   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> ESP_EA_MAF   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> EXAC_MAF | |
> ||  | VARIANTJOIN_RTSALTED24  |
> FILTER   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> GENE | |
> ||  | VARIANTJOIN_RTSALTED24  |
> IMPACT   | |
> ||  | VARIANTJOIN_RTSALTED24  |
> POLYPHEN | |
> ||  | VARIANTJOIN_RTSALTED24  |

Re: hint to use a global index is not working - need to find out why

2018-04-20 Thread Taylor, Ronald (Ronald)
Hello Sergey,

Per your request, here are the commands that I used to create the table and its 
indexes.  Hopefully you can find something in here that provides a guide as to 
what we are doing wrong.

BTW – as I said, we are novices with Phoenix here. One thing we are doing is 
playing around a bit with salting numbers. We believed that the data in our 
test table was small enough to fit on one region server ( < 10 GB), so we used 
a high salt number (24) to try to force HBase to use more than one region 
server, to parallelize over more than one node. Did we get that concept right?

Ron

%

CREATE TABLE variantjoin_RT_salted24 (
chrom VARCHAR,
genomic_range VARCHAR,
reference VARCHAR,
alternate VARCHAR,
annotations VARCHAR,
consequence VARCHAR,
chrom_int INTEGER,
onekg_maf DOUBLE,
coding VARCHAR,
esp_aa_maf DOUBLE,
esp_ea_maf DOUBLE,
exac_maf DOUBLE,
filter VARCHAR,
gene VARCHAR,
impact VARCHAR,
polyphen VARCHAR,
sift VARCHAR,
viva_maf DOUBLE,
variant_id INTEGER PRIMARY KEY,
genomic_range_start INTEGER,
genomic_range_end INTEGER
) SALT_BUCKETS = 24, IMMUTABLE_ROWS=false;



0: jdbc:phoenix:> !describe variantjoin_RTsalted24


0: jdbc:phoenix:> !describe variantjoin_RTsalted24
++--+-+--+-+
| TABLE_CAT  | TABLE_SCHEM  |   TABLE_NAME| COLUMN_NAME  | |
++--+-+--+-+
||  | VARIANTJOIN_RTSALTED24  | CHROM| |
||  | VARIANTJOIN_RTSALTED24  | GENOMIC_RANGE| |
||  | VARIANTJOIN_RTSALTED24  | REFERENCE| |
||  | VARIANTJOIN_RTSALTED24  | ALTERNATE| |
||  | VARIANTJOIN_RTSALTED24  | ANNOTATIONS  | |
||  | VARIANTJOIN_RTSALTED24  | CONSEQUENCE  | |
||  | VARIANTJOIN_RTSALTED24  | CHROM_INT| |
||  | VARIANTJOIN_RTSALTED24  | ONEKG_MAF| |
||  | VARIANTJOIN_RTSALTED24  | CODING   | |
||  | VARIANTJOIN_RTSALTED24  | ESP_AA_MAF   | |
||  | VARIANTJOIN_RTSALTED24  | ESP_EA_MAF   | |
||  | VARIANTJOIN_RTSALTED24  | EXAC_MAF | |
||  | VARIANTJOIN_RTSALTED24  | FILTER   | |
||  | VARIANTJOIN_RTSALTED24  | GENE | |
||  | VARIANTJOIN_RTSALTED24  | IMPACT   | |
||  | VARIANTJOIN_RTSALTED24  | POLYPHEN | |
||  | VARIANTJOIN_RTSALTED24  | SIFT | |
||  | VARIANTJOIN_RTSALTED24  | VIVA_MAF | |
||  | VARIANTJOIN_RTSALTED24  | VARIANT_ID   | |
||  | VARIANTJOIN_RTSALTED24  | GENOMIC_RANGE_START  | |
||  | VARIANTJOIN_RTSALTED24  | GENOMIC_RANGE_END| |
++--+-+--+-+
0: jdbc:phoenix:>


0: jdbc:phoenix:> create index vj2_chrom on variantjoin_RTsalted24 (chrom);
create index vj2_chrom on variantjoin_RTsalted24 (chrom);
No rows affected (13.322 seconds)
0: jdbc:phoenix:>

reference- no index at present
alternate- no index at present
annotations  - JSONB blob no index at present


0: jdbc:phoenix:> create index vj2_genomic_range on variantjoin_RTsalted24 
(genomic_range);
create index vj2_genomic_range on variantjoin_RTsalted24 (genomic_range);
No rows affected (11.953 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_chrom_int on variantjoin_RTsalted24 
(chrom_int);
create index vj2_chrom_int on variantjoin_RTsalted24 (chrom_int);
No rows affected (12.518 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_onekg_maf on variantjoin_RTsalted24 
(onekg_maf);
create index vj2_onekg_maf on variantjoin_RTsalted24 (onekg_maf);
No rows affected (13.727 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_coding_maf on variantjoin_RTsalted24 
(onekg_maf);
create index vj2_coding_maf on variantjoin_RTsalted24 (onekg_maf);
No rows affected (12.45 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_consequence on variantjoin_RTsalted24 
(consequence);
create index vj2_consequence on variantjoin_RTsalted24 (consequence);
No rows affected (13.906 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_esp_aa_maf on variantjoin_RTsalted24 
(esp_aa_maf);
create index vj2_esp_ea_maf on variantjoin_RTsalted24 (esp_ea_maf);
No rows affected (12.355 seconds)
0: jdbc:phoenix:>

0: jdbc:phoenix:> create index vj2_exac_maf on variantjoin_RTsalted24 
(exac_maf);

Re: hint to use a global index is not working - need to find out why

2018-04-19 Thread Sergey Soldatov
That looks strange. Could you please provide full DDLs for table and
indexes? I just tried a similar scenario and obviously index is used:

0: jdbc:phoenix:> create table VARIANTJOIN_RTSALTED24 (id integer primary
key, chrom_int integer, genomic_range integer);
No rows affected (6.339 seconds)
0: jdbc:phoenix:>create index jv2_chrom_int on VARIANTJOIN_RTSALTED24
(chrom_int);
No rows affected (10.016 seconds)
0: jdbc:phoenix:> explain SELECT/*+ INDEX(VJ jv2_chrom_int) */
VJ.chrom_int, genomic_range  FROM VARIANTJOIN_RTSALTED24 as VJ WHERE
(chrom_int =18 ) limit 5;
+---+
| PLAN
|
+---+
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER
VARIANTJOIN_RTSALTED24   |
| CLIENT 5 ROW LIMIT
|
| SKIP-SCAN-JOIN TABLE 0
|
| CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN RANGE SCAN OVER
JV2_CHROM_INT [18]  |
| SERVER FILTER BY FIRST KEY ONLY
 |
| DYNAMIC SERVER FILTER BY "VJ.ID" IN ($2.$4)
 |
| JOIN-SCANNER 5 ROW LIMIT
|
+---+
7 rows selected (0.936 seconds)


Thanks,
Sergey

On Thu, Apr 19, 2018 at 7:31 PM, Taylor, Ronald (Ronald) <
ronald.tay...@cchmc.org> wrote:

> Hello Phoenix users,
>
> I am a novice Phoenix user and this is my first post to this user list. I
> did some searching in the list archives, but could not find an answer to
> what I hope is a simple question: my global index is being ignored, even
> after I add a Hint, and I want to know why.
>
> We are using Phoenix 4.7 in the Hortonworks distribution. Looks like
> Hortonworks has been backporting at least some phoenix updates into their
> version of phoenix 4.7, so I guess it is a custom distribution. See
>
>
>
>  https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/
> bk_release-notes/content/patch_phoenix.html
>
>
>
> I have created a simple table of about 8 million rows, and about 15
> columns, with several fields having global indexes. I created the main
> table (variantjoin_rtsalted24) and its indexes, and then used a bulk loader
> to populate them from a tab-delimited file. That appeared to work fine.
>
> chrom_int is one field on which there is a global index, named
> vj2_chrom_int. And you can see the index being automatically being used
> below, where it is the only field being returned. Time required is 0.124
> sec.
>
> 0: jdbc:phoenix:> SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24 as VJ
> WHERE (chrom_int =18 ) limit 5;
>
> ++
>
> | CHROM_INT  |
>
> ++
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> ++
>
> 5 rows selected (0.124 seconds)
>
> 0: jdbc:phoenix:>
>
> You can see that the vj2_chrom_int index is automatically being used, as I
> understand things  by the "RANGE SCAN" wording and "[0,1" in the explain
> plan:
>
> 0: jdbc:phoenix:> explain SELECT VJ.chrom_int  FROM VARIANTJOIN_RTSALTED24
> as VJ WHERE (chrom_int =18 ) limit 5;
>
> +---
> ---+
>
> |   PLAN
> |
>
> +---
> ---+
>
> | CLIENT 24-CHUNK SERIAL 24-WAY ROUND ROBIN RANGE SCAN OVER VJ2_CHROM_INT
> [0,1 |
>
> | SERVER FILTER BY FIRST KEY ONLY
> |
>
> | SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5 ROW LIMIT
>   |
>
> +---
> ---+
>
> 4 rows selected (0.043 seconds)
>
> 0: jdbc:phoenix:>
>
>
> I can use a Hint to tell Phoenix to NOT use this index, as seen below. And
> that increases the time needed to 1.97 sec, over an order of magnitude more
> time than the 0.124 sec required with index use.
>
> 0: jdbc:phoenix:> SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> ++
>
> | CHROM_INT  |
>
> ++
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> | 18 |
>
> ++
>
> 5 rows selected (1.977 seconds)
>
> 0: jdbc:phoenix:>
>
> And here is the explain plan for that:
>
>
> 0: jdbc:phoenix:> explain SELECT /*+ NO_INDEX */ VJ.chrom_int  FROM
> VARIANTJOIN_RTSALTED24 as VJ WHERE (chrom_int =18 ) limit 5;
>
> +---
> ---+
>
> |
> PLAN  |
>
> +---
> ---+
>
> | CLIENT 72-CHUNK 14325202 ROWS 15099524157 BYTES PARALLEL 24-WAY ROUND
> ROBIN  |
>
> | SERVER FILTER BY CHROM_INT = 18
> |
>
> | SERVER 5 ROW LIMIT
> |
>
> | CLIENT 5