Re: Hash join confusion

2016-10-06 Thread Maryann Xue
Assigned. Thanks a lot for filing the issue, Sumit!

On Thu, Oct 6, 2016 at 10:19 AM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Hi Maryann,
>
> I created https://issues.apache.org/jira/browse/PHOENIX-3354 for this
> issue. I could not assign to you.
>
> Best regards,
> Sumit
>
> --
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam <
> sumit_o...@yahoo.com>
> *Sent:* Wednesday, October 5, 2016 11:27 AM
> *Subject:* Re: Hash join confusion
>
> Not sure if it's related, coz your DDL does not have DESC columns, but we
> do have a sort-merge-join bug fix in 4.8.0: https://issues.apache.org/
> jira/browse/PHOENIX-2894.
>
> Otherwise could you please just file a JIRA and assign to me? Thanks a lot!
>
>
> Thanks,
> Maryann
>
> On Tue, Oct 4, 2016 at 8:24 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:
>
> Hi Maryann,
>
> *Here are the 2 DDLs (for data and index tables)*:
>
> CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT
> NOT NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM
> VARCHAR(32), BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32),
> PREV_TIMESTAMP BIGINT, SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES
> VARCHAR, BULK_PUBLISH_UUID  VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC
> VARCHAR, PREV_EXDOC VARCHAR *CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP,
> ID)*) COMPRESSION = 'SNAPPY', BLOCKCACHE =  false, *SALT_BUCKETS = 36*
>
>
> CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID)
> INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)
>
>
>
> *Here is the upsert query for this table*:
>
> UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID,
> CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM,
> PREV_TIMESTAMP, OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC,
> BINARY_PREV_EXDOC) VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?)
>
>
> *Here is explain plan of a SELECT with merge sort:*
>
> explain   select  */* +USE_SORT_MERGE_JOIN*/ * ID, CURR_EXDOC,
> BINARY_CURR_EXDOC, CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC,
> BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP from "ldmns:exDocStoreb"
>  as a inner join (select max(CURRENT_TIMESTAMP) as mct, ID as tid from
> "ldmns:exDocStoreb" where ID like ' 006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7 c3bf%' group by ID) as tmp on a.ID=tmp.tid
> and a.CURRENT_TIMESTAMP=tmp.mct where id like '
> 006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' ;
> +- -+
> |   PLAN   |
> +- -+
> | SORT-MERGE-JOIN (INNER) TABLES   |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:exDocStoreb
> [0] |
> | SERVER FILTER BY ID LIKE ' 006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7 c3bf%' |
> | SERVER SORTED BY [A.ID <http://a.id/>, A.CURRENT_TIMESTAMP] |
> | CLIENT MERGE SORT|
> | AND (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb
> [0,' 006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf'] -
> [0,'006389 |
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] |
> +- -+
> 11 rows selected (0.025 seconds)
>
>
> *Here is explain plan with default join:*
>
> explain SELECT   ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP,
> CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP
> from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as
> mct, ID as tid from "ldmns:exDocStoreb" where ID like '
> 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group
> by ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like
> ' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' ;
> +- -+
> |   PLAN   |
> +- -+
> | CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] |
> | SERVER FILTER BY ID LIKE ' 42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7 

Re: Hash join confusion

2016-10-06 Thread Sumit Nigam
Hi Maryann,
I created https://issues.apache.org/jira/browse/PHOENIX-3354 for this issue. I 
could not assign to you.
Best regards,Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam 
<sumit_o...@yahoo.com> 
 Sent: Wednesday, October 5, 2016 11:27 AM
 Subject: Re: Hash join confusion
   
Not sure if it's related, coz your DDL does not have DESC columns, but we do 
have a sort-merge-join bug fix in 4.8.0: 
https://issues.apache.org/jira/browse/PHOENIX-2894.
Otherwise could you please just file a JIRA and assign to me? Thanks a lot!

Thanks,Maryann
On Tue, Oct 4, 2016 at 8:24 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Hi Maryann,
Here are the 2 DDLs (for data and index tables):
CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT NOT 
NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM VARCHAR(32), 
BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32), PREV_TIMESTAMP BIGINT, 
SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES VARCHAR, BULK_PUBLISH_UUID  
VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC VARCHAR, PREV_EXDOC VARCHAR CONSTRAINT 
PK PRIMARY KEY(CURRENT_TIMESTAMP, ID)) COMPRESSION = 'SNAPPY', BLOCKCACHE =  
false, SALT_BUCKETS = 36

CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID) 
INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)


Here is the upsert query for this table:
UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID, 
CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP, 
OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC, BINARY_PREV_EXDOC) 
VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

Here is explain plan of a SELECT with merge sort:
explain   select  /* +USE_SORT_MERGE_JOIN*/  ID, CURR_EXDOC, BINARY_CURR_EXDOC, 
CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, 
PREV_TIMESTAMP from "ldmns:exDocStoreb"  as a inner join (select 
max(CURRENT_TIMESTAMP) as mct, ID as tid from "ldmns:exDocStoreb" where ID like 
' 006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' group by 
ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where id like ' 
006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' 
;+- -+|                   PLAN          
         |+- -+| SORT-MERGE-JOIN 
(INNER) TABLES           ||     CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER 
ldmns:exDocStoreb [0] ||         SERVER FILTER BY ID LIKE ' 
006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf%' ||         
SERVER SORTED BY [A.ID, A.CURRENT_TIMESTAMP] ||     CLIENT MERGE SORT           
         || AND (SKIP MERGE)                         ||     CLIENT 36-CHUNK 
PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb [0,' 
006389a6b10667f39bdbbdafdc4611 e03cc04418cbc2619ddc01f54d88d7 c3bf'] - 
[0,'006389 ||         SERVER FILTER BY FIRST KEY ONLY  ||         SERVER 
AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] ||     CLIENT MERGE SORT         
           ||     CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] 
|+- -+11 rows selected (0.025 seconds)

Here is explain plan with default join:
explain SELECT   ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP, 
CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP 
from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as mct, 
ID as tid from "ldmns:exDocStoreb" where ID like ' 
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group by 
ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like ' 
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' 
;+- -+|                   PLAN          
         |+- -+| CLIENT 3-CHUNK 
PARALLEL 3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] ||     SERVER FILTER BY ID 
LIKE ' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' ||  
   PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) ||         CLIENT 3-CHUNK PARALLEL 
3-WAY RANGE SCAN OVER ldmns:indx_exdocb [0,' 42ecf4abd4bd7e7606025dc8eee3de 
6a3cc04418cbc2619ddc01f54d88d7 c3bf'] - [0,' 42ecf4abd4bd7e7606025dc8eee3de 
6a3cc0 ||             SERVER FILTER BY FIRST KEY ONLY ||             SERVER 
AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] ||         CLIENT MERGE SORT     
           ||     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN 
((TMP.MCT, TMP.TID)) |+- -+8 rows 
selected (0.033 seconds)

Looking forward to hearing from you.
Best regards,Sumit
  From: Sumit Nigam <sumit_o...

Re: Hash join confusion

2016-10-04 Thread Maryann Xue
Not sure if it's related, coz your DDL does not have DESC columns, but we
do have a sort-merge-join bug fix in 4.8.0:
https://issues.apache.org/jira/browse/PHOENIX-2894.

Otherwise could you please just file a JIRA and assign to me? Thanks a lot!


Thanks,
Maryann

On Tue, Oct 4, 2016 at 8:24 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Hi Maryann,
>
> *Here are the 2 DDLs (for data and index tables)*:
>
> CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT
> NOT NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM
> VARCHAR(32), BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32),
> PREV_TIMESTAMP BIGINT, SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES
> VARCHAR, BULK_PUBLISH_UUID  VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC
> VARCHAR, PREV_EXDOC VARCHAR *CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP,
> ID)*) COMPRESSION = 'SNAPPY', BLOCKCACHE =  false, *SALT_BUCKETS = 36*
>
>
> CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID)
> INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)
>
>
>
> *Here is the upsert query for this table*:
>
> UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID,
> CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM,
> PREV_TIMESTAMP, OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC,
> BINARY_PREV_EXDOC) VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?)
>
>
> *Here is explain plan of a SELECT with merge sort:*
>
> explain   select  */* +USE_SORT_MERGE_JOIN*/ * ID, CURR_EXDOC,
> BINARY_CURR_EXDOC, CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC,
> BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP from "ldmns:exDocStoreb"
>  as a inner join (select max(CURRENT_TIMESTAMP) as mct, ID as tid from
> "ldmns:exDocStoreb" where ID like '006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7c3bf%' group by ID) as tmp on a.ID=tmp.tid
> and a.CURRENT_TIMESTAMP=tmp.mct where id like '
> 006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf%' ;
> +--+
> |   PLAN   |
> +--+
> | SORT-MERGE-JOIN (INNER) TABLES   |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:exDocStoreb
> [0] |
> | SERVER FILTER BY ID LIKE '006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7c3bf%' |
> | SERVER SORTED BY [A.ID, A.CURRENT_TIMESTAMP] |
> | CLIENT MERGE SORT|
> | AND (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb
> [0,'006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'006389 |
> | SERVER FILTER BY FIRST KEY ONLY  |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] |
> +--+
> 11 rows selected (0.025 seconds)
>
>
> *Here is explain plan with default join:*
>
> explain SELECT   ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP,
> CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP
> from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as
> mct, ID as tid from "ldmns:exDocStoreb" where ID like '
> 42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group
> by ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like
> '42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' ;
> +--+
> |   PLAN   |
> +--+
> | CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] |
> | SERVER FILTER BY ID LIKE '42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7c3bf%' |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> | CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:indx_exdocb
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc0 |
> | SERVER FILTER BY FIRST KEY ONLY |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT,
> TMP.TID)) |
> +--+
> 8 rows selected (0.033 seconds)
>
>
> Looking forward to hearing from you.
>
> Best regards,
> Sumit
>
>

Re: Hash join confusion

2016-10-04 Thread Sumit Nigam
Hi Maryann,
Here are the 2 DDLs (for data and index tables):
CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT NOT 
NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM VARCHAR(32), 
BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32), PREV_TIMESTAMP BIGINT, 
SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES VARCHAR, BULK_PUBLISH_UUID  
VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC VARCHAR, PREV_EXDOC VARCHAR CONSTRAINT 
PK PRIMARY KEY(CURRENT_TIMESTAMP, ID)) COMPRESSION = 'SNAPPY', BLOCKCACHE =  
false, SALT_BUCKETS = 36

CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID) 
INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)


Here is the upsert query for this table:
UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID, 
CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP, 
OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC, BINARY_PREV_EXDOC) 
VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)

Here is explain plan of a SELECT with merge sort:
explain   select  /* +USE_SORT_MERGE_JOIN*/  ID, CURR_EXDOC, BINARY_CURR_EXDOC, 
CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, 
PREV_TIMESTAMP from "ldmns:exDocStoreb"  as a inner join (select 
max(CURRENT_TIMESTAMP) as mct, ID as tid from "ldmns:exDocStoreb" where ID like 
'006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf%' group by 
ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where id like 
'006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf%' 
;+--+|                   PLAN           
        |+--+| SORT-MERGE-JOIN (INNER) 
TABLES           ||     CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER 
ldmns:exDocStoreb [0] ||         SERVER FILTER BY ID LIKE 
'006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf%' ||         
SERVER SORTED BY [A.ID, A.CURRENT_TIMESTAMP] ||     CLIENT MERGE SORT           
         || AND (SKIP MERGE)                         ||     CLIENT 36-CHUNK 
PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb 
[0,'006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf'] - 
[0,'006389 ||         SERVER FILTER BY FIRST KEY ONLY  ||         SERVER 
AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] ||     CLIENT MERGE SORT         
           ||     CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] 
|+--+11 rows selected (0.025 seconds)

Here is explain plan with default join:
explain SELECT   ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP, 
CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP 
from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as mct, 
ID as tid from "ldmns:exDocStoreb" where ID like 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' 
;+--+|                   PLAN           
        |+--+| CLIENT 3-CHUNK PARALLEL 
3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] ||     SERVER FILTER BY ID LIKE 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' ||     
PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) ||         CLIENT 3-CHUNK PARALLEL 
3-WAY RANGE SCAN OVER ldmns:indx_exdocb 
[0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] - 
[0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc0 ||             SERVER FILTER BY FIRST 
KEY ONLY ||             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] 
||         CLIENT MERGE SORT                ||     DYNAMIC SERVER FILTER BY 
(A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT, TMP.TID)) 
|+--+8 rows selected (0.033 seconds)

Looking forward to hearing from you.
Best regards,Sumit
  From: Sumit Nigam <sumit_o...@yahoo.com>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org> 
 Sent: Wednesday, October 5, 2016 12:13 AM
 Subject: Re: Hash join confusion
   
Thanks Maryann.
I will share the details in a few hours.
Under heavy load scenario, the default hash join failed with time-out (and 
memory issue), so I switched to sort-merge. But sort-merge is missing data 
randomly. So, as of now I am not sure what is the issue with sort-merge join.
Hash join does not miss any data but has the issue of not fitting in memory 
(the actual issue with which I started this thread).
Thanks again!Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com>; "user@phoenix.apache.org" 
<user@phoenix.apache.org&

Re: Hash join confusion

2016-10-04 Thread Sumit Nigam
Thanks Maryann.
I will share the details in a few hours.
Under heavy load scenario, the default hash join failed with time-out (and 
memory issue), so I switched to sort-merge. But sort-merge is missing data 
randomly. So, as of now I am not sure what is the issue with sort-merge join.
Hash join does not miss any data but has the issue of not fitting in memory 
(the actual issue with which I started this thread).
Thanks again!Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com>; "user@phoenix.apache.org" 
<user@phoenix.apache.org> 
 Sent: Tuesday, October 4, 2016 10:04 PM
 Subject: Re: Hash join confusion
   
Hi Sumit,

Thank you for the update! Would you mind sharing the queries and their plans, 
as well as the DDL for both the data tables and the index?

And just to confirm, you are saying hash joins are working, is it with changes 
to the config or without?

Thanks,
Maryann
On Tue, Oct 4, 2016 at 9:17 AM Sumit Nigam <sumit_o...@yahoo.com> wrote:

Thank you Maryann.
>From the time I have moved to sort-merge join, my use cases have stopped 
>working. However, if I remove the hint (and fall back to hash), then they all 
>work. I am on phoenix 4.6/ hbase 1.1
I thought just changing the join algorithm would be enough. I would assume that 
changing the hash join to sort-merge join would not alter the query results, 
right? Do I need to re-write my query?
I am using global index.
Thanks,Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com> 
Cc: "user@phoenix.apache.org" <user@phoenix.apache.org>
 Sent: Sunday, October 2, 2016 5:30 AM
 Subject: Re: Hash join confusion
  
So if either or both sides of a sort-merge-join will have to be sorted simply 
depends on whether this side is already ordered on the join key.
So far we don't have any documentation specifically for explain plan yet, but 
the Phoenix website does have some examples for different types of queries or 
functionalities, including join queries.

Thanks,Maryann
On Wed, Sep 28, 2016 at 10:52 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Thanks Maryann.
Yes let me switch to merge sort join because the other query uses lots more 
columns. Also, if I just change the hint to use merge sort would that be enough 
or I need to sort both the driving query and subquery with same order by for 
merge sort?
As an aside, is there a document to interpret explain plan?
Thanks,Sumit
  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com> 
Cc: "user@phoenix.apache.org" <user@phoenix.apache.org>
 Sent: Thursday, September 29, 2016 11:03 AM
 Subject: Re: Hash join confusion
   
Thank you Sumit, for trying this out! So right now it's very clear that the 
table to be cached IS too big so there should be no point of using hash join in 
this case. Is the other table much smaller, or it is about the same size or 
even bigger? If it's considerably smaller you can probably rewrite your query 
to do the join the other way, otherwise let's just stick to sort-merge join.

Thanks,Maryann
On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Thank you Maryann.
I am not using multi-tenancy for these tables. Increasing phoenix. 
coprocessor.maxServerCacheTime ToLiveMs and the corresponding cache size config 
just delayed the error. 
I have also started seeing some memory problem -
Caused by: org.apache.phoenix.memory.Insu fficientMemoryException: Requested 
memory of 22871932 bytes could not be allocated from remaining memory of 
776776654 bytes from global pool of 778469376 bytes after waiting for 1ms.
at org.apache.phoenix.memory. GlobalMemoryManager. allocateBytes( 
GlobalMemoryManager.java:78)
at org.apache.phoenix.memory. GlobalMemoryManager.access$ 
300(GlobalMemoryManager.java: 30)
at org.apache.phoenix.memory. GlobalMemoryManager$ 
GlobalMemoryChunk.resize( GlobalMemoryManager.java:139)
What I am having trouble with is, that the total size of csv produced by 
sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So, when 
this result gets sent across to all region servers to perform the server side 
join, not sure why a memory issue should show up (or a time out occur). Any 
insights?

These tables are salted. Not sure if it is https://issues.apache.org/ 
jira/browse/PHOENIX-2900  issue. 
Switching to sort merge join helped. But not sure if that is the right solution 
going forward.
Thanks again!Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam 
<sumit_o...@yahoo.com> 
 Sent: Wednesday, September 28, 2016 11:36 PM
 Subject: Re: Hash join confusion
   
Yes, Sumit, the sub-query will get cached in hash join. Are you using 
multi-tenancy for these tables? If yes, you might want to checkou

Re: Hash join confusion

2016-10-04 Thread Sumit Nigam
Thank you Maryann.
>From the time I have moved to sort-merge join, my use cases have stopped 
>working. However, if I remove the hint (and fall back to hash), then they all 
>work. I am on phoenix 4.6/ hbase 1.1
I thought just changing the join algorithm would be enough. I would assume that 
changing the hash join to sort-merge join would not alter the query results, 
right? Do I need to re-write my query?
I am using global index.
Thanks,Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com> 
Cc: "user@phoenix.apache.org" <user@phoenix.apache.org>
 Sent: Sunday, October 2, 2016 5:30 AM
 Subject: Re: Hash join confusion
   
So if either or both sides of a sort-merge-join will have to be sorted simply 
depends on whether this side is already ordered on the join key.
So far we don't have any documentation specifically for explain plan yet, but 
the Phoenix website does have some examples for different types of queries or 
functionalities, including join queries.

Thanks,Maryann
On Wed, Sep 28, 2016 at 10:52 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Thanks Maryann.
Yes let me switch to merge sort join because the other query uses lots more 
columns. Also, if I just change the hint to use merge sort would that be enough 
or I need to sort both the driving query and subquery with same order by for 
merge sort?
As an aside, is there a document to interpret explain plan?
Thanks,Sumit
  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com> 
Cc: "user@phoenix.apache.org" <user@phoenix.apache.org>
 Sent: Thursday, September 29, 2016 11:03 AM
 Subject: Re: Hash join confusion
   
Thank you Sumit, for trying this out! So right now it's very clear that the 
table to be cached IS too big so there should be no point of using hash join in 
this case. Is the other table much smaller, or it is about the same size or 
even bigger? If it's considerably smaller you can probably rewrite your query 
to do the join the other way, otherwise let's just stick to sort-merge join.

Thanks,Maryann
On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Thank you Maryann.
I am not using multi-tenancy for these tables. Increasing phoenix. 
coprocessor.maxServerCacheTime ToLiveMs and the corresponding cache size config 
just delayed the error. 
I have also started seeing some memory problem -
Caused by: org.apache.phoenix.memory.Insu fficientMemoryException: Requested 
memory of 22871932 bytes could not be allocated from remaining memory of 
776776654 bytes from global pool of 778469376 bytes after waiting for 1ms.
at org.apache.phoenix.memory. GlobalMemoryManager. allocateBytes( 
GlobalMemoryManager.java:78)
at org.apache.phoenix.memory. GlobalMemoryManager.access$ 
300(GlobalMemoryManager.java: 30)
at org.apache.phoenix.memory. GlobalMemoryManager$ 
GlobalMemoryChunk.resize( GlobalMemoryManager.java:139)
What I am having trouble with is, that the total size of csv produced by 
sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So, when 
this result gets sent across to all region servers to perform the server side 
join, not sure why a memory issue should show up (or a time out occur). Any 
insights?

These tables are salted. Not sure if it is https://issues.apache.org/ 
jira/browse/PHOENIX-2900  issue. 
Switching to sort merge join helped. But not sure if that is the right solution 
going forward.
Thanks again!Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam 
<sumit_o...@yahoo.com> 
 Sent: Wednesday, September 28, 2016 11:36 PM
 Subject: Re: Hash join confusion
   
Yes, Sumit, the sub-query will get cached in hash join. Are you using 
multi-tenancy for these tables? If yes, you might want to checkout Phoenix 4.7 
or 4.8, since a related bug fix got in the 4.7 release. https://issues. 
apache.org/jira/browse/ PHOENIX-2381?jql=project%20% 
3D%20PHOENIX%20AND%20text%20~% 20%22hash%20cache%20id%22
Otherwise I think it's the hash cache timeout issue, in which case changing 
phoenix.coprocessor.m axServerCacheTimeToLiveMs migh t be helpful.

Thanks,Maryann
On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Hi,
Is there any document which can help me understand explain plan output in 
detail? Or, which piece of code should I look at, to get an idea?
Here is explain plan for inner join query below. Can anyone help in explaining 
it to me? Like, as per the plan which table is being cached, etc.?Here, 
indx_exdocb is index table (on ID) and exDocStoreb is main table with rowkey as 
(current_timestamp, ID).
+- -+|                   PLAN           
        |+- -+| CLIENT 36-CHUNK 
PARALLEL 36-WAY FULL SCAN OVER exDocSto

Re: Hash join confusion

2016-09-28 Thread Sumit Nigam
Thanks Maryann.
Yes let me switch to merge sort join because the other query uses lots more 
columns. Also, if I just change the hint to use merge sort would that be enough 
or I need to sort both the driving query and subquery with same order by for 
merge sort?
As an aside, is there a document to interpret explain plan?
Thanks,Sumit
  From: Maryann Xue <maryann@gmail.com>
 To: Sumit Nigam <sumit_o...@yahoo.com> 
Cc: "user@phoenix.apache.org" <user@phoenix.apache.org>
 Sent: Thursday, September 29, 2016 11:03 AM
 Subject: Re: Hash join confusion
   
Thank you Sumit, for trying this out! So right now it's very clear that the 
table to be cached IS too big so there should be no point of using hash join in 
this case. Is the other table much smaller, or it is about the same size or 
even bigger? If it's considerably smaller you can probably rewrite your query 
to do the join the other way, otherwise let's just stick to sort-merge join.

Thanks,Maryann
On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Thank you Maryann.
I am not using multi-tenancy for these tables. Increasing phoenix. 
coprocessor.maxServerCacheTime ToLiveMs and the corresponding cache size config 
just delayed the error. 
I have also started seeing some memory problem -
Caused by: org.apache.phoenix.memory.Insu fficientMemoryException: Requested 
memory of 22871932 bytes could not be allocated from remaining memory of 
776776654 bytes from global pool of 778469376 bytes after waiting for 1ms.
at org.apache.phoenix.memory. GlobalMemoryManager. allocateBytes( 
GlobalMemoryManager.java:78)
at org.apache.phoenix.memory. GlobalMemoryManager.access$ 
300(GlobalMemoryManager.java: 30)
at org.apache.phoenix.memory. GlobalMemoryManager$ 
GlobalMemoryChunk.resize( GlobalMemoryManager.java:139)
What I am having trouble with is, that the total size of csv produced by 
sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So, when 
this result gets sent across to all region servers to perform the server side 
join, not sure why a memory issue should show up (or a time out occur). Any 
insights?

These tables are salted. Not sure if it is https://issues.apache.org/ 
jira/browse/PHOENIX-2900  issue. 
Switching to sort merge join helped. But not sure if that is the right solution 
going forward.
Thanks again!Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam 
<sumit_o...@yahoo.com> 
 Sent: Wednesday, September 28, 2016 11:36 PM
 Subject: Re: Hash join confusion
   
Yes, Sumit, the sub-query will get cached in hash join. Are you using 
multi-tenancy for these tables? If yes, you might want to checkout Phoenix 4.7 
or 4.8, since a related bug fix got in the 4.7 release. https://issues. 
apache.org/jira/browse/ PHOENIX-2381?jql=project%20% 
3D%20PHOENIX%20AND%20text%20~% 20%22hash%20cache%20id%22
Otherwise I think it's the hash cache timeout issue, in which case changing 
phoenix.coprocessor.m axServerCacheTimeToLiveMs migh t be helpful.

Thanks,Maryann
On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Hi,
Is there any document which can help me understand explain plan output in 
detail? Or, which piece of code should I look at, to get an idea?
Here is explain plan for inner join query below. Can anyone help in explaining 
it to me? Like, as per the plan which table is being cached, etc.?Here, 
indx_exdocb is index table (on ID) and exDocStoreb is main table with rowkey as 
(current_timestamp, ID).
+- -+|                   PLAN           
        |+- -+| CLIENT 36-CHUNK 
PARALLEL 36-WAY FULL SCAN OVER exDocStoreb ||     PARALLEL INNER-JOIN TABLE 0 
(SKIP MERGE) ||         CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER 
indx_exdocb [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 
c3bf'] - [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 
c3bg' ||             SERVER FILTER BY FIRST KEY ONLY ||             SERVER 
AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] ||         CLIENT MERGE SORT     
           ||     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN 
((TMP.MCT, TMP.TID)) |+- -+
Also, is there a way to turn ON more verbose explain plan? Like, seeing number 
of bytes, rows that each step results in?
Thanks,Sumit

  From: Sumit Nigam <sumit_o...@yahoo.com>
 To: Users Mail List Phoenix <user@phoenix.apache.org> 
 Sent: Tuesday, September 27, 2016 9:17 PM
 Subject: Hash join confusion
  
Hi,
I am using hbase 1.1 with phoenix 4.6. 
I have a table with row key as (current_timestamp, id) which is salted and 
index on (id). This table has ~3 million records.
I have a query like given below. 
SELECT  ID, CURRENT_TIMESTAMP,  fro

Re: Hash join confusion

2016-09-28 Thread Maryann Xue
Thank you Sumit, for trying this out! So right now it's very clear that the
table to be cached IS too big so there should be no point of using hash
join in this case. Is the other table much smaller, or it is about the same
size or even bigger? If it's considerably smaller you can probably rewrite
your query to do the join the other way, otherwise let's just stick to
sort-merge join.


Thanks,
Maryann

On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Thank you Maryann.
>
> I am not using multi-tenancy for these tables. Increasing phoenix.
> coprocessor.maxServerCacheTimeToLiveMs and the corresponding cache size
> config just delayed the error.
>
> I have also started seeing some memory problem -
>
> Caused by: org.apache.phoenix.memory.*InsufficientMemoryException*: Requested 
> memory of 22871932 bytes could not be allocated from remaining memory of 
> 776776654 bytes from global pool of 778469376 bytes after waiting for 1ms.
>   at 
> org.apache.phoenix.memory.GlobalMemoryManager.allocateBytes(GlobalMemoryManager.java:78)
>   at 
> org.apache.phoenix.memory.GlobalMemoryManager.access$300(GlobalMemoryManager.java:30)
>   at 
> org.apache.phoenix.memory.GlobalMemoryManager$GlobalMemoryChunk.resize(GlobalMemoryManager.java:139)
>
>
> What I am having trouble with is, that the total size of csv produced by
> sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So,
> when this result gets sent across to all region servers to perform the
> server side join, not sure why a memory issue should show up (or a time out
> occur). Any insights?
>
> These tables are salted. Not sure if it is https://issues.apache.org/
> jira/browse/PHOENIX-2900 issue.
>
> Switching to sort merge join helped. But not sure if that is the right
> solution going forward.
>
> Thanks again!
> Sumit
>
>
> --
> *From:* Maryann Xue <maryann@gmail.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam <
> sumit_o...@yahoo.com>
> *Sent:* Wednesday, September 28, 2016 11:36 PM
> *Subject:* Re: Hash join confusion
>
> Yes, Sumit, the sub-query will get cached in hash join. Are you using
> multi-tenancy for these tables? If yes, you might want to checkout Phoenix
> 4.7 or 4.8, since a related bug fix got in the 4.7 release.
> https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%
> 3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22
>
> Otherwise I think it's the hash cache timeout issue, in which case
> changing phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.
>
>
> Thanks,
> Maryann
>
> On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Hi,
>
> Is there any document which can help me understand explain plan output in
> detail? Or, which piece of code should I look at, to get an idea?
>
> Here is explain plan for inner join query below. Can anyone help in
> explaining it to me? Like, as per the plan which table is being cached,
> etc.?
> Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main
> table with rowkey as (current_timestamp, ID).
>
> +- -+
> |   PLAN   |
> +- -+
> | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb*
> [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf'] -
> [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bg' |
> | SERVER FILTER BY FIRST KEY ONLY |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID <http://a.id/>)
> IN ((TMP.MCT, TMP.TID)) |
> +- -+
>
> Also, is there a way to turn ON more verbose explain plan? Like, seeing
> number of bytes, rows that each step results in?
>
> Thanks,
> Sumit
>
> --
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
> *Sent:* Tuesday, September 27, 2016 9:17 PM
> *Subject:* Hash join confusion
>
> Hi,
>
> I am using hbase 1.1 with phoenix 4.6.
>
> I have a table with row key as (current_timestamp, id) which is salted and
> index on (id). This table has ~3 million records.
>
> I have a query like given below.
>
> SELECT  ID, CURRENT_TIMESTAMP,  from TBL
> 

Re: Hash join confusion

2016-09-28 Thread Sumit Nigam
Thank you Maryann.
I am not using multi-tenancy for these tables. Increasing 
phoenix.coprocessor.maxServerCacheTimeToLiveMs and the corresponding cache size 
config just delayed the error. 
I have also started seeing some memory problem -
Caused by: org.apache.phoenix.memory.InsufficientMemoryException: Requested 
memory of 22871932 bytes could not be allocated from remaining memory of 
776776654 bytes from global pool of 778469376 bytes after waiting for 1ms.
at 
org.apache.phoenix.memory.GlobalMemoryManager.allocateBytes(GlobalMemoryManager.java:78)
at 
org.apache.phoenix.memory.GlobalMemoryManager.access$300(GlobalMemoryManager.java:30)
at 
org.apache.phoenix.memory.GlobalMemoryManager$GlobalMemoryChunk.resize(GlobalMemoryManager.java:139)
What I am having trouble with is, that the total size of csv produced by 
sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So, when 
this result gets sent across to all region servers to perform the server side 
join, not sure why a memory issue should show up (or a time out occur). Any 
insights?

These tables are salted. Not sure if it is 
https://issues.apache.org/jira/browse/PHOENIX-2900 issue. 
Switching to sort merge join helped. But not sure if that is the right solution 
going forward.
Thanks again!Sumit

  From: Maryann Xue <maryann@gmail.com>
 To: "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam 
<sumit_o...@yahoo.com> 
 Sent: Wednesday, September 28, 2016 11:36 PM
 Subject: Re: Hash join confusion
   
Yes, Sumit, the sub-query will get cached in hash join. Are you using 
multi-tenancy for these tables? If yes, you might want to checkout Phoenix 4.7 
or 4.8, since a related bug fix got in the 4.7 release. 
https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22
Otherwise I think it's the hash cache timeout issue, in which case changing 
phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.

Thanks,Maryann
On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

Hi,
Is there any document which can help me understand explain plan output in 
detail? Or, which piece of code should I look at, to get an idea?
Here is explain plan for inner join query below. Can anyone help in explaining 
it to me? Like, as per the plan which table is being cached, etc.?Here, 
indx_exdocb is index table (on ID) and exDocStoreb is main table with rowkey as 
(current_timestamp, ID).
+- -+|                   PLAN           
        |+- -+| CLIENT 36-CHUNK 
PARALLEL 36-WAY FULL SCAN OVER exDocStoreb ||     PARALLEL INNER-JOIN TABLE 0 
(SKIP MERGE) ||         CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER 
indx_exdocb [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 
c3bf'] - [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 
c3bg' ||             SERVER FILTER BY FIRST KEY ONLY ||             SERVER 
AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] ||         CLIENT MERGE SORT     
           ||     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN 
((TMP.MCT, TMP.TID)) |+- -+
Also, is there a way to turn ON more verbose explain plan? Like, seeing number 
of bytes, rows that each step results in?
Thanks,Sumit

  From: Sumit Nigam <sumit_o...@yahoo.com>
 To: Users Mail List Phoenix <user@phoenix.apache.org> 
 Sent: Tuesday, September 27, 2016 9:17 PM
 Subject: Hash join confusion
  
Hi,
I am using hbase 1.1 with phoenix 4.6. 
I have a table with row key as (current_timestamp, id) which is salted and 
index on (id). This table has ~3 million records.
I have a query like given below. 
SELECT  ID, CURRENT_TIMESTAMP,  from TBL                    
    as a inner join (                                select 
max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like ' 
42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group by 
ID) as tmp                        on a.ID=tmp.tid and 
a.CURRENT_TIMESTAMP=tmp.mct


The query hangs for long and finally fails with a timeout. I have 12 region 
servers each with 5GB heap and also the total records satisfying the above 
query is 62K whose CSV dump is ~10MB only. 
DoNotRetryIOException: Could not find hash cache for join Id: Ӧ�8�D�. The cache 
might have expired and have been removed

and - 
Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
execution.
at org.apache.phoenix.execute. HashJoinPlan.iterator( 
HashJoinPlan.java:175)
at com.infa.products.ldm. ingestion.server.java.hadoop. impl. 
FixPhoenixIngestInputFormat. getQueryPlan( FixPhoenixIngestInputFormat. java:94)
... 22 more
and
Caused by: java.sql.SQLException:
java.util.concurrent.Timeo utException    at org.apache.phoenix.cache. 
S

Re: Hash join confusion

2016-09-28 Thread Maryann Xue
Yes, Sumit, the sub-query will get cached in hash join. Are you using
multi-tenancy for these tables? If yes, you might want to checkout Phoenix
4.7 or 4.8, since a related bug fix got in the 4.7 release.
https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22

Otherwise I think it's the hash cache timeout issue, in which case changing
phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.


Thanks,
Maryann

On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Hi,
>
> Is there any document which can help me understand explain plan output in
> detail? Or, which piece of code should I look at, to get an idea?
>
> Here is explain plan for inner join query below. Can anyone help in
> explaining it to me? Like, as per the plan which table is being cached,
> etc.?
> Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main
> table with rowkey as (current_timestamp, ID).
>
> +--+
> |   PLAN   |
> +--+
> | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* |
> | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb*
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bg' |
> | SERVER FILTER BY FIRST KEY ONLY |
> | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> | CLIENT MERGE SORT|
> | DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT,
> TMP.TID)) |
> +--+
>
> Also, is there a way to turn ON more verbose explain plan? Like, seeing
> number of bytes, rows that each step results in?
>
> Thanks,
> Sumit
>
> --
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
> *Sent:* Tuesday, September 27, 2016 9:17 PM
> *Subject:* Hash join confusion
>
> Hi,
>
> I am using hbase 1.1 with phoenix 4.6.
>
> I have a table with row key as (current_timestamp, id) which is salted and
> index on (id). This table has ~3 million records.
>
> I have a query like given below.
>
> SELECT  ID, CURRENT_TIMESTAMP,  from TBL
>as a inner join (
> select max(CURRENT_TIMESTAMP) as mct, ID
> as tid from TBL where ID like '42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by ID) as tmp
>on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct
>
>
> The query hangs for long and finally fails with a timeout. I have 12
> region servers each with 5GB heap and also the total records satisfying the
> above query is 62K whose CSV dump is ~10MB only.
>
> DoNotRetryIOException: Could not find *hash cache for join Id*: Ӧ�8�D�.
> The cache might have expired and have been removed
>
> and -
>
> Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
> execution.
>   at 
> org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175)
>   at 
> com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94)
>   ... 22 more
>
>
> andCaused by: java.sql.SQLException:
> java.util.concurrent.*TimeoutException*
>
> at 
> org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264)
>
>
> I can try playing around with parameters such as 
> phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to 
> sort_merge_join actually helped.
>
> But my question is as per Joins | Apache Phoenix 
> <http://phoenix.apache.org/joins.html> in a case such as *lhs* INNER JOIN 
> *rhs, *it is *rhs* which will be built as hash table in server cache. So, in 
> the above query I assume this gets cached?
>
>  select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
> '42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
> ID) as tmp
>
> Thanks,
> Sumit
>
>
>
>


Re: Hash join confusion

2016-09-27 Thread Sumit Nigam
Hi,
Is there any document which can help me understand explain plan output in 
detail? Or, which piece of code should I look at, to get an idea?
Here is explain plan for inner join query below. Can anyone help in explaining 
it to me? Like, as per the plan which table is being cached, etc.?Here, 
indx_exdocb is index table (on ID) and exDocStoreb is main table with rowkey as 
(current_timestamp, ID).
+--+|                   PLAN            
       |+--+| CLIENT 36-CHUNK PARALLEL 
36-WAY FULL SCAN OVER exDocStoreb ||     PARALLEL INNER-JOIN TABLE 0 (SKIP 
MERGE) ||         CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER indx_exdocb 
[0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] - 
[0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bg' ||        
     SERVER FILTER BY FIRST KEY ONLY ||             SERVER AGGREGATE INTO 
ORDERED DISTINCT ROWS BY ["ID"] ||         CLIENT MERGE SORT                ||  
   DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT, TMP.TID)) 
|+--+
Also, is there a way to turn ON more verbose explain plan? Like, seeing number 
of bytes, rows that each step results in?
Thanks,Sumit

  From: Sumit Nigam <sumit_o...@yahoo.com>
 To: Users Mail List Phoenix <user@phoenix.apache.org> 
 Sent: Tuesday, September 27, 2016 9:17 PM
 Subject: Hash join confusion
   
Hi,
I am using hbase 1.1 with phoenix 4.6. 
I have a table with row key as (current_timestamp, id) which is salted and 
index on (id). This table has ~3 million records.
I have a query like given below. 
SELECT  ID, CURRENT_TIMESTAMP,  from TBL                    
    as a inner join (                                select 
max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
ID) as tmp                        on a.ID=tmp.tid and 
a.CURRENT_TIMESTAMP=tmp.mct


The query hangs for long and finally fails with a timeout. I have 12 region 
servers each with 5GB heap and also the total records satisfying the above 
query is 62K whose CSV dump is ~10MB only. 
DoNotRetryIOException: Could not find hash cache for join Id: Ӧ�8�D�. The cache 
might have expired and have been removed

and - 
Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
execution.
at 
org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175)
at 
com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94)
... 22 more
and
Caused by: java.sql.SQLException:
java.util.concurrent.TimeoutException    at 
org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264)

I can try playing around with parameters such as 
phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to sort_merge_join 
actually helped.
But my question is as per Joins | Apache Phoenix in a case such as lhs INNER 
JOIN rhs, it is rhs which will be built as hash table in server cache. So, in 
the above query I assume this gets cached?
 select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
ID) as tmp 
Thanks,
Sumit

   

Hash join confusion

2016-09-27 Thread Sumit Nigam
Hi,
I am using hbase 1.1 with phoenix 4.6. 
I have a table with row key as (current_timestamp, id) which is salted and 
index on (id). This table has ~3 million records.
I have a query like given below. 
SELECT  ID, CURRENT_TIMESTAMP,  from TBL                    
    as a inner join (                                select 
max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
ID) as tmp                        on a.ID=tmp.tid and 
a.CURRENT_TIMESTAMP=tmp.mct


The query hangs for long and finally fails with a timeout. I have 12 region 
servers each with 5GB heap and also the total records satisfying the above 
query is 62K whose CSV dump is ~10MB only. 
DoNotRetryIOException: Could not find hash cache for join Id: Ӧ�8�D�. The cache 
might have expired and have been removed

and - 
Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
execution.
at 
org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175)
at 
com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94)
... 22 more
and
Caused by: java.sql.SQLException:
java.util.concurrent.TimeoutException    at 
org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264)

I can try playing around with parameters such as 
phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to sort_merge_join 
actually helped.
But my question is as per Joins | Apache Phoenix in a case such as lhs INNER 
JOIN rhs, it is rhs which will be built as hash table in server cache. So, in 
the above query I assume this gets cached?
 select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
ID) as tmp 
Thanks,
Sumit