Re: Hash join confusion

Maryann Xue Tue, 04 Oct 2016 22:59:08 -0700

Not sure if it's related, coz your DDL does not have DESC columns, but we
do have a sort-merge-join bug fix in 4.8.0:
https://issues.apache.org/jira/browse/PHOENIX-2894.


Otherwise could you please just file a JIRA and assign to me? Thanks a lot!


Thanks,
Maryann

On Tue, Oct 4, 2016 at 8:24 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Hi Maryann,
>
> *Here are the 2 DDLs (for data and index tables)*:
>
> CREATE TABLE IF NOT EXISTS "ldmns:exDocStoreb" (CURRENT_TIMESTAMP BIGINT
> NOT NULL, ID VARCHAR(96), BINARY_CURR_EXDOC VARBINARY, CURR_CHECKSUM
> VARCHAR(32), BINARY_PREV_EXDOC VARBINARY, PREV_CHECKSUM VARCHAR(32),
> PREV_TIMESTAMP BIGINT, SUMMARY VARCHAR, OBJ_SUMMARY VARCHAR, PARAM_SAMPLES
> VARCHAR, BULK_PUBLISH_UUID  VARCHAR, TOTAL_FACTS INTEGER, CURR_EXDOC
> VARCHAR, PREV_EXDOC VARCHAR *CONSTRAINT PK PRIMARY KEY(CURRENT_TIMESTAMP,
> ID)*) COMPRESSION = 'SNAPPY', BLOCKCACHE =  false, *SALT_BUCKETS = 36*
>
>
> CREATE INDEX IF NOT EXISTS "ldmns:indx_exdocb" ON "ldmns:exDocStoreb"(ID)
> INCLUDE (SUMMARY, OBJ_SUMMARY, PARAM_SAMPLES, BULK_PUBLISH_UUID)
>
>
>
> *Here is the upsert query for this table*:
>
> UPSERT INTO "ldmns:exDocStoreb" (CURRENT_TIMESTAMP, BULK_PUBLISH_UUID, ID,
> CURR_CHECKSUM, CURR_EXDOC, SUMMARY, PREV_EXDOC, PREV_CHECKSUM,
> PREV_TIMESTAMP, OBJ_SUMMARY, PARAM_SAMPLES, TOTAL_FACTS, BINARY_CURR_EXDOC,
> BINARY_PREV_EXDOC) VALUES (TO_NUMBER(CURRENT_TIME()), ?, ?, ?, ?, ?, ?, ?,
> ?, ?, ?, ?, ?, ?)
>
>
> *Here is explain plan of a SELECT with merge sort:*
>
> explain   select  */* +USE_SORT_MERGE_JOIN*/ * ID, CURR_EXDOC,
> BINARY_CURR_EXDOC, CURRENT_TIMESTAMP, CURR_CHECKSUM, PREV_EXDOC,
> BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP from "ldmns:exDocStoreb"
>  as a inner join (select max(CURRENT_TIMESTAMP) as mct, ID as tid from
> "ldmns:exDocStoreb" where ID like '006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7c3bf%' group by ID) as tmp on a.ID=tmp.tid
> and a.CURRENT_TIMESTAMP=tmp.mct where id like '
> 006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf%' ;
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | SORT-MERGE-JOIN (INNER) TABLES           |
> |     CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:exDocStoreb
> [0] |
> |         SERVER FILTER BY ID LIKE '006389a6b10667f39bdbbdafdc4611
> e03cc04418cbc2619ddc01f54d88d7c3bf%' |
> |         SERVER SORTED BY [A.ID, A.CURRENT_TIMESTAMP] |
> |     CLIENT MERGE SORT                    |
> | AND (SKIP MERGE)                         |
> |     CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER ldmns:indx_exdocb
> [0,'006389a6b10667f39bdbbdafdc4611e03cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'006389 |
> |         SERVER FILTER BY FIRST KEY ONLY  |
> |         SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> |     CLIENT MERGE SORT                    |
> |     CLIENT SORTED BY ["ID", MAX("CURRENT_TIMESTAMP")] |
> +------------------------------------------+
> 11 rows selected (0.025 seconds)
>
>
> *Here is explain plan with default join:*
>
> explain SELECT   ID, CURR_EXDOC, BINARY_CURR_EXDOC, CURRENT_TIMESTAMP,
> CURR_CHECKSUM, PREV_EXDOC, BINARY_PREV_EXDOC, PREV_CHECKSUM, PREV_TIMESTAMP
> from "ldmns:exDocStoreb" as a inner join (select max(CURRENT_TIMESTAMP) as
> mct, ID as tid from "ldmns:exDocStoreb" where ID like '
> 42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group
> by ID) as tmp on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct where ID like
> '42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' ;
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:exDocStoreb [0] |
> |     SERVER FILTER BY ID LIKE '42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7c3bf%' |
> |     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> |         CLIENT 3-CHUNK PARALLEL 3-WAY RANGE SCAN OVER ldmns:indx_exdocb
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc0 |
> |             SERVER FILTER BY FIRST KEY ONLY |
> |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> |         CLIENT MERGE SORT                |
> |     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT,
> TMP.TID)) |
> +------------------------------------------+
> 8 rows selected (0.033 seconds)
>
>
> Looking forward to hearing from you.
>
> Best regards,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Sent:* Wednesday, October 5, 2016 12:13 AM
>
> *Subject:* Re: Hash join confusion
>
> Thanks Maryann.
>
> I will share the details in a few hours.
>
> Under heavy load scenario, the default hash join failed with time-out (and
> memory issue), so I switched to sort-merge. But sort-merge is missing data
> randomly. So, as of now I am not sure what is the issue with sort-merge
> join.
>
> Hash join does not miss any data but has the issue of not fitting in
> memory (the actual issue with which I started this thread).
>
> Thanks again!
> Sumit
>
> ------------------------------
> *From:* Maryann Xue <maryann....@gmail.com>
> *To:* Sumit Nigam <sumit_o...@yahoo.com>; "user@phoenix.apache.org" <
> user@phoenix.apache.org>
> *Sent:* Tuesday, October 4, 2016 10:04 PM
> *Subject:* Re: Hash join confusion
>
> Hi Sumit,
>
> Thank you for the update! Would you mind sharing the queries and their
> plans, as well as the DDL for both the data tables and the index?
>
> And just to confirm, you are saying hash joins are working, is it with
> changes to the config or without?
>
> Thanks,
> Maryann
> On Tue, Oct 4, 2016 at 9:17 AM Sumit Nigam <sumit_o...@yahoo.com> wrote:
>
> Thank you Maryann.
>
> From the time I have moved to sort-merge join, my use cases have stopped
> working. However, if I remove the hint (and fall back to hash), then they
> all work. I am on phoenix 4.6/ hbase 1.1
> I thought just changing the join algorithm would be enough. I would
> assume that changing the hash join to sort-merge join would not alter the
> query results, right? Do I need to re-write my query?
>
> I am using global index.
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Maryann Xue <maryann....@gmail.com>
> *To:* Sumit Nigam <sumit_o...@yahoo.com>
> *Cc:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Sent:* Sunday, October 2, 2016 5:30 AM
>
> *Subject:* Re: Hash join confusion
>
> So if either or both sides of a sort-merge-join will have to be sorted
> simply depends on whether this side is already ordered on the join key.
>
> So far we don't have any documentation specifically for explain plan yet,
> but the Phoenix website does have some examples for different types of
> queries or functionalities, including join queries.
>
>
> Thanks,
> Maryann
>
> On Wed, Sep 28, 2016 at 10:52 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Thanks Maryann.
>
> Yes let me switch to merge sort join because the other query uses lots
> more columns. Also, if I just change the hint to use merge sort would that
> be enough or I need to sort both the driving query and subquery with same
> order by for merge sort?
>
> As an aside, is there a document to interpret explain plan?
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Maryann Xue <maryann....@gmail.com>
> *To:* Sumit Nigam <sumit_o...@yahoo.com>
> *Cc:* "user@phoenix.apache.org" <user@phoenix.apache.org>
> *Sent:* Thursday, September 29, 2016 11:03 AM
> *Subject:* Re: Hash join confusion
>
> Thank you Sumit, for trying this out! So right now it's very clear that
> the table to be cached IS too big so there should be no point of using hash
> join in this case. Is the other table much smaller, or it is about the same
> size or even bigger? If it's considerably smaller you can probably rewrite
> your query to do the join the other way, otherwise let's just stick to
> sort-merge join.
>
>
> Thanks,
> Maryann
>
> On Wed, Sep 28, 2016 at 10:29 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Thank you Maryann.
>
> I am not using multi-tenancy for these tables. Increasing phoenix.
> coprocessor.maxServerCacheTime ToLiveMs and the corresponding cache size
> config just delayed the error.
>
> I have also started seeing some memory problem -
>
> Caused by: org.apache.phoenix.memory.*Insu fficientMemoryException*: 
> Requested memory of 22871932 bytes could not be allocated from remaining 
> memory of 776776654 bytes from global pool of 778469376 bytes after waiting 
> for 10000ms.
>       at org.apache.phoenix.memory. GlobalMemoryManager. allocateBytes( 
> GlobalMemoryManager.java:78)
>       at org.apache.phoenix.memory. GlobalMemoryManager.access$ 
> 300(GlobalMemoryManager.java: 30)
>       at org.apache.phoenix.memory. GlobalMemoryManager$ 
> GlobalMemoryChunk.resize( GlobalMemoryManager.java:139)
>
>
> What I am having trouble with is, that the total size of csv produced by
> sub-query is only ~7Mb. I have 12 region servers with 5GB heap each. So,
> when this result gets sent across to all region servers to perform the
> server side join, not sure why a memory issue should show up (or a time out
> occur). Any insights?
>
> These tables are salted. Not sure if it is https://issues.apache.org/
> jira/browse/PHOENIX-2900
> <https://issues.apache.org/jira/browse/PHOENIX-2900>  issue.
>
> Switching to sort merge join helped. But not sure if that is the right
> solution going forward.
>
> Thanks again!
> Sumit
>
>
> ------------------------------
> *From:* Maryann Xue <maryann....@gmail.com>
> *To:* "user@phoenix.apache.org" <user@phoenix.apache.org>; Sumit Nigam <
> sumit_o...@yahoo.com>
> *Sent:* Wednesday, September 28, 2016 11:36 PM
> *Subject:* Re: Hash join confusion
>
> Yes, Sumit, the sub-query will get cached in hash join. Are you using
> multi-tenancy for these tables? If yes, you might want to checkout Phoenix
> 4.7 or 4.8, since a related bug fix got in the 4.7 release. https://issues.
> apache.org/jira/browse/ PHOENIX-2381?jql=project%20%
> 3D%20PHOENIX%20AND%20text%20~% 20%22hash%20cache%20id%22
> <https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22>
>
> Otherwise I think it's the hash cache timeout issue, in which case
> changing phoenix.coprocessor.m axServerCacheTimeToLiveMs migh t be
> helpful.
>
>
> Thanks,
> Maryann
>
> On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com>
> wrote:
>
> Hi,
>
> Is there any document which can help me understand explain plan output in
> detail? Or, which piece of code should I look at, to get an idea?
>
> Here is explain plan for inner join query below. Can anyone help in
> explaining it to me? Like, as per the plan which table is being cached,
> etc.?
> Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main
> table with rowkey as (current_timestamp, ID).
>
> +----------------------------- -------------+
> |                   PLAN                   |
> +----------------------------- -------------+
> | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* |
> |     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> |         CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb*
> [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf'] -
> [0,' 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bg' |
> |             SERVER FILTER BY FIRST KEY ONLY |
> |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> |         CLIENT MERGE SORT                |
> |     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID <http://a.id/>)
> IN ((TMP.MCT, TMP.TID)) |
> +----------------------------- -------------+
>
> Also, is there a way to turn ON more verbose explain plan? Like, seeing
> number of bytes, rows that each step results in?
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
> *Sent:* Tuesday, September 27, 2016 9:17 PM
> *Subject:* Hash join confusion
>
> Hi,
>
> I am using hbase 1.1 with phoenix 4.6.
>
> I have a table with row key as (current_timestamp, id) which is salted and
> index on (id). This table has ~3 million records.
>
> I have a query like given below.
>
> SELECT  ID, CURRENT_TIMESTAMP, <SOME OTHER COLUMNS> from TBL
>                        as a inner join (
>                                 select max(CURRENT_TIMESTAMP) as mct, ID
> as tid from TBL where ID like ' 42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group by ID) as tmp
>                        on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct
>
>
> The query hangs for long and finally fails with a timeout. I have 12
> region servers each with 5GB heap and also the total records satisfying the
> above query is 62K whose CSV dump is ~10MB only.
>
> DoNotRetryIOException: Could not find *hash cache for join Id*: Ӧ�8�D�.
> The cache might have expired and have been removed
>
> and -
>
> Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
> execution.
>       at org.apache.phoenix.execute. HashJoinPlan.iterator( 
> HashJoinPlan.java:175)
>       at com.infa.products.ldm. ingestion.server.java.hadoop. impl. 
> FixPhoenixIngestInputFormat. getQueryPlan( FixPhoenixIngestInputFormat. 
> java:94)
>       ... 22 more
>
>
> andCaused by: java.sql.SQLException:    java.util.concurrent.*Timeo 
> utException*
>
>         at org.apache.phoenix.cache. ServerCacheClient. addServerCache( 
> ServerCacheClient.java:264)
>
>
> I can try playing around with parameters such as phoenix.coprocessor.
> maxServerCacheTimeToLiveMs and switching to sort_merge_join actually helped.
>
> But my question is as per Joins | Apache Phoenix
> <http://phoenix.apache.org/joins.html> in a case such as *lhs* INNER JOIN
> *rhs, *it is *rhs* which will be built as hash table in server cache. So,
> in the above query I assume this gets cached?
>
>  select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like '
> 42ecf4abd4bd7e7606025dc8eee3de 6a3cc04418cbc2619ddc01f54d88d7 c3bf%' group
> by ID) as tmp
>
> Thanks,
> Sumit
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Hash join confusion

Reply via email to