Re: Hash join confusion

Maryann Xue Wed, 28 Sep 2016 11:08:06 -0700

Yes, Sumit, the sub-query will get cached in hash join. Are you using
multi-tenancy for these tables? If yes, you might want to checkout Phoenix
4.7 or 4.8, since a related bug fix got in the 4.7 release.
https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22


Otherwise I think it's the hash cache timeout issue, in which case changing
phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful.


Thanks,
Maryann

On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote:

> Hi,
>
> Is there any document which can help me understand explain plan output in
> detail? Or, which piece of code should I look at, to get an idea?
>
> Here is explain plan for inner join query below. Can anyone help in
> explaining it to me? Like, as per the plan which table is being cached,
> etc.?
> Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main
> table with rowkey as (current_timestamp, ID).
>
> +------------------------------------------+
> |                   PLAN                   |
> +------------------------------------------+
> | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* |
> |     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) |
> |         CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb*
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] -
> [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bg' |
> |             SERVER FILTER BY FIRST KEY ONLY |
> |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] |
> |         CLIENT MERGE SORT                |
> |     DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT,
> TMP.TID)) |
> +------------------------------------------+
>
> Also, is there a way to turn ON more verbose explain plan? Like, seeing
> number of bytes, rows that each step results in?
>
> Thanks,
> Sumit
>
> ------------------------------
> *From:* Sumit Nigam <sumit_o...@yahoo.com>
> *To:* Users Mail List Phoenix <user@phoenix.apache.org>
> *Sent:* Tuesday, September 27, 2016 9:17 PM
> *Subject:* Hash join confusion
>
> Hi,
>
> I am using hbase 1.1 with phoenix 4.6.
>
> I have a table with row key as (current_timestamp, id) which is salted and
> index on (id). This table has ~3 million records.
>
> I have a query like given below.
>
> SELECT  ID, CURRENT_TIMESTAMP, <SOME OTHER COLUMNS> from TBL
>                        as a inner join (
>                                 select max(CURRENT_TIMESTAMP) as mct, ID
> as tid from TBL where ID like '42ecf4abd4bd7e7606025dc8eee3de
> 6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by ID) as tmp
>                        on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct
>
>
> The query hangs for long and finally fails with a timeout. I have 12
> region servers each with 5GB heap and also the total records satisfying the
> above query is 62K whose CSV dump is ~10MB only.
>
> DoNotRetryIOException: Could not find *hash cache for join Id*: Ӧ�8�D�.
> The cache might have expired and have been removed
>
> and -
>
> Caused by: java.sql.SQLException: Encountered exception in sub plan [0] 
> execution.
>       at 
> org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175)
>       at 
> com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94)
>       ... 22 more
>
>
> andCaused by: java.sql.SQLException:    
> java.util.concurrent.*TimeoutException*
>
>         at 
> org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264)
>
>
> I can try playing around with parameters such as 
> phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to 
> sort_merge_join actually helped.
>
> But my question is as per Joins | Apache Phoenix 
> <http://phoenix.apache.org/joins.html> in a case such as *lhs* INNER JOIN 
> *rhs, *it is *rhs* which will be built as hash table in server cache. So, in 
> the above query I assume this gets cached?
>
>  select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like 
> '42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by 
> ID) as tmp
>
> Thanks,
> Sumit
>
>
>
>

Re: Hash join confusion

Reply via email to