Yes, Sumit, the sub-query will get cached in hash join. Are you using multi-tenancy for these tables? If yes, you might want to checkout Phoenix 4.7 or 4.8, since a related bug fix got in the 4.7 release. https://issues.apache.org/jira/browse/PHOENIX-2381?jql=project%20%3D%20PHOENIX%20AND%20text%20~%20%22hash%20cache%20id%22
Otherwise I think it's the hash cache timeout issue, in which case changing phoenix.coprocessor.maxServerCacheTimeToLiveMs might be helpful. Thanks, Maryann On Tue, Sep 27, 2016 at 10:02 PM, Sumit Nigam <sumit_o...@yahoo.com> wrote: > Hi, > > Is there any document which can help me understand explain plan output in > detail? Or, which piece of code should I look at, to get an idea? > > Here is explain plan for inner join query below. Can anyone help in > explaining it to me? Like, as per the plan which table is being cached, > etc.? > Here, *indx_exdocb* is index table* (on ID) *and *exDocStoreb *is main > table with rowkey as (current_timestamp, ID). > > +------------------------------------------+ > | PLAN | > +------------------------------------------+ > | CLIENT 36-CHUNK PARALLEL 36-WAY FULL SCAN OVER *exDocStoreb* | > | PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) | > | CLIENT 36-CHUNK PARALLEL 36-WAY RANGE SCAN OVER *indx_exdocb* > [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf'] - > [0,'42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bg' | > | SERVER FILTER BY FIRST KEY ONLY | > | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID"] | > | CLIENT MERGE SORT | > | DYNAMIC SERVER FILTER BY (A.CURRENT_TIMESTAMP, A.ID) IN ((TMP.MCT, > TMP.TID)) | > +------------------------------------------+ > > Also, is there a way to turn ON more verbose explain plan? Like, seeing > number of bytes, rows that each step results in? > > Thanks, > Sumit > > ------------------------------ > *From:* Sumit Nigam <sumit_o...@yahoo.com> > *To:* Users Mail List Phoenix <user@phoenix.apache.org> > *Sent:* Tuesday, September 27, 2016 9:17 PM > *Subject:* Hash join confusion > > Hi, > > I am using hbase 1.1 with phoenix 4.6. > > I have a table with row key as (current_timestamp, id) which is salted and > index on (id). This table has ~3 million records. > > I have a query like given below. > > SELECT ID, CURRENT_TIMESTAMP, <SOME OTHER COLUMNS> from TBL > as a inner join ( > select max(CURRENT_TIMESTAMP) as mct, ID > as tid from TBL where ID like '42ecf4abd4bd7e7606025dc8eee3de > 6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by ID) as tmp > on a.ID=tmp.tid and a.CURRENT_TIMESTAMP=tmp.mct > > > The query hangs for long and finally fails with a timeout. I have 12 > region servers each with 5GB heap and also the total records satisfying the > above query is 62K whose CSV dump is ~10MB only. > > DoNotRetryIOException: Could not find *hash cache for join Id*: Ӧ�8�D�. > The cache might have expired and have been removed > > and - > > Caused by: java.sql.SQLException: Encountered exception in sub plan [0] > execution. > at > org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:175) > at > com.infa.products.ldm.ingestion.server.java.hadoop.impl.FixPhoenixIngestInputFormat.getQueryPlan(FixPhoenixIngestInputFormat.java:94) > ... 22 more > > > andCaused by: java.sql.SQLException: > java.util.concurrent.*TimeoutException* > > at > org.apache.phoenix.cache.ServerCacheClient.addServerCache(ServerCacheClient.java:264) > > > I can try playing around with parameters such as > phoenix.coprocessor.maxServerCacheTimeToLiveMs and switching to > sort_merge_join actually helped. > > But my question is as per Joins | Apache Phoenix > <http://phoenix.apache.org/joins.html> in a case such as *lhs* INNER JOIN > *rhs, *it is *rhs* which will be built as hash table in server cache. So, in > the above query I assume this gets cached? > > select max(CURRENT_TIMESTAMP) as mct, ID as tid from TBL where ID like > '42ecf4abd4bd7e7606025dc8eee3de6a3cc04418cbc2619ddc01f54d88d7c3bf%' group by > ID) as tmp > > Thanks, > Sumit > > > >