RE: Using Hints in Phoenix

Matthew Johnson Mon, 23 Mar 2015 06:09:31 -0700

Hi Maryann,



My apologies, my Java test app was indeed still referencing 4.2.2 (stupid
copy and paste error) which I have now fixed. Squirrel is very definitely
on 4.3.0 though. Interestingly, here are my results:



JAVA TEST APP – INNER JOIN



SORT-MERGE-JOIN (INNER) TABLES

    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

AND

    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

        SERVER SORTED BY [T2.ROWID]

    CLIENT MERGE SORT



JAVA TEST APP – FULL OUTER JOIN



SORT-MERGE-JOIN (FULL) TABLES

    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

        SERVER SORTED BY [T1.ROWID]

    CLIENT MERGE SORT

AND

    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

        SERVER SORTED BY [T2.ROWID]

    CLIENT MERGE SORT



SQUIRREL SQL – INNER JOIN



CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

    PARALLEL INNER-JOIN TABLE 0

        CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

    DYNAMIC SERVER FILTER BY ROWID IN (T2.ROWID)



SQUIRREL SQL – FULL OUTER JOIN



SORT-MERGE-JOIN (FULL) TABLES

    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

        SERVER SORTED BY [T1.ROWID]

    CLIENT MERGE SORT

AND

    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

        SERVER SORTED BY [T2.ROWID]

    CLIENT MERGE SORT





So it seems that Squirrel has an issue processing the hint when doing an
inner join, even though it works fine from a basic Java prepared statement.
I’m guessing the workaround for this is to use full outer join whenever I
want Sort Merge in Squirrel, and put a ‘where t2.rowid is not null’ in the
where clause to simulate an inner join?



Thank you so much for all your help thus far – really appreciate it!



Cheers,

Matt





*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 19 March 2015 16:38
*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



Hi Matt,



Ah, there we go. I think the your Squirrel and Java program somehow
referred to the old client JAR and that's why.





Thanks,

Maryann



On Thu, Mar 19, 2015 at 12:28 PM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Thanks Maryann for looking into it. I tried to run the following:



*// Execute the SQL statement and get the results in a Resultset*

*       ResultSet rsLimited = stmtLimited.executeQuery(**"explain SELECT
/*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 full outer join testtable2
t2 on t2.rowid = t1.rowid"**);*

       *while**(rsLimited.next()) {*

*              String plan = rsLimited.getString(**"PLAN"**);*

*              System.**out**.println(plan);*

*       }*



But got this error:



Executing statement

*java.sql.SQLFeatureNotSupportedException*: Full joins not supported.

      at org.apache.phoenix.compile.QueryCompiler.compileJoinQuery(
*QueryCompiler.java:248*)

      at org.apache.phoenix.compile.QueryCompiler.compile(
*QueryCompiler.java:138*)

      at
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(
*PhoenixStatement.java:322*)

      at
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(
*PhoenixStatement.java:305*)

      at
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableExplainStatement.compilePlan(
*PhoenixStatement.java:380*)

      at
org.apache.phoenix.jdbc.PhoenixStatement$ExecutableExplainStatement.compilePlan(
*PhoenixStatement.java:360*)

      at org.apache.phoenix.jdbc.PhoenixStatement$1.call(
*PhoenixStatement.java:221*)

      at org.apache.phoenix.jdbc.PhoenixStatement$1.call(
*PhoenixStatement.java:217*)

      at org.apache.phoenix.call.CallRunner.run(*CallRunner.java:53*)

      at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(
*PhoenixStatement.java:216*)

      at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(
*PhoenixStatement.java:1030*)

      at com.algomi.phoenix.PhoenixExplainPlan.main(
*PhoenixExplainPlan.java:24*)





Although this, too, seems to work from sqlline:



*0: jdbc:phoenix:sales1,sales2,sales3> explain SELECT * FROM testtable1 t1
full outer join testtable2 t2 on t2.rowid = t1.rowid;*

*+------------------------------------------+*

*|                   PLAN                   |*

*+------------------------------------------+*

*| SORT-MERGE-JOIN (FULL) TABLES            |*

*|     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*

*|         SERVER SORTED BY [T1.ROWID]      |*

*|     CLIENT MERGE SORT                    |*

*| AND                                      |*

*|     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*

*|         SERVER SORTED BY [T2.ROWID]      |*

*|     CLIENT MERGE SORT                    |*

*+------------------------------------------+*

*8 rows selected (0.032 seconds)*



Does Sqlline use the same phoenix-client jar to make the connection? Or
does it work differently?



Thanks!

Matt





*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 19 March 2015 16:08


*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



By the way, to answer your previous questions, Phoenix joins have not
started to use stats so far, but the hints are parsed and handled in a
universal way regardless of what type of query it is.



Thanks,

Maryann



On Thu, Mar 19, 2015 at 12:03 PM, Maryann Xue <maryann....@gmail.com> wrote:

Hi Matt,



Thank you very much for the investigation! Actually I tried with Java code
in a unit test (for I don't have an standalone HBase environment right
now), but could not reproduce it anyway. Then I thought it could be using
"createStatement()" vs "prepareStatement()" after seeing your
investigation, but again that was not it.



But anyway, it just occurred to me that there might be an easy way to force
running sort-merge-join (just for the purpose of testing), by running a
query with FULL OUTER JOIN. Could you please try that out and see what
happens?





Thanks,

Maryann





On Thu, Mar 19, 2015 at 7:54 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Hi Maryann,



Sending this to the whole user group now I have a bit more investigation.
Managed to do some digging and also figured out how to use sqlline.py, and
got the following results.



With sqlline.py:



*0: jdbc:phoenix:sales1,sales2,sales3> explain SELECT /*+
USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 on t2.rowid
= t1.rowid;*



*+------------------------------------------+*

*|                   PLAN                   |*

*+------------------------------------------+*

*| SORT-MERGE-JOIN (INNER) TABLES           |*

*|     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*

*| AND                                      |*

*|     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*

*|         SERVER SORTED BY [T2.ROWID]      |*

*|     CLIENT MERGE SORT                    |*

*+------------------------------------------+*

*6 rows selected (0.035 seconds)*





So it does seem to work from sqlline!! Then, I decided to try and use it
from inside a Java class:



Class.*forName*("org.apache.phoenix.jdbc.PhoenixDriver");



                     Connection conn = DriverManager.*getConnection*(
"jdbc:phoenix:sales1,sales2,sales3", "", "");



                     // Create a Statement class to execute the SQL
statement

                     Statement stmtLimited = conn.createStatement();



                     System.*out*.println("Executing statement");



                     // Execute the SQL statement and get the results in a
*Resultset*

                     ResultSet rsLimited = stmtLimited.executeQuery("explain
SELECT /*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2
on t2.rowid = t1.rowid");

                     *while*(rsLimited.next()) {

                           String plan = rsLimited.getString("PLAN");

                           System.*out*.println(plan);

                     }



                     stmtLimited.close();

                     conn.close();



And I get:



Executing statement

CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1

    PARALLEL INNER-JOIN TABLE 0

        CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2

    DYNAMIC SERVER FILTER BY ROWID BETWEEN MIN/MAX OF (T2.ROWID)





So it seems that when executed from Java code (Squirrel is also a
Java-based client) it ignores the hint! What do you think? Is there a way
around this?



Thanks!

Matt





*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 17 March 2015 16:23

*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



Hi Matt,



Sorry that I still could not reproduce the issue. Could you try reproducing
it in a unit test?





Thanks,

Maryann



On Fri, Mar 13, 2015 at 6:20 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Hi Maryann,



All hbases in my cluster have:



*phoenix-4.3.0-server.jar*



in the lib folder, and my client is using:



                *phoenix-4.3.0-client.jar*



I generated the MD5 in case something has gone wrong with the versioning:



*e0ade979e7b444fb8f8f4b7b5578edab*



And I have opened up the jar, and can see the new class
*SortMergeJoinPlan.class*, so presumably I have the right version – is
there anything else I can check?



Cheers,

Matt



*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 12 March 2015 23:02


*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



Hi Matt,



I checked on my side. Sort-merge-join would work for both tables and views.
And I also verified that 4.3 branch does have the corresponding check-in (
https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=ebc7ee42cdb2b05a293f54dc687ca975db9acbc3)
although the check-in message had a little mistake there.



Could you please verify your Phoenix library version again, Matt?
Especially the client.





Thanks,

Maryann



On Thu, Mar 12, 2015 at 6:00 PM, Maryann Xue <maryann....@gmail.com> wrote:

Hi Matt,



Thanks for sharing the query. Using that hint should supposedly force
sort-merge join no matter what. I will go ahead and verify that.





Thanks,

Maryann



On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Hi Maryann,



My views:



*create view "mytestview1" *

*(*

*"rowid" varchar primary key, *

*"metadata"."bId" varchar, *

*"metadata"."dId" varchar*

*) *



*create view "mytestview2" *

*(*

*"rowid" varchar primary key, *

*"data"."bId" varchar,*

*"data"."details" varchar,*

*"data"."comment" varchar*

*) *



The amount of data:



*SELECT count(*) FROM "mytestview1";*

*  -- 78,549*



*SELECT count(*) FROM "mytestview2";*

*  -- 2,130,905*



The query:



Without hint:



*EXPLAIN select count(*) *

*from "mytestview1" m1*

*inner join "mytestview2" m2*

*on m1."bId" = m2."bId"*



*CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*

*    SERVER AGGREGATE INTO SINGLE ROW*

*    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*

*        CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*



With hint:



*EXPLAIN select /*+ USE_SORT_MERGE_JOIN */ count(*) *

*from "mytestview1" m1*

*inner join "mytestview2" m2*

*on m1."bId" = m2."bId"*



*CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*

*    SERVER AGGREGATE INTO SINGLE ROW*

*    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*

*        CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*





I know that when joining that I should ideally do <large table> join <small
table>, but this is on my test environment and in production both tables
are roughly the same size, so I’m trying to force it to use the sort merge
before running the query in prod. My region servers in test don’t have that
much heap space (about 2 gigs) if that makes a difference. Do I need to
force a major compaction, generate statistics, anything like that?



Thanks!

Matt





*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 11 March 2015 20:16


*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



Hi Matt,



Views or tables should not matter in this. Would you mind sharing your
query and DDLs?





Thanks,

Maryann





On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Hi Maryann,



I am using 4.3.0 (I upgraded as soon as it was released, largely so I could
pick up this feature). I am actually joining views rather than tables –
would this make a difference?



Cheers,

Matt



*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 10 March 2015 20:54


*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



Hi Matt,



Which version of Phoenix are you using? Sort-merge join is only available
in Phoenix 4.3.





Thanks,

Maryann



On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Hi Maryann,



Thanks for clarifying that for me. I’ve been playing with the hint
*USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very large
tables:



*Error: Encountered exception in sub plan [0] execution.*

*SQLState:  null*

*ErrorCode: 0*



Which I believe is related to a lack of memory for building the hash table?
I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would make the join
much slower but would work on an unlimited data set – did I misunderstand?
The explain plan does not change with or without this hint:



CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1

    SERVER AGGREGATE INTO SINGLE ROW

    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)

        CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2



Cheers,

Matt





*From:* Maryann Xue [mailto:maryann....@gmail.com]
*Sent:* 09 March 2015 15:00
*To:* user@phoenix.apache.org
*Subject:* Re: Using Hints in Phoenix



Hi Matt,



So far in Phoenix, hints are only supported as specified right after
keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
impossible to hint a certain join algorithm for a specific join node in a
multiple join query. However, for subqueries, the inner query can have its
own hints, independent of the outer query, like "SELECT /*+ INDEX(t idx1)*/
col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
= 'x')".





Thanks,

Maryann



On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson <matt.john...@algomi.com>
wrote:

Hi guys,



This is more of a general question than a problem – but I’m just wondering
if someone can clarify for me what the syntax rules are for hints in
Phoenix. Does it matter where in the query they go? Do they always go
something like *SELECT <insert hint> x from y*? Or, if the hint is for a
join (eg Sort Merge) does it go in the join part (*SELECT x from y inner
join <insert hint> z on j = k*)?



Couldn’t seem to find anything specific on this in the docs, and haven’t
worked much with database hints in general so maybe there is a convention
that I am not aware of – apologies if it’s a stupid question!



Cheers,

Matt

RE: Using Hints in Phoenix

Reply via email to