By the way, to answer your previous questions, Phoenix joins have not started to use stats so far, but the hints are parsed and handled in a universal way regardless of what type of query it is.
Thanks, Maryann On Thu, Mar 19, 2015 at 12:03 PM, Maryann Xue <maryann....@gmail.com> wrote: > Hi Matt, > > Thank you very much for the investigation! Actually I tried with Java code > in a unit test (for I don't have an standalone HBase environment right > now), but could not reproduce it anyway. Then I thought it could be using > "createStatement()" vs "prepareStatement()" after seeing your > investigation, but again that was not it. > > But anyway, it just occurred to me that there might be an easy way to > force running sort-merge-join (just for the purpose of testing), by running > a query with FULL OUTER JOIN. Could you please try that out and see what > happens? > > > Thanks, > Maryann > > > On Thu, Mar 19, 2015 at 7:54 AM, Matthew Johnson <matt.john...@algomi.com> > wrote: > >> Hi Maryann, >> >> >> >> Sending this to the whole user group now I have a bit more investigation. >> Managed to do some digging and also figured out how to use sqlline.py, and >> got the following results. >> >> >> >> With sqlline.py: >> >> >> >> *0: jdbc:phoenix:sales1,sales2,sales3> explain SELECT /*+ >> USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 on t2.rowid >> = t1.rowid;* >> >> >> >> *+------------------------------------------+* >> >> *| PLAN |* >> >> *+------------------------------------------+* >> >> *| SORT-MERGE-JOIN (INNER) TABLES |* >> >> *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |* >> >> *| AND |* >> >> *| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |* >> >> *| SERVER SORTED BY [T2.ROWID] |* >> >> *| CLIENT MERGE SORT |* >> >> *+------------------------------------------+* >> >> *6 rows selected (0.035 seconds)* >> >> >> >> >> >> So it does seem to work from sqlline!! Then, I decided to try and use it >> from inside a Java class: >> >> >> >> Class.*forName*("org.apache.phoenix.jdbc.PhoenixDriver"); >> >> >> >> Connection conn = DriverManager.*getConnection*( >> "jdbc:phoenix:sales1,sales2,sales3", "", ""); >> >> >> >> // Create a Statement class to execute the SQL >> statement >> >> Statement stmtLimited = conn.createStatement(); >> >> >> >> System.*out*.println("Executing statement"); >> >> >> >> // Execute the SQL statement and get the results in >> a *Resultset* >> >> ResultSet rsLimited = stmtLimited.executeQuery("explain >> SELECT /*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 >> on t2.rowid = t1.rowid"); >> >> *while*(rsLimited.next()) { >> >> String plan = rsLimited.getString("PLAN"); >> >> System.*out*.println(plan); >> >> } >> >> >> >> stmtLimited.close(); >> >> conn.close(); >> >> >> >> And I get: >> >> >> >> Executing statement >> >> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 >> >> PARALLEL INNER-JOIN TABLE 0 >> >> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 >> >> DYNAMIC SERVER FILTER BY ROWID BETWEEN MIN/MAX OF (T2.ROWID) >> >> >> >> >> >> So it seems that when executed from Java code (Squirrel is also a >> Java-based client) it ignores the hint! What do you think? Is there a way >> around this? >> >> >> >> Thanks! >> >> Matt >> >> >> >> >> >> *From:* Maryann Xue [mailto:maryann....@gmail.com] >> *Sent:* 17 March 2015 16:23 >> *To:* user@phoenix.apache.org >> *Subject:* Re: Using Hints in Phoenix >> >> >> >> Hi Matt, >> >> >> >> Sorry that I still could not reproduce the issue. Could you try >> reproducing it in a unit test? >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> On Fri, Mar 13, 2015 at 6:20 AM, Matthew Johnson <matt.john...@algomi.com> >> wrote: >> >> Hi Maryann, >> >> >> >> All hbases in my cluster have: >> >> >> >> *phoenix-4.3.0-server.jar* >> >> >> >> in the lib folder, and my client is using: >> >> >> >> *phoenix-4.3.0-client.jar* >> >> >> >> I generated the MD5 in case something has gone wrong with the versioning: >> >> >> >> *e0ade979e7b444fb8f8f4b7b5578edab* >> >> >> >> And I have opened up the jar, and can see the new class >> *SortMergeJoinPlan.class*, so presumably I have the right version – is >> there anything else I can check? >> >> >> >> Cheers, >> >> Matt >> >> >> >> *From:* Maryann Xue [mailto:maryann....@gmail.com] >> *Sent:* 12 March 2015 23:02 >> >> >> *To:* user@phoenix.apache.org >> *Subject:* Re: Using Hints in Phoenix >> >> >> >> Hi Matt, >> >> >> >> I checked on my side. Sort-merge-join would work for both tables and >> views. And I also verified that 4.3 branch does have the corresponding >> check-in ( >> https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=ebc7ee42cdb2b05a293f54dc687ca975db9acbc3) >> although the check-in message had a little mistake there. >> >> >> >> Could you please verify your Phoenix library version again, Matt? >> Especially the client. >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> On Thu, Mar 12, 2015 at 6:00 PM, Maryann Xue <maryann....@gmail.com> >> wrote: >> >> Hi Matt, >> >> >> >> Thanks for sharing the query. Using that hint should supposedly force >> sort-merge join no matter what. I will go ahead and verify that. >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson <matt.john...@algomi.com> >> wrote: >> >> Hi Maryann, >> >> >> >> My views: >> >> >> >> *create view "mytestview1" * >> >> *(* >> >> *"rowid" varchar primary key, * >> >> *"metadata"."bId" varchar, * >> >> *"metadata"."dId" varchar* >> >> *) * >> >> >> >> *create view "mytestview2" * >> >> *(* >> >> *"rowid" varchar primary key, * >> >> *"data"."bId" varchar,* >> >> *"data"."details" varchar,* >> >> *"data"."comment" varchar* >> >> *) * >> >> >> >> The amount of data: >> >> >> >> *SELECT count(*) FROM "mytestview1";* >> >> * -- 78,549* >> >> >> >> *SELECT count(*) FROM "mytestview2";* >> >> * -- 2,130,905* >> >> >> >> The query: >> >> >> >> Without hint: >> >> >> >> *EXPLAIN select count(*) * >> >> *from "mytestview1" m1* >> >> *inner join "mytestview2" m2* >> >> *on m1."bId" = m2."bId"* >> >> >> >> *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1* >> >> * SERVER AGGREGATE INTO SINGLE ROW* >> >> * PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)* >> >> * CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2* >> >> >> >> With hint: >> >> >> >> *EXPLAIN select /*+ USE_SORT_MERGE_JOIN */ count(*) * >> >> *from "mytestview1" m1* >> >> *inner join "mytestview2" m2* >> >> *on m1."bId" = m2."bId"* >> >> >> >> *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1* >> >> * SERVER AGGREGATE INTO SINGLE ROW* >> >> * PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)* >> >> * CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2* >> >> >> >> >> >> I know that when joining that I should ideally do <large table> join >> <small table>, but this is on my test environment and in production both >> tables are roughly the same size, so I’m trying to force it to use the sort >> merge before running the query in prod. My region servers in test don’t >> have that much heap space (about 2 gigs) if that makes a difference. Do I >> need to force a major compaction, generate statistics, anything like that? >> >> >> >> Thanks! >> >> Matt >> >> >> >> >> >> *From:* Maryann Xue [mailto:maryann....@gmail.com] >> *Sent:* 11 March 2015 20:16 >> >> >> *To:* user@phoenix.apache.org >> *Subject:* Re: Using Hints in Phoenix >> >> >> >> Hi Matt, >> >> >> >> Views or tables should not matter in this. Would you mind sharing your >> query and DDLs? >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> >> >> On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson <matt.john...@algomi.com> >> wrote: >> >> Hi Maryann, >> >> >> >> I am using 4.3.0 (I upgraded as soon as it was released, largely so I >> could pick up this feature). I am actually joining views rather than tables >> – would this make a difference? >> >> >> >> Cheers, >> >> Matt >> >> >> >> *From:* Maryann Xue [mailto:maryann....@gmail.com] >> *Sent:* 10 March 2015 20:54 >> >> >> *To:* user@phoenix.apache.org >> *Subject:* Re: Using Hints in Phoenix >> >> >> >> Hi Matt, >> >> >> >> Which version of Phoenix are you using? Sort-merge join is only available >> in Phoenix 4.3. >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson <matt.john...@algomi.com> >> wrote: >> >> Hi Maryann, >> >> >> >> Thanks for clarifying that for me. I’ve been playing with the hint >> *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very >> large tables: >> >> >> >> *Error: Encountered exception in sub plan [0] execution.* >> >> *SQLState: null* >> >> *ErrorCode: 0* >> >> >> >> Which I believe is related to a lack of memory for building the hash >> table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would >> make the join much slower but would work on an unlimited data set – did I >> misunderstand? The explain plan does not change with or without this hint: >> >> >> >> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1 >> >> SERVER AGGREGATE INTO SINGLE ROW >> >> PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) >> >> CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2 >> >> >> >> Cheers, >> >> Matt >> >> >> >> >> >> *From:* Maryann Xue [mailto:maryann....@gmail.com] >> *Sent:* 09 March 2015 15:00 >> *To:* user@phoenix.apache.org >> *Subject:* Re: Using Hints in Phoenix >> >> >> >> Hi Matt, >> >> >> >> So far in Phoenix, hints are only supported as specified right after >> keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently >> impossible to hint a certain join algorithm for a specific join node in a >> multiple join query. However, for subqueries, the inner query can have its >> own hints, independent of the outer query, like "SELECT /*+ INDEX(t idx1)*/ >> col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name >> = 'x')". >> >> >> >> >> >> Thanks, >> >> Maryann >> >> >> >> On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson <matt.john...@algomi.com> >> wrote: >> >> Hi guys, >> >> >> >> This is more of a general question than a problem – but I’m just >> wondering if someone can clarify for me what the syntax rules are for hints >> in Phoenix. Does it matter where in the query they go? Do they always go >> something like *SELECT <insert hint> x from y*? Or, if the hint is for a >> join (eg Sort Merge) does it go in the join part (*SELECT x from y inner >> join <insert hint> z on j = k*)? >> >> >> >> Couldn’t seem to find anything specific on this in the docs, and haven’t >> worked much with database hints in general so maybe there is a convention >> that I am not aware of – apologies if it’s a stupid question! >> >> >> >> Cheers, >> >> Matt >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >