Re: Using Hints in Phoenix

Maryann Xue Thu, 19 Mar 2015 09:09:13 -0700

By the way, to answer your previous questions, Phoenix joins have not
started to use stats so far, but the hints are parsed and handled in a
universal way regardless of what type of query it is.


Thanks,
Maryann

On Thu, Mar 19, 2015 at 12:03 PM, Maryann Xue <maryann....@gmail.com> wrote:

> Hi Matt,
>
> Thank you very much for the investigation! Actually I tried with Java code
> in a unit test (for I don't have an standalone HBase environment right
> now), but could not reproduce it anyway. Then I thought it could be using
> "createStatement()" vs "prepareStatement()" after seeing your
> investigation, but again that was not it.
>
> But anyway, it just occurred to me that there might be an easy way to
> force running sort-merge-join (just for the purpose of testing), by running
> a query with FULL OUTER JOIN. Could you please try that out and see what
> happens?
>
>
> Thanks,
> Maryann
>
>
> On Thu, Mar 19, 2015 at 7:54 AM, Matthew Johnson <matt.john...@algomi.com>
> wrote:
>
>> Hi Maryann,
>>
>>
>>
>> Sending this to the whole user group now I have a bit more investigation.
>> Managed to do some digging and also figured out how to use sqlline.py, and
>> got the following results.
>>
>>
>>
>> With sqlline.py:
>>
>>
>>
>> *0: jdbc:phoenix:sales1,sales2,sales3> explain SELECT /*+
>> USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2 on t2.rowid
>> = t1.rowid;*
>>
>>
>>
>> *+------------------------------------------+*
>>
>> *|                   PLAN                   |*
>>
>> *+------------------------------------------+*
>>
>> *| SORT-MERGE-JOIN (INNER) TABLES           |*
>>
>> *|     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1 |*
>>
>> *| AND                                      |*
>>
>> *|     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2 |*
>>
>> *|         SERVER SORTED BY [T2.ROWID]      |*
>>
>> *|     CLIENT MERGE SORT                    |*
>>
>> *+------------------------------------------+*
>>
>> *6 rows selected (0.035 seconds)*
>>
>>
>>
>>
>>
>> So it does seem to work from sqlline!! Then, I decided to try and use it
>> from inside a Java class:
>>
>>
>>
>> Class.*forName*("org.apache.phoenix.jdbc.PhoenixDriver");
>>
>>
>>
>>                      Connection conn = DriverManager.*getConnection*(
>> "jdbc:phoenix:sales1,sales2,sales3", "", "");
>>
>>
>>
>>                      // Create a Statement class to execute the SQL
>> statement
>>
>>                      Statement stmtLimited = conn.createStatement();
>>
>>
>>
>>                      System.*out*.println("Executing statement");
>>
>>
>>
>>                      // Execute the SQL statement and get the results in
>> a *Resultset*
>>
>>                      ResultSet rsLimited = stmtLimited.executeQuery("explain
>> SELECT /*+ USE_SORT_MERGE_JOIN */ * FROM testtable1 t1 join testtable2 t2
>> on t2.rowid = t1.rowid");
>>
>>                      *while*(rsLimited.next()) {
>>
>>                            String plan = rsLimited.getString("PLAN");
>>
>>                            System.*out*.println(plan);
>>
>>                      }
>>
>>
>>
>>                      stmtLimited.close();
>>
>>                      conn.close();
>>
>>
>>
>> And I get:
>>
>>
>>
>> Executing statement
>>
>> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE1
>>
>>     PARALLEL INNER-JOIN TABLE 0
>>
>>         CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER TESTTABLE2
>>
>>     DYNAMIC SERVER FILTER BY ROWID BETWEEN MIN/MAX OF (T2.ROWID)
>>
>>
>>
>>
>>
>> So it seems that when executed from Java code (Squirrel is also a
>> Java-based client) it ignores the hint! What do you think? Is there a way
>> around this?
>>
>>
>>
>> Thanks!
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann....@gmail.com]
>> *Sent:* 17 March 2015 16:23
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: Using Hints in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> Sorry that I still could not reproduce the issue. Could you try
>> reproducing it in a unit test?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>> On Fri, Mar 13, 2015 at 6:20 AM, Matthew Johnson <matt.john...@algomi.com>
>> wrote:
>>
>> Hi Maryann,
>>
>>
>>
>> All hbases in my cluster have:
>>
>>
>>
>> *phoenix-4.3.0-server.jar*
>>
>>
>>
>> in the lib folder, and my client is using:
>>
>>
>>
>>                 *phoenix-4.3.0-client.jar*
>>
>>
>>
>> I generated the MD5 in case something has gone wrong with the versioning:
>>
>>
>>
>> *e0ade979e7b444fb8f8f4b7b5578edab*
>>
>>
>>
>> And I have opened up the jar, and can see the new class
>> *SortMergeJoinPlan.class*, so presumably I have the right version – is
>> there anything else I can check?
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann....@gmail.com]
>> *Sent:* 12 March 2015 23:02
>>
>>
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: Using Hints in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> I checked on my side. Sort-merge-join would work for both tables and
>> views. And I also verified that 4.3 branch does have the corresponding
>> check-in (
>> https://git1-us-west.apache.org/repos/asf?p=phoenix.git;a=commitdiff;h=ebc7ee42cdb2b05a293f54dc687ca975db9acbc3)
>> although the check-in message had a little mistake there.
>>
>>
>>
>> Could you please verify your Phoenix library version again, Matt?
>> Especially the client.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>> On Thu, Mar 12, 2015 at 6:00 PM, Maryann Xue <maryann....@gmail.com>
>> wrote:
>>
>> Hi Matt,
>>
>>
>>
>> Thanks for sharing the query. Using that hint should supposedly force
>> sort-merge join no matter what. I will go ahead and verify that.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>> On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson <matt.john...@algomi.com>
>> wrote:
>>
>> Hi Maryann,
>>
>>
>>
>> My views:
>>
>>
>>
>> *create view "mytestview1" *
>>
>> *(*
>>
>> *"rowid" varchar primary key, *
>>
>> *"metadata"."bId" varchar, *
>>
>> *"metadata"."dId" varchar*
>>
>> *) *
>>
>>
>>
>> *create view "mytestview2" *
>>
>> *(*
>>
>> *"rowid" varchar primary key, *
>>
>> *"data"."bId" varchar,*
>>
>> *"data"."details" varchar,*
>>
>> *"data"."comment" varchar*
>>
>> *) *
>>
>>
>>
>> The amount of data:
>>
>>
>>
>> *SELECT count(*) FROM "mytestview1";*
>>
>> *  -- 78,549*
>>
>>
>>
>> *SELECT count(*) FROM "mytestview2";*
>>
>> *  -- 2,130,905*
>>
>>
>>
>> The query:
>>
>>
>>
>> Without hint:
>>
>>
>>
>> *EXPLAIN select count(*) *
>>
>> *from "mytestview1" m1*
>>
>> *inner join "mytestview2" m2*
>>
>> *on m1."bId" = m2."bId"*
>>
>>
>>
>> *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*
>>
>> *    SERVER AGGREGATE INTO SINGLE ROW*
>>
>> *    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*
>>
>> *        CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*
>>
>>
>>
>> With hint:
>>
>>
>>
>> *EXPLAIN select /*+ USE_SORT_MERGE_JOIN */ count(*) *
>>
>> *from "mytestview1" m1*
>>
>> *inner join "mytestview2" m2*
>>
>> *on m1."bId" = m2."bId"*
>>
>>
>>
>> *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1*
>>
>> *    SERVER AGGREGATE INTO SINGLE ROW*
>>
>> *    PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)*
>>
>> *        CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2*
>>
>>
>>
>>
>>
>> I know that when joining that I should ideally do <large table> join
>> <small table>, but this is on my test environment and in production both
>> tables are roughly the same size, so I’m trying to force it to use the sort
>> merge before running the query in prod. My region servers in test don’t
>> have that much heap space (about 2 gigs) if that makes a difference. Do I
>> need to force a major compaction, generate statistics, anything like that?
>>
>>
>>
>> Thanks!
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann....@gmail.com]
>> *Sent:* 11 March 2015 20:16
>>
>>
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: Using Hints in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> Views or tables should not matter in this. Would you mind sharing your
>> query and DDLs?
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>>
>>
>> On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson <matt.john...@algomi.com>
>> wrote:
>>
>> Hi Maryann,
>>
>>
>>
>> I am using 4.3.0 (I upgraded as soon as it was released, largely so I
>> could pick up this feature). I am actually joining views rather than tables
>> – would this make a difference?
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann....@gmail.com]
>> *Sent:* 10 March 2015 20:54
>>
>>
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: Using Hints in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> Which version of Phoenix are you using? Sort-merge join is only available
>> in Phoenix 4.3.
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>> On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson <matt.john...@algomi.com>
>> wrote:
>>
>> Hi Maryann,
>>
>>
>>
>> Thanks for clarifying that for me. I’ve been playing with the hint
>> *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very
>> large tables:
>>
>>
>>
>> *Error: Encountered exception in sub plan [0] execution.*
>>
>> *SQLState:  null*
>>
>> *ErrorCode: 0*
>>
>>
>>
>> Which I believe is related to a lack of memory for building the hash
>> table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would
>> make the join much slower but would work on an unlimited data set – did I
>> misunderstand? The explain plan does not change with or without this hint:
>>
>>
>>
>> CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1
>>
>>     SERVER AGGREGATE INTO SINGLE ROW
>>
>>     PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)
>>
>>         CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>>
>>
>> *From:* Maryann Xue [mailto:maryann....@gmail.com]
>> *Sent:* 09 March 2015 15:00
>> *To:* user@phoenix.apache.org
>> *Subject:* Re: Using Hints in Phoenix
>>
>>
>>
>> Hi Matt,
>>
>>
>>
>> So far in Phoenix, hints are only supported as specified right after
>> keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently
>> impossible to hint a certain join algorithm for a specific join node in a
>> multiple join query. However, for subqueries, the inner query can have its
>> own hints, independent of the outer query, like "SELECT /*+ INDEX(t idx1)*/
>> col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name
>> = 'x')".
>>
>>
>>
>>
>>
>> Thanks,
>>
>> Maryann
>>
>>
>>
>> On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson <matt.john...@algomi.com>
>> wrote:
>>
>> Hi guys,
>>
>>
>>
>> This is more of a general question than a problem – but I’m just
>> wondering if someone can clarify for me what the syntax rules are for hints
>> in Phoenix. Does it matter where in the query they go? Do they always go
>> something like *SELECT <insert hint> x from y*? Or, if the hint is for a
>> join (eg Sort Merge) does it go in the join part (*SELECT x from y inner
>> join <insert hint> z on j = k*)?
>>
>>
>>
>> Couldn’t seem to find anything specific on this in the docs, and haven’t
>> worked much with database hints in general so maybe there is a convention
>> that I am not aware of – apologies if it’s a stupid question!
>>
>>
>>
>> Cheers,
>>
>> Matt
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>

Re: Using Hints in Phoenix

Reply via email to