Hi Matt, Thanks for sharing the query. Using that hint should supposedly force sort-merge join no matter what. I will go ahead and verify that.
Thanks, Maryann On Thu, Mar 12, 2015 at 2:25 PM, Matthew Johnson <matt.john...@algomi.com> wrote: > Hi Maryann, > > > > My views: > > > > *create view "mytestview1" * > > *(* > > *"rowid" varchar primary key, * > > *"metadata"."bId" varchar, * > > *"metadata"."dId" varchar* > > *) * > > > > *create view "mytestview2" * > > *(* > > *"rowid" varchar primary key, * > > *"data"."bId" varchar,* > > *"data"."details" varchar,* > > *"data"."comment" varchar* > > *) * > > > > The amount of data: > > > > *SELECT count(*) FROM "mytestview1";* > > * -- 78,549* > > > > *SELECT count(*) FROM "mytestview2";* > > * -- 2,130,905* > > > > The query: > > > > Without hint: > > > > *EXPLAIN select count(*) * > > *from "mytestview1" m1* > > *inner join "mytestview2" m2* > > *on m1."bId" = m2."bId"* > > > > *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1* > > * SERVER AGGREGATE INTO SINGLE ROW* > > * PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)* > > * CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2* > > > > With hint: > > > > *EXPLAIN select /*+ USE_SORT_MERGE_JOIN */ count(*) * > > *from "mytestview1" m1* > > *inner join "mytestview2" m2* > > *on m1."bId" = m2."bId"* > > > > *CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest1* > > * SERVER AGGREGATE INTO SINGLE ROW* > > * PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE)* > > * CLIENT 5-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytest2* > > > > > > I know that when joining that I should ideally do <large table> join > <small table>, but this is on my test environment and in production both > tables are roughly the same size, so I’m trying to force it to use the sort > merge before running the query in prod. My region servers in test don’t > have that much heap space (about 2 gigs) if that makes a difference. Do I > need to force a major compaction, generate statistics, anything like that? > > > > Thanks! > > Matt > > > > > > *From:* Maryann Xue [mailto:maryann....@gmail.com] > *Sent:* 11 March 2015 20:16 > > *To:* user@phoenix.apache.org > *Subject:* Re: Using Hints in Phoenix > > > > Hi Matt, > > > > Views or tables should not matter in this. Would you mind sharing your > query and DDLs? > > > > > > Thanks, > > Maryann > > > > > > On Wed, Mar 11, 2015 at 6:06 AM, Matthew Johnson <matt.john...@algomi.com> > wrote: > > Hi Maryann, > > > > I am using 4.3.0 (I upgraded as soon as it was released, largely so I > could pick up this feature). I am actually joining views rather than tables > – would this make a difference? > > > > Cheers, > > Matt > > > > *From:* Maryann Xue [mailto:maryann....@gmail.com] > *Sent:* 10 March 2015 20:54 > > > *To:* user@phoenix.apache.org > *Subject:* Re: Using Hints in Phoenix > > > > Hi Matt, > > > > Which version of Phoenix are you using? Sort-merge join is only available > in Phoenix 4.3. > > > > > > Thanks, > > Maryann > > > > On Tue, Mar 10, 2015 at 6:11 AM, Matthew Johnson <matt.john...@algomi.com> > wrote: > > Hi Maryann, > > > > Thanks for clarifying that for me. I’ve been playing with the hint > *USE_SORT_MERGE_JOIN* to try and solve an issue when joining two very > large tables: > > > > *Error: Encountered exception in sub plan [0] execution.* > > *SQLState: null* > > *ErrorCode: 0* > > > > Which I believe is related to a lack of memory for building the hash > table? I thought that using *SELECT /*+ USE_SORT_MERGE_JOIN*/* would make > the join much slower but would work on an unlimited data set – did I > misunderstand? The explain plan does not change with or without this hint: > > > > CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable1 > > SERVER AGGREGATE INTO SINGLE ROW > > PARALLEL INNER-JOIN TABLE 0 (SKIP MERGE) > > CLIENT 15-CHUNK PARALLEL 1-WAY FULL SCAN OVER mytable2 > > > > Cheers, > > Matt > > > > > > *From:* Maryann Xue [mailto:maryann....@gmail.com] > *Sent:* 09 March 2015 15:00 > *To:* user@phoenix.apache.org > *Subject:* Re: Using Hints in Phoenix > > > > Hi Matt, > > > > So far in Phoenix, hints are only supported as specified right after > keywords SELECT, UPSERT and DELETE. Same for join queries. It is currently > impossible to hint a certain join algorithm for a specific join node in a > multiple join query. However, for subqueries, the inner query can have its > own hints, independent of the outer query, like "SELECT /*+ INDEX(t idx1)*/ > col1, col2 FROM t WHERE col3 IN (SELECT /*+ NO_INDEX*/ id FROM r WHERE name > = 'x')". > > > > > > Thanks, > > Maryann > > > > On Mon, Mar 9, 2015 at 7:26 AM, Matthew Johnson <matt.john...@algomi.com> > wrote: > > Hi guys, > > > > This is more of a general question than a problem – but I’m just wondering > if someone can clarify for me what the syntax rules are for hints in > Phoenix. Does it matter where in the query they go? Do they always go > something like *SELECT <insert hint> x from y*? Or, if the hint is for a > join (eg Sort Merge) does it go in the join part (*SELECT x from y inner > join <insert hint> z on j = k*)? > > > > Couldn’t seem to find anything specific on this in the docs, and haven’t > worked much with database hints in general so maybe there is a convention > that I am not aware of – apologies if it’s a stupid question! > > > > Cheers, > > Matt > > > > > > > > >