Here it is: https://issues.apache.org/jira/browse/PHOENIX-4508
On Thu, Dec 28, 2017 at 9:19 AM, Flavio Pompermaier <pomperma...@okkam.it> wrote: > Hi James, > What should be the subject of the JIRA? > Could you open it for me...? I'm on vacation and opening tickets on JIRA > from mobile is not that easy... > Just 2 observations: PEOPLE table is indeed sorted by PEOPLE_ID, MY_TABLE > is somewhat of a pivot table so it's MUCH bigger that PEOPLE in terms of > cardinality, while that's probably not true in terms of cells and, thus, > occupied space and memory. > > Best, > Flavio > > > > On 28 Dec 2017 08:10, "James Taylor" <jamestay...@apache.org> wrote: > > Looks like the second query is sorting the entire PEOPLE table (though it > seems like that shouldn’t be necessary as it’s probably already sorted by > PEOPLE_ID) while the first one is sorting only part of MY_TABLE (which is > likely less data). Might be a bug as the queries look the same. > > Please log a JIRA and thanks for all the details. > > On Wed, Dec 27, 2017 at 3:44 PM Flavio Pompermaier <pomperma...@okkam.it> > wrote: > >> Ok. So why the 2nd query requires more memory than the first one >> (nonetheless USE_SORT_MERGE_JOIN is used) and can't complete? >> >> >> On 28 Dec 2017 00:33, "James Taylor" <jamestay...@apache.org> wrote: >> >> A hash join (the default) will be faster but the tables being cached >> (last or RHS table being joined) must be small enough to fit into memory on >> the region server. If it's too big, you can use the USE_SORT_MERGE_JOIN >> which would not have this restriction. >> >> On Wed, Dec 27, 2017 at 3:16 PM, Flavio Pompermaier <pomperma...@okkam.it >> > wrote: >> >>> Just to summarize things...is the best approach, in terms of required >>> memory, for Apache Phoenix queries to use sort merge join? Should inner >>> queries be avoided? >>> >>> >>> On 22 Dec 2017 22:47, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: >>> >>> MYTABLE is definitely much bigger than PEOPLE table, in terms of >>> cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger >>> >>> On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote: >>> >>>> I see. I think client side probably hold on to the iterators from the >>>> both sides and crawling forward to do the merge sort. in this case should >>>> be no much memory footprint either way where the filter is performed. >>>> >>>> On December 22, 2017 at 1:04:18 PM, James Taylor ( >>>> jamestay...@apache.org) wrote: >>>> >>>> There’s no shipping of any tables with a sort merge join. >>>> >>>> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aerto...@gmail.com> wrote: >>>> >>>>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship >>>>> around without get filtered first. Just for experiment, if you took out >>>>> hint USE_SORT_MERGE_JOIN, what will be the plan? >>>>> >>>>> >>>>> On December 22, 2017 at 12:46:25 PM, James Taylor ( >>>>> jamestay...@apache.org) wrote: >>>>> >>>>> For sort merge join, both post-filtered table results are sorted on >>>>> the server side and then a merge sort is done on the client-side. >>>>> >>>>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote: >>>>> >>>>>> Hello Flavio, >>>>>> >>>>>> From the plan looks like to me the second query is doing the filter >>>>>> at parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE >>>>>> (after filtered) respectively? >>>>>> >>>>>> For sort merge join, anyone knows are the both sides get shipped to >>>>>> client to do the merge sort? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> >>>>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier ( >>>>>> pomperma...@okkam.it) wrote: >>>>>> >>>>>> Any help here...? >>>>>> >>>>>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pomperma...@okkam.it> >>>>>> wrote: >>>>>> >>>>>>> Hi to all, >>>>>>> I'm trying to find the best query for my use case but I found that >>>>>>> one version work and the other one does not (unless that I don't apply >>>>>>> some >>>>>>> tuning to timeouts etc like explained in [1]). >>>>>>> >>>>>>> The 2 queries extract the same data but, while the first query >>>>>>> terminates the second does not. >>>>>>> *PS*: without the USE_SORT_MERGE_JOIN both queries weren't working >>>>>>> >>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >>>>>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID >>>>>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = >>>>>>> FALSE; >>>>>>> +----------------------------------------------------------- >>>>>>> ----------------------------------------------------+------- >>>>>>> ----------+----------------+----------------+ >>>>>>> | PLAN >>>>>>> | EST_BYTES_READ | >>>>>>> EST_ROWS_READ | EST_INFO_TS | >>>>>>> +----------------------------------------------------------- >>>>>>> ----------------------------------------------------+------- >>>>>>> ----------+----------------+----------------+ >>>>>>> | SORT-MERGE-JOIN (INNER) TABLES >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY >>>>>>> FULL SCAN OVER PEOPLE | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | SERVER FILTER BY FIRST KEY ONLY >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT MERGE SORT >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | AND (SKIP MERGE) >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY >>>>>>> RANGE SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> >>>>>>> | 12077867 | 1513754378759 | >>>>>>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >>>>>>> HAS_CANDIDATES = false) | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | SERVER SORTED BY [L.LOCALID] >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT MERGE SORT >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> +----------------------------------------------------------- >>>>>>> ----------------------------------------------------+------- >>>>>>> ----------+----------------+----------------+ >>>>>>> >>>>>>> >>>>>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >>>>>>> FROM (SELECT LOCALID FROM MYTABLE >>>>>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l >>>>>>> JOIN PEOPLE ds ON ds.PERSON_ID = l.LOCALID; >>>>>>> +----------------------------------------------------------- >>>>>>> ---------------------------------------------------+-------- >>>>>>> ---------+----------------+----------------+ >>>>>>> | PLAN >>>>>>> | EST_BYTES_READ | >>>>>>> EST_ROWS_READ | EST_INFO_TS | >>>>>>> +----------------------------------------------------------- >>>>>>> ---------------------------------------------------+-------- >>>>>>> ---------+----------------+----------------+ >>>>>>> | SORT-MERGE-JOIN (INNER) TABLES >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY >>>>>>> RANGE SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> >>>>>>> | 12077867 | 1513754378759 | >>>>>>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >>>>>>> HAS_CANDIDATES = false) | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT MERGE SORT >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | AND (SKIP MERGE) >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY >>>>>>> FULL SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> >>>>>>> | 12077867 | 1513754378759 | >>>>>>> | SERVER FILTER BY FIRST KEY ONLY >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | SERVER SORTED BY [DS.PERSON_ID] >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT MERGE SORT >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> | CLIENT AGGREGATE INTO SINGLE ROW >>>>>>> | 14155777900 >>>>>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>>>>> +----------------------------------------------------------- >>>>>>> ---------------------------------------------------+-------- >>>>>>> ---------+----------------+----------------+ >>>>>>> 10 rows selected (0.041 seconds) >>>>>>> >>>>>>> What do you think? Whould I try to give more resources to >>>>>>> HBase/Phoenix or is the first query the best (and reliable) one? >>>>>>> Any insight about this is highly appreciated.. >>>>>>> >>>>>>> Best, >>>>>>> Flavio >>>>>>> >>>>>> >>>>> >>> >> >> > -- Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809