There’s no shipping of any tables with a sort merge join. On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aerto...@gmail.com> wrote:
> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship > around without get filtered first. Just for experiment, if you took out > hint USE_SORT_MERGE_JOIN, what will be the plan? > > > On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org) > wrote: > > For sort merge join, both post-filtered table results are sorted on the > server side and then a merge sort is done on the client-side. > > On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote: > >> Hello Flavio, >> >> From the plan looks like to me the second query is doing the filter at >> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE >> (after filtered) respectively? >> >> For sort merge join, anyone knows are the both sides get shipped to >> client to do the merge sort? >> >> Thanks, >> >> >> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier ( >> pomperma...@okkam.it) wrote: >> >> Any help here...? >> >> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: >> >>> Hi to all, >>> I'm trying to find the best query for my use case but I found that one >>> version work and the other one does not (unless that I don't apply some >>> tuning to timeouts etc like explained in [1]). >>> >>> The 2 queries extract the same data but, while the first query >>> terminates the second does not. >>> *PS*: without the USE_SORT_MERGE_JOIN both queries weren't working >>> >>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID >>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE; >>> >>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ >>> | PLAN >>> | EST_BYTES_READ | EST_ROWS_READ >>> | EST_INFO_TS | >>> >>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ >>> | SORT-MERGE-JOIN (INNER) TABLES >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL >>> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> | >>> 12077867 | 1513754378759 | >>> | SERVER FILTER BY FIRST KEY ONLY >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT MERGE SORT >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | AND (SKIP MERGE) >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY >>> RANGE SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >>> HAS_CANDIDATES = false) | 14155777900 >>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>> | SERVER SORTED BY [L.LOCALID] >>> | 14155777900 >>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>> | CLIENT MERGE SORT >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT AGGREGATE INTO SINGLE ROW >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> >>> +---------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ >>> >>> >>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >>> FROM (SELECT LOCALID FROM MYTABLE >>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN >>> PEOPLE ds ON ds.PERSON_ID = l.LOCALID; >>> >>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ >>> | PLAN >>> | EST_BYTES_READ | EST_ROWS_READ >>> | EST_INFO_TS | >>> >>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ >>> | SORT-MERGE-JOIN (INNER) TABLES >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE >>> SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> | >>> 12077867 | 1513754378759 | >>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >>> HAS_CANDIDATES = false) | 14155777900 >>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>> | CLIENT MERGE SORT >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | AND (SKIP MERGE) >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY >>> FULL SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | SERVER FILTER BY FIRST KEY ONLY >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | SERVER SORTED BY [DS.PERSON_ID] >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT MERGE SORT >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> | CLIENT AGGREGATE INTO SINGLE ROW >>> | 14155777900 <(415)%20577-7900> >>> | 12077867 | 1513754378759 | >>> >>> +--------------------------------------------------------------------------------------------------------------+-----------------+----------------+----------------+ >>> 10 rows selected (0.041 seconds) >>> >>> What do you think? Whould I try to give more resources to HBase/Phoenix >>> or is the first query the best (and reliable) one? >>> Any insight about this is highly appreciated.. >>> >>> Best, >>> Flavio >>> >> >