MYTABLE is definitely much bigger than PEOPLE table, in terms of cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger
On 22 Dec 2017 22:36, "Ethan" <ew...@apache.org> wrote: > I see. I think client side probably hold on to the iterators from the both > sides and crawling forward to do the merge sort. in this case should be no > much memory footprint either way where the filter is performed. > > On December 22, 2017 at 1:04:18 PM, James Taylor (jamestay...@apache.org) > wrote: > > There’s no shipping of any tables with a sort merge join. > > On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang <aerto...@gmail.com> wrote: > >> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship >> around without get filtered first. Just for experiment, if you took out >> hint USE_SORT_MERGE_JOIN, what will be the plan? >> >> >> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org) >> wrote: >> >> For sort merge join, both post-filtered table results are sorted on the >> server side and then a merge sort is done on the client-side. >> >> On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote: >> >>> Hello Flavio, >>> >>> From the plan looks like to me the second query is doing the filter at >>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE >>> (after filtered) respectively? >>> >>> For sort merge join, anyone knows are the both sides get shipped to >>> client to do the merge sort? >>> >>> Thanks, >>> >>> >>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier ( >>> pomperma...@okkam.it) wrote: >>> >>> Any help here...? >>> >>> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: >>> >>>> Hi to all, >>>> I'm trying to find the best query for my use case but I found that one >>>> version work and the other one does not (unless that I don't apply some >>>> tuning to timeouts etc like explained in [1]). >>>> >>>> The 2 queries extract the same data but, while the first query >>>> terminates the second does not. >>>> *PS*: without the USE_SORT_MERGE_JOIN both queries weren't working >>>> >>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID >>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE; >>>> +----------------------------------------------------------- >>>> ----------------------------------------------------+------- >>>> ----------+----------------+----------------+ >>>> | PLAN >>>> | EST_BYTES_READ | EST_ROWS_READ >>>> | EST_INFO_TS | >>>> +----------------------------------------------------------- >>>> ----------------------------------------------------+------- >>>> ----------+----------------+----------------+ >>>> | SORT-MERGE-JOIN (INNER) TABLES >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY >>>> FULL SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | SERVER FILTER BY FIRST KEY ONLY >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT MERGE SORT >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | AND (SKIP MERGE) >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY >>>> RANGE SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >>>> HAS_CANDIDATES = false) | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | SERVER SORTED BY [L.LOCALID] >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT MERGE SORT >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT AGGREGATE INTO SINGLE ROW >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> +----------------------------------------------------------- >>>> ----------------------------------------------------+------- >>>> ----------+----------------+----------------+ >>>> >>>> >>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >>>> FROM (SELECT LOCALID FROM MYTABLE >>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l >>>> JOIN PEOPLE ds ON ds.PERSON_ID = l.LOCALID; >>>> +----------------------------------------------------------- >>>> ---------------------------------------------------+-------- >>>> ---------+----------------+----------------+ >>>> | PLAN >>>> | EST_BYTES_READ | EST_ROWS_READ >>>> | EST_INFO_TS | >>>> +----------------------------------------------------------- >>>> ---------------------------------------------------+-------- >>>> ---------+----------------+----------------+ >>>> | SORT-MERGE-JOIN (INNER) TABLES >>>> | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY >>>> RANGE SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >>>> HAS_CANDIDATES = false) | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT MERGE SORT >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | AND (SKIP MERGE) >>>> | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY >>>> FULL SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | SERVER FILTER BY FIRST KEY ONLY >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | SERVER SORTED BY [DS.PERSON_ID] >>>> | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> | CLIENT MERGE SORT >>>> | 14155777900 >>>> <(415)%20577-7900> | 12077867 | 1513754378759 | >>>> | CLIENT AGGREGATE INTO SINGLE ROW >>>> | 14155777900 <(415)%20577-7900> >>>> | 12077867 | 1513754378759 | >>>> +----------------------------------------------------------- >>>> ---------------------------------------------------+-------- >>>> ---------+----------------+----------------+ >>>> 10 rows selected (0.041 seconds) >>>> >>>> What do you think? Whould I try to give more resources to HBase/Phoenix >>>> or is the first query the best (and reliable) one? >>>> Any insight about this is highly appreciated.. >>>> >>>> Best, >>>> Flavio >>>> >>> >>