For sort merge join, both post-filtered table results are sorted on the server side and then a merge sort is done on the client-side.
On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote: > Hello Flavio, > > From the plan looks like to me the second query is doing the filter at > parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE > (after filtered) respectively? > > For sort merge join, anyone knows are the both sides get shipped to client > to do the merge sort? > > Thanks, > > > On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier ( > pomperma...@okkam.it) wrote: > > Any help here...? > > On 20 Dec 2017 17:58, "Flavio Pompermaier" <pomperma...@okkam.it> wrote: > >> Hi to all, >> I'm trying to find the best query for my use case but I found that one >> version work and the other one does not (unless that I don't apply some >> tuning to timeouts etc like explained in [1]). >> >> The 2 queries extract the same data but, while the first query terminates >> the second does not. >> *PS*: without the USE_SORT_MERGE_JOIN both queries weren't working >> >> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID >> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE; >> +----------------------------------------------------------- >> ----------------------------------------------------+------- >> ----------+----------------+----------------+ >> | PLAN >> | EST_BYTES_READ | EST_ROWS_READ | >> EST_INFO_TS | >> +----------------------------------------------------------- >> ----------------------------------------------------+------- >> ----------+----------------+----------------+ >> | SORT-MERGE-JOIN (INNER) TABLES >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL >> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> | >> 12077867 | 1513754378759 | >> | SERVER FILTER BY FIRST KEY ONLY >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT MERGE SORT >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | AND (SKIP MERGE) >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE >> SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> | >> 12077867 | 1513754378759 | >> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >> HAS_CANDIDATES = false) | 14155777900 >> <(415)%20577-7900> | 12077867 | 1513754378759 | >> | SERVER SORTED BY [L.LOCALID] >> | 14155777900 >> <(415)%20577-7900> | 12077867 | 1513754378759 | >> | CLIENT MERGE SORT >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT AGGREGATE INTO SINGLE ROW >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> +----------------------------------------------------------- >> ----------------------------------------------------+------- >> ----------+----------------+----------------+ >> >> >> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) >> FROM (SELECT LOCALID FROM MYTABLE >> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN >> PEOPLE ds ON ds.PERSON_ID = l.LOCALID; >> +----------------------------------------------------------- >> ---------------------------------------------------+-------- >> ---------+----------------+----------------+ >> | PLAN >> | EST_BYTES_READ | EST_ROWS_READ | >> EST_INFO_TS | >> +----------------------------------------------------------- >> ---------------------------------------------------+-------- >> ---------+----------------+----------------+ >> | SORT-MERGE-JOIN (INNER) TABLES >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE >> SCAN OVER MYTABLE [0] - [2] | 14155777900 <(415)%20577-7900> | >> 12077867 | 1513754378759 | >> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND >> HAS_CANDIDATES = false) | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT MERGE SORT >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | AND (SKIP MERGE) >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL >> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> | >> 12077867 | 1513754378759 | >> | SERVER FILTER BY FIRST KEY ONLY >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | SERVER SORTED BY [DS.PERSON_ID] >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT MERGE SORT >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> | CLIENT AGGREGATE INTO SINGLE ROW >> | 14155777900 <(415)%20577-7900> >> | 12077867 | 1513754378759 | >> +----------------------------------------------------------- >> ---------------------------------------------------+-------- >> ---------+----------------+----------------+ >> 10 rows selected (0.041 seconds) >> >> What do you think? Whould I try to give more resources to HBase/Phoenix >> or is the first query the best (and reliable) one? >> Any insight about this is highly appreciated.. >> >> Best, >> Flavio >> >