For sort merge join, both post-filtered table results are sorted on the
server side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan <ew...@apache.org> wrote:

> Hello Flavio,
>
> From the plan looks like to me the second query is doing the filter at
> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
> (after filtered) respectively?
>
> For sort merge join, anyone knows are the both sides get shipped to client
> to do the merge sort?
>
> Thanks,
>
>
> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
> pomperma...@okkam.it) wrote:
>
> Any help here...?
>
> On 20 Dec 2017 17:58, "Flavio Pompermaier" <pomperma...@okkam.it> wrote:
>
>> Hi to all,
>> I'm trying to find the best query for my use case but I found that one
>> version work and the other one does not (unless that I don't apply some
>> tuning to timeouts etc like explained in [1]).
>>
>> The 2 queries extract the same data but, while the first query terminates
>> the second does not.
>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>> |                                                     PLAN
>>                                       | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 3-WAY FULL
>> SCAN OVER PEOPLE                 | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | AND (SKIP MERGE)
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)                   | 14155777900
>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>> |         SERVER SORTED BY [L.LOCALID]
>>                                           | 14155777900
>> <(415)%20577-7900>     | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                        | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> +-----------------------------------------------------------
>> ----------------------------------------------------+-------
>> ----------+----------------+----------------+
>>
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM (SELECT LOCALID FROM MYTABLE
>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> |                                                     PLAN
>>                                      | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> | SORT-MERGE-JOIN (INNER) TABLES
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)                  | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | AND (SKIP MERGE)
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT 42-CHUNK 6168903 ROWS 11324622221 BYTES PARALLEL 42-WAY FULL
>> SCAN OVER PEOPLE               | 14155777900 <(415)%20577-7900>     |
>> 12077867       | 1513754378759  |
>> |         SERVER FILTER BY FIRST KEY ONLY
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> |         SERVER SORTED BY [DS.PERSON_ID]
>>                                     | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> |     CLIENT MERGE SORT
>>                                       | 14155777900 <(415)%20577-7900>
>>    | 12077867       | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>                                      | 14155777900 <(415)%20577-7900>
>>  | 12077867       | 1513754378759  |
>> +-----------------------------------------------------------
>> ---------------------------------------------------+--------
>> ---------+----------------+----------------+
>> 10 rows selected (0.041 seconds)
>>
>> What do you think? Whould I try to give more resources to HBase/Phoenix
>> or is the first query the best (and reliable) one?
>> Any insight about this is highly appreciated..
>>
>> Best,
>> Flavio
>>
>

Reply via email to