Re: Query optimization

2017-12-22 Thread Flavio Pompermaier
MYTABLE is definitely much bigger than PEOPLE table, in terms of
cardinality. In terms of cells (rows x columns) PEOPLE is probably bigger

On 22 Dec 2017 22:36, "Ethan"  wrote:

> I see. I think client side probably hold on to the iterators from the both
> sides and crawling forward to do the merge sort. in this case should be no
> much memory footprint either way where the filter is performed.
>
> On December 22, 2017 at 1:04:18 PM, James Taylor (jamestay...@apache.org)
> wrote:
>
> There’s no shipping of any tables with a sort merge join.
>
> On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang  wrote:
>
>> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
>> around without get filtered first. Just for experiment, if you took out
>> hint USE_SORT_MERGE_JOIN, what will be the plan?
>>
>>
>> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org)
>> wrote:
>>
>> For sort merge join, both post-filtered table results are sorted on the
>> server side and then a merge sort is done on the client-side.
>>
>> On Fri, Dec 22, 2017 at 12:44 PM, Ethan  wrote:
>>
>>> Hello Flavio,
>>>
>>> From the plan looks like to me the second query is doing the filter at
>>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>>> (after filtered) respectively?
>>>
>>> For sort merge join, anyone knows are the both sides get shipped to
>>> client to do the merge sort?
>>>
>>> Thanks,
>>>
>>>
>>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>>> pomperma...@okkam.it) wrote:
>>>
>>> Any help here...?
>>>
>>> On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:
>>>
 Hi to all,
 I'm trying to find the best query for my use case but I found that one
 version work and the other one does not (unless that I don't apply some
 tuning to timeouts etc like explained in [1]).

 The 2 queries extract the same data but, while the first query
 terminates the second does not.
 *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working

 SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
 FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
 WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
 +---
 +---
 --+++
 | PLAN
 | EST_BYTES_READ  | EST_ROWS_READ
 |  EST_INFO_TS   |
 +---
 +---
 --+++
 | SORT-MERGE-JOIN (INNER) TABLES
 | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY
 FULL SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900>
| 12077867   | 1513754378759  |
 | SERVER FILTER BY FIRST KEY ONLY
  | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | CLIENT MERGE SORT
  | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | AND (SKIP MERGE)
 | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
 RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
  | 12077867   | 1513754378759  |
 | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
 HAS_CANDIDATES = false)   | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | SERVER SORTED BY [L.LOCALID]
 | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | CLIENT MERGE SORT
  | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 | CLIENT AGGREGATE INTO SINGLE ROW
 | 14155777900
 <(415)%20577-7900> | 12077867   | 1513754378759  |
 +---
 +---
 --+++


 SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
 FROM (SELECT LOCALID FROM MYTABLE
 WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l
 JOIN PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
 +---
 

Re: Query optimization

2017-12-22 Thread Ethan
I see. I think client side probably hold on to the iterators from the both 
sides and crawling forward to do the merge sort. in this case should be no much 
memory footprint either way where the filter is performed. 
On December 22, 2017 at 1:04:18 PM, James Taylor (jamestay...@apache.org) wrote:

There’s no shipping of any tables with a sort merge join.

On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang  wrote:
I see. Looks like it's possible the rhs (MYTABLE) is too big to ship around 
without get filtered first. Just for experiment, if you took out hint 
USE_SORT_MERGE_JOIN, what will be the plan?


On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org) 
wrote:

For sort merge join, both post-filtered table results are sorted on the server 
side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan  wrote:
Hello Flavio,

From the plan looks like to me the second query is doing the filter at parent 
table (PEOPLE). So what is the size of your PEOPLE and MYTABLE (after filtered) 
respectively?

For sort merge join, anyone knows are the both sides get shipped to client to 
do the merge sort?

Thanks,


On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (pomperma...@okkam.it) 
wrote:

Any help here...?

On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:
Hi to all,
I'm trying to find the best query for my use case but I found that one version 
work and the other one does not (unless that I don't apply some tuning to 
timeouts etc like explained in [1]).

The 2 queries extract the same data but, while the first query terminates the 
second does not.
PS:  without the USE_SORT_MERGE_JOIN both queries weren't working

SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
+---+-+++
|                                                     PLAN                      
                                | EST_BYTES_READ  | EST_ROWS_READ  |  
EST_INFO_TS   |
+---+-+++
| SORT-MERGE-JOIN (INNER) TABLES                                                
                                | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY FULL SCAN 
OVER PEOPLE                 | 14155777900     | 12077867       | 1513754378759  
|
|         SERVER FILTER BY FIRST KEY ONLY                                       
                                | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT MERGE SORT                                                         
                                | 14155777900     | 12077867       | 
1513754378759  |
| AND (SKIP MERGE)                                                              
                                | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE SCAN 
OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES 
= false)                   | 14155777900     | 12077867       | 1513754378759  |
|         SERVER SORTED BY [L.LOCALID]                                          
                                    | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT MERGE SORT                                                         
                                | 14155777900     | 12077867       | 
1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW                                              
                                | 14155777900     | 12077867       | 
1513754378759  |
+---+-+++


SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM (SELECT LOCALID FROM MYTABLE
WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN PEOPLE 
 ds ON ds.PERSON_ID = l.LOCALID;
+--+-+++
|                                                     PLAN                      
                               | EST_BYTES_READ  | EST_ROWS_READ  |  
EST_INFO_TS   |
+--+-+++
| SORT-MERGE-JOIN (INNER) TABLES                     

Re: Query optimization

2017-12-22 Thread James Taylor
There’s no shipping of any tables with a sort merge join.

On Fri, Dec 22, 2017 at 1:02 PM Ethan Wang  wrote:

> I see. Looks like it's possible the rhs (MYTABLE) is too big to ship
> around without get filtered first. Just for experiment, if you took out
> hint USE_SORT_MERGE_JOIN, what will be the plan?
>
>
> On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org)
> wrote:
>
> For sort merge join, both post-filtered table results are sorted on the
> server side and then a merge sort is done on the client-side.
>
> On Fri, Dec 22, 2017 at 12:44 PM, Ethan  wrote:
>
>> Hello Flavio,
>>
>> From the plan looks like to me the second query is doing the filter at
>> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
>> (after filtered) respectively?
>>
>> For sort merge join, anyone knows are the both sides get shipped to
>> client to do the merge sort?
>>
>> Thanks,
>>
>>
>> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
>> pomperma...@okkam.it) wrote:
>>
>> Any help here...?
>>
>> On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:
>>
>>> Hi to all,
>>> I'm trying to find the best query for my use case but I found that one
>>> version work and the other one does not (unless that I don't apply some
>>> tuning to timeouts etc like explained in [1]).
>>>
>>> The 2 queries extract the same data but, while the first query
>>> terminates the second does not.
>>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>>
>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>>>
>>> +---+-+++
>>> | PLAN
>>> | EST_BYTES_READ  | EST_ROWS_READ
>>> |  EST_INFO_TS   |
>>>
>>> +---+-+++
>>> | SORT-MERGE-JOIN (INNER) TABLES
>>> | 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>> | CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY FULL
>>> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> |
>>> 12077867   | 1513754378759  |
>>> | SERVER FILTER BY FIRST KEY ONLY
>>>| 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>> | CLIENT MERGE SORT
>>>| 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>> | AND (SKIP MERGE)
>>> | 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY
>>> RANGE SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900>
>>>  | 12077867   | 1513754378759  |
>>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>>> HAS_CANDIDATES = false)   | 14155777900
>>> <(415)%20577-7900> | 12077867   | 1513754378759  |
>>> | SERVER SORTED BY [L.LOCALID]
>>> | 14155777900
>>> <(415)%20577-7900> | 12077867   | 1513754378759  |
>>> | CLIENT MERGE SORT
>>>| 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>> | CLIENT AGGREGATE INTO SINGLE ROW
>>> | 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>>
>>> +---+-+++
>>>
>>>
>>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>>> FROM (SELECT LOCALID FROM MYTABLE
>>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>>>
>>> +--+-+++
>>> | PLAN
>>>| EST_BYTES_READ  | EST_ROWS_READ
>>> |  EST_INFO_TS   |
>>>
>>> +--+-+++
>>> | SORT-MERGE-JOIN (INNER) TABLES
>>>| 14155777900 <(415)%20577-7900>
>>>| 12077867   | 1513754378759  |
>>> | CLIENT 15-CHUNK 5908964 ROWS 

Re: Query optimization

2017-12-22 Thread Ethan Wang
I see. Looks like it's possible the rhs (MYTABLE) is too big to ship around
without get filtered first. Just for experiment, if you took out hint
USE_SORT_MERGE_JOIN, what will be the plan?


On December 22, 2017 at 12:46:25 PM, James Taylor (jamestay...@apache.org)
wrote:

For sort merge join, both post-filtered table results are sorted on the
server side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan  wrote:

> Hello Flavio,
>
> From the plan looks like to me the second query is doing the filter at
> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
> (after filtered) respectively?
>
> For sort merge join, anyone knows are the both sides get shipped to client
> to do the merge sort?
>
> Thanks,
>
>
> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
> pomperma...@okkam.it) wrote:
>
> Any help here...?
>
> On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:
>
>> Hi to all,
>> I'm trying to find the best query for my use case but I found that one
>> version work and the other one does not (unless that I don't apply some
>> tuning to timeouts etc like explained in [1]).
>>
>> The 2 queries extract the same data but, while the first query terminates
>> the second does not.
>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>> +---
>> +---
>> --+++
>> | PLAN
>>   | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +---
>> +---
>> --+++
>> | SORT-MERGE-JOIN (INNER) TABLES
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY FULL
>> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> |
>> 12077867   | 1513754378759  |
>> | SERVER FILTER BY FIRST KEY ONLY
>>| 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT MERGE SORT
>>| 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | AND (SKIP MERGE)
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900> |
>> 12077867   | 1513754378759  |
>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)   | 14155777900
>> <(415)%20577-7900> | 12077867   | 1513754378759  |
>> | SERVER SORTED BY [L.LOCALID]
>>   | 14155777900
>> <(415)%20577-7900> | 12077867   | 1513754378759  |
>> | CLIENT MERGE SORT
>>| 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> +---
>> +---
>> --+++
>>
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM (SELECT LOCALID FROM MYTABLE
>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>> +---
>> ---+
>> -+++
>> | PLAN
>>  | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +---
>> ---+
>> -+++
>> | SORT-MERGE-JOIN (INNER) TABLES
>>  | 14155777900 <(415)%20577-7900>
>>  | 12077867   | 1513754378759  |
>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900> |
>> 12077867   | 1513754378759  |
>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> 

Re: Query optimization

2017-12-22 Thread James Taylor
For sort merge join, both post-filtered table results are sorted on the
server side and then a merge sort is done on the client-side.

On Fri, Dec 22, 2017 at 12:44 PM, Ethan  wrote:

> Hello Flavio,
>
> From the plan looks like to me the second query is doing the filter at
> parent table (PEOPLE). So what is the size of your PEOPLE and MYTABLE
> (after filtered) respectively?
>
> For sort merge join, anyone knows are the both sides get shipped to client
> to do the merge sort?
>
> Thanks,
>
>
> On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (
> pomperma...@okkam.it) wrote:
>
> Any help here...?
>
> On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:
>
>> Hi to all,
>> I'm trying to find the best query for my use case but I found that one
>> version work and the other one does not (unless that I don't apply some
>> tuning to timeouts etc like explained in [1]).
>>
>> The 2 queries extract the same data but, while the first query terminates
>> the second does not.
>> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
>> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
>> +---
>> +---
>> --+++
>> | PLAN
>>   | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +---
>> +---
>> --+++
>> | SORT-MERGE-JOIN (INNER) TABLES
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY FULL
>> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> |
>> 12077867   | 1513754378759  |
>> | SERVER FILTER BY FIRST KEY ONLY
>>| 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT MERGE SORT
>>| 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | AND (SKIP MERGE)
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900> |
>> 12077867   | 1513754378759  |
>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)   | 14155777900
>> <(415)%20577-7900> | 12077867   | 1513754378759  |
>> | SERVER SORTED BY [L.LOCALID]
>>   | 14155777900
>> <(415)%20577-7900> | 12077867   | 1513754378759  |
>> | CLIENT MERGE SORT
>>| 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT AGGREGATE INTO SINGLE ROW
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> +---
>> +---
>> --+++
>>
>>
>> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
>> FROM (SELECT LOCALID FROM MYTABLE
>> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
>> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
>> +---
>> ---+
>> -+++
>> | PLAN
>>  | EST_BYTES_READ  | EST_ROWS_READ  |
>> EST_INFO_TS   |
>> +---
>> ---+
>> -+++
>> | SORT-MERGE-JOIN (INNER) TABLES
>>  | 14155777900 <(415)%20577-7900>
>>  | 12077867   | 1513754378759  |
>> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
>> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900> |
>> 12077867   | 1513754378759  |
>> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
>> HAS_CANDIDATES = false)  | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | CLIENT MERGE SORT
>>   | 14155777900 <(415)%20577-7900>
>>| 12077867   | 1513754378759  |
>> | AND (SKIP MERGE)
>>  

Re: Query optimization

2017-12-22 Thread Ethan
Hello Flavio,

From the plan looks like to me the second query is doing the filter at parent 
table (PEOPLE). So what is the size of your PEOPLE and MYTABLE (after filtered) 
respectively?

For sort merge join, anyone knows are the both sides get shipped to client to 
do the merge sort?

Thanks,


On December 22, 2017 at 9:58:30 AM, Flavio Pompermaier (pomperma...@okkam.it) 
wrote:

Any help here...?

On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:
Hi to all,
I'm trying to find the best query for my use case but I found that one version 
work and the other one does not (unless that I don't apply some tuning to 
timeouts etc like explained in [1]).

The 2 queries extract the same data but, while the first query terminates the 
second does not.
PS:  without the USE_SORT_MERGE_JOIN both queries weren't working

SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
+---+-+++
|                                                     PLAN                      
                                | EST_BYTES_READ  | EST_ROWS_READ  |  
EST_INFO_TS   |
+---+-+++
| SORT-MERGE-JOIN (INNER) TABLES                                                
                                | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY FULL SCAN 
OVER PEOPLE                 | 14155777900     | 12077867       | 1513754378759  
|
|         SERVER FILTER BY FIRST KEY ONLY                                       
                                | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT MERGE SORT                                                         
                                | 14155777900     | 12077867       | 
1513754378759  |
| AND (SKIP MERGE)                                                              
                                | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE SCAN 
OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES 
= false)                   | 14155777900     | 12077867       | 1513754378759  |
|         SERVER SORTED BY [L.LOCALID]                                          
                                    | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT MERGE SORT                                                         
                                | 14155777900     | 12077867       | 
1513754378759  |
| CLIENT AGGREGATE INTO SINGLE ROW                                              
                                | 14155777900     | 12077867       | 
1513754378759  |
+---+-+++


SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*) 
FROM (SELECT LOCALID FROM MYTABLE
WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN PEOPLE 
 ds ON ds.PERSON_ID = l.LOCALID;
+--+-+++
|                                                     PLAN                      
                               | EST_BYTES_READ  | EST_ROWS_READ  |  
EST_INFO_TS   |
+--+-+++
| SORT-MERGE-JOIN (INNER) TABLES                                                
                               | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE SCAN 
OVER MYTABLE [0] - [2]  | 14155777900     | 12077867       | 1513754378759  |
|         SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES 
= false)                  | 14155777900     | 12077867       | 1513754378759  |
|     CLIENT MERGE SORT                                                         
                               | 14155777900     | 12077867       | 
1513754378759  |
| AND (SKIP MERGE)                                                              
                               | 14155777900     | 12077867       | 
1513754378759  |
|     CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 42-WAY FULL SCAN 
OVER PEOPLE               | 14155777900     | 12077867       | 

Re: Query optimization

2017-12-22 Thread Flavio Pompermaier
Any help here...?

On 20 Dec 2017 17:58, "Flavio Pompermaier"  wrote:

> Hi to all,
> I'm trying to find the best query for my use case but I found that one
> version work and the other one does not (unless that I don't apply some
> tuning to timeouts etc like explained in [1]).
>
> The 2 queries extract the same data but, while the first query terminates
> the second does not.
> *PS*:  without the USE_SORT_MERGE_JOIN both queries weren't working
>
> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
> FROM PEOPLE ds JOIN MYTABLE l ON ds.PERSON_ID = l.LOCALID
> WHERE l.EID IS NULL AND l.DSID = 'PEOPLE' AND l.HAS_CANDIDATES = FALSE;
> +---
> +---
> --+++
> | PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
> +---
> +---
> --+++
> | SORT-MERGE-JOIN (INNER) TABLES
>   | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 3-WAY FULL
> SCAN OVER PEOPLE | 14155777900 <(415)%20577-7900> |
> 12077867   | 1513754378759  |
> | SERVER FILTER BY FIRST KEY ONLY
>  | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | CLIENT MERGE SORT
>  | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | AND (SKIP MERGE)
>   | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 15-WAY RANGE
> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900> |
> 12077867   | 1513754378759  |
> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
> HAS_CANDIDATES = false)   | 14155777900 <(415)%20577-7900>
>| 12077867   | 1513754378759  |
> | SERVER SORTED BY [L.LOCALID]
>   | 14155777900 <(415)%20577-7900>
>| 12077867   | 1513754378759  |
> | CLIENT MERGE SORT
>  | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | CLIENT AGGREGATE INTO SINGLE ROW
>   | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> +---
> +---
> --+++
>
>
> SELECT /*+ USE_SORT_MERGE_JOIN */ COUNT(*)
> FROM (SELECT LOCALID FROM MYTABLE
> WHERE EID IS NULL AND DSID = 'PEOPLE' AND HAS_CANDIDATES = FALSE) l JOIN
> PEOPLE  ds ON ds.PERSON_ID = l.LOCALID;
> +---
> ---+
> -+++
> | PLAN
>  | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
> +---
> ---+
> -+++
> | SORT-MERGE-JOIN (INNER) TABLES
>  | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | CLIENT 15-CHUNK 5908964 ROWS 2831155679 BYTES PARALLEL 3-WAY RANGE
> SCAN OVER MYTABLE [0] - [2]  | 14155777900 <(415)%20577-7900> |
> 12077867   | 1513754378759  |
> | SERVER FILTER BY (EID IS NULL AND DSID = 'PEOPLE' AND
> HAS_CANDIDATES = false)  | 14155777900 <(415)%20577-7900>
>| 12077867   | 1513754378759  |
> | CLIENT MERGE SORT
> | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | AND (SKIP MERGE)
>  | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | CLIENT 42-CHUNK 6168903 ROWS 1132461 BYTES PARALLEL 42-WAY FULL
> SCAN OVER PEOPLE   | 14155777900 <(415)%20577-7900> |
> 12077867   | 1513754378759  |
> | SERVER FILTER BY FIRST KEY ONLY
> | 14155777900 <(415)%20577-7900>
>  | 12077867   | 1513754378759  |
> | SERVER SORTED BY [DS.PERSON_ID]
>   | 14155777900 <(415)%20577-7900> |
> 12077867   | 1513754378759  |
> | CLIENT MERGE SORT
> | 14155777900 <(415)%20577-7900>
>  | 12077867   | 

Re: jdbc driver - fat vs thin and hbase-site.xml

2017-12-22 Thread Marcelo Valle
Hi Josh,

Thanks for answering. Out of curiosity, is there any way of connecting to
phoenix without having to add files on class path? I was hoping it would
work if I inform the DNS of a single node of zoo keeper quorum, as the rest
of the nodes can be found from the node itself.

Thanks,
Marcelo.


On 21 December 2017 at 20:02, Josh Elser  wrote:

> Hi Marcelo,
>
> The requirement for hbase-site.xml and core-site.xml to be on the
> classpath are a "wart", resulting from the close ties to HBase and Hadoop
> -- much of the Kerberos authentication code is inherited from Hadoop, and
> thus requires some of that configuration.
>
> I'd encourage you to look at the contents of the core-site.xml and
> hbase-site.xml files. There is very likely nothing that you would need to
> change in these files if you add a new node to your cluster.
>
> The thin-client talks to the Phoenix Query Server instead of talking
> directly to HBase which does simplify the scenario. However, using the thin
> driver implies that you're (always) doing more work to answer a equivalent
> query as compared to the thick driver.
>
>
> On 12/21/17 11:11 AM, Marcelo Valle wrote:
>
>> Hi,
>>
>> I am new to Apache Phoenix and I am trying to use it in a new project. I
>> am sorry if I am asking something basic here, but I was struggling to find
>> this info on the web.
>>
>> I successfully created a simple application using jdbc fat driver,
>> starting up phoenix using HBaseTestingUtility without any problems, I was
>> able to connect, test, and make sure my code is working as expected.
>>
>> Now it's time to connect to the real cluster we have in dev environment.
>> I was surprised when I found out that just JDBC connection url and kerberos
>> auth are not enough to connect to the real cluster, I have to add some
>> hbase xml files with the cluster information to the classpath, so the
>> driver can know about the cluster nodes.
>>
>> Question - is there a way to avoid that? If I use the thin driver, will I
>> still need these XML files? Suppose I add a new node to my phoenix cluster,
>> does it mean I will have to redeploy my application so the new node can be
>> considered?
>>
>> Thanks,
>> --
>> Marcelo Valle
>> http://mvalle.com - @mvallebr
>>
>


-- 
Marcelo Valle
http://mvalle.com - @mvallebr