Re: Hash aggregation

Gerald Sangudi Wed, 23 May 2018 11:00:47 -0700

Hi Maryann,

I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.


Is this likely to be reviewed soon (say next few weeks), or should I look
at the Phoenix source to estimate the scope / impact?

Thanks,
Gerald

On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <[email protected]> wrote:

> Since the performance running a group-by aggregation on client side is
> most likely bad, it’s usually not desired. The original implementation was
> for functionality completeness only so it chose the easiest way, which
> reused some existing classes. In some cases, though, the client group-by
> can still be tolerable if there aren’t many distinct keys. So yes, please
> open a JIRA for implementing hash aggregation on client side. Thank you!
>
>
> Thanks,
> Maryann
>
> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <[email protected]>
> wrote:
>
>> Hello,
>>
>> Any guidance or thoughts on the thread below?
>>
>> Thanks,
>> Gerald
>>
>>
>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <[email protected]>
>> wrote:
>>
>>> Maryann,
>>>
>>> Can Phoenix provide hash aggregation on the client side? Are there
>>> design / implementation reasons not to, or should I file a ticket for this?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <[email protected]>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>>
>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>> aggregation is used in your query plan is that the aggregation happens on
>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>> which is a client driven join, and after that join stage all operations can
>>>> only be on the client-side.
>>>>
>>>>
>>>> Thanks,
>>>> Marynn
>>>>
>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>> sorted results.
>>>>>
>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>
>>>>> *CREATE TABLE unsalted (       keyA BIGINT NOT NULL,       keyB BIGINT
>>>>> NOT NULL,       val SMALLINT,       CONSTRAINT pk PRIMARY KEY (keyA,
>>>>> keyB));*
>>>>>
>>>>>
>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>> BY t1.val,
>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+|
>>>>>                            PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>> |+------------------------------------------------------------+-----------------+----------------+--+|
>>>>> SORT-MERGE-JOIN (INNER) TABLES                             | null | null |
>>>>> ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | 
>>>>> null
>>>>> | || AND                                                        | null |
>>>>> null | ||     CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | 
>>>>> null
>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]              |
>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>    | null | null |
>>>>> |+------------------------------------------------------------+-----------------+----------------+--+*
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>
>>>>
>>>
>>

Re: Hash aggregation

Reply via email to