Re: Hash aggregation

2018-07-16 Thread Gerald Sangudi
Hi folks,

I've received a couple of reviews and applied some of the feedback.

Is there any way to know when this pull request will be merged?

Thanks,
Gerald

On Mon, Jul 9, 2018 at 9:36 AM, Gerald Sangudi  wrote:

> Hi folks,
>
> Any idea of when this might be reviewed? I realize there are many open
> tasks.
>
> Thanks,
> Gerald
>
>
> On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi 
> wrote:
>
>> Hello all,
>>
>> I've submitted a patch for this issue: https://github.com/apac
>> he/phoenix/pull/308
>>
>> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>>
>> Thanks,
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi 
>> wrote:
>>
>>> Thanks James. Looking into that.
>>>
>>> Gerald
>>>
>>>
>>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor 
>>> wrote:
>>>
>>>> Hi Gerald,
>>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>>> step would be a patch?
>>>> Thanks,
>>>> James
>>>>
>>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi 
>>>> wrote:
>>>>
>>>>> Hi Maryann and James,
>>>>>
>>>>> Any further guidance on PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi >>>> > wrote:
>>>>>
>>>>>> Hi Maryann,
>>>>>>
>>>>>> I filed PHOENIX-4751
>>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>>
>>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue 
>>>>>> wrote:
>>>>>>
>>>>>>> Since the performance running a group-by aggregation on client side
>>>>>>> is most likely bad, it’s usually not desired. The original 
>>>>>>> implementation
>>>>>>> was for functionality completeness only so it chose the easiest way, 
>>>>>>> which
>>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>>> can still be tolerable if there aren’t many distinct keys. So yes, 
>>>>>>> please
>>>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maryann
>>>>>>>
>>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <
>>>>>>> gsang...@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>>> gsang...@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Maryann,
>>>>>>>>>
>>>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>>>> design / implementation reasons not to, or should I file a ticket for 
>>>>>>>>> this?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>>> maryann@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Gerald,
>>>>>>>>>>
>>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>>> aggregation is used in your query plan is that the aggregation 
>>>>>>>>>> happens on
>>>&

Re: Hash aggregation

2018-07-09 Thread Gerald Sangudi
Hi folks,

Any idea of when this might be reviewed? I realize there are many open
tasks.

Thanks,
Gerald

On Mon, Jul 2, 2018 at 1:54 PM, Gerald Sangudi  wrote:

> Hello all,
>
> I've submitted a patch for this issue: https://github.com/
> apache/phoenix/pull/308
>
> The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751
>
> Thanks,
> Gerald
>
>
> On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi 
> wrote:
>
>> Thanks James. Looking into that.
>>
>> Gerald
>>
>>
>> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor 
>> wrote:
>>
>>> Hi Gerald,
>>> No further suggestions than my comments on the JIRA. Maybe a good next
>>> step would be a patch?
>>> Thanks,
>>> James
>>>
>>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi 
>>> wrote:
>>>
>>>> Hi Maryann and James,
>>>>
>>>> Any further guidance on PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi 
>>>> wrote:
>>>>
>>>>> Hi Maryann,
>>>>>
>>>>> I filed PHOENIX-4751
>>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>>
>>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>>> look at the Phoenix source to estimate the scope / impact?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue 
>>>>> wrote:
>>>>>
>>>>>> Since the performance running a group-by aggregation on client side
>>>>>> is most likely bad, it’s usually not desired. The original implementation
>>>>>> was for functionality completeness only so it chose the easiest way, 
>>>>>> which
>>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Maryann
>>>>>>
>>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi 
>>>>>> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Any guidance or thoughts on the thread below?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>>> gsang...@23andme.com> wrote:
>>>>>>>
>>>>>>>> Maryann,
>>>>>>>>
>>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>>> design / implementation reasons not to, or should I file a ticket for 
>>>>>>>> this?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <
>>>>>>>> maryann@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Gerald,
>>>>>>>>>
>>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>>> aggregation is used in your query plan is that the aggregation 
>>>>>>>>> happens on
>>>>>>>>> the client side. And that is because sort-merge join is used (as 
>>>>>>>>> hinted)
>>>>>>>>> which is a client driven join, and after that join stage all 
>>>>>>>>> operations can
>>>>>>>>> only be on the client-side.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Marynn
>>>>>>>>>
>>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>>> gsang...@23andme.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that 
>>>>>>>>>> do not
>>>>>>>>>> require sorted results.
>>>>>>>>>>
>>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>>
>>>>>>>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB
>>>>>>>>>> BIGINT NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY 
>>>>>>>>>> (keyA,
>>>>>>>>>> keyB));*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) 
>>>>>>>>>> GROUP
>>>>>>>>>> BY t1.val,
>>>>>>>>>> t2.val;++-++--+|
>>>>>>>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>>> |++-++--+|
>>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | 
>>>>>>>>>> null |
>>>>>>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null 
>>>>>>>>>> | null
>>>>>>>>>> | || AND| 
>>>>>>>>>> null |
>>>>>>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  
>>>>>>>>>> | null
>>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]
>>>>>>>>>>   |
>>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, 
>>>>>>>>>> T2.VAL]
>>>>>>>>>>| null | null |
>>>>>>>>>> |++-++--+*
>>>>>>>>>> Thanks,
>>>>>>>>>> Gerald
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>


Re: Hash aggregation

2018-07-02 Thread Gerald Sangudi
Hello all,

I've submitted a patch for this issue:
https://github.com/apache/phoenix/pull/308

The JIRA ticket is https://issues.apache.org/jira/browse/PHOENIX-4751

Thanks,
Gerald


On Thu, Jun 14, 2018 at 8:33 AM, Gerald Sangudi 
wrote:

> Thanks James. Looking into that.
>
> Gerald
>
>
> On Thu, Jun 14, 2018 at 6:30 AM, James Taylor 
> wrote:
>
>> Hi Gerald,
>> No further suggestions than my comments on the JIRA. Maybe a good next
>> step would be a patch?
>> Thanks,
>> James
>>
>> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi 
>> wrote:
>>
>>> Hi Maryann and James,
>>>
>>> Any further guidance on PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi 
>>> wrote:
>>>
>>>> Hi Maryann,
>>>>
>>>> I filed PHOENIX-4751
>>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>>
>>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>>> look at the Phoenix source to estimate the scope / impact?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue 
>>>> wrote:
>>>>
>>>>> Since the performance running a group-by aggregation on client side is
>>>>> most likely bad, it’s usually not desired. The original implementation was
>>>>> for functionality completeness only so it chose the easiest way, which
>>>>> reused some existing classes. In some cases, though, the client group-by
>>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi 
>>>>> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Any guidance or thoughts on the thread below?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <
>>>>>> gsang...@23andme.com> wrote:
>>>>>>
>>>>>>> Maryann,
>>>>>>>
>>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>>> design / implementation reasons not to, or should I file a ticket for 
>>>>>>> this?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue >>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Gerald,
>>>>>>>>
>>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>>> aggregation is used in your query plan is that the aggregation happens 
>>>>>>>> on
>>>>>>>> the client side. And that is because sort-merge join is used (as 
>>>>>>>> hinted)
>>>>>>>> which is a client driven join, and after that join stage all 
>>>>>>>> operations can
>>>>>>>> only be on the client-side.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Marynn
>>>>>>>>
>>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>>> gsang...@23andme.com> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that 
>>>>>>>>> do not
>>>>>>>>> require sorted results.
>>>>>>>>>
>>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>>
>>>>>>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB
>>>>>>>>> BIGINT NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY 
>>>>>>>>> (keyA,
>>>>>>>>> keyB));*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) 
>>>>>>>>> GROUP
>>>>>>>>> BY t1.val,
>>>>>>>>> t2.val;++-++--+|
>>>>>>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>>> |++-++--+|
>>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | 
>>>>>>>>> null |
>>>>>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null 
>>>>>>>>> | null
>>>>>>>>> | || AND| 
>>>>>>>>> null |
>>>>>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  
>>>>>>>>> | null
>>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL] 
>>>>>>>>>  |
>>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, 
>>>>>>>>> T2.VAL]
>>>>>>>>>| null | null |
>>>>>>>>> |++-++--+*
>>>>>>>>> Thanks,
>>>>>>>>> Gerald
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>


Re: Hash aggregation

2018-06-14 Thread Gerald Sangudi
Thanks James. Looking into that.

Gerald


On Thu, Jun 14, 2018 at 6:30 AM, James Taylor 
wrote:

> Hi Gerald,
> No further suggestions than my comments on the JIRA. Maybe a good next
> step would be a patch?
> Thanks,
> James
>
> On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi 
> wrote:
>
>> Hi Maryann and James,
>>
>> Any further guidance on PHOENIX-4751
>> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>>
>> Thanks,
>> Gerald
>>
>> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi 
>> wrote:
>>
>>> Hi Maryann,
>>>
>>> I filed PHOENIX-4751
>>> <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>>>
>>> Is this likely to be reviewed soon (say next few weeks), or should I
>>> look at the Phoenix source to estimate the scope / impact?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue 
>>> wrote:
>>>
>>>> Since the performance running a group-by aggregation on client side is
>>>> most likely bad, it’s usually not desired. The original implementation was
>>>> for functionality completeness only so it chose the easiest way, which
>>>> reused some existing classes. In some cases, though, the client group-by
>>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>>
>>>>
>>>> Thanks,
>>>> Maryann
>>>>
>>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi 
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Any guidance or thoughts on the thread below?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>>
>>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi >>>> > wrote:
>>>>>
>>>>>> Maryann,
>>>>>>
>>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>>> design / implementation reasons not to, or should I file a ticket for 
>>>>>> this?
>>>>>>
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Gerald,
>>>>>>>
>>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>>> aggregation is used in your query plan is that the aggregation happens 
>>>>>>> on
>>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>>> which is a client driven join, and after that join stage all operations 
>>>>>>> can
>>>>>>> only be on the client-side.
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marynn
>>>>>>>
>>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>>> gsang...@23andme.com> wrote:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the
>>>>>>>> roadmap, or should I file a ticket? We have aggregation queries that 
>>>>>>>> do not
>>>>>>>> require sorted results.
>>>>>>>>
>>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>>
>>>>>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB
>>>>>>>> BIGINT NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY 
>>>>>>>> (keyA,
>>>>>>>> keyB));*
>>>>>>>>
>>>>>>>>
>>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) 
>>>>>>>> GROUP
>>>>>>>> BY t1.val,
>>>>>>>> t2.val;++-++--+|
>>>>>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>>> |++-++--+|
>>>>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | 
>>>>>>>> null |
>>>>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | 
>>>>>>>> null
>>>>>>>> | || AND| null 
>>>>>>>> |
>>>>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | 
>>>>>>>> null
>>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  
>>>>>>>> |
>>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, 
>>>>>>>> T2.VAL]
>>>>>>>>| null | null |
>>>>>>>> |++-++--+*
>>>>>>>> Thanks,
>>>>>>>> Gerald
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>


Re: Hash aggregation

2018-06-14 Thread James Taylor
Hi Gerald,
No further suggestions than my comments on the JIRA. Maybe a good next step
would be a patch?
Thanks,
James

On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi 
wrote:

> Hi Maryann and James,
>
> Any further guidance on PHOENIX-4751
> <https://issues.apache.org/jira/browse/PHOENIX-4751>?
>
> Thanks,
> Gerald
>
> On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi 
> wrote:
>
>> Hi Maryann,
>>
>> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>
>> .
>>
>> Is this likely to be reviewed soon (say next few weeks), or should I look
>> at the Phoenix source to estimate the scope / impact?
>>
>> Thanks,
>> Gerald
>>
>> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue 
>> wrote:
>>
>>> Since the performance running a group-by aggregation on client side is
>>> most likely bad, it’s usually not desired. The original implementation was
>>> for functionality completeness only so it chose the easiest way, which
>>> reused some existing classes. In some cases, though, the client group-by
>>> can still be tolerable if there aren’t many distinct keys. So yes, please
>>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi 
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Any guidance or thoughts on the thread below?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>>
>>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi 
>>>> wrote:
>>>>
>>>>> Maryann,
>>>>>
>>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>>> design / implementation reasons not to, or should I file a ticket for 
>>>>> this?
>>>>>
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue 
>>>>> wrote:
>>>>>
>>>>>> Hi Gerald,
>>>>>>
>>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>>> which is a client driven join, and after that join stage all operations 
>>>>>> can
>>>>>> only be on the client-side.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Marynn
>>>>>>
>>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <
>>>>>> gsang...@23andme.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>>> or should I file a ticket? We have aggregation queries that do not 
>>>>>>> require
>>>>>>> sorted results.
>>>>>>>
>>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>>
>>>>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB
>>>>>>> BIGINT NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY 
>>>>>>> (keyA,
>>>>>>> keyB));*
>>>>>>>
>>>>>>>
>>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) 
>>>>>>> GROUP
>>>>>>> BY t1.val,
>>>>>>> t2.val;++-++--+|
>>>>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>>> |++-++--+|
>>>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | 
>>>>>>> null |
>>>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | 
>>>>>>> null
>>>>>>> | || AND| null |
>>>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | 
>>>>>>> null
>>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>>| null | null |
>>>>>>> |++-++--+*
>>>>>>> Thanks,
>>>>>>> Gerald
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>


Re: Hash aggregation

2018-06-12 Thread Gerald Sangudi
Hi Maryann and James,

Any further guidance on PHOENIX-4751
<https://issues.apache.org/jira/browse/PHOENIX-4751>?

Thanks,
Gerald

On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi 
wrote:

> Hi Maryann,
>
> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.
>
> Is this likely to be reviewed soon (say next few weeks), or should I look
> at the Phoenix source to estimate the scope / impact?
>
> Thanks,
> Gerald
>
> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue 
> wrote:
>
>> Since the performance running a group-by aggregation on client side is
>> most likely bad, it’s usually not desired. The original implementation was
>> for functionality completeness only so it chose the easiest way, which
>> reused some existing classes. In some cases, though, the client group-by
>> can still be tolerable if there aren’t many distinct keys. So yes, please
>> open a JIRA for implementing hash aggregation on client side. Thank you!
>>
>>
>> Thanks,
>> Maryann
>>
>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi 
>> wrote:
>>
>>> Hello,
>>>
>>> Any guidance or thoughts on the thread below?
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi 
>>> wrote:
>>>
>>>> Maryann,
>>>>
>>>> Can Phoenix provide hash aggregation on the client side? Are there
>>>> design / implementation reasons not to, or should I file a ticket for this?
>>>>
>>>> Thanks,
>>>> Gerald
>>>>
>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue 
>>>> wrote:
>>>>
>>>>> Hi Gerald,
>>>>>
>>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>>> aggregation is used in your query plan is that the aggregation happens on
>>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>>> which is a client driven join, and after that join stage all operations 
>>>>> can
>>>>> only be on the client-side.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Marynn
>>>>>
>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi >>>> > wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>>> or should I file a ticket? We have aggregation queries that do not 
>>>>>> require
>>>>>> sorted results.
>>>>>>
>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>>
>>>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB
>>>>>> BIGINT NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY 
>>>>>> (keyA,
>>>>>> keyB));*
>>>>>>
>>>>>>
>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>>> BY t1.val,
>>>>>> t2.val;++-++--+|
>>>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>>> |++-++--+|
>>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | null 
>>>>>> |
>>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | 
>>>>>> null
>>>>>> | || AND| null |
>>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | 
>>>>>> null
>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>>| null | null |
>>>>>> |++-++--+*
>>>>>> Thanks,
>>>>>> Gerald
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>


Re: Hash aggregation

2018-05-23 Thread Gerald Sangudi
Hi Maryann,

I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751>.

Is this likely to be reviewed soon (say next few weeks), or should I look
at the Phoenix source to estimate the scope / impact?

Thanks,
Gerald

On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann@gmail.com> wrote:

> Since the performance running a group-by aggregation on client side is
> most likely bad, it’s usually not desired. The original implementation was
> for functionality completeness only so it chose the easiest way, which
> reused some existing classes. In some cases, though, the client group-by
> can still be tolerable if there aren’t many distinct keys. So yes, please
> open a JIRA for implementing hash aggregation on client side. Thank you!
>
>
> Thanks,
> Maryann
>
> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsang...@23andme.com>
> wrote:
>
>> Hello,
>>
>> Any guidance or thoughts on the thread below?
>>
>> Thanks,
>> Gerald
>>
>>
>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsang...@23andme.com>
>> wrote:
>>
>>> Maryann,
>>>
>>> Can Phoenix provide hash aggregation on the client side? Are there
>>> design / implementation reasons not to, or should I file a ticket for this?
>>>
>>> Thanks,
>>> Gerald
>>>
>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann@gmail.com>
>>> wrote:
>>>
>>>> Hi Gerald,
>>>>
>>>> Phoenix does have hash aggregation. The reason why sort-based
>>>> aggregation is used in your query plan is that the aggregation happens on
>>>> the client side. And that is because sort-merge join is used (as hinted)
>>>> which is a client driven join, and after that join stage all operations can
>>>> only be on the client-side.
>>>>
>>>>
>>>> Thanks,
>>>> Marynn
>>>>
>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsang...@23andme.com>
>>>> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap,
>>>>> or should I file a ticket? We have aggregation queries that do not require
>>>>> sorted results.
>>>>>
>>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>>
>>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>>>> keyB));*
>>>>>
>>>>>
>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>>> BY t1.val,
>>>>> t2.val;++-++--+|
>>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>>> |++-++--+|
>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | null |
>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | 
>>>>> null
>>>>> | || AND| null |
>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | 
>>>>> null
>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>>| null | null |
>>>>> |++-++--+*
>>>>> Thanks,
>>>>> Gerald
>>>>>
>>>>
>>>>
>>>
>>


Re: Hash aggregation

2018-05-22 Thread Maryann Xue
Since the performance running a group-by aggregation on client side is most
likely bad, it’s usually not desired. The original implementation was for
functionality completeness only so it chose the easiest way, which reused
some existing classes. In some cases, though, the client group-by can still
be tolerable if there aren’t many distinct keys. So yes, please open a JIRA
for implementing hash aggregation on client side. Thank you!


Thanks,
Maryann
On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsang...@23andme.com>
wrote:

> Hello,
>
> Any guidance or thoughts on the thread below?
>
> Thanks,
> Gerald
>
>
> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsang...@23andme.com>
> wrote:
>
>> Maryann,
>>
>> Can Phoenix provide hash aggregation on the client side? Are there design
>> / implementation reasons not to, or should I file a ticket for this?
>>
>> Thanks,
>> Gerald
>>
>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann@gmail.com>
>> wrote:
>>
>>> Hi Gerald,
>>>
>>> Phoenix does have hash aggregation. The reason why sort-based
>>> aggregation is used in your query plan is that the aggregation happens on
>>> the client side. And that is because sort-merge join is used (as hinted)
>>> which is a client driven join, and after that join stage all operations can
>>> only be on the client-side.
>>>
>>>
>>> Thanks,
>>> Marynn
>>>
>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsang...@23andme.com>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>>>> should I file a ticket? We have aggregation queries that do not require
>>>> sorted results.
>>>>
>>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>>
>>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>>> keyB));*
>>>>
>>>>
>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2,
>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP
>>>> BY t1.val,
>>>> t2.val;++-++--+|
>>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>>> |++-++--+|
>>>> SORT-MERGE-JOIN (INNER) TABLES | null | null |
>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>>> | || AND| null |
>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>>| null | null |
>>>> |++-++--+*
>>>> Thanks,
>>>> Gerald
>>>>
>>>
>>>
>>
>


Re: Hash aggregation

2018-05-22 Thread Gerald Sangudi
Hello,

Any guidance or thoughts on the thread below?

Thanks,
Gerald

On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsang...@23andme.com>
wrote:

> Maryann,
>
> Can Phoenix provide hash aggregation on the client side? Are there design
> / implementation reasons not to, or should I file a ticket for this?
>
> Thanks,
> Gerald
>
> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann@gmail.com>
> wrote:
>
>> Hi Gerald,
>>
>> Phoenix does have hash aggregation. The reason why sort-based aggregation
>> is used in your query plan is that the aggregation happens on the client
>> side. And that is because sort-merge join is used (as hinted) which is a
>> client driven join, and after that join stage all operations can only be on
>> the client-side.
>>
>>
>> Thanks,
>> Marynn
>>
>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsang...@23andme.com>
>> wrote:
>>
>>> Hello,
>>>
>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>>> should I file a ticket? We have aggregation queries that do not require
>>> sorted results.
>>>
>>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>>
>>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT
>>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>>> keyB));*
>>>
>>>
>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*)
>>> c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
>>> t2.val;++-++--+|
>>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>>> |++-++--+|
>>> SORT-MERGE-JOIN (INNER) TABLES | null | null |
>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>>> | || AND| null |
>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>>| null | null |
>>> |++-++--+*
>>> Thanks,
>>> Gerald
>>>
>>
>>
>


Re: Hash aggregation

2018-05-18 Thread Gerald Sangudi
Maryann,

Can Phoenix provide hash aggregation on the client side? Are there design /
implementation reasons not to, or should I file a ticket for this?

Thanks,
Gerald

On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann@gmail.com> wrote:

> Hi Gerald,
>
> Phoenix does have hash aggregation. The reason why sort-based aggregation
> is used in your query plan is that the aggregation happens on the client
> side. And that is because sort-merge join is used (as hinted) which is a
> client driven join, and after that join stage all operations can only be on
> the client-side.
>
>
> Thanks,
> Marynn
>
> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsang...@23andme.com>
> wrote:
>
>> Hello,
>>
>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
>> should I file a ticket? We have aggregation queries that do not require
>> sorted results.
>>
>> For example, this EXPLAIN plan shows a CLIENT SORT.
>>
>> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT
>> NOT NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA,
>> keyB));*
>>
>>
>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*)
>> c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
>> t2.val;++-++--+|
>>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
>> |++-++--+|
>> SORT-MERGE-JOIN (INNER) TABLES | null | null |
>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
>> | || AND| null |
>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>>| null | null |
>> |++-++--+*
>> Thanks,
>> Gerald
>>
>
>


Re: Hash aggregation

2018-05-18 Thread Maryann Xue
Hi Gerald,

Phoenix does have hash aggregation. The reason why sort-based aggregation
is used in your query plan is that the aggregation happens on the client
side. And that is because sort-merge join is used (as hinted) which is a
client driven join, and after that join stage all operations can only be on
the client-side.


Thanks,
Marynn

On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi <gsang...@23andme.com>
wrote:

> Hello,
>
> Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
> should I file a ticket? We have aggregation queries that do not require
> sorted results.
>
> For example, this EXPLAIN plan shows a CLIENT SORT.
>
> *CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT NOT
> NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA, keyB));*
>
>
> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*) c
> FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
> t2.val;++-++--+|
>PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
> |++-++--+|
> SORT-MERGE-JOIN (INNER) TABLES | null | null |
> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
> | || AND| null |
> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
>| null | null |
> |++-++--+*
> Thanks,
> Gerald
>


Hash aggregation

2018-05-18 Thread Gerald Sangudi
Hello,

Does Phoenix provide hash aggregation? If not, is it on the roadmap, or
should I file a ticket? We have aggregation queries that do not require
sorted results.

For example, this EXPLAIN plan shows a CLIENT SORT.

*CREATE TABLE unsalted (   keyA BIGINT NOT NULL,   keyB BIGINT NOT
NULL,   val SMALLINT,   CONSTRAINT pk PRIMARY KEY (keyA, keyB));*


*EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, COUNT(*) c
FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) GROUP BY t1.val,
t2.val;++-++--+|
   PLAN   | EST_BYTES_READ | EST_ROWS_READ  |
|++-++--+|
SORT-MERGE-JOIN (INNER) TABLES | null | null |
|| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null | null
| || AND| null |
null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  |
null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]
   | null | null |
|++-++--+*
Thanks,
Gerald