Hi Gerald, No further suggestions than my comments on the JIRA. Maybe a good next step would be a patch? Thanks, James
On Tue, Jun 12, 2018 at 8:15 PM, Gerald Sangudi <gsang...@23andme.com> wrote: > Hi Maryann and James, > > Any further guidance on PHOENIX-4751 > <https://issues.apache.org/jira/browse/PHOENIX-4751>? > > Thanks, > Gerald > > On Wed, May 23, 2018 at 11:00 AM, Gerald Sangudi <gsang...@23andme.com> > wrote: > >> Hi Maryann, >> >> I filed PHOENIX-4751 <https://issues.apache.org/jira/browse/PHOENIX-4751> >> . >> >> Is this likely to be reviewed soon (say next few weeks), or should I look >> at the Phoenix source to estimate the scope / impact? >> >> Thanks, >> Gerald >> >> On Tue, May 22, 2018 at 11:12 AM, Maryann Xue <maryann....@gmail.com> >> wrote: >> >>> Since the performance running a group-by aggregation on client side is >>> most likely bad, it’s usually not desired. The original implementation was >>> for functionality completeness only so it chose the easiest way, which >>> reused some existing classes. In some cases, though, the client group-by >>> can still be tolerable if there aren’t many distinct keys. So yes, please >>> open a JIRA for implementing hash aggregation on client side. Thank you! >>> >>> >>> Thanks, >>> Maryann >>> >>> On Tue, May 22, 2018 at 10:50 AM Gerald Sangudi <gsang...@23andme.com> >>> wrote: >>> >>>> Hello, >>>> >>>> Any guidance or thoughts on the thread below? >>>> >>>> Thanks, >>>> Gerald >>>> >>>> >>>> On Fri, May 18, 2018 at 11:39 AM, Gerald Sangudi <gsang...@23andme.com> >>>> wrote: >>>> >>>>> Maryann, >>>>> >>>>> Can Phoenix provide hash aggregation on the client side? Are there >>>>> design / implementation reasons not to, or should I file a ticket for >>>>> this? >>>>> >>>>> Thanks, >>>>> Gerald >>>>> >>>>> On Fri, May 18, 2018 at 11:29 AM, Maryann Xue <maryann....@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi Gerald, >>>>>> >>>>>> Phoenix does have hash aggregation. The reason why sort-based >>>>>> aggregation is used in your query plan is that the aggregation happens on >>>>>> the client side. And that is because sort-merge join is used (as hinted) >>>>>> which is a client driven join, and after that join stage all operations >>>>>> can >>>>>> only be on the client-side. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Marynn >>>>>> >>>>>> On Fri, May 18, 2018 at 10:57 AM, Gerald Sangudi < >>>>>> gsang...@23andme.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> Does Phoenix provide hash aggregation? If not, is it on the roadmap, >>>>>>> or should I file a ticket? We have aggregation queries that do not >>>>>>> require >>>>>>> sorted results. >>>>>>> >>>>>>> For example, this EXPLAIN plan shows a CLIENT SORT. >>>>>>> >>>>>>> *CREATE TABLE unsalted ( keyA BIGINT NOT NULL, keyB >>>>>>> BIGINT NOT NULL, val SMALLINT, CONSTRAINT pk PRIMARY KEY >>>>>>> (keyA, >>>>>>> keyB));* >>>>>>> >>>>>>> >>>>>>> *EXPLAINSELECT /*+ USE_SORT_MERGE_JOIN */ t1.val v1, t2.val v2, >>>>>>> COUNT(*) c FROM unsalted t1 JOIN unsalted t2 ON (t1.keyA = t2.keyA) >>>>>>> GROUP >>>>>>> BY t1.val, >>>>>>> t2.val;+------------------------------------------------------------+-----------------+----------------+--+| >>>>>>> PLAN | EST_BYTES_READ | EST_ROWS_READ | >>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+| >>>>>>> SORT-MERGE-JOIN (INNER) TABLES | null | >>>>>>> null | >>>>>>> || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED | null | >>>>>>> null >>>>>>> | || AND | null | >>>>>>> null | || CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED | >>>>>>> null >>>>>>> | null | || CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL] | >>>>>>> null | null | || CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL] >>>>>>> | null | null | >>>>>>> |+------------------------------------------------------------+-----------------+----------------+--+* >>>>>>> Thanks, >>>>>>> Gerald >>>>>>> >>>>>> >>>>>> >>>>> >>>> >> >