Thank you so much for the response Josh. Will check on 4.14 and get back to you with the results.
Thanks On Tue, Sep 11, 2018 at 8:45 PM Josh Elser <els...@apache.org> wrote: > The versions you provided in the description make it sound like you're > actually using HDP's distribution of Apache Phoenix, not an official > Apache Phoenix release. > > Please test against an Apache Phoenix release or contact Hortonworks for > support. It would not be unheard of that this issue has already been fixed. > > On 9/11/18 10:01 AM, Abhishek Gupta wrote: > > Hi, > > > > We are trying to use the LAST_VALUE aggregation UDF in our Phoenix SQL > > queries that serve REST APIs over analytics data. However we are seeing > > a number of issues/limitations in this function. > > First, there seems to be a bug where it fails for sparse columns that > > have NULLs. Have created a JIRA ticket for that containing the details > > of the exception and steps to reproduce > > https://issues.apache.org/jira/browse/PHOENIX-4898 > > > > Second, the function works in a way that it ignores all the group by > > columns except the column in the order by clause in finding the last > > value. For example > > Following is a dataset (pk1,pk2,pk3,pk4 and pk5 are Primary key columns > > and m1-m3 are metric columns, I intend to do a SUM on m2 and m3 and a > > last value on m1 > > > > pk1 . | pk2 | pk3 . | pk4 | > > pk5 . | m1 | m2 | m3 > > > > ------------------------------------------------------------------------------------------------------ > > | test | 201808010000 | app1 | plat1 | lang1 | 1 > > | 10 | 100 | > > | test | 201808010000 | app1 | plat1 | lang2 | 2 > > | 10 | 100 | > > | test | 201808010000 | app1 | plat2 | lang1 | 3 > > | 10 | 100 | > > | test | 201808010000 | app2 | plat1 | lang1 | 4 > > | 10 | 100 | > > | test | 201808010030 | app1 | plat1 | lang1 | > > 10 | 10 | 100 | > > | test | 201808010030 | app1 | plat1 | lang2 | > > 20 | 10 | 100 | > > | test | 201808010030 | app1 | plat2 | lang1 | > > 30 | 10 | 100 | > > | test | 201808010030 | app2 | plat1 | lang1 | > > 40 | 10 | 100 | > > | test | 201808010100 | app1 | plat1 | lang1 | > > 100 | 10 | 100 | > > | test | 201808010100 | app1 | plat1 | lang2 | > > 200 | 10 | 100 | > > | test | 201808010100 | app1 | plat2 | lang1 | > > 300 | 10 | 100 | > > | test | 201808010100 | app2 | plat1 | lang1 | > > 400 | 10 | 100 | > > | test | 201808010130 | app1 | plat1 | lang1 | > > 1000 | 10 | 100 | > > | test | 201808010130 | app1 | plat1 | lang2 | > > 2000 | 10 | 100 | > > | test | 201808010130 | app1 | plat2 | lang1 | > > 3000 | 10 | 100 | > > | test | 201808010130 | app2 | plat1 | lang1 | > > 4000 | 10 | 100 | > > | test | 201808010200 | app1 | plat1 | lang1 | > > 10000 | 10 | 100 | > > | test | 201808010200 | app1 | plat1 | lang2 | > > 20000 | 10 | 100 | > > | test | 201808010200 | app1 | plat2 | lang1 | > > 30000 | 10 | 100 | > > | test | 201808010200 | app2 | plat1 | lang1 | > > 40000 | 10 | 100 | > > > > > > If I run the following query (using all primary key columns in group by) > > > > select TO_CHAR(TO_DATE(pk2,'yyyyMMddHHmm'),'yyyyMMddHH') as > > t,pk3,pk4,pk5, last_value(m1) within group (order by pk2 asc) as oi, > > sum(m2), sum(m3) from test_table group by pk1,t,pk3,pk4,pk5; > > > > I get the correct result for last value as for each pk2 value which is > > used in order by there is only 1 row > > > > > > +-------------+---------+----------------+--------+--------+----------+----------+ > > | T | pk3 | pk4 | pk5 | OI | SUM(M2) | SUM(M3) | > > > > +-------------+---------+----------------+--------+--------+----------+----------+ > > | 2018080100 | app1 | plat1 | lang1 | 10 | 20 > > | 200 | > > | 2018080100 | app1 | plat1 | lang2 | 20 | 20 > > | 200 | > > | 2018080100 | app1 | plat2 | lang1 | 30 | 20 > > | 200 | > > | 2018080100 | app2 | plat1 | lang1 | 40 | 20 > > | 200 | > > | 2018080101 | app1 | plat1 | lang1 | 1000 | 20 > > | 200 | > > | 2018080101 | app1 | plat1 | lang2 | 2000 | 20 > > | 200 | > > | 2018080101 | app1 | plat2 | lang1 | 3000 | 20 > > | 200 | > > | 2018080101 | app2 | plat1 | lang1 | 4000 | 20 > > | 200 | > > | 2018080102 | app1 | plat1 | lang1 | 10000 | 20 > > | 100 | > > | 2018080102 | app1 | plat1 | lang2 | 20000 | 10 > > | 100 | > > | 2018080102 | app1 | plat2 | lang1 | 30000 | 10 > > | 100 | > > | 2018080102 | app2 | plat1 | lang1 | 40000 | 10 > > | 100 | > > > > +-------------+---------+----------------+--------+--------+----------+----------+ > > > > > > However if I do I group by on less than all the primary columns the > > LAST_VALUE function ignores the rest of the group by columns in sorting > > and returns incorrect last_value > > > > select TO_CHAR(TO_DATE(pk2,'yyyyMMddHHmm'),'yyyyMMddHH') as t, pk3, > > last_value(m1) within group (order by pk2 asc) as oi, sum(m2), > > sum(m3) from test_table group by pk1,t,pk3; > > > > > > +-------------+---------+--------+----------+----------+ > > | T | pk3 | OI | SUM(M2) | SUM(M3) | > > +-------------+---------+--------+----------+----------+ > > | 2018080100 | app1 | 10 | 60 | 600 | > > | 2018080100 | app2 | 40 | 20 | 200 | > > | 2018080101 | app1 | 1000 | 60 | 600 | > > | 2018080101 | app2 | 4000 | 20 | 200 | > > | 2018080102 | app1 | 10000 | 40 | 300 | > > | 2018080102 | app2 | 40000 | 10 | 100 | > > +-------------+---------+--------+----------+----------+ > > > > So instead of taking the last value of the group formed by 2018080100 > > and app1 i.e 30 it is picking the first i.e 10. > > Thanks, > > Abhishek > > >