How did you input the data?
On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka <[email protected]> wrote: > Oh I'm sorry I forgot to attach them: > DDL of my table: > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id integer > not null ,hu_stream_id integer not null, hu_usage double constraint pk > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) ); > > Phoenix: 4.3.0 > > Thanks, > Marek > > > 2015-04-06 22:25 GMT+02:00 James Taylor <[email protected]>: >> >> Hi Marek, >> How did you input the data and what does your CREATE TABLE/VIEW >> statement look like? What version of Phoenix and HBase are you using? >> Thanks, >> James >> >> On Monday, April 6, 2015, Marek Wiewiorka <[email protected]> >> wrote: >>> >>> Hi James - sure here is the result of your query (limited to 5 rows): >>> >>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> | TO_CHAR(HU_TS) | HU_HO_ID >>> | HU_STREAM_ID | HU_USAGE >>> | >>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> | 2015-03-19 00:59:59.000 | 4720 >>> | 0 | 0.287 >>> | >>> | 2015-03-19 00:59:59.000 | 6854 >>> | 0 | 3.6189999999999998 >>> | >>> | 2015-03-19 01:59:59.000 | 4720 >>> | 0 | 0.323 >>> | >>> | 2015-03-19 01:59:59.000 | 6854 >>> | 0 | 2.556 >>> | >>> | 2015-03-19 02:59:59.000 | 4720 >>> | 0 | 0.37 >>> | >>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>> Ok - I will try to prepare a sample csv with DDL/DML. >>> >>> Thanks again, >>> Marek >>> >>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <[email protected]>: >>>> >>>> Hi Marek, >>>> Thanks for the additional information. If you could answer my earlier >>>> questions, that would be helpful. >>>> >>>> If you can't repro with a simple test case, then how about attaching a >>>> csv dump of some of your data (the smallest amount of data that repros >>>> the issue) to a JIRA along with the CREATE TABLE statement and the >>>> query? >>>> >>>> Thanks, >>>> James >>>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka >>>> <[email protected]> wrote: >>>> > Hi Guys - thanks for your messages. >>>> > >>>> > I did another round of testing I found that if I use to_char instead >>>> > of >>>> > trunc function I'm getting the expected result: >>>> > >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where >>>> > id in >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id; >>>> > >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > | TO_CHAR(DT) | ID >>>> > | STREAM | SUM(USAGE) >>>> > | >>>> > >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > | 2015-03-19 | 4720 >>>> > | 0 | 8.405999999999999 >>>> > | >>>> > | 2015-03-19 | 6854 >>>> > | 0 | 28.339000000000006 >>>> > | >>>> > >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > >>>> > the same query with trunc returns 48 rows (24 rows per id) >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage where id >>>> > in >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id; >>>> > e.g.: >>>> > .... >>>> > | 2015-03-19 00:00:00.000 | 4720 | >>>> > 0 >>>> > | 0.406 | >>>> > | 2015-03-19 00:00:00.000 | 6854 | >>>> > 0 >>>> > | 0.9490000000000001 | >>>> > | 2015-03-19 00:00:00.000 | 4720 | >>>> > 0 >>>> > | 0.332 | >>>> > | 2015-03-19 00:00:00.000 | 6854 | >>>> > 0 >>>> > | 2.738 | >>>> > | 2015-03-19 00:00:00.000 | 4720 | >>>> > 0 >>>> > | 0.33499999999999996 | >>>> > | 2015-03-19 00:00:00.000 | 6854 | >>>> > 0 >>>> > | 2.421 | >>>> > | 2015-03-19 00:00:00.000 | 4720 | >>>> > 0 >>>> > | 0.45599999999999996 | >>>> > | 2015-03-19 00:00:00.000 | 6854 | >>>> > 0 >>>> > | 2.167 | >>>> > >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > >>>> > so for some reason grouping by trunc is not working...but using >>>> > to_char is. >>>> > This applies for table loaded using psql from csv file. >>>> > >>>> > When tried to create a sample table and populate it with upserts >>>> > everything >>>> > worked as expected in both cases: >>>> > >>>> > create table usage (dt time not null ,id integer not null,stream >>>> > integer not >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream )); >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0); >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0); >>>> > >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0); >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0); >>>> > >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select trunc(dt,'DAY') >>>> > as >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by >>>> > trunc(dt,'DAY'),stream,id; >>>> > >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > | DAY | ID | >>>> > STREAM | SUM(USAGE) | >>>> > >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > | 2015-04-01 00:00:00.000 | 100 | >>>> > 0 >>>> > | 5.0 | >>>> > | 2015-04-01 00:00:00.000 | 200 | >>>> > 0 >>>> > | 10.0 | >>>> > >>>> > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id in >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id; >>>> > >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > | TO_CHAR(DT) | ID >>>> > | STREAM | SUM(USAGE) >>>> > | >>>> > >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > | 2015-04-01 | 100 >>>> > | 0 | 5.0 >>>> > | >>>> > | 2015-04-01 | 200 >>>> > | 0 | 10.0 >>>> > | >>>> > >>>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>>> > 2 rows selected (1.49 seconds) >>>> > >>>> > >>>> > Shall I open a jira for that? >>>> > >>>> > Regards, >>>> > Marek >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <[email protected]>: >>>> >> >>>> >> Hi Marek, >>>> >> How did you input the data and what does your CREATE TABLE/VIEW >>>> >> statement look like? What version of Phoenix and HBase are you using? >>>> >> >>>> >> Also, would you mind running the following query and letting us know >>>> >> the >>>> >> output? >>>> >> >>>> >> select to_char(hu_ts,'yyyy-MM-dd >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage >>>> >> from se_dwh.homes_usage_hour >>>> >> where hu_ho_id in (4720,6854); >>>> >> >>>> >> Thanks, >>>> >> James >>>> >> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid >>>> >> <[email protected]> >>>> >> wrote: >>>> >> > That certainly looks like a bug. Would it be possible to make a >>>> >> > small >>>> >> > reproducible test case and if possible, log this in the Phoenix >>>> >> > JIRA >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ? >>>> >> > >>>> >> > - Gabriel >>>> >> > >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka >>>> >> > <[email protected]> >>>> >> > wrote: >>>> >> >> >>>> >> >> Hi All, >>>> >> >> I came across a weird situation while running a query with group >>>> >> >> by. >>>> >> >> I executed 2 queries: >>>> >> >> >>>> >> >> 1) >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >>>> >> >> from >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >>>> >> >> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 >>>> >> >> | >>>> >> >> 0 >>>> >> >> | 0.45599999999999996 | >>>> >> >> | 2015-03-19 00:00:00.000 | 6854 >>>> >> >> | >>>> >> >> 0 >>>> >> >> | 2.167 | >>>> >> >> >>>> >> >> >>>> >> >> 2) >>>> >> >> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >>>> >> >> from >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >>>> >> >> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 >>>> >> >> | >>>> >> >> 0 >>>> >> >> | 8.405999999999999 | >>>> >> >> >>>> >> >> >>>> >> >> The only difference is that in the first case I included 2 >>>> >> >> ids(4720,6854) >>>> >> >> and in the other one only 1 (4720). >>>> >> >> The result for hu_ho_id=4720 should be the same in both case but >>>> >> >> it >>>> >> >> isn't >>>> >> >> (0.45 vs 8.4). >>>> >> >> The second result(8.4) is correct. >>>> >> >> >>>> >> >> Am I doing something wrong? >>>> >> >> >>>> >> >> Regards, >>>> >> >> Marek >>>> >> >> >>>> >> > >>>> > >>>> > >>> >>> >
