Hi James - shall I still open a JIRA for that? Thanks! Marek 2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <marek.wiewio...@gmail.com>:
> psql from a csv file: > ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR > /mnt/spark/export/usage_convert.txt/usage_merged.csv > > Here is a sample: > 2015-03-19 23:59:59,6854,0,2.167 > 2015-03-19 22:59:59,6854,0,2.421 > 2015-03-19 21:59:59,6854,0,2.738 > 2015-03-19 20:59:59,6854,0,0.9490000000000001 > 2015-03-19 19:59:59,6854,0,0.748 > 2015-03-19 18:59:59,6854,0,2.76 > 2015-03-19 17:59:59,6854,0,1.801 > 2015-03-19 16:59:59,6854,0,0.661 > 2015-03-19 15:59:59,6854,0,1.082 > 2015-03-19 14:59:59,6854,0,1.303 > > > M. > > 2015-04-06 22:38 GMT+02:00 James Taylor <jamestay...@apache.org>: > >> How did you input the data? >> >> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka >> <marek.wiewio...@gmail.com> wrote: >> > Oh I'm sorry I forgot to attach them: >> > DDL of my table: >> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id >> integer >> > not null ,hu_stream_id integer not null, hu_usage double constraint pk >> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) ); >> > >> > Phoenix: 4.3.0 >> > >> > Thanks, >> > Marek >> > >> > >> > 2015-04-06 22:25 GMT+02:00 James Taylor <jamestay...@apache.org>: >> >> >> >> Hi Marek, >> >> How did you input the data and what does your CREATE TABLE/VIEW >> >> statement look like? What version of Phoenix and HBase are you using? >> >> Thanks, >> >> James >> >> >> >> On Monday, April 6, 2015, Marek Wiewiorka <marek.wiewio...@gmail.com> >> >> wrote: >> >>> >> >>> Hi James - sure here is the result of your query (limited to 5 rows): >> >>> >> >>> >> >>> >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>> | TO_CHAR(HU_TS) | HU_HO_ID >> >>> | HU_STREAM_ID | HU_USAGE >> >>> | >> >>> >> >>> >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>> | 2015-03-19 00:59:59.000 | 4720 >> >>> | 0 | 0.287 >> >>> | >> >>> | 2015-03-19 00:59:59.000 | 6854 >> >>> | 0 | 3.6189999999999998 >> >>> | >> >>> | 2015-03-19 01:59:59.000 | 4720 >> >>> | 0 | 0.323 >> >>> | >> >>> | 2015-03-19 01:59:59.000 | 6854 >> >>> | 0 | 2.556 >> >>> | >> >>> | 2015-03-19 02:59:59.000 | 4720 >> >>> | 0 | 0.37 >> >>> | >> >>> >> >>> >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>> >> >>> Ok - I will try to prepare a sample csv with DDL/DML. >> >>> >> >>> Thanks again, >> >>> Marek >> >>> >> >>> >> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <jamestay...@apache.org>: >> >>>> >> >>>> Hi Marek, >> >>>> Thanks for the additional information. If you could answer my earlier >> >>>> questions, that would be helpful. >> >>>> >> >>>> If you can't repro with a simple test case, then how about attaching >> a >> >>>> csv dump of some of your data (the smallest amount of data that >> repros >> >>>> the issue) to a JIRA along with the CREATE TABLE statement and the >> >>>> query? >> >>>> >> >>>> Thanks, >> >>>> James >> >>>> >> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka >> >>>> <marek.wiewio...@gmail.com> wrote: >> >>>> > Hi Guys - thanks for your messages. >> >>>> > >> >>>> > I did another round of testing I found that if I use to_char >> instead >> >>>> > of >> >>>> > trunc function I'm getting the expected result: >> >>>> > >> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage >> where >> >>>> > id in >> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id; >> >>>> > >> >>>> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > | TO_CHAR(DT) | ID >> >>>> > | STREAM | >> SUM(USAGE) >> >>>> > | >> >>>> > >> >>>> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > | 2015-03-19 | 4720 >> >>>> > | 0 | 8.405999999999999 >> >>>> > | >> >>>> > | 2015-03-19 | 6854 >> >>>> > | 0 | 28.339000000000006 >> >>>> > | >> >>>> > >> >>>> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > >> >>>> > the same query with trunc returns 48 rows (24 rows per id) >> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage >> where id >> >>>> > in >> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id; >> >>>> > e.g.: >> >>>> > .... >> >>>> > | 2015-03-19 00:00:00.000 | 4720 >> | >> >>>> > 0 >> >>>> > | 0.406 | >> >>>> > | 2015-03-19 00:00:00.000 | 6854 >> | >> >>>> > 0 >> >>>> > | 0.9490000000000001 | >> >>>> > | 2015-03-19 00:00:00.000 | 4720 >> | >> >>>> > 0 >> >>>> > | 0.332 | >> >>>> > | 2015-03-19 00:00:00.000 | 6854 >> | >> >>>> > 0 >> >>>> > | 2.738 | >> >>>> > | 2015-03-19 00:00:00.000 | 4720 >> | >> >>>> > 0 >> >>>> > | 0.33499999999999996 | >> >>>> > | 2015-03-19 00:00:00.000 | 6854 >> | >> >>>> > 0 >> >>>> > | 2.421 | >> >>>> > | 2015-03-19 00:00:00.000 | 4720 >> | >> >>>> > 0 >> >>>> > | 0.45599999999999996 | >> >>>> > | 2015-03-19 00:00:00.000 | 6854 >> | >> >>>> > 0 >> >>>> > | 2.167 | >> >>>> > >> >>>> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > >> >>>> > so for some reason grouping by trunc is not working...but using >> >>>> > to_char is. >> >>>> > This applies for table loaded using psql from csv file. >> >>>> > >> >>>> > When tried to create a sample table and populate it with upserts >> >>>> > everything >> >>>> > worked as expected in both cases: >> >>>> > >> >>>> > create table usage (dt time not null ,id integer not null,stream >> >>>> > integer not >> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream )); >> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0); >> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0); >> >>>> > >> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0); >> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0); >> >>>> > >> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >> trunc(dt,'DAY') >> >>>> > as >> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by >> >>>> > trunc(dt,'DAY'),stream,id; >> >>>> > >> >>>> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > | DAY | ID >> | >> >>>> > STREAM | SUM(USAGE) >> | >> >>>> > >> >>>> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > | 2015-04-01 00:00:00.000 | 100 >> | >> >>>> > 0 >> >>>> > | 5.0 | >> >>>> > | 2015-04-01 00:00:00.000 | 200 >> | >> >>>> > 0 >> >>>> > | 10.0 | >> >>>> > >> >>>> > >> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > >> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id >> in >> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id; >> >>>> > >> >>>> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > | TO_CHAR(DT) | ID >> >>>> > | STREAM | >> SUM(USAGE) >> >>>> > | >> >>>> > >> >>>> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > | 2015-04-01 | 100 >> >>>> > | 0 | 5.0 >> >>>> > | >> >>>> > | 2015-04-01 | 200 >> >>>> > | 0 | 10.0 >> >>>> > | >> >>>> > >> >>>> > >> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >> >>>> > 2 rows selected (1.49 seconds) >> >>>> > >> >>>> > >> >>>> > Shall I open a jira for that? >> >>>> > >> >>>> > Regards, >> >>>> > Marek >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <jamestay...@apache.org>: >> >>>> >> >> >>>> >> Hi Marek, >> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW >> >>>> >> statement look like? What version of Phoenix and HBase are you >> using? >> >>>> >> >> >>>> >> Also, would you mind running the following query and letting us >> know >> >>>> >> the >> >>>> >> output? >> >>>> >> >> >>>> >> select to_char(hu_ts,'yyyy-MM-dd >> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage >> >>>> >> from se_dwh.homes_usage_hour >> >>>> >> where hu_ho_id in (4720,6854); >> >>>> >> >> >>>> >> Thanks, >> >>>> >> James >> >>>> >> >> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid >> >>>> >> <gabriel.r...@gmail.com> >> >>>> >> wrote: >> >>>> >> > That certainly looks like a bug. Would it be possible to make a >> >>>> >> > small >> >>>> >> > reproducible test case and if possible, log this in the Phoenix >> >>>> >> > JIRA >> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ? >> >>>> >> > >> >>>> >> > - Gabriel >> >>>> >> > >> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka >> >>>> >> > <marek.wiewio...@gmail.com> >> >>>> >> > wrote: >> >>>> >> >> >> >>>> >> >> Hi All, >> >>>> >> >> I came across a weird situation while running a query with >> group >> >>>> >> >> by. >> >>>> >> >> I executed 2 queries: >> >>>> >> >> >> >>>> >> >> 1) >> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >> >>>> >> >> from >> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by >> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >> >>>> >> >> >> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 >> >>>> >> >> | >> >>>> >> >> 0 >> >>>> >> >> | 0.45599999999999996 | >> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854 >> >>>> >> >> | >> >>>> >> >> 0 >> >>>> >> >> | 2.167 | >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> 2) >> >>>> >> >> >> >>>> >> >> select trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >> >>>> >> >> from >> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by >> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >> >>>> >> >> >> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 >> >>>> >> >> | >> >>>> >> >> 0 >> >>>> >> >> | 8.405999999999999 | >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> The only difference is that in the first case I included 2 >> >>>> >> >> ids(4720,6854) >> >>>> >> >> and in the other one only 1 (4720). >> >>>> >> >> The result for hu_ho_id=4720 should be the same in both case >> but >> >>>> >> >> it >> >>>> >> >> isn't >> >>>> >> >> (0.45 vs 8.4). >> >>>> >> >> The second result(8.4) is correct. >> >>>> >> >> >> >>>> >> >> Am I doing something wrong? >> >>>> >> >> >> >>>> >> >> Regards, >> >>>> >> >> Marek >> >>>> >> >> >> >>>> >> > >> >>>> > >> >>>> > >> >>> >> >>> >> > >> > >