Yes, please open a JIRA and attach that CSV (or ideally the smallest subset that exhibits the problem). Thanks, James
On Tuesday, April 7, 2015, Marek Wiewiorka <[email protected]> wrote: > Hi James - shall I still open a JIRA for that? > Thanks! > Marek > > 2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>>: > >> psql from a csv file: >> ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR >> /mnt/spark/export/usage_convert.txt/usage_merged.csv >> >> Here is a sample: >> 2015-03-19 23:59:59,6854,0,2.167 >> 2015-03-19 22:59:59,6854,0,2.421 >> 2015-03-19 21:59:59,6854,0,2.738 >> 2015-03-19 20:59:59,6854,0,0.9490000000000001 >> 2015-03-19 19:59:59,6854,0,0.748 >> 2015-03-19 18:59:59,6854,0,2.76 >> 2015-03-19 17:59:59,6854,0,1.801 >> 2015-03-19 16:59:59,6854,0,0.661 >> 2015-03-19 15:59:59,6854,0,1.082 >> 2015-03-19 14:59:59,6854,0,1.303 >> >> >> M. >> >> 2015-04-06 22:38 GMT+02:00 James Taylor <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >> >>> How did you input the data? >>> >>> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka >>> <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> > Oh I'm sorry I forgot to attach them: >>> > DDL of my table: >>> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id >>> integer >>> > not null ,hu_stream_id integer not null, hu_usage double constraint pk >>> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) ); >>> > >>> > Phoenix: 4.3.0 >>> > >>> > Thanks, >>> > Marek >>> > >>> > >>> > 2015-04-06 22:25 GMT+02:00 James Taylor <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >>> >> >>> >> Hi Marek, >>> >> How did you input the data and what does your CREATE TABLE/VIEW >>> >> statement look like? What version of Phoenix and HBase are you using? >>> >> Thanks, >>> >> James >>> >> >>> >> On Monday, April 6, 2015, Marek Wiewiorka <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>> >> wrote: >>> >>> >>> >>> Hi James - sure here is the result of your query (limited to 5 rows): >>> >>> >>> >>> >>> >>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>> | TO_CHAR(HU_TS) | HU_HO_ID >>> >>> | HU_STREAM_ID | HU_USAGE >>> >>> | >>> >>> >>> >>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>> | 2015-03-19 00:59:59.000 | 4720 >>> >>> | 0 | 0.287 >>> >>> | >>> >>> | 2015-03-19 00:59:59.000 | 6854 >>> >>> | 0 | 3.6189999999999998 >>> >>> | >>> >>> | 2015-03-19 01:59:59.000 | 4720 >>> >>> | 0 | 0.323 >>> >>> | >>> >>> | 2015-03-19 01:59:59.000 | 6854 >>> >>> | 0 | 2.556 >>> >>> | >>> >>> | 2015-03-19 02:59:59.000 | 4720 >>> >>> | 0 | 0.37 >>> >>> | >>> >>> >>> >>> >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>> >>> >>> Ok - I will try to prepare a sample csv with DDL/DML. >>> >>> >>> >>> Thanks again, >>> >>> Marek >>> >>> >>> >>> >>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >>> >>>> >>> >>>> Hi Marek, >>> >>>> Thanks for the additional information. If you could answer my >>> earlier >>> >>>> questions, that would be helpful. >>> >>>> >>> >>>> If you can't repro with a simple test case, then how about >>> attaching a >>> >>>> csv dump of some of your data (the smallest amount of data that >>> repros >>> >>>> the issue) to a JIRA along with the CREATE TABLE statement and the >>> >>>> query? >>> >>>> >>> >>>> Thanks, >>> >>>> James >>> >>>> >>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka >>> >>>> <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> >>>> > Hi Guys - thanks for your messages. >>> >>>> > >>> >>>> > I did another round of testing I found that if I use to_char >>> instead >>> >>>> > of >>> >>>> > trunc function I'm getting the expected result: >>> >>>> > >>> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage >>> where >>> >>>> > id in >>> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id; >>> >>>> > >>> >>>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > | TO_CHAR(DT) | ID >>> >>>> > | STREAM | >>> SUM(USAGE) >>> >>>> > | >>> >>>> > >>> >>>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > | 2015-03-19 | 4720 >>> >>>> > | 0 | 8.405999999999999 >>> >>>> > | >>> >>>> > | 2015-03-19 | 6854 >>> >>>> > | 0 | 28.339000000000006 >>> >>>> > | >>> >>>> > >>> >>>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > >>> >>>> > the same query with trunc returns 48 rows (24 rows per id) >>> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage >>> where id >>> >>>> > in >>> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id; >>> >>>> > e.g.: >>> >>>> > .... >>> >>>> > | 2015-03-19 00:00:00.000 | 4720 >>> | >>> >>>> > 0 >>> >>>> > | 0.406 | >>> >>>> > | 2015-03-19 00:00:00.000 | 6854 >>> | >>> >>>> > 0 >>> >>>> > | 0.9490000000000001 | >>> >>>> > | 2015-03-19 00:00:00.000 | 4720 >>> | >>> >>>> > 0 >>> >>>> > | 0.332 | >>> >>>> > | 2015-03-19 00:00:00.000 | 6854 >>> | >>> >>>> > 0 >>> >>>> > | 2.738 | >>> >>>> > | 2015-03-19 00:00:00.000 | 4720 >>> | >>> >>>> > 0 >>> >>>> > | 0.33499999999999996 | >>> >>>> > | 2015-03-19 00:00:00.000 | 6854 >>> | >>> >>>> > 0 >>> >>>> > | 2.421 | >>> >>>> > | 2015-03-19 00:00:00.000 | 4720 >>> | >>> >>>> > 0 >>> >>>> > | 0.45599999999999996 | >>> >>>> > | 2015-03-19 00:00:00.000 | 6854 >>> | >>> >>>> > 0 >>> >>>> > | 2.167 | >>> >>>> > >>> >>>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > >>> >>>> > so for some reason grouping by trunc is not working...but using >>> >>>> > to_char is. >>> >>>> > This applies for table loaded using psql from csv file. >>> >>>> > >>> >>>> > When tried to create a sample table and populate it with upserts >>> >>>> > everything >>> >>>> > worked as expected in both cases: >>> >>>> > >>> >>>> > create table usage (dt time not null ,id integer not null,stream >>> >>>> > integer not >>> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream )); >>> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0); >>> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0); >>> >>>> > >>> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0); >>> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0); >>> >>>> > >>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >>> trunc(dt,'DAY') >>> >>>> > as >>> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group by >>> >>>> > trunc(dt,'DAY'),stream,id; >>> >>>> > >>> >>>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > | DAY | ID >>> | >>> >>>> > STREAM | SUM(USAGE) >>> | >>> >>>> > >>> >>>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > | 2015-04-01 00:00:00.000 | 100 >>> | >>> >>>> > 0 >>> >>>> > | 5.0 | >>> >>>> > | 2015-04-01 00:00:00.000 | 200 >>> | >>> >>>> > 0 >>> >>>> > | 10.0 | >>> >>>> > >>> >>>> > >>> +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > >>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select >>> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where id >>> in >>> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id; >>> >>>> > >>> >>>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > | TO_CHAR(DT) | ID >>> >>>> > | STREAM | >>> SUM(USAGE) >>> >>>> > | >>> >>>> > >>> >>>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > | 2015-04-01 | 100 >>> >>>> > | 0 | 5.0 >>> >>>> > | >>> >>>> > | 2015-04-01 | 200 >>> >>>> > | 0 | 10.0 >>> >>>> > | >>> >>>> > >>> >>>> > >>> +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ >>> >>>> > 2 rows selected (1.49 seconds) >>> >>>> > >>> >>>> > >>> >>>> > Shall I open a jira for that? >>> >>>> > >>> >>>> > Regards, >>> >>>> > Marek >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > >>> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >>> >>>> >> >>> >>>> >> Hi Marek, >>> >>>> >> How did you input the data and what does your CREATE TABLE/VIEW >>> >>>> >> statement look like? What version of Phoenix and HBase are you >>> using? >>> >>>> >> >>> >>>> >> Also, would you mind running the following query and letting us >>> know >>> >>>> >> the >>> >>>> >> output? >>> >>>> >> >>> >>>> >> select to_char(hu_ts,'yyyy-MM-dd >>> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage >>> >>>> >> from se_dwh.homes_usage_hour >>> >>>> >> where hu_ho_id in (4720,6854); >>> >>>> >> >>> >>>> >> Thanks, >>> >>>> >> James >>> >>>> >> >>> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid >>> >>>> >> <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>> >>>> >> wrote: >>> >>>> >> > That certainly looks like a bug. Would it be possible to make a >>> >>>> >> > small >>> >>>> >> > reproducible test case and if possible, log this in the Phoenix >>> >>>> >> > JIRA >>> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ? >>> >>>> >> > >>> >>>> >> > - Gabriel >>> >>>> >> > >>> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka >>> >>>> >> > <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> >>> >>>> >> > wrote: >>> >>>> >> >> >>> >>>> >> >> Hi All, >>> >>>> >> >> I came across a weird situation while running a query with >>> group >>> >>>> >> >> by. >>> >>>> >> >> I executed 2 queries: >>> >>>> >> >> >>> >>>> >> >> 1) >>> >>>> >> >> select >>> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >>> >>>> >> >> from >>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group by >>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >>> >>>> >> >> >>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 >>> >>>> >> >> | >>> >>>> >> >> 0 >>> >>>> >> >> | 0.45599999999999996 | >>> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854 >>> >>>> >> >> | >>> >>>> >> >> 0 >>> >>>> >> >> | 2.167 | >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> 2) >>> >>>> >> >> >>> >>>> >> >> select >>> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) >>> >>>> >> >> from >>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by >>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; >>> >>>> >> >> >>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 >>> >>>> >> >> | >>> >>>> >> >> 0 >>> >>>> >> >> | 8.405999999999999 | >>> >>>> >> >> >>> >>>> >> >> >>> >>>> >> >> The only difference is that in the first case I included 2 >>> >>>> >> >> ids(4720,6854) >>> >>>> >> >> and in the other one only 1 (4720). >>> >>>> >> >> The result for hu_ho_id=4720 should be the same in both case >>> but >>> >>>> >> >> it >>> >>>> >> >> isn't >>> >>>> >> >> (0.45 vs 8.4). >>> >>>> >> >> The second result(8.4) is correct. >>> >>>> >> >> >>> >>>> >> >> Am I doing something wrong? >>> >>>> >> >> >>> >>>> >> >> Regards, >>> >>>> >> >> Marek >>> >>>> >> >> >>> >>>> >> > >>> >>>> > >>> >>>> > >>> >>> >>> >>> >>> > >>> >> >> >
