Hi James - I'm sorry for delay I had to troubleshoot some other more urgent issue. I will try to open JIRA with a sample CSV today.
Thanks! Marek 2015-04-08 23:08 GMT+02:00 James Taylor <[email protected]>: > Any JIRA yet, Marek? It'd be good to get to the bottom of this. Your > work around is not going to perform nearly as well as using TRUNC on > the date. > Thanks, > James > > On Tue, Apr 7, 2015 at 8:53 AM, James Taylor <[email protected]> > wrote: > > Yes, please open a JIRA and attach that CSV (or ideally the smallest > subset > > that exhibits the problem). > > Thanks, > > James > > > > > > On Tuesday, April 7, 2015, Marek Wiewiorka <[email protected]> > > wrote: > >> > >> Hi James - shall I still open a JIRA for that? > >> Thanks! > >> Marek > >> > >> 2015-04-06 22:48 GMT+02:00 Marek Wiewiorka <[email protected]>: > >>> > >>> psql from a csv file: > >>> ./psql.py dwh:2181:/hbase-unsecure -t SE_DWH.HOMES_USAGE_HOUR > >>> /mnt/spark/export/usage_convert.txt/usage_merged.csv > >>> > >>> Here is a sample: > >>> 2015-03-19 23:59:59,6854,0,2.167 > >>> 2015-03-19 22:59:59,6854,0,2.421 > >>> 2015-03-19 21:59:59,6854,0,2.738 > >>> 2015-03-19 20:59:59,6854,0,0.9490000000000001 > >>> 2015-03-19 19:59:59,6854,0,0.748 > >>> 2015-03-19 18:59:59,6854,0,2.76 > >>> 2015-03-19 17:59:59,6854,0,1.801 > >>> 2015-03-19 16:59:59,6854,0,0.661 > >>> 2015-03-19 15:59:59,6854,0,1.082 > >>> 2015-03-19 14:59:59,6854,0,1.303 > >>> > >>> > >>> M. > >>> > >>> 2015-04-06 22:38 GMT+02:00 James Taylor <[email protected]>: > >>>> > >>>> How did you input the data? > >>>> > >>>> On Mon, Apr 6, 2015 at 1:27 PM, Marek Wiewiorka > >>>> <[email protected]> wrote: > >>>> > Oh I'm sorry I forgot to attach them: > >>>> > DDL of my table: > >>>> > create table se_dwh.homes_usage_hour (hu_ts time not null ,hu_ho_id > >>>> > integer > >>>> > not null ,hu_stream_id integer not null, hu_usage double constraint > pk > >>>> > PRIMARY KEY(hu_ts,hu_ho_id,hu_stream_id) ); > >>>> > > >>>> > Phoenix: 4.3.0 > >>>> > > >>>> > Thanks, > >>>> > Marek > >>>> > > >>>> > > >>>> > 2015-04-06 22:25 GMT+02:00 James Taylor <[email protected]>: > >>>> >> > >>>> >> Hi Marek, > >>>> >> How did you input the data and what does your CREATE TABLE/VIEW > >>>> >> statement look like? What version of Phoenix and HBase are you > using? > >>>> >> Thanks, > >>>> >> James > >>>> >> > >>>> >> On Monday, April 6, 2015, Marek Wiewiorka < > [email protected]> > >>>> >> wrote: > >>>> >>> > >>>> >>> Hi James - sure here is the result of your query (limited to 5 > >>>> >>> rows): > >>>> >>> > >>>> >>> > >>>> >>> > >>>> >>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>> | TO_CHAR(HU_TS) | > >>>> >>> HU_HO_ID > >>>> >>> | HU_STREAM_ID | > >>>> >>> HU_USAGE > >>>> >>> | > >>>> >>> > >>>> >>> > >>>> >>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>> | 2015-03-19 00:59:59.000 | 4720 > >>>> >>> | 0 | 0.287 > >>>> >>> | > >>>> >>> | 2015-03-19 00:59:59.000 | 6854 > >>>> >>> | 0 | 3.6189999999999998 > >>>> >>> | > >>>> >>> | 2015-03-19 01:59:59.000 | 4720 > >>>> >>> | 0 | 0.323 > >>>> >>> | > >>>> >>> | 2015-03-19 01:59:59.000 | 6854 > >>>> >>> | 0 | 2.556 > >>>> >>> | > >>>> >>> | 2015-03-19 02:59:59.000 | 4720 > >>>> >>> | 0 | 0.37 > >>>> >>> | > >>>> >>> > >>>> >>> > >>>> >>> > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>> > >>>> >>> Ok - I will try to prepare a sample csv with DDL/DML. > >>>> >>> > >>>> >>> Thanks again, > >>>> >>> Marek > >>>> >>> > >>>> >>> > >>>> >>> 2015-04-06 22:06 GMT+02:00 James Taylor <[email protected]>: > >>>> >>>> > >>>> >>>> Hi Marek, > >>>> >>>> Thanks for the additional information. If you could answer my > >>>> >>>> earlier > >>>> >>>> questions, that would be helpful. > >>>> >>>> > >>>> >>>> If you can't repro with a simple test case, then how about > >>>> >>>> attaching a > >>>> >>>> csv dump of some of your data (the smallest amount of data that > >>>> >>>> repros > >>>> >>>> the issue) to a JIRA along with the CREATE TABLE statement and > the > >>>> >>>> query? > >>>> >>>> > >>>> >>>> Thanks, > >>>> >>>> James > >>>> >>>> > >>>> >>>> On Mon, Apr 6, 2015 at 12:49 PM, Marek Wiewiorka > >>>> >>>> <[email protected]> wrote: > >>>> >>>> > Hi Guys - thanks for your messages. > >>>> >>>> > > >>>> >>>> > I did another round of testing I found that if I use to_char > >>>> >>>> > instead > >>>> >>>> > of > >>>> >>>> > trunc function I'm getting the expected result: > >>>> >>>> > > >>>> >>>> > select to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage > >>>> >>>> > where > >>>> >>>> > id in > >>>> >>>> > (4720,6854) group by to_char(dt,'yyyy-MM-dd'),stream,id; > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > | TO_CHAR(DT) | > >>>> >>>> > ID > >>>> >>>> > | STREAM | > >>>> >>>> > SUM(USAGE) > >>>> >>>> > | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > | 2015-03-19 | 4720 > >>>> >>>> > | 0 | 8.405999999999999 > >>>> >>>> > | > >>>> >>>> > | 2015-03-19 | 6854 > >>>> >>>> > | 0 | 28.339000000000006 > >>>> >>>> > | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > > >>>> >>>> > the same query with trunc returns 48 rows (24 rows per id) > >>>> >>>> > select trunc(dt,'DAY') as day,id,stream,sum(usage) from usage > >>>> >>>> > where id > >>>> >>>> > in > >>>> >>>> > (4720,6854) group by trunc(dt,'DAY'),stream,id; > >>>> >>>> > e.g.: > >>>> >>>> > .... > >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 0.406 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 0.9490000000000001 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 0.332 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 2.738 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 0.33499999999999996 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 2.421 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 4720 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 0.45599999999999996 | > >>>> >>>> > | 2015-03-19 00:00:00.000 | 6854 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 2.167 | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > > >>>> >>>> > so for some reason grouping by trunc is not working...but using > >>>> >>>> > to_char is. > >>>> >>>> > This applies for table loaded using psql from csv file. > >>>> >>>> > > >>>> >>>> > When tried to create a sample table and populate it with > upserts > >>>> >>>> > everything > >>>> >>>> > worked as expected in both cases: > >>>> >>>> > > >>>> >>>> > create table usage (dt time not null ,id integer not > null,stream > >>>> >>>> > integer not > >>>> >>>> > null, usage double constraint pk PRIMARY KEY(dt,id,stream )); > >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 10:00:00',100,0,2.0); > >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 11:00:00',100,0,3.0); > >>>> >>>> > > >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 12:00:00',200,0,4.0); > >>>> >>>> > UPSERT INTO usage VALUES('2015-04-01 13:00:00',200,0,6.0); > >>>> >>>> > > >>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select > >>>> >>>> > trunc(dt,'DAY') > >>>> >>>> > as > >>>> >>>> > day,id,stream,sum(usage) from usage where id in (100,200) group > >>>> >>>> > by > >>>> >>>> > trunc(dt,'DAY'),stream,id; > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > | DAY | ID > >>>> >>>> > | > >>>> >>>> > STREAM | SUM(USAGE) > >>>> >>>> > | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > | 2015-04-01 00:00:00.000 | 100 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 5.0 | > >>>> >>>> > | 2015-04-01 00:00:00.000 | 200 > >>>> >>>> > | > >>>> >>>> > 0 > >>>> >>>> > | 10.0 | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +-------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > > >>>> >>>> > 0: jdbc:phoenix:dwh.sayene.com:2181:/hbase-un> select > >>>> >>>> > to_char(dt,'yyyy-MM-dd'),id,stream,sum(usage) from usage where > id > >>>> >>>> > in > >>>> >>>> > (100,200) group by to_char(dt,'yyyy-MM-dd'),stream,id; > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > | TO_CHAR(DT) | > >>>> >>>> > ID > >>>> >>>> > | STREAM | > >>>> >>>> > SUM(USAGE) > >>>> >>>> > | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > | 2015-04-01 | 100 > >>>> >>>> > | 0 | 5.0 > >>>> >>>> > | > >>>> >>>> > | 2015-04-01 | 200 > >>>> >>>> > | 0 | 10.0 > >>>> >>>> > | > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > +------------------------------------------+------------------------------------------+------------------------------------------+------------------------------------------+ > >>>> >>>> > 2 rows selected (1.49 seconds) > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > Shall I open a jira for that? > >>>> >>>> > > >>>> >>>> > Regards, > >>>> >>>> > Marek > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > > >>>> >>>> > 2015-04-06 20:16 GMT+02:00 James Taylor < > [email protected]>: > >>>> >>>> >> > >>>> >>>> >> Hi Marek, > >>>> >>>> >> How did you input the data and what does your CREATE > TABLE/VIEW > >>>> >>>> >> statement look like? What version of Phoenix and HBase are you > >>>> >>>> >> using? > >>>> >>>> >> > >>>> >>>> >> Also, would you mind running the following query and letting > us > >>>> >>>> >> know > >>>> >>>> >> the > >>>> >>>> >> output? > >>>> >>>> >> > >>>> >>>> >> select to_char(hu_ts,'yyyy-MM-dd > >>>> >>>> >> HH:mm:ss.SSS'),hu_ho_id,hu_stream_id,hu_usage > >>>> >>>> >> from se_dwh.homes_usage_hour > >>>> >>>> >> where hu_ho_id in (4720,6854); > >>>> >>>> >> > >>>> >>>> >> Thanks, > >>>> >>>> >> James > >>>> >>>> >> > >>>> >>>> >> On Mon, Apr 6, 2015 at 10:34 AM, Gabriel Reid > >>>> >>>> >> <[email protected]> > >>>> >>>> >> wrote: > >>>> >>>> >> > That certainly looks like a bug. Would it be possible to > make > >>>> >>>> >> > a > >>>> >>>> >> > small > >>>> >>>> >> > reproducible test case and if possible, log this in the > >>>> >>>> >> > Phoenix > >>>> >>>> >> > JIRA > >>>> >>>> >> > (https://issues.apache.org/jira/browse/PHOENIX) ? > >>>> >>>> >> > > >>>> >>>> >> > - Gabriel > >>>> >>>> >> > > >>>> >>>> >> > On Mon, Apr 6, 2015 at 6:10 PM Marek Wiewiorka > >>>> >>>> >> > <[email protected]> > >>>> >>>> >> > wrote: > >>>> >>>> >> >> > >>>> >>>> >> >> Hi All, > >>>> >>>> >> >> I came across a weird situation while running a query with > >>>> >>>> >> >> group > >>>> >>>> >> >> by. > >>>> >>>> >> >> I executed 2 queries: > >>>> >>>> >> >> > >>>> >>>> >> >> 1) > >>>> >>>> >> >> select > >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) > >>>> >>>> >> >> from > >>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720,6854) group > >>>> >>>> >> >> by > >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; > >>>> >>>> >> >> > >>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 > >>>> >>>> >> >> | > >>>> >>>> >> >> 0 > >>>> >>>> >> >> | 0.45599999999999996 | > >>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 6854 > >>>> >>>> >> >> | > >>>> >>>> >> >> 0 > >>>> >>>> >> >> | 2.167 | > >>>> >>>> >> >> > >>>> >>>> >> >> > >>>> >>>> >> >> 2) > >>>> >>>> >> >> > >>>> >>>> >> >> select > >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id,hu_stream_id,sum(hu_usage) > >>>> >>>> >> >> from > >>>> >>>> >> >> se_dwh.homes_usage_hour where hu_ho_id in (4720) group by > >>>> >>>> >> >> trunc(hu_ts,'DAY',1),hu_ho_id ,hu_stream_id; > >>>> >>>> >> >> > >>>> >>>> >> >> | 2015-03-19 00:00:00.000 | 4720 > >>>> >>>> >> >> | > >>>> >>>> >> >> 0 > >>>> >>>> >> >> | 8.405999999999999 | > >>>> >>>> >> >> > >>>> >>>> >> >> > >>>> >>>> >> >> The only difference is that in the first case I included 2 > >>>> >>>> >> >> ids(4720,6854) > >>>> >>>> >> >> and in the other one only 1 (4720). > >>>> >>>> >> >> The result for hu_ho_id=4720 should be the same in both > case > >>>> >>>> >> >> but > >>>> >>>> >> >> it > >>>> >>>> >> >> isn't > >>>> >>>> >> >> (0.45 vs 8.4). > >>>> >>>> >> >> The second result(8.4) is correct. > >>>> >>>> >> >> > >>>> >>>> >> >> Am I doing something wrong? > >>>> >>>> >> >> > >>>> >>>> >> >> Regards, > >>>> >>>> >> >> Marek > >>>> >>>> >> >> > >>>> >>>> >> > > >>>> >>>> > > >>>> >>>> > > >>>> >>> > >>>> >>> > >>>> > > >>> > >>> > >> > > >
