Alok, Please file a JIRA with this info. We need a representative data set that exhibits this bug - would it be possible to provide that? The smaller the better. Thanks, James
On Monday, March 21, 2016, Alok Singh <[email protected]> wrote: > Environment: > * Phoenix 4.6 > * Hbase 1.1.2 > * Hadoop 2.7.1 > * JDK 1.7 > > Table: > COST ( > ACCOUNT_ID VARCHAR NOT NULL, > PRECISION TINYINT NOT NULL, > START_DATE TIMESTAMP NOT NULL, > SECONDARY_ACCOUNT_ID VARCHAR NOT NULL, > TAG VARCHAR NOT NULL, > VENDOR_ID VARCHAR NOT NULL, > SERVICE VARCHAR NOT NULL, > .... > > CONSTRAINT PK PRIMARY KEY ( > ACCOUNT_ID, > PRECISION, > START_DATE, > SECONDARY_ACCOUNT_ID, > TAG, > VENDOR_ID, > SERVICE > ) COMPRESSION='SNAPPY', SALT_BUCKETS=16; > > When querying this table with a GROUP BY clause that contains the > 'START_DATE" column, the results returned are incorrectly aggregated. I > find multiple rows with the same "START_DATE" which should have been > aggregated to a single row. The workaround is to do a TO_CHAR(START_DATE) > which causes the query to return the correct results. > > e.g: > ---------------- > select start_date, account_id, > sum(quantity) as total, > sum(discounted_quantity) as disc_total > from COST > where > start_date >= (to_date('2016-03-01')) > and precision = 1 > and account_id in ('1234', '5678') > group start_date, account_id > > --RESULT-- > START_DATE ACCOUNT_ID > TOTAL DISC_TOTAL > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > 2016-03-01 16:00:00.0 1234 > 312 0 > > .... > 2016-03-07 16:00:00.0 1234 > 312 0 > > 2016-03-08 16:00:00.0 1234 > 312 0 > > 2016-03-09 16:00:00.0 1234 > 216 0 > > 2016-02-29 16:00:00.0 5678 > 194 24 > > 2016-03-01 16:00:00.0 5678 > 262 0 > > .... > 2016-03-07 16:00:00.0 5678 > 237 48 > > 2016-03-08 16:00:00.0 5678 > 178 0 > > 2016-03-09 16:00:00.0 1234 > 96 0 > > 2016-03-09 16:00:00.0 5678 > 173 50 > > 2016-03-10 16:00:00.0 1234 > 324 0 > > .... > 2016-03-07 16:00:00.0 5678 > 178 24 > > 2016-03-08 16:00:00.0 5678 > 218 0 > > 2016-03-09 16:00:00.0 5678 > 218 48 > > ^^^^^^^^^^^^^^^^^^^^^^ > Repeated rows > > -------------------- > Workaround: > select to_char(start_date), account_id, > sum(quantity) as total, > sum(discounted_quantity) as disc_total > from COST > where > start_date >= (to_date('2016-03-01')) > and precision = 1 > and account_id in ('1234', '5678') > group to_char(start_date), account_id > > --RESULT-- > START_DATE ACCOUNT_ID > TOTAL DISC_TOTAL > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > 2016-03-01 00:00:00.000 1234 > 312 0 > > 2016-03-01 00:00:00.000 5678 > 3,465 384 > > 2016-03-02 00:00:00.000 1234 > 312 0 > > 2016-03-02 00:00:00.000 5678 > 3,238 384 > > 2016-03-03 00:00:00.000 1234 > 312 0 > > 2016-03-03 00:00:00.000 5678 > 3,243 384 > > 2016-03-04 00:00:00.000 1234 > 312 0 > > 2016-03-04 00:00:00.000 5678 > 3,212 384 > > 2016-03-05 00:00:00.000 1234 > 312 0 > > 2016-03-05 00:00:00.000 5678 > 2,907 384 > > 2016-03-06 00:00:00.000 1234 > 312 0 > > 2016-03-06 00:00:00.000 5678 > 2,907 384 > > 2016-03-07 00:00:00.000 1234 > 312 0 > > 2016-03-07 00:00:00.000 5678 > 2,919 384 > > 2016-03-08 00:00:00.000 1234 > 312 0 > > 2016-03-08 00:00:00.000 5678 > 2,955 384 > > 2016-03-09 00:00:00.000 1234 > 312 0 > > 2016-03-09 00:00:00.000 5678 > 2,955 384 > > 2016-03-10 00:00:00.000 1234 > 312 0 > > 2016-03-10 00:00:00.000 5678 > 2,978 384 > > 2016-03-11 00:00:00.000 1234 > 324 0 > > 2016-03-11 00:00:00.000 5678 > 3,125 384 > > 2016-03-12 00:00:00.000 1234 > 336 0 > > 2016-03-12 00:00:00.000 5678 > 3,102 384 > > 2016-03-13 00:00:00.000 1234 > 336 0 > > 2016-03-13 00:00:00.000 5678 > 3,117 384 > > 2016-03-14 00:00:00.000 1234 > 153 0 > > 2016-03-14 00:00:00.000 5678 > 2,187 278 > > > ----------------- > > > Alok > > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');> >
