Looks like a bug to me. Please open a ticket. A simple repro would be very useful.
https://issues.apache.org/jira/browse/DRILL - Sudheesh On Apr 3, 2017, at 2:11 PM, Joel Wilsson <[email protected]<mailto:[email protected]>> wrote: Hi, I'm seeing some strange results when trying to group by date_trunc('QUARTER', <timestamp column>). I can work around it by doing more or less the same thing as in DateTruncFunctions. Am I missing something, or is this a bug? 0: jdbc:drill:> SELECT date_trunc('QUARTER', `taxi_trips`.`dropoff_datetime`), COUNT(*) FROM `hive.default`.`taxi_trips` GROUP BY date_trunc('QUARTER', `taxi_trips`.`dropoff_datetime`) ORDER BY date_trunc('QUARTER', `taxi_trips`.`dropoff_datetime`); +------------------------+------------+ | EXPR$0 | EXPR$1 | +------------------------+------------+ | 2012-01-01 00:00:00.0 | 21817 | | 2013-01-01 00:00:00.0 | 173157926 | | 2013-04-01 00:00:00.0 | 3 | | 2013-07-01 00:00:00.0 | 2 | | 2013-10-01 00:00:00.0 | 3 | | 2014-01-01 00:00:00.0 | 8 | | 2020-01-01 00:00:00.0 | 4 | +------------------------+------------+ 7 rows selected (12.734 seconds) The data is spread out over all months of 2013: 0: jdbc:drill:> SELECT date_trunc('MONTH', `taxi_trips`.`dropoff_datetime`), COUNT(*) FROM `hive.default`.`taxi_trips` GROUP BY date_trunc('MONTH', `taxi_trips`.`dropoff_datetime`) ORDER BY date_trunc('MONTH', `taxi_trips`.`dropoff_datetime`); +------------------------+-----------+ | EXPR$0 | EXPR$1 | +------------------------+-----------+ | 2012-12-01 00:00:00.0 | 21817 | | 2013-01-01 00:00:00.0 | 14772657 | | 2013-02-01 00:00:00.0 | 13990803 | | 2013-03-01 00:00:00.0 | 15744402 | | 2013-04-01 00:00:00.0 | 15108210 | | 2013-05-01 00:00:00.0 | 15313848 | | 2013-06-01 00:00:00.0 | 14355098 | | 2013-07-01 00:00:00.0 | 13830436 | | 2013-08-01 00:00:00.0 | 12613596 | | 2013-09-01 00:00:00.0 | 14080300 | | 2013-10-01 00:00:00.0 | 15009363 | | 2013-11-01 00:00:00.0 | 14388420 | | 2013-12-01 00:00:00.0 | 13950801 | | 2014-01-01 00:00:00.0 | 8 | | 2020-05-01 00:00:00.0 | 4 | +------------------------+-----------+ 15 rows selected (12.25 seconds) This workaround gives the correct results: 0: jdbc:drill:> SELECT date_trunc('YEAR', `taxi_trips`.`dropoff_datetime`) + ((extract(month from `taxi_trips`.`dropoff_datetime`)-1)/3) * interval '3' MONTH, COUNT(*) FROM `hive.default`.`taxi_trips` GROUP BY date_trunc('YEAR', `taxi_trips`.`dropoff_datetime`) + ((extract(month from `taxi_trips`.`dropoff_datetime`)-1)/3) * interval '3' MONTH ORDER BY date_trunc('YEAR', `taxi_trips`.`dropoff_datetime`) + ((extract(month from `taxi_trips`.`dropoff_datetime`)-1)/3) * interval '3' MONTH; +------------------------+-----------+ | EXPR$0 | EXPR$1 | +------------------------+-----------+ | 2012-10-01 00:00:00.0 | 21817 | | 2013-01-01 00:00:00.0 | 44507862 | | 2013-04-01 00:00:00.0 | 44777156 | | 2013-07-01 00:00:00.0 | 40524332 | | 2013-10-01 00:00:00.0 | 43348584 | | 2014-01-01 00:00:00.0 | 8 | | 2020-04-01 00:00:00.0 | 4 | +------------------------+-----------+ 7 rows selected (13.261 seconds) The data is read from an external Parquet table: 0: jdbc:drill:> describe `hive.default`.`taxi_trips`; +---------------------+--------------------+--------------+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +---------------------+--------------------+--------------+ | dropoff_datetime | TIMESTAMP | YES | | dropoff_latitude | DOUBLE | YES | | dropoff_longitude | DOUBLE | YES | | hack_license | CHARACTER VARYING | YES | | medallion | CHARACTER VARYING | YES | | passenger_count | BIGINT | YES | | pickup_datetime | TIMESTAMP | YES | | pickup_latitude | DOUBLE | YES | | pickup_longitude | DOUBLE | YES | | rate_code | BIGINT | YES | | store_and_fwd_flag | CHARACTER VARYING | YES | | trip_distance | DOUBLE | YES | | trip_time_in_secs | BIGINT | YES | | vendor_id | CHARACTER VARYING | YES | +---------------------+--------------------+--------------+ 14 rows selected (0.184 seconds) 0: jdbc:drill:> Best regards, Joel
