[jira] [Created] (HIVE-18226) handle UDF to double/int over aggregate

slim bouguerra (JIRA) Tue, 05 Dec 2017 14:30:36 -0800

slim bouguerra created HIVE-18226:
-------------------------------------

             Summary: handle UDF to double/int over aggregate
                 Key: HIVE-18226
                 URL: https://issues.apache.org/jira/browse/HIVE-18226
             Project: Hive
          Issue Type: Sub-task
          Components: Druid integration
            Reporter: slim bouguerra



In cases like the following query Hive planner adds extra UDFtoDouble over 
integer columns.
This kind of udf can be pushed to Druid as DoubleSum instead of LongSum and 
vice versa.
{code}
PREHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
PREHOOK: type: QUERY
POSTHOOK: query: EXPLAIN SELECT floor_year(`__time`), SUM(ctinyint)/ count(*)
FROM druid_table GROUP BY floor_year(`__time`)
POSTHOOK: type: QUERY
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: druid_table
            properties:
              druid.query.json 
{"queryType":"timeseries","dataSource":"default.druid_table","descending":false,"granularity":"year","aggregations":[{"type":"longSum","name":"$f1","fieldName":"ctinyint"},{"type":"count","name":"$f2"}],"intervals":["1900-01-01T00:00:00.000/3000-01-01T00:00:00.000"],"context":{"skipEmptyBuckets":true}}
              druid.query.type timeseries
            Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL Column 
stats: NONE
            Select Operator
              expressions: __time (type: timestamp with local time zone), 
(UDFToDouble($f1) / UDFToDouble($f2)) (type: double)
              outputColumnNames: _col0, _col1
              Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
              File Output Operator
                compressed: false
                Statistics: Num rows: 9173 Data size: 0 Basic stats: PARTIAL 
Column stats: NONE
                table:
                    input format: 
org.apache.hadoop.mapred.SequenceFileInputFormat
                    output format: 
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                    serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-18226) handle UDF to double/int over aggregate

Reply via email to