slim bouguerra created HIVE-16026:
-------------------------------------

             Summary: Generated query will timeout and/or kill the druid 
cluster.
                 Key: HIVE-16026
                 URL: https://issues.apache.org/jira/browse/HIVE-16026
             Project: Hive
          Issue Type: Bug
          Components: Druid integration
            Reporter: slim bouguerra


Grouping by `__time` and another dimension generate a query with granularity 
NONE with an interval from 1970 to 3000. This will kill the druid cluster 
because druid group by strategy will create cursor for every ms and there is 
lot of milliseconds between 1970 and 3000. Hence such query can turn into a 
select then do the group by within hive. This should only happen when we don't 
know the `__time` granularity.
{code}
explain select `__time`, userid from login_druid group by `__time`, userid
    > ;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
    limit:-1
    Select Operator [SEL_1]
      Output:["_col0","_col1"]
      TableScan [TS_0]
        
Output:["__time","userid"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_user_login\",\"granularity\":\"NONE\",\"dimensions\":[\"userid\"],\"limitSpec\":{\"type\":\"default\"},\"aggregations\":[{\"type\":\"longSum\",\"name\":\"dummy_agg\",\"fieldName\":\"dummy_agg\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to