Zhihua Deng created HIVE-23893:
----------------------------------

             Summary: Extract deterministic conditions for pdd when the 
predicate contains non-deterministic function
                 Key: HIVE-23893
                 URL: https://issues.apache.org/jira/browse/HIVE-23893
             Project: Hive
          Issue Type: Improvement
          Components: Logical Optimizer
            Reporter: Zhihua Deng


Taken the following query for example, assume unix_timestamp is 
non-deterministic before version 1.3.0:
 
{{SELECT}}
{{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') AS ft,}}
{{        b.game_id AS game_id,}}
{{        b.game_name AS game_name,}}
{{        count(DISTINCT a.sha1_imei) uv}}
{{FROM}}
{{        gamesdk_userprofile a}}
{{        JOIN game_info_all b ON a.appid = b.dev_app_id}}
{{WHERE}}
{{        a.date = 20200704}}
{{        AND from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd') = 20200704}}
{{        AND b.date = 20200704}}
{{GROUP BY}}
{{        from_unixtime(unix_timestamp(a.first_dt), 'yyyyMMdd'),}}
{{        b.game_id,}}
{{        b.game_name}}
{{ORDER BY}}
{{        uv DESC}}
{{LIMIT 200;}}
 
The predicates(a.date = 20200704, b.date = 20200704) are unable to push down to 
join op, make the optimizer unable to prune partitions, which may result  to a 
full scan on tables gamesdk_userprofile and game_info_all.
{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to