gumengchao created HIVE-23304:
---------------------------------

             Summary: Two SQL execution results with the same semantics are not 
the same
                 Key: HIVE-23304
                 URL: https://issues.apache.org/jira/browse/HIVE-23304
             Project: Hive
          Issue Type: Bug
          Components: Hive, SQL
    Affects Versions: 2.3.7
         Environment: hadoop version 2.7.7

hive version 2.3.7

hive.execution.engine=mr

 
            Reporter: gumengchao


execute sql

 
{noformat}
SELECT AVG(a.rate), a.movieid, MAX(b.userid)
FROM t_rating a
        INNER JOIN (
                SELECT a.movieid AS movieid, a.userid AS userid, a.rate
                FROM t_rating a
                        JOIN (
                                SELECT COUNT(*) AS rate_times, a.userid AS 
userid
                                FROM t_rating a
                                        JOIN (
                                                SELECT *
                                                FROM t_user
                                                WHERE sex = 'F'
                                        ) b
                                        ON a.userid = b.userid
                                GROUP BY a.userid
                                ORDER BY rate_times DESC
                                LIMIT 0, 1
                        ) b
                        ON a.userid = b.userid
                ORDER BY a.rate DESC
                LIMIT 0, 10
        ) b
        ON a.movieid = b.movieid
GROUP BY a.movieid{noformat}
The result is below.But this is not the result I want 
{noformat}
4.063136456211812       162     1150
4.476190476190476       904     1150
4.249370277078086       951     1150
4.14167916041979        1230    1150
3.6464646464646466      1966    1150
4.163043478260869       2330    1150
3.7039473684210527      3163    1150
4.387453874538745       3307    1150
4.047363717605005       3671    1150
3.8265682656826567      3675    1150{noformat}
So I divided sql into two steps

fisrt:

 
{noformat}
create temporary table tmp as
SELECT a.movieid AS movieid, a.userid AS userid, a.rate
                FROM t_rating a
                        JOIN (
                                SELECT COUNT(*) AS rate_times, a.userid AS 
userid
                                FROM t_rating a
                                        JOIN (
                                                SELECT *
                                                FROM t_user
                                                WHERE sex = 'F'
                                        ) b
                                        ON a.userid = b.userid
                                GROUP BY a.userid
                                ORDER BY rate_times DESC
                                LIMIT 0, 1
                        ) b
                        ON a.userid = b.userid
                ORDER BY a.rate DESC
                LIMIT 0, 10{noformat}
second:

 
{noformat}
SELECT AVG(a.rate), a.movieid, MAX(b.userid)
FROM t_rating a INNER JOIN tmp b
        ON a.movieid = b.movieid
GROUP BY a.movieid{noformat}
the result 
{noformat}
4.52054794520548        745     1150
4.4498902706656915      750     1150
4.476190476190476       904     1150
4.280748663101604       905     1150
3.7314890154597236      1094    1150
4.188888888888889       1236    1150
4.21043771043771        1256    1150
3.747422680412371       1279    1150
4.0739348370927315      2064    1150
4.125390450691656       2997    1150{noformat}
Why the results of two executions are different?

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to