[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638591#comment-13638591 ] Hudson commented on HIVE-4333: -- Integrated in Hive-trunk-h0.21 #2074 (See [https://builds.apache.org/job/Hive-trunk-h0.21/2074/]) HIVE-4333 : most windowing tests fail on hadoop 2 (Harish Butani via Ashutosh Chauhan) (Revision 1470317) Result = FAILURE hashutosh : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470317 Files : * /hive/trunk/ql/src/test/queries/clientpositive/leadlag.q * /hive/trunk/ql/src/test/queries/clientpositive/ptf.q * /hive/trunk/ql/src/test/queries/clientpositive/ptf_general_queries.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_expressions.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_multipartitioning.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_navfn.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_ntile.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_rank.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_udaf.q * /hive/trunk/ql/src/test/queries/clientpositive/windowing_windowspec.q * /hive/trunk/ql/src/test/results/clientpositive/leadlag.q.out * /hive/trunk/ql/src/test/results/clientpositive/ptf.q.out * /hive/trunk/ql/src/test/results/clientpositive/ptf_general_queries.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_expressions.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_multipartitioning.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_navfn.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_ntile.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_rank.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_udaf.q.out * /hive/trunk/ql/src/test/results/clientpositive/windowing_windowspec.q.out most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.11.0 Reporter: Gunther Hagleitner Assignee: Harish Butani Fix For: 0.12.0 Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch, HIVE-4333.D10389.2.patch Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637305#comment-13637305 ] Phabricator commented on HIVE-4333: --- ashutoshc has requested changes to the revision HIVE-4333 [jira] most windowing tests fail on hadoop 2. Few comments. Also, you need to apply HIVE-4369 to test on hadoop 2 INLINE COMMENTS ql/src/test/queries/clientpositive/ptf.q:200 Is it one of those which are deferring in precision after 2 decimal places. Shall we use round (sum (p_size),1) for these ? ql/src/test/queries/clientpositive/leadlag.q:76 It will be better to rewrite this query as select p_name, p_retailprice, lead(p_retailprice) over() as l1 , lag(p_retailprice) over() as l2 from part where p_retailprice = 1173.15; we want to test over() here, so adding order by effectively defeats the purpose. I tested this query and it passes with 23 as well. ql/src/test/queries/clientpositive/windowing.q:278 Shall we use round (sum(p_size), 1) here ? REVISION DETAIL https://reviews.facebook.net/D10389 BRANCH HIVE-4333 ARCANIST PROJECT hive To: JIRA, ashutoshc, hbutani most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.11.0 Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637474#comment-13637474 ] Ashutosh Chauhan commented on HIVE-4333: +1 will commit. most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.11.0 Reporter: Gunther Hagleitner Assignee: Harish Butani Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch, HIVE-4333.D10389.2.patch Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636446#comment-13636446 ] Ashutosh Chauhan commented on HIVE-4333: [~rhbutani] Can you create phabricator entry for this? Since its a huge patch, its hard to read diff file. most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Matthew Weaver Attachments: HIVE-4333.1.patch.txt Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636456#comment-13636456 ] Ashutosh Chauhan commented on HIVE-4333: bq. There are diffs because of precision. Some of the avg and sum functions are now wrapped in 'round' I didn't get this part. All this computation is within Hive, it shouldn't be affected by hadoop version. wrapped in 'round' ? in Hive or Hadoop? bq. Looks like the shuffle in 2.0 reorders the rows even in this case. Yeah thats possible. Since in over() partitioning is by constant so all rows have same value for partitioning column so they can arrive in any order. We need to come up with clever way of writing test which still test over() but gives ordered result for both hadoop 1 and hadoop2 most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Matthew Weaver Attachments: HIVE-4333.1.patch.txt Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636548#comment-13636548 ] Harish Butani commented on HIVE-4333: - I think the diffs due to precision are for the same ordering issue. Since the rows in the partitions are not in the same order there are differences in the overall sum/avg beyond 2 decimal places. most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Components: PTF-Windowing Affects Versions: 0.11.0 Reporter: Gunther Hagleitner Assignee: Matthew Weaver Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635362#comment-13635362 ] Harish Butani commented on HIVE-4333: - Attached a patch. The changes fall into these categories: - some queries had 'partition by p_mfgr order by p_mfgr' or just 'partition by p_mfgr'. In these cases rows within a partition are not coming in the same order as in hadoop 1. Changed to 'partition by p_mfgr order by p_name' - Manufacturer 1 has 2 rows with exactly the same data; so if we use a 'row based window' there are diffs between 1 2. Changed to using a 'range based window' - There are diffs because of precision. Some of the avg and sum functions are now wrapped in 'round' - Finally tests with the empty over() on fns that relied on order had to changed. For e.g. leadlag.q Query 8. I tried the following change: {noformat} select p_name, p_retailprice, lead(p_retailprice) over() as l1 , lag(p_retailprice) over() as l2 from (select p_name, p_retailprice from part where p_mfgr = 'Manufacturer#1' order by p_name, p_retailprice ) p; {noformat} The output in hadoop 1 is: {noformat} almond antique burnished rose metallic 1173.15 1173.15 NULL almond antique burnished rose metallic 1173.15 1753.76 1173.15 almond antique chartreuse lavender yellow 1753.76 1602.59 1173.15 almond antique salmon chartreuse burlywood 1602.59 1414.42 1753.76 almond aquamarine burnished black steel 1414.42 1632.66 1602.59 almond aquamarine pink moccasin thistle 1632.66 NULL1414.42 {noformat} The input to lead and lag query is ordered on p_name and p_retailprice and is very small, just 6 rows(so only 1 mapper is involved) In 1.0 the rows are coming to the reducer in the same order as the input In hadoop 2.0 the result is: {noformat} almond aquamarine pink moccasin thistle 1632.66 1414.42 NULL almond aquamarine burnished black steel 1414.42 1602.59 1632.66 almond antique salmon chartreuse burlywood 1602.59 1753.76 1414.42 almond antique chartreuse lavender yellow 1753.76 1173.15 1602.59 almond antique burnished rose metallic 1173.15 1173.15 1753.76 almond antique burnished rose metallic 1173.15 NULL1173.15 {noformat} Looks like the shuffle in 2.0 reorders the rows even in this case. most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Matthew Weaver Attachments: HIVE-4333.1.patch.txt Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634570#comment-13634570 ] Matthew Weaver commented on HIVE-4333: -- The OVER clauses don't fully specify the ordering, causing different ordering of results and different values for FIRST and LAST. The fix is just to add enough fields to guarantee an unambiguous ordering in the window. This will fix many of the queries, maybe all. most windowing tests fail on hadoop 2 - Key: HIVE-4333 URL: https://issues.apache.org/jira/browse/HIVE-4333 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Matthew Weaver Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira