[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638591#comment-13638591
 ] 

Hudson commented on HIVE-4333:
--

Integrated in Hive-trunk-h0.21 #2074 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2074/])
HIVE-4333 : most windowing tests fail on hadoop 2 (Harish Butani via 
Ashutosh Chauhan) (Revision 1470317)

 Result = FAILURE
hashutosh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1470317
Files : 
* /hive/trunk/ql/src/test/queries/clientpositive/leadlag.q
* /hive/trunk/ql/src/test/queries/clientpositive/ptf.q
* /hive/trunk/ql/src/test/queries/clientpositive/ptf_general_queries.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_expressions.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_multipartitioning.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_navfn.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_ntile.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_rank.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_udaf.q
* /hive/trunk/ql/src/test/queries/clientpositive/windowing_windowspec.q
* /hive/trunk/ql/src/test/results/clientpositive/leadlag.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ptf.q.out
* /hive/trunk/ql/src/test/results/clientpositive/ptf_general_queries.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing_expressions.q.out
* 
/hive/trunk/ql/src/test/results/clientpositive/windowing_multipartitioning.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing_navfn.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing_ntile.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing_rank.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing_udaf.q.out
* /hive/trunk/ql/src/test/results/clientpositive/windowing_windowspec.q.out


 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Fix For: 0.12.0

 Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch, 
 HIVE-4333.D10389.2.patch


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-20 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637305#comment-13637305
 ] 

Phabricator commented on HIVE-4333:
---

ashutoshc has requested changes to the revision HIVE-4333 [jira] most 
windowing tests fail on hadoop 2.

  Few comments. Also, you need to apply HIVE-4369 to test on hadoop 2

INLINE COMMENTS
  ql/src/test/queries/clientpositive/ptf.q:200 Is it one of those which are 
deferring in precision after 2 decimal places. Shall we use round (sum 
(p_size),1) for these ?
  ql/src/test/queries/clientpositive/leadlag.q:76 It will be better to rewrite 
this query as
  select p_name, p_retailprice,
  lead(p_retailprice) over() as l1 ,
  lag(p_retailprice)  over() as l2
  from part
  where p_retailprice = 1173.15;

  we want to test over() here, so adding order by effectively defeats the 
purpose.
  I tested this query and it passes with 23 as well.
  ql/src/test/queries/clientpositive/windowing.q:278 Shall we use round 
(sum(p_size), 1) here ?

REVISION DETAIL
  https://reviews.facebook.net/D10389

BRANCH
  HIVE-4333

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, hbutani


 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-20 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637474#comment-13637474
 ] 

Ashutosh Chauhan commented on HIVE-4333:


+1 will commit.

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Gunther Hagleitner
Assignee: Harish Butani
 Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch, 
 HIVE-4333.D10389.2.patch


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636446#comment-13636446
 ] 

Ashutosh Chauhan commented on HIVE-4333:


[~rhbutani] Can you create phabricator entry for this? Since its a huge patch, 
its hard to read diff file.

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver
 Attachments: HIVE-4333.1.patch.txt


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636456#comment-13636456
 ] 

Ashutosh Chauhan commented on HIVE-4333:


bq. There are diffs because of precision. Some of the avg and sum functions are 
now wrapped in 'round'
I didn't get this part. All this computation is within Hive, it shouldn't be 
affected by hadoop version. wrapped in 'round' ? in Hive or Hadoop?

bq. Looks like the shuffle in 2.0 reorders the rows even in this case.
Yeah thats possible. Since in over() partitioning is by constant so all rows 
have same value for partitioning column so they can arrive in any order. We 
need to come up with clever way of writing test which still test over() but 
gives ordered result for both hadoop 1 and hadoop2


 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver
 Attachments: HIVE-4333.1.patch.txt


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-19 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13636548#comment-13636548
 ] 

Harish Butani commented on HIVE-4333:
-

I think the diffs due to precision are for the same ordering issue. Since the 
rows in the partitions are not in the same order there are differences in the 
overall sum/avg beyond 2 decimal places.

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
  Components: PTF-Windowing
Affects Versions: 0.11.0
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver
 Attachments: HIVE-4333.1.patch.txt, HIVE-4333.D10389.1.patch


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-18 Thread Harish Butani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13635362#comment-13635362
 ] 

Harish Butani commented on HIVE-4333:
-

Attached a patch. The changes fall into these categories:

- some queries had 'partition by p_mfgr order by p_mfgr' or just 'partition by 
p_mfgr'. In these cases rows within a partition are not coming in the same 
order as in hadoop 1. Changed to 'partition by p_mfgr order by p_name'
- Manufacturer 1 has 2 rows with exactly the same data; so if we use a 'row 
based window' there are diffs between 1  2. Changed to using a 'range based 
window'
- There are diffs because of precision. Some of the avg and sum functions are 
now wrapped in 'round'
- Finally tests with the empty over() on fns that relied on order had to 
changed. 
For e.g. leadlag.q Query 8. I tried the following change:
{noformat}
select p_name, p_retailprice,
lead(p_retailprice) over() as l1 ,
lag(p_retailprice)  over() as l2
from (select p_name, p_retailprice from part where p_mfgr = 'Manufacturer#1' 
order by p_name, p_retailprice ) p;
{noformat}

The output in hadoop 1 is:
{noformat}
almond antique burnished rose metallic  1173.15 1173.15 NULL
almond antique burnished rose metallic  1173.15 1753.76 1173.15
almond antique chartreuse lavender yellow   1753.76 1602.59 1173.15
almond antique salmon chartreuse burlywood  1602.59 1414.42 1753.76
almond aquamarine burnished black steel 1414.42 1632.66 1602.59
almond aquamarine pink moccasin thistle 1632.66 NULL1414.42
{noformat}

The input to lead and lag query is ordered on p_name and p_retailprice and is 
very small, just 6 rows(so only 1 mapper is involved) In 1.0 the rows are 
coming to the reducer in the same order as the input


In hadoop 2.0 the result is:
{noformat}
almond aquamarine pink moccasin thistle 1632.66 1414.42 NULL
almond aquamarine burnished black steel 1414.42 1602.59 1632.66
almond antique salmon chartreuse burlywood  1602.59 1753.76 1414.42
almond antique chartreuse lavender yellow   1753.76 1173.15 1602.59
almond antique burnished rose metallic  1173.15 1173.15 1753.76
almond antique burnished rose metallic  1173.15 NULL1173.15
{noformat}

Looks like the shuffle in 2.0 reorders the rows even in this case. 

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver
 Attachments: HIVE-4333.1.patch.txt


 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

2013-04-17 Thread Matthew Weaver (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634570#comment-13634570
 ] 

Matthew Weaver commented on HIVE-4333:
--

The OVER clauses don't fully specify the ordering, causing different ordering 
of results and different values for FIRST and LAST.  The fix is just to add 
enough fields to guarantee an unambiguous ordering in the window.  This will 
fix many of the queries, maybe all.

 most windowing tests fail on hadoop 2
 -

 Key: HIVE-4333
 URL: https://issues.apache.org/jira/browse/HIVE-4333
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Matthew Weaver

 Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira