[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556086#comment-13556086
 ] 

Hudson commented on HIVE-3852:
--

Integrated in hive-trunk-hadoop1 #20 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/20/])
HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit) (Revision 1434600)

 Result = ABORTED
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1434600
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out


 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556585#comment-13556585
 ] 

Hudson commented on HIVE-3852:
--

Integrated in Hive-trunk-hadoop2 #70 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/70/])
HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit) (Revision 1434600)

 Result = FAILURE
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1434600
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out


 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556634#comment-13556634
 ] 

Hudson commented on HIVE-3852:
--

Integrated in Hive-trunk-h0.21 #1919 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1919/])
HIVE-3852 Multi-groupby optimization fails when same distinct column is
used twice or more (Navis via namit) (Revision 1434600)

 Result = SUCCESS
namit : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1434600
Files : 
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientpositive/groupby10.q
* /hive/trunk/ql/src/test/results/clientpositive/groupby10.q.out


 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.11.0

 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-16 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555191#comment-13555191
 ] 

Ashutosh Chauhan commented on HIVE-3852:


Namit,
bq. Should we have this optimization now ?
I am not sure which particular optimization you are referring to. I assume you 
mean there is no need for reduce-side groupbys anymore, since we have map-side 
aggregates. If so, I think those are still required. As Navis, pointed out if 
reduction ratio is not high enough, mappers may run out of memory and than we 
suggest users to turn-off map-side aggregation.


 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555904#comment-13555904
 ] 

Namit Jain commented on HIVE-3852:
--

OK, I agree.
We may have a scenario, in which this is useful.

I will review.

 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13555908#comment-13555908
 ] 

Namit Jain commented on HIVE-3852:
--

+1

 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-14 Thread Navis (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13553336#comment-13553336
 ] 

Navis commented on HIVE-3852:
-

Namit, 
I don't think I'm right person to answer it but IMHO, it would be dependent to 
reduction ratio by map aggregation. If group by column is rather distinctive, 
this optimization could useful but if it's not, two (or more) MR tasks would be 
faster. 

 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3852) Multi-groupby optimization fails when same distinct column is used twice or more

2013-01-07 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13545819#comment-13545819
 ] 

Namit Jain commented on HIVE-3852:
--

[~navis], I had a higher level question.
Should we have this optimization now ?
I mean, is this really needed with map-side aggregates, or can we remove this 
code completely ?

 Multi-groupby optimization fails when same distinct column is used twice or 
 more
 

 Key: HIVE-3852
 URL: https://issues.apache.org/jira/browse/HIVE-3852
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3852.D7737.1.patch


 {code}
 FROM INPUT
 INSERT OVERWRITE TABLE dest1 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), count(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key
 INSERT OVERWRITE TABLE dest2 
 SELECT INPUT.key, sum(distinct substr(INPUT.value,5)), avg(distinct 
 substr(INPUT.value,5)) GROUP BY INPUT.key;
 {code}
 fails with exception FAILED: IndexOutOfBoundsException Index: 0,Size: 0

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira