[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2015-01-31 Thread Arun Gurumurthi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14299962#comment-14299962
 ] 

Arun Gurumurthi commented on HIVE-3552:
---

Hi Namit,

This functionality is great.

Will there be further enhancement to allow following :
a) Rollup  Cube to allow format such as -

current format : group by a, b, c with rollup

New formats : 
a) group by rollup(a, b, c) -- this will give output same as current format
b) group by rollup((a, b), c) -- this will give output as (a,b,c) / (a,b) / 
total
c) group by a rollup((b, c), d) -- this will give output as (a, b, c, d) / 
(a,b,c) , a

similar functionality with CUBE

These allow us to use rollup instead of specifying to many combinations with 
grouping sets when we do not want all combinations but only selective but it is 
still too many to specify in grouping sets.

This functionality is available in RDBMS.

Another request : If we were to use cube and filter only the sets we need 
instead of all combinations, i was trying to use groouping__ID to filter only 
specific sets and it does not return any output.

select a,b,c, grouping__ID, count(*)
from tableA
group by a,b,c, with cube
having grouping__ID = 2
;

Thanks
Arun

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.3552.1.patch, hive.3552.10.patch, 
 hive.3552.11.patch, hive.3552.12.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2014-01-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878542#comment-13878542
 ] 

Lefty Leverenz commented on HIVE-3552:
--

This adds hive.new.job.grouping.set.cardinality to HiveConf.java and 
hive-default.xml.template.

Also documented in the wiki, with a link to this JIRA ticket:  [Query Execution 
|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryExecution]
 (search for grouping).

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.3552.1.patch, hive.3552.10.patch, 
 hive.3552.11.patch, hive.3552.12.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2014-01-22 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13878581#comment-13878581
 ] 

Lefty Leverenz commented on HIVE-3552:
--

The main wikidoc for this is here: 

* [Enhanced Aggregation, Cube, Grouping and Rollup 
|https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup]

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.3552.1.patch, hive.3552.10.patch, 
 hive.3552.11.patch, hive.3552.12.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-06-26 Thread Irwin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13694416#comment-13694416
 ] 

Irwin commented on HIVE-3552:
-

I have tested for cubes and rollups, but failed.
My table is:t1,formatted followes:

The error message is:

I have tried to use hive-0.10.0 and hive-0.11.0, and the error is same.
Why I cannot use Enhanced Aggregation, Cube, Grouping and Rollup?
Any one help? thanks!

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Fix For: 0.11.0

 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549423#comment-13549423
 ] 

Hudson commented on HIVE-3552:
--

Integrated in Hive-trunk-h0.21 #1904 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1904/])
HIVE-3552. performant manner for performing cubes/rollups/grouping sets for 
a high number of grouping set keys. (Revision 1430979)

 Result = SUCCESS
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430979
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/files/grouping_sets1.txt
* /hive/trunk/data/files/grouping_sets2.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml


 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549459#comment-13549459
 ] 

Hudson commented on HIVE-3552:
--

Integrated in Hive-trunk-hadoop2 #56 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/56/])
HIVE-3552. performant manner for performing cubes/rollups/grouping sets for 
a high number of grouping set keys. (Revision 1430979)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430979
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/files/grouping_sets1.txt
* /hive/trunk/data/files/grouping_sets2.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml


 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548842#comment-13548842
 ] 

Hudson commented on HIVE-3552:
--

Integrated in hive-trunk-hadoop1 #4 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/4/])
HIVE-3552. performant manner for performing cubes/rollups/grouping sets for 
a high number of grouping set keys. (Revision 1430979)

 Result = ABORTED
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1430979
Files : 
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/data/files/grouping_sets1.txt
* /hive/trunk/data/files/grouping_sets2.txt
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets6.q
* /hive/trunk/ql/src/test/queries/clientnegative/groupby_grouping_sets7.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets2.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets3.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets4.q
* /hive/trunk/ql/src/test/queries/clientpositive/groupby_grouping_sets5.q
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets6.q.out
* /hive/trunk/ql/src/test/results/clientnegative/groupby_grouping_sets7.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets2.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets3.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets4.q.out
* /hive/trunk/ql/src/test/results/clientpositive/groupby_grouping_sets5.q.out
* /hive/trunk/ql/src/test/results/compiler/plan/groupby1.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby2.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby3.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby4.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby5.q.xml
* /hive/trunk/ql/src/test/results/compiler/plan/groupby6.q.xml


 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-08 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13547083#comment-13547083
 ] 

Kevin Wilfong commented on HIVE-3552:
-

+1

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.12.patch, hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch, hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, 
 hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2013-01-02 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13542382#comment-13542382
 ] 

Kevin Wilfong commented on HIVE-3552:
-

The patch needs to be updated, it's not applying cleanly.

Some minor style comments on Phabricator.

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.11.patch, 
 hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, 
 hive.3552.5.patch, hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, 
 hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-20 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13537462#comment-13537462
 ] 

Kevin Wilfong commented on HIVE-3552:
-

A few more comments on Phabricator.

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.1.patch, 
 hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
 hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-19 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13536762#comment-13536762
 ] 

Namit Jain commented on HIVE-3552:
--

refreshed and attached latest patch.

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.10.patch, hive.3552.1.patch, 
 hive.3552.2.patch, hive.3552.3.patch, hive.3552.4.patch, hive.3552.5.patch, 
 hive.3552.6.patch, hive.3552.7.patch, hive.3552.8.patch, hive.3552.9.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-17 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13534273#comment-13534273
 ] 

Kevin Wilfong commented on HIVE-3552:
-

Add a couple comments on Phabricator.

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530021#comment-13530021
 ] 

Namit Jain commented on HIVE-3552:
--

comments addressed + tests passed

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch, hive.3552.2.patch, hive.3552.3.patch, 
 hive.3552.4.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3552) HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a high number of grouping set keys

2012-12-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13508575#comment-13508575
 ] 

Namit Jain commented on HIVE-3552:
--

tests passed

 HIVE-3552 performant manner for performing cubes/rollups/grouping sets for a 
 high number of grouping set keys
 -

 Key: HIVE-3552
 URL: https://issues.apache.org/jira/browse/HIVE-3552
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3552.1.patch


 This is a follow up for HIVE-3433.
 Had a offline discussion with Sambavi - she pointed out a scenario where the
 implementation in HIVE-3433 will not scale. Assume that the user is performing
 a cube on many columns, say '8' columns. So, each row would generate 256 rows
 for the hash table, which may kill the current group by implementation.
 A better implementation would be to add an additional mr job - in the first 
 mr job perform the group by assuming there was no cube. Add another mr job, 
 where
 you would perform the cube. The assumption is that the group by would have 
 decreased the output data significantly, and the rows would appear in the 
 order of
 grouping keys which has a higher probability of hitting the hash table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira